Image similarity in Golang. Version 4 (LATEST)
Go to file
Vitali Fedulov 9014e3e259 Added new func: CustomSimilar (3) 2024-01-30 01:18:45 +01:00
testdata Added new func: CustomSimilar 2024-01-28 22:50:59 +01:00
LICENSE
README.md Update README.md 2024-01-03 06:38:10 +01:00
const.go Exporting 1 const and 2 funcs 2022-08-09 01:43:12 +02:00
custom.go Added new func: CustomSimilar (3) 2024-01-30 01:18:45 +01:00
custom_test.go Added new func: CustomSimilar (3) 2024-01-30 01:18:45 +01:00
go.mod
icon.go Exporting ResizeByNearest func 2023-12-03 17:04:57 +01:00
icon_test.go Adding new funcs. Related to #2 (2) 2023-04-18 18:31:25 +02:00
image.go Exporting ResizeByNearest func 2023-12-03 17:04:57 +01:00
image_test.go Exporting ResizeByNearest func 2023-12-03 17:04:57 +01:00
similarity.go Adding new funcs. Related to #2 (2) 2023-04-18 18:31:25 +02:00
similarity_test.go Adding new funcs. Related to #2 2023-04-18 17:52:02 +02:00

README.md

Find similar images with Go (LATEST VERSION)

Resized and near duplicate image comparison. No dependencies.

Demo: similar pictures search and clustering (pure in-browser JS app served from).

Major (semantic) versions have their own repositories and are mutually incompatible:

Major version Repository Comment
4 images4 - this recommended; fast hash prefiltering (re)moved to imagehash
3 images3 good, but less optimized
1, 2 images good, legacy code

Example of comparing 2 images

package main

import (
	"fmt"
	"github.com/vitali-fedulov/images4"
)

func main() {

	// Photos to compare.
	path1 := "1.jpg"
	path2 := "2.jpg"

	// Open files (discarding errors here).
	img1, _ := images4.Open(path1)
	img2, _ := images4.Open(path2)

	// Icons are compact image representations (image "hashes"). Name "hash" is reserved for "true" hashes in package imagehash.
	icon1 := images4.Icon(img1)
	icon2 := images4.Icon(img2)

	// Comparison. Images are not used directly. Icons are used instead, because they have tiny memory footprint and fast to compare. If you need to include images rotated right and left use func Similar90270.
	if images4.Similar(icon1, icon2) {
		fmt.Println("Images are similar.")
	} else {
		fmt.Println("Images are distinct.")
	}
}

Main functions

  • Open supports JPEG, PNG and GIF. But other image types can be used through third-party decoders, because input for func Icon is Golang image.Image. Example fork (not mine) expanded with support of WEBP images.

  • Icon produces "image hashes" called "icons", which will be used for comparision.

  • Similar gives a verdict whether 2 images are similar with well-tested default thresholds. To see the thresholds use DefaultThresholds. Rotations and mirrors are not taken in account.

  • Similar90270 is a superset of Similar by additional comparison to images rotated ±90°. Such rotations are relatively common, even by accident when taking pictures on mobile phones.

  • EucMetric can be used instead of Similar when you need different precision or want to sort by similarity. Example (not mine) of custom similarity function.

  • PropMetric allows customization of image proportion threshold.

  • DefaultThresholds prints default thresholds used in func Similar and Similar90270, as a starting point for selecting thresholds on EucMetric and PropMetric.

  • Rotate90 turns an icon 90° clockwise. This is useful for developing custom similarity function for rotated images with EucMetric and PropMetric. With the function you can also compare to images rotated 180° (by applying Rotate90 twice).

Go doc for code reference.

Algorithm

Detailed explanation, also as a PDF.

Summary: Images are resized in a special way to squares of fixed size called "icons". Euclidean distance between the icons is used to give the similarity verdict. Also image proportions are used to avoid matching images of distinct shape.

Customization suggestions

To increase precision you can use your own thresholds in func EucMetric (and PropMetric) OR generate icons for image sub-regions and compare those icons.

To accelerate image decoding you can generate icons for embedded image thumbnails. Specifically, many JPEG images contain EXIF thumbnails. Packages to read thumbnails: 1 and 2. A note of caution: in rare cases there could be issues with thumbnails not matching image content. EXIF standard specification: 1 and 2.

To search in very large image collections (millions and more), use hash-table pre-filtering with my package imagehash.