Image similarity in Golang. Version 4 (LATEST)
Go to file
Vitali Fedulov 647070ed77 1st commit 2022-08-08 02:53:29 +02:00
testdata 1st commit 2022-08-08 02:53:29 +02:00
LICENSE 1st commit 2022-08-08 02:53:29 +02:00
README.md 1st commit 2022-08-08 02:53:29 +02:00
const.go 1st commit 2022-08-08 02:53:29 +02:00
go.mod 1st commit 2022-08-08 02:53:29 +02:00
icon.go 1st commit 2022-08-08 02:53:29 +02:00
icon_test.go 1st commit 2022-08-08 02:53:29 +02:00
image.go 1st commit 2022-08-08 02:53:29 +02:00
image_test.go 1st commit 2022-08-08 02:53:29 +02:00
similarity.go 1st commit 2022-08-08 02:53:29 +02:00
similarity_test.go 1st commit 2022-08-08 02:53:29 +02:00

README.md

Find similar images with Go

Near duplicates and resized images can be found with the package. No dependencies.

Demo: similar image clustering based on similar algorithm.

This is the latest major version (v4) of v1/2 and v3. The changes vs v3 are: simplified func Icon input, more than 2x smaller memory footprint of icons, additional IconNN function, fixed GIF support, removal of dependencies, removal of hashes (a separate package for hashes will be created and linked from here soon).

Func Similar gives a verdict whether 2 images are similar with well-tested default thresholds.

Func EucMetric can be used instead, when you need different precision or want to sort by similarity. Func PropMetric can be used for customization of image proportion threshold.

Func Open supports JPEG, PNG and GIF. But other image types can be used through third-party decoders, because input for func Icon is Golang image.Image.

Go doc for code reference.

Example of comparing 2 images

package main

import (
	"fmt"
	"github.com/vitali-fedulov/images4"
)

func main() {

	// Photos to compare.
	path1 := "1.jpg"
	path2 := "2.jpg"

	// Open files (discarding errors here).
	img1, _ := images4.Open(path1)
	img2, _ := images4.Open(path2)

	// Icons are compact image representations (image "hashes").
	// Name "hash" is not used intentionally.
	icon1 := images4.Icon(img1)
	icon2 := images4.Icon(img2)

	// Comparison.
	// Images are not used directly. Icons are used instead,
	// because they have tiny memory footprint and fast to compare.
	if images4.Similar(icon1, icon2) {
		fmt.Println("Images are similar.")
	} else {
		fmt.Println("Images are distinct.")
	}
}

Algorithm

Detailed explanation, also as a PDF.

Summary: Images are resized in a special way to squares of fixed size called "icons". Euclidean distance between the icons is used to give the similarity verdict. Also image proportions are used to avoid matching images of distinct shape.

Customization suggestions

To increase precision you can either use your own thresholds in func EucMetric (and PropMetric) OR generate icons for image sub-regions and compare those icons.

To speedup file processing you may want to generate icons for available image thumbnails. Specifically, many JPEG images contain EXIF thumbnails, you could considerably speedup the reads by using decoded thumbnails to feed into func Icon. External packages to read thumbnails: 1 and 2. A note of caution: in rare cases there could be issues with thumbnails not matching image content. EXIF standard specification: 1 and 2.