Image similarity in Golang. Version 4 (LATEST)
Go to file
Vitali Fedulov ca2e7db1ef
-
2024-04-02 01:02:06 +02:00
testdata Added new func: CustomSimilar 2024-01-28 22:50:59 +01:00
LICENSE 1st commit 2022-08-08 02:53:29 +02:00
README.md - 2024-04-02 01:02:06 +02:00
const.go Exporting 1 const and 2 funcs 2022-08-09 01:43:12 +02:00
custom.go Added new func: CustomSimilar (3) 2024-01-30 01:18:45 +01:00
custom_test.go Added new func: CustomSimilar (3) 2024-01-30 01:18:45 +01:00
go.mod 1st commit 2022-08-08 02:53:29 +02:00
icon.go Exporting ResizeByNearest func 2023-12-03 17:04:57 +01:00
icon_test.go Adding new funcs. Related to #2 (2) 2023-04-18 18:31:25 +02:00
image.go Exporting ResizeByNearest func 2023-12-03 17:04:57 +01:00
image_test.go Exporting ResizeByNearest func 2023-12-03 17:04:57 +01:00
similarity.go Adding new funcs. Related to #2 (2) 2023-04-18 18:31:25 +02:00
similarity_test.go Adding new funcs. Related to #2 2023-04-18 17:52:02 +02:00

README.md

Find similar images with Go (LATEST VERSION)

Resized and near duplicate image comparison. No dependencies. For search in very large image sets use imagehash2 as a fast pre-filtering step.

Demo: similar pictures search and clustering (pure in-browser JS app served from).

Major (semantic) versions have their own repositories and are mutually incompatible:

Major version Repository Comment
4 images4 - this recommended; fast hash prefiltering (re)moved to imagehash2
3 images3 good, but less optimized
1, 2 images good, legacy code

Go doc - for full code documentation.

Example of comparing 2 images

package main

import (
	"fmt"
	"github.com/vitali-fedulov/images4"
)

func main() {

	// Opening and decoding images. Silently discarding errors.
	img1, _ := images4.Open("1.jpg")
	img2, _ := images4.Open("2.jpg")

	// Icons are compact hash-like image representations.
	icon1 := images4.Icon(img1)
	icon2 := images4.Icon(img2)

	// Comparison. Images are not used directly.
	// Use func CustomSimilar for different precision.
	if images4.Similar(icon1, icon2) {
		fmt.Println("Images are similar")
	} else {
		fmt.Println("Not similar")
	}

}

Main functions

  • Open decodes JPEG, PNG and GIF. But other types can be opened with third-party decoders, because the input to func 'Icon' is Golang image.Image. Example fork (not mine) expanded with support of WEBP images.

  • Icon produces an image hash-like struct called "icon", which will be used for comparision. Side note: name "hash" is reserved for true hash tables in related package for faster comparison imagehash2.

  • Similar gives a verdict whether 2 images are similar with well-tested default thresholds. Rotations and mirrors are not taken in account.

  • CustomSimilar is like 'Similar' above, but allows modifying the default thresholds by multiplication coefficients. When the coefficients equal 1.0, those two functions are equivalent. When the coefficients are less than 1.0, the comparison is more precise, down to 0.0 for identical images.

Advanced functions

  • Similar90270 is a superset of 'Similar' by additional comparison to images rotated ±90°. Such rotations are relatively common, even by accident when taking pictures on mobile phones.

  • CustomSimilar90270 is a custom func for rotations as above with 'CustomSimilar'.

  • EucMetric is an alternative to 'CustomSimilar' when you need to know metric values, for example to sort by similarity. Example (not mine) of custom similarity function.

  • PropMetric is as above for image proportions.

  • DefaultThresholds prints default thresholds used in func 'Similar' and 'Similar90270', as a starting point for selecting thresholds on 'EucMetric' and 'PropMetric'.

  • Rotate90 turns an icon 90° clockwise. This is useful for developing custom similarity function for rotated images with 'EucMetric' and 'PropMetric'. With the function you can also compare to images rotated 180° (by applying 'Rotate90' twice).

  • ResizeByNearest is an image resizing function useful for fast identification of identical images and development of custom distance metrics not involving any of the above comparison functions.

Algorithm

Images are resampled and resized to squares of fixed size called "icons". Euclidean distance between the icons is used to give the similarity verdict. Also image proportions are used to avoid matching images of distinct shape.

Detailed explanation, also as a PDF.

Speed and precision

To considerably accelerate comparison in large image collections (thousands and more), use hash-table pre-filtering with package imagehash2.

To considerably accelerate image decoding you can generate icons for embedded image thumbnails. Specifically, many JPEG images contain EXIF thumbnails. Packages to read thumbnails: 1 and 2. A note of caution: in rare cases there could be issues with thumbnails not matching image content. EXIF standard specification: 1 and 2.

An alternative method to increase precision instead of func 'CustomSimilar' is to generate icons for image sub-regions and compare those icons.