Compare commits

...

15 Commits

Author SHA1 Message Date
Vitali Fedulov 53c0313311
Update README.md 2024-04-06 03:29:37 +02:00
Vitali Fedulov af4edd7579
Update README.md 2024-03-20 22:51:11 +01:00
Vitali Fedulov 0cdc91b59c
- 2024-03-20 22:44:19 +01:00
Vitali Fedulov 8330dfbe44 - 2024-02-07 21:02:09 +01:00
Vitali Fedulov bd63fbbcd4 fnv.New64a test fix and README update 2024-02-07 20:55:42 +01:00
Vitali Fedulov 5e6500c206
- 2023-10-16 17:15:20 +02:00
Vitali Fedulov fc910b3659
Update README.md 2022-07-07 03:14:01 +02:00
Vitali Fedulov 1040b4f3a9
- 2022-05-12 01:22:15 +02:00
Vitali Fedulov 2f83399465
- 2022-03-22 22:19:35 +01:00
Vitali Fedulov 4d66dfaf90
- 2022-03-22 22:18:00 +01:00
Vitali Fedulov a96a06a77a
- 2022-03-16 18:53:45 +01:00
Vitali Fedulov b98bf8c70d
- 2022-03-16 18:51:23 +01:00
Vitali Fedulov 5a935f1d16
Update README.md 2022-02-09 13:57:31 +01:00
Vitali Fedulov 8dab742792
Update README.md 2022-02-06 20:23:38 +01:00
Vitali Fedulov a441b7817f
Update README.md 2022-02-06 20:21:23 +01:00
4 changed files with 19 additions and 14 deletions

View File

@ -1,6 +1,6 @@
MIT License
Copyright (c) 2121 Vitali Fedulov (fedulov.vitali@gmail.com)
Copyright (c) 2021 Vitali Fedulov (fedulov.vitali@gmail.com)
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal

View File

@ -1,11 +1,16 @@
# Hashing float vectors in N-dimensions
# Hashing N-dimensional float vectors
Package hyper allows fast approximate search of nearest neighbour vectors in n-dimensional space.
Search nearest neighbour vectors in n-dimensional space with hashes. There are no dependencies in this package.
Package functions discretize a vector and generate a set of hashes, as described in the [following document](https://vitali-fedulov.github.io/algorithm-for-hashing-high-dimensional-float-vectors.html).
The algorithm is based on the assumption that two real numbers can be considered equal within certain equality distance. Then quantization is used for comparison. To make sure points near or at quantization borders are also comparable, a vector can be discretized into more than one hash, as described [here](https://vitali-fedulov.github.io/similar.pictures/algorithm-for-hashing-high-dimensional-float-vectors.html) (also as [PDF](https://github.com/vitali-fedulov/research/blob/main/Algorithm%20for%20hashing%20float%20vectors.pdf)). The method indirectly clusters given vectors by hypercubes.
To use the package follow the sequence of functions/methods:
1) CubeSet or CentralCube, depending which one is used for a database record and which one for a query.
2) HashSet and DecimalHash to get corresponding hash set and central hash from results of (2). If DecimalHash is not suitable because of very large number of buckets or dimensions, use FNV1aHash to get both the hash set and the central hash).
[Example](https://github.com/vitali-fedulov/images3/blob/master/hashes.go) of usage for image comparison.
## How to use
1) Provided a float vector []float64, use `CubeSet` and `CentralCube` functions to generate hypercube coordinates []int. The difference between the two functions is that one corresponds to hash-table record and the other to a query or vice versa, depending on performance/memory preference.
2) `HashSet` and `DecimalHash`/`FNV1aHash` are used to get corresponding hash set and central hash from the hypercube coordinates above. There are 2 alternative hash functions: DecimalHash and FNV1aHash. DecimalHash does not have collisions, but is not suitable for cases with large number of buckets or dimensions. FNV1aHash is applicable for all cases.
[Example](https://github.com/vitali-fedulov/imagehash2/blob/main/hashes.go) for similar image search and clustering.
[Go doc](https://pkg.go.dev/github.com/vitali-fedulov/hyper) for full code documentation.

View File

@ -18,7 +18,7 @@ type Params struct {
// CubeSet returns a set of hypercubes, which represent
// fuzzy discretization of one n-dimensional vector,
// as described in
// https://vitali-fedulov.github.io/algorithm-for-hashing-high-dimensional-float-vectors.html
// https://vitali-fedulov.github.io/similar.pictures/algorithm-for-hashing-high-dimensional-float-vectors.html
// One hupercube is defined by bucket numbers in each dimension.
// min and max are minimum and maximum possible values of
// the vector components. The assumption is that min and max

View File

@ -17,7 +17,7 @@ func TestDecimalHash(t *testing.T) {
func TestFNV1aHash(t *testing.T) {
cube := Cube{5, 59, 255, 9, 7, 12, 22, 31}
hash := cube.FNV1aHash()
want := uint64(1659788114117494335)
want := uint64(6267598672213710911)
if hash != want {
t.Errorf(`Got %v, want %v.`, hash, want)
}
@ -31,10 +31,10 @@ func TestHashSet(t *testing.T) {
{1, 0, 8, 3, 0, 0, 9}}
hashSet := cubes.HashSet((Cube).FNV1aHash)
want := []uint64{
6172277127052188606,
3265650857171344968,
13730239218993256724,
6843127655045710906}
9211138565158515574,
6304441926533466432,
5296875461196147964,
13706017245957046114}
if !reflect.DeepEqual(hashSet, want) {
t.Errorf(`Got %v, want %v.`, hashSet, want)
}