I have a set of vectors. For a vector in that set I like to find the sub se开发者_StackOverflow社区t that is closeest to this vector. What algorithm can do this.
This class of algorithms is called Nearest Neighbor or K Nearest Neighbor.
The cosine similarity as excepeiont says will work if direction of vector is important. If the vector represents a position in a space, then any metric for representing a distance in the space will work.
For example the Euclidean distance: take the square root of the sum of squares difference in each dimension. This will give you a distance for each vector, then sort your set of vectors ascending on this distance.
This process will be O(N) in time. If this is too slow for you, you might want to look at some common K Nearest Neighbour algorithms.
use the cosinus similarity (http://en.wikipedia.org/wiki/Cosine_similarity) among the vectors and then sort them.
If your problem relates to large amount of data:
I published a related algorithm on ddj.com, that finds the nearest line to a given point:
Accelerated Search For the Nearest Line
You would have to modify this algorithm by i.e. by converting the given vector to a number of points. This will reduce the number of possible matches drastically. The exact match has then to be checked for each possible match by
- Find the cutting point of both vectors or
- Get distance from vector start and end point to the possible match, as described in the article
精彩评论