Is anyone aware of a KD-Tree, or similar spatial index, implemented in SQL? I was considering writing my own using Python and Django's ORM, but I'd like to avoid reinventing the wheel.
I have a table containing millions of rows, with each row containing 128 columns representing image feature data. Given an arbitrary 128-element long list of image features, I want to use a KD-Tree to find the N most similar images in the database. I've found a lot of KD-Tree implementations, but they all appear to only load in local memory and don't scale o开发者_JAVA百科r talk to databases.
KD-tree does not work well for high-dimensional data, and 128 dimensions would be quite high. The KD-tree indexes each dimension at a different level of the tree, and when performing a query the algorithm will do a lot of back-tracking (searching both sides of a branch) and ends up searching most of the points in the tree. When this happens the advantages of using a tree structure disappear and an exhaustive comparison ends up running faster.
You may want to find an existing image similarity search system that you can map your data into. Here is one called Lire which extracts features from images and indexes them using Lucene.
If your work is more research-oriented you may want to read up on metric space indexes and approximate k-nearest neighbor search.
I might be a little out here, but your best bet may be using the Gist / Gin indexes inside of Postgresql
精彩评论