I have a dataset for which I need to find the K nearest neighbours, or all the neighbours within a distance d. The dataset has a custom distance defined but it is not an Euclidean distance.
I have used metric trees before, mostly the cover tree. In this case, however, my dataset is going to be larger t开发者_如何学JAVAhan the available memory. So, is there any data structure that can be used for nearest neighbours on a disk stored dataset? A good database index for this operation would also be useful.
You could use the cover tree to hold pointers to your disk dataset. The pointer would contain the relative record number and whatever additional information from the record that allows you to traverse the tree.
精彩评论