Using STL map for searching for points lying in a rectangular area?_问答_开发者

I have with me a lot of x,y points and each x,y point has some extra data associated with it. This extra data I'll be storing in a struct. My application requires that given any one point, I'll have to find how many other points lie within a rectangular area surrounding this point (this point is at the centre of the rectangle).

One l开发者_如何学编程ogic I've thought of is to store all x points as the keys in a map A and all y points as the keys in another map B. Map A will have x as the key and y values as the value. Map B will have y as the key and the associated struct as the value.

This way, if the given point is (10.5,20.6), I can use upper_bound(10.5+RECTANGLE_WIDTH) and lower_bound(10.5-RECTANGLE_WIDTH) to find the range of x values lying within the rectangle and for the corresponding y values, find whether the y values lie within the +- range of 20.6.

My whole point of using map was because I have a massive store of x,y points and the searching has to be done every two seconds. So I had to use the log(n) search of map. I feel that this can be done in a more efficient way. Suggestions?

This is a typical application for a quadtree. The quadtree facilitates lookup of the m points lying in your rectangle in O(log(n) + m), where n is the total number of points.

Edit: Your approach using the map is not nearly as efficient. For randomly distributed points, it would have an O(sqrt(n)) average complexity, and O(n) worst-case.

How about you store the points as a simple 2 dimensional array of pointers to those structs, and when you need to find a point x,y it's a simple index operation. The same goes for any other points (x+a,y+b).

If you use a std::map of points the lookup will always be O(log N) where N is the number of points you have.

Your other option would be to divide your search space into buckets and put your point into buckets. You then calculate in your rectangle:

any buckets for which all the points are inside your rectangle
any for which there is some overlap.

For those that there is some overlap you can then look up in your collection which is O(M) if you use the right collection type per bucket, but M should be smaller than N. It may even be that M rarely exceeds a handful in which case you can probably check them linearly.

Working out which buckets overlap is a constant time operation but you have to run through these linearly (even to check if they are empty) so having too many of them may also be an issue.

The first observation would be that std::map wouldn't be the most efficient structure in any case. Your input is pretty much fixed, apparently (from the comments). In that case, std::binary_search on a sorted std::vector is more efficient. The main benefit of std::map over a sorted std::vector is that insertion is O(log N) instead of O(N), and you don't need that.

The next observation would be that you can probabaly afford to be a bit inaccurate in the first phase. Your output set will probably be a lot smaller than the total number of points (else a linear search would be in order). But assuming that this is the case, you might benefit from rounding up your rectangle. This will result in more candidate points, which you then check against the precise boundary.

For instance, if your points lay randomly distributed in the XY plane between (0,0) and (200,300), it would be possible to create a 20x30 matrix with each holding an subarea of size (10,10). If you now need points in the rectangle from (64,23) to (78, 45), you need to check subareas [6,2], [6,3], [6,4], [7,2], [7,3] and [7,4] - only 6 of the 600. In the second step, you'd throw out results such as (61, 25).