As part of our research group, we're collecting large amounts of location data. Our data essentially looks like (user id, lat/long co-ordinates, timestamp). There's other metadata involved too, but that's not relevant here. We're collecting about 2-3 million records a week, and expect to collect about a year'开发者_JAVA技巧s worth of data in due time.
I'd really like some advice on techniques on storing and processing this data. We'd like to be able to answer queries similar to:
(1) For a given location, who was near that location (within a specified distance) over a specified period of time?
(2) Which locations are near each other?
That's the general idea. We don't need a real-time response, but what are good databases (or other data storage software)? I've come across people talking about k-d trees, does that work at this scale? What kind of hardware do I need? I'm hoping to get pointers towards general strategies. How do we store this data? Does it even make sense to store it all in a database? Which data/software/packages lend themselves well to distance/radius calculations?
We're most familiar with Python/Linux, would prefer to stay away from Java and prefer open source/free software. We're new to all this, pointers to books and papers would also be useful. All and any advice would be greatly useful.
PostGIS is probably what you are looking for.
精彩评论