This question refers to C++.
Say I have 10 million records of data, each piece of data is a 6 digit number, which I will have numbers being inputted that n开发者_开发技巧eed to be matched to this data.
It boils down to two questions:
- What would be the best way to store this data? An array?
- What would be the best way to search or match this data?
I'm looking for performance more than anything else, memory usage is not a problem. I was looking into hash functions but I'm not sure if that's what I should even be looking for.
For fast lookup, there are basically two options: std::map
, which has O(log n) lookup, or std::unordered_map
, which has expected O(1) lookup (but possibly worse).
If your key type is literally an integer (which by the sound of it is the case), you have perfect hashing for free, so an unordered map would be available with minimal additional cost, so I'd try that one.
But just make a typedef and try both and compare!
#include <map>
#include <unordered_map>
typedef unsigned int key_type; // fine, has < , ==, and std::hash
typedef std::map<key_type, some_value_type> my_map;
// typedef std::unordered_map<key_type, some_value_type> my_map;
my_map m; // populate
my_map::const_iterator it = m.find(<some random key>);
If you don't actually need to associate any data to the keys, i.e. if you don't need a value type, then replace "map" by "set" everywhere. If you need multiple records with the same key, replace "map" by "multimap" everywhere.
With only a 6 digit number to look up, you could keep an array of 1 million elements and do the lookup directly.
If you know right off the bat how many records you're going to have, you can preallocate an array to that size and then start storing the data. Otherwise, some other data structure such as a vector would be better.
For searching, use a binary search. It will significantly cut down on your search time.
Basically, what will happen...(the data needs to be sorted btw)..
You'll jump to the middle element of the data structure and see if your input is higher or lower. If it's higher, you go to the upper half of the structure and repeat this process recursively. If it's lower, you go to the lower half and do the same. You do this until you find your matching data.
Assuming memory is not an issue, why don't you store data into map or set in STL? Search must be one of the fastest.
精彩评论