开发者

Suggest a good method with least lookup time complexity

开发者 https://www.devze.com 2022-12-26 17:48 出处:网络
I have a structure which has 3 identifier fields and one value field. I have a list of these objects. To give an analogy, the identifier fields are like the primary keys to the object. These 3 fields

I have a structure which has 3 identifier fields and one value field. I have a list of these objects. To give an analogy, the identifier fields are like the primary keys to the object. These 3 fields uniquely identify an object.

Class
{
   int a1;
   int a2;
   int a3;
   int value;
};

I would be having a list of say 1000 object of this datatype. I need to check for specific values of these identity key values by passing values of a1, a2 and a3 to a lookup function which would check if any object with those specific values of a1, a2 and a3 is present and returns that v开发者_Python百科alue. What is the most effective way to implement this to achieve a best lookup time?

One solution I could think of is to have a 3 dimensional matrix of length say 1000 and populate the value in it. This has a lookup time of O(1). But the disadvantages are. 1. I need to know the length of array. 2. For higher identity fields (say 20), then I will need a 20 dimension matrix which would be an overkill on the memory. For my actual implementation, I have 23 identity fields.

Can you suggest a good way to store this data which would give me the best look up time?


Create a key class that contains all the identity fields, and define an appropriate equals function and hash method, and then use a hash map to map from the key class to its associated value. This will give you a time complexity of O(1) per lookup in the expected case, and it only requires space proportional to the number of actual key combinations observed (typically twice the number, although you can adjust the constant for the time/space tradeoff that you desire), rather than space proportional to all possible key combinations.


Use hash table (map). Construct the key to be "a1-a2-a3", and store data to H(key)=data.


I would simply sort the array by key, and use a binary search.

(untested)

int compare_entry(ENTRY *k1, ENTRY *k2) {    
    int d = k1->a1 - k2->a1;
    if (d == 0) {
        d = k1->a2 - k2->a2;
        if (d == 0) {
            d = k1->a3 - k2->a3;
        }
    }
    return d; // >0 is k1 > k2, 0 if k1 == k2, <0 if k1 < k2
}

// Derived from Wikipedia
int find(ENTRY *list, int size, ENTRY *value) {
   int low = 0;
   int n = size - 1;
   int high = n;
   while (low < high) {
       int mid = low + (high - low) / 2
       int cmp = compare_entry(&list[mid], value);
       if (cmp < 0) {
           low = mid + 1;
       } else {
            high = mid; 
       }
   }
   if (low < n) {
       int cmp = compare_entry(&list[low], value);
       if (cmp == 0) {
           return low; // found item at 'low' index
       }
   } else {
        return -1;  // not found
   } 
}

Absolutely worst case, you run through this thing, what, 10 times, and end up actually doing all of the comparisons in the key comparison. So that's, what, 85 integer math operations (additions, subtraction, and 1 shift)?

if your a1-a3 are ranging 0-100, then you can make your key a1 * 10000 + a2 * 100 + a3, and do a single compare, and worst case is 63 integer math operations. And your entire array fits within cache on most any modern processor. And it's memory efficient.

You can burn memory with a perfect hash or some other sparse matrix. Even with a perfect hash, I bet the hash calculation itself is competitive with this time, considering multiplication is expensive. This hits the memory bus harder, obviously.

0

精彩评论

暂无评论...
验证码 换一张
取 消