What kind of overhead is involved in a Hashtable?_问答_开发者

What kind of overhead is involved in a Hashtable?

开发者 https://www.devze.com 2022-12-22 10:10 出处：网络

相关专题：hashtable

I understand that since allocation is at run-time there must be some house-keeping operations involved. But other than that what is the开发者_如何学Python overhead ? Also, would it be wise to create a hashtable Vs array when you need to store the number of times an integer element appears in an infinite stream of numbers ?

Theoretically, it depends on how many unique numbers there are in the stream of numbers. But any real life scenario I can imagine, an array would be ridiculously slower. The more unique numbers you process, the slower the array solution will become.

A HashTable generally maintains the same access speed no matter how large it becomes. For an 'infinite stream', I can't imagine how a HashTable won't be the better solution. How do you intend to search the array?

As Neil's comment implies, the overhead in hashtable implementations depends heavily on the particular implementation of a hash table. Typically, though, there will be storage overhead from unused hashes, and both storage and time overhead from dealing with hash collisions. There is of course also time overhead from computing the hash values.

In answer to your second question, this depends very much on the details of your stream of numbers, and the other aspects of your program. Some questions to consider:

Is the set of possible numbers large or small? (How big an array would you need to create?)
Out of the range of possible numbers, do you expect most of them to be used, or only a few? If you expect most of the possible numbers in the range to be used, then using a hash table will not save you much space.
Do you know the range of possible numbers before you start? Or is this unknown? Hash tables can deal with unknown ranges much more easily.
How important is saving storage space in this program? Can you easily afford to allocate an array of the necessary size? If you can easily afford to allocate the array, why bother with a hash table?
How important is runtime speed in this program? Arrays will be usually be faster.

Hashtables are pretty fast. Just as an experiment, I get about 50x slowdown between a raw array and a c++ hash_map (compile with the #if toggled both ways and try it yourself).

#include <ext/hash_map>
using namespace __gnu_cxx;

int main() {
#if 0
  hash_map<int,int> table;
  for (int i = 0; i < 256; i++) table[i] = 0;
#else
  int table[256];
#endif

  for (int i = 0; i < 100000000; i++) {
    table[i&0xff]++;
  }
}