开发者

What is the space complexity of a hash table?

开发者 https://www.devze.com 2023-03-16 03:45 出处:网络
What is size of a hash table with 32 bit key and 32 bit pointers to values stored separately? Is it going to be 2^32 slots * (4 Bytes (key) + 4 Bytes (pointers to values))

What is size of a hash table with 32 bit key and 32 bit pointers to values stored separately?

Is it going to be 2^32 slots * (4 Bytes (key) + 4 Bytes (pointers to values)) = 4 * 10^9 * (4 + 4) = 32开发者_运维技巧GB ?

I am trying to understand space complexity of hash tables.


I think you are asking the wrong question. The space complexity of a datastructure indicates how much space it occupies in relation to the amount of elements it holds. For example a space complexity of O(1) would mean that the datastructure alway consumes constant space no matter how many elements you put in there. O(n) would mean that the space consumption grows linearly with the amount of elements in it.

A hashtable typically has a space complexity of O(n).

So to answer your question: It depends on the number of elements it currently stores and in real world also on the actual implementation.

A lower bound for the memory consumption of your hashtable is: (Number of Values to Store) * (SizeOf a Value). So if you want to store 1 million values in the hashtable and each occupies 4 bytes then it will consume at least 4 million bytes (roughly 4MB). Usually real world implementations use a bit more memory for infrastructure but again: this highly depends on the actual implementation and there is no way to find out for sure but to measure it.


Hash tables don't match hash function values and slots. The hash function is computed modulo the size of a reference vector that is much smaller than the hash function range. Because this value is fixed, it is not considered in the space complexity computation.

Consequently, the space complexity of every reasonable hash table is O(n).

In general, this works out quite well. While the key space may be large, the number of values to store is usually quite easily predictable. Certainly, the amount of memory that is functionally acceptable for data structure overhead is typically obvious.

This is why hash tables are so ubiquitous. They often provide the best data structure for a given task, mixing strictly bounded memory overhead with better than log2 n time complexity. I love binary trees but they don't usually beat hash tables.


Lets pretend we have a naive hashtable where the number of buckets is equal to double the size of the elements. That is O(2n) the number of elements which is O(n).

When the number of elements exceeds half of the number of available buckets, you need to create a new array of buckets, double the size and rehash all the elements to their new locations in the new array of buckets.

386  public V put(K key, V value) {
387      if (key == null)
388          return putForNullKey(value);
389      int hash = hash(key.hashCode());
390      int i = indexFor(hash, table.length);
391      for (Entry<K,V> e = table[i]; e != null; e = e.next) {
392          Object k;
393          if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
394              V oldValue = e.value;
395              e.value = value;
396              e.recordAccess(this);
397              return oldValue;
398          }
399      }
401      modCount++;
402      addEntry(hash, key, value, i);
403      return null;
404  }

768  void addEntry(int hash, K key, V value, int bucketIndex) {
769      Entry<K,V> e = table[bucketIndex];
770      table[bucketIndex] = new Entry<K,V>(hash, key, value, e);
771      if (size++ >= threshold)
772          resize(2 * table.length);
773  }

471  void resize(int newCapacity) {
472      Entry[] oldTable = table;
473      int oldCapacity = oldTable.length;
474      if (oldCapacity == MAXIMUM_CAPACITY) {
475          threshold = Integer.MAX_VALUE;
476          return;
477      }
479      Entry[] newTable = new Entry[newCapacity];
480      transfer(newTable);
481      table = newTable;
482      threshold = (int)(newCapacity * loadFactor);
483  }

488  void transfer(Entry[] newTable) {
489      Entry[] src = table;
490      int newCapacity = newTable.length;
491      for (int j = 0; j < src.length; j++) {
492          Entry<K,V> e = src[j];
493          if (e != null) {
494              src[j] = null;
495              do {
496                  Entry<K,V> next = e.next;
497                  int i = indexFor(e.hash, newCapacity);
498                  e.next = newTable[i];
499                  newTable[i] = e;
500                  e = next;
501              } while (e != null);
502          }
503      }
504  }

References:

HashMap.put
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/HashMap.java#HashMap.put%28java.lang.Object%2Cjava.lang.Object%29

Grepcode is down, you can take a look the openjdk repo here as a better reference: http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/util/HashMap.java


Still there is no perfect answer to the question. I am not sure about the space occupied. As per my understanding of the issue. The size is dynamic and varies with the size of input.

That is we start with a random number, hash table size, which is very less as compare to hash function value. Then we insert the input. Now, as the collision start occurring we dynamically double the hash table size. This is the reason, I think, for O(n) complexity. Kindly correct me if I am wrong.

0

精彩评论

暂无评论...
验证码 换一张
取 消