What is size of a hash table with 32 bit key and 32 bit pointers to values stored separately?
Is it going to be 2^32 slots * (4 Bytes (key) + 4 Bytes (pointers to values)) = 4 * 10^9 * (4 + 4) = 32开发者_运维技巧GB ?
I am trying to understand space complexity of hash tables.
I think you are asking the wrong question. The space complexity of a datastructure indicates how much space it occupies in relation to the amount of elements it holds. For example a space complexity of O(1)
would mean that the datastructure alway consumes constant space no matter how many elements you put in there. O(n)
would mean that the space consumption grows linearly with the amount of elements in it.
A hashtable typically has a space complexity of O(n)
.
So to answer your question: It depends on the number of elements it currently stores and in real world also on the actual implementation.
A lower bound for the memory consumption of your hashtable is: (Number of Values to Store) * (SizeOf a Value). So if you want to store 1 million values in the hashtable and each occupies 4 bytes then it will consume at least 4 million bytes (roughly 4MB). Usually real world implementations use a bit more memory for infrastructure but again: this highly depends on the actual implementation and there is no way to find out for sure but to measure it.
Hash tables don't match hash function values and slots. The hash function is computed modulo the size of a reference vector that is much smaller than the hash function range. Because this value is fixed, it is not considered in the space complexity computation.
Consequently, the space complexity of every reasonable hash table is O(n).
In general, this works out quite well. While the key space may be large, the number of values to store is usually quite easily predictable. Certainly, the amount of memory that is functionally acceptable for data structure overhead is typically obvious.
This is why hash tables are so ubiquitous. They often provide the best data structure for a given task, mixing strictly bounded memory overhead with better than log2 n time complexity. I love binary trees but they don't usually beat hash tables.
Lets pretend we have a naive hashtable where the number of buckets is equal to double the size of the elements. That is O(2n) the number of elements which is O(n).
When the number of elements exceeds half of the number of available buckets, you need to create a new array of buckets, double the size and rehash all the elements to their new locations in the new array of buckets.
386 public V put(K key, V value) {
387 if (key == null)
388 return putForNullKey(value);
389 int hash = hash(key.hashCode());
390 int i = indexFor(hash, table.length);
391 for (Entry<K,V> e = table[i]; e != null; e = e.next) {
392 Object k;
393 if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
394 V oldValue = e.value;
395 e.value = value;
396 e.recordAccess(this);
397 return oldValue;
398 }
399 }
401 modCount++;
402 addEntry(hash, key, value, i);
403 return null;
404 }
768 void addEntry(int hash, K key, V value, int bucketIndex) {
769 Entry<K,V> e = table[bucketIndex];
770 table[bucketIndex] = new Entry<K,V>(hash, key, value, e);
771 if (size++ >= threshold)
772 resize(2 * table.length);
773 }
471 void resize(int newCapacity) {
472 Entry[] oldTable = table;
473 int oldCapacity = oldTable.length;
474 if (oldCapacity == MAXIMUM_CAPACITY) {
475 threshold = Integer.MAX_VALUE;
476 return;
477 }
479 Entry[] newTable = new Entry[newCapacity];
480 transfer(newTable);
481 table = newTable;
482 threshold = (int)(newCapacity * loadFactor);
483 }
488 void transfer(Entry[] newTable) {
489 Entry[] src = table;
490 int newCapacity = newTable.length;
491 for (int j = 0; j < src.length; j++) {
492 Entry<K,V> e = src[j];
493 if (e != null) {
494 src[j] = null;
495 do {
496 Entry<K,V> next = e.next;
497 int i = indexFor(e.hash, newCapacity);
498 e.next = newTable[i];
499 newTable[i] = e;
500 e = next;
501 } while (e != null);
502 }
503 }
504 }
References:
HashMap.put
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/HashMap.java#HashMap.put%28java.lang.Object%2Cjava.lang.Object%29
Grepcode is down, you can take a look the openjdk repo here as a better reference: http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/util/HashMap.java
Still there is no perfect answer to the question. I am not sure about the space occupied. As per my understanding of the issue. The size is dynamic and varies with the size of input.
That is we start with a random number, hash table size, which is very less as compare to hash function value. Then we insert the input. Now, as the collision start occurring we dynamically double the hash table size. This is the reason, I think, for O(n) complexity. Kindly correct me if I am wrong.
精彩评论