开发者

Simple hash function techniques

开发者 https://www.devze.com 2023-02-24 10:21 出处:网络
I\'m pretty new to hashing in Java and I\'ve been getting stuck on a few parts. I have a list of 400 items (and stored in a list of 1.5x = 600), which the item id\'s range from 1-10k. I\'ve been looki

I'm pretty new to hashing in Java and I've been getting stuck on a few parts. I have a list of 400 items (and stored in a list of 1.5x = 600), which the item id's range from 1-10k. I've been looking at a few hash functions and I initially copied the examples in the packet, which just used folding. I noticed that I've been getting about 50-60% null nodes, which is apparently too many. I also noticed that just modding the id by 600 tends to reduce it to a solid 50% nulls.

My current hash function looks something like, and for being as ugly as it is, it's only a 1% decrease in nulls from a simple modding, with an avg list length of 1.32...

   public int getHash( int id )
   {
      int hash = id;

      hash <<= id % 3;
      hash += id << hash % 5;

      /* let's go digit by digit! */          
      int digit;
      for( digit = id % 10;
           id != 0;
           digit = id % 10, id /= 10 )
      {
         if ( digit == 0 ) /* prevent division by zero */
            continue;
         hash += digit * 2;
      }

      hash >>= 5;
      return (hash % 600);
   }

What are so开发者_StackOverflow社区me good techniques for creating simple hash functions?


I would keep it simple. Return the id of your element as your hashcode, and let the hashtable worry about rehashing it if it feels it needs to. Your goal should be to make a hash code unique to your object.

The Java HashMap uses the following rehashing method:

/**
 * Applies a supplemental hash function to a given hashCode, which
 * defends against poor quality hash functions.  This is critical
 * because HashMap uses power-of-two length hash tables, that
 * otherwise encounter collisions for hashCodes that do not differ
 * in lower bits. Note: Null keys always map to hash 0, thus index 0.
 */
static int hash(int h) {
    // This function ensures that hashCodes that differ only by
    // constant multiples at each bit position have a bounded
    // number of collisions (approximately 8 at default load factor).
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}


There's a nice review article here. Also, the Wikipedia article on hash functions is a good overview. It suggests using a chi-squared test to assess the quality of your hash function.

0

精彩评论

暂无评论...
验证码 换一张
取 消