开发者

family of binary hash functions

开发者 https://www.devze.com 2023-01-05 20:34 出处:网络
I am looking for a family of hash function F1,..Fn, where each Fi maps any key in [0,1]. My first implementation was Fi(k) = F(k,i) = hash(i,hash(k,0)),. Here hash is the hashlittle function provided

I am looking for a family of hash function F1,..Fn, where each Fi maps any key in [0,1]. My first implementation was Fi(k) = F(k,i) = hash(i,hash(k,0)),. Here hash is the hashlittle function provided here (http://burtleburtle.net/bob/c/lookup3.c). I haven't looked under the hood of what exactly hashlittle does.

As sharp readers would have noticed, this will fails. My question is how to achiev开发者_C百科e this efficiently. My objective is to minimize, on average, the largest i for which Fi(k1) == Fi(k2) for any given k1,k2 pair. Of course it should be fast too..


Well, I've looked under the hood a bit.

uint32_t hashlittle( const void *key, size_t length, uint32_t initval)
{
  union { const void *ptr; size_t i; } u;     /* needed for Mac Powerbook G4 */

  u.ptr = key;
  if (HASH_LITTLE_ENDIAN && ((u.i & 0x3) == 0)) {

Writing u.ptr and then reading u.i is undefined behaviour.

EDIT

I think I understand now. You basically need hash functions that take two parameters as input. You can use nearly any hash function for this.

A hash function takes a data packet of an arbitrary bit size and transforms it into a data packet of a fixed bit size:

hashval = Hash(data, len);

You need a function where an additional parameter is given and used within the transformation, right?

hashval = Hash(data, len, addval);

The simplest way is to concatenate the additional value to the data packet:

memcpy((char *)data + len, &addval, sizeof(addval));
hashval = Hash(data, len + sizeof(addval));

If you have the source available, another way is to modify it to use the new parameter as initialization for the internal hash calculation. This is what was done in hashlittle.

Before:
uint32_t Hash (const void *data, size_t len)
{
    uint32_t hashval = 0;
    ....
    return (hashval);
}

After:
uint32_t Hash (const void *data, size_t len, uint32_t init)
{
    uint32_t hashval = init;
    ....
    return (hashval);
}

This option may be a bit harder to do, as the internal state can be much more than a single hashval, and the initialization can be quite sophisticated instead of simply using a 0. In hashlittle it is:

/* Set up the internal state */
a = b = c = 0xdeadbeef + ((uint32_t)length) + initval;
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号