I'm using Python-2.6. I have very little knowledge of hash functions.
I want to use a CRC hash function to hash an IP address like '128.0.0.5' into the range [0, H). Currently I'm thinking of doing
zlib.crc32('128.0.0.5')%H.
Is this okay? There's a few ques. you could try and answer...
does it make any diff. if I hash '128.0.0.5' or its binary '0001110101010..' whatever that is or without the '.'s
zlib.crc32 returns a signed integer. Does modding (%) a neg. with a positive H always give a pos no?
Does %-ing by H affect how good the hash function is? ( I mean is that the best I could do for the available space, with the ava开发者_如何学JAVAilable xlib.crc32)
Thanks!
Why do you want to hash an IP address into a number? They already have a native integer representation. For example, using netaddr:
>>> import netaddr
>>> ip = netaddr.IPAddress('192.168.1.1')
>>> ip.value
3232235777
>>> netaddr.IPAddress(3232235777)
IPAddress('192.168.1.1')
does it make any diff. if I hash '128.0.0.5' or its binary '0001110101010..' whatever that is or without the '.'s
Not really.
zlib.crc32 returns a signed integer. Does modding (%) a neg. with a positive H always give a pos no?
Yes.
Does %-ing by H affect how good the hash function is? ( I mean is that the best I could do for the available space, with the available xlib.crc32)
You'd better use all the bits of the checksum to make up for their lack of an "avalanche effect". Single-digit variations such as 192.168.1.1
, 192.168.1.2
, etc might produce differences only in the first bits of the checksum, and since %
cares only about the last bits, hashes will collide.
ad 1) It will yield different results, but does not effect the quality of the hash.
ad 2) It will always yield a positive number or zero.
ad 3) As you limit the number of possible buckets, it does affect the quality of the hash.
In general: About how large is your H? Remember that a IPv4 address is nothing more than a 32-bit value. 192.168.0.1 is just a more human readable byte-wise representation. So if your H is larger than 4294967295, there will be no need of hashing.
精彩评论