I have to store objects that have two attributes (ida and idb) inside a dict. Both attributes are 64 bit positive integers and I can only store one object for a unique arrangement(combination in which the order matters) of ida and idb. For example:
obj1 = SomeClass(ida=5223372036854775807, idb=2)
obj2 = SomeClass(ida=2, idb=5223372036854775807)
obj3 = SomeClass(ida=5223372036854775807, idb=2)
Since the objects themselves are mutable, I'm using hashes of tuples '(ida, idb,)' as keys. Following the example, consider this:
d = {}
# storing obj1
k1 = hash((obj1.ida, o开发者_JS百科bj1.idb,))
d[k1] = obj1
# storing obj2
k2 = hash((obj2.ida, obj2.idb,))
d[k2] = obj2
# storing obj3
k3 = hash((obj3.ida, obj3.idb,)) # note that k3 hash should be the same as k1
d[k3] = obj3
#at this point 'd' should contain only 'obj2' and 'obj3' since 'k1' == 'k3'
I tested the above example in my 64 bit machine and it worked as expected.
I have the following questions:
- Is it possible for two tuples of big integers to compute the same hash if they have differences either in elements or in ordering?
- What about in 32 bit platforms?
- If not, is there a platform independent way to guarantee that the above example will work no matter the values of ida and idb?
In [33]: hash?
Return a hash value for the object. Two objects with the same value have the same hash value. The reverse is not necessarily true, but likely.
Why not just use the tuple (ida,idb) as the key?
import pprint
class SomeClass(object):
def __init__(self,ida,idb):
self.ida=ida
self.idb=idb
obj1 = SomeClass(ida=5223372036854775807, idb=2)
obj2 = SomeClass(ida=2, idb=5223372036854775807)
obj3 = SomeClass(ida=5223372036854775807, idb=2)
d={}
for obj in (obj1,obj2,obj3):
d[obj.ida,obj.idb]=obj
pprint.pprint(d)
# {(2, 5223372036854775807L): <__main__.SomeClass object at 0xb78839ec>,
(5223372036854775807L, 2): <__main__.SomeClass object at 0xb7883a0c>}
Like Ignacio said, you can have hash collisions. Why don't you just use the tuple itself? Tuples are immutable and it looks like your ida and idb are (immutable) integers.
There are only 2**<word size> possible hashes, so you would have to run a 128-bit version of Python to even store all (2**64)**2 possible hashes in the first place. And yes, it could still be possible to have collisions. Use a set
if you need to store unique objects; just define __hash__()
and __eq__()
in a sane manner.
精彩评论