开发者

A hashable, flexible identifier in python

开发者 https://www.devze.com 2023-01-25 09:31 出处:网络
I\'m trying to make some sort of hashable identifier in python; I need it to identify nodes in a graph. The trouble is that some nodes have different attributes. If the attributes of these nodes are p

I'm trying to make some sort of hashable identifier in python; I need it to identify nodes in a graph. The trouble is that some nodes have different attributes. If the attributes of these nodes are portrayed by dictionaries of the attributes to values:

idA = {'type':'A', 'name':'a_100'}
idB = {'type':'B', 'name':'b_3', 'value':7}

I want __hash__() and __eq__() to use the tuple pairs ((key1,value1), (key2,value2), ...).

Dictionaries would be ideal for this, because I'm going to check these properties fairly frequently, and dictionary lookup should be efficient (I'm using many identifiers and each will have many attributes). But dictionaries are not hashable.

A frozenset of the tuple pairs would hash properly, but would it be efficient for lookup?

If I declare an empty class, and then set attributes for it, that does what I want (possibly using a dictionary under the hood), but I don't know how to hash it. Maybe there's some way to hash it's member values using inspect or dir()?

class identifier():
    pass
idA = identifier()
idA.type = 'A'
idA.name = 'a_100'

If there is a way to use a hash (and == operator) based on tuple pairs of (attribute, value), then this would also do what I want.

Or is there some work around that can make the equivalent data type that would satisfy this SAT-type analogy: frozenset is to set as ? is to dict

Thanks for your help.


Edit:

Is this the right direction?

class identifier(dict):
    def to_frozenset(self):
        return frozenset([(k,self[k]) for k in self])
    def __hash__(self):
   开发者_StackOverflow中文版     return hash(self.to_frozenset())
    def __eq__(self, rhs):
        return self.to_frozenset() == rhs.to_frozenset()
    def __ne__(self, rhs):
        return not self == rhs

This shifts the computational complexity so that it is fast to lookup an identifier attribute, but slow to hash an identifier or check two identifiers for equality. If there were a way to cache its hash (and disallow its dictionary to change once the hash was cached), and we were guaranteed few hash collisions of identifier types (so checking for equality were rare), then maybe that would be a good solution? Let me know what you think!


There is no frozendict. But a collections.namedtuple is an approximation to that behaviour which may suit you.


Don't inherit from dict, encapsulate it. That way you can make sure it won't be changed.

As for caching, you can remember to_frozenset or its hash. Depending on the use pattern, remember the hash, which allows you to quickly return on hashing and inequality, and compare the frozensets only if the hashes match.

That said, you're much too worried about performance for someone who hasn't coded a benchmark yet. Build the simplest possible implementation. If it's fast you're done. Otherwise, benchmark it, then find an incremental way to improve measured results.


I'm not sure this solves your problem, but if you want an object to be hashable, you can implement it in this fashion:

class Hashable(object):
    def __hash__(self):
        return hash((self.__class__.__name__,
                     tuple(self.__dict__.items())))

You'll get the data of the object in structured tuple format, along with the class name as a hash signature of some king. You can even extend dict to use in this class.

0

精彩评论

暂无评论...
验证码 换一张
取 消