开发者

python: dictionary dilemma: how to properly index objects based on an attribute

开发者 https://www.devze.com 2022-12-20 16:25 出处:网络
first, an example: given a bunch of Person objects with various attributes (name, ssn, phone, email address, credit card #, etc.)

first, an example:

given a bunch of Person objects with various attributes (name, ssn, phone, email address, credit card #, etc.)

now imagine the following simple website:

  1. uses a person's email address as unique login name
  2. lets users edit their attributes (including their email address)

if this website had tons of users, then it make sense to store Person objects in a dictionary indexed by email address, for quick Person retrieval upon login.

however when a Person's email address is edited, then the dictionary key for that Person needs to be changed as well. this is slightly yucky

im looking for suggestions on how to tackle the generic problem:

given a bunch of entities with a shared aspect. the aspect is used both for fast access to the entities and within each entity's functionality. where should the aspect be placed:

  1. within each entity (not good for fast access)
  2. index only (not good for each entity's functionality)
  3. both within each entity and as index (duplicate data/reference)
  4. somewhere else/somehow differently

the problem may be extended, say, if we want to use several indices to index the data (ssn, credit card number, etc.). eventually we may end up with a bunch of SQL tables.

im looking for something with the following properties (and more if you can think of them):

# create an index on the attribute of a class
magical_index = magical_index_factory(class, class.attribute)
# create an object
obj = class() 
# set the object's attribute
obj.attribute= value
# retrieve object from using attribute as index
magical_index[value] 
# change object attribute to new value
obj.attribute= new_value 
# automagically object can be retrieved using new value of attribute
magical_index[new_value]
# become less materialistic: get rid of the objects in your开发者_运维百科 life
del obj
# object is really gone
magical_index[new_value]
KeyError: new_value

i want the object, indices, all to play nicely and seamlessly with each other.

please suggest appropriate design patterns

note: the above example is just that, an example. an example used to portray the generic problem. so please provide generic solutions (of course, you may choose to keep using the example when explaining your generic solution)


Consider this.

class Person( object ):
    def __init__( self, name, addr, email, etc. ):
        self.observer= []
        ... etc. ...
    @property
    def name( self ): return self._name
    @name.setter
    def name( self, value ): 
        self._name= value
        for observer in self.observedBy: observer.update( self )
    ... etc. ...

This observer attribute implements an Observable that notifies its Observers of updates. This is the list of observers that must be notified of changes.

Each attribute is wrapped with properties. Using Descriptors us probably better because it can save repeating the observer notification.

class PersonCollection( set ):
    def __init__( self, *args, **kw ):
        self.byName= collections.defaultdict(list)
        self.byEmail= collections.defaultdict(list)
        super( PersonCollection, self ).__init__( *args, **kw )
    def add( self, person ):
        super( PersonCollection, self ).append( person )
        person.observer.append( self )
        self.byName[person.name].append( person )
        self.byEmail[person.email].append( person )
    def update( self, person ):
        """This person changed.  Find them in old indexes and fix them."""
        changed = [(k,v) for k,v in self.byName.items() if id(person) == id(v) ]
        for k, v in changed:
            self.byName.pop( k )
        self.byName[person.name].append( person )
        changed = [(k,v) for k,v in self.byEmail.items() if id(person) == id(v) ]
        for k, v in changed:
            self.byEmail.pop( k )
        self.byEmail[person.email].append( person)

    ... etc. ... for all methods of a collections.Set.

Use collections.ABC for more information on what must be implemented.

http://docs.python.org/library/collections.html#abcs-abstract-base-classes

If you want "generic" indexing, then your collection can be parameterized with the names of attributes, and you can use getattr to get those named attributes from the underlying objects.

class GenericIndexedCollection( set ):
    attributes_to_index = [ ] # List of attribute names
    def __init__( self, *args, **kw ):
        self.indexes = dict( (n, {}) for n in self.attributes_to_index ]
        super( PersonCollection, self ).__init__( *args, **kw )
    def add( self, person ):
        super( PersonCollection, self ).append( person )
        for i in self.indexes:
            self.indexes[i].append( getattr( person, i )

Note. To properly emulate a database, use a set not a list. Database tables are (theoretically) sets. As a practical matter they are unordered, and an index will allow the database to reject duplicates. Some RDBMS's don't reject duplicate rows because -- without an index -- it's too expensive to check.


Well, another way may be to implement the following:

  1. Attr is an abstraction for a "value". We need this since there is no "assignment overloading" in Python (simple get / set paradigm is used as the cleanest alternative). Attr also acts as an "Observable".

  2. AttrSet is an "Observer" for Attrs, which tracks their value changes while effectively acting as an Attr-to-whatever (person in our case) dictionary.

  3. create_with_attrs is a factory producing what looks like a named-tuple, forwarding attribute access via supplied Attrs, so that person.name = "Ivan" effectively yields person.name_attr.set("Ivan") and makes the AttrSets observing this person's name appropriately rearrange their internals.

The code (tested):

from collections import defaultdict

class Attribute(object):
    def __init__(self, value):
        super(Attribute, self).__init__()
        self._value = value
        self._notified_set = set()
    def set(self, value):
        old = self._value
        self._value = value
        for n_ch in self._notified_set:
            n_ch(old_value=old, new_value=value)
    def get(self):
        return self._value
    def add_notify_changed(self, notify_changed):
        self._notified_set.add(notify_changed)
    def remove_notify_changed(self, notify_changed):
        self._notified_set.remove(notify_changed)

class AttrSet(object):
    def __init__(self):
        super(AttrSet, self).__init__()
        self._attr_value_to_obj_set = defaultdict(set)
        self._obj_to_attr = {}
        self._attr_to_notify_changed = {}
    def add(self, attr, obj):
        self._obj_to_attr[obj] = attr
        self._add(attr.get(), obj)
        notify_changed = (lambda old_value, new_value:
                          self._notify_changed(obj, old_value, new_value))
        attr.add_notify_changed(notify_changed)
        self._attr_to_notify_changed[attr] = notify_changed
    def get(self, *attr_value_lst):
        attr_value_lst = attr_value_lst or self._attr_value_to_obj_set.keys()
        result = set()
        for attr_value in attr_value_lst:
            result.update(self._attr_value_to_obj_set[attr_value])
        return result
    def remove(self, obj):
        attr = self._obj_to_attr.pop(obj)
        self._remove(attr.get(), obj)
        notify_changed = self._attr_to_notify_changed.pop(attr)
        attr.remove_notify_changed(notify_changed)
    def __iter__(self):
        return iter(self.get())
    def _add(self, attr_value, obj):
        self._attr_value_to_obj_set[attr_value].add(obj)
    def _remove(self, attr_value, obj):
        obj_set = self._attr_value_to_obj_set[attr_value]
        obj_set.remove(obj)
        if not obj_set:
            self._attr_value_to_obj_set.pop(attr_value)
    def _notify_changed(self, obj, old_value, new_value):
        self._remove(old_value, obj)
        self._add(new_value, obj)

def create_with_attrs(**attr_name_to_attr):
    class Result(object):
        def __getattr__(self, attr_name):
            if attr_name in attr_name_to_attr.keys():
                return attr_name_to_attr[attr_name].get()
            else:
                raise AttributeError(attr_name)
        def __setattr__(self, attr_name, attr_value):
            if attr_name in attr_name_to_attr.keys():
                attr_name_to_attr[attr_name].set(attr_value)
            else:
                raise AttributeError(attr_name)
        def __str__(self):
            result = ""
            for attr_name in attr_name_to_attr:
                result += (attr_name + ": "
                           + str(attr_name_to_attr[attr_name].get())
                           + ", ")
            return result
    return Result()

With the data prepared with

name_and_email_lst = [("John","email1@dot.com"),
                      ("John","email2@dot.com"),
                      ("Jack","email3@dot.com"),
                      ("Hack","email4@dot.com"),
                      ]

email = AttrSet()
name = AttrSet()

for name_str, email_str in name_and_email_lst:
    email_attr = Attribute(email_str)
    name_attr = Attribute(name_str)
    person = create_with_attrs(email=email_attr, name=name_attr)
    email.add(email_attr, person)
    name.add(name_attr, person)

def print_set(person_set):
    for person in person_set: print person
    print

the following pseudo-SQL snippet sequence gives:

SELECT id FROM email

>>> print_set(email.get())
email: email3@dot.com, name: Jack,
email: email4@dot.com, name: Hack,
email: email2@dot.com, name: John,
email: email1@dot.com, name: John,

SELECT id FROM email WHERE email="email1@dot.com"

>>> print_set(email.get("email1@dot.com"))
email: email1@dot.com, name: John,

SELECT id FROM email WHERE email="email1@dot.com" OR email="email2@dot.com"

>>> print_set(email.get("email1@dot.com", "email2@dot.com"))
email: email1@dot.com, name: John,
email: email2@dot.com, name: John,

SELECT id FROM name WHERE name="John"

>>> print_set(name.get("John"))
email: email1@dot.com, name: John,
email: email2@dot.com, name: John,

SELECT id FROM name, email WHERE name="John" AND email="email1@dot.com"

>>> print_set(name.get("John").intersection(email.get("email1@dot.com")))
email: email1@dot.com, name: John,

UPDATE email, name SET email="jon@dot.com", name="Jon"

WHERE id IN

SELECT id FROM email WHERE email="email1@dot.com"

>>> person = email.get("email1@dot.com").pop()
>>> person.name = "Jon"; person.email = "jon@dot.com"
>>> print_set(email.get())
email: email3@dot.com, name: Jack,
email: email4@dot.com, name: Hack,
email: email2@dot.com, name: John,
email: jon@dot.com, name: Jon,

DELETE FROM email, name WHERE id=%s

SELECT id FROM email

>>> name.remove(person)
>>> email.remove(person)
>>> print_set(email.get())
email: email3@dot.com, name: Jack,
email: email4@dot.com, name: Hack,
email: email2@dot.com, name: John,
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号