first, an example:
given a bunch of Person objects with various attributes (name, ssn, phone, email address, credit card #, etc.)
now imagine the following simple website:
- uses a person's email address as unique login name
- lets users edit their attributes (including their email address)
if this website had tons of users, then it make sense to store Person objects in a dictionary indexed by email address, for quick Person retrieval upon login.
however when a Person's email address is edited, then the dictionary key for that Person needs to be changed as well. this is slightly yucky
im looking for suggestions on how to tackle the generic problem:
given a bunch of entities with a shared aspect. the aspect is used both for fast access to the entities and within each entity's functionality. where should the aspect be placed:
- within each entity (not good for fast access)
- index only (not good for each entity's functionality)
- both within each entity and as index (duplicate data/reference)
- somewhere else/somehow differently
the problem may be extended, say, if we want to use several indices to index the data (ssn, credit card number, etc.). eventually we may end up with a bunch of SQL tables.
im looking for something with the following properties (and more if you can think of them):
# create an index on the attribute of a class
magical_index = magical_index_factory(class, class.attribute)
# create an object
obj = class()
# set the object's attribute
obj.attribute= value
# retrieve object from using attribute as index
magical_index[value]
# change object attribute to new value
obj.attribute= new_value
# automagically object can be retrieved using new value of attribute
magical_index[new_value]
# become less materialistic: get rid of the objects in your开发者_运维百科 life
del obj
# object is really gone
magical_index[new_value]
KeyError: new_value
i want the object, indices, all to play nicely and seamlessly with each other.
please suggest appropriate design patterns
note: the above example is just that, an example. an example used to portray the generic problem. so please provide generic solutions (of course, you may choose to keep using the example when explaining your generic solution)
Consider this.
class Person( object ):
def __init__( self, name, addr, email, etc. ):
self.observer= []
... etc. ...
@property
def name( self ): return self._name
@name.setter
def name( self, value ):
self._name= value
for observer in self.observedBy: observer.update( self )
... etc. ...
This observer
attribute implements an Observable that notifies its Observers of updates. This is the list of observers that must be notified of changes.
Each attribute is wrapped with properties. Using Descriptors us probably better because it can save repeating the observer notification.
class PersonCollection( set ):
def __init__( self, *args, **kw ):
self.byName= collections.defaultdict(list)
self.byEmail= collections.defaultdict(list)
super( PersonCollection, self ).__init__( *args, **kw )
def add( self, person ):
super( PersonCollection, self ).append( person )
person.observer.append( self )
self.byName[person.name].append( person )
self.byEmail[person.email].append( person )
def update( self, person ):
"""This person changed. Find them in old indexes and fix them."""
changed = [(k,v) for k,v in self.byName.items() if id(person) == id(v) ]
for k, v in changed:
self.byName.pop( k )
self.byName[person.name].append( person )
changed = [(k,v) for k,v in self.byEmail.items() if id(person) == id(v) ]
for k, v in changed:
self.byEmail.pop( k )
self.byEmail[person.email].append( person)
... etc. ... for all methods of a collections.Set.
Use collections.ABC for more information on what must be implemented.
http://docs.python.org/library/collections.html#abcs-abstract-base-classes
If you want "generic" indexing, then your collection can be parameterized with the names of attributes, and you can use getattr
to get those named attributes from the underlying objects.
class GenericIndexedCollection( set ):
attributes_to_index = [ ] # List of attribute names
def __init__( self, *args, **kw ):
self.indexes = dict( (n, {}) for n in self.attributes_to_index ]
super( PersonCollection, self ).__init__( *args, **kw )
def add( self, person ):
super( PersonCollection, self ).append( person )
for i in self.indexes:
self.indexes[i].append( getattr( person, i )
Note. To properly emulate a database, use a set not a list. Database tables are (theoretically) sets. As a practical matter they are unordered, and an index will allow the database to reject duplicates. Some RDBMS's don't reject duplicate rows because -- without an index -- it's too expensive to check.
Well, another way may be to implement the following:
Attr
is an abstraction for a "value". We need this since there is no "assignment overloading" in Python (simple get / set paradigm is used as the cleanest alternative).Attr
also acts as an "Observable".AttrSet
is an "Observer" forAttr
s, which tracks their value changes while effectively acting as anAttr
-to-whatever (person
in our case) dictionary.create_with_attrs
is a factory producing what looks like a named-tuple, forwarding attribute access via suppliedAttr
s, so thatperson.name = "Ivan"
effectively yieldsperson.name_attr.set("Ivan")
and makes theAttrSet
s observing thisperson
'sname
appropriately rearrange their internals.
The code (tested):
from collections import defaultdict
class Attribute(object):
def __init__(self, value):
super(Attribute, self).__init__()
self._value = value
self._notified_set = set()
def set(self, value):
old = self._value
self._value = value
for n_ch in self._notified_set:
n_ch(old_value=old, new_value=value)
def get(self):
return self._value
def add_notify_changed(self, notify_changed):
self._notified_set.add(notify_changed)
def remove_notify_changed(self, notify_changed):
self._notified_set.remove(notify_changed)
class AttrSet(object):
def __init__(self):
super(AttrSet, self).__init__()
self._attr_value_to_obj_set = defaultdict(set)
self._obj_to_attr = {}
self._attr_to_notify_changed = {}
def add(self, attr, obj):
self._obj_to_attr[obj] = attr
self._add(attr.get(), obj)
notify_changed = (lambda old_value, new_value:
self._notify_changed(obj, old_value, new_value))
attr.add_notify_changed(notify_changed)
self._attr_to_notify_changed[attr] = notify_changed
def get(self, *attr_value_lst):
attr_value_lst = attr_value_lst or self._attr_value_to_obj_set.keys()
result = set()
for attr_value in attr_value_lst:
result.update(self._attr_value_to_obj_set[attr_value])
return result
def remove(self, obj):
attr = self._obj_to_attr.pop(obj)
self._remove(attr.get(), obj)
notify_changed = self._attr_to_notify_changed.pop(attr)
attr.remove_notify_changed(notify_changed)
def __iter__(self):
return iter(self.get())
def _add(self, attr_value, obj):
self._attr_value_to_obj_set[attr_value].add(obj)
def _remove(self, attr_value, obj):
obj_set = self._attr_value_to_obj_set[attr_value]
obj_set.remove(obj)
if not obj_set:
self._attr_value_to_obj_set.pop(attr_value)
def _notify_changed(self, obj, old_value, new_value):
self._remove(old_value, obj)
self._add(new_value, obj)
def create_with_attrs(**attr_name_to_attr):
class Result(object):
def __getattr__(self, attr_name):
if attr_name in attr_name_to_attr.keys():
return attr_name_to_attr[attr_name].get()
else:
raise AttributeError(attr_name)
def __setattr__(self, attr_name, attr_value):
if attr_name in attr_name_to_attr.keys():
attr_name_to_attr[attr_name].set(attr_value)
else:
raise AttributeError(attr_name)
def __str__(self):
result = ""
for attr_name in attr_name_to_attr:
result += (attr_name + ": "
+ str(attr_name_to_attr[attr_name].get())
+ ", ")
return result
return Result()
With the data prepared with
name_and_email_lst = [("John","email1@dot.com"),
("John","email2@dot.com"),
("Jack","email3@dot.com"),
("Hack","email4@dot.com"),
]
email = AttrSet()
name = AttrSet()
for name_str, email_str in name_and_email_lst:
email_attr = Attribute(email_str)
name_attr = Attribute(name_str)
person = create_with_attrs(email=email_attr, name=name_attr)
email.add(email_attr, person)
name.add(name_attr, person)
def print_set(person_set):
for person in person_set: print person
print
the following pseudo-SQL snippet sequence gives:
SELECT id FROM email
>>> print_set(email.get())
email: email3@dot.com, name: Jack,
email: email4@dot.com, name: Hack,
email: email2@dot.com, name: John,
email: email1@dot.com, name: John,
SELECT id FROM email WHERE email="email1@dot.com"
>>> print_set(email.get("email1@dot.com"))
email: email1@dot.com, name: John,
SELECT id FROM email WHERE email="email1@dot.com" OR email="email2@dot.com"
>>> print_set(email.get("email1@dot.com", "email2@dot.com"))
email: email1@dot.com, name: John,
email: email2@dot.com, name: John,
SELECT id FROM name WHERE name="John"
>>> print_set(name.get("John"))
email: email1@dot.com, name: John,
email: email2@dot.com, name: John,
SELECT id FROM name, email WHERE name="John" AND email="email1@dot.com"
>>> print_set(name.get("John").intersection(email.get("email1@dot.com")))
email: email1@dot.com, name: John,
UPDATE email, name SET email="jon@dot.com", name="Jon"
WHERE id IN
SELECT id FROM email WHERE email="email1@dot.com"
>>> person = email.get("email1@dot.com").pop()
>>> person.name = "Jon"; person.email = "jon@dot.com"
>>> print_set(email.get())
email: email3@dot.com, name: Jack,
email: email4@dot.com, name: Hack,
email: email2@dot.com, name: John,
email: jon@dot.com, name: Jon,
DELETE FROM email, name WHERE id=%s
SELECT id FROM email
>>> name.remove(person)
>>> email.remove(person)
>>> print_set(email.get())
email: email3@dot.com, name: Jack,
email: email4@dot.com, name: Hack,
email: email2@dot.com, name: John,
精彩评论