开发者

Pagination and Multiple relational entity indexes with AppEngine

开发者 https://www.devze.com 2023-01-20 23:33 出处:网络
This is a general question on doing pagination with models containing multiple entity indexes.This is easier with an example so let us consider the example that Brett Slatkin provided in his video (ht

This is a general question on doing pagination with models containing multiple entity indexes. This is easier with an example so let us consider the example that Brett Slatkin provided in his video (http://www.google.com/events/io/2009/sessions/BuildingScalableComplexApps.html).

You have your Message model (I have ignored the MessageIndex model) and Ive also added two extra properties (for publication and expiration times).

class Message(db.Model):
    sender = db.StringProperty()
    body = db.TextProperty()
    published = db.DateTimeProperty()
    expires = db.DateTimeProperty()

Now I am interested in doing filters on multiple fields, say messages published in a particular time window. eg

select * from Message where pubished > so开发者_StackOverflow中文版me_date and expires < some_other_date

Since GAE does not allow inequality filters on multiple fields, we have to break down the models into extra indexes. So now we have an index for the published and expires fields, giving us the following models (with a Message instance being the parent of MessagePublishedIndex and MessageExpiryIndex instances):

class Message(db.Model):
    sender = db.StringProperty()
    body = db.TextProperty()

class MessagePublishedIndex(db.Model):
    published = db.DateTimeProperty()

class MessageExpiryIndex(db.Model):
    expires = db.DateTimeProperty()

and the following key_only queries:

publish_keys = MessagePublishedIndex.all(key_only = True).filter("published >", some_date)
expire_keys = MessageExpiryIndex.all(key_only = True).filter("expires <", some_other_date)

msgs_by_pubdate = db.get([k.parent() for k in publish_keys])
msgs_by_expiry  = db.get([k.parent() for k in expire_keys])

Now Il have to do an intersection of these lists to find the common ones to get all messages within a particular time window.

This seems pretty wasteful. Is there an easier way to do this? Also this problem is exacerbated if the field in the index is a ListProperty because key_only queries cannot have "IN" filters. And even worse if I want pagination (ie "count" results from an "offset"), Il have to manually discard the first "offset" results and then do an intersection. Surely there has to be an easier (and smarter) way of doing this. Any ideas? It is bad enough GAE does not allow inequality filters on multiple fields (albeit for efficiency reasons), but to have to manually do all the zig-zags ourselves seems pretty inefficient (not to mention running over cpu and time limits).


In your scenario, I would create a single Relational Index Entity

class MessageIndex(db.Model):
    keywords = db.StringListProperty();

where each item in the keywords list would be in the format <property>=<value>

e.g.: keywords = ["published=2011-03-24", "expires=2011-03-25"]

You would need to do serialization/de-serialization yourself to get the property value. Alternatively, you still can store property values in the Message model just for redundancy. However, the approach does not work with range query. (I haven't tested but you probably can use query with prefix to fake range query: u"published=2010" + u"\ufffd", here is more details)

It's been always a challenge to optimize for GAE. But it's fun and rewarding.

0

精彩评论

暂无评论...
验证码 换一张
取 消