I am trying to design tables to buildout a follower relationship.
Say I have a stream of 140char records that have user, hashtag and other text.
Users follow other users, and can also follow hashtags.
I am outlining the way I've designed this below, but there are two limitaions in my design. I was wondering if others had smarter ways to accomplish the same goal.
The issues with this are
- The list of followers is copied in for each record
- If a new follower is added or one removed, 'all' the records have to be updated.
The code
class HashtagFollowers(db.Model):
"""
This table contains the followers for each hashtag
"""
hashtag = db.StringProperty()
followers = db.StringListProperty()
class UserFollowers(db.Model):
"""
This table contains the followers for each user
"""
username = db.StringProperty()
followers = db.StringListProperty()
class stream(db.Model):
"""
This table contains the data stream
"""
username = db.StringProperty()
hashtag = db.StringProperty()
text = db.TextProperty()
def save(self):
"""
On each save all the followers for each开发者_如何学C hashtag and user
are added into a another table with this record as the parent
"""
super(stream, self).save()
hfs = HashtagFollowers.all().filter("hashtag =", self.hashtag).fetch(10)
for hf in hfs:
sh = streamHashtags(parent=self, followers=hf.followers)
sh.save()
ufs = UserFollowers.all().filter("username =", self.username).fetch(10)
for uf in ufs:
uh = streamUsers(parent=self, followers=uf.followers)
uh.save()
class streamHashtags(db.Model):
"""
The stream record is the parent of this record
"""
followers = db.StringListProperty()
class streamUsers(db.Model):
"""
The stream record is the parent of this record
"""
followers = db.StringListProperty()
Now, to get the stream of followed hastags
indexes = db.GqlQuery("""SELECT __key__ from streamHashtags where followers = 'myusername'""")
keys = [k,parent() for k in indexes[offset:numresults]]
return db.get(keys)
Is there a smarter way to do this?
The problem you want to solve is called the fan-out problem.
Brett Slatkin from the Google App Engine team gave a talk with a efficient/scalable solution to fan-out problem on the App Engine. You can find a video of the talk here:
http://code.google.com/events/io/2009/sessions/BuildingScalableComplexApps.html
Yes this is the fan-out problem as others have noted and Brett Slatkin's talk should be looked at by those interested.
However, I raised 2 specific limitations i.e.
- The list of followers is copied in for each record
This as they say is not a bug but a feature. In fact it is in this way that fan-out on appengine scales.
- If a new follower is added or one removed, 'all' the records have to be updated.
Either that OR do nothing so future records are not followed. In other words one does not just follow people's streams one follows people's stream at a given time. So if on day 2 you unfollow, your follower stream will still show records from the user that came in on day one, but not day two and onwards. [Note: This is different from how twitter does it]
You could use a reference property and then have a common table with the followers in it, which you reference to
I'm not sure how to do this in Google App-Engine, but one database schema I would consider would be:
Tables: User -- a table of users with their attributes HashTag -- a table of HashTags with their attributes Follows -- a table that defines who follows whom Columns in the Follows table: followed int, -- the id of the followed entity (could be User or Hashtag) followed_is_user bit, -- whether the followed item is a User followed_is_tag bit, -- whether the followed item is a HashTag follower int -- the id of the follower (this can only be a User so you may want to make this a foreign key on the User table)
You could probably condense the two bit columns into one, but this would allow you to add other things that Users could follow in the future.
精彩评论