开发者

Google app engine - Users, Lists and Products a Join question on effiency

开发者 https://www.devze.com 2023-03-16 13:39 出处:网络
Let\'s take this models: User - name Product - name - category List - name - creation_date - user (refer开发者_JS百科ence)

Let's take this models:

User
  - name

Product 
  - name
  - category

List 
  - name
  - creation_date
  - user (refer开发者_JS百科ence)

Product_List
  - list ( reference)
  - product ( reference)

How can I retrieve a list of the products that remain out of the list?

  • Should I retrieve them all and then delete them programmatically ( doesn't this make the request slower? )

    1. Get all the products of a certain list of a certain user
    2. Get all the products
    3. Extract the difference ( a nested for? )

Sorry, I'm kind of newbie on this, suggestions and comments are welcome!

Thanks!


If you structure your data like this:

class Product(db.Model):
  # ...

class UserInfo(db.Model):
  # ...

class ProductList(db.Model):
  owner = db.ReferenceProperty(UserInfo)
  products = db.ListProperty(db.Key)

Then you can retrieve the products not in a list like this:

product_keys = set(Product.all(keys_only=True).fetch(1000))
product_list = ProductList.get_by_id(product_list_id)
missing_products = product_keys - set(product_list.products)

missing_products is a set of keys, which you can pass to db.get to retrieve the corresponding product entities.

This will, of course, require retrieving the entire list of products, but this is exactly what a relational database would have to do to satisfy the query too.


For this particular cleanup job, you may want to use a map reduce job to scan through your set of Product_Lists.

In the future, try to keep such situations from occurring - for example, you could make each List an entity group, and when the List is removed, also remove all associated Product_Lists in the same transaction. Alternately, eliminate Product_Lists entirely, and just use a list property in the List entity itself.

In general, the lack of joins is a sacrifice you have to make for scalability. Many joins are only efficient when the entire dataset is in local memory; as such, once you have a database too large to fit in a single server, you will always need to sacrifice joins. This may mean major changes to your schema, and/or turning some operations into batch jobs. GAE is simply forcing you to make these design changes up front, when it's easier to do so, rather than later when you're pressed for time.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号