开发者

Need a way to count entities in GAE datastore that meet a certain condition? (over 1000 entities)

开发者 https://www.devze.com 2022-12-19 18:09 出处:网络
I\'m building an app on GAE that needs to report on events occurring. An event has a type and I 开发者_运维问答also need to report by event type.

I'm building an app on GAE that needs to report on events occurring. An event has a type and I 开发者_运维问答also need to report by event type.

For example, say there is an event A, B and C. They occur periodically at random. User logs in and creates a set of entities to which those events can be attributed. When the user comes back to check the status, I need to be able to tell how many events of A, B and/or C occurred during a specific time range, say a day or a month.

The 1000 limit is throwing a wrench into how I would normally do it. I don't need to retrieve all of the entities and present them to the user, but I do need to show the total count for a specific date range. Any suggestions?

I'm a bit of python/GAE noob...


App Engine is not a relational database and you won't be able to quickly do counts on the fly like this. The best approach is to update the counts at write time, not generate them at read time.

When generating counts, there are two general approaches that scale well with App Engine to minimize write contention:

  1. Store the count in Memcache or local memory and periodically flush. This is the simplest solution, but it can be volatile and data loss is probable.
  2. Use a Sharded Counter. This approach is a bit more reliable but more complex. You won't be able to sort easily by count, but you could also periodically flush to another indexed count field periodically and sort by that.


Results of datastore count() queries and offsets for all datastore queries are no longer capped at 1000.

Since Version 1.3.6


My approach would be to have an aggregate model or models to keep track of event types, dates and counts. I'm not 100% how you should model this given your requirements, though.

Then, I'd fire off deferred tasks to asynchronously update the appropriate aggregate models whenever a user does something that triggers an event.

Nick Johnson's Background work with the deferred library article has much more information, and provides a framework that you might find useful for doing the kind of aggregation you're talking about.


Would a solution using cursors (like the one below) work for you? I personally use this method to count the number of entries in a scenario similar to yours, and haven't seen yet any problems with it (although I run on a schedule, since constant querying of the data store is pretty taxing on the CPU quota).

def count(query): 
  i = 0 
  while True: 
    result = query.fetch(1000) 
    i = i + len(result) 
    if len(result) < 1000: 
      break 
    cursor = query.cursor() 
    query.with_cursor(cursor) 
  return i 


This post is quite old, but I would like to provide a useful reference. App Engine now offers a built-in API to access datastore statistics:

For Python,

from google.appengine.ext.db import stats

global_stat = stats.GlobalStat.all().get()
print 'Total bytes stored: %d' % global_stat.bytes
print 'Total entities stored: %d' % global_stat.count

For Java,

import com.google.appengine.api.datastore.DatastoreService;
import com.google.appengine.api.datastore.DatastoreServiceFactory;
import com.google.appengine.api.datastore.Entity;
import com.google.appengine.api.datastore.Query;

// ...
DatastoreService datastore = DatastoreServiceFactory.getDatastoreService();
Entity globalStat = datastore.prepare(new Query("__Stat_Total__")).asSingleEntity();
Long totalBytes = (Long) globalStat.getProperty("bytes");
Long totalEntities = (Long) globalStat.getProperty("count");

It is also possible to filter entities number only for a particular kind. Take a look at this reference:

https://developers.google.com/appengine/docs/python/datastore/stats https://developers.google.com/appengine/docs/java/datastore/stats


This sounds very similar to a question that I posed on StackOverflow.

How to get the distinct value of one of my models in Google App Engine I needed to know how to get a distinct values for an entities within my models and there is going to be over 1000 entities for that model.

Hope that helps.

0

精彩评论

暂无评论...
验证码 换一张
取 消