Is it viable to have a logger entity in app engine for writing logs? I'll have an app with ~1500req/sec and am thinking about doing it with a taskqueue. Whenever I receive a request, I would create a task and put it in a queue to write something to a log entity (with a date and string properties).
I need this because I have to put statistics in the site that I think that doing it this way and reading the logs with a backend later would solve the problem. Would ro开发者_如何学Cck if I had programmatic access to the app engine logs (from logging), but since that's unavailable, I dont see any other way to do it..
Feedback is much welcome
There are a few ways to do this:
- Accumulate logs and write them in a single datastore put at the end of the request. This is the highest latency option, but only slightly - datastore puts are fairly fast. This solution also consumes the least resources of all the options.
- Accumulate logs and enqueue a task queue task with them, which writes them to the datastore (or does whatever else you want with them). This is slightly faster (task queue enqueues tend to be quick), but it's slightly more complicated, and limited to 100kb of data (which hopefully shouldn't be a limitation).
- Enqueue a pull task with the data, and have a regular push task or a backend consume the queue and batch-and-insert into the datastore. This is more complicated than option 2, but also more efficient.
- Run a backend that accumulates and writes logs, and make URLFetch calls to it to store logs. The urlfetch handler can write the data to the backend's memory and return asynchronously, making this the fastest in terms of added user latency (less than 1ms for a urlfetch call)! This will require waiting for Python 2.7, though, since you'll need multi-threading to process the log entries asynchronously.
You might also want to take a look at the Prospective Search API, which may allow you to do some filtering and pre-processing on the log data.
How about keeping a memcache data structure of request info (recorded as they arrive) and then run an every 5 minute (or faster) cron job that crunches the stats on the last 5 minutes of requests from the memcache and just records those stats in the data store for that 5 minute interval. The same (or a different) cron job could then clear the memcache too - so that it doesn't get too big.
Then you can run big-picture analysis based on the aggregate of 5 minute interval stats, which might be more manageable than analyzing hours of 1500req/s data.
精彩评论