开发者

How to find size of db.Model instances in GAE Python before calling db.put()?

开发者 https://www.devze.com 2023-02-14 05:11 出处:网络
I\'m writing an optimizer for my application,开发者_运维百科 so db.put() invoked as rarely as possible. I stuck with following problem:

I'm writing an optimizer for my application,开发者_运维百科 so db.put() invoked as rarely as possible. I stuck with following problem:

I have a number of classes derived from db.Model. The instances of those classes stored in list:

class DBPutter:
    data = [] # list of instances
    def add(self, model):
        # HERE I WANT TO CHECK THAT self.data IS NOT EXEEDING 1MB
        self.data.append(model)
        if len(self.data) == 1000:
            self.flush()  # actual call to db.put() using deferred

With this approach I receive alot of RequestTooLargeError exceptions. How do I check that my data is not exeeding 1MB?


Pympler has a asizeof method, and should run in python 2.5: http://code.google.com/p/pympler/

I think you're over-optimizing though. If an instance is shut down before 1000 objects are in your putter you could lose data. Also, I think using the deferred library with a large amount of data would result in at least two db.puts. One when the task is submitted (because the payload is over 10k), and one inside the task, actually writing your models.


As per the 1.4.0 release notes:

  • Size and quantity limits on datastore batch get/put/delete operations have been removed. Individual entities are still limited to 1 MB, but your app may batch as many entities together for get/put/delete calls as the overall datastore deadline will allow for.

That said, using deferred for this is pointless: Task Queue payloads are limited to 10k, and if your deferred payload is bigger than that, it will create a datastore entity to store the payload in. As a result, it's doing a datastore operation anyway, so you may as well do it yourself.

If you're storing thousands of entities, though, you almost certainly want to be doing the whole process on the task queue in the first place, rather than in an interactive request.


I don't work with GAE, but you could try to call sys.getsizeof on each of your models and verify that the sum is less than 1 MB.

Edit: See this ActiveState recipe for an alternative to sys.getsizeof, which should work in Python 2.5.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号