Currently my application caches models in memcache like this:
memcache.set("somekey", aModel)
But Nicks' post at http://blog.notdot.net/2009/9/Efficient-model-memcaching suggests that first converting it to protobuffers is a lot more efficient. But after running some tests I found out it's indeed smaller in size, but actually slower (~10%).
Do others have the same experience or am I doing something wrong?
Test results: http://1.latest.sofatest.appspot.com/?times=1000
import pickle
import time
import uuid
from google.appengine.ext import webapp
from google.appengine.ext import db
from google.appengine.ext.webapp import util
from google.appengine.datastore import entity_pb
from google.appengine.api import memcache
class Person(db.Model):
name = db.StringProperty()
times = 10000
class MainHandler(webapp.RequestHandler):
def get(self):
self.response.headers['Content-Type'] = 'text/plain'
m = Person(name='Koen Bok')
t1 = time.time()
for i in xrange(int(self.request.get('times', 1))):
key = uuid.uuid4().hex
memcache.set(key, m)
r = memcache.get(key)
self.response.out.write('Pickle took: %.2f' % (time.time() - t1开发者_开发问答))
t1 = time.time()
for i in xrange(int(self.request.get('times', 1))):
key = uuid.uuid4().hex
memcache.set(key, db.model_to_protobuf(m).Encode())
r = db.model_from_protobuf(entity_pb.EntityProto(memcache.get(key)))
self.response.out.write('Proto took: %.2f' % (time.time() - t1))
def main():
application = webapp.WSGIApplication([('/', MainHandler)], debug=True)
util.run_wsgi_app(application)
if __name__ == '__main__':
main()
The Memcache call still pickles the object with or without using protobuf. Pickle is faster with a protobuf object since it has a very simple model
Plain pickle objects are larger than protobuf+pickle objects, hence they save time on Memcache, but there is more processor time in doing the protobuf conversion
Therefore in general either method works out about the same...but
The reason you should use protobuf is it can handle changes between versions of the models, whereas Pickle will error. This problem will bite you one day, so best to handle it sooner
Both pickle and protobufs are slow in App Engine since they're implemented in pure Python. I've found that writing my own, simple serialization code using methods like str.join tends to be faster since most of the work is done in C. But that only works for simple datatypes.
One way to do it more quickly is to turn your model into a dictionary and use the native eval / repr function as your (de)serializers -- with caution of course, as always with the evil eval, but it should be safe here given that there is no external step.
Below an example of a class Fake_entity implementing exactly that.
You first create your dictionary through fake = Fake_entity(entity)
then you can simply store your data via memcache.set(key, fake.serialize())
. The serialize() is a simple call to the native dictionary method of repr, with some additions if you need (e.g. add an identifier at the beginning of the string).
To fetch it back, simply use fake = Fake_entity(memcache.get(key))
. The Fake_entity object is a simple dictionary whose keys are also accessible as attributes. You can access your entity properties normally, except referenceProperties give keys instead of fetching the object (which is actually quite useful). You can also get() the actual entity with fake.get(), or more interestigly, change it and then save with fake.put().
It does not work with lists (if you fetch multiple entities from a query), but could be easily be adjusted with join/split functions using an identifier like '### FAKE MODEL ENTITY ###' as the separator. Use with db.Model only, would need small adjustments for Expando.
class Fake_entity(dict):
def __init__(self, record):
# simple case: a string, we eval it to rebuild our fake entity
if isinstance(record, basestring):
import datetime # <----- put all relevant eval imports here
from google.appengine.api import datastore_types
self.update( eval(record) ) # careful with external sources, eval is evil
return None
# serious case: we build the instance from the actual entity
for prop_name, prop_ref in record.__class__.properties().items():
self[prop_name] = prop_ref.get_value_for_datastore(record) # to avoid fetching entities
self['_cls'] = record.__class__.__module__ + '.' + record.__class__.__name__
try:
self['key'] = str(record.key())
except Exception: # the key may not exist if the entity has not been stored
pass
def __getattr__(self, k):
return self[k]
def __setattr__(self, k, v):
self[k] = v
def key(self):
from google.appengine.ext import db
return db.Key(self['key'])
def get(self):
from google.appengine.ext import db
return db.get(self['key'])
def put(self):
_cls = self.pop('_cls') # gets and removes the class name form the passed arguments
# import xxxxxxx ---> put your model imports here if necessary
Cls = eval(_cls) # make sure that your models declarations are in the scope here
real_entity = Cls(**self) # creates the entity
real_entity.put() # self explanatory
self['_cls'] = _cls # puts back the class name afterwards
return real_entity
def serialize(self):
return '### FAKE MODEL ENTITY ###\n' + repr(self)
# or simply repr, but I use the initial identifier to test and eval directly when getting from memcache
I would welcome speed tests on this, I would assume this is quite faster than the other approaches. Plus, you do not have any risks if your models have changed somehow in the meantime.
Below an example of what the serialized fake entity looks like. Take a particular look at datetime (created) as well as reference properties (subdomain) :
### FAKE MODEL ENTITY ###{'status': u'admin', 'session_expiry': None, 'first_name': u'Louis', 'last_name': u'Le Sieur', 'modified_by': None, 'password_hash': u'a9993e364706816aba3e25717000000000000000', 'language': u'fr', 'created': datetime.datetime(2010, 7, 18, 21, 50, 11, 750000), 'modified': None, 'created_by': None, 'email': u'chou@glou.bou', 'key': 'agdqZXJlZ2xlcgwLEgVMb2dpbhjmAQw', 'session_ref': None, '_cls': 'models.Login', 'groups': [], 'email___password_hash': u'chou@glou.bou+a9993e364706816aba3e25717000000000000000', 'subdomain': datastore_types.Key.from_path(u'Subdomain', 229L, _app=u'jeregle'), 'permitted': [], 'permissions': []}
Personally I also use static variables (faster than memcache) to cache my entities in the short term, and fetch the datastore when the server has changed or its memory has been flushed for some reason (which happens quite often in fact).
精彩评论