I have a datastore with around 1,000,000 entities in 开发者_如何学编程a model. I want to fetch 10 random entities from this.
I am not sure how to do this? can someone help?
Assign each entity a random number and store it in the entity. Then query for ten records whose random number is greater than (or less than) some other random number.
This isn't totally random, however, since entities with nearby random numbers will tend to show up together. If you want to beat this, do ten queries based around ten random numbers, but this will be less efficient.
Jason Hall's answer and the one here aren't horrible, but as he mentions, they are not really random either. Even doing ten queries will not be random if, for example, the random numbers are all grouped together. To keep things truly random, here are two possible solutions:
Solution 1
Assign an index to each datastore object, keep track of the maximum index, and randomly select an index every time you want to get a random record:
MyObject.objects.filter('index =', random.randrange(0, maxindex+1))
Upside: Truly random. Fast.
Down-side: You have to properly maintain indices when adding and deleting objects, which can make both operations a O(N) operation.
Solution 2
Assign a random number to each datastore number when it is created. Then, to get a random record the first time, query for a record with a random number greater than some other random number and order by the random numbers (i.e. MyObject.order('rand_num').filter('rand_num >=', random.random())
). Then save that query as a cursor in the memcache. To get a random record after the first time, load the cursor from the memcache and go to the next item. If there is no item after the first, run the query again.
To prevent the sequence of objects from repeating, on every datastore read, give the entity you just read a new random number and save it back to the datastore.
Up-side: Truly random. No complex indices to maintain.
Down-side: Need to keep track of a cursor. Need to do a put every time you get a random record.
精彩评论