I've got a situation where I want to make 1000 different queries to the datastore, do some calculations on the results of each individual query (to get 1000 separate results), and return the list of results.
I would like the list of results to be returned as the response from the same 30-second user request that started the calculation, for better client-side performance. Hah!
I have a bold plan.
Each of these operations individually will usually have no problem finishing in under a second, none of them need to write to the same entity group as any other, and none of them need any information from any of the other queries. Might it be possible to start 1000 independent tasks, each taking on one of these queries, doing its calculations, and storing the result in some sort of temporary collection of entities? The original request could wait 10 seconds, and then do a single query for the re开发者_运维百科sults from the datastore (maybe they all set a unique value I can query on). Any results that aren't in yet would be noticed at the client end, and the client could just ask for those values again in another ten seconds.
The questions I hope experienced appengineers can answer are:
- Is this ludicrous? If so, is it ludicrous for any number of tasks? Would 50 at once be reasonable?
- I won't run into datastore contention if I'm reading the same entity 20 times a second, right? That contention stuff is all for writing?
- Is there an easier way to get a response from a task?
Yep, sounds pretty ludicrous :)
You shouldn't rely on the Taskqueue to operate like that. You can't rely on 1000 tasks being spawned that quickly (although they most likely will).
Why not use the Channel API to wait for your response. So your solution becomes:
- Client send request to Server
- Server spawns N tasks to do your calculations and responds to Client with a Channel API token
- Client listens to the Channel using token
- Once all the tasks are finished Server pushes response to Client via the Channel
This would avoid any timeout issues that would very likely arrise from time to time due to tasks not executing as fast as you like, or some other reason.
The Task Queue doesn't provide firm guarantees on when a task will execute - the ETA (which defaults to the current time) is the earliest time at which it will execute, but if the queue is backed up, or there are no instances available to execute the task, it could execute much later.
One option would be to use Datastore Plus / NDB, which allows you to execute queries in parallel. 1000 queries is going to be very expensive, however, no matter how you execute them.
Another option, as @Chris suggests, is to use the task queue with the Channel API, so you can notify the user asynchronously when the queries complete.
精彩评论