开发者

Tasks queue process in python

开发者 https://www.devze.com 2023-01-10 23:23 出处:网络
Task is: I have task queue stored in db. It grows. I need to solve t开发者_JS百科asks by python script when I have resources for it. I see two ways:

Task is: I have task queue stored in db. It grows. I need to solve t开发者_JS百科asks by python script when I have resources for it. I see two ways:

  1. python script working all the time. But i don't like it (reason posible memory leak).

  2. python script called by cron and do a little part of task. But i need to solve the problem of one working active script in memory (To prevent active scripts count grow). What is the best solution to implement it in python?

Any ideas to solve this problem at all?


You can use a lockfile to prevent multiple scripts from running out of cron. See the answers to an earlier question, "Python: module for creating PID-based lockfile". This is really just good practice in general for anything that you need to make sure won't have multiple instances running, actually, so you should look into it even if you do have the script running constantly, which I do suggest.

For most things, it shouldn't be too hard to avoid memory leaks, but if you're having a lot of trouble with it (I sometimes do with complex third-party web frameworks, for example), I would suggest instead writing the script with a small, carefully-designed main loop that monitors the database for new jobs, and then uses the multiprocessing module to fork off new processes to complete each task.

When a task is complete, the child process can exit, immediately freeing any memory that isn't properly garbage collected, and the main loop should be simple enough that you can avoid any memory leaks.

This also offers the advantage that you can run multiple tasks in parallel if your system has more than one CPU core, or if your tasks spend a lot of time waiting for I/O.


This is a bit of a vague question. One thing you should remember is that it is very difficult to leak memory in Python, because of the automatic garbage collection. croning a Python script to handle the queue isn't very nice, although it would work fine.

I would use method 1; if you need more power you could make a small Python process that monitors the DB queue and starts new processes to handle the tasks.


I'd suggest using Celery, an asynchronous task queuing system which I use myself.

It may seem a bit heavy for your use case, but it makes it easy to expand later by adding more worker resources if/when needed.

0

精彩评论

暂无评论...
验证码 换一张
取 消