I would like to use celery w/ rabbitmq as a fault tolerant scheduler in a distributed environment. By fault tolerant, i mean that if a task is 开发者_运维百科given to a worker and that worker goes down for whatever reason, celery should be able to reschedule it to another server. How is it possible to achieve this in an environment where there are multiple worker nodes?
Probably all you need is just to set CELERY_ACKS_LATE
Late ack means the task messages will be acknowledged after the task has been executed, not just before, which is the default behaviour. In this way if the worker crash rabbit MQ still have the message.
Here more info
Retry Lost or Failed Tasks (Celery, Django and RabbitMQ)
Have each of the workers consume from the same queue, and Rabbit will round-robin the messages to the workers (consumers). If any one of them fails while processing a job and before it had a chance to send its acknowledgment, the message will be automatically placed back on the queue and the next worker will pick it up. This is an "at least once" delivery pattern.
This link from the RabbitMQ site explains the pattern and includes Python sample code.
精彩评论