I have a multiste开发者_运维技巧p process where each step does some network IO (web service call) and then persists some data. I want to design it in a fault tolerant way so that if the service fails, either because of a system crash or one of the steps fails, I am able to recover and re-start from the last error free step.
Here is how I am thinking of addressing this (this is pretty high level):
- Stored the state of each step (NOT_STARTED, IN_PROGRESS, FAILED) in a database table
- If a step fails mark it and its dependent step as "FAILED" and move to the next non dependent step
- Recover by reading this table (e.g in a bootstrap portion of the application)
I was wondering if there are some design patterns, frameworks and algorithms that address this problem.
This is a nice paper, "Design patterns for checkpoint based recovery", that addresses the problem.
You may consider the Chain Of Responsibility Design Pattern: http://en.wikipedia.org/wiki/Chain-of-responsibility_pattern
Memento (GoF) could be used to store the state before a potentially failing call.
One good keyword to search for would be transactions. That allows you to roll-back changes that occurred on account of a failure to the nearest "stable" state. This is one thing your DB would be providing.
The Command pattern also has known uses of transactions.
In terms of behavioral design patterns, I would recommend looking into the following, as they seem well equipped for your needs. Keep in mind that this list is based off of a very high-level understanding of your implementation.
- Template method - For defining program skeletons
- Strategy - For swapping algorithms as needed
- Memento - For restoring objects to their previous states
- State - Coupled with the memento patterns
If you're not already familiar, I would STRONGLY recommend looking up the Model-View-Controller and Model-View-Presenter patterns, as they will make your development experience much more enjoyable.
If you have any followup questions, feel free to ask. :)
精彩评论