What would be the best way to handle lightweight crash recovery for my program?
I have a Python program that runs a number of test cases and the results are st开发者_运维知识库ored in a dictionary which serves as a cache. If I could save (and then restore) each item that is added to the dictionary, I could simply run the program again and the caching would provide suitable crash recovery.
- You may assume that the keys and values in the dictionary are easily convertible to strings ie. using either str or the pickle module.
- I want this to be completely cross platform - well at least as cross platform as Python is
- I don't want to simply write out each value to a file and load it in my program might crash while I am writing the file
- UPDATE: This is intended to be a lightweight module so a DBMS is out of the question.
- UPDATE: Alex is correct in that I don't actually need to protect against crashes while writing out, but there are circumstances where I would like to be able to manually terminate it in a recoverable state.
- UPDATE Added a highly limited solution using standard input below
There's no good way to guard against "your program crashing while writing a checkpoint to a file", but why should you worry so much about that?! What ELSE is your program doing at that time BESIDES "saving checkpoint to a file", that could easily cause it to crash?!
It's hard to beat pickle
(or cPickle
) for portability of serialization in Python, but, that's just about "turning your keys and values to strings". For saving key-value pairs (once stringified), few approaches are safer than just appending to a file (don't pickle to files if your crashes are far, far more frequent than normal, as you suggest tjey are).
If your environment is incredibly crash-prone for whatever reason (very cheap HW?-), just make sure you close the file (and fflush if the OS is also crash-prone;-), then reopen it for append. This way, worst that can happen is that the very latest append will be incomplete (due to a crash in the middle of things) -- then you just catch the exception raised by unpickling that incomplete record and redo only the things that weren't saved (because they weren't completed due to a crash, OR because they were completed but not fully saved due to a crash, comes to much the same thing in the end).
If you have the option of checkpointing to a database engine (instead of just doing so to files), consider it seriously! The DB engine will keep transaction logs and ensure ACID properties, making your application-side programming much easier IF you can count on that!-)
The pickle module supports serializing objects to a file (and loading from file):
http://docs.python.org/library/pickle.html
One possibility would be to create a number of smaller files ... each representing a subset of the state that you're trying to preserve and each with a checksum or tag indicating that it's complete as the last line/datum of the file (just before the file is closed).
If the checksum/tag is good then the rest of the data can be considered valid ... though program would then have to find all of these files, open and read all of them, and use meta data you've provided (in their headers or their names?) to determine which ones constitute the most recent cohesive state representation (or checkpoint) from which you can continue processing.
Without knowing more about the nature of the data that you're working with it's impossible to be more specific.
You can use files, of course, or you could use a DBMS system just about as easily. Any decent DBMS (PostgreSQL, MySQL if you're using the proper storage back-ends) can give you ACID guarantees and transactional support. So the data you read back should always be consistent with the constraints that you put in your schema and/or with the transactions (BEGIN, COMMIT, ROLLBACK) that you processed.
A possible advantage of posting your serialized date to a DBMS is that you can host the DBMS on a separate system (which is unlikely to suffer the same instabilities as your test host at the same times).
Pickle/cPickle have problems.
I use the JSON module to serialize objects out. I like it because not only does it work on any OS, but it will work fine in other programming languages, too; many other languages and platforms have readily-accessible JSON deserialization support, which makes it easy to use the same objects in different programs.
Solution with severe restrictions
If I don't worry about it crashing while writing out and I only want to allow manual termination, I can use standard output to control this. Unfortunately, this can only terminate the program when a control point is reached. This could be solved by creating a new thread to read standard input. This thread could use a global lock to check if the main thread is inside a critical section (writing to a file) and terminate the program if this is not the case.
Downsides:
- This is reasonably complex
- It adds an extra thread
- It stops me using standard input for anything else
精彩评论