I have a table that contains approx 10 million rows. This table is periodically updated (few times a day) by an external process. The table contains information that, if not in the update, should be deleted. Of course, you don't know if its in the update until the update has finished.
Right now, we take the timestamp of when the update began. When the update finishes, anything that has an "updated" value less than the start timestammp is wiped. This works for now, but is problematic when the updater process crashes for whatever value - we have to start again with a ne开发者_开发知识库w timestamp value.
It seems to be that there must be something more robust as this is a common problem. Any advice?
Instead of a time stamp, use an integer revision number. Increment it ONLY when you have a complete update, and then delete the elements with out of date revisions.
If you use a storage engine that supports transactions, like InnoDb (you're using MySql right?), you can consider using transactions, so if the update process crashes, the modifications are not commited.
Here is the official documentation.
We don't know anything about your architecture, and how you do this update (pure SQL, webservice?), but you might already have a transaction management layer.
精彩评论