I am researching best practices for managing a weekly data feed to a 3rd party OLAP service. The analysis database will be 开发者_开发问答initialized with a full data dump. Subsequent weekly feeds will provide new and updated rows. The data is sourced from a SQL Server 2005 database.
What are the preferred approaches to detecting the new and updated rows?
Trigger a modified date field on inserts and updates and grab those greater than the last feed extraction; or
timestamp column on source tables and grab rows with timestamp greater than last feed extraction; or
some excellent idea I have not thought of...
Well that depends on how you define a changed row.
Number 2 works but any change at all to the row will change the timestamp even an update that doesn't actually change anything (say updating the value from 1 to 1). Sounds silly who would do that? But it's easy to do when you use dynamic code.
Number 1 can be modified to ensure there is a differnce between the inserted and deleted tables in the trigger and fixes the problem of Option 2.
However, suppose you have three different feeds that each contain different columns which might be in related tables and you only want to send if one of those actually changes. Now a trigger isn't specific enough to your feeds. We use change data tracking to identity which rows have changed since the last send date (which we store as part of all our SSIS pacakges) and then calculate rowhash data for specific columns to see if the data we are interested in actually changed. I've also seen people use a history table of what was sent and then compare those values (if you are only interested in one or two fields in table) to the ones that actually changed in the change data tracking table. When you have a requirement this specific, you end up with quite a complex system to support it (I'm just giving you a rough idea).
As said HLGEM, timestamp is not a good idea. Trigger is a good option, but here is another question - how do you plan to store changes? Just cloning all records to history tables? Answer depends from how large storage do you have. If yes, or if you have to store just limited in history records (say, only 1 year history) - you can do that. But if you have limited storage or if you want to store all changes in one table, i propose to use method which i prefer: using OUTPUT clause and store changed only columns value in one table as XML. I described this technique in my blog: part_1,part_2
精彩评论