How can I write a query to extract individual changes from snapshots of data?_问答_开发者

I need to create a process that will extract the changes from a table where each row is a snapshot of a row in another table. The real-world problem involves many tables with many fields, but as a simple example, suppose that I have the following snapshot data:

Sequence    DateTaken      ID       Field1    Field2
--------    -----------    ----     ------    ------
       1    '2011-01-01'      1     'Red'          2
       2    '2011-01-01'      2     'Blue'        10
       3    '2011-02-01'      1     'Green'        2
       4    '2011-03-01'      1     'Green'        3
       5    '2011-03-01'      2     'Purple'       2
       6    '2011-04-01'      1     'Yellow'       2

The Sequence and DateTaken fields relate directly to the snapshot table itself. The ID field is the primary key of the source table and Field1 and Field2 are other fields in the same (source) table.

I can get part-way to a solution with a query like this:

WITH Snapshots (Sequence, DateTaken, ID, Field1, Field2, _Index)
AS
(
    SELECT Sequence, DateTaken, ID, Field1, Field2, ROW_NUMBER() OVER (ORDER BY ID, Sequence) _Index
    FROM #Snapshots
)
SELECT
      c.DateTaken, c.ID
    , c.Field1 Field1_Current, p.Field1 Field1_Previous, CASE WHEN c.Field1 = p.Field1 THEN 0 ELSE 1 END Field1_Changed
    , c.Field2 Field2_Current, p.Field2 Field2_Previous, CASE WHEN c.Field2 = p.Field2 THEN 0 ELSE 1 END Field2_Changed
FROM Snapshots c
JOIN Snapshots p ON p.ID = c.ID AND (p._Index + 1) = c._Index
ORDER BY c.Sequence DESC

The above query will identify what is changing from one snapshot to the next, but it is still not in the form that I need. Each row in the output may contain several changes. At the end of the day, I need one row per change that identifies what field was changed, along with its previous/current values. Fields that have not actually changed will need to be excluded from the final output. So if the above query output is like this:

DateTaken   ID  Field1_Current  Field1_Previous  Field1_Changed  Field2_Current  Field2_Previous  Field2_Changed
----------  --  --------------  ---------------  --------------  --------------  ---------------  --------------
2011-04-01  1   Yellow          Green            1               2               3                1
2011-02-01  1   Green           Red              1               2               2                0

I need to transform that into something like this:

DateTaken   ID  Field    Previous   Current
----------  --  -------  --------   ---------
2011-04-01  1   Field1   Green      Yellow
2011-04开发者_如何学C-01  1   Field2   3          2
2011-02-01  1   Field1   Red        Green

I thought I might be able to get there with UNPIVOT, but I've not been able to make that work. I consider any solution involving cursors or similar to be an absolute last resort.

Thanks much for any advice.

Here's a working sample that uses UNPIVOT. It's based on my answer to my question Better way to Partially UNPIVOT in Pairs in SQL

This has some nice features.

Adding additional fields is easy. Just add values to the SELECT and UNPIVOT clause. You don't have to add additional UNION clauses
The where clause WHERE curr.value <> prev.value never changes regardless of how many fields are added.
The performance is surprisingly fast.
Its portable to Current versions of Oracle if you need that

SQL

Declare @Snapshots as table(
Sequence int,
DateTaken      datetime,
[id] int,
field1 varchar(20),
field2 int)



INSERT INTO @Snapshots VALUES 

      (1,    '2011-01-01',      1,     'Red',          2),
      (2,    '2011-01-01',      2,     'Blue',        10),
      (3,    '2011-02-01',      1,     'Green',        2),
      (4,    '2011-03-01',      1,     'Green' ,       3),
      (5,    '2011-03-01',      2,     'Purple',       2),
      (6,    '2011-04-01',      1,     'Yellow',       2)

;WITH Snapshots (Sequence, DateTaken, ID, Field1, Field2, _Index)
AS
(
    SELECT Sequence, DateTaken, ID, Field1, Field2, ROW_NUMBER() OVER (ORDER BY ID, Sequence) _Index
    FROM @Snapshots
)
,  data as(
SELECT
     c._Index
    , c.DateTaken
    ,  c.ID
    , cast(c.Field1  as varchar(max)) Field1
    , cast(p.Field1  as varchar(max))Field1_Previous
    , cast(c.Field2   as varchar(max))Field2
    , cast(p.Field2  as varchar(max)) Field2_Previous 


FROM Snapshots c
JOIN Snapshots p ON p.ID = c.ID AND (p._Index + 1) = c._Index
)


, fieldsToRows 
     AS (SELECT DateTaken, 
                id,
                _Index,
                value,
                field

         FROM   data p UNPIVOT (value FOR field IN (field1, field1_previous, 
                                                        field2, field2_previous) ) 
                AS unpvt
        ) 
SELECT 
    curr.DateTaken,
    curr.ID,
    curr.field,
    prev.value previous,
    curr.value 'current'

FROM 
        fieldsToRows curr 
        INNER  JOIN  fieldsToRows prev
        ON curr.ID = prev.id
            AND curr._Index = prev._Index 
            AND curr.field + '_Previous' = prev.field
WHERE 
    curr.value <> prev.value

Output

DateTaken               ID          field     previous current
----------------------- ----------- --------- -------- -------
2011-02-01 00:00:00.000 1           Field1    Red      Green
2011-03-01 00:00:00.000 1           Field2    2        3
2011-04-01 00:00:00.000 1           Field1    Green    Yellow
2011-04-01 00:00:00.000 1           Field2    3        2
2011-03-01 00:00:00.000 2           Field1    Blue     Purple
2011-03-01 00:00:00.000 2           Field2    10       2

WITH Snapshots (Sequence, DateTaken, ID, Field, FieldValue, _Index) AS
(
    SELECT
        Sequence,
        DateTaken,
        ID,
        'Field1' AS Field
        CAST(Field1 AS VARCHAR(100)) AS FieldValue,  -- Find an appropriate length
        ROW_NUMBER() OVER (ORDER BY ID, Sequence)
    FROM
        #Snapshots
    UNION ALL
    SELECT
        Sequence,
        DateTaken,
        ID,
        'Field2' AS Field
        CAST(Field2 AS VARCHAR(100)) AS FieldValue,  -- Find an appropriate length
        ROW_NUMBER() OVER (ORDER BY ID, Sequence)
    FROM
        #Snapshots
)
SELECT
    S1.DateTaken,
    S1.ID,
    S1.Field,
    S1.FieldValue AS Previous,
    S2.FieldValue As New   -- Not necessarily "Current"
FROM
    Snapshots S1
INNER JOIN Snapshots S2 ON
    S2.ID = S1.ID AND
    S2.Field = S1.Field AND
    S2._Index = S1._Index + 1 AND
    S2.FieldValue <> S1.FieldValue    -- Might need to handle NULL values