开发者

Remove all but one duplicate rows where datetime column values are within seconds of each other?

开发者 https://www.devze.com 2023-01-16 09:54 出处:网络
Due to an error in the system a tracking log was firing repeatedly causing what should have been one log entry to actually be in the hundreds. This has been resolved but the data is still there and ne

Due to an error in the system a tracking log was firing repeatedly causing what should have been one log entry to actually be in the hundreds. This has been resolved but the data is still there and needs to be for reporting (I can't just deleted it all). However I only want ONE instance of the data. This is going to be tricky I think, here are the relevant fields in the table:

int UserID, int ActorID, nvarchar(50) ActorType, int BoxID, datetime CreateDate, nvarchar(50) Query

Now for every row where all of those are identical and the difference in the CreateDate is within say, 30 seconds of each other, I want to delete all those rows but one.

So all the data in the fields listed will be exactly matched and the CreateDate will range like:

2010-08-17 14:50:11.620
2010-08-17 14:50:11.823
2010-08-17 14:50:12.057
2010-08-17 14:50:12.277
2010-08-17 14:50:12.527
2010-08-17 14:50:12.730
2010-08-17 14:50:12.980
2010-08-17 14:50:13.340
2010-08-17 14:50:13.450
2010-08-17 14:50:13.667
2010-08-17 14:50:13.887
2010-08-17 14:50:14.120
2010-08-17 14:50:14.323
2010-08-17 14:50:14.730
2010-08-17 14:50:14.807
2010-08-17 14:50:15.010
2010-08-17 14:50:15.357
...
2010-08-17 14:51:09.810
2010-08-17 14:51:10.047
2010-08-17 14:51:10.250
2010-08-17 14:51:10.500
2010-08-17 14:51:10.890
2010-08-17 14:51:10.953
2010-08-17 14:51:11.263
2010-08-17 14:51:11.437
2010-08-17 14:51:11.920
2010-08-17 14:51:12.170
2010-08-17 14:51:12.217
2010-08-17 14:51:12.420
2010-08-17 14:51:12.670
2010-08-17 14:51:12.873
2010-08-17 14:51:13.123
2010-08-17 14:51:13.373
2010-08-17 14:51:13.577
2010-08-17 14:51:13.797
2010-08-17 14:51:14.030
2010-08-17 14:51:14.280
2010-08-17 15:29:19.180
2010-08-17 15:32:32.497
2010-08-17 15:32:32.733
2010-08-17 15:32:32.967
2010-08-17 15:32:33.263
2010-08-17 15:32:33.513
2010-08-17 15:32:33.623
2010-08-17 15:32:33.857
2010-08-17 15:32:34.140
2010-08-17 15:32:34.327
2010-08-17 15:32:34.560
2010-08-17 15:32:34.780
2010-08-17 15:32:35.043
2010-08-17 15:32:35.247
2010-08-17 15:32:35.483
2010-08-17 15:32:35.717

But I just one to keep on开发者_高级运维e, I hope that is enough information.


Here's how you can get one row from each group of records that are grouped by the 30-second range. This query can be used to see which rows you would keep in the table.

WITH cte AS
    ( SELECT UserID, ActorID, ActorType, BoxID, Query, CreateDate,
        DATEDIFF(ss, '1/1/2000', CreateDate) / 30 AS CreateDateGroup,
        ROW_NUMBER() OVER (PARTITION BY UserID, ActorID, ActorType, BoxID, Query,
                                     DATEDIFF(ss, '1/1/2000', CreateDate) / 30
                           ORDER BY CreateDate ASC) AS sequence
    FROM TrackingLog
    )

SELECT UserID, ActorID, ActorType, BoxID, Query, CreateDate, CreateDateGroup, sequence
FROM cte
WHERE sequence = 1

Two columns are produced in the common table expression (CTE). The CreateDateGroup column is calculated by converting the CreateDate value to the number of seconds since '1/1/2000', and divided by 30 (as in seconds). The result is an integer, so the fractional part is truncated.

The sequence column is the row number within the group and is ordered by CreateDate, in ascending order. So, the oldest date in each group will be sequence 1.

The main query includes WHERE sequence = 1, which indicates you want to see the first row in each group.

When you are ready to delete the unwanted rows, you would alter the main query like the following:

WITH cte AS
    ( SELECT UserID, ActorID, ActorType, BoxID, Query, CreateDate,
        DATEDIFF(ss, '1/1/2000', CreateDate) / 30 AS CreateDateGroup,
        ROW_NUMBER() OVER (PARTITION BY UserID, ActorID, ActorType, BoxID, Query,
                                     DATEDIFF(ss, '1/1/2000', CreateDate) / 30
                           ORDER BY CreateDate ASC) AS sequence
    FROM TrackingLog
    )

DELETE
FROM cte
WHERE sequence > 1
;

This command will delete all rows from the table that are not the first row of each group.


group by all the fields except the timestamp and take the max(timestamp_field) value?

0

精彩评论

暂无评论...
验证码 换一张
取 消