开发者

SQL query to get the firsts rows 24 hours apart from each other for a specific fields combination

开发者 https://www.devze.com 2023-03-25 01:43 出处:网络
although I\'ve been using si开发者_开发百科mple CTEs in SQL Server Development I\'ve not been able to figure out how to build this one, my scenario:

although I've been using si开发者_开发百科mple CTEs in SQL Server Development I've not been able to figure out how to build this one, my scenario:

I have a log table with connection events, the platform that produces the records sometimes inserts several records for the same event in a short period of time, so the query needs to find only the first record in a 24 hour period for the same fields combination (in this case for the same MAC, IP and eventCode), in order to show a report with "clean" event records.

Table structure:

CREATE TABLE EventsLog(
    [eventDate] [datetime] NOT NULL,
    [MAC] [varchar](30) NOT NULL,
    [IP] [varchar](20) NULL,
    [eventCode] [int] NULL
) ON [PRIMARY]

Sample data:

eventDate               MAC                            IP                   eventCode
----------------------- ------------------------------ -------------------- -----------
2011-06-01 23:37:05.000 00:04:06:CA:F2:17              90.72.118.70        31276197
2011-06-03 01:21:33.000 00:04:06:CA:F2:17              90.72.118.70        31276198
2011-06-03 13:35:36.000 00:04:06:CA:F2:17              90.72.118.70        31276206
2011-06-03 13:35:54.000 00:04:06:CA:F2:17              90.72.118.70        31276206
2011-06-03 13:38:48.000 00:04:06:CA:F2:17              90.72.118.70        31276206
2011-06-03 13:39:23.000 00:04:06:CA:F2:17              90.72.118.70        31276206
2011-06-03 23:39:34.000 00:04:06:CA:F2:17              90.72.118.70        31304578
2011-06-04 23:39:41.000 00:04:06:CA:F2:17              90.72.118.70        31276206
2011-06-04 23:41:07.000 00:04:06:CA:F2:17              90.72.118.70        31276206
2011-06-03 23:41:58.000 00:04:06:CA:F2:17              90.72.118.70        31263067

As you can see in the sample data the eventCode 31276206 has repeated entries, the query needs to get the first entry in a range of 24 hours for each MAC, IP and eventCode. The expected result would be:

eventDate               MAC                            IP                   eventCode
----------------------- ------------------------------ -------------------- -----------
2011-06-01 23:37:05.000 00:04:06:CA:F2:17              90.72.118.70        31276197
2011-06-03 01:21:33.000 00:04:06:CA:F2:17              90.72.118.70        31276198
2011-06-03 13:35:36.000 00:04:06:CA:F2:17              90.72.118.70        31276206
2011-06-03 23:39:34.000 00:04:06:CA:F2:17              90.72.118.70        31304578
2011-06-04 23:39:41.000 00:04:06:CA:F2:17              90.72.118.70        31276206
2011-06-03 23:41:58.000 00:04:06:CA:F2:17              90.72.118.70        31263067

I've tried to figure out how to build a combination of CTEs to do this but my knowledge of CTEs is very basic, so I'll be glad if someone knows how to build them or if it also can be done with some sort of sub-query.

Thanks in advance for your help.


Assuming by "24 hour range" you mean a proper day from midnight to midnight:

WITH x AS
(
    SELECT 
        eventDate, MAC, IP, eventCode, 
        rn = ROW_NUMBER() OVER
        (
            PARTITION BY DATEDIFF(DAY, 0, eventDate),
            MAC, IP, eventCode
            ORDER BY eventDate
        )
        FROM EventsLog
)
SELECT
    eventDate, MAC, IP, eventCode
FROM x
WHERE rn = 1
ORDER BY EventDate;


This doesn't use a CTE but I think it will do what you want

SELECT e1.eventCode, e1.MAC, e1.IP, MIN(e2.eventDate) AS eventDate
FROM EventsLog e1
LEFT OUTER JOIN EventsLog e2 ON e1.IP = e2.IP
                AND e1.MAC = e2.MAC
                AND e1.eventCode = e2.eventCode
                AND DATEDIFF(hh, e1.eventDate, e2.eventDate) <= 24
GROUP BY e1.eventCode, e1.MAC, e1.IP
ORDER BY MIN(e2.eventDate)

This will assume 24-hour period from the first instance.

Performance would degrade with larger amounts of data though.


Can you elaborate if your using a CTE is a requirement or just what you were thinking of using due to perceived performance or lack of other ways to achieve this?

Since you want multiple days returned, I miss read the question, here is the fastest way I can think of doing it (for large data) and doesn't use of CTE:

SELECT MIN([eventDate]) AS [eventDate], [MAC], [IP], [eventCode]
FROM EventsLog
GROUP BY CAST([eventDate] AS DATE), [eventCode], [MAC], [IP]
ORDER BY MIN([eventDate])

This will group by each 24 hour 'day' period (00:00 to 23:59) by the the GROUP BY CAST([eventDate] AS DATE) and give you the MIN([eventDate]) AS [eventDate] for that day

0

精彩评论

暂无评论...
验证码 换一张
取 消