The basic gist of my issue is, for every event A, I need to find the earliest following event B that's associated with the same user. Currently, I have:
SELECT e.UserID, e.date, min(e2.date)
FROM Event e INNER JOIN
Event e2 ON e.UserID = e2.UserID AND e.date <= e2.date
WHERE e.Event LIKE 'A' AND e2.Event LIKE 'B'
However, for every event A (which can happen for a user any number of times), numerous event B's happen, so the inner join is creating numerous extra rows that it then has to weed through on the min function. Is there a more efficient/faster way of doing this?
(the server is MSSQL Server 2008)
UPDATE: Would it be faster with Rank()开发者_Go百科?
Select UserID, date, date2
from (
Select e.UserID, e.date, e2.date as date2, rank() OVER (PARTITION BY e.date, e.UserID ORDER BY e2.date) as rank
FROM Event e INNER JOIN Event e2 on e.UserID = e2.UserID
WHERE e.Event = 'A' and e2.Event = 'B' and e.date <= e2.date
)
WHERE rank = 1
Or will optimization bring them out to basically equivalent?
Is it faster to join a third time, like this? Probably not, but it might be worth trying. Here any data returned in table "e3" represent dates inbetween the e date and the e2 date. So we left join with that and grab the NULL
values.
SELECT e.UserID, e.date, e2.date
FROM Event e
INNER JOIN Event e2 ON (e.UserID = e2.UserID AND e.date <= e2.date)
LEFT JOIN Event e3 ON (e.UserID = e3.UserID AND e.date <= e3.date AND e3.date <= e2.date AND e3.Event = 'B')
WHERE e.Event = 'A' AND e2.Event = 'B'
AND e3.date IS NULL
I am thinking this probably uses the same strategy as your MIN
query, but maybe not? I'm curious to know either way.
The only faster way of doing this that I know requires you to process each event A in a loop and find the first event B using a separate query that uses TOP and ORDER BY which allows it to look the answer up in a suitable index. This could be done in a stored procedure for maximum efficiency.
精彩评论