Optimizing ROW_NUMBER() in SQL Server_问答_开发者

We have a number of machines which record data into a database at sporadic intervals. For each record, I'd like to obtain the time period between this recording and the previous recording.

I can do this using ROW_NUMBER as follows:

WITH TempTable AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY Machine_ID ORDER BY Date_Time) AS Ordering
    FROM dbo.DataTable
)

SELECT [Current].*, Previous.Date_Time AS PreviousDateTime
FROM TempTable AS [Current]
INNER JOIN TempTable AS Previous 
    ON [Current].Machine_ID = Previous.Machine_ID
    AND Previous.Ordering = [Current].Ordering + 1

The problem is, it goes really slow (several minutes on a table with about 10k entries) - I tried creating separate indicies on Machine_ID and Date_Time, and a single joined-index, but nothing 开发者_运维问答helps.

Is there anyway to rewrite this query to go faster?

The given ROW_NUMBER() partition and order require an index on (Machine_ID, Date_Time) to satisfy in one pass:

CREATE INDEX idxMachineIDDateTime ON DataTable (Machine_ID, Date_Time);

Separate indexes on Machine_ID and Date_Time will help little, if any.

How does it compare to this version?:

SELECT x.*
    ,(SELECT MAX(Date_Time)
      FROM dbo.DataTable
      WHERE Machine_ID = x.Machine_ID
          AND Date_Time < x.Date_Time
    ) AS PreviousDateTime
FROM dbo.DataTable AS x

Or this version?:

SELECT x.*
    ,triang_join.PreviousDateTime
FROM dbo.DataTable AS x
INNER JOIN (
    SELECT l.Machine_ID, l.Date_Time, MAX(r.Date_Time) AS PreviousDateTime
    FROM dbo.DataTable AS l
    LEFT JOIN dbo.DataTable AS r
    ON l.Machine_ID = r.Machine_ID
        AND l.Date_Time > r.Date_Time
    GROUP BY l.Machine_ID, l.Date_Time
) AS triang_join
ON triang_join.Machine_ID = x.Machine_ID
    AND triang_join.Date_Time = x.Date_Time

Both would perform best with an index on Machine_ID, Date_Time and for correct results, I'm assuming that this is unique.

You haven't mentioned what is hidden away in * and that can sometimes means a lot since a Machine_ID, Date_Time index will not generally be covering and if you have a lot of columns there or they have a lot of data, ...

If the number of rows in dbo.DataTable is large then it is likely that you are experiencing the issue due to the CTE self joining onto itself. There is a blog post explaining the issue in some detail here

Occasionally in such cases I have resorted to creating a temporary table to insert the result of the CTE query into and then doing the joins against that temporary table (although this has usually been for cases where a large number of joins against the temp table are required - in the case of a single join the performance difference will be less noticable)

I have had some strange performance problems using CTEs in SQL Server 2005. In many cases, replacing the CTE with a real temp table solved the problem.

I would try this before going any further with using a CTE.

I never found any explanation for the performance problems I've seen, and really didn't have any time to dig into the root causes. However I always suspected that the engine couldn't optimize the CTE in the same way that it can optimize a temp table (which can be indexed if more optimization is needed).

Update

After your comment that this is a view, I would first test the query with a temp table to see if that performs better.

If it does, and using a stored proc is not an option, you might consider making the current CTE into an indexed/materialized view. You will want to read up on the subject before going down this road, as whether this is a good idea depends on a lot of factors, not the least of which is how often the data is updated.

What if you use a trigger to store the last timestamp an subtract each time to get the difference?

If you require this data often, rather than calculate it each time you pull the data, why not add a column and calculate/populate it whenever row is added?

(Remus' compound index will make the query fast; running it only once should make it faster still.)