开发者

Help with Improving Performance of SQL Server 2008 CTE

开发者 https://www.devze.com 2023-02-11 08:51 出处:网络
I have a SQL Server 2008 CTE which is responsible for returning the Top Review for a location. The CTE is wrapped in a UDF (Table-Valued), and joined on the LocationId field, so i can get the top rev

I have a SQL Server 2008 CTE which is responsible for returning the Top Review for a location.

The CTE is wrapped in a UDF (Table-Valued), and joined on the LocationId field, so i can get the top review for each location.

Cardinalities:

1 Location has 0-many PostLocations

1 PostLocation has 1 Post

1 Post has 1 Review

Here's the UDF:

CREATE FUNCTION [dbo].[Review_HighestRated_Aggregated_ByLocation]
(   

)
RETURNS TABLE 
AS
RETURN 
(
    WITH [RankedLocations] AS
    (
        SELECT      PL.LocationId, 
                    R.Rating, 
                    P.PostID, 
                    P.UniqueUri, 
                    P.Content, 
                    ROW_NUMBER() OVER (PARTITION BY PL.LocationId ORDER BY R.Rating DESC, P.LocationTypeId, P.CreatedOn DESC) As ScoreRank

        From        dbo.PostLocations As PL
        INNER JOIN  dbo.Posts As P
        ON          P.PostId = PL.PostId
        INNER JOIN  dbo.Reviews As R
        ON          R.PostId = P.PostId

        WHERE       R.ReviewTypeId <> 5
        AND         P.Content IS NOT NULL
    )

    SELECT  LocationId, Rating, PostID, UniqueUri, Content
    FROM    RankedLocations
    WHERE   ScoreRank = 1
)

Here's an example of how i'm using it:

select l.LocationId, l.Name, l.Unique开发者_如何学PythonUri, r.UniqueUri, r.Content
from @Locations l -- temp table containing around 18 location ids
inner join dbo.Review_HighestRated_Aggregated_ByLocation() r 
on l.LocationId = r.LocationId

The above query is taking 15 seconds to execute, which is unacceptable. Without the join to the UDF it's takes 0 seconds.

Any ideas on how i can improve it?

If i look at the execution plan, it's the SORT that is taking up 98% of the execution cost. The IO/subtree cost of this operation is ~300.

I was hoping the execution plan would give me a hint to an index i could create to improve the cost, but i get nothing.

Any ideas?


So i found the performance problem, and it wasn't the CTE, it was how i was using it.

I have several lookup tables, one in particular for Location Type (Street = 7, City = 5, etc).

So to keep my SQL fluent and consistent (and avoid hard-coded magic numbers), i created a wrapper scalar function which returns the evuivalent value based on the string, e.g:

DECLARE @Street_LocationType = [dbo].[ToLocationTypeId]('Street')

The function is extremely simple, just a series of CASE statements.

But, i was using my CTE like this:

SELECT      a.LocationId, b.Content, b.UniqueUri
FROM        [dbo].[Locations] a
INNER JOIN  dbo.Review_HighestRated_Aggregated_ByLocation()  b -- UDF with CTE
ON          a.LocationId = b.LocationId        
WHERE       a.LocationTypeId = @Street_LocationType

So i wasn't even using it on the CTE itself, i was using it as a filter on the Locations table.

If i change the above to hardcode the value (e.g 7), the procedure execution time drops from 13 seconds to 2 secs.

I don't get it, but it solved the problem. I was noticing when the procedure was performing badly, the "SORT" operation in the query plan had an estimated number of rows = 32,000 - which is basically every post in the system.

After my changes, the estimated number of rows is 1 (as it should be).

Bizarre activity indeed.


If your table valued function dont need parameters consider to use a VIEW instead a UDF. Probably it solves the performance problem.


To convert the CTE and the UDF into a VIEW:

DROP FUNCTION [dbo].[Review_HighestRated_Aggregated_ByLocation]
GO

CREATE VIEW Review_HighestRated_Aggregated_ByLocation
AS
SELECT  LocationId, Rating, PostID, UniqueUri, Content
FROM
(
    SELECT      PL.LocationId, 
                R.Rating, 
                P.PostID, 
                P.UniqueUri, 
                P.Content, 
                ROW_NUMBER() OVER (PARTITION BY PL.LocationId ORDER BY R.Rating DESC, P.LocationTypeId, P.CreatedOn DESC) As ScoreRank
    From        dbo.PostLocations As PL
    INNER JOIN  dbo.Posts As P
    ON          P.PostId = PL.PostId
    INNER JOIN  dbo.Reviews As R
    ON          R.PostId = P.PostId
    WHERE       R.ReviewTypeId <> 5
    AND         P.Content IS NOT NULL
) RankedLocations
WHERE   ScoreRank = 1

GO

OP's sample query revised to use the new VIEW:

select l.LocationId, l.Name, l.UniqueUri, r.UniqueUri, r.Content
from @Locations l -- temp table containing around 18 location ids
inner join Review_HighestRated_Aggregated_ByLocation r 
on l.LocationId = r.LocationId
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号