开发者

T-SQL - Help with MAX Operation Over Many-to-Many

开发者 https://www.devze.com 2023-01-31 07:44 出处:网络
This is closely related to a previous question i asked. I have a many-to-many relationship between Post and Location.

This is closely related to a previous question i asked.

I have a many-to-many relationship between Post and Location.

The join table is called PostLocations, and has nothing but the FK's. (LocationId, PostId)

I'm trying to pull back the top post for each location.

This is the query i have (which was given in the answer to my previous question):

SELECT pl.LocationId, p.postid, p.UniqueUri, p.Content, MAX(s.BaseScore) as topscore
from dbo.PostLocations pl
inner join dbo.posts p on pl.PostId = p.PostId
inner join dbo.reviews r on p.postid = r.postid
inner join dbo.scores s on r.scoreid = s.scoreid
group by pl.locationid, p.postid, p.UniqueUri, p.Content

But the problem is, because PostLocations could have entries like this:

LocationId   PostId
1            213213
2            498324
1            230943

So my above query is returning LocationId 1 twice, because it has two records in the join table. I only want 1 record per location - the top post per locationid.

I've also tried this:

SELECT l.LocationId, p.postid, p.UniqueUri, p.Content, MAX(s.BaseScore) as topscore
from dbo.PostLocations 开发者_JAVA技巧pl
inner join dbo.Locations l on pl.LocationId = l.LocationId
inner join dbo.posts p on pl.PostId = p.PostId
inner join dbo.reviews r on p.postid = r.postid
inner join dbo.scores s on r.scoreid = s.scoreid
group by l.locationid, p.postid, p.UniqueUri, p.Content

Same result - this is what comes back:

LocationId   PostId   UniqueUri   Content   TopScore
1            213213   some-post   pew pew   2.00
2            498324   anot-post   blah bl   4.50
1            230943   sadjsa-as   asijd a   3.5

This is what should come back:

LocationId   PostId   UniqueUri   Content   TopScore
1            230943   sadjsa-as   asijd a   3.5
2            498324   anot-post   blah bl   4.50

Because LocationId 1 has 2 posts, but PostId 230943 has the highest score so that is the one returned.

Any ideas on what i'm missing?


If you are using SQL Server 2005 or later, you can do something like:

With RankedLocations As
    (
    Select PL.LocationId
        , S.BaseScore
        , P.PostID
        , P.UniqueUri
        , P.Content
        , Row_Number() Over( Partition By PL.LocationId Order By S.BaseScore Desc ) As ScoreRank
    From dbo.PostLocations As PL
        Join dbo.Posts As P
            On P.PostId = PL.PostId
        Join dbo.Reviews As R
            On R.PostId = P.PostId
        Join dbo.Scores As S
            On S.ScoreId = R.ScoreId
    )
Select LocationId, BaseScore, PostID, UniqueUri, Content
From RankedLocations
Where ScoreRank = 1


As you're grouping on postID which is unique every single post falls into its own group of one.

I'm not sure if there is a better way to do this however what I've done in the past runs along these lines

Select l.LocationId, p.postid, p.UniqueUri, p.Content, s.basescore as topscore
from
    dbo.Locations l inner join
    (select 
        pl.locationid, max(s.BaseScore) as topscore
    from
        dbo.postlocations pl 
        inner join dbo.posts p on pl.PostId = p.PostId
        inner join dbo.reviews r on p.postid = r.postid
        inner join dbo.scores s on r.scoreid = s.scoreid
    group by
        pl.locationid) as topPost on l.locationid = topPost.locationid
    inner join dbo.postlocations pl on pl.locationid = l.locationid
    inner join dbo.posts p on pl.PostId = p.PostId
    inner join dbo.reviews r on p.postid = r.postid
    inner join dbo.scores s on r.scoreid = s.scoreid and s.basescore = toppost.topscore

We make a subquery to find the top score for a given location then do our joins as before and on the last join ensure that the basescore is the topscore we found earlier.

This means that if we have two equal top scores for a given location we will return both rows, however in all other cases we will return a single row per location, it can be modified to pick an arbitrary post given two equal top scores but I haven't done so.

I'm interested to see if there are any other solutions to this problem as with the number of extra joins this is quite a computationally expensive solution to the problem.

Edit - in response to your comment as postid is the primary key we can rely on it to be greatest for the latest post.

Select l.LocationId, p.postid, p.UniqueUri, p.Content, bar.basescore as topscore
from
    dbo.Locations l inner join
    (select 
        l.LocationId, max(p.postid) as postid ,max(s.basescore) as basescore
    from
        (select 
            pl.locationid, max(s.BaseScore) as topscore
        from
            dbo.postlocations pl 
            inner join dbo.posts p on pl.PostId = p.PostId
            inner join dbo.reviews r on p.postid = r.postid
            inner join dbo.scores s on r.scoreid = s.scoreid

        group by
            pl.locationid) as topPost on l.locationid = topPost.locationid
        inner join dbo.postlocations pl on pl.locationid = l.locationid
        inner join dbo.posts p on pl.PostId = p.PostId
        inner join dbo.reviews r on p.postid = r.postid
        inner join dbo.scores s on r.scoreid = s.scoreid and s.basescore = toppost.topscore
    group by l.locationid) as bar on l.locationid = bar.locationid
    inner join posts p on bar.postid = p.postid
0

精彩评论

暂无评论...
验证码 换一张
取 消