Duplicate results returned from query when distinct is used_问答_开发者

Duplicate results returned from query when distinct is used

开发者 https://www.devze.com 2023-01-22 21:31 出处：网络

On a current project at I am needing to do some pagination of results returned from SQL. I have hit a corner case in which the query can accept identifiers as part of the where clause, normally this isn't an issu开发者_StackOverflow中文版e but in one case we have a single identifier being passed up that has a one to many relationship with one of the tables that the query joins on and it is returning multiple rows in the results. That issue was fixed by introducing a distinct to the query. The following is the query which returns the correct result of one row (all table/field names have been changed of course):

select distinct [item_table].[item_id]
    , row_number() over (order by [item_table].[pub_date] desc, [item_table].[item_id]) as [row_num]
from [item_table]
    join [OneToOneRelationShip] on [OneToOneRelationShip].[other_id] = [item_table].[other_id]
    left join [OneToNoneOrManyRelationship] on [OneToNoneOrManyRelationship].[item_id] = [item_table].[item_id]
where [item_table].[pub_item_web] = 1
    and [item_table].[live_item] = 1
    and [item_table].[item_id] in (1404309)

However when I introduce pagination into the query I am finding that it is now returning multiple rows when it should be only be returning one. The method I am using for pagination is as follows:

select [item_id]
from (
      select distinct [item_table].[item_id]
            , row_number() over (order by [item_table].[pub_date] desc, [item_table].[item_id]) as [row_num]
      from [item_table]
            join [OneToOneRelationShip] on [OneToOneRelationShip].[other_id] = [item_table].[other_id]
            left join [OneToNoneOrManyRelationship] on [OneToNoneOrManyRelationship].[item_id] = [item_table].[item_id]
      where [item_table].[pub_item_web] = 1
            and [item_table].[live_item] = 1
            and [item_table].[item_id] in (1404309)
) as [items]
where [items].[row_num] between 0 and 100

I worry that adding a distinct to the outer query will cause an incorrect number of results to be returned and I am unsure of how else to fix this issue. The database I am querying is MS SQL Server 2008.

About 5 minutes after posting the question a possible solution hit me, if I group by the item_id (and any sort criteria) which should only be one instance of it should solve the issue. After testing this was the query that I was left with:

select [item_id]
from (
      select [item_table].[item_id]
            , row_number() over (order by [item_table].[pub_date] desc, [item_table].[item_id]) as [row_num]
      from [item_table]
            join [OneToOneRelationShip] on [OneToOneRelationShip].[other_id] = [item_table].[other_id]
            left join [OneToNoneOrManyRelationship] on [OneToNoneOrManyRelationship].[item_id] = [item_table].[item_id]
      where [item_table].[pub_item_web] = 1
            and [item_table].[live_item] = 1
            and [item_table].[item_id] in (1404309)
      group by [item_table].[item_id], [item_table].[pub_date]
) as [items]
where [items].[row_num] between 0 and 100

I don't see where the DISTINCT is adding any value in your first query. The results are [item_table].[item_id] and [row_num]. Because the value of [row_num] is already distinct, the combination of [item_table].[item_id] and [row_num] will be distinct. When adding the DISTINCT keyword to the query, no rows are excluded.

In the second query, your results will return [item_id] from the sub query where [row_num] meets the criteria. If there where duplicate [item_id] values in the sub-query, there will be duplicates in the final results, but now you don't display [row_num] to distinguish the duplicates.