T-SQL: finding a group by its members_问答_开发者

开发者 https://www.devze.com 2022-12-09 23:04 出处：网络

Given the following two tables in SQL Server 2005: IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = \'GroupItems\')

相关专题：sql tsql

Given the following two tables in SQL Server 2005:

IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'GroupItems')
    DROP TABLE GroupItems;
CREATE TABLE GroupItems (
    RowID INT IDENTITY(1,1) PRIMARY KEY
    , GroupID CHAR(1)
    , ItemID INT
);

IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'ItemList')
    DROP TABLE ItemList;
CREATE TABLE ItemList (
    ItemID INT PRI开发者_StackOverflow中文版MARY KEY
)


INSERT GroupItems ( GroupID, ItemID )
SELECT 'A', 1
UNION SELECT 'A', 2
UNION SELECT 'A', 3
UNION SELECT 'A', 4
UNION SELECT 'B', 1
UNION SELECT 'B', 2
UNION SELECT 'B', 4
UNION SELECT 'C', 1
UNION SELECT 'C', 2
UNION SELECT 'D', 1
UNION SELECT 'D', 4
UNION SELECT 'D', 5


INSERT ItemList ( ItemID )
SELECT 1
UNION SELECT 2
UNION SELECT 4

I'm trying to find the GroupID(s) from table GroupItems where the ItemIDs are an exact match for the contents of table ItemList.

In the sample data, the result should be Group 'B'.

Group A is rejected because it contains an item that is not in the ItemList table.

Group C is rejected because it doesn't contain all the items in the ItemList table.

Group D is rejected for both reasons.

Currently, I'm doing something like

DECLARE @ListCount INT;
SELECT @ListCount = COUNT(*) FROM ItemList;

SELECT GI.GroupID FROM GroupItems AS GI
INNER JOIN ItemList AS IL ON IL.ItemID = GI.ItemID
INNER JOIN ( SELECT GroupID FROM GroupItems 
             GROUP BY GroupID
             HAVING COUNT(*) = @ListCOunt ) AS GS ON GS.GroupID = GI.GroupID  
GROUP BY GI.GroupID 
HAVING COUNT(*) = @ListCount;

This function gives the correct result that I'm looking for, however, in my production environment, the GroupItems table has hundreds of thousands of rows and thousands of unique GroupIDs. The ItemList table usually contains about a dozen rows. This function is called fairly regularly. I'm looking for a more efficient way to get the same results.

Assuming :

ItemID value can be only > 0

SELECT t.GroupID
FROM (
  SELECT GroupItems.GroupID
        ,count(1) as groupItemsCount 
        ,min(IsNull(ItemList.ItemID, -1)) as minVal
  FROM GroupItems
      LEFT JOIN ItemList
              ON (GroupItems.ItemID = ItemList.ItemID)
  GROUP BY GroupID
) t
WHERE t.groupItemsCount = (SELECT COUNT(1) FROM ItemList)
  AND (t.minVal > 0)

Assuming:

there is no relevant information missing
ItemIDs are PKs, and therefore unique
You don't want GroupIDs where you have repeated group/item combinations

This should work:

select GroupID
from GroupItems
inner join ItemMaster
    on GroupItems.ItemID = ItemMaster.ItemID
inner join GroupMaster
    on GroupItems.GroupID = GroupMaster.GroupID
group by GroupID
having count(*) = (select count(*) from ItemList)

If there was a guarantee in GroupItems of unique group/item combinations, the join would be unneccessary.

Have you considered creating an indexed view to aggregate the counts on GroupItems?

CREATE VIEW GroupCounts (groupId, GroupCount) with SCHEMABINDING
AS
SELECT groupId, COUNT_BIG(1) /* I use 1 instead of asterisk by convention */
FROM GroupItems
GROUP BY groupId

CREATE CLUSTERED INDEX IX_GroupCounts on GroupCounts(groupId)

With this, you can use a similar query to the one you have, but it should have much better performance.

SELECT GS.groupId FROM GroupItems AS GI
INNER JOIN ItemList AS IL ON IL.ItemID = GI.ItemID
INNER JOIN GroupCounts AS GS ON GS.GroupID = GI.GroupID  
GROUP BY GS.GroupID 
HAVING COUNT(1) = groupCount;