Given the following two tables in SQL Server 2005:
IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'GroupItems')
DROP TABLE GroupItems;
CREATE TABLE GroupItems (
RowID INT IDENTITY(1,1) PRIMARY KEY
, GroupID CHAR(1)
, ItemID INT
);
IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'ItemList')
DROP TABLE ItemList;
CREATE TABLE ItemList (
ItemID INT PRI开发者_StackOverflow中文版MARY KEY
)
INSERT GroupItems ( GroupID, ItemID )
SELECT 'A', 1
UNION SELECT 'A', 2
UNION SELECT 'A', 3
UNION SELECT 'A', 4
UNION SELECT 'B', 1
UNION SELECT 'B', 2
UNION SELECT 'B', 4
UNION SELECT 'C', 1
UNION SELECT 'C', 2
UNION SELECT 'D', 1
UNION SELECT 'D', 4
UNION SELECT 'D', 5
INSERT ItemList ( ItemID )
SELECT 1
UNION SELECT 2
UNION SELECT 4
I'm trying to find the GroupID(s) from table GroupItems where the ItemIDs are an exact match for the contents of table ItemList.
In the sample data, the result should be Group 'B'.
Group A is rejected because it contains an item that is not in the ItemList table.
Group C is rejected because it doesn't contain all the items in the ItemList table.
Group D is rejected for both reasons.
Currently, I'm doing something like
DECLARE @ListCount INT;
SELECT @ListCount = COUNT(*) FROM ItemList;
SELECT GI.GroupID FROM GroupItems AS GI
INNER JOIN ItemList AS IL ON IL.ItemID = GI.ItemID
INNER JOIN ( SELECT GroupID FROM GroupItems
GROUP BY GroupID
HAVING COUNT(*) = @ListCOunt ) AS GS ON GS.GroupID = GI.GroupID
GROUP BY GI.GroupID
HAVING COUNT(*) = @ListCount;
This function gives the correct result that I'm looking for, however, in my production environment, the GroupItems table has hundreds of thousands of rows and thousands of unique GroupIDs. The ItemList table usually contains about a dozen rows. This function is called fairly regularly. I'm looking for a more efficient way to get the same results.
Assuming :
- ItemID value can be only > 0
SELECT t.GroupID FROM ( SELECT GroupItems.GroupID ,count(1) as groupItemsCount ,min(IsNull(ItemList.ItemID, -1)) as minVal FROM GroupItems LEFT JOIN ItemList ON (GroupItems.ItemID = ItemList.ItemID) GROUP BY GroupID ) t WHERE t.groupItemsCount = (SELECT COUNT(1) FROM ItemList) AND (t.minVal > 0)
Assuming:
- there is no relevant information missing
- ItemIDs are PKs, and therefore unique
- You don't want GroupIDs where you have repeated group/item combinations
This should work:
select GroupID
from GroupItems
inner join ItemMaster
on GroupItems.ItemID = ItemMaster.ItemID
inner join GroupMaster
on GroupItems.GroupID = GroupMaster.GroupID
group by GroupID
having count(*) = (select count(*) from ItemList)
If there was a guarantee in GroupItems of unique group/item combinations, the join would be unneccessary.
Have you considered creating an indexed view to aggregate the counts on GroupItems?
CREATE VIEW GroupCounts (groupId, GroupCount) with SCHEMABINDING
AS
SELECT groupId, COUNT_BIG(1) /* I use 1 instead of asterisk by convention */
FROM GroupItems
GROUP BY groupId
CREATE CLUSTERED INDEX IX_GroupCounts on GroupCounts(groupId)
With this, you can use a similar query to the one you have, but it should have much better performance.
SELECT GS.groupId FROM GroupItems AS GI
INNER JOIN ItemList AS IL ON IL.ItemID = GI.ItemID
INNER JOIN GroupCounts AS GS ON GS.GroupID = GI.GroupID
GROUP BY GS.GroupID
HAVING COUNT(1) = groupCount;
精彩评论