I have 3 sql tables:
Data36 (Data_ID:int <PK>, type:int),
Data38(Data_ID:int <PK>, clientId:int),
Data47(Data_ID:int <PK>, payerID:int).
I thought the following queries are identical, because I don't use aggregate functions here and GROUP BY should behave the same way as DISTINCT. But they return very different result sets and I don't understand why. Please help me to understand defference between these queries.
Query 1 (returns 153 rows):
SELECT payer.Data_ID, payer.type
FROM Data36 AS payer
JOIN Data38 AS serv ON payer.Data_ID = serv.clientId
WHERE ((SELECT count(*) FROM Data47 AS regsites WHERE regsites.payerID = payer.Data_ID) = 0)
GROUP BY payer.Data_ID, payer.type
Query 2 (returns 4744 rows):
SELECT DISTINCT payer.Data_ID, payer.type
FROM Data36 AS payer
JOIN Data38 AS serv ON payer.Data_ID = serv.clientId
WHERE ((SELECT count(*) FROM Data47 AS regsites WHERE regsites.payerID = payer.Data_ID) = 0)
SQL Server version is 5.0.40.
Let me know if you need more specific information.
Update: Sorry for not mentioning this: Data_ID is a Primary Key in these tables, so Data_ID is unique for each record in these tables.
SELECT count( * ) FROM Data36
--returns 5998
SELECT count(DISTINCT Data_ID) FROM Data36
--returns 5998
Update 2: In Query 1 I changed '开发者_StackOverflowGROUP BY payer.Data_ID' to 'GROUP BY payer.Data_ID, payer.type'. The result is still the same - 153 rows.
Whan happens when you try this?
select payer.Data_ID, payer.type from
(
SELECT DISTINCT Data_ID, type
FROM Data36
) AS payer
JOIN Data38 AS serv ON payer.Data_ID = serv.clientId
WHERE ((SELECT count(*) FROM Data47 AS regsites WHERE regsites.payerID = payer.Data_ID) = 0)
Correct and most efficient way to found records that hasn't coresponding records in the third table is:
SELECT payer.Data_ID, payer.type
FROM Data36 AS payer
JOIN Data38 AS serv ON payer.Data_ID = serv.clientId
LEFT JOIN Data47 AS regsites ON(regsites.payerID = payer.Data_ID)
WHERE regsites.payerID IS NULL
GROUP BY payer.Data_ID
Looking at your query there appear to be only one functional difference...
Query1 (153 records):
SELECT payer.Data_ID, payer.type FROM <blah> GROUP BY payer.Data_ID
Query2 (4744 records):
SELECT DISTINCT payer.Data_ID, payer.type FROM <blah>
As Query1 only groups by Data_ID I would say that you have 153 Data_IDs but that each Data_ID can have many different type values.
Query2 is returning every different Data_ID/type combination, but Query1 is only returning 1 record for each Data_ID.
I'm also surprised that Query1 runs at all, as I would expect you would be Required to have an aggregate function such as MIN/MAX around the type field.
精彩评论