So I've got a large amount of SQL data that looks basically like this:
user | src | dst
1 | 1 | 1
1 | 1 | 1
1 | 1 | 2
1 | 1 | 2
2 | 1 | 1
2 | 1 | 3
I want to filter out pairs of (sr开发者_StackOverflow社区c,dst) that are unique to one user (even if that user has duplicates), leaving behind only those pairs belonging to more than one user:
user | src | dst
1 | 1 | 1
1 | 1 | 1
2 | 1 | 1
In other words, pair (1,2) is unique to user 1 and pair (1,3) to user 2, so they're dropped, leaving behind only all instances of pair (1,1).
Edit: To clarify, I am not interested in the pairs that get filtered out, I need all the rows where the pairs are not unique.
Any ideas? The answers to the question below can find the non-unique pairs, but my SQL-fu doesn't suffice to handle the complication of requiring that they belong to multiple users as well.
How to select non "unique" rows
My solution would be something like this (tested):
SELECT user, src, dst, COUNT(user) as num_of_users
FROM test
GROUP BY src, dst
HAVING num_of_users = 1
Edit: The following code produces the results that you provided in your example.
SELECT test.user, test.src, test.dst
FROM test
INNER JOIN
(
SELECT user, src, dst, COUNT(DISTINCT user) as num_of_users
FROM test
GROUP BY src, dst
HAVING num_of_users > 1
) as inner_sql
USING(src, dst)
Is this solution the one you are looking for? How is the performance?
Join using EXISTS syntax
SELECT t1.*
FROM table t1
WHERE EXISTS (SELECT 1
FROM table t2
WHERE t1.src = t2.src
AND t1.dst = t2.dst
AND t1.user != t2.user)
Another option that MIGHT work (didn't test). If it works it would be more efficient:
SELECT user, src, dst
FROM table
GROUP BY src, dst
HAVING COUNT(DISTINCT user) > 1
May be with some group by?
something like
select user,src,dst,count(*) as c from table group by user,src,dst
having c = 1
精彩评论