开发者

GROUP BY does not remove duplicates

开发者 https://www.devze.com 2023-03-30 12:02 出处:网络
I have a watchlist system that I\'ve coded, in the overview of the users\' watchlist, they would see a list of records, however the list shows duplicates when in the database it only shows the exact,

I have a watchlist system that I've coded, in the overview of the users' watchlist, they would see a list of records, however the list shows duplicates when in the database it only shows the exact, correct number.

I've tried GROUP BY watch.watch_id, GROUP BY rec.record_id, none of any types of group I开发者_运维问答've tried seems to remove duplicates. I'm not sure what I'm doing wrong.

SELECT watch.watch_date,
       rec.street_number,
       rec.street_name,
       rec.city,
       rec.state,
       rec.country,
       usr.username
FROM
(
    watchlist watch 

    LEFT OUTER JOIN records rec ON rec.record_id = watch.record_id

    LEFT OUTER JOIN members usr ON rec.user_id = usr.user_id
)
WHERE  watch.user_id = 1 
GROUP BY watch.watch_id
LIMIT 0, 25

The watchlist table looks like this:

+----------+---------+-----------+------------+
| watch_id | user_id | record_id | watch_date |
+----------+---------+-----------+------------+
|       13 |       1 |        22 | 1314038274 |
|       14 |       1 |        25 | 1314038995 |
+----------+---------+-----------+------------+


GROUP BY does not "remove duplicates". GROUP BY allows for aggregation. If all you want is to combine duplicated rows, use SELECT DISTINCT.

If you need to combine rows that are duplicate in some columns, use GROUP BY but you need to to specify what to do with the other columns. You can either omit them (by not listing them in the SELECT clause) or aggregate them (using functions like SUM, MIN, and AVG). For example:

SELECT watch.watch_id, COUNT(rec.street_number), MAX(watch.watch_date)
... GROUP by watch.watch_id

EDIT

The OP asked for some clarification.

Consider the "view" -- all the data put together by the FROMs and JOINs and the WHEREs -- call that V. There are two things you might want to do.

First, you might have completely duplicate rows that you wish to combine:

a b c
- - -
1 2 3
1 2 3
3 4 5

Then simply use DISTINCT

SELECT DISTINCT * FROM V;

a b c
- - -
1 2 3
3 4 5

Or, you might have partially duplicate rows that you wish to combine:

a b c
- - -
1 2 3
1 2 6
3 4 5

Those first two rows are "the same" in some sense, but clearly different in another sense (in particular, they would not be combined by SELECT DISTINCT). You have to decide how to combine them. You could discard column c as unimportant:

SELECT DISTINCT a,b FROM V;

a b
- -
1 2
3 4

Or you could perform some kind of aggregation on them. You could add them up:

SELECT a,b, SUM(c) "tot" FROM V GROUP BY a,b;

a b tot
- - ---
1 2 9
3 4 5

You could add pick the smallest value:

SELECT a,b, MIN(c) "first" FROM V GROUP BY a,b;

a b first
- - -----
1 2 3
3 4 5

Or you could take the mean (AVG), the standard deviation (STD), and any of a bunch of other functions that take a bunch of values for c and combine them into one.

What isn't really an option is just doing nothing. If you just list the ungrouped columns, the DBMS will either throw an error (Oracle does that -- the right choice, imo) or pick one value more or less at random (MySQL). But as Dr. Peart said, "When you choose not to decide, you still have made a choice."


While SELECT DISTINCT may indeed work in your case, it's important to note why what you have is not working.

You're selecting fields that are outside of the GROUP BY. Although MySQL allows this, the exact rows it returns for the non-GROUP BY fields is undefined.

If you wanted to do this with a GROUP BY try something more like the following:

SELECT watch.watch_date,
       rec.street_number,
       rec.street_name,
       rec.city,
       rec.state,
       rec.country,
       usr.username
FROM
(
    watchlist watch 

    LEFT OUTER JOIN est8_records rec ON rec.record_id = watch.record_id

    LEFT OUTER JOIN est8_members usr ON rec.user_id = usr.user_id
)
WHERE watch.watch_id IN (
SELECT watch_id FROM watch WHERE user_id = 1 
GROUP BY watch.watch_id)
LIMIT 0, 25


I Would never recommend using SELECT DISTINCT, it's really slow on big datasets.

Try using things like EXISTS.


You are grouping by watch.watch_id and you have two results, which have different watch IDs, so naturally they would not be grouped.

Also, from the results displayed they have different records. That looks like a perfectly valid expected results. If you are trying to only select distinct values, then you don't want ot GROUP, but you want to select by distinct values.

SELECT DISTINCT()...


If you say your watchlist table is unique, then one (or both) of the other tables either (a) has duplicates, or (b) is not unique by the key you are using.

To suppress duplicates in your results, either use DISTINCT as @Laykes says, or try

GROUP BY watch.watch_date,
       rec.street_number,
       rec.street_name,
       rec.city,
       rec.state,
       rec.country,
       usr.username

It sort of sounds like you expect all 3 tables to be unique by their keys, though. If that is the case, you are simply masking some other problem with your SQL by trying to retrieve distinct values.

0

精彩评论

暂无评论...
验证码 换一张
取 消