How to filter duplicates within row using Distinct/group by with JOINS

开发者 https://www.devze.com 2023-02-04 10:15 出处：网络

For simplicity, I will give a quick example of what i am trying to achieve: Table 1 - Members ID|Name --------------------

For simplicity, I will give a quick example of what i am trying to achieve:

Table 1 - Members

  ID    |   Name
--------------------
  1     |   John    
  2     |   Mike    
  3     |   Sam

Table 1 - Member_Selections

  ID    |   planID
--------------------
  1     |   1    
  1     |   2    
  1     |   1    
  2     |   2    
  2     |   3    
  3     |   2    
  3     |   1

Table 3 - Selection_Details

planID  |   Cost
--------------------
  1     |   5    
  2     |   10    
  3     |   12

When i run my query, I want to return the sum of the all member selections grouped by member. The issue I face however (e.g. table 2 data) is that some members may have duplicate information within the system by mistake. While we do our best to filter this data up front, sometimes it slips through the cracks so when I make the necessary calls to the system to pull information, I also want to filter this data.

the results SHOULD show:

开发者_JS百科

Results Table

ID  |    Name    | Total_Cost
-----------------------------
1   |    John    |   15
2   |    Mike    |   22
3   |    Sam     |   15

but instead have John as $20 because he has plan ID #1 inserted twice by mistake.

My query is currently:

SELECT
    sq.ID, sq.name, SUM(sq.premium) AS total_cost
FROM
(
    SELECT
    m.id, m.name, g.premium
    FROM members m
    INNER JOIN member_selections s USING(ID)
    INNER JOIN selection_details g USING(planid)
) sq group by sq.agent

Adding DISTINCT s.planID filters the results incorrectly as it will only show a single PlanID 1 sold (even though members 1 and 3 bought it).

Any help is appreciated.

EDIT

There is also another table I forgot to mention which is the agent table (the agent who sold the plans to members).

the final group by statement groups ALL items sold by the agent ID (which turns the final results into a single row).

Perhaps the simplest solution is to put a unique composite key on the member_selections table:

 alter table member_selections add unique key ms_key (ID, planID);

which would prevent any records from being added where the unique combo of ID/planID already exist elsewhere in the table. That'd allow only a single (1,1)

comment followup:

just saw your comment about the 'alter ignore...'. That's work fine, but you'd still be left with the bad duplicates in the table. I'd suggest doing the unique key, then manually cleaning up the table. The query I put in the comments should find all the duplicates for you, which you can then weed out by hand. once the table's clean, there'll be no need for the duplicate-handling version of the query.

Use UNIQUE keys to prevent accidental duplicate entries. This will eliminate the problem at the source, instead of when it starts to show symptoms. It also makes later queries easier, because you can count on having a consistent database.

What about:

SELECT
    sq.ID, sq.name, SUM(sq.premium) AS total_cost
FROM
(
    SELECT
    m.id, m.name, g.premium
    FROM members m
    INNER JOIN 
         (select distinct ID, PlanID from member_selections) s
    USING(ID)
    INNER JOIN selection_details g USING(planid)
) sq group by sq.agent

By the way, is there a reason you don't have a primary key on member_selections that will prevent these duplicates from happening in the first place?

You can add a group by clause into the inner query, which groups by all three columns, basically returning only unique rows. (I also changed 'premium' to 'cost' to match your example tables, and dropped the agent part)

SELECT
    sq.ID, 
    sq.name, 
    SUM(sq.Cost) AS total_cost
FROM
(
    SELECT
            m.id, 
            m.name, 
            g.Cost
    FROM 
            members m
            INNER JOIN member_selections s USING(ID) 
            INNER JOIN selection_details g USING(planid)

        GROUP BY
            m.ID,
            m.NAME,
            g.Cost
) sq 
group by 
    sq.ID,
    sq.NAME