开发者

SQL Multiple Duplicate Row Detection

开发者 https://www.devze.com 2023-04-11 16:18 出处:网络
I\'m trying to determine a correct way to isolate rows within a table that have the same values in 2 columns.

I'm trying to determine a correct way to isolate rows within a table that have the same values in 2 columns.

There are two tables, one (Name) with the person's names and IDs, and the other one (Nation) with people's IDs and their nations. I join the two tables with inner join, and now the new table columns consist of an ID, first name, last name, and nation. If I want to find pairs of people who have the same last name and are from the same nation, why isn't

select ID, FName, LName, Nation
from (Name inner join Nation on Name.ID = Nation.ID)
group by Name, Nation
having count(Name) > 1 and count(Nation) > 1

working?

I'm aiming for the result to be a table with columns:

ID -------First--------------- Last ---------Nation

where the last 开发者_StackOverflow中文版names and nations will be identical pairs while first names will be different.

I feel like the group by part isnt appropriate, but is there even an alternate way? Thanks for any help.


If you are using MS SQL Server:

select
    *
from
(
    select 
        Name.*, 
        Nation.Nation, 
        cnt = count(*) over(partition by LName, Nation) 
    from Name
    join Nation on Nation.ID = Name.ID
) t
where cnt > 1


Try this:

SELECT * FROM (
  SELECT Name.ID, Name.FName, Name.LName, Nation.Nation
  FROM Name
  INNER JOIN Nation ON (Name.ID = Nation.ID)
) a
INNER JOIN (
  SELECT Name.ID, Name.FName, Name.LName, Nation.Nation
  FROM Name
  INNER JOIN Nation ON (Name.ID = Nation.ID)
) b ON (a.LName = b.LName AND a.Nation = b.Nation)
WHERE a.ID < b.ID


As Simon Righarts hinted, something's not right with the design.

Scenario 1)

If a name can have multiple nations, you would have 3 tables implementing an n:m relationship.

CREATE TABLE name (name_id int, name text, ...);
CREATE TABLE nation (nation_id int, nation text, ...);
CREATE TABLE nationality (name_id int references name(name_id)
            ,nation_id int references nation(nation_id)
            ... );

Query for the scenario:

SELECT a.name_id, a.fname, a.lname, n.nation
  FROM name a
  JOIN nationality na USING (name_id)
  JOIN nation n USING (nation_id)
  JOIN (
   SELECT a.lname, na.nation_id
     FROM name a
     JOIN nationality na USING (name_id)
    GROUP BY 1,2
   HAVING count(*) > 1) x USING (lname, nation_id)

Scenario 2)

If a name can only have one nation, there would be a column nation_id in the table name:

CREATE TABLE name (name_id int
                  ,name text
                  ,nation_id int references nation(nation_id), ...);
CREATE TABLE nation (nation_id int, nation text, ...);

Query for this scenario:

SELECT a.name_id, a.fname, a.lname, n.nation
  FROM name a
  JOIN nation n USING (nation_id)
  JOIN (
   SELECT a.lname, a.nation_id
     FROM name a
    GROUP BY 1,2
   HAVING count(*) > 1) x USING (lname, nation_id);

All multiple occurrences are included here, not just "pairs" - assuming you meant that.

Your actual description doesn't fit either scenario.

0

精彩评论

暂无评论...
验证码 换一张
取 消