开发者

SQL Query question - How to select groups of rows using commonalities

开发者 https://www.devze.com 2023-02-18 14:52 出处:网络
I\'m working census data, and I wish to provide the ability to search for records by providing a list of names. The idea is that if you know the name or 2 or 3 family members, you will be able to excl

I'm working census data, and I wish to provide the ability to search for records by providing a list of names. The idea is that if you know the name or 2 or 3 family members, you will be able to exclude all addresses which don't have those names. Consider this example dataset (Address, Family Name, Forename):


"Residents of a house 2 in Janemount Lower (Cork No. 4 Urban (part of), Cork)", "Gamble", "Julia"

"Residents of a house 2 in Janemount Lower (Cork No. 4 Urban (part of), Cork)", "Gamble", "Richard"

"Residents of a house 2 in Janemount Lower (Cork No. 4 Urban (part of), Cork)", "Gamble", "Hannah"

"Residents of a house 2 in Janemount Lower (Cork No. 4 Urban (part 开发者_开发百科of), Cork)", "Gamble", "Helen"


A search for Julia, Hannah and Helen should be able to return all 4 rows as they share a common address. It sounds simple enough, but I'm having trouble with this one. Cursors are out because of the size of the dataset. Any ideas?

(Needless to say, I've simplified this somewhat as I'm ignoring the Family-name part of the search for now)


This query:

       select streetaddress, count(*) as occupantcount
       from census
       where firstname in ("Julia", "Hannah", "Helen")
       group by streetaddress
       order by occupantcount desc

would return the address and number of occupants at each address wherever one of the occupants has one of the first names in the IN() list, and order the results in most-occupants-to-least-occupants order. Variations on this same principle (grouping by address) can get you other sorts of information. You can apply successive filters to the results to zero-in on your desired rows.


This is a relational division problem.

SELECT Address, FamilyName, Forename
FROM   YourTable
WHERE  Address IN (SELECT Address
                   FROM   YourTable
                   WHERE  Forename IN ( 'Julia', 'Hannah', 'Helen' )
                   GROUP  BY Address
                   HAVING COUNT(DISTINCT Forename) = 3)  

Or

WITH Names(name)
     AS (SELECT 'Julia'
         UNION ALL
         SELECT 'Hannah'
         UNION ALL
         SELECT 'Helen')
SELECT Address,
       FamilyName,
       Forename
FROM   YourTable y1
WHERE  NOT EXISTS (SELECT *
                   FROM   Names n
                   WHERE  NOT EXISTS(SELECT *
                                     FROM   YourTable y2
                                     WHERE  y1.Address = y2.Address
                                            AND y2.Forename = n.Name))  


You can try this:

SELECT A.*
FROM YourTable A
JOIN (  SELECT Address, COUNT(*) Quant
        FROM YourTable
        WHERE Forename IN ('Julia','Hannah','Helen')
        GROUP BY Address
        HAVING COUNT(DISTINCT Forename) > 2) B
ON A.Address = B.Address


If you index forename and address this should be straightforward and fast:

SELECT a.Address as CommonAddress
FROM (SELECT Address FROM Names WHERE Forename = 'Julia') a
INNER JOIN (SELECT Address FROM Names WHERE Forename = 'Richard') b ON a.Address=b.Address
INNER JOIN (SELECT Address FROM Names WHERE Forename = 'Helen') c on b.Address=c.Address


A solution is to use a subquery to find the address that is common to all of the supplied names, and to return all records at that address.

declare @people table (
    address varchar(255),
    familyName varchar(255),
    forename varchar(255)
)

insert into @people
values ('Residents of a house 2 in Janemount Lower (Cork No. 4 Urban (part of), Cork)', 'Gamble', 'Julia')
insert into @people
values ('Residents of a house 2 in Janemount Lower (Cork No. 4 Urban (part of), Cork)', 'Gamble', 'Richard')
insert into @people
values ('Residents of a house 2 in Janemount Lower (Cork No. 4 Urban (part of), Cork)', 'Gamble', 'Hannah')
insert into @people
values ('Residents of a house 2 in Janemount Lower (Cork No. 4 Urban (part of), Cork)', 'Gamble', 'Helen')
insert into @people
values ('Residents of a house 2 somewhere else (Cork No. 4 Urban (part of), Cork)', 'Cooper', 'Helen')

select people.*
from @people as people
where people.address in (
    select address
    from @people
    where forename in ('Julia', 'Hannah', 'Helen')
    group by address
    having count(forename) >= 3 -- This must be equal to the number of names searched for
)

One issue with this solution is that, if there are three Hannahs living at the same address, the query will return those people, even if Julia and Helen are not living there.


SELECT t.address
     , t.familyname
     , t.forename 
  FROM yourTable t
  WHERE t.address IN 
    -- search subquery
    ( SELECT s1.address 
        FROM yourTable s1
          JOIN yourTable s2
            ON s2.address = s1.address
          JOIN yourTable s3
            ON s3.address = s1.address
        WHERE s1.forename = "Julia"
          AND s2.forename = "Hannah"
          AND s3.forename = "Helen"
    )
  ORDER BY t.address
         , t.familyname
         , t.forename 
;

2nd solution:

SELECT t.address
     , t.familyname
     , t.forename 
  FROM yourTable t
  WHERE EXISTS 
    -- search subquery
    ( SELECT * 
        FROM yourTable s1
          JOIN yourTable s2
            ON s2.address = s1.address
          JOIN yourTable s3
            ON s3.address = s1.address
        WHERE s1.forename = "Julia"
          AND s2.forename = "Hannah"
          AND s3.forename = "Helen"
          AND s1.address = t.address
    )
  ORDER BY t.address
         , t.familyname
         , t.forename 
;

3rd solution:

SELECT t.address
     , t.familyname
     , t.forename
  FROM yourTable t
  WHERE  -- search subqueries
    EXISTS
      ( SELECT * 
          FROM yourTable s1
            WHERE s1.forename= "Julia"
              AND s1.address = t.address
      )
    AND EXISTS
      ( SELECT * 
          FROM yourTable s2
            WHERE s2.forename = "Hannah"
              AND s2.address = t.address
      )
    AND EXISTS
      ( SELECT * 
          FROM yourTable s3
            WHERE s3.forename = "Helen"
              AND s3.address = t.address
      )
  ORDER BY t.address
         , t.familyname
         , t.forename 
;


DECLARE @namecount int;
DECLARE @forenames TABLE (name varchar(50));
INSERT INTO @forenames
VALUES ('...'),
       ('...'),
       ('...');
SELECT @namecount = COUNT(*) FROM @forenames;

/* list all people by addresses that are shared by people
   whose forenames are included in @forenames */
SELECT cd.*
FROM CensusData cd
  INNER JOIN (
    SELECT d.Address
    FROM CensusData d
      INNER JOIN @forenames f ON d.Forename = f.Name
    GROUP BY d.Address
    HAVING COUNT(DISTINCT d.Forename) >= @namecount
  ) filter ON cd.Address = filter.Address

/* same for family names */
SELECT cd.*
FROM CensusData cd
  INNER JOIN (
    SELECT d.[Family Name]
    FROM CensusData d
      INNER JOIN @forenames f ON d.Forename = f.Name
    GROUP BY d.Address
    HAVING COUNT(DISTINCT d.Forename) >= @namecount
  ) filter ON cd.[Family Name]= filter.[Family Name]

/* and so on fro other criteria */

You can also combine criteria, if needed.


select *
from dataset
where address = (
    select address
    from dataset
    where famly_name = 'Hannah' and forename = 'Gamble'
)


You could start with something like this assuming address is to be matched exactly. It needs to be tweaked as you would most likely see some dups

Select
    T1.*
From
    TableName T1
Inner Join
    TableName T2
On
    T1.Address = T2.Address
Where
    T1.ForeName = 'Julia'
0

精彩评论

暂无评论...
验证码 换一张
取 消