Getting almost distinct rows_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-02-07 17:30 出处：网络

I have a table like this C1C2C3 MikeLondon578 MikeBonn578 JaneMadrid245 BillyParis345 JaneRome245 And I nee开发者_开发问答d a query that gives me:

I have a table like this

C1             C2            C3
Mike           London        578
Mike           Bonn          578
Jane           Madrid        245
Billy          Paris         345
Jane           Rome          245

And I nee开发者_开发问答d a query that gives me:

C1             C2            C3
Mike           London        578
Jane           Madrid        245
Billy          Paris         345

That is, a query that gives me something like a distinct on C1, ignoring the next occurrences of the same value on C1.

EDIT: Please excuse me, this was just a quick sample and somewhat it seems to induce some of you to think that C3 matters, I'm editing it to make it look more like the real table, which has about 50 columns, and the problematic rows all identic except for a value, which can be discarded.

If you don't care from which record the data is, you could just write it as:

SELECT C1, min(C2), min(C3)
FROM table
GROUP BY C1

The problem here is that min(C2) and min(C3) could actually mix data from different records.

If you had a primary key, you could avoid id easily:

SELECT C1, C2, C3
FROM table t
WHERE id IN (
  SELECT min(t2.id) 
  FROM table t2
  GROUP BY t2.C1)

There is really no such simple concept as "next occurrences" in SQL because the sets/relations are by default unordered. You must explicitly state how the rows are to be ordered with an ORDER BY clause and then select from that ordered relation the row or rows you want (using TOP in SQL Server 2000). You don't appear to be sorting by C3 descending (since Jane has a 346 and you want her 245). What tacit order-by is implicit in your word "next" (i.e. you want the first row per distinct person) ? How do you wish to define first in this query? Do you want each person's lowest C3 value? If so you could group by person taking the min(c3) in an inline view and join that inline view to another inline view where you have selected the distinct C1.

Use the RANK() OVER PARITION like this (2005, 2008):

declare @table  as table (c1 nvarchar(10), c2 nvarchar(10), c3 int, id int identity(1,1))

insert into @table values
('Mike',           'London',        578),
('Mike',           'Bonn',          234),
('Jane',           'Madrid',        245),
('Billy',          'Paris',         345),
('Jane',           'Rome',          346)

select c1, c2, c3 from 
( select id, c1, c2, c3, RANK() over ( partition by c1 order by id) as Rank from @table) tmp
where tmp.Rank = 1
order by id

Use an interesting WHERE-clause like this (2000):

select t2.*
from
@table t2
where 
(select COUNT(*) from @table t where t.c1=t2.c1 and (t2.c2 > t.c2 or (t2.c2 = t.c2 and t2.c3 > t.c3))) = 1
union
select * from @table where c1 in (select c1 from @table group by c1 having COUNT(*) = 1)

The ordering is different from above, but you'll have to sort that out in your real world data.

I am more than likely being silly but distinct takes the distinct combination of all the columns in the select.

To acheive this you would need more data, something to determine which row came first.

Here is some code that i whipped...

DECLARE @TBL TABLE(

    ID INT IDENTITY(1,1),
    C1 VARCHAR(100),
    C2 VARCHAR(100),
    C3 INT

)
INSERT INTO @TBL VALUES ('Mike','London',578)
INSERT INTO @TBL VALUES ('Mike','Bonn',234)
INSERT INTO @TBL VALUES ('Jane','Madrid',245)
INSERT INTO @TBL VALUES ('Billy','Paris',345)
INSERT INTO @TBL VALUES ('Jane','Rome',346)

SELECT * FROM @TBL T
WHERE
ID = (SELECT MIN(ID) FROM @TBL CHILD WHERE CHILD.C1 = T.C1)
GROUP BY ID,C1,C2,C3

Hope this helps?