开发者

How do I resolve aggregate data into unique entries, eliminating duplicates, while identifying preferred records within the duplicates

开发者 https://www.devze.com 2023-03-16 14:19 出处:网络
I have a table, derived from several different tables, with several similar (near duplicate) rows and need to resolve similar rows into single rows. Using DISTINCT or UNION resolves the rows but in an

I have a table, derived from several different tables, with several similar (near duplicate) rows and need to resolve similar rows into single rows. Using DISTINCT or UNION resolves the rows but in an arbitrary manner. I want to phrase rules for preferring one row over the other. For example:

Table: Foo  
FieldA, FieldB, FieldC, FieldD, FieldE  
abc123, 123abc, 开发者_如何学运维short, short, extra  
abc123, 123abc, Longer, short, extra  
abc123, 123abc, short, Longer, extra  
abc123, 123abc, Longer, Longer, extra  
abc123, 123abc, short, short, extra  

The relationships are:

  1. FieldA and FieldB form uniqueness (i.e. 1 row for each unique set of these two fields)
  2. The longer FieldC and FieldD are the better
  3. I need to perform a simple WHERE clause on FieldE Performing a GROUP BY can resolve to unique FiledA & FieldB combinations (relationship 1) but how do I resolve which of the rows within the grouping get picked.

The desired result from the table above would be the 4th row:

abc123, 123abc, Longer, Longer, extra


you will use a combination of GROUP BY and HAVING count(*) > 1 clauses.


Ok here's the setup (using your elegantly named sample columns):

CREATE TABLE dbo.Foo
    (
    FieldA varchar(50) NOT NULL,
    FieldB varchar(50) NOT NULL,
    FieldC varchar(50) NOT NULL,
    FieldD varchar(50) NOT NULL,
    FieldE varchar(50) NOT NULL
    )  
go

insert into foo values('abc123', '123abc', '01', '01', '')
insert into foo values('abc123', '123abc', '012', '012', '')
insert into foo values('abc123', '123abc', '0123', '01', '')
insert into foo values('abc123', '123abc', '01234567', '01', '')
insert into foo values('abc123', '123abc', '012345', '012345', '')
insert into foo values('def123', '123def', '012345', '012345', '')
insert into foo values('def123', '123def', '', '012345', '')

And the solution:

select * from
(
select *, row_number() over(partition by fielda, fieldb order by len(fieldc) + len(fieldd) desc) seq
from foo
) ordered
where seq = 1

This will give you unique combinations of your first two columns, and the longest (defined as length of col3 + the length of col4)

Pull out the where clause to see the values it came up for all rows.

0

精彩评论

暂无评论...
验证码 换一张
取 消