开发者

how to develop t-sql subquery to select only one record each?

开发者 https://www.devze.com 2023-04-05 09:33 出处:网络
I am using SSMS 2008, trying to select just one row/client. I need to select the following columns: client_name, end_date, and program. Some clients have just one client row. But others have开发者_高级

I am using SSMS 2008, trying to select just one row/client. I need to select the following columns: client_name, end_date, and program. Some clients have just one client row. But others have开发者_高级运维 multiple.

For those clients with multiple rows, they normally have different end_date and program. For instance:

CLIENT       PROGRAM        END_DATE
a            b              c
a            d              e
a            f              g
h            d              e
h            f              NULL

This is a real simplified version of the actual data. As you will see, different clients can be in the same program ("d"). But the same client cannot be in the same program more than one time.

Also the tricky thing is that the end_date can be NULL, so when I tried selecting those clients with > 1 row, I added a HAVING statement > 1. But this eliminated all of my NULL End_date rows.

To sum up, I want one row per client. So those clients with only one row total + those clients listed above with the following criteria:

  • Select only the row where either the End_date is greatest or NULL. (In most cases the end_date is null for these clients).

How can I achieve this with as little logic as possible?


On SQL Server 2005 and up, you can use a Common Table Expression (CTE) combined with the ROW_NUMBER() and PARTITION BY function. This CTE will "partition" your data by one criteria - in your case by Client, creating a "partition" for each separate client. The ROW_NUMBER() will then number each partition ordered by another criteria - here I created a DATETIME - and assigns numbers from 1 on up, separately for each partition.

So in this case, ordering by DATETIME DESC, the newest row gets numbered as 1 - and that's the fact I use when selecting from the CTE. I used the ISNULL() function here to assign those rows that have a NULL end_date some arbitrary value to "get them in order". I wasn't quite sure if I understood your question properly: did you want to select the NULL rows over those with a given end_Date, or did you want to give precedence to an existing end_Date value over NULL?

This will select the most recent row for each client (for each "partition" of your data):

DECLARE @clients TABLE (Client CHAR(1), Program CHAR(1), END_DATE DATETIME)

INSERT INTO @clients 
VALUES('a', 'b', '20090505'),
('a', 'd', '20100808'),
('a', 'f', '20110303'),
('h', 'd', '20090909'),
('h', 'f', NULL)

;WITH LatestData AS
(
   SELECT Client, Program, End_Date,
       ROW_NUMBER() OVER(PARTITION BY CLient ORDER BY ISNULL(End_Date, '99991231') DESC) AS 'RowNum'
    FROM @clients
)
SELECT Client, Program, End_Date
FROM LatestData 
WHERE RowNum = 1

Results in an output of:

Client  Program  End_Date
   a       f     2011-03-03
   h       f     (NULL)
0

精彩评论

暂无评论...
验证码 换一张
取 消