SQL Server logical grouping most recent time_问答_开发者

I am looking for the best approach to this issue and have not had any luck so far. I am using Microsoft SQL Server 2008.

So here is sample data of a user:

======================

Name Status Date

======================

BOB Active 2011-03-07

BOB Active 2011-03-11

BOB Disabled 2011-03-15

BOB Disabled 2011-03-21

BOB Active 2011-03-23

BOB Active 2011-03-28

======================

I want to have this grouped so that it will show when the user actually changed the status and not the next time they were checked if the result stayed the same so what the query returns would be as follows:

======================

BOB Active 2011-03-07

BOB Disabled 2011-03-15

BOB Active 2011-03-23

======================

This is data that I do not have any access how it is created.

Any ideas开发者_Python百科 on the best approach for getting this data the way I need? Keep in mind this is for Thousands of records and performance should be taken into consideration if possible. I can't think of a way to do the grouping so that it will not group the "Activate"s together and use the MIN aggregate function.

Is there a way to maybe loop through and make the data look like this for grouping and group on the last column? Would this even be the best approach?

======================

Name Status Date Group

======================

BOB Active 2011-03-07 1

BOB Active 2011-03-11 1

BOB Disabled 2011-03-15 2

BOB Disabled 2011-03-21 2

BOB Active 2011-03-23 3

BOB Active 2011-03-28 3

======================

All of my searches have not worked out, if someone could give me a key word to search for to get me on the right path that would be very much appreciated.

select tb.name,tb.status,tb.date 
from the_table tb
join (select name,min(date) as date from the_table group by name) t_aggr on tb.name=t_aggr.name and tb.date=t_aggr.date
union
select tb2.name,tb2.status,tb2.date
from the_table tb1
join the_table tb2 on tb1.name=tb2.name and tb1.status<>tb2.status and tb2.date>tb1.date
left join the_table tb3 on tb1.name=tb2.name and tb1.date<tb3.date and tb3.date<tb2.date
where tb3.date is null

The first select in union is to get the first record. The second should get the changes. Anyway test it as I have no sql server to try it on.

If you are using SQL Server 2005 or newer (which you are!), you can easily detect changes in status by ranking the dates and joining on the previous date. This will get you what you need:

WITH rankedDates AS (
    SELECT name, status, date, row_number() OVER (PARTITION BY name ORDER BY DATE) as dateRank
    FROM #myTable
)
SELECT curr.name, curr.status, curr.date
FROM rankedDates curr
LEFT JOIN rankedDates prev
    ON curr.name = prev.name
    AND curr.dateRank = prev.dateRank+1
WHERE prev.status IS NULL
    OR curr.status <> prev.status

Some sql to create the test data I used to make this function (I threw in Jake to make sure it handled two people ok):

SELECT 'BOB' AS name, 'Active' AS status, '2011-03-07' AS date  
INTO #myTable
UNION ALL
SELECT 'BOB', 'Active', '2011-03-11' UNION ALL
SELECT 'BOB', 'Disabled', '2011-03-15' UNION ALL
SELECT 'BOB', 'Disabled', '2011-03-21' UNION ALL
SELECT 'BOB', 'Active', '2011-03-23' UNION ALL
SELECT 'BOB', 'Active', '2011-03-28' 
--include below lines for more complex example
UNION ALL
SELECT 'JAKE', 'Active', '2011-01-11' UNION ALL
SELECT 'JAKE', 'Disabled', '2011-01-15' UNION ALL
SELECT 'JAKE', 'Disabled', '2011-05-21' UNION ALL
SELECT 'JAKE', 'Active', '2011-05-23' UNION ALL
SELECT 'JAKE', 'Active', '2011-07-28'

Regarding performance it would be better if you had IDs for name, and an index on name,date would help as well.