Get list with start and end values from table of datetimes_问答_开发者

Currently i have a table built up like this way

DeviceID      Timestamp            Value
----------------------------------------
Device1       1.1.2011 10:00:00    3
Device1       1.1.2011 10:00:01    4
Device1       1.1.2011 10:00:02    4
Device1       1.1.2011 10:00:04    3
Device1       1.1.2011 10:00:05    4
Device1       1.1.2011 14:23:14    8
Device1       1.1.2011 14:23:15    7
Device1       1.1.2011 14:23:17    4
Device1       1.1.2011 14:23:18    2

As you can see there are coming in some values from a device with a given timestamp (the column type is datetime).

The problem is that the device can be started and stopped at any point and there is no direct information within the data that a start or stop had occured. But from the list of given timestamps it is quite easy to tell when a start and stop had occured, due to the fact that whenever the timestamps of two rows is within five seconds, they belong to the same measurement.

Now i'd like to get out of this data a list like this:

DeviceID      Started              Ended
Device1       1.1.2011 10:00:00    1.1.2011 10:00:05
Device1       1.1.2011 14:23:14    1.1.2011 14:23:18

So any ideas how to do this in a fast way? All i can think about is using some kind of cursor and compare each datetime pair by hand. But i think this will get really slow cause we have to inspect each value in each row.

So is there any better SQL solution which won't work with cursors?

Update

Currently i tested all the given answers. And by reading they all look good and had some interesting approaches. Unfortunately all of them (so far) failed on the real data. The biggest problem seems to be the mass of the data (currently their are round about 3.5 millions entries in the table). Performing the given query only on a small subset leads to the expected results, but rolling the query onto the whole table just leads to a very bad performance.

I have to further test and examine if i can chunkify the data and only pass a part of the data to one of these given algorithms to get this thing rolling. But maybe one of you has another smart idea to get the results a little bit faster.

Update (More informations about structure)

Okay, these informations might help, too: Currently there are round about 3.5 million entries in the table. And here are the given column types and indizes:

_ID
- int
- Primary Key
- Grouped Index
- didn't mentioned this column in my example, cause it isn't needed for this query
DeviceID
- int
- not null
- Index
Timestamp
- datetime
- not null
- Index
Value
- several not indexed columns of differen开发者_开发知识库t types (int, real, tinyint)
- all can be null

Maybe this helps to improve your already (or new) solutions to the given problem.

-- Table var to store the gaps
declare @T table
(
  DeviceID varchar(10),
  PrevPeriodEnd datetime,
  NextPeriodStart datetime
)

-- Get the gaps
;with cte as 
(
  select *,
    row_number() over(partition by DeviceID order by Timestamp) as rn
  from data
)
insert into @T
select
  C1.DeviceID,
  C1.Timestamp as PrevPeriodEnd,
  C2.Timestamp as NextPeriodStart
from cte as C1
  inner join cte as C2
    on C1.rn = C2.rn-1 and
       C1.DeviceID = C2.DeviceID and
       datediff(s, C1.Timestamp, C2.Timestamp) > 5

-- Build islands from gaps in @T
;with cte1 as
(
  -- Add first and last timestamp to gaps
  select DeviceID, PrevPeriodEnd, NextPeriodStart
  from @T
  union all
  select DeviceID, max(TimeStamp) as PrevPeriodEnd, null as NextPeriodStart
  from data
  group by DeviceID
  union all
  select DeviceID, null as PrevPeriodEnd, min(TimeStamp) as PrevPeriodEnd
  from data
  group by DeviceID
),
cte2 as
(
  select *,
    row_number() over(partition by DeviceID order by PrevPeriodEnd) as rn
  from cte1
)
select
  C1.DeviceID,
  C1.NextPeriodStart as PeriodStart,
  C2.PrevPeriodEnd as PeriodEnd
from cte2 as C1
  inner join cte2 as C2
    on C1.DeviceID = C2.DeviceID and
       C1.rn = C2.rn-1
order by C1.DeviceID, C1.NextPeriodStart

I've played around with some datatypes and names (just because I can, and because timestamp is a reserved word), and can get your requested result using your sample data.

Sample data:

create table Measures (
    DeviceID int not null,
    Occurred datetime not null,
    Value int not null,
    constraint PK_Measures PRIMARY KEY (DeviceID,Occurred)
)
go
insert into Measures (DeviceID,Occurred,Value)
select 1,'2011-01-01T10:00:00',3 union all
select 1,'2011-01-01T10:00:01',4 union all
select 1,'2011-01-01T10:00:02',4 union all
select 1,'2011-01-01T10:00:04',3 union all
select 1,'2011-01-01T10:00:05',4 union all
select 1,'2011-01-01T14:23:14',8 union all
select 1,'2011-01-01T14:23:15',7 union all
select 1,'2011-01-01T14:23:17',4 union all
select 1,'2011-01-01T14:23:18',2

and now the query:

;with StartPeriods as (
    select m1.DeviceID,m1.Occurred as Started
    from Measures m1 left join Measures m2 on m1.DeviceID = m2.DeviceID and m2.Occurred < m1.Occurred and DATEDIFF(second,m2.Occurred,m1.Occurred) < 6
    where m2.DeviceID is null
), ExtendPeriods as (
    select DeviceID,Started,Started as Ended from StartPeriods
    union all
    select
        ep.DeviceID,ep.Started,m2.Occurred
    from
        ExtendPeriods ep
            inner join
        Measures m2
            on
                ep.DeviceID = m2.DeviceID and
                ep.Ended < m2.Occurred and
                DATEDIFF(SECOND,ep.Ended,m2.Occurred) < 6
)
select DeviceID,Started,MAX(Ended) from ExtendPeriods group by DeviceID,Started

The StartPeriods Common Table Expression (CTE) finds those rows from the Measures table that don't have a previous row within 5 seconds of them. The ExtendPeriods CTE then recursively extends these periods by finding new rows from Measures that occur up to 5 seconds after the current end of the found period.

We then find the rows where the end of the period was as far away from the start as possible.

Try this:

select DeviceID,MIN(Timestamp),MAX(Timestamp) 
          from @table group by DATEPART(hh,Timestamp),DeviceID

The basic idea for the below solution has been borrowed from this answer.

WITH data (DeviceID, Timestamp, Value) AS (
  SELECT 'Device1', CAST('1.1.2011 10:00:00' AS datetime), 3 UNION ALL
  SELECT 'Device1',      '1.1.2011 10:00:01',              4 UNION ALL
  SELECT 'Device1',      '1.1.2011 10:00:02',              4 UNION ALL
  SELECT 'Device1',      '1.1.2011 10:00:04',              3 UNION ALL
  SELECT 'Device1',      '1.1.2011 10:00:05',              4 UNION ALL
  SELECT 'Device1',      '1.1.2011 14:23:14',              8 UNION ALL
  SELECT 'Device1',      '1.1.2011 14:23:15',              7 UNION ALL
  SELECT 'Device1',      '1.1.2011 14:23:17',              4 UNION ALL
  SELECT 'Device1',      '1.1.2011 14:23:18',              2
),
ranked AS (
  SELECT
    *,
    rn = ROW_NUMBER() OVER (PARTITION BY DeviceID ORDER BY Timestamp)
  FROM data
),
starts AS (
  SELECT
    r1.DeviceID,
    r1.Timestamp,
    rank = ROW_NUMBER() OVER (PARTITION BY r1.DeviceID ORDER BY r1.Timestamp)
  FROM ranked r1
    LEFT JOIN ranked r2 ON r1.DeviceID = r2.DeviceID
      AND r1.rn = r2.rn + 1
      AND r1.Timestamp <= DATEADD(second, 5, r2.Timestamp)
  WHERE r2.DeviceID IS NULL
),
ends AS (
  SELECT
    r1.DeviceID,
    r1.Timestamp,
    rank = ROW_NUMBER() OVER (PARTITION BY r1.DeviceID ORDER BY r1.Timestamp)
  FROM ranked r1
    LEFT JOIN ranked r2 ON r1.DeviceID = r2.DeviceID
      AND r1.rn = r2.rn - 1
      AND r1.Timestamp >= DATEADD(second, -5, r2.Timestamp)
  WHERE r2.DeviceID IS NULL
)
SELECT
  s.DeviceID,
  Started = s.Timestamp,
  Ended = e.Timestamp
FROM starts s
  INNER JOIN ends e ON s.DeviceID = e.DeviceID AND s.rank = e.rank

Try this, though I'm not sure how well it will perform with lots of data

SELECT a.TS AS [StartTime], (SELECT TOP 1 c.TS FROM TestTime c WHERE c.TS >= a.TS AND
    NOT EXISTS(SELECT * FROM TestTime d WHERE d.TS > c.TS AND DATEDIFF(SECOND, c.TS, d.TS) <= 5) ORDER BY c.TS) AS [StopTime]
FROM TestTime a WHERE NOT EXISTS (SELECT * FROM TestTime b WHERE a.TS > b.TS AND DATEDIFF(SECOND, b.TS, a.TS) <= 5)

my table is called TestTime and the column is called TS so tweak it for your table. I've used the NOT EXISTS to check for a timestamp < the current record and within 5 seconds of it - so display if not found, i.e. a start time (or the first record in the table and then it will look for the lowest timestamp that is greater than any records found that is >= that timestamp (in case it's a single entry, so a start/stop one) and that again uses NOT EXISTS to check for a record that is greater than it and within 5 seconds - so, again, display if a record isn't found (only the 1st). You can probably tweak and improve this, but it might be a good basis.

Note that if it is still running it will list the last time found as the stop time for the last start event.

I haven't put a device name in here, for simplicity, so you would need to put that in the StopTime and WHERE clauses

DECLARE @t TABLE
(DeviceID      VARCHAR(10),
 1678071012    DATETIME,
 VALUE          INT
)

INSERT @t
SELECT 'Device1','20110101 10:00:00',    3
UNION SELECT 'Device1','20110101 10:00:01',    4
UNION SELECT 'Device1','20110101 10:00:02',    4
UNION SELECT 'Device1','20110101 10:00:04',   3
UNION SELECT 'Device1','20110101 10:00:05',    4
UNION SELECT 'Device1','20110101 14:23:14',    8
UNION SELECT 'Device1','20110101 14:23:15',    7
UNION SELECT 'Device1','20110101 14:23:17',    4
UNION SELECT 'Device1','20110101 14:23:18',    2


;WITH myCTE
AS
(
    SELECT DeviceID, 1678071012,
           ROW_NUMBER() OVER (PARTITION BY DeviceID
                              ORDER BY [TIMESTAMP]
                             ) AS rn
    FROM @t
)
, recCTE
AS
(
    SELECT DeviceID, 1678071012,  0 as groupID, rn FROM myCTE
    WHERE rn = 1

    UNION ALL

    SELECT r.DeviceID, g.1678071012,  CASE WHEN DATEDIFF(ss,r.1678071012, g.1678071012) <= 5 THEN r.groupID ELSE r.groupID + 1 END, g.rn 
    FROM recCTE AS r
    JOIN myCTE AS g
    ON g.rn = r.rn + 1
)
SELECT DeviceID, MIN(1678071012) AS [started], MAX(1678071012) AS ended
FROM recCTE
GROUP BY DeviceId, groupId
OPTION (MAXRECURSION 0);

You should be able to use window functions for this (assuming 15 minutes defines a new session below):

SELECT DeviceId,
       Timestamp,
       COALESCE((Timestamp - lag(Timestamp) OVER w) > interval '15 min', TRUE)
       as session_begins
       COALESCE((lead(Timestamp) OVER w - Timestamp) > interval '15 min', TRUE)
       as session_ends
FROM YourTable
WINDOW w AS (PARTITION BY DeviceId ORDER BY Timestamp);

Depending on your where clause, you might want to remove the coalesce/true part since the first/last row fetched may become invalid.

If you need only the boundaries, you could use the above in a subquery and group by DeviceId, session_begins, session_ends having session_begins or session_ends. Also, if you do this, don't forget to put the where clause in the subquery, rather than the main one, else you'll end up doing a seq scan on the whole table because of the window aggregate.