Query to calculate average time between successive events_问答_开发者

My question is about how to write an SQL query to calculate the average time between successive events.

I have a small table:

event Name    |    Time

stage 1       |    10:01
stage 2       |    10:03
stage 3       |    10:06
stage 1       |    10:10
stage 2       |    10:15
stage 开发者_运维知识库3       |    10:21
stage 1       |    10:22
stage 2       |    10:23
stage 3       |    10:29

I want to build a query that get as an answer the average of the times between stage(i) and stage(i+1).

For example, the average time between stage 2 and stage 3 is 5:

(3+6+6)/3 =  5

Aaaaand with a sprinkle of black magic:

select a.eventName, b.eventName, AVG(DATEDIFF(MINUTE, a.[Time], b.[Time])) as Average from
     (select *, row_number() over (order by [time]) rn from events) a
join (select *, row_number() over (order by [time]) rn from events) b on (a.rn=b.rn-1)
group by
a.eventName, b.eventName

This will give you rows like:

stage3  stage1  2
stage1  stage2  2
stage2  stage3  5

The first column is the starting event, the second column is the ending event. If there is Event 3 right after Event 1, that will be listed as well. Otherwise you should provide some criteria as to which stage follows which stage, so the times are calculated only between those.

Added: This should work OK on both Transact-SQL (MSSQL, Sybase) and PL/SQL (Oracle, PostgreSQL). However I haven't tested it and there could still be syntax errors. This will NOT work on any edition of MySQL.

Select Avg(differ) from (
 Select s1.r, s2.r, s2.time - s1.time as differ from (
 Select * From (Select rownum as r, inn.time from table inn order by time) s1
 Join (Select rownum as r, inn.time from table inn order by time) s2
 On mod(s2.r, 3) = 2 and s2.r = s1.r + 1
 Where mod(s1.r, 3) = 1)
);

The parameters can be changed as the number of stages changes. This is currently set up to find the average between stages 1 and 2 from a 3 stage process.

EDIT a couple typos

Your table design is flawed. HOw can you tell which stage1 goes with which stage2? Without a way to do this, I do not think your query is possible.

The easiest way would be to order by time and use a cursor (tsql) for iterating over the data. Since cursors are evil it is advisable to fetch the data ordered by time into your application code and iterate there. There are probably other ways to do this in SQL but they will be very complicated and rely on non-standard language extensions.

You don't say which flavour of SQL you want the answer for. This probably means you want the code in SQL Server (as [sql] commonly = [sql-server] in SO tag usage).

But just in case you (or some future seeker) are using Oracle, this kind of query is quite straightforward with analytic functions, in this case LAG(). Check it out:

SQL> select stage_range
  2         , avg(time_diff)/60 as average_time_diff_in_min
  3  from
  4      (
  5          select event_name
  6                 , case when event_name = 'stage 2' then  'stage 1 to 2'
  7                      when event_name = 'stage 3' then  'stage 2 to 3'
  8                      else  '!!!' end as stage_range
  9                 , stage_secs - lag(stage_secs)
 10                              over (order by ts, event_name) as time_diff
 11                 from
 12                     ( select event_name
 13                              , ts
 14                              , to_number(to_char(ts, 'sssss')) as stage_secs
 15                       from timings )
 16      )
 17         where event_name in ('stage 2','stage 3')
 18  group by stage_range
 19  /

STAGE_RANGE  AVERAGE_TIME_DIFF_IN_MIN
------------ ------------------------
stage 1 to 2               2.66666667
stage 2 to 3                        5

SQL>

The change of format in the inner query is necessary because I have stored the TIME column as a DATE datatype, so I convert it into seconds to make the mathematics clearer. An alternate solution would be to work with Day to Second Interval datatype instead. But this solution is really all about LAG().

edit

In my take on this query I have explicitly not calculated the difference between a prior Stage 3 and a subsequent Stage 1. This is a matter of requirement.

WITH    q AS
        (
        SELECT  'stage 1' AS eventname, CAST('2009-01-01 10:01:00' AS DATETIME) AS eventtime
        UNION ALL
        SELECT  'stage 2' AS eventname, CAST('2009-01-01 10:03:00' AS DATETIME) AS eventtime
        UNION ALL
        SELECT  'stage 3' AS eventname, CAST('2009-01-01 10:06:00' AS DATETIME) AS eventtime
        UNION ALL
        SELECT  'stage 1' AS eventname, CAST('2009-01-01 10:10:00' AS DATETIME) AS eventtime
        UNION ALL
        SELECT  'stage 2' AS eventname, CAST('2009-01-01 10:15:00' AS DATETIME) AS eventtime
        UNION ALL
        SELECT  'stage 3' AS eventname, CAST('2009-01-01 10:21:00' AS DATETIME) AS eventtime
        UNION ALL
        SELECT  'stage 1' AS eventname, CAST('2009-01-01 10:22:00' AS DATETIME) AS eventtime
        UNION ALL
        SELECT  'stage 2' AS eventname, CAST('2009-01-01 10:23:00' AS DATETIME) AS eventtime
        UNION ALL
        SELECT  'stage 3' AS eventname, CAST('2009-01-01 10:29:00' AS DATETIME) AS eventtime
        )
SELECT  (
        SELECT  AVG(DATEDIFF(minute, '2009-01-01', eventtime))
        FROM    q
        WHERE   eventname = 'stage 3'
        ) - 
        (
        SELECT  AVG(DATEDIFF(minute, '2009-01-01', eventtime))
        FROM    q
        WHERE   eventname = 'stage 2'
        )

This relies on the fact that you always have complete groups of the stages and they always go in the same order (that is, stage 1 then stage 2 then stage 3)

I can't comment, but I have to agree with HLGEM. While you can tell with the provided data set, the OP should be made aware that relying on only a single set of stages existing at one time may be too optimistic.


event Name    |    Time

stage 1       |    10:01
stage 2       |    10:03
stage 3       |    10:06
stage 1       |    10:10
stage 2       |    10:15
stage 3       |    10:21
stage 1       |    10:22
stage 2       |    10:23
stage 1       |    10:25     --- new stage 1
stage 2       |    10:28     --- new stage 2
stage 3       |    10:29
stage 3       |    10:34     --- new stage 3

We don't know the environment or what is creating the data. It is up to the OP to decide if the table is built correctly.

Oracle would handle this with Analytics. like Vilx's answer.

try this

   Select Avg(e.Time - s.Time)
   From Table s
     Join Table e 
         On e.Time = 
             (Select Min(Time)
              From Table
              Where eventname = s.eventname 
                 And time > s.Time)
         And Not Exists 
             (Select * From Table
              Where eventname = s.eventname 
                 And time < s.Time)

For each record representing a Start of a stage, this sql joins it to the record which represents the end, takes the difference between the end time and the start time, and averages those differences. The Not Exists ensures that he intermediate resultset of start records joined to end records only includes the start records as s... and the first join condition ensures that only the one end record ( the one with the same name and the next time value after the start time) is joined to it...

To see the intermediate resultset after the join, but before the average is taken, run the following:

   Select s.EventName,
       s.Time Startime, e.Time EndTime, 
       (e.Time - s.Time) Elapsed
   From Table s
     Join Table e 
         On e.Time = 
             (Select Min(Time)
              From Table
              Where eventname = s.eventname 
                 And time > s.Time)
         And Not Exists 
             (Select * From Table
              Where eventname = s.eventname 
                 And time < s.Time)