My question is about how to write an SQL query to calculate the average time between successive events.
I have a small table:
event Name | Time
stage 1 | 10:01
stage 2 | 10:03
stage 3 | 10:06
stage 1 | 10:10
stage 2 | 10:15
stage 开发者_运维知识库3 | 10:21
stage 1 | 10:22
stage 2 | 10:23
stage 3 | 10:29
I want to build a query that get as an answer the average of the times between stage(i) and stage(i+1).
For example, the average time between stage 2 and stage 3 is 5:
(3+6+6)/3 = 5
Aaaaand with a sprinkle of black magic:
select a.eventName, b.eventName, AVG(DATEDIFF(MINUTE, a.[Time], b.[Time])) as Average from
(select *, row_number() over (order by [time]) rn from events) a
join (select *, row_number() over (order by [time]) rn from events) b on (a.rn=b.rn-1)
group by
a.eventName, b.eventName
This will give you rows like:
stage3 stage1 2
stage1 stage2 2
stage2 stage3 5
The first column is the starting event, the second column is the ending event. If there is Event 3 right after Event 1, that will be listed as well. Otherwise you should provide some criteria as to which stage follows which stage, so the times are calculated only between those.
Added: This should work OK on both Transact-SQL (MSSQL, Sybase) and PL/SQL (Oracle, PostgreSQL). However I haven't tested it and there could still be syntax errors. This will NOT work on any edition of MySQL.
Select Avg(differ) from (
Select s1.r, s2.r, s2.time - s1.time as differ from (
Select * From (Select rownum as r, inn.time from table inn order by time) s1
Join (Select rownum as r, inn.time from table inn order by time) s2
On mod(s2.r, 3) = 2 and s2.r = s1.r + 1
Where mod(s1.r, 3) = 1)
);
The parameters can be changed as the number of stages changes. This is currently set up to find the average between stages 1 and 2 from a 3 stage process.
EDIT a couple typos
Your table design is flawed. HOw can you tell which stage1 goes with which stage2? Without a way to do this, I do not think your query is possible.
The easiest way would be to order by time and use a cursor (tsql) for iterating over the data. Since cursors are evil it is advisable to fetch the data ordered by time into your application code and iterate there. There are probably other ways to do this in SQL but they will be very complicated and rely on non-standard language extensions.
You don't say which flavour of SQL you want the answer for. This probably means you want the code in SQL Server (as [sql] commonly = [sql-server] in SO tag usage).
But just in case you (or some future seeker) are using Oracle, this kind of query is quite straightforward with analytic functions, in this case LAG()
. Check it out:
SQL> select stage_range
2 , avg(time_diff)/60 as average_time_diff_in_min
3 from
4 (
5 select event_name
6 , case when event_name = 'stage 2' then 'stage 1 to 2'
7 when event_name = 'stage 3' then 'stage 2 to 3'
8 else '!!!' end as stage_range
9 , stage_secs - lag(stage_secs)
10 over (order by ts, event_name) as time_diff
11 from
12 ( select event_name
13 , ts
14 , to_number(to_char(ts, 'sssss')) as stage_secs
15 from timings )
16 )
17 where event_name in ('stage 2','stage 3')
18 group by stage_range
19 /
STAGE_RANGE AVERAGE_TIME_DIFF_IN_MIN
------------ ------------------------
stage 1 to 2 2.66666667
stage 2 to 3 5
SQL>
The change of format in the inner query is necessary because I have stored the TIME column as a DATE datatype, so I convert it into seconds to make the mathematics clearer. An alternate solution would be to work with Day to Second Interval
datatype instead. But this solution is really all about LAG()
.
edit
In my take on this query I have explicitly not calculated the difference between a prior Stage 3 and a subsequent Stage 1. This is a matter of requirement.
WITH q AS
(
SELECT 'stage 1' AS eventname, CAST('2009-01-01 10:01:00' AS DATETIME) AS eventtime
UNION ALL
SELECT 'stage 2' AS eventname, CAST('2009-01-01 10:03:00' AS DATETIME) AS eventtime
UNION ALL
SELECT 'stage 3' AS eventname, CAST('2009-01-01 10:06:00' AS DATETIME) AS eventtime
UNION ALL
SELECT 'stage 1' AS eventname, CAST('2009-01-01 10:10:00' AS DATETIME) AS eventtime
UNION ALL
SELECT 'stage 2' AS eventname, CAST('2009-01-01 10:15:00' AS DATETIME) AS eventtime
UNION ALL
SELECT 'stage 3' AS eventname, CAST('2009-01-01 10:21:00' AS DATETIME) AS eventtime
UNION ALL
SELECT 'stage 1' AS eventname, CAST('2009-01-01 10:22:00' AS DATETIME) AS eventtime
UNION ALL
SELECT 'stage 2' AS eventname, CAST('2009-01-01 10:23:00' AS DATETIME) AS eventtime
UNION ALL
SELECT 'stage 3' AS eventname, CAST('2009-01-01 10:29:00' AS DATETIME) AS eventtime
)
SELECT (
SELECT AVG(DATEDIFF(minute, '2009-01-01', eventtime))
FROM q
WHERE eventname = 'stage 3'
) -
(
SELECT AVG(DATEDIFF(minute, '2009-01-01', eventtime))
FROM q
WHERE eventname = 'stage 2'
)
This relies on the fact that you always have complete groups of the stages and they always go in the same order (that is, stage 1
then stage 2
then stage 3
)
I can't comment, but I have to agree with HLGEM. While you can tell with the provided data set, the OP should be made aware that relying on only a single set of stages existing at one time may be too optimistic.
event Name | Time
stage 1 | 10:01
stage 2 | 10:03
stage 3 | 10:06
stage 1 | 10:10
stage 2 | 10:15
stage 3 | 10:21
stage 1 | 10:22
stage 2 | 10:23
stage 1 | 10:25 --- new stage 1
stage 2 | 10:28 --- new stage 2
stage 3 | 10:29
stage 3 | 10:34 --- new stage 3
We don't know the environment or what is creating the data. It is up to the OP to decide if the table is built correctly.
Oracle would handle this with Analytics. like Vilx's answer.
try this
Select Avg(e.Time - s.Time)
From Table s
Join Table e
On e.Time =
(Select Min(Time)
From Table
Where eventname = s.eventname
And time > s.Time)
And Not Exists
(Select * From Table
Where eventname = s.eventname
And time < s.Time)
For each record representing a Start of a stage, this sql joins it to the record which represents the end, takes the difference between the end time and the start time, and averages those differences. The Not Exists ensures that he intermediate resultset of start records joined to end records only includes the start records as s... and the first join condition ensures that only the one end record ( the one with the same name and the next time value after the start time) is joined to it...
To see the intermediate resultset after the join, but before the average is taken, run the following:
Select s.EventName,
s.Time Startime, e.Time EndTime,
(e.Time - s.Time) Elapsed
From Table s
Join Table e
On e.Time =
(Select Min(Time)
From Table
Where eventname = s.eventname
And time > s.Time)
And Not Exists
(Select * From Table
Where eventname = s.eventname
And time < s.Time)
精彩评论