I'm trying to determine the total elapsed time to complete a set of processes for a multithreaded application which keeps track of start and end times in a table. The easiest way to describe my problem is with an example.
Here's a dumbed-down version of the table (we'll call it processes
) I'm working with:
| id | start_date | end_date |
---------------------------------------------------
| 1 | 07/15/2011 12:00:00 | 07/15/2011 12:01:00 |
| 2 | 07/15/2011 12:00:00 | 07/15/2011 12:02:00 |
| 3 | 07/15/2011 12:00:00 | 07/15/2011 12:03:00 |
| 4 | 07/15/2011 12:01:00 | 07/15/2011 12:05:00 |
| 5 | 07/15/2011 12:01:00 | 07/15/2011 12:03:00 |
| 6 | 07/15/2011 12:03:00 | 07/15/2011 12:04:00 |
| 7 | 07/15/2011 12:03:00 | 07/15/2011 12:07:00 |
| 8 | 07/15/2011 12:03:00 | 07/15/2011 12:06:00 |
| 9 | 07/15/2011 12:04:00 | 07/15/2011 12:05:00 |
| 10 | 07/15/2011 12:05:00 | 07/15/2011 12:07:00 |
| 11 | 07/15/2011 12:08:00 | 07/15/2011 12:09:00 |
With such a small sample of data it's easy enough to visualize this (I'm assuming a thread can finish with a process and instantaneously pick up the next one with no overhead for the purposes of this question)
12:XX: | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
Thread1: 1---4---------------10------] 11--]
Thread2: 2-------] 开发者_开发知识库6---9---]
Thread3: 3-----------7---------------]
Thread4: 5-------8-----------]
And from there you can easily tell that the total time spent working on the 11 processes was 8 minutes.
The problem arises because I am dealing with thousands of records, and there are some periods of time where no processing is happening at all.
How can I get this result using a PL/SQL query selecting from the table?
This is one approach you could take. Generate the list of possible times (in you case, all the way up to seconds) and use "exists" or join it to your table to get the seconds for each ID .
*Edit : New example using exists instead of the Join.. more in line with the question that the query is trying to anser.. (On how many seconds was at least one active process running..?). Set Up data below the query...* '
select count(time1) from
(
/* All possible seconds in the time frame you want to track */
select trunc(sysdate) + level/(24*60*60) time1
from dual
connect by level <= (trunc(sysdate)-trunc(sysdate-1))*24*60*60
) all_times
where exists (
/* Atleast one process is running at that second...*/
select 1
from t1
where t1.start_time < all_times.time1
and t1.end_time >= all_times.time1
)
Count(time1)
11
For ease of understanding, I took only 4 cases/recordings. The first three recordings intersect and the last one is a complete disjoint recording.
create table t1(
id number,
start_time date,
end_time date
);
SQL> insert into t1 values ( 1,
2 to_date('07/15/2011 12:00:00','MM/DD/YYYY HH24:MI:SS'),
3 to_date('07/15/2011 12:00:05','MM/DD/YYYY HH24:MI:SS'));
1 row created.
SQL> insert into t1 values ( 1,
2 to_date('07/15/2011 12:00:02','MM/DD/YYYY HH24:MI:SS'),
3 to_date('07/15/2011 12:00:04','MM/DD/YYYY HH24:MI:SS'));
1 row created.
SQL> insert into t1 values ( 1,
2 to_date('07/15/2011 12:00:03','MM/DD/YYYY HH24:MI:SS'),
3 to_date('07/15/2011 12:00:06','MM/DD/YYYY HH24:MI:SS'));
1 row created.
SQL> insert into t1 values ( 1,
2 to_date('07/15/2011 12:00:15','MM/DD/YYYY HH24:MI:SS'),
3 to_date('07/15/2011 12:00:20','MM/DD/YYYY HH24:MI:SS'));
1 row created.
SQL> commit;
alter session set NLS_DATE_FORMAT = "MM/DD/YYYY HH24:MI:SS";
SQL> select * from t1;
ID START_TIME END_TIME
---------- ------------------- -------------------
1 07/15/2011 12:00:00 07/15/2011 12:00:05
1 07/15/2011 12:00:02 07/15/2011 12:00:04
1 07/15/2011 12:00:03 07/15/2011 12:00:06
1 07/15/2011 12:00:15 07/15/2011 12:00:20
This part gets you the list of all times (till seconds) on 15th July...
select trunc(sysdate) + level/(24*60*60) time1
from dual
connect by level <= (trunc(sysdate)-trunc(sysdate-1))*24*60*60
Join that to the main table and you'll get this...(each minute on which the process was running...)
SQL > break on id on start_time on end_time
select t1.id,
t1.start_time,
t1.end_time,
all_seconds.time1
from t1,
(select trunc(sysdate) + level/(24*60*60) time1
from dual
connect by level <= (trunc(sysdate)-trunc(sysdate-1))*24*60*60
) all_seconds
where all_seconds.time1 > t1.start_time
and all_seconds.time1 <= t1.end_time
ID START_TIME END_TIME TIME1
---------- ------------------- ------------------- -------------------
1 07/15/2011 12:00:00 07/15/2011 12:00:05 07/15/2011 12:00:01
07/15/2011 12:00:02
07/15/2011 12:00:03
07/15/2011 12:00:04
07/15/2011 12:00:05
2 07/15/2011 12:00:02 07/15/2011 12:00:04 07/15/2011 12:00:03
07/15/2011 12:00:04
3 07/15/2011 12:00:03 07/15/2011 12:00:06 07/15/2011 12:00:04
07/15/2011 12:00:05
07/15/2011 12:00:06
4 07/15/2011 12:00:15 07/15/2011 12:00:20 07/15/2011 12:00:16
07/15/2011 12:00:17
07/15/2011 12:00:18
07/15/2011 12:00:19
07/15/2011 12:00:20
From here... All you need is the distinct count of the minutes. If you have another column to group by, you'll need to modify this query at the end. (example... Projects,start dates and end dates..for which you want the total working days.., you'll need to group by project_id (??))
select count(distinct(time1)) total_minutes
from (
select t1.id,
t1.start_time,
t1.end_time,
all_seconds.time1
from t1,
(select trunc(sysdate) + level/(24*60*60) time1
from dual
connect by level <= (trunc(sysdate)-trunc(sysdate-1))*24*60*60
) all_seconds
where all_seconds.time1 > t1.start_time
and all_seconds.time1 <= t1.end_time
)
/
TOTAL_MINUTES
-------------
11
The problem with this solution is that it is (more or less) brute-force. You have to generate the list of all times possible (If you have data spans across multiple dates and you need seconds precision processing time, the generated data will be huge). Depending on how big your actual table is, the join might not be performant.
I am still trying to see if this can be solved elegantly with analytics. Will post it here if I come up with a better solution.
Unless I'm missing something, all you need is the difference between the lowest start date and the highest end date:
SELECT (MAX(end_date) - MIN(start_time))*24*60 AS elapsed_time
FROM processes;
This'll return the time elapsed in minutes.
Based on the comment, it seems that you want the sum of the time elapsed for each process? If that's the case, it's just a smallish variation on the earlier answer:
SELECT sum(end_date - start_time)*24*60 AS elapsed_time
FROM processes;
Obviously this answer doesn't deal with identify which processes to get the total for, but that's not addressed in the question at all, so I'm assuming that's not an issue.
I see now that this doesn't provide the answer that you're looking for either, because it counts time worked by multiple processes multiple times. I believe the following will work. This is based on @Rajesh's sample data from the other answer.
SELECT SUM (new_end - new_start) * 24 * 60 * 60
FROM (SELECT DISTINCT
MIN (LEAST (a.start_time, b.start_time)) AS new_start,
MAX (GREATEST (a.end_time, b.end_time)) AS new_end
FROM (SELECT ROWNUM AS rnum, t1.* FROM t1) a
INNER JOIN t1 b
ON a.start_time <= b.end_time AND a.end_time >= b.start_time
GROUP BY rnum)
Basically, we're joining each to every other row that overlaps with it and taking the earliest start time and latest end time from each of those pairings. Once we have that, we can use the distinct set to remove the duplicates, which will give us the distinct time periods.
I believe there is still one flaw here, which is cascading time periods: period A overlaps with period B and period B overlaps with period C, but period C does not overlap with period A. I think this issue is solvable, I just haven't quite figured it out yet.
Okay, one more time: rather than joining once between the table and itself, this version uses a recursive join to follow all of the cascades to their end point.
SELECT SUM (new_end - new_start) * 24 * 60 * 60
FROM (SELECT DISTINCT MIN (start_time) AS new_start,
MAX (end_time) AS new_end
FROM (SELECT start_time,
end_time,
CONNECT_BY_ROOT rnum AS rnum
FROM (SELECT ROWNUM AS rnum, t1.* FROM t1) a
CONNECT BY NOCYCLE
a.start_time <= PRIOR end_time
AND a.end_time >= PRIOR start_time)
GROUP BY rnum);
While @Allan's answer does solve my problem, it's horribly slow on large data sets. It's my fault though, since I asked specifically for a query instead of just a solution.
Here's the script I ended up using.
DECLARE
start_time DATE;
end_time DATE;
first_loop boolean := TRUE;
duration NUMBER := 0;
CURSOR client_data IS
SELECT start_date s, end_date e
FROM table_name;
BEGIN
FOR rec IN client_data LOOP
IF ( first_loop ) THEN
first_loop := FALSE;
start_time := rec.s;
end_time := rec.e;
ELSE
IF ( rec.s > end_time ) THEN
duration := duration + (end_time - start_time);
start_time := rec.s;
end_time := rec.e;
ELSIF ( rec.e > end_time ) THEN
end_time := rec.e;
END IF;
END IF;
END LOOP;
duration := duration + (end_time - start_time);
dbms_output.put_line(duration*24*60*60 || ' seconds');
END;
精彩评论