I have a large table (millions of 开发者_StackOverflow中文版rows) where I need to find groups of records based on the presence of a certain column value and where a specified 'timeout' has not occurred. I figure one approach would be to find across the entire table where these 'timeout' gaps have occurred.
Example table:
+----------------+------+ | time | base | +----------------+------+ | 1245184797.064 | a | | 1245184802.020 | a | | 1245184807.103 | b | | 1245184812.089 | b | | 1245184816.831 | b | | 1245184821.856 | a | | 1245184821.856 | a | | 1245184855.903 | a | | 1245184855.903 | b | | 1245184858.362 | b | | 1245184858.362 | b | | 1245184860.360 | a | | 1245184860.360 | a | | 1245184862.174 | a | | 1245184862.174 | b | | 1245185001.480 | b | | 1245185417.556 | a | | 1245185417.844 | a | | 1245185419.960 | b | | 1245185420.181 | b | +----------------+------+
Given this set, how would I quickly find the points in the table where base=a hasn't occurred for a given number of seconds (say 5).
To boil it down, my objective is to find spans of records where base=a HAS occurred consistently without timing out.
I think this will help you:
SELECT * FROM (
SELECT t1.[time],
t1.time - (SELECT MAX(time) FROM my_table t2 WHERE t2.time < t1.time and t2.base = 'a') AS timeout
FROM my_table t1
WHERE t1.base = 'a') d
WHERE timeout > 5
And don't forget to create index for this query to be more effective:
CREATE INDEX idx_my_table_time_base ON my_table (time, base)
One possibility, if you are using a database that supports windowing/analytic functions is something like this:
select * from (
select time,
base,
time - lag(time) over(partition by base order by time) as interval
from example) w
where w.interval > 5
This should be able to work from a single scan of a (base,time) index. It works on PostgreSQL 8.4 and I think should also work on SQL Server 2008 and Oracle 10.
One way to approach this is to check for "stretch heads", that is, occurrences of a base with more than 5 seconds since it's last occurrence. This example query joins the table on itself to filter out non-heads:
select head.*
from @t head
left join @t nohead
on head.base = nohead.base
and head.time - 5 < nohead.time and nohead.time < head.time
where nohead.base is null
order by head.[time]
For each row, the left join
searches for the same base within the last 5 seconds. The where nohead.base is null
clause says such a row may not exist. The effect is a list of when a 5+ second span without a base happens.
It won't list the last gap: you could explicitly add "end time" rows for each base:
<end time> a
<end time> b
...
to make the query check end-gaps.
精彩评论