开发者

How to find a gap in a time ordered table where a given column does not have a certain value for a specified interval

开发者 https://www.devze.com 2022-12-13 13:55 出处:网络
I have a large table (millions of 开发者_StackOverflow中文版rows) where I need to find groups of records based on the presence of a certain column value and where a specified \'timeout\' has not occur

I have a large table (millions of 开发者_StackOverflow中文版rows) where I need to find groups of records based on the presence of a certain column value and where a specified 'timeout' has not occurred. I figure one approach would be to find across the entire table where these 'timeout' gaps have occurred.

Example table:

+----------------+------+
| time           | base |
+----------------+------+
| 1245184797.064 | a    |
| 1245184802.020 | a    |
| 1245184807.103 | b    |
| 1245184812.089 | b    |
| 1245184816.831 | b    |
| 1245184821.856 | a    |
| 1245184821.856 | a    |
| 1245184855.903 | a    |
| 1245184855.903 | b    |
| 1245184858.362 | b    |
| 1245184858.362 | b    |
| 1245184860.360 | a    |
| 1245184860.360 | a    |
| 1245184862.174 | a    |
| 1245184862.174 | b    |
| 1245185001.480 | b    |
| 1245185417.556 | a    |
| 1245185417.844 | a    |
| 1245185419.960 | b    |
| 1245185420.181 | b    |
+----------------+------+

Given this set, how would I quickly find the points in the table where base=a hasn't occurred for a given number of seconds (say 5).

To boil it down, my objective is to find spans of records where base=a HAS occurred consistently without timing out.


I think this will help you:

SELECT * FROM (
    SELECT t1.[time],
           t1.time - (SELECT MAX(time) FROM my_table t2 WHERE t2.time < t1.time and t2.base = 'a') AS timeout
    FROM my_table t1
    WHERE t1.base = 'a') d
WHERE timeout > 5

And don't forget to create index for this query to be more effective:

CREATE INDEX idx_my_table_time_base ON my_table (time, base)


One possibility, if you are using a database that supports windowing/analytic functions is something like this:

select * from (
    select time,
           base,
           time - lag(time) over(partition by base order by time) as interval
    from example) w
where w.interval > 5

This should be able to work from a single scan of a (base,time) index. It works on PostgreSQL 8.4 and I think should also work on SQL Server 2008 and Oracle 10.


One way to approach this is to check for "stretch heads", that is, occurrences of a base with more than 5 seconds since it's last occurrence. This example query joins the table on itself to filter out non-heads:

select    head.* 
from      @t head
left join @t nohead 
on        head.base = nohead.base 
and       head.time - 5 < nohead.time and nohead.time < head.time
where     nohead.base is null
order by  head.[time]

For each row, the left join searches for the same base within the last 5 seconds. The where nohead.base is null clause says such a row may not exist. The effect is a list of when a 5+ second span without a base happens.

It won't list the last gap: you could explicitly add "end time" rows for each base:

<end time>     a
<end time>     b
...

to make the query check end-gaps.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号