开发者

How to select based on maximum time interval

开发者 https://www.devze.com 2023-03-29 12:00 出处:网络
I have a table as follows in Postgres SQL 8.4: 1| John Smith | 2011-08-12 12:44:13.125+08 2| John Smith | 2011-08-16 08:38:57.968+08

I have a table as follows in Postgres SQL 8.4:

 1   | John Smith | 2011-08-12 12:44:13.125+08
 2   | John Smith | 2011-08-16 08:38:57.968+08
 3   | John Smith | 2011-08-16 08:38:58.062+08
 4   | Kenny Long | 2011-08-16 17:06:35.843+08
 5   | Kenny Long | 2011-08-16 17:06:35.90开发者_如何学Go6+08
 6   | Kenny Long | 2011-08-16 17:06:59.281+08
 7   | Kenny Long | 2011-08-16 17:07:00.234+08
 8   | Kenny Long | 2011-08-16 17:07:32.859+08
 9   | Kenny Long | 2011-08-16 17:08:00.437+08
 10  | Kenny Long | 2011-08-16 17:08:22.718+08
 11  | Kenny Long | 2011-08-16 17:08:22.781+08

I would like to select the columns based timestamp. Only one row is needed for those records that fall within 2 minutes from each other. For example records number 4 to 9 should return only row number 4 and ignore the rest of the rows.

How can I achieve this? Your help is greatly appreciated.

Thank you in advance.

Joe Liew


I've tried it with a recursive way. I'm not sure it's the better way, and I'm quite sure I should study some Window operations to reduce it.

But It worked on my test case. The goal is to start with one min timestamp per guy, then track which rows are to be deleted (within the 2 min range), and which row is the next valid one. Then at each iteration we continue from this valid row (one per guy).

So here is the query for table myschema.mytable with colums id,name,tm. Note that the level column is there just to track recursion and debug, not necessary:

WITH RECURSIVE mytmp(id,name,thetime,thelevel) AS (
   -- recursive query: 1st row
   -- starting point, one row of the table for each people
   -- with a subquery to get the min time with id, maybe a better way to do it
   (
    select myschema.mytable.id,myschema.mytable.name,myschema.mytable.tm as thetime,1 as thelevel
    from (
      select name,min(tm) as mintm
      from myschema.mytable
      group by name
     ) q,myschema.mytable
    WHERE myschema.mytable.name=q.name
    AND myschema.mytable.tm=q.mintm
   ORDER BY name ASC) -- end of starting point of recursive query
  UNION ALL 
    -- now the recursive part, starting from the 1st row and then again and again (loop)
    -- get descendants in the 2 minutes interval for every computed row already in mytmp
    --
    -- get from previous iterations targets, one per guy
    -- and track the 1st new valid row (>2min) for that guy
    -- removing bad rows (<2min) is easy, several way to do it
    -- keeping only one valid row (and not all the others is harder, limit and aggregates functions are restricted in recursive terms
    -- we must keep only one, as the future valid rows will depend on the 2 minutes range from this one
    -- maybe some window function could help me, but at least I've a working solution
    select myschema.mytable.id,myschema.mytable.name,myschema.mytable.tm as thetime,q2.thelevel
    FROM myschema.mytable,(
    -- here need to keep 1st true one
    select myschema.mytable.name,MIN(myschema.mytable.tm) as tm,mytmp2.thelevel +1 as thelevel
    FROM myschema.mytable,(
        select id,name,thetime,thelevel
        from mytmp 
    ) mytmp2
    -- hack: mytmp2 is useless, mytmp should have been used
    -- we create this indirection to avoid this message:
    -- "ERROR:  aggregate functions not allowed in a recursive query's recursive term"
    -- on the MIN functions
    -- I do not know why it worked :-)
    WHERE myschema.mytable.name=mytmp2.name
    -- future
    AND myschema.mytable.tm - mytmp2.thetime > INTERVAL '0'
    GROUP BY
        -- hack the group by, to make 2 groups
        -- the first one for rows in the 2 min range and the second one for others
        CASE WHEN ((myschema.mytable.tm - mytmp2.thetime) > INTERVAL '2 minutes') THEN 1 ELSE 2 END,
        myschema.mytable.name,mytmp2.thelevel,mytmp2.thetime
        -- then with the having we keep only the second group, containing the first valid > 2min row
        HAVING ((MIN(myschema.mytable.tm) - mytmp2.thetime) > INTERVAL '2 minutes')=true
    ) q2 -- q2contains 1st true row and all false rows for each people
    -- q2 is used to get the id, that we cannot have in a group by request
    WHERE q2.tm=myschema.mytable.tm
    AND q2.name=myschema.mytable.name
 ) -- end of recursive query
  SELECT *
  FROM mytmp
  ORDER BY name asc, thelevel asc,thetime asc
  -- LIMIT 100 -- to debug, avoid infinite loops

Another solution would maybe be using a stored procedure, doing the same things in a temporary table (take valid rows, delete rows in the 2min range, then take the next valid ones, etc), maybe easier to maintain.


Just some idea, not tested. Windowing function need 8.4 / later.

SELECT * FROM 
(SELECT
     name,
     tm, 
     case
         when lagname is NULL OR   -- first row of everything
              (name <> lagname) OR -- we have order by name, this is first row of this name
              (name = lagname AND lagtm + interval '2 minutes' >= tm) 
         then 1
         else 0
     end as flags
  FROM
  ( 
     SELECT name, 
         tm,
         lag(name) over (order by name,tm) as lagname,
         lag(tm) over (order by name,tm) as lagtm
      from "table"."table"
  ) AS lagtable
) AS blar
WHERE "flags" = 1
0

精彩评论

暂无评论...
验证码 换一张
取 消