开发者

How to find N Consecutive records in a table using SQL

开发者 https://www.devze.com 2022-12-28 04:25 出处:网络
I have the following Table definition with sample data. In the following table, Customer Product & Date are key fields

I have the following Table definition with sample data. In the following table, Customer Product & Date are key fields

Table One
Customer   Product    Date         SALE
   X          A       01/01/2010    YES
   X          A       02/01/2010    YES
   X          A       03/01/2010    NO
   X          A       04/01/2010    NO
   X          A       05/01/2010    YES
   X          A       06/01开发者_Python百科/2010    NO
   X          A       07/01/2010    NO
   X          A       08/01/2010    NO
   X          A       09/01/2010    YES
   X          A       10/01/2010    YES
   X          A       11/01/2010    NO
   X          A       12/01/2010    YES

In the above table, I need to find the N or > N consecutive records where there was no sale, Sale value was 'NO' For example, if N is 2, the the result set would return the following

     Customer   Product    Date         SALE
       X          A       03/01/2010    NO
       X          A       04/01/2010    NO
       X          A       06/01/2010    NO
       X          A       07/01/2010    NO
       X          A       08/01/2010    NO

Can someone help me with a SQL query to get the desired results. I am using SQL Server 2005. I started playing using ROW_NUMBER() AND PARTITION clauses but no luck. Thanks for any help


You need to match your table against itself, as if there where 2 tables. So you use two aliases, o1 and o2 to refer to your table:

SELECT DISTINCT o1.customer, o1.product, o1.datum, o1.sale
  FROM one o1, one o2
  WHERE (o1.datum = o2.datum-1 OR o1.datum = o2.datum +1)
  AND o1.sale = 'NO' 
  AND o2.sale = 'NO'; 
 customer | product |   datum    | sale 
----------+---------+------------+------
 X        | A       | 2010-01-03 | NO
 X        | A       | 2010-01-04 | NO
 X        | A       | 2010-01-06 | NO
 X        | A       | 2010-01-07 | NO
 X        | A       | 2010-01-08 | NO

Note that I performed the query on an postgresql database - maybe the syntax differs on ms-sql-server, maybe at the alias 'FROM one AS o1' perhaps, and maybe you cannot add/substract in that way.


A different approach, inspired by munchs last line.

Get - for a given date the first date with YES later than that, and the last date with YES earlier than that. These form the boundary, where our dates shall fit in.

SELECT (o1.datum),
    MAX (o3.datum) - MIN (o2.datum) AS diff
FROM one o1, one o2, one o3 
WHERE o1.sale = 'NO'
AND o3.datum <
    (SELECT MIN (datum) 
    FROM one 
    WHERE datum >= o1.datum 
    AND SALE = 'YES') 
AND o2.datum > 
    (SELECT MAX (datum) 
    FROM one 
    WHERE datum <= o1.datum 
    AND SALE = 'YES') 
GROUP BY o1.datum 
HAVING MAX (o3.datum) - MIN (o2.datum) >= 2
ORDER BY o1.datum;

Maybe it needs some kind of optimization, because table one is 5 times involved in the query. :)


Thanks to everyone for posting your solution. Thought, I would also share my solution with everyone. Just as an FYI, I received this solution from another SQL Server Central forum member. I am definitely not going to take credit for this solution.

DECLARE @CNT INT
SELECT @CNT = 3

SELECT * FROM
(
  SELECT
    [Customer], [Product], [Date], [Sale], groupID, 
    COUNT(*) OVER (PARTITION BY [Customer], [Product], [Sale], groupID) AS groupCnt
  FROM
  (
    SELECT
      [Customer], [Product], [Date], [Sale],
      ROW_NUMBER() OVER (PARTITION BY [Customer], [Product] ORDER BY [Date])
      - ROW_NUMBER() OVER (PARTITION BY [Customer], [Product], [Sale] ORDER BY [Date]) AS groupID
    FROM
      [TableSales]
  ) T1
) T2
WHERE
  T2.[Sale] = 'NO' AND T2.[groupCnt] >= @CNT


Ok, we need a variable answer. We search for a date, where we have N following dates, all with the sale-field being NO.

SELECT d1.datum
FROM one d1, one d2, i 
WHERE d1.sale = 'NO' AND d2.sale = 'NO'
  AND d1.datum = (d2.datum - i) 
  AND i > 0 AND i < 4 
GROUP BY d1.datum 
HAVING COUNT (*) = 3; 

This will give us the date, which we use for subquerying.

Notes:

  • I used 'datum' instead of date, because date is a reserved keyword on postgresql.

  • In Oracle you can use a virtual table dummy, which contains anything you ask for, like 'SELCT foo FROM dual WHERE foo in (1, 2, 3);' which will give you 1, 2, 3, if I remember correctly. Depending on the vendor, there might be other tricks to get a sequence 1 to N. I created a table i with column i, and filled it with the values 1 to 100, and I expect N not to exceed 100; Since a few versions, postgresql contains a function 'generate_series (from, to) which would solve the problem too, and might have similarities with solutions for your specific database. But table i should work vendor independent.

  • if N == 17, you have to modify 3 places from 3 to 17.

The final query will be:

SELECT o4.* 
FROM one o3, one o4 
WHERE o3.datum = (
    SELECT d1.datum
    FROM one d1, one d2, i 
    WHERE d1.sale = 'NO' AND d2.sale = 'NO'
      AND d1.datum = (d2.datum - i) 
      AND i > 0 AND i <= 3 
    GROUP BY d1.datum 
    HAVING COUNT (*) = 3) 
AND o4.datum <= o3.datum + 3 
AND o4.datum >= o3.datum; 
 customer | product |   datum    | sale 
----------+---------+------------+------
 X        | A       | 2010-02-06 | NO
 X        | A       | 2010-02-07 | NO
 X        | A       | 2010-02-08 | NO
 X        | A       | 2010-02-09 | NO
0

精彩评论

暂无评论...
验证码 换一张
取 消