开发者

SQL query to convert date ranges to per day records

开发者 https://www.devze.com 2023-02-17 16:41 出处:网络
Requirements I have data table that saves data in date ranges. Each record is allowed to overlap previous record(s) (record has a CreatedOn datetime column).

Requirements

  1. I have data table that saves data in date ranges.
  2. Each record is allowed to overlap previous record(s) (record has a CreatedOn datetime column).
  3. New record can define it's own date range if it needs to hence can overlap several older records.
  4. Each new overlapping record overrides settings of older records that it overlaps.

Result set

What I need to get is get per day data for any date range that uses record overlapping. It should return a record per day with corresponding data for that particular day.

To convert ranges to days I was thinking of numbers/dates table and user defined function (UDF) to get data for each day in the range but I wonder whether 开发者_开发百科there's any other (as in better* or even faster) way of doing this since I'm using the latest SQL Server 2008 R2.

Stored data

Imagine my stored data looks like this

ID | RangeFrom | RangeTo  | Starts | Ends  | CreatedOn (not providing data)
---|-----------|----------|--------|-------|-----------
1  | 20110101  | 20110331 | 07:00  | 15:00
2  | 20110401  | 20110531 | 08:00  | 16:00
3  | 20110301  | 20110430 | 06:00  | 14:00 <- overrides both partially

Results

If I wanted to get data from 1st January 2011 to 31st May 2001 resulting table should look like the following (omitted obvious rows):

DayDate | Starts | Ends
--------|--------|------
20110101| 07:00  | 15:00  <- defined by record ID = 1
20110102| 07:00  | 15:00  <- defined by record ID = 1
...                          many rows omitted for obvious reasons
20110301| 06:00  | 14:00  <- defined by record ID = 3
20110302| 06:00  | 14:00  <- defined by record ID = 3
...                          many rows omitted for obvious reasons
20110501| 08:00  | 16:00  <- defined by record ID = 2
20110502| 08:00  | 16:00  <- defined by record ID = 2
...                          many rows omitted for obvious reasons
20110531| 08:00  | 16:00  <- defined by record ID = 2


Actually, since you are working with dates, a Calendar table would be more helpful.

Declare @StartDate date
Declare @EndDate date

;With Calendar As
    (
    Select @StartDate As [Date]
    Union All
    Select DateAdd(d,1,[Date])
    From Calendar
    Where [Date] < @EndDate
    )
Select ...
From Calendar
    Left Join MyTable
        On Calendar.[Date] Between MyTable.Start And MyTable.End
Option ( Maxrecursion 0 );

Addition

Missed the part about the trumping rule in your original post:

Set DateFormat MDY;
Declare @StartDate date = '20110101';
Declare @EndDate date = '20110501';

-- This first CTE is obviously to represent
-- the source table
With SampleData As 
    (
    Select 1 As Id
        , Cast('20110101' As date) As RangeFrom
        , Cast('20110331' As date) As RangeTo
        , Cast('07:00' As time) As Starts
        , Cast('15:00' As time) As Ends
        , CURRENT_TIMESTAMP As CreatedOn
    Union All Select 2, '20110401', '20110531', '08:00', '16:00', DateAdd(s,1,CURRENT_TIMESTAMP )
    Union All Select 3, '20110301', '20110430', '06:00', '14:00', DateAdd(s,2,CURRENT_TIMESTAMP )
    )
    , Calendar As
    (
    Select @StartDate As [Date]
    Union All
    Select DateAdd(d,1,[Date])
    From Calendar
    Where [Date] < @EndDate
    )
    , RankedData As
    (
    Select C.[Date]
        , S.Id
        , S.RangeFrom, S.RangeTo, S.Starts, S.Ends
        , Row_Number() Over( Partition By C.[Date] Order By S.CreatedOn Desc ) As Num
    From Calendar As C
        Join SampleData As S
            On C.[Date] Between S.RangeFrom And S.RangeTo
    )
Select [Date], Id, RangeFrom, RangeTo, Starts, Ends
From RankedData
Where Num = 1   
Option ( Maxrecursion 0 );

In short, I rank all the sample data preferring the newer rows that overlap the same date.


Why do it all in DB when you can do it better in memory

This is the solution (I eventually used) that seemed most reasonable in terms of data transferred, speed and resources.

  1. get actual range definitions from DB to mid tier (smaller amount of data)
  2. generate in memory calendar of a certain date range (faster than in DB)
  3. put those DB definitions in (much easier and faster than DB)

And that's it. I realised that complicating certain things in DB is not not worth it when you have executable in memory code that can do the same manipulation faster and more efficient.

0

精彩评论

暂无评论...
验证码 换一张
取 消