开发者

Window moving average in sql server

开发者 https://www.devze.com 2023-02-19 10:13 出处:网络
I am trying to create a function that computes a windowed moving average in SQLServer 2008. I am quite new to SQL so I am having a fair bit of difficulty. The data that I am trying to perform the movi

I am trying to create a function that computes a windowed moving average in SQLServer 2008. I am quite new to SQL so I am having a fair bit of difficulty. The data that I am trying to perform the moving average on needs to be grouped by day (it is all timestamped data) and then a variable moving average window needs to be applied to it.

I already have a function that groups the data by day (and @id) which is shown at the bottom. I have a few questions:

Would it be better to call the grouping function inside the moving average function or should I do it all at once?

Is it possible to get the moving average for the dates input into the function, but go back n days to begin the moving average so that the first n days of the returned data will not have 0 for their average? (ie. if they want a 7 day moving average from 01-08-2011 to 02-08-2011 that I start the moving average calculation on 01-01-2011 so that the first day they defined has a value?)

I am in the process of looking into how to do the moving average, and know that a moving window seems to be the best option (currentSum = prevSum + todayCount - nthDayAgoCount) / nDays but I am still working on figuring out the SQL implementation of this.

I have a grouping 开发者_如何学运维function that looks like this (some variables removed for visibility purposes):

    SELECT
        'ALL' as GeogType,
        CAST(v.AdmissionOn as date) as dtAdmission,    
        CASE WHEN @id IS NULL THEN 99 ELSE v.ID END,
        COUNT(*) as nVisits
    FROM dbo.Table1 v INNER JOIN dbo.Table2 t ON v.FSLDU = t.FSLDU5
    WHERE v.AdmissionOn >= '01-01-2010' AND v.AdmissionOn < DATEADD(day,1,'02-01-2010') 
          AND v.ID = Coalesce(@id,ID)
    GROUP BY    
        CAST(v.AdmissionOn as date),
        CASE WHEN @id IS NULL THEN 99 ELSE v.ID END
    ORDER BY 2,3,4

Which returns a table like so:

ALL 2010-01-01  1   103
ALL 2010-01-02  1   114
ALL 2010-01-03  1   86
ALL 2010-01-04  1   88
ALL 2010-01-05  1   84
ALL 2010-01-06  1   87
ALL 2010-01-07  1   82

EDIT: To answer the first question I asked:

I ended up creating a function which declared a temporary table and inserted the results from the count function into it, then used the example from user662852 to compute the moving average.


Take the hardcoded date range out of your query. Write the output (like your sample at the end) to a temp table (I called it #visits below).
Try this self join to the temp table:

 Select list.dtadmission
   , AVG(data.nvisits) as Avg
   , SUM(data.nvisits) as sum
   , COUNT(data.nvisits) as RollingDayCount
   , MIN(data.dtadmission) as Verifymindate
   , MAX(data.dtadmission)   as Verifymaxdate
 from  #visits as list 
 inner join #visits as data  
 on list.dtadmission between data.dtadmission and DATEADD(DD,6,data.dtadmission) group by list.dtadmission

EDIT: I didn't have enough room in Comments to say this in response to your question:

My join is "kinda cartesian" because it uses a between in the join constraint. Each record in list is going up against every other record, and then I want the ones where the date I report is between a lower bound of (-7) days and today. Every data date is available to list date, this is the key to your question. I could have written the join condition as

list.dtadmission between DATEADD(DD,-6,data.dtadmission) and data.dtadmission

But what really happened was I tested it as

list.dtadmission between DATEADD(DD,6,data.dtadmission) and data.dtadmission

Which returns no records because the syntax is "Between LOW and HIGH". I facepalmed on 0 records and swapped the arguments, that's all.

Try the following, see what I mean: This is the cartesian join for just one listdate:

 SELECT 
 list.[dtAdmission] as listdate
 ,data.[dtAdmission] as datadate
 ,data.nVisits as datadata
 ,DATEADD(dd,6,list.dtadmission) as listplus6 
 ,DATEADD(dd,6,data.dtAdmission ) as datapplus6 
 from  [sandbox].[dbo].[admAvg] as list inner join [sandbox].[dbo].[admAvg] as data    
 on 
 1=1
 where list.dtAdmission = '5-Jan-2011'

Compare this to the actual join condition

 SELECT 
      list.[dtAdmission] as listdate
      ,data.[dtAdmission] as datadate
      ,data.nVisits as datadata
      ,DATEADD(dd,6,list.dtadmission) as listplus6 
      ,DATEADD(dd,6,data.dtAdmission ) as datapplus6
from  [sandbox].[dbo].[admAvg] as list   inner join [sandbox].[dbo].[admAvg] as data    
on 
list.dtadmission between data.dtadmission and DATEADD(DD,6,data.dtadmission)
where list.dtAdmission = '5-Jan-2011'

See how list date is between datadate and dataplus6 in all the records?

0

精彩评论

暂无评论...
验证码 换一张
取 消