Consider a dataset with 6 month data as follows:
// Month-01 = 1
// Month-02 = 5
// Month-03 = 3
// Month-04开发者_StackOverflow中文版 = 2
// Month-05 = 7
// Month-06 = 8
Then rolling quarter (summation of last 3 month) will be as follows:
// QTR-01 = N/A
// QTR-02 = N/A
// QTR-03 = 9
// QTR-04 = 10
// QTR-05 = 12
// QTR-06 = 17
Now, an inefficient algorithm for this calculation in SQL as follows (not perfect algo, just consider the theme of the algo, pls):
foreach row { id,month,qtr,... } in database.table
{
qtrValue = select sum( top 3 month) from database.table where table.id = row.id;
update row.qtr set row.qtr= qtrValue;
}
Can you suggest an efficient algorithm and/or datawarehouse design for this problem? It doesnt' matter it involves relational database or not.
A Moving SUM window aggregate function would accomplish what you are looking to do.
Something along these lines:
SELECT SUM(Month)
OVER(ROWS BETWEEN 2 PRECEDING AND CURRENT ROWS)
FROM database.table
There is a PARTITION BY option that allows you to apply the aggregation to a group of columns. The exact syntax may vary based on the database platform you are running against. If your database platform doesn't support window aggregates all hope is not lost, but it will take a bit more SQL to accomplish the same task in set based notation.
Well, my date dimension simply has MonthNumberInEpoch
which is an incrementing integer for each calendar month starting at the epoch of the dimDate
.
So, I can write something like:
with
q_00 as (-- sales monthly
select
MonthNumberInEpoch
, sum (SaleAmount) as SalesMonthly
from dbo.factSale as f
join dbo.dimDate as d on d.DateKey = f.DateKey
group by MonthNumberInEpoch
)
select
a.MonthNumberInEpoch
, (a.SalesMonthly + b.SalesMonthly + c.SalesMonthly) as SalesThreeMonths
from q_00 as a
join q_00 as b on b.MonthNumberInEpoch + 1 = a.MonthNumberInEpoch
join q_00 as c on c.MonthNumberInEpoch + 2 = a.MonthNumberInEpoch
;
精彩评论