开发者

How to write these two queries for a simple data warehouse, using ANSI SQL?

开发者 https://www.devze.com 2023-01-01 14:51 出处:网络
I am writing a simple data warehouse that will allow me to query the table to observe periodic (say weekly) changes in data, as well as changes in the change of the data (e.g. week to week change in t

I am writing a simple data warehouse that will allow me to query the table to observe periodic (say weekly) changes in data, as well as changes in the change of the data (e.g. week to week change in the weekly sale amount).

For the purposes of simplicity, I will present very simplified (almost trivialized) versions of the tables I am using here. The sales data table is a view and has the following structure:

CREATE TABLE sales_data (
     sales_time date NOT NULL,
     sales_amt double NOT NULL
)

For the purpose of this question. I have left out other fields you would expect to see - like product_id, sales_person_id etc, etc, as they have no direct relevance to this question. AFAICT, the only fields that will be used in the query are the sales_time and the sales_amt fields (unless I am mistaken).

I also have a date dimension table with the following structure:

CREATE TABLE date_dimension (
  id integer  NOT NULL,
  datestamp   date NOT NULL,
  day_part    integer NOT NULL,
  week_part   integer NOT NULL,
  month_part  integer NOT NULL,
  qtr_part    integer NOT NULL, 
  year_part   integer NOT NULL, 
);

which partition dates into reporting ranges.

I need to write queries that will allow me to do the following:

  1. Return the change in week on week sales_amt for a specified period. For example the change between sales today and sales N days ago - where N is a positive integer (N == 7 i开发者_C百科n this case).

  2. Return the change in change of sales_amt for a specified period. For in (1). we calculated the week on week change. Now we want to know how that change is differs from the (week on week) change calculated last week.

I am stuck however at this point, as SQL is my weakest skill. I would be grateful if an SQL master can explain how I can write these queries in a DB agnostic way (i.e. using ANSI SQL).


As noted in the comment above, I probably do not understand your model -- so here is a simple one to get started.

How to write these two queries for a simple data warehouse, using ANSI SQL?

Now if I want weekly sales for calendar year of 2010

select 
    CalendarYearWeek
  , sum(SalesAmount)
from factSales as f
join dimDate as d on d.DateKey = f.DateKey
where Year = 2010
group by CalendarYearWeek

CalendarYearWeek is a column in dimDate, varchar(8), for example '2010-w03', Year is an integer column in dimDate too.

Not sure if this is close to what you were looking for, but may be a start.

EDIT

dimDate also has these columns:

WeekNumberInEpoch, integer -- increases increases starting from some epoch date in past. All rows in dimDate within the same week have the same WeekNumberInEpoch.

DayOfWeek, varchar(10) -- 'sunday', 'monday', ...

DayNumberInWeek, integer -- 1-7

This uses CTEs, should work with latest PostgreSQL, SQL Server, Oracle, DB2. For others you may package the CTE (q_00) into a sub-query.

-- for week to previous week
with
q_00 as (
    select
        WeekNumberInEpoch
      , sum(SalesAmount) as Amount
    from factSale as f
    join dimDate  as d on d.DateKey = f.DateKey
    where CalendarYear = 2010
    group by WeekNumberInEpoch
)
select
    a.WeekNumberInEpoch
  , a.Amount as ThisWeekSales
  , b.Amount as LastWeekSales
  , a.Amount - b.Amount as Difference
from q_00 as a
join q_00 as b on b.WeekNumberInEpoch = a.WeekNumberInEpoch - 1
order by a.WeekNumberInEpoch desc ;


-- for day of week to day of previous week 
-- monday to monday, tuesday to tuesday, ...
with
q_00 as (
    select
        WeekNumberInEpoch
      , DayOfWeek  
      , sum(SalesAmount) as Amount
    from factSale as f
    join dimDate  as d on d.DateKey = f.DateKey
    where CalendarYear = 2010
    group by WeekNumberInEpoch, DayOfWeek
)
select
    a.WeekNumberInEpoch
  , a.DayOfWeek  
  , a.Amount as ThisWeekSales
  , b.Amount as LastWeekSales
  , a.Amount - b.Amount as Difference
from q_00 as a
join q_00 as b on (b.WeekNumberInEpoch = a.WeekNumberInEpoch - 1
                   and b.DayOfWeek = a.DayOfWeek)
order by a.WeekNumberInEpoch desc, a.DayOfWeek ;



-- Sliding by day and day difference (= 7)
with
q_00 as (
    select
        DayNumberInEpoch
      , FullDate
      , DayOfWeek
      , sum(SalesAmount) as Amount
    from factSale as f
    join dimDate as d on d.DateKey = f.DateKey
    where CalendarYear = 2010
    group by DayNumberInEpoch, FullDate, DayOfWeek
)
select
    a.FullDate  as ThisDay
  , a.DayOfWeek as ThisDayName
  , a.Amount    as ThisDaySales
  , b.FullDate  as PreviousPeriodDay
  , b.DayOfWeek as PreviousDayName
  , b.Amount    as PreviousPeriodDaySales
  , a.Amount - b.Amount as Difference
from q_00 as a
join q_00 as b on b.DayNumberInEpoch = a.DayNumberInEpoch - 7
order by a.FullDate desc ;


I suggest you build a separate dimension table for 'time' (one day per row, that contains information about repeating time periods (day, week, month, quarter) so you can easily join/select for that type of information.

Your queries for (1.) and (2.) could be built that way.

Yes, most SQL dialects allow infering that information with time/date function .. but they are slow (-er) and more complicated than using a dimension table ....

0

精彩评论

暂无评论...
验证码 换一张
取 消