I’m struggling to recreate a certain calculation in SQL that I can do in Excel. It concerns averages.
We collect data over many years. One table we create is a t开发者_StackOverflowhree year average. Below are tables that show: 1) the RAW data, and 2) a new table with the AVERAGE data. The ‘Dimensions’ are [Year] & [ItemCode].
The average data is based on averaging a year’s figure with the 2 years either side. i.e. 1991 average figure = Average (1990,1991,1992). =AVERAGE(B2:D2)
1) RAW
2) AVERAGE
alt text http://tinypic.com/images/404.gif
In Excel this seems simple to do: an average formula that is then dragged into the other cells, or in VBA using R1C1 referencing.
But what is it in SQL? How do you say use ‘this’ year & the year before & the year after data?
Also, I realise that if there’s an empty cell it would throw my calculations, so how do you manage NULL values so it only calculates when there are three figures to use?...and then do it all again until finished (many years and many, many ItemCodes)
Any help would be greatly appreciated.
You don't state the technology, assuming MSSQL.
Assume mydate is 2010
SELECT dateadd(Year, -1, mydate), mydate, dateadd(Year, 1, mydate)
FROM SomethingOrOther
returns
2009, 2010, 2011
DATEADD documentation
Assume x, y, z are all NULL
SELECT COALESCE(x, y, z, 0)
returns
0
COALESCE documentation
I've created an example below in which I average the value for year 1994 to get you started. The SQL query should work in any RDBMS. That said, each database has a better approach if you can use the vendor specific features. For example, using Oracle analytics you can compute a rolling average and access the prev/next row pretty much like wanted.
drop table t;
create table t(
item_code varchar(10) not null
,year int not null
,figure decimal(5,2) not null
,primary key(year, item_code)
);
insert into t(item_code, year, figure) values('A', 1990, 5);
insert into t(item_code, year, figure) values('A', 1991, 7.2);
insert into t(item_code, year, figure) values('A', 1992, 9.2);
insert into t(item_code, year, figure) values('A', 1993, 12.5);
insert into t(item_code, year, figure) values('A', 1994, 10);
insert into t(item_code, year, figure) values('A', 1995, 9);
insert into t(item_code, year, figure) values('B', 1990, 16);
insert into t(item_code, year, figure) values('B', 1991, 17);
insert into t(item_code, year, figure) values('B', 1992, 10);
insert into t(item_code, year, figure) values('B', 1993, 13);
insert into t(item_code, year, figure) values('B', 1994, 15.5);
insert into t(item_code, year, figure) values('B', 1995, 12);
insert into t(item_code, year, figure) values('C', 1990, 4);
insert into t(item_code, year, figure) values('C', 1991, 9);
insert into t(item_code, year, figure) values('C', 1992, 15);
insert into t(item_code, year, figure) values('C', 1993, 12);
insert into t(item_code, year, figure) values('C', 1994, 10);
insert into t(item_code, year, figure) values('C', 1995, 7.2);
select item_code
,1994 as year
,sum(case when year = 1995 then figure else 0 end) as next_year
,sum(case when year = 1994 then figure else 0 end) as this_year
,sum(case when year = 1993 then figure else 0 end) as prev_year
,sum(figure) / count(*) as avg
from t
where year in(1993, 1994, 1995)
group
by item_code
having count(*) = 3;
The count(*) = 3 satisfies the constraint that an average will only be computed if there are three years available. Obviously, you have to replace the constants with some variable. Also, the query only computes the averages for 1 year.
精彩评论