I try to collect some simple statistics on a table, let's say the simplified structure is the following:
ID | CREATIONDATE | VALUE |
------------------------------------------------
1 | 2007-01-06 13:54:00.000 | 7 |
2 | 2007-03-07 15:03:00.000 | 8 |
3 | 2008-07-02 18:55:00.000 | 12 |
4 | 2008-09-10 19:12:00.000 | 1 |
5 | 2010-01-06 13:54:00.000 | 4 |
6 | 2010-01-07 07:13:00.000 | 5 |
I'd like to sum the values based on the year of the creation date, so currently I execute the following:
开发者_JS百科select extract('year' from i.creationdate) as d1,
sum(i.value)
from items i
group d1
order by d1;
which will output
2007;15
2008;13
2010;9
Is there any way I can improve the query to have it automatically outputs 0 for the missing 2009 year, even if there is no row in the table with a date in 2009?
2007;15
2008;13
2009;0
2010;9
Use a left outer join from a generate series
select year_list.year,
coalesce(item_list.val,0)
from (select generate_series(2007,2010) as year) AS year_list
left outer join
(select date_part('year',creationdate) item_year,sum(value) as val from i group by item_year) AS item_list
on item_year = year_list.year;
to avoid hard coding the years in the generate_series substitute a select min() and max() on the year part.
Use generate_series in a JOIN:
SELECT
generate_series as d1,
COALESCE(sum(i.value) , 0)
FROM
generate_series(2005, 2010)
LEFT JOIN items i ON generate_series = extract('year' from i.creationdate)
GROUP BY
d1
ORDER BY
d1;
I would do this in the software which is doing the query, as opposed to in the query itself.
I have done such things (e.g. in Oracle) by using for loops in SQL (no doubt there are similar things in Postgres) however I came to the conclusion that was the wrong approach. The solution was more difficult to maintain than if I had simply written the code in the programming language of the application.
SQL returns data that exists, and operates on data that exists, and in that case there is no data for the year in question. That's why it's not a good fit to try and get SQL to do this. Just ask SQL for the data you do have, and when you generate your output in your programming language for the user (e.g. HTML table) then add the for loop to iterate between the first and the last year and print zero if there is no result.
An easy (but not a nice) way is to create a (temporary) table holding all years in one column. Then you just join the data table to your new table.
精彩评论