We are using the following SQL query to produce monthly averages. The statement is heavy used and works pretty well, but it fails with a 'ORA-00979: not a GROUP BY expression' every month or two and we have no idea why.
First about the process:
- we have raw data every few minutes,
- => the raw data is getting averaged to hourly, daily, monthly and yearly values
raw -> hourly
- creates entries in the averages table with average_type 2
- never problems
hourly => daily / daily => monthly / monthly => yearly
- Statements are pretty similar
- averages of a 'lower' type is being averaged to a higher type
Average types are: 2 hourly, 3 daily, (4 weekly not used) 5 monthly and 6 yearly
The bug only appears with the step "daily => monthly".
Query:
- We can't reproduce the bug, the next run of the aggregation job usually works without problems.
- The errors occurs every 50-60 days, without a real pattern
- Environment: Oracle 10g
Does anybody have an idea what the problem could be?
INSERT INTO averages
SELECT averages_seq.NEXTVAL,
avg.*
FROM (
SELECT
m.city_id, m.city_name,
m.state_id, m.state_name,
m.district_id, m.district_name,
m.country_id, m.country_name,
m.currency_id, m.currency_name,
m.category_id, m.category_name,
5 average_type, -- average_type 5 ==> monthly average
0 analysis_type,
TRUNC(m.average_date, 'MM') average_date,
AVG(m.value) value,
SUM(m.sum) sum,
NULL uncertainty,
NULL uncertainty_type,
MIN(m.value_min) value_min,
MAX(m.value_max) value_max,
SUM(number_of_measurements) number_of_measurements,
-- 6 * 24 => measurements per day
-- (ADD_MONTHS(...)) => days per month
100 * SUM(number_of_measurements) /
(6 * 24 *
(ADD_MONTHS(TRUNC(average_date, 'MM'), 1) - TRUNC(average_date, 'MM'))) coverage_percent,
SUM(customers) customers,
NULL dummy_field,
CURRENT_TIMESTAMP calculation_date,
CURRENT_TIMESTAMP creation_date,
'AGGREGATION' creation_user,
CURRENT_TIMESTAMP modification_date,
'AGGREGATION' modification_user,
'n' constant_1,
3 constant_2,
-1 average_state
FROM averages m
WHERE m.average_type = 3 -- average type 3 ==> daily average
AND m.average_date
BETWEEN
TO_TIMESTAMP('2011-06-01T00:00:00Z', 'YYYY-MM-DD"T"HH24:MI:SS"Z"')
AND
TO_TIMESTAMP('2011-06-30T23:59:59Z'开发者_JS百科, 'YYYY-MM-DD"T"HH24:MI:SS"Z"')
AND m.analysis_type = 0
GROUP BY
m.city_id, m.city_name,
m.state_id, m.state_name,
m.district_id, m.district_name,
m.country_id, m.country_name,
m.currency_id, m.currency_name,
m.category_id, m.category_name,
TRUNC(m.average_date, 'MM')
) avg
I would add a group by:
(ADD_MONTHS(TRUNC(average_date, 'MM'), 1) - TRUNC(average_date, 'MM'))
I know this can't change without TRUNC(m.average_date, 'MM')
changing, but it seems to be the only non-aggregate column not in your GROUP BY.
In addition, you could remove all the non-aggregated, constant, columns outside the inner sql, and explicitly name the columns you're inserting and select these constants at the same time:
eg.
INSERT INTO averages(city_id, city_name, ...average_type, analysis_type, ...)
SELECT averages_seq.NEXTVAL,
avg.city_id, avg.city_name, ...
5, 0, ...
...can't say this part will fix the issue, but it would certainly remove them from suspicion of not being aggregated.
Judging by the comparison, the average_date is a timestamp with time zone (local time zone ?), but TRUNC works on a date. I'm wondering what happens if there is some oddity where a selected date is 'jumping' from one month to another (eg it occurred in January in one timezone but Feb in another).
Based on that, also consider whether the client is having an effect, (eg maybe it errors when run from a client that is in a different timezone from the database setting).
I'd expand Gerrat's suggestion of specifying the column names so that you can separate out the constants
INSERT INTO averages
(average_type, analysis_type, uncertainty, uncertainty_type,
dummy_field, calculation_date, creation_date, creation_user,
modification_date, modification_user, constant_1, constant_2,
....
SELECT averages_seq.NEXTVAL,
5 average_type, -- average_type 5 ==> monthly average
0 analysis_type,
NULL uncertainty,
NULL uncertainty_type,
NULL dummy_field,
CURRENT_TIMESTAMP calculation_date,
CURRENT_TIMESTAMP creation_date,
'AGGREGATION' creation_user,
CURRENT_TIMESTAMP modification_date,
'AGGREGATION' modification_user,
'n' constant_1,
3 constant_2,
-1 average_state
avg.*
FROM (
SELECT ...
精彩评论