I have this table
And I need to create a pivot, that displays emotions as columns, with average emotion_level grouped by user_id, user_date, emotion. For example, for user_id = 1, user_date = 2011-07-13 and emotion = 'Anger', the average emotion_level should be 4.0.
I create a pivot:
select USER_ID, user_date,
AVG(case emotion when 'Anger' then convert(float, emotion_level) else 0 end) as Anger,
AVG(case emotion when 'Sadness' then convert(float, emotion_level) else 0 end) as Sadness,
AVG(case emotion when 'Interest' then convert(float, emotion_level) else 0 end) as Interest
from emotions group by USER_ID, user_date;
Which half-works, but calculates average emotion_level among all emotions, but not for emotions grouped by user, date and emotion.
My result for first user + emotion = 'Anger' = 2,开发者_如何学Python but it should be 4.
I guess, I should use the window function (over (partition by user_id, user_date, emotion)), but can't get the syntax run.
Is it possible at all?
I'm using PostgreSQL 9 in prod, but the above example is written in SQL Server.
WITH q (id, user_id, user_date, emotion, emotion_level) AS
(
VALUES
(1, 1, '2011-07-13'::DATE, 'Anger', 3),
(2, 1, '2011-07-13'::DATE, 'Anger', 5),
(3, 1, '2011-07-13'::DATE, 'Sadness', 2),
(4, 1, '2011-07-13'::DATE, 'Interest', 2),
(5, 2, '2011-07-13'::DATE, 'Anger', 1),
(6, 2, '2011-07-13'::DATE, 'Sadness', 4),
(7, 2, '2011-07-13'::DATE, 'Sadness', 5),
(8, 2, '2011-07-13'::DATE, 'Interest', 3),
(9, 3, '2011-07-13'::DATE, 'Anger', 1),
(10, 3, '2011-07-13'::DATE, 'Sadness', 3),
(11, 3, '2011-07-13'::DATE, 'Interest', 4),
(12, 3, '2011-07-13'::DATE, 'Interest', 5)
)
SELECT user_id, user_date,
AVG(CASE emotion WHEN 'Anger' THEN emotion_level END)::numeric(3, 2) AS Anger,
AVG(CASE emotion WHEN 'Sadness' THEN emotion_level END)::numeric(3, 2) AS Sadness,
AVG(CASE emotion WHEN 'Interest' THEN emotion_level END)::numeric(3, 2) AS Interest
FROM q
GROUP BY
user_id, user_date
ORDER BY
user_id, user_date
The problem was that the expressions you originally used:
AVG(case emotion when 'Interest' then convert(float, emotion_level) else 0 end)
averaged over all records for a given user at given date and treated non-Interest
entries as 0
, while they should be treated as NULL
so they would not contribute to Interest
average.
I first updated the datatype of 'emotion_level' to 'float' in the table and then performed the operation using pivot. If the datatype is not updated, the required values are of int datatype.
select user_id, user_date, Anger, Sadness, Interest
from (select user_id,user_date,emotion, emotion_level
from emotions) as emo
pivot(avg(emotion_level)
for emotion in (Anger, Sadness, Interest)) as P;
The result from the above query in the link below
I'd define a CTE, then join
to it - your main problem is probably the fact that you're not group
ing on emotion_level
. Try this:
WITH average_emotion (user_id, date, emotion, average_level) as
(SELECT user_id, user_date, emotion, AVG(convert(float, emotion_level))
FROM emotions
GROUP BY user_id, user_date, emotion)
SELECT a.user_id, a.user_date, COALESCE(b.average_level, 0) as Anger
FROM emotions as a
LEFT JOIN average_emotion as b
ON b.user_id = a.user_id
AND b.date = a.user_date
AND b.emotion = 'Anger'
GROUP BY a.user_id, a.user_date
Then, just add additional left join
s every time you need a new column.
There's a number of ways to restructure this, mostly relating to the rest of your data model, and desired output.
精彩评论