Let's say I've got a SQL Server table that logs user activity. Let's say it has user ID, user name, activity date, and activity type columns. I want to print out a list of all user activity, with one row for each month of activity, and a column for each activity type summing up the number of times that activity occurred in that month. I'm trying to do this with the following query:
SELECT
user_id,
user_name,
CONVERT(VARCHAR(7), activity_date, 120),
SUM(CASE WHEN activity_type = 'Log In' THEN 1 ELSE 0 END),
SUM(CASE WHEN activity_type = 'Save Document' THEN 1 ELSE 0 END),
SUM(CASE WHEN activity_type = 'Create Document' THEN 1 ELSE 0 END)
FROM UserActivity
WHERE DA开发者_运维问答TE BETWEEN '11-1-2010 00:00:00' AND '12-31-2010 23:59:59'
GROUP BY user_id, user_name, CONVERT(VARCHAR(7), activity_date, 120)
The problem is, this query is essentially giving me a separate row for each activity--lots and lots of rows, no counting. I think that the problem is with the way I'm doing the dates, because if I change the query to not select the date, I get a table that looks "mostly correct."
Any thoughts?
You can't have a SUM without a GROUP BY, at least not with other non-aggregates in the SELECT. Do your GROUP BY clause properly.
SELECT
user_id,
user_name,
CONVERT(VARCHAR(7), activity_date, 120),
SUM(CASE WHEN activity_type = 'Log In' THEN 1 ELSE 0 END),
SUM(CASE WHEN activity_type = 'Save Document' THEN 1 ELSE 0 END),
SUM(CASE WHEN activity_type = 'Create Document' THEN 1 ELSE 0 END)
FROM UserActivity
WHERE DATE BETWEEN '11-1-2010 00:00:00' AND '12-31-2010 23:59:59'
GROUP BY user_id,
user_name,
CONVERT(VARCHAR(7), activity_date, 120)
For what it's worth, for date ranges, I prefer to use
WHERE DATE >= '20101101'
AND DATE < '20110101'
I'm sure losing a few records with a timestamp of '12-31-2010 23:59:59.997' won't matter, but it's just more logically correct to use a < next_date
test. And to be pedantic, the format YYYYMMDD is the most robust regardless of regional/language/dateformat settings.
精彩评论