So I want to compare new users to returning users in a table by month. I have a table that contains each action with a username and a date stamp.
I can easily pull users that performed an action i开发者_JS百科n, for example, January 2011. To see see if each user is new I need to then run their username against all previous records (prior to January 2011).
In my fumblings I came up with the following:
SELECT ini.username,
MIN(ini.datetime) AS firstAction,
COUNT(ini.datetime) AS numMonth,
(SELECT COUNT(*)
FROM tableActions tot
WHERE tot.username = ini.username
AND tot.datetime < '201101%'
AND tot.datetime > '201001%') AS numTotal
FROM tableActions ini
WHERE DATETIME >= '201101%'
AND DATETIME < '201102%'
GROUP BY ini.username
ORDER BY firstAction
It doesn't error, but it doesn't finish either. Seems to be quite intense.
You can re-write the query to be (assuming tableactions.datetime
is a DATETIME
data type):
SELECT ini.username,
MIN(ini.datetime) AS firstAction,
COUNT(ini.datetime) AS numMonth,
x.numTotal
FROM tableActions ini
LEFT JOIN (SELECT tot.username,
COUNT(*) AS numTotal
FROM tableActions tot
WHERE tot.datetime > '2010-01-01'
AND tot.datetime < '2011-01-01'
GROUP BY tot.username) x ON x.username = ini.username
WHERE ini.datetime BETWEEN '2011-01-01' AND '2011-01-31'
GROUP BY ini.username
ORDER BY firstAction
Might help to have an index on username
at a minimum, though a covering index using username
, datetime
is worth considering.
The datetime
comparison looks suspect - LIKE
is the only to support wildcards.
I think a simple table-to-itself join with a suitable where clause will be sufficient (this query is straight from my head, not tested):
SELECT curr_activity.username, COUNT(prev_activity.username) AS did_something_in_the_past
FROM tableActions AS curr_activity
LEFT JOIN tableActions AS prev_activity ON curr_activity.username = prev_activity.username
WHERE curr_activity.datetime >= '2011-01-01' AND curr_activity.datetime < '2011-02-01'
AND prev_activity.datetime < '2011-01-01'
GROUP BY curr_activity.username
Indexes do matter. You must index the username
and datetime
column, and the datetime
column must be a datetime
or a similar data type.
SELECT username,
MIN(datetime) AS firstAction,
MAX(datetime) AS numMonth,
COUNT(*) AS numTotal
GROUP BY ini.username
HAVING numTotal > 1
WHERE DATETIME between '201001%' AND '201102%'
ORDER BY username
* I think this collapsed version is what you need ?
I think you can replace
SELECT COUNT(*)
FROM tableActions tot
WHERE tot.username = ini.username
AND tot.datetime < '201101%'
AND tot.datetime > '201001%'
with
SELECT 1
FROM tableActions tot
WHERE tot.username = ini.username
AND tot.datetime < '201101%'
AND tot.datetime > '201001%' LIMIT 1
, so it does not have to loop through all the records and count them.
精彩评论