Can someone please explain the discrepancy among the following queries:
Query 1 (returns 87 results):
SELECT userId, COUNT(userId) as usercount
FROM `cpnc_PaymentOrder`
WHERE created >= UNIX_TIMESTAMP('2011-03-01')
GROUP BY userId
HAVING usercount > 1
Query 2 (returns 177 results):
SELECT userId, COUNT(userId) as usercount
FROM `cpnc_PaymentOrder`
WHERE created >= UNIX_TIMESTAMP('2011-02-01')
GROUP BY userId
HAVING usercount > 1
Query 3 (returns 55 results):
SELECT userId, COUNT(userId) as usercount
FROM `cpnc_PaymentOrder`
WHERE created >= UNIX_TIMESTAMP('2011-02-01')
AND created < UNIX_TIMESTAMP('2011-03-01')
GROUP BY userId
HAVING usercount > 1
Now I would think that the number of results from Query 2 minus the number of results from Query 1 would equal the number of results from Query 3. But this is not the case. Can someone please explain why?
Thanks, Jonah
EDIT:
for clarification, the query i want to write is:
SELECT userId
FROM `cpnc_PaymentOrder`
WHERE created >= UNIX_TIMESTAMP('2011-02-01')
AND created < UNIX_TIMESTAMP('2011-03-01')
AND userId "appears in开发者_运维问答 at least one other record from before '2011-03-01'"
Because you're comparing number of groups.
Let's see this on a small example:
user_id | created
1 | 2011-02-02
1 | 2011-02-02
1 | 2011-02-02
2 | 2011-02-02
2 | 2011-02-02
1 | 2011-03-02
1 | 2011-03-02
2 | 2011-03-02
2 | 2011-03-02
First query for this dataset will return 2 rows, second will return also 2 rows, 3rd will return also 2 rows.
Just remove your GROUP BY
and see the difference (without GROUP BY
the math will match, of course)
Query 1 has: WHERE created
>= UNIX_TIMESTAMP('2011-03-01')
ie. date is larger than …
Query 3 has: WHERE created
< UNIX_TIMESTAMP('2011-03-01')
ie. date is less than …
So, Query 3 is not the intersection of Queries 1 and 2.
user_id | created
1 | 2011-02-02
1 | 2011-03-02
1 | 2011-03-02
Query1 would be 1 result, Query2 also 1 result but Query 3 has zero results, because you are missing the people who appeared only once in that time.
精彩评论