I have the following query that UNIONs two tables which contains tags associated with two separate entities: cat and dog. I am trying to create a table that lists tags and the number of times their appear across two tables. This is a normalized tagging system, so only the tag ID is referenced in the cat_tags and dog_tags tables, and that's why I am doing another INNER JOIN to get the actual tag value.
SELECT x.tag_id, (COUNT(y.tag_id) + COUNT(z.tag_id)) AS num, tag_name AS tag
FROM (SELECT dt.tag_id FROM dog_tags dt UNION SELECT st.tag_id FROM cat_tags st) x
LEFT JOIN dog_tags y ON y.tag_id = x.tag_id
LEFT JOIN cat_tags z ON z.tag_id = x.tag_id
INNER JOIN tags t ON x.tag_id = t.tag_id
GROUP BY x.tag_id ORDER BY num DESC LIMIT 0,100
The problem is that the num counts are not correct for tags that appear multiple times across the two tables. For example tag ID number 5 ("collar") appears twice in cat_tags and twice in dog_tags, however the above query gives the total count as 8 instead of 4. Another tag that ap开发者_运维技巧pears three times comes out as 6. Something is multiplying them by 2. What is it?
I think your query is really complicated and you could try something like this:
SELECT tag_id
, tag_name
, sum(num) as num
FROM tags
join
(
SELECT tag_id, count(*) as num FROM dog_tags GROUP BY tag_id
union all
SELECT tag_id, count(*) as num FROM cat_tags GROUP BY tag_id
) as AnimalsCount on AnimalsCount.tag_id = tags.tag_id
GROUP BY tag_id
, tag_name -- you can remove this if you are 100% sure is not necessary
BTW check the difference between union
and union all
: http://dev.mysql.com/doc/refman/5.0/en/union.html.
Try this:
select t.tag_id, t.tag_name as tag,
ifnull(dc.dog_total, 0) + ifnull(cc.cat_total, 0) as num
from
tags t
left join (
select tag_id, count(*) as dog_total
from dog_tags
group by tag_id
) as dc on t.tag_id = dc.tag_id
left join (
select tag_id, count(*) as cat_total
from cat_tags
group by tag_id
) as cc on t.tag_id = cc.tag_id
order by num desc
limit 0, 100
Well, the problem was the joins you have in your query. Because you weren't grouping by tag_id
for each source table. So, if collar
appeared 2 times in dog_tags
and 3 in cat_tags
the join would result in 6 rows that share the same tag_id
, so the counts would be wrong. Remember that a join is a Cartesian product of the rows of each joined table with a certain criteria. So, by grouping first by tag_id
for each source table, we ensure that the tag_id
appears only once in each table or derived table. And when we join the tables each tag_id
will produce a single row.
精彩评论