I have four different tables in my database:
thread:
- thread_id
- thread_content
- timestamp
thread_rating:
- thread_rating_id
- thread_id
- liked
- disliked
thread_report:
- thread_report_id
- thread_id
thread_impression:
- thread_impression_id
- thread_id
And I'm going to join on these tables with this SQL-Query
SELECT t.thread_id,
t.thread_content,
SUM(tra.liked) AS liked,
SUM(tra.disliked) AS disliked,
t.timestamp,
((100*(tra.liked + SUM(tra.liked))) / (tra.liked + SUM(tra.liked) + (tra.disliked + SUM(tra.disliked)))) AS liked_percent,
((100*(COUNT(DISTINCT tre.thread_report_id)) / ((COUNT(DISTINCT ti.thread_impression_id))))) AS reported_percent
FROM thread AS t
LEFT JOIN thread_rating AS tra ON t.thread_id = tra.thread_id
LEFT JOIN thread_report AS tre ON tra.thread_id = tre.thread_id
LEFT JOIN thread_impression AS ti ON tre.thread_id = ti.thread_id
GROUP BY t.thread_id
ORDER BY liked_percent
The Query should return all thread_ids with the calculated liked and disliked, the likes in percent, the timestamp, when the thread was inserted into the database and the reports in percent to the impressions (the times, the thread was shown to the user)...
Nearly all results are right, the only results which are not right are the likes and dislikes.
If I put a count(*) in front of the query, I can see, that the right results have a count of 1 and the wrong ones have sometimes a count of up to 60. Seems like there are cross join-problems...
I think that this is an issue with the Grouping, or perhaps I should embrace the Joins.
I've seen solutions with subselects. But I don't think t开发者_Go百科hat this is a great solutions for this issue...
What am I doing wrong here?
The tra
table has multiple records per thread_id. This caused double counts in the SUM
function.
Do the summations in a subselect, grouped by the join field.
That way you will only have one thread_id in tra2
to join with and duplicate rows will be avoided.
SELECT t.thread_id,
t.thread_content,
tra2.liked
tra2.disliked,
t.timestamp,
tra2.liked_percent,
((100*(COUNT(DISTINCT tre.thread_report_id)) / ((COUNT(DISTINCT ti.thread_impression_id))))) AS reported_percent
FROM thread AS t
LEFT JOIN (
SELECT
tra.thread_id
, SUM(tra.liked) AS liked
, SUM(tra.disliked) AS disliked
, ((100*(tra.liked + SUM(tra.liked))) / (tra.liked + SUM(tra.liked) + (tra.disliked + SUM(tra.disliked)))) AS liked_percent
FROM thread_rating AS tra
GROUP BY tra.thread_id
) as tra2 ON t.thread_id = tra2.thread_id
LEFT JOIN thread_report AS tre ON tra.thread_id = tre.thread_id
LEFT JOIN thread_impression AS ti ON tre.thread_id = ti.thread_id
GROUP BY t.thread_id
ORDER BY liked_percent DESC
精彩评论