I currently have two tables, one with documents, and another with ratings
doc_id | doc_groupid | doc_name | doc_time
and then
rating_id | rating_docid | rating_score
where rating_score is either -1 or 1.
What I need to do is have a single query that retrieves every column in the document table WHERE groupid = #, but also has columns which aggregate the 开发者_如何学运维ratings. I can retrieve a list of ratings using
SELECT rating_docid,
SUM(CASE WHEN rating_type = 1 THEN 1 ELSE 0 END ) AS UpVotes,
SUM(CASE WHEN rating_type = -1 THEN 1 ELSE 0 END) AS DownVotes
GROUP BY rating_docid
Which gives me a list of documents (as long as they have been rated) and how many upvotes or downvotes they have. I can also obviously very easily get a list of documents with
SELECT * FROM documents WHERE doc_groupid = #
But I have no idea how to do this without a subquery (using JOIN or LEFT JOIN), which my understanding is too slow. Honestly, I have no idea how to do this with a subquery either.
So my question is:
- How can I do this with a speedy join?
- How can I do this with a subquery?
Thanks!
Use:
SELECT d.doc_id,
d.doc_name,
d.doc_time,
COALESCE(SUM(CASE WHEN r.rating_type = 1 THEN 1 ELSE 0 END), 0) AS upvotes,
COALESCE(SUM(CASE WHEN r.rating_type = -1 THEN 1 ELSE 0 END), 0) AS downvotes
FROM DOCUMENTS d
LEFT JOIN RATINGS r ON r.rating_docid = d.doc_id
WHERE d.doc_groupid = ?
GROUP BY d.doc_id, d.doc_name, d.doc_time
The doc_time
is odd to me, makes me think you can have duplicates but with different time values...
JOIN vs Subquery
JOINs (INNER and OUTER) are not subqueries. To make things more complicated, subqueries can mean:
a query in the SELECT clause (AKA sub-select):
SELECT (SELECT col FROM TABLE) AS col2, ...
a query in the WHERE or HAVING clauses:
WHERE col = (SELECT column FROM TABLE) HAVING col IN (SELECT cols FROM TABLE)
a query in the JOIN (AKA derived table, inline view):
LEFT JOIN (SELECT u.user, COUNT(*) AS num FROM TABLE u GROUP BY u.user) x ON x.user = t.column
There's no hard'n'fast rule about one being better than the other because it all depends on:
- table structure
- data
- indexing and table statistics
- expected results
All that really matters is the work is done in as few passes over a table as necessary--ideally one.
I guess you need something like
SELECT *
FROM documents d
LEFT JOIN
(
SELECT rating_docid,
SUM(CASE WHEN rating_type = 1 THEN 1 ELSE 0 END ) AS UpVotes,
SUM(CASE WHEN rating_type = -1 THEN 1 ELSE 0 END) AS DownVotes
FROM rating_table
GROUP BY rating_docid
)r ON (r.rating_docid = d.doc_id)
WHERE d.doc_groupid = ....
Also, it will probably work faster if you change it to
SELECT *
FROM documents d
LEFT JOIN
(
SELECT rating_docid,
SUM(CASE WHEN rating_type = 1 THEN 1 ELSE 0 END ) AS UpVotes,
SUM(CASE WHEN rating_type = -1 THEN 1 ELSE 0 END) AS DownVotes
FROM rating_table
INNER JOIN documents d1 ON (d1.doc_id = rating_docid )
WHERE d1.doc_groupid =...
GROUP BY rating_docid
)r ON (r.rating_docid = d.doc_id)
WHERE d.doc_groupid = ....
Might look strange because of the two joins but, supposing you have your indexed your columns probably, should perform very well.
SELECT d.doc_id, d.doc_name, d.doc_time
SUM(rd.rating_type) * -1 as DownVotes,
SUM(ru.rating_type) as UpVotes
FROM documents d
LEFT JOIN ratings rd ON d.doc_id = rd.rating_docid AND rd.rating_type < 0
LEFT JOIN ratings ru ON d.doc_id = ru.rating_docid AND rd.rating_type > 0
GROUP BY d.doc_id
You might want to add a COALESCE http://dev.mysql.com/doc/refman/5.0/en/comparison-operators.html#function_coalesce to prevent the query returning NULL if nothing to join.
SELECT d.doc_id,
COALESCE(SUM(rd.rating_type), 0) * -1 as DownVotes,
COALESCE(SUM(ru.rating_type), 0) as UpVotes
FROM documents d ...
I wouldn't recommend a subquery if you have many documents to check because for every document another query is executed which means a lot of overhead.
精彩评论