Getting a list of items from one table and adding aggregated ratings from another_问答_开发者

I currently have two tables, one with documents, and another with ratings

doc_id | doc_groupid | doc_name | doc_time

and then

rating_id | rating_docid | rating_score

where rating_score is either -1 or 1.

What I need to do is have a single query that retrieves every column in the document table WHERE groupid = #, but also has columns which aggregate the 开发者_如何学运维ratings. I can retrieve a list of ratings using

  SELECT rating_docid,
         SUM(CASE WHEN rating_type = 1 THEN 1 ELSE 0 END ) AS UpVotes,
         SUM(CASE WHEN rating_type = -1 THEN 1 ELSE 0 END) AS DownVotes
GROUP BY rating_docid

Which gives me a list of documents (as long as they have been rated) and how many upvotes or downvotes they have. I can also obviously very easily get a list of documents with

SELECT * FROM documents WHERE doc_groupid = #

But I have no idea how to do this without a subquery (using JOIN or LEFT JOIN), which my understanding is too slow. Honestly, I have no idea how to do this with a subquery either.

So my question is:

How can I do this with a speedy join?
How can I do this with a subquery?

Thanks!

Use:

   SELECT d.doc_id,
          d.doc_name,
          d.doc_time, 
          COALESCE(SUM(CASE WHEN r.rating_type = 1 THEN 1 ELSE 0 END), 0) AS upvotes,
          COALESCE(SUM(CASE WHEN r.rating_type = -1 THEN 1 ELSE 0 END), 0) AS downvotes
     FROM DOCUMENTS d
LEFT JOIN RATINGS r ON r.rating_docid = d.doc_id
    WHERE d.doc_groupid = ?
 GROUP BY d.doc_id, d.doc_name, d.doc_time

The doc_time is odd to me, makes me think you can have duplicates but with different time values...

JOIN vs Subquery

JOINs (INNER and OUTER) are not subqueries. To make things more complicated, subqueries can mean:

a query in the SELECT clause (AKA sub-select):
```
SELECT (SELECT col FROM TABLE) AS col2, ...
```

a query in the WHERE or HAVING clauses:

WHERE col = (SELECT column FROM TABLE)
HAVING col IN (SELECT cols FROM TABLE)

a query in the JOIN (AKA derived table, inline view):

LEFT JOIN (SELECT u.user,
                  COUNT(*) AS num
             FROM TABLE u
         GROUP BY u.user) x ON x.user = t.column

There's no hard'n'fast rule about one being better than the other because it all depends on:

table structure
data
indexing and table statistics
expected results

All that really matters is the work is done in as few passes over a table as necessary--ideally one.

I guess you need something like

SELECT * 
FROM documents d
LEFT JOIN 
(
   SELECT rating_docid,
     SUM(CASE WHEN rating_type = 1 THEN 1 ELSE 0 END ) AS UpVotes,
     SUM(CASE WHEN rating_type = -1 THEN 1 ELSE 0 END) AS DownVotes
     FROM rating_table
     GROUP BY rating_docid
)r ON (r.rating_docid = d.doc_id)
WHERE d.doc_groupid = ....

Also, it will probably work faster if you change it to

 SELECT * 
FROM documents d
LEFT JOIN 
(
   SELECT rating_docid,
     SUM(CASE WHEN rating_type = 1 THEN 1 ELSE 0 END ) AS UpVotes,
     SUM(CASE WHEN rating_type = -1 THEN 1 ELSE 0 END) AS DownVotes
     FROM rating_table
     INNER JOIN documents d1 ON (d1.doc_id = rating_docid )
     WHERE d1.doc_groupid =...
     GROUP BY rating_docid
)r ON (r.rating_docid = d.doc_id)
WHERE d.doc_groupid = ....

Might look strange because of the two joins but, supposing you have your indexed your columns probably, should perform very well.

SELECT d.doc_id, d.doc_name, d.doc_time
       SUM(rd.rating_type) * -1 as DownVotes,
       SUM(ru.rating_type) as UpVotes
FROM documents d
    LEFT JOIN ratings rd ON d.doc_id = rd.rating_docid AND rd.rating_type < 0
    LEFT JOIN ratings ru ON d.doc_id = ru.rating_docid AND rd.rating_type > 0
GROUP BY d.doc_id

You might want to add a COALESCE http://dev.mysql.com/doc/refman/5.0/en/comparison-operators.html#function_coalesce to prevent the query returning NULL if nothing to join.

SELECT d.doc_id, 
       COALESCE(SUM(rd.rating_type), 0) * -1 as DownVotes,
       COALESCE(SUM(ru.rating_type), 0) as UpVotes
FROM documents d ...

I wouldn't recommend a subquery if you have many documents to check because for every document another query is executed which means a lot of overhead.