开发者

Whether Inner Queries Are Okay?

开发者 https://www.devze.com 2023-03-17 05:35 出处:网络
I often see something like...开发者_运维问答 SELECT events.id, events.begin_on, events.name FROM events

I often see something like...

开发者_运维问答
SELECT events.id, events.begin_on, events.name
  FROM events
 WHERE events.user_id IN ( SELECT contacts.user_id 
                             FROM contacts 
                            WHERE contacts.contact_id = '1')
   OR events.user_id IN ( SELECT contacts.contact_id 
                            FROM contacts 
                           WHERE contacts.user_id = '1')

Is it okay to have query in query? Is it "inner query"? "Sub-query"? Does it counts as three queries (my example)? If its bad to do so... how can I rewrite my example?


Your example isn't too bad. The biggest problems usually come from cases where there is what's called a "correlated subquery". That's when the subquery is dependent on a column from the outer query. These are particularly bad because the subquery effectively needs to be rerun for every row in the potential results.

You can rewrite your subqueries using joins and GROUP BY, but as you have it performance can vary, especially depending on your RDBMS.


It varies from database to database, especially if the columns compared are

  • indexed or not
  • nullable or not

..., but generally if your query is not using columns from the table joined to -- you should be using either IN or EXISTS:

SELECT e.id, e.begin_on, e.name
  FROM EVENTS e
 WHERE EXISTS (SELECT NULL
                 FROM CONTACTS c 
                WHERE ( c.contact_id = '1' AND c.user_id = e.user_id )
                   OR ( c.user_id = '1' AND c.contact_id = e.user_id )

Using a JOIN (INNER or OUTER) can inflate records if the child table has more than one record related to a parent table record. That's fine if you need that information, but if not then you need to use either GROUP BY or DISTINCT to get a result set of unique values -- and that can cost you when you review the query costs.

EXISTS

Though EXISTS clauses look like correlated subqueries, they do not execute as such (RBAR: Row By Agonizing Row). EXISTS returns a boolean based on the criteria provided, and exits on the first instance that is true -- this can make it faster than IN when dealing with duplicates in a child table.


You could JOIN to the Contacts table instead:

SELECT events.id, events.begin_on, events.name
FROM events
JOIN contacts
ON (events.user_id = contacts.contact_id OR events.user_id = contacts.user_id)
WHERE events.user_id = '1'
GROUP BY events.id  
-- exercise: without the GROUP BY, how many duplicate rows can you end up with?

This leaves the following question up to the database: "Should we look through all the contacts table and find all the '1's in the various columns, or do something else?" where your original SQL didn't give it much choice.


The most common term for this sort of query is "subquery." There is nothing inherently wrong in using them, and can make your life easier. However, performance can often be improved by rewriting queries w/ subqueries to use JOINs instead, because the server can find optimizations.

In your example, three queries are executed: the main SELECT query, and the two SELECT subqueries.

SELECT events.id, events.begin_on, events.name
FROM events
JOIN contacts
ON (events.user_id = contacts.contact_id OR events.user_id = contacts.user_id)
WHERE events.user_id = '1'
GROUP BY events.id

In your case, I believe the JOIN version will be better as you can avoid two SELECT queries on contacts, opting for the JOIN instead.

See the mysql docs on the topic.

0

精彩评论

暂无评论...
验证码 换一张
取 消