JOINS, EXISTS or IN which is better? Few questions on SQL_问答_开发者

I have few questions on SQL..

How to analyze the performance of a query? Any soft开发者_如何转开发ware, inbuilt features of MSSQL server 2005/2008?
What should be used in place of inin queries so that the performance is better? Eg:

SELECT * FROM enquiry_courses WHERE enquiry_id IN ( SELECT enquiry_id FROM enquiries WHERE session_id = '4cd3420a16dbd61c6af58f6199ac00f1' )
Which is better: JOINS, EXISTS or IN in terms of performance?

Comments/Help appreciated...

Use the SQL Server Management Studio, and include Actual Execution Plan and SET STATISTICS TIME and SET STATISTICS IO.
This in corresponds to a JOIN, but rewriting probably won't matter. A guess could be that you need indexes on enquiry_courses.enquiry_id and on enquiries.session_id to improve query performance.

MSSQL generally comes with a built in gui tool called Query Analyser which describes how the query will be executed.

For 2) you could rewrite as:

SELECT * 
FROM enquiry_courses ec 
WHERE EXISTS (select 1 FROM enquiries e 
              WHERE e.enquiry_id = ec.enquiry_id 
              and e.session_id ='4cd3420a16dbd61c6af58f6199ac00f1' )

but I can't believe it would make any performance difference in a modern RDBMS.

3: I would expect an IN or EXIST clause to be flattened to a JOIN by the database engine, so there shouldn't be a difference in performance. I don't know about SQL Server, but in Oracle you can verify this by checking the execution plan.

This question suggests that EXISTS is quicker which is what I had been taught IN () vs EXISTS () in SqlServer 2005 (or generally in any RDBMS)

One thing to note is that EXISTS and IN should be used in preference to NOT EXISTS and NOT IN

Bit of a tangent from performance but this is a good article on the subtle differences between IN and EXISTS http://weblogs.sqlteam.com/mladenp/archive/2007/05/18/60210.aspx

As others have said, check the "execution plan". SQL Server Management studio can show you two kinds of execution plans, estimated and actual. Estimated is how SQL Server guesses it would execute the query and is returned without actually executing the query, and the actual plan is returned together with a result set and shows what was actually done.
That query looks good, but you have to make sure that you have an index on enquiry_courses.enquiry_id, and it's probably best that enquiries.enquiry_id is not nullable.
The semantics of IN and EXISTS are slightly different (IN will return no rows if there is one or more NULLs in the subquery). If the subquery is guaranteed to be not null, it doesn't matter. There is some kind of "internet truth" that you should use EXISTS on SQL Server and IN on Oracle, but this might have been true when dinosaurs ruled the planet but it doesn't apply anymore. IN and EXISTS both perform a semi-join, and the optimizer is more than capable of deciding how to execute this join.

I guess the join gives more free to the engine for choice the best query plan. In your exactly case, probably have all solutions similar performances.

SELECT enquiry_courses.* 
FROM enquiry_courses 
INNER JOIN enquiries ON enquiries.enquiry_id=enquiry_courses 
                        AND session_id = '4cd3420a16dbd61c6af58f6199ac00f1'

They each behave differently: it is not a performance choice

The only correct and reliable choice is EXISTS or NOT EXISTS that works all the time.

JOIN may needs DISTINCT
WHERE/LEFT JOIN would needs correct placement of the filter
NOT IN fails on NULL

Example:

DECLARE @Parent TABLE (foo int NULL)
INSERT @Parent (foo) VALUES (1)
INSERT @Parent (foo) VALUES (2)
INSERT @Parent (foo) VALUES (3)
INSERT @Parent (foo) VALUES (4)

DECLARE @Child TABLE (bar int NULL, foo int NULL)
INSERT @Child (bar, foo) VALUES (100, 1)
INSERT @Child (bar, foo) VALUES (200, 2)
INSERT @Child (bar, foo) VALUES (201, 2)
INSERT @Child (bar, foo) VALUES (300, NULL)
INSERT @Child (bar, foo) VALUES (301, NULL)
INSERT @Child (bar, foo) VALUES (400, 4)
INSERT @Child (bar, foo) VALUES (500, NULL)

--"positive" checks
SELECT -- multiple "2" = FAIL without DISTINCT
    P.*
FROM
    @Parent P JOIN @Child C ON P.foo = C.foo

SELECT -- correct
    P.*
FROM
    @Parent P
WHERE
    P.foo IN (SELECT c.foo FROM @Child C)

SELECT -- correct
    P.*
FROM
    @Parent P
WHERE
    EXISTS (SELECT * FROM @Child C WHERE P.foo = C.foo)

--"negative" checks
SELECT -- correct
    P.*
FROM
    @Parent P LEFT JOIN @Child C ON P.foo = C.foo
WHERE
    C.foo IS NULL

SELECT -- no rows = FAIL
    P.*
FROM
    @Parent P
WHERE
    P.foo NOT IN (SELECT c.foo FROM @Child C)

SELECT -- correct
    P.*
FROM
    @Parent P
WHERE
    NOT EXISTS (SELECT * FROM @Child C WHERE P.foo = C.foo)

Note: with EXISTS, the SELECT in the subquery is irrelevant as mentioned in ANSI 92 standard...

NOT EXISTS (SELECT * FROM @Child C WHERE P.foo = C.foo)
NOT EXISTS (SELECT NULL FROM @Child C WHERE P.foo = C.foo)
NOT EXISTS (SELECT 1 FROM @Child C WHERE P.foo = C.foo)
NOT EXISTS (SELECT 1/0 FROM @Child C WHERE P.foo = C.foo)

check the Excution Plan
You can optimise your query by:
- Make a "arguments search" rather than IN
- Put Index on session_id

    SELECT * FROM enquiry_courses as Courses, enquiries as Enquiries
    WHERE Enquiries.session_id = '4cd3420a16dbd61c6af58f6199ac00f1'   
    AND Courses.enquiry_id = Enquiries.enquiry_id

3.Exists is better for performance.

EDIT: Exists & IN are better than JOIN for performance issues.

EDIT: I re-wrote the query so that it's faster (I put the most restrictive condition first in the WHERE close)