I have few questions on SQL..
How to analyze the performance of a query? Any soft开发者_如何转开发ware, inbuilt features of MSSQL server 2005/2008?
What should be used in place of
in
in queries so that the performance is better? Eg:SELECT * FROM enquiry_courses WHERE enquiry_id IN ( SELECT enquiry_id FROM enquiries WHERE session_id = '4cd3420a16dbd61c6af58f6199ac00f1' )
Which is better:
JOINS
,EXISTS
orIN
in terms of performance?
Comments/Help appreciated...
Use the SQL Server Management Studio, and include
Actual Execution Plan
andSET STATISTICS TIME
andSET STATISTICS IO
.This in corresponds to a
JOIN
, but rewriting probably won't matter. A guess could be that you need indexes onenquiry_courses.enquiry_id
and onenquiries.session_id
to improve query performance.
MSSQL generally comes with a built in gui tool called Query Analyser which describes how the query will be executed.
For 2) you could rewrite as:
SELECT *
FROM enquiry_courses ec
WHERE EXISTS (select 1 FROM enquiries e
WHERE e.enquiry_id = ec.enquiry_id
and e.session_id ='4cd3420a16dbd61c6af58f6199ac00f1' )
but I can't believe it would make any performance difference in a modern RDBMS.
3: I would expect an IN or EXIST clause to be flattened to a JOIN by the database engine, so there shouldn't be a difference in performance. I don't know about SQL Server, but in Oracle you can verify this by checking the execution plan.
This question suggests that EXISTS is quicker which is what I had been taught IN () vs EXISTS () in SqlServer 2005 (or generally in any RDBMS)
One thing to note is that EXISTS and IN should be used in preference to NOT EXISTS and NOT IN
Bit of a tangent from performance but this is a good article on the subtle differences between IN and EXISTS http://weblogs.sqlteam.com/mladenp/archive/2007/05/18/60210.aspx
As others have said, check the "execution plan". SQL Server Management studio can show you two kinds of execution plans, estimated and actual. Estimated is how SQL Server guesses it would execute the query and is returned without actually executing the query, and the actual plan is returned together with a result set and shows what was actually done.
That query looks good, but you have to make sure that you have an index on enquiry_courses.enquiry_id, and it's probably best that enquiries.enquiry_id is not nullable.
The semantics of IN and EXISTS are slightly different (IN will return no rows if there is one or more NULLs in the subquery). If the subquery is guaranteed to be not null, it doesn't matter. There is some kind of "internet truth" that you should use EXISTS on SQL Server and IN on Oracle, but this might have been true when dinosaurs ruled the planet but it doesn't apply anymore. IN and EXISTS both perform a semi-join, and the optimizer is more than capable of deciding how to execute this join.
I guess the join gives more free to the engine for choice the best query plan. In your exactly case, probably have all solutions similar performances.
SELECT enquiry_courses.*
FROM enquiry_courses
INNER JOIN enquiries ON enquiries.enquiry_id=enquiry_courses
AND session_id = '4cd3420a16dbd61c6af58f6199ac00f1'
They each behave differently: it is not a performance choice
The only correct and reliable choice is EXISTS or NOT EXISTS that works all the time.
- JOIN may needs DISTINCT
- WHERE/LEFT JOIN would needs correct placement of the filter
- NOT IN fails on NULL
Example:
DECLARE @Parent TABLE (foo int NULL)
INSERT @Parent (foo) VALUES (1)
INSERT @Parent (foo) VALUES (2)
INSERT @Parent (foo) VALUES (3)
INSERT @Parent (foo) VALUES (4)
DECLARE @Child TABLE (bar int NULL, foo int NULL)
INSERT @Child (bar, foo) VALUES (100, 1)
INSERT @Child (bar, foo) VALUES (200, 2)
INSERT @Child (bar, foo) VALUES (201, 2)
INSERT @Child (bar, foo) VALUES (300, NULL)
INSERT @Child (bar, foo) VALUES (301, NULL)
INSERT @Child (bar, foo) VALUES (400, 4)
INSERT @Child (bar, foo) VALUES (500, NULL)
--"positive" checks
SELECT -- multiple "2" = FAIL without DISTINCT
P.*
FROM
@Parent P JOIN @Child C ON P.foo = C.foo
SELECT -- correct
P.*
FROM
@Parent P
WHERE
P.foo IN (SELECT c.foo FROM @Child C)
SELECT -- correct
P.*
FROM
@Parent P
WHERE
EXISTS (SELECT * FROM @Child C WHERE P.foo = C.foo)
--"negative" checks
SELECT -- correct
P.*
FROM
@Parent P LEFT JOIN @Child C ON P.foo = C.foo
WHERE
C.foo IS NULL
SELECT -- no rows = FAIL
P.*
FROM
@Parent P
WHERE
P.foo NOT IN (SELECT c.foo FROM @Child C)
SELECT -- correct
P.*
FROM
@Parent P
WHERE
NOT EXISTS (SELECT * FROM @Child C WHERE P.foo = C.foo)
Note: with EXISTS, the SELECT in the subquery is irrelevant as mentioned in ANSI 92 standard...
NOT EXISTS (SELECT * FROM @Child C WHERE P.foo = C.foo)
NOT EXISTS (SELECT NULL FROM @Child C WHERE P.foo = C.foo)
NOT EXISTS (SELECT 1 FROM @Child C WHERE P.foo = C.foo)
NOT EXISTS (SELECT 1/0 FROM @Child C WHERE P.foo = C.foo)
- check the Excution Plan
- You can optimise your query by:
- Make a "arguments search" rather than IN
- Put Index on session_id
SELECT * FROM enquiry_courses as Courses, enquiries as Enquiries WHERE Enquiries.session_id = '4cd3420a16dbd61c6af58f6199ac00f1' AND Courses.enquiry_id = Enquiries.enquiry_id
3.Exists is better for performance.
EDIT: Exists & IN are better than JOIN for performance issues.
EDIT: I re-wrote the query so that it's faster (I put the most restrictive condition first in the WHERE close)
精彩评论