I've heard that using an IN Clause can hurt performance because it doesn't use Indexes properly. See example below:
SELECT ID, Name, Address
FROM people
WHERE id IN (SELECT ParsedValue FROM UDF_Pa开发者_开发问答rseListToTable(@IDList))
Is it better to use the form below to get these results?
SELECT ID,Name,Address
FROM People as p
INNER JOIN UDF_ParseListToTable(@IDList) as ids
ON p.ID = ids.ParsedValue
Does this depend on which version of SQL Server you are using? If so which ones are affected?
Yes, assuming relatively large data sets.
It's considered better to use EXISTS
for large data sets. I follow this and have noticed improvements in my code execution time.
According to the article, it has to do with how the IN
vs. EXISTS
is internalized. Another article: http://weblogs.sqlteam.com/mladenp/archive/2007/05/18/60210.aspx
It's very simple to find out - open Management studio, put both versions of the query in, then run with the Show Execution plan turned on. Compare the two execution plans. Often, but not always, the query optimizer will make the same exact plan / literally do the same thing for different versions of a query that are logically equivalent.
In fact, that's its purpose - the goal is that the optimizer would take ANY version of a query, assuming the logic is the same, and make an optimal plan. Alas, the process isn't perfect.
Here's one scientific comparison:
http://sqlinthewild.co.za/index.php/2010/01/12/in-vs-inner-join/ http://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/
IN can hurt performance because SQL Server must generate a complete result set and then create potentially a huge IF statement, depending on the number of rows in the result set. BTW, calling a UDF can be a real performance hit as well. They are very nice to use but can really impact performance, if you are not careful. You can Google UDF and Performance to do some research on this.
More than the IN
or the Table Variable
, I would think that proper use of an Index would increase the performance of your query.
Also, from the table name, it does not seem like you are going to have a lot of entries in it so which way you go may be moot point in this particular example.
Secondly, IN
will be evaluated only once since there is no subquery. In your case, the @IDList variable is probably going to cause mistmatches you will need @IDList1, @IDList2, @IdList3....
because IN
demands a list.
As a general rule of thumb, you should avoid IN
with subqueries and use EXISTS
with a join - you will get better performance more often than not.
Your first example is not the same as your second example, because WHERE X IN (@variable)
is the same as WHERE X = @variable
(i.e. you cannot have variable lists).
Regarding performance, you'll have to look at the execution plans to see what indexes are chosen.
精彩评论