I just observed that my query:
SELECT X.A, X.B, X.GroupName
FROM TableA X
INNER JOIN TableB Y -- Huge table
ON (X.A = Y.Name OR X.B = Y.Name)
TableB
has a CLUSTERED INDEX
ON column Name
because of which this query was taking hours to run. So what I did was to rewrite the query as:
SELECT X.A, X.B, X.GroupName
FROM TableA X
INNER JOIN TableB Y -- Huge table
ON X.A = Y.Name
UNION
SELECT X.A, X.B, X.GroupName
FROM TableA X
INNER JOIN TableB Y -- Huge table
ON X.B = Y.Name
This one runs in a few seconds or in the worst case, minutes. While I understand the 开发者_运维百科reason now after burning myself, I was wondering if there is a cleaner way to write this query. I was thinking of a CTE but then the ON X.A = Y.Name
and ON X.B = Y.Name
are like parameters and am not sure how to deal with this.
My actual query is very big so I want to avoid repeating it two times for the sake of having a UNION
. Any suggestions?
In cases such as this it may be acceptable to use the UNION
if the two conditions require using the index in different ways. By putting them as OR
in a single condition you may be removing the ability to use the index.
This is the same as the problem:
SELECT MIN(myCol), MAX(myCol)
By including both you may be borking a query plan's use of the index as it tries to find the "best of both worlds" query rather than "the best of each world, individually, added together"
Here is a (outdated) link which illustrates my point:
http://code.cheesydesign.com/?p=279
http://richardfoote.wordpress.com/category/index-full-scan-minmax/
You could try updating statistics, sometimes that helps queries pick appropriate indices, especially if you haven't done that in a while and have inserted or updated a lot of data.
UPDATE STATISTICS TableB
You could also try using an optimizer hint:
SELECT X.A, X.B, X.GroupName
FROM TableA X
INNER JOIN TableB Y WITH (INDEX(ClusteredIndexName)) -- Huge table
ON (X.A = Y.Name OR X.B = Y.Name)
You can see what indices are being used by using "Display Estimated Execution Plan" in the query menu (CTRL+L instead of CTRL+E), but rarely the actual query will be tuned differently.
I'd also recommend the NOLOCK hint. Normal queries put a shared lock on the data they access, preventing those rows from being updated. This lock also has some overhead associated with it. Using NOLOCK can speed up your query and lead to greater concurrency, but it can cause dirty reads. Let's say one of those rows in your large query is updated in the middle of running it. You may get both the old and new rows in your results (I think, never seen it happen). If you don't use NOLOCK, then that update might block until your query is complete, possibly causing a timeout to an important update.
SELECT X.A, X.B, X.GroupName
FROM TableA X WITH (NOLOCK)
INNER JOIN TableB Y WITH (NOLOCK, INDEX(ClusteredIndexName)) -- Huge table
ON (X.A = Y.Name OR X.B = Y.Name)
精彩评论