How do I rewrite a query to suit a CLUSTERED INDEX?_问答_开发者

How do I rewrite a query to suit a CLUSTERED INDEX?

开发者 https://www.devze.com 2023-04-12 11:20 出处：网络

I just observed that my query: SELECT X.A, X.B, X.GroupName FROM TableA X INNER JOIN TableB Y -- Huge table

I just observed that my query:

SELECT X.A, X.B, X.GroupName
FROM TableA X
INNER JOIN TableB Y -- Huge table
ON (X.A = Y.Name OR X.B = Y.Name)

TableB has a CLUSTERED INDEX ON column Name because of which this query was taking hours to run. So what I did was to rewrite the query as:

SELECT X.A, X.B, X.GroupName
FROM TableA X
INNER JOIN TableB Y -- Huge table
ON X.A = Y.Name
UNION
SELECT X.A, X.B, X.GroupName
FROM TableA X
INNER JOIN TableB Y -- Huge table
ON X.B = Y.Name

This one runs in a few seconds or in the worst case, minutes. While I understand the 开发者_运维百科reason now after burning myself, I was wondering if there is a cleaner way to write this query. I was thinking of a CTE but then the ON X.A = Y.Name and ON X.B = Y.Name are like parameters and am not sure how to deal with this.

My actual query is very big so I want to avoid repeating it two times for the sake of having a UNION. Any suggestions?

In cases such as this it may be acceptable to use the UNION if the two conditions require using the index in different ways. By putting them as OR in a single condition you may be removing the ability to use the index.

This is the same as the problem:

SELECT MIN(myCol), MAX(myCol)

By including both you may be borking a query plan's use of the index as it tries to find the "best of both worlds" query rather than "the best of each world, individually, added together"

Here is a (outdated) link which illustrates my point:
http://code.cheesydesign.com/?p=279
http://richardfoote.wordpress.com/category/index-full-scan-minmax/

You could try updating statistics, sometimes that helps queries pick appropriate indices, especially if you haven't done that in a while and have inserted or updated a lot of data.

UPDATE STATISTICS TableB

You could also try using an optimizer hint:

SELECT X.A, X.B, X.GroupName
FROM TableA X
INNER JOIN TableB Y WITH (INDEX(ClusteredIndexName)) -- Huge table
ON (X.A = Y.Name OR X.B = Y.Name)

You can see what indices are being used by using "Display Estimated Execution Plan" in the query menu (CTRL+L instead of CTRL+E), but rarely the actual query will be tuned differently.

I'd also recommend the NOLOCK hint. Normal queries put a shared lock on the data they access, preventing those rows from being updated. This lock also has some overhead associated with it. Using NOLOCK can speed up your query and lead to greater concurrency, but it can cause dirty reads. Let's say one of those rows in your large query is updated in the middle of running it. You may get both the old and new rows in your results (I think, never seen it happen). If you don't use NOLOCK, then that update might block until your query is complete, possibly causing a timeout to an important update.

SELECT X.A, X.B, X.GroupName
FROM TableA X WITH (NOLOCK)
INNER JOIN TableB Y WITH (NOLOCK, INDEX(ClusteredIndexName)) -- Huge table
ON (X.A = Y.Name OR X.B = Y.Name)