SQL Query Optimisation (Direction of Condition Evaluation)_问答_开发者

SQL Query Optimisation (Direction of Condition Evaluation)

开发者 https://www.devze.com 2023-03-18 17:25 出处：网络

Let\'s say I have a dictionary of 26000 words, 1000 words per letter of the alphabet. If I want to find all the words that have an \'e\' in them, I write:

Let's say I have a dictionary of 26000 words, 1000 words per letter of the alphabet.

If I want to find all the words that have an 'e' in them, I write:

SELECT * 
  FROM dict 
 WHERE word LIKE '%e%';

If I wanted to reduce that to only the words beginning with 'a' I could change the like condition or I could do this:

SELECT * 
  FROM dict 
 WHERE word LIKE '%e%' 
   AND id <开发者_如何转开发; 1000;

Lots of words have the letter 'e' in them and so would return true only to fail the second requirement if the conditions are evaluated left to right but I would expect faster results if the condition is evaluated from right to left.

My question is, would it be better to have the id < 1000 as the first or second condition or does this depend on the type of database.

The location of the condition is irrelevant, the same number of scans (if applicable) will be required. They are not parsed in order -- the optimizer determines what is applied, and when, based on table statistics and indexes (if any exist). Those statistics change, and can become out of date (which is why maintenance is important).

It would be bad to assume id < 1000 to be the equivalent of

SELECT * FROM dict WHERE word LIKE'a%'.

If you designed your database this way it would violate First Normal form. 1NF, Specifically: There's no top-to-bottom ordering to the rows. Technically there isn't a way to ensure this ordering is valid, especially if you wanted to add a word starting with 'A' after you setup your initial state.

One of the key design principles of modern relational database management systems is that you, the user, have no true control or say over how the data is actually being stored on the hard drive by the RDBMS. This means that you cannot assume that the data is (a) stored in alphabetical order on the drive, or (b) that when you retrieve the data, it will be retrieved in alphabetical order. The only way to be absolutely 100% sure that you are getting the data you want is to spell out the way you want it, and anything else is an assumption that some day may blow up in your face.

Why does this matter? Because your query assumes that the data you'll be getting will be in alphabetical order, starting with "A" and going up. (And that assumes consistent case--what about "A" vs "a"? Anything with leading spaces or numbers? Different systems handle different data differently...) Fixing this is simple enough, add an ORDER BY clause, such as:

select * from dict where word like ("%e%") and id < 1000 order by word;

Of course, if you have more than 1000 words beginning with "A" and containing "e", you're in trouble... and if you have less than 1000, you end up with a bunch of "B" words. Try something like:

select * from dict where left(word. 1) = "A" and word like ("%e%");

Depending on your RDBMS and any indexing you have on the table, the system could first identify all "A" words, and then run the "contains e" check on only them.

Try switching your where clause conditions around and then compare the execution plans.

This will show you the difference, if any (I would guess they will be identical, in this case)

The bottom line is, most of the time it makes no difference. However it can change the execution plan.