Which indexes are necessary?_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-03-24 06:18 出处：网络

If I have a table TABLE_1 with, lets say 5 columns:开发者_StackOverflow COL1|COL2|COL3|COL4|COL5 [line]

If I have a table TABLE_1 with, lets say 5 columns:

COL1    |   COL2    |   COL3    |   COL4    |   COL5
[line]
[line]
[...]

And there are two main queries I want to do:

SELECT * FROM table_a WHERE COL1 = 'X' and COL2 = 'Y'

And the other being:

SELECT * FROM table_a WHERE COL2 = 'Z'

Which indexes should I create? Creating one with columns COL1 and COL2 would index for both queries, or should I need another index just for COL2 in order to have the second query faster?

thanks!

A regular B-tree index on table_a(COL2, COL1) could be used by both queries.

Having an index on COL2 alone might be more efficient to retrieve rows using only a filter on this column but the additional index will use space and will slow down inserts (and updates if you ever update this column). It's a trade-off.

In his answer Vincent says this:

"A regular B-tree index on table_a(COL2, COL1) could be used by both queries."

The crucial word in that sentence is "could". Because it is also true that such an index might not be used by either query.

Database indexing is a complicated and subtle subject. There are entire books written on the topic. Richard Foote has managed to maintain a blog talking about nothing but Oracle Indexes (and David Bowie) for years.

We cannot give a definitive answer to the question without knowing some basic facts about the table: how many rows does it have? How many distinct values of COL1 are there? How many distinct values of COL2 are there?

So, let's look at some alternative answers.

If TABLEA contains only a dozen rows the chances are a Full Table Scan will be more efficient than any indexed read.

If COL2 is unique, the only index we need is table_a(COL2)

If COL2 is unselective (relatively few values compared to total number of rows) then the second query should use a Full Table Scan rather than an indexed read.

If COL2 is unselective but COL1 is highly selective (very many values compared to total number of rows, but not unique) then the first query should use an index on table_a(COL2, COL1).

If COL2 is not particularly selective and COL1 is not particularly selective but the combination of the two is highly selective then the first query should use an index on table_a(COL2, COL1). A Full Table Scan would be the preferred path for the second query.

If COL1 is unique it should have an index, which is what would be used in the first query, but it obviously wouldn't help in the second query.

Then there is the matter of NULLs, which futher complicate the issue. NULLs are not indexed, except in composite indexes (and certain other special cases).

A truly complete answer would also address the matter of skew: if 90% of the rows in COL1 are '0' and the rest are highly selective then an index may or may not be useful. We may need to generate histograms when we gather indexes, but they're only really helpful when a query uses literals rather than bind variables.

You can consider using composite index on col2 and col1. See the Oracle concepts guide for more details