Database indexes: A good thing, a bad thing, or a waste of time?_问答_开发者

Adding indexes is often suggested here as a remedy for performance problems.

(I'm talking about reading & querying ONLY, we all know indexes can make writing slower).

I have tried this remedy many times, over many years, both on DB2 and MSSQL, and the result were invariably disappointing.

My finding has been that no matter how 'obvious' it was that an index would make things better, it turned out that the query optimiser was smarter, and my cleverly-chosen index almost always made things worse.

I 开发者_StackOverflowshould point out that my experiences relate mostly to small tables (<100'000 rows).

Can anyone provide some down-to-earth guidelines on choices for indexing?

The correct answer would be a list of recommendations something like:

Never/always index a table with less than/more than NNNN records
Never/always consider indexes on multi-field keys
Never/always use clustered indexes
Never/always use more than NNN indexes on a single table
Never/always add an index when [some magic condition I'm dying to learn about]

Ideally, the answer will give some instructive examples.

Indexes are kind of like chemotherapy...too much and it kills you...too little and you die...do it the wrong way and you die. You gotta know just how much, how often, and what kind to make it not kill you.

Your hardware, platform, environment, load all play a role. So to answer your questions..

Yes, possibly sometimes.

As a rule of thumb, primary keys and foreign keys need to be indexed. Usually primary key are indexed just by defining them as such, but FKs are not in every database (they definitely are not in SQL Server, I can't really speak for other dbs). You will be using these in joins, so it is generally critical to performance to define these.

Now if you have fields you often use in where clauses, they can benefit from indexes as well providing several things:

First the field must have a range of values. A bit field or a field with only 2 or 3 values will almost never use an index.
Second the queries you write must be sargable. That is they must be designed to use indexes. I suspect if you never get performance improvements from what look like likely candidates for indexes, then you probably have queries that are not sargable. For instance take "WHERE Name like '%Smith'" as a where clause. Without knowing the first characters, the optimizer can't use the index.

Small tables rarely benefit much from indexes. If the optimizer can hold the whole thing in memory, then it is often faster to do so. If you were working with multimillion record tables, you would see that indexes are critical.

Indexing can be very complex and if you are interested in the subject, I suggest you get a good book on performance tuning your particular database and read in depth about them.

An index that's never used is a waste of disk space, as well as adding to the insert/update/delete time. It's probably best to define the clustering index first, then define additional indexes as you find yourself writing WHERE clauses.

One common index mistake I see is people wondering why a select on col2 (or col3) takes so long when the index is defined as col1 ASC, col2 ASC, col3 ASC. When you have a multiple column index, your WHERE clause must use the first column in the index, or the first and second column in the index, and so forth.

If you need to access the data by col2, then you need an additional index that's defined as col2 ASC.

With small domain tables, it's sometimes faster to do a table scan than it is to read rows from the table using an index. This depends on the speed of your database machine and the speed of the network.

You need indexes. Only with indexes you can access data fast enough.

To make it as short as possible:

add indexes for columns you are frequently filtering (or grouping) for. (eg. a state or name)
like and sql functions could make the DBMS not use indexes.
add indexes only on columns which have many different values (eg. no boolean fields)
It is common to add indexes to foreign keys, but it is not always needed.
don't add indexes in very short tables
never add indexes when you don't know how they should enhance performance.

Finally: look into execution plans to decide how to optimize queries.

You'll add indexes just for a single, critical query. In this case, you'll add exactly the indexes that are needed in the query in question (multi-column indexes).

Basically when DB is collecting data and it's alive indexes have to go and evolve with that flow. There maybe really good index on table but after growing beyond of XXX records the same index in the same table is useless and in that case it should be refactored.

To have optimized and fast DB the only way is to monitor it all the time and refactor it over the time as records come in.

Real life example i got some time ago was super fast query restricted by some time range (created_at between A and B) and super slow query where time range was different. Same query, same database, same application and only one difference on time range.

Always use clustered indexes.

In fact you can't help but using them. The data in a table will be laid out on disk in some particular order anyway, it can't be save as a pile or something. You have the chance of specifying how exactly this data will be laid out. Why burn it?

When you have a table which gets new records appended and you observe that some value in those records always grow (like StackOverflow question number), make a clustered index out of it. Then the new data will not be inserted in the middle but will basically be appended to a file on disk which is a relatively cheap operation.

If a table is expected to be the target of a join then it is best to have a clustered index on that table so that the joins can be performed sequentially through the data pages. The columns in the clustered index will (on some DB systems) be included in all of the other indexes on that table, since those are the values that the indexes will use to reference the table data. To keep the other indexes from getting too large, the columns in the clustered index should be as narrow as possible, so it is best to use only numeric—rather than character—data types in the clustered index. In general, fewer columns are better than more columns, but notice that three int columns (12 bytes per row) are much better than one nvarchar(32) column (potentially 64 bytes per row).

If the clustered index is narrow, then a few additional indexes should not negatively impact performance very much even on very large tables.

Seems you are confusing two concepts here. Adding indices *generally ~~can~~ only make a read query faster, very very rarely (almost never) slower. Adding an index never forces the query optimizer to use it. It will only use it if it thinks it can benefit from it, and it is generally very smart about those decisions.

For inserts/updates, of course, every index hurts performance a bit more... But at the other end of the spectrum, for, say a read only database, (like a USPS address database which is distributed monthly), in operational use there would ne no inserts/updates, so the only negative impact of additional indices is the disk space they take up.

This is entirely different that specifying that the query optimizer USE an index, in effect overriding what it would do on it's own... That can potentially make a query slower.

EDIT: Edited to eliminate opportunity for misinterpretation by overly literal readers.