开发者

Reasons not to have a clustered index in SQL Server 2005

开发者 https://www.devze.com 2023-01-22 09:32 出处:网络
I\'ve inherited some database creation scripts for a SQL SERVER 2005 database. One thing I\'ve noticed is that all primary keys are created as NON CLUSTERED indexes as opposed to clustered.

I've inherited some database creation scripts for a SQL SERVER 2005 database.

One thing I've noticed is that all primary keys are created as NON CLUSTERED indexes as opposed to clustered.

I know that you can only have one clustered index per table and that you may want to have it on a non primary key column for query performance of searches etc. However there are no other CLUSTERED indexes on the tables in questions.

So my questi开发者_Go百科on is are there any technical reasons not to have clustered indexes on a primary key column apart from the above.


On any "normal" data or lookup table: no, I don't see any reason whatsoever.

On stuff like bulk import tables, or temporary tables - it depends.

To some people surprisingly, it appears that having a good clustered index actually can speed up operations like INSERT or UPDATE. See Kimberly Tripps excellent The Clustered Index Debate continues.... blog post in which she explains in great detail why this is the case.

In this light: I don't see any valid reason not to have a good clustered index (narrow, stable, unique, ever-increasing = INT IDENTITY as the most obvious choice) on any SQL Server table.

To get some deep insights into how and why to choose clustering keys, read all of Kimberly Tripp's excellent blog posts on the topic:

http://www.sqlskills.com/BLOGS/KIMBERLY/category/Clustering-Key.aspx

http://www.sqlskills.com/BLOGS/KIMBERLY/category/Clustered-Index.aspx

Excellent stuff from the "Queen of Indexing" ! :-)


Clustered Tables vs Heap Tables

(Good article on subject at www.mssqltips.com)

HEAP Table (Without clustered index)

  • Data is not stored in any particular order

  • Specific data can not be retrieved quickly, unless there are also non-clustered indexes

  • Data pages are not linked, so sequential access needs to refer back to the index allocation map (IAM) pages

  • Since there is no clustered index, additional time is not needed to maintain the index

  • Since there is no clustered index, there is not the need for additional space to store the clustered index tree

  • These tables have a index_id value of 0 in the sys.indexes catalog view

Clustered Table

  • Data is stored in order based on the clustered index key

  • Data can be retrieved quickly based on the clustered index key, if the query uses the indexed columns

  • Data pages are linked for faster sequential access Additional time is needed to maintain clustered index based on INSERTS, UPDATES and DELETES

  • Additional space is needed to store clustered index tree These tables have a index_id value of 1 in the sys.indexes catalog view


Please read my answer under "No direct access to data row in clustered table - why?", first. Specifically item [2] Caveat.

The people who created the "database" are cretins. They had:

  • a bunch of unnormalised spreadhseets, not normalised relational tables
  • the PKs are all IDENTITY columns (the spreadsheets are linked to each other; they have to be navigated one-by-one-by-one); there is no relational access or relational power across the database
  • they had PRIMARY KEY, which produce UNIQUE CLUSTERED
  • they found that that prevented concurrency
  • they removed the CI and made them all NCIs
  • they were too lazy to finish the reversal; to nominate an alternate (current NCI) to become the new CI, for each table
  • the IDENTITY column remains the Primary Key (it isn't really, but it is in this hamfisted implementation)

For such collections of spreadsheets masquerading as databases, it is becoming more and more common to avoid CIs altogether, and just have NCIs plus the Heap. Obviously they get none of the power or benefits of the CI, but hell, they get none of the power or benefit of Relational databases, so who cares that they get none of the power of CIs (which were designed for Relational databases, which theirs is not). The way they look at it, they have to "refactor" the darn thing every so often anyway, so why bother. Relational databases do not need "refactoring".

If you need to discuss this response further, please post the CREATE TABLE/INDEX DDL; otherwise it is a time-wasting academic argument.


Here is another (have it already been provided in other answers?) possible reason (still to be understood):

  • SQL Server - Poor performance of PK delete

I hope, I shall update later but for now it is rather the desire to link these topics

Update:
What do I miss in understanding the clustered index?


With some b-tree servers/programming languages still used today, fixed or variable length flat ascii files are used for storing data. When a new data record/row is added to a file (table), the record is (1) appended to the end of the file (or replaces a deleted record) and (2) the indexes are balanced. When data is stored this way, you don't have to be concerned about system performance (as far as what the b-tree server is doing to return a pointer to the first data record). The response time is only effected by the # of nodes in your index files.

When you get into using SQL, you hopefully come to realize that system performance has to be considered whenever you write an SQL statement. Using an "ORDER BY" statement on a non-indexed column can bring a system to its knees. Using a clustered index might put an unnecessary load on the CPU. It's the 21st century and I wish we didn't have to think about system performance when programming in SQL, but we still do.

With some older programming languages, it was mandatory to use an index whenever sorted data is retrieved. I only wish this requirement was still in place today. I can only wonder how many companies have updated their slow computer systems due to a poorly written SQL statement on non-indexed data.

In my 25 years of programming, I've never needed my physical data stored in a particular order, so maybe that is why some programmers avoid using clustered indexes. It's hard to know what the tradeoff is (storage time, verses retrieval time) especially if the system you are designing might store millions of records someday.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号