IMO, please correct me...
the leaf of clustered index contains the real table row, so full clustered index, with intermediate leaves, contain much more data than the full table(?) Why/when/how is ever whole clustered index scan chosen over the full table scan?How is clustered index on CUSTOMER_ID column used in SELECT query which does not contain it in either SELECT list or in WHERE condition [1]?
Update:
Update2:
As afterthought, consecutive access cannot give performance boost while loading table through IAM pointers can be parallelized. Does clustered index scan imply consecutive page reading? Does clustered table imply absence of IAM pointers (impossibility of full table scan)? Why cannot clustered table be full table scanned? I still do not understand how/why clustered index full scan can be "better" over full table scan. Does it mean that having clustered index can result in performance worsening?The question is about clustered table not heap (non-indexed) table.
Update3:
Is "full clustered index scan" really synonym to "full table scan"? What are differences?[1] Index Covering Boosts SQL Server Query Performance
The clustered index - or more precisely: its leaf pages ARE the table data - so a clustered index scan really is the same as a table scan (for a table with a clustered index).
If you don't have a clustered index, then your table is a heap - obviously, in this case, if you need to look at all the data, you cannot do a clustered index scan since there is no clustered index, so you'll end up with a table scan which just touches all data pages for that heap table.
The leaf level of a clustered index is the table. "Table Scan" refers to a heap without a clustered index.
Each data page contains pointers to the next and previous leaf node page so the scan does not need to use the higher level pages in the index.
Please read my answer under "No direct access to data row in clustered table - why?", first.
"the leaf of clustered index contains the real table row, so full clustered index, with intermediate leaves, contain much more data than the full table(?)"
See you are mixing up "Table" with storage structures. In the context of your question, eg. thinking about the size of the CI as opposed to the "table", well then you must think about the CI minus the leaf level (which is the data row). The CI, index portion only, is tiny. The intermediate levels (like any B-Tree) contain partial (not full) key entries; it excludes the lowest level, which is the full key entry, which sits in the row itself, and is not duplicated.
The table (full CI) may be 10GB. The CI only may be 10MB. There is an awful lot that can be determined from the 10MB without having to go to the 100GB.
For understanding: the equivalent NCI on the same table (CI) may be 22MB; the equivalent NCI on the same table if you removed the CI may be 21.5MB (assuming the CI key is reasonable, not fat wide).
"Why/when/how is ever whole clustered index scan chosen over the full table scan?"
Quite often. Again the context is, we are talking about the CI-minus-Leaf levels. For queries that use only the columns in the CI, the presence of those columns in the CI (any index actually) allow the query to be a "covered query", which means it can by serviced wholly from the index, no need to go to the data rows. Think range scans on partial keys: BETWEEN x AND yY; x <= y; etc.
(There is always the chance that the optimiser will choose a table scan, when you think it should choose an index scan, bu t that is a different story.)
"I still do not understand how/why clustered index full scan can be "better" over full table scan."
(The terms used by MS are less precise than my answers here.) For any query that can be answered from the 10MB CI, I would much rather churn 10MB through the data cache, than 100GB. For the same queries, bounded by a range on the CI key, that's a fraction of the 10MB.
For queries that requires a "full table scan", well yes, you must read all the Leaf pages of the CI, which is the 100GB.
精彩评论