开发者

How does sql server sort your data?

开发者 https://www.devze.com 2022-12-16 13:13 出处:网络
I was wondering how sql server sorts i开发者_如何学Got\'s data. I noticed that if I have a table that doesn\'t contain the column \"Id\" and you select data without \"ORDER BY\" sql server doesn\'t au

I was wondering how sql server sorts i开发者_如何学Got's data. I noticed that if I have a table that doesn't contain the column "Id" and you select data without "ORDER BY" sql server doesn't automatically sort on the primary column.

Does anyone know what rule sql server follows to sort it's data?


Although it's good to wonder about how it might be explained that you often see the same order, I'd like to point out that it never a good idea to rely on implicit order caused by the particular implementation of the underlying database engine. In other words, its nice to know why, but you should never ever rely on it. For MS SQL, the only thing that reliably delivers the rows in a certain order, is an explicit ORDER BY clause.

Not only do different RDMBS-es behave differently, one particular instance may behave differently due to an update (patch). Not only that, even the state of the RDBMS software may have an impact: a "warm" database behaves differently than a "cold" one, a small table behaves differently than a large one.

Even if you have background information about the implementation (ex: "there is a clustered index, thus it is likely the data will be returned by order of the clustered index"), there is always a possibility that there is another mechanism you don't know about that causes the rows to be returned in a different order (ex1: "if another session just did a full table scan with an explicit ORDER BY the resultset may have been cached; a subsequent full scan will attempt to return the rows from the cache"; ex2: "a GROUP BY may be implemented by sorting the data, thus impacting the order the rows are returned"; ex3: "If the selected columns are all in a secondary index that is already cached in memory, the engine may scan the secondary index instead of the table, most likely returning the rows by order of the secondary index").

Here's a very simple test that illustrates some of my points.

First, startup SQL server (I'm using 2008). Create this table:

create table test_order (
    id int not null identity(1,1) primary key
,   name varchar(10) not null 
)

Examine the table and witness that a clusted index was created to support the primary key on the id column. For example, in sql server management studio, you can use the tree view and navigate to the indexes folder beneath your table. There you should see one index, with a name like: PK__test_ord__3213E83F03317E3D (Clustered)

Insert the first row with this statement:

insert into test_order(name)
select RAND()

Insert more rows by repeating this statement 16 times:

insert into test_order(name)
select RAND()
from   test_order

You should now have 65536 rows:

select COUNT(*) 
from   test_order

Now, select all rows without using an order by:

select *
from   test_order

Most likely, the results will be returned by order of the primary key (although there is no guarantee). Here's the result I got (which is indeed by order of primary key):

#      id    name
1      1     0.605831
2      2     0.517251
3      3     0.52326
.      .     .......
65536  65536 0.902214

(the # is not a column but the ordinal position of the row in the result)

Now, create a secondary index on the name column:

create index idx_name on test_order(name)

Select all rows, but retrieve only the name column:

select name
from   test_order

Most likely the results will be returned by order of the secondary index idx_name, since the query can be resolved by only scanning the index (i.o.w. idx_name is a covering index). Here's the result I got, which is indeed by order of name.

#      name
1      0.0185732
2      0.0185732
.      .........
65536  0.981894

Now, select all columns and all rows again:

select * 
from test_order

Here's the result I got:

#      id    name
1      17    0.0185732
2      18    0.0185732
3      19    0.0185732
...    ..    .........

as you can see, quite different from the first time we ran this query. (It looks like the rows are ordered by the secondary index, but I don't have an explanation why that should be so).

Anyway, the bottom line is - don't rely on implicit order. You can think of explanations why a particular order can be observed, but even then you can't always predict it (like in the latter case) without having intimate knowledge of implementation and runtime state.


If you don't specify an ORDER BY clause explicitly, there is no guaranteed order that the results will be sorted in. It isn't even guaranteed to be based on the clustered index.

You can see an example of this in this article.


AS SQL is based on Set thoery and Set does not guarantee any order, thus if you don't specify a particular order explicityly, order will not be guaranteed.


I had a similar experience with SQL Server returning results sorted differently than I expected. I found that if you specify a table hint in the select statement, giving the name of the clustered index, you get the results ordered how you want:

select * from test_order WITH (INDEX([ClusteredIndexName]))
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号