If databases can support up to millions of records, how i开发者_运维知识库s all of this data organized such that queries can be executed in a reasonable amount of time?
Indexes!
Wikipedia link to a full discussion on Indexes
A complete answer is the topic of numerous books and dissertations ;-)
However, the basic answer is that all modern RDBMS have a couple of features: First, they have a query optimizer that is highly effecient. This find a good (not always the best) path to answer the query provided. Of course, this can actually do more to help complicated questions than simple ones.
Second, they store the data in a structured format that is easy to search. Data without a clustered index is called a heap, and this is generally less effecient to search, but it is still structured data. Data will generally have a clustered index on it which logically orders the data. (Some people believe that it physically orders the data. There is some truth to this, but it is not absolutely true. Itzik Ben Gan amoungst others has an excellent article debunking this belief).
In addition to the clustered index, non-clustered indexes are generally used. These are like indexes in a book which can help tell the query optimizer where to look for the data requested, or in some specific cases the index itself may have all the data needed.
This is of course a gross and horrible oversimplification. If you want a deep level understand, be prepared to pursue a PHD in database theory. For a general level understanding, you can take a look at books about database internals. For SQL Server in particular, SQL Server 2008 Internals has a lot of information on this topic.
I really like This explanation (include graphics ; )
This is the artcicle conclusion:
Conclusion
(source: mattfleming.com)
B-tree indexes are created to decrease the amount of I/O required to find and load a set of data. A highly selective index uses least amount of I/O necessary, poorly selective indices are not much better than a table scan.
精彩评论