As a learning exercise I am attempting to write a simple embedded database in C#. Everything is going fine but I am getting really stuck when it comes to saving the data to disk.
As an example of one of my problems.. I may need to "insert" data into the middle of the data file. This clearly isn't possible with sequential file access. Re-writing the entire last half of the file every time there is an insert isn't an option for obvious performance reasons.
The only solution I can imagine is to write each table followed by some empty space in the file. The empty space will be used to write new data, and the file will need restructuring / growing each time a table uses up its available space.
I gu开发者_如何转开发ess my questions are.. exactly what does the data "look like" inside a typical DB's data file? How / where is new data written in the file?
Generally databases will use a B-tree to store both the data (where the key will be the row's primary key, and the value will be the content of the row) and indexes. This way you can insert rows into arbitrary locations in O(log n)
time.
Ex, see the file format for SQLite databases, which describes how SQLite uses a B-tree where internal nodes only store pointers and leaf nodes only store data.
See also: http://en.wikipedia.org/wiki/B-tree#Insertions_and_deletions_cause_trouble , which seems to address the issue you're having.
The answer of David Wolever is wrong. The data of a database is not stored in B-trees. The B-trees (usually B+-trees) store only keys and child pointers in inner nodes and keys and data pointers in leaf nodes. The B+-trees usually don't store data (they might do it for relation tables). The data of a database is stored in its data files which are organdized in blocks.
精彩评论