I have studied a lot how durability is achieved in databases and if I understand well it works like this (simplified):
Clent's point of view:
- start transaction.
- insert into table values...
- commit transaction
DB engine point of view:
- write transaction start indicator to log file
- write changes done by client to log file
- write transaction commit indicator to log file
- flush log file to HDD (this ensures durability of data)
- return 'OK' to client
What I observed:
Client application is single thread application (one db connection). I'm able to perform 400 transactions/sec, while simple tests that writes something to file and then fsync this file to HDD performs only 150 syncs/sec. If client were multithread/multi connection I would imagine that DB eng开发者_开发知识库ine groups transactions and does one fsync per few transactions, but this is not the case.
My question is if, for example MsSQL, really synchronizes log file (fsync, FlushFileBuffers, etc...) on every transaction commit, or is it some other kind of magic behind?
The short answer is that, for a transaction to be durable, the log file has to be written to stable storage before changes to the database are written to disk.
Stable storage is more complicated than you might think. Disks, for example, are not usually considered to be stable storage. (Not by people who write code for transactional database engines, anyway.)
It see how a particular open source dbms writes to stable storage, you'll need to read the source code. PostgreSQL source code is online. (File is xlog.c) Don't know about MySQL source.
精彩评论