When performing file IO in .NET, it seems that 95% of the examples that I see use a 4096 byte buffer. What's so special about 4kb for a buffer length? Or is it just a convention like us开发者_开发问答ing i for the index in a for loop?
That is because 4K is the default cluster size for for disks upto 16TB. So when picking a buffer size it makes sense to allocate the buffer in multiples of the cluster size.
A cluster is the smallest unit of allocation for a file, so if a file contains only 1 byte it will consume 4K of physical disk space. And a file of 5K will result in a 8K allocation.
Update: Added a code sample for getting the cluster size of a drive
using System;
using System.Runtime.InteropServices;
class Program
{
[DllImport("kernel32", SetLastError=true)]
[return: MarshalAs(UnmanagedType.Bool)]
static extern bool GetDiskFreeSpace(
string rootPathName,
out int sectorsPerCluster,
out int bytesPerSector,
out int numberOfFreeClusters,
out int totalNumberOfClusters);
static void Main(string[] args)
{
int sectorsPerCluster;
int bytesPerSector;
int numberOfFreeClusters;
int totalNumberOfClusters;
if (GetDiskFreeSpace("C:\\",
out sectorsPerCluster,
out bytesPerSector,
out numberOfFreeClusters,
out totalNumberOfClusters))
{
Console.WriteLine("Cluster size = {0} bytes",
sectorsPerCluster * bytesPerSector);
}
else
{
Console.WriteLine("GetDiskFreeSpace Failed: {0:x}",
Marshal.GetLastWin32Error());
}
Console.ReadKey();
}
}
A few factors:
- More often than not, 4K is the cluster size on a disk drive
- 4K is the most common page size on Windows, so the OS can memory map files in 4K chunks
- A 4K page can often be transferred from drive to OS to User Process without being copied
- Windows caches files in RAM using 4K buffers.
Most importantly over the years a lot of people have used 4K as their buffer lengths due to the above, therefore a lot of IO and OS code is optimised for 4K buffers!
My guess would be that it is related to the OS file block size --- Windows on .NET.
My guess … my answer is right, and others are not - not going deep enough in history. And knowing that it is an old question, its much more important to mention, that there where times, when performance was not a Question of programming style only.
The binary size (4096, 8192 or sometime 1024) comes from times, when you could see the the connections of the CPU to peripheral chips. Sorry for sounding old, but this is essential to answer your question. The buffer in your program had to get shifted out to a peripheral device, and therefore it need address lines (needed today there are other ideas) and this address lines are binary bounded. And the chip getting the information needed (and needs) memory to keep it. This memory was an is (!) determined by binary addresses ... - you won't find a 23gb chip. and 1k, 2k, 4k or (finaly) 8k was a good value (in old days).
How ever to shift out a buffer of 8k needed (some how) the same time as shift out one Byte. Thats why we have buffers!
That hard disks have this (cluster) size is not the reason for the buffer size - the oposite is true - the organisation of hard discs follows the above system.
精彩评论