How disk space is allocated for an edited file_问答_开发者

Assume I save a text file in the HDD disk storage(assume the disk storage is new and so defragmented) and the file name is A with a file size of say 10MB

I presume, the file A occupies some space in the disk as shown, where x is an unoccupied space/memory on the disk

AAAAAAAAAAAAAxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Now, I create and save another file B of some size. So B will be saved as

AAAAAAAAAAAAABBBBBBBBBBBBBBBBBxxxxxxxxxxxxxxxxxxxxxxxxxxx - as the disk is defra开发者_StackOverflowgmented, I assume the storage will be contiguous.

Here, what if I edit the file A and reduce the file size to 2MB. Can you say how the memory will be allocated now.

Some options I could think of are

AAAAAAxxxxxxxxxBBBBBBBBBBBBBBBBxxxxxxxxxxxxxxxxxxxxxxxxxxxx

AAxxxAAxxxAxAxxBBBBBBBBBBBBBBBBxxxxxxxxxxxxxxxxxxxxxxxxxxxx

or a totally new location freeing up the bigger chunk for other files.

xxxxxxxxxxxxxxxBBBBBBBBBBBBBBBBAAAAAAxxxxxxxxxxxxxxxxxxxxxx

or is it any other way based on any algorithm or data-structure.

A lot of this would depend upon what type of filesystem you are using (and also how the OS interacts with it). The behavior of an NTFS filesystem in Windows may be nothing like the behavior of an ext3 filesystem in Ubuntu for the same set of logical operations.

Generally speaking, however, most modern filesystems define a file as a series of pointers to blocks on the disk. There is a minimum block size that describes the smallest allocatable block (typically ranging from 512 bytes to 4 KBytes), so files that are less than this size or not some exact multiple of this size will have some amount of extra space allocated to them.

So what happens when you allocate a 10 MB file 'A'? The filesystem reserves 10MB worth of blocks (perhaps even allowing for a few extra blocks at the end to accommodate any minor edits that are made to the file or its metadata) for the file contents. Ideally these blocks will be contiguous, as in your example. When you edit 'A' and make it smaller, the filesystem will release some or all (most likely all since in most cases editing 'A' involves writing out the entire contents of 'A' to disk again, so there's little reason for the filesystem to prefer keeping 'A' in the same physical location over writing the data to a new location somewhere else on the disk) of the blocks allocated to 'A', and update its reference to include any new blocks that were allocated, if necessary.

With that said, in the typical case and using a modern filesystem and OS, I would expect your example to produce the following final state on disk ('b' and 'a' represent extra bytes allocated to 'B' and 'A' that do not contain any meaningful data):

xxxxxxxxxxxxxxxBBBBBBBBBBBBBBBBbbAAAAAAaaxxxxxxxxxxxxxxxxxxxxxx

But real-world results will of course vary by filesystem, OS, and potentially other factors (for instance, when using an SSD data fragmentation becomes irrelevant because any section of the disk can be accessed at very low latency and with no seek penalty but at the same time it becomes important to minimize write cycles so that the device doesn't wear-our prematurely, so the OS may favor leaving 'A' in place as much as possible in that case in order to minimize the number of sectors that need to be overwritten).

So the short answer is, "it depends".

How allocation is done depends entirely on the file system type (e.g. FAT32, NTFS, jfs, reiser, etc. etc.) and the driver software. Your assumption that the file will be stored contiguously is not necessarily true - it may be more performant to store it in a different pattern, depending on hardware. For example, let's say you have a disk with 16 cylinder heads and a blocksize of 512 bytes, then it could be most efficient to store an amount of 8k data on 16 different cylinders.
OTOH, with recent hardware that does not involve rotating mechanical parts, the story changes dramatically - a concept like "fragmentation" becomes suddenly meaningless, because the access time to each block is the same - no matter in which order it is done.

No it's like this:

First you create file A: (here big A stands for data actually used for A and 'a' for reserved data for A, x stands for free).

AAAAAAAAAAAAAaaaaaaaXXXXXXXXXXXXXXXXXXX

Then B is added:

AAAAAAAAAAAAAaaaaaaaBBBBbbbbbbbbbb

Then C is added, but there is no unreserved space left:

AAAAAAAAAAAAAaaaaaaaBBBBbbbbCCCccc

If A is truncated this is what will happen

AAAAAaaaaaaaxxxxxxxxBBBBbbbbCCCccc

If B is now expanded this will happen:

AAAAAaaaaaaaBBBBxxxxxBBBBBBBBCCCccc

You see that the data for B is no longer close to each other, this is called fragmentation. When you run a defragmentation tool the data is placed close together again.