What is the ideal memory block size to use when copying?_问答_开发者

I am currently using 100开发者_StackOverflow社区 megabytes per memory block to copy large files.

Is there a "good" amount that people normally use?

Edit

Thanks for all the great responses.

I'm still quite new to these concepts so I'll try to understand a lot of the ones that have been said (e.g. write back cache). I keep learning new things :)

A block between 4096 and 32KB is the typical choice. Using 100MB is counter-productive. You are occupying RAM with the buffer that can be put to much better use as the file system writeback cache.

Copying files is very fast when the file fits completely in the cache, the WriteFile() call is a simple memory-to-memory copy. The cache manager then lazily writes it out to the disk. But when there's no more room in the cache, the copy speed drops off a cliff when WriteFile() has to wait for space to be made available. It now goes at disk write speeds.

I would recommend you to benchmark this, and remember to include much smaller block sizes. In my own tests on this, I got quite counterintuitive results.

When reading and writing from the hard drive, all (power of two) block sizes between 512 byte and 512 kB gave the same speed. Increasing the block size from 512 kB to 1 MB reduced the copying speed to about 60%. Increasing the block size further increased the speed again, but never all the way back to the speed of using small blocks.

When all the copied data was in the cache memory, the (much faster) copying speed improved with increasing block sizes, flattening out around reaching 32 kB blocks, and then suddenly dropped to about half the previous speed when going from 256 kB to 512 kB blocks, never to return to the previous speeds.

After this test, I dropped read/write block sizes in several of my programs from around 1 MB to 32 kB.

There's generally little benefit in using blocks that large.

Suppose your operating system is super-naive and every read or write operation incurs a hard disc seek (in practice you will often find that writes get queued and reads get read-ahead-buffered, reducing the benefit of using large buffers in your application code).

Then every block costs you (say) 2x10ms for two seeks (one to read and one to write) and there's little point increasing your block size once the time for the actual reading and writing is substantially more than that. A really fast HD might read and write at 150MB/s, in which case that 10ms would correspond to 1.5MB of reading/writing, and you'd be gaining little for blocksizes beyond 15MB.

In practice, (1) your seek time will probably be less, (2) your read and write bandwidth will probably be more, and (3) your OS and drive hardware will probably be cacheing and queueing things for you; you'll probably see little or no benefit from blocksizes above about 100KB.

(You should probably benchmark a variety of blocksizes and see what you get on your own system.)

That's a pretty excessive amount. Consider that you don't even start writing data before reading 100 MB, so the filesystem driver doesn't even have an opportunity to write any of the destination file while you're reading. The disk could be writing parts of the file that happen to pass under the head as it's reading the source file (see elevator seek for example).

I think that it depends on size of free memory that you have.

If you use 100 M blocks to copy on machine that has for example 30Mb of empty memory then it'll take much more time to copy than using smaller (20M) block.

If your buffor for copying is larger than size of available free memory then due to virtual memory swapping your copying will be slower than expected.

Given that the drive must seek when it changes tracks, might not a block size of say 63 x 512 = 32256 produce optimum results?