开发者

Is it useful to use a Thread for prefetching from a file?

开发者 https://www.devze.com 2023-02-15 16:06 出处:网络
Using multiple threads for speeding IO may work, but I need to process a huge file (or directory tree) sequentially by a single thread. However I could imagine two possible ways how to speed up readin

Using multiple threads for speeding IO may work, but I need to process a huge file (or directory tree) sequentially by a single thread. However I could imagine two possible ways how to speed up reading from a file:

Feeder

The main thread gets all it's data from a PipedInputStream (or alike) fed by the auxiliary thread, which is the only one accessing the file. The synchronization overhead is higher, but there's less communication to (the underlying library communicating with) the OS. This is straightforward for a single file, but very complicated for a directory tree.

Prefetcher

The main thread opens new FileInputStream(file) and reads it as if it was alone. The auxiliary thread opens it's own stream over the same file and reads ahead. The main Thread doesn't need to wait for the disk since it gets all it's data from the OS cache. There should be some trivial synchronization assuring that the auxiliary thread doesn't run too far ahead. This开发者_运维百科 could work for directory trees without much additional effort.

The questions

  • Which idea (if any) would you recommend to try out?
  • Have you used something like this?
  • Any other idea?


I had an app that read multiple files, created xml out of it and sent it to a server.
In this situation having a dedicated "feeder" (reads file and put them in a queue) and a few "sender" (creates xml and send it to the server) helped.

If you are doing moderate to intensive CPU consuming work (like XML parsing), then having 2 threads (1 reads and 1 processes) is likely to help even on a single core machine. I won't be too concerned about synchronization overhead. When there is little contention, the gain by doing work while waiting for IO would be much bigger. If your thread wait for IO time to time, then there will be even more benefits.

I'd recommend to read this chapter from JCiP. It addresses this topic.


It depends! ... on your access patterns, on your hardware...

"Using multiple threads for speeding IO may work" - IF your IO subsystem (such as a large disk array) is capable of handling multiple IO requests at once.

On a single desktop drive, your gains will be limited; if you have several threads performing largely independent work (i.e. there are few synchronisation points) you can benefit from one thread reading data, while others are processing data previously read.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号