How come that Solution 2
is more efficient than Solution 1
?
(The time is the average of 100 runs, and the total folders they go through is 13217)
// Solution 1 (2608,9ms)
let rec folderCollector path =
async { let! dirs = Directory.AsyncGetDirectories path
do! [for z in dirs -> folderCollector z]
|> Async.Parallel |> Async.Ignore }
// Solution 2 (2510,9ms)
let rec folderCollector path =
let dirs = Directory.GetDirectories path
for z in dirs do folderCollector z
I would have th开发者_运维百科ought that Solution 1
would be faster because it's async, and that I run it in Parallel. What am I'm missing?
As Daniel and Brian already clearly explained, your solution is probably creating too many short-lived asynchronous computations (so the overhead is more than the gains from parallelism). The AsyncGetDirectories
operation also probably isn't really non-blocking as it is not doing much work. I don't see a truly async version of this operation anywhere - how is it defined?
Anyway, using the ordinary GetDirectories
, I tried the following version (which creates only a small number of parallel asyncs):
// Synchronous version
let rec folderCollectorSync path =
let dirs = Directory.GetDirectories path
for z in dirs do folderCollectorSync z
// Asynchronous version that uses synchronous when 'nesting <= 0'
let rec folderCollector path nesting =
async { if nesting <= 0 then return folderCollectorSync path
else let dirs = Directory.GetDirectories path
do! [for z in dirs -> folderCollector z (nesting - 1) ]
|> Async.Parallel |> Async.Ignore }
Calling a simple synchronous version after certain number of recursive calls is a common trick - it is used when parallelizing any tree-like structure that is very deep. Using folderCollector path 2
, this will start only tens of parallel tasks (as opposed to thousands), so it will be more efficient.
On a sample directory I used (with 4800 sub-dirs and 27000 files), I get:
folderCollectorSync path
takes 1 secondfolderCollector path 2
takes takes 600ms (result is similar for any nesting between 1 and 4)
From the comments:
Your function incurs the cost of async
without any of the benefits because
- you're creating too many
async
s for the short amount of work to be done - your function is not CPU, but rather IO, bound
I expect for a problem like this, you may have the best results if at the top-level you do async/parallel work, but then have the sub-calls be sync. (Or if the trees are very deep, maybe have the first two levels be async, and then sync after that.)
The keys are load-balancing and granularity. Too tiny a piece of work, and the overhead of async outweighs the benefits of parallelism. So you want big enough chunks of work to leverage parallel and overcome the overheads. But if the work pieces are too large and unbalanced (e.g. one top-level dir has 10000 files, and 3 other top-level dirs have 1000 each), then you also suffer because one guy is busy while the rest finish quickly, and you don't maximize parallelism.
If you can estimate the work for each sub-tree beforehand, you can do even better scheduling.
Apparently, your code is IO-bound. Keep in mind how HDDs work. When u use Async to do multiple read, the reading heads of the HDD have to jump back and forth to serve different read commands at the same time, which introduces latency. This will likely become much worse if the data on disk is heavily fragmented.
精彩评论