开发者

Parallelizing a for loop

开发者 https://www.devze.com 2023-02-25 18:53 出处:网络
I have a for loop where the computation at iteration i does not depend on the computations done in the previous iterations.

I have a for loop where the computation at iteration i does not depend on the computations done in the previous iterations.

I want to parallelize the for loop(my code is in java) so that the computation of multiple iterations can be run concurrently on multiple processors. Should I create a thread for the computation of each iteration, i.e. num开发者_开发问答ber of threads to be created is equal to the number of iterations(number of iterations are large in the for loop)? How to do this?


Here's a small example that you might find helpful to get started with parallelization. It assumes that:

  1. You create an Input object that contains the input for each iteration of your computation.
  2. You create an Output object that contains the output from computing the input of each iteration.
  3. You want to pass in a list of inputs and get back a list of outputs all at once.
  4. Your input is a reasonable chunk of work to do, so overhead isn't too high.

If your computation is really simple then you'll probably want to consider processing them in batches. You could do that by putting say 100 in each input. It uses as many threads as there are processors in your system. If you're dealing with purely CPU intensive tasks then that's probably the number you want. You'd want to go higher if they're blocked waiting for something else (disk, network, database, etc.)

public List<Output> processInputs(List<Input> inputs)
        throws InterruptedException, ExecutionException {

    int threads = Runtime.getRuntime().availableProcessors();
    ExecutorService service = Executors.newFixedThreadPool(threads);

    List<Future<Output>> futures = new ArrayList<Future<Output>>();
    for (final Input input : inputs) {
        Callable<Output> callable = new Callable<Output>() {
            public Output call() throws Exception {
                Output output = new Output();
                // process your input here and compute the output
                return output;
            }
        };
        futures.add(service.submit(callable));
    }

    service.shutdown();

    List<Output> outputs = new ArrayList<Output>();
    for (Future<Output> future : futures) {
        outputs.add(future.get());
    }
    return outputs;
}


You should not do the thread handling manually. Instead:

  • create a reasonably-sized thread pool executor service (if your computations do no IO, use as many threads as you have cores).
  • Run a loop that submits each individual computation to the executor service and keeps the resulting Future objects. Note that if each computation consists of only a small amount of work, this will create a lot of overhead and possibly even be slower than a single-threaded program. In that case, submit jobs that do packets of computation as mdma suggests.
  • Run a second loop that collects the results from all the Futures (it will implicitly wait until all computations have finished)
  • shut down the executor service


No, you should not create one thread for each iteration. The optimum number of threads is related to the number of processors available - too many threads, and you waste too much time context switching for no added performance.

If you're not totally attached to Java, you might want to try a parallel high-performance C system like OpenMPI. OpenMPI is suitable for this kind of problem.


Don't create the threads yourself. I recommend you use the fork/join framework (jsr166y) and create tasks that iterate over a given range of items. It will take care of the thread management for you, using as many threads as the hardware supports.

Task granularity is the main issue here. If each iteration is relatively low computation (say less than 100 operations) then having each iteration executed as a separate task will introduce a lot of overhead of task scheduling. It's better to have each task accept a List of arguments to compute, and return the result as a list. That way you can have each task compute 1, 10 or thousands of elements, to keep task granulary at a reasonable level that balances keeping work available, and reducing task management overhead.

There is also a ParallelArray class in jsr166z, that allows repeated computation over an array. That may work for you, if the values you are computing are primitive types.

0

精彩评论

暂无评论...
验证码 换一张
取 消