开发者

Why is my multi threaded sorting algorithm not faster than my single threaded mergesort

开发者 https://www.devze.com 2022-12-31 18:18 出处:网络
There are certain algorithms whose running time can decrease significantly when one divides up a task and gets each part done in parallel. One of these algorithms is merge sort, where a list is divide

There are certain algorithms whose running time can decrease significantly when one divides up a task and gets each part done in parallel. One of these algorithms is merge sort, where a list is divided into infinitesimally smaller parts and then recombined in a sorted order. I decided to do an experiment to test whether or not I could I increase the speed of this sort by using multiple threads. I am running the following functions in Java on a Quad-Core Dell with Windows Vista.

One function (the control case) is simply recursive:

// x is an array of N elements in random order
public int[] mergeSort(int[] x) {
    if (x.length == 1) 
        return x;

    // Dividing the array in half
    int[] a = new int[x.length/2];
    int[] b = new int[x.length/2+((x.length%2 == 1)?1:0)];
    for(int i = 0; i < x.length/2; i++) 
        a[i] = x[i];
    for(int i = 0; i < x.length/2+((x.length%2 == 1)?1:0); i++) 
        b[i] = x[i+x.length/2];

    // Sending them off to continue being divided
    mergeSort(a);
    mergeSort(b);

    // Recombining the two arrays
    int ia = 0, ib = 0, i = 0;
    while(ia != a.length || ib != b.length) {
        if (ia == a.length) {
            x[i] = b[ib];
            ib++;
        }
        else if (ib == b.length) {
            x[i] = a[ia];
            ia++;
        }
        else if (a[ia] < b[ib]) {
            x[i] = a[ia];
            ia++;
        }
        else {
            x[i] = b[ib];
            ib++;
        }
        i++;
    }

    return x;
}

The other is in the 'run' function of a class that extends thread, and recursively creates two new threads each time it is called:

public class Merger extends Thread
{
    int[] x;
    boolean finished;

    public Merger(int[] x)
    {
        this.x = x;
    }

    public void run()
    {
        if (x.length == 1) {
            finished = true;
            return;
        }

        // Divide the array in half
        int[] a = new int[x.length/2];
        int[] b = new int[x.length/2+((x.length%2 == 1)?1:0)];
        for(int i = 0; i < x.length/2; i++) 
            a[i] = x[i];
        for(int i = 0; i < x.length/2+((x.length%2 == 1)?1:0); i++) 
            b[i] = x[i+x.length/2];

        // Begin two threads to continue to divide the array
        Merger ma = new Merger(a);
        ma.run();
        Merger mb = new Merger(b);
        mb.run();

        // Wait for the two other threads to finish 
        while(!ma.finished || !mb.finished) ;

        // Recombine the two arrays
        int ia = 0, ib = 0, i = 0;
        while(ia != a.length || ib != b.length) {
            if (ia == a.length) {
                x[i] = b[ib];
                ib++;
            }
            else if (ib == b.length) {
                x[i] =开发者_如何学Python a[ia];
                ia++;
            }
            else if (a[ia] < b[ib]) {
                x[i] = a[ia];
                ia++;
            }
            else {
                x[i] = b[ib];
                ib++;
            }
            i++;
        }

        finished = true;
    }
}

It turns out that function that does not use multithreading actually runs faster. Why? Does the operating system and the java virtual machine not "communicate" effectively enough to place the different threads on different cores? Or am I missing something obvious?


The problem is not multi-threading: I've written a correctly multi-threaded QuickSort in Java and it owns the default Java sort. I did this after witnessing a gigantic dataset being process and had only one core of a 16-cores machine working.

One of your issue (a huge one) is that you're busy looping:

 // Wait for the two other threads to finish 
 while(!ma.finished || !mb.finished) ;

This is a HUGE no-no: it is called busy looping and you're destroying the perfs.

(Another issue is that your code is not spawning any new threads, as it has already been pointed out to you)

You need to use other way to synchronize: an example would be to use a CountDownLatch.

Another thing: there's no need to spawn two new threads when you divide the workload: spawn only one new thread, and do the other half in the current thread.

Also, you probably don't want to create more threads than there are cores availables.

See my question here (asking for a good Open Source multithreaded mergesort/quicksort/whatever). The one I'm using is proprietary, I can't paste it.

Multithreaded quicksort or mergesort

I haven't implemented Mergesort but QuickSort and I can tell you that there's no array copying going on.

What I do is this:

  • pick a pivot
  • exchange values as needed
  • have we reached the thread limit? (depending on the number of cores)
    • yes: sort first part in this thread
    • no: spawn a new thread
  • sort second part in current thread
  • wait for first part to finish if it's not done yet (using a CountDownLatch).

The code spawning a new thread and creating the CountDownLatch may look like this:

            final CountDownLatch cdl = new CountDownLatch( 1 );
            final Thread t = new Thread( new Runnable() {
                public void run() {
                    quicksort(a, i+1, r );
                    cdl.countDown();
                }
            } };

The advantage of using synchronization facilities like the CountDownLatch is that it is very efficient and that your not wasting time dealing with low-level Java synchronization idiosynchrasies.

In your case, the "split" may look like this (untested, it is just to give an idea):

if ( threads.getAndIncrement() < 4 ) {
    final CountDownLatch innerLatch = new CountDownLatch( 1 );
    final Thread t = new Merger( innerLatch, b );
    t.start();
    mergeSort( a );
    while ( innerLatch.getCount() > 0 ) {
        try {
            innerLatch.await( 1000, TimeUnit.SECONDS );
        } catch ( InterruptedException e ) {
            // Up to you to decide what to do here
        }
    }
} else {
    mergeSort( a );
    mergeSort( b );
}

(don't forget to "countdown" the latch when each merge is done)

Where you'd replace the number of threads (up to 4 here) by the number of available cores. You may use the following (once, say to initialize some static variable at the beginning of your program: the number of cores is unlikely to change [unless you're on a machine allowing CPU hotswapping like some Sun systems allows]):

Runtime.getRuntime().availableProcessors()


As others said; This code isn't going to work because it starts no new threads. You need to call the start() method instead of the run() method to create new threads. It also has concurrency errors: the checks on the finished variable are not thread safe.

Concurrent programming can be pretty difficult if you do not understand the basics. You might read the book Java Concurrency in Practice by Brian Goetz. It explains the basics and explains constructs (such as Latch, etc) to ease building concurrent programs.


The overhead cost of synchronization may be comparatively large and prevent many optimizations.

Furthermore you are creating way too many threads.

The other is in the 'run' function of a class that extends thread, and recursively creates two new threads each time it is called.

You would be better off with a fixed number of threads, suggestively 4 on a quad core. This could be realized with a thread pool (tutorial) and the pattern would be "bag of tasks". But perhaps it would be better yet, to initially divide the task into four equally large tasks and do "single-threaded" sorting on those tasks. This would then utilize the caches a lot better.


Instead of having a "busy-loop" waiting for the threads to finish (stealing cpu-cycles) you should have a look at Thread.join().


How many elements in the array you have to do sort? If there are too few elements, the time of sync and CPU switching will over the time you save for dividing the job for paralleling

0

精彩评论

暂无评论...
验证码 换一张
取 消