开发者

setContextClassLoader slows down drastically when called concurrently

开发者 https://www.devze.com 2022-12-15 21:52 出处:网络
I have pin pointed a bottle neck in my application, it seems to me that it boils down to a call to Thread::setContextClassLoader.

I have pin pointed a bottle neck in my application, it seems to me that it boils down to a call to Thread::setContextClassLoader.

Basically I was forced to mess around with the thread's context class loader due to issues with 3rd party libraries (see this question to understand why).

The solution I picked was to my knowledge the common one and it works something like this:

Thread thread = Th开发者_JAVA百科read.currentThread();
ClassLoader old = thread.getContextClassLoader();
thread.setContextClassLoader(newClassLoader);

try {
    ... // problematic code that uses the thread context class loader
} finally {
    thread.setContextClassLoader(old);
}

It turned out that the call to setContextClassLoader wasn't a problem when only 1 thread was running, but when more than one thread is doing it, it slows down drastically.

I've made the following test app to isolate the problem:

ArrayList<Thread> threads = new ArrayList<Thread>();
int thread_count = 1;

long start = System.currentTimeMillis();

for (int i = 0; i < thread_count; i++) {
    Thread thread = new Thread(new MyRunnable(100000000));

    thread.start();
    threads.add(thread);
}

for (Thread thread : threads) {
    thread.join();
}

long total = System.currentTimeMillis() - start;
double seconds = (double)total / 1000;

System.out.println("time in seconds: " + seconds);

And this is MyRunnable class:

public class MyRunnable implements Runnable {
    int _iterations;

    public MyRunnable(int iterations) {
        _iterations = iterations;
    }

    public void run() {
        final Thread curr = Thread.currentThread();
        final ClassLoader loader = ClassLoader.getSystemClassLoader();

        for (int i = 0; i < _iterations; i++) {
            curr.setContextClassLoader(loader);
        }
    }
}

Basically it opens a couple of threads, and sets the current threads context class loader to the system class loader in a loop.

Updated results after code change on my machine: When thread_count is 1 it finishes in half a second. 2 threads take 1.5~, 3 threads 2.7~, 4 threads 4~ - well you get the picture.

I've tried looking in Thread's implementation of setContextClassLoader and it appears that all it does is set a member variable to the class loader passed to it. I've found no locking (or access to shared resources that would require) to explain such overhead when running with more than one thread.

What am I missing here?

P.S. I'm using JRE 1.5 but the same thing happens in 1.6.

EDIT: @Tom Hawtin - See the code changes I made to rule out the reason you mentioned. Even when the system class loader is fetched once, the results are slower when the thread count > 1.


The only really obvious thing in the source is not related to Thread.setContextClassLoader. ClassLoader.getSystemClassLoader calls initSystemClassLoader that locks ClassLoader.class even if the system class loader is already initialised.

A potential issue is that reading of volatile variables may have performance impact on some multiprocessor machines.

Note that we are only looking at a few hundred cycles here.


Recent JVMs use a techinque called "biased locking" which makes lock acquisition nearly free if only one thread ever accesses a given lock. The first time a second thread tries to access the lock, the "bias" towards the original accessor is revoked and the lock becomes a normal/lightweight lock which requires an atomic operation to acquire (and sometimes one to release).

The performance difference between a biased lock and a normal one can be an order of magnitude (say 5 cycles vs 50 cycles) which is consistent with your measurement. The lock in question here is probably the lock mentioned in the first reply to your question. Biased locking is described in more detail here if you are interested.

Even ignoring biased locking, two threads or more threads attempting to get the same lock will generally be considerably slower in aggregate throughput than a single thread (since they contend for the cache line containing the lock word).

0

精彩评论

暂无评论...
验证码 换一张
取 消