开发者

Maximizing Worker Thread Utilization

开发者 https://www.devze.com 2023-01-10 21:39 出处:网络
To solve a problem (and better my understanding of multitasking) I have written a small thread pool implementation. This thread pool spins up a number of worker threads which pop tasks off of a queue

To solve a problem (and better my understanding of multitasking) I have written a small thread pool implementation. This thread pool spins up a number of worker threads which pop tasks off of a queue as they are added by the client of the thread pool. For the purposes of this question when the task queue is empty the worker threads are all terminated.

After doing some basic benchmarking I have discovered the application spends ~60% of its time waiting to acquire the queue lock. Presumably this is mostly taking place within the worker threads.

Is this merely an indication I'm not giving the worker threads enough to do, or something more? Is there something straightforward I may be missing to increase worker thread throughput?

EDIT: Here is some rough pseudocode that should illustrate things somewhat. These are the only two places where a lock is acquired/released during the execution of the worker threads (which is a vast majority of the running time of the application.)

std::list<task_t> task_list;

// Called by the client to add tasks to the thread pool
void insert_task(const task_t& task)
{
    lock_type listlock(task_mutex);

    task_list.push_back(task);
}

// The base routine of each thread in the pool. Some details
// such as lifetime management have been omitted for clarity.
void worker_thread_base()
{
    while (true)
    {
        task_t task;

        {
        lock_type listlock(task_mutex);

        if (task_list.empty())
            continue;

        task = task_list.front();

        task_list.pop_front();
        }

        do_ta开发者_JAVA技巧sk(task);
    }
}


Your design is built where each thread sits and "spins" trying to acquire the lock. This will happen constantly unless every worker thread is performing work - in which case the lock will sit unacquired and the work will occur.

With all of your threads just sitting, spinning on a lock, you're going to use quite a bit of CPU time waiting. This is somewhat expected, given your design.

You'll find that the percentage of time blocked will likely shrink dramatically if you have fewer worker threads - and at the point where you have more work items than threads, you'll spend very little time waiting on that lock.

A much better design would be to use some form of lockless queue for your work queue, as this could prevent waiting at this point. In addition, having a wait handle that could block the worker threads until there is work in the queue will prevent the unnecessary spinning.


Are you trying to do this with a single lock, multiple locks? Mutexs? What wait semantics are you using?

I would guess from your description (and this is purely a guess) that you have something similar to:

lock(theLock) {
 // ... do lots of work ...
}

In your main thread which contains the code to dispatch to the lightweight threads. One reason why you might see aggrevated wait times on this is because you need to have signals back from the spun up threads that they have been queued and are waiting for execution (again this is a guess since you didnt give any code).

One way you might solve this is to switch from using an explicit lock, as above, into using a signaled mutex which is pulsed when you want one of the threads to grab work.

Without seeing your current implementation though, I am not sure I can over much more over that.

0

精彩评论

暂无评论...
验证码 换一张
取 消