开发者

Efficient Memory Barriers

开发者 https://www.devze.com 2023-04-03 17:02 出处:网络
I have a multithreaded application, where each thread has a variable of integer type. These variables are incremented during execution of the program. At certain points in the code, a thread compares

I have a multithreaded application, where each thread has a variable of integer type. These variables are incremented during execution of the program. At certain points in the code, a thread compares its counting variable with those of the other threads.

Now since, we know that threads running on multicore might execute out of order, a thread might not read the expected counter values of the other threads. To solve this problem, one way is to use atomic variable, such as std::atomic<> of C++11. However, performing a memory fence at each increment of counters will significantly slow down the program.

开发者_开发问答

Now what I want to do is that when a thread is about to read other thread's counter, only then a memory fence is created and counters of all the threads are updated in the memory at that point. How can this be done in C++. I am using Linux and g++.


The C++11 standard library includes support for fences in <atomic> with std::atomic_thread_fence.

Calling this invokes a full fence:

std::atomic_thread_fence(std::memory_order_seq_cst);

If you want to emit only an acquire or only a release fence, you can use std:memory_order_acquire and std::memory_order_release instead.


There are x86 intrinsics that correspond to memory barriers that you can use yourself. The Windows header has a memory barrier macro, so you should be able to find something equivalent for Linux.


You can use boost::asio::strand for this exact purpose. Create a handler responsible for reading the counter. That handler can be called from multiple threads. Instead of directly calling the handler, wrap it inside a boost::asio::strand. This will ensure the handler can not be concurrently called by multiple threads.

http://www.boost.org/doc/libs/1_35_0/doc/html/boost_asio/tutorial/tuttimer5.html

I hope I understood the question right.


My suggestion would be to have a collectTimers() function in a higher level class that can ask each thread for its counter (via queue/msg). This way updating timers are not delayed, but collecting timers is a bit slower.

This only works if you have some kind of communication mechanism between the threads.


And why not having a "control" thread, to whom each thread reports its counter increments and ask for the values of others ?

It would make it very efficient and simple. Just a suggestion.


You could try something like the signal-theft limit counter design in Secion 4.4.3 of http://mirror.nexcess.net/kernel.org/linux/kernel/people/paulmck/perfbook/perfbook.2011.08.28a.pdf

This kind of design can eliminate the atomic operations from the fastpath (incrementing the per-thread counter). Whether the complexity is worth it is up to you to decide, of course.

0

精彩评论

暂无评论...
验证码 换一张
取 消