开发者

Critical Sections that Spin on Posix?

开发者 https://www.devze.com 2022-12-14 01:23 出处:网络
The Windows API provides critical sections in which a waiting thread will spin a limited amount of times before context switching, but only on a multiprocessor system.These are implemented using Initi

The Windows API provides critical sections in which a waiting thread will spin a limited amount of times before context switching, but only on a multiprocessor system. These are implemented using InitializeCriticalSectionAndSpinCount. (See http://msdn.microsoft.com/en-us/library/ms682530.aspx.) This is efficient when you have a critical section that will often only be locked for a short period of time and therefore contention should not immediately trigger a context switch. Two开发者_运维知识库 related questions:

  1. For a high-level, cross-platform threading library or an implementation of a synchronized block, is having a small amount of spinning before triggering a context switch a good default?
  2. What, if anything, is the equivalent to InitializeCriticalSectionAndSpinCount on other OS's, especially Posix?

Edit: Of course no spin count will be optimal for all cases. I'm only interested in whether using a nonzero spin count would be a better default than not using one.


My opinion is that the optimal "spin-count" for best application performance is too hardware-dependent for it to be an important part of a cross-platform API, and you should probably just use mutexes (in posix, pthread_mutex_init / destroy / lock / trylock) or spin-locks (pthread_spin_init / destroy / lock / trylock). Rationale follows.

What's the point of the spin count? Basically, if the lock owner is running simultaneously with the thread attempting to acquire the lock, then the lock owner might release the lock quickly enough that the EnterCriticalSection caller could avoid giving up CPU control in acquiring the lock, improving that thread's performance, and avoiding context switch overhead. Two things:

1: obviously this relies on the lock owner running in parallel to the thread attempting to acquire the lock. This is impossible on a single execution core, which is almost certainly why Microsoft treats the count as 0 in such environments. Even with multiple cores, it's quite possible that the lock owner is not running when another thread attempts to acquire the lock, and in such cases the optimal spin count (for that attempt) is still 0.

2: with simultaneous execution, the optimal spin count is still hardware dependent. Different processors will take different amounts of time to perform similar operations. They have different instruction sets (the ARM I work with most doesn't have an integer divide instruction), different cache sizes, the OS will have different pages in memory... Decrementing the spin count may take a different amount of time on a load-store architecture than on an architecture in which arithmetic instructions can access memory directly. Even on the same processor, the same task will take different amounts of time, depending on (at least) the contents and organization of the memory cache.

If the optimal spin count with simultaneous execution is infinite, then the pthread_spin_* functions should do what you're after. If it is not, then use the pthread_mutex_* functions.


For a high-level, cross-platform threading library or an implementation of a synchronized block, is having a small amount of spinning before triggering a context switch a good default?

One would think so. Many moons ago, Solaris 2.x implemented adaptive locks, which did exactly this - spin for a while, if the mutex is held by a thread executing on another CPU or block otherwise.

Obviously, it makes no sense to spin on single-CPU systems.

0

精彩评论

暂无评论...
验证码 换一张
取 消