开发者

C++0x atomic implementation in c++98 question about __sync_synchronize()

开发者 https://www.devze.com 2022-12-22 09:46 出处:网络
I have written the followin atomic template with a view to mimicing the atomic operations which will be available in the upcoming c++0x standard.

I have written the followin atomic template with a view to mimicing the atomic operations which will be available in the upcoming c++0x standard.

However, I am not sure that the __sync_synchronize() call I have around the returning of the underlying value are necessary.

From my understanding, __sync_synchronize() is a full memory barrier and I'm not 开发者_运维技巧sure I need such a costly call when returning the object value.

I'm pretty sure it'll be needed around the setting of the value but I could also implement this with the assembly ..

__asm__ __volatile__ ( "rep;nop": : :"memory" );

Does anyone know wether I definitely need the synchronize() on return of the object.

M.

template < typename T >
struct atomic
{
private:
    volatile T obj;

public:
    atomic( const T & t ) :
        obj( t )
    {
    }

    inline operator T()
    {
        __sync_synchronize();   // Not sure this is overkill
        return obj;
    }

    inline atomic< T > & operator=( T val )
    {
        __sync_synchronize();   // Not sure if this is overkill
        obj = val;
        return *this;
    }

    inline T operator++()
    {
        return __sync_add_and_fetch( &obj, (T)1 );
    }

    inline T operator++( int )
    {
        return __sync_fetch_and_add( &obj, (T)1 );
    }

    inline T operator+=( T val )
    {
        return __sync_add_and_fetch( &obj, val );
    }

    inline T operator--()
    {
        return __sync_sub_and_fetch( &obj, (T)1 );
    }

    inline T operator--( int )
    {
        return __sync_fetch_and_sub( &obj, (T)1 );
    }

    inline T operator-=( T )
    {
        return __sync_sub_and_fetch( &obj, val );
    }

    // Perform an atomic CAS operation
    // returning the value before the operation
    inline T exchange( T oldVal, T newVal )
    {
        return __sync_val_compare_and_swap( &obj, oldval, newval );
    }

};

Update: I want to make sure that the operations are consistent in the face of read/write re-ordering due to compiler optimisations.


First, some petty remarks:

volatile T obj;

volatile is useless here, even more that you make all the barriers yourself.

inline T operator++( int )

inline is unneeded, since it is implied when the method is defined inside the class.

Getters and setters:

inline operator T()
{
    __sync_synchronize();   // (I)
    T tmp=obj;
    __sync_synchronize();   // (II)
    return tmp;
}

inline atomic< T > & operator=( T val )
{
    __sync_synchronize();   // (III)
    obj = val;
    __sync_synchronize();   // (IV)
    return *this;
}

To assure total ordering of the memory accesses on read and write, you need two barriers on each access (like this). I would be happy with only barriers (II) and (III) as they suffice for some uses I came up with (eg. pointer/boolean saying data is there, spinlock), but, unless specified otherwise, I would not omit the others, because someone might need them (it would be nice if someone showed you can omit some of the barriers without restricting possible uses, but I don't think it's possible).

Of course, this would be unnecessarily complicated and slow.

That said, I would just dump the barriers, and even the idea of using the barriers in any place of a similar template. Note that:

  • the ordering semantics of that interface is all defined by you; and if you decide the interface has the barriers here or there, they must be here or there, period. If you don't define it, you can come up with more efficient design, because not all barriers, or even not full barriers, might be needed for a particular problem.
  • usually, you use atomics if you have a lock-free algorithm that could give you a performance advantage; this means an interface that prematurely pessimizes the accesses will probably be unusable as a building block of it, as it will hamper the performance itself.
  • lock-free algorithms typically contain communication that cannot be encapsulated by one atomic data type, so you need to know what's happening in the algorithm to place the barriers precisely where they belong (eg. when implementing a lock, you need a barrier after you've acquired it, but before you release it, which are both writes, at least in principle)
  • if you don't wanna have problems, and are not sure about placing the barriers explicitly in the algorithm, just use lock-based algorithms. There's nothing bad about it.

BTW, the c++0x interface allows you to specify precise memory ordering constraints.


inline operator T()
{
    __sync_synchronize();   // Not sure this is overkill
    return obj;
}

Short version: This IS overkill.

Long version:

Why do you want to implement this class as a template at all? It doesn't make sense because the atomic operations are allowed only on integer types from 1-8 bytes and you even can't be sure that an 8 byte integer is supported on all platforms.

You should implement your atomic class as a non-template version and use the "native" integer type of your hardware/system. This is int32_t on 32 bit processors/os and int64_t on a 64 bit systems. e.g.:

#ifdef ...
typedef ... native_int_type;
#endif
// verify that you choosed the correct integer type
BOOST_STATIC_ASSERT(sizeof(native_int_type) == sizeof(void*));

BOOST_STATIC_ASSERT is straight forward to "static_assert()" from C++0x.

If you're using the "perfect fit" integer type you can write the operator just like that:

operator native_int_type() { return obj; }

Because obj is volatile it's guaranteed to fetch the value and not return any cached value. And because you're using the "native" integer type you can be sure that reading such a value is atomic.

atomic& operator=( native_integer_type val )

Again, you don't need a synchronize if you're using the right integer type. Read/Set an int32 on a intel 32 bit system is atomic and so is Read/Set an int64 on a 64 bit system.

I don't see any benefit from implementing atomic as a template. Atomic operations are platform dependent. It's better to offer an "atomic_int" class that just guarantees to has at least 4 byte (if you're supporting 32bit and 64bit systems) and an "atomic_pointer" if you need it. This way the name of the class also implies a semantic and a purpose.

If you just use "atomic" than one could think: "Wow, i just have to put my string class in this template and then it's thread safe!".


Edit: to Answer your update: "I want to make sure that the operations are consistent in the face of read/write re-ordering due to compiler optimisations."

To prevent compiler and cpu to reorder read/write operations you need the __sync_synchronize().

But note that acquire/release semantics may yield better performance than full barriers.


Edit2:

inline atomic< T > & operator=( T val )
{
    __sync_synchronize();   // Not sure if this is overkill
    obj = val;
    return *this;
}

What do you want prevent from reordering? In most cases you want to write this:

    obj = val;
    __sync_synchronize();

instead. Because you want to be sure that the value is written, once you returned from the function.

0

精彩评论

暂无评论...
验证码 换一张
取 消