I'm porting a kernel extentsion to 32/64 bit AIX on multi-processor PowerPC, written in C. I don't need more than atomic read operation and atomic write operations (I have no use for fetch-and-add, compare-and-swap etc.) Just to clarify: to me, "atomicity" means not only "no interleaving", but also "visibility across multiple cores". The operations operate on pointers, so operations on 'int' variables are useless to me.
If I declare the variable "volatile", the C standard says the variable can be modified by unknown factors and is therefore not subject to optimizations.
From what I read, it seems that regular reads and writes are supposed to be non-interleaved, and the linux kernel souces seem to agree. it says:
__asm__ __volatile__("stw%U0%X0 %1,%0" : "=m"(v->counter) : "r"(i));
stw
is "store word", which is supposedly atomic, but I don't know what the "%U0%X0" means. I do not understand how this assembly instruction imposes visibility.
When I compile my kernel extension, 'std' is used for the assignment I want, but it should also be atomic for a 64 bit machine, from what I read. I have very little understanding of the specifics of PowerPC and its instruction set, However I did not find in the assembly listing of the compiled file any memory barrier instructions ("sync" or "eieio").
The kernel provides the fetch_and_addlp() service which can be used to implement atomic read (v = fetch_and_addlp(&x, 0)
, for example).
So my questions are:
is it enough to declare the variable 'volatile' to achieve read and write atomicity in the sense of visibility and no-interleaving?
if the answer t开发者_开发问答o 1 is "no", how is such atomicity achieved?
what is the meaning of "%U0%X0" in the Linux PowerPC atomic implementation?
There are idiosyncrasies in the GCC inline assembly syntax.
in the line,
__asm__ __volatile__("stw%U0%X0 %1,%0" : "=m"(v->counter) : "r"(i));
the m
is an output operand and the r
is an input operand. The %1 and %0 refer to the argument order (0->m, 1->r)
the stw
assembly instruction takes 2 arguments and the %U0%X0 are constraints on the arguments. These constraints are to force GCC to analyze the arguments and make sure you dont do something stupid. As it turns out, `U' is powerpc-specific (I'm used to the X64 constraint set :). The full list of constraints can be found in :
http://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints
I have managed to answer questions 1 and 2, but not 3:
- No, its not enough.
- Memory barriers are still required. I used the XLC built in __lwsync(). This should both prevents reordering by the processor and publishes the change to other processors.
精彩评论