开发者

Is function call a memory barrier?

开发者 https://www.devze.com 2023-02-27 02:16 出处:网络
Con开发者_StackOverflowsider this C code: extern volatile int hardware_reg; void f(const void *src, size_t len)

Con开发者_StackOverflowsider this C code:

extern volatile int hardware_reg;

void f(const void *src, size_t len)
{
    void *dst = <something>;

    hardware_reg = 1;    
    memcpy(dst, src, len);    
    hardware_reg = 0;
}

The memcpy() call must occur between the two assignments. In general, since the compiler probably doesn't know what will the called function do, it can't reorder the call to the function to be before or after the assignments. However, in this case the compiler knows what the function will do (and could even insert an inline built-in substitute), and it can deduce that memcpy() could never access hardware_reg. Here it appears to me that the compiler would see no trouble in moving the memcpy() call, if it wanted to do so.

So, the question: is a function call alone enough to issue a memory barrier that would prevent reordering, or is, otherwise, an explicit memory barrier needed in this case before and after the call to memcpy()?

Please correct me if I am misunderstanding things.


The compiler cannot reorder the memcpy() operation before the hardware_reg = 1 or after the hardware_reg = 0 - that's what volatile will ensure - at least as far as the instruction stream the compiler emits. A function call is not necessarily a 'memory barrier', but it is a sequence point.

The C99 standard says this about volatile (5.1.2.3/5 "Program execution"):

At sequence points, volatile objects are stable in the sense that previous accesses are complete and subsequent accesses have not yet occurred.

So at the sequence point represented by the memcpy(), the volatile access of writing 1 has to occurred, and the volatile access of writing 0 cannot have occurred.

However, there are 2 things I'd like to point out:

  1. Depending on what <something> is, if nothing else is done with the the destination buffer, the compiler might be able to completely remove the memcpy() operation. This is the reason Microsoft came up with the SecureZeroMemory() function. SecureZeroMemory() operates on volatile qualified pointers to prevent optimizing writes away.

  2. volatile doesn't necessarily imply a memory barrier (which is a hardware thing, not just a code ordering thing), so if you're running on a multi-proc machine or certain types of hardware you may need to explicitly invoke a memory barrier (perhaps wmb() on Linux).

    Starting with MSVC 8 (VS 2005), Microsoft documents that the volatile keyword implies the appropriate memory barrier, so a separate specific memory barrier call may not be necessary:

    • http://msdn.microsoft.com/en-us/library/12a04hfd.aspx

    Also, when optimizing, the compiler must maintain ordering among references to volatile objects as well as references to other global objects. In particular,

    • A write to a volatile object (volatile write) has Release semantics; a reference to a global or static object that occurs before a write to a volatile object in the instruction sequence will occur before that volatile write in the compiled binary.

    • A read of a volatile object (volatile read) has Acquire semantics; a reference to a global or static object that occurs after a read of volatile memory in the instruction sequence will occur after that volatile read in the compiled binary.


As far as I can see your reasoning leading to

the compiler would see no trouble in moving the memcpy call

is correct. Your question is not answered by the language definition, and can only be addressed with reference to specific compilers.

Sorry to not have any more-useful information.


My assumption would be that the compiler never re-orders volatile assignments since it has to assume they must be executed at exactly the position where they occur in the code.


It's probalby going to get optimized, either because the compiler inlines the mecpy call and eliminates the first assignment, or because it gets compiled to RISC code or machine code and gets optimized there.


Here is a slightly modified example, compiled with gcc 7.2.1 on x86-64:

#include <string.h>
static int temp;
extern volatile int hardware_reg;
int foo (int x)
{
    hardware_reg = 0;
    memcpy(&temp, &x, sizeof(int));
    hardware_reg = 1;
    return temp;
}

gcc knows that the memcpy() is the same as an assignment, and knows that temp is not accessed anywhere else, so temp and the memcpy() disappear completely from the generated code:

foo:
    movl    $0, hardware_reg(%rip)
    movl    %edi, %eax
    movl    $1, hardware_reg(%rip)
    ret
0

精彩评论

暂无评论...
验证码 换一张
取 消