Controlling GCC optimization_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-02-17 12:30 出处：网络

I\'m trying to test the cache properties of a machine I have access to. To do this I am trying to read memory and time it. I vary the working set size and the stride access pattern to get different me

相关专题：assembly c

I'm trying to test the cache properties of a machine I have access to. To do this I am trying to read memory and time it. I vary the working set size and the stride access pattern to get different measurements.

The code looks like so:

开发者_Go百科clock1 = get_ticks()
for (i = 0; i < 1000000; i++) {
  for (j = 0; j < (workingset * stride / sizeof(data_t)); j += stride) {
    *array[j];
  }
}
clock2 = get_ticks()

Now the issue is that with a reasonable optimization level, gcc will optimize out the read because it has no side effect. I can't have no optimization level or else all the loop variables will cause reads to memory. I've tried a few different things like making array volatile, and using inline functions that cast as volatile, but gcc's treatment of volatile variables is very hard to predict. What is the appropriate way to do this?

One possibility is to make use of the array data in a way that can't easily be optimised away, e.g.

clock1 = get_ticks();
sum = 0;
for (i = 0; i < 1000000; i++) {
  for (j = 0; j < (workingset * stride / sizeof(data_t)); j += stride) {
    sum += array[j];
  }
}
clock2 = get_ticks();
return sum;

sum should be in a register, and the add operation should add nothing significant to the loop timing.

If the test function and caller are both in the same compilation unit you may also need to ensure that you actually do something with the returned sum value, e.g. output it via printf.

For GCC try to specify used attribute for all index variables (i, j), in order to avoid compiler optimization on them (even with global optimization option enabled):

int i __attribute__((used));
int j __attribute__((used));

clock1 = get_ticks()
for (i = 0; i < 1000000; i++) {
  for (j = 0; j < (workingset * stride / sizeof(data_t)); j += stride) {
    *array[j];
    asm (""); // help to avoid cycle's body elimination
  }
}
clock2 = get_ticks();

Is also good to know, that asm(...) expressions are never being optimized. You can even use it without any assembler expression in it, like this: asm("");.

I think you should really try to write it in assembler if you don't want the compiler to fuzz around with it. You just can't ensure any "tricks" would work forever. Something that works now might be optimized in a future version of the compiler. Also it's probably hard to predict if it worked. If you're able to check the assembler code to see if it worked (i.e. didn't optimize it), you should be able to write it from scratch as well?

Store the value to a volatile global variable at each iteration. This will ensure that actual writes happen (which are necessary to guarantee that the correct value will be seen in a signal handler, for instance).

Alternatively, use something like