Let arr be an array of dimension 16 x 20
Here is the valgrind output for the code snippet mentioned. The output is from cachegrind.for (i = 0; i < 20; i++)
arr[0][i] = 0;
Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
64 0 0 41 0 0 1 0 0
60 0 0 20 0 0 20 2 2
I have read the what these individual parameters mean from valgrind documentation. But, I am not able to tally those with the above figures. Like for the for loop, do we really have 41 cache data reads? or for 开发者_高级运维the array arr, how can we have 2 L2 write misses?
My configuration is L1d = L1I = 32KB, L2 = 2MB, 64 byte cache line size, and 8-way set associative.
As Erik Olson says, the 41 reads in the for
line are all for i
- 21 in the i < 20
test, and 20 in the i++
(if you compile with optimisation, these should reduce).
There are two L2 write misses because your 20 integers cover 80 bytes, which is (at best) two cache lines. Depending on the alignment of the array, it might cover 3 cache lines, which would cause three write misses.
Most of your data reads come from the loop variable i.
21 from the conditional i<20
20 reads from i++.
20 reads from i in the lvalue arr[0][i].
I'm not up to date on how cache works, but assuming 32 bit int array, your writes cover 10 cache lines. Wild guess: the last two lines are your write misses as it somehow doesn't predict your next write.
If you unroll the loop, you will see the counts collapse to small numbers.
arr[0][0]=0;
arr[0][1]=0;
..
I think the data mentioned with the above text may be erroneous as it was picked from inside a large code, thus there were effects from other variables as well.
精彩评论