How would I unroll the following nested loops?
for(k = begin; k != end; ++k) {
for(j = 0; j < Emax; ++j) {
for(i = 0; i < N; ++i) {
if (j >= E[i]) continue;
array[k] += foo(i, tr[k][开发者_开发知识库i], ex[j][i]);
}
}
}
I tried the following, but my output isn't the same, and it should be:
for(k = begin; k != end; ++k) {
for(j = 0; j < Emax; ++j) {
for(i = 0; i+4 < N; i+=4) {
if (j >= E[i]) continue;
array[k] += foo(i, tr[k][i], ex[j][i]);
array[k] += foo(i+1, tr[k][i+1], ex[j][i+1]);
array[k] += foo(i+2, tr[k][i+2], ex[j][i+2]);
array[k] += foo(i+3, tr[k][i+3], ex[j][i+3]);
}
if (i < N) {
for (; i < N; ++i) {
if (j >= E[i]) continue;
array[k] += foo(i, tr[k][i], ex[j][i]);
}
}
}
}
I will be running this code in parallel using Intel's TBB so that it takes advantage of multiple cores. After this is finished running, another function prints out what is in array[] and right now, with my unrolling, the output isn't identical. Any help is appreciated.
Update: I fixed it. I used the answer for this question to do the unrolling... the output wasn't matching because I wasn't doing array[k] = 0;
after the first for loop.
Thanks, Hristo
if (j >= E[i]) continue;
array[k] += foo(i, tr[k][i], ex[j][i]);
array[k] += foo(i+1, tr[k][i+1], ex[j][i+1]);
array[k] += foo(i+2, tr[k][i+2], ex[j][i+2]);
array[k] += foo(i+3, tr[k][i+3], ex[j][i+3]);
versus
if (j >= E[i]) continue;
array[k] += foo(i, tr[k][i], ex[j][i]);
Screening conditions are not identical
a better approach to screening (eliminate branching):
array[k] += (j < E[i])*foo(i, tr[k][i], ex[j][i]);
also, you need to guarantee N is divisible by 4 otherwise you may overshoot. alternatively, truncate N to be divisible by four (N - N%4)
I think that the if (j >= E[i]) continue;
is your problem. In the original, this test is run for every index i
. In your unrolled version, it is only tested for every fourth index. Try the following:
for (i = 0; i < N; /*advanced in loop*/) {
if (j >= E[i]) continue;
array[k] += foo(i, tr[k][i], ex[j][i]); ++i;
if (j >= E[i]) continue;
array[k] += foo(i, tr[k][i], ex[j][i]); ++i;
if (j >= E[i]) continue;
array[k] += foo(i, tr[k][i], ex[j][i]); ++i;
if (j >= E[i]) continue;
array[k] += foo(i, tr[k][i], ex[j][i]); ++i;
}
while (i < N) {
if (j >= E[i]) {
++i; // missing in original version
continue;
}
array[k] += foo(i, tr[k][i], ex[j][i]);
++i;
}
Edit: I forgot to increment an index in the original version that was causing an infinite loop when j >= E[i]
.
精彩评论