One particular hot spot when I profile a code I am working on, is the following loop:
for(int loc = start; loc<end; ++loc)
y[loc]+=a[offset+loc]*x[loc+d];
where the arrays y, a, and x have no overlap. It seems to me that a loop like this should be easily vectorized, however when I compile using g++ with the options "-O3 -ftree-vectorize -ftree-vectorizer-verbose=1", I get no indication that this particular loop was vectorized. However, a loop occurring just before the code above:
for(int i=0; i<m; ++i)
y[i]=0;
does get vectorized开发者_StackOverflow社区 according to the output. Any thoughts on why the first loop is not vectorized, or how I might be able to fix this? (I am not all that educated on the concept of vectorization, so I am likely missing something quite obvious)
As per Oli's suggestion, turning up the verbosity yields the following notes (while I am usually good at reading compiler warnings/errors/output, I have no idea what this means):
./include/mv_ops.h:89: note: dependence distance = 0.
./include/mv_ops.h:89: note: accesses have the same alignment.
./include/mv_ops.h:89: note: dependence distance modulo vf == 0 between *D.50620_89 and *D.50620_89
./include/mv_ops.h:89: note: not vectorized: can't determine dependence between *D.50623_98 and *D.50620_89
You need to tell the compiler that x
, y
, and a
do not overlap. In C/C++ terms that means telling the compiler that those pointers do not alias by declaring them with restrict
(or __restrict
). gcc is very aggressive about optimizations when it assumes no aliasing, so be careful.
One possibility is that the compiler can't guarantee that there are no aliases. In other words, how can the compiler be sure that y
, a
and x
don't overlap in some way?
If you turn the verbosity level up, you may get some extra info.
精彩评论