I am trying out several compiler switches against a program that performs sobel kernel convolution on two images( 2000Hx3000W and 6800Hx8500W ). There are some observations that I am not able to interprete, following are the data - compiler flags and time taken in secs (please focus on the last column, as it signifies convolution on Y axis 开发者_开发知识库for the larger image):
O2-march=barcelona 0.1483326 0.833264 1.6018882 28.6711242
O2-ftree-vectorize 0.1462104 0.847973 1.506708 26.628592
O2 0.1468406 0.8368156 1.5999718 20.61377564
O2-ftree-vectorize-march=barcelona 0.1441898 0.827366 1.4687354 15.2572644
I expected -O2-march=barcelona to be moderately better, considering the machine I am running on is AMD barcelona. Any ideas as to why -O2 is better than -O2 -march?
About -ftree-vectorize, it should be able to run instructions in parallel since my loop is dependence free. But then, -O2-ftree-vectorize-march=barcelona is the best of the lot, when individually there are reasonable differences in timing.
It would be great if I could understand this behavior.
Regards,
Sayan
精彩评论