Hei!
I need to op开发者_JAVA百科timize some matrix multiplication code in c, and I'm doing it using SSE vector instructions. I also found that there exists SSE4.1 that already has instruction for dot-product, dpps.
The problem is that on machine this software is supposed to work there is an old version of gcc installed (4.1.2), which has no support for SSE4.1, but it has a processor that supports it (don't ask me why gcc version is older than processor...). So I cannot use _mm_dp_ps function.
I was playing around a bit with adding some assembler code to c. The problem is I have never before used assembler code so it's really confusing. Also is it more efficient to write all the code that is dealing with vector instructions in assembler?
So I am asking here if there are any other ways how to use dpps instruction, and if it is even worth using?
Frankly, I do not see the problem. From your description, it seems that the machine on which the final code needs to be executed supports SSE4.1 and DPPS
. Therefore, once your source code - including the instrinsic (or assembly) - is compiled, it can be executed on this machine. You would only have to get your code compiled with a newer version of the compiler, either by installing a newer version on the machine you are talking about or by compiling on a different machine and then copying the executable to the machine it'll have to run on.
As to whether optimisation with DPPS
is worth the effort, that will depend on your code (i.e., how much potential for optimisation there is -- you should profile thoroughly to find out where your bottlenecks are) and how important performance actually is in this specific case (i.e. is it worth your time?; time is money)
Obviously, if you have little assembly experience, implementing your routine in asm, or maybe even just writing your own asm wrapper function around DPPS
, becomes less attractive. (But it is certainly possible to do.)
精彩评论