sse
Mode for _mm_cmpistrm SSE4.2 intrinsic
I\'m trying to figure out how to set the \"mode\" flag for the _mm_cmpistrm SSE4.2 intrinsic. I have a regular C string (char*) that I am loading into an __m128i type with _mm_lddqu_si128. I was going[详细]
2023-03-12 18:38 分类:问答Why doesn't the Windows x64 calling convention use XMM registers to pass more than 4 integer args?
The (Microsoft) x64 calling convention states: The arguments are passed in registers RCX, RDX, R8, and R9. If the arguments are float/double, they are passed in XMM0L, XMM1L, XMM2L, and XMM3L.[详细]
2023-03-11 22:35 分类:问答if/else statement in SSE intrinsics
I am trying to optimize a small piece of code with SSE intrinsics (I am a complete beginner on the topic), but I am a little stuck on the use of conditionals.[详细]
2023-03-11 21:12 分类:问答efficient way to convert scatter indices into gather indices?
I\'m trying to write a stream compaction (take an array and get rid of empty elements) with SIMD intrinsics. Each iteration of the loop processes 8 elements at a time (SIMD width).[详细]
2023-03-11 12:52 分类:问答How would you write code for unsigned addition likely to be optimized into one SSE instruction?
In C or C++ how would you write code for unsigned addition of two arrays likely to be optimized, by say GCC, into o开发者_Go百科ne 128bit SSE unsigned addition instruction?// N number of ints to be ad[详细]
2023-03-08 13:56 分类:问答How to compare __m128 types?
__m128 a; __m128 b; How to code a != b ? what to use: _mm_cmpneq_ps or _mm_cmpneq_ss ? 开发者_高级运维How to process the result ?[详细]
2023-03-07 03:46 分类:问答openMP and SSE, my program doesn't speed up
Here is a part of my code which runs parallel: timer.Start(); for(int i = 0; i < params.epochs; ++i)[详细]
2023-03-05 10:48 分类:问答Bilinear filter with SSE4.1 intrinsics
I am trying to figure out a reasonably fast bilinear filtering function just for one filtered sample at a time now as an exercise in getting used to using intrinsics - up to SSE41 is fine.[详细]
2023-03-04 21:02 分类:问答Overloading conflict with vector types __m128, __m256 in GCC
I\'ve started playing around with AVX instructions on the new Intel\'s Sandy Bridge processor. I\'m using GCC 4.5.2, TDM-GCC 64bit build of MinGW64.[详细]
2023-03-04 06:36 分类:问答How to force gcc to use all SSE (or AVX) registers?
I\'m trying to write some computationally intensive code for Windows x64 target, with SSE or the new AVX instructions, compiling in GCC 4.5.2 and 4.6.1, MinGW64 (TDM GCC build, and some custom build).[详细]
2023-03-04 05:20 分类:问答