sse2
Visual Studio 2010 and SSE 4.2
I would like to know, what is necessary to set in Visual Studio 2010, to have SSE 4.2 enabled? I would like to use it because of optimized POPCNT...[详细]
2023-03-17 11:38 分类:问答SSE2 double multiplication slower than with standard multiplication
I\'m wondering why the following code with SSE2 instructions performs the multiplication slower than the standard C++ implementation.[详细]
2023-03-17 04:56 分类:问答Tweaking MIT's bitcount algorithm to count words in parallel?
I want to use a version of the well known MIT bitcount algorithm to count neighbors in Conway\'s game of life using SSE2 instructions.[详细]
2023-03-14 22:57 分类:问答Sorting tuples inside signed integers
I\'m sorting tuples of 16+16 bits as 32bit integers with SSE2. There are only signed integer instructions for compare and min/max. I don\'t have a problem with the order for the higher part as its jus[详细]
2023-02-23 04:58 分类:问答Array of sse type: Segmentation Fault
today I tried to initialize an array of the sse type __m128d. Unfortunately it didn\'t work - why? Is it generally impossible to create arrays of sse types (since they are register types?). The follow[详细]
2023-02-15 01:15 分类:问答Speeding up some SSE2 Intrinsics for color conversion
I\'m trying to perform image colour conversion from YCbCr to BGRA (Don\'t ask about the A bit, such a headache).[详细]
2023-02-11 06:27 分类:问答Optimizing loop with few instructions(SSE2, SSE4) with TBB
I have a simple image processing related algorithm. Briefly, an image(mean) in float is subtracted by an 8-bit image[详细]
2023-02-10 03:33 分类:问答SIMD: Why is the SSE RGB to YUV color conversion about the same speed as the c++ implementation?
I\'ve just tried to optimize an RGB to YUV420 converter. Using a lookup table yielded a speed increase, as did using fixed point arithmetic. However I was expecting the real gains using SSE instructio[详细]
2023-02-07 19:45 分类:问答How to make the following code faster
int u1, u2; unsigned long elm1[20], _mulpre[16][20], res1[40], res2[40]; 64 bits long res1, res2 initialized to zero.[详细]
2023-01-30 12:25 分类:问答SIMD code vs Scalar Code
The following loop is executed hundreds of times. elma and elmc are both unsigned long (64-bit) arrays, so is res1 and res2.[详细]
2023-01-29 10:06 分类:问答