simd
improve locality and decrease Cache pollution in a medical image reconstruction implementation
I\'m doing a research for my University related to an Image reconstruction algorithm for medical usage.[详细]
2023-02-05 04:10 分类:问答approximating log10[x^k0 + k1]
Greetings. I\'m trying to approximate the function Log10[x^k0 + k1], where .21 < k0 < 21, 0 < k1 < ~2000, and x is integer < 2^14.[详细]
2023-02-04 18:27 分类:问答SSE (SIMD extensions) support in gcc
I see a code as below: #include \"stdio.h\" #define VECTOR_SIZE4 typedef float v4sf __attribute__ ((vector_size(sizeof(float)*VECTOR_SIZE)));[详细]
2023-02-03 00:53 分类:问答indexing into an array with SSE
Suppose I have an array: uint8_t arr[256]; and an element __m128i x containing 16 bytes, x_1, x_2, ... x_16[详细]
2023-01-31 03:28 分类:问答How to make the following code faster
int u1, u2; unsigned long elm1[20], _mulpre[16][20], res1[40], res2[40]; 64 bits long res1, res2 initialized to zero.[详细]
2023-01-30 12:25 分类:问答SIMD code vs Scalar Code
The following loop is executed hundreds of times. elma and elmc are both unsigned long (64-bit) arrays, so is res1 and res2.[详细]
2023-01-29 10:06 分类:问答64-bit specific simd intrinsic
I am using the following union declaration in SSE2. typedef unsigned long uli; typedef uli v4si __attribute__ ((vector_size(16)));[详细]
2023-01-29 06:46 分类:问答What's the most efficient way to load and extract 32 bit integer values from a 128 bit SSE vector?
I\'m trying to optimize my cod开发者_如何学Goe using SSE intrinsics but am running into a problem where I don\'t know of a good way to extract the integer values from a vector after I\'ve done the SSE[详细]
2023-01-28 19:25 分类:问答Porting MMX/SSE instructions to AltiVec
Let me preface this with.. I have extremely limited experience with ASM, and even less with SIMD. But it happens that I have the following MMX/SSE optimised code, that I would like to port across to[详细]
2023-01-28 15:47 分类:问答Most efficient way to store 4 dot products into a contiguous array in C using SSE intrinsics
I am optimizing some code for an Intel x86 Nehalem micro-architecture using SSE intrinsics. A portion of my program computes 4 dot products and adds each result to the previous 开发者_C百科values in[详细]
2023-01-24 17:04 分类:问答