gpgpu
In OpenCL, what does mem_fence() do, as opposed to barrier()?
Unlike barrier() (which I think I understand), mem_fence() does not affect all items in the work group.The OpenCL spec says (section 6.11.10), for mem_fence():[详细]
2023-04-11 17:58 分类:问答How To Structure Large OpenCL Kernels?
I have worked with OpenCL on a couple of projects, but have always written the kernel 开发者_运维百科as one (sometimes rather large) function.Now I am working on a more complex project and would like[详细]
2023-04-10 04:07 分类:问答Using int index where double is expected in C++ AMP retrict(direct3d) code
Googling didn’t help much, has anyone used AMP? In the code snippet below the cast from integer to double (double v = idx.x) leads to a “Failed to create shader” run time error.[详细]
2023-04-08 16:17 分类:问答How good is NVCC at code optimizations?
How well does NVCC optimize device code? Does it do any sort of optimizations like constant folding and common subexpression elimination?[详细]
2023-04-06 18:22 分类:问答Can you predict the runtime of a CUDA kernel?
To what degree can one predict / calculate the performanc开发者_开发百科e of a CUDA kernel? Having worked a bit with CUDA, this seems non trivial.[详细]
2023-04-06 09:51 分类:问答cuda multiple memory access
Please give me some explanation how a memory access works in the following kernel: __global__ void kernel(float4 *a)[详细]
2023-04-04 15:54 分类:问答How to quickly find a image in another image using CUDA?
In my current project I need to find pixel exact position of image contained in another image of larger size. Smaller image is never rotated or stretched (so should match pixel by pixel) but it may ha[详细]
2023-04-04 07:44 分类:问答cuda nbody simulation - shared memory problem
Based on the example from Nvidia GPU computing SDK I created two kernels for the nbody simulation. The first kernel which doesn\'t take advantage of shared memory is ~15% faster than the second kernel[详细]
2023-04-01 20:30 分类:问答OpenCL vs OpenMP performance [closed]
Closed. This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this po[详细]
2023-04-01 08:12 分类:问答CUDA kernel function taking longer than equivalent host function
I\'m following along with http://code.google.com/p/stanford-cs193g-sp2010/ and the video lectures posted online, doing one of the problem sets posted (the first one) I\'ve encountered something slight[详细]
2023-03-31 18:29 分类:问答