gpgpu
Why does CUDA Profiler indicate replayed instructions: 82% != global replay + local replay + shared replay?
I got information from CUDA Profiler. I am so confused why Replays Instruction != Grobal memory replay + Local memory replay + Shared bank conflict replay?[详细]
2023-03-30 09:27 分类:问答OpenCL - How to I query for a device's SIMD width?
In CUDA, there is a concept of a warp, which is defined as 开发者_JAVA技巧the maximum number of threads that can execute the same instruction simultaneously within a single processing element.For NVID[详细]
2023-03-29 05:18 分类:问答Sparse Cholesky factorization algorithm for GPU [closed]
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.[详细]
2023-03-29 01:56 分类:问答cpu vs gpu - when cpu is better [closed]
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references,or expertise, but this question will likely solicit debate, a[详细]
2023-03-28 21:32 分类:问答How does the opencl command queue work, and what can I ask of it
I\'m working on an algorithm that does prettymuch the same operation a bunch of times. Since the operation consists of some linear algebra(BLAS), I thourght I would try using the GPU for this.[详细]
2023-03-27 03:24 分类:问答Why does padding the shared memory array by one column increase the speed of the kernel by 40%?
Why is this matrix transpose kernel faster, when the shared memory array is padded by one column? I found the kernel at PyCuda/Examples/MatrixTranspose.[详细]
2023-03-27 01:51 分类:问答Is there a way to independently task and use heterogenous multi gpus in a windows 7 system?
Can I have two mixed chipset/generation AMD gpus in my desktop; a 6950 and 4870, and dedicate one gpu (4870) for opencl/gpgpu purposes only, eliminating the device from video output or display driving[详细]
2023-03-26 07:19 分类:问答how much time does it take to make a call to opencl?
I\'m currently implementing an algorithm that does allot of linear algebra on small matrices and vectors. the code is fast but I\'m wondering if it would make sense to implement it on a gpgpu instead[详细]
2023-03-25 21:51 分类:问答CUDA limit seems to be reached, but what limit is that?
I have a CUDA program that seems to be hitting some sort of limit of some resource, but I can\'t figure out what that resource is.Here is the kernel function:[详细]
2023-03-24 21:32 分类:问答CUDA - copy to array within array of Objects
I have a CUDA application I\'m working on with an array of Objects; each object has a pointer to an array of std::pair<int, double>.I\'m trying to cudaMemcpy the array of objects over, then cuda[详细]
2023-03-24 19:26 分类:问答