cuda algorithm structure_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-03-28 11:05 出处：网络

I would like to understand the general way of doing the following on a GPU using CUDA. I have an algorithm t开发者_运维知识库hat might look something like this:

相关专题：algorithm c

I would like to understand the general way of doing the following on a GPU using CUDA.

I have an algorithm t开发者_运维知识库hat might look something like this:

void DoStuff(int[,] inputMatrix, int[,] outputMatrix)
{
   forloop {
     forloop {
         if (something) {
                DoStuffA(inputMatrix,a,b,c,outputMatrix)
         }
         else {
               DoStuffB(inputMatrix,a,b,c,outputMatrix)
         }
     }
   }
}

DoStuffA and DoStuffB are simple paralleizable functions (e.g. doing a matrix row operation) that the CUDA examples have plenty of.

What I want to do is to know how to put the main algorithm "DoStuff" onto the GPU and then call DoStuffA and DoStuffB as and when I need to (and they execute in parallel). i.e. the outer loop part is single threaded, but the inner calls are not.

The examples I have seen seem to be multithreaded from the get-go. I assume there is a way to just call a single GPU based method from the outside world and have it control all of the parallel bits by itself?

It depends on how the data inter relates to each other in the for loops, but roughly I would