My algorithm consists from two steps:
- Data generation. On this step I generate data array in cycle as some function result
- Data processing. For this step I written OpenCL kernel which process data array generated on previous step.
Now first step runs on CPU because it hard to parallelize. I want to run it on GPU because each step of generation takes some time. And I want to run second step for already generated data immediately.
Can I run another opencl kernel from currently runned kernel in separated thread? Or it be run in the some thread that caller kernel?
Some pseudocode for illustrate my point:
__kernel second(__global int * data, int index) {
//work on data[i]. This process takes a lot of time
}
__kernel first(__global int * data, const int length) {
for (i开发者_如何转开发nt i = 0; i < length; i++) {
// generate data and store it in data[i]
// This kernel will be launched in some thread that caller or in new thread?
// If in same thread, there are ways to launch it in separated thread?
second(data, i);
}
}
No, OpenCL has no concept of threads, and neither a kernel execution can launch another kernel. All kernel execution is triggered by the CPU.
You should launch one kernel. Then do a clFInish(); Then execute the next kernel.
There are more efficient ways but I will only mess you with events.
You just use the memory output of the first kernel as input for the second one. With that, you aboid CPU->GPU copy process.
I believe that the global work size might be considered as the number of threads that will be executed, in one way or another. Correct me if I'm wrong.
精彩评论