OpenCL: Running only a single instance of a CPU-based kernel_问答_开发者

OpenCL: Running only a single instance of a CPU-based kernel

开发者 https://www.devze.com 2023-03-03 15:13 出处：网络

相关专题：opencl

I have two OpenCL kernels: the first is a parallel task and the second is linear (LZW). The first one, being parallel, runs on a GPU and a second one, linear, runs on a CPU. I have a multicore CPU and i really want to run only a single instance of the kernel on a single core not 2+ instances on 2+ cores. This is not required for production but rather for my academic study about performance of various types 开发者_运维知识库of tasks.

The rather dumb method i am using now is:

if (global_id == 0) then execute(); 
else do_nothing();

Is there a better approach than this?

Thank you.

You can run you kernel with clEnqueueTask. This should start a single work item. You can run the kernel for single thread with that.

Another scenario: OpenCL was developed for parallel computing and setting the work group sizes to 1 is the only possible way to achieve the wanted effect if you run the kernel with clEnqueueNDRangeKernel. On the other hand the compiler might want to optimize something and runs something in parallel. With the option "-cl-opt-disable" to clBuildProgram optimization can be disabled as long as the OpenCL compiler supports this option. This can be done if you need to run clEnqueueNDRangeKernel. But I believe that is not necessary.

To my knowledge, and based on a quick search of the OpenCL specs, there is no portable way to restrict execution of a kernel to a single processor. Consider the following definitions from the spec:

Processing Element: A virtual scalar processor. A work-item may execute on one or more processing elements.

Compute Unit: An OpenCL device has one or more compute units. A work-group executes on a single compute unit. A compute unit is composed of one or more processing elements. A compute unit may also include dedicated texture filtering units that can be accessed by its processing elements.

A CPU-based implementation could conceivably treat the several cores of a single chip as processing elements comprising a single compute unit. The OpenCL kernel compiler would then be free to exploit any parallelism in your kernel, even if you set the global and local work group sizes to 1 when you enqueue your kernel.

However, I doubt that anybody has bothered to make such a clever OpenCL implementation for x86, so by enqueuing only a single instance of your kernel (that is, using global and local work-group sizes of 1), and using an in-order command queue if you want to run several jobs, you can probably get your tasks to use only a single CPU for computation. The host-side OpenCL work (things like compiling your kernels) will probably still use other CPUs.