OpenCL inside Visual Studio - can we compile one exe that would use all possible CPUs OpenCL can get an all OpenCL supporting platforms?_问答_开发者

So I mean compiling code like:

//******************************************************************* 
// Demo OpenCL application to compute a simple vector addition  
// computation between 2 arrays on the GPU 
// ****************************************************************** 
#include <stdio.h> 
#include <stdlib.h> 
#include <CL/cl.h> 

// OpenCL source code 
const char* OpenCLSource[] = {   
"__kernel void VectorAdd(__global int* c, __global int* a,__global int* b)", 
  "{", 
  "  // Index of the elements to add \n", 
  "  unsigned int n = get_global_id(0);", 
  "  // Sum the n’th element of vectors a and b and store in c \n", 
  "  c[n] = a[n] + b[n];", 
  "}" 
}; 

// Some interesting data for the vectors 
int InitialData1[20] = {37,50,54,50,56,0,43,43,74,71,32,36,16,43,56,100,50,25,15,17}; 
int InitialData2[20] = {35,51,54,58,55,32,36,69,27,39,35,40,16,44,55,14,58,75,18,15}; 

// Number of elements in the vectors to be added 
#define SIZE 2048   

// Main function  
// ********************************************************************* 
int main(int argc, char **argv) 
{ 
   // Two integer source vectors in Host memory 
    int HostVector1[SIZE], HostVector2[SIZE]; 

    // Initialize with some interesting repeating data 
    for(int c = 0; c < SIZE; c++)  
    {   
HostVector1[c] = InitialData1[c%20]; 
HostVector2[c] = InitialData2[c%20]; 
    } 

    // Create a context to run OpenCL on our CUDA-enabled NVIDIA GPU  
    cl_context GPUContext = clCreateContextFromType(0, CL_DEVICE_TYPE_GPU,  
                    NULL, NULL, NULL); 

    // Get the list of GPU devices associated with this context 
    size_t ParmDataBytes; 
    clGetContextInfo(GPUContext, CL_CONTEXT_DEVICES, 0, NULL, &ParmDataBytes); 
    cl_device_id* GPUDevices = (cl_device_id*)malloc(ParmDataBytes); 
    clGetContextInfo(GPUContext, CL_CONTEXT_DEVICES, ParmDataBytes, GPUDevices, NULL);
    // Create a command-queue 开发者_JAVA百科on the first GPU device 
    cl_command_queue GPUCommandQueue = clCreateCommandQueue(GPUContext,  
                 GPUDevices[0], 0, NULL); 

    // Allocate GPU memory for source vectors AND initialize from CPU memory 
    cl_mem GPUVector1 = clCreateBuffer(GPUContext, CL_MEM_READ_ONLY | 
          CL_MEM_COPY_HOST_PTR, sizeof(int) * SIZE, HostVector1, NULL); 
    cl_mem GPUVector2 = clCreateBuffer(GPUContext, CL_MEM_READ_ONLY | 
               CL_MEM_COPY_HOST_PTR, sizeof(int) * SIZE, HostVector2, NULL); 

    // Allocate output memory on GPU 
    cl_mem GPUOutputVector = clCreateBuffer(GPUContext, CL_MEM_WRITE_ONLY, 
               sizeof(int) * SIZE, NULL, NULL); 

    // Create OpenCL program with source code 
    cl_program OpenCLProgram = clCreateProgramWithSource(GPUContext, 7, 
               OpenCLSource, NULL, NULL); 

    // Build the program (OpenCL JIT compilation) 
    clBuildProgram(OpenCLProgram, 0, NULL, NULL, NULL, NULL); 

    // Create a handle to the compiled OpenCL function (Kernel) 
    cl_kernel OpenCLVectorAdd = clCreateKernel(OpenCLProgram, "VectorAdd", NULL);

    // In the next step we associate the GPU memory with the Kernel arguments 
    clSetKernelArg(OpenCLVectorAdd, 0, sizeof(cl_mem),(void*)&GPUOutputVector); 
    clSetKernelArg(OpenCLVectorAdd, 1, sizeof(cl_mem), (void*)&GPUVector1); 
    clSetKernelArg(OpenCLVectorAdd, 2, sizeof(cl_mem), (void*)&GPUVector2); 

    // Launch the Kernel on the GPU 
    size_t WorkSize[1] = {SIZE};  // one dimensional Range 
    clEnqueueNDRangeKernel(GPUCommandQueue, OpenCLVectorAdd, 1, NULL, 
         WorkSize, NULL, 0, NULL, NULL); 

    // Copy the output in GPU memory back to CPU memory 
    int HostOutputVector[SIZE]; 
    clEnqueueReadBuffer(GPUCommandQueue, GPUOutputVector, CL_TRUE, 0, 
               SIZE * sizeof(int), HostOutputVector, 0, NULL, NULL); 

    // Cleanup  
    free(GPUDevices); 
    clReleaseKernel(OpenCLVectorAdd);   
    clReleaseProgram(OpenCLProgram); 
    clReleaseCommandQueue(GPUCommandQueue); 
    clReleaseContext(GPUContext); 
    clReleaseMemObject(GPUVector1); 
    clReleaseMemObject(GPUVector2); 
    clReleaseMemObject(GPUOutputVector); 

    // Print out the results 
    for (int Rows = 0; Rows < (SIZE/20); Rows++, printf("\n")){ 
        for(int c = 0; c <20; c++){ 
            printf("%c",(char)HostOutputVector[Rows * 20 + c]); 
    } 
    }  
    return 0; 
}

into exe with VS (in my case 08) can we be sure that any where we run it it would use maximum of PC computing power? If not how to do that? (btw how to make it work with AMD cards and other non CUDA gpus? (meaning one programm for all PCs))

This code should run on any Windows PC with OpenCL capable GPU drivers.

However, to improve portability and possibly performance, you should be more careful about a few things.

You require a working OpenCL implementation. This won't work at all if no OpenCL implementation is available - a DLL is missing in this case.
You require a GPU implementation of OpenCL. It's a better idea to query all OpenCL platforms and devices and prefer GPU devices
You simply use the first available device, which might not be ideal. It's a better idea to determine the most powerful device via heuristics or to give the user a choice.

Furthermore, you are not using clCreateContextFromType correctly. You need to explicitly specify a platform ID (you can query IDs with clGetPlatformIDs) on ICD-enabled OpenCL implementations. All of today's OpenCL implementations use the ICD wrapper. (Well, except Apple's)

By writing your program for OpenCL, it will run on any computer that supports your executable format and has an OpenCL driver. Might be GPU-accelerated and might not, and won't be quite as fast as hardware-specific machine code, but should be fairly portable. If by portable you mean x86 CPU architecture and Windows OS and only the video card changes. For true portability, you'll need to distribute either source code or bytecode that's JIT compiled -- not C or C++ binaries.

In principle OpenCL code should run anywhere. But there are multiple versions of its implementation, 1.1 and 1.2 (not yet on Nvidia) with 2.0 spec released and due sometime. There are API incompatibilities between the OpenCL versions. AMD have provided a way for their OpenCL 1.2 implementation to use the 1.1 API calls.

There are important caveats noted, by other authors, above, OpenCL ICD http://www.khronos.org/registry/cl/extensions/khr/cl_khr_icd.txt supports running on different Vendor platforms but since there is no binary portability for compiled kernel code, you'll need to provide the kernel source code.

In the general case, check for the OpenCL Platform Version and Profile. Remember there can easily be multiple platforms and devices on a system. Then make a determination as to what the minimum platform version and profile available you can run on. If your code relies on specific device extensions, like double precision floating point, then you need to check for their presence on the device.

Use