Okay i have already been through most of the ati and nvidia guides to OpenCL, there are some stuff th开发者_运维知识库at i just want to be sure of, and some need clarification. Nothing in the documentation gives a clear cut answer.
Now i have a radeon 4650, now on querying my device, i got
CL_DEVICE_MAX_COMPUTE_UNITS: 8
CL_DEVICE_ADDRESS_BITS: 32
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 128 / 128 / 128
CL_DEVICE_MAX_WORK_GROUP_SIZE: 128
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 256 MByte
CL_DEVICE_GLOBAL_MEM_SIZE: 256 MByte
ok first, my card has 1GB memory, why am i allowed to 256MB only?
2nd i don't understand the Work-item dimension part, does that mean i can have up to 128*3 or 128^3 work-items?
when i calculated this before i run the query, i got 8 cores * 16 stream processors * 4 work-items = 512 why is this wrong?
also i got the same 3 dimension work-item stuff for my inte core 2 duo CPU, does the same calculations apply?
As for the command queues, when i tried accessing my core duo CPU as a device using OpenCL, stuff got processed on one core only, i tried doing multiple queues and queueing several entries, but still got processed on one core only, i used a global_work_size of 128*128*128*8 for a simple write program where each work-item writes its own global-id to the buffer and i got only zeros.
and what about Nvidia Cards? on a Nvidia 9500 GT with 32 cuda cores, does the work-items calculate similarly?
Thanks alot, i've been really all over the place trying to find answers.
ok first, my card has 1GB memory, why am i allowed to 256MB only?
This is an ATI driver bug/limitation AFAIK. I'll check on my 5850 if I can repro.
http://devforums.amd.com/devforum/messageview.cfm?catid=390&threadid=124142&messid=1069111&parentid=0&FTVAR_FORUMVIEWTMP=Branch
2nd i don't understand the Work-item dimension part, does that mean i can have up to 128*3 or 128^3 work-items?
No. That means you can have max 128 on one dim since CL_DEVICE_MAX_WORK_ITEM_SIZES
is 128 / 128 / 128
. And since CL_DEVICE_MAX_WORK_GROUP_SIZE
is 128, you can have, e.g: work_group_size(128, 1, 1)
or work_group_size(1, 128, 1)
or work_group_size(64, 1, 2)
, or work_group_size(8, 4, 4)
etc, as long as product of each dim is <= 128
it will be fine.
when i calculated this before i run the query, i got 8 cores * 16 stream processors * 4 work-items = 512 why is this wrong?
also i got the same 3 dimension work-item stuff for my inte core 2 duo CPU, does the same calculations apply?
Don't understand what you are trying to compute here.
精彩评论