Hey there, I'm currently developing a Mex-file in matlab including CUDA computation. I wonder if there's a good way to 'automatically' optimize the program for arbitrary input parameters from the user. E.g. when the input-parameters don't exceed a certain size, try开发者_StackOverflow中文版 to use shared and/or constant memory... which will only work up to certain limits. From there on, global memory has to be used. But such optimizations can only be made in runtime because that's the point I get to know the size of input parameters from the user. Any simple solution? Thanks!
You can simply write different kernels and decide which ones to call at runtime.
You can also use the device query API or do some micro-benchmarking to figure out the sizes of shared/constant memory at runtime. This is probably necessary if you don't want to assume a particular GPU model.
精彩评论