开发者

Is Myers Diff amenable to running on GPUs?

开发者 https://www.devze.com 2023-03-10 13:41 出处:网络
I\'m interested in making a faster Myers diff implementation by running it on a GPU, i.e. with OpenCL.I have a good understanding of the algorithm but am new to GPU programming.My hunch is that the GP

I'm interested in making a faster Myers diff implementation by running it on a GPU, i.e. with OpenCL. I have a good understanding of the algorithm but am new to GPU programming. My hunch is that the GPU will not perform well, but I'd like to hear thoughts and ideas.

Here's a description of one iteration of the algorithm in C. We have two constant buffers of bytes 'left' and 'right' (the data we're comparing), and a shared mutable array of int32 called vector. 'idx' is the iteration index. Then the algorithm is essentially this:

   void myers_diff_iteration(const uint8 *left, const uint8 *right, int32 *开发者_运维百科vector, int32 idx) {
       int32 x = MAX(vector[idx-1], vector[idx+1]);
       int32 y = idx - x;
       while (left[x] == right[y]) {
           x++;
           y++;
       }
       vector[x] = x;
   }

My guess is that the while loop (which has a very unpredictable number of iterations, ranging from zero to a million) is likely to be very bad on the GPU, and eliminate any performance gain. Is that true? Any hints for how to improve it?

Also, the vector is shared between all iterations of the loop. Each iteration writes to a different location, so there's no synchronization needed (beyond requiring that writes to an aligned 4-byte word don't affect neighboring words). Is such a shared vector going to perform well?

Thanks for any help!


You could try. The GPU will have serious problems with the while loop, but as long as there is enough "iterations" (threads) running, shouldn't be any speed lose.

You could rewrite it this way:

   void myers_diff_iteration(const uint8 *left, const uint8 *right, int32 *vector, int32 idx) {
       int id = get_global_id(0);
       int32 x = MAX(vector[idx-1], vector[idx+1]) + id;
       int32 y = idx - x + id;
       if (left[x] != right[y]) {
           vector[x] = x;
       }
   }

You only then need to run the max while loop numer of threads. But it will only produce 1 result of the vector each OpenCL run.

The best it to try, and then do some variations.

0

精彩评论

暂无评论...
验证码 换一张
取 消