开发者

parallel strlen?

开发者 https://www.devze.com 2023-02-13 05:42 出处:网络
I\'m w开发者_开发百科ondering if there would be any merit in trying to code a strlen function to find the \\0 sequence in parallel. If so, what should such a function take into account? Thanks.strlen(

I'm w开发者_开发百科ondering if there would be any merit in trying to code a strlen function to find the \0 sequence in parallel. If so, what should such a function take into account? Thanks.


strlen() is sequential by spirit - one step beyond the null-terminator is undefined behavior and the null-terminator can be anywhere - the first character or the one millionth character, so you have to scan sequentially.


You'd have to make sure the NUL found by a thread is the first NUL in the string, which means that the threads would need to synchronize on what their lowest NUL location is. So while it could be done, the overhead for the sync would be far more expensive than any potential gain from parallelization.

Also, there's the issue of caching. A single thread can read a string contiguously, which is cache friendly. Multiple threads run the risk of stepping on each other's toes.


It would be possible on some parallel architectures, but only if one could guarantee that a substantial amount the memory beyond the string could be accessed safely; it would only be practical if the strings were expected to be quite long and thread communication and synchronization were cheap. For example, if one had sixteen processors and one knew that one could safely access 256KB beyond the end of a string, one could start by dispatching the sixteen processors to handle sixteen 4K chunks. Each time a processor finished and hadn't found a zero, it could either start to handle either the next 4K chunk (if it was within 256KB of the lowest chunk that was still in progress), or wait for the lowest processor to complete. In practice, unless strings were really huge, synchronization delays and excess work would moot any gains from parallelism, but if one needed to find the length of a multi-megabyte string, the task could be done in parallel.


To parallelize tasks, you have to split input data and dispatch it to multiple threads. Without knowing the length of the string in advance, you cannot split up the data.

So you must know the allocated size of the input data (which is not necessarily identical with the string-length) in advance, then it will work.

Your program might return multiple NUL-values which is may have found. Your function can only know that the correct NUL value has been found if all threads which are processing the data that comes before any of the NUL values that have been found, have been completed.

Say we have our string split up in 8 chunks (0-7). If we found NUL-values in chunk 3 we cannot know if maybe there are other NUL-values in chunks 0-2, so we have to wait for any of these threads, an we can immediately stop all other threads. If then a NUL-value is found in thread 1 we only have to wait for thread 0 to complete, so we can get a definitive answer.


You could use this on FIXED-WIDTH strings, but not much more than that.


It depends on the architecture. Nothing wrong with having multiple compute units hunt for that first null character, but you will have to keep them fed with a steady stream of data from memory. You will probably want to perform platform specific tuning for the exact parameters keeping cache boundaries in mind.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号