I have some code that uses Intel TBB and I'm running on a 32 core machine. In the code, I use
parallel_for(blocked_range (2,left_image_width-2, left_image_width /32) ...
to spawn 32 to threads that do concurrent work, there are no race conditions and each thread is hopefully given the same amount of work. I'm using clock_t to measure how long my program takes. For a certain image, it takes roughly 19 seconds to complete.
Then I ran my code through Intel Parallel Studio and it ran the code in 2 seconds. This is the result I was expecting but I can't figure out why there's such a large difference between the two. Does time_t sum the clock cycles on all the cores? Even then it doesn't make sense. Below is the snippet in question.
clock_t begin=clock();
create_threads_and_do_work();
clock_t end=clock();
double diffticks=end-begin;
double diffms=(diffticks*1000)/CLOCKS_PER_SEC;
cout<<"And the time is "<<开发者_Python百科diffms<<" ms"<<endl;
Any advice would be appreciated.
It's isn't quite clear if the difference in run time is a result of two different inputs (images) or simply two different run-time measuring methods (clock_t difference vs. Intel software measurement). Furthermore, you aren't showing us what goes on in create_threads_and_do_work(), and you didn't mention what tool within Intel Parallel Studio you are using, is it Vtune?
Your clock_t difference method will sum the processing time of the thread that called it (the main thread in your example), but it might not count the processing time of the threads spawned within create_threads_and_do_work(). Whether it does or doesn't depends on whether within that function you wait for all threads to complete and only then exit the function or if you simply spawn the threads and exit immediately (before they complete processing). If all you do in the function is that parallel_for(), then the clock_t difference should yield the right result and should be no different than other run-time measurements.
Within Intel Parallel Studio there is a profiling tool called Vtune. is a powerful tool and When you run your program through it you can view (in a graphically pleasing way) the processing time (as well as times called) of each function in your code. I'm pretty sure after doing this you'll probably figure it out.
One last idea - did the program complete its course when using Intel software? I'm asking because sometimes Vtune will collect data for some time and then stop without allowing the program to complete.
精彩评论