Background: For a C++ AMP overview, see Daniel Moth's recent BUILD talk.
Going through the initial walk-throughs here, here, here, and here.
Only in that last reference do they make a call to array_view.synchronize()
.
In these simple examples, is a call to synchronize()
not 开发者_运维技巧needed? When is it safe to exclude? Can we trust parallel_for_each to behave "synchronously" without it (w/r/t the proceeding code)?
Use synchronize() when you want to access the data without going through the array_view interface. If all of your access to the data uses array_view operators and functions, you don't need to use synchronize(). As Daniel mentioned, the destructor of an array_view forces a synchronize as well, and it's better to call synchronize() in that case so you can get any exceptions that might be thrown.
The synchronize function forces an update to the buffer within the calling context -- that is if you write data on the GPU and then call synchronize in CPU code, at that point the updated values are copied to CPU memory.
This seems obvious from the name, but I mention it because other array_view operations can cause a 'synchronize' as well. C++ AMP array_view tries it best to make copying between the CPU and GPU memory implict -- any operation which reads data through the array view interface will cause a copy as well.
std::vector<int> v(10);
array_view<int, 1> av(10, v);
parallel_for_each(av.grid, [=](index<1> i) restrict(direct3d) {
av[i] = 7;
}
// at this point, data isn't copied back
std::wcout << v[0]; // should print 0
// using the array_view to access data will force a copy
std::wcout << av[0]; // should print 7
// at this point data is copied back
std::wcout << v[0]; // should print 7
my_array_view_instance.synchronize is not required for the simple examples I showed because the destructor calls synchronize. Having said that, I am not following best practice (sorry), which is to explicitly call synchronize. The reason is that if any exceptions are thrown at that point, you would not observe them if you left them up to the destructor, so please call synchronize explicitly.
Cheers
Daniel
Just noticed the second question in your post about parallel_for_each being synchronous vs asyncrhonous (sorry, I am used to 1 question per thread ;-) "Can we trust parallel_for_each to behave "synchronously" without it (w/r/t the proceeding code)?"
The answer to that is on my post about parallel_for_each: http://www.danielmoth.com/Blog/parallelforeach-From-Amph-Part-1.aspx
..and also in the BUILD recording you pointed to from 29:20-33:00 http://channel9.msdn.com/Events/BUILD/BUILD2011/TOOL-802T
In a nutshell, no you cannot trust it to be synchronous, it is asyncrhonous. The (implicit or explicit) synchronization point is any code that tries to access the data that is expected to be copied back from the GPU as a result of your parallel loop.
Cheers
Daniel
I'm betting that it's never safe to exclude because in a multi-threaded (concurrent or parallel) it's never safe to assume anything. There are certain guarantees which certain constructs give you but, you have to super careful and meticulous not to break these guarantees by introducing something which you think is fine to do but when in reality there's a lot of complexity underpinning the whole thing.
Haven't spent any time with C++-AMP yet but I'm inclined to try it out.
精彩评论