How can i make MPI process notify the others about an error for example, specially on an 开发者_C百科MPI program where all the MPI processees are independant from each others ( There no synchronisation between the different MPI processees ) ?
Thanks
I find your idea of an MPI program in which all the processes are independent very strange. I think that, by definition, all the processes in an MPI program are not independent, they are all, for example, in the same communicator after you have called MPI_INIT so they all 'know' of each others existence. You may have written your code so that the processes do not synchronise after that, but the means still exist for processes to communicate with each other.
One mechanism to look into (which does require synchronisation) is MPI_BCAST (broadcast). Another approach would be to use MPI_ISEND, the non-blocking send operation but, sooner or later, one process or another will have to receive and your sending process ought to test whether the send has succeeded or not.
the disparity you point out makes me wonder: why are you using MPI? it doesn't seem to fit your problem, and there's not much worse than trying to shove a square peg into MPI's round hole(s). "no synchronization between MPI processes" makes it sound like you've taken a workload that is inherently serial-farming, and are trying to turn it into MPI.
that said, you can probably do what you want simply by polling periodically with MPI_Irecv and MPI_Test.
Being independent and having no synchronisation are two entirely different scenarios when dealing with MPI, thanks to non-blocking communication.
It seems to me that what you want can be implemented this way: when an error occurs, a process broadcasts a message with a designated "error" tag, and each process periodically posts non-blocking receives for a message with this tag. If they receive such a message, it means that an error occured recently and they can react accordingly, otherwise they continue their normal execution.
(Note that "broadcasting" in this case doesn't refer to MPI_Bcast
, since that's a collective communication operation, and as such blocks. Instead, it simply means sending the same message to everyone it may concern. If you want to maintain no synchronisation between the processes, then this sending will have to be non-blocking as well.)
There is nothing in the MPI Standard that allows for an "interrupt" to be sent from one rank to another rank (or ranks). In general, progression requires that user code enter the MPI library from time to time. Absent progression, there is no standard way to communicate between the ranks.
Synchronization requires that from time to time there is some entry into the MPI library. MPI_Barrier is the "big hammer" approach to synchronization. Combined with MPI_Reduce_Scatter, it would be possible to know there is some error on at least one rank.
精彩评论