are there any error handlers implemented in OpenMPI and MPICH other than MPI_ERROR_RETURN and MPI_ERRORS_ARE_FATAL? which implementation is better in handling the errors? kindly suggest a link for more info开发者_运维百科rmation about the same..
No, those are the only two error handlers defined by the standard currently.
The MPI forum is currently working on what will become MPI-3, and error handling and fault tolerance will be an important component of the new standard (there's a working group dedicated to the topic). Until that work is complete, however, the only way to get stronger fault tolerance out of MPI is to use earlier, nonstandard, extensions. FT-MPI was a project that developed a very robust MPI, but unfortuantely it's based on MPI1.2; a very early version of the standard. There's MPICH-V, based on MPI2, but that's more checkpoint-restart based.
Along the lines of checkpoint-restart, both OpenMPI and MPICH2 have support for BLCR, a transparent checkpoint-restart form of fault tolerance which allows easy rollback to the last checkpoint in case of hardware or network failure.
精彩评论