开发者

error handlers in MPICH and OpenMPI

开发者 https://www.devze.com 2023-02-28 21:24 出处:网络
are there any error handlers implemented in OpenMPI and MPICH other than MPI_ERROR_RETURN and MPI_ERRORS_ARE_FATAL? which implementation is better in handling the errors?

are there any error handlers implemented in OpenMPI and MPICH other than MPI_ERROR_RETURN and MPI_ERRORS_ARE_FATAL? which implementation is better in handling the errors? kindly suggest a link for more info开发者_运维百科rmation about the same..


No, those are the only two error handlers defined by the standard currently.

The MPI forum is currently working on what will become MPI-3, and error handling and fault tolerance will be an important component of the new standard (there's a working group dedicated to the topic). Until that work is complete, however, the only way to get stronger fault tolerance out of MPI is to use earlier, nonstandard, extensions. FT-MPI was a project that developed a very robust MPI, but unfortuantely it's based on MPI1.2; a very early version of the standard. There's MPICH-V, based on MPI2, but that's more checkpoint-restart based.

Along the lines of checkpoint-restart, both OpenMPI and MPICH2 have support for BLCR, a transparent checkpoint-restart form of fault tolerance which allows easy rollback to the last checkpoint in case of hardware or network failure.

0

精彩评论

暂无评论...
验证码 换一张
取 消