开发者

OpenMPI debugging with Valgrind and suppressions in OS X

开发者 https://www.devze.com 2023-03-28 16:47 出处:网络
I am writing a parallel code in C++ on my OS X (Snow Leopard) laptop, and I am trying to debug it with memchecker. I have successfully built OpenMPI with valgrind support with: configure --prefix=/opt

I am writing a parallel code in C++ on my OS X (Snow Leopard) laptop, and I am trying to debug it with memchecker. I have successfully built OpenMPI with valgrind support with: configure --prefix=/opt/openmpi-1.4.3/ --enable-debug --enable-memchecker --with-valgrind=/opt/valgrind-3.6.0/ FFLAGS=-m64 F90FLAGS=-m64 (Ignore the Fortran flags, it's due to my Fortran compiler being from GCC).

When I run my application with

mpirun -np 2 valgrind --suppressions=/opt/openmpi-1.4.3/share/openmpi/openmpi-valgrind.supp --leak-check=yes --dsymutil=yes ./program

I get a whole lot of warnings from Valgrind (the most of them from the heap summary at the end). I have included a small snippet of the warnings below. What I get from them is that Valgrind detects memory leaks and uninitialised values in the MPI library, but I'm not really interested in that. I want warnings from the code I write. I already run Valgrind with the suppression file provided by OpenMPI, but evidently it is not enough. How can I easily ignore all the other warnings detected in the OpenMPI distribution? Is it possible to find a suppression file for OpenMPI debugging with Valgrind on OS X, or do you know any cunning trick?

The first warning is

 ==1531==    Syscall param writev(vector[...]) points to uninitialised byte(s)
 ==1531==    at 0x1014E16E2: writev (in /usr/lib/libSystem.B.dylib)
 ==1531==    by 0x101AEA4C5: mca_oob_tcp_peer_send (in /opt/openmpi-1.4.3/lib/openmpi/mca_oob_tcp.so)
 ==1531==    by 0x101AF0B88: mca_oob_tcp_send_nb (in /opt/openmpi-1.4.3/lib/openmpi/mca_oob_tcp.so)
 ==1531==    by 0x101AC7F48: orte_rml_oob_send (in /opt/openmpi-1.4.3/lib/openmpi/mca_rml_oob.so)
 ==1531==    by 0x101AC8AA1: orte_rml_oob_send_buffer (in /opt/openmpi-1.4.3/lib/openmpi/mca_rml_oob.so)
 ==1531==    by 0x101B3489E: allgather (in /opt/openmpi-1.4.3/lib/openmpi/mca_grpcomm_bad.so) 
 ==1531==    by 0x101B3525D: modex (in /opt/openmpi-1.4.3/lib/openmpi/mca_grpcomm_bad.so)
 ==1531==    by 0x1000A48E6: ompi_mpi_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x1000F7806: MPI_Init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100001AF2: main (main.cpp:34)
 ==1531==  Address 0x101a8911b is 107 bytes inside a block of size 256 alloc'd
 ==1531==    at 0x10002DB2D: realloc (vg_replace_malloc.c:525)
 ==1531==    by 0x1012240B6: opal_dss_buffer_extend (in /opt/openmpi-1.4.3/lib/libopen- pal.0.dylib)
 ==1531==    by 0x101225CF7: opal_dss_copy_payload (in /opt/openmpi-1.4.3/lib/libopen-pal.0.dylib)
 ==1531==    by 0x101B347CA: allgather (in /opt/openmpi-1.4.3/lib/openmpi/mca_grpcomm_bad.so)
 ==1531==    by 0x101B3525D: modex (in /opt/openmpi-1.4.3/lib/openmpi/mca_grpcomm_bad.so)
 ==1531==    by 0x1000A48E6: ompi_mpi_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x1000F7806: MPI_Init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100001AF2: main (main.cpp:34)

After execution a small snippet of the heap summary looks like this

 ==1531== 88 bytes in 1 blocks are definitely lost in loss record 1,950 of 2,194
 ==1531==    at 0x10002D915: malloc (vg_replace_malloc.c:236)
 ==1531==    by 0x100073888: opal_obj_new (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100073808: opal_obj_new_debug (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100073C17: ompi_attr_create_keyval_impl (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100073FCF: ompi_attr_create_keyval (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100077C96: create_comm (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x10007798A: ompi_attr_create_predefined (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x1000737CF: ompi_attr_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x1000A4840: ompi_mpi_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x1000F7806: MPI_Init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100001AF2: main (main.cpp:34)

...

 ==1531== 88 bytes in 1 blocks are definitely lost in loss record 1,952 of 2,194
 ==1531==    at 0x10002D915: malloc (vg_replace_malloc.c:236)
 ==1531==    by 0x100073888: opal_obj_new (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100073808: opal_obj_new_debug (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100073C17: ompi_attr_create_keyval_impl (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100073FCF: ompi_attr_create_keyval (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x10014CEC5: PMPI_Keyval_create (in /opt/openmpi-1.4.3开发者_运维技巧/lib/libmpi.0.dylib)
 ==1531==    by 0x1065ACFE6: ???
 ==1531==    by 0x10658867B: ???
 ==1531==    by 0x10017A591: module_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100179985: mca_io_base_file_select (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100089D55: ompi_file_open (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x1000E1ED1: MPI_File_open (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531== 
 ==1531== 88 bytes in 1 blocks are definitely lost in loss record 1,953 of 2,194
 ==1531==    at 0x10002D915: malloc (vg_replace_malloc.c:236)
 ==1531==    by 0x100073888: opal_obj_new (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100073808: opal_obj_new_debug (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100073C17: ompi_attr_create_keyval_impl (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x100073FCF: ompi_attr_create_keyval (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x10014CEC5: PMPI_Keyval_create (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)
 ==1531==    by 0x1065A6210: ???
 ==1531==    by 0x106597149: ???
 ==1531==    by 0x106596AAB: ???
 ==1531==    by 0x1065AD14C: ???
 ==1531==    by 0x10658867B: ???
 ==1531==    by 0x10017A591: module_init (in /opt/openmpi-1.4.3/lib/libmpi.0.dylib)


I can't speak to Open MPI's behavior under Valgrind, but MPICH2 should be better about this. If you don't specifically need Open MPI as your MPI implementation, then you can easily configure MPICH2 to avoid problems with Valgrind.


You can add additional suppressions yourself for valgrind. These will take care of the first set of warnings that you posted:

{
  ORTE OOB suppression rule
  Memcheck:Param
  writev(vector[...])
  fun:writev
  fun:mca_oob_tcp_msg_send_handler
  fun:mca_oob_tcp_peer_send
  fun:mca_oob_tcp_send_nb
  fun:orte_rml_oob_send
  fun:orte_rml_oob_send_buffer
  ...
  fun:ompi_mpi_init
}

{
  OMPI init leak
  Memcheck:Leak
  fun:malloc
  ...
  fun:ompi_mpi_init
}

{
  OMPI init leak
  Memcheck:Leak
  fun:realloc
  ...
  fun:ompi_mpi_init
}

{
  OMPI init leak
  Memcheck:Leak
  fun:calloc
  ...
  fun:ompi_mpi_init
}
0

精彩评论

暂无评论...
验证码 换一张
取 消