I'm trying to run NAS-UPC benchmarks o开发者_StackOverflow中文版n a 32 node cluster.
It works fine in cases where the problem size is small . When I graduate to a bigger problem size (CLASS D), I get this error (for MG benchmark)
*** Caught a fatal signal: SIGBUS(7) on node 2/32
p4_error: latest msg from perror: Bad file descriptor
*** Caught a signal: SIGPIPE(13) on node 0/32
p4_error: latest msg from perror: Bad file descriptor
p4_error: latest msg from perror: Bad file descriptor
*** FATAL ERROR: recursion failure in AMMPI_SPMDExit
*** Caught a signal: SIGPIPE(13) on node 27/32
*** Caught a signal: SIGPIPE(13) on node 20/32
*** Caught a signal: SIGPIPE(13) on node 21/32
p4_error: latest msg from perror: Bad file descriptor
*** FATAL ERROR: recursion failure in AMMPI_SPMDExit
*** FATAL ERROR: recursion failure in AMMPI_SPMDExit
*** FATAL ERROR: recursion failure in AMMPI_SPMDExit
*** Caught a signal: SIGPIPE(13) on node 16/32
*** FATAL ERROR: recursion failure in AMMPI_SPMDExit
Can anybody explain why this is happening , And if anyone has seen this error before and fixed it ?
EDIT : Figured out it is a memory related problem . But I'm unable to allott right amount of memory for application at compile time
Check a dmesg
output - it can be an out-of-memory issue. Or, again, it can be a some from ulimit -a
hitted, e.g. a stacksize (default stack size is too small for some NAS tasks).
If you have a lines like "Out of Memory: Killed process ###" in dmesg
output on any of your machines - it means that your program required (and tried to use) a lot of memory, bigger than your OS can give to the application. There are several limits of memory:
ulimit -v
- user limit for virtual memory size. Check allulimit -a
limits also, but seems that your case is not this- You can use not more memory than you have total RAM and all swap sizes (check with
free
command). But if your application uses more memory than RAM size, and begin to do swapping - the performance will be bad (in most cases). - There are architectural limits of maximum memory, allowable to single process to have. For 32-bit nodes this limit can be from 1(very rare case) to 2, 3, 4 GB. Even if your 32-bit system have >4 GB of memory, e.g. with using of PAE - no single process can take > 4 Gb. A big part of 4Gb virtual space also taken by OS (from hundreds of MB up to GBs).
I figured it is a problem with benchmark needing more memory than i had allotted it during compile time.
精彩评论