开发者

Caught a fatal signal: SIGBUS(7) on node 2/32

开发者 https://www.devze.com 2023-02-19 07:13 出处:网络
I\'m trying to run NAS-UPC benchmarks o开发者_StackOverflow中文版n a 32 node cluster. It works fine in cases where the problem size is small . When I graduate to a bigger problem size (CLASS D), I ge

I'm trying to run NAS-UPC benchmarks o开发者_StackOverflow中文版n a 32 node cluster.

It works fine in cases where the problem size is small . When I graduate to a bigger problem size (CLASS D), I get this error (for MG benchmark)

*** Caught a fatal signal: SIGBUS(7) on node 2/32
 p4_error: latest msg from perror: Bad file descriptor
*** Caught a signal: SIGPIPE(13) on node 0/32
    p4_error: latest msg from perror: Bad file descriptor
   p4_error: latest msg from perror: Bad file descriptor

*** FATAL ERROR: recursion failure in AMMPI_SPMDExit
*** Caught a signal: SIGPIPE(13) on node 27/32
*** Caught a signal: SIGPIPE(13) on node 20/32
*** Caught a signal: SIGPIPE(13) on node 21/32
    p4_error: latest msg from perror: Bad file descriptor
*** FATAL ERROR: recursion failure in AMMPI_SPMDExit
*** FATAL ERROR: recursion failure in AMMPI_SPMDExit
*** FATAL ERROR: recursion failure in AMMPI_SPMDExit
*** Caught a signal: SIGPIPE(13) on node 16/32
*** FATAL ERROR: recursion failure in AMMPI_SPMDExit

Can anybody explain why this is happening , And if anyone has seen this error before and fixed it ?

EDIT : Figured out it is a memory related problem . But I'm unable to allott right amount of memory for application at compile time


Check a dmesg output - it can be an out-of-memory issue. Or, again, it can be a some from ulimit -a hitted, e.g. a stacksize (default stack size is too small for some NAS tasks).

If you have a lines like "Out of Memory: Killed process ###" in dmesg output on any of your machines - it means that your program required (and tried to use) a lot of memory, bigger than your OS can give to the application. There are several limits of memory:

  1. ulimit -v - user limit for virtual memory size. Check all ulimit -a limits also, but seems that your case is not this
  2. You can use not more memory than you have total RAM and all swap sizes (check with free command). But if your application uses more memory than RAM size, and begin to do swapping - the performance will be bad (in most cases).
  3. There are architectural limits of maximum memory, allowable to single process to have. For 32-bit nodes this limit can be from 1(very rare case) to 2, 3, 4 GB. Even if your 32-bit system have >4 GB of memory, e.g. with using of PAE - no single process can take > 4 Gb. A big part of 4Gb virtual space also taken by OS (from hundreds of MB up to GBs).


I figured it is a problem with benchmark needing more memory than i had allotted it during compile time.

0

精彩评论

暂无评论...
验证码 换一张
取 消