[FLASH-USERS] MPI failure

Peter Vitello vitello at llnl.gov
Mon Jun 11 17:54:01 CDT 2007


Thanks for the suggestion, but it doesn't look like my MPI out of 
memory failure is due to the stack size being limited.
The results from ulimit are as follows, and stack size is 
unlimited.  While unlimited, I don't know what memory is actually
available.  Does anyone know where else to check for what would FLASH 
2.5 to generate:

Fatal error in MPI_Irecv: Other MPI error, error stack:
MPI_Irecv(144): MPI_Irecv(buf=0x124f8cc0, count=12, 
MPI_DOUBLE_PRECISION, src=21, tag=440, MPI_COMM_WORLD, 
request=0x137dab7c) failed
MPID_Irecv(74): Out of memory

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
file size               (blocks, -f) unlimited
pending signals                 (-i) 37376
max locked memory       (kbytes, -l) 32
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 37376
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Thanks for any help.

Peter Vitello
LLNL

At 02:46 PM 6/11/2007, you wrote:

>>Perhaps you have checked, but the "Out of memory" suggests exactly
>>that. Was the block count on each processor approaching MAXBLOCKS?
>>I have seen crashes with non-intuitive mpi errors when the number 
>>of blocks on the processors gets close to the maximum and the code
>>runs out of memory. You might try the run again on more processors.
>
>I have fouled up the usage of more codes than most
>people have ever heard of: if it was an out-of-memory problem and
>maxblocks was not being reached, check to make sure your stacksize is
>not limited.  You should be able to run the Unix command "limit" and
>see something like
>
>% limit
>cputime      unlimited
>filesize     unlimited
>datasize     unlimited
>stacksize    unlimited
>coredumpsize 0 kbytes
>memoryuse    unlimited
>vmemoryuse   unlimited
>descriptors  1024
>memorylocked 32 kbytes
>maxproc      98304
>
>If that shows some smaller limit, in your .cshrc or .bashrc file, enter
>the line
>     unlimit stacksize
>so that all new processes (esp the MPI ones) are started without the
>stacksize limited.
>
>This will only cause you trouble if your code is starting a lot of 
>Java virtual machines, or you are directly using pthreads, which are
>both unusual for most HPC MPI codes.
>
>This has been an irritating bug encountered so many times that I've 
>started having our sysadmin apply it for the default start up file
>for every new student I get. It always causes troubles that seem far
>removed from the root cause.  I strongly suspect that by now LLNL has
>already done this by default for most users, but if you copied over a
>.cshrc file from another machine it may have been overwritten or
>overrided.
>
>And if this was the problem, feel free to post this to the rest of the
>flash-user list. I have no sense of shame anymore, and don't mind
>everyone knowing about how many times I've made this same mistake!



More information about the flash-users mailing list