From woitke at strw.leidenuniv.nl Thu Dec 2 07:14:47 2004 From: woitke at strw.leidenuniv.nl (Peter Woitke) Date: Thu, 2 Dec 2004 14:14:47 +0100 (MET) Subject: [FLASH-BUGS] MPI_REQUEST_MAX Message-ID: Dear developers, for long runs (more than 12 hours on 32 Processors) my flash-job crashes with error message *** MPI has run out of request entries. *** The current allocation level is: *** MPI_REQUEST_MAX = 16384 Here is some advice from google: http://www.cray.com/craydoc/manuals/004-3687-001/html-004-3687-001/zfixe dlqtgeczn.htmlD 8.5. Why do I keep getting error messages about MPI_REQUEST_MAX being too small, no matter how large I set it? You are probably calling MPI_Isend(3) or MPI_Irecv(3) and not completing or freeing your request objects. You should use MPI_Request_free(3), as described in the previous question. I checked that, indeed, MPI_Isend and MPI_Irecv are called from the FLASH-code, but MPI_Request_free is never called. Have you thought about this? Thanx, Peter Woitke From tomek at flash.uchicago.edu Thu Dec 2 09:31:44 2004 From: tomek at flash.uchicago.edu (Tomasz Plewa) Date: Thu, 2 Dec 2004 09:31:44 -0600 Subject: [FLASH-BUGS] MPI_REQUEST_MAX In-Reply-To: ; from woitke@strw.leidenuniv.nl on Thu, Dec 02, 2004 at 02:14:47PM +0100 References: Message-ID: <20041202093144.A25957@flash.uchicago.edu> Peter - This is for the first time I see this kind of error. I Can you associate this error with any part of the code? Having a checkpoint file saved prior to suspected crash and restarting the code would tell us whether such error is really due to accumulation of requests. Writing some trace messages from master processor around major modules in flash.F90 evolution loop would tell us where this error might be occuring. Tomek -- On Thu, Dec 02, 2004 at 02:14:47PM +0100, Peter Woitke wrote: > Dear developers, > > for long runs (more than 12 hours on 32 Processors) my flash-job > crashes with error message > > *** MPI has run out of request entries. > *** The current allocation level is: > *** MPI_REQUEST_MAX = 16384 > > Here is some advice from google: > http://www.cray.com/craydoc/manuals/004-3687-001/html-004-3687-001/zfixe > dlqtgeczn.htmlD > 8.5. Why do I keep getting error messages about MPI_REQUEST_MAX being > too small, no matter how large I set it? > You are probably calling MPI_Isend(3) or MPI_Irecv(3) and not completing > or freeing your request objects. You should use MPI_Request_free(3), as > described in the previous question. > > I checked that, indeed, MPI_Isend and MPI_Irecv are called from the > FLASH-code, but MPI_Request_free is never called. > > Have you thought about this? > > Thanx, > > Peter Woitke -- Thu, 09:31 CST (15:31 GMT), Dec-02-2004 _______________________________________________________________________________ Tomasz Plewa www: flash.uchicago.edu Computational Physics and Validation Group email: tomek at uchicago.edu The ASC FLASH Center, The University of Chicago phone: 773.834.3227 5640 South Ellis Ave, RI 475, Chicago, IL 60637 fax: 773.834.3230 _______________________________________________________________________________ From lusk at mcs.anl.gov Thu Dec 2 09:36:34 2004 From: lusk at mcs.anl.gov (Rusty Lusk) Date: Thu, 02 Dec 2004 09:36:34 -0600 (CST) Subject: [FLASH-BUGS] MPI_REQUEST_MAX In-Reply-To: References: Message-ID: <20041202.093634.37225370.lusk@localhost> You should not need to call MPI_Request_free if your Isend's and Irecv's are waited on with MPI_Wait or MPI_Waitall. So it is not an obviously bad thing if there are no explicit calls to MPI_Request_Free. But something is causing you to have lots of requests active at one time. 16,000 is pretty many for a single process. Rusty Lusk From: Peter Woitke Subject: [FLASH-BUGS] MPI_REQUEST_MAX Date: Thu, 2 Dec 2004 14:14:47 +0100 (MET) > Dear developers, > > for long runs (more than 12 hours on 32 Processors) my flash-job > crashes with error message > > *** MPI has run out of request entries. > *** The current allocation level is: > *** MPI_REQUEST_MAX = 16384 > > Here is some advice from google: > http://www.cray.com/craydoc/manuals/004-3687-001/html-004-3687-001/zfixe > dlqtgeczn.htmlD > 8.5. Why do I keep getting error messages about MPI_REQUEST_MAX being > too small, no matter how large I set it? > You are probably calling MPI_Isend(3) or MPI_Irecv(3) and not completing > or freeing your request objects. You should use MPI_Request_free(3), as > described in the previous question. > > I checked that, indeed, MPI_Isend and MPI_Irecv are called from the > FLASH-code, but MPI_Request_free is never called. > > Have you thought about this? > > Thanx, > > Peter Woitke > From rloy at mcs.anl.gov Thu Dec 2 13:13:13 2004 From: rloy at mcs.anl.gov (Ray Loy) Date: Thu, 02 Dec 2004 13:13:13 -0600 Subject: [FLASH-BUGS] MPI_REQUEST_MAX In-Reply-To: Message from Rusty Lusk of "Thu, 02 Dec 2004 09:36:34 CST." <20041202.093634.37225370.lusk@localhost> Message-ID: <200412021913.iB2JDDN20797@shakey.mcs.anl.gov> > Rusty Lusk wrote: > ---- > You should not need to call MPI_Request_free if your Isend's and Irecv's > are waited on with MPI_Wait or MPI_Waitall. That is in fact the case, at least in the routines that do the bulk of the MPI. (Assuming a compile with Paramesh 2; I'm not as familiar with P3). -r From dubey at tagore.uchicago.edu Thu Dec 2 16:22:14 2004 From: dubey at tagore.uchicago.edu (Anshu Dubey) Date: Thu, 2 Dec 2004 16:22:14 -0600 (CST) Subject: [FLASH-BUGS] MPI_REQUEST_MAX In-Reply-To: <200412021913.iB2JDDN20797@shakey.mcs.anl.gov> Message-ID: This exact problem had surfaced once in the past, and it had turned out that one specific non blocking call didn't have a corresponding wait. That was fixed (as far as I know). It is possible that we overlooked another one. Peter, could you give us more specifics about when you run into problem, so we can investigate ? Thanks On Thu, 2 Dec 2004, Ray Loy wrote: > > > Rusty Lusk wrote: > > ---- > > You should not need to call MPI_Request_free if your Isend's and Irecv's > > are waited on with MPI_Wait or MPI_Waitall. > > > That is in fact the case, at least in the routines that do the bulk of > the MPI. (Assuming a compile with Paramesh 2; I'm not as familiar > with P3). > > > -r > -- Anshu Dubey Astronomy & Astrophysics University of Chicago 5640 S. Ellis Ave. Chicago IL 60637 Tel : (773) 834 2999 Fax : (773) 834 3230 From woitke at strw.leidenuniv.nl Thu Dec 2 07:14:47 2004 From: woitke at strw.leidenuniv.nl (Peter Woitke) Date: Thu, 2 Dec 2004 14:14:47 +0100 (MET) Subject: [FLASH-BUGS] MPI_REQUEST_MAX Message-ID: Dear developers, for long runs (more than 12 hours on 32 Processors) my flash-job crashes with error message *** MPI has run out of request entries. *** The current allocation level is: *** MPI_REQUEST_MAX = 16384 Here is some advice from google: http://www.cray.com/craydoc/manuals/004-3687-001/html-004-3687-001/zfixe dlqtgeczn.htmlD 8.5. Why do I keep getting error messages about MPI_REQUEST_MAX being too small, no matter how large I set it? You are probably calling MPI_Isend(3) or MPI_Irecv(3) and not completing or freeing your request objects. You should use MPI_Request_free(3), as described in the previous question. I checked that, indeed, MPI_Isend and MPI_Irecv are called from the FLASH-code, but MPI_Request_free is never called. Have you thought about this? Thanx, Peter Woitke From tomek at flash.uchicago.edu Thu Dec 2 09:31:44 2004 From: tomek at flash.uchicago.edu (Tomasz Plewa) Date: Thu, 2 Dec 2004 09:31:44 -0600 Subject: [FLASH-BUGS] MPI_REQUEST_MAX In-Reply-To: ; from woitke@strw.leidenuniv.nl on Thu, Dec 02, 2004 at 02:14:47PM +0100 References: Message-ID: <20041202093144.A25957@flash.uchicago.edu> Peter - This is for the first time I see this kind of error. I Can you associate this error with any part of the code? Having a checkpoint file saved prior to suspected crash and restarting the code would tell us whether such error is really due to accumulation of requests. Writing some trace messages from master processor around major modules in flash.F90 evolution loop would tell us where this error might be occuring. Tomek -- On Thu, Dec 02, 2004 at 02:14:47PM +0100, Peter Woitke wrote: > Dear developers, > > for long runs (more than 12 hours on 32 Processors) my flash-job > crashes with error message > > *** MPI has run out of request entries. > *** The current allocation level is: > *** MPI_REQUEST_MAX = 16384 > > Here is some advice from google: > http://www.cray.com/craydoc/manuals/004-3687-001/html-004-3687-001/zfixe > dlqtgeczn.htmlD > 8.5. Why do I keep getting error messages about MPI_REQUEST_MAX being > too small, no matter how large I set it? > You are probably calling MPI_Isend(3) or MPI_Irecv(3) and not completing > or freeing your request objects. You should use MPI_Request_free(3), as > described in the previous question. > > I checked that, indeed, MPI_Isend and MPI_Irecv are called from the > FLASH-code, but MPI_Request_free is never called. > > Have you thought about this? > > Thanx, > > Peter Woitke -- Thu, 09:31 CST (15:31 GMT), Dec-02-2004 _______________________________________________________________________________ Tomasz Plewa www: flash.uchicago.edu Computational Physics and Validation Group email: tomek at uchicago.edu The ASC FLASH Center, The University of Chicago phone: 773.834.3227 5640 South Ellis Ave, RI 475, Chicago, IL 60637 fax: 773.834.3230 _______________________________________________________________________________ From lusk at mcs.anl.gov Thu Dec 2 09:36:34 2004 From: lusk at mcs.anl.gov (Rusty Lusk) Date: Thu, 02 Dec 2004 09:36:34 -0600 (CST) Subject: [FLASH-BUGS] MPI_REQUEST_MAX In-Reply-To: References: Message-ID: <20041202.093634.37225370.lusk@localhost> You should not need to call MPI_Request_free if your Isend's and Irecv's are waited on with MPI_Wait or MPI_Waitall. So it is not an obviously bad thing if there are no explicit calls to MPI_Request_Free. But something is causing you to have lots of requests active at one time. 16,000 is pretty many for a single process. Rusty Lusk From: Peter Woitke Subject: [FLASH-BUGS] MPI_REQUEST_MAX Date: Thu, 2 Dec 2004 14:14:47 +0100 (MET) > Dear developers, > > for long runs (more than 12 hours on 32 Processors) my flash-job > crashes with error message > > *** MPI has run out of request entries. > *** The current allocation level is: > *** MPI_REQUEST_MAX = 16384 > > Here is some advice from google: > http://www.cray.com/craydoc/manuals/004-3687-001/html-004-3687-001/zfixe > dlqtgeczn.htmlD > 8.5. Why do I keep getting error messages about MPI_REQUEST_MAX being > too small, no matter how large I set it? > You are probably calling MPI_Isend(3) or MPI_Irecv(3) and not completing > or freeing your request objects. You should use MPI_Request_free(3), as > described in the previous question. > > I checked that, indeed, MPI_Isend and MPI_Irecv are called from the > FLASH-code, but MPI_Request_free is never called. > > Have you thought about this? > > Thanx, > > Peter Woitke > From rloy at mcs.anl.gov Thu Dec 2 13:13:13 2004 From: rloy at mcs.anl.gov (Ray Loy) Date: Thu, 02 Dec 2004 13:13:13 -0600 Subject: [FLASH-BUGS] MPI_REQUEST_MAX In-Reply-To: Message from Rusty Lusk of "Thu, 02 Dec 2004 09:36:34 CST." <20041202.093634.37225370.lusk@localhost> Message-ID: <200412021913.iB2JDDN20797@shakey.mcs.anl.gov> > Rusty Lusk wrote: > ---- > You should not need to call MPI_Request_free if your Isend's and Irecv's > are waited on with MPI_Wait or MPI_Waitall. That is in fact the case, at least in the routines that do the bulk of the MPI. (Assuming a compile with Paramesh 2; I'm not as familiar with P3). -r From dubey at tagore.uchicago.edu Thu Dec 2 16:22:14 2004 From: dubey at tagore.uchicago.edu (Anshu Dubey) Date: Thu, 2 Dec 2004 16:22:14 -0600 (CST) Subject: [FLASH-BUGS] MPI_REQUEST_MAX In-Reply-To: <200412021913.iB2JDDN20797@shakey.mcs.anl.gov> Message-ID: This exact problem had surfaced once in the past, and it had turned out that one specific non blocking call didn't have a corresponding wait. That was fixed (as far as I know). It is possible that we overlooked another one. Peter, could you give us more specifics about when you run into problem, so we can investigate ? Thanks On Thu, 2 Dec 2004, Ray Loy wrote: > > > Rusty Lusk wrote: > > ---- > > You should not need to call MPI_Request_free if your Isend's and Irecv's > > are waited on with MPI_Wait or MPI_Waitall. > > > That is in fact the case, at least in the routines that do the bulk of > the MPI. (Assuming a compile with Paramesh 2; I'm not as familiar > with P3). > > > -r > -- Anshu Dubey Astronomy & Astrophysics University of Chicago 5640 S. Ellis Ave. Chicago IL 60637 Tel : (773) 834 2999 Fax : (773) 834 3230