From klaus at flash.uchicago.edu Tue Sep 1 10:26:54 2009 From: klaus at flash.uchicago.edu (Klaus Weide) Date: Tue, 1 Sep 2009 10:26:54 -0500 (CDT) Subject: [FLASH-USERS] 4000+ cpus on Franklin In-Reply-To: <815AAC49-6FB6-499D-8DF0-5E8ED7A9F798@ucolick.org> References: <815AAC49-6FB6-499D-8DF0-5E8ED7A9F798@ucolick.org> Message-ID: On Mon, 31 Aug 2009, James Guillochon wrote: > Hi all, > > I'm trying to restart a FLASH simulation on Franklin. If I run on 2000 cpus, > the job runs fine, however if I try to push the code to 4000 cpus, I get the > following error: > > abort_message [flash_convert_cc_hook] Trying to convert non-zero mass-specific > variable to per-volume form, but dens is zero! James, As you have probably noticed, the problem (or at least the symptom) is essentially the same as what you reported to flash-users in April: unwanted zero values in DENS_VAR. The code in flash_convert_cc_hook, which triggers the abort, does essentially the same as the code in amr_prolong_gen_unk1_fun, where the problem showed up in your previous report. I don't know what was ultimately the cause of the previous problem, but you have solved it somehow; could the cause be similar this time? Please remind us whether you are using the latest released version of FLASH. Also, - does this problem occur immediately after restart, or some time later? - What are the last log file messages before the abort? - There should also have been a message on standard output, with additional information (PE, ivar, and value). Do you have that? - Does the problem occur on several PEs at the same time (your should then see several of the standard output messages), or only on one CPU? Klaus From jfg at ucolick.org Tue Sep 1 12:31:57 2009 From: jfg at ucolick.org (James Guillochon) Date: Tue, 1 Sep 2009 10:31:57 -0700 Subject: [FLASH-USERS] 4000+ cpus on Franklin In-Reply-To: References: <815AAC49-6FB6-499D-8DF0-5E8ED7A9F798@ucolick.org> Message-ID: <7035E0FD-DFD9-49A2-A1A2-07B389C4588D@ucolick.org> Hi Klaus, I am using FLASH 3.0. The problem occurs immediately after restart, before the first time step. Here's a copy of the log before aborting: [ 08-31-2009 21:56:16.578 ] [GRID amr_refine_derefine]: initiating refinement [GRID amr_refine_derefine] min blks 17 max blks 21 tot blks 76393 [GRID amr_refine_derefine] min leaf blks 13 max leaf blks 17 tot leaf blks 66844 [ 08-31-2009 21:56:16.655 ] [GRID amr_refine_derefine]: refinement complete [DRIVER_ABORT] Driver_abort() called by PE 4093 abort_message [flash_convert_cc_hook] Trying to convert non-zero mass- specific variable to per-volume form, but dens is zero! Here's the standard output: file: wdacc_hdf5_chk_00170 opened for restart read_data: read 76393 blocks. io_readData: finished reading input file. [Eos_init] Cannot open helm_table.bdat! [Eos_init] Trying old helm_table.dat! Source terms initialized don_dist, don_mass 3359870241.058920 6.5677519229596569E+032 [EOS Helmholtz] WARNING! Mask setting does not speed up Eos Helmholtz calls iteration, no. not moved = 0 76246 iteration, no. not moved = 1 42904 iteration, no. not moved = 2 0 refined: total leaf blocks = 66844 refined: total blocks = 76393 [flash_convert_cc_hook] PE= 4093, ivar= 4, why=2 Trying to convert non-zero mass-specific variable to per-volume form, but dens is zero! Application 1231431 exit codes: 1 Application 1231431 exit signals: Killed Application 1231431 resources: utime 0, stime 0 The error seems to be happening on one of the very last indexed processors (There are 4096 processors total, error is happening on 4093), and only on one of them. I've tried enabling "amr_error_checking" to dump some additional information, if I enable that option I end up with segmentation faults on all processors. Here's the standard output just before crashing: mpi_amr_1blk_restrict: after commsetup: pe 3 mpi_amr_1blk_restrict: after commsetup: pe 2 mpi_amr_1blk_restrict: pe 3 blk 10 ich = 1 mpi_amr_1blk_restrict: pe 3 blk 10 child = 3 11 mpi_amr_1blk_restrict: pe 3 blk 10 cnodetype = 1 mpi_amr_1blk_restrict: pe 3 blk 10 cempty = 0 mpi_amr_1blk_restrict: pe 2 blk 1 ich = 1 mpi_amr_1blk_restrict: pe 3 blk 10 calling perm mpi_amr_1blk_restrict: pe 2 blk 1 child = 2 2 mpi_amr_1blk_restrict: pe 2 blk 1 cnodetype = 1 mpi_amr_1blk_restrict: pe 2 blk 1 cempty = 0 mpi_amr_1blk_restrict: pe 2 blk 1 calling perm mpi_amr_1blk_restrict: pe 1 blk 2 after reset blockgeom mpi_amr_1blk_restrict: pe 1 blk 2 bef reset amr_restrict_unk_fun mpi_amr_1blk_restrict: pe 3 blk 10 exited perm mpi_amr_1blk_restrict: pe 2 blk 1 exited perm mpi_amr_1blk_restrict: pe 3 blk 10 calling blockgeom mpi_amr_1blk_restrict: pe 1 blk 2 aft reset amr_restrict_unk_fun mpi_amr_1blk_restrict: pe 2 blk 1 calling blockgeom mpi_amr_1blk_restrict: pe 1 blk 2 aft lcc mpi_amr_1blk_restrict: pe 1 blk 2 ich = 3 mpi_amr_1blk_restrict: pe 1 blk 2 child = 1 5 mpi_amr_1blk_restrict: pe 1 blk 2 cnodetype = 1 mpi_amr_1blk_restrict: pe 1 blk 2 cempty = 0 mpi_amr_1blk_restrict: pe 1 blk 2 calling perm Application 1231925 exit codes: 139 Application 1231925 exit signals: Killed Application 1231925 resources: utime 4885, stime 185 Unfortunately, I wasn't able to pin down what I actually did to fix the problem I had with zero density values a few months ago. I had been trying many different things, including changing bits of the actual Simulation code. Your help is very much appreciated! -- James Guillochon Department of Astronomy & Astrophysics University of California, Santa Cruz jfg at ucolick.org On Sep 1, 2009, at 8:26 AM, Klaus Weide wrote: > On Mon, 31 Aug 2009, James Guillochon wrote: > >> Hi all, >> >> I'm trying to restart a FLASH simulation on Franklin. If I run on >> 2000 cpus, >> the job runs fine, however if I try to push the code to 4000 cpus, >> I get the >> following error: >> >> abort_message [flash_convert_cc_hook] Trying to convert non-zero >> mass-specific >> variable to per-volume form, but dens is zero! > > James, > > As you have probably noticed, the problem (or at least the symptom) is > essentially the same as what you reported to flash-users in April: > unwanted zero values in DENS_VAR. The code in flash_convert_cc_hook, > which triggers the abort, does essentially the same as the code in > amr_prolong_gen_unk1_fun, where the problem showed up in your previous > report. I don't know what was ultimately the cause of the previous > problem, but you have solved it somehow; could the cause be similar > this > time? > > Please remind us whether you are using the latest released version of > FLASH. Also, > - does this problem occur immediately after restart, or some > time later? > - What are the last log file messages before the abort? > - There should also have been a message on standard output, with > additional information (PE, ivar, and value). Do you have that? > - Does the problem occur on several PEs at the same time (your should > then see several of the standard output messages), or only on one > CPU? > > > Klaus > > !DSPAM:10135,4a9d3d40291851707132194! > From seyit at astro.rug.nl Wed Sep 2 04:00:28 2009 From: seyit at astro.rug.nl (Seyit Hocuk) Date: Wed, 02 Sep 2009 11:00:28 +0200 Subject: [FLASH-USERS] 4000+ cpus on Franklin In-Reply-To: <7035E0FD-DFD9-49A2-A1A2-07B389C4588D@ucolick.org> References: <815AAC49-6FB6-499D-8DF0-5E8ED7A9F798@ucolick.org> <7035E0FD-DFD9-49A2-A1A2-07B389C4588D@ucolick.org> Message-ID: <4A9E342C.6060501@astro.rug.nl> Hi, Just a quick note on this. For me, using -O3 in the compiler options got rid of the unwanted zero values (and the weird refinement boundary fluctuations). Not using -O3 with Flash3 creates lots of problems. I hope this is of some use for you. Seyit James Guillochon wrote: > Hi Klaus, > > I am using FLASH 3.0. The problem occurs immediately after restart, > before the first time step. Here's a copy of the log before aborting: > > [ 08-31-2009 21:56:16.578 ] [GRID amr_refine_derefine]: initiating > refinement > [GRID amr_refine_derefine] min blks 17 max blks 21 tot blks 76393 > [GRID amr_refine_derefine] min leaf blks 13 max leaf blks 17 tot > leaf blks 66844 > [ 08-31-2009 21:56:16.655 ] [GRID amr_refine_derefine]: refinement > complete > [DRIVER_ABORT] Driver_abort() called by PE 4093 > abort_message [flash_convert_cc_hook] Trying to convert non-zero > mass-specific variable to per-volume form, but dens is zero! > > Here's the standard output: > > file: wdacc_hdf5_chk_00170 opened for restart > read_data: read 76393 blocks. > io_readData: finished reading input file. > [Eos_init] Cannot open helm_table.bdat! > [Eos_init] Trying old helm_table.dat! > Source terms initialized > don_dist, don_mass 3359870241.058920 6.5677519229596569E+032 > [EOS Helmholtz] WARNING! Mask setting does not speed up Eos > Helmholtz calls > iteration, no. not moved = 0 76246 > iteration, no. not moved = 1 42904 > iteration, no. not moved = 2 0 > refined: total leaf blocks = 66844 > refined: total blocks = 76393 > [flash_convert_cc_hook] PE= 4093, ivar= 4, why=2 > Trying to convert non-zero mass-specific variable to per-volume form, > but dens is zero! > Application 1231431 exit codes: 1 > Application 1231431 exit signals: Killed > Application 1231431 resources: utime 0, stime 0 > > The error seems to be happening on one of the very last indexed > processors (There are 4096 processors total, error is happening on > 4093), and only on one of them. > > I've tried enabling "amr_error_checking" to dump some additional > information, if I enable that option I end up with segmentation faults > on all processors. Here's the standard output just before crashing: > > mpi_amr_1blk_restrict: after commsetup: pe 3 > mpi_amr_1blk_restrict: after commsetup: pe 2 > mpi_amr_1blk_restrict: pe 3 blk 10 ich = > 1 > mpi_amr_1blk_restrict: pe 3 blk 10 child = > 3 11 > mpi_amr_1blk_restrict: pe 3 blk 10 cnodetype = > 1 > mpi_amr_1blk_restrict: pe 3 blk 10 cempty = > 0 > mpi_amr_1blk_restrict: pe 2 blk 1 ich = > 1 > mpi_amr_1blk_restrict: pe 3 blk 10 calling perm > mpi_amr_1blk_restrict: pe 2 blk 1 child = > 2 2 > mpi_amr_1blk_restrict: pe 2 blk 1 cnodetype = > 1 > mpi_amr_1blk_restrict: pe 2 blk 1 cempty = > 0 > mpi_amr_1blk_restrict: pe 2 blk 1 calling perm > mpi_amr_1blk_restrict: pe 1 blk 2 > after reset blockgeom > mpi_amr_1blk_restrict: pe 1 blk 2 > bef reset amr_restrict_unk_fun > mpi_amr_1blk_restrict: pe 3 blk 10 exited perm > mpi_amr_1blk_restrict: pe 2 blk 1 exited perm > mpi_amr_1blk_restrict: pe 3 blk 10 calling > blockgeom > mpi_amr_1blk_restrict: pe 1 blk 2 > aft reset amr_restrict_unk_fun > mpi_amr_1blk_restrict: pe 2 blk 1 calling > blockgeom > mpi_amr_1blk_restrict: pe 1 blk 2 aft lcc > mpi_amr_1blk_restrict: pe 1 blk 2 ich = > 3 > mpi_amr_1blk_restrict: pe 1 blk 2 child = > 1 5 > mpi_amr_1blk_restrict: pe 1 blk 2 cnodetype = > 1 > mpi_amr_1blk_restrict: pe 1 blk 2 cempty = > 0 > mpi_amr_1blk_restrict: pe 1 blk 2 calling perm > Application 1231925 exit codes: 139 > Application 1231925 exit signals: Killed > Application 1231925 resources: utime 4885, stime 185 > > Unfortunately, I wasn't able to pin down what I actually did to fix > the problem I had with zero density values a few months ago. I had > been trying many different things, including changing bits of the > actual Simulation code. > > Your help is very much appreciated! > From jfg at ucolick.org Fri Sep 4 19:10:14 2009 From: jfg at ucolick.org (James Guillochon) Date: Fri, 4 Sep 2009 17:10:14 -0700 Subject: [FLASH-USERS] 4000+ cpus on Franklin In-Reply-To: References: <815AAC49-6FB6-499D-8DF0-5E8ED7A9F798@ucolick.org> Message-ID: <8259D92A-8DC7-4E0C-A7E4-9DD812E838F4@ucolick.org> Some more information on this issue: I compiled FLASH with -debug and I get the following errors: 1 0: Subscript out of range for array loc_message_size (mpi_pack_blocks.F90: 146) 2 subscript=-1, lower bound=1, upper bound=54, dimension=1 3 0: Subscript out of range for array loc_message_size (mpi_pack_blocks.F90: 146) 4 subscript=-1, lower bound=1, upper bound=54, dimension=1 5 0: Subscript out of range for array loc_message_size (mpi_pack_blocks.F90: 146) 6 subscript=-1, lower bound=1, upper bound=54, dimension=1 7 0: Subscript out of range for array loc_message_size (mpi_pack_blocks.F90: 146) 8 subscript=-1, lower bound=1, upper bound=54, dimension=1 9 0: Subscript out of range for array loc_message_size (mpi_pack_blocks.F90: 146) 10 subscript=-1, lower bound=1, upper bound=54, dimension=1 11 0: Subscript out of range for array loc_message_size (mpi_pack_blocks.F90: 146) 12 subscript=-1, lower bound=1, upper bound=54, dimension=1 13 0: Subscript out of range for array loc_message_size (mpi_pack_blocks.F90: 146) 14 subscript=-1, lower bound=1, upper bound=54, dimension=1 15 0: Subscript out of range for array loc_message_size (mpi_pack_blocks.F90: 146) 16 subscript=-1, lower bound=1, upper bound=54, dimension=1 17 0: Subscript out of range for array loc_message_size (mpi_pack_blocks.F90: 146) 18 subscript=-1, lower bound=1, upper bound=54, dimension=1 19 0: Subscript out of range for array loc_message_size (mpi_pack_blocks.F90: 146) 20 subscript=-1, lower bound=1, upper bound=54, dimension=1 21 0: Subscript out of range for array loc_message_size (mpi_pack_blocks.F90: 146) 22 subscript=-1, lower bound=1, upper bound=54, dimension=1 23 0: Subscript out of range for array loc_message_size (mpi_pack_blocks.F90: 146) 24 subscript=-1, lower bound=1, upper bound=54, dimension=1 Etc, etc. The -1 subscript is coming from the "to_be_sent" array, which by default is initialized as an array with all entries = -1, but is populated by MPI calls in another function. So it seems like the error leading to my crash is somewhere in the PM3 mpi_source directory. Anyone familiar with that part of the code? I would upgrade to 3.2 to see if PM4 fixes the problem, but I am not sure if I can restart from my 3.0 checkpoint if I do that. Thanks all! -- James Guillochon Department of Astronomy & Astrophysics University of California, Santa Cruz jfg at ucolick.org On Sep 1, 2009, at 8:26 AM, Klaus Weide wrote: > On Mon, 31 Aug 2009, James Guillochon wrote: > >> Hi all, >> >> I'm trying to restart a FLASH simulation on Franklin. If I run on >> 2000 cpus, >> the job runs fine, however if I try to push the code to 4000 cpus, >> I get the >> following error: >> >> abort_message [flash_convert_cc_hook] Trying to convert non-zero >> mass-specific >> variable to per-volume form, but dens is zero! > > James, > > As you have probably noticed, the problem (or at least the symptom) is > essentially the same as what you reported to flash-users in April: > unwanted zero values in DENS_VAR. The code in flash_convert_cc_hook, > which triggers the abort, does essentially the same as the code in > amr_prolong_gen_unk1_fun, where the problem showed up in your previous > report. I don't know what was ultimately the cause of the previous > problem, but you have solved it somehow; could the cause be similar > this > time? > > Please remind us whether you are using the latest released version of > FLASH. Also, > - does this problem occur immediately after restart, or some > time later? > - What are the last log file messages before the abort? > - There should also have been a message on standard output, with > additional information (PE, ivar, and value). Do you have that? > - Does the problem occur on several PEs at the same time (your should > then see several of the standard output messages), or only on one > CPU? > > > Klaus > > !DSPAM:10135,4a9d3d40291851707132194! > From jzuhone at cfa.harvard.edu Fri Sep 4 19:20:09 2009 From: jzuhone at cfa.harvard.edu (John ZuHone) Date: Fri, 4 Sep 2009 20:20:09 -0400 Subject: [FLASH-USERS] 4000+ cpus on Franklin In-Reply-To: <8259D92A-8DC7-4E0C-A7E4-9DD812E838F4@ucolick.org> References: <815AAC49-6FB6-499D-8DF0-5E8ED7A9F798@ucolick.org> <8259D92A-8DC7-4E0C-A7E4-9DD812E838F4@ucolick.org> Message-ID: <07ECFCCD-665C-40B2-A6A2-72CDABAC63E5@cfa.harvard.edu> James, My (hopefully educated) guess based on my experience is that a restart with the later version should work, as the data structures stored in the checkpoint are the same (someone please correct me if not!). If it does you will of course maybe want to try an earlier checkpoint to make sure FLASH gives the same answers. Of course this doesn't resolve the bug but would be a nice check. Best, John ========================== Sent from John ZuHone's iPhone On Sep 4, 2009, at 8:10 PM, James Guillochon wrote: > Some more information on this issue: > > I compiled FLASH with -debug and I get the following errors: > > 1 0: Subscript out of range for array loc_message_size > (mpi_pack_blocks.F90: 146) > 2 subscript=-1, lower bound=1, upper bound=54, dimension=1 > 3 0: Subscript out of range for array loc_message_size > (mpi_pack_blocks.F90: 146) > 4 subscript=-1, lower bound=1, upper bound=54, dimension=1 > 5 0: Subscript out of range for array loc_message_size > (mpi_pack_blocks.F90: 146) > 6 subscript=-1, lower bound=1, upper bound=54, dimension=1 > 7 0: Subscript out of range for array loc_message_size > (mpi_pack_blocks.F90: 146) > 8 subscript=-1, lower bound=1, upper bound=54, dimension=1 > 9 0: Subscript out of range for array loc_message_size > (mpi_pack_blocks.F90: 146) > 10 subscript=-1, lower bound=1, upper bound=54, dimension=1 > 11 0: Subscript out of range for array loc_message_size > (mpi_pack_blocks.F90: 146) > 12 subscript=-1, lower bound=1, upper bound=54, dimension=1 > 13 0: Subscript out of range for array loc_message_size > (mpi_pack_blocks.F90: 146) > 14 subscript=-1, lower bound=1, upper bound=54, dimension=1 > 15 0: Subscript out of range for array loc_message_size > (mpi_pack_blocks.F90: 146) > 16 subscript=-1, lower bound=1, upper bound=54, dimension=1 > 17 0: Subscript out of range for array loc_message_size > (mpi_pack_blocks.F90: 146) > 18 subscript=-1, lower bound=1, upper bound=54, dimension=1 > 19 0: Subscript out of range for array loc_message_size > (mpi_pack_blocks.F90: 146) > 20 subscript=-1, lower bound=1, upper bound=54, dimension=1 > 21 0: Subscript out of range for array loc_message_size > (mpi_pack_blocks.F90: 146) > 22 subscript=-1, lower bound=1, upper bound=54, dimension=1 > 23 0: Subscript out of range for array loc_message_size > (mpi_pack_blocks.F90: 146) > 24 subscript=-1, lower bound=1, upper bound=54, dimension=1 > > Etc, etc. The -1 subscript is coming from the "to_be_sent" array, > which by default is initialized as an array with all entries = -1, > but is populated by MPI calls in another function. So it seems like > the error leading to my crash is somewhere in the PM3 mpi_source > directory. > > Anyone familiar with that part of the code? I would upgrade to 3.2 > to see if PM4 fixes the problem, but I am not sure if I can restart > from my 3.0 checkpoint if I do that. > > Thanks all! > > -- > James Guillochon > Department of Astronomy & Astrophysics > University of California, Santa Cruz > jfg at ucolick.org > > On Sep 1, 2009, at 8:26 AM, Klaus Weide wrote: > >> On Mon, 31 Aug 2009, James Guillochon wrote: >> >>> Hi all, >>> >>> I'm trying to restart a FLASH simulation on Franklin. If I run on >>> 2000 cpus, >>> the job runs fine, however if I try to push the code to 4000 cpus, >>> I get the >>> following error: >>> >>> abort_message [flash_convert_cc_hook] Trying to convert non-zero >>> mass-specific >>> variable to per-volume form, but dens is zero! >> >> James, >> >> As you have probably noticed, the problem (or at least the symptom) >> is >> essentially the same as what you reported to flash-users in April: >> unwanted zero values in DENS_VAR. The code in flash_convert_cc_hook, >> which triggers the abort, does essentially the same as the code in >> amr_prolong_gen_unk1_fun, where the problem showed up in your >> previous >> report. I don't know what was ultimately the cause of the previous >> problem, but you have solved it somehow; could the cause be similar >> this >> time? >> >> Please remind us whether you are using the latest released version of >> FLASH. Also, >> - does this problem occur immediately after restart, or some >> time later? >> - What are the last log file messages before the abort? >> - There should also have been a message on standard output, with >> additional information (PE, ivar, and value). Do you have that? >> - Does the problem occur on several PEs at the same time (your should >> then see several of the standard output messages), or only on one >> CPU? >> >> >> Klaus >> >> !DSPAM:10135,4a9d3d40291851707132194! >> From klaus at flash.uchicago.edu Fri Sep 4 22:57:58 2009 From: klaus at flash.uchicago.edu (Klaus Weide) Date: Fri, 4 Sep 2009 22:57:58 -0500 (CDT) Subject: [FLASH-USERS] 4000+ cpus on Franklin In-Reply-To: <8259D92A-8DC7-4E0C-A7E4-9DD812E838F4@ucolick.org> References: <815AAC49-6FB6-499D-8DF0-5E8ED7A9F798@ucolick.org> <8259D92A-8DC7-4E0C-A7E4-9DD812E838F4@ucolick.org> Message-ID: On Fri, 4 Sep 2009, James Guillochon wrote: > Etc, etc. The -1 subscript is coming from the "to_be_sent" array, which by > default is initialized as an array with all entries = -1, but is populated by > MPI calls in another function. So it seems like the error leading to my crash > is somewhere in the PM3 mpi_source directory. > > Anyone familiar with that part of the code? I would upgrade to 3.2 to see if > PM4 fixes the problem, but I am not sure if I can restart from my 3.0 > checkpoint if I do that. James, Yes it is a good idea to upgrade to 3.2. Quite a bit of experience with running on large numbers of processors has gone into the code development since version 3.0 was released. So the cause of the problem may well have been fixed, with either the Paramesh4.0 or the Paramesh4dev variants of PARAMESH that come with FLASH3.2. The format of checkpoint files has changed slightly, but restarting with version 3.2 code from 3.0 checkpoint files should work fine. (Unless the grid state was somehow already corrupt when the checkpoint was written!) Klaus From kmur at kft.umcs.lublin.pl Sat Sep 5 04:54:06 2009 From: kmur at kft.umcs.lublin.pl (Kris Murawski) Date: Sat, 05 Sep 2009 11:54:06 +0200 Subject: [FLASH-USERS] changing description of axis in IDL Message-ID: <4AA2353E.9020308@kft.umcs.lublin.pl> Dear FLASH users, I am using FLASH 3.2 with the default unit system which is "none" in fact. I measure time in seconds but a spatial coordinate is expressed in Mega meters. While drawing plasma profiles I got x, y and other spatial plasma variables expressed in cgs units. Please have a look at the attached file. Is there any simple way of having "x(Mm)" and "y(Mm)" instead of "x(cm)" and "y(cm)"? Ideally it would be nice to replace "(g/cm^3)" by " ". Thanks a lot in advance for your time spent on solving my problem. Happy Flashing, Kris From latife at astro.rug.nl Mon Sep 7 07:47:00 2009 From: latife at astro.rug.nl (Latif) Date: Mon, 07 Sep 2009 14:47:00 +0200 Subject: [FLASH-USERS] WARNING after gc filling Message-ID: <4AA500C4.9060202@astro.rug.nl> Hi Dear All, I am using FLASH 3.2. I am getting following warnings which i saw in FLASH 3.1. After discussion about them, it was suggested that they are not harmful. Same warnings are appearing in FLASH 3.2, which are coming from gr_sanitizeDataAfterInterp.F90. If i look at values in Visit they seem normal. what is your take on this. WARNING after gc filling: min. unk(EINT_VAR)=79113608.35385749 PE=0 block=433 type=1 Cheers Latif From jfg at ucolick.org Mon Sep 7 19:58:36 2009 From: jfg at ucolick.org (James Guillochon) Date: Mon, 7 Sep 2009 17:58:36 -0700 Subject: [FLASH-USERS] 4000+ cpus on Franklin In-Reply-To: References: <815AAC49-6FB6-499D-8DF0-5E8ED7A9F798@ucolick.org> <8259D92A-8DC7-4E0C-A7E4-9DD812E838F4@ucolick.org> Message-ID: <73FDB89B-1AEF-493A-A6F5-DA397F6E7475@ucolick.org> OK, I've upgraded to FLASH 3.2...again, I can run the simulation on 2048 processors, but when I try 4096 I get the following errors: 1 aborting job: 2 Fatal error in MPI_Irecv: Invalid tag, error stack: 3 MPI_Irecv(144): MPI_Irecv(buf=0x1373f760, count=378, MPI_INTEGER, src=4095, tag=16777312, MPI_COMM_WORLD, request=0x1377aea8) failed 4 MPI_Irecv(97).: Invalid tag, value is 16777312 5 aborting job: 6 Fatal error in MPI_Irecv: Invalid tag, error stack: 7 MPI_Irecv(144): MPI_Irecv(buf=0x1373e040, count=378, MPI_INTEGER, src=4095, tag=16777315, MPI_COMM_WORLD, request=0x1377af18) failed 8 MPI_Irecv(97).: Invalid tag, value is 16777315 9 aborting job: 10 Fatal error in MPI_Irecv: Invalid tag, error stack: 11 MPI_Irecv(144): MPI_Irecv(buf=0x13788c98, count=378, MPI_INTEGER, src=4095, tag=16777313, MPI_COMM_WORLD, request=0x13741674) failed 12 MPI_Irecv(97).: Invalid tag, value is 16777313 13 aborting job: 14 Fatal error in MPI_Irecv: Invalid tag, error stack: 15 MPI_Irecv(144): MPI_Irecv(buf=0x1373e040, count=378, MPI_INTEGER, src=4095, tag=16777314, MPI_COMM_WORLD, request=0x1377af18) failed 16 MPI_Irecv(97).: Invalid tag, value is 16777314 17 aborting job: 18 Fatal error in MPI_Ssend: Invalid tag, error stack: 19 MPI_Ssend(167): MPI_Ssend(buf=0x1354d910, count=378, MPI_INTEGER, dest=4091, tag=16777312, MPI_COMM_WORLD) failed 20 MPI_Ssend(93).: Invalid tag, value is 16777312 21 [NID 12795]Apid 1253267: initiated application termination MPI_TAG_UB is about 2 billion on the machine I'm running on, so the tag numbers here don't seem to be out of range... I also tried switching the paramesh library to paramesh4dev...but for some reason it seems unable to find the amr_runtime_parameters file. I've tried setting "ParameshLibraryMode=true", but that doesn't seem to help. Here is the error: 1 PGFIO-F-209/OPEN/unit=35/'OLD' specified for file which does not exist. 2 File name = amr_runtime_parameters 3 In source file amr_set_runtime_parameters.F90, at line number 86 4 [NID 1593]Apid 1255265: initiated application termination Thanks in advance! -- James Guillochon Department of Astronomy & Astrophysics University of California, Santa Cruz jfg at ucolick.org On Sep 4, 2009, at 8:57 PM, Klaus Weide wrote: > On Fri, 4 Sep 2009, James Guillochon wrote: > >> Etc, etc. The -1 subscript is coming from the "to_be_sent" array, >> which by >> default is initialized as an array with all entries = -1, but is >> populated by >> MPI calls in another function. So it seems like the error leading >> to my crash >> is somewhere in the PM3 mpi_source directory. >> >> Anyone familiar with that part of the code? I would upgrade to 3.2 >> to see if >> PM4 fixes the problem, but I am not sure if I can restart from my 3.0 >> checkpoint if I do that. > > James, > > Yes it is a good idea to upgrade to 3.2. Quite a bit of experience > with > running on large numbers of processors has gone into the code > development > since version 3.0 was released. So the cause of the problem may > well have > been fixed, with either the Paramesh4.0 or the Paramesh4dev variants > of > PARAMESH that come with FLASH3.2. > > The format of checkpoint files has changed slightly, but restarting > with version 3.2 code from 3.0 checkpoint files should work fine. > (Unless the grid state was somehow already corrupt when the checkpoint > was written!) > > Klaus > > !DSPAM:10135,4aa1e1c6491021468! > From phy1erp at leeds.ac.uk Tue Sep 8 04:17:56 2009 From: phy1erp at leeds.ac.uk (Ross Parkin) Date: Tue, 08 Sep 2009 10:17:56 +0100 Subject: [FLASH-USERS] Improving the load balancing in FLASH Message-ID: <20090908101756.1hyev2dujqos08ok@webmail.leeds.ac.uk> Hello, I've been thinking about the load balancing method used in FLASH. The timestep used by the hydro units is the same on all refinement levels and so blocks are essentially distributed by paramesh so that all processors have the roughly the smae number of blocks. Has anyone tried modifying FLASH so that each refinement level has its own timestep? To do this properly you would need to modify how PARAMESH distributes blocks amongst processors, taking account of the relative work done by each block on each refinement level, i.e. a block on a finer refinement level may need to do twice as many hydro loops before its simulation time equals that of a coarser block. Any ideas guys? Cheers, Ross Parkin From dubey at flash.uchicago.edu Tue Sep 8 08:25:04 2009 From: dubey at flash.uchicago.edu (Anshu Dubey) Date: Tue, 8 Sep 2009 08:25:04 -0500 Subject: [FLASH-USERS] Improving the load balancing in FLASH In-Reply-To: <20090908101756.1hyev2dujqos08ok@webmail.leeds.ac.uk> References: <20090908101756.1hyev2dujqos08ok@webmail.leeds.ac.uk> Message-ID: <5b942a610909080625y21a8cd99xd4101dc1d114aff@mail.gmail.com> This was considered several years ago. Then Mike Zingale and Jonathan Dursi proved that the computational savings from coarse blocks having a coarse time step are too insignificant to be worth pursuing. This is because the computational time in the finer blocks completely dominates that in coarser blocks for the same area of computational domain. I believe that study appeared in some proceedings, if you are interested I can dig up the reference. Anshu On Tue, Sep 8, 2009 at 4:17 AM, Ross Parkin wrote: > Hello, > > I've been thinking about the load balancing method used in FLASH. The > timestep used by the hydro units is the same on all refinement levels and so > blocks are essentially distributed by paramesh so that all processors have > the roughly the smae number of blocks. Has anyone tried modifying FLASH so > that each refinement level has its own timestep? To do this properly you > would need to modify how PARAMESH distributes blocks amongst processors, > taking account of the relative work done by each block on each refinement > level, i.e. a block on a finer refinement level may need to do twice as many > hydro loops before its simulation time equals that of a coarser block. > > Any ideas guys? > > Cheers, > > Ross Parkin > From oshea at msu.edu Tue Sep 8 08:35:48 2009 From: oshea at msu.edu (Brian O'Shea) Date: Tue, 8 Sep 2009 09:35:48 -0400 Subject: [FLASH-USERS] Improving the load balancing in FLASH In-Reply-To: <5b942a610909080625y21a8cd99xd4101dc1d114aff@mail.gmail.com> References: <20090908101756.1hyev2dujqos08ok@webmail.leeds.ac.uk> <5b942a610909080625y21a8cd99xd4101dc1d114aff@mail.gmail.com> Message-ID: In all fairness, the statement that Anshu made depends a lot on the type of problem you are using. For example, the Enzo AMR code is mainly used for astrophysical applications that are gravity-dominated, so most of the action takes place in a very small fraction of the simulation volume, and can use many levels of refinement. The only way that this sort of problem is tractable is by using adaptive time- stepping. One of the main complications of such an adaptive time- stepping scheme is keeping all of the mesh levels synched up. It also makes load-balancing a significant challenge - one is forced to distribute grids on a level-by-level basis, and if only a small fraction of the volume is refined (as in our cosmology simulations...) you destroy grid locality, which increases communication and decreases scalability. Brian > This was considered several years ago. Then Mike Zingale and Jonathan > Dursi proved that the computational savings from coarse blocks having > a coarse time step are too insignificant to be worth pursuing. This is > because the computational time in the finer blocks completely > dominates that in coarser blocks for the same area of computational > domain. I believe that study appeared in some proceedings, if you are > interested I can dig up the reference. > > Anshu > > On Tue, Sep 8, 2009 at 4:17 AM, Ross Parkin > wrote: >> Hello, >> >> I've been thinking about the load balancing method used in FLASH. The >> timestep used by the hydro units is the same on all refinement >> levels and so >> blocks are essentially distributed by paramesh so that all >> processors have >> the roughly the smae number of blocks. Has anyone tried modifying >> FLASH so >> that each refinement level has its own timestep? To do this >> properly you >> would need to modify how PARAMESH distributes blocks amongst >> processors, >> taking account of the relative work done by each block on each >> refinement >> level, i.e. a block on a finer refinement level may need to do >> twice as many >> hydro loops before its simulation time equals that of a coarser >> block. >> >> Any ideas guys? >> >> Cheers, >> >> Ross Parkin >> > From mzingale at scotty.ess.sunysb.edu Tue Sep 8 08:43:04 2009 From: mzingale at scotty.ess.sunysb.edu (Mike Zingale) Date: Tue, 8 Sep 2009 09:43:04 -0400 (EDT) Subject: [FLASH-USERS] Improving the load balancing in FLASH In-Reply-To: References: <20090908101756.1hyev2dujqos08ok@webmail.leeds.ac.uk> <5b942a610909080625y21a8cd99xd4101dc1d114aff@mail.gmail.com> Message-ID: Brian is right -- the situation is entirely problem dependent. For cases where you are locally refining only a small region of the domain, subcycling in time can give you a big boost. Jonathan and I considered cases that were typical to the applications that the FLASH center was working on at the time -- large regions were typically refined. Mike On Tue, 8 Sep 2009, Brian O'Shea wrote: > In all fairness, the statement that Anshu made depends a lot on the type of > problem you are using. For example, the Enzo AMR code is mainly used for > astrophysical applications that are gravity-dominated, so most of the action > takes place in a very small fraction of the simulation volume, and can use > many levels of refinement. The only way that this sort of problem is > tractable is by using adaptive time-stepping. One of the main complications > of such an adaptive time-stepping scheme is keeping all of the mesh levels > synched up. It also makes load-balancing a significant challenge - one is > forced to distribute grids on a level-by-level basis, and if only a small > fraction of the volume is refined (as in our cosmology simulations...) you > destroy grid locality, which increases communication and decreases > scalability. > > Brian > >> This was considered several years ago. Then Mike Zingale and Jonathan >> Dursi proved that the computational savings from coarse blocks having >> a coarse time step are too insignificant to be worth pursuing. This is >> because the computational time in the finer blocks completely >> dominates that in coarser blocks for the same area of computational >> domain. I believe that study appeared in some proceedings, if you are >> interested I can dig up the reference. >> >> Anshu >> >> On Tue, Sep 8, 2009 at 4:17 AM, Ross Parkin wrote: >>> Hello, >>> >>> I've been thinking about the load balancing method used in FLASH. The >>> timestep used by the hydro units is the same on all refinement levels and >>> so >>> blocks are essentially distributed by paramesh so that all processors have >>> the roughly the smae number of blocks. Has anyone tried modifying FLASH so >>> that each refinement level has its own timestep? To do this properly you >>> would need to modify how PARAMESH distributes blocks amongst processors, >>> taking account of the relative work done by each block on each refinement >>> level, i.e. a block on a finer refinement level may need to do twice as >>> many >>> hydro loops before its simulation time equals that of a coarser block. >>> >>> Any ideas guys? >>> >>> Cheers, >>> >>> Ross Parkin >>> > ----------------------------------------------------------------------------- Michael Zingale (mzingale at mail.astro.sunysb.edu) Assistant Professor Dept. of Physics and Astronomy office: ESS 440 Stony Brook University phone: 631-632-8225 Stony Brook, NY 11794-3800 web: http://www.astro.sunysb.edu/mzingale ----------------------------------------------------------------------------- From dubey at flash.uchicago.edu Tue Sep 8 08:43:23 2009 From: dubey at flash.uchicago.edu (Anshu Dubey) Date: Tue, 8 Sep 2009 08:43:23 -0500 Subject: [FLASH-USERS] Improving the load balancing in FLASH In-Reply-To: References: <20090908101756.1hyev2dujqos08ok@webmail.leeds.ac.uk> <5b942a610909080625y21a8cd99xd4101dc1d114aff@mail.gmail.com> Message-ID: <5b942a610909080643x2ffffb1dt32b2e51de76d660c@mail.gmail.com> Brian, The statement pertains to Paramesh as much as the problem itself, I should have stated that in my earlier email. Paramesh is Oct-tree based and has the requirement that adjacent blocks cannot be more than a factor of two different in their resolution. Enzo I believe is patch based where these restriction don't apply, and therefore my earlier statement may not be valid for many problems. Anshu On Tue, Sep 8, 2009 at 8:35 AM, Brian O'Shea wrote: > In all fairness, the statement that Anshu made depends a lot on the type of > problem you are using. ?For example, the Enzo AMR code is mainly used for > astrophysical applications that are gravity-dominated, so most of the action > takes place in a very small fraction of the simulation volume, and can use > many levels of refinement. ?The only way that this sort of problem is > tractable is by using adaptive time-stepping. ?One of the main complications > of such an adaptive time-stepping scheme is keeping all of the mesh levels > synched up. ?It also makes load-balancing a significant challenge - one is > forced to distribute grids on a level-by-level basis, and if only a small > fraction of the volume is refined (as in our cosmology simulations...) you > destroy grid locality, which increases communication and decreases > scalability. > > Brian > >> This was considered several years ago. Then Mike Zingale and Jonathan >> Dursi proved that the computational savings from coarse blocks having >> a coarse time step are too insignificant to be worth pursuing. This is >> because the computational time in the finer blocks completely >> dominates that in coarser blocks for the same area of computational >> domain. I believe that study appeared in some proceedings, if you are >> interested I can dig up the reference. >> >> Anshu >> >> On Tue, Sep 8, 2009 at 4:17 AM, Ross Parkin wrote: >>> >>> Hello, >>> >>> I've been thinking about the load balancing method used in FLASH. The >>> timestep used by the hydro units is the same on all refinement levels and >>> so >>> blocks are essentially distributed by paramesh so that all processors >>> have >>> the roughly the smae number of blocks. Has anyone tried modifying FLASH >>> so >>> that each refinement level has its own timestep? To do this properly you >>> would need to modify how PARAMESH distributes blocks amongst processors, >>> taking account of the relative work done by each block on each refinement >>> level, i.e. a block on a finer refinement level may need to do twice as >>> many >>> hydro loops before its simulation time equals that of a coarser block. >>> >>> Any ideas guys? >>> >>> Cheers, >>> >>> Ross Parkin >>> >> > > From dubey at flash.uchicago.edu Tue Sep 8 08:56:45 2009 From: dubey at flash.uchicago.edu (Anshu Dubey) Date: Tue, 8 Sep 2009 08:56:45 -0500 Subject: [FLASH-USERS] Improving the load balancing in FLASH In-Reply-To: References: <20090908101756.1hyev2dujqos08ok@webmail.leeds.ac.uk> <5b942a610909080625y21a8cd99xd4101dc1d114aff@mail.gmail.com> Message-ID: <5b942a610909080656s796e826fxeb9a5cd774b94e12@mail.gmail.com> I suppose if you have an extremely unbalanced oct-tree that could work out. But I'd have to work through an example to be fully convinced of it, I have a feeling we will find that the factor of two constraint will limit much of gain to be had by subcycling. Anshu On Tue, Sep 8, 2009 at 8:43 AM, Mike Zingale wrote: > Brian is right -- the situation is entirely problem dependent. ?For cases > where you are locally refining only a small region of the domain, subcycling > in time can give you a big boost. ?Jonathan and I considered cases that were > typical to the applications that the FLASH center was working on at the time > -- large regions were typically refined. > > Mike > > > On Tue, 8 Sep 2009, Brian O'Shea wrote: > >> In all fairness, the statement that Anshu made depends a lot on the type >> of problem you are using. ?For example, the Enzo AMR code is mainly used for >> astrophysical applications that are gravity-dominated, so most of the action >> takes place in a very small fraction of the simulation volume, and can use >> many levels of refinement. ?The only way that this sort of problem is >> tractable is by using adaptive time-stepping. ?One of the main complications >> of such an adaptive time-stepping scheme is keeping all of the mesh levels >> synched up. ?It also makes load-balancing a significant challenge - one is >> forced to distribute grids on a level-by-level basis, and if only a small >> fraction of the volume is refined (as in our cosmology simulations...) you >> destroy grid locality, which increases communication and decreases >> scalability. >> >> Brian >> >>> This was considered several years ago. Then Mike Zingale and Jonathan >>> Dursi proved that the computational savings from coarse blocks having >>> a coarse time step are too insignificant to be worth pursuing. This is >>> because the computational time in the finer blocks completely >>> dominates that in coarser blocks for the same area of computational >>> domain. I believe that study appeared in some proceedings, if you are >>> interested I can dig up the reference. >>> >>> Anshu >>> >>> On Tue, Sep 8, 2009 at 4:17 AM, Ross Parkin wrote: >>>> >>>> Hello, >>>> >>>> I've been thinking about the load balancing method used in FLASH. The >>>> timestep used by the hydro units is the same on all refinement levels >>>> and so >>>> blocks are essentially distributed by paramesh so that all processors >>>> have >>>> the roughly the smae number of blocks. Has anyone tried modifying FLASH >>>> so >>>> that each refinement level has its own timestep? To do this properly you >>>> would need to modify how PARAMESH distributes blocks amongst processors, >>>> taking account of the relative work done by each block on each >>>> refinement >>>> level, i.e. a block on a finer refinement level may need to do twice as >>>> many >>>> hydro loops before its simulation time equals that of a coarser block. >>>> >>>> Any ideas guys? >>>> >>>> Cheers, >>>> >>>> Ross Parkin >>>> >> > > > ----------------------------------------------------------------------------- > Michael Zingale (mzingale at mail.astro.sunysb.edu) > Assistant Professor > > Dept. of Physics and Astronomy ? ? office: ESS 440 > Stony Brook University ? ? ? ? ? ? phone: ?631-632-8225 > Stony Brook, NY 11794-3800 ? ? ? ? web: http://www.astro.sunysb.edu/mzingale > ----------------------------------------------------------------------------- > > From phy1erp at leeds.ac.uk Tue Sep 8 09:17:16 2009 From: phy1erp at leeds.ac.uk (Ross Parkin) Date: Tue, 08 Sep 2009 15:17:16 +0100 Subject: [FLASH-USERS] Improving the load balancing in FLASH Message-ID: <20090908151716.jzgvws5m3kg480gk@webmail.leeds.ac.uk> I can see how the choice is problem specific. Sadly, the problem I'm working on at the minute requires a few levels more refinement in a very small fraction of the grid and the use of a global timestep is making the simulation run quite slowly. I had a quick look in the paramesh manual and they do discuss using paramesh with different timesteps on different refinement levels. As far as I can see this would involve altering how FLASH (or the paramesh bits of FLASH to be more exact) does the loadbalancing, then set the hydro loop to run with a level dependent timestep. I was wondering if anyone had attempted to re-jig FLASH to do this, but from the sounds of things it's a bigger task than I first anticipated. Thanks for the input, Ross Quoting Anshu Dubey : > Brian, > > The statement pertains to Paramesh as much as the problem itself, I > should have stated that in my earlier email. Paramesh is Oct-tree > based and has the requirement that adjacent blocks cannot be more than > a factor of two different in their resolution. Enzo I believe is patch > based where these restriction don't apply, and therefore my earlier > statement may not be valid for many problems. > > Anshu > > On Tue, Sep 8, 2009 at 8:35 AM, Brian O'Shea wrote: >> In all fairness, the statement that Anshu made depends a lot on the type of >> problem you are using. ?For example, the Enzo AMR code is mainly used for >> astrophysical applications that are gravity-dominated, so most of the action >> takes place in a very small fraction of the simulation volume, and can use >> many levels of refinement. ?The only way that this sort of problem is >> tractable is by using adaptive time-stepping. ?One of the main complications >> of such an adaptive time-stepping scheme is keeping all of the mesh levels >> synched up. ?It also makes load-balancing a significant challenge - one is >> forced to distribute grids on a level-by-level basis, and if only a small >> fraction of the volume is refined (as in our cosmology simulations...) you >> destroy grid locality, which increases communication and decreases >> scalability. >> >> Brian >> >>> This was considered several years ago. Then Mike Zingale and Jonathan >>> Dursi proved that the computational savings from coarse blocks having >>> a coarse time step are too insignificant to be worth pursuing. This is >>> because the computational time in the finer blocks completely >>> dominates that in coarser blocks for the same area of computational >>> domain. I believe that study appeared in some proceedings, if you are >>> interested I can dig up the reference. >>> >>> Anshu >>> >>> On Tue, Sep 8, 2009 at 4:17 AM, Ross Parkin wrote: >>>> >>>> Hello, >>>> >>>> I've been thinking about the load balancing method used in FLASH. The >>>> timestep used by the hydro units is the same on all refinement levels and >>>> so >>>> blocks are essentially distributed by paramesh so that all processors >>>> have >>>> the roughly the smae number of blocks. Has anyone tried modifying FLASH >>>> so >>>> that each refinement level has its own timestep? To do this properly you >>>> would need to modify how PARAMESH distributes blocks amongst processors, >>>> taking account of the relative work done by each block on each refinement >>>> level, i.e. a block on a finer refinement level may need to do twice as >>>> many >>>> hydro loops before its simulation time equals that of a coarser block. >>>> >>>> Any ideas guys? >>>> >>>> Cheers, >>>> >>>> Ross Parkin >>>> >>> >> >> > ----- End forwarded message ----- From cdaley at flash.uchicago.edu Tue Sep 8 09:41:28 2009 From: cdaley at flash.uchicago.edu (Chris Daley) Date: Tue, 08 Sep 2009 09:41:28 -0500 Subject: [FLASH-USERS] 4000+ cpus on Franklin In-Reply-To: <73FDB89B-1AEF-493A-A6F5-DA397F6E7475@ucolick.org> References: <815AAC49-6FB6-499D-8DF0-5E8ED7A9F798@ucolick.org> <8259D92A-8DC7-4E0C-A7E4-9DD812E838F4@ucolick.org> <73FDB89B-1AEF-493A-A6F5-DA397F6E7475@ucolick.org> Message-ID: <4AA66D18.9050200@flash.uchicago.edu> Hi James, > 1 aborting job: > 2 Fatal error in MPI_Irecv: Invalid tag, error stack: > 3 MPI_Irecv(144): MPI_Irecv(buf=0x1373f760, count=378, > MPI_INTEGER, src=4095, tag=16777312, MPI_COMM_WORLD, > request=0x1377aea8) failed > 4 MPI_Irecv(97).: Invalid tag, value is 16777312 It is very likely that the tags are overflowing. I have experienced tag overflow errors on Jaguar when using > 4000 processes (both Jaguar and Franklin are Cray XT systems). Paramesh4dev will not overflow MPI tags. Alternatively, you can use Paramesh4.0 but remove PPDEFINE PM_UNIQUE_MPI_TAGS from Grid/GridMain/paramesh/paramesh4/Paramesh4.0/Config before setting up your application. However, we have sometimes experienced deadlock situations in this special mode with Paramesh4.0 (we're not sure why yet). We've never encountered a deadlock with Paramesh4dev (which uses this special mode by default), so I recommend Paramesh4dev. > > I also tried switching the paramesh library to paramesh4dev...but for > some reason it seems unable to find the amr_runtime_parameters file. Simply copy amr_runtime_parameters from your object directory to your run directory. Regards, Chris James Guillochon wrote: > OK, I've upgraded to FLASH 3.2...again, I can run the simulation on > 2048 processors, but when I try 4096 I get the following errors: > > 1 aborting job: > 2 Fatal error in MPI_Irecv: Invalid tag, error stack: > 3 MPI_Irecv(144): MPI_Irecv(buf=0x1373f760, count=378, > MPI_INTEGER, src=4095, tag=16777312, MPI_COMM_WORLD, > request=0x1377aea8) failed > 4 MPI_Irecv(97).: Invalid tag, value is 16777312 > 5 aborting job: > 6 Fatal error in MPI_Irecv: Invalid tag, error stack: > 7 MPI_Irecv(144): MPI_Irecv(buf=0x1373e040, count=378, > MPI_INTEGER, src=4095, tag=16777315, MPI_COMM_WORLD, > request=0x1377af18) failed > 8 MPI_Irecv(97).: Invalid tag, value is 16777315 > 9 aborting job: > 10 Fatal error in MPI_Irecv: Invalid tag, error stack: > 11 MPI_Irecv(144): MPI_Irecv(buf=0x13788c98, count=378, > MPI_INTEGER, src=4095, tag=16777313, MPI_COMM_WORLD, > request=0x13741674) failed > 12 MPI_Irecv(97).: Invalid tag, value is 16777313 > 13 aborting job: > 14 Fatal error in MPI_Irecv: Invalid tag, error stack: > 15 MPI_Irecv(144): MPI_Irecv(buf=0x1373e040, count=378, > MPI_INTEGER, src=4095, tag=16777314, MPI_COMM_WORLD, > request=0x1377af18) failed > 16 MPI_Irecv(97).: Invalid tag, value is 16777314 > 17 aborting job: > 18 Fatal error in MPI_Ssend: Invalid tag, error stack: > 19 MPI_Ssend(167): MPI_Ssend(buf=0x1354d910, count=378, > MPI_INTEGER, dest=4091, tag=16777312, MPI_COMM_WORLD) failed > 20 MPI_Ssend(93).: Invalid tag, value is 16777312 > 21 [NID 12795]Apid 1253267: initiated application termination > > MPI_TAG_UB is about 2 billion on the machine I'm running on, so the > tag numbers here don't seem to be out of range... > > I also tried switching the paramesh library to paramesh4dev...but for > some reason it seems unable to find the amr_runtime_parameters file. > I've tried setting "ParameshLibraryMode=true", but that doesn't seem > to help. Here is the error: > > 1 PGFIO-F-209/OPEN/unit=35/'OLD' specified for file which does > not exist. > 2 File name = amr_runtime_parameters > 3 In source file amr_set_runtime_parameters.F90, at line number 86 > 4 [NID 1593]Apid 1255265: initiated application termination > > Thanks in advance! > From dubey at flash.uchicago.edu Tue Sep 8 09:58:25 2009 From: dubey at flash.uchicago.edu (Anshu Dubey) Date: Tue, 8 Sep 2009 09:58:25 -0500 Subject: [FLASH-USERS] Improving the load balancing in FLASH In-Reply-To: <5b942a610909080656s796e826fxeb9a5cd774b94e12@mail.gmail.com> References: <20090908101756.1hyev2dujqos08ok@webmail.leeds.ac.uk> <5b942a610909080625y21a8cd99xd4101dc1d114aff@mail.gmail.com> <5b942a610909080656s796e826fxeb9a5cd774b94e12@mail.gmail.com> Message-ID: <5b942a610909080758q117c5ebeoc907dae2e01bb7ab@mail.gmail.com> I take that back. Here is a simple example to counter my argument. n blocks at level m, 1 block refined to level m+1. If n >> 1, subcycling would gain almost a factor of 2. That said, Ross you are right that there is no simple way to do subcycling in FLASH right now. If you can quantify the loss in computational efficiency you are seeing, and if it is likely to be worth your while to spend some time in coding, we can spend some time thinking about how to do it. Anshu On Tue, Sep 8, 2009 at 8:56 AM, Anshu Dubey wrote: > I suppose if you have an extremely unbalanced oct-tree that could work out. > > But I'd have to work through an example to be fully convinced of it, I > have a feeling we will find that the factor of two constraint will > limit much of gain to be had by subcycling. > > Anshu > > On Tue, Sep 8, 2009 at 8:43 AM, Mike > Zingale wrote: >> Brian is right -- the situation is entirely problem dependent. ?For cases >> where you are locally refining only a small region of the domain, subcycling >> in time can give you a big boost. ?Jonathan and I considered cases that were >> typical to the applications that the FLASH center was working on at the time >> -- large regions were typically refined. >> >> Mike >> >> >> On Tue, 8 Sep 2009, Brian O'Shea wrote: >> >>> In all fairness, the statement that Anshu made depends a lot on the type >>> of problem you are using. ?For example, the Enzo AMR code is mainly used for >>> astrophysical applications that are gravity-dominated, so most of the action >>> takes place in a very small fraction of the simulation volume, and can use >>> many levels of refinement. ?The only way that this sort of problem is >>> tractable is by using adaptive time-stepping. ?One of the main complications >>> of such an adaptive time-stepping scheme is keeping all of the mesh levels >>> synched up. ?It also makes load-balancing a significant challenge - one is >>> forced to distribute grids on a level-by-level basis, and if only a small >>> fraction of the volume is refined (as in our cosmology simulations...) you >>> destroy grid locality, which increases communication and decreases >>> scalability. >>> >>> Brian >>> >>>> This was considered several years ago. Then Mike Zingale and Jonathan >>>> Dursi proved that the computational savings from coarse blocks having >>>> a coarse time step are too insignificant to be worth pursuing. This is >>>> because the computational time in the finer blocks completely >>>> dominates that in coarser blocks for the same area of computational >>>> domain. I believe that study appeared in some proceedings, if you are >>>> interested I can dig up the reference. >>>> >>>> Anshu >>>> >>>> On Tue, Sep 8, 2009 at 4:17 AM, Ross Parkin wrote: >>>>> >>>>> Hello, >>>>> >>>>> I've been thinking about the load balancing method used in FLASH. The >>>>> timestep used by the hydro units is the same on all refinement levels >>>>> and so >>>>> blocks are essentially distributed by paramesh so that all processors >>>>> have >>>>> the roughly the smae number of blocks. Has anyone tried modifying FLASH >>>>> so >>>>> that each refinement level has its own timestep? To do this properly you >>>>> would need to modify how PARAMESH distributes blocks amongst processors, >>>>> taking account of the relative work done by each block on each >>>>> refinement >>>>> level, i.e. a block on a finer refinement level may need to do twice as >>>>> many >>>>> hydro loops before its simulation time equals that of a coarser block. >>>>> >>>>> Any ideas guys? >>>>> >>>>> Cheers, >>>>> >>>>> Ross Parkin >>>>> >>> >> >> >> ----------------------------------------------------------------------------- >> Michael Zingale (mzingale at mail.astro.sunysb.edu) >> Assistant Professor >> >> Dept. of Physics and Astronomy ? ? office: ESS 440 >> Stony Brook University ? ? ? ? ? ? phone: ?631-632-8225 >> Stony Brook, NY 11794-3800 ? ? ? ? web: http://www.astro.sunysb.edu/mzingale >> ----------------------------------------------------------------------------- >> >> > From tplewa at fsu.edu Tue Sep 8 13:07:06 2009 From: tplewa at fsu.edu (Tomasz Plewa) Date: Tue, 08 Sep 2009 14:07:06 -0400 Subject: [FLASH-USERS] Improving the load balancing in FLASH In-Reply-To: <20090908101756.1hyev2dujqos08ok@webmail.leeds.ac.uk> References: <20090908101756.1hyev2dujqos08ok@webmail.leeds.ac.uk> Message-ID: <4AA69D4A.4080405@fsu.edu> Early versions of Paramesh provided support for individual level time stepping. Putting that functionality back into the code is not an impossible task, but quite tedious and boring. It would also require essentially doubling the code memory, in the simplest approach. Taking into account the number of time steps required to evolve a given level into consideration when load balancing is less important then the information about refinement level. This is because the number of parent blocks is always smaller than the number of offspring blocks. But it could be included at no cost. At some point data locality would interfere and become a serious problem. This is a common problem for all Berger-Colella AMR implementations. Paramesh is a very restricted subset of that approach and those simplifications make it efficient for most well-balanced physics applications. Level-based load balancing would be helpful in general. One example is a multigrid solver. That has been bypassed algorithmically in Enzo, if I am not mistaken. I was told Chombo provides level-based load balancing. Tomek -- Ross Parkin wrote: > Hello, > > I've been thinking about the load balancing method used in FLASH. The > timestep used by the hydro units is the same on all refinement levels > and so blocks are essentially distributed by paramesh so that all > processors have the roughly the smae number of blocks. Has anyone > tried modifying FLASH so that each refinement level has its own > timestep? To do this properly you would need to modify how PARAMESH > distributes blocks amongst processors, taking account of the relative > work done by each block on each refinement level, i.e. a block on a > finer refinement level may need to do twice as many hydro loops before > its simulation time equals that of a coarser block. > > Any ideas guys? > > Cheers, > > Ross Parkin > -------------- next part -------------- A non-text attachment was scrubbed... Name: tplewa.vcf Type: text/x-vcard Size: 338 bytes Desc: not available Url : http://flash.uchicago.edu/pipermail/flash-users/attachments/20090908/ce2ae14c/attachment.vcf From gforjan at gmu.edu Mon Sep 14 04:08:52 2009 From: gforjan at gmu.edu (Gary F Forjan) Date: Mon, 14 Sep 2009 05:08:52 -0400 Subject: [FLASH-USERS] WARNING after gc fillling problem Message-ID: Hello FLASH Users I have a question similar to Latif's in her email of 9/7/09, only my problem is more serious. I am getting zero values for my ENER_VAR, DENS_VAR, and other variables and my simulation is crashing after a call to EOS. I have a 1-dimensional domain and am using the following boundary condition types xl_boundary_type = "user" xr_boundary_type = "reflect" To implement the user defined boundary condition, I have followed the WindTunnel example and just modified the Grid_applyBCEdge.F90 routine to use my own variable values which appear to be correct. I also make a call to EOS in my Simulation_initBlock.F90 routine using temperature and density to obtain the pressure and energy. I get the following error at the start of the flash run after several refinements have taken place (Nblockx=1) WARNING after gc filling: min. unk(DENS_VAR)=0.000000000000000000000 PE=1 block=1 type=1 1 2.32E-15 2.24E-15 2.16E-15 2.08E-15 2.01E-15 1.94E-15 1.87E-15 1.80E-15 1.74E-15 1.68E-15 1.62E-15 1.56E-15 0.0 0.0 0.0 0.0 WARNING after gc filling: min. unk(ENER_VAR)=0.000000000000000 PE=1 block=1 type=1 1 2.24E+14 2.24E+14 2.24E+14 2.24E+14 2.24E+14 2.24E+14 2.24E+14 2.24E+14 2.24E+14 2.24E+14 2.24E+14 2.24E+14 0.0 0.0 0.0 0.0 WARNING after gc filling: min. unk(EINT_VAR)=0.000000000000000 PE=1 block=1 type=1 1 2.24E+14 2.24E+14 2.24E+14 2.24E+14 2.24E+14 2.24E+14 2.24E+14 2.24E+14 2.24E+14 2.24E+14 2.24E+14 2.24E+14 0.0 0.0 0.0 0.0 DRIVER_ABORT: [Eos_putData] ERROR Density or Internal Energy are zero after a call to EOS! Would anyone know what is causing this problem and or where I should start looking in the code or my setup files to track it down? Thanks for any information. I have attached my Config, Simulation_initBlock.F90, Grid_applyBCEdge.F90, and error output files as well in case they are of any help. One other point: I want to use just a simple fully ionized proton+electron plasma so that I can use radiative cooling in my simulation. This was easy in FLASH2 since you could just specify in the Config file REQUIRES materials/composition/prot+elec For FLASH3, I assume I need to just specify REQUIRES Simulation/SimulationComposition/Ionize in the Config file. Thanks Gary Gary Forjan Department of Computational and Data Sciences George Mason University, Fairfax, Va gforjan at gmu.edu -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: Grid_applyBCEdge.F90 Url: http://flash.uchicago.edu/pipermail/flash-users/attachments/20090914/27251bfb/attachment.pl -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: Simulation_initBlock.F90 Url: http://flash.uchicago.edu/pipermail/flash-users/attachments/20090914/27251bfb/attachment-0001.pl -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: Config Url: http://flash.uchicago.edu/pipermail/flash-users/attachments/20090914/27251bfb/attachment-0002.pl -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: flash_output Url: http://flash.uchicago.edu/pipermail/flash-users/attachments/20090914/27251bfb/attachment-0003.pl -------------- next part -------------- A non-text attachment was scrubbed... Name: flash.log Type: application/octet-stream Size: 101882 bytes Desc: not available Url : http://flash.uchicago.edu/pipermail/flash-users/attachments/20090914/27251bfb/attachment.obj From seyit at astro.rug.nl Mon Sep 14 05:29:40 2009 From: seyit at astro.rug.nl (Seyit Hocuk) Date: Mon, 14 Sep 2009 12:29:40 +0200 Subject: [FLASH-USERS] WARNING after gc fillling problem In-Reply-To: References: Message-ID: <4AAE1B14.5080300@astro.rug.nl> Hi Gary, On the first point, are you compiling with -O3. Somehow not using -O3 is punishable by Flash3 with zero variable values :). This is what I noticed. Seyit Gary F Forjan wrote: > Hello FLASH Users > > I have a question similar to Latif's in her email of 9/7/09, only my problem is more serious. I am getting zero values for my ENER_VAR, DENS_VAR, and other variables and my simulation is crashing after a call to EOS. I have a 1-dimensional domain and am using the following boundary condition types > > xl_boundary_type = "user" > xr_boundary_type = "reflect" > > To implement the user defined boundary condition, I have followed the WindTunnel example and just modified the Grid_applyBCEdge.F90 routine to use my own variable values which appear to be correct. I also make a call to EOS in my Simulation_initBlock.F90 routine using temperature and density to obtain the pressure and energy. I get the following error at the start of the flash run after several refinements have taken place (Nblockx=1) > > WARNING after gc filling: min. unk(DENS_VAR)=0.000000000000000000000 PE=1 block=1 type=1 > 1 2.32E-15 2.24E-15 2.16E-15 2.08E-15 2.01E-15 1.94E-15 1.87E-15 1.80E-15 1.74E-15 1.68E-15 1.62E-15 1.56E-15 0.0 0.0 0.0 0.0 > WARNING after gc filling: min. unk(ENER_VAR)=0.000000000000000 PE=1 block=1 type=1 > 1 2.24E+14 2.24E+14 2.24E+14 2.24E+14 2.24E+14 2.24E+14 2.24E+14 2.24E+14 2.24E+14 2.24E+14 2.24E+14 2.24E+14 0.0 0.0 0.0 0.0 > WARNING after gc filling: min. unk(EINT_VAR)=0.000000000000000 PE=1 block=1 type=1 > 1 2.24E+14 2.24E+14 2.24E+14 2.24E+14 2.24E+14 2.24E+14 2.24E+14 2.24E+14 2.24E+14 2.24E+14 2.24E+14 2.24E+14 0.0 0.0 0.0 0.0 > DRIVER_ABORT: > [Eos_putData] ERROR Density or Internal Energy are zero after a call to EOS! > > > Would anyone know what is causing this problem and or where I should start looking in the code or my setup files to track it down? Thanks for any information. I have attached my Config, Simulation_initBlock.F90, Grid_applyBCEdge.F90, and error output files as well in case they are of any help. > > One other point: I want to use just a simple fully ionized proton+electron plasma so that I can use radiative cooling in my simulation. This was easy in FLASH2 since you could just specify in the Config file > > REQUIRES materials/composition/prot+elec > > For FLASH3, I assume I need to just specify > > REQUIRES Simulation/SimulationComposition/Ionize > > in the Config file. > > Thanks > > Gary > > Gary Forjan > Department of Computational and Data Sciences > George Mason University, Fairfax, Va > gforjan at gmu.edu From seyit at astro.rug.nl Mon Sep 14 07:18:37 2009 From: seyit at astro.rug.nl (Seyit Hocuk) Date: Mon, 14 Sep 2009 14:18:37 +0200 Subject: [FLASH-USERS] Oddness in results when using uneven number of parallel processors Message-ID: <4AAE349D.5050802@astro.rug.nl> Hi Community, I am experiencing somewhat odd results with a newly written unit when I am running with odd number of processors. Everything seems fine when using 1 or even number of processors. By the way, I realise now that it works for 2^x number of processors. So, with even, I mean 2^x and odd =! 2^x. The code goes something like this (A is important here): >>Unit<< do blockID = 1, blockCount ... do k = blkLimits(LOW,KAXIS),blkLimits(HIGH,KAXIS) do j = blkLimits(LOW,JAXIS),blkLimits(HIGH,JAXIS) do i = blkLimits(LOW,IAXIS),blkLimits(HIGH,IAXIS) ... do LeafID = 1, blockCount call subroutine1 ... ... end >>subroutine1<< ... call subroutine2 ... end >>subroutine2<< do n = blkLimits(LOW,KAXIS),blkLimits(HIGH,KAXIS) do m = blkLimits(LOW,JAXIS),blkLimits(HIGH,JAXIS) do l = blkLimits(LOW,IAXIS),blkLimits(HIGH,IAXIS) ... A = A + x (a summation) ... end I was thinking that every processor calculates its own piece of summation of A, but fails to communicate between eacother so that it is not the actual sum I am seeing. However, I find it weird that it does works for certain number of processors more than 1. To solve this, I used the solnData pointer. That is, instead of A, I have tried solnData(A,i,j,k) so that the different processors will be connected through this mediator. No change. Kind Regards, Seyit From klaus at flash.uchicago.edu Mon Sep 14 10:47:46 2009 From: klaus at flash.uchicago.edu (Klaus Weide) Date: Mon, 14 Sep 2009 10:47:46 -0500 (CDT) Subject: [FLASH-USERS] WARNING after gc filling In-Reply-To: <4AA500C4.9060202@astro.rug.nl> References: <4AA500C4.9060202@astro.rug.nl> Message-ID: On Mon, 7 Sep 2009, Latif wrote: > Hi > Dear All, > I am using FLASH 3.2. I am getting following warnings which i saw in FLASH > 3.1. After discussion about them, it was suggested that they are not harmful. > Same warnings are appearing in FLASH 3.2, which are coming from > gr_sanitizeDataAfterInterp.F90. If i look at values in Visit they seem normal. > what is your take on this. > WARNING after gc filling: min. unk(EINT_VAR)=79113608.35385749 PE=0 block=433 type=1 My take: - If the min values are 0.0 exactly, then this may be an indication that something went wrong within Paramesh (or with FLASH's use if it). However, if the warnings only occur for non-leaf blocks, i.e. for type=2 or type=3, then this *may* still be ok. Also, some warnings with min values .ne. 0.0 *may* be ignorable if the block type is non-leaf. - If the min values are positive but slightly below the minimum given by smlrho (for density) or smalle (for energy variables ener, eint), as perhaps in the example quoted above, then this is probably an indication of interpolation undershooting. The interpolation here is the one that happens automatically for filling finer-resolution guard cells at fine/coarse boundaries. One can expect this to happen for some sorts of setups with steep gradients. Whether this is a serious problem should be analyzed for the specific setup. It could indicate, for example, that the resolution is not good enough, or even that the model of the physics that is being implement is not appropriate. Klaus From dubey at flash.uchicago.edu Mon Sep 14 14:34:17 2009 From: dubey at flash.uchicago.edu (Anshu Dubey) Date: Mon, 14 Sep 2009 14:34:17 -0500 Subject: [FLASH-USERS] Oddness in results when using uneven number of parallel processors In-Reply-To: <4AAE349D.5050802@astro.rug.nl> References: <4AAE349D.5050802@astro.rug.nl> Message-ID: <5b942a610909141234s70e56f27r61c49b6ebb03cdd3@mail.gmail.com> Seyit, There is no implicit shared memory parallelism in FLASH. If you need A to be summed up over all processors, you have to make a call to the MPI_Allreduce function. As an example you can see the file source/Grid/GridSolvers/Multipole/gr_mpoleMoments.F90 where the Moments are first computed locally, and then summed over all processors with allreduce calls. The fact that you are seeing some reasonable values in some parallel situations has to be just a coincidence. Anshu On Mon, Sep 14, 2009 at 7:18 AM, Seyit Hocuk wrote: > Hi Community, > > I am experiencing somewhat odd results with a newly written unit when I am > running with odd number of processors. Everything seems fine when using 1 or > even number of processors. > By the way, I realise now that it works for 2^x number of processors. So, > with even, I mean 2^x and odd =! 2^x. > > > The code goes something like this (A is important here): > >>>Unit<< > do blockID = 1, blockCount > ... > ? do k = blkLimits(LOW,KAXIS),blkLimits(HIGH,KAXIS) > ? ? ? do j = blkLimits(LOW,JAXIS),blkLimits(HIGH,JAXIS) > ? ? ? ? ?do i = blkLimits(LOW,IAXIS),blkLimits(HIGH,IAXIS) > ? ? ? ? ? ?... > ? ? ? ? ? ?do LeafID = 1, blockCount > ? ? ? ? ? ? ? ?call subroutine1 > ? ? ? ? ? ? ? ?... > ... > end > >>>subroutine1<< > ... > call subroutine2 > ... > end > >>>subroutine2<< > do n = blkLimits(LOW,KAXIS),blkLimits(HIGH,KAXIS) > ?do m = blkLimits(LOW,JAXIS),blkLimits(HIGH,JAXIS) > ? ? ?do l = blkLimits(LOW,IAXIS),blkLimits(HIGH,IAXIS) > ? ? ?... > ? ? ?A = A + x ?(a summation) > ? ? ?... > end > > > I was thinking that every processor calculates its own piece of summation of > A, but fails to communicate between eacother so that it is not the actual > sum I am seeing. However, I find it weird that it does works for certain > number of processors more than 1. To solve this, I used the solnData > pointer. That is, instead of A, I have tried solnData(A,i,j,k) so that the > different processors will be connected through this mediator. No change. > > > Kind Regards, > Seyit > > > From seyit at astro.rug.nl Tue Sep 15 05:54:04 2009 From: seyit at astro.rug.nl (Seyit Hocuk) Date: Tue, 15 Sep 2009 12:54:04 +0200 Subject: [FLASH-USERS] Oddness in results when using uneven number of parallel processors In-Reply-To: <5b942a610909141234s70e56f27r61c49b6ebb03cdd3@mail.gmail.com> References: <4AAE349D.5050802@astro.rug.nl> <5b942a610909141234s70e56f27r61c49b6ebb03cdd3@mail.gmail.com> Message-ID: <4AAF724C.8030409@astro.rug.nl> Hi Anshu, I realise now that my second block loop is also reduced by the number of processors which may be the problem. That is, blockCount and blockList are divided(reduced). The first block loop can be parallelized, however, I want that each block/zone to do the complete block loop again, not a part of it. The fact that I noticed no problem when I used 2^x procs, must come from the way that I do things in the code and that with 2^x procs the box is divided symmetricly. Like you said, it was just a coincidence. Im trying to understand what MPI_allreduce function does. Does it exactly do what I want, i.e., make blockList and blockCount complete/undivided again? Thanks, Seyit Anshu Dubey wrote: > Seyit, > > There is no implicit shared memory parallelism in FLASH. If you need A > to be summed up over all processors, you have to make a call to the > MPI_Allreduce function. As an example you can see the file > source/Grid/GridSolvers/Multipole/gr_mpoleMoments.F90 where the > Moments are first computed locally, and then summed over all > processors with allreduce calls. The fact that you are seeing some > reasonable values in some parallel situations has to be just a > coincidence. > > Anshu > > On Mon, Sep 14, 2009 at 7:18 AM, Seyit Hocuk wrote: > >> Hi Community, >> >> I am experiencing somewhat odd results with a newly written unit when I am >> running with odd number of processors. Everything seems fine when using 1 or >> even number of processors. >> By the way, I realise now that it works for 2^x number of processors. So, >> with even, I mean 2^x and odd =! 2^x. >> >> >> The code goes something like this (A is important here): >> >> >>>> Unit<< >>>> >> do blockID = 1, blockCount >> ... >> do k = blkLimits(LOW,KAXIS),blkLimits(HIGH,KAXIS) >> do j = blkLimits(LOW,JAXIS),blkLimits(HIGH,JAXIS) >> do i = blkLimits(LOW,IAXIS),blkLimits(HIGH,IAXIS) >> ... >> do LeafID = 1, blockCount >> call subroutine1 >> ... >> ... >> end >> >> >>>> subroutine1<< >>>> >> ... >> call subroutine2 >> ... >> end >> >> >>>> subroutine2<< >>>> >> do n = blkLimits(LOW,KAXIS),blkLimits(HIGH,KAXIS) >> do m = blkLimits(LOW,JAXIS),blkLimits(HIGH,JAXIS) >> do l = blkLimits(LOW,IAXIS),blkLimits(HIGH,IAXIS) >> ... >> A = A + x (a summation) >> ... >> end >> >> >> I was thinking that every processor calculates its own piece of summation of >> A, but fails to communicate between eacother so that it is not the actual >> sum I am seeing. However, I find it weird that it does works for certain >> number of processors more than 1. To solve this, I used the solnData >> pointer. That is, instead of A, I have tried solnData(A,i,j,k) so that the >> different processors will be connected through this mediator. No change. >> >> >> Kind Regards, >> Seyit >> >> >> >> From gforjan at gmu.edu Tue Sep 15 06:45:19 2009 From: gforjan at gmu.edu (Gary F Forjan) Date: Tue, 15 Sep 2009 07:45:19 -0400 Subject: [FLASH-USERS] WARNING after gc filling problem In-Reply-To: <4AAE1B14.5080300@astro.rug.nl> References: <4AAE1B14.5080300@astro.rug.nl> Message-ID: Hi Seyit Thanks for the suggestion but compiling the code with the -O3 flag did not resolve the problem of zero values for the variables. I have to think Klaus is right that something is wrong in Paramesh but I am not sure what in my Config and flash.par file is causing it. Perhaps my composition. I include only the line REQUIRES Simulation/SimulationComposition/Ionize in my Config file to implement a proton+elec plasma but the Flash.h file shows that NSPECIES is defined as 5. I am not sure how FLASH determines that. Gary Gary Forjan Department of Computational and Data Sciences George Mason University, Fairfax, Va gforjan at gmu.edu ----- Original Message ----- From: Seyit Hocuk Date: Monday, September 14, 2009 6:29 am Subject: Re: [FLASH-USERS] WARNING after gc fillling problem > Hi Gary, > > On the first point, are you compiling with -O3. Somehow not using - > O3 is > punishable by Flash3 with zero variable values :). > This is what I noticed. > > Seyit > > > > Gary F Forjan wrote: > > Hello FLASH Users > > > > I have a question similar to Latif's in her email of 9/7/09, > only my problem is more serious. I am getting zero values for my > ENER_VAR, DENS_VAR, and other variables and my simulation is > crashing after a call to EOS. I have a 1-dimensional domain and > am using the following boundary condition types > > > > xl_boundary_type = "user" > > xr_boundary_type = "reflect" > > > > To implement the user defined boundary condition, I have > followed the WindTunnel example and just modified the > Grid_applyBCEdge.F90 routine to use my own variable values which > appear to be correct. I also make a call to EOS in my > Simulation_initBlock.F90 routine using temperature and density to > obtain the pressure and energy. I get the following error at the > start of the flash run after several refinements have taken place > (Nblockx=1)> > > WARNING after gc filling: min. > unk(DENS_VAR)=0.000000000000000000000 PE=1 block=1 > type=1 > > 1 2.32E-15 2.24E-15 2.16E-15 2.08E-15 2.01E-15 1.94E-15 1.87E- > 15 1.80E-15 1.74E-15 1.68E-15 1.62E-15 1.56E-15 0.0 0.0 > 0.0 0.0 > > WARNING after gc filling: min. unk(ENER_VAR)=0.000000000000000 > PE=1 block=1 > type=1 > > 1 2.24E+14 2.24E+14 2.24E+14 2.24E+14 2.24E+14 2.24E+14 > 2.24E+14 2.24E+14 2.24E+14 2.24E+14 2.24E+14 2.24E+14 0.0 > 0.0 0.0 0.0 > > WARNING after gc filling: min. unk(EINT_VAR)=0.000000000000000 > PE=1 block=1 > type=1 > > 1 2.24E+14 2.24E+14 2.24E+14 2.24E+14 2.24E+14 2.24E+14 > 2.24E+14 2.24E+14 2.24E+14 2.24E+14 2.24E+14 2.24E+14 0.0 > 0.0 0.0 0.0 > > DRIVER_ABORT: > > [Eos_putData] ERROR Density or Internal Energy are zero after a > call to EOS! > > > > > > Would anyone know what is causing this problem and or where I > should start looking in the code or my setup files to track it > down? Thanks for any information. I have attached my Config, > Simulation_initBlock.F90, Grid_applyBCEdge.F90, and error output > files as well in case they are of any help. > > > > One other point: I want to use just a simple fully ionized > proton+electron plasma so that I can use radiative cooling in my > simulation. This was easy in FLASH2 since you could just specify > in the Config file > > > > REQUIRES materials/composition/prot+elec > > > > For FLASH3, I assume I need to just specify > > > > REQUIRES Simulation/SimulationComposition/Ionize > > > > in the Config file. > > > > Thanks > > > > Gary > > > > Gary Forjan > > Department of Computational and Data Sciences > > George Mason University, Fairfax, Va > > gforjan at gmu.edu > > From klaus at flash.uchicago.edu Tue Sep 15 08:39:51 2009 From: klaus at flash.uchicago.edu (Klaus Weide) Date: Tue, 15 Sep 2009 08:39:51 -0500 (CDT) Subject: [FLASH-USERS] WARNING after gc filling problem In-Reply-To: References: <4AAE1B14.5080300@astro.rug.nl> Message-ID: On Tue, 15 Sep 2009, Gary F Forjan wrote: > > Thanks for the suggestion but compiling the code with the -O3 flag did not resolve the problem of zero values for the variables. I have to think Klaus is right that something is wrong in Paramesh but I am not sure what in my Config and flash.par file is causing it. Perhaps my composition. I include only the line > > REQUIRES Simulation/SimulationComposition/Ionize > > in my Config file to implement a proton+elec plasma but the Flash.h file shows that NSPECIES is defined as 5. I am not sure how FLASH determines that. Hi Gary, Yes, that NSPECIES number is strange. You can try to find in Flash.h what those five species are supposed to be (look for XXX_SPECIES). I am not very familiar with the Ionize code unit, but one obvious thing that is different between your FLASH3 build and a reference setup (NeiTest) for Ionization is the following: your setup does not include the Multispecies unit (as shown in the "FLASH Units used" section of your log file). So add a REQUESTS for that to you Config file and see what happens. You may then also have to use Eos/EosMain/Multigamma instead of Eos/EosMain/Gamma. Some other ideas to try: Add a REQUESTS Grid/GridBoundaryConditions explicitly to your Config file, before the REQUESTS Grid/GridBoundaryConditions/OneRow you already have. Normally this should always be implied, so this is unlikekly to do anything, but please try it anyway. I have seen a similar effect (all guard cells filled with 0.0) in the past when the xl_boundary_type etc. string was misspelled in a .par file, like "refecting". Make sure that your Grid_applyBCEdge invocation sees the correct negative values for bcType (USER_DEFINED and REFLECTING as defined in constants.h). If you force the Paramesh grid to be uniform (set lrefine_min==lrefine_max), do you still see these errors? Klaus From latife at astro.rug.nl Tue Sep 15 09:02:32 2009 From: latife at astro.rug.nl (Latif) Date: Tue, 15 Sep 2009 16:02:32 +0200 Subject: [FLASH-USERS] How memory requirement calculations work ? Message-ID: <4AAF9E78.3070601@astro.rug.nl> Hi Dear All, I am trying to run 512^3 ( which is equivalent to lrefinemax=7 in 3-D) uniform grid cosmological simulations (including gravity, particles, hydro,EOS). I am running FLASH on 250 GB internal memory machine. I was not able to able to run simulations because i received the message that there is not enough virtual memory available. Will using uniform grid (UG) will reduce memory requirements. Can any one explain to me, how memory requirement calculation is done in FLASH. I am really interested in knowing about. Some guy was arguing with me that he can run run 512^3 sph simulations on 30 GB machine why you can't. Memory requirement in AMR codes is generally high. Which part of FLASH is expensive in terms of memory? How can i know how much memory i will be required for X^3 grid calculations? Cheers Latif From seyit at astro.rug.nl Tue Sep 15 09:55:20 2009 From: seyit at astro.rug.nl (Seyit Hocuk) Date: Tue, 15 Sep 2009 16:55:20 +0200 Subject: [FLASH-USERS] Oddness in results when using uneven number of parallel processors In-Reply-To: <6D2A7F96-5016-4AD9-9301-F7BD6CD0F1B6@gmail.com> References: <4AAE349D.5050802@astro.rug.nl> <5b942a610909141234s70e56f27r61c49b6ebb03cdd3@mail.gmail.com> <4AAF724C.8030409@astro.rug.nl> <6D2A7F96-5016-4AD9-9301-F7BD6CD0F1B6@gmail.com> Message-ID: <4AAFAAD8.9010509@astro.rug.nl> Hi Nathan, No, I understand what Mpi_allreduce does and am also trying to work out mpi_allgather (for blockList). Either I have to use these to make the block loop complete again, or find another way to write the code without making use of (especially) blockList. Since it was already there, it was so easy to make use of them, however, I never had considered the parallellisation of the code. Suggestions on how to make a complete block loop within a partial block loop are always welcome of course. :) Thanks Nathan, Seyit Nathan C. Hearn wrote: > Hi Seyit, > > MPI_allReduce collects a value from each MPI process, applies an > operation (like summation) on the set of values, and reports the > result back to all processes. If it looks like you will be writing > your own code to communicate between processes, I would recommend > getting a copy of one of the "Using MPI" books from MIT Press -- they > provide a good tutorial for MPI and explanations of the various MPI > functions. > > > - Nathan > > > On Sep 15, 2009, at 4:54, Seyit Hocuk wrote: > >> >> Hi Anshu, >> >> I realise now that my second block loop is also reduced by the number >> of processors which may be the problem. That is, blockCount and >> blockList are divided(reduced). The first block loop can be >> parallelized, however, I want that each block/zone to do the complete >> block loop again, not a part of it. The fact that I noticed no >> problem when I used 2^x procs, must come from the way that I do >> things in the code and that with 2^x procs the box is divided >> symmetricly. Like you said, it was just a coincidence. >> >> Im trying to understand what MPI_allreduce function does. Does it >> exactly do what I want, i.e., make blockList and blockCount >> complete/undivided again? >> >> Thanks, >> Seyit >> From gforjan at gmu.edu Tue Sep 15 12:07:20 2009 From: gforjan at gmu.edu (Gary F Forjan) Date: Tue, 15 Sep 2009 13:07:20 -0400 Subject: [FLASH-USERS] WARNING after gc filling problem In-Reply-To: References: <4AAE1B14.5080300@astro.rug.nl> Message-ID: Hi Klaus Thanks for the comments - I will try all your suggestions. Since I think all compositions include electrons and hydrogen at a minimum, perhaps I do need to use the Multispecies unit to get a fully ionized proton/electron plasma. I'll also see if using a uniform grid makes any difference. Thanks again, Gary Gary Forjan Department of Computational and Data Sciences George Mason University, Fairfax, Va gforjan at gmu.edu ----- Original Message ----- From: Klaus Weide Date: Tuesday, September 15, 2009 9:39 am Subject: Re: [FLASH-USERS] WARNING after gc filling problem > On Tue, 15 Sep 2009, Gary F Forjan wrote: > > > > > Thanks for the suggestion but compiling the code with the -O3 > flag did not resolve the problem of zero values for the variables. > I have to think Klaus is right that something is wrong in > Paramesh but I am not sure what in my Config and flash.par file is > causing it. Perhaps my composition. I include only the line > > > > REQUIRES Simulation/SimulationComposition/Ionize > > > > in my Config file to implement a proton+elec plasma but the > Flash.h file shows that NSPECIES is defined as 5. I am not sure > how FLASH determines that. > > Hi Gary, > > Yes, that NSPECIES number is strange. You can try to find in > Flash.h > what those five species are supposed to be (look for XXX_SPECIES). > > I am not very familiar with the Ionize code unit, but one obvious > thing > that is different between your FLASH3 build and a reference setup > (NeiTest) for Ionization is the following: your setup does not > include the > Multispecies unit (as shown in the "FLASH Units used" section of > your log > file). So add a REQUESTS for that to you Config file and see what > happens. You may then also have to use Eos/EosMain/Multigamma > instead of > Eos/EosMain/Gamma. > > > Some other ideas to try: > Add a > REQUESTS Grid/GridBoundaryConditions > explicitly to your Config file, before the > REQUESTS Grid/GridBoundaryConditions/OneRow > you already have. Normally this should always be implied, so this > is > unlikekly to do anything, but please try it anyway. > > I have seen a similar effect (all guard cells filled with 0.0) in > the past > when the xl_boundary_type etc. string was misspelled in a .par > file, like > "refecting". Make sure that your Grid_applyBCEdge invocation sees > the > correct negative values for bcType (USER_DEFINED and REFLECTING as > defined > in constants.h). > > If you force the Paramesh grid to be uniform (set > lrefine_min==lrefine_max), do you still see these errors? > > Klaus > From dubey at flash.uchicago.edu Tue Sep 15 12:56:38 2009 From: dubey at flash.uchicago.edu (Anshu Dubey) Date: Tue, 15 Sep 2009 12:56:38 -0500 Subject: [FLASH-USERS] How memory requirement calculations work ? In-Reply-To: <4AAF9E78.3070601@astro.rug.nl> References: <4AAF9E78.3070601@astro.rug.nl> Message-ID: <5b942a610909151056m1d0aa0f2s431fd34ca185d277@mail.gmail.com> Latif, You should certainly be able to run 512^3 problem on your machine unless you have an exorbitant number of variables. Uniform Grid will be significantly cheaper than AMR memory wise, especially if you are using 8^3 blocks. The simplest calculation of memory per variable is as follows. The amount of memory per block is 16^3 words, where only 8^3 words have the interior data, the remaining cells are for guard cells. Effectively, for a 512^3 of actual data, 262144 blocks or 262144*8^3 cells need to be stored per variable. If this same space was covered with 32^3 blocks, then the numbers reduce to 4096 blocks and 4096*40^3 cells per variable. If you were to run the problem as a single block then the number of cells is 520^3 words per variable. AMR has several other memory overheads as well because a lot more metadata and buffer storage is required. So if your mesh ends up being uniform, you should absolutely use the UG implementation instead of paramesh for your simulation. Anshu On Tue, Sep 15, 2009 at 9:02 AM, Latif wrote: > Hi > Dear All, > I am trying to run 512^3 ( which is equivalent to lrefinemax=7 in 3-D) > uniform grid cosmological simulations (including gravity, particles, > hydro,EOS). I am running FLASH on 250 GB internal memory machine. ?I was not > able to able to run simulations because i received the message that there is > not enough virtual memory available. Will using uniform grid (UG) will > reduce memory requirements. Can any one explain to me, how memory > ?requirement calculation is done in FLASH. I am really interested in knowing > about. Some guy was arguing with me that he can run run 512^3 sph > simulations on 30 GB machine why you can't. Memory requirement in AMR codes > is generally high. Which part of FLASH is expensive in terms of memory? How > can i know how much memory i will be required for X^3 grid calculations? > Cheers > Latif > > From andrew.szymkowiak at yale.edu Thu Sep 24 10:54:01 2009 From: andrew.szymkowiak at yale.edu (Andrew Szymkowiak) Date: Thu, 24 Sep 2009 11:54:01 -0400 Subject: [FLASH-USERS] problem w/ userdef boundary Message-ID: <4ABB9619.2080905@yale.edu> (I'll continue to crawl around in idb, but I've spent sufficient time w/o progress that I thought it might now be expedient to run even my partial problem report by the experts, to see if any bells are rung and they can direct me...) I've been working with a new simulation, and my Simulation_initBlock works fine. Now I'm trying to add a Grid_bcApplyToRegionSpecialized, and much of that is working, when all 6 boundaries are set to "reflect". But when I set one boundary to "userdef", my simulation dies because the guardcells along that face are never initialized, they are left at zero, and the "sanitize" routine complains because densities, etc are too small. So I've started crawling around in the debugger, and what I've discovered so far is that the gr_bcApplyToOneFace routine is only calling my Grid_bcApplyToRegionSpecialized 5 times, once for each of the reflecting boundaries, but never for the userdef one. I've been watching the surrblks array, and never see the userdef value (-38) in it, only the -31 (=REFLECTING) Any ideas? Thanks, Andy S. From klaus at flash.uchicago.edu Thu Sep 24 11:11:01 2009 From: klaus at flash.uchicago.edu (Klaus Weide) Date: Thu, 24 Sep 2009 11:11:01 -0500 (CDT) Subject: [FLASH-USERS] problem w/ userdef boundary In-Reply-To: <4ABB9619.2080905@yale.edu> References: <4ABB9619.2080905@yale.edu> Message-ID: On Thu, 24 Sep 2009, Andrew Szymkowiak wrote: > I've been working with a new simulation, and my Simulation_initBlock works > fine. Now I'm trying to add a Grid_bcApplyToRegionSpecialized, and much of > that is working, when all 6 boundaries are set to "reflect". But when I set > one boundary to "userdef", my simulation dies because the guardcells along > that face are never initialized, they are left at zero, and the "sanitize" > routine complains because densities, etc are too small. So I've started > crawling around in the debugger, and what I've discovered so far is that the > gr_bcApplyToOneFace routine is only calling my Grid_bcApplyToRegionSpecialized > 5 times, once for each of the reflecting boundaries, but never for the userdef > one. I've been watching the surrblks array, and never see the userdef value > (-38) in it, only the -31 (=REFLECTING) Could it be as simple as this: The string you are using is not recognized and silently ignored. For boundary conditions, the recognized strings are in the file RuntimeParameters_mapStrToInt.F90: $ grep -in user RuntimeParameters_mapStrToInt.F90 122: case ("user", "user-defined","USER","USER-DEFINED") 123:#ifdef USER_DEFINED 124: constKey = USER_DEFINED So you should have "user" or "user-defined" in your flash.par, not "userdef". Klaus From andrew.szymkowiak at yale.edu Thu Sep 24 11:33:35 2009 From: andrew.szymkowiak at yale.edu (Andrew Szymkowiak) Date: Thu, 24 Sep 2009 12:33:35 -0400 Subject: [FLASH-USERS] problem w/ userdef boundary In-Reply-To: References: <4ABB9619.2080905@yale.edu> Message-ID: <4ABB9F5F.8030207@yale.edu> Well that was sure embarrassing - yes that was it. > So you should have "user" or "user-defined" in your flash.par, not > "userdef". > > Klaus > (I have a distinct memory of having discovered the right value by looking at that routine, and having fixed this in my flash.par. It seems likely that I again suffered from the fact that when I re-run setup, in this case to turn on debugging, that it re-writes the (good) flash.par in the object library from the original in the source directory, and reverted to the old, bad value - but in my head, this had already been fixed. Sorry. (Perhaps the string to integer routine should emit a warning when the default case is reached?) From klaus at flash.uchicago.edu Thu Sep 24 12:15:49 2009 From: klaus at flash.uchicago.edu (Klaus Weide) Date: Thu, 24 Sep 2009 12:15:49 -0500 (CDT) Subject: [FLASH-USERS] problem w/ userdef boundary In-Reply-To: <4ABB9F5F.8030207@yale.edu> References: <4ABB9619.2080905@yale.edu> <4ABB9F5F.8030207@yale.edu> Message-ID: On Thu, 24 Sep 2009, Andrew Szymkowiak wrote: > (Perhaps the string to integer > routine should emit a warning when the default case is reached?) I agree that FLASH should at least emit some warning if a boundary condition string is not recognized. Klaus From andrew.szymkowiak at yale.edu Wed Sep 30 11:48:20 2009 From: andrew.szymkowiak at yale.edu (Andrew Szymkowiak) Date: Wed, 30 Sep 2009 12:48:20 -0400 Subject: [FLASH-USERS] problems with refinement? Message-ID: <4AC38BD4.3080200@yale.edu> This will only be a brief version of my situation - I'm looking for debugging suggestions, etc. I've written a Grid_bcApplyToRegionSpecialized, which puts an outflow into some regions of the boundary. But when I do a run with max_lrefine > 2, after some iterations, I am getting "ghost" copies of parts of my flow in other regions of the bottom plane. I've spent a lot of time trying to see if it's my (mis-)handling of the RegionData, and the endpoints, etc - but have never found anything I yet recognize as wrong in the debugging data I've been dumping. Also, the "ghosts" have a lower amplitude than the region where I'm actually inserting the flow, so it seems unlikely that my routine is setting the guardcells under the ghosts. I was wondering, partially based on the earlier discussion about the markRefineDerefine problems the Jeans simulation, about whether I could be having problems with parent blocks vs leaf blocks, etc. I did just try a run with "advance_all_levels" set to true, but it made no difference. This is with FLASH 3.2, and a setup including "-3d -auto +8wave". In addition to the ApplyToRegion routine, the only other source in my directory is a Simulation_data.F90, a Simulation_init.F90, and a Simulation_initBlock.F90, so all the rest should be "vamilla" FLASH. Thanks, Andy S.