From seyit at astro.rug.nl Mon Aug 4 08:00:10 2008 From: seyit at astro.rug.nl (Seyit Hocuk) Date: Mon, 04 Aug 2008 15:00:10 +0200 Subject: [FLASH-USERS] Big Problem?? In-Reply-To: <2467fdc0807300705n71a85d08gabfc731daee9c955@mail.gmail.com> References: <48441DC4.9050701@astro.rug.nl> <487CE1DC.8080505@flash.uchicago.edu> <4880C5A5.50708@astro.rug.nl> <2467fdc0807181010q52dcbecft6793e7c00dcd1518@mail.gmail.com> <4880D78C.4020708@astro.rug.nl> <4880DC19.7070205@scs.fsu.edu> <488F58E1.9050306@astro.rug.nl> <488F7953.4010802@flash.uchicago.edu> <2467fdc0807291324t5368140ax904909774b1bf582@mail.gmail.com> <489038FC.2030005@astro.rug.nl> <2467fdc0807300705n71a85d08gabfc731daee9c955@mail.gmail.com> Message-ID: <4896FD5A.50202@astro.rug.nl> Hi Paul, hi Nathan, First of all; using --with-default-api-version=v16 when configuring hdf5-1.8.1 works fine. Thanks for that Paul. Nathan, if you mean by uninitialized that the types are not defined (like in config file REAL or INTEGER or whatever), then no because I define them all. But if you mean I have included modules or parameters which I don't use, that's correct and I am no expert in programming so it might indeed be good to check this. Greetz, Seyit Nathan Hearn wrote: > Hi Seyit, > > Forgive me if I am asking a question that has already been > addressed, but is it possible that there are uninitialized variables > that are being used in your code? It is certainly possible that they > could yield different results with each change in compiler, library > version, and optimization level. > > If you haven't done so already, you might want to look through the > output of your first checkpoint file (using hdfview or h5dump) to see > if any of the datasets have unexpected values. You could also see if > your compiler has an option to automatically initialize variables when > Flash is run (this should not be used to fix the problem, but rather, > to expose it); if possible, see if you can automatically initialize > all floating point data to NaN. > > > - Nathan > > > On Wed, Jul 30, 2008 at 4:48 AM, Seyit Hocuk wrote: > >> Ah thanks Nathan, >> >> I see there is a good link to the api bindings. That answers the question I >> posted a minute ago I guess. >> I think I just have to configure hdf5 with a certain flag. " >> |--with-default-api-version=v16 |" >> >> Let me try that out. >> >> Greetz, >> Seyit >> From seyit at astro.rug.nl Mon Aug 4 08:17:20 2008 From: seyit at astro.rug.nl (Seyit Hocuk) Date: Mon, 04 Aug 2008 15:17:20 +0200 Subject: [FLASH-USERS] Flash conferences, workshops etc. Message-ID: <48970160.5020908@astro.rug.nl> Hi, I have a different type of question. Is there any FLASH workshops, meetings or conferences planned in the future that you know of? Or any get together which any creator(s) of FLASH will attend? I and my friend would like to attend one since FLASH has become a big part of our research. I know there was one last year. Maybe there is a list somewhere on the web. Greetz, Seyit From nhearn at uchicago.edu Mon Aug 4 10:20:33 2008 From: nhearn at uchicago.edu (Nathan Hearn) Date: Mon, 4 Aug 2008 10:20:33 -0500 Subject: [FLASH-USERS] Big Problem?? In-Reply-To: <4896FD5A.50202@astro.rug.nl> References: <48441DC4.9050701@astro.rug.nl> <2467fdc0807181010q52dcbecft6793e7c00dcd1518@mail.gmail.com> <4880D78C.4020708@astro.rug.nl> <4880DC19.7070205@scs.fsu.edu> <488F58E1.9050306@astro.rug.nl> <488F7953.4010802@flash.uchicago.edu> <2467fdc0807291324t5368140ax904909774b1bf582@mail.gmail.com> <489038FC.2030005@astro.rug.nl> <2467fdc0807300705n71a85d08gabfc731daee9c955@mail.gmail.com> <4896FD5A.50202@astro.rug.nl> Message-ID: <2467fdc0808040820t405dbd15j19359a4558417803@mail.gmail.com> Hi Seyit, An uninitialized variable is one that is declared (specified as integer, real, etc.), but not assigned a value. Thus, an uninitialized variable usually has whatever value was in its memory location before it was declared. (It could be a random number, "infinity," or just garbage.) If this variable gets used before a value is assigned to it, strange behavior may result, which would be very compiler- and architecture-specific). If you are using uninitialized pointer or allocatable variables, the effects can be quite drastic and hard to identify. Generally speaking, it is a good idea to assign a value to every variable soon after it is declared, even if it is only a temporary value that is not actually used. (As I recall, there is a way to assign a null value to pointers, which is also a very useful practice.) - Nathan On 8/4/08, Seyit Hocuk wrote: > Hi Paul, hi Nathan, > > First of all; using --with-default-api-version=v16 when configuring > hdf5-1.8.1 works fine. Thanks for that Paul. > > Nathan, if you mean by uninitialized that the types are not defined > (like in config file REAL or INTEGER or whatever), then no because I > define them all. But if you mean I have included modules or parameters > which I don't use, that's correct and I am no expert in programming so > it might indeed be good to check this. > > Greetz, > Seyit From nhearn at uchicago.edu Mon Aug 4 10:26:29 2008 From: nhearn at uchicago.edu (Nathan Hearn) Date: Mon, 4 Aug 2008 10:26:29 -0500 Subject: [FLASH-USERS] Big Problem?? In-Reply-To: <2467fdc0808040820t405dbd15j19359a4558417803@mail.gmail.com> References: <48441DC4.9050701@astro.rug.nl> <4880D78C.4020708@astro.rug.nl> <4880DC19.7070205@scs.fsu.edu> <488F58E1.9050306@astro.rug.nl> <488F7953.4010802@flash.uchicago.edu> <2467fdc0807291324t5368140ax904909774b1bf582@mail.gmail.com> <489038FC.2030005@astro.rug.nl> <2467fdc0807300705n71a85d08gabfc731daee9c955@mail.gmail.com> <4896FD5A.50202@astro.rug.nl> <2467fdc0808040820t405dbd15j19359a4558417803@mail.gmail.com> Message-ID: <2467fdc0808040826w788f1273jff47321ba1d12218@mail.gmail.com> P.S., I just realized that I may have confused things a bit. (Sorry!) I am actually referring to variables used in the Fortran code, and not the Flash parameters declared in Config. - Nathan On 8/4/08, Nathan Hearn wrote: > Hi Seyit, > > An uninitialized variable is one that is declared (specified as > integer, real, etc.), but not assigned a value. Thus, an > uninitialized variable usually has whatever value was in its memory > location before it was declared. (It could be a random number, > "infinity," or just garbage.) If this variable gets used before a > value is assigned to it, strange behavior may result, which would be > very compiler- and architecture-specific). If you are using > uninitialized pointer or allocatable variables, the effects can be > quite drastic and hard to identify. > > Generally speaking, it is a good idea to assign a value to every > variable soon after it is declared, even if it is only a temporary > value that is not actually used. (As I recall, there is a way to > assign a null value to pointers, which is also a very useful > practice.) > > > - Nathan > > > On 8/4/08, Seyit Hocuk wrote: >> Hi Paul, hi Nathan, >> >> First of all; using --with-default-api-version=v16 when configuring >> hdf5-1.8.1 works fine. Thanks for that Paul. >> >> Nathan, if you mean by uninitialized that the types are not defined >> (like in config file REAL or INTEGER or whatever), then no because I >> define them all. But if you mean I have included modules or parameters >> which I don't use, that's correct and I am no expert in programming so >> it might indeed be good to check this. >> >> Greetz, >> Seyit > -- Nathan C. Hearn nhearn at uchicago.edu ASC Flash Center Computational Physics Group University of Chicago From carlo at oddjob.uchicago.edu Mon Aug 4 11:48:55 2008 From: carlo at oddjob.uchicago.edu (Carlo Graziani) Date: Mon, 04 Aug 2008 11:48:55 -0500 Subject: [FLASH-USERS] Big Problem?? In-Reply-To: <2467fdc0808040820t405dbd15j19359a4558417803@mail.gmail.com> References: <48441DC4.9050701@astro.rug.nl> <2467fdc0807181010q52dcbecft6793e7c00dcd1518@mail.gmail.com> <4880D78C.4020708@astro.rug.nl> <4880DC19.7070205@scs.fsu.edu> <488F58E1.9050306@astro.rug.nl> <488F7953.4010802@flash.uchicago.edu> <2467fdc0807291324t5368140ax904909774b1bf582@mail.gmail.com> <489038FC.2030005@astro.rug.nl> <2467fdc0807300705n71a85d08gabfc731daee9c955@mail.gmail.com> <4896FD5A.50202@astro.rug.nl> <2467fdc0808040820t405dbd15j19359a4558417803@mail.gmail.com> Message-ID: <489732F7.8090902@oddjob.uchicago.edu> Hi Seyit. Nathan's suggestion of investigating un-initialized variables use and erratically allocated/deallocated memory is very sensible for tracking down a problem that manifests itself as unpredictable behavior. There is actually an open-source tool for doing this called valgrind. It may already be installed on your local linux systems, and if not it is easy enough to obtain. One runs a program under valgrind very simply (the documentation gives more options): Prompt> valgrind Then one sits back and digests the potentially-voluminous output. Valgrind will flag any access to uninitialized memory and memory-management screw-ups. Caveats are: (1) It's slow; (2) You'd be amazed at how much valgrind finds distasteful in system libraries. You'll probably have to filter away a bunch of uninteresting warnings about libc/mpi/hdf5 and so on (which are probably harmless, and which you can't do much about anyway). Valgrind has some facilities for suppressing certain types of warnings, which you can use to cut down the noise. If you can make a small version of the problem, running on one processor, that exhibits the erratic behavior, this would probably be an ideal case to feed to valgrind. There's some support for parallel debugging, but you'd probably have to spend some quality time with documentation and haunt some other mailing lists to get that running. Cheers, Carlo Nathan Hearn wrote: > Hi Seyit, > > An uninitialized variable is one that is declared (specified as > integer, real, etc.), but not assigned a value. Thus, an > uninitialized variable usually has whatever value was in its memory > location before it was declared. (It could be a random number, > "infinity," or just garbage.) If this variable gets used before a > value is assigned to it, strange behavior may result, which would be > very compiler- and architecture-specific). If you are using > uninitialized pointer or allocatable variables, the effects can be > quite drastic and hard to identify. > > Generally speaking, it is a good idea to assign a value to every > variable soon after it is declared, even if it is only a temporary > value that is not actually used. (As I recall, there is a way to > assign a null value to pointers, which is also a very useful > practice.) > > > - Nathan > > > On 8/4/08, Seyit Hocuk wrote: >> Hi Paul, hi Nathan, >> >> First of all; using --with-default-api-version=v16 when configuring >> hdf5-1.8.1 works fine. Thanks for that Paul. >> >> Nathan, if you mean by uninitialized that the types are not defined >> (like in config file REAL or INTEGER or whatever), then no because I >> define them all. But if you mean I have included modules or parameters >> which I don't use, that's correct and I am no expert in programming so >> it might indeed be good to check this. >> >> Greetz, >> Seyit -- Carlo Graziani (773) 702-7973 (Voice) Department of Astronomy and Astrophysics (773) 702-6645 (FAX) University of Chicago 5640 South Ellis Avenue | If the free market really allocates resources Chicago, IL 60637 | efficiently, why does LA have four times as carlo at oddjob.uchicago.edu | many plastic surgeons as brain surgeons? From cdaley at flash.uchicago.edu Mon Aug 4 13:37:27 2008 From: cdaley at flash.uchicago.edu (Chris Daley) Date: Mon, 04 Aug 2008 13:37:27 -0500 Subject: [FLASH-USERS] Big Problem?? In-Reply-To: <489732F7.8090902@oddjob.uchicago.edu> References: <48441DC4.9050701@astro.rug.nl> <2467fdc0807181010q52dcbecft6793e7c00dcd1518@mail.gmail.com> <4880D78C.4020708@astro.rug.nl> <4880DC19.7070205@scs.fsu.edu> <488F58E1.9050306@astro.rug.nl> <488F7953.4010802@flash.uchicago.edu> <2467fdc0807291324t5368140ax904909774b1bf582@mail.gmail.com> <489038FC.2030005@astro.rug.nl> <2467fdc0807300705n71a85d08gabfc731daee9c955@mail.gmail.com> <4896FD5A.50202@astro.rug.nl> <2467fdc0808040820t405dbd15j19359a4558417803@mail.gmail.com> <489732F7.8090902@oddjob.uchicago.edu> Message-ID: <48974C67.5010809@flash.uchicago.edu> Hi Seyit, I fully agree with Carlo's recommendation - Valgrind is an excellent tool. However, before resorting to such a powerful tool, it may be worthwhile using your compiler to detect uninitialised data. I notice that you have the intel compiler on two of your computing platforms. Try adding "-check uninit" and "-traceback" to your Fortran (debug) compilation options. This will check if an uninitialised variable is used in a calculation, and will generate a runtime error if it is used. I've just run a mini test in which I used a variable in my Simulation_initBlock.F90 that is never initialised. Here is the output: forrtl: severe (193): Run-Time Check Failure. The variable 'simulation_initblock_$PP' is being used without being defined Image PC Routine Line Source flash3 00000000006B010B Unknown Unknown Unknown flash3 00000000006AE446 Unknown Unknown Unknown flash3 00000000006878DE Unknown Unknown Unknown flash3 000000000065CEB2 Unknown Unknown Unknown flash3 000000000065D147 Unknown Unknown Unknown flash3 000000000049D3DC simulation_initbl 88 Simulation_initBlock.F90 flash3 000000000044F05B Unknown Unknown Unknown flash3 000000000040E152 Unknown Unknown Unknown flash3 000000000041211D Unknown Unknown Unknown flash3 0000000000404A6A Unknown Unknown Unknown libc.so.6 000000355501C3FB Unknown Unknown Unknown flash3 00000000004049AA Unknown Unknown Unknown p0_31573: p4_error: interrupt SIGx: 13 Remember to compile your code in "debug" mode, i.e. to include the -g compilation flag and no optimisations. You may also want to look at the option "-ftrapuv" which initialises local stack variables to "unusual values". Further information about all of these options can be found in the intel man page. Regards, Chris Carlo Graziani wrote: > Hi Seyit. > > Nathan's suggestion of investigating un-initialized variables use > and erratically allocated/deallocated memory is very sensible for > tracking down a problem that manifests itself as unpredictable behavior. > > There is actually an open-source tool for doing this called valgrind. > It may already be installed on your local linux systems, and if not it > is easy enough to obtain. > > One runs a program under valgrind very simply (the documentation gives > more options): > > Prompt> valgrind > > Then one sits back and digests the potentially-voluminous output. > > Valgrind will flag any access to uninitialized memory and > memory-management > screw-ups. > > Caveats are: > (1) It's slow; > > (2) You'd be amazed at how much valgrind finds distasteful in system > libraries. You'll probably have to filter away a bunch of uninteresting > warnings about libc/mpi/hdf5 and so on (which are probably harmless, > and which you can't do much about anyway). Valgrind has some facilities > for suppressing certain types of warnings, which you can use to cut down > the noise. > > If you can make a small version of the problem, running on one > processor, that > exhibits the erratic behavior, this would probably be an ideal case to > feed to valgrind. There's some support for parallel debugging, but > you'd probably > have to spend some quality time with documentation and haunt some other > mailing lists to get that running. > > Cheers, > > Carlo > > Nathan Hearn wrote: >> Hi Seyit, >> >> An uninitialized variable is one that is declared (specified as >> integer, real, etc.), but not assigned a value. Thus, an >> uninitialized variable usually has whatever value was in its memory >> location before it was declared. (It could be a random number, >> "infinity," or just garbage.) If this variable gets used before a >> value is assigned to it, strange behavior may result, which would be >> very compiler- and architecture-specific). If you are using >> uninitialized pointer or allocatable variables, the effects can be >> quite drastic and hard to identify. >> >> Generally speaking, it is a good idea to assign a value to every >> variable soon after it is declared, even if it is only a temporary >> value that is not actually used. (As I recall, there is a way to >> assign a null value to pointers, which is also a very useful >> practice.) >> >> >> - Nathan >> >> >> On 8/4/08, Seyit Hocuk wrote: >>> Hi Paul, hi Nathan, >>> >>> First of all; using --with-default-api-version=v16 when configuring >>> hdf5-1.8.1 works fine. Thanks for that Paul. >>> >>> Nathan, if you mean by uninitialized that the types are not defined >>> (like in config file REAL or INTEGER or whatever), then no because I >>> define them all. But if you mean I have included modules or parameters >>> which I don't use, that's correct and I am no expert in programming so >>> it might indeed be good to check this. >>> >>> Greetz, >>> Seyit > > From hirche at astro.univie.ac.at Wed Aug 6 04:18:15 2008 From: hirche at astro.univie.ac.at (Stefan Hirche) Date: Wed, 6 Aug 2008 11:18:15 +0200 Subject: [FLASH-USERS] IDL loaddata, flash2 Message-ID: <200808061118.16302.hirche@astro.univie.ac.at> Hi, unfortunatly, I'm not able to use flash3, because I depend on the database. Now I have a problem with the IDL loaddata-function. I have a variable in the database, that has values around 1.e-72. This does not fit into a float, but into a double. But till now I was not able to read this variable with the loaddata-function, even not with the parameter double=1. With double=1 I don't get the warning "float underflow" anymore, but still only 0 as returned value. Any ideas what I'm doing wrong? Thanks, Stefan From mh475 at cam.ac.uk Wed Aug 6 11:59:47 2008 From: mh475 at cam.ac.uk (Martin Huarte-Espinosa) Date: Wed, 6 Aug 2008 17:59:47 +0100 Subject: [FLASH-USERS] Problems compiling Flash3 In-Reply-To: <3182e9f70808060952x5fd0434pf9bf4b15297f7cc7@mail.gmail.com> References: <3182e9f70808060952x5fd0434pf9bf4b15297f7cc7@mail.gmail.com> Message-ID: <3182e9f70808060959l5449ee70t228379434db2c26c@mail.gmail.com> Good day Flash3 users: I have some compilation problems that I haven't been able to solve. $ make, at the object directory and it displays: * -L/mraosw/data1/krause/lib/mpich-1.2.4/lib /home/krause/dataw/lib/mpich-1.2.4/lib/libmpich.a(p4_secure.o)(.text+0x91): In function `start_slave' : : warning: Using 'getpwuid' in statically linked applications requires at runtime the shared librarie s from the glibc version used for linking /mraosw/data1/krause/lib/hdf/5-1.6.5-amd-icc/lib/libhdf5.a(H5FDstream.o)(.text+0x723): In function `H 5FD_stream_open_socket': : warning: Using 'gethostbyname' in statically linked applications requires at runtime the shared lib raries from the glibc version used for linking Grid_bcApplyToRegionSpecialized.o(.text+0x5ac): In function `grid_bcapplytoregionspecialized_': : relocation truncated to fit: R_X86_64_PC32 simulation_data_mp_sim_counter1_ Grid_bcApplyToRegionSpecialized.o(.text+0x5c7): In function `grid_bcapplytoregionspecialized_': : relocation truncated to fit: R_X86_64_PC32 simulation_data_mp_sim_numepis_ Grid_bcApplyToRegionSpecialized.o(.text+0x5e2): In function `grid_bcapplytoregionspecialized_': : relocation truncated to fit: R_X86_64_PC32 simulation_data_mp_sim_tinit_ Grid_bcApplyToRegionSpecialized.o(.text+0x5f0): In function `grid_bcapplytoregionspecialized_': : relocation truncated to fit: R_X86_64_PC32 simulation_data_mp_sim_tjet_ Grid_bcApplyToRegionSpecialized.o(.text+0x600): In function `grid_bcapplytoregionspecialized_': : relocation truncated to fit: R_X86_64_PC32 simulation_data_mp_sim_counter1_ Grid_bcApplyToRegionSpecialized.o(.text+0x629): In function `grid_bcapplytoregionspecialized_': : relocation truncated to fit: R_X86_64_PC32 simulation_data_mp_sim_counter1_ Grid_bcApplyToRegionSpecialized.o(.text+0x943): In function `grid_bcapplytoregionspecialized_': : relocation truncated to fit: R_X86_64_PC32 simulation_data_mp_sim_jetradius_ Grid_bcApplyToRegionSpecialized.o(.text+0x9a1): In function `grid_bcapplytoregionspecialized_': : relocation truncated to fit: R_X86_64_PC32 simulation_data_mp_sim_jetvel_ Grid_bcApplyToRegionSpecialized.o(.text+0x9fc): In function `grid_bcapplytoregionspecialized_': : relocation truncated to fit: R_X86_64_PC32 simulation_data_mp_sim_gamma_ Grid_bcApplyToRegionSpecialized.o(.text+0xa4b): In function `grid_bcapplytoregionspecialized_': : relocation truncated to fit: R_X86_64_PC32 simulation_data_mp_sim_gamma_ Grid_conserveField.o(.text+0xe): In function `grid_conservefield_': : additional relocation overflows omitted from the output /home/krause/dataw/lib/mpich-1.2.4/lib/libpmpich.a(getpname.o)(.text+0x1b): In function `PMPI_Get_pro cessor_name': : undefined reference to `MPID_Node_name' make: *** [flash3] Error 1* My make file (taken from my working flash2.5 for the same machine): *LD_LIBRARY_PATH = $LD_LIBRARY_PATH:/mraosw/data1/krause/lib/hdf/szip2.0-amd-enc/lib #HDF4_PATH = /usr/local/hdf4 HDF5_PATH = /mraosw/data1/krause/lib/hdf/5-1.6.5-amd-icc #hdf/hdf5-1.6.5/hdf5/ #PAPI_PATH = /usr/papi #PAPI_FLAGS = -c -fast -I$(PAPI_PATH)/include #---------------------------------------------------------------------------- # Compiler and linker commands # # Use the MPICH wrappers around the compilers -- these will automatically # load the proper libraries and include files. Version of MPICH prior # to 1.2.2 (?) do not recognize .F90 as a valid Fortran file extension. # You need to edit mpif90 and add .F90 to the test of filename extensions, # or upgrade your MPICH. #---------------------------------------------------------------------------- FCOMP = /mraosw/data1/krause/lib/mpich-1.2.4/bin//mpif90 CCOMP = /mraosw/data1/krause/lib/mpich-1.2.4/bin//mpicc CPPCOMP = /mraosw/data1/krause/lib/mpich-1.2.4/bin//mpicc LINK = /mraosw/data1/krause/lib/mpich-1.2.4/bin//mpif90 -static #FCOMP = /mraosw/data1/krause/lib/mpich2-1.0.4p1/bin/mpif90 #CCOMP = /mraosw/data1/krause/lib/mpich2-1.0.4p1/bin/mpicc #CPPCOMP = /mraosw/data1/krause/lib/mpich2-1.0.4p1/bin/mpicc #LINK = /mraosw/data1/krause/lib/mpich2-1.0.4p1/bin/mpif90 -static # pre-processor flag PP = -D #---------------------------------------------------------------------------- # Compilation flags # # Three sets of compilation/linking flags are defined: one for optimized # code, one for testing, and one for debugging. The default is to use the # _OPT version. Specifying -debug to setup will pick the _DEBUG version, # these should enable bounds checking. Specifying _TEST is used for # flash_test, and is set for quick code generation, and (sometimes) # profiling. The Makefile generated by setup will assign the generic token # (ex. FFLAGS) to the proper set of flags (ex. FFLAGS_OPT). #---------------------------------------------------------------------------- #FFLAGS_OPT = -c -fast -r8 -i4 #FFLAGS_DEBUG = -g -c -Mbounds -r8 -i4 #FFLAGS_TEST = -c -r8 -i4 -Mprof=lines FFLAGS_OPT = -c -r8 -i4 -O3 -I/home/krause/mpich2-0.971/mpich2-install/include FFLAGS_DEBUG = -g -c -Mbounds -double FFLAGS_TEST = -c -double -Mprof=lines F90FLAGS = -w -v -mismatch -dusty # if we are using HDF5, we need to specify the path to the include files CFLAGS_HDF5 = -I $(HDF5_PATH)/include CFLAGS_OPT = -O3 -c CFLAGS_DEBUG = -g -c CFLAGS_TEST = -c #---------------------------------------------------------------------------- # Linker flags # # There is a seperate version of the linker flags for each of the _OPT, # _DEBUG, and _TEST cases. #---------------------------------------------------------------------------- LFLAGS_OPT = -o LFLAGS_DEBUG = -g -o LFLAGS_TEST = -Mprof=lines -o #---------------------------------------------------------------------------- # Library specific linking # # If a FLASH module has a 'LIBRARY xxx' line in its Config file, we need to # create a macro in this Makefile.h for LIB_xxx, which will be added to the # link line when FLASH is built. This allows us to switch between different # (incompatible) libraries. We also create a _OPT, _DEBUG, and _TEST # library macro to add any performance-minded libraries (like fast math), # depending on how FLASH was setup. #---------------------------------------------------------------------------- #LIB_MPI = -L/mraosw/data1/krause/lib/mpich-1.2.4/lib LIB_MPI = -L/mraosw/data1/krause/lib/mpich-1.2.6/lib LIB_HDF4 = -L$(HDF4_PATH)/lib -lmfhdf -ldf -ljpeg -lz LIB_HDF5 = -L$(HDF5_PATH)/lib -lhdf5 -lz -lhdf5_fortran # SZ with path to be included if v1.6.3 is used (not recommended) #LIB_PAPI = $(PAPI_PATH)/lib/libpapi.a LIB_MATH = -ldfftw -ldrfftw #LIB_OPT = -L/home/krause/mpich2-0.971/mpich2-install/lib LIB_DEBUG = LIB_TEST = #---------------------------------------------------------------------------- # Additional machine-dependent object files # # Add any machine specific files here -- they will be compiled and linked # when FLASH is built. #---------------------------------------------------------------------------- MACHOBJ = #---------------------------------------------------------------------------- # Additional commands #---------------------------------------------------------------------------- MV = mv -f AR = ar -r RM = rm -f CD = cd RL = ranlib ECHO = echo * Do you have any hints? Thank you. -- Martin Huarte-Espinosa PhD Student, Astrophysics Group, Physics Department, University of Cambridge mh475 at cam.ac.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: http://flash.uchicago.edu/pipermail/flash-users/attachments/20080806/505b7191/attachment.html From cdaley at flash.uchicago.edu Wed Aug 6 12:22:21 2008 From: cdaley at flash.uchicago.edu (Chris Daley) Date: Wed, 06 Aug 2008 12:22:21 -0500 Subject: [FLASH-USERS] Problems compiling Flash3 In-Reply-To: <3182e9f70808060959l5449ee70t228379434db2c26c@mail.gmail.com> References: <3182e9f70808060952x5fd0434pf9bf4b15297f7cc7@mail.gmail.com> <3182e9f70808060959l5449ee70t228379434db2c26c@mail.gmail.com> Message-ID: <4899DDCD.6030905@flash.uchicago.edu> Hi there, It may be your LIB_MPI flag. Try this: /LIB_MPI = -L/mraosw/data1/krause/lib/mpich-1.2.6/lib -lmpich/ Regards, Chris Martin Huarte-Espinosa wrote: > Good day Flash3 users: > > I have some compilation problems that I haven't been able to solve. > > $ make, at the object directory and it displays: > / > -L/mraosw/data1/krause/lib/mpich-1.2.4/lib > /home/krause/dataw/lib/mpich-1.2.4/lib/libmpich.a(p4_secure.o)(.text+0x91): > In function `start_slave' > : > : warning: Using 'getpwuid' in statically linked applications requires > at runtime the shared librarie > s from the glibc version used for linking > /mraosw/data1/krause/lib/hdf/5-1.6.5-amd-icc/lib/libhdf5.a(H5FDstream.o)(.text+0x723): > In function `H > 5FD_stream_open_socket': > : warning: Using 'gethostbyname' in statically linked applications > requires at runtime the shared lib > raries from the glibc version used for linking > Grid_bcApplyToRegionSpecialized.o(.text+0x5ac): In function > `grid_bcapplytoregionspecialized_': > : relocation truncated to fit: R_X86_64_PC32 > simulation_data_mp_sim_counter1_ > Grid_bcApplyToRegionSpecialized.o(.text+0x5c7): In function > `grid_bcapplytoregionspecialized_': > : relocation truncated to fit: R_X86_64_PC32 > simulation_data_mp_sim_numepis_ > Grid_bcApplyToRegionSpecialized.o(.text+0x5e2): In function > `grid_bcapplytoregionspecialized_': > : relocation truncated to fit: R_X86_64_PC32 simulation_data_mp_sim_tinit_ > Grid_bcApplyToRegionSpecialized.o(.text+0x5f0): In function > `grid_bcapplytoregionspecialized_': > : relocation truncated to fit: R_X86_64_PC32 simulation_data_mp_sim_tjet_ > Grid_bcApplyToRegionSpecialized.o(.text+0x600): In function > `grid_bcapplytoregionspecialized_': > : relocation truncated to fit: R_X86_64_PC32 > simulation_data_mp_sim_counter1_ > Grid_bcApplyToRegionSpecialized.o(.text+0x629): In function > `grid_bcapplytoregionspecialized_': > : relocation truncated to fit: R_X86_64_PC32 > simulation_data_mp_sim_counter1_ > Grid_bcApplyToRegionSpecialized.o(.text+0x943): In function > `grid_bcapplytoregionspecialized_': > : relocation truncated to fit: R_X86_64_PC32 > simulation_data_mp_sim_jetradius_ > Grid_bcApplyToRegionSpecialized.o(.text+0x9a1): In function > `grid_bcapplytoregionspecialized_': > : relocation truncated to fit: R_X86_64_PC32 > simulation_data_mp_sim_jetvel_ > Grid_bcApplyToRegionSpecialized.o(.text+0x9fc): In function > `grid_bcapplytoregionspecialized_': > : relocation truncated to fit: R_X86_64_PC32 simulation_data_mp_sim_gamma_ > Grid_bcApplyToRegionSpecialized.o(.text+0xa4b): In function > `grid_bcapplytoregionspecialized_': > : relocation truncated to fit: R_X86_64_PC32 simulation_data_mp_sim_gamma_ > Grid_conserveField.o(.text+0xe): In function `grid_conservefield_': > : additional relocation overflows omitted from the output > /home/krause/dataw/lib/mpich-1.2.4/lib/libpmpich.a(getpname.o)(.text+0x1b): > In function `PMPI_Get_pro > cessor_name': > : undefined reference to `MPID_Node_name' > make: *** [flash3] Error 1/ > > > > My make file (taken from my working flash2.5 for the same machine): > > /LD_LIBRARY_PATH = > $LD_LIBRARY_PATH:/mraosw/data1/krause/lib/hdf/szip2.0-amd-enc/lib > #HDF4_PATH = /usr/local/hdf4 > HDF5_PATH = /mraosw/data1/krause/lib/hdf/5-1.6.5-amd-icc > #hdf/hdf5-1.6.5/hdf5/ > #PAPI_PATH = /usr/papi > > #PAPI_FLAGS = -c -fast -I$(PAPI_PATH)/include > > #---------------------------------------------------------------------------- > # Compiler and linker commands > # > # Use the MPICH wrappers around the compilers -- these will > automatically > # load the proper libraries and include files. Version of MPICH prior > # to 1.2.2 (?) do not recognize .F90 as a valid Fortran file extension. > # You need to edit mpif90 and add .F90 to the test of filename > extensions, > # or upgrade your MPICH. > #---------------------------------------------------------------------------- > FCOMP = /mraosw/data1/krause/lib/mpich-1.2.4/bin//mpif90 > CCOMP = /mraosw/data1/krause/lib/mpich-1.2.4/bin//mpicc > CPPCOMP = /mraosw/data1/krause/lib/mpich-1.2.4/bin//mpicc > LINK = /mraosw/data1/krause/lib/mpich-1.2.4/bin//mpif90 -static > #FCOMP = /mraosw/data1/krause/lib/mpich2-1.0.4p1/bin/mpif90 > #CCOMP = /mraosw/data1/krause/lib/mpich2-1.0.4p1/bin/mpicc > #CPPCOMP = /mraosw/data1/krause/lib/mpich2-1.0.4p1/bin/mpicc > #LINK = /mraosw/data1/krause/lib/mpich2-1.0.4p1/bin/mpif90 -static > # pre-processor flag > PP = -D > > #---------------------------------------------------------------------------- > # Compilation flags > # > # Three sets of compilation/linking flags are defined: one for optimized > # code, one for testing, and one for debugging. The default is to > use the > # _OPT version. Specifying -debug to setup will pick the _DEBUG version, > # these should enable bounds checking. Specifying _TEST is used for > # flash_test, and is set for quick code generation, and (sometimes) > # profiling. The Makefile generated by setup will assign the generic > token > # (ex. FFLAGS) to the proper set of flags (ex. FFLAGS_OPT). > #---------------------------------------------------------------------------- > > #FFLAGS_OPT = -c -fast -r8 -i4 > #FFLAGS_DEBUG = -g -c -Mbounds -r8 -i4 > #FFLAGS_TEST = -c -r8 -i4 -Mprof=lines > FFLAGS_OPT = -c -r8 -i4 -O3 > -I/home/krause/mpich2-0.971/mpich2-install/include > FFLAGS_DEBUG = -g -c -Mbounds -double > FFLAGS_TEST = -c -double -Mprof=lines > > F90FLAGS = -w -v -mismatch -dusty > > > # if we are using HDF5, we need to specify the path to the include files > CFLAGS_HDF5 = -I $(HDF5_PATH)/include > > CFLAGS_OPT = -O3 -c > CFLAGS_DEBUG = -g -c > CFLAGS_TEST = -c > > > #---------------------------------------------------------------------------- > # Linker flags > # > # There is a seperate version of the linker flags for each of the _OPT, > # _DEBUG, and _TEST cases. > #---------------------------------------------------------------------------- > > LFLAGS_OPT = -o > LFLAGS_DEBUG = -g -o > LFLAGS_TEST = -Mprof=lines -o > > > #---------------------------------------------------------------------------- > # Library specific linking > # > # If a FLASH module has a 'LIBRARY xxx' line in its Config file, we > need to > # create a macro in this Makefile.h for LIB_xxx, which will be added > to the > # link line when FLASH is builmt. This allows us to switch between > different > # (incompatible) libraries. We also create a _OPT, _DEBUG, and _TEST > # library macro to add any performance-minded libraries (like fast math), > # depending on how FLASH was setup. > #---------------------------------------------------------------------------- > > #LIB_MPI = -L/mraosw/data1/krause/lib/mpich-1.2.4/lib > LIB_MPI = -L/mraosw/data1/krause/lib/mpich-1.2.6/lib > LIB_HDF4 = -L$(HDF4_PATH)/lib -lmfhdf -ldf -ljpeg -lz > LIB_HDF5 = -L$(HDF5_PATH)/lib -lhdf5 -lz -lhdf5_fortran > # SZ with path to be included if v1.6.3 is used (not recommended) > #LIB_PAPI = $(PAPI_PATH)/lib/libpapi.a > LIB_MATH = -ldfftw -ldrfftw > > #LIB_OPT = -L/home/krause/mpich2-0.971/mpich2-install/lib > > LIB_DEBUG = > LIB_TEST = > > > #---------------------------------------------------------------------------- > # Additional machine-dependent object files > # > # Add any machine specific files here -- they will be compiled and linked > # when FLASH is built. > #---------------------------------------------------------------------------- > > MACHOBJ = > > > #---------------------------------------------------------------------------- > # Additional commands > #---------------------------------------------------------------------------- > > MV = mv -f > AR = ar -r > RM = rm -f > CD = cd > RL = ranlib > ECHO = echo > / > Do you have any hints? > > Thank you. > > -- > Martin Huarte-Espinosa > PhD Student, Astrophysics Group, > Physics Department, University of Cambridge > mh475 at cam.ac.uk From hudson at mcs.anl.gov Wed Aug 13 10:02:53 2008 From: hudson at mcs.anl.gov (Randy Hudson) Date: Wed, 13 Aug 2008 10:02:53 -0500 Subject: [FLASH-USERS] Visit - corrections Message-ID: <3319ADBB-ED8C-4654-862D-B55B9FD86949@mcs.anl.gov> (The few of you who already know about this can ignore this email....) I've fixed two problems with VisIt, to wit: 1) the messy drawing, one atop another, of overlapping blocks from different refinement levels, and 2) the failure to read some FLASH3 files with particles. But, it'll be a few weeks before visit release 1.10, which contains these fixes, is available. If you think you're having either of these problems, and would like to try building the fixed flash plugin stand-alone, let me know and I can send you the source and build instructions. Randy. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://flash.uchicago.edu/pipermail/flash-users/attachments/20080813/fa998c6e/attachment.html From seyit at astro.rug.nl Thu Aug 14 12:03:23 2008 From: seyit at astro.rug.nl (Seyit Hocuk) Date: Thu, 14 Aug 2008 19:03:23 +0200 Subject: [FLASH-USERS] Big Problem?? In-Reply-To: <48974C67.5010809@flash.uchicago.edu> References: <48441DC4.9050701@astro.rug.nl> <2467fdc0807181010q52dcbecft6793e7c00dcd1518@mail.gmail.com> <4880D78C.4020708@astro.rug.nl> <4880DC19.7070205@scs.fsu.edu> <488F58E1.9050306@astro.rug.nl> <488F7953.4010802@flash.uchicago.edu> <2467fdc0807291324t5368140ax904909774b1bf582@mail.gmail.com> <489038FC.2030005@astro.rug.nl> <2467fdc0807300705n71a85d08gabfc731daee9c955@mail.gmail.com> <4896FD5A.50202@astro.rug.nl> <2467fdc0808040820t405dbd15j19359a4558417803@mail.gmail.com> <489732F7.8090902@oddjob.uchicago.edu> <48974C67.5010809@flash.uchicago.edu> Message-ID: <48A4655B.90801@astro.rug.nl> Hi Chris, Nathan, Carlo, It has been a while now, but finally I had time to check some stuff. Like Chris said I tried compiling with -CU (check uninit) and also with -ftrapuv. The error with -CU is shown here below. However, the same error is showing when I use normal jeans setup and a different error is generated with Sedov test problem. I think there are many undeclared variables or subroutines anyway. When I use -ftrapuv, I get segmentation faults. These debug options are not much use unfortunately. By the way, what is the use of -g when debugging. It seems like it is doing nothing. * JEANS SETUP perturbation is unstable with growth time 8776137041619.21 forrtl: severe (193): Run-Time Check Failure. The variable 'poisson_mg_relax_$ERROR' is being used without being defined Image PC Routine Line Source flash2 00000000005B70FE Unknown Unknown Unknown flash2 00000000005B62FE Unknown Unknown Unknown flash2 000000000056EB56 Unknown Unknown Unknown flash2 0000000000536D51 Unknown Unknown Unknown flash2 00000000005383C6 Unknown Unknown Unknown flash2 00000000004D31E4 poisson_mg_relax_ 315 poisson_mg_relax.F90 flash2 00000000004D4DE9 poisson_mg_solve_ 38 poisson_mg_solve.F90 flash2 00000000004CA3C4 mg_cycle_ 82 mg_cycle.F90 flash2 00000000004CEEC8 multigrid_ 150 multigrid.F90 flash2 00000000004D249A poisson_ 87 poisson.F90 flash2 0000000000430CE6 modulegravpotenti 139 GravPotentialAllBlocks.F90 flash2 000000000042AF1B init_from_scratch 257 init_from_scratch.F90 flash2 000000000041D45A init_flash_ 324 init_flash.F90 flash2 000000000041C5BF MAIN__ 62 flash.F90 flash2 0000000000405362 Unknown Unknown Unknown libc.so.6 00007F7DBB0131C4 Unknown Unknown Unknown flash2 00000000004052A9 Unknown Unknown Unknown * SEDOV EXPLOSION SETUP [CHECKPOINT_WR] NOTE: will send 710 blocks per message. [CHECKPOINT_WR] Writing checkpoint file sedov_hdf5_chk_0000 Progress: | forrtl: severe (193): Run-Time Check Failure. The variable 'perfmon_mp_log_timers_$CHECKPT_NUM' is being used without being defined Image PC Routine Line Source flash2 00000000005A1856 Unknown Unknown Unknown flash2 00000000005A0A56 Unknown Unknown Unknown flash2 000000000055A646 Unknown Unknown Unknown flash2 0000000000522B81 Unknown Unknown Unknown flash2 00000000005241F6 Unknown Unknown Unknown flash2 00000000004CB301 perfmon_mp_log_ti 1067 perfmon.F90 flash2 0000000000465F75 checkpoint_wr_ 701 checkpoint_wr.F90 flash2 000000000045E2F6 output_initial_ 189 output_initial.F90 flash2 000000000041DB09 init_flash_ 335 init_flash.F90 flash2 000000000041CC33 MAIN__ 62 flash.F90 flash2 00000000004051E2 Unknown Unknown Unknown libc.so.6 00007F0BCE1FF1C4 Unknown Unknown Unknown flash2 0000000000405129 Unknown Unknown Unknown Chris Daley wrote: > Hi Seyit, > > I fully agree with Carlo's recommendation - Valgrind is an excellent > tool. However, before resorting to such a powerful tool, it may be > worthwhile using your compiler to detect uninitialised data. I notice > that you have the intel compiler on two of your computing platforms. > > Try adding "-check uninit" and "-traceback" to your Fortran > (debug) compilation options. This will check if an uninitialised > variable > is used in a calculation, and will generate a runtime error if it is > used. > > I've just run a mini test in which I used a variable in my > Simulation_initBlock.F90 > that is never initialised. Here is the output: > > forrtl: severe (193): Run-Time Check Failure. The variable > 'simulation_initblock_$PP' is being used without being defined > Image PC Routine Line > Source > flash3 00000000006B010B Unknown Unknown > Unknown > flash3 00000000006AE446 Unknown Unknown > Unknown > flash3 00000000006878DE Unknown Unknown > Unknown > flash3 000000000065CEB2 Unknown Unknown > Unknown > flash3 000000000065D147 Unknown Unknown > Unknown > flash3 000000000049D3DC simulation_initbl 88 > Simulation_initBlock.F90 > flash3 000000000044F05B Unknown Unknown > Unknown > flash3 000000000040E152 Unknown Unknown > Unknown > flash3 000000000041211D Unknown Unknown > Unknown > flash3 0000000000404A6A Unknown Unknown > Unknown > libc.so.6 000000355501C3FB Unknown Unknown > Unknown > flash3 00000000004049AA Unknown Unknown > Unknown > p0_31573: p4_error: interrupt SIGx: 13 > > Remember to compile your code in "debug" mode, i.e. to > include the -g compilation flag and no optimisations. > > You may also want to look at the option "-ftrapuv" which initialises > local stack variables to "unusual values". Further information about > all of these options can be found in the intel man page. > > Regards, > Chris > > > Carlo Graziani wrote: >> Hi Seyit. >> >> Nathan's suggestion of investigating un-initialized variables use >> and erratically allocated/deallocated memory is very sensible for >> tracking down a problem that manifests itself as unpredictable behavior. >> >> There is actually an open-source tool for doing this called valgrind. >> It may already be installed on your local linux systems, and if not it >> is easy enough to obtain. >> >> One runs a program under valgrind very simply (the documentation gives >> more options): >> >> Prompt> valgrind >> >> Then one sits back and digests the potentially-voluminous output. >> >> Valgrind will flag any access to uninitialized memory and >> memory-management >> screw-ups. >> >> Caveats are: (1) It's slow; >> >> (2) You'd be amazed at how much valgrind finds distasteful in system >> libraries. You'll probably have to filter away a bunch of uninteresting >> warnings about libc/mpi/hdf5 and so on (which are probably harmless, >> and which you can't do much about anyway). Valgrind has some facilities >> for suppressing certain types of warnings, which you can use to cut down >> the noise. >> >> If you can make a small version of the problem, running on one >> processor, that >> exhibits the erratic behavior, this would probably be an ideal case to >> feed to valgrind. There's some support for parallel debugging, but >> you'd probably >> have to spend some quality time with documentation and haunt some other >> mailing lists to get that running. >> >> Cheers, >> >> Carlo >> >> Nathan Hearn wrote: >>> Hi Seyit, >>> >>> An uninitialized variable is one that is declared (specified as >>> integer, real, etc.), but not assigned a value. Thus, an >>> uninitialized variable usually has whatever value was in its memory >>> location before it was declared. (It could be a random number, >>> "infinity," or just garbage.) If this variable gets used before a >>> value is assigned to it, strange behavior may result, which would be >>> very compiler- and architecture-specific). If you are using >>> uninitialized pointer or allocatable variables, the effects can be >>> quite drastic and hard to identify. >>> >>> Generally speaking, it is a good idea to assign a value to every >>> variable soon after it is declared, even if it is only a temporary >>> value that is not actually used. (As I recall, there is a way to >>> assign a null value to pointers, which is also a very useful >>> practice.) >>> >>> >>> - Nathan >>> >>> >>> On 8/4/08, Seyit Hocuk wrote: >>>> Hi Paul, hi Nathan, >>>> >>>> First of all; using --with-default-api-version=v16 when configuring >>>> hdf5-1.8.1 works fine. Thanks for that Paul. >>>> >>>> Nathan, if you mean by uninitialized that the types are not defined >>>> (like in config file REAL or INTEGER or whatever), then no because I >>>> define them all. But if you mean I have included modules or parameters >>>> which I don't use, that's correct and I am no expert in programming so >>>> it might indeed be good to check this. >>>> >>>> Greetz, >>>> Seyit >> >> > From cdaley at flash.uchicago.edu Fri Aug 15 15:02:39 2008 From: cdaley at flash.uchicago.edu (Chris Daley) Date: Fri, 15 Aug 2008 15:02:39 -0500 Subject: [FLASH-USERS] Big Problem?? In-Reply-To: <48A4655B.90801@astro.rug.nl> References: <48441DC4.9050701@astro.rug.nl> <2467fdc0807181010q52dcbecft6793e7c00dcd1518@mail.gmail.com> <4880D78C.4020708@astro.rug.nl> <4880DC19.7070205@scs.fsu.edu> <488F58E1.9050306@astro.rug.nl> <488F7953.4010802@flash.uchicago.edu> <2467fdc0807291324t5368140ax904909774b1bf582@mail.gmail.com> <489038FC.2030005@astro.rug.nl> <2467fdc0807300705n71a85d08gabfc731daee9c955@mail.gmail.com> <4896FD5A.50202@astro.rug.nl> <2467fdc0808040820t405dbd15j19359a4558417803@mail.gmail.com> <489732F7.8090902@oddjob.uchicago.edu> <48974C67.5010809@flash.uchicago.edu> <48A4655B.90801@astro.rug.nl> Message-ID: <48A5E0DF.1050401@flash.uchicago.edu> Hi Seyit, The compiler flag is doing its job - we get a run-time error when an uninitialised variable is used. 1. I can confirm that the abort from perfmon.F90 is an error. Notice that the subroutine argument is "check_ptnum", but the condition in the "if" statement evaluates a variable named "checkpt_num". This emphasises the importance of "implicit none". 2. The abort from poisson_mg_relax.F90 stems from a situation which will not normally cause problems in a simulation. This is because the "error" variable is part of the following logical expression. done = (iter == nsmooth) .or. ((iter > iterating_to_convergence_limit) .and. (error > 0.)) i.e. (iter > iterating_to_convergence_limit) .and. (error > 0.). So we need both conditions to be .true. to obtain a .true. in the cumulative expression. In the poisson_mg_relax.F90 source, "error" is only ever initialised when "iter" > "iterating_to_convergence_limit", so the times when "error" is not initialised will generally not be an issue. In pursuit of robust code you should correct the code as follows: 1. Rename "checkpt_num" variable to "check_ptnum" 2. Initialise "error" to 0.0 after the variable declaration. If you re-compile and run you will hopefully get a similar kind of abort during execution of your simulation specific code. The error can then be resolved using similar steps as above. You must keep the -g flag as it gives us line-level information in the stack traces. Finally, I would recommend that you use FLASH3 if it supports the physics that you need. It is much easier for us to answer your FLASH3 questions rather than FLASH2 questions, and we would rather spend our time making FLASH3 a better application. Chris Seyit Hocuk wrote: > Hi Chris, Nathan, Carlo, > > It has been a while now, but finally I had time to check some stuff. > Like Chris said I tried compiling with -CU (check uninit) and also > with -ftrapuv. The error with -CU is shown here below. However, the > same error is showing when I use normal jeans setup and a different > error is generated with Sedov test problem. I think there are many > undeclared variables or subroutines anyway. When I use -ftrapuv, I get > segmentation faults. These debug options are not much use unfortunately. > > By the way, what is the use of -g when debugging. It seems like it is > doing nothing. > > > > * JEANS SETUP > > perturbation is unstable with growth time 8776137041619.21 > forrtl: severe (193): Run-Time Check Failure. The variable > 'poisson_mg_relax_$ERROR' is being used without being defined > Image PC Routine Line > Source flash2 00000000005B70FE > Unknown Unknown Unknown > flash2 00000000005B62FE Unknown Unknown > Unknown > flash2 000000000056EB56 Unknown Unknown > Unknown > flash2 0000000000536D51 Unknown Unknown > Unknown > flash2 00000000005383C6 Unknown Unknown > Unknown > flash2 00000000004D31E4 poisson_mg_relax_ 315 > poisson_mg_relax.F90 > flash2 00000000004D4DE9 poisson_mg_solve_ 38 > poisson_mg_solve.F90 > flash2 00000000004CA3C4 mg_cycle_ 82 > mg_cycle.F90 > flash2 00000000004CEEC8 multigrid_ 150 > multigrid.F90 > flash2 00000000004D249A poisson_ 87 > poisson.F90 > flash2 0000000000430CE6 modulegravpotenti 139 > GravPotentialAllBlocks.F90 > flash2 000000000042AF1B init_from_scratch 257 > init_from_scratch.F90 > flash2 000000000041D45A init_flash_ 324 > init_flash.F90 > flash2 000000000041C5BF MAIN__ 62 > flash.F90 > flash2 0000000000405362 Unknown Unknown > Unknown > libc.so.6 00007F7DBB0131C4 Unknown Unknown > Unknown > flash2 00000000004052A9 Unknown Unknown > Unknown > > > > * SEDOV EXPLOSION SETUP > > > [CHECKPOINT_WR] NOTE: will send 710 blocks per message. > [CHECKPOINT_WR] Writing checkpoint file sedov_hdf5_chk_0000 > Progress: | > forrtl: severe (193): Run-Time Check Failure. The variable > 'perfmon_mp_log_timers_$CHECKPT_NUM' is being used without being defined > Image PC Routine Line > Source flash2 00000000005A1856 > Unknown Unknown Unknown > flash2 00000000005A0A56 Unknown Unknown > Unknown > flash2 000000000055A646 Unknown Unknown > Unknown > flash2 0000000000522B81 Unknown Unknown > Unknown > flash2 00000000005241F6 Unknown Unknown > Unknown > flash2 00000000004CB301 perfmon_mp_log_ti 1067 > perfmon.F90 > flash2 0000000000465F75 checkpoint_wr_ 701 > checkpoint_wr.F90 > flash2 000000000045E2F6 output_initial_ 189 > output_initial.F90 > flash2 000000000041DB09 init_flash_ 335 > init_flash.F90 > flash2 000000000041CC33 MAIN__ 62 > flash.F90 > flash2 00000000004051E2 Unknown Unknown > Unknown > libc.so.6 00007F0BCE1FF1C4 Unknown Unknown > Unknown > flash2 0000000000405129 Unknown Unknown > Unknown > > > > > > > > > > > > > > Chris Daley wrote: >> Hi Seyit, >> >> I fully agree with Carlo's recommendation - Valgrind is an excellent >> tool. However, before resorting to such a powerful tool, it may be >> worthwhile using your compiler to detect uninitialised data. I notice >> that you have the intel compiler on two of your computing platforms. >> >> Try adding "-check uninit" and "-traceback" to your Fortran >> (debug) compilation options. This will check if an uninitialised >> variable >> is used in a calculation, and will generate a runtime error if it is >> used. >> >> I've just run a mini test in which I used a variable in my >> Simulation_initBlock.F90 >> that is never initialised. Here is the output: >> >> forrtl: severe (193): Run-Time Check Failure. The variable >> 'simulation_initblock_$PP' is being used without being defined >> Image PC Routine Line >> Source >> flash3 00000000006B010B Unknown Unknown >> Unknown >> flash3 00000000006AE446 Unknown Unknown >> Unknown >> flash3 00000000006878DE Unknown Unknown >> Unknown >> flash3 000000000065CEB2 Unknown Unknown >> Unknown >> flash3 000000000065D147 Unknown Unknown >> Unknown >> flash3 000000000049D3DC simulation_initbl 88 >> Simulation_initBlock.F90 >> flash3 000000000044F05B Unknown Unknown >> Unknown >> flash3 000000000040E152 Unknown Unknown >> Unknown >> flash3 000000000041211D Unknown Unknown >> Unknown >> flash3 0000000000404A6A Unknown Unknown >> Unknown >> libc.so.6 000000355501C3FB Unknown Unknown >> Unknown >> flash3 00000000004049AA Unknown Unknown >> Unknown >> p0_31573: p4_error: interrupt SIGx: 13 >> >> Remember to compile your code in "debug" mode, i.e. to >> include the -g compilation flag and no optimisations. >> >> You may also want to look at the option "-ftrapuv" which initialises >> local stack variables to "unusual values". Further information about >> all of these options can be found in the intel man page. >> >> Regards, >> Chris >> >> >> Carlo Graziani wrote: >>> Hi Seyit. >>> >>> Nathan's suggestion of investigating un-initialized variables use >>> and erratically allocated/deallocated memory is very sensible for >>> tracking down a problem that manifests itself as unpredictable >>> behavior. >>> >>> There is actually an open-source tool for doing this called valgrind. >>> It may already be installed on your local linux systems, and if not it >>> is easy enough to obtain. >>> >>> One runs a program under valgrind very simply (the documentation gives >>> more options): >>> >>> Prompt> valgrind >>> >>> Then one sits back and digests the potentially-voluminous output. >>> >>> Valgrind will flag any access to uninitialized memory and >>> memory-management >>> screw-ups. >>> >>> Caveats are: (1) It's slow; >>> >>> (2) You'd be amazed at how much valgrind finds distasteful in system >>> libraries. You'll probably have to filter away a bunch of >>> uninteresting >>> warnings about libc/mpi/hdf5 and so on (which are probably harmless, >>> and which you can't do much about anyway). Valgrind has some >>> facilities >>> for suppressing certain types of warnings, which you can use to cut >>> down >>> the noise. >>> >>> If you can make a small version of the problem, running on one >>> processor, that >>> exhibits the erratic behavior, this would probably be an ideal case to >>> feed to valgrind. There's some support for parallel debugging, but >>> you'd probably >>> have to spend some quality time with documentation and haunt some other >>> mailing lists to get that running. >>> >>> Cheers, >>> >>> Carlo >>> >>> Nathan Hearn wrote: >>>> Hi Seyit, >>>> >>>> An uninitialized variable is one that is declared (specified as >>>> integer, real, etc.), but not assigned a value. Thus, an >>>> uninitialized variable usually has whatever value was in its memory >>>> location before it was declared. (It could be a random number, >>>> "infinity," or just garbage.) If this variable gets used before a >>>> value is assigned to it, strange behavior may result, which would be >>>> very compiler- and architecture-specific). If you are using >>>> uninitialized pointer or allocatable variables, the effects can be >>>> quite drastic and hard to identify. >>>> >>>> Generally speaking, it is a good idea to assign a value to every >>>> variable soon after it is declared, even if it is only a temporary >>>> value that is not actually used. (As I recall, there is a way to >>>> assign a null value to pointers, which is also a very useful >>>> practice.) >>>> >>>> >>>> - Nathan >>>> >>>> >>>> On 8/4/08, Seyit Hocuk wrote: >>>>> Hi Paul, hi Nathan, >>>>> >>>>> First of all; using --with-default-api-version=v16 when configuring >>>>> hdf5-1.8.1 works fine. Thanks for that Paul. >>>>> >>>>> Nathan, if you mean by uninitialized that the types are not defined >>>>> (like in config file REAL or INTEGER or whatever), then no because I >>>>> define them all. But if you mean I have included modules or >>>>> parameters >>>>> which I don't use, that's correct and I am no expert in >>>>> programming so >>>>> it might indeed be good to check this. >>>>> >>>>> Greetz, >>>>> Seyit >>> >>> >> From seyit at astro.rug.nl Sat Aug 16 09:02:03 2008 From: seyit at astro.rug.nl (seyit) Date: Sat, 16 Aug 2008 16:02:03 +0200 Subject: [FLASH-USERS] Big Problem?? In-Reply-To: <48A5E0DF.1050401@flash.uchicago.edu> References: <48441DC4.9050701@astro.rug.nl> <2467fdc0807181010q52dcbecft6793e7c00dcd1518@mail.gmail.com> <4880D78C.4020708@astro.rug.nl> <4880DC19.7070205@scs.fsu.edu> <488F58E1.9050306@astro.rug.nl> <488F7953.4010802@flash.uchicago.edu> <2467fdc0807291324t5368140ax904909774b1bf582@mail.gmail.com> <489038FC.2030005@astro.rug.nl> <2467fdc0807300705n71a85d08gabfc731daee9c955@mail.gmail.com> <4896FD5A.50202@astro.rug.nl> <2467fdc0808040820t405dbd15j19359a4558417803@mail.gmail.com> <489732F7.8090902@oddjob.uchicago.edu> <48974C67.5010809@flash.uchicago.edu> <48A4655B.90801@astro.rug.nl> <48A5E0DF.1050401@flash.uchicago.edu> Message-ID: <8f4d75acec0506d8e133e6319cc172c0@astro.rug.nl> Hi Chris, The thing is, I spent many hours already trying to switch to FLASH3.0. Besides the name changes, the stucture of FLASH has changed in version 3. Together with my colleague we spent a lot of time but we can't get past certain things. We want to implement cooling function and for that we need to implement multispecies were we get problems. We also get stuck when we want to start with density and temperature instead of density and pressure thus we need to change the EOS. We must understand the code better in order to implement these things amongst others. This is the reason that we, for the moment, still stick with FLASH2.5. Don't think we do not want to switch to FLASH3.0, we most certainly do. One other problem we detected with FLASH2.5 jeans setup is that the jeans setup has its own refinement criteria using delta_ref and delta_deref together with the reference density. Except that "ctore" or better yet the refinement_... is not mentioned in the config file of the setup causing an undefined parameter getting a crazy value depending on compiler. This we found by accident when we were checking where the differences are coming from between the two machines (32-bit machine with ifort 8.0 -vs.- 64-bit machine with ifort 10.1). Greetz, Seyit On Fri, 15 Aug 2008 15:02:39 -0500, Chris Daley wrote: > Hi Seyit, > > The compiler flag is doing its job - we get a run-time error when an > uninitialised variable is used. > > 1. I can confirm that the abort from perfmon.F90 is an error. Notice > that the subroutine argument is "check_ptnum", but the condition in the > "if" statement evaluates a variable named "checkpt_num". This emphasises > the > importance of "implicit none". > > 2. The abort from poisson_mg_relax.F90 stems from a situation which will > not normally cause problems in a simulation. This is because the "error" > variable is > part of the following logical expression. > > done = (iter == nsmooth) .or. ((iter > iterating_to_convergence_limit) > .and. (error > 0.)) > > i.e. (iter > iterating_to_convergence_limit) .and. (error > 0.). So we > need both conditions to be .true. to obtain a .true. in the cumulative > expression. > In the poisson_mg_relax.F90 source, "error" is only ever > initialised when "iter" > "iterating_to_convergence_limit", so the > times when "error" is not initialised will generally not be an issue. > > In pursuit of robust code you should correct the code as follows: > > 1. Rename "checkpt_num" variable to "check_ptnum" > 2. Initialise "error" to 0.0 after the variable declaration. > > > If you re-compile and run you will hopefully get a similar kind > of abort during execution of your simulation specific code. The error can > then > be resolved using similar steps as above. You must keep the -g flag as it > gives us line-level information in the stack traces. > > Finally, I would recommend that you use FLASH3 if it > supports the physics that you need. It is much easier for > us to answer your FLASH3 questions rather than FLASH2 questions, > and we would rather spend our time making FLASH3 a better > application. > > Chris > > > > > > Seyit Hocuk wrote: >> Hi Chris, Nathan, Carlo, >> >> It has been a while now, but finally I had time to check some stuff. >> Like Chris said I tried compiling with -CU (check uninit) and also >> with -ftrapuv. The error with -CU is shown here below. However, the >> same error is showing when I use normal jeans setup and a different >> error is generated with Sedov test problem. I think there are many >> undeclared variables or subroutines anyway. When I use -ftrapuv, I get >> segmentation faults. These debug options are not much use unfortunately. >> >> By the way, what is the use of -g when debugging. It seems like it is >> doing nothing. >> >> >> >> * JEANS SETUP >> >> perturbation is unstable with growth time 8776137041619.21 >> forrtl: severe (193): Run-Time Check Failure. The variable >> 'poisson_mg_relax_$ERROR' is being used without being defined >> Image PC Routine Line >> Source flash2 00000000005B70FE >> Unknown Unknown Unknown >> flash2 00000000005B62FE Unknown Unknown >> Unknown >> flash2 000000000056EB56 Unknown Unknown >> Unknown >> flash2 0000000000536D51 Unknown Unknown >> Unknown >> flash2 00000000005383C6 Unknown Unknown >> Unknown >> flash2 00000000004D31E4 poisson_mg_relax_ 315 >> poisson_mg_relax.F90 >> flash2 00000000004D4DE9 poisson_mg_solve_ 38 >> poisson_mg_solve.F90 >> flash2 00000000004CA3C4 mg_cycle_ 82 >> mg_cycle.F90 >> flash2 00000000004CEEC8 multigrid_ 150 >> multigrid.F90 >> flash2 00000000004D249A poisson_ 87 >> poisson.F90 >> flash2 0000000000430CE6 modulegravpotenti 139 >> GravPotentialAllBlocks.F90 >> flash2 000000000042AF1B init_from_scratch 257 >> init_from_scratch.F90 >> flash2 000000000041D45A init_flash_ 324 >> init_flash.F90 >> flash2 000000000041C5BF MAIN__ 62 >> flash.F90 >> flash2 0000000000405362 Unknown Unknown >> Unknown >> libc.so.6 00007F7DBB0131C4 Unknown Unknown >> Unknown >> flash2 00000000004052A9 Unknown Unknown >> Unknown >> >> >> >> * SEDOV EXPLOSION SETUP >> >> >> [CHECKPOINT_WR] NOTE: will send 710 blocks per message. >> [CHECKPOINT_WR] Writing checkpoint file sedov_hdf5_chk_0000 >> Progress: | >> forrtl: severe (193): Run-Time Check Failure. The variable >> 'perfmon_mp_log_timers_$CHECKPT_NUM' is being used without being defined >> Image PC Routine Line >> Source flash2 00000000005A1856 >> Unknown Unknown Unknown >> flash2 00000000005A0A56 Unknown Unknown >> Unknown >> flash2 000000000055A646 Unknown Unknown >> Unknown >> flash2 0000000000522B81 Unknown Unknown >> Unknown >> flash2 00000000005241F6 Unknown Unknown >> Unknown >> flash2 00000000004CB301 perfmon_mp_log_ti 1067 >> perfmon.F90 >> flash2 0000000000465F75 checkpoint_wr_ 701 >> checkpoint_wr.F90 >> flash2 000000000045E2F6 output_initial_ 189 >> output_initial.F90 >> flash2 000000000041DB09 init_flash_ 335 >> init_flash.F90 >> flash2 000000000041CC33 MAIN__ 62 >> flash.F90 >> flash2 00000000004051E2 Unknown Unknown >> Unknown >> libc.so.6 00007F0BCE1FF1C4 Unknown Unknown >> Unknown >> flash2 0000000000405129 Unknown Unknown >> Unknown >> >> >> >> >> >> >> >> >> >> >> >> >> >> Chris Daley wrote: >>> Hi Seyit, >>> >>> I fully agree with Carlo's recommendation - Valgrind is an excellent >>> tool. However, before resorting to such a powerful tool, it may be >>> worthwhile using your compiler to detect uninitialised data. I notice >>> that you have the intel compiler on two of your computing platforms. >>> >>> Try adding "-check uninit" and "-traceback" to your Fortran >>> (debug) compilation options. This will check if an uninitialised >>> variable >>> is used in a calculation, and will generate a runtime error if it is >>> used. >>> >>> I've just run a mini test in which I used a variable in my >>> Simulation_initBlock.F90 >>> that is never initialised. Here is the output: >>> >>> forrtl: severe (193): Run-Time Check Failure. The variable >>> 'simulation_initblock_$PP' is being used without being defined >>> Image PC Routine Line >>> Source >>> flash3 00000000006B010B Unknown Unknown >>> Unknown >>> flash3 00000000006AE446 Unknown Unknown >>> Unknown >>> flash3 00000000006878DE Unknown Unknown >>> Unknown >>> flash3 000000000065CEB2 Unknown Unknown >>> Unknown >>> flash3 000000000065D147 Unknown Unknown >>> Unknown >>> flash3 000000000049D3DC simulation_initbl 88 >>> Simulation_initBlock.F90 >>> flash3 000000000044F05B Unknown Unknown >>> Unknown >>> flash3 000000000040E152 Unknown Unknown >>> Unknown >>> flash3 000000000041211D Unknown Unknown >>> Unknown >>> flash3 0000000000404A6A Unknown Unknown >>> Unknown >>> libc.so.6 000000355501C3FB Unknown Unknown >>> Unknown >>> flash3 00000000004049AA Unknown Unknown >>> Unknown >>> p0_31573: p4_error: interrupt SIGx: 13 >>> >>> Remember to compile your code in "debug" mode, i.e. to >>> include the -g compilation flag and no optimisations. >>> >>> You may also want to look at the option "-ftrapuv" which initialises >>> local stack variables to "unusual values". Further information about >>> all of these options can be found in the intel man page. >>> >>> Regards, >>> Chris >>> >>> >>> Carlo Graziani wrote: >>>> Hi Seyit. >>>> >>>> Nathan's suggestion of investigating un-initialized variables use >>>> and erratically allocated/deallocated memory is very sensible for >>>> tracking down a problem that manifests itself as unpredictable >>>> behavior. >>>> >>>> There is actually an open-source tool for doing this called valgrind. >>>> It may already be installed on your local linux systems, and if not it >>>> is easy enough to obtain. >>>> >>>> One runs a program under valgrind very simply (the documentation gives >>>> more options): >>>> >>>> Prompt> valgrind >>>> >>>> Then one sits back and digests the potentially-voluminous output. >>>> >>>> Valgrind will flag any access to uninitialized memory and >>>> memory-management >>>> screw-ups. >>>> >>>> Caveats are: (1) It's slow; >>>> >>>> (2) You'd be amazed at how much valgrind finds distasteful in system >>>> libraries. You'll probably have to filter away a bunch of >>>> uninteresting >>>> warnings about libc/mpi/hdf5 and so on (which are probably harmless, >>>> and which you can't do much about anyway). Valgrind has some >>>> facilities >>>> for suppressing certain types of warnings, which you can use to cut >>>> down >>>> the noise. >>>> >>>> If you can make a small version of the problem, running on one >>>> processor, that >>>> exhibits the erratic behavior, this would probably be an ideal case to >>>> feed to valgrind. There's some support for parallel debugging, but >>>> you'd probably >>>> have to spend some quality time with documentation and haunt some > other >>>> mailing lists to get that running. >>>> >>>> Cheers, >>>> >>>> Carlo >>>> >>>> Nathan Hearn wrote: >>>>> Hi Seyit, >>>>> >>>>> An uninitialized variable is one that is declared (specified as >>>>> integer, real, etc.), but not assigned a value. Thus, an >>>>> uninitialized variable usually has whatever value was in its memory >>>>> location before it was declared. (It could be a random number, >>>>> "infinity," or just garbage.) If this variable gets used before a >>>>> value is assigned to it, strange behavior may result, which would be >>>>> very compiler- and architecture-specific). If you are using >>>>> uninitialized pointer or allocatable variables, the effects can be >>>>> quite drastic and hard to identify. >>>>> >>>>> Generally speaking, it is a good idea to assign a value to every >>>>> variable soon after it is declared, even if it is only a temporary >>>>> value that is not actually used. (As I recall, there is a way to >>>>> assign a null value to pointers, which is also a very useful >>>>> practice.) >>>>> >>>>> >>>>> - Nathan >>>>> >>>>> >>>>> On 8/4/08, Seyit Hocuk wrote: >>>>>> Hi Paul, hi Nathan, >>>>>> >>>>>> First of all; using --with-default-api-version=v16 when configuring >>>>>> hdf5-1.8.1 works fine. Thanks for that Paul. >>>>>> >>>>>> Nathan, if you mean by uninitialized that the types are not defined >>>>>> (like in config file REAL or INTEGER or whatever), then no because I >>>>>> define them all. But if you mean I have included modules or >>>>>> parameters >>>>>> which I don't use, that's correct and I am no expert in >>>>>> programming so >>>>>> it might indeed be good to check this. >>>>>> >>>>>> Greetz, >>>>>> Seyit >>>> >>>> >>> From seyit at astro.rug.nl Mon Aug 18 09:38:26 2008 From: seyit at astro.rug.nl (Seyit Hocuk) Date: Mon, 18 Aug 2008 16:38:26 +0200 Subject: [FLASH-USERS] Big Problem?? In-Reply-To: <48A5E0DF.1050401@flash.uchicago.edu> References: <48441DC4.9050701@astro.rug.nl> <2467fdc0807181010q52dcbecft6793e7c00dcd1518@mail.gmail.com> <4880D78C.4020708@astro.rug.nl> <4880DC19.7070205@scs.fsu.edu> <488F58E1.9050306@astro.rug.nl> <488F7953.4010802@flash.uchicago.edu> <2467fdc0807291324t5368140ax904909774b1bf582@mail.gmail.com> <489038FC.2030005@astro.rug.nl> <2467fdc0807300705n71a85d08gabfc731daee9c955@mail.gmail.com> <4896FD5A.50202@astro.rug.nl> <2467fdc0808040820t405dbd15j19359a4558417803@mail.gmail.com> <489732F7.8090902@oddjob.uchicago.edu> <48974C67.5010809@flash.uchicago.edu> <48A4655B.90801@astro.rug.nl> <48A5E0DF.1050401@flash.uchicago.edu> Message-ID: <48A98962.6050407@astro.rug.nl> Hi Chris, and others, Thank you, that solved the two errors for standard jeans and sedov setup by the way. However, those were unrelated to our problem of differences. Besides, we found another mistake in jeans setup ref_marking routine. There was a "save" missing when declaring refine_cutoff and derefine_cutoff and the filter. Anyway, after these, there were no other errors showing up when debugging, so the code is a-ok also for our own setup. No uninitialized parameters. We downloaded the latest compilers of ifort (10.1.012 / 015 / 017) and tried them each. The difference between ifort 8.0 and 10.1 seems huge even on the same machine! Refinement is very different even for normal jeans setup. The differences became very small when we reduced the boxsize just to check. Normally I use boxsize of 8.0E19, when we tried 8.0E9 (mind you its 10 orders of magnitude less), the differences were minimal. Now comparing the two computers, one 32-bit one 64-bit but both same/similar compilers, results in very little differences. This means, I guess, that there was something wrong with ifort 8.0 (for large numbers perhaps?). If so, that anoying thing costed us many many hours, days, weeks and headaches of searching. The slight differences that still exist can hopefully be contributed to the difference in amount of bits. I hope you guys can answer that better. Thanks, Seyit ps: don't say switch to FLASH3.0, because I am trying. Chris Daley wrote: > Hi Seyit, > > The compiler flag is doing its job - we get a run-time error when an > uninitialised variable is used. > > 1. I can confirm that the abort from perfmon.F90 is an error. Notice > that the subroutine argument is "check_ptnum", but the condition in the > "if" statement evaluates a variable named "checkpt_num". This > emphasises the > importance of "implicit none". > > 2. The abort from poisson_mg_relax.F90 stems from a situation which will > not normally cause problems in a simulation. This is because the > "error" variable is > part of the following logical expression. > > done = (iter == nsmooth) .or. ((iter > iterating_to_convergence_limit) > .and. (error > 0.)) > > i.e. (iter > iterating_to_convergence_limit) .and. (error > 0.). So we > need both conditions to be .true. to obtain a .true. in the cumulative > expression. In the poisson_mg_relax.F90 source, "error" is only ever > initialised when "iter" > "iterating_to_convergence_limit", so the > times when "error" is not initialised will generally not be an issue. > > In pursuit of robust code you should correct the code as follows: > > 1. Rename "checkpt_num" variable to "check_ptnum" > 2. Initialise "error" to 0.0 after the variable declaration. > > > If you re-compile and run you will hopefully get a similar kind > of abort during execution of your simulation specific code. The error > can then > be resolved using similar steps as above. You must keep the -g flag > as it > gives us line-level information in the stack traces. > > Finally, I would recommend that you use FLASH3 if it > supports the physics that you need. It is much easier for > us to answer your FLASH3 questions rather than FLASH2 questions, > and we would rather spend our time making FLASH3 a better > application. > > Chris > > > > > > Seyit Hocuk wrote: >> Hi Chris, Nathan, Carlo, >> >> It has been a while now, but finally I had time to check some stuff. >> Like Chris said I tried compiling with -CU (check uninit) and also >> with -ftrapuv. The error with -CU is shown here below. However, the >> same error is showing when I use normal jeans setup and a different >> error is generated with Sedov test problem. I think there are many >> undeclared variables or subroutines anyway. When I use -ftrapuv, I >> get segmentation faults. These debug options are not much use >> unfortunately. >> >> By the way, what is the use of -g when debugging. It seems like it is >> doing nothing. >> >> >> >> * JEANS SETUP >> >> perturbation is unstable with growth time 8776137041619.21 >> forrtl: severe (193): Run-Time Check Failure. The variable >> 'poisson_mg_relax_$ERROR' is being used without being defined >> Image PC Routine Line >> Source flash2 00000000005B70FE >> Unknown Unknown Unknown >> flash2 00000000005B62FE Unknown Unknown >> Unknown >> flash2 000000000056EB56 Unknown Unknown >> Unknown >> flash2 0000000000536D51 Unknown Unknown >> Unknown >> flash2 00000000005383C6 Unknown Unknown >> Unknown >> flash2 00000000004D31E4 poisson_mg_relax_ 315 >> poisson_mg_relax.F90 >> flash2 00000000004D4DE9 poisson_mg_solve_ 38 >> poisson_mg_solve.F90 >> flash2 00000000004CA3C4 mg_cycle_ 82 >> mg_cycle.F90 >> flash2 00000000004CEEC8 multigrid_ 150 >> multigrid.F90 >> flash2 00000000004D249A poisson_ 87 >> poisson.F90 >> flash2 0000000000430CE6 modulegravpotenti 139 >> GravPotentialAllBlocks.F90 >> flash2 000000000042AF1B init_from_scratch 257 >> init_from_scratch.F90 >> flash2 000000000041D45A init_flash_ 324 >> init_flash.F90 >> flash2 000000000041C5BF MAIN__ 62 >> flash.F90 >> flash2 0000000000405362 Unknown Unknown >> Unknown >> libc.so.6 00007F7DBB0131C4 Unknown Unknown >> Unknown >> flash2 00000000004052A9 Unknown Unknown >> Unknown >> >> >> >> * SEDOV EXPLOSION SETUP >> >> >> [CHECKPOINT_WR] NOTE: will send 710 blocks per message. >> [CHECKPOINT_WR] Writing checkpoint file sedov_hdf5_chk_0000 >> Progress: | >> forrtl: severe (193): Run-Time Check Failure. The variable >> 'perfmon_mp_log_timers_$CHECKPT_NUM' is being used without being defined >> Image PC Routine Line >> Source flash2 00000000005A1856 >> Unknown Unknown Unknown >> flash2 00000000005A0A56 Unknown Unknown >> Unknown >> flash2 000000000055A646 Unknown Unknown >> Unknown >> flash2 0000000000522B81 Unknown Unknown >> Unknown >> flash2 00000000005241F6 Unknown Unknown >> Unknown >> flash2 00000000004CB301 perfmon_mp_log_ti 1067 >> perfmon.F90 >> flash2 0000000000465F75 checkpoint_wr_ 701 >> checkpoint_wr.F90 >> flash2 000000000045E2F6 output_initial_ 189 >> output_initial.F90 >> flash2 000000000041DB09 init_flash_ 335 >> init_flash.F90 >> flash2 000000000041CC33 MAIN__ 62 >> flash.F90 >> flash2 00000000004051E2 Unknown Unknown >> Unknown >> libc.so.6 00007F0BCE1FF1C4 Unknown Unknown >> Unknown >> flash2 0000000000405129 Unknown Unknown >> Unknown >> >> >> >> >> >> >> >> >> >> >> >> >> >> Chris Daley wrote: >>> Hi Seyit, >>> >>> I fully agree with Carlo's recommendation - Valgrind is an excellent >>> tool. However, before resorting to such a powerful tool, it may be >>> worthwhile using your compiler to detect uninitialised data. I notice >>> that you have the intel compiler on two of your computing platforms. >>> >>> Try adding "-check uninit" and "-traceback" to your Fortran >>> (debug) compilation options. This will check if an uninitialised >>> variable >>> is used in a calculation, and will generate a runtime error if it is >>> used. >>> >>> I've just run a mini test in which I used a variable in my >>> Simulation_initBlock.F90 >>> that is never initialised. Here is the output: >>> >>> forrtl: severe (193): Run-Time Check Failure. The variable >>> 'simulation_initblock_$PP' is being used without being defined >>> Image PC Routine Line >>> Source >>> flash3 00000000006B010B Unknown Unknown >>> Unknown >>> flash3 00000000006AE446 Unknown Unknown >>> Unknown >>> flash3 00000000006878DE Unknown Unknown >>> Unknown >>> flash3 000000000065CEB2 Unknown Unknown >>> Unknown >>> flash3 000000000065D147 Unknown Unknown >>> Unknown >>> flash3 000000000049D3DC simulation_initbl 88 >>> Simulation_initBlock.F90 >>> flash3 000000000044F05B Unknown Unknown >>> Unknown >>> flash3 000000000040E152 Unknown Unknown >>> Unknown >>> flash3 000000000041211D Unknown Unknown >>> Unknown >>> flash3 0000000000404A6A Unknown Unknown >>> Unknown >>> libc.so.6 000000355501C3FB Unknown Unknown >>> Unknown >>> flash3 00000000004049AA Unknown Unknown >>> Unknown >>> p0_31573: p4_error: interrupt SIGx: 13 >>> >>> Remember to compile your code in "debug" mode, i.e. to >>> include the -g compilation flag and no optimisations. >>> >>> You may also want to look at the option "-ftrapuv" which initialises >>> local stack variables to "unusual values". Further information about >>> all of these options can be found in the intel man page. >>> >>> Regards, >>> Chris >>> >>> >>> Carlo Graziani wrote: >>>> Hi Seyit. >>>> >>>> Nathan's suggestion of investigating un-initialized variables use >>>> and erratically allocated/deallocated memory is very sensible for >>>> tracking down a problem that manifests itself as unpredictable >>>> behavior. >>>> >>>> There is actually an open-source tool for doing this called valgrind. >>>> It may already be installed on your local linux systems, and if not it >>>> is easy enough to obtain. >>>> >>>> One runs a program under valgrind very simply (the documentation gives >>>> more options): >>>> >>>> Prompt> valgrind >>>> >>>> Then one sits back and digests the potentially-voluminous output. >>>> >>>> Valgrind will flag any access to uninitialized memory and >>>> memory-management >>>> screw-ups. >>>> >>>> Caveats are: (1) It's slow; >>>> >>>> (2) You'd be amazed at how much valgrind finds distasteful in system >>>> libraries. You'll probably have to filter away a bunch of >>>> uninteresting >>>> warnings about libc/mpi/hdf5 and so on (which are probably harmless, >>>> and which you can't do much about anyway). Valgrind has some >>>> facilities >>>> for suppressing certain types of warnings, which you can use to cut >>>> down >>>> the noise. >>>> >>>> If you can make a small version of the problem, running on one >>>> processor, that >>>> exhibits the erratic behavior, this would probably be an ideal case to >>>> feed to valgrind. There's some support for parallel debugging, but >>>> you'd probably >>>> have to spend some quality time with documentation and haunt some >>>> other >>>> mailing lists to get that running. >>>> >>>> Cheers, >>>> >>>> Carlo >>>> >>>> Nathan Hearn wrote: >>>>> Hi Seyit, >>>>> >>>>> An uninitialized variable is one that is declared (specified as >>>>> integer, real, etc.), but not assigned a value. Thus, an >>>>> uninitialized variable usually has whatever value was in its memory >>>>> location before it was declared. (It could be a random number, >>>>> "infinity," or just garbage.) If this variable gets used before a >>>>> value is assigned to it, strange behavior may result, which would be >>>>> very compiler- and architecture-specific). If you are using >>>>> uninitialized pointer or allocatable variables, the effects can be >>>>> quite drastic and hard to identify. >>>>> >>>>> Generally speaking, it is a good idea to assign a value to every >>>>> variable soon after it is declared, even if it is only a temporary >>>>> value that is not actually used. (As I recall, there is a way to >>>>> assign a null value to pointers, which is also a very useful >>>>> practice.) >>>>> >>>>> >>>>> - Nathan >>>>> >>>>> >>>>> On 8/4/08, Seyit Hocuk wrote: >>>>>> Hi Paul, hi Nathan, >>>>>> >>>>>> First of all; using --with-default-api-version=v16 when configuring >>>>>> hdf5-1.8.1 works fine. Thanks for that Paul. >>>>>> >>>>>> Nathan, if you mean by uninitialized that the types are not defined >>>>>> (like in config file REAL or INTEGER or whatever), then no because I >>>>>> define them all. But if you mean I have included modules or >>>>>> parameters >>>>>> which I don't use, that's correct and I am no expert in >>>>>> programming so >>>>>> it might indeed be good to check this. >>>>>> >>>>>> Greetz, >>>>>> Seyit >>>> >>>> >>> > From klaus at flash.uchicago.edu Mon Aug 18 14:48:32 2008 From: klaus at flash.uchicago.edu (Klaus Weide) Date: Mon, 18 Aug 2008 14:48:32 -0500 (CDT) Subject: [FLASH-USERS] Big Problem?? In-Reply-To: <48A98962.6050407@astro.rug.nl> References: <48441DC4.9050701@astro.rug.nl> <2467fdc0807181010q52dcbecft6793e7c00dcd1518@mail.gmail.com> <4880D78C.4020708@astro.rug.nl> <4880DC19.7070205@scs.fsu.edu> <488F58E1.9050306@astro.rug.nl> <488F7953.4010802@flash.uchicago.edu> <2467fdc0807291324t5368140ax904909774b1bf582@mail.gmail.com> <489038FC.2030005@astro.rug.nl> <2467fdc0807300705n71a85d08gabfc731daee9c955@mail.gmail.com> <4896FD5A.50202@astro.rug.nl> <2467fdc0808040820t405dbd15j19359a4558417803@mail.gmail.com> <489732F7.8090902@oddjob.uchicago.edu> <48974C67.5010809@flash.uchicago.edu> <48A4655B.90801@astro.rug.nl> <48A5E0DF.1050401@flash.uchicago.edu> <48A98962.6050407@astro.rug.nl> Message-ID: On Mon, 18 Aug 2008, Seyit Hocuk wrote: > Anyway, after these, there were no other errors showing up when debugging, so > the code is a-ok also for our own setup. No uninitialized parameters. We > downloaded the latest compilers of ifort (10.1.012 / 015 / 017) and tried > them each. The difference between ifort 8.0 and 10.1 seems huge even on the > same machine! Refinement is very different even for normal jeans setup. The > differences became very small when we reduced the boxsize just to check. > Normally I use boxsize of 8.0E19, when we tried 8.0E9 (mind you its 10 orders > of magnitude less), the differences were minimal. Seyit, I note that setups/jeans/ref_marking.F90 has a hardwirded "small number" here: do kk = 1, ndim2 num = num + delu2(kk)**2 denom = denom + (delu3(kk) + & & (epsil*delu4(kk)+1.e-20))**2 end do Maybe that has something to do with the dependence of your results on box sizes? You may want to try if changing that number to something many orders of magnitude different changes your findings. Klaus From seyit at astro.rug.nl Tue Aug 19 09:55:12 2008 From: seyit at astro.rug.nl (Seyit Hocuk) Date: Tue, 19 Aug 2008 16:55:12 +0200 Subject: [FLASH-USERS] Big Problem?? In-Reply-To: References: <48441DC4.9050701@astro.rug.nl> <2467fdc0807181010q52dcbecft6793e7c00dcd1518@mail.gmail.com> <4880D78C.4020708@astro.rug.nl> <4880DC19.7070205@scs.fsu.edu> <488F58E1.9050306@astro.rug.nl> <488F7953.4010802@flash.uchicago.edu> <2467fdc0807291324t5368140ax904909774b1bf582@mail.gmail.com> <489038FC.2030005@astro.rug.nl> <2467fdc0807300705n71a85d08gabfc731daee9c955@mail.gmail.com> <4896FD5A.50202@astro.rug.nl> <2467fdc0808040820t405dbd15j19359a4558417803@mail.gmail.com> <489732F7.8090902@oddjob.uchicago.edu> <48974C67.5010809@flash.uchicago.edu> <48A4655B.90801@astro.rug.nl> <48A5E0DF.1050401@flash.uchicago.edu> <48A98962.6050407@astro.rug.nl> Message-ID: <48AADED0.3090702@astro.rug.nl> Klaus, Actually, I think that number might be even too big. It causes the second order derivative refinement not to work. If the number i reduced to 1E-99, then the second order derivative refinement works. Seyit Klaus Weide wrote: > On Mon, 18 Aug 2008, Seyit Hocuk wrote: > >> Anyway, after these, there were no other errors showing up when >> debugging, so the code is a-ok also for our own setup. No >> uninitialized parameters. We downloaded the latest compilers of ifort >> (10.1.012 / 015 / 017) and tried them each. The difference between >> ifort 8.0 and 10.1 seems huge even on the same machine! Refinement is >> very different even for normal jeans setup. The differences became >> very small when we reduced the boxsize just to check. Normally I use >> boxsize of 8.0E19, when we tried 8.0E9 (mind you its 10 orders of >> magnitude less), the differences were minimal. > > Seyit, > > I note that setups/jeans/ref_marking.F90 has a hardwirded "small > number" here: > > do kk = 1, ndim2 > num = num + delu2(kk)**2 > denom = denom + (delu3(kk) + & > & (epsil*delu4(kk)+1.e-20))**2 > end do > > Maybe that has something to do with the dependence of your results on > box sizes? You may want to try if changing that number to something > many orders of magnitude different changes your findings. > > Klaus From John.K.Cannizzo at nasa.gov Tue Aug 19 14:03:34 2008 From: John.K.Cannizzo at nasa.gov (John Cannizzo) Date: Tue, 19 Aug 2008 15:03:34 -0400 (EDT) Subject: [FLASH-USERS] boundary condition question (r=0 line in cylindrical coordinates) Message-ID: A boundary condition question: In cylindrical coordinates, how is locus corresponding to r=0 handled in relativistic hydrodynamical calculations? If someone could point me to the relevant stretch of code within FLASH, for example, that would be great, --many thanks! --John From justin-parsons at uiowa.edu Tue Aug 19 16:19:00 2008 From: justin-parsons at uiowa.edu (Parsons, Justin T) Date: Tue, 19 Aug 2008 16:19:00 -0500 Subject: [FLASH-USERS] error in compiling Sedov example Message-ID: <6F972D706464F746A870605073DE96614278C1C576@IOWAEVS04.iowa.uiowa.edu> Greetings! I'm in the process of testing my FLASH install and familiarizing myself with the code. I've been trying to compile the Sedov problem but keep running in to this error: --ERROR-- /home/jussn/mpich2-install/lib/libmpich.a(simple_pmi.o): In function `PMI_Init': simple_pmi.c:(.text+0x1fcd): warning: Using 'gethostbyname' in statically linked applicatio ns requires at runtime the shared libraries from the glibc version used for linking /usr/lib/gcc/i486-linux-gnu/4.2.3/../../../../lib/librt.a(aio_misc.o): In function `__aio_e nqueue_request': (.text+0x1c3): undefined reference to `pthread_getschedparam' /usr/lib/gcc/i486-linux-gnu/4.2.3/../../../../lib/librt.a(aio_misc.o): In function `__aio_e nqueue_request': (.text+0x2dd): undefined reference to `pthread_cond_signal' /usr/lib/gcc/i486-linux-gnu/4.2.3/../../../../lib/librt.a(aio_misc.o): In function `__aio_e nqueue_request': (.text+0x3f7): undefined reference to `pthread_attr_init' /usr/lib/gcc/i486-linux-gnu/4.2.3/../../../../lib/librt.a(aio_misc.o): In function `__aio_e nqueue_request': (.text+0x407): undefined reference to `pthread_attr_setdetachstate' /usr/lib/gcc/i486-linux-gnu/4.2.3/../../../../lib/librt.a(aio_misc.o): In function `__aio_e nqueue_request': (.text+0x41c): undefined reference to `pthread_attr_setstacksize' /usr/lib/gcc/i486-linux-gnu/4.2.3/../../../../lib/librt.a(aio_misc.o): In function `__aio_e nqueue_request': (.text+0x489): undefined reference to `pthread_attr_destroy' /usr/lib/gcc/i486-linux-gnu/4.2.3/../../../../lib/librt.a(aio_misc.o): In function `handle_ fildes_io': (.text+0x676): undefined reference to `pthread_getschedparam' /usr/lib/gcc/i486-linux-gnu/4.2.3/../../../../lib/librt.a(aio_misc.o): In function `handle_ fildes_io': (.text+0x6bd): undefined reference to `pthread_setschedparam' /usr/lib/gcc/i486-linux-gnu/4.2.3/../../../../lib/librt.a(aio_misc.o): In function `handle_ fildes_io': (.text+0x790): undefined reference to `pthread_cond_signal' /usr/lib/gcc/i486-linux-gnu/4.2.3/../../../../lib/librt.a(aio_misc.o): In function `handle_ fildes_io': (.text+0x9d1): undefined reference to `pthread_attr_init' /usr/lib/gcc/i486-linux-gnu/4.2.3/../../../../lib/librt.a(aio_misc.o): In function `handle_ fildes_io': (.text+0x9e1): undefined reference to `pthread_attr_setdetachstate' /usr/lib/gcc/i486-linux-gnu/4.2.3/../../../../lib/librt.a(aio_misc.o): In function `handle_ fildes_io': (.text+0xa7c): undefined reference to `pthread_cond_timedwait' /usr/lib/gcc/i486-linux-gnu/4.2.3/../../../../lib/librt.a(aio_notify.o): In function `__aio _notify_only': (.text+0xa9): undefined reference to `pthread_attr_init' /usr/lib/gcc/i486-linux-gnu/4.2.3/../../../../lib/librt.a(aio_notify.o): In function `__aio _notify_only': (.text+0xb9): undefined reference to `pthread_attr_setdetachstate' collect2: ld returned 1 exit status make: *** [flash3] Error 1 --ERROR-- I'm not quite sure where to go from here. Makefile.h in my "object" dir has the correct paths and, I believe, I've installed all necessary libraries and deps. Any tips? System is Ubuntu 8.04 with AMD 32bit. Cheers Justin Parsons http://jussn.beevomit.org From richp at flash.uchicago.edu Tue Aug 19 16:37:47 2008 From: richp at flash.uchicago.edu (Paul M. Rich) Date: Tue, 19 Aug 2008 16:37:47 -0500 Subject: [FLASH-USERS] error in compiling Sedov example In-Reply-To: <6F972D706464F746A870605073DE96614278C1C576@IOWAEVS04.iowa.uiowa.edu> References: <6F972D706464F746A870605073DE96614278C1C576@IOWAEVS04.iowa.uiowa.edu> Message-ID: <48AB3D2B.5040901@flash.uchicago.edu> Justin, I've seen things like this happen like this before and these look like they are from libpthread. Effectively it is a dependency of a dependency. Are you linking statically or dynamically in your Makefile.h? You can try manually including that library in your Makefile.h to ensure that these symbols are defined. Does that help? Paul Rich -------------------------------- richp at flash.uchicago.edu ASC Flash Center University of Chicago Parsons, Justin T wrote: > Greetings! > > I'm in the process of testing my FLASH install and familiarizing myself with the code. I've been trying to compile the Sedov problem but keep running in to this error: > > --ERROR-- > /home/jussn/mpich2-install/lib/libmpich.a(simple_pmi.o): In function `PMI_Init': > simple_pmi.c:(.text+0x1fcd): warning: Using 'gethostbyname' in statically linked applicatio > ns requires at runtime the shared libraries from the glibc version used for linking > /usr/lib/gcc/i486-linux-gnu/4.2.3/../../../../lib/librt.a(aio_misc.o): In function `__aio_e > nqueue_request': > (.text+0x1c3): undefined reference to `pthread_getschedparam' > /usr/lib/gcc/i486-linux-gnu/4.2.3/../../../../lib/librt.a(aio_misc.o): In function `__aio_e > nqueue_request': > (.text+0x2dd): undefined reference to `pthread_cond_signal' > /usr/lib/gcc/i486-linux-gnu/4.2.3/../../../../lib/librt.a(aio_misc.o): In function `__aio_e > nqueue_request': > (.text+0x3f7): undefined reference to `pthread_attr_init' > /usr/lib/gcc/i486-linux-gnu/4.2.3/../../../../lib/librt.a(aio_misc.o): In function `__aio_e > nqueue_request': > (.text+0x407): undefined reference to `pthread_attr_setdetachstate' > /usr/lib/gcc/i486-linux-gnu/4.2.3/../../../../lib/librt.a(aio_misc.o): In function `__aio_e > nqueue_request': > (.text+0x41c): undefined reference to `pthread_attr_setstacksize' > /usr/lib/gcc/i486-linux-gnu/4.2.3/../../../../lib/librt.a(aio_misc.o): In function `__aio_e > nqueue_request': > (.text+0x489): undefined reference to `pthread_attr_destroy' > /usr/lib/gcc/i486-linux-gnu/4.2.3/../../../../lib/librt.a(aio_misc.o): In function `handle_ > fildes_io': > (.text+0x676): undefined reference to `pthread_getschedparam' > /usr/lib/gcc/i486-linux-gnu/4.2.3/../../../../lib/librt.a(aio_misc.o): In function `handle_ > fildes_io': > (.text+0x6bd): undefined reference to `pthread_setschedparam' > /usr/lib/gcc/i486-linux-gnu/4.2.3/../../../../lib/librt.a(aio_misc.o): In function `handle_ > fildes_io': > (.text+0x790): undefined reference to `pthread_cond_signal' > /usr/lib/gcc/i486-linux-gnu/4.2.3/../../../../lib/librt.a(aio_misc.o): In function `handle_ > fildes_io': > (.text+0x9d1): undefined reference to `pthread_attr_init' > /usr/lib/gcc/i486-linux-gnu/4.2.3/../../../../lib/librt.a(aio_misc.o): In function `handle_ > fildes_io': > (.text+0x9e1): undefined reference to `pthread_attr_setdetachstate' > /usr/lib/gcc/i486-linux-gnu/4.2.3/../../../../lib/librt.a(aio_misc.o): In function `handle_ > fildes_io': > (.text+0xa7c): undefined reference to `pthread_cond_timedwait' > /usr/lib/gcc/i486-linux-gnu/4.2.3/../../../../lib/librt.a(aio_notify.o): In function `__aio > _notify_only': > (.text+0xa9): undefined reference to `pthread_attr_init' > /usr/lib/gcc/i486-linux-gnu/4.2.3/../../../../lib/librt.a(aio_notify.o): In function `__aio > _notify_only': > (.text+0xb9): undefined reference to `pthread_attr_setdetachstate' > collect2: ld returned 1 exit status > make: *** [flash3] Error 1 > --ERROR-- > > I'm not quite sure where to go from here. Makefile.h in my "object" dir has the correct paths and, I believe, I've installed all necessary libraries and deps. Any tips? > > System is Ubuntu 8.04 with AMD 32bit. > > > Cheers > Justin Parsons > http://jussn.beevomit.org > From mateuszr at umich.edu Sat Aug 23 13:52:44 2008 From: mateuszr at umich.edu (mateuszr at umich.edu) Date: Sat, 23 Aug 2008 14:52:44 -0400 Subject: [FLASH-USERS] restarting FLASH In-Reply-To: References: <48441DC4.9050701@astro.rug.nl> <2467fdc0807181010q52dcbecft6793e7c00dcd1518@mail.gmail.com> <4880D78C.4020708@astro.rug.nl> <4880DC19.7070205@scs.fsu.edu> <488F58E1.9050306@astro.rug.nl> <488F7953.4010802@flash.uchicago.edu> <2467fdc0807291324t5368140ax904909774b1bf582@mail.gmail.com> <489038FC.2030005@astro.rug.nl> <2467fdc0807300705n71a85d08gabfc731daee9c955@mail.gmail.com> <4896FD5A.50202@astro.rug.nl> <2467fdc0808040820t405dbd15j19359a4558417803@mail.gmail.com> <489732F7.8090902@oddjob.uchicago.edu> <48974C67.5010809@flash.uchicago.edu> <48A4655B.90801@astro.rug.nl> <48A5E0DF.1050401@flash.uchicago.edu> <48A98962.6050407@astro.rug.nl> Message-ID: <20080823145244.54885qcme0cxccqo@web.mail.umich.edu> Hi all, I am having trouble restarting the code. More specifically, I am getting a segmentation fault when I try to restart from a valid checkpoint file. The details are enclosed below. I would be vary grateful for some clues. thanks, Mateusz [mateuszr at galaxy ~/FLASH]$ more out.txt [io_readData] Opening test_hdf5_chk_0011 for restart rank 1 in job 2 galaxy001.astro.lsa.umich.edu_38345 caused collective abort of all ranks exit status of rank 1: killed by signal 9 [mateuszr at galaxy ~/FLASH]$ more flash_run.err forrtl: severe (174): SIGSEGV, segmentation fault occurred Image PC Routine Line Source flash3 000000000050CC48 Unknown Unknown Unknown flash3 000000000043EEBA Unknown Unknown Unknown flash3 000000000043C359 Unknown Unknown Unknown flash3 0000000000410513 Unknown Unknown Unknown flash3 0000000000415C19 Unknown Unknown Unknown flash3 0000000000406982 Unknown Unknown Unknown libc.so.6 00000035AFE1D8A4 Unknown Unknown Unknown flash3 00000000004068A9 Unknown Unknown Unknown touch: cannot touch `/scratch///.keep': Permission denied From richp at flash.uchicago.edu Mon Aug 25 11:27:30 2008 From: richp at flash.uchicago.edu (Paul M. Rich) Date: Mon, 25 Aug 2008 11:27:30 -0500 Subject: [FLASH-USERS] restarting FLASH In-Reply-To: <20080823145244.54885qcme0cxccqo@web.mail.umich.edu> References: <48441DC4.9050701@astro.rug.nl> <2467fdc0807181010q52dcbecft6793e7c00dcd1518@mail.gmail.com> <4880D78C.4020708@astro.rug.nl> <4880DC19.7070205@scs.fsu.edu> <488F58E1.9050306@astro.rug.nl> <488F7953.4010802@flash.uchicago.edu> <2467fdc0807291324t5368140ax904909774b1bf582@mail.gmail.com> <489038FC.2030005@astro.rug.nl> <2467fdc0807300705n71a85d08gabfc731daee9c955@mail.gmail.com> <4896FD5A.50202@astro.rug.nl> <2467fdc0808040820t405dbd15j19359a4558417803@mail.gmail.com> <489732F7.8090902@oddjob.uchicago.edu> <48974C67.5010809@flash.uchicago.edu> <48A4655B.90801@astro.rug.nl> <48A5E0DF.1050401@flash.uchicago.edu> <48A98962.6050407@astro.rug.nl> <20080823145244.54885qcme0cxccqo@web.mail.umich.edu> Message-ID: <48B2DD72.4030502@flash.uchicago.edu> Mateusz, We could use some more information so that we can help you figure this out. What Flash units are you using in your setup? Which ones have you customized or overridden in your setup, particularly if any of the initializations were changed? Are there any unusual runtime parameters being used? Which IO mode is this? This information will help us narrow down where to look considerably. Thanks, Paul Rich ------------------------------ ASC Flash Center University of Chicago richp at flash.uchicago.edu mateuszr at umich.edu wrote: > > Hi all, > > I am having trouble restarting the code. More specifically, I am > getting a segmentation fault when I try to restart from a valid > checkpoint file. The details are enclosed below. I would be vary > grateful for some clues. > > thanks, > Mateusz > > > [mateuszr at galaxy ~/FLASH]$ more out.txt > > [io_readData] Opening test_hdf5_chk_0011 for restart > rank 1 in job 2 galaxy001.astro.lsa.umich.edu_38345 caused > collective abort of all ranks > exit status of rank 1: killed by signal 9 > > [mateuszr at galaxy ~/FLASH]$ more flash_run.err > forrtl: severe (174): SIGSEGV, segmentation fault occurred > Image PC Routine Line > Source > flash3 000000000050CC48 Unknown Unknown > Unknown > flash3 000000000043EEBA Unknown Unknown > Unknown > flash3 000000000043C359 Unknown Unknown > Unknown > flash3 0000000000410513 Unknown Unknown > Unknown > flash3 0000000000415C19 Unknown Unknown > Unknown > flash3 0000000000406982 Unknown Unknown > Unknown > libc.so.6 00000035AFE1D8A4 Unknown Unknown > Unknown > flash3 00000000004068A9 Unknown Unknown > Unknown > touch: cannot touch `/scratch///.keep': Permission denied > From mateuszr at umich.edu Mon Aug 25 13:38:48 2008 From: mateuszr at umich.edu (mateuszr at umich.edu) Date: Mon, 25 Aug 2008 14:38:48 -0400 Subject: [FLASH-USERS] restarting FLASH In-Reply-To: <48B2DD72.4030502@flash.uchicago.edu> References: <48441DC4.9050701@astro.rug.nl> <2467fdc0807181010q52dcbecft6793e7c00dcd1518@mail.gmail.com> <4880D78C.4020708@astro.rug.nl> <4880DC19.7070205@scs.fsu.edu> <488F58E1.9050306@astro.rug.nl> <488F7953.4010802@flash.uchicago.edu> <2467fdc0807291324t5368140ax904909774b1bf582@mail.gmail.com> <489038FC.2030005@astro.rug.nl> <2467fdc0807300705n71a85d08gabfc731daee9c955@mail.gmail.com> <4896FD5A.50202@astro.rug.nl> <2467fdc0808040820t405dbd15j19359a4558417803@mail.gmail.com> <489732F7.8090902@oddjob.uchicago.edu> <48974C67.5010809@flash.uchicago.edu> <48A4655B.90801@astro.rug.nl> <48A5E0DF.1050401@flash.uchicago.edu> <48A98962.6050407@astro.rug.nl> <20080823145244.54885qcme0cxccqo@web.mail.umich.edu> <48B2DD72.4030502@flash.uchicago.edu> Message-ID: <20080825143848.21085mn2itnipd44@web.mail.umich.edu> Hi Paul, thanks for coming back to me on this. Here are additional details: 1) units: default cgs 2) IO mode: default serial (since the last e-mail I also ran the same simulation w/ parallel IO, then stopped it and tried to restart. The code behavior was different: it did read the checkpoint file and proceeded to the evolution stage taking just one (n=1) step and then just hung in there (though technically it did not crash) 3) I do have one problem-specific runtime parameter that is only used at the initialization stage and that is never important for restarts. Do let me know if you need any more information. thanks, Mateusz P.S. Btw, I did do some simple Sedov case restart experiment to check if the problem is not with the cluster itself. I was able to restart the code without any problem in this test case. Quoting "Paul M. Rich" : > Mateusz, > > We could use some more information so that we can help you figure > this out. What Flash units are you using in your setup? Which ones > have you customized or overridden in your setup, particularly if any > of the initializations were changed? Are there any unusual runtime > parameters being used? Which IO mode is this? > > This information will help us narrow down where to look considerably. > > Thanks, > > Paul Rich > ------------------------------ > ASC Flash Center > University of Chicago > richp at flash.uchicago.edu > > > mateuszr at umich.edu wrote: >> >> Hi all, >> >> I am having trouble restarting the code. More specifically, I am >> getting a segmentation fault when I try to restart from a valid >> checkpoint file. The details are enclosed below. I would be vary >> grateful for some clues. >> >> thanks, >> Mateusz >> >> >> [mateuszr at galaxy ~/FLASH]$ more out.txt >> >> [io_readData] Opening test_hdf5_chk_0011 for restart >> rank 1 in job 2 galaxy001.astro.lsa.umich.edu_38345 caused >> collective abort of all ranks >> exit status of rank 1: killed by signal 9 >> >> [mateuszr at galaxy ~/FLASH]$ more flash_run.err >> forrtl: severe (174): SIGSEGV, segmentation fault occurred >> Image PC Routine Line Source >> flash3 000000000050CC48 Unknown Unknown Unknown >> flash3 000000000043EEBA Unknown Unknown Unknown >> flash3 000000000043C359 Unknown Unknown Unknown >> flash3 0000000000410513 Unknown Unknown Unknown >> flash3 0000000000415C19 Unknown Unknown Unknown >> flash3 0000000000406982 Unknown Unknown Unknown >> libc.so.6 00000035AFE1D8A4 Unknown Unknown Unknown >> flash3 00000000004068A9 Unknown Unknown Unknown >> touch: cannot touch `/scratch///.keep': Permission denied >> > > > > From richp at flash.uchicago.edu Mon Aug 25 14:07:47 2008 From: richp at flash.uchicago.edu (Paul M. Rich) Date: Mon, 25 Aug 2008 14:07:47 -0500 Subject: [FLASH-USERS] restarting FLASH In-Reply-To: <20080825143848.21085mn2itnipd44@web.mail.umich.edu> References: <48441DC4.9050701@astro.rug.nl> <2467fdc0807181010q52dcbecft6793e7c00dcd1518@mail.gmail.com> <4880D78C.4020708@astro.rug.nl> <4880DC19.7070205@scs.fsu.edu> <488F58E1.9050306@astro.rug.nl> <488F7953.4010802@flash.uchicago.edu> <2467fdc0807291324t5368140ax904909774b1bf582@mail.gmail.com> <489038FC.2030005@astro.rug.nl> <2467fdc0807300705n71a85d08gabfc731daee9c955@mail.gmail.com> <4896FD5A.50202@astro.rug.nl> <2467fdc0808040820t405dbd15j19359a4558417803@mail.gmail.com> <489732F7.8090902@oddjob.uchicago.edu> <48974C67.5010809@flash.uchicago.edu> <48A4655B.90801@astro.rug.nl> <48A5E0DF.1050401@flash.uchicago.edu> <48A98962.6050407@astro.rug.nl> <20080823145244.54885qcme0cxccqo@web.mail.umich.edu> <48B2DD72.4030502@flash.uchicago.edu> <20080825143848.21085mn2itnipd44@web.mail.umich.edu> Message-ID: <48B30303.3010103@flash.uchicago.edu> Mateusz, My guess is that it is a unit that initializes before IO runs in a restart that is overrunning memory. Grid and Multispecies, particularly if you have overrode anything there, would be a good place to look. You could also try using array bounds checking if your compiler supports it. Among the units that Driver_initFlash starts before IO comes in a restart are: Grid, Particles, MaterialProperties and Multispecies. Also, if the runtime parameter is not being used on a restart, I would look and check that the code it controls is not being executed on a restart. Does this help? Paul Rich mateuszr at umich.edu wrote: > > Hi Paul, > > thanks for coming back to me on this. Here are additional details: > > 1) units: default cgs > > 2) IO mode: default serial > (since the last e-mail I also ran the same simulation w/ parallel IO, > then stopped it and tried to restart. The code behavior was different: > it did read the checkpoint file and proceeded to the evolution stage > taking just one (n=1) step and then just hung in there (though > technically it did not crash) > > 3) I do have one problem-specific runtime parameter that is only used > at the initialization stage and that is never important for restarts. > > Do let me know if you need any more information. > > thanks, > Mateusz > > P.S. Btw, I did do some simple Sedov case restart experiment to check > if the problem is not with the cluster itself. I was able to restart > the code without any problem in this test case. > > > > > Quoting "Paul M. Rich" : > >> Mateusz, >> >> We could use some more information so that we can help you figure >> this out. What Flash units are you using in your setup? Which ones >> have you customized or overridden in your setup, particularly if any >> of the initializations were changed? Are there any unusual runtime >> parameters being used? Which IO mode is this? >> >> This information will help us narrow down where to look considerably. >> >> Thanks, >> >> Paul Rich >> ------------------------------ >> ASC Flash Center >> University of Chicago >> richp at flash.uchicago.edu >> >> >> mateuszr at umich.edu wrote: >>> >>> Hi all, >>> >>> I am having trouble restarting the code. More specifically, I am >>> getting a segmentation fault when I try to restart from a valid >>> checkpoint file. The details are enclosed below. I would be vary >>> grateful for some clues. >>> >>> thanks, >>> Mateusz >>> >>> >>> [mateuszr at galaxy ~/FLASH]$ more out.txt >>> >>> [io_readData] Opening test_hdf5_chk_0011 for restart >>> rank 1 in job 2 galaxy001.astro.lsa.umich.edu_38345 caused >>> collective abort of all ranks >>> exit status of rank 1: killed by signal 9 >>> >>> [mateuszr at galaxy ~/FLASH]$ more flash_run.err >>> forrtl: severe (174): SIGSEGV, segmentation fault occurred >>> Image PC Routine Line >>> Source >>> flash3 000000000050CC48 Unknown Unknown >>> Unknown >>> flash3 000000000043EEBA Unknown Unknown >>> Unknown >>> flash3 000000000043C359 Unknown Unknown >>> Unknown >>> flash3 0000000000410513 Unknown Unknown >>> Unknown >>> flash3 0000000000415C19 Unknown Unknown >>> Unknown >>> flash3 0000000000406982 Unknown Unknown >>> Unknown >>> libc.so.6 00000035AFE1D8A4 Unknown Unknown >>> Unknown >>> flash3 00000000004068A9 Unknown Unknown >>> Unknown >>> touch: cannot touch `/scratch///.keep': Permission denied >>> >> >> >> >> From josef.stoeckl at uibk.ac.at Mon Aug 25 17:43:39 2008 From: josef.stoeckl at uibk.ac.at (=?iso-8859-1?Q?Josef_St=F6ckl?=) Date: Tue, 26 Aug 2008 00:43:39 +0200 Subject: [FLASH-USERS] restarting FLASH In-Reply-To: <20080823145244.54885qcme0cxccqo@web.mail.umich.edu> References: <48441DC4.9050701@astro.rug.nl> <2467fdc0807181010q52dcbecft6793e7c00dcd1518@mail.gmail.com> <4880D78C.4020708@astro.rug.nl> <4880DC19.7070205@scs.fsu.edu> <488F58E1.9050306@astro.rug.nl> <488F7953.4010802@flash.uchicago.edu> <2467fdc0807291324t5368140ax904909774b1bf582@mail.gmail.com> <489038FC.2030005@astro.rug.nl> <2467fdc0807300705n71a85d08gabfc731daee9c955@mail.gmail.com> <4896FD5A.50202@astro.rug.nl> <2467fdc0808040820t405dbd15j19359a4558417803@mail.gmail.com> <489732F7.8090902@oddjob.uchicago.edu> <48974C67.5010809@flash.uchicago.edu> <48A4655B.90801@astro.rug.nl> <48A5E0DF.1050401@flash.uchicago.edu> <48A98962.6050407@astro.rug.nl> <20080823145244.54885qcme0cxccqo@web.mail.umich.edu> Message-ID: <005201c90704$054918f0$0fdb4ad0$@stoeckl@uibk.ac.at> Hi Mateusz, There is a distinct restart bug in the serial HDF5 IO unit, which I decribed a few months back in the flash-bugs mailing list. Basically in a 3D problem the wrong buffer variable gets used (most likely due to copy-and-paste). I also posted a fix, which consists of modifying one line in the file IO/IOMain/hdf5/serial/PM/io_readData.F90: if(NDIM .gt. 2) then allocate(faceZBuf(NUNK_VARS, NXB, NYB, NZB+1, localNumBlocks)) - call MPI_RECV(unk(i,:,:,:,1:localNumBlocks), & + call MPI_RECV(faceZBuf(i,:,:,:,1:localNumBlocks), & NXB*NYB*(NZB+1)*localNumBlocks, & FLASH_REAL, MASTER_PE, & 9+i+NUNK_VARS+(NFACE_VARS*2), & MPI_COMM_WORLD, status, ierr) facevarz(i,io_ilo:io_ihi, io_jlo:io_jhi, io_klo:io_khi+1,1:localNumBlocks) = & faceZBuf(i,1:NXB,1:NYB,1:NZB+1,1:localNumBlocks) deallocate(faceZBuf) end if (this is a unified patch-like description) I hope this helps you! Best regards, Josef -----Urspr?ngliche Nachricht----- Von: flash-users-bounces at flash.uchicago.edu [mailto:flash-users-bounces at flash.uchicago.edu] Im Auftrag von mateuszr at umich.edu Gesendet: Samstag, 23. August 2008 20:53 An: flash-users at flash.uchicago.edu Betreff: [FLASH-USERS] restarting FLASH Hi all, I am having trouble restarting the code. More specifically, I am getting a segmentation fault when I try to restart from a valid checkpoint file. The details are enclosed below. I would be vary grateful for some clues. thanks, Mateusz [mateuszr at galaxy ~/FLASH]$ more out.txt [io_readData] Opening test_hdf5_chk_0011 for restart rank 1 in job 2 galaxy001.astro.lsa.umich.edu_38345 caused collective abort of all ranks exit status of rank 1: killed by signal 9 [mateuszr at galaxy ~/FLASH]$ more flash_run.err forrtl: severe (174): SIGSEGV, segmentation fault occurred Image PC Routine Line Source flash3 000000000050CC48 Unknown Unknown Unknown flash3 000000000043EEBA Unknown Unknown Unknown flash3 000000000043C359 Unknown Unknown Unknown flash3 0000000000410513 Unknown Unknown Unknown flash3 0000000000415C19 Unknown Unknown Unknown flash3 0000000000406982 Unknown Unknown Unknown libc.so.6 00000035AFE1D8A4 Unknown Unknown Unknown flash3 00000000004068A9 Unknown Unknown Unknown touch: cannot touch `/scratch///.keep': Permission denied From mateuszr at umich.edu Wed Aug 27 18:33:09 2008 From: mateuszr at umich.edu (mateuszr at umich.edu) Date: Wed, 27 Aug 2008 19:33:09 -0400 Subject: [FLASH-USERS] restarting FLASH Message-ID: <20080827193309.11325b53iu38wr0o@web.mail.umich.edu> Hi Josef/Paul, thanks for your suggestions. I did more experiments with restarting the code and here are my observations: 1) serial IO restart (compiled with bounds check and traceback) shows that there is indeed a problem in the part of the code that Josef described in his e-mail. I included that update in IO/IOMain/hdf5/serial/PM/io_readData.F90 as he suggested. This nicely removed the segmentation fault but unfortunately led to the following error message (and to the crashing of the code): INFO: Grid_fillGuardCells is ignoring masking. [flash_convert_cc_hook] PE= 92, ivar= 3, why=2 Trying to convert non-zero mass-specific variable to per-volume form, but dens is zero! The checkpoint file from which I am attempting to restart the simulation is "valid" and looks as expected when inspected w/ IDL. 2) when the code is ran in the parallel IO mode, stopped and then restarted, it effectively stops without crashing at the step n=1 (i.e., no output is produced and the batch job has "R" status). This does not depend on the amount of allocated memory or the number of processors. This is all a bit puzzling. If anybody has any thoughts on this then I would be grateful for comments/suggestions. thanks, Mateusz Quoting Josef St?ckl : > Hi Mateusz, > > There is a distinct restart bug in the serial HDF5 IO unit, which I decribed > a few months back in the flash-bugs mailing list. Basically in a 3D problem > the wrong buffer variable gets used (most likely due to copy-and-paste). I > also posted a fix, which consists of modifying one line in the file > IO/IOMain/hdf5/serial/PM/io_readData.F90: > > if(NDIM .gt. 2) then > > allocate(faceZBuf(NUNK_VARS, NXB, NYB, NZB+1, localNumBlocks)) > - call MPI_RECV(unk(i,:,:,:,1:localNumBlocks), & > + call MPI_RECV(faceZBuf(i,:,:,:,1:localNumBlocks), & > NXB*NYB*(NZB+1)*localNumBlocks, & > FLASH_REAL, MASTER_PE, & > 9+i+NUNK_VARS+(NFACE_VARS*2), & > MPI_COMM_WORLD, status, ierr) > facevarz(i,io_ilo:io_ihi, io_jlo:io_jhi, > io_klo:io_khi+1,1:localNumBlocks) = & > faceZBuf(i,1:NXB,1:NYB,1:NZB+1,1:localNumBlocks) > deallocate(faceZBuf) > > end if > > (this is a unified patch-like description) > > I hope this helps you! > > Best regards, > Josef > > > -----Urspr?ngliche Nachricht----- > Von: flash-users-bounces at flash.uchicago.edu > [mailto:flash-users-bounces at flash.uchicago.edu] Im Auftrag von > mateuszr at umich.edu > Gesendet: Samstag, 23. August 2008 20:53 > An: flash-users at flash.uchicago.edu > Betreff: [FLASH-USERS] restarting FLASH > > > Hi all, > > I am having trouble restarting the code. More specifically, I am > getting a segmentation fault when I try to restart from a valid > checkpoint file. The details are enclosed below. I would be vary > grateful for some clues. > > thanks, > Mateusz > > > [mateuszr at galaxy ~/FLASH]$ more out.txt > > [io_readData] Opening test_hdf5_chk_0011 for restart > rank 1 in job 2 galaxy001.astro.lsa.umich.edu_38345 caused > collective abort of all ranks > exit status of rank 1: killed by signal 9 > > [mateuszr at galaxy ~/FLASH]$ more flash_run.err > forrtl: severe (174): SIGSEGV, segmentation fault occurred > Image PC Routine Line Source > flash3 000000000050CC48 Unknown Unknown Unknown > flash3 000000000043EEBA Unknown Unknown Unknown > flash3 000000000043C359 Unknown Unknown Unknown > flash3 0000000000410513 Unknown Unknown Unknown > flash3 0000000000415C19 Unknown Unknown Unknown > flash3 0000000000406982 Unknown Unknown Unknown > libc.so.6 00000035AFE1D8A4 Unknown Unknown Unknown > flash3 00000000004068A9 Unknown Unknown Unknown > touch: cannot touch `/scratch///.keep': Permission denied > > > > > > > From richp at flash.uchicago.edu Thu Aug 28 11:03:07 2008 From: richp at flash.uchicago.edu (Paul M. Rich) Date: Thu, 28 Aug 2008 11:03:07 -0500 Subject: [FLASH-USERS] [Fwd: Re: restarting FLASH] Message-ID: <48B6CC3B.8060306@flash.uchicago.edu> Since this bounced from the list: -------------- next part -------------- An embedded message was scrubbed... From: "Paul M. Rich" Subject: Re: [FLASH-USERS] restarting FLASH Date: Thu, 28 Aug 2008 02:12:26 -0500 Size: 6031 Url: http://flash.uchicago.edu/pipermail/flash-users/attachments/20080828/9ace7ab0/attachment.eml