[FLASH-USERS] Details of speedup after restart
Nathan Hearn
nhearn at uchicago.edu
Tue Jun 5 20:04:17 CDT 2007
Hi Sanjib,
This is very curious. How do the output data files that come from
restarts compare with those from non-restarts? Are they
binary-equivalent? My concern is that the speed-up is coming from the
code processing the input data differently each time.
Alternatively, is it possible that there are background operations
on the compute nodes (e.g., cluster node monitors, file system checks,
network loads, etc.) that are interfering with your benchmarking runs?
(Can you use top on any of the processing nodes while the simulation
is running?)
- Nathan
On 6/5/07, sanjib gupta <guptasanjib at lanl.gov> wrote:
> Hi,
>
> I am attaching 2 log files - the initial run on 128 processors, then
> immediately killing the job and restarting from the first checkpoint
> file "hc-rt-hdf5_chk_0000"
> notice about 4 timesteps per second initially, then ~30 timesteps/sec
> after restart.
>
> On 64 processors I noticed the gain was higher , but my resolution was
> lower (half the number of nblocky, same nblockx, this is a 2D run)-
> sorry did not keep the logfiles.
>
> However this "gain" cannot be predicted......sometimes I don't get it on
> the first restart, so I restart a couple of times!
> As you'all can guess, this plays havoc with any benchmarking efforts
> .......and we do intend to showcase our results from FLASH soon ... :-)
>
> We compile with intel fortran 9.1.033 and openmpi 1.1 on a linux cluster
> ....and hdf5 version 1.6.5 ......Makefile.h is attached.
> Architecture - 64 bit AMD Opteron
> running FC3 linux + BProcV4 (cluster OS) with kernel = 2.6.14
>
> Thanks much for your help/insight/suggestions,
> Sanjib.
More information about the flash-users
mailing list