[FLASH-USERS] Details of speedup after restart
sanjib gupta
guptasanjib at lanl.gov
Tue Jun 5 21:17:03 CDT 2007
For instance, when I ran using 64 processors I had
nblockx=4
nblocky=16
in "flash.par"................so that makes sense.
So that means that if all allotted processors are at 100% usage by my
job when I am at 1block/proc, then I will get no enhancements from
doubling to 128 processors,
so it makes sense to increase the resolution instead?
That explains some tests I did to see how the runs scale with
#processors, but I'm not sure it explains the drastic difference between
initial run vs. all subsequent (restarted) runs.
Thanks,
Sanjib.
Dan Sheeler wrote:
> Does this run have just 64 blocks (4 x 16)? If this is a standard
> flash setup with local physics, having less than one block per process
> probably will produce weird performance numbers. In a standard run,
> 2d 8x8 blocks require very little ram or work per process. Single
> processes are happy working on thousands of blocks. Furthermore, work
> is distributed to the processors in nothing smaller than block-sized
> chunks. If your run is a typical setup, then half of the processes
> have more-or-less nothing to do but add communication, and I can
> imagine that would cause runtime to fluctuate non-deterministically.
>
> Dan
>
> --
> Dan Sheeler
> ASC Flash Center
> sheeler at flash.uchicago.edu
> (773) 834-3236
>
> On Tue, 5 Jun 2007, sanjib gupta wrote:
>
>> Hi,
>>
>> I am attaching 2 log files - the initial run on 128 processors, then
>> immediately killing the job and restarting from the first checkpoint
>> file "hc-rt-hdf5_chk_0000"
>> notice about 4 timesteps per second initially, then ~30 timesteps/sec
>> after restart.
>>
>> On 64 processors I noticed the gain was higher , but my resolution
>> was lower (half the number of nblocky, same nblockx, this is a 2D
>> run)- sorry did not keep the logfiles.
>>
>> However this "gain" cannot be predicted......sometimes I don't get it
>> on the first restart, so I restart a couple of times!
>> As you'all can guess, this plays havoc with any benchmarking efforts
>> .......and we do intend to showcase our results from FLASH soon ...
>> :-)
>>
>> We compile with intel fortran 9.1.033 and openmpi 1.1 on a linux
>> cluster ....and hdf5 version 1.6.5 ......Makefile.h is attached.
>> Architecture - 64 bit AMD Opteron
>> running FC3 linux + BProcV4 (cluster OS) with kernel = 2.6.14
>>
>> Thanks much for your help/insight/suggestions,
>> Sanjib.
>>
More information about the flash-users
mailing list