[FLASH-USERS] Details of speedup after restart

sanjib gupta guptasanjib at lanl.gov
Tue Jun 5 21:17:03 CDT 2007


For instance, when I ran using 64 processors I had
nblockx=4
nblocky=16
in "flash.par"................so that makes sense.
So that means that if all allotted processors are at 100% usage by my 
job when I am at 1block/proc, then I will get no enhancements from 
doubling to 128 processors,
so it makes sense to increase the resolution instead?
That explains some tests I did to see how the runs scale with 
#processors, but I'm not sure it explains the drastic difference between 
initial run vs. all subsequent (restarted) runs.
Thanks,
Sanjib.

Dan Sheeler wrote:
> Does this run have just 64 blocks (4 x 16)?  If this is a standard 
> flash setup with local physics, having less than one block per process 
> probably will produce weird performance numbers.  In a standard run, 
> 2d 8x8 blocks require very little ram or work per process.  Single 
> processes are happy working on thousands of blocks.  Furthermore, work 
> is distributed to the processors in nothing smaller than block-sized 
> chunks.  If your run is a typical setup, then half of the processes 
> have more-or-less nothing to do but add communication, and I can 
> imagine that would cause runtime to fluctuate non-deterministically.
>
> Dan
>
>  --
> Dan Sheeler
> ASC Flash Center
> sheeler at flash.uchicago.edu
> (773) 834-3236
>
> On Tue, 5 Jun 2007, sanjib gupta wrote:
>
>> Hi,
>>
>> I am attaching 2 log files - the initial run on 128 processors, then 
>> immediately killing the job and restarting from the first checkpoint 
>> file "hc-rt-hdf5_chk_0000"
>> notice about 4 timesteps per second initially, then ~30 timesteps/sec 
>> after restart.
>>
>> On 64 processors I noticed the gain was higher , but my resolution 
>> was lower (half the number of nblocky, same nblockx, this is a 2D 
>> run)- sorry did not keep the logfiles.
>>
>> However this "gain" cannot be predicted......sometimes I don't get it 
>> on the first restart, so I restart a couple of times!
>> As you'all can guess, this plays havoc with any benchmarking efforts 
>> .......and we do intend to showcase our results from FLASH soon ...   
>> :-)
>>
>> We compile with intel fortran 9.1.033 and openmpi 1.1 on a linux 
>> cluster ....and hdf5 version 1.6.5 ......Makefile.h is attached.
>> Architecture - 64 bit AMD Opteron
>> running FC3 linux  + BProcV4 (cluster OS) with kernel = 2.6.14
>>
>> Thanks much for your help/insight/suggestions,
>> Sanjib.
>>




More information about the flash-users mailing list