FLASH User's Guide FLASH User's Guide

Version 1.0

October 1999

uofc_logo.gif
asci_med_logo.gif
(ASC) Flash Center
University of Chicago

Contents

1  Introduction
2  Algorithmic overview
    2.1  Hydrodynamics
    2.2  Adaptive mesh refinement
    2.3  Equation of state
    2.4  Thermonuclear reactions
3  Quick start
4  FLASH tools
    4.1  The FLASH configuration script (setup)
    4.2  FLASH output comparison utility (focu)
        4.2.1  Building focu
        4.2.2  Using focu
    4.3  IDL routines for the analysis of FLASH output (fidlr)
        4.3.1  Configuring your environment to use the IDL routines
        4.3.2  Using the xflash graphical plotting tool
        4.3.3  Other IDL routines
5  The supplied problems
    5.1  Running the FLASH test suite
    5.2  The Sod shock-tube problem
    5.3  The Woodward-Colella interacting blast-wave problem
    5.4  The Sedov explosion problem
    5.5  The advection problem
    5.6  The problem of a wind tunnel with a step
6  Going further with FLASH
    6.1  The FLASH code architecture
        6.1.1  Code infrastructure
        6.1.2  Configuration files
        6.1.3  Module files
        6.1.4  Runtime parameters
        6.1.5  Code modules
        6.1.6  HDF output produced by FLASH
    6.2  Creating new problems
    6.3  Adding new solvers
    6.4  Porting FLASH to other machines
7  Contacting the authors of FLASH
8  References

Acknowledgments





The FLASH Code Group is supported by the (ASC) Flash Center at the University of Chicago under U. S. Department of Energy contract B341495. Some of the test calculations described here were performed using the Origin 2000 computer at Argonne National Laboratory and the (ASC) Blue Mountain computer at Los Alamos National Laboratory.

1  Introduction

The Center for Astrophysical Thermonuclear Flashes, or Flash Center, was founded at the University of Chicago in 1997 under contract to the United States Department of Energy as part of its Accelerated Strategic Computing Initiative ((ASC)). The goal of the Center is to solve several problems related to thermonuclear flashes on the surfaces of compact stars (neutron stars and white dwarfs), in particular X-ray bursts, Type Ia supernovae, and novae. To solve these problems requires the participants in the Center to develop new simulation tools capable of handling the extreme resolution and physical requirements imposed by conditions in these explosions, and to do so while making efficient use of the parallel supercomputers developed by the ASC project, the most powerful constructed to date.

The FLASH code represents an important step along the road to this goal. FLASH is a modular, adaptive, parallel simulation code capable of handling general compressible flow problems in astrophysical environments. FLASH has been designed to allow users to configure initial and boundary conditions, change algorithms, and add new physical effects with minimal effort. It uses the PARAMESH library to manage a block-structured adaptive grid, placing resolution elements only where they are needed most. FLASH uses the Message-Passing Interface (MPI) library to achieve portability and scalability on a variety of different message-passing parallel computers.

This user's guide is designed to enable individuals unfamiliar with the FLASH code to quickly get acquainted with its structure and to move beyond the simple test problems distributed with FLASH, customizing it to suit their own needs. The second section briefly describes the equations and algorithms used by the physics modules distributed with FLASH. It assumes that the reader has some familiarity both with the basic physics involved and with numerical hydrodynamics methods. This familiarity is absolutely essential in using FLASH (or any other simulation code) to arrive at meaningful solutions to physical problems. The novice reader is directed to an introductory text such as Patrick Roache's Fundamentals of Computational Fluid Dynamics (Hermosa, 1998) or C. A. J. Fletcher's Computational Techniques for Fluid Dynamics (Springer-Verlag, 1991). The advanced reader who wishes to know more specific information about the various algorithms is directed to the literature references in this section.

The third section discusses how to quickly get started with FLASH, describing how to configure, build, and run the code with one of the included test problems, then examine the resulting output. Users familiar with the capabilities of FLASH who wish to quickly `get their feet wet' with the code can skip directly to this section. The fourth section describes in more detail the use of the configuration and analysis tools distributed with FLASH. The fifth section describes the different test problems distributed with FLASH. The sixth section describes the FLASH code structure at length and gives detailed instructions for extending FLASH's capabilities by adding new problem setups. This section also discusses the code modules included with FLASH 1.0 and describes how new solvers may be integrated into the code. Finally, the seventh section gives guidance on contacting the authors of FLASH.

2  Algorithmic overview

FLASH provides a general simulation framework capable of incorporating several different types of physics module (e.g., hydrodynamics on structured or unstructured meshes, N-body solvers, reactive source terms, etc.) and initial or boundary conditions. However, it has been built to solve a particular class of problems, namely reactive, compressible flows in astrophysical environments. Thus, FLASH 1.0 is distributed with a particular set of algorithms to handle compressible hydrodynamics on a block-structured adaptive mesh, together with an appropriate nuclear equation of state and nuclear reaction network. We treat each of these elements in turn.

2.1  Hydrodynamics

The hydrodynamic module in FLASH 1.0 is based on the PROMETHEUS code (Fryxell, Müller, and Arnett 1989). This code solves Euler's equations for compressible gas dynamics in one, two, or three spatial dimensions. These equations can be written in conservative form as

r
t
+ Ñ ·rv
=
0
(1)
rv
t
+ Ñ ·rv v +Ñ P
=
rg
(2)
rE
t
+Ñ ·( rE + P ) v
=
rv ·g
(3)
where r is the fluid density, v is the fluid velocity, P is the pressure, E is the sum of the internal energy e and kinetic energy per unit mass,
E = e+ 1
2
v2 ,
(4)
g is the acceleration due to gravity, and t is the time coordinate. The pressure is obtained from the energy and density using the equation of state. For the case of an ideal gas equation of state, the pressure is given by
P = (g- 1) re,
(5)
where g is the ratio of specific heats. More general equations of state are discussed below.

For reactive flows, a separate advection equation must be solved for each chemical or nuclear species:

rXl
t
+ Ñ ·rXl v = 0 ,
(6)
where Xl is the mass fraction of the lth species, with the constraint that ål Xl = 1. The quantity rXl represents the partial density of the lth fluid. The code does not explicitly track interfaces between the fluids, so a small amount of numerical mixing can be expected during the course of a calculation.

The equations are solved using a modified version of the Piecewise-Parabolic Method (PPM), which is described in detail in Woodward and Colella (1984) and Colella and Woodward (1984). PPM is a higher-order version of the method developed by Godunov (1959). Godunov's method uses a finite-volume spatial discretization of the Euler equations together with an explicit forward time difference. Time-advanced fluxes at cell boundaries are computed using the analytic solution to Riemann's shock tube problem at each boundary. Initial conditions for each Riemann problem are determined by assuming the nonadvanced solution to be piecewise constant in each cell. Using the Riemann solution has the effect of introducing explicit nonlinearity into the difference equations and permits the calculation of sharp shock fronts and contact discontinuities without introducing significant nonphysical oscillations into the flow. Since the value of each variable in each cell is assumed to be constant, Godunov's method is limited to first-order accuracy in both space and time.

PPM improves on Godunov's method by representing the flow variables with piecewise parabolic functions. It also uses a monotonicity constraint rather than artificial viscosity to control oscillations near discontinuities, a feature shared with the MUSCL scheme of van Leer (1979). Although this could lead to a method which is accurate to third order, PPM is formally accurate only to second order in both space and time, as a fully third-order scheme proved not to be cost-effective. Nevertheless, PPM is considerably more accurate and efficient than most formally second-order algorithms.

PPM is particularly well-suited to flows involving discontinuities, such as shocks and contact discontinuities. The method also performs extremely well for smooth flows, although other schemes which do not perform the extra work necessary for the treatment of discontinuities might be more efficient in these cases. The high resolution and accuracy of PPM are obtained by the explicit nonlinearity of the scheme and through the use of intelligent dissipation algorithms, such as monotonicity enforcement, contact steepening, and interpolant flattening. These algorithms are described in detail by Colella and Woodward (1984).

A complete description of PPM is beyond the scope of this user's guide. However, for comparison with other codes, we note that the implementation of PPM in FLASH 1.0 uses the direct Eulerian formulation of PPM and the technique for allowing nonideal equations of state described by Colella and Glaz (1985). For multidimensional problems, FLASH 1.0 uses second-order operator splitting (Strang 1968). FLASH also implements PPM on a block-structured adaptive mesh, which is described in the following section.

2.2  Adaptive mesh refinement

We have used a package known as PARAMESH (MacNeice et al. 1999) for the parallelization and adaptive mesh refinement (AMR) portion of FLASH. PARAMESH consists of a suite of subroutines which handle refinement/derefinement, distribution of work to processors, guard cell filling, and flux conservation. In this section we briefly describe this package and the ways in which it has been modified for use with FLASH.

PARAMESH uses a block-structured adaptive mesh refinement scheme similar to others in the literature (e.g., Parashar 1999; Berger & Oliger 1984; Berger & Colella 1989; DeZeeuw & Powell 1993) as well as to schemes which refine on an individual cell basis (Khokhlov 1997). In block-structured AMR, the fundamental data structure is a block of uniform cells arranged in a logically Cartesian fashion. Each cell can be specified using a block identifier (processor number and local block number) and a coordinate triple (i,j,k), where i = 1¼nxb, j = 1¼nyb, and k = 1¼nzb. The complete computational grid consists of a collection of blocks with different physical cell sizes, related to each other in a hierarchical fashion using a tree data structure. The blocks at the root of the tree have the largest cells, while their children have smaller cells and are said to be refined. Two rules govern the establishment of refined child blocks in PARAMESH. First, the cells of a refined child block must be one-half as large as those of its parent block. Second, a block's children must be nested; that is, the child blocks must fit within their parent block and cannot overlap one another, and the complete set of children of a block must fill its volume. Thus, in d dimensions a given block can have at most 2d children.

Each block contains nxb×nyb×nzb interior cells and a set of guard cells. The guard cells contain boundary information needed to update the interior cells. These can be obtained from physically neighboring blocks, externally specified boundary conditions, or both. The number of guard cells needed depends upon the interpolation scheme and differencing stencil used for the hydrodynamics algorithm; for the explicit PPM algorithm distributed with FLASH, four guard cells are needed in each direction. PARAMESH handles the filling of guard cells with information from other blocks or a user-specified external boundary routine. If a block's neighbor has the same level of refinement, PARAMESH fills its guard cells using a direct copy from the neighbor's interior cells. If the neighbor has a different level of refinement, the neighbor's interior cells are used to interpolate guard cell values for the block. If the block and its neighbor are stored in the memory of different processors, PARAMESH handles the appropriate parallel communication (blocks are not split between processors). PARAMESH supports only linear interpolation for guard cell filling at jumps at refinement, but it is easily extended to allow other interpolation schemes; for example, for FLASH we have added a quadratic interpolation scheme. Once each block's guard cells are filled, it can be updated independently of the other blocks.

PARAMESH also enforces flux conservation at jumps in refinement, as described by Berger and Colella (1989). At jumps in refinement, the fluxes of mass, momentum, energy, and nuclear abundances across boundary cell faces are averaged across the fine cells at that face and provided to the corresponding coarse face on the neighboring block. These averaged fluxes are then used to update the values of the coarse block's local flow variables.

Each processor decides when to refine or derefine its blocks by computing a user-defined error estimator for each block. Refinement involves creation of between one and 2d refined child blocks, while derefinement involves deletion of a block. As child blocks are created, they are temporarily placed at the end of the processor's block list. After the refinements and derefinements are complete, the blocks are redistributed among the processors using a work-weighted Morton space-filling curve in a manner similar to that described by Warren and Salmon (1987) for a parallel treecode. During the distribution step each block is assigned a work value (an estimate of the relative amount of time required to update the block). The Morton number of the block is then computed by interleaving the bits of its integer coordinates as described by Warren and Salmon (1987); this determines its location along the space-filling curve. The Morton numbers of all of the blocks in the problem are then sorted using a parallel bitonic sort. Finally, the list of all blocks is partitioned among the processors using the block weights, equalizing the estimated workload of each processor. During the sorting step, only Morton numbers and not block data are communicated between processors; blocks are only moved (if necessary) after the sort. Changes in refinement are typically performed every ten timesteps.

The refinement criterion used by PARAMESH is adapted from Löhner (1987). Löhner's error estimator was originally developed for finite element applications and has the advantage that it uses an entirely local calculation. Furthermore, the estimator is dimensionless and can be applied with complete generality to any of the field variables of the simulation or any combination of them (by default, PARAMESH uses the density and pressure). Löhner's estimator is a modified second derivative, normalized by the average of the gradient over one computational cell. In one dimension on a uniform mesh it is given by

Ei = \mid ui+1 - 2ui + ui-1 \mid
\mid ui+1 - ui \mid + \mid ui - ui-1 \mid + e[ \mid ui+1 \mid - 2 \mid ui \mid + \mid ui-1 \mid ]
 ,
(7)
where ui is the refinement test variable's value in the ith cell. The last term in the denominator of this expression acts as a filter, preventing refinement of small ripples. The constant e is given a value of 10-4. Although PPM is formally second-order and its leading error terms scale as the third derivative, we have found the second derivative criterion to be very good at detecting discontinuities in the flow variable u. When extending this criterion to multidimensions, all cross derivatives are computed, and the following generalization of the above expresion is used:
Ei = é
ê
ê
ê
ê
ê
ê
ê
ê
ë

å
k,l 
æ
ç
è
2 u
xk xl
ö
÷
ø

é
ê
ë
(\mid u
xk
\midi+1 + \mid u
xk
\midi)/Dxl + e\mid 2 u
xk xl
\mid ù
ú
û
2

 
ù
ú
ú
ú
ú
ú
ú
ú
ú
û
1/2





 
 ,
(8)
where the sums are carried out over the coordinate directions. If the maximum value of Ei in a block is larger than an adjustable constant, CTORE, the block is marked for refinement. If the maximum value of Ei is less than a second constant (CTODE), the block is marked for derefinement. PARAMESH uses the default values CTORE = 0.8 and CTODE = 0.2. For each block, the error estimator is calculated for all interior cells plus two layers of guard cells, helping to ensure that blocks refine in advance of discontinuities.

2.3  Equation of state

It is common to call an equation of state routine more than 109 times when calculating two- and three-dimensional hydrodynamic models of stellar phenomena. Thus, it is very desirable to have an equation of state that is as efficient as possible, yet accurately represents the relevant physics. While FLASH is easily capable of including any equation of state, here we discuss three equation of state (henceforth EOS) routines which are supplied with FLASH 1.0: the ideal-gas or gamma-law EOS, the Nadyozhin EOS, and the Helmholtz EOS.

Each EOS routine takes as input the temperature T, density r, mean number of nucleons per isotope [`A], and mean charge per isotope [`Z]. Each routine then returns as output the scalar pressure (Ptot, specific thermal energy etot, and entropy Stot. In addition to these quantities, the EOS routines return partial derivatives of the pressure, specific thermal energy, and entropy with respect to the density and temperature:

Ptot
T
ê
ê
ê


r 
 , Ptot
r
ê
ê
ê


T 
 , etot
T
ê
ê
ê


r 
 , etot
r
ê
ê
ê


T 
 , Stot
T
ê
ê
ê


r 
 , Stot
r
ê
ê
ê


T 
 .
(9)
Quantities such as the specific heats and the adiabatic indices can be determined once these partial derivatives are known. When solving the energy equation, the temperature usually is obtained by iteration, and the thermodynamic derivatives must be available in order to implement efficient iteration schemes. Confirming that an EOS routine numerically satisfies thermodynamic consistency also demands that the derivatives be available.

An EOS is thermodynamically consistent if its outputs satisfy the Maxwell relations:

P  
=
 r2   e
r
ê
ê
ê


T 
  +  T   P
T
ê
ê
ê


r 
(10)
e
T
ê
ê
ê


r 
 
=
  T   S
T
ê
ê
ê


r 
(11)
- S
r
ê
ê
ê


T 
 
=
  1
r2
  P
T
ê
ê
ê


r 
 .
(12)
Thermodynamic inconsistency can manifest itself in the unphysical buildup (or decay) of the entropy (or temperature) during numerical simulations of what should be an adiabatic flow. Entropy-sensitive models (e.g., core-collapse supernovae) may suffer inaccuracies if thermodynamic consistency is significantly violated over sufficiently long time scales. The equations of state supplied with FLASH satisfy the Maxwell relations to a good degree of precision (in some cases to the limits of 64-bit arithmetic).

The gamma-law EOS provided with FLASH represents a simple ideal gas with constant adiabatic index g:

P = Na  k
_
A
rTe = 1
g- 1
  P
r
S = P/r+ e
T
 ,
(13)
where Na is the Avogadro number, and k is the Boltzmann constant. Because it requires no iteration in order to obtain the temperature, this EOS is quite inexpensive to evaluate.

More complex equations of state are necessary for realistic models of X-ray bursts, Type Ia supernovae, classical novae, and other stellar phenomena, where the electrons and positrons may be relativistic and/or degenerate and where radiation contributes significantly to the thermodynamic state. The Nadyozhin and Helmholtz EOS routines address this need. The Nadyozhin EOS, summarized by Nadyozhin (1974) and explained in detail by Blinnikov et al. (1996), was found by Timmes and Arnett (1999) in a comparison of five different analytic EOS schemes to give the best tradeoff among accuracy, thermodynamic consistency, and speed. The Helmholtz EOS (Timmes & Swesty 1999) uses interpolation from a two-dimensional table of the Helmholtz free energy F(r,T), obtaining comparable accuracy, better thermodynamic consistency, and about twice the speed of the Nadyozhin EOS.

For stellar phenomena, the pressure, specific thermal energy, and entropy contain contributions from several components:

Ptot
=
Prad + Pion + Pele + Ppos
(14)
etot
=
erad + eion +eele + epos
(15)
Stot
=
Srad + Sion + Sele + Spos .
(16)
Here the subscripts ``rad,'' ``ion,'' ``ele,'' and ``pos'' represent radiation, nuclei, electrons, and positrons, respectively. The radiation portion is that of a blackbody in local thermodynamic equilibrium:
Prad = a T4
3
erad = 3 Prad
r
Srad = (P/r+ e)
T
(17)
where a is the Stefan-Boltzmann energy constant, and c is the speed of light. The ion contribution is that of an ideal gas with g = 5/3. The electrons and positrons are treated as a non-interacting Fermi gas; the number densities of free electrons Nele and positrons Npos are given by
Nele
=
8 pÖ2
h3
  me3  c3  b3/2   [ F1/2(h,b)  +  F3/2(h,b) ]
(18)
Npos
=
8 pÖ2
h3
  me3  c3  b3/2 [ F1/2 ( -h- 2/b, b)  +  b  F3/2 ( -h- 2 /b, b) ] ,
(19)
where h is the Planck constant, me is the electron rest mass, b = k T / (me c2) is the relativity parameter, h = m/ k T is the normalized chemical potential energy m for electrons, and Fk(h,b) is the Fermi-Dirac integral
Fk(h,b) = ¥
ó
õ
0 
  xk  (1 + 0.5  b x)1/2  dx
exp(x - h) + 1
 .
(20)
Because the electron rest mass is not included in the chemical potential, the positron chemical potential must have the form hpos = -h-2/b. For complete ionization, the number density of free electrons in the matter is
Nele,matter =
_
Z

_
A
 Na  r = _
Z
 
 Nion ,
(21)
and charge neutrality requires
Nele,matter = Nele - Npos .
(22)
Solving this equation with a standard one-dimensional root-finding algorithm determines h. Once h is known, the Fermi-Dirac integrals can be evaluated, giving the pressure, specific thermal energy, and entropy due to the free electrons and positrons.

The Nadyozhin EOS routine uses polynomial or rational functions to evaluate the thermodynamic quantities. The temperature-density plane is decomposed into 5 regions (see Figure 12 of Blinnikov et al. 1996), with different expansions or fitting functions applied to each region. The methods used in the 5 regions are (1) a perfect gas approximation with the first-order corrections for degeneracy, (2) expansions of the half-integer Fermi-Dirac functions, (3) Chandrasekhar's (1939) expansion for a degenerate gas, (4) relativistic asymptotics, and (5) Gaussian quadrature for the thermodynamic quantities. The perturbation expansions, asymptotic relations, and fitting functions for the 5 regions are combined, with extraordinary care given to making sure that transitions between the regions are continuous, smooth, and thermodynamically consistent. All of the partial derivatives are obtained analytically.

The electron-positron Helmholtz free energy table used by the Helmholtz EOS was generated by using the Timmes EOS routine (Timmes & Arnett 1999). The Fermi-Dirac integrals, along with their derivatives with respect to h and b, were calculated to at least 18 significant figures using the quadrature schemes of Aparicio (1998). Newton-Raphson iteration was used to solve for h to at least 15 significant figures. The thermodynamic derivatives were computed analytically. Searches through the free energy table are avoided by computing hash indices from the values of any given (T,r[`Z]/[`A]) pair. No computationally expensive divisions are required in interpolating from the table; all of them can be computed and stored the first time the EOS routine is called.

2.4  Thermonuclear reactions

Modelling thermonuclear flashes typically requires the energy generation rate due to nuclear burning over a large range of temperatures, densities and compositions. The average energy generated or lost over a period of time is found by integrating a system of ordinary differential equations (the nuclear reaction network) for the abundances of important nuclei and the total energy release. In some contexts, such as Type II supernova models, the abundances themselves are also of interest. In either case, the coefficients that appear in the equations typically are extremely sensitive to temperature. The resulting stiffness of the system of equations requires the use of an implicit time integration scheme.

Timmes (1999) has reviewed several methods for solving stiff nuclear reaction networks, providing the basis for the reaction network solver included with FLASH. The scaling properties and behavior of three semi-implicit time integration algorithms (a traditional first-order accurate Euler method, a fourth-order accurate Kaps-Rentrop method, and a variable order Bader-Deuflhard method) and eight linear algebra packages (LAPACK, LUDCMP, LEQS, GIFT, MA28, UMFPACK, and Y12M) were investigated by running each of these 24 combinations on seven different nuclear reaction networks (hard-wired 13- and 19-isotope networks and table-lookup networks using 47, 76, 127, 200, and 489 isotopes). Timmes' analysis suggested that the best balance of accuracy, overall efficiency, memory footprint, and ease-of-use was provided by the choice of a 13- or 19-isotope reaction network, the variable order Bader-Deuflhard time integration method, and the MA28 sparse matrix package. The default FLASH distribution uses this combination of methods, which we describe in this section.

We begin by describing the equations solved by the nuclear burning module. We consider material which may be described by a density r and a single temperature T and contains a number of isotopes i, each of which has Zi protons and Ai nucleons (protons + neutrons). Let ni and ri denote the number and mass density, respectively, of the ith isotope, and let Xi denote its mass fraction, so that

Xi = ri/r = ni Ai/(rNA) ,
(23)
where NA is the Avogadro number. Let the molar abundance of the ith isotope be
Yi = Xi/Ai = ni/(rNA) .
(24)
Mass conservation is then expressed by
N
å
i = 1 
Xi = 1
(25)
At the end of each timestep, FLASH checks that the stored abundances satisfy equation (25) to machine precision in order to avoid the unphysical buildup (or decay) of the abundances or energy generation rate. Roundoff errors in this equation can lead to significant problems in some contexts (e.g., classical nova envelopes) where trace abundances are important.

The general continuity equation for the ith isotope is given in Lagrangian formulation by

dYi
dt
+ Ñ·( Yi Vi ) = .
Ri
 
 .
(26)
In this equation [(Ri)\dot] is the total reaction rate due to all binary reactions of the form i(j,k)l,
.
Ri
 
=
å
j,k 
Yl Yk lkj(l) - Yi Yj ljk(i) ,
(27)
where lkj and ljk are the reverse (creation) and forward (destruction) nuclear reaction rates, respectively. Contributions from three-body reactions, such as the triple-a reaction, are easy to append to equation (27). The mass diffusion velocities Vi in equation (26) are obtained from the solution of a multicomponent diffusion equation (Chapman & Cowling 1970; Burgers 1969; Williams 1988) and reflect the fact that mass diffusion processes arise from pressure, temperature, and/or abundance gradients as well as external gravitational or electrical forces.

The case Vi º 0 is important for two reasons. First, mass diffusion is often unimportant when compared to other transport process such as thermal or viscous diffusion (i.e., large Lewis numbers and/or small Prandtl numbers). Such a situation obtains, for example, in the study of laminar flame fronts propagating through the quiescent interior of a white dwarf. Second, this case permits the decoupling of the reaction network solver from the hydrodynamical solver through the use of operator splitting, greatly simplifying the algorithm. This is the method used by the default FLASH distribution. Setting Vi º 0 transforms equation (26) into

dYi
dt
= .
Ri
 
 ,
(28)
which may be written in the more compact and standard form
.
y
 
= f  (y) .
(29)
Stated another way, in the absence of mass diffusion or advection, any changes to the fluid composition are due to local processes.

Because of the highly nonlinear temperature dependence of the nuclear reaction rates, and because the abundances themselves often range over several orders of magnitude in value, the values of the coefficients which appear in equations (28) and (29) can vary quite significantly. As a result, the nuclear reaction network equations are ``stiff.'' A system of equations is stiff when the ratio of the maximum to the minimum eigenvalue of the Jacobian matrix [(J)\tilde] º f/y is large and imaginary. This means that at least one of the isotopic abundances changes on a much shorter timescale than another. Implicit or semi-implicit time integration methods are generally necessary to avoid following this short-timescale behavior, requiring the calculation of the Jacobian matrix.

It is instructive at this point to look at an example of how equation (28) and the associated Jacobian matrix are formed. Consider the 12C(a,g)16O reaction, which competes with the triple-a reaction during helium burning in stars. The rate R at which this reaction proceeds is critical for evolutionary models of massive stars since it determines how much of the core is carbon and how much of the core is oxygen after the initial helium fuel is exhausted. This reaction sequence contributes to the right-hand side equation (29) through the terms

.
Y
 
(4He)
= - Y(4He)  Y(12C)  R + ¼
.
Y
 
(12C)
= - Y(4He)  Y(12C)  R  + ¼
 ,
(30)
.
Y
 
(16O)
= + Y(4He)  Y(12C)  R  + ¼
where the ellipsis indicate additional terms coming from other reaction sequences. The minus signs indicate that helium and carbon are being destroyed, while the plus sign indicates that oxygen is being created. Each of these three expressions contributes two terms to the Jacobian matrix [(J)\tilde]=f/y:
J(4He,4He) = - Y(12C)  R  + ¼
J(4He,12C) = - Y(4He)  R  + ¼
J(12C,4He) = - Y(12C)  R  + ¼
J(12C,12C) = - Y(4He)  R  + ¼
 .
(31)
J(16O,4He) = + Y(12C)  R  + ¼
J(16O,12C) = + Y(4He)  R  + ¼
Entries in the Jacobian matrix represent the flow, in number of nuclei s-1, into (positive) or out of (negative) an isotope. All of the temperature and density dependence is included in the reaction rate R. The Jacobian matrices that arise from nuclear reaction networks are neither positive-definite nor symmetric since the forward and reverse reaction rates are generally not equal. However, the magnitudes of the matrix entries change as the abundances, temperature, or density change with time.

The default FLASH distribution includes two reaction networks. The 13-isotope a-chain plus heavy-ion reaction network is suitable for most multi-dimensional simulations of stellar phenomena where having a reasonably accurate energy generation rate is of primary concern. The 19-isotope reaction network has the same a-chain and heavy-ion reactions as the 13-isotope network, but it includes additional isotopes to accommodate some types of hydrogen burning (PP and CNO chains), along with some aspects of photodisintegration into 54Fe. This reaction network is described in Weaver, Zimmerman, & Woosley (1978). Both the networks supplied with FLASH are examples of ``hard-wired'' reaction networks, where each of the reaction sequences are carefully entered by hand. This approach is suitable for small networks when minimizing the CPU time required to run the reaction network is a primary concern, although it suffers the disadvantage of inflexibility.

The Jacobian matrices of nuclear reaction networks tend to be sparse, and they become more sparse as the number of isotopes increases. Implicit time integration schemes require the inverse of the Jacobian matrix, so we need to use a sparse linear algebra package to compute this inverse. To control the iteration count for a prespecified accuracy, FLASH uses a direct sparse matrix solver. Direct methods typically divide the solution of [(A)\tilde] ·x = b into a symbolic LU decomposition, numerical LU decomposition, and a backsubstitution phase. In the symbolic LU decomposition phase the pivot order of a matrix is determined, and a sequence of decomposition operations which minimize the amount of fill-in is recorded. Fill-in refers to zero matrix elements which become nonzero (e.g., a sparse matrix times a sparse matrix is generally a denser matrix). The matrix is not (usually) decomposed; only the steps to do so are stored. Since the nonzero pattern of a chosen nuclear reaction network does not change, the symbolic LU decomposition is a one-time initialization cost for reaction networks. In the numerical LU decomposition phase, a matrix with the same pivot order and nonzero pattern as a previously factorized matrix is numerically decomposed into its lower-upper form. This phase must be done only once for each staged set of linear equations. In the backsubstitution phase, a set of linear equations is solved with the factors calculated from a previous numerical decomposition. The backsubstitution phase may be performed with as many right-hand sides as needed, and not all of the right-hand sides need to be known in advance.

The MA28 linear algebra package used by FLASH is described by Duff, Erisman, & Reid (1986). It uses a combination of nested dissection and frontal envelope decomposition to minimize fill-in during the factorization stage. An approximate degree update algorithm that is much faster (asymptotically and in practice) than computing the exact degrees is employed. One continuous real parameter sets the amount of searching done to locate the pivot element. When this parameter is set to zero, no searching is done and the diagonal element is the pivot, while when set to unity, complete partial pivoting is done. Since the matrices generated by reaction networks are usually diagonally dominant, the routine is set in FLASH to use the diagonal as the pivot element. Several test cases showed that using partial pivoting did not make a significant accuracy difference, but were less efficient since a search for an appropriate pivot element had to be performed. MA28 accepts the nonzero entries of the matrix in the (i, j, ai,j) coordinate system, and typically uses uses 70-90% less storage than storing the full dense matrix.

The time integration method used by FLASH for evolving the reaction networks is the variable order Bader-Deuflhard method (e.g., Bader & Deuflhard 1983; Press et al. 1992). The reaction network is advanced over a large time step H from yn to yn+1 by the following sequence of matrix equations. First,

h
= H/m
( ~
1
 
- ~
J
 
) ·D0
= h f(yn)
(32)
y1
= yn + D0 .
Then from k = 1,2,¼,m-1
( ~
1
 
- ~
J
 
) ·x
= h f(yk) - Dk-1
Dk
= Dk-1 + 2 x
(33)
yk+1
= yk + Dk  ,
and closure is obtained by the last stage
( ~
1
 
- ~
J
 
) ·Dm
= h [ f (ym) - Dm-1 ]
yn+1
= ym + Dm  .
(34)
This staged sequence of matrix equations is executed at least twice with m=2 and m=6, yielding a fifth-order method. The sequence may be executed a maximum of seven times, which yields a fifteenth-order method. The exact number of times the staged sequence is executed depends on the accuracy requirements (set to one part in 106 in FLASH) and the smoothness of the solution. Estimates of the accuracy of an integration step are made by comparing the solutions derived from different orders. The minimum cost of this method - which applies for a single time step that met or exceeded the specified integration accuracy - is one Jacobian evaluation, eight evaluations of the right-hand side, two matrix decompositions, and ten backsubstitutions. This minimum cost can be increased at a rate of one decomposition (the expensive part) and m backsubstitutions (the inexpensive part) for every increase in the order 2k+1. The cost of increasing the order is compensated for, hopefully, by taking a correspondingly larger (but accurate) time step. The controls for order versus step size are a built-in part of the Bader-Deuflhard method. The cost per step of this integration method is at least twice as large as the cost per step of either a traditional first-order accurate Euler method or a fourth-order accurate Kaps-Rentrop (essentially an implicit Runge-Kutta) method, However, if the Bader-Deuflhard method can take accurate time steps that are at least twice as large, then this method will be more efficient globally. Timmes (1999) shows that this is typically the case. Note that in equations (32) - (34) not all of the right-hand sides are known in advance since the sequence of linear equations is staged. This staging feature of the integration method will make some matrix packages, such as MA28, a more efficient choice.

The energy generation rate is given by the sum

.
e
 

nuc 
= NA  
å
i 
  dYi
dt
 .
(35)
Note that a nuclear reaction network does not need to be evolved in order to obtain the instantaneous energy generation rate, since only the right hand sides of the ordinary differential equations need to be evaluated. It is more appropriate in the FLASH program to use the average nuclear energy generated over a time step
.
e
 

nuc 
= NA  
å
i 
  DYi
Dt
 .
(36)
In this case, the nuclear reaction network does need to be evolved. The energy generation rate, after subtraction of any neutrino losses, is returned to the FLASH program for use with the operator splitting technique.

Finally, the tabulation of Caughlan & Fowler (1988) is used in FLASH for most of the key nuclear reaction rates. Modern values for some of the reaction rates were taken from the reaction rate library of Hoffman (1999, priv. comm.). Nuclear reaction rate screening effects as implemented by Wallace, Woosley, & Weaver (1982), and decreases in the energy generation rate [(e)\dot]nuc due to neutrino losses as given by Itoh et al. (1985) are included in calculations done with FLASH.

3  Quick start

This section describes how to quickly get up and running with FLASH, showing how to configure and build it to solve the Sedov explosion problem, how to run it, and how to examine the output using IDL. To begin, verify that you have the following:

FLASH has been tested on the following Unix-based platforms. In addition, it may work with others not listed (see section ).

In this section we will use irix as an example of the machine type. Substitute your machine type for irix wherever it appears.

To begin, unpack the FLASH source code distribution. If you have a Unix tar file, type `tar xvf FLASHX.tar' (without the quotes), where X is the FLASH major version number (for example, use FLASH1.tar and FLASH1/ for FLASH version 1.0). If you are working with a CVS source tree, use `cvs checkout FLASHX' to obtain a personal copy of the tree. You may need to obtain permission from the local CVS administrator to do this. In either case you will create a directory called FLASHX/. Type `cd FLASHX' to enter this directory.

Next, configure the FLASH source tree for the Sedov explosion problem. Type

./setup sedov irix -defaults

This configures FLASH for the sedov problem, using the default hydrodynamic solver, equation of state, and I/O format defined for this problem. The source tree is configured to create a two-dimensional code by default.

The next step is to edit the file object/Makefile.h. This file contains all of the site-dependent and architecture-dependent parts of the Makefile. Verify that the compiler settings (FCOMP, CCOMP, and CPPCOMP) and the library paths (LIB) are correct for your system. No other changes should be necessary at this stage (later on you may wish to fiddle with the compiler optimization settings). Future versions of FLASH may make use of the GNU configure program to handle this step.

From the FLASH root directory (ie. the directory from which you ran setup), execute make. This will compile the FLASH code. If you should have problems and need to recompile, `make clean' will remove all object files from the object/ directory, leaving the source configuration intact; `make realclean' will remove all files and links from object/. After `make realclean,' a new invocation of setup is required before the code can be built.

Assuming compilation and linking were successful, you should now find an executable named flashX_sedov_irix in the object/ directory. (Remember to replace irix with your machine type.) You may wish to check that this is the case.

FLASH expects to find a flat text file named flash.par in the directory from which it is run. This file sets the values of various runtime parameters which determine the behavior of FLASH. If it is not present, FLASH will use default values for the runtime parameters, which may or may not produce desirable results. Here we will create a simple flash.par which sets a few parameters and allows the rest to take on default values. With your text editor, create flash.par in the main FLASH directory with the contents of Figure 1.

# runtime parameters

lrefine_max = 4
basenm = ßedov_4_"
restart = .false.
trstrt = 0.005
nend = 1000
tmax = 0.02
gamma = 1.4
xlboundary = -21 # code -21 is outflow
xrboundary = -21
ylboundary = -21
yrboundary = -21

Figure 1: FLASH parameter file contents for the quick start example.

This instructs FLASH to use up to four levels of adaptive mesh refinement (AMR) and to name the output files appropriately. We will not be starting from a checkpoint file (this is the default, but here it is explicitly set for purposes of illustration). Output files are to be written every 0.005 time units, and we will integrate until t = 0.02 or 1000 timesteps have been taken, whichever comes first. The ratio of specific heats for the gas (g) is taken to be 1.4, and all four boundaries of the two-dimensional grid have outflow (zero-gradient or Neumann) boundary conditions. Note the format of the file: each line is a comment (denoted by a hash mark, #), blank, or of the form variable = value. String values are enclosed in double quotes ("). Boolean values are indicated in the Fortran style, .true. or .false.

We are now ready to run FLASH. To run FLASH on N processors, type

mpirun -np N object/flashX_sedov_irix

remembering to replace N, X, and irix with the appropriate values. Some systems may require you to start MPI programs with a different command; use whichever command is appropriate to your system. The FLASH executable can take one command-line argument, the name of the runtime parameter file. The default parameter file name is flash.par.

You should see a number of lines of output indicating that FLASH is initializing the Sedov problem, listing the initial parameters, and giving the timestep chosen at each step. At the end FLASH prints a summary of the CPU time spent in various parts of the code and terminates. In the current directory you should now find several files:

We will use the xflash routine under IDL to examine the output. Before doing so, we need to set the values of two environment variables, IDL_PATH and XFLASH_DIR. Under csh this can be done using the commands

setenv XFLASH_DIR "$PWD/tools/idl"
setenv IDL_PATH "${XFLASH_DIR}:$IDL_PATH"

If you get a message indicating that IDL_PATH is not defined, just enter

setenv IDL_PATH "$XFLASH_DIR"

Now run IDL (idl) and enter xflash at the IDL> prompt. You should see a control panel widget as shown in Figure 2.

xflash.gif
Figure 2: Control panel for the xflash visualization tool under IDL.

The path entry should be filled in for you with the current directory. Enter sedov_4_ as the base filename and enter 4 as the suffix. (xflash can generate output for a number of consecutive files, but if you fill in only the beginning suffix, only one file is read.) Our output files are in HDF format, so make sure the HDF button is selected. Choose the image format (screen, Postscript, GIF) and the problem type (in our case, Sedov). Selecting the problem type is only important for choosing default ranges for our plot; plots for other problems can be generated by ignoring this setting and overriding the default values for the data range and the coordinate ranges. Select the desired plotting variable and colormap. The `abundance conservation' variable produces a plot of the sum of the nuclear abundances, useful for comparison with the expected value (unity). Under `Options,' select whether to plot the logarithm of the desired quantity, and select whether to plot the outlines of the AMR blocks. For very highly refined grids the block outlines can obscure the data, but they are useful for verifying that FLASH is putting resolution elements where they are needed. Finally, check `velocity vectors' to overlay the velocity field. The `xskip' and `yskip' parameters enable you plot only a fraction of the vectors so that they do not obscure the background plot.

When the control panel settings are to your satisfaction, click the `Plot' button to generate the plot. For Postscript and GIF output, a file is created in the current directory. The result should look something like Figure 3, although this figure was generated from a run with eight levels of refinement rather than the four used in the quick start example run. With fewer levels of refinement the Cartesian grid causes the explosion to appear somewhat diamond-shaped.

xflash_output.gif
Figure 3: Example of xflash output for the Sedov problem with eight levels of refinement.

FLASH is intended to be customized by the user to work with interesting initial and boundary conditions. In the following sections we will cover in more detail the structure of FLASH and the sample problems and tools distributed with it.

4  FLASH tools

In this section we describe several of the tools distributed with FLASH. The source code for these programs (except for setup) can be found in the main FLASH source tree under the tools/ directory.

4.1  The FLASH configuration script (setup)

The setup script, found in the FLASH root directory, provides the primary command-line interface to the FLASH source code. It configures the source tree for a given problem and target machine and creates files needed to parse the runtime parameter file and make the FLASH executable. More description of what setup does may be found in Section 6.1. Here we describe its basic usage.

Running setup without any options prints a message describing the available options:

[sphere 5:09pm] % ./setup
usage: setup problem-name machine-type [options]

problems: 2blast advect mach rt sedov sod xrayburst
machines: asci_blue asci_bluemtn asci_red irix linux t3e
options: -verbose -portable -defaults -[123]d -maxblocks=#

Available values for the two mandatory options, the name of the problem to configure and the type of machine to configure for, are determined by scanning the setups/ and source/systems/ directories, respectively.

A ``problem'' consists of a set of initial and boundary conditions, possibly additional physics (e.g., a subgrid model for star formation), and a set of adjustable parameters. The directory associated with a problem contains source code files which implement the initial and boundary conditions and extra physics, together with a configuration file, read by setup, which contains information on required physics modules and adjustable parameters.

A ``machine type'' consists of an architecture and an associated operating system. Normally it does not refer to a specific machine, but in the case of the (ASC) machines (Blue Pacific, Blue Mountain, and Red), the particular machine is unique and is treated as a separate architecture. The directory for each machine type contains a makefile fragment ( Makefile.h) which sets command names, compiler flags, and library paths, and any replacement or additional source files needed to compile FLASH for that machine type.

setup uses the problem and machine type, together with a user-supplied file called Modules which lists the code modules to include, to generate a directory called object/ which contains links to the appropriate source files and makefile fragments. It also creates the master makefile (object/Makefile) and several Fortran include files (object/rt_*) which are needed by the code in order to parse the runtime parameter file. After running setup, the user can make the FLASH executable by running make in the object/ directory (or from the FLASH root directory, if the -portable option is not used with setup). The optional command-line modifiers have the following interpretations:

-verbose Normally setup echoes to the standard output summary messages indicating what it is doing. Including the -verbose option causes it to also list the links it creates.
-portable This option creates a portable build directory. Instead of object/, setup creates object_problem_machine/, and instead of creating links to the source files in source/ and setups/, it copies files from those directories into the build directory. The resulting build directory can be placed into a tar archive and sent to another machine for building. Because the makefile in the FLASH root directory is generic and does not know what problem or machine has been selected, it cannot be used to make code which has been configured with -portable.

-defaults Normally setup requires that the user supply a plain text file called Modules (in the FLASH root directory) which specifies which code modules to include. A sample Modules file appears in Figure 4. Each line is either a comment (preceded by a hash mark (#)) or a module include statement of the form INCLUDE module. Sub-modules are indicated by specifying the path to the sub-module in question; in the example, the sub-module gamma of the eos module is included. If a module has a default sub-module, but no sub-module is specified, setup automatically selects the default using information provided by the module's configuration file.
The purpose of the -defaults option is to enable setup to generate a ``rough draft'' of a Modules file for the user. The configuration file for each problem setup specifies a number of code module requirements; for example, a problem may require the perfect-gas equation of state (eos/gamma) and an unspecified hydro solver (hydro). With -defaults, setup creates a Modules file by converting these requirements into module include statements. In addition, it checks the configuration files for the required modules and includes any of their required modules, eliminating duplicates. This is only done for one level of requirement, but usually this is enough. Most users configuring a problem for the first time will want to run setup with -defaults to generate a Modules file, then edit Modules directly to specify different sub-modules (e.g., hydro solvers or I/O formats). After editing Modules in this way, re-run setup without -defaults to incorporate the changes into the code configuration.

# Modules file constructed for rt problem by setup -defaults

INCLUDE driver
INCLUDE eos/gamma
INCLUDE grav
INCLUDE paramesh
INCLUDE mmpi
INCLUDE mpi_amr
INCLUDE hydro
INCLUDE io

Figure 4: Example of the Modules file used by setup to determine which code modules to include.

-[123]d By default setup creates a makefile which produces a FLASH executable capable of solving one- or two-dimensional problems (equivalent to -2d). To generate a makefile with options appropriate to three-dimensional problems, use -3d. The resulting code will be able to solve 1D, 2D, or 3D problems. To generate a one-dimensional code, use -1d. While -3d provides maximum flexibility, it also allocates the most memory at runtime. It is generally very wasteful of memory to solve a 1D or 2D problem using an executable compiled with -3d. These options are mutually exclusive and cause setup to add the appropriate compilation option to the makefile it generates.
-maxblocks=# This option is also used by setup in constructing the makefile compiler options. It determines the amount of memory allocated at runtime to the adaptive mesh refinement (AMR) block data structure. For example, to allocate enough memory on each processor for 500 blocks, use -maxblocks=500. If the default block buffer size is too large for your system, you may wish to try a smaller number here (the defaults are currently defined in source/driver/physicaldata.fh). Alternatively, you may wish to experiment with larger buffer sizes if your system has enough memory.

After configuring the source tree with setup and using make to create an executable in the build directory, look in the build directory for a file named rt_parms.txt. This contains a complete list of all of the runtime parameters available to the executable, together with their variable types, default values, and comments. To set runtime parameters to values other than the defaults, create a runtime parameter file named flash.par in the directory from which FLASH is to be run. The format of this file is described in Section 3.

4.2  FLASH output comparison utility (focu)

The FLASH output comparison utility, or focu (pronounced faux-coup), is a command-line utility which compares FLASH output files according to specified criteria (e.g., local or global error criterion on any of the hydrodynamical variables) and reports whether they are the same or not. This is useful in determining whether a change to FLASH (or to the value of a runtime parameter) has changed the results obtained with a given problem.

4.2.1  Building focu

The focu executable is configured and built in much the same way as is FLASH: obtain source tree, run setup to configure the source tree for a given architecture, and run make in the build directory. FLASH and focu are different in that focu is problem-independent, so only a machine type needs to be specified, and in that (at least at the present time) all of the modules included with focu are used, so setup is always run with the -defaults option. (See Section 4.1 for details regarding the usage of setup.) To summarize the steps in building focu:

  1. Move to the focu root directory (tools/focu/ relative to the FLASH root directory).
  2. Run ./setup machine-type -defaults, where machine-type is the name of the architecture for which you are configuring focu (run setup without options for a list). Do not use the setup script in the FLASH root directory.
  3. Edit object/Makefile.h to set compiler flags and the locations of libraries. Since focu contains routines to read both HDF and Fortran unformatted output, you must have the HDF library installed.
  4. Run make. If compilation is successful, you should find an executable named focu_machine-type in the current directory.

Additional setup options are available and are interpreted as described in Section 4.1.

4.2.2  Using focu

Running focu is very similar to running FLASH itself. First prepare a runtime parameter file, then run focu using

mpirun -np N focu_machine-type parameter-file

(or use the command used by your system to start MPI programs). Here N is the number of processors to use (which need not be the same as the number of processors used by the FLASH job which produced the output files to be read), and machine-type is the machine type specified in the setup step. If the parameter-file argument is not supplied, focu looks for a parameter file named focu.par in the current directory. The format of this file is identical to that used by FLASH (see Section 3).

The runtime parameters understood by focu are as follows. Default values are listed in brackets following each description.

file1 Base file name for the first file to compare. The file name will be constructed by appending the file number, padded with zeros to four places, to the base file name. [``file1'']
num1 The file number for the first file to compare. [0]
format1 The format of the first file: ``HDF'' for Hierarchical Data Format, ``F77'' for Fortran unformatted. [``HDF'']
file2 Base file name for the second file to compare. [``file2'']
num2 File number for the second file. [0]
format2 Format for the second file. [``HDF'']

variable The name of the hydrodynamical variable to examine. Available values are ``density,'' ``pressure,'' ``x velocity,'' ``y velocity,'' ``z velocity,'' ``temperature,'' ``energy,'' ``kinetic energy,'' ``internal energy,'' ``x momentum,'' ``y momentum,'' ``z momentum,'' and ``abundance N,'' where N is the index of the abundance to use. [``density'']

criterion The error criterion to apply in comparing the files. This must be of the form ``A B'', where A is either ``local'' or ``global,'' and B is either ``sum,'' ``min,'' or ``max.'' For the global criteria, focu first computes the volume-weighted average, minimum value, or maximum value of variable for each file, then computes the L1 deviation between the two global quantities. For the local criteria, the files are compared on a zone-by-zone basis. Consequently the AMR block structures of the two files must be the same in order for the comparison to succeed. If criterion is ``local sum,'' the volume-weighted average of the local L1 deviations is compared to threshold to determine success or failure. If criterion is ``local min'' or ``local max,'' the minimum or maximum L1 deviation is compared. The L1 deviation for two quantities X1 and X2 is defined as |X2-X1| / max{1/2|X1+X2|,10-99}. [``global sum'']

threshold The error threshold to use in deciding success or failure. [10-6]
outfile The name of the output file to create when generating the comparison report. Setting this to ``stdout'' causes focu to write to standard output. [``stdout'']

 FOCU: Flash Output Comparison Utility

Comparing: ßedov_4_f77_" # 3
ßedov_4_hdf_" # 3
Variable: density
Criterion: local sum
Threshold: 1.0000000000000E-09
Result:
  Error: 0.0000000000000E+00
  Minimum: 0.0000000000000E+00
  Maximum: 0.0000000000000E+00
  SUCCESS

Figure 5: Example of focu output format.

The output of focu is a short report listing the volume-averaged sum, minimum, and maximum deviations (local or global, depending on the setting of criterion) and reporting SUCCESS or FAILURE according to the supplied threshold. For example, if two runs of FLASH are performed using the Sedov problem, one with HDF output and one with Fortran unformatted output, the resulting output files should be identical. The report generated by a focu comparison of two such files is shown in Figure 5. Note that the last line of the report begins with SUCCESS or FAILURE, making it easy for a script to test the results. For example, with csh one might use

set result = `mpirun -np 2 focu_irix focu_sedov.par | grep SUCCESS`
if ("result" == "SUCCESS") then
...
endif

focu is used by the automated FLASH test script (Section 5.1) to compare FLASH outputs to the reference results supplied as part of the test suite.

4.3  IDL routines for the analysis of FLASH output (fidlr)

fidlr is a set of routines, written in the IDL language, which provide tools for plotting results obtained with FLASH. The routines include programs which can be run from the IDL command line to read 2D or 3D FLASH datasets and interpolate them onto uniform grids. An IDL program called xflash provides a graphical interface to these routines, enabling users to read and plot 2D AMR datasets without interpolating onto a uniform mesh.

4.3.1  Configuring your environment to use the IDL routines

The FLASH IDL routines are located in the tools/fidlr/ subdirectory of the FLASH root directory. To use them you must set two environment variables. First set the value of XFLASH_DIR to the location of the FLASH IDL routines; for example, under csh, use

setenv XFLASH_DIR flash-root-path/tools/fidlr

where flash-root-path is the absolute path of the FLASH root directory. Next add this directory to your IDL_PATH variable:

setenv IDL_PATH "${XFLASH_DIR}:$IDL_PATH"

If you have not previously defined IDL_PATH, omit :$IDL_PATH from the above command. You may wish to include these commands in your .cshrc file (or profile if you use another shell) to avoid having to reissue them every time you log in.

To begin using the routines, start IDL (idl). You may wish to make use of the program start.pro included with fidlr; this forces the use of an eight-bit colormap on 24-bit displays which have the ability to maintain two color depths (such as those found on SGIs). To use this program, start IDL using idl start.pro.

4.3.2  Using the xflash graphical plotting tool

xflash provides a graphical interface to IDL for plotting one or several 2D AMR datasets. It plots variables using a shaded colormap and can overlay the outline of the AMR block structure or arrows showing the velocity field. It also enables users to zoom in to different regions of the dataset and to probe the dataset.

To use the graphical tool, from the IDL command line (idl>) execute xflash. The widget is divided into sections controlling the file location, output type, problem, variable to plot, colormap, plot options, data range, zooming, and velocity vectors. At the bottom is a status display. See Figure 2 for an example of the appearance of the xflash window.

File location

The files to be read are specified by entering a path (by default the current IDL working directory), a base name, and a range of suffixes. If only the first suffix is entered, only that one is used; otherwise all integers between the beginning and ending suffix are used. Select the file type by choosing ``HDF'' for Hierarchical Data Format or ``F77'' for Fortran unformatted output. At present the Fortran reader does not convert little-endian files to big-endian or vice versa.

Output device

It is possible to direct output to the screen, to a Postscript file, or to a GIF image. The plotting routine will determine the orientation (portrait or landscape) using the aspect ratio of the domain to be plotted, and it will plot the largest area that maintains this ratio.

Problem

The problem buttons load default data ranges, axis orientation, and velocity information for different problems. New problems can be added quite easily by modifying the file tools/fidlr/xflash_defaults.pro. It is not necessary to add a problem in order to plot a dataset. The problem selection exists only to provide rational default values for data ranges; these values can be overridden through the widget.

Variables

Presently the following variables can be plotted: temperature, density, pressure, total energy, velocity magnitude, and abundance conservation error. The latter is the difference between the sum of the nuclear abundances in a zone and unity. The choice of problem and variable affects the default data range chosen.

Colormap

The colormaps used by xflash differ from the standard IDL colormaps. The first 12 colors are uniform across the colormaps and contain standard colors (red, green, blue, white, black) that are used to set the background color, text color, and so forth.

Plotting options

Two plotting options are currently available. The ``log'' option plots the common logarithm of the chosen variable, while the ``show blocks'' option overlays the boundaries of the AMR block structure on the colormap used to represent the variable. For AMR grids with many levels of refinement, the block boundaries can completely obscure the colormap; however, this option can be useful for verifying that resolution elements are being placed where they are needed.

Data range

The default data range may be overridden if desired by entering a new minimum or maximum value here. The values of the defaults are set in tools/fidlr/xflash_defaults.pro.

Zoom

A value of -1 here indicates that the default position for the corresponding limit is to be used. Leave each limit at -1 to plot the entire grid, or enter different limits to plot a smaller part of the grid.

Velocity vectors

If this box is checked, velocity arrows will be overlaid on the colormap. The arrow lengths are scaled according to a ``typical'' velocity magnitude, and magnitudes outside of a specified range are not plotted. The values of the typical velocity and the clipping range are set in tools/fidlr/xflash_defaults.pro. If the velocity vector box is checked, the xskip and yskip options become selectable. These options set the number of zones to skip in each direction when plotting arrows. For highly refined meshes this can help one to avoid obscuring the colormap with arrows.

Control buttons

Once all of the options are set, press the ``Plot'' button to produce the plot. The status of the plot appears in the status window below the buttons. If only a single plot is made (ie. only one suffix was specified), additional plots can be made without reading in the dataset again. This permits one to easily experiment with color tables, data ranges, and so forth. Once the plot is complete, press the ``Query'' button and click anywhere in the plot to probe the dataset. Mouse clicks will bring up a window showing the position of the click and the values of the hydrodynamical variables at that position.

Press the ``Quit'' button to terminate the widget.

4.3.3  Other IDL routines

Following is a description of each of the IDL routines included with the FLASH IDL tools in the tools/fidlr subdirectory.

Main programs

xflash.pro Widget routine and event handler.
xflash_defaults.pro Contains problem-specific defaults.
xplot_amr.pro Main plotting routine, called by xflash.

Reading routines

read3_amr.pro Reads block-structured AMR data from an HDF 4 file. The data are stored in common blocks.
read_amr_f77.pro Reads block-structured AMR data from a Fortran unformatted file. The data are stored in common blocks.
def_common.pro Defines the common blocks which hold the AMR data. Run this from the command line in order to access the AMR data structure directly (e.g., in order to pass a variable name to one of the uniform resampling routines).

Data manipulation routines

merge_amr.pro Resamples a given AMR data structure on a uniform grid of a given coarseness. Currently the only way to manipulate 3D data.
scale2_amr.pro Resamples a given AMR data structure on a uniform grid, producing a byte array for plotting purposes.

Plotting routines

partvelvec.pro Plots velocity vectors.
draw_blocks.pro Plots the outline of the AMR block structure.
colorbar2.pro Produces a horizontal color key.
vcolorbar.pro Produces a vertical color key.

Utility routines

cmcongrid.pro A special version of congrid used by tvimage.
color_gif.pro Captures a window and saves it to a GIF file.
nolabel.pro A hack to allow unlabelled axes.
start.pro When run at the start of an IDL session, sets up an eight-bit X visual on a 24-bit display (when supported by the X server).
tvimage.pro Similar to the IDL tv command, but allows a byte array to be drawn on the screen or directly into a Postscript file.
undefine.pro Frees up memory used by AMR data structures.

5  The supplied problems

5.1  Running the FLASH test suite

To verify that FLASH works as expected, and to debug changes in the code, we have created a suite of standard test problems. Most of these problems have analytical solutions which can be used to test the accuracy of the code. The remaining problems do not have analytical solutions, but they produce well-defined flow features which have been verified by experiments and are stringent tests of the code. The test suite configuration code is included with the FLASH source tree (in the setups/ directory), so it is easy to configure and run FLASH with any of these problems `out of the box.' Sample runtime parameter files are also included. Files containing reference results for the test suite tend to be large and so are distributed separately (via the FLASH_TEST CVS tree or in archive form as FLASH_TEST.tar). The FLASH_TEST tree also contains the runtime parameter files needed to reproduce the reference results.

Included with the test suite results is a script named flash_test which automates the process of comparing test suite results with the reference results. flash_test takes one argument, the machine type (run flash_test with no arguments for a list of available types). It automatically configures and builds FLASH for each of the test problems, builds the focu output comparison utility (Section 4.2), runs FLASH with each test problem, and uses focu to compare the results. It prepares a report for the entire suite of problems, flagging compilation problems, catastrophic runtime errors, and erroneous results. Since flash_test can be quite time-consuming to execute for the full test suite, it is generally a good idea to execute it as a batch job on the target system. Note that the tests may be run individually using the runtime parameter files provided with the FLASH test suite. Plots similar to those in this section can be produced using IDL routines provided with the test suite; these make use of the FLASH IDL tools (Section 4.3).

5.2  The Sod shock-tube problem

The Sod problem (Sod 1978) is an essentially one-dimensional flow discontinuity problem which provides a good test of a compressible code's ability to capture shocks and contact discontinuities with a small number of zones and to produce the correct density profile in a rarefaction. It also tests a code's ability to correctly satisfy the Rankine-Hugoniot shock jump conditions. When implemented at an angle to a multidimensional grid, it can also be used to detect irregularities in planar discontinuities produced by grid geometry or operator splitting effects.

We construct the initial conditions for the Sod problem by establishing a planar interface at some angle to the x and y axes. The fluid is initially at rest on either side of the interface, and the density and pressure jumps are chosen so that all three types of flow discontinuity (shock, contact, and rarefaction) develop. To the ``left'' and ``right'' of the interface we have

rL
=
1
rR
=
0.125
pL
=
1
pR
=
0.1
(37)
The ratio of specific heats g is chosen to be 1.4 on both sides of the interface.

In FLASH, the Sod problem (sod) uses the runtime parameters listed in Table 1 in addition to the regular ones supplied with the code. For this problem we use the eos/gamma equation of state module and set the value of the parameter gamma supplied by this module to 1.4. The default values listed in Table 1 are appropriate to a shock with normal parallel to the x-axis which initially intersects that axis at x = 0.5 (halfway across a box with unit dimensions).

Table 1: Runtime parameters used with the sod test problem.

Variable Type Default Description
rho_left real 1 Initial density to the left of the interface (rL)
rho_rightreal 0.125 Initial density to the right (rR)
p_left real 1 Initial pressure to the left (pL)
p_right real 0.1 Initial pressure to the right (pR)
u_left real 0 Initial velocity (perpendicular to interface) to the left (uL)
u_right real 0 Initial velocity (perpendicular to interface) to the right (uR)
xangle real 0 Angle made by interface normal with the x-axis (degrees)
yangle real 90 Angle made by interface normal with the y-axis (degrees)
posn real 0.5 Point of intersection between the interface plane and the x-axis

sod_single.gif
Figure 6: Comparison of numerical and analytical solutions to the Sod problem. A 2D grid with six levels of refinement is used. The shock normal is parallel to the x-axis.

sod_compare.gif
Figure 7: Comparison of numerical solutions to the Sod problem for two different angles (q) of the shock normal relative to the x-axis. A 2D grid with six levels of refinement is used.

Figure 6 shows the result of running the Sod problem with FLASH on a two-dimensional grid, with the analytical solution shown for comparison. The hydrodynamical algorithm used here is the directionally split piecewise-parabolic method (PPM) included with FLASH. In this run the shock normal is chosen to be parallel to the x-axis. With six levels of refinement, the effective grid size at the finest level is 2562, so the finest zones have width 0.00390625. At t = 0.2 three different nonlinear waves are present: a rarefaction between x » 0.25 and x » 0.5, a contact discontinuity at x » 0.68, and a shock at x » 0.85. The two sharp discontinuities are each resolved with approximately three zones at the highest level of refinement, demonstrating the ability of PPM to handle sharp flow features well. Near the contact discontinuity and in the rarefaction we find small errors of about 1-2% in the density and specific internal energy, with similar errors in the velocity inside the rarefaction. Elsewhere the numerical solution is exact; no oscillation is present.

Figure 7 shows the result of running the Sod problem on the same two-dimensional grid with different shock normals: parallel to the x-axis (q = 0°) and along the box diagonal (q = 45°). For the diagonal solution we have interpolated values of density, specific internal energy, and velocity to a set of 256 points spaced exactly as in the x-axis solution. This comparison shows the effects of the second-order directional splitting used with FLASH on the resolution of shocks. At the right side of the rarefaction and at the contact discontinuity the diagonal solution undergoes slightly larger oscillations (on the order of a few percent) than the x-axis solution. Also, the value of each variable inside the discontinuity regions differs between the two solutions by up to 10% in most cases. However, the location and thickness of the discontinuities is the same between the two solutions. In general shocks at an angle to the grid are resolved with approximately the same number of zones as shocks parallel to a coordinate axis.

sod_2d_density.gif
Figure 8: Density in the diagonal 2D Sod problem with six levels of refinement at t = 0.2. The outlines of AMR blocks are shown (each block contains 8×8 zones).

Figure 8 presents a grayscale map of the density at t = 0.2 in the diagonal solution together with the block structure of the AMR grid in this case. Note that regions surrounding the discontinuities are maximally refined, while behind the shock and discontinuity the grid has de-refined as the second derivative of the density has decreased in magnitude. Because zero-gradient outflow boundaries were used for this test, some reflections are present at the upper left and lower right corners, but at t = 0.2 these have not yet propagated to the center of the grid.

5.3  The Woodward-Colella interacting blast-wave problem

This problem was originally used by Woodward and Colella (1984) to compare the performance of several different hydrodynamical methods on problems involving strong, thin shock structures. It has no analytical solution, but since it is one-dimensional, it is easy to produce a converged solution by running the code with a very large number of zones, permitting an estimate of the self-convergence rate when narrow, interacting discontinuities are present. For FLASH it also provides a good test of the adaptive mesh refinement scheme.

The initial conditions consist of two parallel, planar flow discontinuities. Reflecting boundary conditions are used. The density in the left, middle, and right portions of the grid (rL, rM, and rR, respectively) is unity; everywhere the velocity is zero. The pressure is large to the left and right and small in the center:

pL
=
1000
pM
=
0.01
pR
=
100 .
(38)
The equation of state is that of a perfect gas with g = 1.4.

Figure 9 shows the density and velocity profiles at several different times in the converged solution, demonstrating the complexity inherent in this problem. The initial pressure discontinuities drive shocks into the middle part of the grid; behind them, rarefactions form and propagate toward the outer boundaries, where they are reflected back onto the grid and interact with themselves. By the time the shocks collide at t = 0.028, the reflected rarefactions have caught up to them, weakening them and making their post-shock structure more complex. Because the right-hand shock is initially weaker, the rarefaction on that side reflects from the wall later, so the resulting shock structures going into the collision from the left and right are quite different. Behind each shock is a contact discontinuity left over from the initial conditions (at x » 0.50 and 0.73). The shock collision produces an extremely high and narrow density peak; in the Woodward and Colella calculation the peak density is slightly less than 30. Even with ten levels of refinement, FLASH obtains a value of only 18 for this peak. Reflected shocks travel back into the colliding material, leaving a complex series of contact discontinuities and rarefactions between them. A new contact discontinuity has formed at the point of the collision (x » 0.69). By t = 0.032 the right-hand reflected shock has met the original right-hand contact discontinuity, producing a strong rarefaction which meets the central contact discontinuity at t = 0.034. Between t = 0.034 and t = 0.038 the slope of the density behind the left-hand shock changes as the shock moves into a region of constant entropy near the left-hand contact discontinuity.

2blast_soln.gif
Figure 9: Density and velocity profiles in the Woodward-Colella interacting blast-wave problem, as computed by FLASH using ten levels of refinement.

Figure 10 shows the self-convergence of density and pressure when FLASH is run on this problem. For several runs with different maximum refinement levels, we compare the density, pressure, and total specific energy at t = 0.038 to the solution obtained using FLASH with ten levels of refinement. This figure plots the L1 error norm for each variable u, defined using

E(Nref;u) º 1
N(Nref)
N(Nref)
å
i = 1 
ê
ê
ê
ui(Nref) -ui(10)
ui(10)
ê
ê
ê
 ,
(39)
against the effective number of zones (N(Nref)) at the highest level of refinement Nref. In computing this norm, both the `converged' solution u(10) and the test solution u(Nref) are interpolated onto a uniform mesh having N(Nref) zones. Values of Nref between 2 (corresponding to cell size Dx = 1/16) and 9 (Dx = 1/2048) are shown.

2blast_conv.gif
Figure 10: Self-convergence of the density, pressure, and total specific energy in the 2blast test problem.

Although PPM is formally a second-order method, one sees from this plot that, for the interacting blast-wave problem, the convergence rate is only linear. Indeed, in their comparison of the performance of seven nominally second-order hydrodynamic methods on this problem, Woodward and Colella found that only PPM achieved even linear convergence; the other methods were worse. The error norm is very sensitive to the correct position and shape of the strong, narrow shocks generated in this problem.

The additional runtime parameters supplied with the 2blast problem are listed in Table 2. This problem is configured to use the perfect-gas equation of state (eos/gamma), and it is run in a two-dimensional unit box with gamma set to 1.4. Boundary conditions in the y direction (transverse to the shock normals) are taken to be periodic.

Table 2: Runtime parameters used with the 2blast test problem.

Variable Type Default Description
rho_left real 1 Initial density to the left of the left interface (rL)
rho_mid real 1 Initial density between the interfaces (rM)
rho_rightreal 1 Initial density to the right of the right interface (rR)
p_left real 1000 Initial pressure to the left (pL)
p_mid real 0.01 Initial pressure in the middle (pM)
p_right real 100 Initial pressure to the right (pR)
u_left real 0 Initial velocity (perpendicular to interface) to the left (uL)
u_mid real 0 Initial velocity (perpendicular to interface) in the middle (uL)
u_right real 0 Initial velocity (perpendicular to interface) to the right (uR)
xangle real 0 Angle made by interface normal with the x-axis (degrees)
yangle real 90 Angle made by interface normal with the y-axis (degrees)
posnL real 0.1