Finite element solution strategies flow simulations* for large

Computer Methods in Applied Mechanics and Engineering
North-Holland
112 (1994) 3-24
Finite element solution strategies for large-scale
flow simulations*
M. Behr and T.E.
Tezduyar
~e~ar~rne~ of Aerospace Engitzeeringand ~e~h~~i~s~Army ~~~~-Ferfor~ce ~o~p~~i~~ Research Center and
Supercomputer Institute, Universityof Minnesota, 1200 WaskingtonAvenue South,
Minneapolis, MN 55415, USA
Received 4 December 1992
Revised manuscript received 10 December 1992
Large-scale flow simulation strategies involving implicit finite element formulations are described in the context
of incompressible flows. The stabilized space-time formulation for problems involving moving boundaries and
interfaces is presented, followed by a discussion of mesh moving schemes. The methods of solution of large linear
systems of equations are reviewed, and an implementation of the entire finite element code, permitting the use of
totally unstructured meshes, on a massively parallel supercomputer is considered. As an example, this methodology
is applied to a flow problem involving three-dimensional simulation of liquid sloshing in a tank subjected to vertical
vibrations.
1.
Introduction
Fuelfed by advances in supercomputer technology, the popularity of numerical simulation as a way of
investigating physical phenomena has been growing steadily. Fluid dynamics research in particular is
benefitting greatly from the new methodology. Yet, numerical approximation of physical systems, and
especially the necessary replacement of continuum with a finite number of variables, is rarely
straightforward. The possibility of approximation errors obliterating the actual physical phenomena is a
constant danger. A combination of mathematical analysis and careful comparison with the experimentat
and analytic results is necessary to validate any new numerical method. Even after such verification,
many obstacles remain on the road to realistic large-scale simulations. Chief among them is the need to
solve large systems of linear equations, which is inherent in the implicit methods considered here. A
more recent challenge is the need to consider parallelization potential of every algorithm employed.
These issues are among those addressed in this article. Even though we consider a formulation for
incompressible flows, many of the procedures and remarks are equally applicable to simulation of
compressible flows.
We discuss the application of finite element tools to flow problems which involve moving boundaries
and interfaces. The domain over which such problems have to be solved evolves in time, requiring
modifications of the usual. simulation techniques. We address this problem with a stabilized finite
element formulation, which incorporates the motion of the domain automatically by virtue of deforming
space-time elements.
Any discussion of numerical techniques cannot be easily separated from implementational
issues.
Practical application of finite element methods on scalar, and even vector, architectures has been
Correspondence to: Professor ‘I’ayfun E. Tezduyar, Supercomputer Institute, 1200 Washington Avenue South, Minneapolis,
MN 55415, USA.
* This research was sponsored by NASA-JSC under grant NAG 9-449, by NSF under grants MSM-8796352 and ASC-9211083,
by ALCOA Foundation and by ARPA under NISI’ contract GONANB2D1272. Partial support for this work has also come from
the Army Research Office contract number DAAL03-89-C-0038 with the Army High Performance Computing Research Center
at the University of Minnesota. The authors are thankful for the technical support they received from the Thinking Machines
Corporation, in particular from Dr. John Kennedy.
~45-~SZj/94/$~.~
@ 1994 Elsevier Science B.V. AH rights reserved
3
M. Behr. 7‘.E. Tezduyzr, Finite element solution strategies
extensively discussed in many textbooks and tutorials. The advent of massively parallel computers,
while promising continuing advances in computational speed, rendered most of these classical finite
element programming techniques obsolete. Also, the range of best to worst code performance
dramatically increased with the shift away from traditional sup~rcomputers to machines with hundreds
or thousands of processing units. Both the data structure and the algorithm design have to be changed
extensively to take advantage of the speed-ups offered by parallelism. An implementational framework
suitable for unstructured finite element grids. yet resulting in good and scalable parallel performance. is
also discussed,
We begin by introducing the equations of the fluid motion for the incompressible case in Section 2.
In Section 3, we open with a general discussion of numericat methods used to solve flow problems
involving moving boundaries and interfaces. Subsequently, we present the space-time velocity-pressure
formulation which takes such deformations into account in an elegant and straightforward manner. We
describe the stabilization technique which will endow the method with necessary robustness. In Section
4 we discuss the applicability of the method to problems involving deforming domains, and related
issues such as mesh movement options.
The formulations such as the one described in Section 3 give rise to large coupled linear systems of
equations, which can be solved by using either a direct or an iterative solution technique. The merits
and disadvantages of both approaches are discussed in Section 5. The iterative scheme under
consideration is based on the preconditioned Generalized Minimum Residual (GMRES) method.
The large-scale flow problems which motivate us to develop the finite element formulations in the
first place, prompt us also to pay particular attention to the usability of the related algorithms on the
computer architectures that offer highest computational speeds. Today, that is synonymous with
implementing the concepts included in following sections on massively parallel distributed-memory
computers. Section 6 relates our experience in that area, which is so far restricted to the SingleInstruction-Multiple-Data
(SIMD), or data parallel, style of programming, which is accessible on the
Connection Machine CM-200 and CM-5 computers.
A numerical example is presented in Section 7. Here, the space-time velocity-pressure formulation is
used in a three-dimensional simulation of sloshing in a tank subjected to vertical vibrations.
2. Problem statement
We consider a viscous, incompressible fluid occupying at an instant t E (0, T) a bounded region
0, c [w”\d, with boundary &, where rqd is the number of space dimensions. The velocity and pressure,
u(x, t) and p(r, t), are governed by the momentum and mass balance equations:
du
p ,t+u-Vu-f
!
V.u=O
1
on 4
-V.u=O
VtE(0,
onf2,
Vff(0,
T) *
(1)
(2)
T),
where p is the fluid density, assumed to be constant, andf(x, t) is an external, e.g., gravitational, force
field. The closure is obtained with a constitutive equation relating the stress tensor u to velocity and
pressure fields. In the constitutive equation for a Newtonian fluid, the deviatoric stress is assumed to be
proportional to the fluid rate of strain tensor which is defined as
E(U) =
$(Vu
+(Vu)‘) .
Under this assumption,
cr=-pl+T,
(3)
the stress tensor for a fluid with dynamic viscosity p is defined as follows:
T = 2&u)
Both the Dirichlet and Neumann-type
(41
.
boundary
conditions are taken into account, represented
as
5
M. B&r, T.E. Tezduyar, Finite element solution strategies
n - u - ed = h,
on (Qh,d,
d = 1, . . . , nsd ,
(6)
where(&d and(%d
are complementary subsets of the boundary r], and {e,>ze, is a basis in lPsd.
Note that the decomposition of 4 may be different for each of the basis vectors ed. In practice, the
bases often coincide with the local directions normal and tangential to the boundary. All of the
commonly used inflow, no-slip, symmetry and outflow traction-free boundary conditions can be
represented in this way.
The initial condition consists of a divergence-free velocity field specified over the entire domain:
4-v 0) = uo ,
V-u,=0
onGo.
(7)
Of interest here is the case where the domain fz,, and consequently
its boundaries 4, evolve in time.
3. Space-time velocity-pressure formulation
When faced with fluid dynamics problems involving free surfaces, moving interfaces and deforming
domains in general, a decision tree regarding the choice of a numerical method shown in Fig. 1 may be
the one to follow.
The Marker-And-Cell
(MAC) method developed by Harlow and Welch, and described in an
exhaustive report [l], has stood its own against the more established finite element and finite difference
approaches in this particular area. The numerous examples presented in [l] attest to the method’s
extraordinary flexibility. The concept of a mesh covering the maximum possible domain, but with
computations
being performed only for the cells intoning
marker particles, leads to efficient
implementations,
Since the tracer particles are indistinguishable from one another, the joining and
separation of fluid parts - a source of mesh generation nightmares in the finite difference/element
approaches - can be effortlessly simulated. Yet the MAC method is not free of shortcomings. Various
stress effects at free surfaces, such as surface tension and contact angles, have been notoriously difficult
to incorporate, requiring complex surface fitting techniques such as those introduced in [2]. The method
can suffer from stability problems which may be di~cult to detect. That is, apparently reasonable
particle distribution may be an artifact of gross approximation errors. Some modification were proposed
to stabilize the MAC method in [3]. Another variation of the MAC concept replaces the discrete
marker particles with a continuous fractional volume of fluid (VOF) function, as described in [4].
The extensive pool of knowledge accumulated about the finite difference, element and volume
methods in the context of fixed domain problems makes these more standard approaches also
attractive. Here, we concentrate on the finite element approach. It should be noted that some of the
finite element methods reviewed in this section possess a finite difference counterpart. For a finite
element method to be applicable to problems involving deforming domains, it needs to have one
Marker-And-Cull
I
Lagrangian
ALE
I
Continuous-In-Time
Finite Element, Difference, ,..
I
Lagrangian-Eulerian
Space-Time
I
Discontinuous-In-Time
Fig. 1. Numerical methods applicable to problems involving deforming domains: authors’ decision tree (see text for explanation
of terms).
6
M. Behr.
T.E.
Tezduyar, Finite element solution strategies
important
feature which may be omitted in the case of problems
involving
fixed domains only. The
necessity
of accurately
following
the deforming
boundary
of the domain
requires
some degree of
Lagrangian
description
to be present in the formulation,
at least in the vicinity of the boundary.
Since
the use of a Lagrangian
viewpoint
is unavoidable,
some researchers
consider
fully Lagrangian
formulation
for the entire domain [5, 61. The simplicity of such a solution is offset by several difficulties.
The flow field in the interior of the domain may exhibit a large amount of circulation,
such as local
vortices, which is often unrelated
to the motion of the boundary.
The fully Lagrangian
approach may
result in unnecessary
mesh distortion
and tangling in such interior regions, even for moderate
or null
displacements
of the boundary.
On the other hand, internal circulation
and shear layers are handled
without difficulty by an Eulerian
description.
These facts provided strong motivation
for the development
of methods capable of combining
the
Lagrangian
and Eulerian
approaches
in the same domain. The Arbitrary
Lagrangian-Eulerian
(ALE)
formulation
was initially stated in a finite difference context by Hirt [7], and eventually
adopted also in
the finite element community
(see [8, 91 and references contained
therein for a detailed description).
In
the modern
ALE approach,
velocities
of the nodes of the computational
mesh explicitly
enter the
momentum
equation,
which is written over a reference domain. This domain may or may not coincide
with either material or spatial domains.
Nodal velocities may be either prescribed
a priori, or be a
function
of the fluid velocity (most notably at the free surface). The ALE method has been used to
solve a number of interesting
problems,
including those quoted in [9] and [lo].
Another
avenue was pioneered
by Jamet and Bonnerot
[ 11, 121 as a means of solution
of Euler
equations
and Stefan-type
problems.
The method uses finite element discretization
in both space and
time to automatically
account for the deformation
of the computational
domain.
This approach
was
later applied to shallow water equations
by Lynch and Gray [13] and flows of incompressible
fluids by
Frederiksen
and Watts [14]. Improving
on the continuous-in-time
interpolations
used in the applications
mentioned
above, Jamet [15] proposed the use of discontinuous-in-time
functions for a model parabolic
problem.
Here for the first time the numerical
method was presented
together with analytic proofs of
stability and estimates of accuracy.
In a parallel
development,
the discontinuous-in-time
space-time
methods
have been gaining
popularity
for fixed domain
problems,
as an alternative
to the more common
semidiscrete
(finite
elements
in space, finite differences
in time) approach.
The advantages,
such as potential
for selective
temporal
refinement
and possibility
of obtaining
rigorous convergence
predictions,
were discussed by
Hughes and Hulbert in [16] in the context of elastodynamics.
Considerable
attention
was given to the
space-time
method for fixed domains by Hansbo et al. [17]. Their characteristic
streamline
diffusion
method takes advantage
of the possibility
of varying node distribution
inside the domain to reduce
advective effects by moving interior nodes approximately
along the characteristic
directions.
followed
by remeshing.
The potential
of the discontinuous-in-time
space-time
formulation
as a means of simulating
flows of
fluids in a deforming domain has been exploited by Tezduyar et al., in [ 18, 191. Later applications
of the
technique
are described in [20, 211. It has also been used to conduct large-scale studies of incompressible flows past oscillating
airfoils and cylinders
[22, 231. More recently,
the same method,
with a
particular
mesh moving scheme conforming
to the characteristic
streamline
diffusion ideology. has been
applied to deforming
domain problems by Hansbo [24]. Finally let us note that a similar approach can
be equally successful for compressible
flow simulation,
as shown by Aliabadi and Tezduyar in [25]. The
remainder
of this chapter recapitulates
the method introduced
in [18, 191, and discusses mesh motion
and related aspects. The discontinuous-in-time
space-time
technique
considered
here may be expected
to retain the detailed accuracy analyses carried out for its fixed domain counterpart,
while providing an
extremely
elegant way of accounting
for the displacements
of the reference
domain.
i.e., the finite
element mesh.
Note that in addition to the direct modeling of the governing
equations,
the free-surface
flows arc
amenable
to treatment
with specialized methods, which may use specific assumptions
about the flow to
produce an approximate
set of governing
equations.
As is the case with the Shallow Water Equations
(SWE) [26], this modified problem is often significantly
simpler to model than the original one. The
simplified problem is in turn solved with a suitable numerical
method. These types of approximations
are not considered
here.
M. Behr,
T.E.
Tezduyar, Finite element solution strategies
7
3.1. Variational formulation
In order to construct the finite element function spaces for the space-time method, we partition the
time interval (0, T) into subintervals Z, = (t,, t,, 1), where t, and t,, 1 belong to an ordered series of
time levels 0 = t, < t, < * . . < t, = T. Let C&= 4, and Z, = & . We define the space-time slab Q, as the
domain enclosed by the surfaces a,, , fin + 1 and P,,, where P,, i”sthe surface described by the boundary 4
as t traverses I,,. These concepts are sketched in Fig. 2. As it is the case with r], surface P,, can be
decomposed into (P,), and (P,),, with respect to the type of boundary condition (Dirichlet or
Neumann) being applied, again with a possibly different decomposition for each of the velocity degrees
of freedom. For each space-time slab, we define the following finite element interpolation function
spaces for the velocity and pressure:
(Yi), = {u” 1uhE
[H’h(Q,)]“sd,
uhA
(Vi), = {u” 1uhE
[H'"(Q,)]"sd,
uhG 0
gh on (P,),}
on (P,),}
,
(8)
,
(9)
($2, = P;), = {P”1ph E H’“(Q,)> .
(10)
Over the element domain, the interpolation is constructed by using first-order polynomials in space and,
depending on our choice, zeroth- or first-order polynomials in time. Globally, the interpolation
functions are continuous in space but discontinuous in time. However, for two-liquid flows, the solution
and variational function spaces for pressure should include the functions that are discontinuous across
the interface. It should be remarked that when required, the method can employ higher-order
interpolation functions in both space and time, to provide necessary accuracy.
The stabilized space-time formulation for deforming domains can be written as follows: given (u”), ,
find uh E (Yi), and ph E (9’:),
such that Vwh E (YE),, Vqh E (Yi),:
9) dQ +I,n 4wh): 4ph,
+
n
qhV-uh dQ + R (w”),’ . P((u”),’
I n
Fig. 2. Space-time
discretization:
- @“),I
concept
u”)
dQ
dfl
of a space-time
slab.
M.
Behr.
T.E.
Tezduyar,
-/)
The notational
conventions
Finite
element
solution
srrategie.\
u”)]
dQ
-V.a(ph,
used in (3 1) are shown below:
(zf”),’ =iim ~(t,, t F) ,
(12)
r-0
The solution to (11) is obtained
computations start with
sequentially
for all space-time
slabs 9, , Q2, . . . * Q,v , . The
In the variational formulation given by (ll), the first three terms of the left-hand side, together with
the right-hand side, constitute the standard Galerkin formulation of the problem. The fourth integral
enforces, in a weak sense, the continuity of the velocity in time over the lower boundary O,, of the
space-time slab Q,. The fifth term in (11) is a least-squares form of the momentum equation added to
the formulation. This term provides crucial stability characteristics and is discussed in more detail,
including the definition of rMoM, later in this section. At high Reynolds numbers, the stability is further
enhanced by the sixth term present in the formuIation (ll), which includes a least-squares form of the
continuity equation. The coefficient rconr is aIso defined later in this section.
The deformation of the mesh is reflected in the deformation of space-time elements, and directly
affects the computation of the transport terms. This effect is illustrated for a typical deformed
space-time element shown in Fig. 3. Let 5 and 8 serve as a set of standard reference coordinates for
that element. In the special, yet common, case when the 8 = const levels coincide with the t = const
levels, the Jacobian matrix of the transformation between the reference and physical domains can be
written as
I
X
Fig. 3. Space-time
discretization:
deforming
elements.
M. Behr,
9
T. E. Tezduyar, Finite element solution strategies
ax
ae
-at
(16)
’
ae1
and its inverse as
(17)
where At = t, + 1 - t, and vh is the local mesh velocity defined as
ax
vh=at
Now the transport
(18)
g=,,“st
term can be computed as
auh a8 aUh ag
auh
i)t+Uh.v*h=~j-+--++h*vuh
ag at
auh 2
ae At
auh
a&
=___--_h
.vg + uh .VUh
h
=
S$
+
(u”
-
v”)
*Vuh
.
(19)
This is equivalent to modifying the advective velocity by subtracting the mesh velocity. If the element
does not deform, as in a fully Eulerian description, the at/at term vanishes and the transport terms
retain full advective velocity. On the other hand, in an approximately Lagrangian description, the node
movement follows the fluid velocity field, and the effective advection velocity (u” - v”) is close to zero.
The mixed Galerkin formulation, i.e., formulation (11) with 7MoM= 0 and 7coNT = 0, suffers from
two well known stability problems. The first of the two is observed when a solution of an advectiondominated problem is attempted. The spurious oscillations arise in the vicinity of the boundary layers,
and pollute the entire domain. As a remedy, the streamline upwind / Petrov-Galerkin
(SUPG)
technique introduced by Brooks and Hughes [27] has been widely accepted in the fluid dynamics
community, and is a part of the stabilization terms of (11). The second stability problem is unique to
mixed methods, and manifests itself for only certain combinations of the interpolations for the velocity
and pressure fields. Simply stated, the danger stems from the fact that the pressure field enters the
mixed Galerkin variational formulation, i.e., (11) with rMoM = 0, only through the term la V*
whph dQ. For not sufficiently rich (Yi),,
sp urious pressure fields, or modes, may ‘slip through’ a ne”t of
velocity weighting functions and remain undetected by the variational formulation. A consistent
approach for the Stokes problem was introduced by Hughes et al. in [28]. Here the weighted residual
character of the Galerkin formulation was preserved, with added stabilizing terms which vanish when
the solution approaches the exact one. A generalization of that work for the finite Reynolds number
flows led to pressure stabilized/Petrov-Galerkin
(PSPG) formulation developed by Tezduyar et al.
[29]. As seen in [29], the PSPG formulation has been successfully applied to different equal-order
interpolations in the context of Navier-Stokes equations. The Galerkin / Least-Squares technique used
here combines the SUPG and PSPG stabilizations in one conceptually simple term. As seen in (ll), the
stabilizing term has the form of the sum of element-level inner products of the momentum equation and
its variation.
The TCONT- term has been proposed by Franca and Hughes [30], and provides stability to the velocity
10
M. Behr.
T, E. Terduyw, Finite element solution srraregieJ
field at high Reynolds numbers. This term can dramatically enhance the convergence of the nonlinear
iterative algorithm, without compromising the consistency of the original method. The standard stability
analysis for the advection-diffusion
equation with SUPC or Galerkin / Least-Squares stabilization relies
on the divergence-free property of the advection velocity field. In general, the velocity field obtained by
solving (11) will become free of divergence only in the h --+ 0 timit, where h is the mesh parameter. This
potential for instability is eliminated with the addition of the consistent rcoN., -term to the formulation.
A parameter rMoM designed specifically for use with bilinear interpolations has been given in, c.g..
[EC], and is repeated here:
(20,
or
in the steady-state case. Here laJij!, = (Z>?,l~~l(‘)“” is the pointwise velocity norm, h,, is the element
length and v = p/p is the kincmattc viscosity. The second design (21) coincides in the advective limit
with a definition used earlier for non-space-time formulations (see e.g. [29]):
Re,, < 3
Re,>3’
(22)
where Re, is an element level Reynolds number.
The definition of ~~o,.,~ is the same as that introduced
in [31]:
with Re, and t(Re,) defined as in (22). Here 9 is a positive parameter, usuahy taken as unity. The
definition (23) is a smoothed version of the parameter initially used in [32].
Special remarks apply to the formulation (11) when it is used with first-order interpolation functions
for the velocity field. Although the formulation appears to be consistent, the least-squares stabilization
terms involve an unmodified form of the momentum equation, and hence second derivatives of the
velocity field. These derivatives vanish identically for linear (e.g., triangle) velocity interpolations, and
are incomplete for bi- or trinlinear (e.g., quadrilatera1) velocity elements. Thus, while the full viscous
momentum equation is represented in the Galerkin part, only an inviscid, or nearly so, version of it is
present in the least-squares contribution. The imbalance created by the absence of the viscous terms
will normally be absorbed by the pressure and convective terms. This effect is hardly noticeable for the
rMoM definitions in use here. But in the space-time formulation, the presence of the time evolution
terms in the least-squares contribution may lead to a non-negligible and surprising phenomenon. When
a steady flow of a fluid is computed with a time-dependent procedure, the convergence is achieved
when (Us),,, = (a”),, yet (u”), need not be equal to (u”),:. When this criterion is used, there exists the
possibility of a non-zero C&/at and a corresponding non-zero jump-term. Indeed, the already
mentioned inconsistency seen in the least-squares form of the momentum equation, will feed such a
saw-tooth pattern in the time behavior of the solution. By varying the time step size, the weighting
expression &vh/dt is changed, leading to relative shifts in weight between the Galerkin and Least-
M. Behr,
T.E.
11
Tezduyar, Finite element solution strategies
Squares forms of the momentum equation. This accounts for the visible dependence of the steady-state
solution on the time step, when the bilinear or trilinear velocity interpolation is used.
Another low order element anomaly appears in the context of temporal discretization. In the
applications of space-time methods to the fixed domain problems, a satisfactory solutions may be
obtained with the use of constant-in-time interpolation functions. When such interpolation is applied to
the deforming domain problems, it falls into the subparametric category, which may invalidate some of
the assumptions of convergence analysis, such as completeness of the shape function sets.
4. Moving boundary
treatment
Similar to other Lagrangian-Eulerian
formulations, the current formulation gives us nearly complete
freedom of the mesh movement within the domain, and to some extent also at the boundary. In the
particular case of free surfaces and moving no-flux interfaces, the only requirement for the velocity u of
the nodes on that boundary is v*n = u*n, i.e., the normal components of the nodal and fluid velocities
must match. The tangential component of u remains arbitrary, with some common choices depicted in
Fig. 4. Boundary nodes can be made to move in a prescribed direction (e.g., horizontal or vertical), in a
direction normal to the free surface, or in the direction of local fluid velocity vector. In the last case, we
obtain a locally Lagrangian description at the boundary.
In problems involving mild or cyclic deformations only, the initial mesh can be designed to provide
the necessary flexibility, and remain topologically unchanged throughout the simulation. In problems
involving large deformations, on the other hand, it may be necessary to introduce new meshes during
the course of the computations, and project the solution from the old set of nodes to the new one for
every new mesh. There are situations in which the deformation of the mesh is known in advance, either
precisely, or qualitatively. Such is the case when computing flows around bodies moving in a prescribed
manner. Thus the initial mesh can be made to absorb certain types of deformations without significant
element distortion or the need for remeshing, as shown schematically in Fig. 5. This approach is
successfully used in conjunction with the present formulation to compute flows past oscillating cylinders
and pitching airfoils (see [22, 231). In general, such techniques will be successful when the deformation
of the domain is either cyclic or tends to a steady-state.
In many cases, however, the distortion of the domain is irregular or unpredictable, and therefore the
design of an explicit mesh moving method is difficult. More flexible techniques have been developed, in
which the domain is treated as an elastic solid under given boundary displacements [21]. The movement
of the nodes is determined by solving the elasticity equations to obtain the displacement field in the
interior of the domain. The shape of the critical elements can be improved by adjusting the (fictitious)
t
Fig. 4. Mesh moving options at the free surface: prescribed
direction
(left), normal direction
(center) and local velocity
direction
(right).
I
Fig. 5. Mesh moving
options:
movement
with no remeshing.
M. Behr.
12
T. E. Tezduyar,
Finite element solution strategies
f
\ \
\i
i
__\
X
Fig. 6. Mesh
meshing.
moving
options:
movement
with
discrete
re-
X
Fig. 7. Mesh
remeshing.
moving
options:
movement
with
continuous
material properties for these elements. Only the initial mesh needs to be explicitly generated, possibly
by an automatic mesh generator.
In some other cases, dramatic changes in the shape and volume of the domain may cause both of the
aforementioned techniques to break down. Our approach here is to use an automatic mesh generator,
the INRIA-developed
Emc* [33], to discretize anew the domain deformed during the previous time
step, as shown in Fig. 6. The solution has to be projected to the new mesh via an interpolation
technique. The jump term integral in (11) can also serve as a projection mechanism. The ability of this
scheme to handle arbitrary domains is offset by the diffusive effect of the projection, which grows with
increasing time resolution, as the number of projections increases with decreasing time step size.
The ideal approach should integrate the elastic treatment of the interior domain, with infrequent
remeshing when unacceptable mesh distortions are reached. It is also desirable to isolate, if possible, a
section of the domain in which remeshing should be avoided, and preserve the mesh in this region even
as the mesh in the rest of the computational domain undergoes restructuring and data there are
projected. It is clear that the linear projection will not affect a flow field with uniform velocity or
uniform shear rate. On the other hand, the adverse effects of projection are most pronounced in
regions where the shear rate varies rapidly, e.g., in boundary layers of internal shear layers. Sometimes,
the location of such regions is predictable. For example, computation of fluid flow around a solid object
might involve a structured, stiff mesh fragment around the object which is never subjected to remeshing
and the associated projection process, thus preserving the flow field within its boundary layer. Outside
this small area, even frequent remeshing may be allowed, as it will not degrade the quality of a slowly
varying flow field.
Yet another possibility is available to us with the use of a completely unstructured mesh within the
space-time slab. In the approaches discussed so far, the space-time mesh was formed by a straightforward extension of the spatial mesh in the time dimension. Such strategy yields three-dimensional
six-node or eight-node prisms for two-dimensional problems discretized with triangular or quadrilateral
elements, and four-dimensional bricks for three-dimensional problems. The important point is that the
mesh generation burden is restricted to the spatial domain only. With n,,-simplex mesh in the spatial
domain, it is also possible to connect two dissimilar meshes at the two time levels of a space-time slab
with an unstructured (nsd + 1)-simplex mesh. The remeshing in this case is achieved in a continuous
manner, as shown in Fig. 7. Unfortunately with the present state of affairs in the area of mesh
generation, this strategy is not yet viable for three-dimensional
problems. The mesh motion with no
remeshing (Fig. 5) is utilized in the sloshing problem in Section 7.
5. Solution techniques
At each nonlinear iteration step of a finite element procedure,
linear equation system,
we have the task of solving a large
M. Behr, T.E. Tezcluyar, Finite element solution strategies
Ax=b.
13
(24)
Initially A is assumed to be an N X N general matrix, while later we discuss special properties of the
matrices arising from the finite element discretizations. The size of the system in our applications ranges
from N = 5000 for small space-time test problems to N = 300 000 for some of the large-scale problems.
Only in the lower part of this spectrum are the direct solution techniques feasible on contemporary
computers. For larger problems the memory limitations, as well as rapidly increasing computing cost,
make iterative solution techniques a necessity. It is therefore tempting to abandon the direct solver
entirely, for the sake of code uniformity. Yet time and again, the direct solvers prove to be a necessary
tool in at least the development stage of a finite element implementation. For example, to properly
diagnose instabilities of a new variational form, we may have to approach and pinpoint the borders of
the nonconvergence of the nonlinear iteration process. The robustness of the direct technique, which is
lacking from the practical iterative techniques, makes such studies of nearly ill-posed problems possible.
Iterative solvers are usually applied after the basic properties of the system matrix, often predicted by
analysis, are confirmed numeri~lly with the aid of a direct solver.
Of particular interest here is the LU decomposition of the matrix A, and the Crout algorithm for
achieving such decomposition. This method gives an option of efficient solution of a system with
multiple right-hand sides, yet having a low operations count similar to the methods of Gauss and
Gauss-Jordan.
The multiple right-hand side feature can be useful when the nonlinear iterations are
performed with a frozen left-hand side matrix to save matrix formation expense.
While the direct solution methods offer much better tolerance of unfavorable properties of the linear
system than their iterative counterparts, they are not immune to failure. Aside from the obvious
requirement of the nonsingularity, matrix A may also require pivoting to correct certain orderings of the
equations, which otherwise would lead to numerical overflows. In the case of the stabilized methods
discussed here, we are guaranteed that the symmetric part of the matrix A is positive definite. The
design of the stabilization parameters ensures that the positive definiteness constant is greater than
machine precision zero. Thus pivoting is not necessary in our implementations.
Another important effect of the equation ordering is the bandwidth of the matrix A. Figure 8 shows
the position of nonzero entries of a typical matrix arising from a finite element discretization. It is
natural to take advantage of the sparsity of such a matrix by employing the skyline matrix storage
scheme, such as that described in Section 11.2 of [34]. In that scheme, only the entries below the first
nonzero entry in each column of the upper-triangular part of A, and the entries to the right of the first
nonzero entry in each row of the lower-triangular part, are stored. Some zero entries of the initial
matrix A are still stored, as they become nonzero, or filled-in the process of the LU decomposition. The
resulting arrangement of stored and ignored entries is often called a skyline profile of the matrix. The
Fig. 8. Typical finite element matrix: location of non-zero entries (left) and skyline profile (right) for a quasi-structured
(inset).
mesh
14
M. Be/w, T.E. Tezduyar, Finite element solution strategies
skyline profile of the example matrix is also illustrated in Fig. 8. Twice the average height of the column
in a skyline matrix is called the mean bandwidth. The storage required for a skyline N x’N matrix with
a mean bandwidth b is Nb. Significant changes in the bandwidth may result from simple reordering of
the equations in the linear system. It is therefore worthwhile to seek an equation numbering that
produces an optimal bandwidth for a given system. Normally, the initial equation numbering will
correspond to the node numbering produced by the mesh generator. When a structured mesh generator
is used, the equation ordering can be nearly optimal without any additional effort. To illustrate, the
system pictured in Fig. 8 has been constructed from a quasi-structured finite element mesh. Unfortunately, the more automatic the mesh generator becomes, the less control is exercised over the
node ordering. This may have a disastrous effect, as the mean bandwidth b may approach N for
randomly ordered meshes, leading to unmanageable storage requirements. The remedy is to use an
empirical node renumbering scheme as an adjunct to the mesh generation step. Our implementations
which rely on an automatic mesh generation incorporate a reverse Cuthill-McKee [35, 361 renumbering
algorithm. Figure 9 shows the skyline profile of a typical finite eiement matrix resulting from a
discretization via an unstructured grid. The nonzero entries, including the fill-in, account for 37% of the
entire matrix. Application of the reverse Cuthill-McKee reordering yields a matrix representing the
same equation system, but with dramatically different skyline profile, also shown in Fig. 9. Now the
stored entries take up only 9% of the full matrix. Thus, an important four-fold reduction in storage
requirement is accomplished.
In an effort to reduce the cost required to solve a large linear system, we often lower our sights from
the exact solution and become content with a solution which approximates the exact one with a
prescribed accuracy. Such approximation can often be found at a greatly reduced expense. The iterative
methods achieve this aim by constructing a series of guesses approximating the true solution. The
conjugate gradient (CG) method has been extremely successful when applied to symmetric linear
systems. Discounting the truncation errors, this method is known to produce an exact solution in at
most N iterations, where N is the order of the linear system. For a review of this and earlier iterative
methods see, e.g., [37]. Although several generalizations of the CG algorithm to nonsymmetric linear
systems exist, the Generalized Minimum Residual (GMRES) method introduced by Saad and Schultz
[38] has proven to be the method of choice for many finite element researchers. The algorithm that
follows was introduced by Saad in [39].
Fig. 9. Finite element matrix for an unstructured
(right).
grid: skyline profile for the mesh (inset) before (left) and after reordering
M. Behr, T. E. Tezduyar,
5.1.
GMRES
Finite element solution strategies
15
algorithm
The outline of the GMRES algorithm is shown in Box 1. Given the equation system (24) an initial
guess n, and a pre~onditioner matrix M (to be discussed later), the process leads to an approximate
solution vector x which minimizes the residual of the initial system over the preconditioned Krylov
subspace. Each outer iteration involves solution of the linear system projected to a lower-dimensional
subspace. The successive outer iterations differ in the quality of the initial guess x0 only.
The inner iteration loop constructs the Krylov space and projects the original equation system to this
space, producing the matrix a which is by definition upper-Hessenberg. The modified Gramm-Schmidt
orthogonalization
of the basis vectors is typically the most ~omputationally expensive part of the
algorithm for moderate values of m.
The solution of the reduced system takes advantage of the upper-Hessenberg form of a, as shown in
Box 2. First the matrix H is reduced to an upper-triangular form by a series of Givens rotations. The
Box 1
GMRES algorithm: control flow
forI=l,...,n,,,,,
r,:=b-Ax,
z, := M,?u,
w :=Az,
fori=l,...,j
h,.i := (w, u,)
x:=x*
I
+ EF=, yiz,
if @e, - Hy/, s E exit
elsex,:=s
GMRES outer iterations
compute initial residual
compute initiaI residual norm
define first Krylov vector
GMRES inner iteration
preconditioning step
matrix-vector product
Gramm-Schmidt orthogonaliation
define next Krylov vector
define reduced system matrix
solve reduced system - see Box 2
form approximate solution
convergence check
restart
Box 2
ZMRES aigorithm: solution of the reduced system
p:=pe,
forj=l,...,m
initialize right-hand side
loop over columns of ti
r:=qm
c:=h,,,/y
s:=b. ,+JY
Givens rotation on I?
:= -sh,,, + chj+l,j = 0
M. Eehr,
16
T.E.
Tezduyar,
Finite elemenr solution strategies
optimal y is then found through a simple back-substitution on the transformed system. Note that the
operations enclosed in frames in Box 2 are omitted from practical implementations.
It should be noted that the algorithm shown in Box 1 differs from the classical GMRES in two ways.
In the classical GMRES, the Krylov space is continually expanded until convergence is reached. This
leads to excessive memory requirements, and the advantage over a direct method of solution is lost. To
alleviate this problem, in a practical GMRES algorithm, the dimension of the Krylov space m is limited
a priori, and if this dimension proves insufficient, the algorithm is restarted in the next outer iteration.
Alternatively,
the orthogonalization
process can be truncated to include only a given number of
previous basis vectors, but this option is not pursued here. Unfortunately, these modified methods arc
not guaranteed to converge, and the choice of the restart frequency has to be made carefully, usually by
numerical experimentation. The current algorithm also differs from other GMRES derivatives in that it
allows variable preconditioning at each inner iteration. This feature enables us to, e.g.. freely mix two
different but complementary preconditioners, as described in [40].
In a more efficient (but less intuitive) version of the GMRES algorithm, the Givens rotations are
computed and applied to the reduced system even as it is being constructed. This modi~cation gives us
an option of discontinuing the inner GMRES iterations when proper convergence criterion is met
within the inner iteration loop. Here, we use the fact that after j rotations applied to the reduced
system, the current value of the norm II/3e, - Hyll z is equal to the absolute value 1pj +_,1, as shown in
[38]. The final algorithm adopted for present computations is thus presented in Box 3.
Box 3
GMRES algorithm: modified control flow
for I= 1, . , , noutcr
GMRES outer iterations
compute initial residual
compute initial residual norm
define first Krylov vector
initialize right-hand side
GMRES inner iteration
preconditioning step
matrix-vector product
Gramm-Schmidt orthogonalization
r,, = 6 - Ax,,
P := l/~ul/Z
u, = rOlp
p:=pe,
forj=l,...,m
z,:=M;‘v
w:=Az,
I
fori=l,...,j
h ‘,, := fW?Vi)
w := w - hf,,u
h , i l,ip;lI:
v, / 1
/
111.1
define next Krylov vector
previous Givens rotations on fi
fori=l,...,j-1
hi., := cA, + sA+ I.,
y :=~~~:;h::.r:.,+c.h,,,.,
{
compute next rotation
c, : = ,,,,I;;
~,:=h,+,,,h
h,,, := Y
Givens rotation on J?
h ,<,,,:=o
P, := C,P,
Givens rotation on p
{ P,+1 := -s,p,
if IJ~,+,(S’E exit loop
{~~1:=
[“~
I:;
x:=x0 + c;_, yz
if \~,+,ls:
exi;iLop
else x0:=x
:::1-‘{~:i
inner loop convergence check
backsubstitution
form approximate solution
outer loop convergence check
restart
M. Behr, T. E. Tezduyar, Finite element solution strategies
17
For equation systems stemming from the increment form of the variational formulation, the initial
guess x0 is normally taken as zero.
In contrast to the direct methods of solution, the iterative schemes can be extremely sensitive to the
particular numerical properties of the system matrix A, and the GMRES algorithm is no exception. The
rate of convergence of Krylov subspace methods is influenced by the condition number of the matrix A,
as shown in [41]. The analysis contained therein indicates also that the order of convergence will
degrade as eccentricity of the ellipse containing the eigenvalues of the linear system decreases. In
practical terms, this measure is smaller for problems dominated by advection effects, as indicated by the
Reynolds number.
In most cases, a preconditioning of the original system will be needed to achieve reasonable
convergence rates. Preconditioning involves a matrix M resembling in some sense the matrix A, but
whose inverse is easy to compute. Assuming a constant preconditioner matrix for the duration of the
iterative process (as mentioned earlier, variable preconditioning is also possible), the (right) preconditioned system becomes
AM-‘@&-) = b .
(25)
The closer the matrix AM-’ is to identity, the better convergence one may expect from the iterative
soIver. Once the solution Mx of (25) is obtained, the solution of the original system, i.e., the x vector
itself, is easily computed.
Scaling of the linear system is a related concept. Here we construct a system
In contrast with the one-sided
The matrix of the iteratively solved system becomes thus W-“2AW-“2.
preconditioning, the scaled matrix retains the symmetry characteristics of the original system.
The simplest preconditioning method uses the diagonal part of the system matrix M = diag A. Such
preconditioning removes the worst effects of the large variations in the magnitude of diagonal entries,
which for diagonally dominant matrices is closely related to a high condition number. A diagonal
scaling may be constructed for positive definite matrices with W = diag A as well. Diagonal preconditioning or scaling become less effective as the Reynolds number increases and off-diagonal entries assume
importance. The more advanced preconditioning techniques such as the Element-By-Element
(EBE)
[42], Cluster-Element-By-Element
(CEBE) [43, 441 and CEBE /Cluster Companion (CEBE /CC) [40]
methods are not discussed here. The iterative implementation used here uses the diagonal scaling
exclusively.
6. Parallel impleme~~tion
Implementation notes contained in this section follow the parallel program design outlined in [20].
We restrict the discussion to distributed memory Single Instruction/Multiple
Data (SIMD) architectures, such as the Connection Machine range of supercomputers. It should be noted that the latest
model in this hardware family is also capable of sustaining the Multiple Inst~ction/MuItip~e
Data
(MIMI)) type of control flow.
The starting point in the design of a massively parallel implementation is the decision regarding the
distribution of variables among the processors of the computer. Since most of massively parallel
architectures are distributed memory machines, the placement of the data is of paramount importance
and can radically affect the performance of the implementation. We assume that the architecture that is
used has a developed concept of virtual processors, so that each of the data entries that are subject to a
parallel operation may be assumed to reside on its own processor. In reality, each physical processor
takes over the duties of several virtual processors.
We use two primary data storage modes termed ELEM and EQN. The ELEM mode is used for storage
of element level data, with one element and its degrees of freedom associated with exactly one virtual
M. Behr. T. E. Tezduyar. Finite element solution strategies
18
processor. On the other hand, the EQN mode will hold variables at the level of the global equation
system. These two arrangements are shown schematically in Fig. 10 for a typical subset of a finite
element mesh. The nodal data, coordinates, element-level properties and element level matrices are
stored in the ELEM mode. The global increment vector, global residual and certain intermediate
variables are kept in the EQN form. The mapping between the two datasets is denoted
LM : ELEMw EQN. Communication operation ELEM+EQN is called a gather, while movement of the
data in the opposite direction, ELEM+ EQN, is known as a scatter. The scatter is usually coupled with a
combining operation at the destination, such as addition or overwriting. Both gather and scatter may be
implemented efficiently on the Connection Machine computers, provided they are done repeatedly with
a static communication pattern as defined by LM. In that case, the communication trace may he
optimized and saved the first time communication is performed, resulting in extremely fast subsequent
gathers and scatters. The general nature of the scatter and gather operations sets no conditions on the
connectivity of the mesh, allowing the use of totally unstructured meshes characteristic of the finite
element method.
The bulk of the computations in an implicit finite element program will typically come from two
stages. One is the formation of the element level matrices and residual vectors. Second is the solution of
a linear system of equations, the components of which were constructed in the first stage. Efficient
implementation of these two stages on a massively parallel architecture is the most important, as well as
difficult, task.
The process of solution of the global linear system used here does not require the assembfy of the
global matrix in any form. Instead, it operates on unassembled element level matrices. Therefore, apart
from a simple assembly, or scatter, of the global residual vector from element-level residuals, the task
of forming the equation system takes place entirely at the element level. Consequently. in the matrix
formation phase, no inter-processor communication is involved and the parallelism of the operations
ELEM
$Ni
oo[IIo.
\
Fig. 10. Parallel
0000
0000,
processing
EQN
data
node
implementation:
element-level
(left)
element
and equation-level
(right)
data storage
modes
M. Behr,
T. E. Tezduyar, Finite element solution strategies
19
may be fully exploited. A more detailed description of the matrix formation phase, including a
pseudocode fragment, is found in [20].
The solution of the linear system (24) typically consumes most of the computational resources in an
implicit finite element program, and is also the most challenging part from the point of view of parallel
implementation. The direct solution technique are extremely hard to implement efficiently on massively
parallel computers, except for the special cases of, e.g., a uniformly banded matrix with very small
bandwidth of 3-5 elements. Such matrices rarely appear in finite element codes. Efforts are underway
to create more general parallel direct solvers employing the MIMD paradigm which by nature is more
flexible than the SIMD model.
In contrast, the iterative solution techniques have proven significantly more adaptable to parallel
architectures. The example in Section 7 employs a data parallel implementation
of the GMRES
algorithm described previously. The bulk of the computations in the GMRES algorithm shown in Box 3
will involve EQN structures only. These include the set of Krylov vectors (original and preconditioned)
and the entries of the diagonal preconditioner. The computationally intensive task of Gramm-Schmidt
orthogonalization
of the Krylov vectors involves only on-processor operations and communication
operations of the scan/reduce type. The latter are performed very efficiently on the architectures used
here, being in fact one of the fastest inter-processor communication operations. The only interplay
between the ELEM and EQN occurs when the matrix-vector product is to be computed. Such product
involves a gather of EQN-level values into the ELEM dataset, followed by a communication-free
on-processor matrix-vector product, and finally a scatter back into the EQN data set. The gather and
scatter operations utilize the highly optimized Connection Machine Scientific Software Library
(CMSSL) available on the CM-200 and CM-S.
In the current implementation, all variables related to the reduced system are stored on the scalar
front-end processor, and the factorization and back substitution of that system is also performed in a
scalar fashion. For moderate values of the Krylov space dimension m 6 50, it was observed that the
solution of the reduced system consumed less than 3% of the time spent on the GMRES iterations, so
the use of the comparatively slow front-end processor here is not an impediment.
See also the Ph.D. thesis of Johan [45] for another discussion of data parallel solution techniques and
a description of a memory-saving matrix-free GMRES implementation.
It is important to note the rapid advancements in computational speed which are still seen in the
universe of parallel computing. Most of the performance measurements quoted in the next section come
from either a CM-200 computer, or an early version of CM-5. These early CM-5 machines are now
being equipped with Vector Execution Units (VEU), which dramatically improve the pace of floating
point operations. We have compared the speed of the VEU version of CM-5 to that of a CM-200 in a
preliminary three-dimensional computation of an incompressible flow past a sphere at Reynolds number
Fig.11. Flow past a sphere: mesh surface for the benchmark problem.
20
M. Behr.
T.E.
Tezduyar, Finite element solution strategies
100. Figure 11 shows the surface portions of the mesh for this problem, which uses 16 384 elements and
134 542 unknowns.
On a per-node basis, we have obtained a speed-up of over 19 for the computationally intensive
system formation
stage, and of 7 for the communication
intensive
GMRES
stage. This
resulted
in an overall speed-up
by a factor of almost 9. Note that ‘node’ refers to either a Weitek
floating point unit on CM-200, or a 4-VEU processing node of a CM-5. It is also important
to realize,
that while the communication
libraries on the CM-200 were subjected
to optimization
efforts lasting
several years, the development
of their CM-S/VEU
counterparts
has just began. These results are
based upon a test version of the Thinking
Machines Corporation
software where the emphasis was on
providing
functionality
and the tools necessary to begin testing the CM-5 vector units. This software
release has not had the benefit of optimization
or performance
tuning and, consequently,
is not
necessarily
representative
of the performance
of the full version of this software.
7. Numerical
example:
sloshing
in a tank subjected
to vertical
excitation
In this section, we perform a three-dimensional
simulation
of sloshing in a tank subjected to external
excitation.
The objective
of this exercise is to assess the capabilities
of the space-time
finite element
formulation
as a method
for handling
deforming
domain
problems.
Experimental
and theoretical
evidence [46, 471 indicates the existence of multiple solution branches when the horizontal
cross-section
of the tank is nearly square.
Depending
on the excitation
frequency,
the competing
wave modes
interact generating
complex periodic, as well as chaotic, wave behavior. The particular case considered
here is based on the experiment
performed
by Feng and Sethna [47].
Water fills a container
with a rectangular
cross-section.
The initial dimensions
of the domain are
W X H X D with W= 0.1778 m (7 in), H = 0.18034 m (7.1 in) and D = 0.127 m (5 in). Side and bottom
boundaries
allow slip in the direction tangent to the surface. The top boundary
is left free, and moves
with the normal component
of the fluid velocity at the surface. The normal directions
are computed
using the consistent
normal approach of Engelman
et al. [48]. The external forces acting on the fluid
consist of a constant gravitational
acceleration
of magnitude
g = 9.81 m ss2 and of a sinusoidal
vertical
excitation
Ag sin ot with w = 2nf, f = 4.00 Hz and A such that the amplitude
of the oscillations
remains
at 1 mm. The problem
is nondimensionalized,
with the dimensional
scales chosen so that the Xdimension
length
%’ and vertical
acceleration
g become
unity. The nondimensional
time step is
At”= 0.093, leading to 20 time steps per excitation period. At each time level, the domain is discretized
using 7056 nodes and 6000 hexahedral
trilinear
elements.
The space-time
mesh for each time slab
consists of 14 112 nodes and 6000 quadrilinear
four-dimensional
brick elements.
A GMRES
solver with diagonal scaling was used to solve the linear system of equations
at each
number
of outer GMRES
nonlinear
iteration.
Krylov space size of 40 was chosen, and maximum
iterations
was 5 initially, and 10 for larger fluid motions. At each time step. an average of 3 nonlinear
iterations
were needed. The simulations
were done in 64-bit precision on the CM-200 computer.
The computations
start with a perturbed
solution
which has been obtained
by carrying
out the
simulation
for 20 time steps (one excitation period), with additional
force components
applied in the xand y-directions.
These components
have the same amplitude
and frequency as the vertical one. At the
end of the perturbation
period, the surface of the fluid becomes deflected
by about 7% from the
original flat position. The additional
forces are removed,
and motion of the fluid continues
influenced
by the vertical excitation
only. To get an idea of how much of the wave motion is contributed
by the
two competing
modes (1,O) and (0, l), we show the wave height reading at points (x, y) = (W, H/2)
and (x, y) = (W/2, H). These points correspond
to the location of probes in the experiment
of Feng et
al. [47], and are referred to as Al and A2, respectively.
The Al amplitude
identifies the (1, 0)-mode
component
and amplitude
at A2 indicates the strength of the (0, I)-mode.
The time histories of these
two amplitudes
are shown in Figs. 12 and 13. Even though the perturbation
exhibits no preference
for
either of the two competing
modes. i.e., (0,l)
and (l,O), after its removal, we witness the growth of
the (0, I)-mode.
A period of the wave motion is shown in Fig. 14. The elevation plots are viewed from the negative
x-direction.
In the z-direction,
the visible bounding
box extends from 0.5 to 0.9. Here the (0. l)-mode
21
M. Be&-, T.E. Tezduyar, Finite element solution strategies
0
20
40
60
80
100
120
Fig. 12. Vertically excited tank: time history of the wave height at point Al.
I
0
_
5
-.
0.4
0
_i
I
20
i
..‘.......
i...
,
40
..
.,....._,_.,..
_.;
.,.,.:_
..,.
:.
<
._._... i_
I
60
I
80
t
100
I
120
Fig. 13. Vertically excited tank: time history of the wave height at point A2.
is dominant, with a small, but non-zero, (l,O)-mode component. The two competing modes have a
slight phase difference, leading to a small ‘rotating’ effect. The experimental data predicts the presence
of a mixed mode (called MS1 in [47]) at this frequency, which contains a large (0, 1)-mode component
and a small (1, 0)-mode component.
Performance tests were conducted for the space-time implementation, with the sloshing problem
serving as a benchmark, yielding the following results. On the 32K-processor CM-200 computer, the
formation of the element level matrices and residuals took place at 800 million 64-bit floating point
operation per second (megaflops). The GMRES solution phase speed was observed as 872 megaflops.
For a three-dimensional
space-time formulation, the bulk of time taken by the GMRES solution
process is consumed by the matrix-vector
product. As discussed in Section 6, this product involves
three components: gather, on-processor matrix vector multiplication and scatter. Only the second
component contributes to the floating point operation count. The ratio of the imputation
time
consumed by the three stages was found to be 2 : 1: 2. Thus the 872 megaflops speed for the entire
GMRES solver indicates that the on-processor matrix vector product is being computed near its peak
speed of 5 gigaflops. The performance of the entire code, including the parallel input/output operations
and problem setup, was 794 megafiops. As the speeds of the two major code parts, i.e., the matrix
22
M. Behr. T. E. Tezduyar, Finite element solution strategies
!
I
i
Fig. 14. Vertically excited tank: free surface view at I ~71.29, 71.76. 72.22 (top row), t=72.69,
73.15, 73.61 (middie row),
t = 74.08, 74.54, 75.00 (bottom row).
formation stage and the GMRES solution stage, are comparable, the overall performance of the
program should not be sensitive to the choice of parameters such as the number of outer or nonlinear
iterations. Nevertheless, such sensitivity may exist in other situations.
8. Conclusions
We outlined the stabilized space-time velocity-pressure formulation which is applicable to problems
involving moving boundaries and interfaces. In an effort to communicate our philosophy regarding the
movement of the mesh, we discussed several distinct mesh moving schemes. Subsequently,
we
concentrated on the solution methods for the large linear systems of equations which arise from finite
element dicretizations. Noting the continuing need for direct solution techniques, we entrusted the large
scale simulations to the GMRES iterative method. The simulation of three-dimensional
phenomena
places a big strain on computing resources, and the efficient implementation of the finite element
methods on the latest generation of computers becomes an important issue. To this end, we successfully
employ massively parallel CM-200 and CM-5 machines, in conjunction with arbitrary unstructured linite
element meshes. Here, we discussed some of the programming issues which arise from the use of
distributed memory hardware. Finally, we presented the results from a simulation, which involved
three-dimensional
sloshing phenomena in a vertically excited tank.
References
[l] F.H. Harlow, J.E. Welch, J.P. Shannon and B.J. Daly, The MAC method, Report LA-3425, Los Alamos Scientific
Laboratory, 1965.
]2] B.J. Daty, A technique for including surface tension effects in hydrodynamics calculations, J. Comput. Phys. 4 (1969)
97- 117.
[3] R.K.C. Chan and R.L. Street, A computer study of finite-amplitude water waves, J. Comput. Phys. 6 (1970) 68-94.
[4] C.W. Hirt and B.D. Nichols, Volume of Fluid (VOF) method for the dynamics of free boundaries, J. Comput. Phys. 39
(1981) 201-225.
M. Behr, T.E. Tezduyar, Finite element solution strategies
23
[5] C.W. Hirt, J.L. Cook and T.D. Butler, A Lagrangian method for calculating the dynamics of an incompressible fluid with
free surface, J. Comput. Phys. 5 (1970) 103-124.
[6] T. Okamoto and M. Kawahara, Two-dimensional sloshing analysis by Lagrangian finite element method, Internat. J.
Numer. Methods Fluids, 11 (1990) 453-477.
[7] C.W. Hirt, A.A. Amsden and J.L. Cook, An arbitrary Lagrangian Euierian computing method for all flow speeds, J.
Comput. Phys. 14 (1974) 227-253.
[8] T.J.R. Hughes, W.K. Liu and T.K. Zimmermann, Lagrangian-Eulerian
finite element formulation for incompressible
viscous flows, Comput. Methods Appl. Mech. Engrg. 29 (1981) 329-349.
[9] A. Huerta and W.K. Liu, Viscous flow with large free surface motion, Comput. Methods Appl. Mech. Engrg. 69 (1988)
277-324.
[lo] A. Soulaimani, M. Fortin, G. Dhatt and Y. Ouellet, Finite element simulation of two- and three-dimensional free surface
tlows, Comput. Methods Appl. Mech. Engrg. 86 (1991) 265-296.
(111 P. Jamet and R. Bonnerot, Numerical solution of the Eulerian equations of compressible flow by a finite element method
which follows the free boundary and the interfaces, J. Comput. Phys. 18 (1975) 21-45.
(121 R. Bonnerot and P. Jamet, Numerical computation of the free boundary for the two-dimensional Stefan problem by
space-time finite elements, J. Comput. Phys. 25 (1977) 163-181.
1131 D.R. Lynch and W.G. Gray, Finite element simulation of flow in deforming regions, J. Comput. Phys. 36 (1980) 135-153.
[14] C.S. Frederiksen and A.M. Watts, Finite-element method for time-de~ndent
incompressible free surface flows, J. Comput.
Phys. 39 (1981) 282-304.
[15] P. Jamet, Galerkin-type approximations which are discontinuous in time for parabolic equations in a variable domain, SIAM
J. Numer. Anal. 15 (1978) 912-928.
[16] T.J.R. Hughes and G.M. Hulbert, Space-time finite element methods for elastodynamics: Formulations and error estimates,
Comput. Methods Appl. Mech. Engrg. 66 (1988) 339-363.
[17] P. Hansbo and A. Szepessy, A velocity-pressure
streamline diffusion finite element method for the incompressible
Navier-Stokes equations, Comput. Meth~s Appt. Mech. Engrg. 84 (1990) 175-192.
[lS] T.E. Tezduyar, M. Behr and J. Liou, A new strategy
-. for finite element computations involving moving boundaries and
interfaces --The deforming-spatial-domain/space-time
procedure: I. The concept and the preliminary numerical tests,
Comput. Methods Appl. Mech. Engrg. 94 (1992) 339-351.
]I91 T.E. Tezduyar, M. Behr, S. Mittal and J. Liou, A new strategy for finite element computations involving moving boundaries
and interfaces - The deforming-spatial-domain/space-time
procedure: II. Computation of free-surface flow, two-liquid
flows, and flows with drifting cylinders, Comput. Methods Appi. Mech. Engrg. 94 (1992) 353-371.
]201 M. Behr, A. Johnson, J. Kennedy, S. Mittal and T.E. Tezduyar, Computation of incompre~ible flows with implicit finite
element implementations on the Connection Machine, Comput. Methods Appl. Mech. Engrg. 108 (1993) 99-118.
WI T.E. Tezduyar, M. Behr, S. Mittal and A.A. Johnson, Computation of unsteady incompressible flows with the finite element
methods - Space-time formulations, iterative strategies and massively parallel implementations,
in: New Methods in
Transient Analysis, PVP, Vol. 246 (ASME, New York, 1992) 7-24.
La S. Mittal and T.E. Tezduyar, A finite element study of incompressible flows past oscillating cylinders and airfoils. Internat. J.
Numer. Methods Fluids 15 (1992) 1073-1118.
for unsteady incompressible flows involving fluid-body
f231 S. Mittal, Stabilized space-time finite element fo~ulations
interactions, PhD thesis, University of Minnesota, 1992.
v41 P. Hansbo, The characteristic streamline diffusion method for the time-dependent incompressible Navier-Stokes equations,
Comput. Methods Appi. Mech. Engrg. 99 (1992) 171-186.
[251 S. Aliabadi and T.E. Tezduyar, Space-time finite element computation of compressible flows involving moving boundaries
and interfaces, Comput. Methods Appi. Mech. Engrg. 107 (1993) 209-223.
WI J.P. Benque, A. Haugei and P.L. Viollet, Numerical methods in environmental fluid mechanics, in: M.B. Abbott and J,A.
Cunge, eds., Engineering Applications of Computational Hydraulics, Vol. II (Pitman, London, 1982).
formulations for convection dominated flows with
2271 A.N. Brooks and T.J.R. Hughes, Streamline upwind/Petrov-Galerkin
particular emphasis on the incompressible Navier-Stokes equations, Comput. Methods Appl. Mech. Engrg. 32 (1982)
199-259.
WI T.J.R. Hughes, L.P. Franca and M. Balestra, A new finite element formulation for computational fluid dynamics: V.
Circumventing the BabuSka-Brezzi condition: A stable Petrov-Galerkin formulation of the Stokes problem accommodating
equal-order inte~olations, Comput. Methods in Appl. Mech. Engrg. 59 (1986) 85-99.
flow computations with stabilized bilinear and linear
1291 T.E. Tezduyar, S. Mittai, S.E. Ray and R. Shih, In~mpressible
equal-order-interpolation
velocity-pressure elements, Comput. Methods Appl. Mech. Engrg. 95 (1992) 221-242.
[301 L.P. Franca and T.J.R. Hughes, Two classes of mixed finite element methods, Comput. Methods Appl. Mech. Engrg. 69
(1988) 89-129.
formulation of
[311 M. Behr, L.P. Franca and T.E. Tezduyar, Stabilized finite element methods for the velocity-pressure-stress
incompressible flows, Comput. Methods Appl. Mech. Engrg. 104 (1993) 31-48.
[32] L.P. Franca and S.L. Frey, Stabilized finite element methods: II. The incompressible Navier-Stokes equations, Comput.
Methods Appl. Mech. Engrg. 99 (1992) 209-233.
1331 F. Hecht and E. Satei, Emc’ un iogiciel d’edition de maiilages et de contours bidimensionneis, RT no. 118, INRIA, 1990.
[34] T.J.R. Hughes, The Finite Element Method. Linear Static and Dynamic Finite Element Analysis (Prentice Hall, Englewood
Cliffs, NJ 1987).
24
M. Behr. T.E. Tezduyar, Finite element solution strategies
[3S] E. Cuthill and J. McKee, Reducing the bandwidth of sparse symmetric matrices, in: Proc. 24th National Conf. ACM, Vol.
P-69 (1969) 157-172.
(361 J.A. George, Computer implementation of the finite element method. Technical Report STAN-CS-71-208, Computer
Science Department, Stanford University. Stanford, California, 1971.
(371 D.M. Young, A historical overview of iterative methods, Comput. Phys. Comm. 53 (1989) 1-17.
[38] Y. Saad and M. Schultz, GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems, SIAM
J. Sci. Statist. Comput. 7 (1986) 856-869.
[39] Y. Saad, A flexible inner-outer preconditioned GMRES algorithm, SIAM J. Sci. Statist. Comput. 14 (1993) 461-469.
(401 T.E. Tezduyar, M. Behr, S.K. Abadi. S. Mittal and S.E. Ray, A new mixed preconditioning method for finite element
computations, Computer Methods Appl. Mech. Engrg. 99 (1992) 27-42.
[41] Y. Saad, Krylov subspace methods for solving large unsymmetric linear systems, Math. Comp. 37 (1981) 105-126.
[42] T.J.R. Hughes, J. Winget, I. Levit and T.E. Tezduyar, New alternating direction procedures in finite element analysis based
upon EBE approximate factorizations, in: S. Atluri and N. Perrone, eds., Recent Developments in Computer Methods for
Nonlinear Solid and Structural Mechanics (ASME, New York, 1983) 75-109.
[43] J. Liou and T.E. Tezduyar. A clustered element-by-element
iteration method for finite element computations, in: R.
Glowinski et al., eds., Domain Decomposition Methods for Partial Differential Equations (SIAM. Philadelphia, PA. 1991)
140-150.
1441 J. Liou and T.E. Tezduyar, Clustered element-by-element computatj~~ns for fluid flow. in: Horst D. Simon, ed.. Parallel
Computational Fluid Dynamics - Implementation and Results, Scientific and Engineering Computation Series (MIT Press.
Cambridge, MA, 1992) 167-187.
(451 Z. Johan. Data Parallel Finite Element Techniques for Large-Scale Computational Fluid Dynamics. PhD thesis. Stanford
University, 1992.
(461 F. Simonelli and J.P. Gollub. Surface wave mode interactions: Effects of symmetry and degeneracy. J. Fluid Mech. I99
(1989) 472-494.
[47] Z.C. Feng and P.R. Sethna, Symmetry-breaking bifurcations in resonant surface waves, J. Fluid Mech. I99 (1989) 495-518.
1481 M.S. Engelman, R.L. Sani and P.M. Gresho, The implementation of normal and/or tangential boundary conditions in finite
element codes for incompressible fluid flow. Internat. J. Numer. Methods Fluids 2 (1982) 225-238.