a multi-grid enhanced gmres algorithm for elasto

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING
Int. J. Numer. Meth. Engng. 42, 1441–1462 (1998)
A MULTI-GRID ENHANCED GMRES ALGORITHM
FOR ELASTO-PLASTIC PROBLEMS
∗ AND D. R. J. OWEN
Y. T. FENG, D. PERIC
Department of Civil Engineering; University of Wales Swansea; Swansea SA2 8PP; U.K.
ABSTRACT
A combination of both GMRES and multi-grid (MG) methods is presented in this paper for solving largescale two- and three-dimensional elasto-plastic problems, in which each MG iteration cycle serves as the
preconditioning step for the GMRES procedure. A particular multi-grid approach, termed the Galerkin multigrid scheme, is considered and the main eort is devoted to the implementation aspects of the proposed
algorithm. Numerical examples, characterised by large-scale (up to 82145 DOF), strong non-linearity (nearly
plastic limit state, necking and localization) and severe ill-conditioned states (presence of loading limit points),
and also involving symmetric and unsymmetric as well as SPD and indenite system matrices, are provided.
The numerical results illustrate that the proposed method exhibits a remarkable performance in terms of
eciency and robustness in all circumstances. ? 1998 John Wiley & Sons, Ltd.
KEY WORDS: GMRES; Galerkin multi-grid; variable preconditioning scheme; conjugate gradient method; large-scale
elasto-plastic problem
1. INTRODUCTION
The GMRES algorithm,1 together with the Conjugate Gradient Squared Method (CGS)2 and the
Bi-Conjugate Gradient Stabilized Method (BiCGStab),3 is a very popular Krylov-type iterative
solver for general unsymmetric linear systems arising from a wide variety of applications. The
main attractive feature of the GMRES algorithm over CGS and BiCGStab is its good numerical
stability, combined with a non-increasing residual norm sequence. Nevertheless, the algorithm
requires storage of all the basis vectors of the Krylov space, resulting in a large increase in
terms of both memory requirement and orthogonalization cost if the procedure cannot converge
within a relatively small number of iterations. A practical remedy to these drawbacks is to adopt
a restarted version, but at the cost of requiring a greater number of iterations to attain convergence.
A unique feature that makes the GMRES algorithm much more attractive lies in the fact that
it can readily incorporate non-constant or non-linear preconditioning schemes in its algorithm
framework, and therefore its performance can be substantially enhanced by choosing the most appropriate preconditioner at each iteration step.4; 5 Such situations arise (i) when it is more desirable
to adopt dierent preconditioners determined, for instance, by means of heuristics at dierent stages
∗
Correspondence to: D. Peric, Department of civil Engineering, University of Wales Swansea, Swansea SA2 8PP, U.K.
E-mail: [email protected]
Contract=grant sponsor: EPSRC; Contract=grant number: GR=K88965
CCC 0029–5981/98/081441–22$17.50
? 1998 John Wiley & Sons, Ltd.
Received 1 May 1997
1442
Y. T. FENG, D. PERIC AND D. R. J. OWEN
of the algorithm, i.e. a non-constant preconditioning strategy may be more appropriate, or (ii) when
any iterative algorithm is employed as a preconditioning step, i.e. a non-linear preconditioner is
employed.
The version of GMRES with variable preconditioning, proposed by Saad in Reference 4, incurs
no additional cost in the arithmetic but doubles the memory requirement. An important property of
this new version of GMRES is that it still satises the residual norm minimization property over
the preconditioned Krylov subspace just as in the standard algorithm. A few numerical experiments
are also presented in Reference 4 to demonstrate how the new version can be used to improve
the robustness of the standard GMRES algorithm, while the rst illustration of the benets of this
algorithm in nite element applications is described by Tezduyar et al.6
The particular non-linear preconditioning scheme considered in the present work is the so called
Multi-Grid Algorithm (MG). It is motivated by the fact that the multi-grid approach has been
established as an ecient iterative method for solving a wide variety of practical problems, and
thus it has a potential to become one of the best preconditioners. Consequently, it is naturally
expected that by applying multi-grid iteration as the preconditioning step at each iteration, GMRES
may become the most promising iterative method in terms of eciency and robustness. Such
a combination, to be referred to as a MG-GMRES algorithm, is presented in this paper for solving
large-scale two- and three-dimensional elasto-plastic problems which are typically characterized by
poor conditioning when substantial plastic ow develops and as a result any conventional iterative
solver normally does not perform well. It is important to emphasize that the proposed MG-GMRES
method could be a very ecient iterative solver not only for unsymmetric problems, but also for
symmetric situations although the Conjugate Gradient Method (CG) is generally employed in the
latter case.
The paper is organized as follows: in the next section, the GMRES algorithm with constant or
variable preconditioning schemes is reviewed. Then a particular version of the multi-grid method,
termed the Galerkin Multi-Grid Approach (GMG) proposed recently by Feng et al.7 is reviewed
in Section 3. Next, the combination of GMRES and GMG is addressed with the emphasis on
the practical implementation issues. Finally numerical experiments are undertaken to assess the
performance of the proposed MG-GMRES algorithm for a set of examples with a wide spectrum
of system conditioning.
2. GMRES WITH VARIABLE PRECONDITIONER
2.1. Standard restarted GMRES
First a general introduction is given to the restarted version of the standard GMRES algorithm
with right preconditioning for solving the following linear system of equations:
Ax = b
(1)
or equivalently the preconditioned equations
AM−1 (Mx) = b
(2)
where A ∈ Rn×n is the non-singular coecient matrix, M ∈ Rn×n is the preconditioning matrix,
and b; x ∈ R n are respectively the right-hand side and the solution to be sought.
? 1998 John Wiley & Sons, Ltd.
Int. J. Numer. Meth. Engng. 42, 1441–1462 (1998)
A MULTI-GRID ENHANCED GMRES ALGORITHM FOR ELASTO-PLASTIC PROBLEMS
1443
The GMRES algorithm begins with an initial guess x0 , and thus an initial residual r0 = b − Ax0
is calculated. Then a (modied) Gram–Schmidt process is used to construct an orthogonal basis
{vi ; i = 1; : : : ; j} of the preconditioned Krylov subspace
Kj (AM−1 ; r0 ) ≡ Span{r0 ; AM−1 r0 ; : : : ; (AM−1 ) j r0 }
The approximation solution to (1) is then expressed as
xj = x0 + Vj yj
where Vj = {v1 ; : : : ; vj } and yj is the solution of the following least-squares problem:
j = Minj kb − A(x0 + Vj yj )k
yj ∈R
(3)
i.e. at step j, GMRES attains a solution that has a minimum residual norm in the shifted preconditioned Krylov subspace x0 + Kj (AM−1 ; r0 ). The outline of the GMRES algorithm is presented
as follows.1
Algorithm 1: Restarted GMRES (k) with right preconditioning
1. Start: Choose x0 and a dimension k of the Krylov subspace.
k and initialize all its entries hi; j to zero.
Dene a (k + 1)×k matrix H
2. Arnoldi process:
(a) Compute: r0 = b − Ax0 ; 1 = kr0 k, and v1 = x0 =1 .
(b) For j = 1; : : : ; k Do
(i) Compute: zj = M−1 vj ;
(ii) Compute: w = Azj ;
(iii) For i = 1; : : : ; j Do: hi; j = wT vi ; w ← w − hi; j vi : End Do
(iv) Compute: j+1 = hj+1; j = kwk and vj+1 = w=j+1
(v) Convergence check: if j 6 (given tolerance), goto (c)
End Do
(c) Dene Vj = [v1 ; : : : ; vj ]
3. Form the approximate solution: Compute xj = x0 + M−1 Vj yj where yj is the solution of
j yk in which e1 = [1; 0; : : : ; 0]T
miny∈R j k1 e1 − H
4. Restart: If j 6 stop, else set x0 ← xj and go to 2.
j by deleting its last row. Then the iterative
We denote by Hj the j ×j matrix obtained from H
procedure of step 2 can be expressed in compact form as
AM−1 Vj = Vj Hj + j+1 vj+1 ejT
? 1998 John Wiley & Sons, Ltd.
(4)
Int. J. Numer. Meth. Engng. 42, 1441–1462 (1998)
1444
Y. T. FENG, D. PERIC AND D. R. J. OWEN
with VjT Vj = Ij (identity matrix of order j), VjT vj+1 = 0. In practical implementation, the Givens
transformation is employed to solve the least-square problem
j yk
j = miny∈R j k1 e1 − H
(5)
j is factorized into the following form:
By using j successive plane rotations, H
∗j
j = Qj H
H
(6)
where Qj is a (j + 1)×( j + 1) matrix and is the accumulated product of the rotation matrices,
∗j is an upper triangular matrix of dimension (j + 1)×j, whose last row is zero. Applying
while H
the same rotations to 1 e1 yields
g j = 1 Qj e1
(7)
As kQj k = Ij+1 , the solution to (5) can then be obtained by solving the upper triangular linear
system:
Hj∗ yj = gj
j∗ and the last component of g j
in which H∗j and gj result from removing the last row of H
respectively. More importantly, the last component of gj is in fact the residual norm, i.e.
T
j = ej+1
g j
Alternatively, if we apply Qj−1 to hj , which is the jth column of Hj , and let j be the rotated
value of hj; j , i.e.
j = ejT Qj−1 hj
the residual norm j can be attained as
j = sj j−1
(with 0 = 1)
(8)
q
2 + 2 61. Clearly, 6
where sj = j+1 = j+1
j
j−1 , namely, the error norm is not increased from
j
one iteration to the next. In addition, if j+1 = 0, it follows that j = 0, thus the solution xj will
be exact. Note that both j+1 and j will not be zero at the same time.
2.2. GMRES with variable preconditioning
If non-constant or non-linear preconditioners are adopted, the standard GMRES algorithm can
readily incorporate these changes into its algorithmic framework. Assume that the jth preconditioning step can be symbolically denoted by
zj = Mj−1 vj
? 1998 John Wiley & Sons, Ltd.
Int. J. Numer. Meth. Engng. 42, 1441–1462 (1998)
A MULTI-GRID ENHANCED GMRES ALGORITHM FOR ELASTO-PLASTIC PROBLEMS
1445
although preconditioning matrix Mj may not be expressed in explicit form in many cases. Then
the GMRES algorithm with variable preconditioning can be represented in the following form.4
Algorithm 2: Restart GMRES(k) with variable preconditioning
1. Start: Choose x0 and a dimension k of the Krylov subspace.
k and initialize all its entries hi; j to zero.
Dene a (k + 1)×k matrix H
2. Arnoldi process:
(a) Compute: r0 = b − Ax0 ; 1 = kr0 k; and v1 = x0 =1 .
(b) For j = 1; : : : ; k Do
(i) Compute: zj = Mj−1 vj ;
(ii) Compute: w = Azj ;
(iii) For i = 1; : : : ; j Do: hi; j = wT vi ; w ← w − hi; j vi : End Do
(iv) Compute: j+1 = hj+1; j = kwk and vj+1 = w=j+1
(v) Convergence check: if j 6 (given tolerance), goto (c)
End Do
(c) Dene Zj = [z1 ; : : : ; zj ]
3. Form the approximate solution: Compute xj = x0 + Zj yj where yj is the solution of miny∈R j
j yk
k1 e1 − H
4. Restart: If j 6 stop, else set x0 ← xj and go to 2.
As can be observed, this new variant has no additional cost incurred in the arithmetic but requires
extra memory to save the set of vectors {zj }, j = 1; : : : ; k. An important property of this variant
is that it still satises the residual norm minimization property over the preconditioned Krylov
subspace just as in the standard GMRES algorithm. The only dierence is that the approximate
solution xj obtained at step j minimizes the residual norm kb−Axj k over space x0 +Span{Zj }. This
dierence, however, no longer guarantees that xj is exact if j+1 = 0 unless j 6= 0, or equivalently,
Hj is non-singular. In addition, the iteration procedure may break down if j+1 = 0 and j = 0.
However, this new approach provides the possibility of enhancing its performance by choosing
the most appropriate preconditioner at each iteration step. This added feature therefore signicantly
osets the diculties mentioned above. Furthermore, our experience shows that breakdown of the
GMRES iteration procedure does not happen in practice if a conventional preconditioning scheme
is adopted.
In order to indicate the performance of the jth preconditioning step, a reduction factor j is
dened as
j =
kvi − Azj k
= kvi − Azj k
kvi k
(9)
Accordingly, a preconditioning step can be regarded as ‘good’ if the corresponding reduction
factor is reasonably small, say ≈ 0·5. It is worth pointing out that this factor will play an
important role in the analysis to be conducted in Section 4. Next, a relation between j and j
will be established.
? 1998 John Wiley & Sons, Ltd.
Int. J. Numer. Meth. Engng. 42, 1441–1462 (1998)
1446
Y. T. FENG, D. PERIC AND D. R. J. OWEN
As
j2 = kAzj k 2 − 2vjT Azj + 1 = kAzj k 2 − 2hj; j + 1
and the jth iteration of GMRES can be represented as
Azj = Vj hj + j+1 vj+1
i.e.
2
kAzj k 2 = khj k 2 + j+1
Thus
2
j+1
= j2 − khj k 2 + 2hj; j − 1 = j2 − (hj; j − 1) 2 −
j−1
P
i=1
hj;2 i 6j2
namely
j+1 6j
(10)
Therefore it follows that
j 6 q
j
2 + 2
j+1
j
j−1
(11)
which reveals the inuence of the quality of a preconditioning step on the convergence of the
GMRES procedure. In particular, when the preconditioning is ‘exact’ at step j, i.e. Azj = vj , the
approximation xj will be exact if in addition j−1 6= 0 or Hj is non-singular (note that j would
not be equal to zero if j−1 6= 0 in this situation). More importantly, our numerical experience
suggests that the following relation normally holds for the case of j ¡1
j ≈ j j−1
(12)
Applications of this expression and further numerical verication will be presented in Sections 4
and 5 respectively. For the case of ¿1, i.e. a poor preconditioning step, the property of GMRES
can still ensure a non-increased residual norm j 6j−1 .
3. THE GALERKIN MULTI-GRID METHOD
The essential multi-grid principle is based on the observation that the smooth (or long-wavelength)
part of the error, which may not be eciently swept out by iterative methods, could be substantially
reduced by a coarse mesh correction. The success of MG strategies lies primarily in (i) their
excellent convergence characteristics, which theoretically should not depend on the size of the
nite element mesh; (ii) their high eciency whereby solutions of problems with neq unknown
are obtained with O(neq ) in terms of work and storage for large classes of problems. Several
dierent schemes of multi-grid techniques have been put forward in the last decade.8 – 11 In this
paper we focus on one particular scheme termed the Galerkin multi-grid method proposed by
Feng et al. in Reference 7.
? 1998 John Wiley & Sons, Ltd.
Int. J. Numer. Meth. Engng. 42, 1441–1462 (1998)
A MULTI-GRID ENHANCED GMRES ALGORITHM FOR ELASTO-PLASTIC PROBLEMS
1447
To illustrate the basic idea of the Galerkin multi-grid scheme we consider its two-grid form.
Suppose that Gc and G are, respectively, coarse and ne meshes which discretize the same geometrical domain , and that the ne mesh is supposed to represent the current problem considered.
It should be emphasized that both ne and coarse meshes can be totally non-nested as well as
fully unstructured. We use subscript c to distinguish the quantities of the coarse mesh from those
of the ne mesh. Let Ac be the coarse grid matrix, and P and Q be, respectively, the matrix
representations of the interpolation and projection operators. In the GMG method, the coarse mesh
matrix Ac is constructed by direct projection of the ne mesh matrix as
Ac = QAP
(13)
Here the projection operator Q is taken as Q = PT . Therefore
Ac = PT AP
(14)
Ecient computation of Ac is crucial to achieve an overall high performance of the complete GMG
method. Such an implementation, together with other relevant issues, can be found in Reference 7.
Let S(x; ) denote the smoother with x as the initial guess and the maximum number of
iterations, and let 1 and 2 be the maximum iterations of the pre- and post- smoothing procedures
performed, respectively, before and after the coarse grid correction which is accomplished by
a prole solver.
Then one cycle of two-level multi-grid iteration can be represented as follows:
Algorithm 3: One cycle of two-level multi-grid MG(b; S; 1 ; 2 )
• Pre-smoothing
– Smoothing on ne mesh: x ← S(0; 1 )
– Compute the new residual: r = b − Ax
• Coarse mesh correction
– Project the residual: rc = PT r
– Solve: Ac xc = rc
• Post-smoothing
– Update initial guess: x ← x + Pxc
– Smoothing on ne mesh: x ← S(x; 2 )
Obviously, the eciency of GMG will depend upon the quality of the coarse mesh and the
appropriate selection of interpolation and projection operators. Once Ac , P and Q are determined,
the performance of GMG will entirely depend on the smoother S and the numbers of iterations
1 ; 2 . The practical selection of smoothers can range from very simple Jacobi (or DS), Gauss–
Seidel, SOR, SSOR to incomplete decomposition, and even to any iterative algorithm. In the present
work, preconditioned CG and BiCGStab algorithms are respectively chosen as the smoother for
symmetric and non-symmetric problems.
Naturally the previously dened reduction factor can also be employed as an indicator to
measure the performance of each GMG iteration cycle, or more specically, to indicate the accuracy
of the solution obtained by the cycle.
In the above algorithm, the smoothing procedure is terminated if the predetermined number
of iterations is performed. The disadvantage of this strategy is that, as plastic zone develops,
? 1998 John Wiley & Sons, Ltd.
Int. J. Numer. Meth. Engng. 42, 1441–1462 (1998)
1448
Y. T. FENG, D. PERIC AND D. R. J. OWEN
an increased number of smoothing iterations may be required to sweep the high frequency error
due to the increased ill-conditioning caused by the incompressible plastic ow. Alternatively, the
smoothing sweep can be stopped if the current reduction factor is less than a prescribed tolerance
¡1, i.e.
6
(15)
Instead of using either of these approaches, a scheme which combines both versions is actually
employed in the current work, namely, the GMG iteration cycle will be terminated (i) if an
approximation solution with the specied tolerance is achieved; or (ii) if the predetermined
maximum number of iterations is performed. One of the advantages of this combined scheme will
be demonstrated in the next section.
As mentioned in the introduction, an important point is that one cycle of MG iteration can be
considered as a preconditioning step, denoted as
x = MG(b; S; 1 ; 2 ; )
(16)
Except for cases where linear stationary iterative algorithms are used as the smoother, this preconditioning step will be generally non-linear in terms of the relation between b and x.
Finally, it is important to note that the Galerkin multi-grid method has some distinct features
over its conventional MG counterparts. As the Galerkin strategy has been fully adopted for the
generation of coarse mesh equations and no material and loading information for coarse meshes are
utilized, the GMG approach is relatively easy to incorporate into the existing solution procedures,
and is particularly suitable for implementation in material non-linear cases, including elasto-plastic
and frictional contact. For geometrical non-linear cases, the approach may use a constant transfer
operator throughout the whole solution process without signicantly inuencing the convergence
property. The other forms of coarse mesh evolution patterns have been extensively discussed in
Reference 12. Another important feature of the GMG method is that coarse and ne meshes can
be non-nested and unstructured which not only allows for easy treatment of complex geometry
problems, but also provides a possibility for easy combination with adaptive mesh renement
techniques. For more details regarding these features, please see Reference 7.
4. GMRES WITH MULTI-GRID AS PRECONDITIONER
By employing one cycle of the multi-grid algorithm described above as the preconditioning scheme
in GMRES(k), a non-linear GMRES scheme is obtained, referred to as the MG-GMRES(k) algorithm. This algorithm is identical to Algorithm 3 except that the preconditioning step 2(b)(i) is
replaced with
(i) MG Preconditioning: zj = MG(vj ; S; 1 ; 2 ; )
It is important to note that instead of considering the MG cycle as the preconditioner for GMRES,
the MG-GMRES(k) algorithm can be equivalently viewed as a new variant of the MG method
using GMRES to accelerate its outer iteration procedure. Rather than further exploiting the theoretical aspects of the MG-GMRES(k) algorithm we will focus on some implementation issues that
may greatly inuence the performance of the algorithm in practice.
? 1998 John Wiley & Sons, Ltd.
Int. J. Numer. Meth. Engng. 42, 1441–1462 (1998)
A MULTI-GRID ENHANCED GMRES ALGORITHM FOR ELASTO-PLASTIC PROBLEMS
1449
4.1. Avoidance of matrix–vector multiplication
One matrix–vector multiplication is required at each iteration in the standard GMRES algorithm.
This multiplication can, however, be totally avoided in the MG-GMRES scheme if the reduction
factor is monitored in each MG cycle. In fact, as discussed in the preceding section, is
always evaluated within MG iterations in order to terminate the iteration if the condition 6 is
satised. Therefore, the residual rj = vj − Azj is available after each MG cycle and consequently
the multiplication w = Azj can be simply obtained as
w = vj − rj
With this modication, only 2j vector–vector operations are actually required at the jth outer iteration of the MG-GMRES approach, which is, in terms of computational cost, normally negligible
in comparison with MG inner iterations, particularly for large-scale 3-D applications, unless the
number of outer loops is too large.
4.2. Termination of last MG iteration cycle
Our previous choice for the termination of the MG iteration cycle is that ¡ or the predetermined maximum number of iterations is performed. An obvious disadvantage of this choice is
that the (1 + 2 ) number of ne grid smoothings, or the required number that results in ¡, are
always applied at each MG cycle and that might lead to a higher accuracy than what is actually
required in some cases. For example, when m−1 is close to satisfying the stopping criterion ,
then it is expected that a small number of ne grid smoothing iterations will be sucient to give
m ¡. A simple strategy that may be able to amend the above disadvantage is to nd the required
accuracy of the solution at the last MG cycle. According to the empirical relation (12), it is easy
to estimate the required tolerance as
m = =m−1
and thus a considerable number of smoothing iterations may be saved at the last MG cycle if the
prescribed value of is too small and the value of 1 + 2 is too large. Of course, the eciency of
this strategy is dependent on the validation of relation (12). Our limited experience has shown that
this relation appears to perform reasonably well for various cases. Further numerical verication
will be presented in the next section. In order to deal with the problem that the last GMG iteration
cycle m is normally unknown a priori, we modify the expected tolerance at any iteration j to be
j = max{; =j−1 }
and correspondingly the preconditioning step becomes
zj = MG(vj ; 1 ; 2 ; j )
4.3. Choice of the restart value of k
The parameters which need to be chosen in the MG-GMRES algorithm include the restart value
k, the pre- and post-smoothing numbers 1 ; 2 and the prescribed tolerance . An appropriate
selection of a value for k should lead to a good balance between the memory requirement and
the convergence rate under the restriction of the available computer resources. Without taking
into account the storage requirement of the MG iteration, the major memory requirement of the
? 1998 John Wiley & Sons, Ltd.
Int. J. Numer. Meth. Engng. 42, 1441–1462 (1998)
1450
Y. T. FENG, D. PERIC AND D. R. J. OWEN
Table I. Required iterations of GMRES for dierent at = 10−5
n
0·1
5
0·2
8
0·3
10
0·4
13
0·5
17
0·6
23
0·7
33
0·8
52
0·9
110
outer GMRES procedure consists of 2k vectors vj ; zj , j = 1; : : : ; k. As far as the convergence is
concerned, however, no theoretical result seems available to indicate the required iteration number
for any given k under a general condition. For this reason, a simplied analysis will be conducted
to estimate the total number of iterations n for a full version GMRES. Assume that a constant
reduction factor is always achieved at each MG iteration cycle. Then by recursively applying
the relation (12), it follows that
n ≈ 1 · · · n = n 6
Hence n may be estimated as
n≈
log log (17)
Table I lists the required number of iterations of the full version of GMRES for dierent reduction factor at tolerance level = 10−5 . Note that a very attractive property of the result is that
the convergence of GMRES is independent on the problem size. Based on the above analysis,
a recommended upper bound of k is taken as 15. With this value, the storage requirement of
GMRES(k) generally accounts for a small part of the total memory required by the whole FE
computation, particularly for 3-D elasto-plastic applications. Furthermore, no restart may be necessary in the case that 60·5, i.e. a full version may actually be used. Even if ≈ 0·7, one
restart run is usually sucient to obtain the solution, meaning that the convergence rate may not
be signicantly degraded. Only in the case that the MG cycle performs poorly, e.g. ¿0·9, will
a good convergence rate possibly not be attained. Nevertheless enhancing the MG iteration rather
than increasing k may be more benecial in this situation.
4.4. Selection of MG parameters
After k is determined, the performance of the MG-GMRES(k) algorithm will entirely depend
on the selections of MG parameters 1 ; 2 and . Ideally, these parameters should be chosen in
such a way that the minimum computational cost in terms of CPU time is required to attain
the solution. Unfortunately, due to the complex relationship between these parameters and the
convergence characteristics of the MG-GMRES(k) algorithm, it is very dicult to give a simple
expression that can explicitly reect the eects of these parameters on the total computational cost.
Thereafter a similar analysis based on a simplied situation as done previously will be presented
to give some indication of the eects on the total solution cost. Suppose that a constant is
achieved at each MG cycle with the same computational cost Win (). If the cost of the outer
(full version) GMRES iteration process can be denoted by Wout (n)( = O(n2 )), where n is the total
number of iterations determined approximately by equation (17), then the total computational cost
of MG-GMRES(k) can be expressed as
W = Wout (n) + nWin ()
? 1998 John Wiley & Sons, Ltd.
(18)
Int. J. Numer. Meth. Engng. 42, 1441–1462 (1998)
A MULTI-GRID ENHANCED GMRES ALGORITHM FOR ELASTO-PLASTIC PROBLEMS
1451
Figure 1. Relations of Win ; Wout and W with for ∈ (0; 1)
The relations of Win ; Wout and W with for ∈ (0; 1) are respectively depicted in Figure 1 for
illustration purposes, in which the curve of Win () is plotted on the basis of the observation that
in the multi-grid algorithm the convergence rate of the smoothing process normally slows down
with increase in the number of steps. The gure demonstrates that for a smaller , a relatively
small number of outer loops of GMRES are required to obtain the solution, but the cost of each
inner MG iteration will increase, and as a result, the overall performance of MG-GMRES may be
degraded in terms of CPU time requirement. On the contrary, for a larger value of , less cost
may be required by each inner MG iteration, but the signicantly increased number of outer loops
may make the total cost too high. Therefore, there should exist an optimal value of ; opt , which
will minimize the total cost by achieving a balance between the cost of each MG cycle and the
number of outer GMRES iterations. However, it is quite dicult to determine this optimal value as
the relation of Win () with may be too complex to be well established, and furthermore, it may
be problem dependent. From a practical point of view, however, it will be satised if a value
reasonably close to the optimal could be selected. Therefore, our target is to appropriately choose
the parameters 1 ; 2 and so that the resulting may not be far away from the optimal value.
Numerical experiments show that for linear problems or at the early stage of elasto-plastic cases,
a small value for both 1 and 2 may be sucient to achieve a relatively small , but with the
development of plastic zones, large values will become necessary in order to obtain a reasonably
small reduction factor. Based on this observation, in order to achieve a high eciency for a wide
range of elasto-plastic non-linearity, should not be selected too small, whereas the value of 1 +2
should be suciently large. It is important to point out that if the MG iteration cycle is stopped
only after 1 and 2 pre- and post-smoothing sweeps are performed, as currently used in the
literature, or stopped alternatively after 6, the expected high performance of MG-GMRES(k)
for both linear and elasto-plastic situations may not be achieved.
? 1998 John Wiley & Sons, Ltd.
Int. J. Numer. Meth. Engng. 42, 1441–1462 (1998)
1452
Y. T. FENG, D. PERIC AND D. R. J. OWEN
4.5. Adoption of variable smoothers in MG
Similarly to GMRES, dierent smoothing algorithms may, at least in theory, be applied at each
cycle of MG iteration so that more robust MG, and thus MG-GMRES procedures could also be
achieved.
5. NUMERICAL EXPERIMENTS
In this section, three elasto-plastic problems of two and three dimensions including small and
nite strain situations are presented to provide an assessment of the performance of the proposed
MG-GMRES(k) approach. The numerical experiments are also undertaken in order to further
verify relation (12) and the modied convergence check scheme for the last MG iteration cycle
proposed in the previous section. It should be emphasised that these examples are characterized
by large scale (up to 82145 active DOF), strong non-linearity (nearly plastic limit state, necking
and localization) and severe ill-conditioned states (presence of loading limit points). Furthermore,
various situations including symmetric and unsymmetric as well as SPD and indenite system
matrices are all covered. The behaviour of the MG-GMRES(k) algorithm for a metal forming
process involving frictional contact can be found in Reference 13.
The performance of the MG-GMRES(k) method is assessed by comparing to similar computations carried out by the following algorithms: (i) the standard incomplete Cholesky factorization preconditioned conjugate gradient method (IC-CG) in symmetric cases; (ii) the IC-BiCGStab
method for the unsymmetric problem; and (iii) the standard GMG algorithm. The standard GMRES algorithm is not included because its performance in our unsymmetric case is inferior in
comparison to BiCGStab. The same tolerance of 10−5 for the relative residual norm is applied as
the termination criterion as for the standard GMG, MG-GMRES and CG=BiCGStab procedures.
A two-grid form of the GMG scheme is used in the present work. It should be emphasised
that both the ne and coarse meshes are totally non-nested and have been generated independently
from each other. Moreover, except for the ne mesh of the third example, all meshes are also
fully unstructured. The nodal numbering of the ne mesh, together with the prole of the coarse
mesh correction equation Ac , is optimized by Lewis’ implemented version of the Gibbs-King
algorithm.14; 15 Both IC-CG and IC-BiCGStab are also employed, respectively, as the ne mesh
smoother in the symmetric and unsymmetric cases. The combination strategy as described in
Section 4 is adopted to switch between a MG cycle and a GMRES iteration, with the corresponding
parameters taken as 1 = 0; 2 = 15 and = 0·25. As recommended, the GMRES restart value of
k is taken as 15. In addition, a constant transfer operator is used for geometrically non-linear
problems (Examples 2 and 3).
A full Newton–Raphson method is employed in all computations, and the convergence of the
nite element solution is established on the basis of the standard Euclidian norm of the out-ofbalance force, with the tolerance taken as 10−4 . All results presented here are carried out on a SGI
Challenge using one R4400 processor.
5.1. Problem descriptions
Example 1 (Three-dimensional beam—perfect elasto-plastic material at small strains). The
geometry of this example is characterized by a 4 ×4 ×20 block, with one end clamped and a distributed vertical load applied at the other end. The material is assumed to be perfect elasto-plastic,
? 1998 John Wiley & Sons, Ltd.
Int. J. Numer. Meth. Engng. 42, 1441–1462 (1998)
A MULTI-GRID ENHANCED GMRES ALGORITHM FOR ELASTO-PLASTIC PROBLEMS
1453
Figure 2. Three-dimensional beam—perfect elasto-plastic case: (a) ne mesh: element type = 10-node tetrahedra, elements = 13 468, nodes = 20 521, DOF = 60 668; (b) coarse mesh: element type = 4-node tetrahedra, elements = 5411,
nodes = 1216, DOF = 3540
with Young’s modulus E and Poisson’s ratio taken, respectively, as E = 2·1 ×105 and = 0·32
and the yield stress as Y = 200. The ne mesh is composed of 13 468 unstructured 10-node
tetrahedral elements with 60 668 active DOF, while the coarse mesh consists of 5411 unstructured
4-node tetrahedral elements with 3540 active DOF. Both meshes are shown in Figure 2. The total
load is applied gradually up to the total value of 184·93 and a curve depicting the total load
against the vertical displacement of the top corner, obtained by the MG-GMRES(15) approach,
is illustrated in Figure 3. As can be clearly seen from the curve, the plastic limit state is almost
reached at the nal increment, implying a very ill-conditioned situation at that point.
Example 2 (Three-dimensional plate with a hole—perfect elasto-plastic with geometrical
non-linearity). This example simulates the nite stretch of a perfect elasto-plastic 3-D plate (of dimensions 20 ×10×5) with a cylindrical hole (of radius 5) in the centre. The elasto-plastic material
is dened by a Young’s modulus E = 70 and Poisson’s ratio = 0·29 and the yield stress is taken
as Y = 0·243. Due to symmetry, only a quarter of the plate is considered in the analysis. The ne
mesh consists of 18 698 unstructured 10-node tetrahedral elements with 82 145 active DOF, while
the coarse mesh consists of 2194 unstructured 4-node tetrahedral elements with 1542 active DOF.
The two unstructured meshes of the problem are shown in Figures 4(a) and 4(b). A horizontal
stretch of u = 2·0 is imposed at the far end of the plate. The nal conguration of the ne mesh
is shown in Figure 5, from which the occurrence of a necking phenomenon can be clearly seen.
Figure 6 depicts the force–displacement diagram obtained by MG-GMRES(15) during the load
incrementation. The curve reveals that the structure under the current loading condition passes
? 1998 John Wiley & Sons, Ltd.
Int. J. Numer. Meth. Engng. 42, 1441–1462 (1998)
1454
Y. T. FENG, D. PERIC AND D. R. J. OWEN
Figure 3. Three-dimensional beam—perfect elasto-plastic case. Force–displacement diagram and performance comparison
points
a loading limit point around u = 0·105, which implies a severe ill-conditioned situation near that
point, and a ‘softening’ behaviour thereafter, which gives rise to an indenite system of equations.
It may be helpful to note that the nal conguration might not represent the actual behaviour of
the structure as a consequence of an inadequate number of elements used, but this disagreement,
if it exists, does not aect our current purpose.
Example 3 (Two-dimensional bar—elasto-plastic geometrically non-linear case). The example
consists of simulation of the nite stretch of an 53·334 mm ×12·826 mm rectangular bar under
plane strain condition. A width reduction of 1·8 per cent is introduced in the centre of the bar to
trigger strain localisation. Besides the presence of both material and geometrical non-linearities, this
example also involves an unsymmetric condition, which arises due to adoption of a new 4-node
quadrilateral element for large straining of nearly impressible solids, as described in Reference 16.
The material is assumed to be elasto-plastic with Young’s modulus E = 206·9, Poisson’s ratio
= 0·29, yield stress Y = 0·45 and an isotropic hardening law dened by
Y ()
= (∞ − 0 )[1 − exp(−)]
+ H with constants 0 = 0·45; ∞ = 0·715; = 16·93 and H = − 0·012924. The symmetry of the
problem results in a reduction of the computation domain to one quarter of the bar. Figures 7(a) and
7(b) illustrate two non-nested meshes. A horizontal prescribed displacement u = 4·2 mm is applied
incrementally to the right end of the bar. Figure 7(c) depicts the nal deformed conguration
? 1998 John Wiley & Sons, Ltd.
Int. J. Numer. Meth. Engng. 42, 1441–1462 (1998)
A MULTI-GRID ENHANCED GMRES ALGORITHM FOR ELASTO-PLASTIC PROBLEMS
1455
Figure 4. Three-dimensional plate with a hole—perfect elasto-plastic with geometrical non-linearity: (a) ne mesh: element type = 10-node tetrahedra, elements = 18 698, nodes = 27 849, DOF = 82 145; (b) coarse mesh: element type = 4-node
tetrahedra, elements = 2194, nodes = 545, DOF = 1542
Figure 5. Three-dimensional plate with a hole—perfect elasto-plastic with geometrical non-linearity. Final deformed
conguration (u = 2·0)
corresponding to the prescribed displacement u = 4·2 mm, from which a very localized shear band
can be observed. A curve depicting the force and displacement obtained by MG-GMRES(15)
during the loading process is plotted in Figure 8, revealing a similar non-linear behaviour as the
previous example. Note that the maximum force is reached near u = 2·75.
? 1998 John Wiley & Sons, Ltd.
Int. J. Numer. Meth. Engng. 42, 1441–1462 (1998)
1456
Y. T. FENG, D. PERIC AND D. R. J. OWEN
Figure 6. Three-dimensional plate with a hole—perfect elasto-plastic with geometrical non-linearity. Force– displacement
diagram and performance comparison points
Details of the above three examples, together with the corresponding DOF ratio between the
coarse and ne meshes, are summarised in Table II. It is noted that the equation order of Example 3
is about 10 000, which is regarded as a moderate system. The restriction that prevents us from
testing a larger problem lies in the fact that a very ne mesh may become so distorted even at
very early stage of the computation that the solution procedure may not continue unless remeshing
is applied.
5.2. Performance comparison
The performance of the proposed MG-GMRES algorithm, together with CG, BiCGStab and the
standard GMG, is measured in terms of CPU time (s) and iterations (the number of ne mesh
sweeps for GMG and MG-GMRES). The CPU cost includes the time spent on the solution phase,
and also the time of forming and factorizing the coarse mesh matrix Ac for GMG and MGGMRES(k) methods, but excludes the cost of the incomplete Cholesky decomposition procedure.
As the transfer operator P is generated once for the whole non-linear computation, the corresponding cost is negligible with respect to the whole analysis computation cost and thus is not taken
into account. An important consideration in conducting the comparison is to provide a complete
picture of the behaviour of each algorithm for a wide range of non-linearity from the linear state
to very ill-conditioned situations. To achieve this goal, the performances of the algorithms for each
? 1998 John Wiley & Sons, Ltd.
Int. J. Numer. Meth. Engng. 42, 1441–1462 (1998)
A MULTI-GRID ENHANCED GMRES ALGORITHM FOR ELASTO-PLASTIC PROBLEMS
1457
Figure 7. Two-dimensional bar—geometrically non-linear and elasto-plastic case: (a) ne mesh: element type = 4-node
quadrilateral, elements = 4800, nodes = 4961, DOF = 9719; (b) coarse mesh: element type = 4-node quadrilateral,
elements = 599, nodes = 663, DOF = 1245; (c) The nal deformed conguration (u = 4·2)
example are assessed at several selected loading steps, each representing a dierent deformation
stage of the structure.
Tables III –V summarize the averaged performances of CG=BiCGStab, GMG and MG-GMRES
(15) at each selected loading step for Examples 1, 2 and 3, respectively. The tables also list the
CPU speed-ups of GMG and MG-GMRES over CG=BiCGStab and the results corresponding to the
linear state. In addition, the averaged numbers of coarse mesh corrections for GMG and of outer
GMRES loops for MG-GMRES are also given in the bracket next to the iteration number. As an
unsymmetric system of equations arises due to the adoption of a new form of element in the third
example, BiCGStab is employed instead of CG for the comparison. The positions of the selected
loading steps for performance comparison for each example are also marked on the corresponding
solution curves shown in Figures 3, 6 and 8, respectively, while the number of Newton–Raphson
iterations required at each step can be found in the tables.
From the results, it is evident that MG-GMRES is the most ecient approach in terms of
both CPU cost and convergence in all circumstances. Firstly, MG-GMRES clearly outperforms
CG=BiCGStab with a CPU speedup ranging from 1·31 to 5·98, although it is observed that
the performance of MG-GMRES in general degrades gradually as the systems become more illconditioned. It not only exhibits an excellent performance with a CPU speedup from 3·54 up to
5·98 at the linear or early stage of elasto-plastic deformation of all three examples considered,
? 1998 John Wiley & Sons, Ltd.
Int. J. Numer. Meth. Engng. 42, 1441–1462 (1998)
1458
Y. T. FENG, D. PERIC AND D. R. J. OWEN
Figure 8. Two-dimensional bar—geometrically non-linear and elasto-plastic case. Force– displacement diagram and
performance comparison points
Table II. Summary of Three elasto-plastic examples
Coarse mesh
Element
type
Fine mesh
Element
type
Example
Description
1
Small strain
nearly limit
state
4-Node
tetrahedral
5411
10-Node
1216 3540 tetrahedral
13 468
20 521 60 668
5·835
3-D Plate Large strain
with
necking
A Hole
4-Node
tetrahedral
2194
10-Node
545 1542 tetrahedral
18 698
27 849 82 145
1·877
599
4-Node
663 1245 quadrilateral
2
3-D
Beam
3 2-D Bar
Large strain
4-Node
localization quadrilateral
unsymmetric
Elements Nodes DOF
Coarse &
ne mesh
DOF
Elements Nodes DOF ratio (%)
4800
4961
9719
12·82
but also achieves very good CPU speedups (1·65; 2·54 and 4·72 for three examples, respectively)
at severe ill-conditioned cases (near load limit points). However, as the necking and localization develops, namely, the stiness matrices become indenite, the performance of MG-GMRES
method is usually degraded, although reasonable speed-ups still have been obtained. This fact
? 1998 John Wiley & Sons, Ltd.
Int. J. Numer. Meth. Engng. 42, 1441–1462 (1998)
1459
A MULTI-GRID ENHANCED GMRES ALGORITHM FOR ELASTO-PLASTIC PROBLEMS
Table III. Comparison of performances of CG, GMG and MG-GMRES(15) (Example 1)
Load
109·60
(Linear)
129·69
149·78
169·87
179·91
184·93
N-R
iter.
No.
CG
Standard GMG
Iter.
CPU
Iter.
1
268
695·85
34(8)
4
4
5
6
8
264
270
295
378
492
685·21
700·94
764·63
993·88
1316·54
MG-GMRES (15)
CPU
Speed-up
Iter.
CPU
Speed-up
4·47
32(8)
151·91
4·58
5·43
2·75
2·31
0·713
¡0·593
22(7)
58(8)
74(9)
112(10)
226(17)
122·43
231·96
276·64
398·33
795·58
5·60
3·02
2·76
2·50
1·65
155·67
24(7)
65(8)
96(10)
442(31)
¿712(49)
126·12
254·64
331·04
1392·59
¿2221·31
Table IV. Comparison of performances of CG, GMG and MG-GMRES(15) (Example 2)
u
0·00
(Linear)
0·04
0·105
0·30
N-R
iter.
No.
CG
Standard GMG
MG-GMRES(15)
Iter.
CPU
Iter.
CPU
Speed-up
Iter.
CPU
Speed-up
1
156
554·51
16(7)
115·89
4·78
13(6)
101·20
5·48
5
7
5
205
357
365
711·32
1278·0
1306·2
25(8)
205(16)
246(18)
153·78
894·07
1078·1
4·63
1·43
1·21
22(7)
115(10)
130(10)
137·30
524·38
587·83
5·18
2·54
2·22
Table V. Comparison of performances of BiCGStab, GMG and MG-GMRES(15) (Example 3)
u
0·00
(Linear)
1·20711
2·66247
3·35269
4·20000
N-R
iter.
No.
BiCGStab
Standard GMG
MG-GMRES(15)
Iter.
CPU
Iter.
CPU
Speed-up
Iter.
CPU
Speed-up
1
47
11·68
4(4)
3·24
3·60
4(4)
3·30
3·54
4
4
12
7
298
530
309
95
73·53
130·61
75·70
23·55
50(7)
114(10)
328(24)
71(8)
15·26
32·02
86·02
21·41
4·82
4·08
0·88
1·10
38(6)
77(8)
159(14)
59(7)
12·29
22·49
44·13
18·01
5·98
4·72
1·72
1·31
reveals that the current MG-GMRES algorithm may not be able to deal with indenite systems very
eciently.
Secondly, it is also clear that MG-GMRES is superior to the standard GMG algorithm in terms
of both convergence speed and computational cost, but showing a totally dierent pattern from the
above CG=BiCGStab case. More specically, the performance dierence between both GMG and
MG-GMRES is generally marginal at the linear stage, but gradually becomes considerable when
the plastic zones develop, particularly when the system is severely ill-conditioned. This general
trend indicates that the GMRES acceleration to the MG iterations may not be signicant when the
MG cycle performs well (e.g. for linear cases), but may become very signicant if the performance
of MG is deteriorated (normally for poor conditioned cases).
? 1998 John Wiley & Sons, Ltd.
Int. J. Numer. Meth. Engng. 42, 1441–1462 (1998)
1460
Y. T. FENG, D. PERIC AND D. R. J. OWEN
Figure 9. Comparison of the relative residual norm predicted by relation (12) and the actual value at one typical
Newton–Raphson iteration within an arbitrary selected loading step: (a) Example 1; (b) Example 2; (c) Example 3
The computational results also reveal that the maximum number of outer GMRES loops will
normally not exceed 15 except for a few extremely ill-conditioned cases. Therefore the restart
number of k may be taken as less than 15 if the memory consumption becomes intensive.
? 1998 John Wiley & Sons, Ltd.
Int. J. Numer. Meth. Engng. 42, 1441–1462 (1998)
A MULTI-GRID ENHANCED GMRES ALGORITHM FOR ELASTO-PLASTIC PROBLEMS
1461
The inuence of the coarse mesh size on the convergence of the MG-GMRES method is not
addressed in the previous sections. It appears that the same coarse mesh may be unlikely to achieve
the same computational eciency at dierent stages of the elasto-plastic computations. Clearly, an
increase in coarse mesh size will denitely accelerate the convergence of the MG iteration, but
on the other hand, the computational costs involved in the construction and decomposition of the
coarse mesh matrices as well as in the correction steps will also grow accordingly. In the case
that the coarse=ne mesh ratio becomes too high, the above-mentioned overhead may completely
oset the benet gained from the decrease of MG inner iterations, and hence no improvement will
be oered in terms of overall performance of MG-GMRES. By comparing the CPU speed-ups
of both GMG and MG-GMRES at dierent stages for all three examples it may suggest that the
coarse mesh in the second example is suitable for linear computation but may be inadequate for
the latter stages of elasto-plastic analysis. For the rst example, the coarse mesh is slightly over
rened for the linear case but still not ne enough for the severe case. In the third example, the
relatively high speed-up achieved in the poor ill-conditioned case indicates that the coarse mesh is
probably appropriate for the severe elasto-plastic computation but over rened for the linear case.
These observations are in fact coincident with the actual DOF ratios between the coarse and ne
meshes of the three examples listed in Table II. Therefore further research regarding the selection
of an appropriate coarse=ne mesh ratio, so that an optimal overall performance of MG-GMRES
can be achieved, is of practical importance.
5.3. Other numerical verications
The validation of relation (12) is further investigated due to its inuence on the performance
of other aspect of the MG-GMRES approach. It is surprising to nd that relation (12) is able
to provide an estimation very close to the actual error in all circumstances considered. Figures
9(a) – 9(c) illustrate the actual relative residual norm against the value predicted by relation (12)
at one typical Newton–Raphson iteration within an arbitrary selected loading step for the three
examples. Excellent agreement between the two values in all cases can be clearly observed.
Due to the success of relation (12), the modied termination scheme at the last MG cycle also
performs quite well. In general, more than 40 per cent saving in terms of inner MG iterations at
the last MG cycles for all three examples is obtained, and further saving is achieved in the poor
conditioned cases, where more iterations are normally required at each MG cycle.
6. CONCLUSIONS
A combination of both GMRES and MG methods has been presented, in which each MG iteration
cycle is employed as the preconditioning step for the GMRES iteration procedure; or alternatively,
the GMRES procedure is employed to accelerate the standard MG iterations. The corresponding
algorithm, termed MG-GMRES, can be readily implemented within the framework of a non-linear
GMRES algorithm with variable preconditioner. The most attractive feature of this approach is that
it not only inherits all the advantages of both MG and GMRES approaches, but also signicantly
enhances their performances in terms of computational eciency and robustness.
By means of numerical experiments, the performance of the new approach has been assessed
over a set of examples with a wide range of system conditioning. A very good performance of
the MG-GMRES method has been observed for both less ill-conditioned and severe ill-conditioned
problems, and also for both symmetric and unsymmetric cases, demonstrating that the GMG method
? 1998 John Wiley & Sons, Ltd.
Int. J. Numer. Meth. Engng. 42, 1441–1462 (1998)
1462
Y. T. FENG, D. PERIC AND D. R. J. OWEN
could become a very promising iterative method for solving elasto-plastic problems encountered
in practical applications.
ACKNOWLEDGEMENTS
This work is funded by the EPSRC of UK under grant No. GR=K88965. This support is gratefully
acknowledged.
REFERENCES
1. Y. Saad and M. Schultz, ‘GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems’,
SIAM J. Sci. Statist. Comput., 7, 856 – 869 (1986).
2. P. Sonneveld, ‘CGS: a fast Lanczos-type solver for nonsymmetric linear systems’, SIAM J. Sci. Statist. Comput., 10,
36 – 52 (1989).
3. H. van der Vorst, ‘Bi-CGSTAB: A fast and smoothly converging variant of Bi-CG for the solution of nonsymmetric
linear equations’, SIAM J. Sci. Statist. Comput., 13, 631– 644 (1992).
4. Y. Saad, ‘A exible inner–outer preconditioned GMRES algorithm’, SIAM J. Sci. Comput., 14, 461– 469 (1993).
5. H. van der Vorst and C. Vuik, ‘GMRESR: A family of nested GMRES methods’, Numer. Linear Algebra Appl., 1,
368 – 386 (1994).
6. T. E. Tezduyar, M. Behr, S. K. Aliabadi, S. Mittal and S. E. Ray, ‘A new mixed preconditioning method for nite
element computations’, Comput. Meth. Appl. Mech. Engng., 99, 27 – 42 (1992).
7. Y. T. Feng, D. Peric and D. R. J. Owen, ‘A non-nested Galerkin Multi-Grid method for solving linear and nonlinear
solid mechanics problems’, Comput. Meth. Appl. Mech. Engng., 144, 307 – 325 (1997).
8. A. Brandt, ‘Multi-level adaptive solution to boundary-value problems’, Math. Comput., 31, 333 – 390 (1977).
9. J. W. Ruge and K. Stuben, ‘Chapter 4: Algebraic multigrid’, in S. F. McCormick (ed.), Multigrid Methods, SIAM,
Philadelphia, Pennsylvania, USA, 1987, pp. 73 –130.
10. W. Hackbuch, Multi-grid Methods And Applications, Springer, Berlin, Germany, 1985.
11. P. Wesseling, An Introduction to Multigrid Methods, Wiley, Chichester, U.K., 1992.
12. Y. T. Feng, D. Peric and D. R. J. Owen, ‘Coarse Mesh Evolution Strategies in the Galerkin Multigrid Method for
Solving Geometrically Nonlinear Problems’, 4th Int. Conf. on Computational Structures Technology, Edinburgh,
Scotland, U.K., 1998.
13. D. R. J. Owen, Y. T. Feng and D. Peric, ‘Iterative methods on parallel computers for FE simulation of 3-d sheet
forming operations’, in R. H. Wagoner, J. K. Lee and G. L. Kinzel (eds.), 3rd Int. Conf. on Numerical Simulation
of 3-D Sheet Forming Processes (NUMISHEET’96), Dearborn, Michigan, USA, 1996.
14. J. G. Lewis, ‘Implementation of the Gibbs-Poole-Stockmeyer and Gibbs-King algorithm’, ACM Trans. Math. Software,
8, 180 –189 (1982).
15. N. E. Gibbs, ‘A hybrid prole reduction algorithm’, ACM Trans. Math. Software, 2, 378 – 387 (1976).
16. E. A. De Souza Neto, D. Peric, M. Dutko and D. R. J. Owen, ‘Design of simple low order nite elements for large
strain analysis of nearly incompressible solids’, Int. J. Solids Struct., 33, 3277 – 3296 (1996).
? 1998 John Wiley & Sons, Ltd.
Int. J. Numer. Meth. Engng. 42, 1441–1462 (1998)