A Supernodal Approach to Incomplete LU Factorization Meiyue

A Supernodal Approach to Incomplete LU Factorization
Meiyue Shao
Department of Computing Science,
Umeå University, Sweden
[email protected]
Under the instruction of
Dr. Xiaoye S. Li and Prof. Weiguo Gao
Zürich, September 2009
Notations
A
L, U
F = L+U −I
Dr , Dc
P, P0, Pr , Pc
nnz(A)
nnz(F)/nnz(A)
(s : t)
— 1/27 —
coefficient matrix
triangular factor
filled matrix
diagonal matrices
permutation matrices
number of nonzeros in A
fill ratio
(s, s + 1, . . . , t)
Outline
• Introduction
• Sketch of the Algorithm
• Secondary Dropping
• Some Variations
• Numerical Experiments
• Conclusion
— 2/27 —
Introduction
— 3/27 —
ILU are mainly classified in two types: structure-based and threshold-based.
• structure-based ILU (symbolic)
Nonzero patterns can be determined before the numerical factorization.
• threshold-based ILU (numeric)
Nonzeros are determined on the fly.
Is derived from a direct solver (e.g. UMFPACK, SuperLU, MA48, . . . )
Existing packages(algorithms):
e.g. SPARSKIT(ILUTP), ILUPACK(ILUSTAB), . . .
Introduction
— 4/27 —
Our purpose:
• Derive an ILU algorithm from SuperLU for general unsymmetric matrices.
• Keep the supernodal structure in SuperLU.
• Generate the preconditioner even if only a limited amount of memory is available.
Introduction
— 5/27 —
Supernodal Structure in SuperLU
Storage:
• Supernode: consecutive columns with
same nonzero rows in L part.
• Lower triangular part and dense diagonal blocks are stored in L.
• Upper triangular part is stored in U.
Algorithm:
• left-looking
• supernodal-panel update
• partial pivoting
L
U
Panel
Outline
• Introduction
• Sketch of the Algorithm
• Secondary Dropping
• Some Variations
• Numerical Experiments
• Conclusion
— 6/27 —
Sketch of the Algorithm
— 7/27 —
• Preprocessing: ordering and equilibration
Call MC64: A → P0 Dr ADc, which is called I-Matrix (|aii| = 1, |ai j| ≤ 1).
• Sparse ordering:
P0 Dr ADc → P0 Dr ADc PTc (Use graph(AT A))
• Factorization:
For each panel ...
Symbolic factorization
Partial pivoting
Numerical factorization
Apply the dropping rules to L and U
end loop
Dropping Rule
— 8/27 —
Threshold-based dropping criteria for ILU(τ):
• Dropping elements in U:
If |ui j| ≤ τkA(:, j)k∞, we set ui j to zero.
• Dropping elements in L:
In a supernode L(:, s : t),
if kL(i, s : t)k∞ ≤ τ, we set the ENTIRE i-th row to zero.
Elements in L are dropped once the boundary of a supernode is determined.
Compare with scalar ILU(τ):
• For 54 matrices, τ = 10−4, SILU+GMRES converged with 47 cases, versus 43 with
scalar ILU+GMRES
• SILU+GMRES is 2.3x faster than scalar ILU+GMRES
SILU is reliable, but sometimes the fill ratio can be large.
Outline
• Introduction
• Sketch of the Algorithm
• Secondary Dropping
• Some Variations
• Numerical Experiments
• Conclusion
— 9/27 —
Secondary Dropping
— 10/27 —
Existing methods:
• Y. Saad’s ILUTP (1994):
For a given integer p, keep at most p nonzeros in each column. Usually the entries
with largest modulus are kept.
We denote it as ILU(τ,p).
• A. Gupta & T. George’s approach (2008):
Given γ, if nnz(L(:, j)) > γnnz(A( j : n, j)), compute
−1
−1
τ−1
s = αdmax + (1 − α)dmin ,
where α = γnnz(A( j : n, j))/nnz(L(:, j)),
dmax and dmin are the maximum and minimun “score” in the remaining rows,
respectively.
Then τ s is used as a new dropping tolerance.
Assume the uniform distribution of the reciprocal of the row scores, this method
can be considered as a dynamic ILU(τ,p( j)) approach, where p( j) = γnnz(A(:, j)).
Secondary Dropping
— 11/27 —
Area-base approach:
• Given a continuous function f ( j) such that f (n) ≤ γ. Let γ( j) =
drop “small” rows once γ( j) > f ( j).
nnz(F(:,1: j))
nnz(A(:,1: j)) ,
then
• This is also a dynamic ILU(τ,p( j)) approach, where
(
)
f ( j)nnz(A(:, 1 : j)) − nnz(F(:, 1 : j − k))
p( j) = max
,k .
k
• Sometimes it is necessary to treat L and U seperately. We can choose fL( j) and
fU ( j) seperately, so long as fL(n) + fU (n) ≤ γ.
According to the data structure in SuperLU, we choose
γ 9γ
j
fU ( j) = 0.9 × = , fL( j) = 1 −
γ.
2 20
2n
Here fL( j) is chosen by the ratio of the lower triangular part in first j columns.
(ratio of area)
Secondary Dropping
— 12/27 —
Adaptive τ approach:
• Since ILU(τ,p( j)) can be treat as dynamic ILU(τ( j)), we can try to find τ( j) for
each column directly.
• Given τ(1) = τ0, according to area-base rule,
(
min{1, 2τ( j)}, if γ( j) > f ( j);
τ( j + 1) =
max{τ0, τ( j)/2}, if γ( j) ≤ f ( j).
• Gain: Avoid sorting; τ( j) changes gently.
• Loss: The memory is not controlled as precisely as area-based ILU(τ,p( j)).
Secondary Dropping
— 13/27 —
Experiments: SILU+GMRES
• Use restarted GMRES with our ILU as a right preconditioner
Solve Pr A(L̃Ũ)−1y = Pr b
• Size of Krylov subspace set to 50
• Stopping criteria:
kb − Axk k2 ≤ δkbk2,
iterations ≤ 1000
δ = 10−8
Secondary Dropping
— 14/27 —
SILU for extended MHD calculation (fusion)
ILU parameters: τ = 10−4, γ = 10
Problems
order nonzeros ILU(t) ILU(f-r) ITER(t) iter SuperLU(t) SuperLU(f-r)
matrix31
17,298
2.7M
8.2
2.7
0.6
9
33.3
13.1
matrix41
30,258
4.7M 18.6
2.9
1.4 11
111.1
17.5
matrix61
66,978
10.6M 54.3
3.0
7.3 20
612.5
26.3
matrix121 263,538
42.5M 145.2
1.7
47.8 45
fail
matrix181 589,698
95.2M 415.0
1.7 716.0 289
fail
-
Secondary Dropping
— 15/27 —
Compare with column-based ILUTP
Profile of Fill Ratio
1
ilu(1e−4,p)
ilu(1e−8,p)
column, 1e−4
column, 1e−8
area, 1e−4
area, 1e−8
adaptive τ, 1e−4
ilu(1e−4)
0.9
0.8
Prob(Fill Ratio ≤ x)
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
2
4
6
8
x
10
Here γ is set to 10.
12
14
16
Outline
• Introduction
• Sketch of the Algorithm
• Secondary Dropping
• Some Variations
• Numerical Experiments
• Conclusion
— 16/27 —
Classic MILU Method
— 17/27 —
• Classic column-wise MILU for column j
(1)
(2)
(3)
(4)
(5)
Obtain the current filled column F(:, j);
P
Compute the sum of dropped entries in F(:, j): S = dropped fi j;
Set f j j to be f j j + S ;
Pivot: row interchange in the lower triangular part F( j : n, j);
Separate U and L: U(1 : j, j) = F(1 : j, j); L( j : n, j) = F( j : n, j)/F( j, j);
This algorithm preserves the column sum (eT Pr A = eT LU).
Variations of MILU
— 18/27 —
• SMILU-1: Supernodal MILU for column j
(1)
(2)
(3)
(4)
P
Compute the sum of the dropped entries in U(:, j): S = dropped ui j;
Choose pivot row i, such that i = arg maxi≥ j | fi j + S |;
Swap rows i and j, and set u j j := fi j + S ;
IF j starts a new supernode THEN
Let (r : t) be the newly formed supernode; (t ≡ j − 1)
For each column k in the supernode (r ≤ k ≤ t):
P
compute the sum of the dropped entries: S k = i dropped lik ;
set ukk := ukk + S k · ukk ;
END IF;
This algorithm does not preserve the column sum because of delayed dropping.
P
• SMILU-2: S = | dropped ui j|,
P
• SMILU-3: S = dropped |ui j|,
P
S k = | dropped lik |.
P
S k = dropped |lik |.
Variations of MILU
— 19/27 —
Some statistic result about ILU(τ) and SMILU (with GMRES)
converge slow diverge memory zero pivots
τ = 10−4
ILU
47
3
4
0
7786
SMILU-1
35 15
4
0
9
SMILU-2
44
6
4
0
9
SMILU-3
38 14
2
0
9
τ = 10−6
ILU
51
0
3
0
685
SMILU-1
49
3
1
1
0
50
3
1
0
0
SMILU-2
SMILU-3
49
5
0
0
0
τ = 10−8
ILU
52
1
0
1
0
SMILU-1
50
3
0
1
0
SMILU-2
50
3
0
1
0
SMILU-3
51
3
0
0
0
(τ = 0.0) SuperLU
53
0
0
1
0
slow:
δ < krk2/kbk2 < 1 while exceeding the maximum iterations
diverge: (numerically) krk2 ≥ kbk2
memory: out of memory
Relaxed Pivoting
— 20/27 —
Given η ∈ [0, 1],
if f j j ≥ η maxi≥ j | fi j|, then f j j is chosen as the pivot.
Effect of Diag thresh (η) with ILU(τ).
Diag thresh (η)
1.0
0.8
0.5
0.1 0.01 0.001 0
SILU(1e-4) Number of successes 47
48
48
48
47
48
44
Average fill ratio 12.60 12.02 11.91 11.99 12.62 12.92 11.71
SILU(1e-6) Number of successes 51
51
51
51
51
51
45
Average fill ratio 28.64 28.84 28.67 28.62 28.85 29.06 29.87
Breakdown Due to Zero Pivots
— 21/27 —
• Whether encountering zero pivots depends on the nonzero pattern. Numerical
values affect the probability of encountering zero pivots.
• A 2 × 2 example:
"
#
a b
.
c 0
Assuming a, b, c are drawn independently from the uniform distribution in [−1, 1],
we have Prob{u2,2 = 0} = γ/2 > 0.
• For a matrix with order n, the condition for encountering a zero pivot at column j
for the first time is that there exists a permutation matrix P such that
(PA)(1 : j − 1, 1 : j − 1) is nonsingular and (PA)( j : n, j) = 0.
With this condition, the probability of encountering a zero pivot is at least
(γ/2)nnz(A)−n. (Assume the same distribution above.)
Handling the Breakdown
— 22/27 —
• Do NOT drop anything in the last several columns.
• Introduce a perturbation once encountering a zero pivot:
If f j j = 0, set f j j = τ̂kA(:, j)k∞.
This is equivalent to add a perturbation no greater than τ̂ at l j,k .
• We can choose τ̂ = τ so that the perturbation will not exceed the upper bound of the
error propagated by droppings.
• In our code, we choose τ̂( j) = 10−2(1− j/n). This prevents the diagonal entries of U
from being too small.
• Sometimes it is helpful, but NOT always.
Outline
• Introduction
• Sketch of the Algorithm
• Secondary Dropping
• Some Variations
• Numerical Experiments
• Conclusion
— 23/27 —
Numerical Experiments
— 24/27 —
Default values of the parameters of the ILU routine xGSITRF
Options
MC64
equilibration
drop tolerance (τ)
fill-ratio bound (γ)
diag thresh (η)
column permutation
SMILU
secondary dropping
Default
ON
ON
10−4
10
0.1
COLAMD
SMILU-2
area-based
Iterative solver: GMRES
Numerical Experiments
— 25/27 —
Compare with ILUPACK:
Profile of Fill Ratio
1
ilu(1e−4)
area, 1e−4
ilupack, 1e−4
0.9
0.8
Prob(Fill Ratio ≤ x)
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
2
4
6
x
8
10
Conclusion
— 26/27 —
Summary:
• Use dual dropping rules
• Retain partial pivoting
• Retain supernode structure while dropping
• Faster and more reliable than classical column-based ILUTP
• Competitive with an inverse-based multilevel ILU method: ILUPACK
New routines of ILU are available in SuperLU 4.0.
http://crd.lbl.gov/˜xiaoye/SuperLU/
Future work:
• Adjust the parameters automatically if the preconditioner does not work.
• Offer more iterative solvers.
• Parallel ILU in SuperLU DIST.
Some References
— 27/27 —
1. J. W. Demmel, S. C. Eisenstat, J. R. Gilbert, X. S. Li, and J. W. H. Liu. A
supernodal approach to sparse partial pivoting. SIAM J. Matrix Analysis and
Applications, 20(3):720-755, 1999.
2. J. W. Demmel, J. R. Gilbert, and X. S. Li. SuperLU Users’ Guide, 1999.
3. Y. Saad. ILUT: A dual threshold incomplete LU factorization. Numerical Linear
Algebra with Applications, 1(4):387-402, 1994.
4. A. Gupta and T. George. Adaptive techniques for improving the performance of
incomplete factorization preconditioning. IBM Research Report RC
24598(W0807-036), 2008.
5. M. Bollhöfer, Y. Saad, and O. Schenk. ILUPACK - preconditioning software
package. http://ilupack.tu-bs.de, TU Braunschweig.
6. S. C. Jardin, J. Breslau and N. Ferraro. A high-order implicit finite element method
for integrating the two-fluid magnetohydrodynamic equations in two dimensions.
Journal of Computational Physics, 226:2146-2174, 2007.
Thank you!