A Supernodal Approach to Incomplete LU Factorization Meiyue Shao Department of Computing Science, Umeå University, Sweden [email protected] Under the instruction of Dr. Xiaoye S. Li and Prof. Weiguo Gao Zürich, September 2009 Notations A L, U F = L+U −I Dr , Dc P, P0, Pr , Pc nnz(A) nnz(F)/nnz(A) (s : t) — 1/27 — coefficient matrix triangular factor filled matrix diagonal matrices permutation matrices number of nonzeros in A fill ratio (s, s + 1, . . . , t) Outline • Introduction • Sketch of the Algorithm • Secondary Dropping • Some Variations • Numerical Experiments • Conclusion — 2/27 — Introduction — 3/27 — ILU are mainly classified in two types: structure-based and threshold-based. • structure-based ILU (symbolic) Nonzero patterns can be determined before the numerical factorization. • threshold-based ILU (numeric) Nonzeros are determined on the fly. Is derived from a direct solver (e.g. UMFPACK, SuperLU, MA48, . . . ) Existing packages(algorithms): e.g. SPARSKIT(ILUTP), ILUPACK(ILUSTAB), . . . Introduction — 4/27 — Our purpose: • Derive an ILU algorithm from SuperLU for general unsymmetric matrices. • Keep the supernodal structure in SuperLU. • Generate the preconditioner even if only a limited amount of memory is available. Introduction — 5/27 — Supernodal Structure in SuperLU Storage: • Supernode: consecutive columns with same nonzero rows in L part. • Lower triangular part and dense diagonal blocks are stored in L. • Upper triangular part is stored in U. Algorithm: • left-looking • supernodal-panel update • partial pivoting L U Panel Outline • Introduction • Sketch of the Algorithm • Secondary Dropping • Some Variations • Numerical Experiments • Conclusion — 6/27 — Sketch of the Algorithm — 7/27 — • Preprocessing: ordering and equilibration Call MC64: A → P0 Dr ADc, which is called I-Matrix (|aii| = 1, |ai j| ≤ 1). • Sparse ordering: P0 Dr ADc → P0 Dr ADc PTc (Use graph(AT A)) • Factorization: For each panel ... Symbolic factorization Partial pivoting Numerical factorization Apply the dropping rules to L and U end loop Dropping Rule — 8/27 — Threshold-based dropping criteria for ILU(τ): • Dropping elements in U: If |ui j| ≤ τkA(:, j)k∞, we set ui j to zero. • Dropping elements in L: In a supernode L(:, s : t), if kL(i, s : t)k∞ ≤ τ, we set the ENTIRE i-th row to zero. Elements in L are dropped once the boundary of a supernode is determined. Compare with scalar ILU(τ): • For 54 matrices, τ = 10−4, SILU+GMRES converged with 47 cases, versus 43 with scalar ILU+GMRES • SILU+GMRES is 2.3x faster than scalar ILU+GMRES SILU is reliable, but sometimes the fill ratio can be large. Outline • Introduction • Sketch of the Algorithm • Secondary Dropping • Some Variations • Numerical Experiments • Conclusion — 9/27 — Secondary Dropping — 10/27 — Existing methods: • Y. Saad’s ILUTP (1994): For a given integer p, keep at most p nonzeros in each column. Usually the entries with largest modulus are kept. We denote it as ILU(τ,p). • A. Gupta & T. George’s approach (2008): Given γ, if nnz(L(:, j)) > γnnz(A( j : n, j)), compute −1 −1 τ−1 s = αdmax + (1 − α)dmin , where α = γnnz(A( j : n, j))/nnz(L(:, j)), dmax and dmin are the maximum and minimun “score” in the remaining rows, respectively. Then τ s is used as a new dropping tolerance. Assume the uniform distribution of the reciprocal of the row scores, this method can be considered as a dynamic ILU(τ,p( j)) approach, where p( j) = γnnz(A(:, j)). Secondary Dropping — 11/27 — Area-base approach: • Given a continuous function f ( j) such that f (n) ≤ γ. Let γ( j) = drop “small” rows once γ( j) > f ( j). nnz(F(:,1: j)) nnz(A(:,1: j)) , then • This is also a dynamic ILU(τ,p( j)) approach, where ( ) f ( j)nnz(A(:, 1 : j)) − nnz(F(:, 1 : j − k)) p( j) = max ,k . k • Sometimes it is necessary to treat L and U seperately. We can choose fL( j) and fU ( j) seperately, so long as fL(n) + fU (n) ≤ γ. According to the data structure in SuperLU, we choose γ 9γ j fU ( j) = 0.9 × = , fL( j) = 1 − γ. 2 20 2n Here fL( j) is chosen by the ratio of the lower triangular part in first j columns. (ratio of area) Secondary Dropping — 12/27 — Adaptive τ approach: • Since ILU(τ,p( j)) can be treat as dynamic ILU(τ( j)), we can try to find τ( j) for each column directly. • Given τ(1) = τ0, according to area-base rule, ( min{1, 2τ( j)}, if γ( j) > f ( j); τ( j + 1) = max{τ0, τ( j)/2}, if γ( j) ≤ f ( j). • Gain: Avoid sorting; τ( j) changes gently. • Loss: The memory is not controlled as precisely as area-based ILU(τ,p( j)). Secondary Dropping — 13/27 — Experiments: SILU+GMRES • Use restarted GMRES with our ILU as a right preconditioner Solve Pr A(L̃Ũ)−1y = Pr b • Size of Krylov subspace set to 50 • Stopping criteria: kb − Axk k2 ≤ δkbk2, iterations ≤ 1000 δ = 10−8 Secondary Dropping — 14/27 — SILU for extended MHD calculation (fusion) ILU parameters: τ = 10−4, γ = 10 Problems order nonzeros ILU(t) ILU(f-r) ITER(t) iter SuperLU(t) SuperLU(f-r) matrix31 17,298 2.7M 8.2 2.7 0.6 9 33.3 13.1 matrix41 30,258 4.7M 18.6 2.9 1.4 11 111.1 17.5 matrix61 66,978 10.6M 54.3 3.0 7.3 20 612.5 26.3 matrix121 263,538 42.5M 145.2 1.7 47.8 45 fail matrix181 589,698 95.2M 415.0 1.7 716.0 289 fail - Secondary Dropping — 15/27 — Compare with column-based ILUTP Profile of Fill Ratio 1 ilu(1e−4,p) ilu(1e−8,p) column, 1e−4 column, 1e−8 area, 1e−4 area, 1e−8 adaptive τ, 1e−4 ilu(1e−4) 0.9 0.8 Prob(Fill Ratio ≤ x) 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 2 4 6 8 x 10 Here γ is set to 10. 12 14 16 Outline • Introduction • Sketch of the Algorithm • Secondary Dropping • Some Variations • Numerical Experiments • Conclusion — 16/27 — Classic MILU Method — 17/27 — • Classic column-wise MILU for column j (1) (2) (3) (4) (5) Obtain the current filled column F(:, j); P Compute the sum of dropped entries in F(:, j): S = dropped fi j; Set f j j to be f j j + S ; Pivot: row interchange in the lower triangular part F( j : n, j); Separate U and L: U(1 : j, j) = F(1 : j, j); L( j : n, j) = F( j : n, j)/F( j, j); This algorithm preserves the column sum (eT Pr A = eT LU). Variations of MILU — 18/27 — • SMILU-1: Supernodal MILU for column j (1) (2) (3) (4) P Compute the sum of the dropped entries in U(:, j): S = dropped ui j; Choose pivot row i, such that i = arg maxi≥ j | fi j + S |; Swap rows i and j, and set u j j := fi j + S ; IF j starts a new supernode THEN Let (r : t) be the newly formed supernode; (t ≡ j − 1) For each column k in the supernode (r ≤ k ≤ t): P compute the sum of the dropped entries: S k = i dropped lik ; set ukk := ukk + S k · ukk ; END IF; This algorithm does not preserve the column sum because of delayed dropping. P • SMILU-2: S = | dropped ui j|, P • SMILU-3: S = dropped |ui j|, P S k = | dropped lik |. P S k = dropped |lik |. Variations of MILU — 19/27 — Some statistic result about ILU(τ) and SMILU (with GMRES) converge slow diverge memory zero pivots τ = 10−4 ILU 47 3 4 0 7786 SMILU-1 35 15 4 0 9 SMILU-2 44 6 4 0 9 SMILU-3 38 14 2 0 9 τ = 10−6 ILU 51 0 3 0 685 SMILU-1 49 3 1 1 0 50 3 1 0 0 SMILU-2 SMILU-3 49 5 0 0 0 τ = 10−8 ILU 52 1 0 1 0 SMILU-1 50 3 0 1 0 SMILU-2 50 3 0 1 0 SMILU-3 51 3 0 0 0 (τ = 0.0) SuperLU 53 0 0 1 0 slow: δ < krk2/kbk2 < 1 while exceeding the maximum iterations diverge: (numerically) krk2 ≥ kbk2 memory: out of memory Relaxed Pivoting — 20/27 — Given η ∈ [0, 1], if f j j ≥ η maxi≥ j | fi j|, then f j j is chosen as the pivot. Effect of Diag thresh (η) with ILU(τ). Diag thresh (η) 1.0 0.8 0.5 0.1 0.01 0.001 0 SILU(1e-4) Number of successes 47 48 48 48 47 48 44 Average fill ratio 12.60 12.02 11.91 11.99 12.62 12.92 11.71 SILU(1e-6) Number of successes 51 51 51 51 51 51 45 Average fill ratio 28.64 28.84 28.67 28.62 28.85 29.06 29.87 Breakdown Due to Zero Pivots — 21/27 — • Whether encountering zero pivots depends on the nonzero pattern. Numerical values affect the probability of encountering zero pivots. • A 2 × 2 example: " # a b . c 0 Assuming a, b, c are drawn independently from the uniform distribution in [−1, 1], we have Prob{u2,2 = 0} = γ/2 > 0. • For a matrix with order n, the condition for encountering a zero pivot at column j for the first time is that there exists a permutation matrix P such that (PA)(1 : j − 1, 1 : j − 1) is nonsingular and (PA)( j : n, j) = 0. With this condition, the probability of encountering a zero pivot is at least (γ/2)nnz(A)−n. (Assume the same distribution above.) Handling the Breakdown — 22/27 — • Do NOT drop anything in the last several columns. • Introduce a perturbation once encountering a zero pivot: If f j j = 0, set f j j = τ̂kA(:, j)k∞. This is equivalent to add a perturbation no greater than τ̂ at l j,k . • We can choose τ̂ = τ so that the perturbation will not exceed the upper bound of the error propagated by droppings. • In our code, we choose τ̂( j) = 10−2(1− j/n). This prevents the diagonal entries of U from being too small. • Sometimes it is helpful, but NOT always. Outline • Introduction • Sketch of the Algorithm • Secondary Dropping • Some Variations • Numerical Experiments • Conclusion — 23/27 — Numerical Experiments — 24/27 — Default values of the parameters of the ILU routine xGSITRF Options MC64 equilibration drop tolerance (τ) fill-ratio bound (γ) diag thresh (η) column permutation SMILU secondary dropping Default ON ON 10−4 10 0.1 COLAMD SMILU-2 area-based Iterative solver: GMRES Numerical Experiments — 25/27 — Compare with ILUPACK: Profile of Fill Ratio 1 ilu(1e−4) area, 1e−4 ilupack, 1e−4 0.9 0.8 Prob(Fill Ratio ≤ x) 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 2 4 6 x 8 10 Conclusion — 26/27 — Summary: • Use dual dropping rules • Retain partial pivoting • Retain supernode structure while dropping • Faster and more reliable than classical column-based ILUTP • Competitive with an inverse-based multilevel ILU method: ILUPACK New routines of ILU are available in SuperLU 4.0. http://crd.lbl.gov/˜xiaoye/SuperLU/ Future work: • Adjust the parameters automatically if the preconditioner does not work. • Offer more iterative solvers. • Parallel ILU in SuperLU DIST. Some References — 27/27 — 1. J. W. Demmel, S. C. Eisenstat, J. R. Gilbert, X. S. Li, and J. W. H. Liu. A supernodal approach to sparse partial pivoting. SIAM J. Matrix Analysis and Applications, 20(3):720-755, 1999. 2. J. W. Demmel, J. R. Gilbert, and X. S. Li. SuperLU Users’ Guide, 1999. 3. Y. Saad. ILUT: A dual threshold incomplete LU factorization. Numerical Linear Algebra with Applications, 1(4):387-402, 1994. 4. A. Gupta and T. George. Adaptive techniques for improving the performance of incomplete factorization preconditioning. IBM Research Report RC 24598(W0807-036), 2008. 5. M. Bollhöfer, Y. Saad, and O. Schenk. ILUPACK - preconditioning software package. http://ilupack.tu-bs.de, TU Braunschweig. 6. S. C. Jardin, J. Breslau and N. Ferraro. A high-order implicit finite element method for integrating the two-fluid magnetohydrodynamic equations in two dimensions. Journal of Computational Physics, 226:2146-2174, 2007. Thank you!
© Copyright 2026 Paperzz