® Identification, Assessment and Correction of Ill-Conditioning and Numerical Instability in Linear and Integer Programs Ed Klotz ([email protected]), Math Programming Specialist, IBM © 2014 IBM Corporation IBM Software Group Objective Enable more precise assessment of ill conditioning in linear and integer programs – Multiple metrics available to assess ill conditioning Discuss some techniques to treat the symptoms and causes of ill conditioning in LP and MIP models 2 © 2014 IBM Corporation IBM Software Group Outline Finite precision computing fundamentals Description of ill conditioning Assessment of ill conditioning Alternate metrics for ill conditioning Numerical stability of algorithms Identification and treatment of symptoms of ill conditioning Identification and treatment of sources of ill conditioning Anomalies, misconceptions, inconsistencies and contradictions Examples that illustrate modeling pitfalls that can contribute to ill conditioning – Formulation alternatives Conclusions 3 © 2014 IBM Corporation IBM Software Group Finite Precision Computing Fundamentals 64 bit double precision is most commonly used in scientific applications – 32 bit single precision requires less memory, but is less accurate • Memory savings not significant for LP and MIP solvers anymore – 128 bit double precision is more accurate, but requires more memory and computing time Many floating point numbers cannot be represented exactly – Base of floating point representation determines those that can – Base 2 typically used • 4 Numbers that are integer linear combinations of (positive and negative) powers of 2 can be represented exactly, within the limits of the minimum and maximum possible exponents © 2014 IBM Corporation IBM Software Group Example: 64 Bit IEEE Double Representation 16 digits (53 bits, one bit implicit) 4 digits (11bits) 1/3 + + sign 0 0 0 3 3 … exponent 3 3 Absolute round-off error =( 10-16/ 3 ) 3 3 3 3 … Absolute round-off mantissa error =( 10-12/ 3 ) 10000/3 + + 0 0 4 3 3 … 3 3 3 3 3 3 … Absolute round-off error =( 10-20/ 3 ) 1/30000 5 + - 0 0 4 3 3 … 3 3 3 3 3 3 … © 2014 IBM Corporation IBM Software Group IEEE 64 Bit Double Addition 16 digits (53 bits, one bit implicit) 4 digits (11bits) 1/3 + 1/30000 + + sign + 0 0 0 3 3 … exponent + 0 3 Abs round-off error = 10-16/ 3 3 3 3 0 0 0 0 0 … Abs round-off mantissa 0 3 -17 error = 10-16 /3 3 … 3 Shifted exponent 3 … Abs round-off -21 error = 10-20 /3 1/30000 6 + - 0 0 4 3 3 3 3 3 … 3 3 … © 2014 IBM Corporation IBM Software Group IEEE 64 Bit Double Addition 16 digits (53 bits, one bit implicit) 4 digits (11bits) 1/3 + + + 0 0 Abs round-off error = 10-12/ 3 4 0 0 0 0 3 ... 3 3 … Abs round-off sign exponent mantissa error = 10-12/ 3 10000/3 + + 0 0 4 3 3 3 … 3 3 3 3 … 1/3 + + 0 0 0 3 3 3 3 3 ... 3 3 … 7 © 2014 IBM Corporation IBM Software Group IEEE 64 Bit Division Subtract exponents, divide mantissas – Errors in representation in mantissa determine magnitude of roundoff error Don’t divide big numbers by small numbers in data calculations For a > > b, compare a/b and b/a (a = 3, b = 1/30000, ε = 10-8 ) b/a ~ (b + ε )/a = b/a + ε /a (error ~ 10-8 ) a/b ~ a/(b + ε ) = a/b - aε / ( (b + ε )b) (error ~ 100 ) (a = 3, b = 1/30000, ε = 10-16 ) 8 b/a ~ (b + ε )/a = b/a + ε /a (error ~ 10-16 ) a/b ~ a/(b + ε ) = a/b - aε / ( (b + ε )b) (error ~ 10-8 ) © 2014 IBM Corporation IBM Software Group Implications of Finite Precision Representation Simply representing the model data can introduce roundoff errors – Larger numbers have larger absolute round-off errors in their representations Arithmetic calculations can introduce additional round-off errors Arithmetic calculations on numbers of the same order of magnitude are more accurate than calculations on numbers of different orders of magnitude 9 © 2014 IBM Corporation IBM Software Group Description Ill Conditioning – Does the flap of a butterfly’s wings in Brazil set off a tornado in Texas? Data to 3 decimal places Meteorological Model (.506) Data to 6 decimal places Meteorological Model (.506127) 10 © 2014 IBM Corporation IBM Software Group Problem definition Ill Conditioning – Small change in input leads to big change in output Given x ∈ R n , y ∈ R m , y = f ( x ) For y + ∆ y = f ( x + ∆ x ), compute bound κ : ∆y ≤ κ ∆x – Can we quantitatively measure ill conditioning? • For many mathematical systems or models, quantitative measures have yet to be discovered. But, sometimes we can measure it. • Specifically, we can measure ill conditioning when solving square linear systems of equations 11 © 2014 IBM Corporation IBM Software Group Condition Number of a Square Matrix (Turing, 1948; Rice, 1966) CPLEX solves square linear systems of form: exact solution is: How will a change to the input vector b affect the computed solution x? Cauchy-Schwarz inequality: Cauchy-Schwarz for original system: Combine and rearrange: 12 Bx = b − 1 x= B b −1 x + δ x = B (b + δ b) −1 ⇒ δx = B δb δ x ≤ B− 1 ⋅ δ b b ≤ B ⋅ x δx δb −1 ≤ B ⋅ B ⋅ x b © 2014 IBM Corporation IBM Software Group Condition Number of a Square Matrix (ctd.) CPLEX solves square linear systems of form: exact solution is: Bx = b x = B − 1b How will a change to the input matrix B affect the computed solution x? ⇒ Bδ x = − δ B ( x + δ x ) Cauchy-Schwarz inequality: Rearrange: Multiply by 13 B B : ( B + δ B )( x + δ x ) = b Bx + δ Bx + Bδ x + δ Bδ x = b ⇒ − δ x = B − 1δ B ( x + δ x ) δ x ≤ B− 1 ⋅ δ b ⋅ x + δ x δx ≤ B− 1 ⋅ δ B x+ δx δx δB −1 ≤ B ⋅ B ⋅ x+ δx B © 2014 IBM Corporation IBM Software Group Condition Number Condition number of B is defined as δx x ≤ κ ( B) ⋅ δ b b δx ≤ κ ( B) ⋅ δ B x+ δx B κ ( B) = B ⋅ B− 1 As condition number increases, potential change in solution relative to (normwise) change in data also increases Even if the modeler doesn‘t change the data, finite precision computers can introduce small changes – Machine precision for 64 bit double = 1e-16 – Just moving from a Windows machine to an AIX machine can change precision enough to significantly influence results on an ill conditioned linear system 14 © 2014 IBM Corporation IBM Software Group Assessment of Ill Conditioning What constitutes a large or small value? – Depends on machine, data and algorithm precision (ε ), algorithm tolerances (t) – Ill conditioning can occur when round off error associated with machine precision is large enough to influence algorithm decisions δx x ≤ κ ( B) ⋅ δ b ≥t? b ε – Classify based on threshold defined by t /ε • Four distinct categories – Example: CPLEX has default algorithmic tolerances of 1e-6, runs double precision arithmetic on machines with precision of ~1e-16 • t / ɛ = 1e-6/1e-16 = 1e+10 is a key threshold 15 © 2014 IBM Corporation IBM Software Group Assessment of Ill Conditioning Condition number is a bound for the increase of the error: δx x ≤ κ ( B) ⋅ δ b b Basic epsilons: – Machine precision (double): 1e-16 – Default feasibility and optimality tolerance: 1e-6 Classification of condition numbers for LP bases: 16 ≤ κ(B) < 1e+7 – Stable: 0 – Suspicious: 1e+7 ≤ κ(B) < 1e+10 – Unstable: 1e+10 ≤ κ(B) < 1e+14 – Ill-posed: 1e+14 ≤ κ(B) © 2014 IBM Corporation IBM Software Group Assessment of Ill Conditioning What about MIP? Previously, CPLEX provided Kappa for the associated fixed LP: – Some indication of condition of primal solution • But optimal basis for fixed LP may not match that of node LP that yielded the associated integer solution – Optimality proof of MIP is based on pruning during tree search and thus not available with final solution How reliable is it? – Need to monitor condition number of all optimal bases used during Branch-and-Cut search – Performance impact – Can be mitigated by sampling 17 © 2014 IBM Corporation IBM Software Group Assessment of Ill Conditioning MIP Kappa feature, available starting with CPLEX 12.2 – Sample from the series of condition numbers • New parameter CPX_PARAM_MIPKAPPA with settings: -1: off 0: auto (defaults to off) 1: sample 2: use every optimal basis – Classification thresholds provide percentages of each category – Provide an assessment for users unfamiliar with ill conditioning If enabled, categorize condition numbers of optimal bases • • • • 18 Stable Suspicious Unstable Ill-posed © 2014 IBM Corporation IBM Software Group Assessment of Ill Conditioning MIP Kappa sample output : Branch-and-cut subproblem optimization: Max condition number: 3.5490e+16 Percentage of stable bases: 0.0% Percentage of suspicious bases: 86.9% Percentage of unstable bases: 13.0% Percentage of ill-posed bases: 0.1% Attention level: 0.048893 CPLEX encountered numerical difficulties while solving this model. Attention level – =0 if only stable bases encountered – >0 if at least one basis encountered that is not stable – Max value is 1 (all bases ill-posed) – Not “linear” 19 © 2014 IBM Corporation IBM Software Group Implications of Ill Conditioning Now that we can better assess the meaning of the basis condition numbers, what can we do about it? – Ill conditioning can occur under perfect arithmetic. • For some models, even perfect data, algorithm and machine precision may not address the problem. Consider adjustments to existing formulation, or alternate formulations that provide the solution to the ill conditioned model. – But, in most cases, finite precision can perturb the exact system of equations we wish to solve, resulting in significant changes to the computed solution. • Calculate data, formulate model and configure algorithm to keep such perturbations as small as possible – Condition number provides a worst case bound on the effect • CPLEX provides good quality solutions on majority of models containing some basis condition numbers in [1e+10, 1e+14] 20 © 2014 IBM Corporation IBM Software Group Implications of Ill Conditioning Sources of perturbations – Finite precision representation of exact data – Calculation of problem data in finite precision – Truncation of calculated data • Good idea if based on knowledge of the model and associated physical system (cleaning up the model data) • Bad idea if done arbitrarily without considering the implications for the model and associated physical system (garbage in, garbage out). – Errors in algorithmic calculations of data • Statistical methods to predict demand for production planning or asset returns for consideration in a financial portfolio – Errors in physical measurements of data values – Any other differences between the conceptual perfect precision calculation and the practical finite precision calculation • Example: addition and multiplication no longer associative and distributive under finite precision 21 © 2014 IBM Corporation IBM Software Group Alternate interpretations of Ill Conditioned Basis Condition number of Simplex Solutions – Simplex solution is intersection point of n hyperplanes 22 © 2014 IBM Corporation IBM Software Group Alternate interpretations of Ill Conditioned Basis Condition number of Simplex Solutions – Simplex solution is intersection point of n hyperplanes ∀∆b = change in hyperplane (input); ∆x = change in solution (output) ∆x κ = ∆x / ∆b ≈1: well conditioned! ∆b 23 © 2014 IBM Corporation IBM Software Group Alternate interpretations of Ill Conditioned Basis Condition number of Simplex Solutions – Simplex solution is intersection point of n hyperplanes κ = ∆x / ∆b >>1: ill conditioned! ∆x ∆b 24 © 2014 IBM Corporation IBM Software Group Alternate interpretations of Ill Conditioned Basis Condition number of Simplex Solutions – Simplex solution is intersection point of n hyperplanes stable Solution ill-conditioned Solution 25 © 2014 IBM Corporation IBM Software Group Alternate interpretations of Ill Conditioned Basis Distance to singularity of a matrix is the reciprocal of its condition number (Gastinel, Kahan). ∆B distp( B ) := min{ : B + ∆ B singular} ∆B || B || p distp( B ) = 1 / Κ p( B ) p – Implies that linear combinations of rows or columns of B that are close to 0 imply ill conditioning: T if B λ = v, || v ||< ε , || λ ||> > || ε ||, B is close to singular, and hence ill conditioned – λ provides a certificate of ill conditioning; its support identifies rows to examine 26 © 2014 IBM Corporation IBM Software Group Alternate interpretations of Ill Conditioned Basis Diagonals of U in LU factorization provide a proxy for basis condition number ( B ) = ( L )( D )(U ) −1 −1 −1 −1 ( B ) = (U )( D )( L ) 27 © 2014 IBM Corporation IBM Software Group Implications for the Practitioner (Data Input) Can’t avoid perturbations due to machine precision δx x ≤ κ ( B) ⋅ δ b ≥t? b ε – But, increasing tolerances when condition number is high can prevent algorithmic decisions based on round off error associated with machine precision. – More precise input values are better • Always calculate and input model data in double precision • Machine precision for 32 bit floats ~1e-8 28 Condition numbers > 1e+2 could result in algorithmic decisions based on machine precision based round off error If you really need to use single precision in the model data, increase the algorithms default tolerances above 1e-6 © 2014 IBM Corporation IBM Software Group Implications for the Practitioner (Data Calculation) Minimize perturbations involving other factors δx x ≤ κ ( B) ⋅ δ b ≥t? b ε – Model data values • Don’t divide big numbers by small numbers in data calculations Increases round off error • Make sure all procedures that calculate the data are implemented in a numerically stable manner • Less round off error if all data values of similar order of magnitude 29 Mix of large and small numbers results in more shifting of the exponents, loss of precision in the mantissa. – Use CPLEX’s aggressive scaling if unavoidable © 2014 IBM Corporation IBM Software Group Implications for the Practitioner (Formulation) Avoid nearly linear dependent rows or columns if B λ = v, || v ||< ε , || λ ||> > || ε ||, B is close to singular, and hence ill conditioned T – Such linear combinations of rows and columns often arise from round off error in the data 30 © 2014 IBM Corporation IBM Software Group Implications for the Practitioner (Formulation) Imprecise model data values and near singular matrices (example) – Avoid rounding if you can, or round as precisely as possible – Matrices can be ill conditioned despite small spread of coefficients – Exact formulation: if B T λ = v, || v ||< ε , || λ ||> > || ε ||, Maximize x1 + x2 c1: 1/3 x1 + 2/3 x2 = 1 B is close to singular, and hence ill conditioned c2: x1 + 2 x2 = 3 – Imprecisely rounded, single [double] precision Maximize x1 + x2 c1: .33333333 x1 + .66666667 x2 = 1 (results in near singular matrix) [ c1: .3333333333333333 x1 + .666666666666667 x2 = 1] (better) c2: x1 + 2x2 = 3 – Scale to integral value whenever possible: Maximize x1 + x2 c1: x1 + 2 x2 = 3 c2: x1 + 2 x2 = 3 31 (best) © 2014 IBM Corporation IBM Software Group Numerical Stability of Algorithms Numerical instability and ill conditioning are not the same – Ill conditioning can occur under perfect precision; numerical instability is specific to finite precision – Informally, an algorithm is numerically unstable if it performs calculations that introduce unnecessarily large amounts of round-off error – Formally, numerical stability (or lack thereof) involves error analysis Given x ∈ R n , y ∈ R m , y = f ( x ) Forward error analysis : ∆ y = fl ( f ( x )) − f ( x ) Backward error analysis : ∆ x : f ( x + ∆ x ) = fl ( f ( x )) • Forward: change in computed solution due to round-off errors • Backward: change in model (under perfect precision) required to achieve finite precision result • An algorithm is numerically stable when the bound on the backward error is small relative to the error in the input 32 © 2014 IBM Corporation IBM Software Group Numerical Stability of Algorithms Sources of numerical instability in finite precision algorithms and calculations – Performing arithmetic operations on numbers of dramatically different orders of magnitude • Look for mathematically equivalent calculation on numbers of more similar magnitude – Algorithms that rely on ill conditioned subproblems • Example: Gomory cuts become almost parallel in cutting plane algorithm as it nears convergence – Ill-conditioned transformations of the problem* • Example: LU factorization calculated with numerically unstable pivot selections (more on this soon) – Calculations involving large intermediate values compared to final solution values* • Small relative error for large intermediate values are much larger relative to final value * source: Higham, Accuracy and Stability of Numerical Algorithms 33 © 2014 IBM Corporation IBM Software Group Numerical Stability of Algorithms Example: stability test in the LU factorization calculation* ~ A= ε 1 1 - 1 = 1 1 ε 1 1 1 A = = ε -1 ε 1 ~ -1 ε + 1 A = -1 ε +1 1 ε + 1 ε ε + 1 0 ε -1 1 1 0 1 + ε 0 1 1 1 0 - (1 + ε ) 1. Divide small numbers into much larger ones 2. Large intermediate values 3. Ill conditioned transformation (Both matrices are well conditioned) *source: Higham, Accuracy and Stability of Numerical Algorithms 34 © 2014 IBM Corporation IBM Software Group Numerical Stability of Algorithms Resulting round-off error 0 ε -1 1 = 1 0 1 + ε ~ ~ ~ ε - 1 ε - 1 A - fl ( L ) fl (U ) = − 1 1 1 0 1 ~ ~ fl ( L ) fl (U ) = 1 ε 1 fl ( L) fl (U ) = ε ε -1 1 0 0 = 0 0 1 Should be 1 0 1 1 1 1 = 1 0 - (1 + ε ) 0 - 1 1 1 1 1 0 0 A - fl ( L) fl (U ) = − = ε - 1 0 - 1 ε 0 35 Should be ɛ © 2014 IBM Corporation IBM Software Group Numerical Stability of Algorithms Implications for the practitioner – Optimizer developers are responsible for stability of their algorithms • CPLEX applies such tests in factorizations, ratio tests, presolve operations, other places • Simpler software that doesn’t use numerically stable procedures in the implementation will not be consistently reliable – Practitioner is responsible for stability of algorithms used to calculate model data for optimizer • Statistical packages that do predictive analytics as input for prescriptive analytics • Watch out for ill-conditioned transformations of the problem • Watch out for data calculations involving large intermediate values compared to final data values 36 © 2014 IBM Corporation IBM Software Group Identification of symptoms of ill conditioning Tools for assessing presence of ill conditioning or excessive round-off error – Examine problem statistics of model before starting the optimization • Mixtures of large and small coefficients • Indications of nearly linearly dependent rows Values with repeating decimal places – Examine node or iteration log during the optimization • Loss of feasibility for LP/QP solves • Large iteration counts for node relaxations – Examine solution quality after the optimization • Significant primal or dual residuals often indicate large basis condition numbers – (If available) Examine the diagonals of U in B=LU (MINOS) – (If available) Run the MIP Kappa feature for MIPs (CPLEX) 37 © 2014 IBM Corporation IBM Software Group Identification of symptoms of ill conditioning: ns1687037 (http://plato.asu.edu/ftp/lptestset/) Problem statistics: Variables Objective nonzeros Linear constraints Nonzeros RHS nonzeros : 43749 [Nneg: 36001, Box: 874, Free: 6874] : 24000 : 50622 [Greater: 38622, Equal: 12000] : 1406739 : 24000 Variables Objective nonzeros Linear constraints Nonzeros RHS nonzeros : Min LB: 0.000000 Max UB: 3.000000 : Min : 1.000000 Max : 100.0000 : : Min : 1.987766e-08 Max : 1364210. : Min : 0.0005000000 Max : 5.030775e+07 Wide range of coefficients; smallest below default feasibility, optimality tolerances 38 © 2014 IBM Corporation IBM Software Group Identification of symptoms of ill conditioning: ns1687037 (http://plato.asu.edu/ftp/lptestset/) Iteration log #1: Loss of feasibility after basis refactorization Iteration: 674222 Dual objective = 913.318204 Should only Iteration: 674228 Dual objective = 913.318204 refactor every … <3 more refactorizations> 100+ iters Iteration: 674256 Dual objective = 913.318205 Iteration: 674258 Dual objective = 913.318205 Removing perturbation. Massive loss Iteration: 674259 Scaled dual infeas = 12123.146176 of feasibility Iteration: 674772 Scaled dual infeas = 595.276887 Elapsed time = 16959.32 sec. (6667412.82 ticks, 674876 iterations) ... Elapsed time = 17138.76 sec. (6737439.02 ticks, 681930 iterations) Iteration: 681949 Scaled dual infeas = 0.000002 ... Iteration: 682542 Scaled dual infeas = 0.000000 Objective Iteration: 682624 Dual objective = -19559.930294 much worse Iteration: 682896 Dual objective = -18109.597465 Elapsed time = 17160.88 sec. (6747443.75 ticks, 682941 iterations) 39 © 2014 IBM Corporation IBM Software Group Identification, of symptoms of ill conditioning: ns1687037 (http://plato.asu.edu/ftp/lptestset/) Iteration log #2: Increase in Markowitz tolerance after frequent refactorizations of the basis Iteration: 783543 Dual objective Iteration: 783548 Dual objective Iteration: 783550 Dual objective Iteration: 783553 Dual objective Iteration: 783556 Dual objective Removing shift (209). Markowitz threshold set to 0.99999 Iteration: 783558 Dual objective = = = = = = 3.635840 3.635840 3.635840 3.635840 3.635840 3.635695 Should only refactor every 100+ iters CPLEX reacts to signs of trouble Dual feasibility preserved 40 © 2014 IBM Corporation IBM Software Group Identification, of symptoms of ill conditioning: ns1687037 (http://plato.asu.edu/ftp/lptestset/) Solution quality (available in all CPLEX APIs) Max. unscaled (scaled) bound infeas. = 8.39528e-07 (8.39528e-07) (Reduce feasibility tolerance, continue optimizing to decrease bound infeasibilities) Max. unscaled (scaled) reduced-cost infeas. = 2.31959e-08 (2.31959e-08) (Reduce optimality tolerance, continue optimizing to decrease reduced cost infeasibilities) Max. unscaled (scaled) Ax-b resid. = 3.51461e-07 (1.16886e-12) (If exceeds feasibility tolerance, CPLEX feasibility decisions based on round off error) Max. unscaled (scaled) c-B'pi resid. = 1.18561e-13 (1.18561e-13) (If exceeds optimality tolerance, CPLEX optimality decisions based on round off error) Max. unscaled (scaled) |x| = 24139.1 (24139.1) Max. unscaled (scaled) |slack| = 48278.2 (48278.2) Max. unscaled (scaled) |pi| = 76.2637 (76.2637) Max. unscaled (scaled) |red-cost| = 100 (100) Condition number of scaled basis = 2.2e+12 (Use to assess sensitivity of solution to perturbations in the model data) 41 © 2014 IBM Corporation IBM Software Group Treatment of symptoms of ill conditioning Distinguish the symptoms from the cause Treatment of symptoms often not as robust, but it may provide a quick resolution to a pressing problem CPLEX parameters to treat symptoms – Set the scale parameter to 1 • Geometric mean based scaling works well on models with wide range of coefficients – Increase the Markowitz tolerance from its default of 0.01 to . 90 or larger (max of .9999) • Tightens the pivot threshold in the row stability test of the LU factorization • Equivalently, tightens the bound on the sub diagonal elements of L from 1/.01 to 1/.9 – Turn on the numerical emphasis parameter • Causes CPLEX to invoke internal logic to perform more accurate calculations (including quad precision for the LU factorization) 42 © 2014 IBM Corporation IBM Software Group Treatment of symptoms of ill conditioning ns1687037 – Problem stats indicated wide range of coefficients in matrix • Well suited for setting scale parameter to 1 • Removed all problems (loss of feasibility, overly frequent basis refactorizations) seen in iteration logs • Results: Huge reductions in run times for dual simplex and barrier, modest reduction for primal simplex: Algorithm Settings Primal Dual Barrier Default 14214.6 21094.2 1258.23 Scaling=1 11164.64 907.5 83.52 • Solution quality was better as well 43 © 2014 IBM Corporation IBM Software Group Treatment of causes of ill conditioning ns1687037 – Problem stats indicated wide range of coefficients in matrix Linear constraints Nonzeros RHS nonzeros : : Min : Min : 1.987766e-08 Max : 0.0005000000 Max : 1364210. : 5.030775e+07 – Are the small coefficients of 1e-8 meaningful or due to round-off error in the data calculations? • Changing them to 0 results in an LP that solves to optimality within 1 second, just with presolve Suggests these coefficients have meaning, but may cause trouble for CPLEX’s default feasibility or optimality tolerances of 1e-6 • Modeller or owner of data needs to assess whether these coefficients are meaningful 44 © 2014 IBM Corporation IBM Software Group Treatment of causes of ill conditioning ns1687037 – Consider the constraints that contain the tiny coefficients • Fortunately, they appear repeatedly in small subsets of constraints • Reformulation of one subset will apply to other subsets R0002624: 50150 C0024008 + 50150 C0024010 + 50150 C0024012 + 50150 C0024014 + 50150 C0024016 + 113600 C0024020 + 50150 C0024024 + 113600 C0024026 + 113600 C0024038 + ... + 69070 C0025728 + 69070 C0025734 + 47585 C0025738 + 50150 C0025742 + 50150 C0025744 + 69070 C0025748 + C0025749 = 50307748 R0002625: - C0025749 + C0025750 >= 0 These variables R0002626: C0025749 + C0025750 >= 0 only appear in R0002627: 1.9877659e-8 C0025750 - C0025751 = 0 constraints on this R0002628: C0000001 - C0025751 >= 0 slide R0002629: C0000002 - C0025751 >= -0.0005 R0002630: C0000003 - C0025751 >= -0.0008 R0002631: C0000004 - C0025751 >= -0.0009 45 © 2014 IBM Corporation IBM Software Group Treatment of causes of ill conditioning ns1687037 These vars appear in other constraints R0002624: 50150 C0024008 + 50150 C0024010 + 50150 C0024012 + 50150 C0024014 + 50150 C0024016 + 113600 C0024020 + 50150 C0024024 + 113600 C0024026 + 113600 C0024038 + ... + 69070 C0025728 + 69070 C0025734 + 47585 C0025738 + 50150 C0025742 + 50150 C0025744 + 69070 C0025748 + C0025749 = 50307748 // C0025749 free; all others ≥ 0 R0002625: - C0025749 + C0025750 >= 0 C25750 = | C0025749| R0002626: C0025749 + C0025750 >= 0 R0002627: 1.9877659e-8 C0025750 - C0025751 = 0 R0002628: C0000001 - C0025751 >= 0 R0002629: C0000002 - C0025751 >= -0.0005 Scale abs. value of R0002630: C0000003 - C0025751 >= -0.0008 violation for R0002631: C0000004 - C0025751 >= -0.0009 R0002624 Penalty variables; appear here and in objective 46 © 2014 IBM Corporation IBM Software Group Treatment of causes of ill conditioning ns1687037 – Objective contains only variables like C0000001,…,C0000004 • Piecewise linear, higher cost for larger absolute violation of R0002624 • Move the scaling of absolute violation in the constraints to the objective Dramatically improves the coefficient spread in the constraint matrix, LU factors • Better: use unscaled violation. Objective value will be larger, but, if needed, recapture actual value after the optimization R0002627: 1.9877659 C0025750 – 1e+8 C0025751 = 0 R0002628: 1e+8 C0000001 – 1e+8 C0025751 >= 0 R0002629: 1e+8 C0000002 – 1e+8 C0025751 >= -50000 R0002630: 1e+8 C0000003 – 1e+8 C0025751 >= -80000 R0002631: 1e+8 C0000004 – 1e+8 C0025751 >= -90000 (x ʹ = (1e+8) x) R0002627: 1.9877659 C0025750 – C0025751ʹ = 0 R0002628: C0000001ʹ – C0025751ʹ >= 0 R0002629: C0000002ʹ – C0025751ʹ >= -50000 R0002630: C0000003ʹ – C0025751ʹ >= -80000 R0002631: C0000004ʹ – C0025751ʹ >= -90000 47 © 2014 IBM Corporation IBM Software Group Treatment of causes of ill conditioning ns1687037 – Reformulation: R0002624: 50150 C0024008 + 50150 C0024010 + 50150 C0024012 + 50150 C0024014 + 50150 C0024016 + 113600 C0024020 + 50150 C0024024 + 113600 C0024026 + 113600 C0024038 ... + 69070 C0025728 + 69070 C0025734 + 47585 C0025738 + 50150 C0025742 + 50150 C0025744 + 69070 C0025748 + C0025749 = 50307748 R0002625: - C0025749 + C0025750 >= 0 R0002626: C0025749 + C0025750 >= 0 R0002627: 1.9877659 C0025750 - C0025751 = 0 R0002628: C0000001 - C0025751 >= 0 R0002629: C0000002 - C0025751 >= -50000 R0002630: C0000003 - C0025751 >= -80000 R0002631: C0000004 - C0025751 >= -90000 48 © 2014 IBM Corporation IBM Software Group Treatment of causes of ill conditioning ns1687037 – Run times for original formulation: Algorithm Settings Primal Dual Barrier Default 14214.6 21094.2 1258.23 Scaling=1 11164.64 907.5 83.52 – Run times for modified formulation: Algorithm Settings 49 Primal Dual Barrier Default 2310.9 2926.5 41.4 Scaling=1 6890.8 1054.7 68.2 © 2014 IBM Corporation IBM Software Group Common sources of ill conditioning Ill conditioning can be caused by large or small subsets of constraints and variables in the model Such subsets can be difficult to isolate Build up a list of common sources, use that before more model specific analysis 50 © 2014 IBM Corporation IBM Software Group Common sources of ill conditioning Mixture of large and small coefficients in the model – Does not guarantee large basis condition numbers – Additional round-off in floating point representation, arithmetic calculations enables modest condition numbers to magnify error in computed solutions above optimizer tolerances Imprecise data resulting in near singular matrices – Near singular matrices have large condition numbers Long sequences of transfer constraints – Mixture of large and small coefficients is implicit rather than explicit This is not a comprehensive list – Add items based on your own modelling experiences 51 © 2014 IBM Corporation IBM Software Group Common sources Imprecise model data values ∆B distp( B ) := min{ : B + ∆ B singular} ∆B || B || p distp( B ) = 1 / Κ p( B ) p – Avoid rounding if you can, or round as precisely as possible – Matrices can be ill conditioned despite small spread of coefficients – Exact formulation: Maximize x1 + x2 c1: 1/3 x1 + 2/3 x2 = 1 c2: x1 + 2 x2 = 3 – Imprecisely rounded, single [double] precision Maximize x1 + x2 c1: .33333333 x1 + .66666667 x2 = 1 (results in near singular matrix) [ c1: .3333333333333333 x1 + .666666666666667 x2 = 1] (better) c2: x1 + 2x2 = 3 – Scale to integral value whenever possible: Maximize x1 + x2 c1: x1 + 2 x2 = 3 c2: x1 + 2 x2 = 3 52 (best) © 2014 IBM Corporation IBM Software Group Common sources of ill conditioning Long sequences of transfer constraints x1 = 2 x2 x2 = 2 x3 x3 = 2 x4 M x n − 1 = 2 xn xn = 1 x j ≥ 0 for j = 1,K, n All coefficients have same order of magnitude All coefficients can be represented exactly as IEEE doubles How bad can it be? 53 © 2014 IBM Corporation IBM Software Group Common sources of ill conditioning Long sequences of transfer constraints (ctd) – If any one variable > 0, all the others are basic as well – Κ=3*2n – Bound from condition number is fairly tight 1 -2 1 2 1 -2 ~ B = 1 O 1 2 1 1 ~ -1 B = 54 2 4 8 1 2 4 1 2 1 O 1 2 n-1 n-2 2 2 n -3 2 1 © 2014 IBM Corporation IBM Software Group Common sources of ill conditioning Long sequences of transfer constraints (ctd) – Substitute out variables: x1 = 2 x2 x2 = 2 x3 ( x1 = 4 x3 ) x3 = 2 x4 ( x1 = 8 x4 ) M xn − 1 = 2 xn ( x1 = 2n − 1 xn ) xn = 1 x j ≥ 0 for j = 1,K, n – Small change in xn propagates into large change in x1 55 © 2014 IBM Corporation IBM Software Group Diagnostics for ill conditioning and numerical instability Consider the list of common sources first Look at solution values – Extremely large primal or dual values can identify small subsets of constraints and variables involved in the ill-conditioning Then look at the basis and its inverse for large values – C API programs available among IBM Technotes* For MIPs, consider MIP Kappa feature – C API program available to export node LPs with conditions number above a user supplied threshold available as well* • Look at solution values, basis values or inverse values after locating a node LP with ill conditioned optimal basis *http://www-01.ibm.com/support/docview.wss?uid=swg21662382 56 © 2014 IBM Corporation IBM Software Group Diagnostics for ill conditioning and numerical instability Consider other metrics for ill conditioning Distance to singularity – Find linear combination of rows or columns that is closest to 0 • Since B is nonsingular, no linear combination is exactly 0 Min eT (u + w) s.t. c1 : BT λ − v = 0 c2 : v = u − w c 3 : eT λ = 1 λ , v free; u, v ≥ 0 v is a nonzero linear combination of the rows of B Used to minimize |v| in the objective – Nonzeros of optimal λ* identify a single subset of constraints involved in ill-conditioning if optimal objective value is small relative to λ* 57 © 2014 IBM Corporation IBM Software Group Diagnostics for ill conditioning and numerical instability Challenges with distance to singularity LP – B is known to be ill conditioned; that’s why this LP is of interest • May be better to start with all slack basis • Turn on numerical emphasis – λ* may identify a large number of constraints, making diagnosis difficult • Solve a MIP instead Min eT ( y + z ) s.t. c1 : λ = u − w c2 : u − y ≤ 0 c3 : w − z ≤ 0 Count the nonzero elements of λ; makes use of objective c4 : BT λ = s − t c5 : e s + e t ≤ ε T T c 6 : eT λ = 1 λ , free; s, t , u, w ≥ 0; y , z ∈ {0,1} 58 Nonzero linear combo of rows of B that is close to 0 © 2014 IBM Corporation IBM Software Group Diagnostics for ill conditioning and numerical instability Challenges with distance to singularity MIP – B is known to be ill conditioned, as with LP – Need to choose a small value for ɛ • Conceptually want ɛ = 1/κ, but too small for large κ associated with ill conditioned basis • Need to use larger value, which could compromise the usefulness of the resulting solution – Solving a MIP can be much harder than solving an LP 59 © 2014 IBM Corporation IBM Software Group Anomalies, Misconceptions, Inconsistencies and Contradictions Unscaled infeasibilities 3y1 + 5y2 + 799988z ≤ 800000, y1, y2 ∈ [0,100000], z binary After default scaling: 3/799988 y1 + 5/799988 y2 + z ≤ 800000/799988 y1 =0, y2= 2.5, z=1: lhs = 1 + 12.5/799988, rhs = 1 + 12/799988 Scaled infeasibilities = .5/799988 ~= 6.25e-7 Unscaled: lhs = 800000.5, rhs = 800000; unscaled infeasibilities = .5 – Remedies • Improve ratio of largest to smallest coefficients to make default scaling more effective If bounds of 100000 can be reduced to 1000, constraint becomes 3y1 +5y2 +7988z <= 8000, scaled infeasibility increases to ~6.25*10-5, excluding y1 = 0, y2 = 2.5, z=1 • If binary involved, try to replace with indicator constraints 60 z = 1 3y1 +5y2 <= 12 © 2014 IBM Corporation IBM Software Group Anomalies, Misconceptions, Inconsistencies and Contradictions Models on the edge of feasibility and infeasibility – This can happen with well-conditioned models – Example: c1 : - x1 + 24x2 ≤ 21; - ∞ ≤ x1 ≤ 3; x2 ≥ 1.00000008 • For CPLEX’s default feasibility tolerance of 1e-6, different bases can legitimately result in a declaration of feasibility or infeasibility 61 © 2014 IBM Corporation IBM Software Group Anomalies, Misconceptions, Inconsistencies and Contradictions c1 : - x1 + 24x2 ≤ 21; Models on the edge of feasibility and infeasibility – Different bases legitimately declare the model feasible or infeasible - ∞ ≤ x1 ≤ 3; x2 ≥ 1.00000008 – Presolve (all slack basis) c1 : - x1 + 24x2 ≤ 21; row min : - 1 * 3 + 24 * 1.00000008 = 21 + 1.92e − 6 ≥ 21 (decreasing x1 or increasing x2 from bound will not reduce infeasibility Presolve off, defaults otherwise (uses all slack basis again) Primal simplex - Infeasible: Infeasibility = 1.9199999990e-06 CPLEX > display solution reduced Variable Name Reduced Cost x1 -1.000000 x2 24.000000 Constraint Name Slack Value slack c1 -0.000002 62 © 2014 IBM Corporation IBM Software Group Anomalies, Misconceptions, Inconsistencies and Contradictions c1 : - x1 + 24x2 ≤ 21; Models on the edge of feasibility and infeasibility – Different bases legitimately declare the model feasible or infeasible - ∞ ≤ x1 ≤ 3; x2 ≥ 1.00000008 – Presolve off, defaults (starting basis of x2) • x1 = 3, slack on c1 = 0 x2 = 1.0 • Bound violation on x2 of .8e-8 is within CPLEX’s default feasibility tolerance of 1e-6 • Feasible and optimal basis, since objective is all zero – What should we do? • Consistent (infeasible) results obtained with feasibility tolerance of 1e-8 • Feasibility and optimality tolerances should be smaller than smallest legitimate data value • Need to decide if .00000008 is legitimate or round-off error 63 Legitimate: Reduce tolerances Round-off: Clean data; change lower bound of x2 to 1.0 © 2014 IBM Corporation IBM Software Group Examples from publicly available test sets ns1687037 (previously discussed) – Wide range of coefficients – Setting scaling to 1 helped – Recognize that smallest coefficients involved penalties on constraint violations that could be moved into the objective function, improving numerics of basis factorization – Reformulating the model to improve the numerics yielded additional improvements, addressed the underlying cause of the problem 64 © 2014 IBM Corporation IBM Software Group Examples from publicly available test sets cdma (unsolved MIP from unstable test set of MIPLIB 2010) Nodes Node Left * 0+ 0 0 0 ... Cuts/ Objective IInf Best Integer 0 -1.97987e+14 0 -6.33335e+16 704 -1.97987e+14 0 -6.31118e+16 702 -1.97987e+14 0 -6.26076e+16 631 -1.97987e+14 Best Bound ItCnt Gap -6.51749e+17 -6.33335e+16 Cuts: 1870 Cuts: 1870 71155 71155 185865 292616 --------- 0 0 -5.87363e+16 1206 -3.21168e+15 Cuts: 1604 4545331 --* 0+ 0 -7.75173e+15 -5.87363e+16 4654552 657.72% ... * 0+ 0 -1.16804e+16 -5.81032e+16 5626309 397.44% 0 0 -5.80853e+16 1585 -1.16804e+16 Cuts: 566 5632163 397.29% Heuristic still looking. 0 2 -5.80853e+16 1583 -1.16804e+16 -5.80853e+16 5633601 397.29% Elapsed time = 52951.48 sec. (11682283.45 ticks, tree = 0.01 MB, solutions = 17) 1 3 -5.80295e+16 1444 -1.16804e+16 -5.80853e+16 5643488 397.29% ... 12862 10763 -4.10127e+16 1032 -1.46845e+16 -4.29552e+16 29639775 192.52% Elapsed time = 71901.33 sec. (18469780.15 ticks, tree = 33.05 MB, solutions = 24) 12866 10767 -3.86482e+16 979 -1.46845e+16 -4.29552e+16 29661467 192.52% 65 © 2014 IBM Corporation IBM Software Group Examples from publicly available test sets cdma (unsolved MIP from unstable test set of MIPLIB 2010) – Problem statistics: Variables : 7891 [Fix: 1, Box: 3655, Binary: 4235] Objective nonzeros : 2383 Linear constraints : 9095 [Less: 8390, Greater: 645, Equal: 60] Nonzeros : 168227 RHS nonzeros : 4145 Variables : Min LB: 0.000000 Objective nonzeros : Min : 1.000000 Linear constraints : Nonzeros : Min : 1.000000 RHS nonzeros : Min : 1.000000 66 Max UB: 1.000000e+07 Max : 1.724400e+11 Max Max : 5.000000e+07 : 3000000. © 2014 IBM Corporation IBM Software Group Examples from publicly available test sets cdma (unsolved MIP from unstable test set of MIPLIB 2010) – Typical node LP iteration log Wasted iterations slow node throughput Iteration log ... Iteration: 1 Scaled dual infeas = 0.000130 Iteration: 8 Scaled dual infeas = 0.000069 Iteration: 12 Dual objective = -6614900586660791.000000 Iteration: 23 Scaled dual infeas = 0.000107 Iteration: 32 Dual objective = -6614900586660791.000000 Iteration: 46 Dual infeasibility = 0.000038 Iteration: 52 Dual objective = -6614900586660791.000000 Iteration: 58 Dual infeasibility = 0.000038 Iteration: 64 Dual objective = -6614900586660791.000000 Maximum unscaled reduced-cost infeasibility = 7.62939e-06. Maximum scaled reduced-cost infeasibility = 7.62939e-06. Dual simplex - Optimal: Objective = -6.6149005867e+15 ... 67 © 2014 IBM Corporation IBM Software Group Examples from publicly available test sets cdma (unsolved MIP from unstable test set of MIPLIB 2010) – Solution quality for same node LP Max. unscaled (scaled) bound infeas. = 1.81311e-07 (1.81311e-07) Max. unscaled (scaled) reduced-cost infeas. = 7.62939e-06 (7.62939e-06) Max. unscaled (scaled) Ax - b resid. = 9.99989e-10 (6.10345e-14) Max. unscaled (scaled) c - B'pi resid. = 7.3125 (7.3125) Max. unscaled (scaled) |x| = 5596 (55296) Max. unscaled (scaled) |slack| = 6.12827e+06 (10.6908) Max. unscaled (scaled) |pi| = 8.67329e+16 (4.02442e+17) Max. unscaled (scaled) |red-cost| = 4.31401e+17 (4.31401e+17) Condition number of scaled basis = 5.2e+08 Reasonable optimal basis condition number 68 Incoming variable choice based on round-off error © 2014 IBM Corporation IBM Software Group Examples from publicly available test sets cdma (unsolved MIP from unstable test set of MIPLIB 2010) – Large objective coefficients, not large basis condition numbers, cause slow node throughput • Model is numerically unstable, not ill conditioned • 16 base 10 digits of accuracy for IEEE doubles, objective coefficients on the order of 1e+11 Round-off error of 1e-5 just to represent Modest basis condition numbers of 1e+8 can magnify to 1e+8*1e-5 = 1e+3 Default optimality tolerance: 1e-6 • Simplex method pivots heavily influenced by round-off error – What can we do? • Take a closer look at the objective function 69 © 2014 IBM Corporation IBM Software Group Examples from publicly available test sets cdma (unsolved MIP from unstable test set of MIPLIB 2010) – Histogram of objective coefficients* OBJECTIVE Range Count [10^0,10^1]: 31 [10^9,10^10]: 1236 [10^11,10^12]: 1116 All binaries; relative contribution to objective is below CPLEX’s default relative MIP gap Current objective poorly scaled; would be well scaled if we deleted the 31 small objective coeffs. – Large objective coefficients problematic for dual feasibility of node LP, dual residuals in solution quality *Using program from https://www-304.ibm.com/support/docview.wss?uid=swg21400100. 70 © 2014 IBM Corporation IBM Software Group Examples from publicly available test sets cdma (unsolved MIP from unstable test set of MIPLIB 2010) – 31 binaries with relatively small objective coefficients have negligible impact on objective coefficients • Especially when current solutions have relative MIP gaps of over 150% – Remove them from the objective • Remaining objective coefficients are all on the order of [1e+9,1e+12] • Rescale by 1e+9 – Much better results with adjusted model • Much faster node throughput • Much better intermediate results regarding MIP gap • Moderately better final MIP gap after ~20 hours – More to be done • MIP gap remains challenging • But at least now node throughput sufficiently fast to consider MIP parameter tuning, other changes to formulation 71 © 2014 IBM Corporation IBM Software Group Examples from publicly available test sets cdma Implications – Mixture of large and small coefficients can be problematic • Consider solving sequence of problems with a hierarchical objective rather than solving a problem with a single, blended objective – Examined node log to discover slow node throughput was a major performance bottleneck – Examined node LP iteration log, solution quality, problem statistics to identify large dual residuals as the primary source of slow node LP solve times – Adjusted formulation to obtain a well scaled objective 72 © 2014 IBM Corporation IBM Software Group Examples from publicly available test sets de063155 (LP from http://www.sztaki.hu/meszaros/public_ftp/lptestset/problematic/ – CPLEX solves it in less than 0.1 seconds – Iterations logs indicate no sign of trouble – Problem statistics and solution quality raise questions regarding the solution and the associated physical system – Is the solution acceptable? • Depends • Examine the problem statistics and solution quality to find out. 73 © 2014 IBM Corporation IBM Software Group Examples from publicly available test sets de063155 (LP from http://www.sztaki.hu/meszaros/public_ftp/lptestset/problematic/ – Problem stats Variables : 228, Other: 84] Objective nonzeros : Linear constraints : Nonzeros : RHS nonzeros : Variables Objective nonzeros Linear constraints Nonzeros RHS nonzeros 74 1488 [Nneg: 756, Fix: 205, Box: 215, Free: 852 852 [Less: 360, Equal: 492] 4553 777 : Min LB: -10000.00 Max UB: 30.90000 : Min : 1.279580e-05 Max : 1000.000 : : Min : 2.106480e-07 Max : 8.354500e+11 : Min : 0.0002187500 Max : 4.227560e+17 © 2014 IBM Corporation IBM Software Group Examples from publicly available test sets de063155 (LP from http://www.sztaki.hu/meszaros/public_ftp/lptestset/problematic/ – Solution quality: There are no bound infeasibilities. There are no reduced-cost infeasibilities. Max. unscaled (scaled) Ax-b resid. = 747.949 (5.12641e-08) Max. unscaled (scaled) c-B'pi resid. = 7.74852e-10 (8.51025e-06) Max. unscaled (scaled) |x| = 3.10148e+13 (3.76112e+07) Max. unscaled (scaled) |slack| = 3.75814e+07 (3.75814e+07) Max. unscaled (scaled) |pi| = 62061.1 (5.106e+09) Max. unscaled (scaled) |red-cost| = 6.78639e+09 (8.5923e+09) Condition number of scaled basis = 1.7e+08 – Using aggressive scaling or turning on numerical emphasis does not improve the solution quality [verify scaling] 75 © 2014 IBM Corporation IBM Software Group Examples from publicly available test sets de063155 – Is the solution quality a problem? Max. unscaled (scaled) Ax-b resid. = 747.949 (5.12641e-08) Max. unscaled (scaled) c-B'pi resid. = 7.74852e-10 (8.51025e-06) Max. unscaled (scaled) |x| = 3.10148e+13 (3.76112e+07) Max. unscaled (scaled) |slack| = 3.75814e+07 (3.75814e+07) Max. unscaled (scaled) |pi| = 62061.1 (5.106e+09) Max. unscaled (scaled) |red-cost| = 6.78639e+09 (8.5923e+09) Condition number of scaled basis = 1.7e+08 – Unscaled primal residuals are large in an absolute sense, but not relative to primal solution values – Solution quality does not hinder performance – Practitioner must assess acceptability in context of the physical system being modelled • Look at the constraints with the large absolute residuals • Need more computing precision if not acceptable • Or need to reformulate model in order to eliminate large matrix and right hand side coefficients 76 © 2014 IBM Corporation IBM Software Group Examples from publicly available test sets iprob (LP from http://www.sztaki.hu/meszaros/public_ftp/lptestset/problematic/ – Problem stats are benign – CPLEX and most other solvers indicate the model is infeasible – CPLEX’s conflict refiner generates a conflict that is feasible – CPLEX’s final basis has a condition number of ~1e+7, but small changes to the model make it feasible – There are some claims on the web that the model is feasible – Investigate the source of these inconsistencies 77 © 2014 IBM Corporation IBM Software Group Examples from publicly available test sets iprob – Problem stats are benign (but still useful) Square linear system with all free variables Variables Objective nonzeros Linear constraints Nonzeros RHS nonzeros : 3001 [Free: 3001] : 3000 : 3001 [Equal: 3001] : 9000 : 3000 Variables Objective nonzeros Linear constraints Nonzeros RHS nonzeros : Min LB: all infinite Max UB: all infinite : Min : 1.000000 Max : 1.000000 : : Min : 0.01030000 Max : 9940.000 : Min : 1.000000 Max : 1.000000 – Has a feasible solution if the square basis matrix of all free variables is nonsingular 78 © 2014 IBM Corporation IBM Software Group Examples from publicly available test sets iprob LP file Minimize obj.func: c2 + c3 + c4 + … + c2999 + c3000 + c3001 Subject to r1: 0.373 c2 + 1.23 c3 + 39.4 c4 + ... + 0.798 c3000 + 0.0889 c3001 = 1 r2: - 0.373 c1 + c2 = 1 r3: - 1.23 c1 + c3 = 1 r4: 39.4 c1 + c4 = 1 ... r3000: 0.798 c1 + c3000 = 1 r3001: - 0.0889 c1 + c3001 = 0 – All variables but c1 appear in the first constraint – Variable c1 appears in the remaining 3000 constraints, with one other variable • In each of these constraints, c1 has a coefficient whose absolute value is the coefficient of the other variable in the first constraint 79 © 2014 IBM Corporation IBM Software Group Examples from publicly available test sets iprob, model formulation 3000 min ∑ x j subject to j= 2 3000 r1 : ∑ j= 2 α j xj = 1 rk : α k x1 + xk = bk k = 2,K,3001 x free 80 © 2014 IBM Corporation IBM Software Group Examples from publicly available test sets iprob, structure of free variable basis matrix 3000 min ∑ x j x1 α 2 α 3 L α 3001 α 2 1 α 1 3 M O α 1 3000 α L 1 3001 r1 0 subject to j= 2 3000 r1 : ∑ α j xj = 1 j= 2 rk : α k x1 + xk = bk k = 2,K,3001 x free After permuting x1, r1 to last row and column: I B= α 81 T α 0 © 2014 IBM Corporation IBM Software Group Examples from publicly available test sets iprob, closed form representation of LU factors and basis inverse I L= α T 0 1 α I U= 0 -α Tα δ Singular when 82 δ = 0 I B= α α α I δ −1 B = T α δ T T α 0 α δ 1 - δ © 2014 IBM Corporation IBM Software Group Examples from publicly available test sets iprob, singular when I B= T α vT B = 0 vT b ≠ 0 83 α ; 0 δ = α Tα = 0 let v = ( − α ,1), then (proof of singularity) (proof of infeasibility) © 2014 IBM Corporation IBM Software Group Examples from publicly available test sets iprob – Singular when δ = 0 – Becomes increasingly ill conditioned as δ → 0 • Gastinel and Kahan’s metric • Definition of condition number of B – δ = 0 for the data instance at the web site (under perfect precision) • CPLEX and the other solvers correctly declare infeasibility • Any slight perturbation to the data causes δ ≠ 0 Problem then is feasible (although potentially ill conditioned) • Model is on the boundary of feasibility and infeasibility – Excellent model for exploring ill conditioning • Model structure easily reproducible with just a few constraints and variables 84 © 2014 IBM Corporation IBM Software Group Examples from publicly available test sets iprob – Implication for practitioner • Needs to assess meaning of small and 0 values of the physical system being modelled δ in the context of Can check the model data in advance to calculate the value of Declare infeasible without solving in the singular case Consider reformulating if need to solve the boundary cases δ • Additional precision beyond 64 bits may help for small values 85 Exact rational arithmetic would be better. © 2014 IBM Corporation IBM Software Group Examples from publicly available test sets ns2122603 (http://miplib.zib.de/miplib2010-unstable.php). – Red model from MIPLIB 2010, never solved to optimality – Wide range of coefficients like cdma – Highly variable results • Different objective values claimed optimal with irrelevant changes to optimization • Dramatic variability in run time as well 86 © 2014 IBM Corporation IBM Software Group Examples from publicly available test sets ns2122603, problem stats Variables Objective nonzeros Linear constraints 2488] Nonzeros RHS nonzeros Variables Objective nonzeros Linear constraints Nonzeros RHS nonzeros Imprecise rounding? 87 : 19300 [Nneg: 11712, Binary: 7588] : 5880 : 24754 [Less: 14706, Greater: 7560, Equal: : 77044 : 10088 : Min LB : 0.000000 Max UB : 1.000000 : Min : 1.000000 Max : 450.0000 : : Min : 0.04166670 Max : 1.000000e+08 : Min : 0.02083335 Max : 1.000000e+08 Wide coefficient range © 2014 IBM Corporation IBM Software Group Examples from publicly available test sets ns2122603, MIP Kappa output: Max condition number: 6.0818e+23 Percentage (number) of stable bases: 1.89% (108) Percentage (number) of suspicious bases: 60.29% (3446) Percentage (number) of unstable bases: 35.22% (2013) Percentage (number) of ill-posed bases: 2.61% (149) Attention level: 0.137747 – Κ = 1e+23 -> perturbations of 1e-16 can be magnified to 1e-16*1e+23 = 1e+7 – Need to look at node LPs with ill-posed optimal bases • http://www-01.ibm.com/support/docview.wss?uid=swg21662382 contains a program that writes out node LPs with optimal basis condition numbers that exceed a threshold provided by the user 88 © 2014 IBM Corporation IBM Software Group Examples from publicly available test sets ns2122603, ill conditioned node LP solution quality Max. unscaled (scaled) bound infeas. = 1.72662e-12 (1.72662e-12) Max. unscaled (scaled) reduced-cost infeas.= 3.97904e-13 (3.97904e-13) Max. unscaled (scaled) Ax - b resid. = 7.73721e-07 (9.67152e-08) Max. unscaled (scaled) c - B'pi resid. = 4.33376e-12 (4.33376e-12) Max. unscaled (scaled) |x| = 4.8e+09 (6e+08) Max. unscaled (scaled) |slack| = 4.81015e+09 (6e+08) Max. unscaled (scaled) |pi| = 66773.2 (6.71089e+07) Max. unscaled (scaled) |red-cost| = 1.00002e+08 (1.34218e+08) Condition number of scaled basis = 8.6e+17 Large primal values (large red. costs too, but consider primal value first since primal residuals are larger 89 © 2014 IBM Corporation IBM Software Group Examples from publicly available test sets ns2122603, ill conditioned node LP primal variable values Variable Name Solution Value C0213 0.007973 C0217 0.025673 ... C10142 36.450029 C10458 4799996160.003072 ... C11441 2399998080.001536 C11442 2399998080.001536 ... C11711 2583222987.267681 C11712 7.851078 ... C19155 0.000476 All other variables in the range 1-16642 are 0. 90 © 2014 IBM Corporation IBM Software Group Examples from publicly available test sets ns2122603, constraints associated with large primal variable value Intersecting Constraints, Variable C11441, Model ns2122603 Variable C11441 coefficients Linear Constraint: Coefficient: Linear Objective 0 R13353 -0.0416667 R14613 1 R15873 1 1/24 ???? R17133 1 R18393 1 R13353: C10181 - 0.0416667 C11441 - 100000000 C12701 + 100000000 C14025 ≥ -100000000 Variable Name Solution Value C11441 2399998080.001536 The variable ``C12701'' is 0. The variable ``C14025'' is 0. The variable ``C10181'' is 0. 91 2400000000? (Change in input by 1e-7 changes output by ~2e+3) © 2014 IBM Corporation IBM Software Group Examples from publicly available test sets ns2122603 – Change .0416667 to 1/24 • Do likewise with some other coefficients in the model – Resulting constraint, after rescaling by 24: R13353: 24 C10181 + 2400000000 C14025 - 2400000000 C12701 - 1.0 C11441 ≥ -2400000000 – Inconsistencies, bad MIP Kappa stats persist – Mixture of large and small coefficients remain 92 © 2014 IBM Corporation IBM Software Group Examples from publicly available test sets ns2122603 Binaries – Large coefficients appear to be big M values R13353: 24 C10181 + 2400000000 C14025 - 2400000000 C12701 - 1.0 C11441 ≥ -2400000000 – Try to remove the large coefficients using indicator variables – First need to interpret constraint • C14025 = 0 and C12701 = 1 C11441 – 24C10181 ≤ 0 • Otherwise, nonbinding • Need a binary that assumes value of 1 when C14025 = 0 and C12701 = 1, 0 otherwise Z ≥ C12701 - C14025 Z ≤ C12701 Z + C12701+C14025 ≤ 2 Z ≥ 1 C11441 – 24C10181 ≤ 0 93 © 2014 IBM Corporation IBM Software Group Examples from publicly available test sets ns2122603 – Problem stats after replacing big M constraints with indicators Variables Objective nonzeros Linear constraints Nonzeros RHS nonzeros Indicator constraints Nonzeros RHS nonzeros Variables Objective nonzeros Linear constraints Nonzeros RHS nonzeros Indicator constraints: Nonzeros RHS nonzeros 94 : 23080 [Nneg: 11712, Binary: 11368] : 5880 : 28534 [Less: 18486, Greater: 7560, Equal: 2488] : 84604 : 10088 Previously : 2520 [Greater: 2520] [4e-2, 1e+8] : 3780 : 1260 : Min LB : 0.000000 Max UB : 1.000000 : Min : 1.000000 Max : 450.0000 : : Min : 1.000000 Max : 110555.0 : Min : 1.000000 Max : 1.295001e+07 : Min : 1.000000 Max : 24.00000 : Min : 1.000000 Max : 1.000000 © 2014 IBM Corporation IBM Software Group Examples from publicly available test sets ns2122603, indicator formulation Consistent results regardless of parameter settings, random seeds MIP model remains challenging – Now have a chance to address the MIP performance issues 95 © 2014 IBM Corporation IBM Software Group Summary of examples from publicly available test sets ns1687037 (LP; previously discussed) – Wide range of coefficients – Setting scaling to 1 helped – Reformulating the model to improve the numerics yielded additional improvements, addressed the underlying cause of the problem • Moving the scaling issue from the matrix to the objective removed the numerical problems from the basis matrix cdma (MIP) – – – – 96 Basis condition numbers OK Wide range of objective coefficients were the real problem Separate large objective coefficients, rescale Faster node throughput yields significantly better solutions faster, but solving MIP to optimality remains challenging © 2014 IBM Corporation IBM Software Group Summary of publicly available examples (ctd). de063155 (LP) – No performance problem; solves within a second – Problem statistics, solution quality are cause for concern – Large data values, significant absolute residuals that are relatively small • Need to assess whether residuals are acceptable in the context of the associated system being modelled iprob (LP) – Depending on data values can be Ill posed/singular, highly ill conditioned or well conditioned • Ill posed for specific data instance – Straightforward to mathematically represent the model • Closed form representation of relevant basis matrices – Use closed form representation to filter ill posed data instances – Decide based on associated system being modelled what to do ns2122603 (MIP) – Ill conditioned basis matrices yield inconsistent results – Replacing constraints with large big M values with CPLEX’s indicator constraints improves the formulation, yielding consistent results – Solving the MIP to optimality remains challenging 97 © 2014 IBM Corporation IBM Software Group Summary and Conclusion Key observations Ill conditioning can occur under perfect arithmetic – But finite precision introduces perturbations that ill conditioning may magnify • Can’t avoid machine precision Algorithms cannot be ill conditioned, but they can be numerically unstable – Avoid unnecessary round-off errors in computations – Ill conditioned transformations in algorithms promote ill numerical instability • Stability tests prevent dividing smaller numbers into much larger ones In CPLEX’s algorithms In data calculations, including algorithms (e.g., predictive analytics) Most LP and MIP solvers use absolute rather than relative tolerances – Models with limited accuracy or significant round-off error may require larger tolerances 98 © 2014 IBM Corporation IBM Software Group Summary and Conclusion Key recommendations Make sure optimizer doesn’t make algorithmic decisions based on round-off error – Distinguish meaningful data from round-off error – Large basis condition numbers can magnify round-off error – Solution quality can help assess this • Compare primal and dual residuals to feasibility and optimality tolerances, respectively Avoid single precision data calculations whenever possible – Trade-off between memory savings and accuracy is usually unfavorable – If necessary, use larger feasibility and optimality tolerances Build a list of known sources of ill conditioning Try to reproduce numerical issues on smaller model instances Make use of the numerous different metrics available to measure and assess ill conditioning 99 © 2014 IBM Corporation IBM Software Group Summary and Conclusions Tools for assessing presence of ill conditioning or excessive round-off error – Examine problem statistics of model before starting the optimization • Mixtures of large and small coefficients • Indications of nearly linearly dependent rows – Examine node or iteration log during the optimization • Loss of feasibility for LP/QP solves • Large iteration counts for node relaxations – Examine solution quality after the optimization • Significant primal or dual residuals often indicate large basis condition numbers – (If available) Examine the diagonals of U in B=LU (MINOS) – (If available) Run the MIP Kappa feature for MIPs (CPLEX) 100 © 2014 IBM Corporation IBM Software Group Summary and Conclusion Assessing high condition numbers – Increasing algorithm tolerances can mitigate effects of large condition numbers • But improvements to formulation, accuracy of input are more robust – 1e+10, 1e+14 are important thresholds on common machines when using 64 bit double precision – CPLEX’s MIP Kappa feature enables assessment of ill conditioning in MIPs CPLEX solves most models with some or all of these problems just fine – Familiarity with these remedies can save the practitioner precious time when they do encounter a badly ill conditioned model. 101 © 2014 IBM Corporation IBM Software Group Discussion What other features in CPLEX Optimization Studio would help you bridge the gap between the mathematical model and the practical application? – How useful would a minimal subset of an ill conditioned model that remains ill conditioned be? • Even if it involved a significant number of constraints and variables? Let us know ([email protected]). 102 © 2014 IBM Corporation IBM Software Group References/Further Reading More detailed discussion in INFORMS TutORials in Operations Research 2014 Higham, Accuracy and Stability of Numeric Algorithms Duff, Erisman and Reid, Direct Methods for Sparse Matrices Gill, Murray and Wright, Practical Optimization Golub and Van Loan, Matrix Computations Floating point arithmetic: http://pages.cs.wisc.edu/~smoler/x86text/lect.notes/arith.flpt.html MIP Performance tuning and formulation strengthening: α – Klotz, Newman. Practical Guidelines for Solving Difficult Mixed Integer Programs http://www.sciencedirect.com/science/article/pii/S1876735413000020 LP performance issues – Klotz, Newman. Practical Guidelines for Solving Difficult Linear Programs http://www.sciencedirect.com/science/article/pii/S1876735412000189 Converting repeating decimals into rational fractions: http://en.wikipedia.org/wiki/Repeating_decimal#Converting_repeating_decimals_to _fractions 103 © 2014 IBM Corporation IBM Software Group Backup Material Backup Material 104 © 2014 IBM Corporation IBM Software Group Problem Definition Ill Conditioning – Motivated by work of meteorologist & mathematician Edward Lorenz – Lorenz focused on small changes in initial conditions, resulting trajectories in nonlinear meteorological models • Lorenz subsequently became a pioneer in the field of Chaos Theory – Ill conditioning extends beyond the nonlinear meteorological models on which Lorenz worked – More generally, a mathematical model or system is ill conditioned when a small change in the input can result in a large change to the computed solution 105 © 2014 IBM Corporation IBM Software Group Alternate interpretations of Ill Conditioned Basis Skeel’s condition number: || | B − 1 | ⋅ | B | || – Invariant under row scaling • Not invariant under column scaling – If significantly smaller than regular condition number, some rows of the matrix have larger norms than others 106 © 2014 IBM Corporation IBM Software Group Alternate interpretations of Ill Conditioned Basis Skeel’s condition number: – Example (http://www.hsl.rl.ac.uk/specs/mc75.pdf): c1: 3300 x1 + 1e-11 x2 = 1 c2: x1 + 3300 x2 + 1e-11 x3 = 1 c3: x2 + 3300 x3 + 1e-11 x4 = 1 c4: 10000 x2 + 10000 x3 + 330000000 x4 = 1 // much larger row norm xj free, j=1,...,4 – Skeel condition number: 1.0061 – CPLEX exact condition number (no scaling): 100036 – CPLEX exact condition number (default scaling): 10.0091 107 © 2014 IBM Corporation IBM Software Group Alternate interpretations of Ill Conditioned Basis Skeel’s condition number: || | B −1 | ⋅ | B | || – Why does this metric measure sensitivity to perturbations? ( ) −1 • We saw how κ B = B ⋅ B measured potential magnification of error in the solution relative to perturbation in the input −1 • What is the underlying theoretical justification for || | B | ⋅ | B | || ? 108 © 2014 IBM Corporation IBM Software Group Alternate interpretations of Ill Conditioned Basis What is the underlying theoretical justification for || | B −1 | ⋅ | B | ||? – Use absolute values on individual components instead of norms during derivation δ B ≤ ε B (ε – Use componentwise perturbation instead of norms: (original system) (perturbed system) Bx = b (Combine and rearrange) > 0) ( B + δ B )( x + δ x ) = b −1 ⇒ − δ x = B δ B( x + δ x ) ≤ εB (Componentwise abs value) ⇒ δ x = B − 1δ B ( x + δ x ) ⇒ δ x = B − 1δ B ( x + δ x ) ≤ B − 1 ⋅ B ⋅ ( x + δ x ) ε ⇒ δ x ( x + δ x) ≤ B − 1 ⋅ B ⋅ ε 109 © 2014 IBM Corporation IBM Software Group Examples Consider alternate formulations to improve numerics – Fixed costs on continuous variables using big Ms: Minimize cT x + f T z subject to Ax = b xi − Mzi ≤ 0 ( c , f ≥ 0) (only constraint with zi ) xi ≥ 0, 0 ≤ zi ≤ 1 (Mixture of large and small numbers) zi integer • LP relaxation solution xi ≤ Mzi ⇒ xi / M ≤ zi ⇒ zi = xi / M • CPLEX default integrality tolerance: 1e-5 xi = 100, M = 1e + 10 ⇒ zi = xi / M = 1e − 8 zi not eligible for branching unless M ≤ 1e + 7 110 “integer feasible” solution within integrality tolerance that violates intent of the model (trickle flow) © 2014 IBM Corporation IBM Software Group Examples To get correct answers with big-M formulation – Use smallest possible value of big-M that doesn’t violate intent of model • Bound strengthening in CPLEX presolve often does this automatically – Set integrality tolerance to 0 – Set simplex tolerances to minimum values, 1e-9 – Ask for more accuracy on a potentially ill-conditioned system • Turn on numerical emphasis parameter Many users are unfamiliar with issues – Frequent source of CPLEX customer calls – One of most popular CPLEX FAQs – But should they have to be? 111 © 2014 IBM Corporation IBM Software Group Examples Indicator constraint formulation for fixed costs on continuous variables Minimize c T x + f T z subject to Ax = b (c, f ≥ 0) zi = 0 → xi ≤ 0 (CPLEX branches on these directly) xi ≥ 0, 0 ≤ zi ≤ 1 zi integer – LP relaxation solution xi = 100, zi = 0 indicator constraint i requires branching 112 (integer feasible solutions aligned with intent of the model) © 2014 IBM Corporation IBM Software Group Examples Which approach to use? – Indicator formulation more precise representation of model • Indicator and big-M formulation equivalent when M=∞ – If we can use modest values for big-M, indicator formulation tends to be weaker – Use indicator constraints, let CPLEX decide whether to replace with big-Ms if preprocessing can deduce big-M values of modest size • Presolve tightens the indicator formulation (improved further in CPLEX 12.2.0) 113 Presolve on indicators (improved) Node presolve on indicators Probing on by default Probing on indicator constraints Re-presolve by default © 2014 IBM Corporation
© Copyright 2026 Paperzz