Confidence in the Range of Variability H.J. Pradlwarter and G.I. Schuëller [email protected] Institute of Engineering Mechanics Leopold-Franzens University Innsbruck, Austria, EU 1 Problem definition Suppose, only few measured values of an uncertain quantity are available: Is it under such circumstance possible to establish a credible probability distribution for the reliability assessment? - Without any strong assumptions, certainly not ! - There is an infinite number of options ! - Some physical background information is needed to proceed any further. 2 Problem definition (cont.) Among the infinite set of options, the choice should reflect the needs of the analysis Some options: The uncertainty due to the insufficient amount of data points should be considered For estimating the performance (safety assessment) confidence in the estimates will be crucial. 3 Problem definition (cont.) Only few measured values of an uncertain quantity are available: Is it under such circumstance possible to establish a credible probability distribution for the reliability assessment? Yes if: - Some physical background information can be safely assumed - We are not looking for the best estimate (e.g. a Bayesian approach) but for an conservative PDF estimate for a required confidence level. 4 Overview Bootstrap procedure Statistical results, e.g. confidence intervals Probability of observation Probability of lying outside the observed domain Probability density estimation Extended bootstrap procédure Marginal distributions for calibration data Joint distributions Results Conclusions 5 Bootstrap procedure Bootstrap procedure: Modern, computer-intensive, general purpose approach to statistical inference. Approach to compute properties of an estimator (e.g. variance, confidence intervals, correlations). Advantage: Straightforward also for complex estimators and complex distributions. Disadvantage: Tendency to be too optimistic for small sample sizes. 6 Bootstrap procedure Bootstrap procedure (cont.): Given the data set x ( x1 , x2 ,..., xn1 , xn ) Resampling: Generate artificially a large number N of sets x ( j) N from data by random sampling j1 x( j ) ( xI (1, j ) , xI (2, j ) ,..., xI ( n1, j ) , xI ( n, j ) ) 1 n Determine for each of the N sets the estimator (e.g. mean, variance, etc.) , establish the histogram and derive confidence intervals: Probability P[ I (i, j ) k ,1 k n] Estimator (e.g.variance) 7 Bootstrap procedure Bootstrap procedure (cont.): Resampling corresponds to sampling from the discrete probability distribution x The inference is only justified in case the sample represent the underlying unknown distribution well. The method is not reliable if only very few data are availble, i.e. in case n is small. The case n < 30 will be investigated in the following: 8 Probability mass outside the observation range N > 1 data points specify the observed range x( N ) x( j) x (1) xmaxb a xmin Define interval [a,b] a xmin xmax xmin 2N 2 Assume independent data points b xmax xmax xmin 2N 2 b P a x ( j ) b q N with q f X ( x)dx a Suggestion: Interpret qN as level of significance confidence level = 1- 9 Probability mass outside the observation range Probability P[ x( N k ) [a, b]] p P[ x( N k ) [a, b]] 1 q q 1/ N p( , N ) 1- 1/ N p( N , ) 10 Probability density Density outside the observed domain Until now we just have an estimate for the probability, not the density! Almost everthing is possible without any physical background information Reasonable (physical) assumptions: The density is high in the neighbourhood of any observations The density decreases with its distance from observations The density has a single domain with PDF(x)>0 11 Proposed PDF Extended bootstrap distribution: Replace the underlying discrete bootstrap probability distribution x( N ) x( j) x (1) xmaxb a xmin by continuous Gaussian kernel density functions x( N ) a xmin x( j) x (1) xmaxb 12 Proposed PDF Kernel densities: N gaussian densitities centered at the data points ( x x ( j ) ) 2 f X ( x, ) exp( ) 2 2 N 2 j 1 1 N Justification: + each data point has equal weight and provides identical information + each data point has the same variability + the probability of occurrence decreases with the distance p ( , N ) is used to specify the standard deviation : a - f X ( x, )dx f X ( x, )dx p( , N ) b 13 Application to data Calibration experiments: Nc 5 Young's modulus E ( Lc / 2)[MPa] Lc [mm] elongation N c 20 Three data sets Notation: ( j ) inverse Young's modulus ( j ) average inverse Young's modulus N c 30 over the length ( j) 1 1 ( Lc / 2) ( j ) , ( j) E ( Lc / 2) Lc Lc 0 AL ( x) dx c c Lc Fc ( j) ( j) 14 Application to calibration data Inverse Young's modulus ( j) 1 , j 1, 2, E ( j ) ( Lc / 2) 0.02 Nc 5 N c 20 , Nc N c 30 Exceptionally large dispersion for Nc=5 when compared with Nc=30. The distribution is function of the amount of data points Nc and the required confidence level 1-a. 0.10 Nc 5 N c 20 N c 30 15 Application to calibration data Average inverse Young's modulus ( j) 0.02 N c 20 N c 30 1 Lc ( j ) ( x)dx Lc 0 Ac L(c j ) Fc Lc Exceptionally small dispersion for Nc=5 when compared with Nc=30. Nc 5 0.10 Nc 5 N c 20 N c 30 The distribution is function of the amount of data points Nc and the required confidence level 1-a. 16 Application to calibration data Joint distribution as function of Nc and significance level Nc 5, 0.10 Nc 5, 0.02 PDF f ( , | , Nc ) 17 Application to calibration data Joint distribution as function of Nc and significance level Nc 20, 0.10 Nc 20, 0.02 PDF f ( , | , Nc ) 18 Application to calibration data Joint distribution as function of Nc and significance level Nc 30, 0.02 Nc 30, 0.10 PDF f ( , | , Nc ) 19 Random field calibration Random field model simple piecewise linear F Lc c Ac M M i 1 Lc i i 1 i i i (ˆi 1 ˆi ) / 2 i.i.d inverse Young's moduli i distribution of i (maximum entropy principle) F () 1 exp( min ) D D E[ min ] min derives from mechanics 20 Random field calibration Fitting of D E[ min ] The average correlation length D is selected such that it fits the joint distribution F ( , ) best. Simple Monte Carlo search 1 () N MC N MC ( F ( [i ] , [i ] ) K ( [i ] , [i ] ))2 Min i 1 1 1 K ( , ) 2 N MC 1 N MC [i ] 1 Lc [i ] [i ] Mi k 1 [i ,k ] [i ,k ] , with N MC I [ [k ] [ i ] [ k ] [ i ] ] k 1 [i,1] , [i ] [i ,M i ] M i 1 Lc [i ,k ] k 1 21 Application: Static Challenge problem Prediction of Exceedance probability Young's modulus in all four bars modelled as random field Challenge: Estimation of exceedance probability P[ yP 3.0 mm] = p f ( ) yP 22 Application: Static Challenge problem Prediction of Exceedance probability Consistent results Severe underestimation without introducing a low level of P[ yP 3.0 mm] = p f ( ) yP Prediction for = 0.02,0.10 and Nc 5, 20,30 23 Summary and Conclusion The spread of the assumed probability distribution is a function of the number of data points and the required confidence level. The introduction of confidence level provides a suitable safeguard against a severe underestimation of the variability of the parameters derived from a small data set. Consistent results can be obtained although the small data set might be misleading. 24 Acknowledgment This research is partially supported by the European Commission under contract # RTN505164 (MADUSE) 25
© Copyright 2026 Paperzz