CwU 2004

Confidence in the Range of Variability
H.J. Pradlwarter and G.I. Schuëller
[email protected]
Institute of Engineering Mechanics
Leopold-Franzens University
Innsbruck, Austria, EU
1
Problem definition

Suppose, only few measured values of an uncertain
quantity are available:
Is it under such circumstance possible to establish a credible
probability distribution for the reliability assessment?
- Without any strong assumptions, certainly not !
- There is an infinite number of options !
- Some physical background information is needed to
proceed any further.
2
Problem definition (cont.)
 Among
the infinite set of options, the choice should reflect
the needs of the analysis
Some options:
The uncertainty due to the insufficient amount of data points
should be considered
 For estimating the performance (safety assessment)
confidence in the estimates will be crucial.

3
Problem definition (cont.)

Only few measured values of an uncertain quantity are
available:
Is it under such circumstance possible to establish a credible
probability distribution for the reliability assessment?
Yes if:
- Some physical background information can be safely assumed
- We are not looking for the best estimate (e.g. a Bayesian approach)
but for an conservative PDF estimate for a required confidence level.
4
Overview
Bootstrap procedure
 Statistical results, e.g. confidence intervals
 Probability of observation



Probability of lying outside the observed domain
Probability density estimation



Extended bootstrap procédure
Marginal distributions for calibration data
Joint distributions
Results
 Conclusions

5
Bootstrap procedure

Bootstrap procedure:
 Modern, computer-intensive, general purpose approach to
statistical inference.

Approach to compute properties of an estimator (e.g.
variance, confidence intervals, correlations).

Advantage: Straightforward also for complex estimators and
complex distributions.

Disadvantage: Tendency to be too optimistic for small
sample sizes.
6
Bootstrap procedure

Bootstrap procedure (cont.):

Given the data set
x  ( x1 , x2 ,..., xn1 , xn )
 Resampling: Generate artificially a large number N of sets
x 
( j) N
from data by random sampling
j1
x( j )  ( xI (1, j ) , xI (2, j ) ,..., xI ( n1, j ) , xI ( n, j ) )
1
n
Determine for each of the N sets the estimator (e.g. mean,
variance, etc.) , establish the histogram and derive
confidence intervals:
Probability P[ I (i, j )  k ,1  k  n] 

Estimator (e.g.variance)
7
Bootstrap procedure

Bootstrap procedure (cont.):

Resampling corresponds to sampling from the discrete
probability distribution
x

The inference is only justified in case the sample represent
the underlying unknown distribution well.

The method is not reliable if only very few data are availble,
i.e. in case n is small.

The case n < 30 will be investigated in the following:
8
Probability mass outside the observation range

N > 1 data points specify the observed range
x( N )
x( j)
x (1)
xmaxb
a xmin
 Define interval [a,b]
a  xmin 
xmax  xmin
2N  2
 Assume independent data points
b  xmax 
xmax  xmin
2N  2
b
P  a  x ( j )  b   q N with q   f X ( x)dx
a

Suggestion: Interpret qN as level of significance 
confidence level = 1- 
9
Probability mass outside the observation range

Probability P[ x( N k ) [a, b]]
p  P[ x( N  k ) [a, b]]  1  q
q   1/ N  p( , N )  1-  1/ N
p( N ,  )
10
Probability density

Density outside the observed domain
Until now we just have an estimate for the probability, not the density!
Almost everthing is possible without any physical background information

Reasonable (physical) assumptions:



The density is high in the neighbourhood of any observations
The density decreases with its distance from observations
The density has a single domain with PDF(x)>0
11
Proposed PDF

Extended bootstrap distribution:

Replace the underlying discrete bootstrap probability distribution
x( N )
x( j)
x (1)
xmaxb
a xmin

by continuous Gaussian kernel density functions
x( N )
a xmin
x( j)
x (1)
xmaxb
12
Proposed PDF

Kernel densities:
N gaussian densitities centered at the data points
( x  x ( j ) ) 2
f X ( x,  ) 
exp(
)

2
2
N 2 j 1
1
N
Justification:
+ each data point has equal weight and provides identical information
+ each data point has the same variability
+ the probability of occurrence decreases with the distance
 p ( , N )
is used to specify the standard deviation 
:

a
-

f X ( x,  )dx   f X ( x,  )dx  p( , N )
b
13
Application to data

Calibration experiments:


Nc  5
Young's modulus E ( Lc / 2)[MPa]
 Lc [mm]
elongation
N c  20

Three data sets
Notation:
 ( j ) inverse Young's modulus
 ( j ) average inverse Young's modulus
N c  30
over the length

( j)
1
1
( Lc / 2)  ( j )
,  ( j) 
E ( Lc / 2)
Lc

Lc
0

AL
( x) dx  c c
Lc Fc
( j)
( j)
14
Application to calibration data

Inverse Young's modulus
 ( j) 
1
, j  1, 2,
E ( j ) ( Lc / 2)
  0.02
Nc  5
N c  20
, Nc
N c  30
Exceptionally large dispersion
for Nc=5 when compared with
Nc=30.
The distribution is function of
the amount of data points Nc
and the required confidence
level 1-a.
  0.10
Nc  5
N c  20
N c  30
15
Application to calibration data

Average inverse
Young's modulus

( j)
  0.02
N c  20
N c  30
1 Lc
   ( j ) ( x)dx
Lc 0
Ac L(c j )

Fc Lc
Exceptionally small dispersion
for Nc=5 when compared with
Nc=30.
Nc  5
  0.10
Nc  5
N c  20
N c  30
The distribution is function of
the amount of data points Nc
and the required confidence
level 1-a.
16
Application to calibration data

Joint distribution as function of Nc and significance level
Nc  5,   0.10
Nc  5,   0.02




PDF f ( , |  , Nc )
17
Application to calibration data

Joint distribution as function of Nc and significance level
Nc  20,   0.10
Nc  20,   0.02




PDF f ( , |  , Nc )
18
Application to calibration data

Joint distribution as function of Nc and significance level
Nc  30,   0.02

Nc  30,   0.10



PDF f ( , |  , Nc )
19
Random field calibration

Random field model

simple piecewise linear
F
 Lc  c
Ac
M
M
 
i 1
Lc    i
i 1


i
i
i  (ˆi 1  ˆi ) / 2
i.i.d inverse Young's moduli  i
distribution of  i (maximum entropy principle)
F ()  1  exp(
   min
)
D
D  E[  min ]
 min derives from mechanics
20
Random field calibration

Fitting of D  E[  min ]
The average correlation length
D is selected such that it fits
the joint distribution F ( , )
best.



Simple Monte Carlo search
1
 () 
N MC
N MC
 ( F (
[i ]
, [i ] )  K ( [i ] , [i ] ))2  Min
i 1
1
1
K ( , ) 

2 N MC 1  N MC
[i ]
1
 
Lc
[i ]
[i ]
Mi

k 1
[i ,k ]

[i ,k ]
, with 
N MC
 I [
[k ]
  [ i ]  [ k ]   [ i ] ]
k 1
[i,1]
 ,
[i ]
[i ,M i ]
M i 1
 Lc   [i ,k ]
k 1
21
Application: Static Challenge problem

Prediction of Exceedance probability


Young's modulus in all four bars modelled as random field
Challenge: Estimation of exceedance probability
P[ yP  3.0 mm] = p f   (  )
 yP
22
Application: Static Challenge problem

Prediction of Exceedance probability


Consistent results
Severe underestimation without
introducing a low level of 
P[ yP  3.0 mm] = p f   (  )
 yP
Prediction for  = 0.02,0.10 and Nc  5, 20,30
23
Summary and Conclusion

The spread of the assumed probability distribution is a
function of the number of data points and the required
confidence level.

The introduction of confidence level provides a suitable
safeguard against a severe underestimation of the
variability of the parameters derived from a small data set.

Consistent results can be obtained although the small data
set might be misleading.
24
Acknowledgment
This research is partially supported by the
European Commission under contract
# RTN505164 (MADUSE)
25