Overcoming the Fear of Using an Incorrect Probability Distribution

Overcoming the Fear of Using an
Incorrect Probability Distribution
J.D. Solomon, PE, CRE, CMRP
[email protected]
Palisade Risk Conference
July 27, 2016
Chicago, IL
WBG052114205536FLL
Today’s Discussion - Topics
 The Sources of Our Fears





Some Comforting Reasons Why It May Not Matter
Types of Data
Four Methods to Better Select a Probability Distribution
The Degree to Which It Matters
Moving Forward
Beginner to Intermediate Level… Practical… Often Controversial… Intended to Stir Debate
JD Solomon, PE, CRE, CMRP
A little bit about me…
 BS – Civil Engineering – NC State University
 Master of Business Administration – University of South Carolina
 Professional Certificate – Strategic Decisions & Risk Management – Stanford
 Professional Engineer (NC, SC, VA)
 Certified Reliability Engineer
 Certified Maintenance and Reliability Professional
 Six Sigma Black Belt
 Certified in Lean Management
JD Solomon, PE, CRE, CMRP
Some Past Presentations at Palisade Conferences
 System Renewal & Replacement Forecasting
 Partnering Agreements
 Reliability Assessments
 Communications with Boards, Elected Officials, and the Public
 Big Data and Mind Maps
JD Solomon, PE, CRE, CMRP
The Sources of Our Fears
The Sources of our Fears
1. Statistical Illiteracy
2. Deterministic Education and Training
3. Inadequate Post-Mortems
4. Don’t Use the Tool Enough
JD Solomon, PE, CRE, CMRP
Some Comforts (or maybe discomforts)
Few people are ever going to follow up
 They don’t understand
 There is not enough money
 If inaccurate, people will not usually beat a dead horse
 Majority of people don’t understand it better than you –
especially after today
JD Solomon, PE, CRE, CMRP
Some Comforts (or maybe discomforts)
You will probably be conservative anyway
 Most people governed by Central Limit Theorem
 Anchoring is more common than uncommon
JD Solomon, PE, CRE, CMRP
Types of Data
Data Classification
 Qualitative (categorical)
 Quantitative (numeric)
• Discrete data are numeric data that have a finite
number of possible values (integers)
• Continuous data have infinite possibilities are
continuous with no gaps or interruptions (real numbers)
JD Solomon, PE, CRE, CMRP
Data Classification and Statistics
 Ratio
Continuous (real numbers)
Full range of statistics (parametric)
 Interval
May Be Treated as Continuous
Most parametric approaches
 Ordinal
Discrete (integers)
Non-parametric approaches
 Cardinal
Categorical (qualitative)
None
Statistical Approaches based on Stevens (1946)
JD Solomon, PE, CRE, CMRP
Data Classification and Statistics
 Parametric statistics assumes that sample data comes from a population that follows a
probability distribution based on a fixed set of parameters
 When the assumptions are correct, parametric methods will produce more accurate and
precise estimates than non-parametric methods
 Non-parametric “distribution free” approaches (think ranking and counting) include:
Wilcoxon, Mann-Whitney, Kruskal-Wallis, Mood’s median test, Friedman
 When to use non-parametric:
• Data set is better represented by the median
• Very small sample size
• Ordinal data, ranked data, or outliers that you can’t remove
JD Solomon, PE, CRE, CMRP
So What Does This Mean?
 Potential mixing of scales in the source data
set can be problematic
 Otherwise, practically it is a bit of a fine point
 It is correct - understanding it should provide
some added degree of confidence
Bar chart describes
discrete/interval data
 It is not an uncommon weak point for those
who you fear that may criticize you
 Remember that a key outcome of any
statistical analysis is to provide an indication
of central tendency and dispersion
Histogram describes
continuous/ratio data
JD Solomon, PE, CRE, CMRP
Four Methods to Better Select a
Probability Distribution
3 Major Challenges for Establishing Useful
Probability Distributions
 Nature of the data (use of continuous or discrete scales)
 Underlying quality of the available reference data
 Potential lack of meaningful lifetime data
JD Solomon, PE, CRE, CMRP
Four Approaches We Will Discuss Today
1.
2.
3.
4.
Classic statistics – “complete” data
Distribution Fitting & Bootstrapping – partial data
Cumulative Density Function – limited data or no data
Quantiles – partial lifetime data
JD Solomon, PE, CRE, CMRP
1. Classic Statistics with Good Lifetime Data
Pros
 Most definitive
 This is what we are all hoping to have and use
Cons
 Laboratory testing does not reflect actual operating conditions
 Field testing is expensive and often not practical
 Running controlled tests to failure takes time and is expensive
 Data capture from work orders is erratic and of questionable quality
 In practice, we seldom run important things to failure
 Few organizations do formal Root Cause Failure Analysis (RCFA)
JD Solomon, PE, CRE, CMRP
2. Distribution Fitting & Bootstrapping
 Applicable with partial lifetime data
 Efron and Tibshirani are credited with bootstrapping method
 Distribution fitting is a standard tool within @Risk
Pros
 Proven and accepted approach
 Distribution fitting is a standard tool in @Risk
Cons
 Requires some knowledge of the typical distribution for the class of data
 Frequently use curve beyond the range of available data
JD Solomon, PE, CRE, CMRP
3. Cumulative Density Function
 Applicable when limited data or no data
 Interview-based approach
 Developed by Spetzler and Stael von Holstein (1974)
Pros
 Proven and accepted approach
Cons
 Requires some knowledge of the typical distribution for the class of data
 Reliance on Subject Matter Experts, facilitation skills, survey quality
JD Solomon, PE, CRE, CMRP
JD Solomon, PE, CRE, CMRP
Bar Charts Provide Meaningful Insights
JD Solomon, PE, CRE, CMRP
4. Quantiles
 Applicable with partial data
 Keelin and Powley advocate the use of quantiles (2012)
 Quantile density function (QDF), as yet another way of describing a PDF
Pros
 Mathematically rigorous method
Cons
 Not common
 Requires some specialty knowledge and tools
JD Solomon, PE, CRE, CMRP
Types of Distributions
Most Common Distributions
 Normal: Symmetric range and not bounded
 Triangular: Symmetric ranges and bounded
Continuous
 PERT: Non-symmetric ranges and bounded
 LogNormal: Non-symmetric and bounded on one side
 Binomial
 Poisson
Discrete
JD Solomon, PE, CRE, CMRP
Triangular
 No theoretical justification, however,
simple and clear to use. Allows for
skewness.
 Good choice where simple intuitive
understanding and flexibility is of
utmost importance.
 Perhaps the most commonly-used
distribution in construction risk
analysis.
JD Solomon, PE, CRE, CMRP
Pert
 Appropriate when more confident that
the true value lies in the vicinity of the
most likely outcome than at the
extremes of the distribution
 Will result in a lower risk-based project
contingency than will use of the
Triangular distribution.
JD Solomon, PE, CRE, CMRP
Lognormal
 Good for any risk that can have a cost
outcome ranging from $0 to infinity
 good for risks that vary over several
orders of magnitude with low
probability “bad outcomes”.
 Used to characterize extreme events
 Maximum annual rainfall, cost of utility
relocations, installation and testing of
complex systems, and repair times for
maintenance.
JD Solomon, PE, CRE, CMRP
Normal Distribution
 Central limit theorem states that the
mean of a set of values drawn
independently from a population will be
normally distributed.
 Somewhat more difficult to specify a
standard deviation than Triangular and
other distributions that require
specification of upper/lower limits.
 Manufacturing processes, many man
made mechanical devices, inflation
JD Solomon, PE, CRE, CMRP
Discrete Distributions
 Used when each outcome has a value
and a probability of occurrence.
 Used to model several discrete outcomes
(e.g. worst case, expected case, best
case) or to model discrete scenarios.
 Often used for Yes/No conditions
 Examples would be things like counting
number of defects, compliance, and rare
events (event risks)
JD Solomon, PE, CRE, CMRP
Specialty Distributions
 Many other specialty distributions
 The Weibull Distribution represents three different types of
distributions. Used commonly in reliability and systems
engineering. Also commonly known as the bathtub curve.
 The Beta distribution represents the uncertainty
distribution for probability of a binomial process given
observed data
 The Gamma distribution represents the uncertainty for the
intensity of a Poisson process
JD Solomon, PE, CRE, CMRP
The Degree to Which the World is Skewed
Skewed
 Biological (contamination, fish kills)
 Social Sciences/Human Behavior (financial markets)
 Some Physical Sciences (floods, sand piles, earthquakes)
Normally Distributed
 Many Physical Processes, especially manufacturing systems or
mechanical systems
JD Solomon, PE, CRE, CMRP
The Degree to Which It Matters
Example: Agriculture and Nutrients
JD Solomon, PE, CRE, CMRP
Theoretical Foundations (use some common sense!)
Measurement
 Representativeness: Objects can actually be assigned to a number
 Uniqueness: A number cannot be assigned to the same object by
different measurers
Surveys
 Validity: If it can be shown to measure the variable that it is intended to
measure and not others.
 Reliability: For surveys, refers to the extent the same results are
obtained with the same question when repeated to the same group of
respondents.
“Applying the Scales‐Measurement Theory and Statistical Analysis Controversy to Risk Assessment” by Solomon, Vallero, and Benson
JD Solomon, PE, CRE, CMRP
Moving Forward
Just Get in the Game!
 Vast majority of users do not have all the data they would prefer.
 Embrace it! It is part of the power of probabilistic analysis.
 Be Bold! First generation models direct where more data is needed.
 Be conscious of types of data.
 Be aware of embedded data – make sure to use some common sense.
 Expand your ranges! Avoid anchoring!
JD Solomon, PE, CRE, CMRP
Some Insightful References
 Benoit Mandelbrot
 Benoit Mandelbrot
The (Mis)behavior of Markets
The Fractal Geometry of Nature
 Mark Buchanan
 Philip Ball
Ubiquity
Critical Mass
 Sam Savage
The Flaw of Averages
JD Solomon, PE, CRE, CMRP
Overcoming the Fear of Using an
Incorrect Probability Distribution
J.D. Solomon, PE, CRE, CMRP
[email protected]
Palisade Risk Conference
July 27, 2016
Chicago, IL
WBG052114205536FLL