Overcoming the Fear of Using an Incorrect Probability Distribution J.D. Solomon, PE, CRE, CMRP [email protected] Palisade Risk Conference July 27, 2016 Chicago, IL WBG052114205536FLL Today’s Discussion - Topics The Sources of Our Fears Some Comforting Reasons Why It May Not Matter Types of Data Four Methods to Better Select a Probability Distribution The Degree to Which It Matters Moving Forward Beginner to Intermediate Level… Practical… Often Controversial… Intended to Stir Debate JD Solomon, PE, CRE, CMRP A little bit about me… BS – Civil Engineering – NC State University Master of Business Administration – University of South Carolina Professional Certificate – Strategic Decisions & Risk Management – Stanford Professional Engineer (NC, SC, VA) Certified Reliability Engineer Certified Maintenance and Reliability Professional Six Sigma Black Belt Certified in Lean Management JD Solomon, PE, CRE, CMRP Some Past Presentations at Palisade Conferences System Renewal & Replacement Forecasting Partnering Agreements Reliability Assessments Communications with Boards, Elected Officials, and the Public Big Data and Mind Maps JD Solomon, PE, CRE, CMRP The Sources of Our Fears The Sources of our Fears 1. Statistical Illiteracy 2. Deterministic Education and Training 3. Inadequate Post-Mortems 4. Don’t Use the Tool Enough JD Solomon, PE, CRE, CMRP Some Comforts (or maybe discomforts) Few people are ever going to follow up They don’t understand There is not enough money If inaccurate, people will not usually beat a dead horse Majority of people don’t understand it better than you – especially after today JD Solomon, PE, CRE, CMRP Some Comforts (or maybe discomforts) You will probably be conservative anyway Most people governed by Central Limit Theorem Anchoring is more common than uncommon JD Solomon, PE, CRE, CMRP Types of Data Data Classification Qualitative (categorical) Quantitative (numeric) • Discrete data are numeric data that have a finite number of possible values (integers) • Continuous data have infinite possibilities are continuous with no gaps or interruptions (real numbers) JD Solomon, PE, CRE, CMRP Data Classification and Statistics Ratio Continuous (real numbers) Full range of statistics (parametric) Interval May Be Treated as Continuous Most parametric approaches Ordinal Discrete (integers) Non-parametric approaches Cardinal Categorical (qualitative) None Statistical Approaches based on Stevens (1946) JD Solomon, PE, CRE, CMRP Data Classification and Statistics Parametric statistics assumes that sample data comes from a population that follows a probability distribution based on a fixed set of parameters When the assumptions are correct, parametric methods will produce more accurate and precise estimates than non-parametric methods Non-parametric “distribution free” approaches (think ranking and counting) include: Wilcoxon, Mann-Whitney, Kruskal-Wallis, Mood’s median test, Friedman When to use non-parametric: • Data set is better represented by the median • Very small sample size • Ordinal data, ranked data, or outliers that you can’t remove JD Solomon, PE, CRE, CMRP So What Does This Mean? Potential mixing of scales in the source data set can be problematic Otherwise, practically it is a bit of a fine point It is correct - understanding it should provide some added degree of confidence Bar chart describes discrete/interval data It is not an uncommon weak point for those who you fear that may criticize you Remember that a key outcome of any statistical analysis is to provide an indication of central tendency and dispersion Histogram describes continuous/ratio data JD Solomon, PE, CRE, CMRP Four Methods to Better Select a Probability Distribution 3 Major Challenges for Establishing Useful Probability Distributions Nature of the data (use of continuous or discrete scales) Underlying quality of the available reference data Potential lack of meaningful lifetime data JD Solomon, PE, CRE, CMRP Four Approaches We Will Discuss Today 1. 2. 3. 4. Classic statistics – “complete” data Distribution Fitting & Bootstrapping – partial data Cumulative Density Function – limited data or no data Quantiles – partial lifetime data JD Solomon, PE, CRE, CMRP 1. Classic Statistics with Good Lifetime Data Pros Most definitive This is what we are all hoping to have and use Cons Laboratory testing does not reflect actual operating conditions Field testing is expensive and often not practical Running controlled tests to failure takes time and is expensive Data capture from work orders is erratic and of questionable quality In practice, we seldom run important things to failure Few organizations do formal Root Cause Failure Analysis (RCFA) JD Solomon, PE, CRE, CMRP 2. Distribution Fitting & Bootstrapping Applicable with partial lifetime data Efron and Tibshirani are credited with bootstrapping method Distribution fitting is a standard tool within @Risk Pros Proven and accepted approach Distribution fitting is a standard tool in @Risk Cons Requires some knowledge of the typical distribution for the class of data Frequently use curve beyond the range of available data JD Solomon, PE, CRE, CMRP 3. Cumulative Density Function Applicable when limited data or no data Interview-based approach Developed by Spetzler and Stael von Holstein (1974) Pros Proven and accepted approach Cons Requires some knowledge of the typical distribution for the class of data Reliance on Subject Matter Experts, facilitation skills, survey quality JD Solomon, PE, CRE, CMRP JD Solomon, PE, CRE, CMRP Bar Charts Provide Meaningful Insights JD Solomon, PE, CRE, CMRP 4. Quantiles Applicable with partial data Keelin and Powley advocate the use of quantiles (2012) Quantile density function (QDF), as yet another way of describing a PDF Pros Mathematically rigorous method Cons Not common Requires some specialty knowledge and tools JD Solomon, PE, CRE, CMRP Types of Distributions Most Common Distributions Normal: Symmetric range and not bounded Triangular: Symmetric ranges and bounded Continuous PERT: Non-symmetric ranges and bounded LogNormal: Non-symmetric and bounded on one side Binomial Poisson Discrete JD Solomon, PE, CRE, CMRP Triangular No theoretical justification, however, simple and clear to use. Allows for skewness. Good choice where simple intuitive understanding and flexibility is of utmost importance. Perhaps the most commonly-used distribution in construction risk analysis. JD Solomon, PE, CRE, CMRP Pert Appropriate when more confident that the true value lies in the vicinity of the most likely outcome than at the extremes of the distribution Will result in a lower risk-based project contingency than will use of the Triangular distribution. JD Solomon, PE, CRE, CMRP Lognormal Good for any risk that can have a cost outcome ranging from $0 to infinity good for risks that vary over several orders of magnitude with low probability “bad outcomes”. Used to characterize extreme events Maximum annual rainfall, cost of utility relocations, installation and testing of complex systems, and repair times for maintenance. JD Solomon, PE, CRE, CMRP Normal Distribution Central limit theorem states that the mean of a set of values drawn independently from a population will be normally distributed. Somewhat more difficult to specify a standard deviation than Triangular and other distributions that require specification of upper/lower limits. Manufacturing processes, many man made mechanical devices, inflation JD Solomon, PE, CRE, CMRP Discrete Distributions Used when each outcome has a value and a probability of occurrence. Used to model several discrete outcomes (e.g. worst case, expected case, best case) or to model discrete scenarios. Often used for Yes/No conditions Examples would be things like counting number of defects, compliance, and rare events (event risks) JD Solomon, PE, CRE, CMRP Specialty Distributions Many other specialty distributions The Weibull Distribution represents three different types of distributions. Used commonly in reliability and systems engineering. Also commonly known as the bathtub curve. The Beta distribution represents the uncertainty distribution for probability of a binomial process given observed data The Gamma distribution represents the uncertainty for the intensity of a Poisson process JD Solomon, PE, CRE, CMRP The Degree to Which the World is Skewed Skewed Biological (contamination, fish kills) Social Sciences/Human Behavior (financial markets) Some Physical Sciences (floods, sand piles, earthquakes) Normally Distributed Many Physical Processes, especially manufacturing systems or mechanical systems JD Solomon, PE, CRE, CMRP The Degree to Which It Matters Example: Agriculture and Nutrients JD Solomon, PE, CRE, CMRP Theoretical Foundations (use some common sense!) Measurement Representativeness: Objects can actually be assigned to a number Uniqueness: A number cannot be assigned to the same object by different measurers Surveys Validity: If it can be shown to measure the variable that it is intended to measure and not others. Reliability: For surveys, refers to the extent the same results are obtained with the same question when repeated to the same group of respondents. “Applying the Scales‐Measurement Theory and Statistical Analysis Controversy to Risk Assessment” by Solomon, Vallero, and Benson JD Solomon, PE, CRE, CMRP Moving Forward Just Get in the Game! Vast majority of users do not have all the data they would prefer. Embrace it! It is part of the power of probabilistic analysis. Be Bold! First generation models direct where more data is needed. Be conscious of types of data. Be aware of embedded data – make sure to use some common sense. Expand your ranges! Avoid anchoring! JD Solomon, PE, CRE, CMRP Some Insightful References Benoit Mandelbrot Benoit Mandelbrot The (Mis)behavior of Markets The Fractal Geometry of Nature Mark Buchanan Philip Ball Ubiquity Critical Mass Sam Savage The Flaw of Averages JD Solomon, PE, CRE, CMRP Overcoming the Fear of Using an Incorrect Probability Distribution J.D. Solomon, PE, CRE, CMRP [email protected] Palisade Risk Conference July 27, 2016 Chicago, IL WBG052114205536FLL
© Copyright 2026 Paperzz