Math 6330: Statistical Consulting Class 4 Tony Cox [email protected] University of Colorado at Denver Course web site: http://cox-associates.com/6330/ Assignment for next time (February 7) • Evaluate evidence that PM2.5 causes elderly mortality in new data set, Sample4.xlsx, at http://cox-associates.com/6330/. Be prepared to present your thoughts in < 5-minute presentation (just in case!) • Read Russo & Schoemaker, 1989, Chapter 5 (improving intelligence-gathering and estimation), https://professional.sauder.ubc.ca/re_creditprogram/course_ resources/courses/content/499/russo.pdf • Software: Download Netica, bring it next time http://www.norsys.com/download.html • (Optional) Youtube: Hans Rosling TED talk, www.ted.com/talks/hans_rosling_shows_the_best_stats_you _ve_ever_seen • (Optional) Fair Coin problem 2 Introduction to descriptive analytics (cont.) – Some high-value tools • • • • • • Interaction plots CART trees Bayesian Networks (BNs) Random Forests Partial dependence plots Visualization 3 Interaction plots 4 Interaction plot descriptions can generate important hypotheses Why is low income so strongly associated with increased risk of heart attack? 5 Interaction plot descriptions can raise worthwhile research questions How and why does education affect self-reported health risks? 6 BNs can help to answer such questions 7 PM2.5 is informative about elderly mortality in Sample4 Dependence between elderly and other mortality suggests hidden confounders 8 Do changes in PM2.5 predict changes in elderly mortality in Sample4? 9 Descriptive analytics: Visualization • Hans Rosling TED talk, www.ted.com/talks/hans_rosling_shows_the_ best_stats_you_ve_ever_seen 10 Introduction to predictive analytics 11 Everyone wants predictive analytics “Move your analytics program from descriptive to predictive with Microsoft Azure Machine Learning—part of the Cortana Intelligence Suite. You can use our pre-built modules or upload your own R or Python code. Learn more with our machine learning guide for data scientists.” 12 Predictive analytics • What will happen if we do nothing? • How sure can we be? Example: Black-box ARIMA forecasting of losses due to terrorist attacks 13 http://www.slideshare.net/VictorOdutokun/arima-analysis-project-slide Predictive analytics challenge: Different models make different predictions for case 7 • Model 1: Outcome = Predictor 3 • Model 2: Outcome = majority(Predictors 2-4) • Model 3: Outcome = max(Predictors 3-4) Case Predictor 1 Predictor 2 Predictor 3 Predictor 4 Outcome 1 1 1 1 1 1 2 0 0 0 0 0 3 0 1 1 0 1 4 1 1 0 0 0 5 0 0 0 0 0 6 1 0 1 1 1 7 1 1 0 1 ? 14 Predictive analytics techniques • • • • Forecasting: Pr(future outputs | past) Regression: Pr(output | covariates) Dynamic simulation: Pr(outputs | inputs) Inference: Bayesian network (BN) PDFs – Inference: Pr(outputs | observed inputs) • Monte-Carlo and exact inference algorithms • Structure learning and ensemble learning algs – Dynamic Bayesian Networks (DBNs) • Kalman filtering and extensions • Particle swarm optimization 15 Breakthroughs in predictive analytics • Averaging predictions from multiple models improves predictions! – More accurate, less bias, more precise (lower error variance), less over-confidence (fewer type 1, type 2 errors) • Ensemble methods improve forecasts – – – – Random forest (rf) Gradient boosting (gbm) Cross-validation, BMA Super-learning 16 Introduction to Bayesian inference with Netica® 17 Example: HIV screening • Pr(s) = 0.01 = fraction of population with HIV – s = has HIV, s′ = does not have HIV – y = test is positive • Pr(test positive | HIV) = 0.99 • Pr(test positive | no HIV) = 0.02 • Find: Pr(HIV | test positive) = Pr(s | y) – Subjective probability estimates? 18 Solution via Bayesian Network (BN) Solver • DAG model: “True state Observation” – DAG = “directed acyclic graph”: Nodes and arrows, no cycles allowed • Store “marginal probabilities” at input nodes (having output arrows only) • Store “conditional probability tables” at all other nodes. • Make observations • Enter query – Solver calculates conditional probabilities 19 Solution in Netica • Step 1: Build model, compile network HIV_status HIV present 1.0 HIV not present 99.0 Test_result test positive 2.97 test negative 97.0 20 Solution in Netica • Step 1: Build model, compile network HIV_status HIV present 1.0 HIV not present 99.0 Test_result test positive 2.97 test negative 97.0 • Step 2: Condition on observation (right-click, choose “Enter findings”), view conditional probabilities HIV_status HIV present 33.3 HIV not present 66.7 Test_result test positive 100 test negative 0 21 Wrap-up on Netica introduction • User just needs to enter model and observations (“findings”) • Netica uses Bayesian Network algorithms to update all probabilities (conditioning them on findings) • We will learn to do this manually for small problems • Algorithms and software are essential for large, complex inference problems 22 Fair Coin Problem • A box contains two coins: (a) A fair coin; and (b) A coin with a head on each side. One coin is selected at random (we don’t know which) and tossed once. It comes up heads. • Q1: What is the probability that the coin is the fair coin? • Q2: If the same coin is tossed again and shows heads again, then what is the new (posterior) probability that it is the fair coin? Solve manually and/or using Netica. 23 Using Netica to solve fair coin problem • Step 1: Create DAG model. (Q: What is its root?) A: Root node is “Coin is fair” • Step 2: Use “Enter Findings” (right-click) to specify observations (i.e., histories of observations on which answers are to be conditioned, e.g., “Head on first toss” or “Heads on first two tosses”) • Step 3: View the “Coin is fair” root node to view the answer (i.e., Pr(Coin is fair | Observations). 24 Using Netica to solve fair coin problem • Step 1: Create DAG model. (Q: What is its root?) A: Root node is “Coin is fair” CoinIsFair Yes 50.0 No 50.0 FirstToss Head 75.0 Tail 25.0 SecondToss1 Head 75.0 Tail 25.0 25 Using Netica to solve fair coin problem • Step 1: Create DAG model. (Q: What is its root?) A: Root node is “Coin is fair” • Step 2: Use “Enter Findings” • Step 3: View the “Coin is fair” root node to view the answer (i.e., Pr(Coin is fair | Observations). CoinIsFair Yes 33.3 No 66.7 FirstToss Head 100 Tail 0 SecondToss1 Head 83.3 Tail 16.7 26 Using Netica to solve fair coin problem • Step 1: Create DAG model. (Q: What is its root?) A: Root node is “Coin is fair” • Step 2: Use “Enter Findings” • Step 3: View the “Coin is fair” root node to view the answer (i.e., Pr(Coin is fair | Observations). CoinIsFair Yes 20.0 No 80.0 FirstToss Head 100 Tail 0 SecondToss1 Head 100 Tail 0 27
© Copyright 2026 Paperzz