Math 6330: Statistical Consulting Class 7 Tony Cox [email protected] University of Colorado at Denver Course web site: http://cox-associates.com/6330/ Agenda • Charniak Family Out network • Causal analytics (cont.) • Prescriptive analytics – Simulation-optimization – Newsvendor problem and applications – Decision rules, optimal statistical decisions • Decision psychology – Heuristics and biases • Probabilistic expert systems (Netica) 2 Recommended readings on causal analytics • Charniak (1991), pages 50-53, – Build the network in Figure 2 www.aaai.org/ojs/index.php/aimagazine/article/viewFile/918/836 • Pearl (2009) http://ftp.cs.ucla.edu/pub/stat_ser/r350.pdf • Methods to Accelerate the Learning of Bayesian Network Structures, Daly and Shen (2007) https://pdfs.semanticscholar.org/e7d3/029e84a1775bb12e7e67541beaf2367f7a88.pdf • Distinguishing cause from effect using observational data (Mooij et al., 2016), www.jmlr.org/papers/volume17/14-518/14-518.pdf • Probabilistic computational causal discovery for systems biology (Lagani et al., 2016) www.mensxmachina.org/files/publications/Probabilistic%20Causal%20Discovery%20for%20System s%20Biology_prePrint.pdf 3 Family out network family_out Yes No bowel_problem 15.0 85.0 Yes No light_on Yes No 13.3 86.8 1.00 99.0 dog_out Yes No 16.9 83.1 hear_bark Yes No 12.6 87.4 4 Family out network: Inference family_out Yes No bowel_problem 44.8 55.2 Yes No light_on Yes No 100 0 0.56 99.4 dog_out Yes No 33.5 66.5 hear_bark Yes No 0 100 5 Family out network: Inference family_out Yes No bowel_problem 44.8 55.2 Yes No light_on Yes No 100 0 0.56 99.4 dog_out Yes No 33.5 66.5 Information flows both backward up arrows (via Bayes rule) and down arrows hear_bark Multiple sources of evidence are fused at the “dog_out” node (via conditioning) Yes No 0 100 6 Pearl, 2009, Sections 1-2 Four under-used ideas for causal analysis: 1. Counterfactual analysis 2. Nonparametric structural equations, y = f(x, e) 3. Graphical models (DAGs) 4. Symbiosis between counterfactual and graphical methods: y = f(do(x), e) Challenges: Untested assumptions, new notation 7 Project ideas 8 Goals for student projects • Extract good problems from available knowledge and data – “Good” = high value of analysis = large improvement in decisions, results, etc. • Apply high-value techniques to produce valuable answers and insights – Unexpected directions are ok! • Present the results so that the potential value is actually delivered • If possible, document impact and next steps • Or… do algorithm research, software project, suggest improved analytics for –omics data analysis, etc. 9 General ideas for projects 1. 2. 3. 4. 5. Read and report on a book or a collection of about 3-5 technical papers in an area where statistical consulting is extensively used – Marketing, pricing, A/B testing; Location, operations, inventory and logistics; Bidding – Policy analysis – Genomic, other –omics data analysis and predictive analytics Research and report on or extend some of the algorithms behind Netica, simulation-optimization, or other analytics software Create a piece of software for advanced analytics Apply statistical consulting methods and software to a real-world problem from work or life. Propose your own novel research or application 10 Some examples of applied statistical consulting project ideas 1. Optimizing buyer discounts given history 2. Excel simulation-optimization (SO) solver for multi-item inventory control decisions 3. SO for call center intent-based routing to reduce call handling times 4. NFL decision analytics; Improving 4th down decisions 5. Providing more valuable information to help change customer/patient behaviors 11 Components of a successful project 1. 2. 3. 4. 5. 6. 7. 8. Problem statement and motivation Data Analysis plan/narrative Tools and software Results: Reports and displays Presentation: What did we learn? Evaluation: What was the impact? Proposed next steps 12 Causal analytics (cont.) 13 How to get from data to causal predictions… objectively? • Causal prediction – Deterministic causal prediction: Doing X will make Y happen to people of type Z – Probabilistic causal prediction: Doing X will change conditional probability distribution of Y, given covariates Z • Goal: Manipulative causation (vs. associational, counterfactual, predictive, computational, etc.) • Data: Observed (X, Y, Z) values • Challenge: How will changing X change Y? 14 Ambiguous associations undermine causal predictions from data • How would cutting exposure concentration C in half affect future response rate R? Community Concentration , C Income, I Response rate, R A 4 100 8 B 8 60 16 C 12 20 24 Model 1: R = 2C If this is a valid structural equation, then ∆R = 2∆C The corresponding DAG is: C R Ambiguous associations undermine causal predictions from data • How would cutting exposure concentration C in half affect future response rate R? – No way to determine from historical data Community Concentration , C Income, I Mortality rate, R A 4 100 8 B 8 60 16 C 12 20 24 Model 1: R = 2C, (I = 140 – 10C), DAG: I C R, I C R Model 2: R = 35 – 0.5C – 0.25*I, DAG: C R I Model 3: R = 28 – 0.2*I, (C = 14 – 0.1*I), DAG: C I R So, decreasing C could decrease R, increase it, or leave it unchanged. Implications • Ambiguous associations obscure objective functions, make sound modeling and inference more difficult – Conclusions are not purely data-driven • hypothesis data conclusion – Instead, they conflate data and modeling assumptions • hypothesis/model/assumptions conclusions data – Undermines sound (objective, trustworthy, well-justified, independently repeatable, verifiable) inference • Undermined when conclusions rest on untested assumptions – Ambiguous associations are common in practice • Wanted: A way to reach valid, robust (modelindependent) conclusions from data that can be fully specified before seeing the data. 17 Causal Analytics Toolkit (CAT) software for advanced analytics 18 CAT uses data in Excel • Load data in Excel, click Excel to R to send it to R – Los Angeles air basin – 1461 days, 2007-2010 (Lopiano et al., 2015, thanks to Stan Young for data) – PM2.5 data from CARB – Elderly mortality (“AllCause75”) from CA Department of Health – Daily min and max temps & max relative humidity from ORNL and EPA • Risk question: Does PM2.5 exposure increase elderly mortality risk? If so, how much? 19 Using CAT to examine associations: Plotting the data 1. Send data from Excel to R – – 2. Highlight columns Click on “Excel to R” Select columns to analyze – – 3. Click on column headers Cntrl-click toggles selection Click on Plots to view frequency distributions, scatter plots, correlation, smooth regression curves – PM2.5 is slightly negatively associated with mortality 20 Using CAT to examine associations: Plotting more data 1. Send data from Excel to R – – 2. Highlight columns Click on “Excel to R” Select columns – – 3. Click on column heads Cntrl-click toggles selection Click on Plots to view frequency distributions, scatter plots, correlations, smooth regression curves – – Temperature is positively associated with PM2.5 Temperature is negatively associated with mortality, 21 Basic ideas of Causal Analytics • Use a network to show which variables provide direct information about each other – Arrows between variables show they are informative about each other, even given all other variables – Learn network structure directly from data – Carefully check conclusions • In non-parametric analyses we trust! • Do power analyses using simulation – Interpret neighbors in network as potential direct causes (satisfying necessary condition) • Use sensitivity (partial dependence) graphs (based on averaging over many trees) to quantify relation between independent and dependent variables. 22 Run BN structure discovery algorithms • Click B Bayesian Network to generate DAG structure. – Only variables connected to response variable by an arrow are identified as potential direct causes – Multiple pathways between two variables reveal potential direct and indirect effects – Example: Direct and indirect paths between tmax and AllCause75. CAT_bnLearn (year,month,day,AllCause75,PM2.5,tmin,tmax,MAXRH) Bayesian Network diagram. An arrow between two variables shows that they are informative about each other. Network discovered by bnlearn 23 Confirm or refute/refine BN structure with additional non-parametric tests • Conditioning on very different values of a direct cause should cause the distribution of the response variable to change • If the response variable does not change, then any association between them may be due to indirect pathways (e.g., confounding) 24 Confirm or refute/refine BN structure with additional non-parametric tests • Conditioning on very different values of a direct cause should cause the distribution of the response variable to change • If the response variable does not change, then any association between them may be due to indirect pathways (e.g., confounding) 25 Discovering DAG structure resolves ambiguous associations • How would cutting PM2.5 pollution in half affect future elderly mortalities per year? – No way to determine from association data Community PM2.5 in 1980 (µg/m3) Income Elderly mortality rate in 1980 A 4 100 8 B 8 60 16 C 12 20 24 Model 1: Income PM2.5 Mortality: mortality would be halved Model 2: PM2.5 Mortality Income: mortality would increase Model 3: PM2.5 Income Mortality: mortality would not change Quantify direct causal relations • • Procedure: To quantify direct (potentially causal) relations after controlling for other variables and indirect pathways, estimate partial dependence graph for response R vs. (potential) cause C. Rationale: Screening and BN structure discovery have shown that the relation might be causal. Partial dependence estimates size of potential effect. 27 Validate quantified C-R relations in hold-out sample • Current CAT uses bootstrap and crossvalidation approaches for Random Forest ensembles • Cross-validation and hold-out sample validation reports for regression and other analyses 28 Summary of CAT’s causal analytics • Screen for total, partial, and temporal associations and information relations • Learn BN network structure from data • Estimate quantitative dependence relations among neighboring variables – Use partial dependence plots (Random Forest ensemble of non-parametric trees) – Use trees to quantify multivariate dependencies on multiple neighbors simultaneously • Validate on hold-out samples 29 Systems dynamics simulations 30 http://www.systemdynamics.org/conferences/2003/proceed/PAPERS/417.pdf Systems dynamics model structure reveals key loops to break https://wdsi.wordpress.com/2014/03/30/analysing-terrorism-from-a-systems-thinking-perspective/ 31 Introduction to prescriptive analytics (decision analytics) 32 Concurrent choice experiment Your client must pick one of A or B, and pick one of C or D • • • • A: Gain $240 for sure B: 25% chance to gain $1000, else gain 0 C: Sure loss of $750 D: 75% chance to lose $1000, else lose 0 So, client’s four possible choices are • • • • A&C A&D B&C B&D Which pair would you recommend? 33 Usual Concurrent Choices Pick one of A or B, and pick one of C or D • • • • A: Gain $240 for sure [84%](Tversky & Kahneman, ‘81) B: 25% chance to gain $1000, else gain 0 [16%] C: Sure loss of $750 [13%] D: 75% chance to lose $1000, else lose 0 [87%] But usual choice (A & D) is objectively worse than unusual choice (B & C). Why? • (A and D) is equivalent to 75% chance to lose $760 (= $240 $1000), else gain $240. • (B and C) is equivalent to 75% chance to lose $750, else gain $250. 34 Stochastic dominance • (A and D) gives 75% chance to lose $760, 25% chance to gain $240. • (B and C) gives 75% chance to lose $750, 25% chance to gain $250. • (B & C) stochastically dominates (A & D) – First-order stochastic dominance (FSD): Probabilities of preferred outcomes are greater for (B & C) than for (A & D) • DA identifies undominated choices 35 Conclusions on concurrent choice • Humans tend to consider each choice in isolation rather than looking at portfolios of outcomes and their probabilities for different combinations of choices • This leads us to make predictably sub-optimal choices. (So do many other psychological barriers to rational decision-making.) • Statistical consulting (and decision analysis) can overcome such sub-optimal decision-making. 36 Prescriptive analytics • What to do next? – Uncertain models for predicting probable effects of different actions – Value-of-information – Exploration-exploitation trade-off • Decision rules • Trial design – Sequential multiple assignment randomized trials 37 Optimize changes in controllable inputs, given Pr(c | x) and u(c) https://wdsi.wordpress.com/2014/03/30/analysing-terrorism-from-a-systems-thinking-perspective/ http://www.tandfonline.com/doi/abs/10.1198/jasa.2009.0155?journalCode=uasa20 38 Traditional prescriptive analytics • Standard decision problem: Choose x X to maximize objective function f(x) – x = decision variable (control vector, trajectory, policy mapping information to action, etc.) • May be distributed over multiple collaborating agents – X = choice set • Possibly specified by constraints, hard to search • Typically, f(x) = EU(x) = expected utility of decision x – Possibly evaluated via Monte Carlo 39 In slightly more detail… • Pr(c | x) = probability of receiving consequence c if we do x – c is a member of consequence set C – Pr(c | x) is a risk model • Fine print: Pr(c | x) is not really a conditional probability, because x is an act, not an event • u(c) = utility of consequence c • EU(x) = u(c)*Pr(c | x) c C 40 Challenges • EU(x) may be uncertain or unknown or may be different for different agents – Unknown consequence probabilities, Pr(c | x) • How to estimate from data? – Uncertain consequence set, C (“Black swans”) – Uncertain choice set, X • What constraints hold? How hard or soft? – Uncertain values, u(c) • Problems with multiple interacting agents • Even for known EU(x), maximization problem may be hard (or impossible) 41 Causal analysis challenge • What to do next when consequences of alternative decisions are uncertain? – Known probabilities → EU theory • Known risk model: Pr(c | x) • Known probabilities of different risk models – Unknown probabilities → ? 42 Decisions with unknown models • Learn causal model(s) from data/experience, then use them to optimize decisions – Design of experiments (DOE) – Randomized control trials (RCTs) and practical PCTs – Learn from natural experiments, quasi-experiments • Learn decision rules from data – Adaptive learning: trial and error, low-regret learning – Highly effective in simple (e.g., MDP) environments • Robust optimization – Find decisions that perform well no matter how uncertainties are resolved 43 Algorithms for optimizing actions • Influence diagram algorithms – Learning from data – Validating causal mechanisms – Using for inference and recommendations • Simulation-optimization • Robust optimization • Adaptive optimization/learning algorithms 44
© Copyright 2026 Paperzz