Math 6330: Statistical Consulting Class 8 Tony Cox [email protected] University of Colorado at Denver Course web site: http://cox-associates.com/6330/ Agenda • Projects and schedule • Prescriptive (decision) analytics (Cont.) – – – – – Decision trees Simulation-optimization Newsvendor problem and applications Decision rules, optimal statistical decisions Quality control, SPRT • Evaluation analytics • Learning analytics • Decision psychology – Heuristics and biases 2 Recommended readings • Charniak (1991) (rest of paper) – Build the network in Figure 2 www.aaai.org/ojs/index.php/aimagazine/article/viewFile/918/836 • Pearl (2009) http://ftp.cs.ucla.edu/pub/stat_ser/r350.pdf • Methods to Accelerate the Learning of Bayesian Network Structures, Daly and Shen (2007) https://pdfs.semanticscholar.org/e7d3/029e84a1775bb12e7e67541beaf2367f7a88.pdf • Distinguishing cause from effect using observational data (Mooij et al., 2016), www.jmlr.org/papers/volume17/14-518/14-518.pdf • Probabilistic computational causal discovery for systems biology (Lagani et al., 2016) www.mensxmachina.org/files/publications/Probabilistic%20Causal%20Discovery%20for%20System s%20Biology_prePrint.pdf 3 Projects 4 Papers and projects: 3 types • Applied: Analyze an application (description, prediction, causal analysis, decision, evaluation, learning) using high-value statistical consulting methods • Research/develop software – R packages, algorithms, CAT modules, etc. • Research/review book or papers (3-5 articles) – Explain a topic within statistical consulting – Examples: Netica’s Bayesian inference algorithms, multicriteria decision-making, machine learning algorithms, etc. 5 Projects (cont.) • Typical report paper is about 10-20 pages, font 12, space 1.5. (This is typical, not required) • Content matters; length does not • Typical in-class presentation is 20-30 minutes – Can run longer if needed • Purposes: 1. Learn something interesting and useful; 2. Either explain/show what you learned, or show how to use it in practice (or both) 6 Project proposals due March 17 • If you have not yet done so, please send me a succinct description of what you want to do (and perhaps what you hope to learn by doing it). – Problem to be addressed – Methods to be researched/applied – Hoped-for results • Due by end of day on Friday, March 17th (though sooner is welcome) • Key dates: April 14 for rough draft (or very good outline) • Start in-class presentations/discussions April 18 • May 4, 8:00 PM for final 7 Course schedule • • • • • March 14: No class. (Work on project idea) March 17: Project/paper proposals due March 21: No class (Spring break) April 14: Draft of project/term paper due April 18, 25, May 2, (May 9): In-class presentations • May 4: Final project/paper due by 8:00 PM 8 Prescriptive analytics (cont.) 9 Algorithms for optimizing actions • Decision analysis framework: Choose act a from choice set A to maximize expected utility of consequence c, given a causal model c(a, s), Pr(s) or Pr(c | a, s), Pr(s) – s = state = random variable = things that affect c other than the choice of act a • Influence diagram algorithms – Learning ID structure from data – Validating causal mechanisms – Using for inference and recommendations • Simulation-optimization • Robust optimization • Adaptive optimization/learning algorithms 10 Prescriptive analytics methods • Optimization – – – – Decision trees, Stochastic dynamic programming, optimal control Gittins indices Reinforcement learning (RL) algorithms • Influence diagram solution algorithms • Simulation-optimization • Adaptive learning and optimization – EVOP (Evolutionary operations) – Multi-arm bandit problems, UCL strategies 11 Decision tree ingredients • Three types of nodes – Choice nodes (squares) – Chance nodes (circles) – Terminal nodes / value nodes • Arcs show how decisions and chance events can unfold over time – Uncertainties are resolved as time passes and choices are made 12 Solving decision trees • “Backward induction” • “Stochastic dynamic programming” – “Average out and roll back” implicitly, tree determines Pr(c | a) • Procedure: – Start at tips of tree, work backward – Compute expected value at each chance node • “Averaging out” – Choose maximum expected value at each choice node 13 Obtaining Pr(s) from Decision trees http://www.eogogics.com/talkgogics/tutorials/decision-tree Decision 1: Develop or Do Not Develop Development Successful + Development Unsuccessful (70% X $172,000) + (30% x (- $500,000)) $120,400 + (-$150,000) Obtaining Pr(s) from Decision trees http://www.eogogics.com/talkgogics/tutorials/decision-tree Decision 1: Develop or Do Not Develop Development Successful + Development Unsuccessful (70% X $172,000) + (30% x (- $500,000)) $120,400 + (-$150,000) What happened to act a and state s? http://www.eogogics.com/talkgogics/tutorials/decision-tree Decision 1: Develop or Do Not Develop Development Successful + Development Unsuccessful (70% X $172,000) + (30% x (- $500,000)) $120,400 + (-$150,000) What happened to act a and state s? http://www.eogogics.com/talkgogics/tutorials/decision-tree Decision 1: Develop or Do Not Develop Development Successful + Development Unsuccessful (70% X $172,000) + (30% x (- $500,000)) $120,400 + (-$150,000) What happened to act a and state s? http://www.eogogics.com/talkgogics/tutorials/decision-tree What are the 3 possible acts in this tree? What happened to act a and state s? http://www.eogogics.com/talkgogics/tutorials/decision-tree What are the 3 possible acts in this tree? (a) Don’t develop; (b) Develop, then rebuild if successful; (c) Develop, then new line if successful. What happened to act a and state s? http://www.eogogics.com/talkgogics/tutorials/decision-tree Optimize decisions! What are the 3 possible acts in this tree? (a) Don’t develop; (b) Develop, then rebuild if successful; (c) Develop, then new line if successful. Key points • Solving decision trees (with decisions) requires embedded optimization – Make future decisions optimally, given the information available when they are made • Event trees = decision trees with no decisions – Can be solved, to find outcome probabilities, by forward Monte-Carlo simulation, or by multiplication and addition • In general, sequential decision-making cannot be modeled well using event trees. – Must include (optimal choice | information) What happened to state s? http://www.eogogics.com/talkgogics/tutorials/decision-tree What are the 4 possible states? What happened to state s? http://www.eogogics.com/talkgogics/tutorials/decision-tree What are the 4 possible states? C1 can succeed or not; C2 can be high or low demand Acts and states cause consequences http://www.eogogics.com/talkgogics/tutorials/decision-tree Key theoretical insight • A complex decision model can be viewed as a (possibly large) simple Pr(c | a) model. – – – – s = selection of branch at each chance node a = selection of branch at each choice node c = outcome at terminal node for (a, s) Pr(c | a) = sPr(c | a, s)*Pr(s) • Other complex decision models can also be interpreted as c(a, s), Pr(c | a, s), or Pr(c |a) models – s = system state & information signal – a = decision rule (information act) – c may include changes in s and in possible a. Real decision trees can quickly become “bushy messes” (Raiffa, 1968) with many duplicated subtrees Y1|d1 D2|d1 Test All Test CA only No BSE Repeat Test Test All BSE in CA Track Imports Test All Test CA only BSE in US from US Repeat Test BSE in US from CA Test All Test CA only Don’t Track Imports No BSE Repeat Test Test All Repeat Test BSE in CA B A A Y2|d1,d2 A No BSE BSE in CA BSE in US from US Test CA only Repeat Test D1 A Test All Repeat Test Test All BSE in US Repeat Test B BSE in US from CA A A B A B No BSE BSE in CA A BSE in US from CA B A C C C C C C C No BSE BSE in CA BSE in US Influence Diagrams help to avoid large trees http://en.wikipedia.org/wiki/Decision_tree Often much more compact than decision trees Limitations of decision trees • Combinatorial explosion – Example: Searching for a prize in one of N boxes or locations involves building a tree of depth N! = N(N – 1)…*2*1. • Infinite trees – Continuous variables – When to stop growing a tree? • How to evaluate utilities and probabilities? 29 Optimization formulations of decision problems • Example: Prize is in location j with prior probability p(j), j = 1, 2, …, N • It costs c(j) to inspect location j • What search strategy minimizes expected cost of finding prize? – What is a strategy? Order in which to inspect – How many are there? N! 30 With two locations, 1 and 2 Strategy 1: Inspect 1, then 2 if needed: – Expected cost: c1 + (1 – p1)c2 = c1 + c2 – p1c2 Strategy 2: Inspect 2, then 1 if needed: – Expected cost: c2 + (1 – p2)c1 = c1 + c2 – p2c1 Strategy 1 has lower expected cost if: • p1c2 > p2c1, or p1/c1 > p2/c2 • So, look first at location with highest success probability per unit cost 31 With N locations • Optimal decision rule: Always inspect next the (as-yet uninspected) location with the greatest success probability-to-cost ratio – Example of an “index policy,” “Gittins index” – If M players take turns, competing to find prize, each should still use this rule. • A decision table or tree can be unwieldy even for such simple optimization problems 32 Other optimization formulations • maxa A EU(a) – Typically, a is a vector, A is the feasible set – More generally, a is a strategy/policy/decision rule, A is the choice set of feasible strategies – In previous example, A = set of permutations • maxa A EU(a) s.t. EU(a) = ∑cPr(c | a)u(c) Pr(c | a) = ∑sPr(c | a, s)p(s) g(a) ≤ 0 (feasible set, A) 33 Introduction to evaluation analytics 34 Evaluation analytics: How well are policies working? • Algorithms for evaluating effects of actions, events, conditions – Intervention analysis/interrupted time series • Key idea: Compare predicted outcomes with no action to observed outcomes with it – Counterfactual causal analysis – Google’s new CausalImpact algorithm • Quasi-experimental designs and analysis – Refute non-causal explanations for data – Compare to control groups to estimate effects 35 How did U.K. National Institute for Health and Clinical Excellence (NICE) recommendation of complete cessation of antibiotic prophylaxis for prevention of infective endocarditis in March, 2008 affect incidence of infective endocarditis? www.thelancet.com/journals/lancet/article/PIIS0140-6736(14)62007-9/fulltext?rss=yes 36 Different models yield different conclusions. So, how to deal with model uncertainty? Solution: Model ensembles, Bayesian Model Averaging (BMA) www.thelancet.com/journals/lancet/article/PIIS0140-6736(14)62007-9/fulltext?rss=yes 37 Nonlinear models complicate inference of intervention effects Solution: Non-parametric models, gradient boosting 38 Quasi-experiments: Refuting non-causal explanations with control groups Example: Do delinquency interventions work? http://www.slideshare.net/horatjitra/research-design-and-validity 39 Algorithms for evaluating effects of combinations of factors • Classification trees – Boosted trees, Random Forest, MARS • Bayesian Network algorithms – Discovery • Conditional independence tests – Validation – Inference and explanation • Response surface algorithms – Adaptive learning, design of experiments 40 Learning analytics • Learn to predict better – Create ensemble of models, algorithms • Use multiple machine learning algorithms – Logistic regression, Random Forest, SVM, ANN, deep learning, gradient boosting, KNN, lasso, etc. – “Stack” models (hybridize multiple predictions) • Cross-validation assesses model performance – Meta-learner combines performance-weighted predictors to produce an improved predictor • Theoretical guarantees, practical successes (Kaggle competitions) • Learn to decide better – Low-regret learning of decision rules • Theoretical guarantees (MDPs) • practical performance 41 http://www2.hawaii.edu/~chenx/ics699rl/grid/rl.html Collaborative risk analytics: Multiple interacting learning agents http://groups.inf.ed.ac.uk/agents/index.php/Main/Projects 42 Collaborative risk analytics • Global performance metrics • Local information, control, tasks, priorities, rewards – Hierarchical distributed control – Collaborative sensing, filtering, deliberation, and decision-control networks of agents • Mixed human and machine agents • Autonomous agents vs. intelligent assistants http://www.cities.io/news/page/3/ 43 Collaborative risk analytics: Games as labs for distributed AI • Local information, control, tasks, priorities – Hierarchical distributed control – Collaborative sensing, deliberation, control networks • From decentralized agents to effective risk analytics teams and HCI support – Trust, reputation, performance – Sharing information, attention, control, evaluation, learning 44 http://people.idsia.ch/~juergen/learningrobots.html Risk analytics toolkit: Summary 1. Descriptive analytics – – 2. Change-point analysis, likelihood ratio Machine learning , response surfaces CPA ML (LR, RF, GBM, SVM, ANN, KNN, etc.) Predictive analytics – – 3. Bayesian networks, dynamic BN Bayesian model averaging BN, DBN BMA, ML Causal analytics & principles – – 4. 5. 6. Causal BNs, systems dynamics (SD) Time series causation Prescriptive analytics: Evaluation analytics: Learning analytics – – Machine learning, superlearning Low-regret learning of decision rules DAGs, SD simulation IDs, simulation-optimization, robust QE, credit assignment, attribution ML Collaborative learning 45 Applied risk analytics toolkit: Toward more practical analytics Reorientation: From solving well-posed problems to discovering how to act more effectively 1. Descriptive analytics: What’s happening? 2. Predictive analytics: What’s coming next? 3. Causal analytics: What can we do about it? 4. Prescriptive analytics: What should we do? 5. Evaluation analytics: How well is it working? 6. Learning analytics: How to do better? 7. Collaboration: How to do better together? 46
© Copyright 2026 Paperzz