Statistical Consulting - Cox Associates Consulting

Math 6330: Statistical Consulting
Class 8
Tony Cox
[email protected]
University of Colorado at Denver
Course web site: http://cox-associates.com/6330/
Agenda
• Projects and schedule
• Prescriptive (decision) analytics (Cont.)
–
–
–
–
–
Decision trees
Simulation-optimization
Newsvendor problem and applications
Decision rules, optimal statistical decisions
Quality control, SPRT
• Evaluation analytics
• Learning analytics
• Decision psychology
– Heuristics and biases
2
Recommended readings
• Charniak (1991) (rest of paper)
– Build the network in Figure 2 www.aaai.org/ojs/index.php/aimagazine/article/viewFile/918/836
• Pearl (2009) http://ftp.cs.ucla.edu/pub/stat_ser/r350.pdf
• Methods to Accelerate the Learning of Bayesian Network
Structures, Daly and Shen (2007)
https://pdfs.semanticscholar.org/e7d3/029e84a1775bb12e7e67541beaf2367f7a88.pdf
• Distinguishing cause from effect using observational data
(Mooij et al., 2016), www.jmlr.org/papers/volume17/14-518/14-518.pdf
• Probabilistic computational causal discovery for systems
biology (Lagani et al., 2016)
www.mensxmachina.org/files/publications/Probabilistic%20Causal%20Discovery%20for%20System
s%20Biology_prePrint.pdf
3
Projects
4
Papers and projects: 3 types
• Applied: Analyze an application (description,
prediction, causal analysis, decision, evaluation,
learning) using high-value statistical consulting
methods
• Research/develop software
– R packages, algorithms, CAT modules, etc.
• Research/review book or papers (3-5 articles)
– Explain a topic within statistical consulting
– Examples: Netica’s Bayesian inference algorithms,
multicriteria decision-making, machine learning
algorithms, etc.
5
Projects (cont.)
• Typical report paper is about 10-20 pages, font
12, space 1.5. (This is typical, not required)
• Content matters; length does not
• Typical in-class presentation is 20-30 minutes
– Can run longer if needed
• Purposes:
1. Learn something interesting and useful;
2. Either explain/show what you learned, or show
how to use it in practice (or both)
6
Project proposals due March 17
• If you have not yet done so, please send me a succinct
description of what you want to do (and perhaps what
you hope to learn by doing it).
– Problem to be addressed
– Methods to be researched/applied
– Hoped-for results
• Due by end of day on Friday, March 17th (though
sooner is welcome)
• Key dates: April 14 for rough draft (or very good
outline)
• Start in-class presentations/discussions April 18
• May 4, 8:00 PM for final
7
Course schedule
•
•
•
•
•
March 14: No class. (Work on project idea)
March 17: Project/paper proposals due
March 21: No class (Spring break)
April 14: Draft of project/term paper due
April 18, 25, May 2, (May 9): In-class
presentations
• May 4: Final project/paper due by 8:00 PM
8
Prescriptive analytics (cont.)
9
Algorithms for optimizing actions
• Decision analysis framework: Choose act a from choice
set A to maximize expected utility of consequence c,
given a causal model c(a, s), Pr(s) or Pr(c | a, s), Pr(s)
– s = state = random variable = things that affect c other than
the choice of act a
• Influence diagram algorithms
– Learning ID structure from data
– Validating causal mechanisms
– Using for inference and recommendations
• Simulation-optimization
• Robust optimization
• Adaptive optimization/learning algorithms
10
Prescriptive analytics methods
• Optimization
–
–
–
–
Decision trees,
Stochastic dynamic programming, optimal control
Gittins indices
Reinforcement learning (RL) algorithms
• Influence diagram solution algorithms
• Simulation-optimization
• Adaptive learning and optimization
– EVOP (Evolutionary operations)
– Multi-arm bandit problems, UCL strategies
11
Decision tree ingredients
• Three types of nodes
– Choice nodes (squares)
– Chance nodes (circles)
– Terminal nodes / value nodes
• Arcs show how decisions and chance events
can unfold over time
– Uncertainties are resolved as time passes and
choices are made
12
Solving decision trees
• “Backward induction”
• “Stochastic dynamic programming”
– “Average out and roll back”  implicitly, tree
determines Pr(c | a)
• Procedure:
– Start at tips of tree, work backward
– Compute expected value at each chance node
• “Averaging out”
– Choose maximum expected value at each choice node
13
Obtaining Pr(s) from Decision trees
http://www.eogogics.com/talkgogics/tutorials/decision-tree
Decision 1: Develop or Do Not Develop
Development Successful + Development Unsuccessful
(70% X $172,000) + (30% x (- $500,000))
$120,400 + (-$150,000)
Obtaining Pr(s) from Decision trees
http://www.eogogics.com/talkgogics/tutorials/decision-tree
Decision 1: Develop or Do Not Develop
Development Successful + Development Unsuccessful
(70% X $172,000) + (30% x (- $500,000))
$120,400 + (-$150,000)
What happened to act a and state s?
http://www.eogogics.com/talkgogics/tutorials/decision-tree
Decision 1: Develop or Do Not Develop
Development Successful + Development Unsuccessful
(70% X $172,000) + (30% x (- $500,000))
$120,400 + (-$150,000)
What happened to act a and state s?
http://www.eogogics.com/talkgogics/tutorials/decision-tree
Decision 1: Develop or Do Not Develop
Development Successful + Development Unsuccessful
(70% X $172,000) + (30% x (- $500,000))
$120,400 + (-$150,000)
What happened to act a and state s?
http://www.eogogics.com/talkgogics/tutorials/decision-tree
What are the 3 possible acts in this tree?
What happened to act a and state s?
http://www.eogogics.com/talkgogics/tutorials/decision-tree
What are the 3 possible acts in this tree?
(a) Don’t develop; (b) Develop, then rebuild if
successful; (c) Develop, then new line if successful.
What happened to act a and state s?
http://www.eogogics.com/talkgogics/tutorials/decision-tree
Optimize decisions!
What are the 3 possible acts in this tree?
(a) Don’t develop; (b) Develop, then rebuild if
successful; (c) Develop, then new line if successful.
Key points
• Solving decision trees (with decisions) requires
embedded optimization
– Make future decisions optimally, given the
information available when they are made
• Event trees = decision trees with no decisions
– Can be solved, to find outcome probabilities, by
forward Monte-Carlo simulation, or by multiplication
and addition
• In general, sequential decision-making cannot be
modeled well using event trees.
– Must include (optimal choice | information)
What happened to state s?
http://www.eogogics.com/talkgogics/tutorials/decision-tree
What are the 4 possible states?
What happened to state s?
http://www.eogogics.com/talkgogics/tutorials/decision-tree
What are the 4 possible states?
C1 can succeed or not; C2 can be high or low demand
Acts and states cause consequences
http://www.eogogics.com/talkgogics/tutorials/decision-tree
Key theoretical insight
• A complex decision model can be viewed as a (possibly
large) simple Pr(c | a) model.
–
–
–
–
s = selection of branch at each chance node
a = selection of branch at each choice node
c = outcome at terminal node for (a, s)
Pr(c | a) = sPr(c | a, s)*Pr(s)
• Other complex decision models can also be interpreted
as c(a, s), Pr(c | a, s), or Pr(c |a) models
– s = system state & information signal
– a = decision rule (information  act)
– c may include changes in s and in possible a.
Real decision trees can quickly
become “bushy messes” (Raiffa,
1968) with many duplicated subtrees
Y1|d1
D2|d1
Test All
Test CA only
No BSE
Repeat Test
Test All
BSE in CA
Track Imports
Test All
Test CA only
BSE in US from US
Repeat Test
BSE in US from CA
Test All
Test CA only
Don’t Track Imports
No BSE
Repeat Test
Test All
Repeat Test
BSE in CA
B
A
A
Y2|d1,d2
A
No BSE
BSE in CA
BSE in US from US
Test CA only
Repeat Test
D1
A
Test All
Repeat Test
Test All
BSE in US
Repeat Test
B
BSE in US from CA
A
A
B
A
B
No BSE
BSE in CA
A
BSE in US from CA
B
A
C
C
C
C
C
C
C
No BSE
BSE in CA
BSE in US
Influence Diagrams help to avoid large trees
http://en.wikipedia.org/wiki/Decision_tree
Often much more
compact than
decision trees
Limitations of decision trees
• Combinatorial explosion
– Example: Searching for a prize in one of N boxes
or locations involves building a tree of depth N! =
N(N – 1)…*2*1.
• Infinite trees
– Continuous variables
– When to stop growing a tree?
• How to evaluate utilities and probabilities?
29
Optimization formulations of decision
problems
• Example: Prize is in location j with prior
probability p(j), j = 1, 2, …, N
• It costs c(j) to inspect location j
• What search strategy minimizes expected cost
of finding prize?
– What is a strategy? Order in which to inspect
– How many are there? N!
30
With two locations, 1 and 2
Strategy 1: Inspect 1, then 2 if needed:
– Expected cost: c1 + (1 – p1)c2 = c1 + c2 – p1c2
Strategy 2: Inspect 2, then 1 if needed:
– Expected cost: c2 + (1 – p2)c1 = c1 + c2 – p2c1
Strategy 1 has lower expected cost if:
• p1c2 > p2c1, or p1/c1 > p2/c2
• So, look first at location with highest success
probability per unit cost
31
With N locations
• Optimal decision rule: Always inspect next the
(as-yet uninspected) location with the greatest
success probability-to-cost ratio
– Example of an “index policy,” “Gittins index”
– If M players take turns, competing to find prize,
each should still use this rule.
• A decision table or tree can be unwieldy even
for such simple optimization problems
32
Other optimization formulations
• maxa A EU(a)
– Typically, a is a vector, A is the feasible set
– More generally, a is a strategy/policy/decision rule, A
is the choice set of feasible strategies
– In previous example, A = set of permutations
• maxa A EU(a)
s.t. EU(a) = ∑cPr(c | a)u(c)
Pr(c | a) = ∑sPr(c | a, s)p(s)
g(a) ≤ 0 (feasible set, A)
33
Introduction to evaluation
analytics
34
Evaluation analytics:
How well are policies working?
• Algorithms for evaluating effects of actions,
events, conditions
– Intervention analysis/interrupted time series
• Key idea: Compare predicted outcomes with no action to
observed outcomes with it
– Counterfactual causal analysis
– Google’s new CausalImpact algorithm
• Quasi-experimental designs and analysis
– Refute non-causal explanations for data
– Compare to control groups to estimate effects
35
How did U.K. National Institute for Health and Clinical Excellence (NICE)
recommendation of complete cessation of antibiotic prophylaxis for
prevention of infective endocarditis in March, 2008 affect incidence of
infective endocarditis?
www.thelancet.com/journals/lancet/article/PIIS0140-6736(14)62007-9/fulltext?rss=yes
36
Different models yield different conclusions.
So, how to deal with model uncertainty?
Solution: Model ensembles, Bayesian Model Averaging (BMA)
www.thelancet.com/journals/lancet/article/PIIS0140-6736(14)62007-9/fulltext?rss=yes
37
Nonlinear models complicate
inference of intervention effects
Solution: Non-parametric models, gradient boosting
38
Quasi-experiments: Refuting non-causal
explanations with control groups
Example:
Do delinquency
interventions work?
http://www.slideshare.net/horatjitra/research-design-and-validity
39
Algorithms for evaluating effects of
combinations of factors
• Classification trees
– Boosted trees, Random Forest, MARS
• Bayesian Network algorithms
– Discovery
• Conditional independence tests
– Validation
– Inference and explanation
• Response surface algorithms
– Adaptive learning, design of experiments
40
Learning analytics
• Learn to predict better
– Create ensemble of models, algorithms
• Use multiple machine learning algorithms
– Logistic regression, Random Forest, SVM, ANN, deep learning, gradient boosting, KNN, lasso,
etc.
– “Stack” models (hybridize multiple predictions)
• Cross-validation assesses model performance
– Meta-learner combines performance-weighted predictors to produce an improved predictor
• Theoretical guarantees, practical successes (Kaggle competitions)
• Learn to decide better
– Low-regret learning of decision rules
• Theoretical guarantees (MDPs)
• practical performance
41
http://www2.hawaii.edu/~chenx/ics699rl/grid/rl.html
Collaborative risk analytics: Multiple
interacting learning agents
http://groups.inf.ed.ac.uk/agents/index.php/Main/Projects
42
Collaborative risk analytics
• Global performance metrics
• Local information, control,
tasks, priorities, rewards
– Hierarchical distributed control
– Collaborative sensing, filtering,
deliberation, and decision-control
networks of agents
• Mixed human and machine agents
• Autonomous agents vs. intelligent
assistants
http://www.cities.io/news/page/3/
43
Collaborative risk analytics: Games as
labs for distributed AI
• Local information, control,
tasks, priorities
– Hierarchical distributed control
– Collaborative sensing,
deliberation, control networks
• From decentralized agents to
effective risk analytics teams
and HCI support
– Trust, reputation, performance
– Sharing information, attention,
control, evaluation, learning
44
http://people.idsia.ch/~juergen/learningrobots.html
Risk analytics toolkit: Summary
1.
Descriptive analytics
–
–
2.
Change-point analysis, likelihood ratio
Machine learning , response surfaces
CPA
ML (LR, RF, GBM, SVM, ANN, KNN, etc.)
Predictive analytics
–
–
3.
Bayesian networks, dynamic BN
Bayesian model averaging
BN, DBN
BMA, ML
Causal analytics & principles
–
–
4.
5.
6.
Causal BNs, systems dynamics (SD)
Time series causation
Prescriptive analytics:
Evaluation analytics:
Learning analytics
–
–
Machine learning, superlearning
Low-regret learning of decision rules
DAGs, SD simulation
IDs, simulation-optimization, robust
QE, credit assignment, attribution
ML
Collaborative learning
45
Applied risk analytics toolkit: Toward
more practical analytics
Reorientation: From solving well-posed problems to
discovering how to act more effectively
1. Descriptive analytics: What’s happening?
2. Predictive analytics: What’s coming next?
3. Causal analytics: What can we do about it?
4. Prescriptive analytics: What should we do?
5. Evaluation analytics: How well is it working?
6. Learning analytics: How to do better?
7. Collaboration: How to do better together?
46