Statistical Consulting - Cox Associates Consulting

Math 6330: Statistical Consulting
Class 7
Tony Cox
[email protected]
University of Colorado at Denver
Course web site: http://cox-associates.com/6330/
Agenda
• Charniak Family Out network
• Causal analytics (cont.)
• Prescriptive analytics
– Simulation-optimization
– Newsvendor problem and applications
– Decision rules, optimal statistical decisions
• Decision psychology
– Heuristics and biases
• Probabilistic expert systems (Netica)
2
Recommended readings on causal
analytics
• Charniak (1991), pages 50-53,
– Build the network in Figure 2 www.aaai.org/ojs/index.php/aimagazine/article/viewFile/918/836
• Pearl (2009) http://ftp.cs.ucla.edu/pub/stat_ser/r350.pdf
• Methods to Accelerate the Learning of Bayesian Network
Structures, Daly and Shen (2007)
https://pdfs.semanticscholar.org/e7d3/029e84a1775bb12e7e67541beaf2367f7a88.pdf
• Distinguishing cause from effect using observational data
(Mooij et al., 2016), www.jmlr.org/papers/volume17/14-518/14-518.pdf
• Probabilistic computational causal discovery for systems
biology (Lagani et al., 2016)
www.mensxmachina.org/files/publications/Probabilistic%20Causal%20Discovery%20for%20System
s%20Biology_prePrint.pdf
3
Family out network
family_out
Yes
No
bowel_problem
15.0
85.0
Yes
No
light_on
Yes
No
13.3
86.8
1.00
99.0
dog_out
Yes
No
16.9
83.1
hear_bark
Yes
No
12.6
87.4
4
Family out network: Inference
family_out
Yes
No
bowel_problem
44.8
55.2
Yes
No
light_on
Yes
No
100
0
0.56
99.4
dog_out
Yes
No
33.5
66.5
hear_bark
Yes
No
0
100
5
Family out network: Inference
family_out
Yes
No
bowel_problem
44.8
55.2
Yes
No
light_on
Yes
No
100
0
0.56
99.4
dog_out
Yes
No
33.5
66.5
Information flows both backward up arrows
(via Bayes rule) and down arrows
hear_bark
Multiple sources of evidence are fused at
the “dog_out” node (via conditioning)
Yes
No
0
100
6
Pearl, 2009, Sections 1-2
Four under-used ideas for causal analysis:
1. Counterfactual analysis
2. Nonparametric structural equations, y = f(x, e)
3. Graphical models (DAGs)
4. Symbiosis between counterfactual and graphical
methods: y = f(do(x), e)
Challenges: Untested assumptions, new notation
7
Project ideas
8
Goals for student projects
• Extract good problems from available knowledge and
data
– “Good” = high value of analysis = large improvement in
decisions, results, etc.
• Apply high-value techniques to produce valuable
answers and insights
– Unexpected directions are ok!
• Present the results so that the potential value is
actually delivered
• If possible, document impact and next steps
• Or… do algorithm research, software project, suggest
improved analytics for –omics data analysis, etc.
9
General ideas for projects
1.
2.
3.
4.
5.
Read and report on a book or a collection of about 3-5 technical
papers in an area where statistical consulting is extensively used
– Marketing, pricing, A/B testing; Location, operations, inventory
and logistics; Bidding
– Policy analysis
– Genomic, other –omics data analysis and predictive analytics
Research and report on or extend some of the algorithms behind
Netica, simulation-optimization, or other analytics software
Create a piece of software for advanced analytics
Apply statistical consulting methods and software to a real-world
problem from work or life.
Propose your own novel research or application
10
Some examples of applied statistical
consulting project ideas
1. Optimizing buyer discounts given history
2. Excel simulation-optimization (SO) solver for
multi-item inventory control decisions
3. SO for call center intent-based routing to reduce
call handling times
4. NFL decision analytics; Improving 4th down
decisions
5. Providing more valuable information to help
change customer/patient behaviors
11
Components of a successful project
1.
2.
3.
4.
5.
6.
7.
8.
Problem statement and motivation
Data
Analysis plan/narrative
Tools and software
Results: Reports and displays
Presentation: What did we learn?
Evaluation: What was the impact?
Proposed next steps
12
Causal analytics (cont.)
13
How to get from data to causal
predictions… objectively?
• Causal prediction
– Deterministic causal prediction: Doing X will make
Y happen to people of type Z
– Probabilistic causal prediction: Doing X will change
conditional probability distribution of Y, given
covariates Z
• Goal: Manipulative causation (vs. associational,
counterfactual, predictive, computational, etc.)
• Data: Observed (X, Y, Z) values
• Challenge: How will changing X change Y?
14
Ambiguous associations undermine
causal predictions from data
• How would cutting exposure concentration C
in half affect future response rate R?
Community
Concentration , C
Income, I
Response rate, R
A
4
100
8
B
8
60
16
C
12
20
24
Model 1: R = 2C
If this is a valid structural equation, then ∆R = 2∆C
The corresponding DAG is: C  R
Ambiguous associations undermine
causal predictions from data
• How would cutting exposure concentration C
in half affect future response rate R?
– No way to determine from historical data
Community
Concentration , C
Income, I
Mortality rate, R
A
4
100
8
B
8
60
16
C
12
20
24
Model 1: R = 2C, (I = 140 – 10C), DAG: I  C  R, I  C  R
Model 2: R = 35 – 0.5C – 0.25*I, DAG: C  R  I
Model 3: R = 28 – 0.2*I, (C = 14 – 0.1*I), DAG: C  I  R
So, decreasing C could decrease R, increase it, or leave it unchanged.
Implications
• Ambiguous associations obscure objective functions,
make sound modeling and inference more difficult
– Conclusions are not purely data-driven
• hypothesis  data  conclusion
– Instead, they conflate data and modeling assumptions
• hypothesis/model/assumptions  conclusions  data
– Undermines sound (objective, trustworthy, well-justified,
independently repeatable, verifiable) inference
• Undermined when conclusions rest on untested assumptions
– Ambiguous associations are common in practice
• Wanted: A way to reach valid, robust (modelindependent) conclusions from data that can be fully
specified before seeing the data.
17
Causal Analytics Toolkit (CAT)
software for advanced analytics
18
CAT uses data in Excel
• Load data in Excel, click
Excel to R to send it to R
– Los Angeles air basin
– 1461 days, 2007-2010
(Lopiano et al., 2015, thanks
to Stan Young for data)
– PM2.5 data from CARB
– Elderly mortality
(“AllCause75”) from CA
Department of Health
– Daily min and max temps &
max relative humidity from
ORNL and EPA
• Risk question: Does PM2.5
exposure increase elderly
mortality risk? If so, how
much?
19
Using CAT to examine associations:
Plotting the data
1.
Send data from Excel to R
–
–
2.
Highlight columns
Click on “Excel to R”
Select columns to analyze
–
–
3.
Click on column headers
Cntrl-click toggles selection
Click on Plots to view
frequency distributions,
scatter plots, correlation,
smooth regression curves
–
PM2.5 is slightly negatively
associated with mortality
20
Using CAT to examine associations:
Plotting more data
1.
Send data from Excel to R
–
–
2.
Highlight columns
Click on “Excel to R”
Select columns
–
–
3.
Click on column heads
Cntrl-click toggles selection
Click on Plots to view
frequency distributions,
scatter plots, correlations,
smooth regression curves
–
–
Temperature is positively
associated with PM2.5
Temperature is negatively
associated with mortality,
21
Basic ideas of Causal Analytics
• Use a network to show which variables provide
direct information about each other
– Arrows between variables show they are
informative about each other, even given all
other variables
– Learn network structure directly from data
– Carefully check conclusions
• In non-parametric analyses we trust!
• Do power analyses using simulation
– Interpret neighbors in network as potential
direct causes (satisfying necessary condition)
• Use sensitivity (partial dependence) graphs
(based on averaging over many trees) to
quantify relation between independent and
dependent variables.
22
Run BN structure discovery algorithms
• Click B Bayesian Network
to generate DAG
structure.
– Only variables connected
to response variable by
an arrow are identified as
potential direct causes
– Multiple pathways
between two variables
reveal potential direct
and indirect effects
– Example: Direct and
indirect paths between
tmax and AllCause75.
CAT_bnLearn (year,month,day,AllCause75,PM2.5,tmin,tmax,MAXRH)
Bayesian Network diagram.
An arrow between two variables shows that they are informative about each other.
Network discovered by bnlearn
23
Confirm or refute/refine BN structure
with additional non-parametric tests
• Conditioning on very
different values of a direct
cause should cause the
distribution of the response
variable to change
• If the response variable
does not change, then any
association between them
may be due to indirect
pathways (e.g.,
confounding)
24
Confirm or refute/refine BN structure
with additional non-parametric tests
• Conditioning on very
different values of a direct
cause should cause the
distribution of the response
variable to change
• If the response variable
does not change, then any
association between them
may be due to indirect
pathways (e.g.,
confounding)
25
Discovering DAG structure resolves
ambiguous associations
• How would cutting PM2.5 pollution in half
affect future elderly mortalities per year?
– No way to determine from association data
Community
PM2.5 in 1980 (µg/m3)
Income
Elderly mortality rate in 1980
A
4
100
8
B
8
60
16
C
12
20
24
Model 1: Income  PM2.5  Mortality: mortality would be halved
Model 2: PM2.5  Mortality  Income: mortality would increase
Model 3: PM2.5  Income  Mortality: mortality would not change
Quantify direct causal relations
•
•
Procedure: To quantify direct
(potentially causal) relations
after controlling for other
variables and indirect
pathways, estimate partial
dependence graph for
response R vs. (potential)
cause C.
Rationale: Screening and BN
structure discovery have
shown that the relation
might be causal. Partial
dependence estimates size
of potential effect.
27
Validate quantified C-R relations in
hold-out sample
•
Current CAT uses
bootstrap and crossvalidation approaches
for Random Forest
ensembles
•
Cross-validation and
hold-out sample
validation reports for
regression and other
analyses
28
Summary of CAT’s causal analytics
• Screen for total, partial, and temporal
associations and information relations
• Learn BN network structure from data
• Estimate quantitative dependence relations
among neighboring variables
– Use partial dependence plots (Random Forest
ensemble of non-parametric trees)
– Use trees to quantify multivariate dependencies on
multiple neighbors simultaneously
• Validate on hold-out samples
29
Systems dynamics simulations
30
http://www.systemdynamics.org/conferences/2003/proceed/PAPERS/417.pdf
Systems dynamics model structure reveals
key loops to break
https://wdsi.wordpress.com/2014/03/30/analysing-terrorism-from-a-systems-thinking-perspective/
31
Introduction to prescriptive
analytics (decision analytics)
32
Concurrent choice experiment
Your client must pick one of A or B, and pick one of C or D
•
•
•
•
A: Gain $240 for sure
B: 25% chance to gain $1000, else gain 0
C: Sure loss of $750
D: 75% chance to lose $1000, else lose 0
So, client’s four possible choices are
•
•
•
•
A&C
A&D
B&C
B&D
Which pair would you recommend?
33
Usual Concurrent Choices
Pick one of A or B, and pick one of C or D
•
•
•
•
A: Gain $240 for sure [84%](Tversky & Kahneman, ‘81)
B: 25% chance to gain $1000, else gain 0 [16%]
C: Sure loss of $750 [13%]
D: 75% chance to lose $1000, else lose 0 [87%]
But usual choice (A & D) is objectively worse than
unusual choice (B & C). Why?
• (A and D) is equivalent to 75% chance to lose $760 (= $240 $1000), else gain $240.
• (B and C) is equivalent to 75% chance to lose $750, else gain
$250.
34
Stochastic dominance
• (A and D) gives 75% chance to lose $760,
25% chance to gain $240.
• (B and C) gives 75% chance to lose $750,
25% chance to gain $250.
• (B & C) stochastically dominates (A & D)
– First-order stochastic dominance (FSD):
Probabilities of preferred outcomes are
greater for (B & C) than for (A & D)
• DA identifies undominated choices
35
Conclusions on concurrent choice
• Humans tend to consider each choice in isolation
rather than looking at portfolios of outcomes and
their probabilities for different combinations of
choices
• This leads us to make predictably sub-optimal
choices. (So do many other psychological barriers to
rational decision-making.)
• Statistical consulting (and decision analysis) can
overcome such sub-optimal decision-making.
36
Prescriptive analytics
• What to do next?
– Uncertain models for predicting probable effects
of different actions
– Value-of-information
– Exploration-exploitation trade-off
• Decision rules
• Trial design
– Sequential multiple assignment randomized trials
37
Optimize changes in controllable inputs,
given Pr(c | x) and u(c)
https://wdsi.wordpress.com/2014/03/30/analysing-terrorism-from-a-systems-thinking-perspective/
http://www.tandfonline.com/doi/abs/10.1198/jasa.2009.0155?journalCode=uasa20
38
Traditional prescriptive analytics
• Standard decision problem: Choose x  X to
maximize objective function f(x)
– x = decision variable (control vector, trajectory, policy
mapping information to action, etc.)
• May be distributed over multiple collaborating agents
– X = choice set
• Possibly specified by constraints, hard to search
• Typically, f(x) = EU(x) = expected utility of decision x
– Possibly evaluated via Monte Carlo
39
In slightly more detail…
• Pr(c | x) = probability of receiving
consequence c if we do x
– c is a member of consequence set C
– Pr(c | x) is a risk model
• Fine print: Pr(c | x) is not really a conditional
probability, because x is an act, not an event
• u(c) = utility of consequence c
• EU(x) =  u(c)*Pr(c | x)
c C
40
Challenges
• EU(x) may be uncertain or unknown or may be
different for different agents
– Unknown consequence probabilities, Pr(c | x)
• How to estimate from data?
– Uncertain consequence set, C (“Black swans”)
– Uncertain choice set, X
• What constraints hold? How hard or soft?
– Uncertain values, u(c)
• Problems with multiple interacting agents
• Even for known EU(x), maximization problem may
be hard (or impossible)
41
Causal analysis challenge
• What to do next when consequences of
alternative decisions are uncertain?
– Known probabilities → EU theory
• Known risk model: Pr(c | x)
• Known probabilities of different risk models
– Unknown probabilities → ?
42
Decisions with unknown models
• Learn causal model(s) from data/experience, then
use them to optimize decisions
– Design of experiments (DOE)
– Randomized control trials (RCTs) and practical PCTs
– Learn from natural experiments, quasi-experiments
• Learn decision rules from data
– Adaptive learning: trial and error, low-regret learning
– Highly effective in simple (e.g., MDP) environments
• Robust optimization
– Find decisions that perform well no matter how uncertainties are
resolved
43
Algorithms for optimizing actions
• Influence diagram algorithms
– Learning from data
– Validating causal mechanisms
– Using for inference and recommendations
• Simulation-optimization
• Robust optimization
• Adaptive optimization/learning algorithms
44