Statistical Consulting - Cox Associates Consulting

Math 6330: Statistical Consulting
Class 11
Tony Cox
[email protected]
University of Colorado at Denver
Course web site: http://cox-associates.com/6330/
Course schedule
•
•
•
•
April 14: Draft of project/term paper due
April 18, 25, May 2: In-class presentations
May 2: Last class
May 4: Final project/paper due by 8:00 PM
2
MAB Thompson sampling (cont.)
3
Thompson sampling and adaptive
Bayesian control: Bernoulli trials
• Basic idea: Choose each of the k actions according to
the probability that it is best
• Estimate the probability via Bayes’ rule
– It is the mean of the posterior distribution
– Use beta conjugate prior updating for “Bernoulli bandit”
(0-1 reward, fail/succeed)
– Sample from posterior for each arm, 1… k; choose the one
with highest sample value. Update & repeat.
S = success
F = failure
http://jmlr.org/proceedings/papers/v23/agrawal12/agrawal12.pdf Agrawal and Goyal, 2012
4
Thompson sampling: General stochastic
(random) rewards
• Second idea: Generalize to arbitrary reward
distribution (normalized to the interval [0, 1])
by considering a trial a “success” with
probability equal to its reward
http://jmlr.org/proceedings/papers/v23/agrawal12/agrawal12.pdf Agrawal and Goyal, 2012
5
Thompson sampling with complex
online actions
•
Main idea: Embed simulationoptimization in Thompson
sampling loop
–  = state space, S
– Sample the states
•
Applications: Job scheduling
(assigning jobs to machines); web
advertising with reward
depending sets of ads shown
– Y = observation h = reward, X =
random variable depending on 
•
Updating posteriors can be done
efficiently using a sampling-based
approach (particle filtering)
Gopalan et al., 2014 http://jmlr.org/proceedings/papers/v32/gopalan14.pdf
6
Comparing methods
• In simulation experiments,
Thompson sampling works well
with batch updating, even with
slowly or occasionally changing
rewards and other realistic
complexities.
• Beats UCB1 in many but not all
comparisons
• More practical than UCB1 for
batch updating because it keeps
experimenting (trying actions
with some randomness) between
updates.
http://engineering.richrelevance.com/recommendations-thompson-sampling/
7
MAB variations
• Contextual bandits
– See signal before acting
– Constrained contextual bandits: Actions constrained
• Adversarial bandits
– Adaptive adversaries
–
Bubeck and Slivens, 2012, https://www.microsoft.com/en-us/research/wp-content/uploads/2017/01/COLT12_BS.pdf
• Restless bandits: Probabilities change
• Gittins index maximizes expected discounted reward,
not easy to compute
• Correlated bandits
8
Wrap-up on MAB problems
• Adaptive Bayesian learning works well in
simple environments, including many of
practical interest
• The resulting rules are *much* simpler to
implement than previous methods (e.g.,
Gittins index policies)
• Sampling-based approaches (Thomposn,
particle filtering, etc.) promote
computationally practical “online learning”
9
Wrap-up on adaptive learning
• No need for a causal model
– Learn act-consequence probabilities and optimal
decision rules directly
• Assumes a stationary (or slowly changing)
decision environment, known choice set,
immediate feedback (reward) following action
• Works very well when these assumptions are
met: low-regret learning is possible
10
Optimal stopping
11
Optimal stopping decision problems
• Suppose that a decision-maker (d.m.) faces a random
sequence of opportunities
• How long to wait for best one?
• When to stop and commit to a final choice?
• Examples: Selling a house, hiring a new employee,
accepting a job offer, replacing a component,
shuttering an aging facility, taking a parking spot, etc.
• Other optimal stopping problems: Least-cost policies
for replacing aging components
12
Hazard functions: Conditional rate of
failure given survival so far
• Let T = length of life for a component (or person, or time
until first occurrence of an event, etc.)
– T is a random variable with cdf F(t) = Pr(T < t) and survival
function S(t) = 1 – F(t) = Pr(T > t)
– The pdf for T is then f(t) = F’(t) = dF(t)/dt
• The hazard function for T is defined as:
• h(t) = limdt0Pr(t < T < t + dt | T > t)/dt
• h(t) = f(t)/S(t) = f(t)/[1 – F(t)]
– Interpretation: “instantaneous failure rate”
• h(t)dt  Pr(occurs in next dt | survival until t)
– In discrete time, dt = 1, no limit is taken
13
Using hazard functions to guide decisions
• The shape of the hazard
function can often guide
decisions, e.g…
– If h(t) is increasing, then optimal
time to stop is when h(t) reaches a
certain threshold
– If h(t) is decreasing, then best
decision is either don’t start or else
continue until failure occurs
– Normal distribution hazard
function calculator is at
http://reliabilityanalyticstoolkit.ap
pspot.com/normal_distribution
– SPRT and other calculators:
http://reliabilityanalyticstoolkit.ap
pspot.com/
www.wolfram.com/mathematica/new-in-9/enhanced-probability-and-statistics/define-a-distributio
given-its-hazard-function.html
https://www.ncss.com/software/ncss/survival-analysis-in-ncss/
14
Example: optimal age replacement
• The lifetime T of a component is a random variable with
known distribution
• Suppose it costs $10 to replace the plant before it fails
and $50 to replace it if it fails.
• When should the component be voluntarily replaced (if
not failed yet)?
• Answer can be calculated by minimizing expected
average cost per cycle (or equating marginal benefit to
marginal cost for continuing), but calculations are
detailed and soon get tedious
• Alternative: Google “optimal replacement age
calculator”
15
Optimal age replacement calculator
http://www.reliawiki.org/index.php/Optimum_Replacement_Time_Example
16
Optimal selling of an asset
• If offers arrive sequentially from a known distribution and
costs of waiting are known, then an optimal decision
boundary (blue) can be constructed to maximize EMV
Sell when red line first
hits blue decision
boundary
W(t) = price series
S(t) = maximum price so
far
http://file.scirp.org/Html/9-1040163_25151.htm
17
Optimal stopping: Variations
• Offers arrive sequentially from an unknown distribution
– Bayesian updating provides solutions
• Time pressure: Must sell by deadline, or fixed number of
offers
• With or without being able to go back to previous offers
Sell when blue line first
hits green decision
boundary
http://file.scirp.org/Html/9-1040163_25151.htm
18
Wrap-up on optimal stopping and
statistical decision theory
• Many valuable decision problems can be solved using the
philosophy of simulation-optimization:
– Try different decisions, evaluate their probable consequences
– Choose the one with best (EMV or EU-maximizing) probability
distribution of consequences
• Finding a best decision or decision rule can become very
technical
– Use appropriate software or on-line calculators
• For business applications, understanding how to formulate
decision problems and solve them with software can create
high value in practice
19
Heuristics and biases
20