2351 06a

Session 6a
Agenda
Stochastic dynamic programming
Examples:
–TV Game Show
–Personalized Marketing
–Stochastic Production Scheduling
TV Game Show
Beginning the game with no accumulated winnings, you spin the
wheel once, and after each spin, you are allowed to spin again if you
wish.
On any one spin, four possible outcomes:
Outcome
Win $1
Win $5
Win $10
Lose 100% (game over)
Probability
0.25
0.25
0.25
0.25
TV Game Show
After each spin, assuming you still have money, you must either
(i) take your accumulated winnings and thereby end the game or
(ii) choose to spin the wheel again.
Your objective is to maximize your expected accumulated winnings
from playing the game.
Solve the problem assuming there are one or two or three (or ten)
spins available.
Single Spin Analysis
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
A
Initial State
$
$
$
$
-
B
$ 1.00
$ 5.00
$ 10.00
wipe out
C
D
E
End State Prob.
$ 1.00 0.25
$ 5.00 0.25
=A4+B4
$ 10.00 0.25
$
0.25
F
$ 4.00
Initial State
$
20.00
$
20.00
$
20.00
$
20.00
$ 1.00
$ 5.00
$ 10.00
wipe out
G
H
I
J
=SUMPRODUCT(C2:C5,D2:D5)
End State Prob.
$ 21.00 0.25
$ 25.00 0.25
$ 30.00 0.25
$
0.25
$ 19.00
=SUMPRODUCT(C10:C13,D10:D13)
Case 1. Begin with $0: Best to spin
Case 2. Begin with $20: Best not to spin
How to organize a general model for all situations?
Multiple-Spins Analysis: Tree Representation
0
0
0
0
3
2
1
7
6
11
5
10
12
15
10
11
16
15
20
21
20
25
30
Characteristics of Example
Characteristic 1
–The problem can be divided into stages with a decision required at
each stage.
Characteristic 2
–Each stage has a number of states associated with it.
–By a state, we mean the information that is needed at any stage to
make an optimal decision.
Characteristic 3
–The decision chosen at any stage describes how the state at the
current stage is transformed into the state at the next stage
(transition state).
–How?
Characteristics of DP Applications
Characteristic 4
− Given the current state, the optimal decision for each of the
remaining stages must not depend on previously reached states or
previously chosen decisions.
− This is known as the principle of optimality.
Characteristic 5
− If the states for the problem have been classified into one of T
stages, there must be a recursion that relates the cost or reward
earned during stages t, t+1, …., T to the cost or reward earned from
stages t+1, t+2, …. T (cost/value-to-go function).
DP with Uncertainty
In deterministic dynamic programming, a specification of the current
state and current decision was enough to tell us with certainty the new
state (transition state) and the immediate costs/payoff during the
current stage.
In many practical problems, these factors may not be known with
certainty, even the current state and decision are known.
Fishing example
• Profit/ton can be uncertain
• Reproduction rate can be uncertain
Adapt deterministic DP methodology to incorporate uncertainty.
• Expected values?
• Multiple transition states, each assigned a probability?
The value of the best decision, given State i at Stage t,
𝑛
𝑓 𝑆𝑖,𝑡 = 𝑚𝑎𝑥
𝑆𝑖 ,
𝑓 𝑆𝑖,𝑡+1 𝑃 𝑆𝑡+1 = 𝑆𝑖
𝑖=1
EMV from Spinning
Cash
A
1
2
3
4
5
6
7
8
9
10
A
$
B
B
1.00 $
5.00
0.25
0.25
Value Function
3 Spins Left
2 Spins Left
$
6.81 $
6.00
$
6.56
$
7.13
$
7.69
$
8.25
C
C
$
D
E
F
G
H
I
2 Spins Left
$
$
1.00
$
2.00
$
3.00
$
4.00
1 Spin Left
$
$
1.00
$
2.00
$
3.00
$
4.00
0 Spins Left
$
$
1.00
$
2.00
$
3.00
$
4.00
D
10.00 Wipe Out
0.25
0.25
1 Spin Left
0 Spins Left
$
4.00 $
=MAX(H7,(SUM(C8,C12,C17)/4))
$
4.75 $
1.00
$
5.50 $
2.00
$
6.25 $
3.00
$
7.00 $
4.00
States
3 Spins Left
$
-
Multiple-Spins Analysis: Tree Representation
0
0
$6.81
0
0
3
1
$6.56
2
$5.50
7
6
$8.50
5
$8.81
10
$11.50
11
12
15
10
$11.88
11
$12.25
15
$15.25
16
20
21
20
$19.00
25
30
Personalized Marketing
Suppose you run an online wine shop that sells 12-bottle cases of
wines of your choice.
You have segmented your customers based on how many boxes of
wine they have purchased in the previous quarter.
You plan to send emails offering discount on the 12-bottle cases to two
unique segments of customers:
•active customers who have made a purchase in the previous quarter,
•inactive customers who have not made a purchase in the previous
quarter.
Personalized Marketing
By analyzing historical data, you have estimated the chance of a
customer making a purchase in response to different discount
levels.
You have found that customers in different segments respond
different to discount offers.
Customer Segment
Inactive
Inactive
Inactive
Active
Active
Active
Discount Probability of Purchase
None
0.030
Minor
0.051
Major
0.171
None
0.110
Minor
0.150
Major
0.504
You would like to run the online shop for five more quarters. What
is your optimal strategy to maximize the expected lifetime value of
a customer.
Assume 4% annual discount rate.
Personalized Marketing
In particular, you are considering choosing one of the three levels
of discounts for each customer each quarter:
Discounts
0%
11.67%
46.67%
Description
Direct Cost
No discount
0
Minor discount
$1
Major discount
$1
Your supplier charges you $65 per 12-bottle case of wines. Without
any discount, your customer would pay $150 for a case of wines.
This implies that a customer needs to pay only $80 if he/she is
offered a major 46.67% discount.
Active means “made a purchase this month”.
Note that a customer is always in one of two states, as
represented in this network diagram.
4 Months to Go
3 Months to Go
2 Months to Go
1 Month to Go
0 Months to Go
Inactive
Inactive
Inactive
Inactive
Inactive
Active
Active
Active
Active
Active
Also note that there are three decision alternatives for each
state, but only two possible subsequent states (each of
which could be reached through any of the three decision
alternatives).
We will define an end stage, when there are no months left.
Assume that neither type of customer buys anything in this
“terminal” stage.
Inactive
$0.00
Active
$0.00
EMV = Expected Profit This Period + Expected Profit in Future Periods
Inactive customer with 1 month to go:
No discount
(0.030 * $85.00) + 0.99 * ($0.00) = $2.55
Minor discount
(0.051 * $66.50) + 0.99 * ($0.00) = $2.44
Inactive
$2.55
Inactive
$0.00
Active
Active
$0.00
Major discount
(0.171 * $14.00) + 0.99 * ($0.00) = $1.57
Expected Profit this Period
Expected Profit in Future Periods
Might have multiple outcomes
Might involve discounting
Inactive customer with 1 month to go:
No discount
(0.030 * $85.00) + 0.99 * ($0.00) = $2.55
Minor discount
(0.051 * $66.50) + 0.99 * ($0.00) = $2.44
Major discount
(0.171 * $14.00) + 0.99 * ($0.00) = $1.57
1
2 Cost
3
A
4
5
6 Probability
7 Inactive
8 Active
9
10 Revenue
11
12 Profit
0.0000
0.1167
0.4667
0.030
0.110
0.051
0.150
0.171
0.504
Expected Profit this Period
Inactive
Active
Discount
0.990099
$ 150.00 $ 132.50 $ 80.00
$65.00
$65.00
$65.00
$85.00
$67.50
$15.00
13
14 0 Month to go
15
Nothing
16
17
18
19
20
21
Expected
Profit
in Future
PeriodsH
E
F
G
B
C
D
Nothing Minor
Major
$0.00
$1.00
$1.00
$0.000
$0.000
Minor
$0.000
$0.000
Major
$0.000
$0.000
Optimal
Decision
$0.000
$0.000
=-D$2+D$7*D$12+$F$2*(D$7*$F17+(1-D$7)*$F16)
1 Month to go
Nothing Minor
Major
Inactive
$2.55
$2.44
$1.57
Optimal
$2.550 Nothing
Inactive customer with 1 month to go:
No discount
(0.030 * $85.00) + 0.99 * ($0.00) = $2.55
Minor discount
(0.051 * $66.50) + 0.99 * ($0.00) = $2.44
Major discount
(0.171 * $14.00) + 0.99 * ($0.00) = $1.57
Expected Profit from Best Decision
19
20
21
22
A
B
C
D
1 Month to go
Nothing Minor
Major
Inactive
$2.55
$2.44
$1.57
Active
$9.35
$9.13
$6.56
E
F
G
Optimal
=MAX(B21:D21)
$2.550 Nothing
$9.350 Nothing
H
Active customer with 1 month to go:
No discount
(0.110 * $85.00) + 0.99 * ($0.00) = $9.35
Minor discount
(0.150 * $66.50) + 0.99 * ($0.00) = $9.13
Major discount
(0.504 * $14.00) + 0.99 * ($0.00) = $6.56
Inactive
$2.55
Inactive
$0.00
Active
$9.35
Active
$0.00
2 Months to Go
1 Month to Go
0 Months to Go
Inactive
$5.31
Minor
Inactive
$2.55
Nothing
Inactive
$0.00
Active
Active
$9.55
Nothing
Active
$0.00
Inactive customer with 2 months to go:
No discount:
=$-1.00+(0.03*$85.00)+0.99*(0.03*$9.35+(1-0.03)*$2.55)=$5.28
Minor discount:
=$-1.00+(0.051*$67.5)+0.99*(0.051*$9.35+(1-0.051)*$2.55)=$5.31
Major discount:
=$-1.00+(0.171*$15)+0.99*(0.171*$9.35+(1-0.171)*$2.55)=$5.24
2 Months to Go
1 Month to Go
0 Months to Go
Inactive
$5.31
Minor
Inactive
$2.55
Nothing
Inactive
$0.00
Active
$12.66
Minor
Active
$9.55
Nothing
Active
$0.00
Active customer with 2 months to go:
No discount:
=$-1.00+(0.11*$85.00)+0.99*(0.11*$9.35+(1-0.11)*$2.55)=$12.62
Minor discount:
=$-1.00+(0.15*$67.5)+0.99*(0.15*$9.35+(1-0.15)*$2.55)=$12.66
Major discount:
=$-1.00+(0.504*$15)+0.99*(0.504*$9.35+(1-0.504)*$2.55)=$12.48
Solved Version
2 Months to Go
1 Month to Go
0 Months to Go
4 Months to Go
3 Months to Go
Inactive
$10.81
Major
Inactive
$8.07
Minor
Inactive
$5.31
Minor
Inactive
$2.55
Nothing
Inactive
$0.00
Active
$18.25
Major
Active
$15.49
Major
Active
$12.66
Minor
Active
$9.55
Nothing
Active
$0.00
Production Planning
At the beginning of each period, decide the production quantity
(before knowing the actual demand of that period).
Each period’s demand is equally likely to be 1 or 2 units.
Production cost is c(x) if x units is produced. Assume $5x.
It is required that all demand be met on time. All demand occurs at the
beginning of the period.
After meeting the current period’s demand out of current production
and inventory, the firm’s end-of-period inventory is evaluated, and a
holding cost of $1 per unit is assessed.
Inventory at the end of each period cannot exceed 3 units. Any
inventory on hand at the end of period 3 can be sold at $2 per unit.
At the beginning of period 1, the firm has 1 unit of inventory.
Production Planning: Characteristics
The problem can be divided into three stages (three periods)
At each stage, the production quantity has to be decided
The state for each stage is the beginning inventory, this is the
information needed to make future decisions
Optimal decisions for the remaining stages do not depend on how we
reached the beginning inventory of the current stage
The expected cost from stage t to stage 3 is the sum of the expected
immediate cost at stage 3, and the expected cost from stage t+1 to
stage 3.
Stochastic Production Planning Model
Define ft(i) to be the minimum expected net cost incurred during the
periods t, t +1,…3 when the inventory at the beginning of period t is i
units. Then
Probabilities


1
1
1
1
f 3 (i )  min c( x )   i  x  1   i  x  2    2i  x  1   2(i  x  2 )
x
2
2
2
2


Production
Cost
Inventory Costs
2 Scenarios
End of other periods
Salvage Values
2 Scenarios
End of 3rd Period
where x must be an integer and x must satisfy (2-i) ≤ x ≤ (4 - i).
Stochastic Production Planning Model
For t = 1, 2, we can derive the recursive relation for ft(i) by noting that
for any month t production level x, the expected costs incurred during
periods t, t+1, …,3 are the sum of the expected costs incurred during
period t and the expected costs incurred during periods t+1, t+2, …,3 .
If x units are produced during month t, the expected cost during
month t will be c(x) + (½) (i+x-1)+ (½)(i+x-2).
If x units are produced during month t, the expected cost during
periods t+1, t+2, …,3 is computed as follows.
Stochastic Production Planning Model
Half of the time, the demand during period t will be 1 unit, and the
inventory at the beginning of t+1 will be i + x – 1.
In this situation, the expected costs incurred during periods t+1, t+2,
…,3 is ft+1(i+x-1).
Similarly, there is a ½ chance that the inventory at the beginning of
period t+1 will be i + x – 2.
In this case, the expected cost incurred during periods t+1, t+2, …,3
will be ft+1(i+x-2).
In summary, the expected cost during the periods t+1, t+2, …,3 will be
(½) ft+1(i+x-1) + (½) ft+1(i+x-2).
Stochastic Production Planning Model
For t = 1,2


1
1
1
1
f t (i )  min c( x )   i  x  1   i  x  2     f t  1 i  x  1    f t  1 (i  x  2 )
x
2
2
2
2


where x must be an integer and x must satisfy (2-i) ≤ x ≤ (4-i).
Period 1
1
Period 4
Period 3
Period 2
0
0
0
1
1
1
2
3
0
Period 1
1
$28.00
Period 4
Period 3
Period 2
0
$25.00
0
$17.00
0
$9.00
1
$20.00
1
$12.00
1
$4.00
2
$15.00
3
$11.00
Inventory Optimal Production Quantity
0
2
1
1
2
0
3
0
0
Period 1
1
$28.00
Period 4
Period 3
Period 2
0
$25.00
0
$17.00
0
$9.00
1
$20.00
1
$12.00
1
$4.00
Inventory Optimal Production Quantity
0
2
1
1
2
0
3
0
0
Next Class
Stochastic DP Formulation II
Approximate lognormal distribution by binomial tree
State depends on values of multiple parameters
Assignment 2 is due before class:
Submit in class a hard copy of your report
Submit online your report and the Excel files