SEEM 3470: Dynamic Optimization and Applications
2013–14 Second Term
Handout 1: Introduction to Dynamic Programming
Instructor: Shiqian Ma
January 6, 2014
Suggested Reading: Sections 1.1 – 1.5 of Chapter I of Richard Bellman, Dynamic Programming,
Dover Publications, Inc., 2003. Also review material from SEEM 3440: Operations Research II.
1
Dynamic Programming: Introduction and Examples
• Operations Research: a science about decision making
– Operations: Activities carried out in an organization related to the attainment of its
goals: Decision making among different options (Example: Shortest Path)
– Research: Scientific methods to study the operations
– Operations Research: Develop scientific methods to help people make decisions of activities so as to achieve a specific objective
– Two features:
∗ Decision making: which path?
∗ Achieve some objective, e.g., maximize profits or minimize costs
• Deterministic model
– All info and data are deterministic
– Produce chair and table using two materials
• Stochastic model
– Some info and data are stochastic
– Lifespan of a usb, when should I replace it?
• Where is operations research used?
– Airline: Scheduling aircrafts crews (minimum number of crews)
– Logistics and supply chain: Inventory (how many to order, demand, order cost, inventory
cost)
– Revenue management: Pricing (retailer selects products to display)
– Financial industry: Portfolio selection, asset allocation
– Civil engineering: Traffic analysis and transportation system design (the routes and
frequency of buses, emergency evacuation system)
• Dynamic Programming: multi-stage optimization: take advantage of the new information in
each stage to make a new decision.
• Example:
1
– Scheduling (Shortest Path)
– Inventory Control
– Two-game chess match
– Machine replacement
2
Basic Terminologies in Optimization
An optimization problem typically takes the form
minimize
f (x)
subject to x ∈ X.
(P )
Here, f : Rn → R is called the objective function, and X ⊂ Rn is called the feasible region.
Thus, x = (x1 , . . . , xn ) is an n–dimensional vector, and we shall agree that it is represented in
column form. In other words, we treat x as an n × 1 matrix. The entries x1 , . . . , xn are called the
decision variables of (P ).
If X = Rn , then (P ) is called an unconstrained optimization problem. Otherwise, it is
called an constrained optimization problem.
As the above formulation suggests, we are interested in an optimal solution to (P ), which
is defined as a point x∗ ∈ X such that f (x∗ ) ≤ f (x) for all x ∈ X. We call f (x∗ ) the optimum
value of (P ).
To illustrate the above concepts, let us consider the following example:
Example 1 Suppose that f : R2 → R is given by f (x1 , x2 ) = x21 + 2x22 , and X = {(x1 , x2 ) ∈ R2 :
0 ≤ x1 ≤ 1, 1 ≤ x2 ≤ 3}. Then, we can write (P ) as
(P )
minimize
x21 + 2x22
subject to
0 ≤ x1 ≤ 1, 1 ≤ x2 ≤ 3.
This is a constrained optimization problem, and it is easy to verify that
f (x1 , x2 ) ≥ f (0, 1) = 2
for all (x1 , x2 ) ∈ X.
Thus, we say that (0, 1) is an optimal solution to (P ), and f (0, 1) = 2 is the optimal value.
It is worth computing the derivative of f at (0, 1):
∂f
#
" # " #
"
2x1
0
0
∂x1
=
=
6=
.
∂f
4x2 (x ,x )=(0,1)
1
0
1 2
∂x2 (x1 ,x2 )=(0,1)
This shows that for a constrained optimization problem, the derivative at the optimal solution need
not be zero.
The different structures of f and X in (P ) give rise to different classes of optimization problems.
Some important classes include:
1. discrete optimization problems, when the set X consists of countably many points;
2
2. linear optimization problems, when f takes the form a1 x1 + a2 x2 + · · · + an xn for some
given a1 , . . . , an , and X is a set defined by linear inequalities;
3. nonlinear optimization problems, when f is nonlinear or X cannot be defined by linear
inequalities alone;
4. stochastic optimization problems, where f takes the form
f (x) = EZ [F (x, Z)],
where Z is a random parameter.
To illustrate the above concepts, let us consider the following problem, which will serve as our
running example:
Resource Allocation Problem. Suppose that we have an initial wealth of S0 dollars, and we
want to allocate it to two investment options. By allocating x0 to the first option, one earns a
return of g(x0 ). The remaining S0 − x0 dollars will earn a return of h(S0 − x0 ). Here, we are
assuming that 0 ≤ x0 ≤ S0 , so that we are not borrowing extra money to fund our investments.
Now, a natural goal is choose the allocation amount x0 to maximize our total return, which is given
by f (x0 ) = g(x0 ) + h(S0 − x0 ).
In our notation, the resource allocation problem is nothing but the following optimization
problem:
maximize g(x0 ) + h(S0 − x0 )
(RAP)
subject to x0 ∈ X = [0, S0 ].
Consider the following scenarios:
1. Suppose that both g and h are linear, i.e., g(x) = ax + b and h(x) = cx + d for some
a, b, c, d ∈ R. Then, (RAP) becomes
maximize
(a − c)x0 + b + d + cS0
subject to
0 ≤ x0 ≤ S0 ,
(RAP–L)
which is a linear optimization problem. In this case, the optimal solution to (RAP–L) can
be determined explicitly. Indeed, if a − c ≥ 0, then it is profitable to make x0 as large as
possible, and hence the optimal solution is x∗0 = S0 . On the other hand, if a − c < 0, then a
similar argument shows that the optimal solution should be x∗0 = 0.
Suppose that we change the constraint in (RAP–L) from x0 ∈ [0, S0 ] to
S0 2S0
x0 ∈ X = 0, ,
, . . . , S0 ,
M M
where M ≥ 2 is some integer. Then, the problem becomes a discrete optimization problem,
as the feasible region X now consists of only a finite number of points.
2. Suppose that g(x) = a log x and h(x) = b log x for some a, b > 0. Then, (RAP) becomes
maximize
a log x0 + b log(S0 − x0 )
subject to
0 ≤ x0 ≤ S0 ,
3
(RAP–LOG)
which is a nonlinear optimization problem. Observe that if x∗0 is an optimal solution to
(RAP–LOG), then we must have 0 < x∗0 < S0 . In other words, the boundary points x0 = 0
and x0 = S0 cannot be optimal for (RAP–LOG). This implies that the optimal solution x∗0
can be found by differentiating the objective function and setting the derivative to zero; i.e.,
x∗0 satisfies
df
a
b
=
−
= 0.
dx0
x0 S0 − x0
In particular, we obtain x∗0 = aS0 /(a + b).
3. Let Z be a random variable with
1
Pr(Z = 1) = ,
4
3
Pr(Z = −1) = .
4
Consider the functions G and g defined by
G(x, Z) = Zx + b,
g(x) = EZ [G(x, Z)],
where b ∈ R is a given constant. Furthermore, suppose that h(x) = cx + d, where c, d ∈ R
are given. Then, (RAP) becomes
maximize
EZ [G(x, Z)] + cx + d
subject to
0 ≤ x0 ≤ S0 ,
(RAP–S)
which is a stochastic optimization problem. Note that by definition of expectation, we have
EZ [G(x, Z)] = G(x, −1) · Pr(Z = −1) + G(x, 1) · Pr(Z = 1)
=
1
3
(−x + b) + (x + b)
4
4
1
= − x+b
2
for any x. Hence, (RAP–S) can be written as
1
x+b+d
maximize
c−
2
subject to 0 ≤ x0 ≤ S0 ,
which is a simple linear optimization problem.
3
Introduction to Dynamic Programming
Observe that all the optimization problems introduced in the previous section involve only an one–
stage decision, namely, to choose a point x∗ in the feasible region X to minimize an objective
function f . However, in reality, information is often released in stages, and we are allowed to take
advantage of the new information in each stage to make a new decision. This gives rise to multi–
stage optimization problems, which we shall refer to as dynamic programming or dynamic
optimization problems.
4
Before we introduce the theory of dynamic programming, let us study an example and understand some of the difficulties of dynamic optimization. Consider a two–stage generalization of the
resource allocation problem, in which the first–stage proceeds as before. However, as a price of
obtaining the return g(x0 ), the original allocation x0 to the first option is reduced to ax0 , where
0 < a < 1. Similarly, the allocation S0 − x0 for obtaining the return h(S0 − x0 ) is reduced to
b(S0 − x0 ), where 0 < b < 1. In particular, at the end of the first stage, the available wealth for
investment in the next stage is S1 = ax0 + b(S0 − x0 ). Now, in the second–stage, one can again
split the S1 dollars into the two investment options, obtaining a return of g(x1 ) + h(S1 − x1 ) if x1
dollars is allocated to the first option and the remaining amount S1 − x1 is allocated to the second
option. The goal now is to choose the allocation amounts x0 and x1 in both stages to maximize
the total return
fS0 (x0 , x1 ) = g(x0 ) + h(S0 − x0 ) + g(x1 ) + h(S1 − x1 ).
In other words, we can formulate the two–stage resouce allocation problem as follows:
maximize
g(x0 ) + h(S0 − x0 ) + g(x1 ) + h(S1 − x1 )
subject to
0 ≤ x0 ≤ S0 , 0 ≤ x1 ≤ S1 ,
(RAP–2)
S1 = ax0 + b(S0 − x0 ).
Of course, there is no reason to stop at a second–stage problem. By iterating the above process,
we have an N –stage resource allocation problem, where at the end of the k–th stage (where k =
1, 2, . . . , N − 1), the available wealth would be Sk = axk−1 + b(Sk−1 − xk−1 ), where xk−1 is the
amount allocated to the first option in the k–th stage. Mathematically, the N –stage problem can
be formulated as follows:
N
−1
X
maximize
g(xk ) + h(Sk − xk )
k=0
(RAP–N )
0 ≤ x0 ≤ S0 , 0 ≤ x1 ≤ S1 , . . . , 0 ≤ xN −1 ≤ SN −1 ,
subject to
Sk = axk−1 + b(Sk−1 − xk−1 )
for k = 1, . . . , N − 1.
Now, an important question is, how would one solve (RAP–N )? If g, h are linear, then (RAP–
N ) is a linear optimization problem, and hence it can in principle be solved by, say, the simplex
method. However, the problem becomes more difficult if g, h are nonlinear. One possibility is to
use calculus. Towards that end, suppose that the optimal solution (x∗0 , x∗1 , . . . , x∗N −1 ) to (RAP–N )
satisfies 0 < x∗k < Sk for k = 0, 1, . . . , N − 1. Let
fS0 (x0 , x1 , . . . , xN −1 ) =
N
−1
X
g(xk ) + h(Sk − xk ) .
k=0
Then, we set all the partial derivatives of fS0 to zero and solve for x0 , x1 , . . . , xN −1 :
∂fS0
∂xN −1
= g 0 (xN −1 ) − h0 (SN −1 − xN −1 ) = 0,
∂fS0
∂xN −2
= g 0 (xN −2 ) − h0 (SN −2 − xN −2 ) + (a − b)h0 (SN −1 − xN −1 ) = 0,
..
.
5
This approach requires us to solve a system of N nonlinear equations in N unknowns, which in
general is not an easy task. Worse yet, we have to also check the boundary points xk = 0 and
xk = Sk for optimality. Fortunately, not all is lost. Observe that in the above approach, we have not
taken into account the sequential nature of the problem, i.e., the allocations x0 , x1 , . . . , xN −1 should
be determined sequentially. This motivates us to consider approaches that can take advantage of
such a structure.
Towards that end, observe that the maximum total return of the N –stage resource allocation
problem depends only on N and the initial wealth S0 . Hence, we can define a function qN by
qN (S0 ) = max {fS0 (x0 , x1 , . . . , xN −1 ) : 0 ≤ xk ≤ Sk for k = 0, 1, . . . , N − 1} .
(1)
In words, qN (S0 ) is the maximum return of the N –stage resource allocation problem if the initial
wealth is S0 . For instance, we have
q1 (S0 ) = max {g(x0 ) + h(S0 − x0 ) : 0 ≤ x0 ≤ S0 } ,
(2)
which coincides with (RAP). Now, although we can use the definition of q2 (S0 ) as given in (1), we
can also express it in terms of q1 (S0 ). To see this, recall that the total return of the 2–stage problem
is the first–stage return plus the second–stage return. Clearly, whatever we choose the first–stage
allocation x0 to be, the wealth available at the end of the first stage, i.e., S1 = ax0 + b(S0 − x0 ),
must be allocated optimally for the second stage if we wish to maximize the total return. Thus, if
x0 is our allocation in the first stage, then we will obtain a return of q1 (S1 ) in the second stage by
choosing x1 optimally. It follows that
q2 (S0 ) = max {g(x0 ) + h(S0 − x0 ) + q1 (ax0 + b(S0 − x0 )) : 0 ≤ x0 ≤ S0 } .
(3)
More generally, by using the same idea, we obtain the following recurrence relation for qN (s0 ):
qN (S0 ) = max {g(x0 ) + h(S0 − x0 ) + qN −1 (ax0 + b(S0 − x0 )) : 0 ≤ x0 ≤ S0 } .
(4)
An important feature of (4) is that it has only one decision variable (i.e., x0 ), as opposed to N
decision variables (i.e., x0 , x1 , . . . , xN −1 ) in the definition of qN (S0 ) as given by (1). Now, starting
with q1 (S0 ), as given by (2), we can use (3) to compute q2 (S0 ), which in turn can be used to
compute q3 (S0 ) and so on using (4). Thus, the formulation (4) allows us to turn the original N –
variable formulation (RAP–N ) into N one–dimensional problems. We shall see the computational
advantage of such a formulation later in the course.
As an illustration, consider the following example:
Example 2 Consider the 2–stage resource allocation problem, where g(x) = a log x and h(x) =
b log x for some a, b > 0, and the initial wealth is S0 . Recall that the maximum total return of this
problem is given by
q2 (S0 ) = max {g(x0 ) + h(S0 − x0 ) + q1 (ax0 + b(S0 − x0 )) : 0 ≤ x0 ≤ S0 } .
To determine q2 (S0 ), we start with q1 (S1 ), where S1 = ax0 + b(S0 − x0 ). By definition, we have
q1 (S1 ) = max {a log x + b log(S1 − x) : 0 ≤ x ≤ S1 } .
Observe that the optimal solution x∗ to q1 (S1 ) must satisfy 0 < x∗ < S1 . Hence, by differentiating
the objective function and setting the derivative to zero, we obtain
a
b
−
=0
∗
x
S1 − x∗
⇐⇒
6
x∗ =
a
S1 .
a+b
In particular,
q1 (S1 ) = a log(rS1 ) + b log((1 − r)S1 ),
where r =
a
.
a+b
Upon substituting this into q2 (S0 ), we have
q2 (S0 ) = max {a log(x0 ) + b log(S0 − x0 ) + a log(rS1 ) + b log((1 − r)S1 ) : 0 ≤ x0 ≤ S0 } .
Again, the optimal solution x∗0 to q2 (S0 ) must satisfy 0 < x∗0 < S0 . Hence, by differentiating the
objective function and setting the derivative to zero, we have
b(a − b)
b
a(a − b)
a
+ ∗
= 0.
−
+ ∗
∗
∗
∗
x0 S0 − x0 ax0 + b(S0 − x0 ) ax0 + b(S0 − x∗0 )
This is just a quadratic equation in x∗0 and hence the optimal solution x∗0 can be found easily. We
leave this as an exercise to the reader.
7
© Copyright 2026 Paperzz