SHORT INTRODUCTION TO DYNAMIC PROGRAMMING 1

SHORT INTRODUCTION TO DYNAMIC PROGRAMMING
1. Example
We consider different stages (discrete time events) given by k = 0, . . . , N . Let xk
be the amount of money owned by a consumer at the stage k. At each stage k, the
consumer decides the fraction uk of the capital xk that he will use. The amount
which is consumed at the stage k is therefore given by ck = uk xk . The rest is spared
at a given interest rate. The evolution of the capital is thus given by the equation
xk+1 = ρ(1 − uk )xk
√
where ρ > 1. We consider the utility function U (c) = c. In particular, the
function U is strictly increasing and also strictly concave. It is increasing because
the consumer spends his money with something which increases his satisfaction.
However, the marginal increase in satisfaction decreases with the amount of money
which is spent. For example, the increase in satisfaction obtained by spending 10
extra kroner is less if it is added to 1000 kroner than if it is added to 10 kroner.
The concavity assumption takes this fact into account.
The consumer wants to maximize
J=
N
X
U (ck ) =
k=0
N
X
√
xk uk
k=0
The consumer has to find a good balance between his immediate satisfaction and
his future satisfaction. To illustrate this, let us consider two opposite strategies:
• The consumer uses all his money at the first stage: u0 = 1 and uk arbitrary
for k = 0, . . . , N . Then, xk = 0 for k = 1, . . . , N and
√
J = x0
• The consumer spares his money until the last stage where he spends it all:
uk = 0 for k = 0, . . . , N − 1, uN = 1. Then, xk = ρk x0 and
p
J = ρN x0
The first strategy is not optimal because the consumer at stage 1 does not take
into account the satisfaction he can get in the future by sparing. In the second
strategy, the consumer takes this fact into account and manages to get the highest
capital possible but this capital is not used in the optimal way. Indeed, due to the
concavity of the utility function, it is not optimal to spend a lot of money at the
same time. The optimal strategy is a balance between these two strategies, which
we are going to compute.
1
2
DYNAMIC PROGRAMMING
2. Terminology and statement of the problem
We consider a system where the events happen in stages and the total number of
stages is fixed. At each stage k (where k = 0, . . . , N ), the state variable xk gives
a describtion of the system. The evolution of the system is given by a governing
equation of the form
(1)
xk+1 = gk (xk , uk )
where gk is a given function and uk is a control variable. By choosing the control
variable, we influence the evolution of the state variable (we cannot, in general, set
directly the state variable; some intertia in the system can be modelled by (1)). At
each stage, there is a profit (or cost) given as a function fk (xk , uk ) of the state
variable and of the control variable. The total value (or total cost) is thus
(2)
J=
N
X
fk (xk , uk ).
k=0
We want to find the optimal values for uk (k = 0, . . . , N ) which maximize or
minimize J. We denote the optimal value of J by J ∗ .
We will consider the case where xk and uk belong to R (the scalar case) but the
results can be readily extended to the case where xk ∈ Rn and uk ∈ Rp for some
integer n and p (the vector case).
In general the state and control variable cannot just take any value in R and we
have
uk ∈ U
where U , the control space, is a given subset of R. It is standard to take U
compact (bounded and closed) so that the minimization problems have a solutions.
One can also consider a set Uk which depends on the stage.
We can finally formulate the problem as follows: Given x0 ∈ R, find the optimal
∗
sequence, which we denote {u∗k }N
k=0 , such that uk ∈ Uk for k = 0, . . . , N and
(3)
J∗ =
N
X
fk (xk , u∗k ) = max
{uk }N
k=0
k=0
N
X
fk (xk , uk )
k=0
where
(4)
xk+1 = gk (xk , uk ).
3. The DP algorithm
We define the value function Jk (x) for each stage k as follows
Definition 1. For any x ∈ R, we define Jk (x) as
(5)
Jk (x) = max
{uk }N
k=0
N
X
fi (xi , ui )
i=k
where
xk = x
and
xi+1 = gi (xi , ui ) for i = k, . . . , N.
DYNAMIC PROGRAMMING
3
It is clear from the definition of the optimal control (3) that we have
J ∗ = J0 (x0 ).
We want to compute the functions Jk (x). We can see JN (x) is easy to obtain.
Indeed, we have
JN (x) = max fN (x, u)
u∈UN
so that JN (x) is obtained by solving a standard maximisation problem (the only
unknown is u). The idea is then to compute Jk (x) for all value of x by going
backwards: We assume that Jk+1 (x) is given and then, we compute Jk (x) by using
the following proposition, which constitutes the fundamental principle in dynamic
programming.
Fundamental principle in dynamic programming. We have
(6)
Jk (x) = max (fk (x, u) + Jk+1 (gk (x, u))).
u∈Uk
Proof. Given x ∈ R, for any sequence of control {ui }N
i=k , we have
N
X
fi (xi , ui ) = fk (xk , uk ) +
i=k
N
X
fi (xi , ui )
i=k+1
≤ fk (xk , uk ) + Jk+1 (xk+1 )
(by definition of Jk+1 )
= fk (x, uk ) + Jk+1 (gk (x, uk ))
(by (4))
≤ max (fk (x, u) + Jk+1 (gk (x, u))).
(7)
u∈Uk
The right-hand side of (7) is a number which does not depend on the sequence
{ui }N
i=k . We take the maximum over all the sequences ui on the left-hand side and
obtain that
Jk (x) ≤ max (fk (x, u) + Jk+1 (gk (x, u))).
u∈Uk
It remains to prove the inequality in the other direction. From now on, we assume
that the maxima are allways attained. Consider u∗ which maximises fk (x, u) +
PN
Jk+1 (gk (x, u)) and then u∗i (i = k + 1, . . . , N ) which maximizes i=k+1 fi (xi , ui )
where xk+1 = gk (x, u∗ ) and xi+1 = gi (xi , ui ). Hence,
max (fk (x, u) + Jk+1 (gk (x, u))) = fk (x, u∗ ) + Jk+1 (gk (x, u∗ ))
u∈Uk
N
X
∗
= fk (x, u ) +
fi (xi , u∗i )
i=k+1
≤ Jk (x)
The last inequality follows from the definition of Jk as a maximum, see (5).
To solve the problem, we can use the following algorithm
DP algorithm. By using the fundamental principle of dynamic programming, we
compute Jk (x) for k = N, . . . , 0 (going backwards in k). The optimal value for J is
given by J ∗ = J0 (x0 ). The optimal control sequence {uk }N
k=0 is given by
u∗k = argmax(fk (x, u) + Jk (gk (x, u)))
u∈Uk
for k = 0, . . . , N .
4
DYNAMIC PROGRAMMING
Let us use the DP algorithm to solve the example of the first section. We compute
JN (x); we have
√
√
JN (x) = max xu = x.
u∈[0,1]
At the last stage, the consumer spends all the money left. We compute JN −1 (x);
we have
√
JN −1 (x) = max ( xu + JN (g(x, u)))
u∈[0,1]
so that
p
√
JN −1 (x) = max ( xu + ρ(1 − u)x)
u∈[0,1]
√
√
√ √
= x max ( u + ρ 1 − u).
u∈[0,1]
√ √
u + ρ 1 − u. We have
√
ρ
1
0
φ (u) = √ − √
2 u 2 1−u
We want to minimize the function φ(u) =
√
and φ0 (u∗ ) = 0 if and only if
u∗ =
1
.
1+ρ
√
Then, we have φ(u∗ ) = 1 + ρ. Since u∗ is the only extrema in (0, 1) and φ(u∗ ) ≥
√
φ(0) = ρ and φ(u∗ ) ≥ φ(1) = 1, then u∗ is the maximum. Hence,
p
√
JN −1 (x) = 1 + ρ x.
We compute JN −2 (x); we have
√
JN −2 (x) = max ( xu + JN −1 (g(x, u)))
u∈[0,1]
so that
p
p
√
JN −1 (x) = max ( xu + 1 + ρ ρ(1 − u)x)
u∈[0,1]
p
√
√
√
= x max ( u + ρ + ρ2 1 − u).
u∈[0,1]
p
√
√
We want to maximize the function φ̃(u) = u + ρ + ρ2 1 − u. We observe that
φ̃ ispobtained from φ by replacing ρ by ρ + ρ2 . Hence, the maximum of φ̃ is equal
1
to 1 + ρ + ρ2 and is reached for u∗ = 1+ρ+ρ
2 . Thus,
p
√
JN −2 (x) = 1 + ρ + ρ2 x.
By induction, we prove that
JN −p =
p
1 + ρ + ρ2 + . . . + ρ
√
p
s
x=
Hence, the optimal value is
s
J=
ρN − 1
x
ρ−1
and is obtained by choosing
u∗N −p =
ρ−1
.
−1
ρp+1
ρp+1 − 1
x.
ρ−1
DYNAMIC PROGRAMMING
5
4. The shortest path problem
We consider N nodes. Some of the nodes are connected. The lengths between the
connected nodes are given. There is a starting node that we denote s and an ending
node that we denote t. The shortest path problem consists of finding the shortest
path between s and t. Figure 1 gives an example of such graph. the length between
two connected nodes is indicated in the figure. We order the nodes and give them
6
2
3
6
1
2
4
1
1
3
3
1
5
2
5
Figure 1. Example of a graph for a shortest path problem
a number from 1 to N . Let f (i, j) be the length between the connected nodes i
and j. A path of p nodes is a sequence of node xk ∈ {1, . . . , p} for k = 1, . . . , p
such that xk and xk+1 are connected, x1 = s and xp = t. The length of the path
{xk }pk=1 is given by
p−1
X
(8)
f (xk , xk+1 ).
k=1
The solution of the shortest path problem is a path {xk }p1 which minimizes the
length given by (8).
We now want to rewrite the shortest path problem as a DP problem. We extend
the definition of f (i, j) to any node (not just the connected ones) by setting
f (i, j) = ∞ if the nodes i and j are not connected
and f (i, i) = 0. We make the following assumptions:
• There does not exist any cyclic path of negative length. If this assumption
is not fullfilled and if this cycle path with negative length can be reached
from s, then the problem does not admit a solution as any path can allways
be improved by taking loops in this cycle.
• There exist at least one path of finite length which connects s to t.
With these assumptions, it is clear that an optimal path exists and its length is at
most N . We consider the DP problem given by minimizing
(9)
J=
N
−1
X
f (xk , uk )
k=1
where
x1 = s,
xk+1 = uk
6
DYNAMIC PROGRAMMING
and we take uk ∈ Uk where Uk = {1, . . . , N − 1} for k = 1, . . . , N − 2 and UN −1 =
{t}. This DP problem is equivalent to the shortest path problem. In the DP
formulation (9) , we have a fixed number of stage N − 1 while p was variable in the
shortest path problem formulation (8). We find the actual number of nodes in the
optimal path by removing the repeated nodes in the solution of the DP problem.
Let us consider the example above with s = 1 and t = 6. The function f is given in
the Figure 2. By using (6), we compute recursively the value of Jk (i) for k = 1, . . . , 5
and i = 1, . . . , 6 and the result are given in Figure 3. For illustration purpose, we
consider in details the computation of J4 (1). We have
J4 (1) = min(f (1, x) + J5 (x)).
x
Since
    
∞
∞
0
3 6 9
     
 2  ∞ ∞
    
f (1, ·) + J5 (·) = 
∞ +  1  = ∞ ,
     
∞  3  ∞
∞
0
∞

we have J4 (1) = 9.
From the results in Figure 3, we get that the optimal length is J ∗ = J1 (s) = J1 (1) =
5. To find the optimal path, we have to solve
(10)
xk+1 = argmin (f (xk , x) + Jk (x))
x∈{1,...,N }
Hence, we get
x1 = 1,
x2 = 1,
x3 = 3,
x4 = 5,
x5 = 4,
x6 = 6.
There is one repeated node (x1 = x2 ).
To compute the shortest path, a straightforward method is to consider all the path,
compute the length of each of them and find the smallest. Since there are N nodes,
there exist N − 2 paths and we can roughly estimate by N · N ! the number of
operation to compute all the lengths. The question is how this method compares
with the DP algorithm. In the DP algorithm, we have to find the functions Jk ,
that is, compute Jk (xi ) for k = 1, . . . , N and i = 1, . . . , N (in total N · N values
to compute). To compute each Jk (xi ), we need to solve a minimization problem
which requires N operations. Finally, for the DP algorithm, we have a number
of operations of order N 3 . Since, for N large, N 3 is smaller that N · N !, the DP
algorithm is computationally advantageous. However, it requires a lot of memory
(all the Jk have to be stored), which is not the case in the first approach.
1
2
3
4
5
6
1
0
3
2
∞
∞
∞
2 3
4 5
3 2 ∞ ∞
0 ∞ 2 ∞
∞ 0
5 1
2 5
0 1
∞ 1
1 0
6 ∞ 1 3
6
∞
6
∞
1
3
0
Figure 2. The value of f (i, j) is given by the element (i, j) in the table.
DYNAMIC PROGRAMMING
1
2
3
4
5
6
J5
∞
6
∞
1
3
0
J4
9
3
4
1
2
0
J3
6
3
3
1
2
0
J2
5
3
3
1
2
0
J1
5
3
3
1
2
0
Figure 3. Computation of Jk (i) for k = 1, . . . , 5 and i = 1, . . . , 6
7

Download Report

SHORT INTRODUCTION TO DYNAMIC PROGRAMMING 1

Paperzz.com

Your Paperzz