Dynamic Programming and Value-Function Approximation in

J Optim Theory Appl (2013) 156:380–416
DOI 10.1007/s10957-012-0118-2
Dynamic Programming and Value-Function
Approximation in Sequential Decision Problems: Error
Analysis and Numerical Results
Mauro Gaggero · Giorgio Gnecco ·
Marcello Sanguineti
Received: 4 August 2011 / Accepted: 21 June 2012 / Published online: 6 July 2012
© Springer Science+Business Media, LLC 2012
Abstract Value-function approximation is investigated for the solution via Dynamic
Programming (DP) of continuous-state sequential N -stage decision problems, in
which the reward to be maximized has an additive structure over a finite number
of stages. Conditions that guarantee smoothness properties of the value function at
each stage are derived. These properties are exploited to approximate such functions
by means of certain nonlinear approximation schemes, which include splines of suitable order and Gaussian radial-basis networks with variable centers and widths. The
accuracies of suboptimal solutions obtained by combining DP with these approximation tools are estimated. The results provide insights into the successful performances
appeared in the literature about the use of value-function approximators in DP. The
theoretical analysis is applied to a problem of optimal consumption, with simulation
results illustrating the use of the proposed solution methodology. Numerical comparisons with classical linear approximators are presented.
Keywords Sequential decision problems · Dynamic programming · Approximation
schemes · Curse of dimensionality · Suboptimal solutions · Optimal consumption
Communicated by Francesco Zirilli.
M. Gaggero
Institute of Intelligent Systems for Automation, National Research Council of Italy, Genova, Italy
e-mail: [email protected]
G. Gnecco · M. Sanguineti ()
DIBRIS, University of Genova, Genova, Italy
e-mail: [email protected]
G. Gnecco
e-mail: [email protected]
J Optim Theory Appl (2013) 156:380–416
381
1 Introduction
Tasks requiring to perform sequential decisions in such a way to maximize a reward
(or minimize a cost) expressed as a summation over stages arise in a variety of applications. Often, a model of the process evolving during the stages is assumed to
be available, and the decisions taken at each step depend on a “state variable” that
captures the “history” of the process. Examples can be found in scheduling fleets
of vehicles, managing systems of water reservoirs, allocating resources (e.g., people,
equipment, commodities, and facilities), selling assets, investing money in portfolios,
optimizing transportation or telecommunication networks, inventory forecasting, financial planning, etc. Depending on the application context, both continuous and
discrete states are considered. In this paper, we are interested in the case where the
state can take a continuum of values.
Sequential decision problems have been extensively studied by means of the Dynamic Programming (DP) methodology [1]. DP solves sequential decision problems
iteratively, by introducing at each stage the value function (called cost-to-go when a
cost has to be minimized), which expresses the value of the reward to be incurred at
the next stage, as a function of the state at the current one. The solution is formally
obtained by means of recursive equations. However, closed-form solutions to such
equations can be derived only in particular cases. In general, one has to search for suboptimal solutions [2–5]. We refer to the various techniques and algorithms developed
to this end as Approximate Dynamic Programming (ADP). Effective approximation
approaches require understanding the structure of the problem at hand. In the case
of continuous problems, they typically share the feature of combining DP with tools
from the theory of approximation, in such a way to replace the value functions with
simple approximations (e.g., orthogonal polynomials, splines, and neural networks
[6]) containing some parameters to be optimized (e.g., the coefficients of orthogonal
polynomials and the weights and connections in the computational units of neural
networks). The knowledge of smoothness properties of the value functions is useful
to choose suitable approximation strategies (see, e.g., the discussion in [3, Chap. 11]).
In the basic version of ADP, first the current and next state vectors are discretized
by using a number of levels in each of their components, respectively. In such a way,
the application of DP requires to solve recursive equations only for a finite number
of state values. However, in order to get the solution of the original sequential decision problem, one does not need merely to know for every stage the value function
in correspondence of the discretized states, but has also to estimate it in the other
states that might be “visited” by the DP algorithm. Hence, as a second step, a suitable
technique has to be applied to approximate each value function in the states out of the
discretization set (see, e.g., [3, Chaps. 7 and 11] and [7, Chap. 6]). The idea to combine DP with approximations of the value functions arose at the very beginning of DP.
After the seminal contributions [1] and [8], it is possible to trace an evolution going
from polynomial approximation [8–10] to spline interpolation [11, 12] and neural
networks [5, 13]. Interpolation methods were developed in [14] to approximate the
value functions for high-dimensional problems. Several methods that involve the use
of neural networks were presented in [2] under the name of “neuro-dynamic programming”. A nice exposition of approximation methods for continuous-state problems
can be found in [15]. Among recent monographs on ADP methods and algorithms,
we mention [3] and [4].
382
J Optim Theory Appl (2013) 156:380–416
The aim of this paper is to investigate how DP and suitable approximations of the
value functions can be combined to develop a methodology that allows one to face
high-dimensional, continuous-state, sequential decision problems. As pointed out in
[2, Chap. 6, p. 335], in order to have performance guarantees in value-function approximation, “the function approximator must be able to closely represent all of the
intermediate cost-to-go functions” . . . “Given that such a condition is in general very
difficult to verify, one must either accept the risk of a divergent algorithm1 or else
restrict to particular types of function approximators under which divergent behavior
becomes impossible”. The search for such approximators is the departure point of our
work.
We consider approximations of the value functions taking on the form of linear combinations of basis functions obtained from a chosen “mother function” by
varying some “inner parameters”. For instance, the mother function can be a Gaussian, in which case the inner parameters are the variance and the coordinates of its
center. Such inner parameters have to be optimized together with the components
of the coefficients of the linear combinations. Since the basis functions can vary—
although keeping unchanged the structure of the mother function—by varying the
inner parameters, one has a variable-basis approximation scheme [16]. In contrast,
traditional approaches are fixed-basis approximation schemes, as they are made up
of linear combinations of a certain number of a priori fixed basis functions (e.g.,
algebraic and trigonometric polynomials). In particular, we consider variable-basis
functions that model approximators successfully used in the applications of ADP to
sequential decision problems, such as splines, Gaussian radial-basis functions, and
neural networks [6].
Unfortunately, the number of basis functions (hence, the number of coefficients to
be optimized) required to guarantee a desired approximation accuracy of the value
functions may grow “very fast” with the dimension of the state vector. This behavior,
known as curse of dimensionality in value-function approximation, is a major source
of difficulties in the development of computationally efficient ADP techniques. However, there is a large experimental evidence of the effectiveness of variable-basis approximation schemes with certain mother functions (see, e.g, [2, 17], [3, Sect. 7.4],
and [18]). This requires a thorough theoretical investigation, which, to the best of
our knowledge, still lacks in the literature (see, e.g., the remarks in [3, Sect. 7.4],
[7, Chap. 6]). For example, quoting [19, p. 61], the use of neural networks to approximate value functions “has led to some successes”, but “it is very hard to quantify
or analyze the performance of neural-network-based techniques.” In this paper, we
provide such analysis for a variety of variable-basis approximators used in ADP, including several kinds of neural networks.
Our results give conditions guaranteeing that the number of parameterized basis
functions (e.g., the number of computational units in neural networks; hence, the total
number of parameters to be adjusted) is small, but large enough to guarantee sufficiently accurate suboptimal solutions. The numerical comparison between variablebasis schemes and linear approximators in the solution of optimization problems has
deserved a very little attention in the literature (we are aware only of the papers
[20, 21] and the monograph [5], in preparation). To get insights into this direction, we
1 When the decision horizon goes to infinity.
J Optim Theory Appl (2013) 156:380–416
383
compare from a numerical point of view the proposed variable-basis approximation
schemes with fixed-basis ones, showing that the former provide, in general, a better
accuracy in the approximation of the value functions, the number of computational
units being the same. To the best of our knowledge, estimates of this kind have not
been derived before.
The paper is organized as follows. Section 2 states the considered sequential decision problem, summarizes the DP algorithm, and estimates the error propagation in
ADP with value-function approximation. Section 3 derives smoothness properties of
the value functions. In Sect. 4, such properties are combined with tools from approximation theory, in order to investigate the accuracy of suboptimal solutions obtained
via our methodology. Section 5 applies the approach to a problem of optimal consumption. Section 6 describes the ADP procedure suggested by our theoretical study.
Simulation results are given in Sect. 7, where variable-basis schemes are compared
with fixed-basis ones. Section 8 contains some final remarks. To keep continuity of
exposition, all proofs are collected in the Appendix.
2 Error Propagation
We consider the following model of N -stage, continuous-state, sequential decision
problem, in which a reward functional, expressed as a summation of N terms, has to
be maximized [22].
Problem ΣN For x0 ∈ X0 , find
N −1
o
t
N
J (x0 ) := sup
β ht (xt , xt+1 ) + β hN (xN )
t=0
s.t. (xt , xt+1 ) ∈ Dt ,
t = 0, 1, . . . , N − 1,
xt ∈ X t ,
where: xt ∈ Xt ⊆ Rdt , t = 0, 1, . . . , N , is the state vector at time t, and Xt is the state
space; the next state xt+1 , t = 0, 1, . . . , N − 1, has to be chosen subject to the constraint (xt , xt+1 ) ∈ Dt , where Dt ⊆ Xt × Xt+1 is a correspondence which models the
transition from stage t to stage t + 1; ht : Dt → R, t = 0, 1, . . . , N − 1, are transition
rewards that depend only on the current and next states; hN : XN → R is the final
reward, which depends only on the final state xN ; β ∈ [0, 1] is a fixed nonnegative
discount factor (the case β = 0 corresponds to a static optimization problem).
Under mild hypotheses on the correspondence and reward functions [1, 7, 23],
DP allows one to formally solve Problem ΣN in an iterative way, by introducing, for
t = N − 1, . . . , 0, the following subproblems:
JNo (xN ) := hN (xN ),
N −1
o
k−t
N −t
β hk (xk , xk+1 ) + β
hN (xN )
Jt (xt ) := sup
(1a)
(1b)
k=t
s.t. (xk , xk+1 ) ∈ Dk ,
k = t, . . . , N − 1.
(1c)
384
J Optim Theory Appl (2013) 156:380–416
The function Jto : Xt → R is called tth value function (tth cost-to-go function when
a cost has to be minimized).
Subproblems (1a)–(1c) can be restated in terms of Bellman’s operators Tt , defined
for every t = N − 1, . . . , 0 and every bounded continuous function ft+1 on Xt+1 as
(Tt ft+1 )(xt ) := sup ht (xt , y) + βft+1 (y) ,
y∈Dt (xt )
where, for xt ∈ Xt , we let Dt (xt ) := {y ∈ Xt+1 | (xt , y) ∈ Dt }. In terms of Bellman’s
operators, subproblems (1a)–(1c) satisfy the following recursive equations for the
value functions, known as Bellman’s equations:
JNo (xN ) = hN (xN ),
o
(xt ),
Jto (xt ) = Tt Jt+1
(2a)
t = N − 1, . . . , 0.
(2b)
Their iterated application is the DP algorithm. The solution to Problem ΣN is given
by J0o (x0 ) = J o (x0 ).
For t = 0, . . . , N − 1, let us denote by J˜to the approximation of the tth value
function Jto , obtained by a given approximating family Ft . The use of approximate
value functions is the essence of the ADP algorithm. The last stage does not require
any approximation, as one can set JÑo = JNo := hN . Suppose that, at a certain stage
o (obtained from previous iterations of
t + 1, one has at disposal an approximation J˜t+1
o
o with Jõ in Eqs. (2a)–
ADP) of the (t + 1)th value function Jt+1 . By replacing Jt+1
t+1
o
(2b), instead of Jt (xt ), one gets
o
Jˆto (xt ) = Tt J˜t+1
(xt ).
In general, Jˆto = Jto . Thus, there is an error propagation from one iteration of ADP
to the following one. Since Jˆto may not belong to the approximating family Ft used
at stage t, before performing the next iteration of ADP, one has to approximate Jˆto
by an element J˜to ∈ Ft (this is done, e.g., by choosing the function J˜to that minimizes a suitable error between J˜to and Jˆto ). Suppose that one is able to guarantee that
o of J o is “sufficiently accurate”; then, one would like the
the approximation J˜t+1
t+1
o
approximations Jˆt of Jto and J˜to of Jˆto (hence of Jto ) be “sufficiently accurate” too.
In order to investigate how the error propagates from one iteration of ADP to the
next one, a suitable norm has to be chosen to evaluate the approximation error. Given
a Lebesgue-measurable set Ω ⊆ Rd and 1 ≤ p ≤ +∞, we denote by Lp (Ω) the
correspondent Lebesgue space, where integration is performed with respect to the
Lebesgue measure (for bounded and continuous functions, the L∞ norm coincides
with the supremum norm). For our purposes, the following assumption is needed.
Assumption 2.1 The final reward hN is bounded and continuous on XN , and, for
t = 0, . . . , N − 1, the transition reward ht is bounded and continuous on Dt . For
every xt ∈ Xt , the set Dt (xt ) is nonempty.
We shall exploit the following known result (see, e.g., [22, Theorem 3.3, p. 54]).
J Optim Theory Appl (2013) 156:380–416
385
Proposition 2.1 If Assumption 2.1 holds and ft+1 , kt+1 are bounded and continuous
on Xt+1 , then
sup (Tt ft+1 )(xt ) − (Tt kt+1 )(xt ) ≤ β sup ft+1 (xt+1 ) − kt+1 (xt+1 ).
xt ∈Xt
xt+1 ∈Xt+1
The next proposition provides an upper bound on the approximation error through
the N stages; the estimate in (i) is analogous to the one derived in [2, pp. 332–333]
for infinite-horizon problems.
Proposition 2.2 (Error Propagation in ADP) Let Assumption 2.1 hold. Suppose that,
for every t = 0, 1, . . . , N , Jto is bounded and continuous and, for t = 0, 1, . . . , N − 1,
let Ft be a family of bounded and continuous functions on Xt . Let JÑo = JNo := hN .
(i) If, for t = 0, 1, . . . , N − 1, there exists ft ∈ Ft such that
o
sup Tt J˜t+1
(xt ) − ft (xt ) ≤ εt
xt ∈Xt
(3)
and one takes J˜to = ft , then
−1
N
sup J0o (x0 ) − J˜0o (x0 ) ≤
β t εt .
x0 ∈X0
(4)
t=0
(ii) If, for every t = 0, 1, . . . , N − 1, there exists ft ∈ Ft such that
sup Jto (xt ) − ft (xt ) ≤ εt ,
xt ∈Xt
(5)
then one can choose each J˜to on the basis of the available information at every
stage t in such a way that
−1
N
sup J0o (x0 ) − J˜0o (x0 ) ≤
(2β)t εt .
x0 ∈X0
(6)
t=0
Condition (3) requires that at every stage t the family Ft has to be “flexible
enough” to approximate with a desired degree of accuracy every function of the form
o , obtained by applying Bellman’s operator to the previous-stage approximation
Tt J˜t+1
o . As remarked in [2, p. 335], for small values of ε , this may be rather difficult. In
J˜t+1
t
Sect. 4, we shall provide conditions under which (3) holds for classes of functions Ft .
Condition (5) is a much weaker requirement than (3). It expresses the capability of Ft
o . Howto approximate, at every stage t, merely the true value function Jto = Tt Jt+1
ever, by replacing (3) with (5), one gets, instead of (4), the upper bound (6), where
the term 2β replaces β inside the summation. When εt does not depend on t, for
β ∈ [1/2, 1], the bound (6) may exhibit a curse of time horizon (we use such a terminology by analogy with the curse of dimensionality). This may happen, e.g., when
the same family of approximators is used in each stage and the value functions at all
386
J Optim Theory Appl (2013) 156:380–416
stages are “almost identical”. When the transition rewards are stage-independent, the
latter is reasonable for N large enough, since when the horizon N tends to infinity,
typically the tth value functions converge to a stationary value function. In this case,
the right-hand side of (6) becomes a geometric series. So, in the case of a “nearly
constant” value of εt and a value of the horizon N large enough, computational feasibility limits the range of applicability of Proposition 2.2(ii) to β ∈ [0, 1/2); if, instead,
the horizon is sufficiently small, then the whole range β ∈ [0, 1] works.
Values of β not very close to 1 are sometimes encountered in sequential decision problems (see, e.g., [24]). For instance, the role of small discount factors in the
convergence rates of algorithms for discounted Markov decision problems was investigated in [25]. Small values of β arise, e.g., in optimal control of admission to multiserver queues and single-server queues with random vacation periods of the server
[26], adaptive critics for control [27], technology adoption in economic growth [28],
management of renewable resources [29], debt dynamics and sustainability [30], etc.
When β ∈ [1/2, 1), there are two possibilities: either making the stronger requirement
(3) satisfied or resorting to the technique known as multistage lookahead (see, e.g.,
[2, Sect. 6.1.2], [7, Sect. 6.3], and [31]), considered in the remaining of this section.
Let M be a fixed positive integer such that the horizon N is its multiple (we
assume this for simplicity, but, at the expense of a heavier notation, the result can
o
:= JNo , for t =
be easily generalized to the nonmultiple case). Starting from JÑ/M
o(M)
(M)
:= Tt Jõ ,
N/M − 1, . . . , 0, one searches for a function that approximates Jˆt
t+1
(M)
= TM·t · · · TM·(t+1)−1 is the operator obtained applying the M Bellman’s
where Tt
operators TM·(t+1)−1 , . . . , TM·t in this order. We denote by ADP(M) the corresponding value-function approximation procedure. For the use of multistage lookahead in
value-function approximation, see also [2, Sect. 6.1.2]. Advantages of this technique
(which, of course, are obtained at the expense of heavier computations) are pointed
out, e.g., in [2, pp. 266, 375].
Proposition 2.3 (Error Propagation in ADP with Multistage Lookahead) Let Assumption 2.1 hold. Suppose that, for t = 0, 1, . . . , N , Jto is bounded and continuous,
and let M be a positive integer such that the horizon N is its multiple. For every
t = 0, . . . , N/M − 1, let Ft be a family of bounded and continuous functions on
o
= JNo = hN . If, for t = 0, 1, . . . , N/M − 1, there exists ft ∈ Ft
XM·t , and let JÑ/M
such that
o
sup JM·t
(xM·t ) − ft (xM·t ) ≤ εt ,
(7)
x∈XM·t
then the approximation error of ADP(M) is bounded from above as follows:
N/M−1
t
2β M εt .
sup J0o (x0 ) − J˜0o (x0 ) ≤
x0 ∈X0
(8)
t=0
According to
the bound (8), the above-mentioned curse of time horizon is avoided
for 0 ≤ β < M 12 , which, still for a reasonably small value of M, covers a much
wider range of values than the bound (6) from Proposition 2.2. For example, for
J Optim Theory Appl (2013) 156:380–416
387
M = 2, 3, and 4, we get the interval 0 ≤ β < βmax , where βmax 0.7, 0.8, and 0.84,
o , which
respectively. This is obtained at the expense of computing Jˆto(M) = Tt(M) J˜t+1
amounts at solving, for every t, the nonlinear programming problem
o(M)
(xt )
Jˆt
=
t+M−1
max
xt+1 ,...,xt+M :xk+1 ∈Dk (xk ),k=t,...,t+M−1
β
k−t
hk (xk , xk+1 ) + β
M
o
(xt+M )
J˜t+1
.
k=t
From the computational point of view, the number of real variables involved in the
equation above is d · M, where d is the dimension of the state vector. When d · M is
not too large, this approach may be computationally feasible [2, Sect. 6.1.2].
3 Smoothness of the Value Functions
The smoothness results of this section are expressed in terms of Lp -norms of partial
derivatives, so the natural setting is provided by Sobolev spaces (see [32] and the references therein for other smoothness results available in the literature). For a
smooth
function f : X → R and a (multi)index r := (r1 , . . . , rd ) ∈ Nd0 , we let |r| := di=1 ri
r
and D r f (x) := r1∂ f rd (x). For an open set Ω ⊆ Rd , a positive integer m, and
∂x1 ···∂xd
1 ≤ p ≤ ∞, by Wpm (Ω) we denote the Sobolev space of functions on Ω whose (distributional) partial derivatives up to the order m are in Lp (Ω), endowed with the
norm
⎧ 1/p 1/p
p
r
r
p
⎪
=
⎪
0≤|r|≤m D f Lp (Ω)
0≤|r|≤m Ω |D f (x)| dx
⎪
⎪
⎨ if 1 ≤ p < +∞,
f Wpm (Ω) :=
⎪
⎪
max0≤|r|≤m D r f L∞ (Ω) = max0≤|r|≤m {ess supx∈Ω |D r f (x)|}
⎪
⎪
⎩
if p = +∞.
As we shall deal with continuous functions, “ess supx∈Ω ” can be replaced by
“supx∈Ω ”.
At stage t + 1, the state xt+1 can be written as the value assumed by a function
gt : Xt → Xt+1 , under the constraint that gt (xt ) ∈ Dt (xt ) for every xt ∈ Xt , and
Problem ΣN can be viewed as the maximization of a reward that depends on the N
functions g0 , . . . , gN −1 , called policy functions or simply policies. We have
o
gto (xt ) ∈ argmax ht (xt , y) + βJt+1
(y) ,
(9)
y∈Dt (xt )
and so optimal policy functions can be obtained as a byproduct of DP. In writing (9), we are supposing that, for every t = N − 1, . . . , 0 and every xt ∈ Xt ,
o (y)] exists. This happens, e.g., when D (x ) is nonempty
maxy∈Dt (xt ) [h(xt , y)+βJt+1
t t
o (y) is continuous.
and compact and ht (xt , y) + βJt+1
We make the following assumption. Recall that for a convex set X ⊆ Rd and
α ∈ R, a function f : X → R is α-concave on X if and only if f (x) + 12 αx2 is
388
J Optim Theory Appl (2013) 156:380–416
concave on X (for α = 0, we get the standard definition of concavity). For a positive
integer m, we denote by C m (X) the set of functions on X with partial derivatives
which are continuous up to the order m. If f is of class C 2 (X), then f is α-concave
if and only if [33]
(10)
sup λmax ∇ 2 f (x) ≤ −α,
x∈X
where λmax
(∇ 2 f (x))
is the maximum eigenvalue of the Hessian ∇ 2 f (x).
Assumption 3.1 Let m ≥ 2 be an integer. The following hold for t = 0, . . . , N − 1.
(i) Xt ⊂ Rd and Dt ⊆ Xt × Xt+1 are compact, convex, and have nonempty interiors;
(ii) there exist optimal policies gto that are continuous and interior on int(Xt ), i.e.,
for every xt ∈ int(Xt ), gto (xt ) ∈ int(Dt (xt ));
(iii) ht ∈ C m (Dt ), and there exists αt > 0 such that ht is αt -concave;
(iv) hN ∈ C m (XN ) and is concave.
Note that Assumption 3.1 implies Assumption 2.1, and Assumption 3.1(iii) implies the concavity of ht . Although the concavity of the transition and final rewards
might appear restrictive, it is quite common in applications and typically made in
the literature [7, 22, 23]. In many cases, the existence and continuity of the optimal
policies can be checked through the Maximum Theorem [22, Theorem 3.6, p. 62]
and its consequences [22, discussion on p. 63 and Exercise 3.11 (a), p. 57], while
Assumption 3.1(i), (iii), and (iv) can be enforced by the problem formulation.
The next proposition gives conditions under which the tth value functions are of
class C m on compact sets X̄t that are closures of open sets, and each value function
Jto can be extended on the whole Rd to functions which belong to the Sobolev spaces
Wpm (Rd ). We shall exploit such properties in Sect. 4. For Ω1 ⊂ Ω ⊆ Rd and a function f on Ω, we denote by f |Ω1 its restriction to Ω1 (similarly, f is an extension to
Ω of f |Ω1 ). For r > 0, by B1r (Rd ) we denote the Bessel potential space [34, p. 134]
of functions u : Rd → R that can be written as u = Gr ∗ λ, where Gr : Rd → Rd is
the inverse Fourier transform of Ĝr (ω) := (2π)−d/2 (1 + ω2 )−r/2 , “∗” denotes the
convolution operator, and λ ∈ L1 (Rd ). Let uBr (Rd ) := λL1 (Rd ) .
1
Proposition 3.1 (Smoothness Properties of the Value Functions) Let Assumption 3.1
hold. For every t = 0, . . . , N , the following hold.
(i) Jto ∈ C m (Xt );
o,p
(ii) for every 1 ≤ p ≤ ∞, there exists a function J¯t ∈ Wpm (Rd ) such that Jto =
o,p
J¯t |Xt ;
o,p
(iii) for every 1 < p < ∞, there exists a function J¯t ∈ Bpm (Rd ) such that Jto =
o,p
J¯t |Xt . The same holds for p = 1 and m ≥ 2 even.
o of J o and its derivatives up to a
For a “sufficiently good” approximation J˜t+1
t+1
suitable order, the following technical result provides smoothness properties of Jˆto :=
o .
Tt J˜t+1
J Optim Theory Appl (2013) 156:380–416
389
Proposition 3.2 (Smoothness Properties of the Value Functions) Let Assumption 3.1
hold for t = 0, . . . , N − 1, 1 ≤ p ≤ ∞, with item (ii) referred to the policies
o
g̃to (xt ) ∈ argmax ht (xt , y) + β J˜t+1
(y)
(11)
y∈Dt (xt )
instead of the optimal policies, and suppose that hN is αN -concave for some αN > 0.
o,p
o ∈ C m (X
o
m
∈
(i) If J˜t+1
t+1 ) is concave, then Tt J˜t+1 ∈ C (Xt ), and there exists Jˆt
o,p
o
m
d
ˆ
˜
Wp (R ) such that Tt Jt+1 = Jt |Xt .
o,p
o,p
(ii) There exists a function J¯t ∈ Wpm (Rd ) such that Jto = J¯t |Xt .
(iii) For j = 1, 2, . . . , let Jõ
∈ C m (Xt+1 ) be such that
t+1,j
lim
max
j →∞ 0≤|r|≤m
o
o
supxt+1 ∈Xt+1 D r Jt+1
(xt+1 ) − J˜t+1,j
(xt+1 ) = 0.
(12)
o replaced by Jõ
Then, for all sufficiently large j , (i) holds for J˜t+1
t+1,j , and setting
o,p
o,p
, we have
Jˆ = Tt J˜
t,j
t+1,j
o,p
o,p lim J¯t − Jˆt,j W m (Rd ) = 0.
j →∞
p
o can be extended
Proposition 3.2 has the following meaning. The function Tt J˜t+1
o,p
on the whole Rd to a function Jˆt in the Sobolev space Wpm (Rd ). Moreover, if a
o
o
sequence J˜t+1,j
, j = 1, 2, . . . , of approximations of Jt+1
converges uniformly to
o,p
o
J , together with the partial derivatives up to order m, then the sequence Jˆ , j =
t,j
o,p
t+1
o
converges in Wpm (Rd ) to the extension J¯t of Jto .
1, 2, . . ., of extensions of Tt J˜t+1,j
See, e.g., [32, Sect. 6] for an example of a problem in which the assumptions of
Proposition 3.2 are satisfied.
4 Accuracy of Suboptimal Solutions
Classical approximation schemes in a function space can be expressed as linear combinations of fixed basis functions ϕ1 , . . . , ϕn that span an (at most) n-dimensional
linear subspace [35]. So, they take on the form
n
δi ϕi (·),
(13)
i=1
where the coefficients δ1 , . . . , δn are determined in such a way to minimize the approximation error; for the use of such models in value-function approximation, see
[3, Sect. 7.2.3]. For example, this is the case with algebraic and trigonometric polynomials in the space of continuous functions on compact sets. Properties of linear
approximation have been extensively studied (see, e.g., [35]).
390
J Optim Theory Appl (2013) 156:380–416
An alternative approximation scheme consists of linear combinations φ1 (·, w1 ),
. . . , φn (·, wn ) of basis functions obtained by varying vectors w1 , . . . , wn ∈ Rp of
“inner” parameters in a “mother function” φ:
n
(14)
δi φ(·, wi ).
i=1
The “inner” parameter vectors w1 , . . . , wn have to be optimized together with the
coefficients δ1 , . . . , δn of the linear combination. In general, the presence of the “inner” parameters “destroys” linearity, so (14) is a nonlinear approximation scheme
[3, Sect. 7.4]. Equation (14) models a wide variety of approximating families used in
applications, such as free-nodes splines, radial-basis-function networks with variable
centers and widths, trigonometric polynomials with free frequencies and phases, and
feedforward neural networks [16]. Approximating families of the form (14) belong
to the so-called variable-basis approximation schemes [16, 36], whose advantages
over classical linear ones of the form (13) was investigated, e.g., in [16, 36–38].
Roughly speaking, for a given accuracy of approximation of functions in certain
spaces, variable-basis approximation schemes may require much less parameters to
be optimized than linear ones.
In the following, we consider the use in value-function approximation of some
approximation schemes of the form (14). For a compact set X ⊂ Rd , we define the
ridge variable-basis approximation scheme2
R(ψ, n) := fn : X → R | fn (x) =
n
δi ψ(ai · x + bi ), ai ∈ R , δi , bi ∈ R , (15)
d
i=1
where ai · x denotes the scalar product in Rd of ai and x. Note that (15) is of the
form (14) with φ(x, wi ) := ψ(x · ai + bi ) and wi := (ai,1 , . . . , ai,d , bi ) ∈ Rd+1 . We
say that the mother function ψ : R → R is q-smooth if and only if it belongs to the
family
S := ψ : R → R | nonzero, compactly supported, with continuous and uniformly
q
bounded partial derivatives up to the order q, and ∃l ≥ q
s.t. 0 <
l ∂ ψ dz < ∞ .
l
R ∂z
(16)
2 Functions constant along hyperplanes are known as ridge functions. Each ridge function results from the
composition of a multivariable function having a particularly simple form, i.e., the inner product, with an
arbitrary function dependent on a single variable.
J Optim Theory Appl (2013) 156:380–416
391
Examples of functions in S q are splines of smoothness order q + 1 [39]. We are
interested in basis functions belonging to the family
S := ψ : R → R | nonzero, infinitely many times differentiable in some open
interval (a, b) ⊂ R and such that there exists c ∈ (a, b)
∂ k ψ s.t.
= 0 ∀k ∈ N .
∂zk z=c
(17)
Examples of such basis functions used in applications are the so-called squashing (or
logistic) function (1 + e−z )−1 [40] and the sinusoidal functions. The hyperbolic tangent is in S too, since tanh z = 2(1 + e−2z )−1 − 1 (to which the so-called feedforward
sigmoidal neural networks correspond [37]).
We define also the radial variable-basis approximation scheme
n
G(ψ, n) := fn : X → R | fn (x) =
δi ψ x − τi /σi ,
i=1
τi ∈ R , δi ∈ R, σi > 0 ,
d
(18)
where ψ : R → R is a radial-basis function (e.g., the Gaussian e−z , to which the
so-called Gaussian radial-basis networks correspond [41]). Note that (18) is of the
form (14) with φ(x, wi ) := ψ(x − τi /σi ) and wi := (τi,1 , . . . , τi,d , σi ) ∈ Rd+1 .
The following proposition investigates the sup-norm approximation of functions
in the balls of certain Sobolev spaces, via the variable-basis schemes (15) and (18).
2
Proposition 4.1 (Approximation Error Bounds) Let d, q be positive integers, and
X ⊂ Rd compact and convex.
(i) If ψ ∈ S q+s and s = d/2 + 1, then there exists C > 0 such that, for every
ρ > 0, every f ∈ Bρ ( · W q+2s+1 (Rd ) ), and every positive integer n, there is
2
fn ∈ R(ψ, n) such that
ρ
max sup D r f (x) − D r fn (x) ≤ C √ .
0≤|r|≤q x∈X
n
(19)
(ii) If ψ ∈ S and s = d/2 + 1, then there exists C > 0 such that, for every ρ > 0,
s (Rd ) ), and every positive integer n, there is fn ∈ R(ψ, n)
every f ∈ Bρ ( · W∞
such that
ρ
sup f (x) − fn (x) ≤ C √ .
(20)
n
x∈X
(iii) If ψ is the Gaussian and s > d, then there exists C > 0 such that, for every ρ > 0, every f ∈ Bρ ( · Bs (Rd ) ), and every positive integer n, there is
1
392
J Optim Theory Appl (2013) 156:380–416
fn ∈ G(ψ, n) such that
ρ
sup f (x) − fn (x) ≤ C √ .
n
x∈X
(21)
For an increasing number n of variable-basis
√ functions, the upper bounds provided
by Proposition 4.1 decrease with an order 1/ n (the constants C may be different in
each of them). The three estimates (i), (ii), and (iii) differ in the ways in which the
approximation error is measured, in the values of the required smoothness s, and in
the families of functions to which they apply. Item (i) provides an upper bound on the
error in approximating functions in Sobolev balls (of radius ρ and centered on the origin) and all their partial derivatives up to a certain order, uniformly on a compact and
convex set X ⊂ Rd . The approximators are variable-basis functions belonging to the
family R(ψ, n) defined in (15) with a (q + s)-smooth basis function ψ, as defined
in (16). The values of q and s are related to the largest order of the partial derivatives to be approximated and to the dimension d, respectively. For a lower degree of
smoothness, a result similar to Proposition 4.1(i) was obtained in [42]. Item (ii) of
Proposition 4.1 gives an upper bound on the error in approximating uniformly on X,
by elements of R(ψ, n) with ψ ∈ S , functions in Sobolev balls, without enforcing
derivative approximation. Finally, item (iii) provides an upper bound on the error in
approximating uniformly on X, by elements of G(ψ, n) with ψ a Gaussian, functions
belonging to balls in Bessel potential spaces B1s (Rd ), without enforcing derivative
approximation.
Combining the approximation tool given in Proposition 4.1 with the smoothness
results stated in Propositions 3.1 and 3.2, we get the next Proposition 4.2. It estimates
the error in the uniform approximation of Jto and its partial derivatives, when basis
functions from S q or S are used in the approximation scheme (15) or the Gaussian
is used in the approximation scheme (18).
Proposition 4.2 (Approximations of the Value Functions) Let s := d/2 + 1. Then,
there exist N positive constants Ct , N positive constants Ct , and N positive constants
Ct such that the following hold for t = 0, . . . , N − 1.
(i) Let Assumption 3.1 hold with item (ii) referred to the policies (11) and
m ≥ 2 + (2s + 1)N , ψt ∈ S 2+(2s+1)(t+1) , and n̄N −1 , . . . , n̄0 be an N -tuple of sufficiently large positive integers. For t = N − 1, . . . , 0, if J˜ko ∈ R(ψk , nk ), nk ≥ n̄k ,
k = N − 1, . . . , t + 1, are suitable approximations of the value functions Jko from
the stage t + 1 to the stage N − 1, then, for every positive integer nt , there exists
ft ∈ R(ψt , nt ) such that
Ct
o
(xt ) − D r ft (xt ) ≤ √ .
sup D r Tt J˜t+1
(22)
max
nt
0≤|r|≤2+(2s+1)t xt ∈Xt
The approximations J˜ko are obtained recursively by setting J˜ko := fk for k = N −
1, . . . , t + 1.
(ii) Let Assumption 3.1 hold with m ≥ s, and ψt ∈ S . Then, for every t = 0, . . . ,
N − 1 and every positive integer nt , there exists ft ∈ R(ψt , nt ) such that
C
sup Jto (xt ) − ft (xt ) ≤ √ t .
nt
xt ∈Xt
(23)
J Optim Theory Appl (2013) 156:380–416
393
(iii) Let Assumption 3.1 hold with m > d, and ψt be the Gaussian. Then, for every
t = 0, . . . , N − 1 and every positive integer nt , there exists ft ∈ G(ψt , nt ) such that
C sup Jto (xt ) − ft (xt ) ≤ √ t .
nt
xt ∈Xt
(24)
The last step consists in combining the estimates from Proposition 4.2 with the
bounds on error propagation from Proposition 2.2. This provides the following upper
bounds on the approximation error.
Theorem 4.1 (Final Error Bounds) Let s := d/2 + 1. Then, there exist N positive
constants Ct , N positive constants Ct , and N positive constants Ct such that the
following hold.
(i) Under the assumptions of Proposition 4.2(i), (ii), and (iii), we have
−1
N
Ct
sup J0o (x0 ) − J˜0o (x0 ) ≤
βt √ ,
nt
x0 ∈X0
t=0
−1
N
C
(2β)t √ t ,
sup J0o (x0 ) − J˜0o (x0 ) ≤
nt
x0 ∈X0
t=0
−1
N
C (2β)t √ t ,
sup J0o (x0 ) − J˜0o (x0 ) ≤
nt
x0 ∈X0
t=0
respectively, where J˜to , t = N − 1, . . . , 0 satisfies (3) with εt :=
and εt :=
C √t
nt
√Ct
nt
, εt :=
C
√t
nt
,
, respectively.
To the best of our knowledge, for quite general formulations of N -stage optimization problems in the form of Problem ΣN , Theorem 4.1 gives the first upper bounds
on the sup-norm error for value-function approximation by variable-basis approximation schemes, where the number of variable-basis functions required to guarantee a
desired accuracy is estimated. Thus, Theorem 4.1 provides a partial answer to the issues remarked in [2, Chap. 6, p. 335], quoted in Sect. 1, and provides new insights into
the effectiveness of value-function approximation by neural networks [3, Sect. 7.4].
Note that Theorem 4.1 requires a degree of smoothness of the function to be approximated that grows linearly with the dimension d of the state space. Estimates
analogous to those provided by Theorem 4.1 can be obtained for ADP with the multistage lookahead technique described in Sect. 2, by combining the estimates from
Proposition 4.2 with the bounds on error propagation from Proposition 2.3.
5 Application to a Problem of Optimal Consumption
In the classical problem of Optimal Consumption (OC) [43, Chap. 6], a consumer
aims at maximizing, over a given time horizon, the discounted value of consumption
394
J Optim Theory Appl (2013) 156:380–416
of a good for a given sequence of interest rates. The consumer has a certain initial
wealth and at each time period earns an income, modeled as an exogenous input.
We consider a “multidimensional version” of the OC problem, denoted by OCdN , in
which there are d > 1 consumers that aim at maximizing a “social utility function”.
Problem OCdN A set of d consumers aims at finding
J (a0 ) =
o
sup
N
−1
β
t
u(ct ) +
ct ,t=0,...,N t=0
d
vt,j (at,j ) + β N u(cN ),
j =1
where at ∈ At = dj =1 At,j ⊆ Rd , at+1,j = ft,j (at,j , ct,j ) = (1 + rt,j )(at,j + yt,j −
ct,j ), j = 1, . . . , d, aN,j + yN,j − cN,j ≥ 0, j = 1, . . . , d, ct,j ≥ 0, j = 1, . . . , d,
y0,j , . . . , yN,j ≥ 0 given, r0,j , . . . , rN −1,j ≥ 0 given. Here, at,j and yt,j are the
wealth and the labor income of the consumer j at time t, respectively, ct,j ≥ 0 is
the current consumption of a good by consumer j , and rt,j is an interest rate associated with that good. Each vector of consumptions ct is chosen as a function of the
current state vector at . The function u is a social utility associated with the vector ct
of consumptions with components ct,j , vt,j is an individual utility dependent on at,j ,
and β > 0 is a fixed discount factor. For j = 1, . . . , d, the budget constraints
aN,j + yN,j − cN,j ≥ 0
(25)
(also called no-Ponzi-game conditions) mean that all the consumers have to repay
any debts within the time N . We assume that the utility functions vt (at,j ) penalize
min , for which all the consumers
the closeness of the at,j to their minimum values at,j
will be able to satisfy conditions (25) in the future. The latter imply some constraints
on the sets At,j to which the state variables at,j belong. These are taken into account
in the next assumption.
max , j = 1, . . . , d, the sets A
Assumption 5.1 For every given a0,j
t,j are closed and
bounded intervals
N −1
N −1
min max k=i (1 + rk,j ) + yN,j
i=0 yi,j
max
A0,j := a0,j , a0,j = −
,
, a0,j
N −1
k=0 (1 + rk,j )
N −1
N −1
min max i=t yi,j
k=i (1 + rk,j ) + yN,j
At,j := at,j , at,j = −
,
N −1
k=t (1 + rk,j )
t−1
t−1
t−1
max
a0,j
(1 + rk,j ) +
yi,j
(1 + rk,j ) ,
k=0
i=0
k=i
t = 1, . . . , N − 1,
AN,j
N
−1
N
−1
N
−1
min max max
:= aN,j , aN,j = −yN,j , a0,j
(1 + rk,j ) +
yi,j
(1 + rk,j ) .
k=0
i=0
k=i
J Optim Theory Appl (2013) 156:380–416
395
min ≤ 0 and a min
min
Note that at,j
t+1,j can be recursively computed as at+1,j =
min + y )(1 + r ), so a min + y
min
(at,j
t,j
t,j
t,j ≤ 0 for t = 0, . . . , N − 1 and aN,j + yN,j = 0.
t,j
Proposition 5.1 Assumption 5.1 implies that the budget constraint (25) is satisfied.
Suppose that the partial derivatives of u with respect to each of its arguments
are positive. Then, at the stage N , the best choice for the d consumers is cN,j =
aN,j + yN,j , j = 1, . . . , d. A change of variable allows one to write the objective of
Problem OCdN as
N
−1
t=0
d
(1 + rt ) ◦ (at + yt ) − at+1
+
β u
vt,j (at,j ) + β N u(aN + yN ) ,
1 + rt
t
j =1
where, for two vectors a, b of the same dimensions, we denote by a ◦ b their entrywise product and, provided that all their components are different from 0, by 1/a,
1/b their entry-wise reciprocals. Having replaced ct,j with its expression in terms of
at,j and at+1,j , the largest allowable consumption at time t when the consumer j is
min
(1+rt,j )(at,j +yt,j )−at+1,j
. With such a choice, the next
1+rt,j
min
state for the consumer j is at+1,j . Moreover, the nonnegativity constraint ct,j ≥ 0
min by the no-Ponzibecomes at+1,j ≤ (1 + rt,j )(at,j + yt,j ), whereas at+1,j ≥ at+1,j
game-conditions. Summing up, Problem OCdN can be reformulated as an instance of
Problem ΣN with
max (a ) =
in the state at,j is ct,j
t,j
Xt :=
d
(26)
At,j ,
j =1
min
Dt := (at , at+1 ) ∈ (At × At+1 ) : at+1,j ∈ at+1,j
, (1 + rt,j )(at,j + yt,j ) , (27)
d
(1 + rt ) ◦ (at + yt ) − at+1
+
vt,j (at,j ),
ht (at , at+1 ) := u
1 + rt
(28)
j =1
and
hN (aN ) := u(aN + yN ).
(29)
In the following, we describe a condition that guarantees the smoothness of the
optimal policies on suitable subsets X̄t ⊂ Xt . When one of the components j of at is
min , the optimal policy g o (a ) cannot be interior, since one has necessarily
equal to at,j
t
t
o
min
gt,j (at ) = at+1,j . This is the reason for which in the next Assumption 5.2(i), the
interiority of the optimal policies is imposed only on a suitable subset of the state
space. We let
min
, j = 1, . . . , d ,
Āt := at ∈ At : at,j ≥ āt,j
t = 0, . . . , N,
(30)
396
J Optim Theory Appl (2013) 156:380–416
min such that a min < ā min < a max , and
for some āt,j
t,j
t,j
t,j
D̄t := Dt ∩ (Āt × Āt+1 ),
t = 0, . . . , N − 1.
(31)
We also require that gto (Āt ) ⊆ Āt+1 and that, for every at ∈ int(Āt ), one has gto (at ) ∈
int(D̄t (at )).
Assumption 5.2 Let m ≥ 2 be an integer, and let the sets At , t = 0, . . . , N , be chomin , j = 1, . . . , d}, t = 0, . . . , N ,
sen as in Assumption 5.1, Āt := {at ∈ At : at,j ≥ āt,j
d
min such that a min < ā min < a max , and I d :=
for some āt,j
u
j =1 Iu,j , where, for
t,j
t,j
t,j
N −1
N −1
N −1
max
j = 1, . . . , d, Iu,j := [0, a0,j
k=i (1 + rk,j ) + yN,j ].
k=0 (1 + rk,j ) +
i=0 yi,j
Then:
(i) there exist optimal policies gto that are continuous and interior on int(Āt ),
gto (Āt ) ⊆ Āt+1 , and, for every at ∈ int(Āt ), we have gto (at ) ∈ int(D̄t (at ));
(ii) u ∈ C m (Iud ), u is α-concave on Iud for some α > 0, and the partial derivatives
of u with respect to each of its arguments are positive on the set Iud (i.e., the
marginal utility of each consumption is positive on Iud );
(iii) vt,j ∈ C m (At,j ), vt,j is βt,j -concave on the At,j for some βt,j > 0, and the
derivative of each vt,j is positive on At,j for t = 0, . . . , N , j = 1, . . . , d.
The set Iu,j in Assumption 5.2 represents the largest interval to which the consumption cN,j (so, all the other consumptions ct,j , t = 0, . . . , N − 1) can belong.
Proposition 5.2 For Problem OCdN , Assumption 5.2 implies Assumption 3.1 with Xt
replaced by Āt and Dt replaced by D̄t .
Proposition 5.2 allows one to derive for Problem OCdN particular forms of Propositions 3.1, 3.2, and Theorem 4.1 on the accuracy of suboptimal solutions to Problem OCdN via approximation of the value functions by the family Ft = R(ψt , nt )
(see (15)) or Ft = G(ψt , nt ) (see (18)), not reported here for lack of space.
6 Design of the ADP Algorithm
Our results can be exploited to develop an ADP algorithm based on the use of
variable-basis approximation schemes to approximate the value function at each
stage.
The first step consists in performing a suitable discretization of the sets Xt for
t = 0, . . . , N − 1, to which the corresponding state vectors xt belong. Let Lt be the
number of discretization points at stage t. Of course, they should be spread “as uniformly as possible”. The notions of “uniformity” and “good spreading” have been
largely discussed in the literature on statistics and number-theoretic methods [44].
Given a compact set S ⊂ Rd and a positive integer L, let SL ⊂ S be a set of L sample
points si , i = 1, 2, . . . , L, belonging to S. The dispersion of SL is defined as [44]
θ (SL ) := sup min s − s̃.
s∈S s̃∈SL
J Optim Theory Appl (2013) 156:380–416
397
Fig. 1 Comparison between pure-random and Sobol’s samplings of the 2-dimensional unit cube
So, θ (SL ) is a measure of the uniformity of the distribution of the L points of SL .
Roughly speaking, a small value of θ (SL ) guarantees that the points of SL are spread
on S “in an uniform way”, i.e., “close enough” to one another and without leaving
regions “undersampled”.
Random sampling with an uniform distribution, called Monte Carlo sampling
[45], can be used to generate the discretized sets. Unfortunately, it is known that
the resulting points are not uniformly scattered, in the sense that their dispersion
has large values. To sample at each stage t = 0, . . . , N − 1 the set Xt to which the
corresponding state vectors xt belong, we shall adopt the approach of [13], based
on taking finite portions of the so-called low-discrepancy sequences (e.g., the good
lattice points sequence, the Niederreiter sequence, the Halton sequence, the Hammersley sequence, and the Sobol’ sequence), as it is known [46, p. 152] that they are
low-dispersion sequences. A random sampling of the set S with such sequences is
usually referred to as quasi-Monte Carlo sampling. As compared with Monte Carlo
sampling, discretization based on low-discrepancy sequences suffers less from the
formation of clusters of points in particular regions of the space [44] (such formation
undermines the sampling uniformity). A comparison between a sampling of the twodimensional unit cube by a sequence of 1000 points independently and uniformly
distributed and by a sampling of the same cube obtained via the Sobol’ sequence
[47] is shown in Fig. 1. It can be clearly seen how the space is better covered by the
second sequence and how the largest empty spaces among the points appear in the
first sampling scheme. These properties motivate our choice of Sobol’ sequences to
sample the sets Xt . We denote by XtLt the corresponding discretized set and by xtl ,
l = 1, . . . , Lt , its points.
The second step of the ADP algorithm consists in performing the optimizations
o
Jˆto xtl := sup ht xtl , y + β J˜t+1
(y) , l = 1, . . . , Lt , t = N − 1, . . . , 0, (32)
y∈Dt (xtl )
o (·)
for each discretized value from stage t = N − 1 back to stage t = 0, where J˜t+1
is the approximation of the value function at stage t + 1. Such an approximation is
built up backwards from t = N − 1 to t = 0 using a ridge (see Eq. (15)) or radial (see
398
J Optim Theory Appl (2013) 156:380–416
Eq. (18)) variable-basis approximation scheme, in such a way to guarantee an error
at most equal to εt uniformly over Xt , i.e.,
(33)
sup J˜to (xt ) − Jˆto (xt ) ≤ εt , t = N − 1, . . . , 0
xt ∈Xt
(see Proposition 2.2). The value of εt has to be chosen is such a way to guarantee
the desired accuracy of the suboptimal solution, i.e., such that supx0 ∈X0 |J0o (x0 ) −
J˜0o (x0 )| is below a desired threshold (see Theorem 4.1 for conditions guaranteeing
the existence of a sequence of approximators for which the accuracy is below the
threshold). The initialization is given by JÑo (·) ≡ hN (·).
We have to face the following two issues.
(1) The sup-norm in (33) does not allow the application of iterative descent methods.
To overcome this drawback, at each stage t we consider the Lpt -norm:
o
J˜ (xt ) − Jô (xt )
t
t
Lpt (Xt )
(1)
≤ εt .
(34)
By [48, Theorem 14F, p. 39], for any bounded and continuous function f
on Xt , one has limpt →∞ f Lpt (Xt ) = supxt ∈Xt |f (xt )|.
(2) Only the values Jˆto (xtl ) in correspondence of the discretization points are available, so, instead of (34), all we can impose is an upper bound on the corresponding discretized Lpt -norm:
Lt
o l p
1 J˜t xt − Jˆto xtl t
Lt
1/pt
(2)
≤ εt .
(35)
l=1
By exploiting the properties of low-discrepancy sequences used to discretize the
sets Xt , the smoothness properties that we have proved for the functions Jto and Jˆto ,
and the smoothness properties of the mother function (either a Gaussian or belonging
to the family (17)) used to obtain the approximation J˜to , one can show that, for a
sufficiently large number Lt of samples and value of pt and a sufficiently small value
(2)
of εt , the upper bound (35) implies (34). Similarly, one can prove that with a suitable
(1)
value of εt , the upper bound (34) guarantees (33).
As a detailed analysis of these topics is beyond the scope of the present paper, in
the following we limit ourselves to a sketch of a possible procedure suggested by the
results of the previous sections and the discussion above.
For t = 0, . . . , N − 1, let εt be the maximum allowed error in the sup-norm approximation supxt ∈Xt |J˜to (xt ) − Jˆto (xt )|, in such a way to guarantee the desired solution accuracy, i.e., such that supx0 ∈X0 |J0o (x0 ) − J˜0o (x0 )| is below a desired threshold
(Proposition 2.2).
(1) Choose a mother function ψ , either a Gaussian or belonging to the family (17).
(2) Let JÑo (·) := hN (·). Set t := N − 1.
(3) Choose a value of pt and a corresponding discretized Lpt -norm, a number nt of
basis functions, and a number Lt of discretization points.
(4) Use low-discrepancy sequences to generate the discretized set XtLt .
J Optim Theory Appl (2013) 156:380–416
399
o (y)], l = 1, . . . , L .
(4.1) Compute Jˆto (xtl ) := maxy∈Dt (x l ) [ht (xtl , y) + β J˜t+1
t
t
n
(4.2) Let J˜t (·, δt , wt ) := i=1 δt,i φ(·, wt,i ) be the structure of the approximate
value functions obtained via a ridge (see Eq. (15)) or radial (see Eq. (18))
variable-basis approximation scheme with the mother function ψ . Find
o o
δt , wt := arg min
δt ,wt
Lt
l
l pt
1 o
J˜t xt , δt , wt − Jˆt xt
.
Lt
l=1
Let J˜to (·) := J˜t (·, δto , wto ).
(4.3) Compute the maximum error Errt (XtLt ) over the discretized set XtLt :
Errt XtLt := max J˜to xtl − Jˆto xtl .
l=1,...,Lt
(4.4) If Errt (XtLt ) ≤ εt and t = 0, then stop. If Errt (XtLt ) ≤ εt and t = 0, then set
t := t − 1 and go back to step (3). If Errt (XtLt ) > εt , increase pt and/or Lt
and/or nt and go back to step (4).
Many variations of the above-described procedure are possible. For instance, the
desired smoothness of J˜to can be increased by adding a regularization term in the
objective function of step (4.2), e.g., proportional to the squared l2 -norm of the vector
of its parameters. Moreover, instead of fixing εt and varying nt until the desired
accuracy is guaranteed, one may fix nt and find εt a posteriori, after performing
step (4.2). If the resulting value of εt is not sufficiently small, then one can increase
nt and repeat step (4.2) with the new value of nt .
7 Numerical Results
In order to evaluate the effectiveness of our approach, we present numerical results for
two instances of Problem ΣN , Problem OCdN described in Sect. 5 and a test problem,
for which the optimal solution can be found. In the following, we shall refer to the
latter as “Problem with Known Solution”, or Problem KSdN for short. We constructed
it via the inverse-problem technique used in [49, 50] to derive N -stage optimization
problems associated with given optimal policy functions.
Problem KSdN Given Xt := [0, 1]d , Dt := [0, 1]d × [0, 1]d , t = 0, . . . , N − 1, β ∈
[0, 1], and a sequence of functions ḡt : [0, 1]d → [0, 1]d , t = 0, . . . , N − 1, such that
ḡt (xt ) ∈ (0, 1)d for all xt ∈ (0, 1)d , find
J o (x0 ) :=
sup
N
−1
xt+1 ,t=0,...,N −1 t=0
β t ht (xt , xt+1 ) + hN (xN ),
where hN (xN ) := 1 − 12 xN 2 and ht (xt , xt+1 ) := 1 − 12 xt 2 − 12 xt+1 − ḡt (xt )2 −
β(1 − 12 xt+1 2 ).
400
J Optim Theory Appl (2013) 156:380–416
It follows by an application of Bellman’s equations (2a)–(2b) that the optimal
policies of Problem KSdN are gto (xt ) = ḡt (xt ), t = 0, . . . , N − 1, and the value functions are given by Jto (xt ) = 1 − 12 xt 2 for t = 0, . . . , N (so, they are smooth and
αt -concave with αt = 1). Thus, the optimal value function of Problem KSdN at time
t = 0 is
1
J o (x0 ) := J0o (x0 ) = 1 − x0 2 .
2
For both Problems KSdN and OCdN , we present the results obtained with two instances of the ridge approximation scheme (15) with basis functions given by the
hyperbolic tangent sigmoid (called in the following as sigmoidal basis functions) and
the sinusoid (both belonging to the family (17)), respectively, and one instance of the
radial approximation scheme (18) with Gaussian basis functions. The inner parameters are the weights and biases in the sigmoids, the frequencies and phases in the
sinusoids, and the centers and widths in the Gaussians. For every t = 0, . . . , N − 1,
we used the same number nt of basis functions, denoted simply by n; the left-hand
side of (35) was computed for pt = 2. In all the considered cases, for a given value
of n, the number of parameters to optimize is the same for functions belonging to the
class R or G (see Eqs. (15) and (18)).
We compared the results with linear approximators with the same type of basis functions, but having fixed inner parameters (so, only the outer parameters δi ,
i = 1, . . . , n, have to be optimized). Specifically, we fixed the inner parameter values randomly by using Sobol’ low-discrepancy sequences, in order to cover the
state space as uniformly as possible. We used such sequences also to discretize via
Lt = 1000 points the sets Xt , t = 0, . . . , N − 1 (we used the same value of Lt for
all t). The general case in which Lt , nt , and pt depend on t is described in Sect. 6 and
can be implemented in a similar way. To display in a compact way the realizations
of the error for various values of the initial state x0 , we use a pictorial representation
known in Statistics as boxplot [51]. Specifically, the “box” in a boxplot ranges from
the 25th percentile (25% of the realizations are at or below it) to the 75th percentile
(75% of the realizations are at or below it). The length of the box corresponds to
the inter-quartile range, i.e., the difference between the 75th percentile and the 25th
percentile. The line inside the box is the median, or the 50th percentile (half of the
realizations of the optimal cost are above it and half are below it). The lines extending
from each end of the box are the whiskers: they contain “less significant” realizations
of the error that do not fit into the box. The single values beyond the borders of the
whiskers are the outliers.
The simulations were performed using the Optimization Toolbox of Matlab on a
personal computer with a 1.8 GHz Core2 Duo CPU and 2 GB of RAM.
7.1 Numerical Results for Problem KSdN
We solved Problem KSdN over a total number of four decision stages (i.e., N = 3) and
two values of d, d = 2 and d = 10. The discount factor β was taken equal to 1 and
the functions ḡt , t = 0, . . . , N − 1, were chosen as
ḡt (xt ) :=
1 2
+ sin(txt ).
2 5
J Optim Theory Appl (2013) 156:380–416
401
Table 1 Summary of the simulation results for Problem KSdN
Median of eJ (x0 )
Sigmoidal
Mean simulation time [s]
Gaussian
Sinusoidal
Sigmoidal
Gaussian
Sinusoidal
Fixed-basis functions
d =2
d = 10
n=5
2.00 × 10−2
1.02 × 10−3
1.34 × 10−2
3.33 × 101
9.50 × 101
9.45 × 101
n = 10
1.84 × 10−2
3.98 × 10−4
3.45 × 10−4
1.35 × 102
1.72 × 102
1.60 × 102
n = 15
1.35 × 10−2
1.10 × 10−4
3.57 × 10−4
1.07 × 102
2.67 × 102
2.76 × 102
n = 20
7.01 × 10−4
8.71 × 10−5
6.10 × 10−5
1.37 × 102
4.45 × 102
3.36 × 102
n=5
4.94 × 10−1
3.16 × 10−1
4.92 × 10−1
1.09 × 102
7.03 × 102
4.68 × 102
n = 10
4.91 × 10−1
8.58 × 10−2
4.85 × 10−1
2.27 × 102
3.12 × 103
1.25 × 103
n = 15
4.85 × 10−1
3.18 × 10−2
4.88 × 10−1
7.50 × 102
4.77 × 103
2.30 × 103
n = 20
4.62 × 10−1
7.07 × 10−3
4.11 × 10−1
3.82 × 102
6.17 × 103
3.57 × 103
Variable-basis functions
d =2
d = 10
n=5
5.35 × 10−4
1.53 × 10−3
1.86 × 10−4
1.04 × 103
2.39 × 103
9.22 × 102
n = 10
1.94 × 10−5
4.23 × 10−4
7.88 × 10−5
2.00 × 103
5.39 × 103
1.52 × 103
n = 15
7.71 × 10−6
3.04 × 10−4
3.65 × 10−5
2.64 × 103
1.02 × 104
2.28 × 103
n = 20
8.67 × 10−6
3.46 × 10−5
2.96 × 10−5
3.31 × 103
1.33 × 104
2.63 × 103
n=5
3.86 × 10−1
2.07 × 10−1
3.01 × 10−1
1.19 × 104
1.50 × 104
1.36 × 104
n = 10
3.27 × 10−1
2.01 × 10−2
4.29 × 10−2
1.98 × 104
2.64 × 104
1.99 × 104
n = 15
1.25 × 10−1
1.94 × 10−3
2.19 × 10−2
2.07 × 104
3.78 × 104
2.08 × 104
n = 20
6.95 × 10−3
3.23 × 10−3
1.67 × 10−2
2.24 × 104
4.92 × 104
2.23 × 104
The results reported in the following compare the performances of the considered
types of fixed- and variable-basis functions for a total of 1000 different values of
x0 ∈ [0, 1]d .
As the solution (i.e., J o (x0 ) at stage 0) of Problem KSdN is known, once fixed
x0 we can evaluate the effectiveness of the approach by measuring as follows the
distance between the optimal and the approximate solutions:
eJ (x0 ) :=
|Jõ (x0 ) − J o (x0 )|
.
|J o (x0 )|
(36)
Table 1 provides the medians of the error eJ (x0 ) computed as in (36), together with
the mean simulation times (in seconds) obtained by the ADP algorithm with fixedand variable-basis approximators for 1000 different initial conditions x0 . Figure 2
contains a pictorial sketch of the error eJ using boxplots.
7.2 Numerical Results for Problem OCdN
We solved Problem OCdN over a total number of four decision stages (i.e., N = 3)
and various numbers d of consumers: d = 2, d = 10, and d = 30.
In all these cases, the labor income yt,j of the consumer j at time t, j = 1, . . . , d,
t = 0, . . . , N − 1, was randomly generated according to the uniform distribution in
402
J Optim Theory Appl (2013) 156:380–416
Fig. 2 Boxplots of the error eJ for Problem KSdN , where “F” and “V” denote the approximation schemes
with fixed- and variable-basis functions, respectively
the interval [2, 5]. The interest rate rt,j of the good consumed by consumer j at
time t, j = 1, . . . , d, t = 0, . . . , N , was uniformly distributed between 0 and 0.1.
max was set equal to 20 for every
The discount factor β was taken equal to 1, and a0,j
j = 1, . . . , d. The conditions of Assumption 5.2 were imposed by choosing suitable
logarithmic functions for u(·) and vt,j (·) in such a way that, for every consumer, the
min and a
choices at+1,j = at+1,j
t+1,j = (1 + rt,j )(at,j + yt,j ) (i.e., ct = 0) are penalized
so that they are never optimal next choices for the j th component of the state, at least
min , a max ] ⊂ [a min , a max ], where
for values of at,j in a suitable interval of the form [āt,j
t,j
t,j
t,j
min and a max are determined by a max and Assumption 5.1. In particular, the function
at,j
t,j
0,j
u(·), for t = 0, . . . , N , was taken equal to
3
u(ct ) :=
2
d
j =1
K ln(ct,j
d
2
1
+ ε) − !4 +
K ln(ct,j + ε) ,
2
(37)
j =1
where K := 10 and ε := 1. For t = 0, . . . , N − 1 and j = 1, . . . , d, we have chosen
min + ε), where K := 10 and ε := 1, and likewise for u(c ).
vt,j (at,j ) := K ln(at,j − at,j
t
The use of logarithmic reward functions is quite common for the problem of optimal
consumption (see, e.g., [43, Chap. 6]). The value ε > 0 in the expression (37) of the
social utility function has to be sufficiently small so that the choice ct,j = 0, j =
1, . . . , d, is sufficiently penalized (i.e., the corresponding value u(ct,j ) is negative
and has a sufficiently large absolute value), while the arguments of the logarithms
are positive.
The function u(ct ) specified by (37) is√of the form f2 (f1 (ct )), where
f1 (ct ) := dj =1 K ln(ct,j + ε) and f2 (z) := 32 z − 12 4 + z2 , where the functions f1
and f2 are nonlinear, and their composition is strongly concave. The function f2
allows one to increase the interactions among the d consumers.
J Optim Theory Appl (2013) 156:380–416
403
Fig. 3 Examples of wealth incomes of Problem OCdN obtained with variable-basis functions
The results reported in the following compare the performances of the considered types of fixed- and variable-basis functions for a total of 1000 different values
min , a max ], j = 1, . . . , d (see Assumption 5.1). Examples of the wealth
of a0,j ∈ [a0,j
0,j
income at,j in the cases of d = 2, d = 10, and d = 30 consumers obtained with
variable-basis functions are reported in Fig. 3.
Table 2 provides the medians of the approximate value Jõ (a0 ) at stage 0, together
with the mean simulation times (in seconds) obtained by the ADP algorithm with
both fixed- and variable-basis approximators for 1000 different initial conditions a0
in the cases of 2, 10, and 30 consumers.
Due to the unavailability of closed-form solutions to Problem OCdN , following the
criterion adopted in [12, p. 46], we evaluated the performances of the approximations
on the basis of many different solutions obtained with the ADP approach itself. More
specifically, once fixed n and d and for each initial condition a0 , the corresponding
“optimal” value J o (a0 ) was supposed to be equal to the largest one among those
obtained by the sigmoidal, Gaussian, or sinusoidal basis functions with either fixedor variable-basis functions. We denote such an optimal value as Jõ,max (a0 ). Then,
we define the error eJ (a0 ) as the difference between the value Jõ (a0 ) obtained by a
given type of basis function and the “optimal” one Jõ,max (a0 ), i.e.,
eJ (a0 ) :=
|Jõ (a0 ) − Jõ,max (a0 )|
.
|Jõ,max (a0 )|
(38)
Such a procedure does not allow one to evaluate the performance of an approximation
strategy per se, i.e., it does not provide the distance between Jõ (a0 ) and J o (a0 ),
but is useful whenever one wants to compare the solutions of a given optimization
problem obtained by the same algorithm in different “configurations”. In our case,
such configurations correspond to the different types of basis functions used for the
approximation of the value functions with the ADP algorithm using either fixed- or
variable-basis schemes. Figure 4 shows the boxplots of the “errors” eJ (a0 ) computed
according to (38) for different initial conditions a0 and types and numbers of basis
functions in the cases of 2, 10, and 30 consumers.
7.3 Discussion of the Simulation Results
The simulation results obtained by Problems KSdN and OCdN exhibit similar features,
so a unified discussion is presented.
404
J Optim Theory Appl (2013) 156:380–416
Table 2 Summary of the simulation results for Problem OCdN
Median of Jõ (a0 )
Sigmoidal
Mean simulation time [s]
Gaussian
Sinusoidal
Sigmoidal
Gaussian
Sinusoidal
Fixed-basis functions
d =2
d = 10
d = 30
n=5
1.84 × 102
1.08 × 102
1.15 × 102
8.63 × 101
9.62 × 101
7.97 × 101
n = 10
2.61 × 102
1.89 × 102
1.38 × 102
1.59 × 102
2.68 × 102
1.75 × 102
n = 15
2.61 × 102
2.55 × 102
1.39 × 102
2.71 × 102
5.41 × 102
2.97 × 102
n = 20
2.64 × 102
2.88 × 102
1.45 × 102
4.80 × 102
1.02 × 103
5.19 × 102
n=5
6.79 × 102
6.08 × 102
7.07 × 102
7.61 × 102
6.96 × 102
8.28 × 102
n = 10
8.29 × 102
8.81 × 102
7.15 × 102
1.94 × 103
4.80 × 103
1.44 × 103
n = 15
9.41 × 102
9.92 × 102
7.20 × 102
3.07 × 103
1.17 × 104
1.99 × 103
n = 20
1.09 × 103
1.33 × 103
7.50 × 102
4.56 × 103
1.47 × 104
2.83 × 103
n=5
1.94 × 103
2.07 × 103
1.98 × 103
3.93 × 103
1.86 × 103
3.73 × 103
n = 10
2.10 × 103
2.17 × 103
2.14 × 103
8.07 × 103
4.15 × 103
5.95 × 103
n = 15
2.19 × 103
2.18 × 103
2.19 × 103
1.01 × 104
2.03 × 104
9.74 × 103
n = 20
2.19 × 103
2.36 × 103
2.28 × 103
1.09 × 104
3.06 × 104
1.22 × 104
Variable-basis functions
d =2
d = 10
d = 30
n=5
2.88 × 102
2.90 × 102
1.48 × 102
2.44 × 103
7.21 × 103
1.06 × 103
n = 10
2.91 × 102
2.90 × 102
1.48 × 102
5.17 × 103
1.07 × 104
1.26 × 103
n = 15
2.91 × 102
2.91 × 102
1.60 × 102
9.24 × 103
3.12 × 104
2.56 × 103
n = 20
2.90 × 102
2.91 × 102
1.55 × 102
1.43 × 104
3.92 × 104
4.18 × 103
n=5
8.22 × 102
1.28 × 103
7.71 × 102
9.42 × 103
1.64 × 104
4.58 × 103
n = 10
7.80 × 102
1.32 × 103
7.82 × 102
1.41 × 104
2.37 × 104
1.24 × 104
n = 15
9.51 × 102
1.34 × 103
9.00 × 102
1.46 × 104
3.78 × 104
1.57 × 104
n = 20
9.85 × 102
1.29 × 103
9.53 × 102
1.54 × 104
4.11 × 104
1.94 × 104
n=5
2.18 × 103
2.66 × 103
2.24 × 103
1.28 × 104
1.76 × 104
2.94 × 104
n = 10
2.17 × 103
2.58 × 103
2.46 × 103
1.48 × 104
3.28 × 104
3.48 × 104
n = 15
2.47 × 103
3.51 × 103
2.56 × 103
1.57 × 104
4.11 × 104
3.71 × 104
n = 20
2.60 × 103
3.64 × 103
2.66 × 103
1.61 × 104
4.87 × 104
3.95 × 104
For both problems, the best results were obtained by using Gaussian variablebasis functions. The performances of the sigmoidal basis functions are similar to the
Gaussian ones only for d = 2, whereas a larger value of d (i.e., d = 10 or d = 30)
entails worse results. The sinusoidal basis functions provide the worst results with
d = 2, whereas similar results to those obtained with the sigmoidal functions are
presented for d = 10 and d = 30.
The similar performances between sigmoidal and sinusoidal functions may be ascribed to the fact that they belong to the same family R defined by (15), i.e., they
both use a scalar product between coefficients and inputs. By contrast, the Gaussian
basis functions belong to the family G defined by (18) and are based on a distance
between the inputs and the coordinates of the centers rather than on a scalar product.
As a consequence, they are more “localized” than sigmoids and sinusoids, which are
J Optim Theory Appl (2013) 156:380–416
405
Fig. 4 Boxplots of the errors eJ for Problem OCdN , where “F” and “V” denote the approximation schemes
using fixed- and variable-basis functions, respectively
spread all over the domain Xt . In the considered examples, the radial variable-basis
scheme provides better results than the ridge one.
As compared with the results obtained with fixed-basis approximators, variablebasis ones guarantee better performances in term of accuracies. The gap in the results
increases with d: for d = 2, the performances of the two types of approximators are
similar, whereas for d = 10 and d = 30, the variable-basis approximators outperforms the fixed-basis ones. In both Problems KSdN and OCdN , the simulation results
confirm the good properties of variable-basis approximation schemes when the dimension of the inputs of the value functions increases. Fixed-basis schemes require
more basis functions to obtain the same performances of variable-basis ones. The required number of fixed-basis functions needed to guarantee the same approximation
capabilities of variable-basis ones grows with the dimension of the input of value
functions.
Once the type of basis functions and the number d have been fixed, the value of
the errors decreases if the number n of basis functions increases. This turns out to be
more evident with d = 10 or d = 30, whereas with d = 2, the difference in the results
obtained with various n is reduced. This can be explained as follows. In the case of
d = 2, the dimension of the inputs xt , t = 0, . . . , N − 1, of the approximate tth value
functions J˜to is quite small, so also a small number of basis function can provide satisfactory approximations. In other words, in this case there is no need to use many
406
J Optim Theory Appl (2013) 156:380–416
Table 3 Computational efforts of value-function approximations
Fixed-basis schemes
Variable-basis schemes
Number of discretization points of Xt
L
L
Number of optimizations
L + 1 (Eqs. (32) and (35))
L + 1 (Eqs. (32) and (35))
d (Eq. (32))
d (Eq. (32))
n (Eq. (35))
dn + 2n (Eq. (35))
Number of unknowns
basis functions, i.e., many parameters. On the contrary, when the dimension of the
input of the approximate tth value functions increases (in the example, from d = 10
to d = 30), a larger value of n—thus a larger number of parameters—provides an increased approximation capability, which enables one to obtain better approximations.
This is particularly evident in the case of d = 30.
Concerning the simulation times, the larger d, the larger the computational time
needed to perform the optimizations with either fixed- or variable-basis functions.
From the numerical results it turns out that the Gaussians require a larger computational effort with respect to the other two types of basis functions. The computational
times of the sigmoidal and sinusoidal basis functions are similar and smaller than the
corresponding times for the Gaussians. In all cases, as expected, the simulation times
grow as the number n of basis functions grows since it is required to find the optimal
value of a larger number of parameters. For the same number n of basis functions,
the variable-basis schemes require a larger computational effort. Indeed, the number
of parameters to be optimized in the variable-basis case is larger than the fixed-basis
one (one has to optimize also the value of the inner parameters).
7.4 Computational Aspects
Table 3 summarizes the computational efforts needed to find the approximations of
the value functions at each stage t, using at each stage the same numbers L of discretization levels and n of basis functions.
The approximate value functions J˜to are obtained by solving the optimization
problem (35) at each time t. In the case of variable-basis functions, such an optimization is performed in the (dn + 2n)-dimensional space of the parameters of the
approximating structures (15) and (18). For fixed-basis functions, the number of unknowns in Eq. (35) is equal to n. At each time t = 0, . . . , N − 1, to find the approximation of Jto , we have to make first the L optimizations (32), in such a way to obtain
the values of Jˆto at the discretization points xtl of Xt . Each of such additional optimizations involves merely d unknowns, and thus it is easier to solve with respect
to the optimization (35). Hence, the total number of optimizations that have to be
performed at each time t is equal to L + 1.
Each optimization in (35) is a mathematical programming problem that can be
solved, e.g., via iterative descent methods. Specifically, we exploited the sequential
quadratic programming algorithm [52]. In the case of the variable-basis approach,
problem (35) is more difficult to solve than in the fixed-basis one because of the presence of the free inner parameters, on which the approximate value functions depend
J Optim Theory Appl (2013) 156:380–416
407
nonlinearly. As a consequence, the numerical solver used for (35) may be trapped
into local minima, thus compromising the effectiveness of the approximation. In order to mitigate this risk, we have adopted a “multistart technique”, which consists in
solving (35) for several different initial values of the parameter vectors of the approximating structures (15) and (18), and choosing the parameters corresponding to the
best results as the optimal ones. By contrast, in the fixed-basis case the approximate
value functions depend linearly on the unknown parameters, and thus their optimal
selection can be performed more easily. However, when global optimal solutions are
found, the quality of variable-basis approximations are better in general, as variablebasis functions have a greater approximation capability than fixed-basis ones.
In general, the number of fixed-basis functions needed to obtain the same approximation accuracy of variable-basis ones is large if d is large. Hence, in the online
phase of the ADP algorithm, after having solved problem (35), one has to deal with
complex approximators, in which the linear combinations in (15) or (18) are made up
of a large number of terms. On the contrary, variable-basis approximators are more
easy to deal with online once the optimal value of the parameters have been found
since the sums in (15) or (18) are made up by a smaller number of terms.
The use of sigmoidal or Gaussian basis functions requires the computation of exponential functions, whereas the use of the sinusoidal basis functions requires to
compute trigonometric functions. In both cases, this is usually done by means of
truncated Taylor series. Furthermore, the use of Gaussian basis functions entails the
computations of norms, whereas sigmoidal and sinusoidal basis functions are based
on the computations of inner products. The complexity of these two operations is
quite similar. Thus, we can conclude that the time needed to find the output of an
approximating structure for all the considered types of approximators is almost the
same. The choice of the type of approximator does not change the total number of
optimizations described above. The increased amount of computational time needed
to find the approximations when using the Gaussians may be ascribed to the form of
the objective functions one has to deal with in the various optimizations. With respect
to this, note that Gaussians exhibit opposite geometrical properties as compared with
sigmoids and sinusoids. In the first case, one has to compute distances to centers, and
such distances become the argument of a Gaussian. Hence, they respond to “localized” regions. By contrast, in the second case, the arguments of the basis functions
are weighted sums of inputs plus biases, so they respond to “nonlocalized” regions
of the input space. The local nature of the Gaussian functions may generate very
complex objective functions, possibly with a lot of local minima. For this reason, the
optimization procedure when using Gaussian functions may be more complex and so
more time-consuming.
8 Conclusive Remarks
We have investigated the solution of sequential decision problems, modeled as Problem ΣN , by means of DP combined with approximation of the value functions at
each stage. We have considered variable-basis approximation schemes frequently
used in applications, such as neural networks. Both standard approximate DP and the
408
J Optim Theory Appl (2013) 156:380–416
multistage-lookahead technique have been studied. In particular, we have addressed
the curse of dimensionality in value-function approximation, i.e., the risk of a very
fast growth of the number of basis functions in the approximation scheme (hence,
of the number of coefficients to be optimized) required to guarantee a desired approximation accuracy of the value functions.
The estimates that
√ we have derived on the accuracy of suboptimal solutions are
proportional to 1/ n, where n is the number of variable-basis functions. To the best
of our knowledge, for quite general formulations of sequential optimization problems in the form of Problem ΣN , these are the first estimates of this kind, where
the number of variable-basis functions required to guarantee a desired accuracy is
estimated. Our results show a way to face via approximate DP high-dimensional,
continuous-state, sequential decision problems and provide insights into the effectiveness of value-function approximation by neural networks, for which there exists
a large experimental evidence.
The proposed approach has been tested numerically on a problem of optimal consumption under uncertainty, for which we have compared traditional fixed-basis approximators with variable-basis ones that model Gaussian, sinusoidal, and sigmoidal
neural networks.
Appendix
Proof of Proposition 2.2 (i) We use a backward induction argument. For t = N − 1,
o ∈F
o
. . . , 0, assume that, at stage t + 1, J˜t+1
t+1 is such that supxt+1 ∈Xt+1 |Jt+1 (xt+1 ) −
o
J˜t+1 (xt+1 )| ≤ ηt+1 for some ηt+1 ≥ 0. In particular, for t = N −1, one has ηN = 0, as
o )(x ) − f (x )| ≤ ε .
JÑo = JNo . By (3), there exists ft ∈ Ft such that supxt ∈Xt |(Tt J˜t+1
t
t t
t
Set J˜to = ft . By the triangle inequality and Proposition 2.1,
sup J o (xt ) − Jõ (xt ) ≤ sup Tt J o (xt ) − Tt Jõ (xt )
xt ∈Xt
t
t
t+1
xt ∈Xt
t+1
o
+ sup Tt J˜t+1
(xt ) − J˜to (xt )
xt ∈Xt
≤ βηt+1 + εt := ηt .
Then, after N iterations we get supx0 ∈X0 |J0o (x0 ) − J˜0o (x0 )| ≤ η0 = ε0 + βη1 = ε0 +
−1 t
βε1 + β 2 η2 = · · · := N
t=0 β εt .
o ∈F
(ii) As before, for t = N − 1, . . . , 0, assume that, at stage t + 1, J˜t+1
t+1 is such
o
o
that supxt+1 ∈Xt+1 |Jt+1 (xt+1 ) − J˜t+1 (xt+1 )| ≤ ηt+1 for some ηt+1 ≥ 0. In particular,
o . Proposition 2.1 gives
for t = N − 1, one has ηN = 0, as JÑo = JNo . Let Jˆto = Tt J˜t+1
o
o
sup Jto (xt ) − Jˆto (xt ) = sup Tt Jt+1
(xt ) − Tt J˜t+1
(xt )
xt ∈Xt
xt ∈Xt
≤β
sup
xt+1 ∈Xt+1
o
J (xt+1 ) − Jõ (xt+1 ) ≤ βηt+1 .
t+1
t+1
J Optim Theory Appl (2013) 156:380–416
409
Before moving to the tth stage, one has to find an approximation J˜to ∈ Ft for
o . Such an approximation has to be obtained from Jô = T Jõ (which,
Jto = Tt Jt+1
t t+1
t
o is unknown. By assumption,
in general, may not belong to Ft ), because Jto = Tt Jt+1
there exists ft ∈ Ft such that supxt ∈Xt |Jto (xt ) − ft (xt )| ≤ εt . However, in general,
one cannot set J˜to = ft , since on a neighborhood of radius βηt+1 of Jˆto in the supnorm, there may exist (besides Jto ) some other function It = Jto which can also be
approximated by some function f˜t ∈ Ft with error less than or equal to εt . As Jto is
unknown, in the worst case it happens that one chooses J˜to = f˜t instead of J˜to = ft .
In such a case, we get
sup Jto (xt ) − J˜to (xt ) ≤ sup Jto (xt ) − Jˆto (xt ) + sup Jˆto (xt ) − It (xt )
xt ∈Xt
xt ∈Xt
xt ∈Xt
+ sup It (xt ) − J˜to (xt ) ≤ 2βηt+1 + εt .
xt ∈Xt
Let ηt := 2βηt+1 + εt . Then, after N iterations we have supx0 ∈X0 |J0o (x0 ) − J˜0o (x0 )| ≤
−1
t
η0 = ε0 + 2βη1 = ε0 + 2βε1 + 4β 2 η2 = · · · = N
t=0 (2β) εt .
Proof of Proposition 2.3 Set ηN/M = 0 and for t = N/M − 1, . . . , 0, assume that,
o ∈F
o
at stage t + 1 of ADP(M) , J˜t+1
t+1 is such that supxt+1 ∈Xt+1 |JM·(t+1) (xt+1 ) −
o
J˜ (xt+1 )| ≤ ηt+1 . Proceeding as in the proof of Proposition 2.2(i), we get the ret+1
cursion ηt = 2β M ηt+1 + εt (where β M replaces β since in each iteration of ADP(M)
one can apply M times Proposition 2.1).
In order to prove Proposition 3.1, we shall apply the following technical lemma
(which readily follows by [53, Theorem 2.13, p. 69] and the example in [53, p. 70]).
A B such that D is nonsingular,
Given a square partitioned real matrix M = C
D
Schur’s complement M/D of D in M is defined [53, p. 18] as the matrix M/D =
A − BD −1 C. For a symmetric real matrix, we denote by λmax its maximum eigenvalue.
B
be a partitioned symmetric negative-semidefinite maLemma 9.1 Let M = BAT D
trix such that D is nonsingular. Then λmax (M/D) ≤ λmax (M).
In the proof of the next theorem, we shall use the following notations. The symbol ∇ denotes the gradient operator when it is applied to a scalar-valued function and
the Jacobian operator when applied to a vector-valued function. We use the notation
∇ 2 for the Hessian. In the case of a composite function, e.g., f (g(x, y, z), h(x, y, z)),
by ∇i f (g(x, y, z), h(x, y, z)) we denote the gradient of f with respect to its ith
(vector) argument, computed at (g(x, y, z), h(x, y, z)). The full gradient of f with
respect to the argument x is denoted by ∇x f (g(x, y, z), h(x, y, z)). Similarly, by
2 f (g(x, y, z), h(x, y, z)) we denote the submatrix of the Hessian of f computed
∇i,j
at (g(x, y, z), h(x, y, z)), whose first indices belong to the vector argument i and
the second ones to the vector argument j . ∇Jto (xt ) is a column vector, and ∇gto (xt )
is a matrix whose rows are the transposes of the gradients of the components of
410
J Optim Theory Appl (2013) 156:380–416
o the j th component of the optimal policy function g o
gto (xt ). We denote by gt,j
t
(j = 1, . . . , d). The other notations used in the proof are detailed in Sect. 3.
Proof of Proposition 3.1 (i) Let us first show by backward induction on t that Jto ∈
o ∈ C m−1 (X ) (which we also need in the
C m (Xt ) and, for every j ∈ {1, . . . , d}, gt,j
t
o
o
m
proof). Since JN = hN , we have JN ∈ C (XN ) by hypothesis. Now, fix t and suppose
o ∈ C m (X
that Jt+1
t+1 ) and is concave. Let xt ∈ int(Xt ). As by hypothesis the optimal
policy gto is interior on int(Xt ), the first-order optimality condition ∇2 ht (xt , gto (xt ))+
o (g o (x )) = 0 holds. By the implicit function theorem we get
β∇Jt+1
t
t
o
2 −1 2 o
ht xt , gto (xt ) + β∇ 2 Jt+1
gt (xt )
∇gto (xt ) = − ∇2,2
∇2,1 ht xt , gto (xt ) , (39)
2 (h (x , g o (x ))) + β∇ 2 J o (g o (x )) is nonsingular as ∇ 2 (h (x , g o (x )))
where ∇2,2
t t t
t
t
t
2,2 t t t
t+1 t
o (g o (x )) is
is negative semidefinite by the αt -concavity of ht for αt > 0, and ∇ 2 Jt+1
t
t
o is concave.
negative definite since Jt+1
o
of
By differentiating the two members of (39) up to derivatives of ht and Jt+1
o
order m, for j = 1, . . . , d, we get gt,j ∈ C m−1 (int(Xt )). As the expressions that one
can obtain for its partial derivatives up to the order m − 1 are bounded and continuous
o ∈ C m−1 (X ).
not only on int(Xt ), but on the whole Xt , one has gt,j
t
o (g o (x )) we obtain
By differentiating the equality Jto (xt ) = ht (xt , gto (xt )) + βJt+1
t
t
T T
o
∇Jt (xt ) = ∇1 ht xt , gto (xt )
o
T
o
gt (xt ) ∇gto (xt ).
+ ∇2 ht xt , gto (xt ) + β∇Jt+1
So, by the first-order optimality condition we get
∇Jto (xt ) = ∇1 ht xt , gto (xt ) .
(40)
By differentiating the two members of (40) up to derivatives of ht of order m,
we obtain Jto ∈ C m (int(Xt )). Likewise for the optimal policies, this extends to Jto ∈
C m (Xt ).
In order to conclude the backward induction step, it remains to show that Jto is
concave. This can be proved by the following direct argument. By differentiating
(40) and using (39), for the Hessian of Jto , we obtain
2
∇ 2 Jto (xt ) = ∇1,1
ht xt , gto (xt )
2 −1
2
o
xt , gto (xt )
− ∇1,2
ht xt , gto (xt ) ∇2,2
ht xt , gto (xt ) + β∇ 2 Jt+1
2
× ∇2,1
ht xt , gto (xt ) ,
2 h (x , g o (x )) + β∇ 2 J o (x , g o (x ))] in the
which is Schur’s complement of [∇2,2
t t t
t
t+1 t t t
matrix
#
" 2
2 h (x , g o (x ))
∇1,2
∇1,1 ht (xt , gto (xt ))
t t t t
.
2 h (x , g o (x )) ∇ 2 h (x , g o (x )) + β∇ 2 J o (x , g o (x ))
∇2,1
t t t t
t
2,2 t t t t
t+1 t t
J Optim Theory Appl (2013) 156:380–416
411
Note that such a matrix is negative semidefinite, as it is the sum of the two matrices
" 2
#
"
#
2 h (x , g o (x ))
∇1,1 ht (xt , gto (xt )) ∇1,2
0
0
t t t t
and
,
o (x , g o (x ))
2 h (x , g o (x )) ∇ 2 h (x , g o (x ))
0 β∇ 2 Jt+1
∇2,1
t t
t
t t t
t
2,2 t t t t
o are concave and twice continuously
which are negative-semidefinite as ht and Jt+1
differentiable. In particular, it follows by [54, p. 102] (which gives bounds on the
eigenvalues of the sum of two symmetric matrices) that its maximum eigenvalue is
smaller than or equal to αt . Then, it follows by Lemma 9.1 that Jto is concave (even
αt -concave).
Thus, by backward induction on t and by the compactness of Xt we conclude that,
for every t = N, . . . , 0, Jto ∈ C m (Xt ) ⊂ Wpm (int(Xt )) for every 1 ≤ p ≤ +∞.
(ii) As Xt is bounded and convex, by Sobolev’s extension theorem [34, Theorem 5, p. 181, and Example 2, p. 189], for every 1 ≤ p ≤ +∞, the function
o,p
Jto ∈ Wpm (int(Xt )) can be extended on the whole Rd to a function J¯t ∈ Wpm (Rd ).
(iii) For 1 < p < +∞, the statement follows by item (ii) and the equivalence
between Sobolev spaces and Bessel potential spaces [34, Theorem 3, p. 135]. For
p = 1 and m ≥ 2 even, it follows by item (ii) and the inclusion W1m (Rd ) ⊂ B1m (Rd )
from [34, p. 160].
o with
Proof of Proposition 3.2 (i) is proved likewise Proposition 3.1 by replacing Jt+1
o and g o with g̃ o .
J˜t+1
t
t
(ii) Inspection of the proof of Proposition 3.1(i) shows that Jto is αt -concave
(αt > 0) for t = 0, . . . , N − 1, whereas the αN -concavity (αN > 0) of JNo = hN
o
is assumed. By (12) and condition (10), J˜t+1,j
is concave for j sufficiently large.
o,p
o
, and so there exists Jˆ ∈ Wpm (Rd ) such that
Hence, one can apply (i) to J˜
t,j
t+1,j
o,p
o
= Jˆt,j |Xt . Proceeding as in the proof of Proposition 3.1, one obtains equaTt J˜t+1,j
tions analogous to (39) and (40) (with obvious replacements). Then, by differentiating
o
Tt J˜t+1,j
up to the order m, we get
lim
max
j →∞ 0≤|r|≤m
o
supxt ∈Xt D r Jto (xt ) − Tt J˜t+1,j
(xt ) = 0.
Finally, the statement follows by the continuity of the embedding of C m (Xt ) into
(since Xt is compact) and the continuity of the Sobolev’s extension
operator.
Wpm (int(Xt ))
Proof of Proposition 4.1 (i) For ω ∈ Rd , let M(ω) = max{ω, 1}, ν be a positive
integer, and define the set of functions
Γ ν Rd := f ∈ L2 Rd :
M(ω)ν fˆ(ω) dω < ∞ ,
Rd
where fˆ is the Fourier transform of f . For f ∈ Γ ν (Rd ), let
f Γ ν (Rd ) :=
M(ω)ν fˆ(ω) dω
Rd
412
J Optim Theory Appl (2013) 156:380–416
and for θ > 0, denote by
Bθ · Γ ν (Rd ) := f ∈ L2 Rd :
Rd
M(ω)ν fˆ(ω) dω ≤ θ ,
the closed ball of radius θ in Γ ν (Rd ). By [55, Corollary 3.2]3 , the compactness
of the support of ψ , and the regularity of its boundary (which allows one to apply
the Rellich–Kondrachov theorem [56, Theorem 6.3, p. 168]), for s = d/2 + 1 and
ψ ∈ S q+s , there exists4 C1 > 0 such that, for every f ∈ Bθ ( · Γ q+s+1 ) and every
positive integer n, there is fn ∈ R(ψ, n) such that
θ
max sup D r f (x) − D r fn (x) ≤ C1 √ .
0≤|r|≤q x∈X
n
(41)
The next step consists in proving that, for every positive integer ν and s =
d/2 + 1, the space W2ν+s (Rd ) is continuously embedded in Γ ν (Rd ). Let f ∈
W2ν+s (Rd ). Then
fˆ(ω) dω +
M(ω)ν fˆ(ω) dω =
ων fˆ(ω) dω.
Rd
ω≤1
ω>1
The
is finite by the Cauchy–Schwarz inequality and the finiteness of
first integral
ˆ(ω)|2 dω. To study the second integral, taking the hint from [37, p. 941],
|
f
ω≤1
we factorize ων |fˆ(ω)| = a(ω)b(ω), where a(ω) := (1 + ω2s )−1/2 and b(ω) :=
ων |fˆ(ω)|(1 + ω2s )1/2 . By the Cauchy–Schwarz inequality,
1/2 1/2
ν ˆ
2
2
ω f (ω) dω ≤
a (ω) dω
b (ω) dω
.
ω>1
Rd
Rd
The integral Rd a 2 (ω) dω = Rd (1 + ω2s )−1 dω is finite for 2s > d, which is satisfied for all d ≥ 1 as s = d/2 + 1. By Parseval’s identity [57, p. 172],
since f has
square-integrable νth and (ν + s)th partial derivatives, the integral Rd b2 (ω) dω =
2ν ˆ
2
2s
2
2ν + ω2(ν+s) ) dω is finite.
ˆ
Rd ω |f (ω)| (1 + ω ) dω = Rd |f (ω)| (ω
ν
ν
d
Hence, Rd M(ω) |fˆ(ω)| dω is finite, so f ∈ Γ (R ), and, by the argument above,
there exists C2 > 0 such that Bρ ( · W ν+s ) ⊂ BC2 ρ ( · Γ ν ).
2
Taking ν = q + s + 1 as required in (41) and C = C1 · C2 , we conclude that, for
every f ∈ Bρ ( · W q+2s+1 ) and every positive integer n, there exists fn ∈ R(ψ, n)
2
such that max0≤|r|≤q supx∈X |D r f (x) − D r fn (x)| ≤ C √ρn .
(ii) Follows by [40, Theorem 2.1] and the Rellich–Kondrachov theorem [56, Theorem 6.3, p. 168], which allows to use “sup” in (20) instead of “ess sup”.
(iii) Follows by [58, Corollary 5.2].
3 Note that [55, Corollary 3.2] uses “ess sup” instead of “sup” in (41). However, by the Rellich–Kondrachov
theorem [56, Theorem 6.3, p. 168], one can replace “ess sup” with “sup”.
4 Unfortunately, [55, Corollary 3.2] does not provide neither a closed-form expression of C , nor an upper
1
bound on it. For results similar to [55, Corollary 3.2] and for specific choices of ψ , [55] gives upper bounds
on similar constants (see, e.g., [55, Theorem 2.3 and Corollary 3.3]).
J Optim Theory Appl (2013) 156:380–416
413
Proof of Proposition 4.2 (i) We detail the proof for t = N − 1 and t = N − 2; the
other cases follow by backward induction.
Let us start with t = N − 1 and JÑo = JNo . By Proposition 3.1(ii), there exists
2+(2s+1)N
(Rd ) such that TN −1 JÑo = TN −1 JNo = JNo −1 = J¯No,2
J¯No,2
−1 ∈ W2
−1 |XN−1 .
o,2
¯
By Proposition 4.1(i) with q = 2 + (2s + 1)(N − 1) applied to JN −1 , we obtain
(22) for t = N − 1. Set JÑo −1 = fN −1 in (22). By (22) and condition (10), there exists
a positive integer n̄N −1 such that JÑo −1 is concave for nN −1 ≥ n̄N −1 .
Now consider t = N − 2. By Proposition 3.2(i), it follows that there exists
Jô,2 ∈ W 2+(2s+1)(N −1) (Rd ) such that TN −2 Jõ = Jô,2 |XN−2 . By applying to
N −2
N −1
2
N −2
JˆNo,2
−2 Proposition 4.1(i) with q = 2 + (2s + 1)(N − 2), for every positive integer
nN −2 , we conclude that there exists fN −2 ∈ R(ψt , nN −2 ) such that
max
sup D r TN −2 JÑo −1 (xN −2 ) − D r fN −2 (xN −2 )
0≤|r|≤2+(2s+1)(N −2) xN−2 ∈XN−2
C̄N −2 JˆNo,2
−2 W22+(2s+1)(N−1) (Rd )
≤
,
√
nN −2
(42)
2+(2s+1)(N −1)
where, by Proposition 3.2(i), JˆNo,2
(Rd ) is a suitable extension of
−2 ∈ W2
o
d
TN −2 JÑ −1 on R , and C̄N −2 > 0 does not depend on the approximations generated
in the previous iterations. The statement for t = N − 2 follows by the fact that the
dependence of the bound (42) on JˆNo,2
−2 W22+(2s+1)(N−1) (Rd ) can be removed by exploiting Proposition 3.2(ii); in particular, we can choose CN −2 > 0 independently of
nN −1 . So, we get (22) for t = N − 2. Set JÑo −2 = fN −2 in (22). By (22) and condition
(10), there exists a positive integer n̄N −2 such that JÑo −2 is concave for nN −2 ≥ n̄N −2 .
The proof proceeds similarly for the other values of t; each constant Ct can be
chosen independently on nt+1 , . . . , nN −1 .
(ii) follows by Proposition 3.1(ii) (with p = +∞) and Proposition 4.1(ii).
(iii) follows by Proposition 3.1(iii) (with p = 1) and Proposition 4.1(iii).
Proof of Proposition 5.1 We first derive some constraints on the form of the sets At,j
and then show that the budget constraints (25) are satisfied if and only if the sets At,j
are chosen as in Assumption 5.1 (or are suitable subsets).
As the labor incomes yt,j and the interest rates rt,j are known, for t = 1, . . . , N ,
we have
max
at,j ≤ a0,j
t−1
k=0
(1 + rk,j ) +
t−1
yi,j
i=0
t−1
max
(1 + rk,j ) = at,j
k=i
(the upper bound is achieved when all the consumptions ct,j are equal to 0), so the
max . The boundedness
corresponding feasible sets At,j are bounded from above by at,j
from below of each At,j follows from the budget constraints (25), which for ck,j = 0
(k = t, . . . , N ) are equivalent for t = N to
aN,j ≥ −yN,j
(43)
414
J Optim Theory Appl (2013) 156:380–416
−1
N −1
N −1
and for t = 0, . . . , N − 1 to at,j N
k=t (1 + rk,j ) +
i=t yi,j
k=i (1 + rk,j ) +
yN,j ≥ 0, i.e.,
N −1
N −1
i=t yi,j
k=i (1 + rk,j ) + yN,j
at,j ≥ −
.
(44)
N −1
k=t (1 + rk,j )
So, in order to satisfy the budget constraints (25), the constraints (43) and (44) have
to be satisfied. Then the maximal sets At that satisfy the budget constraints (25) have
the form described in Assumption 5.1.
Proof of Proposition 5.2 (a) About Assumption 3.1(i). By construction, the sets Āt
are compact, convex, and have nonempty interiors, since they are Cartesian products
of nonempty closed intervals. The same holds for the D̄t , since by (31) they are the
intersections between Āt × Āt+1 and the sets Dt , which are compact, convex, and
have nonempty interiors too.
(b) About Assumption 3.1(ii). This is Assumption 5.2(i), with the obvious replacements of Xt and Dt .
(c) About Assumption 3.1(iii). Recall that for Problem OCdN and t = 0, . . . , N − 1,
we have
d
(1 + rt ) ◦ (at + yt ) − at+1
vt,j (at,j ).
ht (at , at+1 ) = u
+
1 + rt
j =1
Then, ht ∈ C m (D̄t ) by Assumption 5.2(ii) and (iii). As u(·) and vt,j (·) are twice
continuously differentiable, the second part of Assumption 3.1(iii) means that there
exists some αt > 0 such that the function
d
(1 + rt ) ◦ (at + yt ) − at+1
1
+
u
vt,j (at,j ) + αt at 2
1 + rt
2
j =1
has negative semi-definite Hessian with respect to the variables at and at+1 . Assumpt +yt )−at+1
) has
tion 5.2(ii) and easy computations show that the function u( (1+rt )◦(a1+r
t
negative semi-definite Hessian. By Assumption 5.2(iii), for each j = 1, . . . , d and
2 has negative semi-definite Hessian too. So, Asαt,j ∈ (0, βt,j ], vt,j (at,j ) + 12 αt,j at,j
sumption 3.1(iii) is satisfied for every αt ∈ (0, minj =1,...,d {βt,j }].
(d) About Assumption 3.1(iv). Recall that for Problem OCdN , we have hN (aN ) =
u(aN + yN ). Then, hN ∈ C m (ĀN ) and is concave by Assumption 5.2(ii).
References
1. Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)
2. Bertsekas, D.P., Tsitsiklis, J.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)
3. Powell, W.B.: Approximate Dynamic Programming—Solving the Curses of Dimensionality. Wiley,
Hoboken (2007)
4. Si, J., Barto, A.G., Powell, W.B., Wunsch, D. (eds.): Handbook of Learning and Approximate Dynamic Programming. IEEE Press, New York (2004)
J Optim Theory Appl (2013) 156:380–416
415
5. Zoppoli, R., Parisini, T., Sanguineti, M., Gnecco, G.: Neural Approximations for Optimal Control and
Decision. Springer, London (2012, in preparation)
6. Haykin, S.: Neural Networks: a Comprehensive Foundation. Prentice Hall, New York (1998)
7. Bertsekas, D.P.: Dynamic Programming and Optimal Control vol. 1. Athena Scientific, Belmont
(2005)
8. Bellman, R., Dreyfus, S.: Functional approximations and dynamic programming. Math. Tables Other
Aids Comput. 13, 247–251 (1959)
9. Bellman, R., Kalaba, R., Kotkin, B.: Polynomial approximation—a new computational technique in
dynamic programming. Math. Comput. 17, 155–161 (1963)
10. Foufoula-Georgiou, E., Kitanidis, P.K.: Gradient dynamic programming for stochastic optimal control
of multidimensional water resources systems. Water Resour. Res. 24, 1345–1359 (1988)
11. Johnson, S., Stedinger, J., Shoemaker, C., Li, Y., Tejada-Guibert, J.: Numerical solution of continuousstate dynamic programs using linear and spline interpolation. Oper. Res. 41, 484–500 (1993)
12. Chen, V.C.P., Ruppert, D., Shoemaker, C.A.: Applying experimental design and regression splines to
high-dimensional continuous-state stochastic dynamic programming. Oper. Res. 47, 38–53 (1999)
13. Cervellera, C., Muselli, M.: Efficient sampling in approximate dynamic programming algorithms.
Comput. Optim. Appl. 38, 417–443 (2007)
14. Philbrick, C.R. Jr., Kitanidis, P.K.: Improved dynamic programming methods for optimal control of
lumped-parameter stochastic systems. Oper. Res. 49, 398–412 (2001)
15. Judd, K.: Numerical Methods in Economics. MIT Press, Cambridge (1998)
16. Kůrková, V., Sanguineti, M.: Comparison of worst-case errors in linear and neural network approximation. IEEE Trans. Inf. Theory 48, 264–275 (2002)
17. Tesauro, G.: Practical issues in temporal difference learning. Mach. Learn. 8, 257–277 (1992)
18. Gnecco, G., Sanguineti, M., Gaggero, M.: Suboptimal solutions to team optimization problems with
stochastic information structure. SIAM J. Optim. 22, 212–243 (2012)
19. Tsitsiklis, J.N., Roy, B.V.: Feature-based methods for large scale dynamic programming. Mach.
Learn. 22, 59–94 (1996)
20. Zoppoli, R., Sanguineti, M., Parisini, T.: Approximating networks and extended Ritz method for the
solution of functional optimization problems. J. Optim. Theory Appl. 112, 403–439 (2002)
21. Alessandri, A., Gaggero, M., Zoppoli, R.: Feedback optimal control of distributed parameter systems
by using finite-dimensional approximation schemes. IEEE Trans. Neural Netw. Learn. Syst. 23(6),
984–996 (2012)
22. Stokey, N.L., Lucas, R.E., Prescott, E.: Recursive Methods in Economic Dynamics. Harvard University Press, Cambridge (1989)
23. Bertsekas, D.P.: Dynamic Programming and Optimal Control vol. 2. Athena Scientific, Belmont
(2007)
24. White, D.J.: Markov Decision Processes. Wiley, New York (1993)
25. Puterman, M.L., Shin, M.C.: Modified policy iteration algorithms for discounted Markov decision
processes. Manag. Sci. 41, 1127–1137 (1978)
26. Altman, E., Nain, P.: Optimal control of the M/G/1 queue with repeated vacations of the server. IEEE
Trans. Autom. Control 38, 1766–1775 (1993)
27. Lendaris, G.G., Neidhoefer, J.C.: Guidance in the choice of adaptive critics for control. In: Si, J.,
Barto, A.G., Powell, W.B., Wunsch, D. (eds.) Handbook of Learning and Approximate Dynamic
Programming, pp. 97–124. IEEE Press, New York (2004)
28. Karp, L., Lee, I.H.: Learning-by-doing and the choice of technology: the role of patience. J. Econ.
Theory 100, 73–92 (2001)
29. Rapaport, A., Sraidi, S., Terreaux, J.: Optimality of greedy and sustainable policies in the management
of renewable resources. Optim. Control Appl. Methods 24, 23–44 (2003)
30. Semmler, W., Sieveking, M.: Critical debt and debt dynamics. J. Econ. Dyn. Control 24, 1121–1144
(2000)
31. Nawijn, W.M.: Look-ahead policies for admission to a single-server loss system. Oper. Res. 38, 854–
862 (1990)
32. Gnecco, G., Sanguineti, M.: Suboptimal solutions to dynamic optimization problems via approximations of the policy functions. J. Optim. Theory Appl. 146, 764–794 (2010)
33. Hiriart-Urruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms I. Springer,
Berlin (1993)
34. Stein, E.M.: Singular Integrals and Differentiability Properties of Functions. Princeton University
Press, Princeton (1970)
416
J Optim Theory Appl (2013) 156:380–416
35. Singer, I.: Best Approximation in Normed Linear Spaces by Elements of Linear Subspaces. Springer,
Berlin (1970)
36. Kůrková, V., Sanguineti, M.: Geometric upper bounds on rates of variable-basis approximation. IEEE
Trans. Inf. Theory 54, 5681–5688 (2008)
37. Barron, A.R.: Universal approximation bounds for superpositions of a sigmoidal function. IEEE
Trans. Inf. Theory 39, 930–945 (1993)
38. Gnecco, G., Kůrková, V., Sanguineti, M.: Some comparisons of complexity in dictionary-based and
linear computational models. Neural Netw. 24, 171–182 (2011)
39. Wahba, G.: Spline Models for Observational Data. CBMS-NSF Regional Conf. Series in Applied
Mathematics, vol. 59. SIAM, Philadelphia (1990)
40. Mhaskar, H.N.: Neural networks for optimal approximation of smooth and analytic functions. Neural
Comput. 8, 164–177 (1996)
41. Kainen, P.C., Kůrková, V., Sanguineti, M.: Complexity of Gaussian radial-basis networks approximating smooth functions. J. Complex. 25, 63–74 (2009)
42. Alessandri, A., Gnecco, G., Sanguineti, M.: Minimizing sequences for a family of functional optimal
estimation problems. J. Optim. Theory Appl. 147, 243–262 (2010)
43. Adda, J., Cooper, R.: Dynamic Economics: Quantitative Methods and Applications. MIT Press, Cambridge (2003)
44. Fang, K.T., Wang, Y.: Number-Theoretic Methods in Statistics. Chapman & Hall, London (1994)
45. Hammersley, J.M., Handscomb, D.C.: Monte Carlo Methods, Methuen, London (1964)
46. Niederreiter, H.: Random Number Generation and Quasi-Monte Carlo Methods. SIAM, Philadelphia
(1992)
47. Sobol’, I.: The distribution of points in a cube and the approximate evaluation of integrals. Zh. Vychisl.
Mat. Mat. Fiz. 7, 784–802 (1967)
48. Loomis, L.H.: An Introduction to Abstract Harmonic Analysis. Van Nostrand, Princeton (1953)
49. Boldrin, M., Montrucchio, L.: On the indeterminacy of capital accumulation paths. J. Econ. Theory
40, 26–39 (1986)
50. Dawid, H., Kopel, M., Feichtinger, G.: Complex solutions of nonconcave dynamic optimization models. Econ. Theory 9, 427–439 (1997)
51. Chambers, J., Cleveland, W.: Graphical Methods for Data Analysis. Wadsworth/Cole Publishing
Company, Pacific Grove (1983)
52. Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, New York (2006)
53. Zhang, F. (ed.): The Schur Complement and Its Applications. Springer, New York (2005)
54. Wilkinson, J.H.: The Algebraic Eigenvalue Problem. Oxford Science Publications, Oxford (2004)
55. Hornik, K., Stinchcombe, M., White, H., Auer, P.: Degree of approximation results for feedforward
networks approximating unknown mappings and their derivatives. Neural Comput. 6, 1262–1275
(1994)
56. Adams, R.A., Fournier, J.J.F.: Sobolev Spaces. Academic Press, San Diego (2003)
57. Rudin, W.: Functional Analysis. McGraw-Hill, New York (1973)
58. Gnecco, G., Sanguineti, M.: Approximation error bounds via Rademacher’s complexity. Appl. Math.
Sci. 2, 153–176 (2008)

Download Report

Dynamic Programming and Value-Function Approximation in

Paperzz.com

Your Paperzz