J Optim Theory Appl (2013) 156:380–416 DOI 10.1007/s10957-012-0118-2 Dynamic Programming and Value-Function Approximation in Sequential Decision Problems: Error Analysis and Numerical Results Mauro Gaggero · Giorgio Gnecco · Marcello Sanguineti Received: 4 August 2011 / Accepted: 21 June 2012 / Published online: 6 July 2012 © Springer Science+Business Media, LLC 2012 Abstract Value-function approximation is investigated for the solution via Dynamic Programming (DP) of continuous-state sequential N -stage decision problems, in which the reward to be maximized has an additive structure over a finite number of stages. Conditions that guarantee smoothness properties of the value function at each stage are derived. These properties are exploited to approximate such functions by means of certain nonlinear approximation schemes, which include splines of suitable order and Gaussian radial-basis networks with variable centers and widths. The accuracies of suboptimal solutions obtained by combining DP with these approximation tools are estimated. The results provide insights into the successful performances appeared in the literature about the use of value-function approximators in DP. The theoretical analysis is applied to a problem of optimal consumption, with simulation results illustrating the use of the proposed solution methodology. Numerical comparisons with classical linear approximators are presented. Keywords Sequential decision problems · Dynamic programming · Approximation schemes · Curse of dimensionality · Suboptimal solutions · Optimal consumption Communicated by Francesco Zirilli. M. Gaggero Institute of Intelligent Systems for Automation, National Research Council of Italy, Genova, Italy e-mail: [email protected] G. Gnecco · M. Sanguineti () DIBRIS, University of Genova, Genova, Italy e-mail: [email protected] G. Gnecco e-mail: [email protected] J Optim Theory Appl (2013) 156:380–416 381 1 Introduction Tasks requiring to perform sequential decisions in such a way to maximize a reward (or minimize a cost) expressed as a summation over stages arise in a variety of applications. Often, a model of the process evolving during the stages is assumed to be available, and the decisions taken at each step depend on a “state variable” that captures the “history” of the process. Examples can be found in scheduling fleets of vehicles, managing systems of water reservoirs, allocating resources (e.g., people, equipment, commodities, and facilities), selling assets, investing money in portfolios, optimizing transportation or telecommunication networks, inventory forecasting, financial planning, etc. Depending on the application context, both continuous and discrete states are considered. In this paper, we are interested in the case where the state can take a continuum of values. Sequential decision problems have been extensively studied by means of the Dynamic Programming (DP) methodology [1]. DP solves sequential decision problems iteratively, by introducing at each stage the value function (called cost-to-go when a cost has to be minimized), which expresses the value of the reward to be incurred at the next stage, as a function of the state at the current one. The solution is formally obtained by means of recursive equations. However, closed-form solutions to such equations can be derived only in particular cases. In general, one has to search for suboptimal solutions [2–5]. We refer to the various techniques and algorithms developed to this end as Approximate Dynamic Programming (ADP). Effective approximation approaches require understanding the structure of the problem at hand. In the case of continuous problems, they typically share the feature of combining DP with tools from the theory of approximation, in such a way to replace the value functions with simple approximations (e.g., orthogonal polynomials, splines, and neural networks [6]) containing some parameters to be optimized (e.g., the coefficients of orthogonal polynomials and the weights and connections in the computational units of neural networks). The knowledge of smoothness properties of the value functions is useful to choose suitable approximation strategies (see, e.g., the discussion in [3, Chap. 11]). In the basic version of ADP, first the current and next state vectors are discretized by using a number of levels in each of their components, respectively. In such a way, the application of DP requires to solve recursive equations only for a finite number of state values. However, in order to get the solution of the original sequential decision problem, one does not need merely to know for every stage the value function in correspondence of the discretized states, but has also to estimate it in the other states that might be “visited” by the DP algorithm. Hence, as a second step, a suitable technique has to be applied to approximate each value function in the states out of the discretization set (see, e.g., [3, Chaps. 7 and 11] and [7, Chap. 6]). The idea to combine DP with approximations of the value functions arose at the very beginning of DP. After the seminal contributions [1] and [8], it is possible to trace an evolution going from polynomial approximation [8–10] to spline interpolation [11, 12] and neural networks [5, 13]. Interpolation methods were developed in [14] to approximate the value functions for high-dimensional problems. Several methods that involve the use of neural networks were presented in [2] under the name of “neuro-dynamic programming”. A nice exposition of approximation methods for continuous-state problems can be found in [15]. Among recent monographs on ADP methods and algorithms, we mention [3] and [4]. 382 J Optim Theory Appl (2013) 156:380–416 The aim of this paper is to investigate how DP and suitable approximations of the value functions can be combined to develop a methodology that allows one to face high-dimensional, continuous-state, sequential decision problems. As pointed out in [2, Chap. 6, p. 335], in order to have performance guarantees in value-function approximation, “the function approximator must be able to closely represent all of the intermediate cost-to-go functions” . . . “Given that such a condition is in general very difficult to verify, one must either accept the risk of a divergent algorithm1 or else restrict to particular types of function approximators under which divergent behavior becomes impossible”. The search for such approximators is the departure point of our work. We consider approximations of the value functions taking on the form of linear combinations of basis functions obtained from a chosen “mother function” by varying some “inner parameters”. For instance, the mother function can be a Gaussian, in which case the inner parameters are the variance and the coordinates of its center. Such inner parameters have to be optimized together with the components of the coefficients of the linear combinations. Since the basis functions can vary— although keeping unchanged the structure of the mother function—by varying the inner parameters, one has a variable-basis approximation scheme [16]. In contrast, traditional approaches are fixed-basis approximation schemes, as they are made up of linear combinations of a certain number of a priori fixed basis functions (e.g., algebraic and trigonometric polynomials). In particular, we consider variable-basis functions that model approximators successfully used in the applications of ADP to sequential decision problems, such as splines, Gaussian radial-basis functions, and neural networks [6]. Unfortunately, the number of basis functions (hence, the number of coefficients to be optimized) required to guarantee a desired approximation accuracy of the value functions may grow “very fast” with the dimension of the state vector. This behavior, known as curse of dimensionality in value-function approximation, is a major source of difficulties in the development of computationally efficient ADP techniques. However, there is a large experimental evidence of the effectiveness of variable-basis approximation schemes with certain mother functions (see, e.g, [2, 17], [3, Sect. 7.4], and [18]). This requires a thorough theoretical investigation, which, to the best of our knowledge, still lacks in the literature (see, e.g., the remarks in [3, Sect. 7.4], [7, Chap. 6]). For example, quoting [19, p. 61], the use of neural networks to approximate value functions “has led to some successes”, but “it is very hard to quantify or analyze the performance of neural-network-based techniques.” In this paper, we provide such analysis for a variety of variable-basis approximators used in ADP, including several kinds of neural networks. Our results give conditions guaranteeing that the number of parameterized basis functions (e.g., the number of computational units in neural networks; hence, the total number of parameters to be adjusted) is small, but large enough to guarantee sufficiently accurate suboptimal solutions. The numerical comparison between variablebasis schemes and linear approximators in the solution of optimization problems has deserved a very little attention in the literature (we are aware only of the papers [20, 21] and the monograph [5], in preparation). To get insights into this direction, we 1 When the decision horizon goes to infinity. J Optim Theory Appl (2013) 156:380–416 383 compare from a numerical point of view the proposed variable-basis approximation schemes with fixed-basis ones, showing that the former provide, in general, a better accuracy in the approximation of the value functions, the number of computational units being the same. To the best of our knowledge, estimates of this kind have not been derived before. The paper is organized as follows. Section 2 states the considered sequential decision problem, summarizes the DP algorithm, and estimates the error propagation in ADP with value-function approximation. Section 3 derives smoothness properties of the value functions. In Sect. 4, such properties are combined with tools from approximation theory, in order to investigate the accuracy of suboptimal solutions obtained via our methodology. Section 5 applies the approach to a problem of optimal consumption. Section 6 describes the ADP procedure suggested by our theoretical study. Simulation results are given in Sect. 7, where variable-basis schemes are compared with fixed-basis ones. Section 8 contains some final remarks. To keep continuity of exposition, all proofs are collected in the Appendix. 2 Error Propagation We consider the following model of N -stage, continuous-state, sequential decision problem, in which a reward functional, expressed as a summation of N terms, has to be maximized [22]. Problem ΣN For x0 ∈ X0 , find N −1 o t N J (x0 ) := sup β ht (xt , xt+1 ) + β hN (xN ) t=0 s.t. (xt , xt+1 ) ∈ Dt , t = 0, 1, . . . , N − 1, xt ∈ X t , where: xt ∈ Xt ⊆ Rdt , t = 0, 1, . . . , N , is the state vector at time t, and Xt is the state space; the next state xt+1 , t = 0, 1, . . . , N − 1, has to be chosen subject to the constraint (xt , xt+1 ) ∈ Dt , where Dt ⊆ Xt × Xt+1 is a correspondence which models the transition from stage t to stage t + 1; ht : Dt → R, t = 0, 1, . . . , N − 1, are transition rewards that depend only on the current and next states; hN : XN → R is the final reward, which depends only on the final state xN ; β ∈ [0, 1] is a fixed nonnegative discount factor (the case β = 0 corresponds to a static optimization problem). Under mild hypotheses on the correspondence and reward functions [1, 7, 23], DP allows one to formally solve Problem ΣN in an iterative way, by introducing, for t = N − 1, . . . , 0, the following subproblems: JNo (xN ) := hN (xN ), N −1 o k−t N −t β hk (xk , xk+1 ) + β hN (xN ) Jt (xt ) := sup (1a) (1b) k=t s.t. (xk , xk+1 ) ∈ Dk , k = t, . . . , N − 1. (1c) 384 J Optim Theory Appl (2013) 156:380–416 The function Jto : Xt → R is called tth value function (tth cost-to-go function when a cost has to be minimized). Subproblems (1a)–(1c) can be restated in terms of Bellman’s operators Tt , defined for every t = N − 1, . . . , 0 and every bounded continuous function ft+1 on Xt+1 as (Tt ft+1 )(xt ) := sup ht (xt , y) + βft+1 (y) , y∈Dt (xt ) where, for xt ∈ Xt , we let Dt (xt ) := {y ∈ Xt+1 | (xt , y) ∈ Dt }. In terms of Bellman’s operators, subproblems (1a)–(1c) satisfy the following recursive equations for the value functions, known as Bellman’s equations: JNo (xN ) = hN (xN ), o (xt ), Jto (xt ) = Tt Jt+1 (2a) t = N − 1, . . . , 0. (2b) Their iterated application is the DP algorithm. The solution to Problem ΣN is given by J0o (x0 ) = J o (x0 ). For t = 0, . . . , N − 1, let us denote by J˜to the approximation of the tth value function Jto , obtained by a given approximating family Ft . The use of approximate value functions is the essence of the ADP algorithm. The last stage does not require any approximation, as one can set J˜No = JNo := hN . Suppose that, at a certain stage o (obtained from previous iterations of t + 1, one has at disposal an approximation J˜t+1 o o with J˜o in Eqs. (2a)– ADP) of the (t + 1)th value function Jt+1 . By replacing Jt+1 t+1 o (2b), instead of Jt (xt ), one gets o Jˆto (xt ) = Tt J˜t+1 (xt ). In general, Jˆto = Jto . Thus, there is an error propagation from one iteration of ADP to the following one. Since Jˆto may not belong to the approximating family Ft used at stage t, before performing the next iteration of ADP, one has to approximate Jˆto by an element J˜to ∈ Ft (this is done, e.g., by choosing the function J˜to that minimizes a suitable error between J˜to and Jˆto ). Suppose that one is able to guarantee that o of J o is “sufficiently accurate”; then, one would like the the approximation J˜t+1 t+1 o approximations Jˆt of Jto and J˜to of Jˆto (hence of Jto ) be “sufficiently accurate” too. In order to investigate how the error propagates from one iteration of ADP to the next one, a suitable norm has to be chosen to evaluate the approximation error. Given a Lebesgue-measurable set Ω ⊆ Rd and 1 ≤ p ≤ +∞, we denote by Lp (Ω) the correspondent Lebesgue space, where integration is performed with respect to the Lebesgue measure (for bounded and continuous functions, the L∞ norm coincides with the supremum norm). For our purposes, the following assumption is needed. Assumption 2.1 The final reward hN is bounded and continuous on XN , and, for t = 0, . . . , N − 1, the transition reward ht is bounded and continuous on Dt . For every xt ∈ Xt , the set Dt (xt ) is nonempty. We shall exploit the following known result (see, e.g., [22, Theorem 3.3, p. 54]). J Optim Theory Appl (2013) 156:380–416 385 Proposition 2.1 If Assumption 2.1 holds and ft+1 , kt+1 are bounded and continuous on Xt+1 , then sup (Tt ft+1 )(xt ) − (Tt kt+1 )(xt ) ≤ β sup ft+1 (xt+1 ) − kt+1 (xt+1 ). xt ∈Xt xt+1 ∈Xt+1 The next proposition provides an upper bound on the approximation error through the N stages; the estimate in (i) is analogous to the one derived in [2, pp. 332–333] for infinite-horizon problems. Proposition 2.2 (Error Propagation in ADP) Let Assumption 2.1 hold. Suppose that, for every t = 0, 1, . . . , N , Jto is bounded and continuous and, for t = 0, 1, . . . , N − 1, let Ft be a family of bounded and continuous functions on Xt . Let J˜No = JNo := hN . (i) If, for t = 0, 1, . . . , N − 1, there exists ft ∈ Ft such that o sup Tt J˜t+1 (xt ) − ft (xt ) ≤ εt xt ∈Xt (3) and one takes J˜to = ft , then −1 N sup J0o (x0 ) − J˜0o (x0 ) ≤ β t εt . x0 ∈X0 (4) t=0 (ii) If, for every t = 0, 1, . . . , N − 1, there exists ft ∈ Ft such that sup Jto (xt ) − ft (xt ) ≤ εt , xt ∈Xt (5) then one can choose each J˜to on the basis of the available information at every stage t in such a way that −1 N sup J0o (x0 ) − J˜0o (x0 ) ≤ (2β)t εt . x0 ∈X0 (6) t=0 Condition (3) requires that at every stage t the family Ft has to be “flexible enough” to approximate with a desired degree of accuracy every function of the form o , obtained by applying Bellman’s operator to the previous-stage approximation Tt J˜t+1 o . As remarked in [2, p. 335], for small values of ε , this may be rather difficult. In J˜t+1 t Sect. 4, we shall provide conditions under which (3) holds for classes of functions Ft . Condition (5) is a much weaker requirement than (3). It expresses the capability of Ft o . Howto approximate, at every stage t, merely the true value function Jto = Tt Jt+1 ever, by replacing (3) with (5), one gets, instead of (4), the upper bound (6), where the term 2β replaces β inside the summation. When εt does not depend on t, for β ∈ [1/2, 1], the bound (6) may exhibit a curse of time horizon (we use such a terminology by analogy with the curse of dimensionality). This may happen, e.g., when the same family of approximators is used in each stage and the value functions at all 386 J Optim Theory Appl (2013) 156:380–416 stages are “almost identical”. When the transition rewards are stage-independent, the latter is reasonable for N large enough, since when the horizon N tends to infinity, typically the tth value functions converge to a stationary value function. In this case, the right-hand side of (6) becomes a geometric series. So, in the case of a “nearly constant” value of εt and a value of the horizon N large enough, computational feasibility limits the range of applicability of Proposition 2.2(ii) to β ∈ [0, 1/2); if, instead, the horizon is sufficiently small, then the whole range β ∈ [0, 1] works. Values of β not very close to 1 are sometimes encountered in sequential decision problems (see, e.g., [24]). For instance, the role of small discount factors in the convergence rates of algorithms for discounted Markov decision problems was investigated in [25]. Small values of β arise, e.g., in optimal control of admission to multiserver queues and single-server queues with random vacation periods of the server [26], adaptive critics for control [27], technology adoption in economic growth [28], management of renewable resources [29], debt dynamics and sustainability [30], etc. When β ∈ [1/2, 1), there are two possibilities: either making the stronger requirement (3) satisfied or resorting to the technique known as multistage lookahead (see, e.g., [2, Sect. 6.1.2], [7, Sect. 6.3], and [31]), considered in the remaining of this section. Let M be a fixed positive integer such that the horizon N is its multiple (we assume this for simplicity, but, at the expense of a heavier notation, the result can o := JNo , for t = be easily generalized to the nonmultiple case). Starting from J˜N/M o(M) (M) := Tt J˜o , N/M − 1, . . . , 0, one searches for a function that approximates Jˆt t+1 (M) = TM·t · · · TM·(t+1)−1 is the operator obtained applying the M Bellman’s where Tt operators TM·(t+1)−1 , . . . , TM·t in this order. We denote by ADP(M) the corresponding value-function approximation procedure. For the use of multistage lookahead in value-function approximation, see also [2, Sect. 6.1.2]. Advantages of this technique (which, of course, are obtained at the expense of heavier computations) are pointed out, e.g., in [2, pp. 266, 375]. Proposition 2.3 (Error Propagation in ADP with Multistage Lookahead) Let Assumption 2.1 hold. Suppose that, for t = 0, 1, . . . , N , Jto is bounded and continuous, and let M be a positive integer such that the horizon N is its multiple. For every t = 0, . . . , N/M − 1, let Ft be a family of bounded and continuous functions on o = JNo = hN . If, for t = 0, 1, . . . , N/M − 1, there exists ft ∈ Ft XM·t , and let J˜N/M such that o sup JM·t (xM·t ) − ft (xM·t ) ≤ εt , (7) x∈XM·t then the approximation error of ADP(M) is bounded from above as follows: N/M−1 t 2β M εt . sup J0o (x0 ) − J˜0o (x0 ) ≤ x0 ∈X0 (8) t=0 According to the bound (8), the above-mentioned curse of time horizon is avoided for 0 ≤ β < M 12 , which, still for a reasonably small value of M, covers a much wider range of values than the bound (6) from Proposition 2.2. For example, for J Optim Theory Appl (2013) 156:380–416 387 M = 2, 3, and 4, we get the interval 0 ≤ β < βmax , where βmax 0.7, 0.8, and 0.84, o , which respectively. This is obtained at the expense of computing Jˆto(M) = Tt(M) J˜t+1 amounts at solving, for every t, the nonlinear programming problem o(M) (xt ) Jˆt = t+M−1 max xt+1 ,...,xt+M :xk+1 ∈Dk (xk ),k=t,...,t+M−1 β k−t hk (xk , xk+1 ) + β M o (xt+M ) J˜t+1 . k=t From the computational point of view, the number of real variables involved in the equation above is d · M, where d is the dimension of the state vector. When d · M is not too large, this approach may be computationally feasible [2, Sect. 6.1.2]. 3 Smoothness of the Value Functions The smoothness results of this section are expressed in terms of Lp -norms of partial derivatives, so the natural setting is provided by Sobolev spaces (see [32] and the references therein for other smoothness results available in the literature). For a smooth function f : X → R and a (multi)index r := (r1 , . . . , rd ) ∈ Nd0 , we let |r| := di=1 ri r and D r f (x) := r1∂ f rd (x). For an open set Ω ⊆ Rd , a positive integer m, and ∂x1 ···∂xd 1 ≤ p ≤ ∞, by Wpm (Ω) we denote the Sobolev space of functions on Ω whose (distributional) partial derivatives up to the order m are in Lp (Ω), endowed with the norm ⎧ 1/p 1/p p r r p ⎪ = ⎪ 0≤|r|≤m D f Lp (Ω) 0≤|r|≤m Ω |D f (x)| dx ⎪ ⎪ ⎨ if 1 ≤ p < +∞, f Wpm (Ω) := ⎪ ⎪ max0≤|r|≤m D r f L∞ (Ω) = max0≤|r|≤m {ess supx∈Ω |D r f (x)|} ⎪ ⎪ ⎩ if p = +∞. As we shall deal with continuous functions, “ess supx∈Ω ” can be replaced by “supx∈Ω ”. At stage t + 1, the state xt+1 can be written as the value assumed by a function gt : Xt → Xt+1 , under the constraint that gt (xt ) ∈ Dt (xt ) for every xt ∈ Xt , and Problem ΣN can be viewed as the maximization of a reward that depends on the N functions g0 , . . . , gN −1 , called policy functions or simply policies. We have o gto (xt ) ∈ argmax ht (xt , y) + βJt+1 (y) , (9) y∈Dt (xt ) and so optimal policy functions can be obtained as a byproduct of DP. In writing (9), we are supposing that, for every t = N − 1, . . . , 0 and every xt ∈ Xt , o (y)] exists. This happens, e.g., when D (x ) is nonempty maxy∈Dt (xt ) [h(xt , y)+βJt+1 t t o (y) is continuous. and compact and ht (xt , y) + βJt+1 We make the following assumption. Recall that for a convex set X ⊆ Rd and α ∈ R, a function f : X → R is α-concave on X if and only if f (x) + 12 αx2 is 388 J Optim Theory Appl (2013) 156:380–416 concave on X (for α = 0, we get the standard definition of concavity). For a positive integer m, we denote by C m (X) the set of functions on X with partial derivatives which are continuous up to the order m. If f is of class C 2 (X), then f is α-concave if and only if [33] (10) sup λmax ∇ 2 f (x) ≤ −α, x∈X where λmax (∇ 2 f (x)) is the maximum eigenvalue of the Hessian ∇ 2 f (x). Assumption 3.1 Let m ≥ 2 be an integer. The following hold for t = 0, . . . , N − 1. (i) Xt ⊂ Rd and Dt ⊆ Xt × Xt+1 are compact, convex, and have nonempty interiors; (ii) there exist optimal policies gto that are continuous and interior on int(Xt ), i.e., for every xt ∈ int(Xt ), gto (xt ) ∈ int(Dt (xt )); (iii) ht ∈ C m (Dt ), and there exists αt > 0 such that ht is αt -concave; (iv) hN ∈ C m (XN ) and is concave. Note that Assumption 3.1 implies Assumption 2.1, and Assumption 3.1(iii) implies the concavity of ht . Although the concavity of the transition and final rewards might appear restrictive, it is quite common in applications and typically made in the literature [7, 22, 23]. In many cases, the existence and continuity of the optimal policies can be checked through the Maximum Theorem [22, Theorem 3.6, p. 62] and its consequences [22, discussion on p. 63 and Exercise 3.11 (a), p. 57], while Assumption 3.1(i), (iii), and (iv) can be enforced by the problem formulation. The next proposition gives conditions under which the tth value functions are of class C m on compact sets X̄t that are closures of open sets, and each value function Jto can be extended on the whole Rd to functions which belong to the Sobolev spaces Wpm (Rd ). We shall exploit such properties in Sect. 4. For Ω1 ⊂ Ω ⊆ Rd and a function f on Ω, we denote by f |Ω1 its restriction to Ω1 (similarly, f is an extension to Ω of f |Ω1 ). For r > 0, by B1r (Rd ) we denote the Bessel potential space [34, p. 134] of functions u : Rd → R that can be written as u = Gr ∗ λ, where Gr : Rd → Rd is the inverse Fourier transform of Ĝr (ω) := (2π)−d/2 (1 + ω2 )−r/2 , “∗” denotes the convolution operator, and λ ∈ L1 (Rd ). Let uBr (Rd ) := λL1 (Rd ) . 1 Proposition 3.1 (Smoothness Properties of the Value Functions) Let Assumption 3.1 hold. For every t = 0, . . . , N , the following hold. (i) Jto ∈ C m (Xt ); o,p (ii) for every 1 ≤ p ≤ ∞, there exists a function J¯t ∈ Wpm (Rd ) such that Jto = o,p J¯t |Xt ; o,p (iii) for every 1 < p < ∞, there exists a function J¯t ∈ Bpm (Rd ) such that Jto = o,p J¯t |Xt . The same holds for p = 1 and m ≥ 2 even. o of J o and its derivatives up to a For a “sufficiently good” approximation J˜t+1 t+1 suitable order, the following technical result provides smoothness properties of Jˆto := o . Tt J˜t+1 J Optim Theory Appl (2013) 156:380–416 389 Proposition 3.2 (Smoothness Properties of the Value Functions) Let Assumption 3.1 hold for t = 0, . . . , N − 1, 1 ≤ p ≤ ∞, with item (ii) referred to the policies o g̃to (xt ) ∈ argmax ht (xt , y) + β J˜t+1 (y) (11) y∈Dt (xt ) instead of the optimal policies, and suppose that hN is αN -concave for some αN > 0. o,p o ∈ C m (X o m ∈ (i) If J˜t+1 t+1 ) is concave, then Tt J˜t+1 ∈ C (Xt ), and there exists Jˆt o,p o m d ˆ ˜ Wp (R ) such that Tt Jt+1 = Jt |Xt . o,p o,p (ii) There exists a function J¯t ∈ Wpm (Rd ) such that Jto = J¯t |Xt . (iii) For j = 1, 2, . . . , let J˜o ∈ C m (Xt+1 ) be such that t+1,j lim max j →∞ 0≤|r|≤m o o supxt+1 ∈Xt+1 D r Jt+1 (xt+1 ) − J˜t+1,j (xt+1 ) = 0. (12) o replaced by J˜o Then, for all sufficiently large j , (i) holds for J˜t+1 t+1,j , and setting o,p o,p , we have Jˆ = Tt J˜ t,j t+1,j o,p o,p lim J¯t − Jˆt,j W m (Rd ) = 0. j →∞ p o can be extended Proposition 3.2 has the following meaning. The function Tt J˜t+1 o,p on the whole Rd to a function Jˆt in the Sobolev space Wpm (Rd ). Moreover, if a o o sequence J˜t+1,j , j = 1, 2, . . . , of approximations of Jt+1 converges uniformly to o,p o J , together with the partial derivatives up to order m, then the sequence Jˆ , j = t,j o,p t+1 o converges in Wpm (Rd ) to the extension J¯t of Jto . 1, 2, . . ., of extensions of Tt J˜t+1,j See, e.g., [32, Sect. 6] for an example of a problem in which the assumptions of Proposition 3.2 are satisfied. 4 Accuracy of Suboptimal Solutions Classical approximation schemes in a function space can be expressed as linear combinations of fixed basis functions ϕ1 , . . . , ϕn that span an (at most) n-dimensional linear subspace [35]. So, they take on the form n δi ϕi (·), (13) i=1 where the coefficients δ1 , . . . , δn are determined in such a way to minimize the approximation error; for the use of such models in value-function approximation, see [3, Sect. 7.2.3]. For example, this is the case with algebraic and trigonometric polynomials in the space of continuous functions on compact sets. Properties of linear approximation have been extensively studied (see, e.g., [35]). 390 J Optim Theory Appl (2013) 156:380–416 An alternative approximation scheme consists of linear combinations φ1 (·, w1 ), . . . , φn (·, wn ) of basis functions obtained by varying vectors w1 , . . . , wn ∈ Rp of “inner” parameters in a “mother function” φ: n (14) δi φ(·, wi ). i=1 The “inner” parameter vectors w1 , . . . , wn have to be optimized together with the coefficients δ1 , . . . , δn of the linear combination. In general, the presence of the “inner” parameters “destroys” linearity, so (14) is a nonlinear approximation scheme [3, Sect. 7.4]. Equation (14) models a wide variety of approximating families used in applications, such as free-nodes splines, radial-basis-function networks with variable centers and widths, trigonometric polynomials with free frequencies and phases, and feedforward neural networks [16]. Approximating families of the form (14) belong to the so-called variable-basis approximation schemes [16, 36], whose advantages over classical linear ones of the form (13) was investigated, e.g., in [16, 36–38]. Roughly speaking, for a given accuracy of approximation of functions in certain spaces, variable-basis approximation schemes may require much less parameters to be optimized than linear ones. In the following, we consider the use in value-function approximation of some approximation schemes of the form (14). For a compact set X ⊂ Rd , we define the ridge variable-basis approximation scheme2 R(ψ, n) := fn : X → R | fn (x) = n δi ψ(ai · x + bi ), ai ∈ R , δi , bi ∈ R , (15) d i=1 where ai · x denotes the scalar product in Rd of ai and x. Note that (15) is of the form (14) with φ(x, wi ) := ψ(x · ai + bi ) and wi := (ai,1 , . . . , ai,d , bi ) ∈ Rd+1 . We say that the mother function ψ : R → R is q-smooth if and only if it belongs to the family S := ψ : R → R | nonzero, compactly supported, with continuous and uniformly q bounded partial derivatives up to the order q, and ∃l ≥ q s.t. 0 < l ∂ ψ dz < ∞ . l R ∂z (16) 2 Functions constant along hyperplanes are known as ridge functions. Each ridge function results from the composition of a multivariable function having a particularly simple form, i.e., the inner product, with an arbitrary function dependent on a single variable. J Optim Theory Appl (2013) 156:380–416 391 Examples of functions in S q are splines of smoothness order q + 1 [39]. We are interested in basis functions belonging to the family S := ψ : R → R | nonzero, infinitely many times differentiable in some open interval (a, b) ⊂ R and such that there exists c ∈ (a, b) ∂ k ψ s.t. = 0 ∀k ∈ N . ∂zk z=c (17) Examples of such basis functions used in applications are the so-called squashing (or logistic) function (1 + e−z )−1 [40] and the sinusoidal functions. The hyperbolic tangent is in S too, since tanh z = 2(1 + e−2z )−1 − 1 (to which the so-called feedforward sigmoidal neural networks correspond [37]). We define also the radial variable-basis approximation scheme n G(ψ, n) := fn : X → R | fn (x) = δi ψ x − τi /σi , i=1 τi ∈ R , δi ∈ R, σi > 0 , d (18) where ψ : R → R is a radial-basis function (e.g., the Gaussian e−z , to which the so-called Gaussian radial-basis networks correspond [41]). Note that (18) is of the form (14) with φ(x, wi ) := ψ(x − τi /σi ) and wi := (τi,1 , . . . , τi,d , σi ) ∈ Rd+1 . The following proposition investigates the sup-norm approximation of functions in the balls of certain Sobolev spaces, via the variable-basis schemes (15) and (18). 2 Proposition 4.1 (Approximation Error Bounds) Let d, q be positive integers, and X ⊂ Rd compact and convex. (i) If ψ ∈ S q+s and s = d/2 + 1, then there exists C > 0 such that, for every ρ > 0, every f ∈ Bρ ( · W q+2s+1 (Rd ) ), and every positive integer n, there is 2 fn ∈ R(ψ, n) such that ρ max sup D r f (x) − D r fn (x) ≤ C √ . 0≤|r|≤q x∈X n (19) (ii) If ψ ∈ S and s = d/2 + 1, then there exists C > 0 such that, for every ρ > 0, s (Rd ) ), and every positive integer n, there is fn ∈ R(ψ, n) every f ∈ Bρ ( · W∞ such that ρ sup f (x) − fn (x) ≤ C √ . (20) n x∈X (iii) If ψ is the Gaussian and s > d, then there exists C > 0 such that, for every ρ > 0, every f ∈ Bρ ( · Bs (Rd ) ), and every positive integer n, there is 1 392 J Optim Theory Appl (2013) 156:380–416 fn ∈ G(ψ, n) such that ρ sup f (x) − fn (x) ≤ C √ . n x∈X (21) For an increasing number n of variable-basis √ functions, the upper bounds provided by Proposition 4.1 decrease with an order 1/ n (the constants C may be different in each of them). The three estimates (i), (ii), and (iii) differ in the ways in which the approximation error is measured, in the values of the required smoothness s, and in the families of functions to which they apply. Item (i) provides an upper bound on the error in approximating functions in Sobolev balls (of radius ρ and centered on the origin) and all their partial derivatives up to a certain order, uniformly on a compact and convex set X ⊂ Rd . The approximators are variable-basis functions belonging to the family R(ψ, n) defined in (15) with a (q + s)-smooth basis function ψ, as defined in (16). The values of q and s are related to the largest order of the partial derivatives to be approximated and to the dimension d, respectively. For a lower degree of smoothness, a result similar to Proposition 4.1(i) was obtained in [42]. Item (ii) of Proposition 4.1 gives an upper bound on the error in approximating uniformly on X, by elements of R(ψ, n) with ψ ∈ S , functions in Sobolev balls, without enforcing derivative approximation. Finally, item (iii) provides an upper bound on the error in approximating uniformly on X, by elements of G(ψ, n) with ψ a Gaussian, functions belonging to balls in Bessel potential spaces B1s (Rd ), without enforcing derivative approximation. Combining the approximation tool given in Proposition 4.1 with the smoothness results stated in Propositions 3.1 and 3.2, we get the next Proposition 4.2. It estimates the error in the uniform approximation of Jto and its partial derivatives, when basis functions from S q or S are used in the approximation scheme (15) or the Gaussian is used in the approximation scheme (18). Proposition 4.2 (Approximations of the Value Functions) Let s := d/2 + 1. Then, there exist N positive constants Ct , N positive constants Ct , and N positive constants Ct such that the following hold for t = 0, . . . , N − 1. (i) Let Assumption 3.1 hold with item (ii) referred to the policies (11) and m ≥ 2 + (2s + 1)N , ψt ∈ S 2+(2s+1)(t+1) , and n̄N −1 , . . . , n̄0 be an N -tuple of sufficiently large positive integers. For t = N − 1, . . . , 0, if J˜ko ∈ R(ψk , nk ), nk ≥ n̄k , k = N − 1, . . . , t + 1, are suitable approximations of the value functions Jko from the stage t + 1 to the stage N − 1, then, for every positive integer nt , there exists ft ∈ R(ψt , nt ) such that Ct o (xt ) − D r ft (xt ) ≤ √ . sup D r Tt J˜t+1 (22) max nt 0≤|r|≤2+(2s+1)t xt ∈Xt The approximations J˜ko are obtained recursively by setting J˜ko := fk for k = N − 1, . . . , t + 1. (ii) Let Assumption 3.1 hold with m ≥ s, and ψt ∈ S . Then, for every t = 0, . . . , N − 1 and every positive integer nt , there exists ft ∈ R(ψt , nt ) such that C sup Jto (xt ) − ft (xt ) ≤ √ t . nt xt ∈Xt (23) J Optim Theory Appl (2013) 156:380–416 393 (iii) Let Assumption 3.1 hold with m > d, and ψt be the Gaussian. Then, for every t = 0, . . . , N − 1 and every positive integer nt , there exists ft ∈ G(ψt , nt ) such that C sup Jto (xt ) − ft (xt ) ≤ √ t . nt xt ∈Xt (24) The last step consists in combining the estimates from Proposition 4.2 with the bounds on error propagation from Proposition 2.2. This provides the following upper bounds on the approximation error. Theorem 4.1 (Final Error Bounds) Let s := d/2 + 1. Then, there exist N positive constants Ct , N positive constants Ct , and N positive constants Ct such that the following hold. (i) Under the assumptions of Proposition 4.2(i), (ii), and (iii), we have −1 N Ct sup J0o (x0 ) − J˜0o (x0 ) ≤ βt √ , nt x0 ∈X0 t=0 −1 N C (2β)t √ t , sup J0o (x0 ) − J˜0o (x0 ) ≤ nt x0 ∈X0 t=0 −1 N C (2β)t √ t , sup J0o (x0 ) − J˜0o (x0 ) ≤ nt x0 ∈X0 t=0 respectively, where J˜to , t = N − 1, . . . , 0 satisfies (3) with εt := and εt := C √t nt √Ct nt , εt := C √t nt , , respectively. To the best of our knowledge, for quite general formulations of N -stage optimization problems in the form of Problem ΣN , Theorem 4.1 gives the first upper bounds on the sup-norm error for value-function approximation by variable-basis approximation schemes, where the number of variable-basis functions required to guarantee a desired accuracy is estimated. Thus, Theorem 4.1 provides a partial answer to the issues remarked in [2, Chap. 6, p. 335], quoted in Sect. 1, and provides new insights into the effectiveness of value-function approximation by neural networks [3, Sect. 7.4]. Note that Theorem 4.1 requires a degree of smoothness of the function to be approximated that grows linearly with the dimension d of the state space. Estimates analogous to those provided by Theorem 4.1 can be obtained for ADP with the multistage lookahead technique described in Sect. 2, by combining the estimates from Proposition 4.2 with the bounds on error propagation from Proposition 2.3. 5 Application to a Problem of Optimal Consumption In the classical problem of Optimal Consumption (OC) [43, Chap. 6], a consumer aims at maximizing, over a given time horizon, the discounted value of consumption 394 J Optim Theory Appl (2013) 156:380–416 of a good for a given sequence of interest rates. The consumer has a certain initial wealth and at each time period earns an income, modeled as an exogenous input. We consider a “multidimensional version” of the OC problem, denoted by OCdN , in which there are d > 1 consumers that aim at maximizing a “social utility function”. Problem OCdN A set of d consumers aims at finding J (a0 ) = o sup N −1 β t u(ct ) + ct ,t=0,...,N t=0 d vt,j (at,j ) + β N u(cN ), j =1 where at ∈ At = dj =1 At,j ⊆ Rd , at+1,j = ft,j (at,j , ct,j ) = (1 + rt,j )(at,j + yt,j − ct,j ), j = 1, . . . , d, aN,j + yN,j − cN,j ≥ 0, j = 1, . . . , d, ct,j ≥ 0, j = 1, . . . , d, y0,j , . . . , yN,j ≥ 0 given, r0,j , . . . , rN −1,j ≥ 0 given. Here, at,j and yt,j are the wealth and the labor income of the consumer j at time t, respectively, ct,j ≥ 0 is the current consumption of a good by consumer j , and rt,j is an interest rate associated with that good. Each vector of consumptions ct is chosen as a function of the current state vector at . The function u is a social utility associated with the vector ct of consumptions with components ct,j , vt,j is an individual utility dependent on at,j , and β > 0 is a fixed discount factor. For j = 1, . . . , d, the budget constraints aN,j + yN,j − cN,j ≥ 0 (25) (also called no-Ponzi-game conditions) mean that all the consumers have to repay any debts within the time N . We assume that the utility functions vt (at,j ) penalize min , for which all the consumers the closeness of the at,j to their minimum values at,j will be able to satisfy conditions (25) in the future. The latter imply some constraints on the sets At,j to which the state variables at,j belong. These are taken into account in the next assumption. max , j = 1, . . . , d, the sets A Assumption 5.1 For every given a0,j t,j are closed and bounded intervals N −1 N −1 min max k=i (1 + rk,j ) + yN,j i=0 yi,j max A0,j := a0,j , a0,j = − , , a0,j N −1 k=0 (1 + rk,j ) N −1 N −1 min max i=t yi,j k=i (1 + rk,j ) + yN,j At,j := at,j , at,j = − , N −1 k=t (1 + rk,j ) t−1 t−1 t−1 max a0,j (1 + rk,j ) + yi,j (1 + rk,j ) , k=0 i=0 k=i t = 1, . . . , N − 1, AN,j N −1 N −1 N −1 min max max := aN,j , aN,j = −yN,j , a0,j (1 + rk,j ) + yi,j (1 + rk,j ) . k=0 i=0 k=i J Optim Theory Appl (2013) 156:380–416 395 min ≤ 0 and a min min Note that at,j t+1,j can be recursively computed as at+1,j = min + y )(1 + r ), so a min + y min (at,j t,j t,j t,j ≤ 0 for t = 0, . . . , N − 1 and aN,j + yN,j = 0. t,j Proposition 5.1 Assumption 5.1 implies that the budget constraint (25) is satisfied. Suppose that the partial derivatives of u with respect to each of its arguments are positive. Then, at the stage N , the best choice for the d consumers is cN,j = aN,j + yN,j , j = 1, . . . , d. A change of variable allows one to write the objective of Problem OCdN as N −1 t=0 d (1 + rt ) ◦ (at + yt ) − at+1 + β u vt,j (at,j ) + β N u(aN + yN ) , 1 + rt t j =1 where, for two vectors a, b of the same dimensions, we denote by a ◦ b their entrywise product and, provided that all their components are different from 0, by 1/a, 1/b their entry-wise reciprocals. Having replaced ct,j with its expression in terms of at,j and at+1,j , the largest allowable consumption at time t when the consumer j is min (1+rt,j )(at,j +yt,j )−at+1,j . With such a choice, the next 1+rt,j min state for the consumer j is at+1,j . Moreover, the nonnegativity constraint ct,j ≥ 0 min by the no-Ponzibecomes at+1,j ≤ (1 + rt,j )(at,j + yt,j ), whereas at+1,j ≥ at+1,j game-conditions. Summing up, Problem OCdN can be reformulated as an instance of Problem ΣN with max (a ) = in the state at,j is ct,j t,j Xt := d (26) At,j , j =1 min Dt := (at , at+1 ) ∈ (At × At+1 ) : at+1,j ∈ at+1,j , (1 + rt,j )(at,j + yt,j ) , (27) d (1 + rt ) ◦ (at + yt ) − at+1 + vt,j (at,j ), ht (at , at+1 ) := u 1 + rt (28) j =1 and hN (aN ) := u(aN + yN ). (29) In the following, we describe a condition that guarantees the smoothness of the optimal policies on suitable subsets X̄t ⊂ Xt . When one of the components j of at is min , the optimal policy g o (a ) cannot be interior, since one has necessarily equal to at,j t t o min gt,j (at ) = at+1,j . This is the reason for which in the next Assumption 5.2(i), the interiority of the optimal policies is imposed only on a suitable subset of the state space. We let min , j = 1, . . . , d , Āt := at ∈ At : at,j ≥ āt,j t = 0, . . . , N, (30) 396 J Optim Theory Appl (2013) 156:380–416 min such that a min < ā min < a max , and for some āt,j t,j t,j t,j D̄t := Dt ∩ (Āt × Āt+1 ), t = 0, . . . , N − 1. (31) We also require that gto (Āt ) ⊆ Āt+1 and that, for every at ∈ int(Āt ), one has gto (at ) ∈ int(D̄t (at )). Assumption 5.2 Let m ≥ 2 be an integer, and let the sets At , t = 0, . . . , N , be chomin , j = 1, . . . , d}, t = 0, . . . , N , sen as in Assumption 5.1, Āt := {at ∈ At : at,j ≥ āt,j d min such that a min < ā min < a max , and I d := for some āt,j u j =1 Iu,j , where, for t,j t,j t,j N −1 N −1 N −1 max j = 1, . . . , d, Iu,j := [0, a0,j k=i (1 + rk,j ) + yN,j ]. k=0 (1 + rk,j ) + i=0 yi,j Then: (i) there exist optimal policies gto that are continuous and interior on int(Āt ), gto (Āt ) ⊆ Āt+1 , and, for every at ∈ int(Āt ), we have gto (at ) ∈ int(D̄t (at )); (ii) u ∈ C m (Iud ), u is α-concave on Iud for some α > 0, and the partial derivatives of u with respect to each of its arguments are positive on the set Iud (i.e., the marginal utility of each consumption is positive on Iud ); (iii) vt,j ∈ C m (At,j ), vt,j is βt,j -concave on the At,j for some βt,j > 0, and the derivative of each vt,j is positive on At,j for t = 0, . . . , N , j = 1, . . . , d. The set Iu,j in Assumption 5.2 represents the largest interval to which the consumption cN,j (so, all the other consumptions ct,j , t = 0, . . . , N − 1) can belong. Proposition 5.2 For Problem OCdN , Assumption 5.2 implies Assumption 3.1 with Xt replaced by Āt and Dt replaced by D̄t . Proposition 5.2 allows one to derive for Problem OCdN particular forms of Propositions 3.1, 3.2, and Theorem 4.1 on the accuracy of suboptimal solutions to Problem OCdN via approximation of the value functions by the family Ft = R(ψt , nt ) (see (15)) or Ft = G(ψt , nt ) (see (18)), not reported here for lack of space. 6 Design of the ADP Algorithm Our results can be exploited to develop an ADP algorithm based on the use of variable-basis approximation schemes to approximate the value function at each stage. The first step consists in performing a suitable discretization of the sets Xt for t = 0, . . . , N − 1, to which the corresponding state vectors xt belong. Let Lt be the number of discretization points at stage t. Of course, they should be spread “as uniformly as possible”. The notions of “uniformity” and “good spreading” have been largely discussed in the literature on statistics and number-theoretic methods [44]. Given a compact set S ⊂ Rd and a positive integer L, let SL ⊂ S be a set of L sample points si , i = 1, 2, . . . , L, belonging to S. The dispersion of SL is defined as [44] θ (SL ) := sup min s − s̃. s∈S s̃∈SL J Optim Theory Appl (2013) 156:380–416 397 Fig. 1 Comparison between pure-random and Sobol’s samplings of the 2-dimensional unit cube So, θ (SL ) is a measure of the uniformity of the distribution of the L points of SL . Roughly speaking, a small value of θ (SL ) guarantees that the points of SL are spread on S “in an uniform way”, i.e., “close enough” to one another and without leaving regions “undersampled”. Random sampling with an uniform distribution, called Monte Carlo sampling [45], can be used to generate the discretized sets. Unfortunately, it is known that the resulting points are not uniformly scattered, in the sense that their dispersion has large values. To sample at each stage t = 0, . . . , N − 1 the set Xt to which the corresponding state vectors xt belong, we shall adopt the approach of [13], based on taking finite portions of the so-called low-discrepancy sequences (e.g., the good lattice points sequence, the Niederreiter sequence, the Halton sequence, the Hammersley sequence, and the Sobol’ sequence), as it is known [46, p. 152] that they are low-dispersion sequences. A random sampling of the set S with such sequences is usually referred to as quasi-Monte Carlo sampling. As compared with Monte Carlo sampling, discretization based on low-discrepancy sequences suffers less from the formation of clusters of points in particular regions of the space [44] (such formation undermines the sampling uniformity). A comparison between a sampling of the twodimensional unit cube by a sequence of 1000 points independently and uniformly distributed and by a sampling of the same cube obtained via the Sobol’ sequence [47] is shown in Fig. 1. It can be clearly seen how the space is better covered by the second sequence and how the largest empty spaces among the points appear in the first sampling scheme. These properties motivate our choice of Sobol’ sequences to sample the sets Xt . We denote by XtLt the corresponding discretized set and by xtl , l = 1, . . . , Lt , its points. The second step of the ADP algorithm consists in performing the optimizations o Jˆto xtl := sup ht xtl , y + β J˜t+1 (y) , l = 1, . . . , Lt , t = N − 1, . . . , 0, (32) y∈Dt (xtl ) o (·) for each discretized value from stage t = N − 1 back to stage t = 0, where J˜t+1 is the approximation of the value function at stage t + 1. Such an approximation is built up backwards from t = N − 1 to t = 0 using a ridge (see Eq. (15)) or radial (see 398 J Optim Theory Appl (2013) 156:380–416 Eq. (18)) variable-basis approximation scheme, in such a way to guarantee an error at most equal to εt uniformly over Xt , i.e., (33) sup J˜to (xt ) − Jˆto (xt ) ≤ εt , t = N − 1, . . . , 0 xt ∈Xt (see Proposition 2.2). The value of εt has to be chosen is such a way to guarantee the desired accuracy of the suboptimal solution, i.e., such that supx0 ∈X0 |J0o (x0 ) − J˜0o (x0 )| is below a desired threshold (see Theorem 4.1 for conditions guaranteeing the existence of a sequence of approximators for which the accuracy is below the threshold). The initialization is given by J˜No (·) ≡ hN (·). We have to face the following two issues. (1) The sup-norm in (33) does not allow the application of iterative descent methods. To overcome this drawback, at each stage t we consider the Lpt -norm: o J˜ (xt ) − Jˆo (xt ) t t Lpt (Xt ) (1) ≤ εt . (34) By [48, Theorem 14F, p. 39], for any bounded and continuous function f on Xt , one has limpt →∞ f Lpt (Xt ) = supxt ∈Xt |f (xt )|. (2) Only the values Jˆto (xtl ) in correspondence of the discretization points are available, so, instead of (34), all we can impose is an upper bound on the corresponding discretized Lpt -norm: Lt o l p 1 J˜t xt − Jˆto xtl t Lt 1/pt (2) ≤ εt . (35) l=1 By exploiting the properties of low-discrepancy sequences used to discretize the sets Xt , the smoothness properties that we have proved for the functions Jto and Jˆto , and the smoothness properties of the mother function (either a Gaussian or belonging to the family (17)) used to obtain the approximation J˜to , one can show that, for a sufficiently large number Lt of samples and value of pt and a sufficiently small value (2) of εt , the upper bound (35) implies (34). Similarly, one can prove that with a suitable (1) value of εt , the upper bound (34) guarantees (33). As a detailed analysis of these topics is beyond the scope of the present paper, in the following we limit ourselves to a sketch of a possible procedure suggested by the results of the previous sections and the discussion above. For t = 0, . . . , N − 1, let εt be the maximum allowed error in the sup-norm approximation supxt ∈Xt |J˜to (xt ) − Jˆto (xt )|, in such a way to guarantee the desired solution accuracy, i.e., such that supx0 ∈X0 |J0o (x0 ) − J˜0o (x0 )| is below a desired threshold (Proposition 2.2). (1) Choose a mother function ψ , either a Gaussian or belonging to the family (17). (2) Let J˜No (·) := hN (·). Set t := N − 1. (3) Choose a value of pt and a corresponding discretized Lpt -norm, a number nt of basis functions, and a number Lt of discretization points. (4) Use low-discrepancy sequences to generate the discretized set XtLt . J Optim Theory Appl (2013) 156:380–416 399 o (y)], l = 1, . . . , L . (4.1) Compute Jˆto (xtl ) := maxy∈Dt (x l ) [ht (xtl , y) + β J˜t+1 t t n (4.2) Let J˜t (·, δt , wt ) := i=1 δt,i φ(·, wt,i ) be the structure of the approximate value functions obtained via a ridge (see Eq. (15)) or radial (see Eq. (18)) variable-basis approximation scheme with the mother function ψ . Find o o δt , wt := arg min δt ,wt Lt l l pt 1 o J˜t xt , δt , wt − Jˆt xt . Lt l=1 Let J˜to (·) := J˜t (·, δto , wto ). (4.3) Compute the maximum error Errt (XtLt ) over the discretized set XtLt : Errt XtLt := max J˜to xtl − Jˆto xtl . l=1,...,Lt (4.4) If Errt (XtLt ) ≤ εt and t = 0, then stop. If Errt (XtLt ) ≤ εt and t = 0, then set t := t − 1 and go back to step (3). If Errt (XtLt ) > εt , increase pt and/or Lt and/or nt and go back to step (4). Many variations of the above-described procedure are possible. For instance, the desired smoothness of J˜to can be increased by adding a regularization term in the objective function of step (4.2), e.g., proportional to the squared l2 -norm of the vector of its parameters. Moreover, instead of fixing εt and varying nt until the desired accuracy is guaranteed, one may fix nt and find εt a posteriori, after performing step (4.2). If the resulting value of εt is not sufficiently small, then one can increase nt and repeat step (4.2) with the new value of nt . 7 Numerical Results In order to evaluate the effectiveness of our approach, we present numerical results for two instances of Problem ΣN , Problem OCdN described in Sect. 5 and a test problem, for which the optimal solution can be found. In the following, we shall refer to the latter as “Problem with Known Solution”, or Problem KSdN for short. We constructed it via the inverse-problem technique used in [49, 50] to derive N -stage optimization problems associated with given optimal policy functions. Problem KSdN Given Xt := [0, 1]d , Dt := [0, 1]d × [0, 1]d , t = 0, . . . , N − 1, β ∈ [0, 1], and a sequence of functions ḡt : [0, 1]d → [0, 1]d , t = 0, . . . , N − 1, such that ḡt (xt ) ∈ (0, 1)d for all xt ∈ (0, 1)d , find J o (x0 ) := sup N −1 xt+1 ,t=0,...,N −1 t=0 β t ht (xt , xt+1 ) + hN (xN ), where hN (xN ) := 1 − 12 xN 2 and ht (xt , xt+1 ) := 1 − 12 xt 2 − 12 xt+1 − ḡt (xt )2 − β(1 − 12 xt+1 2 ). 400 J Optim Theory Appl (2013) 156:380–416 It follows by an application of Bellman’s equations (2a)–(2b) that the optimal policies of Problem KSdN are gto (xt ) = ḡt (xt ), t = 0, . . . , N − 1, and the value functions are given by Jto (xt ) = 1 − 12 xt 2 for t = 0, . . . , N (so, they are smooth and αt -concave with αt = 1). Thus, the optimal value function of Problem KSdN at time t = 0 is 1 J o (x0 ) := J0o (x0 ) = 1 − x0 2 . 2 For both Problems KSdN and OCdN , we present the results obtained with two instances of the ridge approximation scheme (15) with basis functions given by the hyperbolic tangent sigmoid (called in the following as sigmoidal basis functions) and the sinusoid (both belonging to the family (17)), respectively, and one instance of the radial approximation scheme (18) with Gaussian basis functions. The inner parameters are the weights and biases in the sigmoids, the frequencies and phases in the sinusoids, and the centers and widths in the Gaussians. For every t = 0, . . . , N − 1, we used the same number nt of basis functions, denoted simply by n; the left-hand side of (35) was computed for pt = 2. In all the considered cases, for a given value of n, the number of parameters to optimize is the same for functions belonging to the class R or G (see Eqs. (15) and (18)). We compared the results with linear approximators with the same type of basis functions, but having fixed inner parameters (so, only the outer parameters δi , i = 1, . . . , n, have to be optimized). Specifically, we fixed the inner parameter values randomly by using Sobol’ low-discrepancy sequences, in order to cover the state space as uniformly as possible. We used such sequences also to discretize via Lt = 1000 points the sets Xt , t = 0, . . . , N − 1 (we used the same value of Lt for all t). The general case in which Lt , nt , and pt depend on t is described in Sect. 6 and can be implemented in a similar way. To display in a compact way the realizations of the error for various values of the initial state x0 , we use a pictorial representation known in Statistics as boxplot [51]. Specifically, the “box” in a boxplot ranges from the 25th percentile (25% of the realizations are at or below it) to the 75th percentile (75% of the realizations are at or below it). The length of the box corresponds to the inter-quartile range, i.e., the difference between the 75th percentile and the 25th percentile. The line inside the box is the median, or the 50th percentile (half of the realizations of the optimal cost are above it and half are below it). The lines extending from each end of the box are the whiskers: they contain “less significant” realizations of the error that do not fit into the box. The single values beyond the borders of the whiskers are the outliers. The simulations were performed using the Optimization Toolbox of Matlab on a personal computer with a 1.8 GHz Core2 Duo CPU and 2 GB of RAM. 7.1 Numerical Results for Problem KSdN We solved Problem KSdN over a total number of four decision stages (i.e., N = 3) and two values of d, d = 2 and d = 10. The discount factor β was taken equal to 1 and the functions ḡt , t = 0, . . . , N − 1, were chosen as ḡt (xt ) := 1 2 + sin(txt ). 2 5 J Optim Theory Appl (2013) 156:380–416 401 Table 1 Summary of the simulation results for Problem KSdN Median of eJ (x0 ) Sigmoidal Mean simulation time [s] Gaussian Sinusoidal Sigmoidal Gaussian Sinusoidal Fixed-basis functions d =2 d = 10 n=5 2.00 × 10−2 1.02 × 10−3 1.34 × 10−2 3.33 × 101 9.50 × 101 9.45 × 101 n = 10 1.84 × 10−2 3.98 × 10−4 3.45 × 10−4 1.35 × 102 1.72 × 102 1.60 × 102 n = 15 1.35 × 10−2 1.10 × 10−4 3.57 × 10−4 1.07 × 102 2.67 × 102 2.76 × 102 n = 20 7.01 × 10−4 8.71 × 10−5 6.10 × 10−5 1.37 × 102 4.45 × 102 3.36 × 102 n=5 4.94 × 10−1 3.16 × 10−1 4.92 × 10−1 1.09 × 102 7.03 × 102 4.68 × 102 n = 10 4.91 × 10−1 8.58 × 10−2 4.85 × 10−1 2.27 × 102 3.12 × 103 1.25 × 103 n = 15 4.85 × 10−1 3.18 × 10−2 4.88 × 10−1 7.50 × 102 4.77 × 103 2.30 × 103 n = 20 4.62 × 10−1 7.07 × 10−3 4.11 × 10−1 3.82 × 102 6.17 × 103 3.57 × 103 Variable-basis functions d =2 d = 10 n=5 5.35 × 10−4 1.53 × 10−3 1.86 × 10−4 1.04 × 103 2.39 × 103 9.22 × 102 n = 10 1.94 × 10−5 4.23 × 10−4 7.88 × 10−5 2.00 × 103 5.39 × 103 1.52 × 103 n = 15 7.71 × 10−6 3.04 × 10−4 3.65 × 10−5 2.64 × 103 1.02 × 104 2.28 × 103 n = 20 8.67 × 10−6 3.46 × 10−5 2.96 × 10−5 3.31 × 103 1.33 × 104 2.63 × 103 n=5 3.86 × 10−1 2.07 × 10−1 3.01 × 10−1 1.19 × 104 1.50 × 104 1.36 × 104 n = 10 3.27 × 10−1 2.01 × 10−2 4.29 × 10−2 1.98 × 104 2.64 × 104 1.99 × 104 n = 15 1.25 × 10−1 1.94 × 10−3 2.19 × 10−2 2.07 × 104 3.78 × 104 2.08 × 104 n = 20 6.95 × 10−3 3.23 × 10−3 1.67 × 10−2 2.24 × 104 4.92 × 104 2.23 × 104 The results reported in the following compare the performances of the considered types of fixed- and variable-basis functions for a total of 1000 different values of x0 ∈ [0, 1]d . As the solution (i.e., J o (x0 ) at stage 0) of Problem KSdN is known, once fixed x0 we can evaluate the effectiveness of the approach by measuring as follows the distance between the optimal and the approximate solutions: eJ (x0 ) := |J˜o (x0 ) − J o (x0 )| . |J o (x0 )| (36) Table 1 provides the medians of the error eJ (x0 ) computed as in (36), together with the mean simulation times (in seconds) obtained by the ADP algorithm with fixedand variable-basis approximators for 1000 different initial conditions x0 . Figure 2 contains a pictorial sketch of the error eJ using boxplots. 7.2 Numerical Results for Problem OCdN We solved Problem OCdN over a total number of four decision stages (i.e., N = 3) and various numbers d of consumers: d = 2, d = 10, and d = 30. In all these cases, the labor income yt,j of the consumer j at time t, j = 1, . . . , d, t = 0, . . . , N − 1, was randomly generated according to the uniform distribution in 402 J Optim Theory Appl (2013) 156:380–416 Fig. 2 Boxplots of the error eJ for Problem KSdN , where “F” and “V” denote the approximation schemes with fixed- and variable-basis functions, respectively the interval [2, 5]. The interest rate rt,j of the good consumed by consumer j at time t, j = 1, . . . , d, t = 0, . . . , N , was uniformly distributed between 0 and 0.1. max was set equal to 20 for every The discount factor β was taken equal to 1, and a0,j j = 1, . . . , d. The conditions of Assumption 5.2 were imposed by choosing suitable logarithmic functions for u(·) and vt,j (·) in such a way that, for every consumer, the min and a choices at+1,j = at+1,j t+1,j = (1 + rt,j )(at,j + yt,j ) (i.e., ct = 0) are penalized so that they are never optimal next choices for the j th component of the state, at least min , a max ] ⊂ [a min , a max ], where for values of at,j in a suitable interval of the form [āt,j t,j t,j t,j min and a max are determined by a max and Assumption 5.1. In particular, the function at,j t,j 0,j u(·), for t = 0, . . . , N , was taken equal to 3 u(ct ) := 2 d j =1 K ln(ct,j d 2 1 + ε) − !4 + K ln(ct,j + ε) , 2 (37) j =1 where K := 10 and ε := 1. For t = 0, . . . , N − 1 and j = 1, . . . , d, we have chosen min + ε), where K := 10 and ε := 1, and likewise for u(c ). vt,j (at,j ) := K ln(at,j − at,j t The use of logarithmic reward functions is quite common for the problem of optimal consumption (see, e.g., [43, Chap. 6]). The value ε > 0 in the expression (37) of the social utility function has to be sufficiently small so that the choice ct,j = 0, j = 1, . . . , d, is sufficiently penalized (i.e., the corresponding value u(ct,j ) is negative and has a sufficiently large absolute value), while the arguments of the logarithms are positive. The function u(ct ) specified by (37) is√of the form f2 (f1 (ct )), where f1 (ct ) := dj =1 K ln(ct,j + ε) and f2 (z) := 32 z − 12 4 + z2 , where the functions f1 and f2 are nonlinear, and their composition is strongly concave. The function f2 allows one to increase the interactions among the d consumers. J Optim Theory Appl (2013) 156:380–416 403 Fig. 3 Examples of wealth incomes of Problem OCdN obtained with variable-basis functions The results reported in the following compare the performances of the considered types of fixed- and variable-basis functions for a total of 1000 different values min , a max ], j = 1, . . . , d (see Assumption 5.1). Examples of the wealth of a0,j ∈ [a0,j 0,j income at,j in the cases of d = 2, d = 10, and d = 30 consumers obtained with variable-basis functions are reported in Fig. 3. Table 2 provides the medians of the approximate value J˜o (a0 ) at stage 0, together with the mean simulation times (in seconds) obtained by the ADP algorithm with both fixed- and variable-basis approximators for 1000 different initial conditions a0 in the cases of 2, 10, and 30 consumers. Due to the unavailability of closed-form solutions to Problem OCdN , following the criterion adopted in [12, p. 46], we evaluated the performances of the approximations on the basis of many different solutions obtained with the ADP approach itself. More specifically, once fixed n and d and for each initial condition a0 , the corresponding “optimal” value J o (a0 ) was supposed to be equal to the largest one among those obtained by the sigmoidal, Gaussian, or sinusoidal basis functions with either fixedor variable-basis functions. We denote such an optimal value as J˜o,max (a0 ). Then, we define the error eJ (a0 ) as the difference between the value J˜o (a0 ) obtained by a given type of basis function and the “optimal” one J˜o,max (a0 ), i.e., eJ (a0 ) := |J˜o (a0 ) − J˜o,max (a0 )| . |J˜o,max (a0 )| (38) Such a procedure does not allow one to evaluate the performance of an approximation strategy per se, i.e., it does not provide the distance between J˜o (a0 ) and J o (a0 ), but is useful whenever one wants to compare the solutions of a given optimization problem obtained by the same algorithm in different “configurations”. In our case, such configurations correspond to the different types of basis functions used for the approximation of the value functions with the ADP algorithm using either fixed- or variable-basis schemes. Figure 4 shows the boxplots of the “errors” eJ (a0 ) computed according to (38) for different initial conditions a0 and types and numbers of basis functions in the cases of 2, 10, and 30 consumers. 7.3 Discussion of the Simulation Results The simulation results obtained by Problems KSdN and OCdN exhibit similar features, so a unified discussion is presented. 404 J Optim Theory Appl (2013) 156:380–416 Table 2 Summary of the simulation results for Problem OCdN Median of J˜o (a0 ) Sigmoidal Mean simulation time [s] Gaussian Sinusoidal Sigmoidal Gaussian Sinusoidal Fixed-basis functions d =2 d = 10 d = 30 n=5 1.84 × 102 1.08 × 102 1.15 × 102 8.63 × 101 9.62 × 101 7.97 × 101 n = 10 2.61 × 102 1.89 × 102 1.38 × 102 1.59 × 102 2.68 × 102 1.75 × 102 n = 15 2.61 × 102 2.55 × 102 1.39 × 102 2.71 × 102 5.41 × 102 2.97 × 102 n = 20 2.64 × 102 2.88 × 102 1.45 × 102 4.80 × 102 1.02 × 103 5.19 × 102 n=5 6.79 × 102 6.08 × 102 7.07 × 102 7.61 × 102 6.96 × 102 8.28 × 102 n = 10 8.29 × 102 8.81 × 102 7.15 × 102 1.94 × 103 4.80 × 103 1.44 × 103 n = 15 9.41 × 102 9.92 × 102 7.20 × 102 3.07 × 103 1.17 × 104 1.99 × 103 n = 20 1.09 × 103 1.33 × 103 7.50 × 102 4.56 × 103 1.47 × 104 2.83 × 103 n=5 1.94 × 103 2.07 × 103 1.98 × 103 3.93 × 103 1.86 × 103 3.73 × 103 n = 10 2.10 × 103 2.17 × 103 2.14 × 103 8.07 × 103 4.15 × 103 5.95 × 103 n = 15 2.19 × 103 2.18 × 103 2.19 × 103 1.01 × 104 2.03 × 104 9.74 × 103 n = 20 2.19 × 103 2.36 × 103 2.28 × 103 1.09 × 104 3.06 × 104 1.22 × 104 Variable-basis functions d =2 d = 10 d = 30 n=5 2.88 × 102 2.90 × 102 1.48 × 102 2.44 × 103 7.21 × 103 1.06 × 103 n = 10 2.91 × 102 2.90 × 102 1.48 × 102 5.17 × 103 1.07 × 104 1.26 × 103 n = 15 2.91 × 102 2.91 × 102 1.60 × 102 9.24 × 103 3.12 × 104 2.56 × 103 n = 20 2.90 × 102 2.91 × 102 1.55 × 102 1.43 × 104 3.92 × 104 4.18 × 103 n=5 8.22 × 102 1.28 × 103 7.71 × 102 9.42 × 103 1.64 × 104 4.58 × 103 n = 10 7.80 × 102 1.32 × 103 7.82 × 102 1.41 × 104 2.37 × 104 1.24 × 104 n = 15 9.51 × 102 1.34 × 103 9.00 × 102 1.46 × 104 3.78 × 104 1.57 × 104 n = 20 9.85 × 102 1.29 × 103 9.53 × 102 1.54 × 104 4.11 × 104 1.94 × 104 n=5 2.18 × 103 2.66 × 103 2.24 × 103 1.28 × 104 1.76 × 104 2.94 × 104 n = 10 2.17 × 103 2.58 × 103 2.46 × 103 1.48 × 104 3.28 × 104 3.48 × 104 n = 15 2.47 × 103 3.51 × 103 2.56 × 103 1.57 × 104 4.11 × 104 3.71 × 104 n = 20 2.60 × 103 3.64 × 103 2.66 × 103 1.61 × 104 4.87 × 104 3.95 × 104 For both problems, the best results were obtained by using Gaussian variablebasis functions. The performances of the sigmoidal basis functions are similar to the Gaussian ones only for d = 2, whereas a larger value of d (i.e., d = 10 or d = 30) entails worse results. The sinusoidal basis functions provide the worst results with d = 2, whereas similar results to those obtained with the sigmoidal functions are presented for d = 10 and d = 30. The similar performances between sigmoidal and sinusoidal functions may be ascribed to the fact that they belong to the same family R defined by (15), i.e., they both use a scalar product between coefficients and inputs. By contrast, the Gaussian basis functions belong to the family G defined by (18) and are based on a distance between the inputs and the coordinates of the centers rather than on a scalar product. As a consequence, they are more “localized” than sigmoids and sinusoids, which are J Optim Theory Appl (2013) 156:380–416 405 Fig. 4 Boxplots of the errors eJ for Problem OCdN , where “F” and “V” denote the approximation schemes using fixed- and variable-basis functions, respectively spread all over the domain Xt . In the considered examples, the radial variable-basis scheme provides better results than the ridge one. As compared with the results obtained with fixed-basis approximators, variablebasis ones guarantee better performances in term of accuracies. The gap in the results increases with d: for d = 2, the performances of the two types of approximators are similar, whereas for d = 10 and d = 30, the variable-basis approximators outperforms the fixed-basis ones. In both Problems KSdN and OCdN , the simulation results confirm the good properties of variable-basis approximation schemes when the dimension of the inputs of the value functions increases. Fixed-basis schemes require more basis functions to obtain the same performances of variable-basis ones. The required number of fixed-basis functions needed to guarantee the same approximation capabilities of variable-basis ones grows with the dimension of the input of value functions. Once the type of basis functions and the number d have been fixed, the value of the errors decreases if the number n of basis functions increases. This turns out to be more evident with d = 10 or d = 30, whereas with d = 2, the difference in the results obtained with various n is reduced. This can be explained as follows. In the case of d = 2, the dimension of the inputs xt , t = 0, . . . , N − 1, of the approximate tth value functions J˜to is quite small, so also a small number of basis function can provide satisfactory approximations. In other words, in this case there is no need to use many 406 J Optim Theory Appl (2013) 156:380–416 Table 3 Computational efforts of value-function approximations Fixed-basis schemes Variable-basis schemes Number of discretization points of Xt L L Number of optimizations L + 1 (Eqs. (32) and (35)) L + 1 (Eqs. (32) and (35)) d (Eq. (32)) d (Eq. (32)) n (Eq. (35)) dn + 2n (Eq. (35)) Number of unknowns basis functions, i.e., many parameters. On the contrary, when the dimension of the input of the approximate tth value functions increases (in the example, from d = 10 to d = 30), a larger value of n—thus a larger number of parameters—provides an increased approximation capability, which enables one to obtain better approximations. This is particularly evident in the case of d = 30. Concerning the simulation times, the larger d, the larger the computational time needed to perform the optimizations with either fixed- or variable-basis functions. From the numerical results it turns out that the Gaussians require a larger computational effort with respect to the other two types of basis functions. The computational times of the sigmoidal and sinusoidal basis functions are similar and smaller than the corresponding times for the Gaussians. In all cases, as expected, the simulation times grow as the number n of basis functions grows since it is required to find the optimal value of a larger number of parameters. For the same number n of basis functions, the variable-basis schemes require a larger computational effort. Indeed, the number of parameters to be optimized in the variable-basis case is larger than the fixed-basis one (one has to optimize also the value of the inner parameters). 7.4 Computational Aspects Table 3 summarizes the computational efforts needed to find the approximations of the value functions at each stage t, using at each stage the same numbers L of discretization levels and n of basis functions. The approximate value functions J˜to are obtained by solving the optimization problem (35) at each time t. In the case of variable-basis functions, such an optimization is performed in the (dn + 2n)-dimensional space of the parameters of the approximating structures (15) and (18). For fixed-basis functions, the number of unknowns in Eq. (35) is equal to n. At each time t = 0, . . . , N − 1, to find the approximation of Jto , we have to make first the L optimizations (32), in such a way to obtain the values of Jˆto at the discretization points xtl of Xt . Each of such additional optimizations involves merely d unknowns, and thus it is easier to solve with respect to the optimization (35). Hence, the total number of optimizations that have to be performed at each time t is equal to L + 1. Each optimization in (35) is a mathematical programming problem that can be solved, e.g., via iterative descent methods. Specifically, we exploited the sequential quadratic programming algorithm [52]. In the case of the variable-basis approach, problem (35) is more difficult to solve than in the fixed-basis one because of the presence of the free inner parameters, on which the approximate value functions depend J Optim Theory Appl (2013) 156:380–416 407 nonlinearly. As a consequence, the numerical solver used for (35) may be trapped into local minima, thus compromising the effectiveness of the approximation. In order to mitigate this risk, we have adopted a “multistart technique”, which consists in solving (35) for several different initial values of the parameter vectors of the approximating structures (15) and (18), and choosing the parameters corresponding to the best results as the optimal ones. By contrast, in the fixed-basis case the approximate value functions depend linearly on the unknown parameters, and thus their optimal selection can be performed more easily. However, when global optimal solutions are found, the quality of variable-basis approximations are better in general, as variablebasis functions have a greater approximation capability than fixed-basis ones. In general, the number of fixed-basis functions needed to obtain the same approximation accuracy of variable-basis ones is large if d is large. Hence, in the online phase of the ADP algorithm, after having solved problem (35), one has to deal with complex approximators, in which the linear combinations in (15) or (18) are made up of a large number of terms. On the contrary, variable-basis approximators are more easy to deal with online once the optimal value of the parameters have been found since the sums in (15) or (18) are made up by a smaller number of terms. The use of sigmoidal or Gaussian basis functions requires the computation of exponential functions, whereas the use of the sinusoidal basis functions requires to compute trigonometric functions. In both cases, this is usually done by means of truncated Taylor series. Furthermore, the use of Gaussian basis functions entails the computations of norms, whereas sigmoidal and sinusoidal basis functions are based on the computations of inner products. The complexity of these two operations is quite similar. Thus, we can conclude that the time needed to find the output of an approximating structure for all the considered types of approximators is almost the same. The choice of the type of approximator does not change the total number of optimizations described above. The increased amount of computational time needed to find the approximations when using the Gaussians may be ascribed to the form of the objective functions one has to deal with in the various optimizations. With respect to this, note that Gaussians exhibit opposite geometrical properties as compared with sigmoids and sinusoids. In the first case, one has to compute distances to centers, and such distances become the argument of a Gaussian. Hence, they respond to “localized” regions. By contrast, in the second case, the arguments of the basis functions are weighted sums of inputs plus biases, so they respond to “nonlocalized” regions of the input space. The local nature of the Gaussian functions may generate very complex objective functions, possibly with a lot of local minima. For this reason, the optimization procedure when using Gaussian functions may be more complex and so more time-consuming. 8 Conclusive Remarks We have investigated the solution of sequential decision problems, modeled as Problem ΣN , by means of DP combined with approximation of the value functions at each stage. We have considered variable-basis approximation schemes frequently used in applications, such as neural networks. Both standard approximate DP and the 408 J Optim Theory Appl (2013) 156:380–416 multistage-lookahead technique have been studied. In particular, we have addressed the curse of dimensionality in value-function approximation, i.e., the risk of a very fast growth of the number of basis functions in the approximation scheme (hence, of the number of coefficients to be optimized) required to guarantee a desired approximation accuracy of the value functions. The estimates that √ we have derived on the accuracy of suboptimal solutions are proportional to 1/ n, where n is the number of variable-basis functions. To the best of our knowledge, for quite general formulations of sequential optimization problems in the form of Problem ΣN , these are the first estimates of this kind, where the number of variable-basis functions required to guarantee a desired accuracy is estimated. Our results show a way to face via approximate DP high-dimensional, continuous-state, sequential decision problems and provide insights into the effectiveness of value-function approximation by neural networks, for which there exists a large experimental evidence. The proposed approach has been tested numerically on a problem of optimal consumption under uncertainty, for which we have compared traditional fixed-basis approximators with variable-basis ones that model Gaussian, sinusoidal, and sigmoidal neural networks. Appendix Proof of Proposition 2.2 (i) We use a backward induction argument. For t = N − 1, o ∈F o . . . , 0, assume that, at stage t + 1, J˜t+1 t+1 is such that supxt+1 ∈Xt+1 |Jt+1 (xt+1 ) − o J˜t+1 (xt+1 )| ≤ ηt+1 for some ηt+1 ≥ 0. In particular, for t = N −1, one has ηN = 0, as o )(x ) − f (x )| ≤ ε . J˜No = JNo . By (3), there exists ft ∈ Ft such that supxt ∈Xt |(Tt J˜t+1 t t t t Set J˜to = ft . By the triangle inequality and Proposition 2.1, sup J o (xt ) − J˜o (xt ) ≤ sup Tt J o (xt ) − Tt J˜o (xt ) xt ∈Xt t t t+1 xt ∈Xt t+1 o + sup Tt J˜t+1 (xt ) − J˜to (xt ) xt ∈Xt ≤ βηt+1 + εt := ηt . Then, after N iterations we get supx0 ∈X0 |J0o (x0 ) − J˜0o (x0 )| ≤ η0 = ε0 + βη1 = ε0 + −1 t βε1 + β 2 η2 = · · · := N t=0 β εt . o ∈F (ii) As before, for t = N − 1, . . . , 0, assume that, at stage t + 1, J˜t+1 t+1 is such o o that supxt+1 ∈Xt+1 |Jt+1 (xt+1 ) − J˜t+1 (xt+1 )| ≤ ηt+1 for some ηt+1 ≥ 0. In particular, o . Proposition 2.1 gives for t = N − 1, one has ηN = 0, as J˜No = JNo . Let Jˆto = Tt J˜t+1 o o sup Jto (xt ) − Jˆto (xt ) = sup Tt Jt+1 (xt ) − Tt J˜t+1 (xt ) xt ∈Xt xt ∈Xt ≤β sup xt+1 ∈Xt+1 o J (xt+1 ) − J˜o (xt+1 ) ≤ βηt+1 . t+1 t+1 J Optim Theory Appl (2013) 156:380–416 409 Before moving to the tth stage, one has to find an approximation J˜to ∈ Ft for o . Such an approximation has to be obtained from Jˆo = T J˜o (which, Jto = Tt Jt+1 t t+1 t o is unknown. By assumption, in general, may not belong to Ft ), because Jto = Tt Jt+1 there exists ft ∈ Ft such that supxt ∈Xt |Jto (xt ) − ft (xt )| ≤ εt . However, in general, one cannot set J˜to = ft , since on a neighborhood of radius βηt+1 of Jˆto in the supnorm, there may exist (besides Jto ) some other function It = Jto which can also be approximated by some function f˜t ∈ Ft with error less than or equal to εt . As Jto is unknown, in the worst case it happens that one chooses J˜to = f˜t instead of J˜to = ft . In such a case, we get sup Jto (xt ) − J˜to (xt ) ≤ sup Jto (xt ) − Jˆto (xt ) + sup Jˆto (xt ) − It (xt ) xt ∈Xt xt ∈Xt xt ∈Xt + sup It (xt ) − J˜to (xt ) ≤ 2βηt+1 + εt . xt ∈Xt Let ηt := 2βηt+1 + εt . Then, after N iterations we have supx0 ∈X0 |J0o (x0 ) − J˜0o (x0 )| ≤ −1 t η0 = ε0 + 2βη1 = ε0 + 2βε1 + 4β 2 η2 = · · · = N t=0 (2β) εt . Proof of Proposition 2.3 Set ηN/M = 0 and for t = N/M − 1, . . . , 0, assume that, o ∈F o at stage t + 1 of ADP(M) , J˜t+1 t+1 is such that supxt+1 ∈Xt+1 |JM·(t+1) (xt+1 ) − o J˜ (xt+1 )| ≤ ηt+1 . Proceeding as in the proof of Proposition 2.2(i), we get the ret+1 cursion ηt = 2β M ηt+1 + εt (where β M replaces β since in each iteration of ADP(M) one can apply M times Proposition 2.1). In order to prove Proposition 3.1, we shall apply the following technical lemma (which readily follows by [53, Theorem 2.13, p. 69] and the example in [53, p. 70]). A B such that D is nonsingular, Given a square partitioned real matrix M = C D Schur’s complement M/D of D in M is defined [53, p. 18] as the matrix M/D = A − BD −1 C. For a symmetric real matrix, we denote by λmax its maximum eigenvalue. B be a partitioned symmetric negative-semidefinite maLemma 9.1 Let M = BAT D trix such that D is nonsingular. Then λmax (M/D) ≤ λmax (M). In the proof of the next theorem, we shall use the following notations. The symbol ∇ denotes the gradient operator when it is applied to a scalar-valued function and the Jacobian operator when applied to a vector-valued function. We use the notation ∇ 2 for the Hessian. In the case of a composite function, e.g., f (g(x, y, z), h(x, y, z)), by ∇i f (g(x, y, z), h(x, y, z)) we denote the gradient of f with respect to its ith (vector) argument, computed at (g(x, y, z), h(x, y, z)). The full gradient of f with respect to the argument x is denoted by ∇x f (g(x, y, z), h(x, y, z)). Similarly, by 2 f (g(x, y, z), h(x, y, z)) we denote the submatrix of the Hessian of f computed ∇i,j at (g(x, y, z), h(x, y, z)), whose first indices belong to the vector argument i and the second ones to the vector argument j . ∇Jto (xt ) is a column vector, and ∇gto (xt ) is a matrix whose rows are the transposes of the gradients of the components of 410 J Optim Theory Appl (2013) 156:380–416 o the j th component of the optimal policy function g o gto (xt ). We denote by gt,j t (j = 1, . . . , d). The other notations used in the proof are detailed in Sect. 3. Proof of Proposition 3.1 (i) Let us first show by backward induction on t that Jto ∈ o ∈ C m−1 (X ) (which we also need in the C m (Xt ) and, for every j ∈ {1, . . . , d}, gt,j t o o m proof). Since JN = hN , we have JN ∈ C (XN ) by hypothesis. Now, fix t and suppose o ∈ C m (X that Jt+1 t+1 ) and is concave. Let xt ∈ int(Xt ). As by hypothesis the optimal policy gto is interior on int(Xt ), the first-order optimality condition ∇2 ht (xt , gto (xt ))+ o (g o (x )) = 0 holds. By the implicit function theorem we get β∇Jt+1 t t o 2 −1 2 o ht xt , gto (xt ) + β∇ 2 Jt+1 gt (xt ) ∇gto (xt ) = − ∇2,2 ∇2,1 ht xt , gto (xt ) , (39) 2 (h (x , g o (x ))) + β∇ 2 J o (g o (x )) is nonsingular as ∇ 2 (h (x , g o (x ))) where ∇2,2 t t t t t t 2,2 t t t t+1 t o (g o (x )) is is negative semidefinite by the αt -concavity of ht for αt > 0, and ∇ 2 Jt+1 t t o is concave. negative definite since Jt+1 o of By differentiating the two members of (39) up to derivatives of ht and Jt+1 o order m, for j = 1, . . . , d, we get gt,j ∈ C m−1 (int(Xt )). As the expressions that one can obtain for its partial derivatives up to the order m − 1 are bounded and continuous o ∈ C m−1 (X ). not only on int(Xt ), but on the whole Xt , one has gt,j t o (g o (x )) we obtain By differentiating the equality Jto (xt ) = ht (xt , gto (xt )) + βJt+1 t t T T o ∇Jt (xt ) = ∇1 ht xt , gto (xt ) o T o gt (xt ) ∇gto (xt ). + ∇2 ht xt , gto (xt ) + β∇Jt+1 So, by the first-order optimality condition we get ∇Jto (xt ) = ∇1 ht xt , gto (xt ) . (40) By differentiating the two members of (40) up to derivatives of ht of order m, we obtain Jto ∈ C m (int(Xt )). Likewise for the optimal policies, this extends to Jto ∈ C m (Xt ). In order to conclude the backward induction step, it remains to show that Jto is concave. This can be proved by the following direct argument. By differentiating (40) and using (39), for the Hessian of Jto , we obtain 2 ∇ 2 Jto (xt ) = ∇1,1 ht xt , gto (xt ) 2 −1 2 o xt , gto (xt ) − ∇1,2 ht xt , gto (xt ) ∇2,2 ht xt , gto (xt ) + β∇ 2 Jt+1 2 × ∇2,1 ht xt , gto (xt ) , 2 h (x , g o (x )) + β∇ 2 J o (x , g o (x ))] in the which is Schur’s complement of [∇2,2 t t t t t+1 t t t matrix # " 2 2 h (x , g o (x )) ∇1,2 ∇1,1 ht (xt , gto (xt )) t t t t . 2 h (x , g o (x )) ∇ 2 h (x , g o (x )) + β∇ 2 J o (x , g o (x )) ∇2,1 t t t t t 2,2 t t t t t+1 t t J Optim Theory Appl (2013) 156:380–416 411 Note that such a matrix is negative semidefinite, as it is the sum of the two matrices " 2 # " # 2 h (x , g o (x )) ∇1,1 ht (xt , gto (xt )) ∇1,2 0 0 t t t t and , o (x , g o (x )) 2 h (x , g o (x )) ∇ 2 h (x , g o (x )) 0 β∇ 2 Jt+1 ∇2,1 t t t t t t t 2,2 t t t t o are concave and twice continuously which are negative-semidefinite as ht and Jt+1 differentiable. In particular, it follows by [54, p. 102] (which gives bounds on the eigenvalues of the sum of two symmetric matrices) that its maximum eigenvalue is smaller than or equal to αt . Then, it follows by Lemma 9.1 that Jto is concave (even αt -concave). Thus, by backward induction on t and by the compactness of Xt we conclude that, for every t = N, . . . , 0, Jto ∈ C m (Xt ) ⊂ Wpm (int(Xt )) for every 1 ≤ p ≤ +∞. (ii) As Xt is bounded and convex, by Sobolev’s extension theorem [34, Theorem 5, p. 181, and Example 2, p. 189], for every 1 ≤ p ≤ +∞, the function o,p Jto ∈ Wpm (int(Xt )) can be extended on the whole Rd to a function J¯t ∈ Wpm (Rd ). (iii) For 1 < p < +∞, the statement follows by item (ii) and the equivalence between Sobolev spaces and Bessel potential spaces [34, Theorem 3, p. 135]. For p = 1 and m ≥ 2 even, it follows by item (ii) and the inclusion W1m (Rd ) ⊂ B1m (Rd ) from [34, p. 160]. o with Proof of Proposition 3.2 (i) is proved likewise Proposition 3.1 by replacing Jt+1 o and g o with g̃ o . J˜t+1 t t (ii) Inspection of the proof of Proposition 3.1(i) shows that Jto is αt -concave (αt > 0) for t = 0, . . . , N − 1, whereas the αN -concavity (αN > 0) of JNo = hN o is assumed. By (12) and condition (10), J˜t+1,j is concave for j sufficiently large. o,p o , and so there exists Jˆ ∈ Wpm (Rd ) such that Hence, one can apply (i) to J˜ t,j t+1,j o,p o = Jˆt,j |Xt . Proceeding as in the proof of Proposition 3.1, one obtains equaTt J˜t+1,j tions analogous to (39) and (40) (with obvious replacements). Then, by differentiating o Tt J˜t+1,j up to the order m, we get lim max j →∞ 0≤|r|≤m o supxt ∈Xt D r Jto (xt ) − Tt J˜t+1,j (xt ) = 0. Finally, the statement follows by the continuity of the embedding of C m (Xt ) into (since Xt is compact) and the continuity of the Sobolev’s extension operator. Wpm (int(Xt )) Proof of Proposition 4.1 (i) For ω ∈ Rd , let M(ω) = max{ω, 1}, ν be a positive integer, and define the set of functions Γ ν Rd := f ∈ L2 Rd : M(ω)ν fˆ(ω) dω < ∞ , Rd where fˆ is the Fourier transform of f . For f ∈ Γ ν (Rd ), let f Γ ν (Rd ) := M(ω)ν fˆ(ω) dω Rd 412 J Optim Theory Appl (2013) 156:380–416 and for θ > 0, denote by Bθ · Γ ν (Rd ) := f ∈ L2 Rd : Rd M(ω)ν fˆ(ω) dω ≤ θ , the closed ball of radius θ in Γ ν (Rd ). By [55, Corollary 3.2]3 , the compactness of the support of ψ , and the regularity of its boundary (which allows one to apply the Rellich–Kondrachov theorem [56, Theorem 6.3, p. 168]), for s = d/2 + 1 and ψ ∈ S q+s , there exists4 C1 > 0 such that, for every f ∈ Bθ ( · Γ q+s+1 ) and every positive integer n, there is fn ∈ R(ψ, n) such that θ max sup D r f (x) − D r fn (x) ≤ C1 √ . 0≤|r|≤q x∈X n (41) The next step consists in proving that, for every positive integer ν and s = d/2 + 1, the space W2ν+s (Rd ) is continuously embedded in Γ ν (Rd ). Let f ∈ W2ν+s (Rd ). Then fˆ(ω) dω + M(ω)ν fˆ(ω) dω = ων fˆ(ω) dω. Rd ω≤1 ω>1 The is finite by the Cauchy–Schwarz inequality and the finiteness of first integral ˆ(ω)|2 dω. To study the second integral, taking the hint from [37, p. 941], | f ω≤1 we factorize ων |fˆ(ω)| = a(ω)b(ω), where a(ω) := (1 + ω2s )−1/2 and b(ω) := ων |fˆ(ω)|(1 + ω2s )1/2 . By the Cauchy–Schwarz inequality, 1/2 1/2 ν ˆ 2 2 ω f (ω) dω ≤ a (ω) dω b (ω) dω . ω>1 Rd Rd The integral Rd a 2 (ω) dω = Rd (1 + ω2s )−1 dω is finite for 2s > d, which is satisfied for all d ≥ 1 as s = d/2 + 1. By Parseval’s identity [57, p. 172], since f has square-integrable νth and (ν + s)th partial derivatives, the integral Rd b2 (ω) dω = 2ν ˆ 2 2s 2 2ν + ω2(ν+s) ) dω is finite. ˆ Rd ω |f (ω)| (1 + ω ) dω = Rd |f (ω)| (ω ν ν d Hence, Rd M(ω) |fˆ(ω)| dω is finite, so f ∈ Γ (R ), and, by the argument above, there exists C2 > 0 such that Bρ ( · W ν+s ) ⊂ BC2 ρ ( · Γ ν ). 2 Taking ν = q + s + 1 as required in (41) and C = C1 · C2 , we conclude that, for every f ∈ Bρ ( · W q+2s+1 ) and every positive integer n, there exists fn ∈ R(ψ, n) 2 such that max0≤|r|≤q supx∈X |D r f (x) − D r fn (x)| ≤ C √ρn . (ii) Follows by [40, Theorem 2.1] and the Rellich–Kondrachov theorem [56, Theorem 6.3, p. 168], which allows to use “sup” in (20) instead of “ess sup”. (iii) Follows by [58, Corollary 5.2]. 3 Note that [55, Corollary 3.2] uses “ess sup” instead of “sup” in (41). However, by the Rellich–Kondrachov theorem [56, Theorem 6.3, p. 168], one can replace “ess sup” with “sup”. 4 Unfortunately, [55, Corollary 3.2] does not provide neither a closed-form expression of C , nor an upper 1 bound on it. For results similar to [55, Corollary 3.2] and for specific choices of ψ , [55] gives upper bounds on similar constants (see, e.g., [55, Theorem 2.3 and Corollary 3.3]). J Optim Theory Appl (2013) 156:380–416 413 Proof of Proposition 4.2 (i) We detail the proof for t = N − 1 and t = N − 2; the other cases follow by backward induction. Let us start with t = N − 1 and J˜No = JNo . By Proposition 3.1(ii), there exists 2+(2s+1)N (Rd ) such that TN −1 J˜No = TN −1 JNo = JNo −1 = J¯No,2 J¯No,2 −1 ∈ W2 −1 |XN−1 . o,2 ¯ By Proposition 4.1(i) with q = 2 + (2s + 1)(N − 1) applied to JN −1 , we obtain (22) for t = N − 1. Set J˜No −1 = fN −1 in (22). By (22) and condition (10), there exists a positive integer n̄N −1 such that J˜No −1 is concave for nN −1 ≥ n̄N −1 . Now consider t = N − 2. By Proposition 3.2(i), it follows that there exists Jˆo,2 ∈ W 2+(2s+1)(N −1) (Rd ) such that TN −2 J˜o = Jˆo,2 |XN−2 . By applying to N −2 N −1 2 N −2 JˆNo,2 −2 Proposition 4.1(i) with q = 2 + (2s + 1)(N − 2), for every positive integer nN −2 , we conclude that there exists fN −2 ∈ R(ψt , nN −2 ) such that max sup D r TN −2 J˜No −1 (xN −2 ) − D r fN −2 (xN −2 ) 0≤|r|≤2+(2s+1)(N −2) xN−2 ∈XN−2 C̄N −2 JˆNo,2 −2 W22+(2s+1)(N−1) (Rd ) ≤ , √ nN −2 (42) 2+(2s+1)(N −1) where, by Proposition 3.2(i), JˆNo,2 (Rd ) is a suitable extension of −2 ∈ W2 o d TN −2 J˜N −1 on R , and C̄N −2 > 0 does not depend on the approximations generated in the previous iterations. The statement for t = N − 2 follows by the fact that the dependence of the bound (42) on JˆNo,2 −2 W22+(2s+1)(N−1) (Rd ) can be removed by exploiting Proposition 3.2(ii); in particular, we can choose CN −2 > 0 independently of nN −1 . So, we get (22) for t = N − 2. Set J˜No −2 = fN −2 in (22). By (22) and condition (10), there exists a positive integer n̄N −2 such that J˜No −2 is concave for nN −2 ≥ n̄N −2 . The proof proceeds similarly for the other values of t; each constant Ct can be chosen independently on nt+1 , . . . , nN −1 . (ii) follows by Proposition 3.1(ii) (with p = +∞) and Proposition 4.1(ii). (iii) follows by Proposition 3.1(iii) (with p = 1) and Proposition 4.1(iii). Proof of Proposition 5.1 We first derive some constraints on the form of the sets At,j and then show that the budget constraints (25) are satisfied if and only if the sets At,j are chosen as in Assumption 5.1 (or are suitable subsets). As the labor incomes yt,j and the interest rates rt,j are known, for t = 1, . . . , N , we have max at,j ≤ a0,j t−1 k=0 (1 + rk,j ) + t−1 yi,j i=0 t−1 max (1 + rk,j ) = at,j k=i (the upper bound is achieved when all the consumptions ct,j are equal to 0), so the max . The boundedness corresponding feasible sets At,j are bounded from above by at,j from below of each At,j follows from the budget constraints (25), which for ck,j = 0 (k = t, . . . , N ) are equivalent for t = N to aN,j ≥ −yN,j (43) 414 J Optim Theory Appl (2013) 156:380–416 −1 N −1 N −1 and for t = 0, . . . , N − 1 to at,j N k=t (1 + rk,j ) + i=t yi,j k=i (1 + rk,j ) + yN,j ≥ 0, i.e., N −1 N −1 i=t yi,j k=i (1 + rk,j ) + yN,j at,j ≥ − . (44) N −1 k=t (1 + rk,j ) So, in order to satisfy the budget constraints (25), the constraints (43) and (44) have to be satisfied. Then the maximal sets At that satisfy the budget constraints (25) have the form described in Assumption 5.1. Proof of Proposition 5.2 (a) About Assumption 3.1(i). By construction, the sets Āt are compact, convex, and have nonempty interiors, since they are Cartesian products of nonempty closed intervals. The same holds for the D̄t , since by (31) they are the intersections between Āt × Āt+1 and the sets Dt , which are compact, convex, and have nonempty interiors too. (b) About Assumption 3.1(ii). This is Assumption 5.2(i), with the obvious replacements of Xt and Dt . (c) About Assumption 3.1(iii). Recall that for Problem OCdN and t = 0, . . . , N − 1, we have d (1 + rt ) ◦ (at + yt ) − at+1 vt,j (at,j ). ht (at , at+1 ) = u + 1 + rt j =1 Then, ht ∈ C m (D̄t ) by Assumption 5.2(ii) and (iii). As u(·) and vt,j (·) are twice continuously differentiable, the second part of Assumption 3.1(iii) means that there exists some αt > 0 such that the function d (1 + rt ) ◦ (at + yt ) − at+1 1 + u vt,j (at,j ) + αt at 2 1 + rt 2 j =1 has negative semi-definite Hessian with respect to the variables at and at+1 . Assumpt +yt )−at+1 ) has tion 5.2(ii) and easy computations show that the function u( (1+rt )◦(a1+r t negative semi-definite Hessian. By Assumption 5.2(iii), for each j = 1, . . . , d and 2 has negative semi-definite Hessian too. So, Asαt,j ∈ (0, βt,j ], vt,j (at,j ) + 12 αt,j at,j sumption 3.1(iii) is satisfied for every αt ∈ (0, minj =1,...,d {βt,j }]. (d) About Assumption 3.1(iv). Recall that for Problem OCdN , we have hN (aN ) = u(aN + yN ). Then, hN ∈ C m (ĀN ) and is concave by Assumption 5.2(ii). References 1. Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957) 2. Bertsekas, D.P., Tsitsiklis, J.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996) 3. Powell, W.B.: Approximate Dynamic Programming—Solving the Curses of Dimensionality. Wiley, Hoboken (2007) 4. Si, J., Barto, A.G., Powell, W.B., Wunsch, D. (eds.): Handbook of Learning and Approximate Dynamic Programming. IEEE Press, New York (2004) J Optim Theory Appl (2013) 156:380–416 415 5. Zoppoli, R., Parisini, T., Sanguineti, M., Gnecco, G.: Neural Approximations for Optimal Control and Decision. Springer, London (2012, in preparation) 6. Haykin, S.: Neural Networks: a Comprehensive Foundation. Prentice Hall, New York (1998) 7. Bertsekas, D.P.: Dynamic Programming and Optimal Control vol. 1. Athena Scientific, Belmont (2005) 8. Bellman, R., Dreyfus, S.: Functional approximations and dynamic programming. Math. Tables Other Aids Comput. 13, 247–251 (1959) 9. Bellman, R., Kalaba, R., Kotkin, B.: Polynomial approximation—a new computational technique in dynamic programming. Math. Comput. 17, 155–161 (1963) 10. Foufoula-Georgiou, E., Kitanidis, P.K.: Gradient dynamic programming for stochastic optimal control of multidimensional water resources systems. Water Resour. Res. 24, 1345–1359 (1988) 11. Johnson, S., Stedinger, J., Shoemaker, C., Li, Y., Tejada-Guibert, J.: Numerical solution of continuousstate dynamic programs using linear and spline interpolation. Oper. Res. 41, 484–500 (1993) 12. Chen, V.C.P., Ruppert, D., Shoemaker, C.A.: Applying experimental design and regression splines to high-dimensional continuous-state stochastic dynamic programming. Oper. Res. 47, 38–53 (1999) 13. Cervellera, C., Muselli, M.: Efficient sampling in approximate dynamic programming algorithms. Comput. Optim. Appl. 38, 417–443 (2007) 14. Philbrick, C.R. Jr., Kitanidis, P.K.: Improved dynamic programming methods for optimal control of lumped-parameter stochastic systems. Oper. Res. 49, 398–412 (2001) 15. Judd, K.: Numerical Methods in Economics. MIT Press, Cambridge (1998) 16. Kůrková, V., Sanguineti, M.: Comparison of worst-case errors in linear and neural network approximation. IEEE Trans. Inf. Theory 48, 264–275 (2002) 17. Tesauro, G.: Practical issues in temporal difference learning. Mach. Learn. 8, 257–277 (1992) 18. Gnecco, G., Sanguineti, M., Gaggero, M.: Suboptimal solutions to team optimization problems with stochastic information structure. SIAM J. Optim. 22, 212–243 (2012) 19. Tsitsiklis, J.N., Roy, B.V.: Feature-based methods for large scale dynamic programming. Mach. Learn. 22, 59–94 (1996) 20. Zoppoli, R., Sanguineti, M., Parisini, T.: Approximating networks and extended Ritz method for the solution of functional optimization problems. J. Optim. Theory Appl. 112, 403–439 (2002) 21. Alessandri, A., Gaggero, M., Zoppoli, R.: Feedback optimal control of distributed parameter systems by using finite-dimensional approximation schemes. IEEE Trans. Neural Netw. Learn. Syst. 23(6), 984–996 (2012) 22. Stokey, N.L., Lucas, R.E., Prescott, E.: Recursive Methods in Economic Dynamics. Harvard University Press, Cambridge (1989) 23. Bertsekas, D.P.: Dynamic Programming and Optimal Control vol. 2. Athena Scientific, Belmont (2007) 24. White, D.J.: Markov Decision Processes. Wiley, New York (1993) 25. Puterman, M.L., Shin, M.C.: Modified policy iteration algorithms for discounted Markov decision processes. Manag. Sci. 41, 1127–1137 (1978) 26. Altman, E., Nain, P.: Optimal control of the M/G/1 queue with repeated vacations of the server. IEEE Trans. Autom. Control 38, 1766–1775 (1993) 27. Lendaris, G.G., Neidhoefer, J.C.: Guidance in the choice of adaptive critics for control. In: Si, J., Barto, A.G., Powell, W.B., Wunsch, D. (eds.) Handbook of Learning and Approximate Dynamic Programming, pp. 97–124. IEEE Press, New York (2004) 28. Karp, L., Lee, I.H.: Learning-by-doing and the choice of technology: the role of patience. J. Econ. Theory 100, 73–92 (2001) 29. Rapaport, A., Sraidi, S., Terreaux, J.: Optimality of greedy and sustainable policies in the management of renewable resources. Optim. Control Appl. Methods 24, 23–44 (2003) 30. Semmler, W., Sieveking, M.: Critical debt and debt dynamics. J. Econ. Dyn. Control 24, 1121–1144 (2000) 31. Nawijn, W.M.: Look-ahead policies for admission to a single-server loss system. Oper. Res. 38, 854– 862 (1990) 32. Gnecco, G., Sanguineti, M.: Suboptimal solutions to dynamic optimization problems via approximations of the policy functions. J. Optim. Theory Appl. 146, 764–794 (2010) 33. Hiriart-Urruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms I. Springer, Berlin (1993) 34. Stein, E.M.: Singular Integrals and Differentiability Properties of Functions. Princeton University Press, Princeton (1970) 416 J Optim Theory Appl (2013) 156:380–416 35. Singer, I.: Best Approximation in Normed Linear Spaces by Elements of Linear Subspaces. Springer, Berlin (1970) 36. Kůrková, V., Sanguineti, M.: Geometric upper bounds on rates of variable-basis approximation. IEEE Trans. Inf. Theory 54, 5681–5688 (2008) 37. Barron, A.R.: Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inf. Theory 39, 930–945 (1993) 38. Gnecco, G., Kůrková, V., Sanguineti, M.: Some comparisons of complexity in dictionary-based and linear computational models. Neural Netw. 24, 171–182 (2011) 39. Wahba, G.: Spline Models for Observational Data. CBMS-NSF Regional Conf. Series in Applied Mathematics, vol. 59. SIAM, Philadelphia (1990) 40. Mhaskar, H.N.: Neural networks for optimal approximation of smooth and analytic functions. Neural Comput. 8, 164–177 (1996) 41. Kainen, P.C., Kůrková, V., Sanguineti, M.: Complexity of Gaussian radial-basis networks approximating smooth functions. J. Complex. 25, 63–74 (2009) 42. Alessandri, A., Gnecco, G., Sanguineti, M.: Minimizing sequences for a family of functional optimal estimation problems. J. Optim. Theory Appl. 147, 243–262 (2010) 43. Adda, J., Cooper, R.: Dynamic Economics: Quantitative Methods and Applications. MIT Press, Cambridge (2003) 44. Fang, K.T., Wang, Y.: Number-Theoretic Methods in Statistics. Chapman & Hall, London (1994) 45. Hammersley, J.M., Handscomb, D.C.: Monte Carlo Methods, Methuen, London (1964) 46. Niederreiter, H.: Random Number Generation and Quasi-Monte Carlo Methods. SIAM, Philadelphia (1992) 47. Sobol’, I.: The distribution of points in a cube and the approximate evaluation of integrals. Zh. Vychisl. Mat. Mat. Fiz. 7, 784–802 (1967) 48. Loomis, L.H.: An Introduction to Abstract Harmonic Analysis. Van Nostrand, Princeton (1953) 49. Boldrin, M., Montrucchio, L.: On the indeterminacy of capital accumulation paths. J. Econ. Theory 40, 26–39 (1986) 50. Dawid, H., Kopel, M., Feichtinger, G.: Complex solutions of nonconcave dynamic optimization models. Econ. Theory 9, 427–439 (1997) 51. Chambers, J., Cleveland, W.: Graphical Methods for Data Analysis. Wadsworth/Cole Publishing Company, Pacific Grove (1983) 52. Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, New York (2006) 53. Zhang, F. (ed.): The Schur Complement and Its Applications. Springer, New York (2005) 54. Wilkinson, J.H.: The Algebraic Eigenvalue Problem. Oxford Science Publications, Oxford (2004) 55. Hornik, K., Stinchcombe, M., White, H., Auer, P.: Degree of approximation results for feedforward networks approximating unknown mappings and their derivatives. Neural Comput. 6, 1262–1275 (1994) 56. Adams, R.A., Fournier, J.J.F.: Sobolev Spaces. Academic Press, San Diego (2003) 57. Rudin, W.: Functional Analysis. McGraw-Hill, New York (1973) 58. Gnecco, G., Sanguineti, M.: Approximation error bounds via Rademacher’s complexity. Appl. Math. Sci. 2, 153–176 (2008)
© Copyright 2026 Paperzz