Pessimistic Optimal Choice for Risk-averse Agents: The Continuous-time Limit∗ Paolo Vitale University of Pescara† July 2014 ∗ I wish to thank Fausto di Biase and seminar participants at LUISS University, the University of Bologna, the University of Tor Vergata, the EIEF Center in Rome, and the 2014 John Cabot University Economic Theory Workshop. † Department of Economics, Università Gabriele d’Annunzio, Viale Pindaro 42, 65127 Pescara (Italy); telephone: ++39-085-453-7647; fax: ++39-085-453-7565; webpage: http:/www.unich.it/~vitale; e-mail: [email protected] Pessimistic Optimal Choice for Risk-averse Agents: The Continuous-time Limit ABSTRACT We extend Hansen and Sargent’s (Hansen and Sargent, 1994, 1995, 2013) analysis of dynamic optimization with risk-averse agents in two directions. Firstly, following Whittle (Whittle, 1990), we show that the optimal risk-averse policy is identified via a pessimistic choice mechanism and described by simple recursive formulae. Secondly, we investigate the continuous-time limit and show that sufficient conditions for the existence of optimal solutions coincide with those which apply under risk-neutrality. Our analysis is conducted both under perfect and imperfect state observation. As an illustrative example, we analyze the optimal production policy of an entrepreneur running a monopolistic firm which faces a demand schedule subject to stochastic shocks, showing that risk-aversion induces her to act more aggressively. JEL Classification Numbers: C61. Keywords: Pessimistic Agents, Time-discounting, Linear Exponential Quadratic Gaussian. Pessimism: the tendency to expect the worst in all things Collins New Dictionary Introduction Risk-aversion is an important aspect of agents’ preferences, which heavily influences their actions, in particular when the economic environment is complex and uncertain, and agents need to consider the future implications of their decisions. Investigating the nexus between risk-aversion and agents’ behavior is a challenge, in that optimal control problems are typically difficult to solve under risk-aversion. Researchers have usually relied on linear-quadratic formulations as they can be easily solved employing well-established results. Thus, within a linear-quadratic set-up straightforward recursive formulae immediately yield the optimal policy, while according to the certainty equivalence principle unknown variables can be replaced by their maximum likelihood estimates. Linear-quadratic formulations are however problematic, as the convexity of the quadratic cost function represents risk-aversion in an unsatisfactory manner. In fact, the prescribed optimal policy does not change when the environmental uncertainty varies, while risk-averse agents ought to care for the degree of uncertainty they face. In addition, in many economic problems, such as the production problem we investigate in Section 2, a quadratic objective function corresponds to the assumption of risk-neutrality on the part of the optimizing agent. To correct for these shortcomings, Whittle (1990) introduces a risk-adjustment in the objective function of the standard linear-quadratic set-up moving from a linear-quadratic to a linear-exponential-quadratic formulation. In doing so he augments the convexity of the objective function and the degree of risk-aversion of the optimizing agent. Within his formulation, he shows that: i) the optimal policy is identified via a pessimistic (or worst-case) choice mechanism; ii) risk-sensitive (or modified) certainty equivalence and separation principles hold; and iii) recursive formulae describe the optimal policy. Despite its versatility, few researchers have employed Whittle’s methodology in economics (exceptions are Mamaysky and Spiegel, 2002; van der Ploeg, 2009, 2010; Vitale, 1995, 2012; Zhang, 2004). This is because an important limitation of his methodology is that it does not allow to consider time-discounting in a satisfactory way. 1 Hansen and Sargent (1994, 1995, 2013) have introduced time-discounting into risk-sensitive optimal control problems by formulating a recursive optimization criterion à la Epstein and Zin. We extend their contribution in two ways. Firstly, we manipulate their optimization criterion in order to exploit a number of results developed by Whittle. Secondly, we consider the continuous-time limit of their discrete-time formulation. This allows to establish that sufficient conditions for the existence of optimal solutions coincide with those which apply under riskneutrality. Indeed, an important feature of Hansen and Sargent’s linear-exponential-quadratic formulation in discrete-time is that for a sufficiently high degree of risk-aversion no optimal policy exists. On the contrary, in the continuous time limit an optimal solution always exists if some regularity conditions are met, irrespective of the degree of risk-aversion. This paper is organized as follows. In Section 1 we reformulate Hansen and Sargent’s recursive optimization criterion for the class of Markovian discounted linear exponential quadratic Gaussian (DLEQG) problems so as to apply, with simple adjustments, results originally derived by Whittle for the class of linear exponential quadratic Gaussian (LEQG) problems. In this Section we see how Whittle’s risk-sensitive certainty equivalence principle (risk-sensitive CEP) is maintained while the corresponding Riccati equation, which yields the optimal policy for the class of Markovian LEQG problems, is modified. In Section 2 we investigate the continuous-time limit of the DLEQG problem, showing that in the limit the optimal policy is characterized by a modified version of the differential Riccati equation which applies to the corresponding LEQG formulation. We are then able to show that in the continuous-time limit if the quadratic cost function in the DLEQG formulation is positive definite an optimal solution always exists irrespective of the degree of risk-aversion of the optimizing agent. In this Section we also discuss, as an illustrative example of the Markovian DLEQG problem in continuous-time, the optimal production policy of a risk-averse entrepreneur which runs a monopolist firm facing a demand schedule subject to stochastic shocks for the commodity it produces. Interestingly, risk-aversion makes the entrepreneur more aggressive. Given her preferences, the entrepreneur finds it optimal to systematically produce a larger quantity than that selected by her risk-neutral counterpart. This is because, despite a larger supply of the commodity depresses its price and jeopardizes future profits, it also reduces their variability. Consequently, a risk-averse entrepreneur is willing to gain smaller profits (in order to reduce their variability) and to produce a larger quantity of the commodity than a risk-neutral counterpart. 2 This may appear counter-intuitive as it contradicts results typically obtained within static formulations. However, such a feature of the impact of risk-aversion on the behavior of economic agents in dynamic optimization exercises appears elsewhere. For instance, Holden and Subrahmanyam (1994) and Vitale (1995, 2012) find that risk-aversion makes a privately informed strategic agent trade more aggressively in a sequential call auction market, while the same trader would be more cautions in a one-shot call auction market. In other words, a point which is worth emphasizing is that the Markovian DLEQG formulation allows to derive implications of risk-aversion which are both general and stark. Indeed, not only risk-averse agents are pessimistic, but also bold. In Section 3 we investigate the class of DLEQG problems under imperfect state observation. Here, Whittle’s risk-sensitive separation principle (risk-sensitive SP), which allows to separate control and estimation, is reformulated for the class of DLEQG problems and it is applied to the example discussed in Section 2. In Section 4 we extend our analysis to the formulation in which the state vector is subject to pre-determined disturbances. A final Section offers some concluding remarks. 1 Discounted LEQG Problems We define a specific class of optimal control problems, which are characterized by: i) a Markovian linear dynamic structure for a vector of state variables, zn ; ii) a multi-normal distribution for an innovation vector, n ; and iii) a recursive optimization criterion à la Epstein and Zin. Suppose the interval of time [0, T ] is divided in N sub-periods (n = 1, 2, . . . , N ) of length ∆ = T /N . For ∆ finite (e.g. ∆ = 1) we will have a discrete-time formulation, while for ∆ ↓ 0 we converge to the continuous-time analogous. We then introduce the following Definition: Definition 1 An optimal control problem is said to be Markovian linear exponential quadratic Gaussian with time-discounting if the following recursive optimization Vn = min un ρ i 2 h ln En exp δ ∆ V n+1 , ∆cn + ρ 2 (1.1) where ρ (with ρ > 0) is the risk-enhancement coefficient, δ (with 0 < δ < 1) is the timediscounting factor, ∆cn is the (per-period) scalar-valued cost function and V n is the value function (with terminal condition V N +1 = 0), is solved in periods n = 1, 2, . . . , N with respect to the free- 3 valued control vector un under the conditions that: (i) the cost function, cn , is a quadratic form in the control vector, un , and the state vector, zn , cn = u0n Q un + z0n R zn + 2 u0n S zn , (ii) the state vector, zn , is governed by the following linear plant equation zn = (I + A∆) zn−1 + B∆ un−1 + n , where n ∼ N [0, ∆N] and n ⊥ n0 . The parameter ρ represents a risk-enhancement coefficient, in that it introduces extra convexity vis-a-vis that of the objective function of the Markovian linear quadratic Gaussian problem with time-discounting (DLQG problem henceforth). It can be shown that the convexity of ∆cn + ρ/2 ln En exp δ ∆ ρ2 V n+1 is increasing in ρ, while limρ↓0 ρ2 ln En exp δ ∆ ρ2 V n+1 = δ ∆ En [V n+1 ]. This indicates that in the limit, for ρ ↓ 0, the solution of the recursive optimization in Definition 1 converges to that of the standard dynamic programming recursion of a discounted Markovian optimal control problem, i.e. V n = minun {∆cn + δ ∆ En [V n+1 ]}.1 By increasing the degree of risk-aversion of the optimizing agent, the DLEQG problem is better suited to capture the impact of risk-aversion on agents’ decisions than the standard DLQG formulation. In particular, the optimal policy is no longer independent of the agents’ degree of uncertainty (i.e. differently from what happens in the DLQG formulation, the optimal policy will depend on the covariance matrix of the innovation vector, ∆N). Furthermore, it allows to introduce risk-aversion in those DLQG problems, such as the one discussed in Section 2, where agents are risk-neutral. Our formulation of the optimization criterion is an extension of that proposed by Hansen and Sargent (1994, 1995, 2013) (it collapses to their formulation for ∆ = 1 and ρ0 = δρ), which preserves the properties of monotonicity and convexity of their criterion and allows to accommodate a continuous-time limit and to exploit a number of useful results derived by Whittle (1990) for the class of LEQG problems. The optimization criterion (1.1) is not a special characterization of preferences, as it can be employed in the analysis of several economic issues (Hansen and Sargent, 2013; Hansen, Sargent, and Tallarini, 1999; Luo, 2004; Luo and Young, 2010). Moreover, it corresponds to the recursive preferences of Epstein and Zin (1991) when the elasticity of inter-temporal sub1 These and other results are proved in a separate Appendix. 4 stitution is 1, ρ is the coefficient of relative risk-aversion and log consumption is approximated by a quadratic form of the state and control vectors (Tallarini, 2000). In Whittle’s formulation of the LEQG problem there is no time-discounting, in that in any period n the aggregate cost function ln En exp ρ C2 with C a time-separable function, PN C = n=1 ∆cn , is minimized with respect to the control vector un . To accommodate time PN ∆ n ∆c . discounting one could consider the modified time-separable function C = n n=1 δ This would be unsatisfactory because: i) the impact of risk-aversion would dissipate over-time (Bouakiz and Sobel, 1984); and in the continuous-time limit the optimal control rule would coincide with the risk-neutral counter-part. Therefore, as suggested by Hansen and Sargent, a recursive optimization criterion such as (1.1) is called for. However, whereas our optimization criterion differs from Whittle’s, we are still able to preserve most of his results with some minor adjustments.2 To show this we notice that the following holds: Lemma 1 The recursive optimization criterion (1.1) can be equivalently formulated as follows Vn = n h ρ io 2 ∆ ln min En exp (∆cn + δ V n+1 ) . un ρ 2 (1.2) Borrowing Whittle’s intuition we introduce the concept of (discounted) stress: Definition 2 In n the (discounted) stress is S n ≡ ∆cn − 1 ρ dn+1 + δ ∆ V n+1 , where dn is a per- period discrepancy function equal to 0n (∆N)−1 n for n = 1, 2, . . . , N and 0 for n = N + 1. This concept is similar to that originally introduced by Whittle. Our notion differs in two respects: firstly, it covers only periods n and n + 1; secondly, the value function in period n + 1, V n+1 , is pre-multiplied by the per-period discount factor δ ∆ . The (discounted) stress is useful in that we can rely on the following Lemma, which adapts a result firstly outlined by Whittle for the class of LEQG problems: Lemma 2 In a Markovian DLEQG problem if the value function in n + 1, V n+1 , is a quadratic form in the state vector zn+1 and the (discounted) stress S n satisfies a saddle point condition with 2 For δ ↑ 1 V n does not converge to Whittle’s aggregate cost function, since in the optimization criterion (1.1) the minimization argument does not contain the past cost components, cm with m < n. However, for δ ↑ 1 the optimal policy for the optimization criterion (1.1) converges to that for Whittle’s Markovian LEQG problem, in that in period n cm , with m < n, is deterministic and constant with respect to un . 5 respect to n+1 and un , so that minun maxn+1 S n exists, the following proportionality holds min En un h i ρ exp (∆cn + δ V n+1 ) ∝ exp min max S n , 2 2 un n+1 ρ ∆ where the proportionality constant is independent of the state vector zn , while the value function V n is a quadratic form in zn equal to the extremized stress, minun maxn+1 S n , plus a constant independent of zn . To prove this Lemma we first need to apply the following result: Lemma 3 If Q(u, ) is a quadratic form in the vectors u and which admits the saddle point maxu min Q(u, ), the following proportionality holds Z min u 1 exp − Q(u, ) 2 1 d ∝ exp − max min Q(u, ) 2 u . It is worth noticing that the order of the extremizing operations is irrelevant in the determination of the saddle point if the two conditions Q > 0 and Qu u − Qu Q−1 Q u < 0 hold.3 More importantly, analogous results apply when Q is a non-homogeneous quadratic form, which depends on u and alongside a third vector z, insofar it admits a saddle point maxu min Q(u, , z). We are now ready to prove Lemma 2. Proof of Lemma 2. Consider that if V n+1 is a quadratic form in zn+1 , as the latter is linearly dependent on n+1 , h ρ i min En exp (∆cn + δ ∆ V n+1 ) un 2 Z ∝ = ρ 1 min exp (∆cn + δ ∆ V n+1 ) − 0n+1 (∆N)−1 n+1 un 2 2 Z Sn dn+1 . min exp ρ un 2 Since V n+1 is as a quadratic form in n+1 , un and zn , so is S n . If the stress in n admits the saddle point minun maxn+1 S n , then −S n admits the saddle point maxun minn+1 {−S n }. We can apply Lemma 3 for Q(un , n+1 ) = −ρS n , so that Z min un 3 Sn exp ρ dn+1 2 ∝ exp 1 − max min (−ρS n ) 2 un n+1 = exp ρ min max S n 2 un n+1 The quadratic form is extremized when it is maximized with respect to and minimized with respect to u. 6 , dn+1 where the constant of proportionality depends on the covariance matrix of the vector n+1 and it is independent of both un and zn . As S n is a quadratic form in n+1 , un and zn , from the saddle point condition we conclude that the extremized stress, minun maxn+1 S n , is a quadratic form in zn , while V n in equation (1.2) is equal to the extremized stress plus a constant independent of zn . The requirement that the stress satisfies a saddle point condition may actually not hold. As it will be clearer later, for a sufficiently large degree of risk-aversion (i.e. for ρ large enough) the stress in period n will not be negative definite in n+1 indicating that the saddle point condition cannot be met and that the DLEQG problem does not present an optimizing solution, in that the value of V n becomes infinite. In other words, while suggesting a way to solving Markovian DLEQG problems, this Lemma also establishes that under extreme circumstances such problems are not well-behaved and their optimization is meaningless. However, we will see that such a problem vanishes in the continuous-time limit, as for ∆ ↓ 0 a saddle point always exists under the same sufficient conditions which apply to the risk-neutral formulation (specifically that Q being positive definite). As corollary of Lemma 2 we establish a simplified version of Whittle’s risk-sensitive certainty equivalence principle (RSCEP) which is useful in addressing Markovian DLEQG problems. Theorem 1 - (Risk-sensitive Certainty Equivalence Principle). In a Markovian DLEQG problem, if the saddle point condition for the stress is respected in all future periods, i.e. minun+j maxn+j+1 S n+j exists for j = 0, 1, . . . , N − n, the optimal value of the vector un is determined in period n by simultaneously minimizing S n with respect to un and maximizing it with respect to n+1 . The value function is then proportional to the extremized stress, V n ∝ minun maxn+1 S n . Proof. Notice that in N cN is a positive definite quadratic form in uN , while V N +1 = dN +1 = 0. This implies that S N is a quadratic form in uN and N +1 and hence that the conditions to apply Lemma 2 are met, so that the saddle point condition for S N yields the optimal control uN , with the extremized stress, minuN maxN +1 S N , and the value function, V N , both quadratic forms in zN . By backward induction the statement is established. Theorem 1 is particularly useful in that it suggests that, when the stress is well-behaved and hence the DLEQG problem admits a meaningful solution, to pin down the optimal policy it is sufficient to impose recursively the saddle point condition for S n . The recursion starts in period N and proceeds backward. Therefore, if the saddle point condition is met in periods N , N − 1, . . ., n + 1, in n the optimal policy is derived by first maximizing S n with respect to 7 the innovation vector n+1 and then by minimizing the resulting expression with respect to the control vector un . An economic interpretation of Theorem 1 is that a risk-averse agent whose preferences are represented by the optimization criterion (1.1) attempts to hedge against the worst possible values for the vector n+1 , by following a min-max strategy according to which she selects un to minimize her welfare loss (i.e. the stress) against the most unfavorable innovation vector n+1 . Such an agent acts as if she were pessimistic, considering these worst-case realizations very likely. Consequently she tunes her actions on their impact on her welfare, applying what we term, borrowing Whittle’s terminology, a pessimistic choice mechanism. Theorem 1 revises the certainty equivalence principle of the Markovian DLQG problem: the normally distributed unobservable variables are no longer replaced by their maximum likelihood (ML) estimates, but by those that maximize the stress in order to compensate for risk-aversion. In other words, in the Markovian DLQG problem the separation principle between optimization over the control vector and estimation of the unknown values applies, in that the control vector is chosen as it would be in the perfect information case, with the unobservable values replaced by their ML estimates. On the contrary, in the DLEQG problem the derivation of the optimal control and the optimal estimation of the unknown values are intertwined, as the optimal control and optimal estimates are chosen in order to extremize the stress. Indeed, differently from the Markovian DLQG problem, uncertainty over the innovation vector n+1 conditions the optimal choice of the control vector, un . Specifically, the statistical characteristics of n+1 , and hence its covariance matrix ∆N, influence the optimal value of the vector un . Viceversa, the cost function and the degree of risk-aversion affect the optimal estimate of n+1 , which no longer corresponds to the ML estimate. Theorem 1 indicates that because of the recursive structure of the Markovian DLEQG problem the extremization of the stress can be undertaken recursively. In particular, starting from N one proceeds backward imposing the saddle point conditions for the stress in periods N , N − 1, . . ., 1. In this respect, we adapt to the Markovian DLEQG problem a result originally derived by Whittle. Lemma 4 The saddle point conditions for the stress S n , with n = 1, 2, . . . , N , can be satisfied by solving the following (discounted) future stress backward recursion Fn (zn ) = min un max n+1 ∆cn − 8 1 dn+1 + δ ∆ Fn+1 (zn+1 ) , ρ (1.3) where Fn (zn ), denoted as the extremized (discounted) future stress, is a quadratic form in the state vector zn , Fn (zn ) ≡ z0n Πn zn with ΠN +1 ≡ 0. The value function in n is V n = νn + Fn (zn ), where νn is independent of zn . Proof. Let us start from N . By definition V N +1 = 0 and dN +1 = 0. Given that cN is a quadratic form in uN and zN , we immediately see that: i) imposing the saddle point condition for the stress in N , S N , is equivalent to solving recursion (1.3); and ii) there exist a matrix ΠN and a constant νN independent of zN such that the extremized future stress is FN (zN ) ≡ z0N ΠN zN and that exp(ρV N /2) = exp( 12 ρ[νN + FN (zN )]). Proceeding backward, the optimal control vector in period N − 1 is obtained by imposing the following saddle point condition min max S N −1 = min uN −1 N uN −1 1 ∆ . max ∆cN −1 − dN + δ V N N ρ Since V N = νN + FN (zN ) and νN is independent of zN , this is equivalent to imposing the saddle point condition 1 ∆ . min max ∆cN −1 − dN + δ FN (zN ) uN −1 N ρ Given that cN −1 is a quadratic function in uN −1 and zN −1 , dN is a quadratic form in N and FN (zN ) is a quadratic form in zN while this is linear in uN −1 , zN −1 and N , we find that the result of this extremization is given by a quadratic form of zN −1 , so that there exists a matrix ΠN −1 such that FN −1 (zN −1 ) = z0N −1 ΠN zN −1 and exp(ρV N −1 /2) = exp( 21 ρ[νN −1 + FN −1 (zN −1 )]). Since the same argument applies in any other period n as long as Fn+1 is a quadratic form in zn+1 , by backward induction the statement is proved. In conclusion, a recursion similar to the Bellman equation for the value function of dynamic programming is established: given the extremized future stress in period n + 1, the optimal policy in n is obtained by solving the future stress recursion.4 Similarly to the Markovian DLQG problem, the extremized future stress is a quadratic form in the state vector, Fn (zn ) = z0n Πn zn , while it can be established that the optimal policy is linear in the state vector, un = Kn zn , where Πn and Kn respect recursions which correspond to modified versions of the Riccati recursions for the Markovian DLQG problem. Such a recursion is analogous to that provided by Hansen and Sargent (1994, 1995, 2013).5 4 For δ ↑ 1 the future stress recursion (1.3) converges to the recursion originally derived by Whittle for the Markovian LEQG problem. This confirms that the optimal policy for our optimization criterion converges to that for Whittle’s Markovian LEQG problem for δ ↑ 1. In other words, the DLEQG problem encompasses the LEQG one. 5 Interestingly, they also propose a similar result when they find that solutions to their Markovian DLEQG prob- 9 Applying Lemma 4 we can establish the following Theorem, which extends Whittle’s risksensitive Riccati equation to the class of Markovian DLEQG problems and reveals the nexus with the common solutions to the standard Markovian DLQG problem: Theorem 2 - (Risk-sensitive Riccati Equation). If the matrix (δ ∆ Πn+1 )−1 − ρ∆N is positive definite, in period n the extremized (discounted) future stress exists and it is given by Fn (zn ) = z0n Πn zn for the optimal control un = Kn zn , where (1.4) e 0Π e n+1 A e − (S0 + A e 0Π e n+1 B)(Q + ∆B0 Π e n+1 B)−1 (∆S + ∆B0 Π e n+1 A) e (1.5) Πn = R∆ + A , Kn = e n+1 B)−1 (S + B0 Π e n+1 A) e , − (Q + ∆B0 Π (1.6) e n+1 = ((δ ∆ Πn+1 )−1 − ρ ∆N)−1 Π (1.7) e = (I + A∆) . and A Proof. In the Markovian DLEQG problem the extremized future stress Fn (zn ) respects the double recursion Fn = L L̃ Fn+1 , based on the following two operators e + Bu)] and L̃ φ(z) = max [φ(z + ) − L φ(z) = min [∆c(z, u) + φ(Az u where φ(z) = δ ∆ z0 Πz, so that L̃ φ(z) = max [(z + )0 δ ∆ Π(z + ) − 1 0 (∆N)−1 ] , ρ 1 0 −1 ρ (∆N) ]. Taking first derivatives, we find that ˜ = − (δ ∆ Π − 1 −1 ∆N−1 )−1 δΠ z = − Π̆ δΠ z , ρ which pins down a maximum if Π̆ is negative definite, or equivalently if (δ ∆ Π)−1 −ρ∆N is positive definite. Replacing this expression we conclude that L̃ φ(z) = z0 ((δΠ)−1 − ρ∆N)−1 z = e z. For L̃ φ(z) = z0 Π e z, solution of the operator L yields the standard recursive formulae z0 Π e = ((δ ∆ Π)−1 − ρ∆N)−1 replaces for Π and K from the Markovian LQG problem where Π Π. Applying the two operators in n we obtain the recursive formulae for Πn and Kn presented in the statement. Importantly, as the cost function cn is positive definite in un and zn , e n+1 B is positive definite, so that the second order condition for minimization in Q + ∆B0 Π the operator L holds and Πn is positive semi-definite. It is worth noticing that with respect to the standard Riccati equation which applies to lems are obtained using the same Bellman equation which yields the optimal policy in their robust control problem (Hansen and Sargent, 2008). 10 the Markovian DLQG problem, the matrix δ ∆ Πn+1 is now replaced by the modified matrix e n+1 .6 This shows that in the DLEQG formulation the optimal policy retains a specification Π which is similar to the one that would prevail in the DLQG one. Indeed, as for ρ = 0 we obtain the standard Riccati equation of the DLQG problem, it is confirmed that the DLEQG problem encompasses the DLQG one. For ρ > 0 a straightforward correction for the impact of uncertainty and risk-aversion must however be inserted in the expressions for the recursions of Πn and Kn , as the optimal policy depends on the agent’s risk-enhancement coefficient, ρ, and her uncertainty over future shocks, ∆N. The requirement that the matrix (δ ∆ Πn+1 )−1 − ρ∆N being positive definite derives from a second order condition which must hold for the stress S n to satisfy the saddle point condition imposed by Theorem 1. As noted by Whittle, whenever the cost function cn is non-negative such condition fails for ρ large enough, indicating that the value function V n is infinite and confirming our earlier claim that for a sufficiently large degree of risk-aversion the DLEQG problem is not well-behaved and does not admit an optimizing solution. An economic interpretation of the failure of the optimization problem is that in these extreme circumstances the optimizing agent becomes so pessimistic as to consider her control ineffective and hence useless.7 Because of time-discounting it is possible to consider the limit case for T ↑ ∞, that is a DLEQG problem with infinite horizon. As indicated by Hansen and Sargent (1994) there is no certainty that for T ↑ ∞ the value function V n is finite and as a consequence the DLEQG problem may be not well-defined. However, when a minimum is reached we can identify a stationary solution, in that in the limit Πn → Π and Kn → K, where the limit matrices are determined by the fixed point in the risk-sensitive Riccati equation, Π = e 0Π eA e − (S0 + A e 0 ΠB)(Q e e −1 (∆S + ∆B0 Π e A) e , R∆ + A + ∆B0 ΠB) (1.8) e ≡ ((δ ∆ Π)−1 − ρ∆N)−1 . with Π (1.9) 6 With respect to the original result by Whittle for the LEQG problem two adjustments are required as consequence of the introduction of time-discounting: i) the second order condition for the extremization of the future stress function requires that (δ ∆ Πn+1 )−1 − ρ∆N, rather than Π−1 n+1 − ρ∆N, is positive definite; and ii) in the e n+1 = ((δ ∆ Πn+1 )−1 − ρ∆N)−1 replaces Π e n+1 = (Π−1 − ρ∆N)−1 . modified Riccati equation Π n+1 7 The matrix Πn+1 may be non-invertible (this being necessarily the case for ΠN +1 = 0), so that Theorem 2 cannot be applied. The proof of such Theorem indicates how to amend its statement, in that for Πn+1 non-invertible, the second order condition for the stress function to satisfy the saddle point condition e n+1 = is that δ ∆ Πn+1 − (1/ρ)(∆N)−1 is negative definite, while the modified matrix now takes the form Π δ ∆ Πn+1 (I − δ ∆ ρ∆NΠn+1 )−1 . In N , since ΠN +1 = 0, we immediately see that the second order condition that e N +1 = 0. δ ∆ ΠN +1 − (1/ρ)(∆N)−1 being negative definite is trivially satisfied, while Π 11 2 The Continuous-time Limit Whereas for ∆ = 1 we have the standard discrete time formulation, for ∆ ↓ 0 the Markovian DLEQG problem of Section1 converges to its continuous-time analogue. Assuming that for ∆ ↓ 0, period n converges to time t (and final period N to terminal date T ), the following result holds: Theorem 3 In the continuous-time limit of the Markovian DLEQG problem, the optimal policy is u(t) = K(t) z(t) , with (2.1) K(t) = − Q−1 (S0 + B0 Π(t)) , (2.2) where the matrix Π(t) solves the following continuous-time risk-sensitive Riccati equation d Π(t) + R + A0 Π(t) + Π(t) A − (S0 + Π(t) B) Q−1 (S + B0 Π(t)) dt + ρ Π(t) N Π(t) + ln δ Π(t) under the boundary condition Π(T ) = 0 . = 0, (2.3) (2.4) Proof. From the proof of Theorem 2 we know that the extremized future stress is Fn (zn ) = z0n Πn zn for un = Kn zn , where 0e −1 Kn = − (Q + ∆ B Πn+1 B) Πn 0 e S + B Πn+1 (I + A ∆) , 0 0 0e e e e = Πn+1 + ∆ R + Kn Q Kn + 2S Kn + A Πn+1 + Πn+1 A + 2Πn+1 B Kn + o(∆2 ) . e n+1 (Π e n+1 = ((δ ∆ Πn+1 )−1 − ρ∆N)−1 ) and considering Given the expressions for Kn and Π e n+1 = lim∆↓0 Πn+1 and lim∆↓0 Πn = Π(t), for ∆ ↓ 0 the Riccati difference that lim∆↓0 Π equation for Πn converges to its continuous-time counterpart in (2.3), while the terminal condition ΠN +1 = 0 converges to the boundary condition Π(T ) = 0. The extremized future stress in the continuous-time limit is F (z(t), t) = z(t)0 Π(t)z(t), while the optimal policy is u(t) = K(t)z(t) with K(t) = −Q−1 (S0 + B0 Π(t)). It is worth noticing that in the continuous-time limit the solution to the Markovian DLEQG e n converges to Πn . Indeed, this problem is simpler, in that for ∆ ↓ 0 the modified matrix Π 12 is consequence of the fact that for ∆ ↓ 0 the stress S n always possesses a maximum in n+1 for n+1 = 0. This implies that the positive-definiteness condition for the extremization of e n+1 B0 ) converges the stress in Theorem 2 is redundant. Furthermore, in the limit (Q + ∆BΠ to Q, so that a sufficient condition for the existence of a solution to the Markovian DLEQG problem is that such matrix is positive definite. This means that as long as the cost function, cn , is positive definite in un , in the continuous-time limit an optimal policy for the Markovian DLEQG always exists, since the stress, S n , surely possesses a saddle point. In brief, conditions for DLEQG problems to admit solutions are less restrictive in continuos time than in discrete time and correspond to those which apply to DLQG problems. Finally, notice that with respect to the standard Riccati equation which applies to the LQG problem in continuous time, two extra terms appear in the DLEQG extension. The term ρΠ(t)NΠ(t) modifies the standard Riccati equation as in the Markovian LEQG problem discussed by Whittle and captures the impact in continuous time of risk-aversion on the dynamics of the optimal solution. Similarly, the extra term ln δ Π(t) captures the impact of time-discounting. Interestingly, the two terms enter separately into the modified Riccati equation indicating that in the continuous-time limit the impact of risk-enhancement and timediscounting is clearly disjointed. 2.1 Optimal Production Policy for a Risk-averse Entrepreneur To illustrate how the Markovian DLEQG formulation operates in continuous time we consider an entrepreneur running a monopolistic firm which produces a perishable commodity. The firm’s profits at time t are equal to p(t)x(t) − qx(t)2 , where p(t) is the commodity price and x(t) the quantity produced. Since q is a positive constant, production presents increasing marginal costs. We assume the demand schedule for this commodity is linear in its price and its price variation, xd (t) = ad − bd p(t) − cd d p(t) + d (t) , dt where all coefficients are positive. This means that the demand for the commodity is decreasing in both its price level and its price variation. Since the commodity is perishable, the entrepreneur chooses the quantity the firm produces, accepting any price which will clear the market. From the inverted demand function we obtain the following plant equation d p(t) dt = a p(t) + b x(t) + µ + (t) , 13 (2.5) where a = −bd /cd , b = −1/cd (with a and b negative), while µ = ad /cd and (t) = −d (t)/cd . Because the demand for the commodity is subject to stochastic shocks the entrepreneur faces an uncertain environment and will not be able to anticipate the profits her production decisions will bring about in the future. Assuming she is risk-averse, her optimal production problem can be casted within our Markovian DLEQG framework, for z(t) = p(t), u(t) = x(t) and c(t) = qx(t)2 − p(t)x(t). For simplicity we restrict our analysis to a homogeneous formulation with µ = 0, so that Q = q, R = 0, S = −1/2, A = a and B = b.8 From Theorem 3, we find that the optimal production policy respects the formulation x(t) = κ(t) p(t) , where 1 1 − b π(t) and π(t) = − κ(t) = q 2 (2.6) √ ζ1 b2 q − ρ σ2 e ζ1 ζ2 D(T −t) − 1 √ e D(T −t) − 1 ! , 2 with σ2 the variance of (t), T the terminal date, D = (2a + b/q + ln δ)2 − (1/q)( bq − ρσ2 ) √ and ζ1 , ζ2 = − 12 (2a + b/q + ln δ) ± 21 D. Inspection of this solution indicates that π(t) is never positive, as the monopolistic firm would stop producing if the entrepreneur did not earn any profits. In addition, since for t ↑ T π(t) ↑ 0 and κ(t) ↑ 1/(2q), as the entrepreneur approaches the final horizon of her optimization problem, her optimal policy converges to the static solution, in which the entrepreneur chooses her production to maximizes her expected per-period 2 profits. On the contrary, for t ↓ −∞, π(t) ↓ −ζ2 /( bq − ρσ2 ), while k(t) ↓ 1 1 q (2 + bζ2 ), b2 /q−ρσ2 indicating that the optimal policy converges to the stationary solution for the formulation with infinite horizon. In Figure 4 we plot the dynamics of the coefficients π(t) and κ(t) under a specific parametric choice for ρ = 0 and 5 and δ = 1 and 0.5. This graphical representation clearly confirms that in the dynamic optimization exercise the entrepreneur will be more cautious than in the static formulation and will choose to produce a smaller quantity of the commodity (the intensity of her production policy is smaller than in the static formulation as κ(t) < (1/2q)). The entrepreneur realizes that a larger quantity of the commodity brought to the market at time t will jeopardize her future profits. Indeed, a larger quantity x(t) in t lowers the commodity price p(t). Because of inertia in the demand for the commodity, such reduction propagates through time, so that a larger production today induces lower prices and impaired profit opportunities tomorrow. As the terminal date T approaches, future profits have a smaller impact on the entrepreneur’s optimal policy, which converges to the static solution (κ(t) converges to the 8 In Section 4 we discuss the general formulation with µ > 0. 14 Control Rule Function Riccati Function 2 0 ‐0.05 1.9 1.8 ‐0.15 κ(t) π(t) ‐0.1 ρ=5,δ=0.5 ρ=0,δ=0.5 ρ=5,δ=1 ρ=0,δ=1 ρ=5,δ=0.5 ρ=0,δ=0.5 ρ=5,δ=1 ρ=0,δ=1 1.7 ‐0.2 1.6 ‐0.25 1.5 ‐0.3 ‐0.35 0 0.2 0.4 Time, t 0.6 0.8 0 1 0.2 0.4 Time, t 0.6 0.8 1 Figure 4: The dynamics π(t) and κ(t) for T = 1, σ2 = 0.25, a = − 1, b = − 0.5 and q = 0.25. static optimal value 1/(2q)). Both time-discounting and risk-aversion condition the production policy.9 Unsurprisingly, for δ smaller the entrepreneur becomes more aggressive, as she gives less weight to future profits in determining her current production policy. This implies that the intensity of the production policy, κ(t), is larger for any t. Similarly, risk-aversion makes the entrepreneur more aggressive, as κ(t) is larger for ρ = 5 than ρ = 0 throughout time. The impact of risk-aversion on the optimal policy can be explained considering that a larger quantity x(t) in t will reduce both the expected value and the variance of future profits. Given that the optimization criterion in (1.1) favors early resolution of uncertainty, a risk-averse entrepreneur will be willing to accept smaller expected future profits to reduce their variance and will consequently choose a more aggressive production policy. 3 DLEQG Problems Under Imperfect State Observation In many situations agents observe only a noisy signal on the state vector zn in period n. In such a scenario the optimization criterion (1.1) is not well-defined, as the cost function cn is no longer deterministic. Nevertheless, we can employ the optimization criterion (1.2). Under 9 While the plots represented in Figure 4 are specific to the parametric constellation we have chosen, qualitatively similar results are obtained for other choices of such parameters. This suggests that Figure 4 represents the general features of the impact of time-discounting and risk-aversion on the optimal policy of the entrepreneur. 15 imperfect state observation, while no longer equivalent to the former one, this optimization criterion is well-defined. To allow for imperfect state observation we introduce the following modified Definition for the class of Markovian DLEQG problems: Definition 3 An optimal control problem is said to be Markovian linear exponential quadratic Gaussian with time-discounting and imperfect state observation if the following recursive optimization i h ρ Vn En exp 2 = h ρ i min En exp (∆cn + δ ∆ V n+1 ) , un 2 (3.1) is solved in periods n = 1, 2, . . . , N with respect to the free-valued control vector un under the conditions that: (i) for n = 1, 2, . . . N , the cost function, cn , is a positive-definite quadratic form in the control vector, un , and the state vector, zn , cn = u0n Qun + z0n Rzn + 2u0n Szn ; (ii) the state vector, zn , follows the plant equation zn = (I + A∆)zn−1 + B∆un−1 + n ; (iii) in n the vector of observable variables is given by wn = C zn−1 + η n , with ψ n = n ! ηn " ∼ N 0 0 ! , !# ∆N L L0 1 ∆M and ψ n ⊥ ψ n0 . As zn is now unobservable, the (discounted) stress takes a new formulation. In particular, let ẑn−1 denote the expectation of the state vector zn−1 conditional on the information contained in observation history Hn−1 = {h0 , Un−2 , Wn−1 }, with Un−2 = (u1 , . . . , un−2 ), Wn−1 = (w1 , . . . , wn−1 ), Ωn−1 the corresponding conditional covariance matrix and P the covariance matrix for ψ n , where P= ∆N L0 L 1 ∆M ! . We introduce a revised Definition for the (discounted) stress: Definition 4 Under imperfect state observation, the (discounted) stress is S n ≡ ∆cn − ρ1 (D n−1 + dn + dn+1 ) + δ ∆ V n+1 where dn is equal to ψ 0n P−1 ψ n for n ≤ N and 0 for n = N + 1, while D n−1 = (zn−1 − ẑn−1 )0 Ω−1 n−1 (zn−1 − ẑn−1 ). 16 We can then prove the following Lemma, which reformulates a result stated in Lemma 2 under the perfect state observation: Lemma 5 In a Markovian DLEQG problem, under imperfect state observation, if the value function in n + 1, V n+1 , is a quadratic form in the state vector zn+1 and the (discounted) stress S n satisfies a saddle point condition with respect to ξn (with ξ 0n ≡ (z0n−1 − ẑ0n−1 ψ 0n ψ 0n+1 )) and un , so that minun maxξn S n exists, the following proportionality condition holds h ρ i ρ ∆ min En exp (∆cn + δ V n+1 ) ∝ exp min max S n , un 2 2 un ξn where the proportionality constant is independent of the state vector zn , while the value function V n is a quadratic form in zn equal to the extremized (discounted) stress minun maxξn S n plus a constant independent of zn . Proof. Consider that under imperfect state observation V n+1 is still a function of zn+1 , while cn is function of zn . Since under imperfect state information zn and zn+1 can be expressed in terms of the vector ξ n , we have h ρ i min En exp (∆cn + δ ∆ V n+1 ) un 2 Z ∝ min un ρ 1 0 −1 ∆ exp (∆cn + δ V n+1 ) − ξ n Υn−1 ξ n dξ n , 2 2 where Υn−1 denotes the covariance matrix of ξ n conditional on observation history Hn−1 . In addition, since (zn−1 − ẑn−1 )0 ⊥ ψ 0n ⊥ ψ 0n+1 , we can write h ρ i min En exp (∆cn + δ ∆ V n+1 ) un 2 Z ∝ min un ρ 1 exp (∆cn + δ ∆ V n+1 ) − 2 2 ψ 0n+1 P−1 ψ n+1 + ψ 0n P−1 ψ n + (zn−1 − ẑn−1 )0 Ω−1 n−1 (zn−1 − ẑn−1 ) Z = min un ! Sn exp ρ dξ n . 2 We proceed as in the Proof of Lemma 2. Since V n+1 is a quadratic function of zn+1 and the latter is linearly dependent on ξ n and un , S n is a quadratic form in ξ n and un . If the stress, S n , respects the aforementioned saddle point condition, exploiting the property of quadratic 17 dξ n forms outlined in Lemma 3, we conclude that Z min un Sn exp ρ 2 Z dξ n = min un ! 1 exp − (−ρS n ) dξ n 2 | {z } Q(un ,ξn ) ∝ exp 1 ρ − max min (−ρS n ) = exp min max S n , 2 un ξn 2 un ξn where once again we have made use of the fact that if the saddle point minun maxξn S n exists, so does maxun minξn −S n . As the stress S n respects the saddle point condition, its extremized value will be a quadratic form in zn and so will be V n . Lemma 5 suggests a revision of the RSCEP outlined in Theorem 1: Theorem 4 - (Risk-sensitive Certainty Equivalence Principle). In a Markovian DLEQG problem, under imperfect state observation, if the stress S n+j respects a saddle point condition with respect to ξn+j and un+j for j = 0, 1, . . . , N − n, the optimal value of the vector un is determined in period n by simultaneously minimizing S n with respect to un and maximizing it with respect to ξ n . The value function is then proportional to the extremized stress, V n ∝ minun maxξn S n . The vector ξ n contains the vectors zn−1 − ẑn−1 , n , η n , n+1 and η n+1 which in period n are unknown. Following Whittle’s suggestion they can be expressed as linear functions of the unobservable (at time n) state vectors, zn−1 , zn and zn+1 , and the vector of signals, wn+1 . It follows that the saddle-point condition for the stress in n can be equivalently written as min max un zn−1 ,zn ,zn+1 ,wn+1 Sn . By writing the saddle point condition in this way, we see that it can be satisfied proceeding in two stages: in stage i), conditionally on the current state vector zn , the stress, S n , is extremized with respect to un , zn−1 , zn+1 and wn+1 ; in stage ii) the resulting function is extremized with respect to zn . In fact, we notice that min max un zn−1 ,zn ,zn+1 ,wn+1 S n ⇔ max zn min max un zn−1 ,zn+1 ,wn+1 Sn . In stage i), conditionally on zn , the extremization of the stress will be achieved by isolating terms in S n which pertain to past and future. This allows a partial separation between estimation and control. Specifically, let Pn (zn , Hn ) denote the extremized (discounted) past stress 18 defined as Pn (zn , Hn ) = max zn−1 1 − ρ dn + D n−1 (3.2) , where Hn is observation history in n. Then, the following Lemma, which adapts to the Markovian DLEQG problem a result originally derived by Whittle, holds: Lemma 6 Under imperfect state observation, the extremization of the stress in period n, can be obtained operating into two stages. Firstly, the extremized past and future stresses, Pn (zn , Hn ) and Fn (zn ), are calculated conditionally on the current state vector, zn . Secondly, the saddle point for the stress S n is achieved by maximizing Pn (zn , Hn ) + Fn (zn ) with respect to zn . Proof. The stress can be divided into two parts which contain values respectively depending on past and future variables, − 1 ρ dn + D n−1 and ∆cn − 1 dn+1 + δ ∆ V n+1 . ρ In stage i), extremizing S n with respect to un and zn−1 , zn+1 , wn+1 , conditionally on the current state vector zn , is equivalent to solving separately the two programs 1 max − zn−1 ρ ∆dn + D n−1 and min max un zn+1 ,wn+1 1 ∆cn − dn+1 + δ ∆ V n+1 ρ . As zn+1 and wn+1 are linearly dependent on n+1 and η n+1 , the latter program can be rewritten as follows min max un n+1 ,η n+1 1 ∆ ∆cn − dn+1 + δ V n+1 . ρ The maximization of ∆cn −(1/ρ)dn+1 +δ ∆ V n+1 with respect to η n+1 reduces to maxηn+1 {−(1/ρ)dn+1 } = − ρ1 0n+1 (∆N)−1 n+1 . Importantly, this means that under imperfect state observation the extremization, conditionally on zn , of the stress, S n , with respect to un , zn+1 and wn+1 is equivalent to the extremization of the stress under perfect state information with respect to un and n+1 . Lemma 4 shows that this corresponds to calculating the extremized future stress, Fn (zn ), which yields the optimal policy, u(zn ), conditional on the state vector zn . The former program instead corresponds to calculating the extremized past stress in n, Pn (zn , Hn ). This extremization identifies the optimal estimate for zn conditionally on information up to time n. In stage ii) estimation and control are recoupled by maximizing the sum Pn (zn , Hn ) + Fn (zn ) with respect to the current state vector zn . 19 Theorem 4 and Lemma 6 extend to the class of Markovian DLEQG problems Whittle’s risksensitive separation principle (RSSP) which allows to find the optimal policy by partially separating estimation and control. In particular, if the conditions listed in Theorem 4 for the existence of a meaningful solution to the DLEQG problem under imperfect state observation are met, this is found through the following Theorem: Theorem 5 - (Risk-sensitive Separation Principle). The extremized past and future stresses, Pn and Fn , relate to estimation and control respectively, as the evaluation of the former identifies the optimal estimate for the state vector zn conditional on past observations, while that of the latter pins down the control un (zn ) which would be optimal if zn were known. The calculations of Pn and Fn are recoupled by maximizing Pn + Fn with respect to the state vector zn so as to find the maximum stress estimate (MSE) z̆n for the state vector zn . - (Risk-sensitive Certainty Equivalence Principle). The optimal value of the control vector in n is given by un (z̆n ). As mentioned, conditionally on zn the future stress, Fn , respects the recursion (1.3) in Lemma 4. Then, Theorem 2 provides the conditional optimal policy, un (zn ). The extremization of the past stress is obtained via the following Lemma:10 Lemma 7 The extremized past stress is equal to Pn (zn , Hn ) = −(1/ρ) D n + · · · , where D n = (zn − ẑn )0 Ω−1 n (zn − ẑn ) and · · · indicates terms independent of zn , ẑn denotes the ML estimate of the state vector zn , given by its expectation conditional on observation history Hn , and Ωn the corresponding conditional covariance matrix. Proof. Pn (zn , Hn ) = maxzn−1 − ρ1 {dn + D n−1 } . From the definition of D n−1 we see that this maximum is equivalent to maxzn−1 ρ1 2 ln f (zn | Hn−1 ) where f (.) is the conditional density function of zn given Hn−1 . The maximum corresponds to Pn (zn , Hn ) = − ρ1 D n + · · · , where once again · · · indicates terms independent of zn . The extremized past stress depends on the maximum likelihood estimate (MLE) of the state vector zn . Given the noisy linear signal wn , from the Kalman filter we see that this value 10 A corollary of this Lemma is that the extremized past stress solves a (forward) recursion complementary to the future stress recursion, in that Pn (zn , Hn ) = maxzn−1 {− ρ1 dn + Pn−1 (zn−1 , Hn−1 )}, with P0 (z0 , H0 ) = − ρ1 D 0 . 20 respects the following recursion ẑn = e ẑn−1 + B∆ un−1 + (L + AΩ e n−1 C0 ) A 1 M + C Ωn−1 C0 ∆ −1 × (wn − C ẑn−1 ) , (3.3) e = I + A∆ and the conditional covariance matrix Ωn respects the Riccati recursion where A 0 e Ωn−1 A e − (L + AΩ e n−1 C ) Ωn = ∆N + A 1 M + C Ωn−1 C0 ∆ −1 × e n−1 C0 )0 . (3.4) (L + AΩ Importantly, Lemma 7 indicates that differently from what applies to the Markovian LEQG problem studied by Whittle, in the calculation of the extremized past stress no adjustment is made to the MLE of zn to correct for the impact of risk-aversion. This is because differently from Whittle’s formulation the past stress does not depend on the cost function.11 However, estimation and control are still intertwined as suggested in the discussion of Theorem 1. In fact, to re-couple the extremized past and future stresses we apply Theorem 5. This requires maximizing the sum Pn (zn , Hn ) + Fn (zn ) with respect to the state vector zn to obtain the maximum stress estimate (MSE), z̆n . Given that Pn (zn , Hn ) + Fn (zn ) = 0 −(1/ρ)(zn − ẑn )0 Ω−1 n (zn − ẑn ) + zn Πn zn plus terms independent of zn , from the first derivative of this sum with respect to zn it is immediate to see that, for Ω−1 n − ρ Πn positive definite, z̆n is given by the following expression z̆n = (I − ρ Ωn Πn )−1 ẑn , (3.5) which is function of ρ and the matrix Πn . As the latter depends on the components of cn , we see that the MSE z̆n is clearly affected by the shape of the cost function alongside the degree of risk-aversion, confirming the close nexus between control and estimation for the class of DLEQG problems. Finally, the optimal control vector under imperfect state observation is given by Theorem 2 where z̆n replaces zn , i.e. un = Kn z̆n , with Kn the matrix of optimal coefficients presented in Theorem 2. 11 Whittle shows that a maximum past stress estimate (MPSE) replaces the standard MLE in the expression for Pn . Such MPSE is obtained by introducing an adjustment to the Kalman filter which accounts for the impact of risk-aversion on the optimal estimation of the state vector zn . 21 3.1 The Continuous-time Limit with Imperfect State Information Under imperfect state information one should see how optimal control and estimation behaves for ∆ ↓ 0. It is immediate to see that Theorem 4 and Lemma 7 still apply, with the qualification that the ML estimate of zn will be replaced by its continuous-time counterpart. The continuoustime version of the Kalman filter indicates that for w(t) = Cz(t) + η(t), the ML estimate of the state vector respects the following expression, d ẑ(t) dt = A ẑ(t) + B u(t) + (L + Ω(t) C0 )M−1 (w(t) − C ẑ(t)) , (3.6) where the conditional covariance matrix Ω(t) respects the Riccati differential equation, d Ω(t) dt = N + A Ω(t) + Ω(t) A0 − (L + Ω(t) C0 )M−1 (L + Ω(t) C0 )0 . (3.7) Re-coupling the extremized past and future stresses yields the MSE for the state vector. In the continuous-time limit this will be given by the expression z̆(t) = (I − ρ Ω(t) Π(t))−1 ẑ(t) . (3.8) Let us reconsider our risk-averse entrepreneur and assume she only observes a noisy signal, w(t) = p(t) + η(t), of the commodity price, p(t). To justify such an assumption notice that p(t) corresponds to a relative price. While the entrepreneur observes the nominal price at which the firm sells its commodity, she does not observe the general price level but only a price index, which represents a noisy signal of the former. Assuming that she can continuously check such price index, at any time t she receives the noisy signal w(t) on p(t). Applying equations (3.6) and (3.7) we see that dp̂(t) dt = ap̂(t) + bx(t) + dσp2 (t) dt = σ2 + 2w(t) − σp2 (t) w(t) − p̂(t) , ση2 1 4 σ (t) . ση2 p The latter is a standard Riccati equation in continuous time. Given the boundary condition 22 σp2 (0) = σ02 , the conditional variance for the relative price p(t) at time t, is σp2 (t) = ση2 θ = a+ σ2 ση2 1/2 a+ σ2 ση2 1/2 θι1 exp(ι1 t) + ι2 exp(ι2 t) θ exp(ι1 t) + exp(ι2 t) + σ02 ση2 −a − σ02 ση2 −a , where 1/2 σ2 . and ι1 , ι2 = a ± a + σ 2 η From equation (3.8) we see that the MSE for the relative price is then p̌(t) = 1 1−ρσp2 (t)π(t) p̂(t), while the optimal production policy becomes x(t) = κ(t)p̌(t). 4 The Time-heterogeneous Formulation In Definition 1 the Markovian DLEQG problem is time-homogeneous, in that neither the plant equation nor the cost function explicitly depends on period n. This Definition can be adjusted to accommodate a non-homogenous plant equation and/or cost function, by making any of the matrices A, B, N, Q, R and S time-dependent. Treatment of this generalization is straightforward. It is more involving the analysis of the formulation with pre-determined disturbance terms into the plant equation. In particular, assume the state vector respects the following law of motion zn = (I + A∆)zn−1 + B∆un−1 + µn ∆ + n , where the vector µn contains pre-determined values. These values are known in advance and represent anticipated disturbances which modify the original plant equation introduced in Definition 1. Under perfect state observation, with a pre-determined disturbance term µn in the law of motion for the state vector Lemma 2 and Theorem 1 hold, as the stress in n, S n , is still a quadratic form in un , n+1 and zn . However, this quadratic form is no longer homogeneous in zn . This implies that Lemma 4 must be amended, in that the extremized future stress is now a non-homogenous quadratic form, Fn (zn ) = z0n Πn zn − 2ϑ0n zn · · · (with ϑn a vector of coefficients for the components of zn and · · · indicating terms independent of zn ). Given that in N + 1 FN +1 ≡ 0, the terminal conditions ΠN +1 = 0 and ϑN +1 = 0 apply. Therefore, Theorem 2 must be extended as follows: 23 Theorem 6 - (Risk-sensitive Riccati Equation). Under perfect state observation, if the matrix (δ ∆ Πn+1 )−1 − ρN is positive definite and the state vector respects the linear plant equation with pre-determined disturbances, in period n the extremized future stress is given by Fn (zn ) = z0n Πn zn − 2ϑ0n zn + · · · for (4.1) un = e n+1 B)−1 B0 Π e n+1 (Π−1 ϑn+1 − µn+1 ∆) , Kn zn + (Q + ∆B0 Π n+1 (4.2) where e 0Π e n+1 A e − (S0 + A e 0Π e n+1 B)(Q + ∆B0 Π e n+1 B)−1 (∆S + ∆B0 Π e n+1 A) e (4.3) Πn = R∆ + A , e n+1 B)−1 (S + B0 Π e n+1 A) e , Kn = − (Q + ∆B0 Π (4.4) e n+1 (Π−1 ϑn+1 − µn+1 ∆) , ϑn = Γ0n Π n+1 (4.5) e + ∆BKn , Γn = A e n+1 = ((δ ∆ Πn+1 )−1 − ρ N)−1 and A e = (I + ∆A) . Π with (4.6) Proof. We just repeat the steps followed in the proof of Theorem 2. Recall that the extremized future stress respects the double recursion Fn = L L̃ Fn+1 based on the two operators 1 e L φ(z) = min [∆c(z, u) + φ(Az+∆Bu+µ∆)] and L̃ φ(z) = max [φ(z+) − 0 (∆N)−1 ] . u ρ Assume that φ(z) = δ ∆ z0 Π z − 2δ ∆ ϑ0 z + · · · , so that L̃ φ(z) = max [(z + )0 δ ∆ Π(z + ) − 2δ ∆ ϑ0 (z + ) − 1 0 −1 ρ (∆N) ˜ = − (δ ∆ Π − + · · · ]. Taking first derivatives, we find that 1 1 (∆N)−1 )−1 δ ∆ Π z + (δ ∆ Π − (∆N)−1 )−1 δ ∆ ϑ , ρ ρ which pins down a maximum if (δ ∆ Π)−1 − ρ∆N is positive definite. Replacing this expression e 0 z+· · · , where Π e = ΠΠ e z − 2ϑ e = ((δ ∆ Π)−1 − ρ∆N)−1 , ϑ e −1 ϑ we conclude that L̃ φ(z) = z0 Π e 0 z + · · · , the solution of the e z − 2ϑ and · · · denotes terms independent of z. For L̃ φ(z) = z0 Π operator L yields the standard recursive formulae for Π, K and ϑ from the Markovian LQG e replace e = ((δ ∆ Π)−1 − ρ∆N)−1 and ϑ problem with pre-determined disturbances, where Π respectively Π and ϑ. Specifically, applying the double recursion Fn = L L̃ Fn+1 , we find that e 0 zn +· · · for un = Kn zn +(Q+∆B0 Π e n+1 − Π e n+1 B)−1 B0 (ϑ e n+1 µ F (zn ) = z0 Πn zn −2ϑ ∆), n where Kn = −(Q + n+1 e n+1 B)−1 (S B0 Π n+1 + e n+1 A). e B0 Π e n+1 with Π e n+1 Π−1 ϑn+1 we Replacing ϑ n+1 find the recursive formulae presented in the statement. 24 Theorem 6 indicates that in the presence of pre-determined disturbances the optimal policy contains a risk-adjusted correction which takes into account their anticipated values. This also implies that the optimal control is not longer a simple linear function of the state vector, as an extra term enters into the optimal policy in equation 4.2.12 A second adjustment must be introduced under imperfect state observation, when predetermined disturbances enter into the plant equation for the state vector. As Theorem 5 and Lemma 7 still apply, in recoupling the extremized past and future stresses, the sum Pn (zn , Hn ) + Fn (zn ) is maximized with respect to zn to obtain the MSE, z̆n . Given that 0 0 0 Pn (zn , Hn ) + Fn (zn ) = −(1/ρ)(zn − ẑn )0 Ω−1 n (zn − ẑn ) + zn Πn zn − 2ϑn zn plus terms indepen- dent of zn , taking the first derivative of this sum with respect to zn we see that, for Ω−1 n − ρ Πn positive definite, z̆n = (I − ρ Ωn Πn )−1 (ẑn − ρ Ωn ϑn ) , (4.7) where ẑn is still the ML estimate of zn . Thanks to the presence of the pre-determined disturbances, this is given by ẑn e n−1 C0 ) e ẑn−1 + B∆ un−1 + µn ∆ + (L + AΩ = A 1 M + C Ωn−1 C0 ∆ (wn − C ẑn−1 ) . −1 × (4.8) For the continuous-time limit of the Markovian DLEQG problem, where the pre-determined disturbance, µ(t), enters into the law of motion for the state vector, the following result holds:13 12 If the matrix Πn+1 is non-invertible Theorem 6 must be amended. The expression for the vector ϑn is now e n+1 µ e − δ ∆ ρ∆NΠn+1 )−1 ϑn+1 − Π n+1 ∆], while Πn+1 and the second order condition take the formulation given in footnote 7. 13 The presence of the pre-determined disturbances will only add a vector of pre-determined values, µ(t), in the expression for the MLE of the state vector under imperfect state observation in equation (3.6), while the continuous-time analogue for the MSE in equation (4.7) is z̆(t) = (I − ρ Ω(t)Π(t))−1 (ẑ(t) − ρ Ω(t) ϑ(t)). Γ0n [δ ∆ (I 25 Theorem 7 In the continuous-time limit of the Markovian DLEQG problem with pre-determined disturbances the optimal policy is u(t) = K(t)z(t) + Q−1 B0 ϑ(t) , (4.9) K(t) = − Q−1 (S0 + B0 Π(t)) , with (4.10) where the matrix Π(t) solves equation (2.3) and the vector ϑ(t) the following one d ϑ(t) + dt 0 e ln δ + Γ(t) ϑ(t) − Π(t) µ(t) = 0 , e with Γ(t) = A − B Q−1 S − −1 BQ (4.11) B − ρ N Π(t) , 0 under the boundary conditions Π(T ) = 0 and ϑ(T ) = 0 . (4.12) (4.13) Proof. From Theorem 6 we know that the extremized future stress is Fn (zn ) = z0n Πn zn − e n+1 is given by ((δ ∆ Πn+1 )−1 − ρN∆)−1 and the optimal control vector is 2ϑ0 zn + · · · , while Π n 0e −1 un = − (Q + ∆ B Πn+1 B) 0 e (I + A ∆) S+B Π zn e n+1 − Π e n+1 ∆B)−1 B0 (ϑ e n+1 µn+1 ∆) , where + (Q + ∆B0 Π e n+1 = Π e n+1 = ((δ ∆ Πn+1 )−1 − ρ ∆ N)−1 and ϑ e n+1 Π−1 ϑn+1 . Π n+1 While the matrix Πn respects the Riccati difference equation derived in the proof of Theorem 3, the vector ϑn respects the following difference equation e 0 [ I + ∆ (A + B Kn ) ] − ∆µ0 Π e ϑ0n = ϑ n+1 n+1 n+1 e n+1 − Π e n+1 µ∆)0 B (Q + ∆B0 Π e n+1 ∆B)−1 [ Q Kn + S + B0 Π e n+1 (I + A ∆) ] . − ∆ (ϑ e n+1 and Π e n+1 and considering that lim∆↓0 Π e n+1 = lim∆↓0 Πn+1 , Given the expressions for Kn , ϑ e n+1 = lim∆↓0 ϑn+1 and lim∆↓0 ϑn = ϑ(t), for ∆ ↓ 0 the differlim∆↓0 Πn = Π(t), lim∆↓0 ϑ ence equation for ϑn converges to its continuous-time counterpart in (4.11), while the terminal conditions ΠN +1 = 0 and ϑN +1 = 0 converge to the boundary conditions Π(T ) = 0 and ϑ(T ) = 0. The extremized future stress in the continuous-time limit is F (z(t), t) = z(t)0 Π(t) z(t) − 2ϑ(t)0 z(t) + · · · , while the optimal policy is u(t) = K(t)z(t) + Q−1 B0 ϑ(t) with K(t) = −Q−1 (S0 + B0 Π(t)). 26 In the illustrative example we have considered in Section 2 we have assumed that µ = ad /cd = 0. For µ 6= 0, we see that x(t) = κ(t)p(t) + ϑ(t)µ, where, applying Theorem 7, we find that ϑ(t) solves the non-homogenous differential equation dϑ(t) + dt 2 b 1 2 ϑ(t) − ln δ + a + − ρσ π(t)ϑ(t) − µπ(t) = 0 . 2q q (4.14) Numerical methods are called for to solve such a differential equation. To obtain a closed-form solution, we can consider the infinite horizon formulation of the optimal production problem assuming that T ↑ ∞. In this case π(t) → π̄ while ϑ(t) → ϑ̄, so that x(t) = κ̄p(t) + ϑ̄µ. In Section 2 we have seen that π̄ = − b2 q ζ2 − ρσ2 bζ2 1 1 . + 2 and κ̄ = b q 2 2 − ρσ q In addition, notice that in steady state equation (4.14) simplifies to 2 b 1 2 ϑ̄ − − ρσ π̄ ϑ̄ − µπ̄ = 0 , ln δ + a + 2q q so that π̄ ϑ̄ = ln δ + a + 1 2q − b2 q µ. − ρσ2 π̄ Importantly, the conclusions we drew in Section 2 on the impact of risk-aversion on the sensitiveness of the production policy with respect to the commodity price are unaffected by the presence of the pre-determined disturbance µ. 5 Concluding Remarks We have analyzed a specific class of optimal control problems, termed discounted linear exponential quadratic Gaussian (DLEQG) problems, where risk-averse agents satisfy a recursive optimization criterion defined over a quadratic cost function in state and control vectors as in Hansen and Sargent (1994, 1995, 2013) and a Markovian linear plant equation for the state vector dictates the dynamics of the economic environment. We haveshown that the DLEQG class is a generalization of Whittle’s (Whittle, 1990) linear exponential quadratic Gaussian (LEQG) class which allows to accommodate time-discounting while preserving most of the results he derives. Thus, Whittle’s risk-sensitive certainty-equivalence (RSCEP) and separa27 tion (SP) principles are reformulated, while his recursive formulae for the optimal control are modified to adapt them to the DLEQG class. In our analysis we consider the continuous-time limit of this class of optimal control problems and show that sufficient conditions for DLEQG problems to admit meaningful solutions are less stringent in continuous than in discrete time and coincide with those which apply to DLQG problems. Our revised RSCEP confirms within the DLEQG class Whittle’s result that the optimal behavior of risk-averse agents can be identified via a pessimistic choice mechanism, according to which the control variables are chosen by applying a min-max strategy, so that agents minimize their welfare loss against the most adverse shocks. A possible conjecture is that amid an uncertain environment a pessimistic agent acts cautiously, choosing conservative control rules. In effect, it is immediate to verify that within static optimization problems under uncertainty more risk-averse agents are less aggressive. Common intuition would then indicate that a similar result also holds within fully dynamic optimization problems. Our analysis suggests the contrary. We see, within the fully dynamic optimization problem faced by our monopolistic entrepreneur, that a larger degree of risk-aversion brings about bolder optimal policies. Hence, a conclusion we draw from our analysis is that, differently from what common intuition would suggest, a pessimistic agent should not be confused with a cautious one, as we see that pessimism induces such agent to act boldly. References B OUAKIZ , M., AND M. S OBEL (1984): “Non Stationarity Policies Are Optimal for Risk-sensitive Markov Decision Processes,” Discussion paper, California Institute of Technology. E PSTEIN , L. G., AND S. E. Z IN (1991): “Substitution, Risk Aversion, and the Temporal Behavior of Consumption and Asset Returns: An Empirical Analysis,” Journal of Political Economy, 99, 263–286. H ANSEN , L., AND T. J. S ARGENT (1994): “Discounted Linear Exponential Quadratic Gaussian Control,” Discussion paper, University of Chicago mimeo. (1995): “Discounted Linear Exponential Quadratic Gaussian Control,” IEEE Transactions on Automatic Control, 40, 968–971. 28 (2008): Robustness. Princeton University Press, Princeton. (2013): Recursive Models of Dynamic Linear Economies. Princeton University Press. H ANSEN , L. P., T. J. S ARGENT, AND T. D. TALLARINI (1999): “Robust Permanent Income and Pricing,” Review of Financial Studies, 66, 873–907. H OLDEN , C. W., AND A. S UBRAHMANYAM (1994): “Risk-aversion, Imperfect Competition, and Long-lived Information,” Economic Letters, 44, 181–190. L UO, Y. (2004): “Consumption Dynamics, Asset Pricing, and Welfare Effects under Information Processing Constraints,” Discussion paper, Princeton University. L UO, Y., AND E. R. Y OUNG (2010): “Risk-sensitive Consumption and Investment under Rational Inattention,” American Economic Journal: Macroeconomics, 2, 281–325. M AMAYSKY, H., AND M. S PIEGEL (2002): “A Theory of Mutual Funds: Optimal Fund Objectives and Industry Organization,” Discussion paper, Yale School of Management. TALLARINI , T. D. (2000): “Risk-Sensitive Real Business Cycles,” Journal of Monetary Economics, 45, 507–532. VAN DER P LOEG, F. (2009): “Prudent Monetary Policy and Prediction of the Output Gap,” Journal of Macroeconomics, 31, 217–230. (2010): “Political Economy of Prudent Budgetary Policy,” International Tax and Public Finance, 17, 295–314. V ITALE , P. (1995): “Risk-Averse Traders and Inside Information,” Discussion Paper 9504, Department of Applied Economics, University of Cambridge. (2012): “Risk-averse Insider Trading in Multi-asset Sequential Auction Markets,” Economics Letters, 117, 673–675. W HITTLE , P. (1990): Risk-sensitive Optimal Control. John Wiley & Sons, New York. Z HANG, H. (2004): “Dynamic Beta, Time-Varying Risk Premium, and Momentum,” Discussion paper, Yale School of Management. 29 Appendix A: Markovian DLEQG Problems in Discrete Time A.1. Monotonicity and Convexity of the Optimization Criterion (1.1). Let R(V) ≡ ln E exp δ ∆ ρ2 V . Then assume V 1 ≥ V 2 ≥ 0. Consider that R(V 1 ) − R(V 2 ) h ρ i h ρ i ln E exp δ ∆ V 1 − ln E exp δ ∆ V 2 = ln 2 2 = ! E exp δ ∆ ρ2 V 1 ≥ 0 . E exp δ ∆ ρ2 V 2 This means that R(V) is monotone increasing in V. So, consider the convex combination θV 1 + (1 − θ)V 2 , with 0 < θ < 1. R(θV 1 + (1 − θ)V 2 ) = ≤ = = i ρ θ ρ 1−θ h ρ = ln E exp δ ∆ V 1 exp δ ∆ V 2 ln E exp δ ∆ [θV 1 + (1 − θ)V 2 ] 2 2 2 n h ρ ioθ n h io1−θ ρ ln E exp δ ∆ V 1 · E exp δ ∆ V 2 2 2 i h ρ i h ρ + ln(1 − θ) E exp δ ∆ V 2 ln(θ) ln E exp δ ∆ V 1 2 2 ln(θ)R(V 1 ) + ln(1 − θ)R(V 2 ) , where the inequality follows from Hölder’s inequality. In fact, Hölder’s inequality states that for p and q such that 1 < p, q, with 1/p + 1/q = 1, E[| X · Y |] ≤ (E[| X |p ])1/p · (E[| Y |q ])1/q . We can apply Hölder’s θ 1−θ inequality setting X ≡ exp δ ∆ ρ2 V 1 and Y ≡ exp δ ∆ ρ2 V 2 , choosing p = 1/θ and q = 1/(1 − θ) and noticing that the exponential function is non-negative. This means that R is convex in V. Define Γ(u, z, V) ≡ Q(u, z) + R(V), where Q is positive definite in u and z. From the properties of the function R, it follows that Γ(u, z, V) is monotone increasing in V and convex in u, z and V, so that the recursive optimization criterion captures risk-aversion. In fact, the larger ρ the larger the convexity of the function Γ. A.2. Limit Properties of the Optimization Criterion (1.1). For ρ > 0 we have that ρ Vn n h ρ io ρ i 2 h ∆ ∆ = min ρ∆cn + 2 ln En exp δ V n+1 = ρ min ∆cn + ln En exp δ V n+1 , un un 2 ρ 2 ie. V n = min ∆cn + un ρ i 2 h ln En exp δ ∆ V n+1 ρ 2 . We proceed by backward induction. Assume that V n+1 is independent of ρ (this being certainly true in N + 1). Then, ρ i 1 h lim ln En exp δ ∆ V n+1 ρ↓0 ρ 2 = lim δ ρ↓0 ∆ ρ 1 En exp δ ∆ 2 V n+1 · V n+1 1 = δ ∆ En [ V n+1 ] , 2 2 En exp δ ∆ ρ2 V n+1 i where we have used the Hôpital’s rule and moved the derivative operator inside the expectation opera tor. This implies that for ρ ↓ 0 we have V n = minun ∆cn + δ ∆ En [V n+1 ] , with V n independent of ρ, while the argmin of the Markovian DLEQG problem converges to that of the corresponding Markovian DLQG problem for ρ ↓ 0. This implies that the Markovian DLEQG problem encompasses the Markovian DLQG problem and it can be considered an extension of the latter. A.3. Equivalence for the Optimization Criterion (1.1) and that of Hansen and Sargent’s. For ∆ = 1 and ρ > 0 the following holds Vn = min cn un = min = min cn un cn un ρ i 2 h ln En exp δ V n+1 + ρ 2 ρ i δ 2 h + ln En exp δ V n+1 δ ρ 2 0 ρ 2 + δ 0 ln En exp V n+1 , ρ 2 so that our recursive optimization criterion (1.1) corresponds to that proposed by Hansen and Sargent (1994) for ρ replaced by ρ0 ≡ δρ. A.4. The Optimization Criterion (1.1) and Epstein and Zin’s Preferences. Suppose U n solves Epstein and Zin’s recursion ( U n = max (1 − δ)Cn1−η + δ En h U 1−γ n+1 1−η i( 1−γ ) 1 )( 1−η ) , where 1/η is the elasticity of inter-temporal substitution. Let η = 1. Tallarini (2000) shows that δ h i ( 1−γ ) . U n = max Cn1−δ En U 1−γ n+1 Taking logs, ln U n = max (1 − δ) ln Cn + h i δ ln En U 1−γ , n+1 1−γ or equivalently ln U n = max 1−δ ln Cn + h i δ ln En U 1−γ . n+1 (1 − δ)(1 − γ) We can re-write this as h i ln U n δ 1−γ − = min − ln Cn − ln En U n+1 . 1−δ (1 − δ)(1 − γ) Un For V n = − ln1−δ , we have that −(1 − δ)V n = ln U n , so that U n+1 = exp(−(1 − δ)V n+1 ) and U 1−γ n+1 = (exp(−(1 − δ)V n+1 )) 1−γ ii = exp(−(1 − δ)(1 − γ)V n+1 ) . Setting ρ0 = −2(1 − δ)(1 − γ), we can write V n = min − ln Cn + δ 0 2 ρ ln E exp V , n n+1 ρ0 2 which corresponds to the optimization criterion (1.1) for ρ = ρ0 δ , − ln Cn equal to a quadratic form in the control and state vectors, un and zn , and ∆ = 1. A.5. Proof of Lemma 1. We can write exp ρ 2 Vn io ρ exp min ∆cn + ln En exp δ V n+1 un 2 2 n hρ h ρ iio min exp ∆cn + ln En exp δ ∆ V n+1 un 2 2 n h ρ h ρ iio min exp ln exp ∆cn + ln En exp δ ∆ V n+1 un 2 2 h ρ iio n h ρ ∆cn En exp δ ∆ V n+1 min exp ln exp un 2 2 n ρ h ρ io min exp ∆cn En exp δ ∆ V n+1 un 2 2 n h io ρ min En exp (∆cn + δ ∆ V n+1 ) , so that un 2 n h io ρ = ln min En exp (∆cn + δ ∆ V n+1 ) . un 2 = = = = = = ρ Vn 2 nρ h ∆ A.6. Proof of Lemma 3. Consider the quadratic form Q(u, ), where Q(u, ) = u0 Qu u u + 2u0 Qu + 0 Q . Assume Q admits a minimum in in that Q is positive definite. The following holds Z 1 exp − Q(u, ) d 2 ∝ 1 exp − min Q(u, ) . 2 This is because, for ˆ = argmin Q, we can write Q(u, ) = Q(u, ˆ) + ( − ˆ)0 Q ( − ˆ). In fact, as Q is positive definite and invertible, the minimum of Q with respect to is obtained for ˆ = −Q−1 Q u u and is equal to Q(u, ˆ) = u0 [Qu u − Qu Q−1 Q u ]u. Then, Q(u, ) − Q(u, ˆ) = 0 Q + 0 Q u u + u0 Qu + u0 Qu Q−1 Q u u = 0 Q − 0 Q ˆ − ˆ0 Q + ˆ0 Q ˆ = ( − ˆ)0 Q ( − ˆ) . iii As Q(u, ˆ) = min Q(u, ) is a constant in the integration, 1 exp − Q(u, ) d 2 Z = Z 1 1 exp − min Q(u, ) × exp[ − ( − ˆ)0 Q ( − ˆ)] d . 2 2 Let ∆ denote (−ˆ). Because Q is positive definite, for ∆ integrated over Rp where p is the dimension R of , we find that exp( − 21 ∆0 Q ∆) d ∆ = (2π)p/2 det(Q )−1/2 . It follows Z 1 exp − Q(u, ) d 2 = (2π) p/2 −1/2 det(Q ) 1 × exp − min Q(u, ) , 2 where the constant (2π)p/2 det(Q )−1/2 is independent of u. Suppose that we solve the program minu R exp − 12 Q(u, ) . Assume that Q admits a saddle point with respect to and u, so that maxu min Q(u, ) exists. This is the case if Q > 0 and Qu u − Qu Q−1 Q u < 0 hold. As a corollary of the former result we have Z min u 1 exp − Q(u, ) d 2 1 1 ∝ min exp − min Q(u, ) = exp − max min Q(u, ) . 2 u 2 2 u A.7. The (Discounted) Future Stress Backward Recursion. From Lemma 2 we know that if V n+1 is a quadratic form in zn+1 ρ exp Vn 2 ρ min max S n constant × exp 2 un n+1 = ρ = exp γn + min max S n , un n+1 2 for γn a constant independent of zn . This implies that V n = γn + minun maxn+1 S n . Then, assume that V n+1 = νn+1 + z0n+1 Πn+1 zn+1 . Since S n = ∆cn − ρ1 dn+1 + δ ∆ V n+1 , it follows that 1 ∆ ∆ 0 min max ∆cn − dn+1 + δ νn+1 + δ zn+1 Πn+1 zn+1 un n+1 ρ 1 ∆ ∆ 0 δ νn+1 + min max ∆cn − dn+1 + δ zn+1 Πn+1 zn+1 un n+1 ρ min max S n un n+1 = = = δ ∆ νn+1 + z0n Πn zn = δ ∆ νn+1 + Fn (zn ) , so that V n = γn + minun maxn+1 S n = νn + Fn (zn ), with νn = γn + δ ∆ νn+1 and Fn (zn ) = z0n Πn zn . iv A.8. The Value Function. Consider that (for p the dimension of the vector n+1 ) h ρ i En exp (∆cn + δ ∆ V n+1 ) 2 ρ exp Vn 2 ρ 1 (∆cn + δ ∆ V n+1 ) − 0n+1 (∆N)−1 n+1 dn+1 2 2 Z Sn = (2π)−p/2 det(∆N)−1/2 exp ρ dn+1 , so that 2 Z Sn −p/2 −1/2 = (2π) det(∆N) min exp ρ dn+1 . un 2 = (2π)−p/2 det(∆N)−1/2 Z exp The function −ρS n is a quadratic form in un and n+1 , which we can write as u0n Suu un +2u0n Su n+1 + 0n+1 S n+1 , with S = (∆N)−1 − δ ∆ ρΠn+1 . We can apply Lemma 3. From its proof we know that Z min un Sn exp ρ dn+1 2 (2π)p/2 det((∆N)−1 − δ ∆ ρΠn+1 )−1/2 × exp = ρ min max S n . 2 un n+1 Notice that (∆N)−1 − δ ∆ ρΠn+1 = (∆N)−1 (I − δ ∆ ρ∆NΠn+1 ), so that det((∆N)−1 − δ ∆ ρΠn+1 ) = det(I − δ ∆ ρ∆NΠn+1 )/det(∆N). Therefore, ρ exp Vn 2 ∆ = det(I − δ ρ∆N Πn+1 ) This implies that in A.7. γn = 2 ρ −1/2 Z min un Sn exp ρ 2 dn+1 . ln(det(I − δ ∆ ρ∆N Πn+1 )−1/2 ) = − ρ1 ln(det(I − δ ∆ ρ∆N Πn+1 )), while νn = γn + δ ∆ νn+1 . In steady state V n = F (zn ) + ν, where F (zn ) = z0n Πzn , while ν = − 1 1 − δ∆ 1 ln(det(I − δ ∆ ρ∆N Π)) . ρ For ρ ↓ 0 we can prove that γn → δ ∆ Tr(∆NΠn+1 ). This also implies that in steady state ρ ↓ 0 ν → δ∆ 1−δ ∆ Tr(∆NΠ). To show this result consider that calculate the limit for ρ ↓ 0 of γn we need to apply Hôpital’s rule. Then, we should use the following results which apply to any invertible matrix Ξ d ln(det(Ξ)) 1 d det(Ξ) = dy det(Ξ) dy and d det(Ξ) −1 d Ξ = det(Ξ) Tr Ξ . dy dy Then, d ln det I − δ ∆ ρ∆NΠn+1 dρ = − δ ∆ Tr (I − δ ∆ ρ∆NΠn+1 )−1 ∆NΠn+1 . For ρ ↓ 0 this converges to −δ ∆ Tr(∆NΠn+1 ). Applying Hôpital’s rule we conclude that γn → δ ∆ Tr(∆NΠn+1 ), so that in steady state ν → δ∆ 1−δ ∆ Tr(∆NΠ). v A.9. The L̃-Recursion. Suppose φ(z) = δ ∆ z0 Πz, so that L̃ φ(z) = max [(z + )0 δ ∆ Π(z + ) − 1 0 −1 ]. ρ (∆N) Taking first derivatives, we find that 2 1 (∆N)−1 ρ δ∆ Π − + 2δ ∆ Π z = 0 ⇔ ˜ = − δ∆ Π − 1 (∆N)−1 ρ −1 −1 ∆ δ ∆ Π z = − Π̌ δ Πz, which identifies a maximum for Π̌ negative definite. Plugging this formula in the expression for L̃φ(z), we find that L̃φ(z) = − 1 0 ˜ (∆N)−1 ˜ + (z + ˜)0 δ ∆ Π (z + ˜) ρ = − 1 0 ∆ −1 −1 −1 −1 z δ Π Π̌ (∆N)−1 Π̌ δ ∆ Π z + z0 (I − Π̌ δ ∆ Π)0 δ ∆ Π (I − Π̌ δ ∆ Π) z . ρ Now, − 1 ∆ −1 −1 −1 −1 δ Π Π̌ (∆N)−1 Π̌ δ ∆ Π + (I − Π̌ δ ∆ Π)0 δ ∆ Π (I − Π̌ δ ∆ Π) = ρ 1 ∆ −1 −1 −1 −1 −1 δ Π Π̌ (∆N)−1 Π̌ δ ∆ Π + δ ∆ Π − 2 δ ∆ Π Π̌ δ ∆ Π + δ ∆ Π Π̌ δ ∆ Π Π̌ δ ∆ Π = ρ 1 −1 −1 −1 δ ∆ Π − 2 δ ∆ Π Π̌ δ ∆ Π + δ ∆ Π Π̌ Π − (∆N)−1 Π̌ δ ∆ Π = ρ − δ ∆ Π − 2 δ ∆ Π Π̌ −1 ∆ −1 ∆ δ Π + δ ∆ Π Π̌ −1 ∆ δ Π = δ ∆ Π − δ ∆ Π Π̌ −1 ∆ δ Π = δ ∆ Π [I − Π̌ Consider that 1 (∆N)−1 ρ −1 1 δ ∆ Π − (∆N)−1 ρ δ∆ Π − I− δ∆ Π − 1 (∆N)−1 ρ = 1 1 δ ∆ (∆N)−1 ρ∆N Π − (∆N)−1 , so that ρ ρ = −1 δ ∆ ρ∆N Π − I δ∆ Π −1 ρ∆N . Hence, −1 ∆ I + I − δ ∆ ρ∆N Π δ ρ∆NΠ . = Since for I + A invertible (I + A)−1 = I − (I + A)−1 A, we find that −1 1 ∆ −1 I − δ Π − (∆N) δ∆ Π ρ " ∆ δ Π I− 1 δ Π − (∆N)−1 ρ ∆ −1 −1 I − δ ∆ ρ∆N Π = # ∆ δ Π vi and hence −1 = δ ∆ Π I − δ ∆ ρ∆N Π . δ Π] . Then, " 0 ∆ L̃φ(z) = z δ Π I − 1 δ Π − (∆N)−1 ρ ∆ −1 # −1 δ Π z = z0 δ ∆ Π I − δ ∆ ρ∆N Π z. ∆ For Π invertible, δ ∆ Π (I − ρ ∆Nδ ∆ Π)−1 = δ ∆ Π [((δ ∆ Π)−1 − ρ ∆N) δ ∆ Π]−1 = ((δ ∆ Π)−1 − ρ ∆N)−1 . e z, where Π e = ((δ ∆ Π)−1 − ρ ∆N)−1 if Π is invertible and δ ∆ Π I − δ ∆ ρ∆N Π −1 We conclude that L̃φ(z) = z0 Π otherwise. A.10. Second Order Conditions for the L̃-Recursion. Consider that the second order condition for the maximization in the L̃-recursion is that δ ∆ Π − 1 −1 ρ (∆N) being negative definite. Now, as this is a symmetric matrix, there exists a coordinate transfor- mation which diagonalizes it. This matrix will be negative definite iff all its eigenvalues are negative, or equivalently iff its elements on the main diagonal are negative, suggesting that is possible to proceed as in the scalar case. Hence, suppose that Π is invertible. The condition δ ∆ Π − ∆ −1 equivalent to (δ Π) 1 −1 ρ (∆N) < 0 is − ρ∆N > 0, as the elements on the main diagonal of the former matrix will be negative iff those on the latter are positive, or equivalently the former matrix is negative definite iff the latter is positive definite. We therefore establish that a solution to the L̃-recursion exists if an only if (δ ∆ Π)−1 − ρ∆N is positive definite. A.11. The L-Recursion. Suppose that Π is positive semi-definite and Q and R are positive definite. This will be true if the cost function c is positive definite in u and z (that Q and R are positive definite when the cost function is a positive definite quadratic form is obvious; that in this case Π is also positive semi-definite will be shown below). Assume also that the condition in Theorem 2 for the DLEQG problem to have a proper e (as long as its inverse, (δ ∆ Π)−1 − ρ∆N) is positive definite. solution holds, so that Π In solving the L-recursion consider that we need to solve min [∆c + φ((I + A ∆)z + B ∆ u)] , u e z and ∆c = u0 Q ∆ u + z0 R ∆ z + 2u0 S ∆ z. The first order condition is where φ(z) = z0 Π e B u + S z + B0 Π e (I + A ∆) z = 0 , so that 2 ∆ Q u + ∆ B0 Π −1 0e e u = K z with K = − (Q + ∆ B Π B) S + B Π (I + A ∆) . 0 e are positive definite a minimum is certainly reached since the denominator in the expression As Q and Π for the optimal control is also positive definite. Plugging the optimal control vector into the L-recursion, vii standard algebra shows that Lφ(z) = z0 Πz, where −1 e (I+A∆) − S0 + (I + A∆)0 Π eB eB e (I + A∆) . Π = ∆R + (I+A∆)0 Π Q + ∆B0 Π ∆S + ∆B0 Π A.12. Second Order Conditions for the L-Recursion. e n+1 is positive definite. Now, consider that Suppose that in n + 1 Πn+1 is positive semi-definite, while Π Lφ(zn ) = min un [∆c(zn , un ) + ((I + A∆) zn + ∆B un )0 Π̃n+1 ((I + A∆) zn + ∆B un )] . e n+1 is positive definite, Since the cost function c is a positive definite quadratic form in un and zn and Π Lφ(zn ) is non-negative and therefore Πn must be positive semi-definite. As in N ΠN is equal to 0, by induction we prove that in any period n the L-recursion has the solution discussed in the proof of Theorem 2, as the second order condition of the minimization is always respected, while the matrix Πn is positive semi-definite. That the cost function c is a positive definite quadratic form in un and zn is a sufficient condition for the DLEQG problem to have the recursive solution presented in Theorem 2, but it is not necessary. If e n+1 B is positive this assumption is abandoned, it will be necessary to verify that the matrix Q + ∆B0 Π definite in any period n. B. Markovian DLEQG Problems in Continuous Time B.1. The Continuous-time Limit. In A.9 we have shown that in the L̃-recursion, with φ(z) = z0 Πz, a maximum is found for ˜ = − 1 δ Π − (∆ N)−1 ρ ∆ −1 e z , with Π e = ((δ ∆ Π)−1 − ρ∆ N)−1 . δ ∆ Π z , so that L̃φ(z) = z0 Π Importantly, whatever Π, for ∆ ↓ 0 a maximum in the L̃-recursion above is found for ˜ = 0. In fact, when ∆ ↓ 0, for 6= 0, −(1/ρ)0 (∆N)−1 ↓ −∞ and hence a maximum for the L̃-recursion with respect e → Π. to is found for ˜ = 0, irrespective of Π. Notice also that for ∆ ↓ 0, Π In A.11 we have analyzed the L-recursion, showing that in the minimization min [∆c + φ((I + A ∆)z + B ∆ u)] , u e z and ∆c = u0 Q ∆ u + z0 R ∆ z + 2u0 S ∆ z, the first order condition is with φ(z) = z0 Π e B u + S z + B0 Π e (I + A ∆) z = 0 , so that 2 ∆ Q u + ∆ B0 Π viii −1 0e e S + B Π (I + A ∆) . u = K z with K = − (Q + ∆ B Π B) 0 e → Π, K → − Q−1 (S + B0 Π). Thus, in the limit a sufficient condition for the DLEQG For ∆ ↓ 0, as Π problem to have the recursive solution discussed in Theorem 3 is that Q being positive definite. Inserting the expression above for u in the minimizing function we find Lφ(z) = z0 Π− z, where e + ∆ R + K0 Q K + 2S0 K + A0 Π + Π e A + 2Π B K = Π Π− e A + K0 B Π e B K + 2A0 Π e BK , + ∆2 A0 Π so that applying the double recursion in n, Fn = LL̃Fn+1 , we find that Πn = e n+1 + ∆ R + K0 Q Kn + 2S0 Kn + A0 Π e n+1 + Π e n+1 A + 2Π e n+1 B Kn + o(∆2 ) , Π n e n+1 = ((δ ∆ Πn+1 )−1 − ρ∆ N)−1 = δ ∆ Πn+1 (I − ρ∆ N δ ∆ Πn+1 )−1 . This implies that where Π Πn − Πn+1 ∆ e n+1 − Πn+1 Π o(∆2 ) 0 0 0e e e + R + Kn Q Kn + 2S Kn + A Πn+1 + Πn+1 A + 2Πn+1 B Kn + ∆ ∆ 2 e n+1 − Πn+1 Π e n+1 + Π e n+1 A + K0 Q Kn + 2S0 Kn + 2Π e n+1 B Kn + o(∆ ) . + R + A0 Π n ∆ ∆ = = Now, given the expression for Kn , K0n Q 0 0e −1 e e + 2 (S + Πn+1 B) Kn = (S + Πn+1 B) 2I − (Q + ∆ B Πn+1 B) Q + o(∆) Kn = 0 0 e n+1 B) − (S + Π e n+1 B) − (S0 + Π 0 e n+1 B)−1 Q + o(∆) 2I − (Q + ∆ B Π e n+1 B)−1 Q 2I − (Q + ∆ B0 Π −1 0e e (Q + ∆ B Πn+1 B) S + B Πn+1 (I + A ∆) = 0 e n+1 B)−1 S + B0 Π e n+1 + o(∆) + o(∆2 ) . (Q + ∆ B0 Π e n+1 = lim∆↓0 Πn+1 = Π(t), we conclude that Therefore, considering that lim∆↓0 Π e n+1 B) Kn lim K0n Q + 2 (S0 + Π ∆↓0 = − (S0 + Π(t) B) Q−1 (S + B0 Π(t)) . In addition, lim ∆↓0 Πn − Πn+1 d Π(t) = − and ∆ dt e n+1 + Π e n+1 A = A0 Π(t) + Π(t) A . lim A0 Π ∆↓0 ix Finally, consider that e n+1 − Πn+1 = Πn+1 δ ∆ (I − ρ∆ N δ ∆ Πn+1 )−1 − I . Π For ∆ small we can write (I − ρ∆ N δ ∆ Πn+1 )−1 = I + ρ∆ N δ ∆ Πn+1 + o(∆2 ), so that e n+1 − Πn+1 = Πn+1 (δ ∆ − 1) I + ρδ 2∆ ∆ N Πn+1 ) + o(∆2 ) . Π It follows that e n+1 − Πn+1 Π ∆ = δ∆ − 1 ∆ Πn+1 + ρ δ 2∆ Πn+1 N Πn+1 + o(∆) . In conclusion, we find that lim ∆↓0 e n+1 − Πn+1 Π ∆ = ln δ Π(t) + ρ Π(t) N Π(t) and that the continuous-time counterpart of the Riccati equation is d Π(t) + R + A0 Π(t) + Π(t) A − (S0 + Π(t) B) Q−1 (S + B0 Π(t)) + ln δ Π(t) + ρ Π(t) N Π(t) dt = 0. B.2. The Value Function in the Continuous-time Limit. In A.8 and A.9 we have seen that V n = νn +Fn (zn ), where νn = γn +δ ∆ νn+1 and γn = − ρ1 ln det I − δ ∆ ρ∆NΠn+1 . We can then write that ∆ δ −1 νn+1 − νn νn+1 + ∆ ∆ The limit of the l.h.s. for ∆ ↓ 0 is ln δ ν(t) + = d ν(t) dt . 1 1 ln det I − δ ∆ ρ∆NΠn+1 . ∆ ρ As for the r.h.s. we need to apply Hôpital’s rule. Recalling that for any invertible matrix Ξ d ln(det(Ξ)) dy = −1 Tr Ξ dΞ dy , we see that d ln det I − δ ∆ ρ∆NΠn+1 d∆ = − δ ∆ (1 + ∆ ln δ) ρ Tr (I − δ ∆ ρ∆NΠn+1 )−1 NΠn+1 . For ∆ ↓ 0 this converges to −ρTr(NΠ(t)). This implies that in the limit νn converges to ν(t) which solves the following differential equation d ν(t) + ln δ ν(t) + Tr(NΠ(t)) dt x = 0. B.3. The Continuous-time Limit of the Kalman Filter. Consider that zn = (I + A∆)zn−1 + B∆un−1 + n , while wn = Czn−1!+ η n . In addition, we know that ∆N L the perturbations (0n , η n ) possess covariance matrix . Then, application of Kalman 1 0 L ∆M filter implies that ẑn (I + A∆)ẑn−1 + B∆un−1 + (L + (I + A∆)Ωn−1 C0 ) = 1 M + C Ωn−1 C0 ∆ −1 (wn − C ẑn−1 ) , where the conditional covariance matrix Ωn respects the Riccati recursion Ωn ∆N + (I + A∆) Ωn−1 (I + A∆)0 − (L + (I + A∆)Ωn−1 C0 ) = 1 M + C Ωn−1 C0 ∆ −1 × (L + (I + A∆)Ωn−1 C0 )0 . The former entails that ẑn − ẑn−1 ∆ Notice that ẑn − ẑn−1 ∆ For ∆ ↓ 0, = Aẑn−1 + Bun−1 1 ∆M + C Ωn−1 C0 −1 1 + (L + (I + A∆)Ωn−1 C0 ) ∆ = ∆ (M + ∆C Ωn−1 C0 ) −1 1 M + C Ωn−1 C0 ∆ −1 dẑ(t) dt , → d ẑ(t) dt (wn − C ẑn−1 ) , , so that = Aẑn−1 + Bun−1 + (L + (I + A∆)Ωn−1 C0 ) (M + ∆C Ωn−1 C0 ) ẑn − ẑn−1 ∆ −1 (wn − C ẑn−1 ) , ẑn−1 → ẑ(t), un−1 → u(t), wn → w(t), Ωn−1 → Ω(t). We conclude that = A ẑ(t) + B u(t) + (L + Ω(t) C0 )M−1 (w(t) − C ẑ(t)) . As for the limit behavior Ωn consider that we can write 1 (Ωn − Ωn−1 ) ∆ For ∆ ↓ 0, = Ωn − Ωn−1 ∆ d Ω(t) dt −1 N + AΩn−1 + Ωn−1 A0 − (L + Ωn−1 C0 ) (M + ∆C Ωn−1 C0 ) → = dΩ(t) dt and hence N + A Ω(t) + Ω(t) A0 − (L + Ω(t) C0 )M−1 (L + Ω(t) C0 )0 . xi (L + Ωn−1 C0 )0 + o(∆) . C. Markovian DLEQG Problems with Deterministic Disturbances C.1. The Recursive Optimization with Deterministic Disturbances to the Plant Equation. Assume that V n+1 = z0n+1 Πn+1 z0n+1 − 2ϑ0n+1 zn+1 + νn+1 . Then, consider exp ρ 2 Vn = h ρ i min E exp ∆cn + δ ∆ V n+1 . un 2 Given that zn+1 is a linear function of n+1 , this expression is equal to exp ρ 2 Vn = min un (2π)−p/2 det(∆N)1/2 Z ρ 1 0 ∆ −1 exp ∆cn + δ V n+1 − n+1 (∆N) n+1 d n+1 . 2 2 Given the expression for V n+1 , the argument inside the exponential can be written as ρ2 κn + Q(n+1 ), where κn contains terms independent of n+1 , while Q(n+1 ) = − 21 0n+1 Q, n+1 + Q0 n+1 with Q, = (I − δ ∆ ρΠn+1 ∆N)(∆N)−1 and e n + B∆un ) − δ ∆ ρ ϑn+1 . Q = δ ∆ ρ Πn+1 (Az n+1 ) = 21 Q0 Q−1 For n+1 = ˆn+1 = Q−1 , Q . Moreover, and , Q , Q(n+1 ) has a maximum equal to Q(ˆ more importantly, even if Q is not an homogeneous quadratic form, it is immediate to see that Q(n+1 ) max Q(n+1 ) − = n+1 1 (n+1 − ˆn+1 )0 Q, (n+1 − ˆn+1 ) . 2 In fact, − 12 (n+1 −ˆn+1 )0 Q, (n+1 −ˆn+1 ) = − 21 0n+1 Q, n+1 − 21 ˆ0n+1 Q, ˆn+1 +ˆ0n+1 Q, n+1 . Substi1 0 1 0 −1 0 tuting ˆn+1 with Q−1 , Q , we find − 2 n+1 Q, n+1 +Q n+1 − 2 Q Q, Q = Q(n+1 )−maxn+1 Q(n+1 ). Then, exp ρ 2 Vn = (2π)−p/2 min det(∆N)1/2 un = (2π)−p/2 det(∆N)1/2 Z ρ 1 exp κn + max Q(n+1 ) − (n+1 − ˆn+1 )0 Q, (n+1 − ˆn+1 ) d n+1 n+1 2 2 Z ρ 1 0 min exp κn + max Q(n+1 ) exp − ∆ Q, ∆ d∆ , un n+1 2 2 where ∆ = n+1 − ˆn+1 . This implies that exp ρ 2 Vn = (2π)−p/2 (2π)p/2 min 1/2 det(∆N) det(Q, )1/2 un exp ρ κn + max Q(n+1 ) n+1 2 Considering that det(Q, ) = det(I − δ ∆ ρΠn+1 ∆N)/ det(∆N), we find that exp ρ 2 Vn = = ρ det(I − δ ∆ ρΠn+1 ∆N)−1/2 exp min κn + max Q(n+1 ) un n+1 2 ρ det(I − δ ∆ ρΠn+1 ∆N)−1/2 exp min max κn + Q(n+1 ) . un n+1 2 xii It is immediate to see that, given the definitions of Q(n+1 ) and S n , ρ κn + Q(n+1 ) 2 = = exp ρ 2 Vn = 1 ∆cn + δ ∆ z0n+1 Πn+1 z0n+1 − 2δ ∆ ϑ0n+1 zn+1 − 0n+1 (∆N)−1 n+1 + δ ∆ νn+1 ρ ρ S n + δ ∆ νn+1 and hence even in this case 2 ρ det(I − δ ∆ ρΠn+1 ∆N)−1/2 exp min max S n + δ ∆ νn+1 . 2 un n+1 ρ 2 This implies that as in the homogenous case (see A.7 and A.8) Vn γn + δ ∆ νn+1 + min max S n , where γn = − = un n+1 1 log(det(I − δ ∆ ρΠn+1 ∆N)) . ρ C.2. The L̃-Recursion with Deterministic Disturbances to the Plant Equation. Suppose φ(z) = δ ∆ z0 Πz − 2δ ∆ ϑ0 z + δ ∆ ν, so that L̃ φ(z) = max [(z + )0 δ ∆ Π(z + ) − 2δ ∆ ϑ0 (z + ) + δ∆ ν − 1 0 −1 ]. ρ (∆N) Taking first derivatives, we find that 2 ˜ = − δ∆ Π − 1 (∆N)−1 ρ + 2δ ∆ (Π z − ϑ) = 0 ⇔ −1 −1 1 1 δ ∆ Π − (∆N)−1 δ ∆ Π z + δ ∆ Π − (∆N)−1 δ∆ ϑ , ρ ρ ie. ˜ = ˜o + ˜e , with ˜o = −Π̌ −1 ∆ δ Πz and ˜e = Π̌ −1 ∆ δ ϑ. Plugging this formula in the expression for L̃φ(z), we find that L̃φ(z) = = − 1 (˜o + ˜e )0 (∆N)−1 (˜o + ˜e ) + (z + ˜o + ˜e )0 δ ∆ Π (z + ˜o + ˜e ) − 2 δ ∆ ϑ0 (z + ˜o + ˜e ) + δ ∆ ν ρ 1 1 − ˜0o (∆N)−1 ˜o + (z + ˜o )0 δ ∆ Π (z + ˜o ) − ˜0e (∆N)−1 ˜e + ˜0e δ ∆ Π ˜e − 2 δ ∆ ϑ0 ˜e + δ ∆ ν ρ ρ | {z } | {z } ez z0 Π independent of z 2 − ˜0e (∆N)−1 ˜o + 2 ˜0e δ ∆ Π (z + ˜o ) − 2 δ ∆ ϑ0 (z + ˜o ) . ρ | {z } linear in z e z” correspond to the function L̃φ(z) of the homogenous formulation (See The terms tagged as “z0 Π A.9), so that we concentrate on those tagged as “independent of z” and “linear z”. Thus, starting from −1 ∆ the terms “independent of z”, consider that, as ˜e = Π̌ xiii δ ϑ, the sum of the first three can be written as δ 2∆ 1 −1 −1 −1 −1 −1 −1 ∆ ϑ − Π̌ (∆N) Π̌ + δ Π̌ Π Π̌ − 2 Π̌ ϑ ρ 1 −1 −1 −1 −1 2∆ 0 ∆ Π̌ − 2 Π̌ ϑ δ ϑ Π̌ δ Π − (∆N) ρ 0 = = −1 − δ 2∆ ϑ0 Π̌ ϑ. This implies that sum of the terms “independent of z” is equal to a constant ν̃, where ν̃ = ∆ δ ν − δ 2∆ −1 −1 1 ∆ −1 ϑ δ Π − (∆N) ϑ = δ ∆ ν + δ 2∆ ρ ϑ0 ∆N I − δ ∆ ρ Π ∆N ϑ. ρ 0 The terms “linear in z” can be re-written as follows 1 0 2 0 ∆ −1 ∆ 0 ˜ (∆N) − ˜e δ Π + δ ϑ (z + ˜o ) + ˜0e (∆N)−1 z . −2 ρ e ρ Given the expression for ˜e we find that the sum of the first three terms is equal to 1 ∆ 0 −1 −1 ∆ 0 −1 ∆ ∆ 0 δ ϑ Π̌ (∆N) − δ ϑ Π̌ δ Π + δ ϑ (z + ˜o ) −2 ρ = 1 −1 ∆ −1 δ Π − (∆N) −2 δ ϑ I − Π̌ (z + ˜o ) = 0 , while ρ | {z } ∆ 0 Π̌ 2 0 −1 ˜ (∆N)−1 z = 2 δ ∆ ϑ0 Π̌ (ρ∆N)−1 z . ρ e 0 −1 e z, with −ϑ e = δ ∆ (− ρ∆N)−1 Π̌ which we can write as −2ϑ −1 (− ρ∆N)−1 Π̌ (−ϑ). Now, = (− ρ∆N)−1 (δ ∆ Π − (ρ∆N)−1 )−1 = (− ρ∆N)−1 [(I + δ ∆ Π(− ρ∆N))(−ρ∆N)−1 )]−1 = (I − δ ∆ ρ Π∆N)−1 . e 0 z+ν̃, with ϑ e = δ ∆ (− ρ∆N)−1 Π̌−1 ϑ = δ ∆ (I−δ ∆ ρΠ ∆N)−1 ϑ e ϑ Then, we conclude that Lφ(z) = z0 Πz−2 −1 and ν̃ = δ ∆ ν + δ 2∆ ρϑ0 ∆N I − δ ∆ ρΠ∆N ϑ. Notice that for Π invertible we can also write e ϑ = δ ∆ ((δ ∆ Π)−1 − ρ ∆N)−1 (δ ∆ Π)−1 ϑ = e Π−1 ϑ . ((δ ∆ Π)−1 − ρ ∆N)−1 Π−1 ϑ = Π xiv C.3. The L-Recursion with Deterministic Disturbances to the Plant Equation. Suppose that φ(z) = z0 Πz − 2ϑ0 z + ν. Then, consider that Lφ(z) = e z + B∆ u + µ∆)0 Π (A e z + B∆u + µ∆) − 2 ϑ0 (A e z + B∆ u + µ∆) + ν] , min [∆c(z, u) + (A c(z, u) = u0 Q u + 2 u0 S z + z R z . u The first order condition is e z + B0 (Π µ∆ − ϑ) = 0 , (Q + ∆B0 Π B) u + (S + BΠA) so that the optimal control is uo + ue where uo = e z, Ko z = − (Q + ∆B0 Π B)−1 (S + B ΠA) ue = Ke (Π µ∆ − ϑ) = − (Q + ∆B0 Π B)−1 B0 (Π µ∆ − ϑ) . Plugging the expressions for uo and ue into Lφ(z), we find that z0 Π−1 z − 2 ϑ0−1 z + ν−1 = z0 R∆ z + (uo + ue )0 ∆Q (uo + ue ) + 2 z0 S0 ∆ (uo + ue ) + e z + B∆(uo + ue ) + µ∆]0 Π [A e z + B∆(uo + ue ) + µ∆] − 2 ϑ0 [A e z + B∆(uo + ue ) + µ∆] + ν = [A e z + B∆uo )0 Π (A e z + B∆uo ) + z0 R∆ z + u0o Q∆ uo + 2 z0 S0 ∆ uo + (A {z } | quadratic in z e z ] + 2 (Π µ∆ − ϑ)0 (Az e + B∆ uo ) + 2 u0 [(Q∆ + B0 ∆ΠB∆)uo + 2 (S∆ + B0 ∆ Π A) {z } | e linear in z u0e Q∆ ue + (B∆ue + µ∆)0 Π (B∆ue + µ∆) − 2 ϑ0 (B∆ue + µ∆) + ν . | {z } independent of z The terms in the right hand side of this equation tagged as “quadratic in z” will correspond to z0 Π−1 z”, the terms tagged as “linear in z” to will correspond to −2ϑ0−1 z, while those tagged “independent of z will correspond to ν−1 . In particular, as z0 Π−1 z e z + B∆uo ]0 Π [A e z + B∆uo ] = z0 R∆ z + u0o Q∆ uo + 2 z0 S0 uo + [A corresponds to the Riccati recursion of the homogenous LQG formulation, standard results indicate that e 0 ΠA e − (S0 + A e 0 ΠB)(Q + ∆B0 ΠB)−1 (S∆ + B0 ∆ΠA) e . Π−1 = R∆ + A Consider hence the sum of the terms “linear in z”, corresponding to −2ϑ0−1 z. Given that uo = Kz and xv e we see that the first term in parentheses is null so that that K = (Q + ∆B0 ΠB)−1 (S + B0 Π A), e + B∆K . −2ϑ0−1 z = 2 (Πµ∆ − ϑ)0 Γ z , i.e. ϑ−1 = Γ0 (ϑ − Π µ∆) with Γ = A Consider hence the sum of the terms “independent of z”, corresponding to ν−1 . We can write ue = Kµ µ∆ + Kϑ ϑ, where Kµ = −(Q + ∆B0 ΠB)−1 B0 Π and Kϑ = (Q + ∆B0 ΠB)−1 B0 , and B∆ue + µ∆ = (I + B∆Kµ )µ∆ + B∆Kϑ ϑ. Substituting out ue and B∆ue + µ∆ and regrouping terms, one finds that ! ν−1 = µ∆ 0 K0µ (Q∆ 0 + B ∆ΠB∆)Kµ + Π + 2K0µ B0 ∆Π µ∆ + ! 0 K0ϑ (Q∆ ϑ 0 0 + B ∆ΠB∆)Kϑ − 2B ∆Kϑ ϑ + ! 2ϑ 0 K0ϑ (Q∆ + B0 ∆ΠB∆)Kµ + K0ϑ B0 ∆Π − I − B∆Kµ µ∆ + ν . Substituting out Kµ and Kθ , we can show that the three terms in parentheses are equal respectively to (I − BJ−1 B0 ∆)Π, −BJ−1 B0 ∆ and −(I − BJ−1 B0 ∆Π), where J = (Q + B0 ΠB∆). This implies that ν−1 = µ∆0 (I − BJ−1 B0 ∆)Πµ∆ − 2 ϑ0 (I − BJ−1 B0 ∆Π)µ∆ − ϑ0 BJ−1 B0 ∆ϑ + ν . C.4. The Value Function with Deterministic Disturbances to the Plant Equation. We just combine results in C.1, C.2 and C.3. From C.1 we know that if V n+1 = z0n+1 Πn+1 z0n+1 − 2ϑ0n+1 zn+1 + νn+1 . Then Vn = γn + min max ∆cn + δ un n+1 ∆ z0n+1 Πn+1 z0n+1 − 2δ ∆ ϑ0n+1 zn+1 1 − 0n+1 (∆N)−1 n+1 + δ ∆ νn+1 ρ with γn = − ρ1 log(det(I − δ ∆ ρΠn+1 ∆N)). From C.2 we know that 1 max ∆cn + δ ∆ z0n+1 Πn+1 z0n+1 − 2δ ∆ ϑ0n+1 zn+1 − 0n+1 (∆N)−1 n+1 + δ ∆ νn+1 = n+1 ρ e0 e n+1 z0 en+1 , where z0n+1 Π n+1 − 2ϑn+1 zn+1 + ν ((δ ∆ Πn+1 )−1 − ρ ∆N)−1 , e n+1 Π = e n+1 ϑ = δ ∆ (I − δ ∆ ρΠn+1 ∆N)−1 ϑn+1 = ((δ ∆ Πn+1 )−1 − ρ ∆N)−1 Π−1 n+1 ϑn+1 , ν̃n+1 = −1 δ ∆ νn+1 + δ 2∆ ρ ϑ0n+1 ∆N I − δ ∆ ρ Πn+1 ∆N ϑn+1 . xvi , Then, from C.3 we find that V n = z0n Πn z0n − 2ϑ0n zn + νn , where Πn = e 0Π e n+1 A e − (S0 + A e 0Π e n+1 B)(Q + ∆B0 Π e n+1 B)−1 (S∆ + B0 ∆Π e n+1 A) e , R∆ + A ϑn = e n+1 − Π e n+1 µ Γ0n (ϑ n+1 ∆) , νn = e−1 B0 ∆)Π e n+1 µn+1 ∆ γn + ν̃n+1 + µ0n+1 ∆(I − BJ n+1 e 0 (I − BJ e0 e−1 B0 ∆Π e n+1 )µ e−1 0 e − 2ϑ n+1 ∆ − ϑn+1 BJn+1 B ∆ϑn+1 n+1 n+1 en+1 = (Q + B0 Π e n+1 B∆). and J C.5. The Continuous-time Limit with Pre-determined Disturbances. In C.3 we have seen that if zn = (I + A∆)zn−1 + B∆un−1 + µn ∆ + n , then, ϑ0n = e 0 [ I + ∆ (A + B Kn ) ] − ∆µ0 Π e ϑ n+1 n+1 n+1 [ I + ∆ (A + B Kn ) ] or equivalently ϑn − ϑn+1 ∆ e n+1 − ϑn+1 ϑ e n+1 − Π e n+1 µn+1 + o(∆) . + (A + B Kn )0 ϑ ∆ = e n+1 e n+1 = lim∆↓0 Πn = Π(t), lim∆↓0 ϑ Now, lim∆↓0 Π n+1 lim∆↓0 ϑn −ϑ ∆ = − d ϑ(t) dt . In addition, lim∆↓0 Kn = −Q = lim∆↓0 ϑn = ϑ(t), lim∆↓0 µn+1 = µ(t), −1 (S + B0 Π(t)). We conclude that lim (A + B Kn ) = A − B Q−1 S − JΠ(t) , e n+1 µ lim Π n+1 = Π(t) µ(t) , ∆↓0 ∆↓0 e n+1 = Π e n+1 Π−1 ϑn+1 , with J = B Q−1 B0 . Since ϑ n+1 e n+1 − ϑn+1 ϑ = e n+1 Π−1 − I) ϑn+1 . (Π n+1 e n+1 = (I − ρ ∆ δ ∆ Πn+1 N)−1 δ ∆ Πt+1 , so that In addition, Π e n+1 Π−1 − I) (Π n+1 = (I − ρ ∆ δ ∆ Πn+1 N)−1 δ ∆ − I . Since for ∆ small we can write (I − ρ ∆ δ ∆ Πn+1 N)−1 = I + ρ ∆ δ ∆ Πn+1 N + o(∆2 ) we find that e n+1 Π−1 − I) (Π n+1 = (I + ρ ∆ δ ∆ Πn+1 N + o(∆2 )) δ ∆ − I (δ ∆ − 1) I + ρ ∆ δ 2∆ Πn+1 N + o(∆2 ) . It follows that ∆ (δ − 1) = I + ρ δ 2∆ Πn+1 N + o(∆) ϑn+1 ∆ = e n+1 − ϑn+1 ϑ ∆ xvii and that lim ∆↓0 e n+1 − ϑn+1 ϑ ∆ = ln δ ϑ(t) + ρ Π(t) N ϑ(t) . Summing up terms we find that d ϑ(t) + ln δ ϑ(t) + ρ Π(t) N ϑ(t) + [A − B Q−1 S − J Π(t)]0 ϑ(t) − Π(t) µ(t) = 0 , dt which can also be written as d ϑ(t) e 0 ) ϑ(t) − Π(t) µ(t) = 0 , with + (ln δ + Γ(t) dt e Γ(t) = A − B Q−1 S − (J − ρ N )Π(t) . Finally, consider the term νn . We see from C.4 that νn = e−1 ∆B0 )Π e n+1 µn+1 ∆ γn + ν̃n+1 + ∆µ0n+1 (I − BJ n+1 e 0 BJ e n+1 , e 0 (I − BJ e−1 ∆B0 ϑ e−1 ∆B0 Π e n+1 )∆µn+1 − ϑ − 2ϑ n+1 n+1 n+1 n+1 where en+1 J = e n+1 B∆) (Q + B0 Π e n+1 ϑ = δ ∆ (I − δ ∆ ρ Πn+1 N∆)−1 ϑn+1 , ν̃n+1 = δ ∆ νn+1 + δ 2∆ ρ ϑ0n+1 N∆ I − δ ∆ ρ Πn+1 N∆ γn = − −1 ϑn+1 , 1 ln det I − δ ∆ ρ∆NΠn+1 . ρ This is equivalent to νn − δ ∆ νn+1 ∆ = −1 1 γn + δ 2∆ ρ ϑ0n+1 N I − δ ∆ ρ Πn+1 N∆ ϑn+1 ∆ e0 µ e0 e−1 0 e − 2ϑ n+1 n+1 − ϑn+1 BJn+1 B ϑn+1 + o(∆) . The l.h.s. can be written as νn − νn+1 1 − δ∆ νn − νn+1 ∂ν(t) 1 − δ∆ + νn+1 with lim = − and lim νn+1 = − ν(t) ln δ . ∆↓0 ∆↓0 ∆ ∆ ∆ ∂t ∆ so that the l.h.s. converges to −( ∂ν(t) ∂t + ln δ ν(t)). For the r.h.s. consider that in B.2 we have shown that xviii 1 ∆ γn e n+1 = lim∆↓0 ϑn+1 = ϑ(t), lim∆↓0 µ = Tr(NΠ(t)). Finally, recall that lim∆↓0 ϑ n+1 = µ(t) −1 −1 ∆ e and consider that lim∆↓0 J = −Q and lim∆↓0 (I − δ ρΠn+1 N∆) = I. Thus, we conclude that the lim∆↓0 n+1 r.h.s. converges to Tr(NΠ(t)) − 2 ϑ(t)0 µ(t) − ϑ(t)0 (BQB0 − ρN)ϑ(t) so that ν(t) respects the following differential equation ∂ν(t) + ln δ ν(t) + Tr(NΠ(t)) − 2 ϑ(t)0 µ(t) − ϑ(t)0 (BQ−1 B0 − ρN)ϑ(t) = 0 . ∂t D. Optimal Policy for a Risk-averse Monopolist D.1. Optimal Production Policy for a Risk-averse Entrepreneur. Given that Q = q, R = 0, S = −1/2, A = a, B = b, z = p and u = x, the continuous-time risk-sensitive Riccati equation is 1 d π(t) + 2 a π(t) − dt q which we can re-write as d π(t) dt 1 2 b π(t) − 2 + ρ σ2 π(t)2 + ln δ π(t) = 0, = h0 + h1 π(t) + h2 π(t)2 , with h0 = 1/(4q), h1 = −(2a + b/q + ln δ), h2 = b2 /q − ρσ2 , and transformed into a homogeneous ordinary differential equation of order two, d y(t) d y(t) d2 y(t) − h1 + h0 h2 y(t) d2 t dt 0 , with π(t) = − = 1 dt . h2 y(t) Assume then that y(t) = m exp(ζ t). We have a solution of the ODE iff ζ 2 m exp(ζ t) − ζ h1 m exp(ζ t) + h0 h2 m exp(ζ t) = 0 , ie. iff m ζ 2 − m h1 ζ + m h0 h2 = 0. ( This admits two roots equal to ζ = ζ1 = ζ2 = m1 exp(ζ1 t) + m2 exp(ζ2 t). Given that π(t) = π(t) = − 2 ( bq 1 2 h1 + 1 2 h1 − − h12 d y(t) dt y(t) 1 2 √ √ 1 2 D D , with D = h21 − 4h0 h2 . Thus, y(t) = , we can write that m1 ζ1 exp(ζ1 t) + m2 ζ2 exp(ζ2 t) . − ρ σ2 ) (m1 exp(ζ1 t) + m2 exp(ζ2 t)) We can impose the terminal condition π(T ) = 0 to find that m1 ζ1 exp(ζ1 T ) + m2 ζ2 exp(ζ2 T ) = 0 ⇔ m2 = − xix √ ζ1 ζ1 m1 exp((ζ1 − ζ2 ) T ) = − m1 exp( D T ) . ζ2 ζ2 Re-inserting this expression in that for π(t) we find that π(t) = − = − = − = − ( bq 1 − ρ σ2 ) 2 ( bq ζ1 − ρ σ2 ) 2 ( bq ζ1 − ρ σ2 ) 2 ( bq ζ1 − ρ σ2 ) 2 as ζ1 − ζ2 = D. Notice, that for 2 b q b2 q ! √ ζ1 exp(ζ1 t) − ζ1 exp( D T ) exp(ζ2 t) √ exp(ζ1 t) − ζζ21 exp( D T ) exp(ζ2 t) ! √ 1 − exp( D T ) exp(−(ζ1 − ζ2 ) t) √ 1 − ζζ12 exp( D T ) exp(−(ζ1 − ζ2 ) t) ! √ 1 − exp(− D (t − T )) √ 1 − ζζ12 exp(− D (t − T )) ! √ exp( D (T − t)) − 1 √ , ζ1 ζ2 exp( D (T − t)) − 1 > ρσ2 0 < ζ2 < ζ1 . It immediately follows that π(t) < 0, while for < ρσ2 , ζ2 < 0 < ζ1 . Even in this case π(t) < 0 as that for 2 b q 2 ( bq ζ1 −ρ σ2 ) and √ exp( D (T −t))−1 √ exp( D (T −t))−1 ζ1 ζ2 change sign. Notice = ρσ2 , ζ2 = 0 and the solution turns degenerate and collapses to the static solution. In fact, in this case ζ2 = 0 and ζ1 = h1 , so that y(t) = m1 exp(h1 t) + m2 and hence d y(t)/d t = m1 h1 exp(h1 t). The terminal condition π(T ) = 0 entails that m1 = 0 and this on turn implies that π(t) = 0. D.2. Riccati Equation for the Conditional Variance of the Commodity Price. The conditional variance of p(t) solves the continuous time Riccati equation dσp2 (t) dt = σ2 + 2aw(t) − 1 4 σ (t) . ση2 p Even this can be transformed into an ordinary differential equation d2 y(t) d y(t) σ2 − 2a − y(t) d2 t dt ση2 0 , with σp2 (t) = ση2 = d y(t) 1 . d t y(t) The ordinary differential equation will present a general solution y(t) = m exp(ιt) iff m exp(ι t) ι2 − 2a m exp(ι t)ι − σ2 m exp(ι t) ση2 = 0 , ie. iff σ2 m ση2 = 0. m ι2 − 2am ι − ( ι1 = a + q ι2 = a − q This admits two roots equal to ι = a2 + σ2 ση2 a2 + σ2 ση2 . xx Thus, y(t) = m1 exp(ι1 t) + m2 exp(ι2 t). Given that σp2 (t) = ση2 d y(t) dt y(t) , we can write that σp2 (t) = ση2 m1 ι1 exp(ι1 t) + m2 ι2 exp(ι2 t) m1 exp(ι1 t) + m2 exp(ι2 t) = ση2 θ ι1 exp(ι1 t) + ι2 exp(ι2 t) . θ exp(ι1 t) + exp(ι2 t) Imposing the initial condition that σp2 (0) = σ02 and substituting out ι1 and ι2 we find that s σ02 = ση2 a+ σ2 θ − 1 a2 + 2 ση θ + 1 ! which is equivalent to σ02 ση2 −a q a2 + σ2 ση2 θ−1 = θ+1 ⇐⇒ θ = xxi a+ σ2 ση2 1/2 a+ σ2 ση2 1/2 + σ02 ση2 − σ02 ση2 −a . −a
© Copyright 2026 Paperzz