Pessimistic Optimal Choice for Risk

Pessimistic Optimal Choice for Risk-averse Agents:
The Continuous-time Limit∗
Paolo Vitale
University of Pescara†
July 2014
∗
I wish to thank Fausto di Biase and seminar participants at LUISS University, the University of Bologna, the
University of Tor Vergata, the EIEF Center in Rome, and the 2014 John Cabot University Economic Theory Workshop.
†
Department of Economics, Università Gabriele d’Annunzio, Viale Pindaro 42, 65127 Pescara (Italy); telephone: ++39-085-453-7647; fax: ++39-085-453-7565; webpage: http:/www.unich.it/~vitale; e-mail:
[email protected]
Pessimistic Optimal Choice for Risk-averse Agents:
The Continuous-time Limit
ABSTRACT
We extend Hansen and Sargent’s (Hansen and Sargent, 1994, 1995, 2013) analysis of
dynamic optimization with risk-averse agents in two directions. Firstly, following Whittle
(Whittle, 1990), we show that the optimal risk-averse policy is identified via a pessimistic
choice mechanism and described by simple recursive formulae. Secondly, we investigate
the continuous-time limit and show that sufficient conditions for the existence of optimal
solutions coincide with those which apply under risk-neutrality. Our analysis is conducted
both under perfect and imperfect state observation. As an illustrative example, we analyze
the optimal production policy of an entrepreneur running a monopolistic firm which faces
a demand schedule subject to stochastic shocks, showing that risk-aversion induces her to
act more aggressively.
JEL Classification Numbers: C61.
Keywords: Pessimistic Agents, Time-discounting, Linear Exponential Quadratic Gaussian.
Pessimism: the tendency to
expect the worst in all things
Collins New Dictionary
Introduction
Risk-aversion is an important aspect of agents’ preferences, which heavily influences their actions, in particular when the economic environment is complex and uncertain, and agents
need to consider the future implications of their decisions. Investigating the nexus between
risk-aversion and agents’ behavior is a challenge, in that optimal control problems are typically difficult to solve under risk-aversion. Researchers have usually relied on linear-quadratic
formulations as they can be easily solved employing well-established results. Thus, within a
linear-quadratic set-up straightforward recursive formulae immediately yield the optimal policy, while according to the certainty equivalence principle unknown variables can be replaced
by their maximum likelihood estimates.
Linear-quadratic formulations are however problematic, as the convexity of the quadratic
cost function represents risk-aversion in an unsatisfactory manner. In fact, the prescribed
optimal policy does not change when the environmental uncertainty varies, while risk-averse
agents ought to care for the degree of uncertainty they face. In addition, in many economic
problems, such as the production problem we investigate in Section 2, a quadratic objective
function corresponds to the assumption of risk-neutrality on the part of the optimizing agent.
To correct for these shortcomings, Whittle (1990) introduces a risk-adjustment in the objective function of the standard linear-quadratic set-up moving from a linear-quadratic to a
linear-exponential-quadratic formulation. In doing so he augments the convexity of the objective function and the degree of risk-aversion of the optimizing agent. Within his formulation,
he shows that: i) the optimal policy is identified via a pessimistic (or worst-case) choice mechanism; ii) risk-sensitive (or modified) certainty equivalence and separation principles hold; and
iii) recursive formulae describe the optimal policy. Despite its versatility, few researchers have
employed Whittle’s methodology in economics (exceptions are Mamaysky and Spiegel, 2002;
van der Ploeg, 2009, 2010; Vitale, 1995, 2012; Zhang, 2004). This is because an important
limitation of his methodology is that it does not allow to consider time-discounting in a satisfactory way.
1
Hansen and Sargent (1994, 1995, 2013) have introduced time-discounting into risk-sensitive
optimal control problems by formulating a recursive optimization criterion à la Epstein and
Zin. We extend their contribution in two ways. Firstly, we manipulate their optimization criterion in order to exploit a number of results developed by Whittle. Secondly, we consider the
continuous-time limit of their discrete-time formulation. This allows to establish that sufficient
conditions for the existence of optimal solutions coincide with those which apply under riskneutrality. Indeed, an important feature of Hansen and Sargent’s linear-exponential-quadratic
formulation in discrete-time is that for a sufficiently high degree of risk-aversion no optimal
policy exists. On the contrary, in the continuous time limit an optimal solution always exists if
some regularity conditions are met, irrespective of the degree of risk-aversion.
This paper is organized as follows. In Section 1 we reformulate Hansen and Sargent’s recursive optimization criterion for the class of Markovian discounted linear exponential quadratic
Gaussian (DLEQG) problems so as to apply, with simple adjustments, results originally derived
by Whittle for the class of linear exponential quadratic Gaussian (LEQG) problems. In this
Section we see how Whittle’s risk-sensitive certainty equivalence principle (risk-sensitive CEP)
is maintained while the corresponding Riccati equation, which yields the optimal policy for the
class of Markovian LEQG problems, is modified.
In Section 2 we investigate the continuous-time limit of the DLEQG problem, showing that
in the limit the optimal policy is characterized by a modified version of the differential Riccati
equation which applies to the corresponding LEQG formulation. We are then able to show
that in the continuous-time limit if the quadratic cost function in the DLEQG formulation is
positive definite an optimal solution always exists irrespective of the degree of risk-aversion of
the optimizing agent.
In this Section we also discuss, as an illustrative example of the Markovian DLEQG problem
in continuous-time, the optimal production policy of a risk-averse entrepreneur which runs
a monopolist firm facing a demand schedule subject to stochastic shocks for the commodity
it produces. Interestingly, risk-aversion makes the entrepreneur more aggressive. Given her
preferences, the entrepreneur finds it optimal to systematically produce a larger quantity than
that selected by her risk-neutral counterpart. This is because, despite a larger supply of the
commodity depresses its price and jeopardizes future profits, it also reduces their variability.
Consequently, a risk-averse entrepreneur is willing to gain smaller profits (in order to reduce
their variability) and to produce a larger quantity of the commodity than a risk-neutral counterpart.
2
This may appear counter-intuitive as it contradicts results typically obtained within static
formulations. However, such a feature of the impact of risk-aversion on the behavior of economic agents in dynamic optimization exercises appears elsewhere. For instance, Holden and
Subrahmanyam (1994) and Vitale (1995, 2012) find that risk-aversion makes a privately informed strategic agent trade more aggressively in a sequential call auction market, while the
same trader would be more cautions in a one-shot call auction market. In other words, a point
which is worth emphasizing is that the Markovian DLEQG formulation allows to derive implications of risk-aversion which are both general and stark. Indeed, not only risk-averse agents
are pessimistic, but also bold.
In Section 3 we investigate the class of DLEQG problems under imperfect state observation.
Here, Whittle’s risk-sensitive separation principle (risk-sensitive SP), which allows to separate
control and estimation, is reformulated for the class of DLEQG problems and it is applied to
the example discussed in Section 2. In Section 4 we extend our analysis to the formulation in
which the state vector is subject to pre-determined disturbances. A final Section offers some
concluding remarks.
1
Discounted LEQG Problems
We define a specific class of optimal control problems, which are characterized by: i) a Markovian linear dynamic structure for a vector of state variables, zn ; ii) a multi-normal distribution
for an innovation vector, n ; and iii) a recursive optimization criterion à la Epstein and Zin.
Suppose the interval of time [0, T ] is divided in N sub-periods (n = 1, 2, . . . , N ) of length
∆ = T /N . For ∆ finite (e.g. ∆ = 1) we will have a discrete-time formulation, while for ∆ ↓ 0
we converge to the continuous-time analogous. We then introduce the following Definition:
Definition 1 An optimal control problem is said to be Markovian linear exponential quadratic
Gaussian with time-discounting if the following recursive optimization
Vn
=
min
un
ρ
i
2 h
ln En exp δ ∆ V n+1
,
∆cn +
ρ
2
(1.1)
where ρ (with ρ > 0) is the risk-enhancement coefficient, δ (with 0 < δ < 1) is the timediscounting factor, ∆cn is the (per-period) scalar-valued cost function and V n is the value function
(with terminal condition V N +1 = 0), is solved in periods n = 1, 2, . . . , N with respect to the free-
3
valued control vector un under the conditions that:
(i) the cost function, cn , is a quadratic form in the control vector, un , and the state vector, zn ,
cn = u0n Q un + z0n R zn + 2 u0n S zn ,
(ii) the state vector, zn , is governed by the following linear plant equation
zn = (I + A∆) zn−1 + B∆ un−1 + n , where n ∼ N [0, ∆N] and n ⊥ n0 .
The parameter ρ represents a risk-enhancement coefficient, in that it introduces extra convexity vis-a-vis that of the objective function of the Markovian linear quadratic Gaussian problem with time-discounting (DLQG problem henceforth). It can be shown that the convexity of
∆cn + ρ/2 ln En exp δ ∆ ρ2 V n+1
is increasing in ρ, while limρ↓0 ρ2 ln En exp δ ∆ ρ2 V n+1
= δ ∆ En [V n+1 ]. This indicates that in the limit, for ρ ↓ 0, the solution of the recursive optimization in Definition 1 converges to that of the standard dynamic programming recursion of
a discounted Markovian optimal control problem, i.e. V n = minun {∆cn + δ ∆ En [V n+1 ]}.1
By increasing the degree of risk-aversion of the optimizing agent, the DLEQG problem is
better suited to capture the impact of risk-aversion on agents’ decisions than the standard
DLQG formulation. In particular, the optimal policy is no longer independent of the agents’
degree of uncertainty (i.e. differently from what happens in the DLQG formulation, the optimal
policy will depend on the covariance matrix of the innovation vector, ∆N). Furthermore, it
allows to introduce risk-aversion in those DLQG problems, such as the one discussed in Section
2, where agents are risk-neutral.
Our formulation of the optimization criterion is an extension of that proposed by Hansen
and Sargent (1994, 1995, 2013) (it collapses to their formulation for ∆ = 1 and ρ0 = δρ),
which preserves the properties of monotonicity and convexity of their criterion and allows to
accommodate a continuous-time limit and to exploit a number of useful results derived by
Whittle (1990) for the class of LEQG problems.
The optimization criterion (1.1) is not a special characterization of preferences, as it can
be employed in the analysis of several economic issues (Hansen and Sargent, 2013; Hansen,
Sargent, and Tallarini, 1999; Luo, 2004; Luo and Young, 2010). Moreover, it corresponds to
the recursive preferences of Epstein and Zin (1991) when the elasticity of inter-temporal sub1
These and other results are proved in a separate Appendix.
4
stitution is 1, ρ is the coefficient of relative risk-aversion and log consumption is approximated
by a quadratic form of the state and control vectors (Tallarini, 2000).
In Whittle’s formulation of the LEQG problem there is no time-discounting, in that in any
period n the aggregate cost function ln En exp ρ C2
with C a time-separable function,
PN
C =
n=1 ∆cn , is minimized with respect to the control vector un . To accommodate time
PN
∆ n ∆c .
discounting one could consider the modified time-separable function C =
n
n=1 δ
This would be unsatisfactory because: i) the impact of risk-aversion would dissipate over-time
(Bouakiz and Sobel, 1984); and in the continuous-time limit the optimal control rule would
coincide with the risk-neutral counter-part. Therefore, as suggested by Hansen and Sargent, a
recursive optimization criterion such as (1.1) is called for. However, whereas our optimization
criterion differs from Whittle’s, we are still able to preserve most of his results with some minor
adjustments.2
To show this we notice that the following holds:
Lemma 1 The recursive optimization criterion (1.1) can be equivalently formulated as follows
Vn
=
n h
ρ
io
2
∆
ln min En exp
(∆cn + δ V n+1 )
.
un
ρ
2
(1.2)
Borrowing Whittle’s intuition we introduce the concept of (discounted) stress:
Definition 2 In n the (discounted) stress is S n ≡ ∆cn −
1
ρ
dn+1 + δ ∆ V n+1 , where dn is a per-
period discrepancy function equal to 0n (∆N)−1 n for n = 1, 2, . . . , N and 0 for n = N + 1.
This concept is similar to that originally introduced by Whittle. Our notion differs in two
respects: firstly, it covers only periods n and n + 1; secondly, the value function in period n + 1,
V n+1 , is pre-multiplied by the per-period discount factor δ ∆ . The (discounted) stress is useful
in that we can rely on the following Lemma, which adapts a result firstly outlined by Whittle
for the class of LEQG problems:
Lemma 2 In a Markovian DLEQG problem if the value function in n + 1, V n+1 , is a quadratic
form in the state vector zn+1 and the (discounted) stress S n satisfies a saddle point condition with
2
For δ ↑ 1 V n does not converge to Whittle’s aggregate cost function, since in the optimization criterion (1.1)
the minimization argument does not contain the past cost components, cm with m < n. However, for δ ↑ 1 the
optimal policy for the optimization criterion (1.1) converges to that for Whittle’s Markovian LEQG problem, in that
in period n cm , with m < n, is deterministic and constant with respect to un .
5
respect to n+1 and un , so that minun maxn+1 S n exists, the following proportionality holds
min En
un
h
i
ρ
exp
(∆cn + δ V n+1 )
∝ exp
min max S n ,
2
2 un n+1
ρ
∆
where the proportionality constant is independent of the state vector zn , while the value function
V n is a quadratic form in zn equal to the extremized stress, minun maxn+1 S n , plus a constant
independent of zn .
To prove this Lemma we first need to apply the following result:
Lemma 3 If Q(u, ) is a quadratic form in the vectors u and which admits the saddle point
maxu min Q(u, ), the following proportionality holds
Z
min
u
1
exp − Q(u, )
2
1
d ∝ exp − max min Q(u, )
2 u
.
It is worth noticing that the order of the extremizing operations is irrelevant in the determination of the saddle point if the two conditions Q > 0 and Qu u − Qu Q−1
Q u < 0
hold.3 More importantly, analogous results apply when Q is a non-homogeneous quadratic
form, which depends on u and alongside a third vector z, insofar it admits a saddle point
maxu min Q(u, , z).
We are now ready to prove Lemma 2.
Proof of Lemma 2. Consider that if V n+1 is a quadratic form in zn+1 , as the latter is linearly
dependent on n+1 ,
h
ρ
i
min En exp
(∆cn + δ ∆ V n+1 )
un
2
Z
∝
=
ρ
1
min exp
(∆cn + δ ∆ V n+1 ) − 0n+1 (∆N)−1 n+1
un
2
2
Z
Sn
dn+1 .
min
exp ρ
un
2
Since V n+1 is as a quadratic form in n+1 , un and zn , so is S n . If the stress in n admits the
saddle point minun maxn+1 S n , then −S n admits the saddle point maxun minn+1 {−S n }. We
can apply Lemma 3 for Q(un , n+1 ) = −ρS n , so that
Z
min
un
3
Sn
exp ρ
dn+1
2
∝
exp
1
− max min (−ρS n )
2 un n+1
= exp
ρ
min max S n
2 un n+1
The quadratic form is extremized when it is maximized with respect to and minimized with respect to u.
6
,
dn+1
where the constant of proportionality depends on the covariance matrix of the vector n+1
and it is independent of both un and zn . As S n is a quadratic form in n+1 , un and zn ,
from the saddle point condition we conclude that the extremized stress, minun maxn+1 S n , is
a quadratic form in zn , while V n in equation (1.2) is equal to the extremized stress plus a
constant independent of zn . The requirement that the stress satisfies a saddle point condition may actually not hold. As
it will be clearer later, for a sufficiently large degree of risk-aversion (i.e. for ρ large enough)
the stress in period n will not be negative definite in n+1 indicating that the saddle point
condition cannot be met and that the DLEQG problem does not present an optimizing solution,
in that the value of V n becomes infinite. In other words, while suggesting a way to solving
Markovian DLEQG problems, this Lemma also establishes that under extreme circumstances
such problems are not well-behaved and their optimization is meaningless. However, we will
see that such a problem vanishes in the continuous-time limit, as for ∆ ↓ 0 a saddle point
always exists under the same sufficient conditions which apply to the risk-neutral formulation
(specifically that Q being positive definite).
As corollary of Lemma 2 we establish a simplified version of Whittle’s risk-sensitive certainty
equivalence principle (RSCEP) which is useful in addressing Markovian DLEQG problems.
Theorem 1 - (Risk-sensitive Certainty Equivalence Principle). In a Markovian DLEQG problem, if
the saddle point condition for the stress is respected in all future periods, i.e. minun+j maxn+j+1 S n+j
exists for j = 0, 1, . . . , N − n, the optimal value of the vector un is determined in period n by simultaneously minimizing S n with respect to un and maximizing it with respect to n+1 . The value
function is then proportional to the extremized stress, V n ∝ minun maxn+1 S n .
Proof. Notice that in N cN is a positive definite quadratic form in uN , while V N +1 = dN +1 = 0.
This implies that S N is a quadratic form in uN and N +1 and hence that the conditions to apply
Lemma 2 are met, so that the saddle point condition for S N yields the optimal control uN , with
the extremized stress, minuN maxN +1 S N , and the value function, V N , both quadratic forms
in zN . By backward induction the statement is established. Theorem 1 is particularly useful in that it suggests that, when the stress is well-behaved
and hence the DLEQG problem admits a meaningful solution, to pin down the optimal policy
it is sufficient to impose recursively the saddle point condition for S n . The recursion starts in
period N and proceeds backward. Therefore, if the saddle point condition is met in periods
N , N − 1, . . ., n + 1, in n the optimal policy is derived by first maximizing S n with respect to
7
the innovation vector n+1 and then by minimizing the resulting expression with respect to the
control vector un .
An economic interpretation of Theorem 1 is that a risk-averse agent whose preferences are
represented by the optimization criterion (1.1) attempts to hedge against the worst possible
values for the vector n+1 , by following a min-max strategy according to which she selects un
to minimize her welfare loss (i.e. the stress) against the most unfavorable innovation vector
n+1 . Such an agent acts as if she were pessimistic, considering these worst-case realizations
very likely. Consequently she tunes her actions on their impact on her welfare, applying what
we term, borrowing Whittle’s terminology, a pessimistic choice mechanism. Theorem 1 revises
the certainty equivalence principle of the Markovian DLQG problem: the normally distributed
unobservable variables are no longer replaced by their maximum likelihood (ML) estimates,
but by those that maximize the stress in order to compensate for risk-aversion.
In other words, in the Markovian DLQG problem the separation principle between optimization over the control vector and estimation of the unknown values applies, in that the
control vector is chosen as it would be in the perfect information case, with the unobservable
values replaced by their ML estimates. On the contrary, in the DLEQG problem the derivation
of the optimal control and the optimal estimation of the unknown values are intertwined, as
the optimal control and optimal estimates are chosen in order to extremize the stress. Indeed,
differently from the Markovian DLQG problem, uncertainty over the innovation vector n+1
conditions the optimal choice of the control vector, un . Specifically, the statistical characteristics of n+1 , and hence its covariance matrix ∆N, influence the optimal value of the vector un .
Viceversa, the cost function and the degree of risk-aversion affect the optimal estimate of n+1 ,
which no longer corresponds to the ML estimate.
Theorem 1 indicates that because of the recursive structure of the Markovian DLEQG problem the extremization of the stress can be undertaken recursively. In particular, starting from
N one proceeds backward imposing the saddle point conditions for the stress in periods N ,
N − 1, . . ., 1. In this respect, we adapt to the Markovian DLEQG problem a result originally
derived by Whittle.
Lemma 4 The saddle point conditions for the stress S n , with n = 1, 2, . . . , N , can be satisfied by
solving the following (discounted) future stress backward recursion
Fn (zn )
=
min
un
max
n+1
∆cn −
8
1
dn+1 + δ ∆ Fn+1 (zn+1 )
,
ρ
(1.3)
where Fn (zn ), denoted as the extremized (discounted) future stress, is a quadratic form in the
state vector zn , Fn (zn ) ≡ z0n Πn zn with ΠN +1 ≡ 0. The value function in n is V n = νn + Fn (zn ),
where νn is independent of zn .
Proof. Let us start from N . By definition V N +1 = 0 and dN +1 = 0. Given that cN is a quadratic
form in uN and zN , we immediately see that: i) imposing the saddle point condition for the
stress in N , S N , is equivalent to solving recursion (1.3); and ii) there exist a matrix ΠN and a
constant νN independent of zN such that the extremized future stress is FN (zN ) ≡ z0N ΠN zN
and that exp(ρV N /2) = exp( 12 ρ[νN + FN (zN )]). Proceeding backward, the optimal control
vector in period N − 1 is obtained by imposing the following saddle point condition
min max S N −1 = min
uN −1
N
uN −1
1
∆
.
max ∆cN −1 − dN + δ V N
N
ρ
Since V N = νN + FN (zN ) and νN is independent of zN , this is equivalent to imposing the
saddle point condition
1
∆
.
min max ∆cN −1 − dN + δ FN (zN )
uN −1
N
ρ
Given that cN −1 is a quadratic function in uN −1 and zN −1 , dN is a quadratic form in N and
FN (zN ) is a quadratic form in zN while this is linear in uN −1 , zN −1 and N , we find that
the result of this extremization is given by a quadratic form of zN −1 , so that there exists a
matrix ΠN −1 such that FN −1 (zN −1 ) = z0N −1 ΠN zN −1 and exp(ρV N −1 /2) = exp( 21 ρ[νN −1 +
FN −1 (zN −1 )]). Since the same argument applies in any other period n as long as Fn+1 is a
quadratic form in zn+1 , by backward induction the statement is proved. In conclusion, a recursion similar to the Bellman equation for the value function of dynamic
programming is established: given the extremized future stress in period n + 1, the optimal
policy in n is obtained by solving the future stress recursion.4 Similarly to the Markovian DLQG
problem, the extremized future stress is a quadratic form in the state vector, Fn (zn ) = z0n Πn zn ,
while it can be established that the optimal policy is linear in the state vector, un = Kn zn ,
where Πn and Kn respect recursions which correspond to modified versions of the Riccati
recursions for the Markovian DLQG problem. Such a recursion is analogous to that provided
by Hansen and Sargent (1994, 1995, 2013).5
4
For δ ↑ 1 the future stress recursion (1.3) converges to the recursion originally derived by Whittle for the
Markovian LEQG problem. This confirms that the optimal policy for our optimization criterion converges to that
for Whittle’s Markovian LEQG problem for δ ↑ 1. In other words, the DLEQG problem encompasses the LEQG one.
5
Interestingly, they also propose a similar result when they find that solutions to their Markovian DLEQG prob-
9
Applying Lemma 4 we can establish the following Theorem, which extends Whittle’s risksensitive Riccati equation to the class of Markovian DLEQG problems and reveals the nexus
with the common solutions to the standard Markovian DLQG problem:
Theorem 2 - (Risk-sensitive Riccati Equation). If the matrix (δ ∆ Πn+1 )−1 − ρ∆N is positive
definite, in period n the extremized (discounted) future stress exists and it is given by
Fn (zn ) = z0n Πn zn for the optimal control un = Kn zn ,
where
(1.4)
e 0Π
e n+1 A
e − (S0 + A
e 0Π
e n+1 B)(Q + ∆B0 Π
e n+1 B)−1 (∆S + ∆B0 Π
e n+1 A)
e (1.5)
Πn = R∆ + A
,
Kn
=
e n+1 B)−1 (S + B0 Π
e n+1 A)
e ,
− (Q + ∆B0 Π
(1.6)
e n+1 = ((δ ∆ Πn+1 )−1 − ρ ∆N)−1
Π
(1.7)
e = (I + A∆) .
and A
Proof. In the Markovian DLEQG problem the extremized future stress Fn (zn ) respects the
double recursion Fn = L L̃ Fn+1 , based on the following two operators
e + Bu)] and L̃ φ(z) = max [φ(z + ) −
L φ(z) = min [∆c(z, u) + φ(Az
u
where φ(z) = δ ∆ z0 Πz, so that L̃ φ(z) = max [(z + )0 δ ∆ Π(z + ) −
1 0
(∆N)−1 ] ,
ρ
1 0
−1
ρ (∆N) ].
Taking
first derivatives, we find that
˜ = − (δ ∆ Π −
1
−1
∆N−1 )−1 δΠ z = − Π̆ δΠ z ,
ρ
which pins down a maximum if Π̆ is negative definite, or equivalently if (δ ∆ Π)−1 −ρ∆N is positive definite. Replacing this expression we conclude that L̃ φ(z) = z0 ((δΠ)−1 − ρ∆N)−1 z =
e z. For L̃ φ(z) = z0 Π
e z, solution of the operator L yields the standard recursive formulae
z0 Π
e = ((δ ∆ Π)−1 − ρ∆N)−1 replaces
for Π and K from the Markovian LQG problem where Π
Π. Applying the two operators in n we obtain the recursive formulae for Πn and Kn presented in the statement. Importantly, as the cost function cn is positive definite in un and zn ,
e n+1 B is positive definite, so that the second order condition for minimization in
Q + ∆B0 Π
the operator L holds and Πn is positive semi-definite. It is worth noticing that with respect to the standard Riccati equation which applies to
lems are obtained using the same Bellman equation which yields the optimal policy in their robust control problem
(Hansen and Sargent, 2008).
10
the Markovian DLQG problem, the matrix δ ∆ Πn+1 is now replaced by the modified matrix
e n+1 .6 This shows that in the DLEQG formulation the optimal policy retains a specification
Π
which is similar to the one that would prevail in the DLQG one. Indeed, as for ρ = 0 we
obtain the standard Riccati equation of the DLQG problem, it is confirmed that the DLEQG
problem encompasses the DLQG one. For ρ > 0 a straightforward correction for the impact of
uncertainty and risk-aversion must however be inserted in the expressions for the recursions
of Πn and Kn , as the optimal policy depends on the agent’s risk-enhancement coefficient, ρ,
and her uncertainty over future shocks, ∆N.
The requirement that the matrix (δ ∆ Πn+1 )−1 − ρ∆N being positive definite derives from a
second order condition which must hold for the stress S n to satisfy the saddle point condition
imposed by Theorem 1. As noted by Whittle, whenever the cost function cn is non-negative
such condition fails for ρ large enough, indicating that the value function V n is infinite and
confirming our earlier claim that for a sufficiently large degree of risk-aversion the DLEQG
problem is not well-behaved and does not admit an optimizing solution. An economic interpretation of the failure of the optimization problem is that in these extreme circumstances
the optimizing agent becomes so pessimistic as to consider her control ineffective and hence
useless.7
Because of time-discounting it is possible to consider the limit case for T ↑ ∞, that is a
DLEQG problem with infinite horizon. As indicated by Hansen and Sargent (1994) there is
no certainty that for T ↑ ∞ the value function V n is finite and as a consequence the DLEQG
problem may be not well-defined. However, when a minimum is reached we can identify a
stationary solution, in that in the limit Πn → Π and Kn → K, where the limit matrices are
determined by the fixed point in the risk-sensitive Riccati equation,
Π
=
e 0Π
eA
e − (S0 + A
e 0 ΠB)(Q
e
e −1 (∆S + ∆B0 Π
e A)
e ,
R∆ + A
+ ∆B0 ΠB)
(1.8)
e ≡ ((δ ∆ Π)−1 − ρ∆N)−1 .
with Π
(1.9)
6
With respect to the original result by Whittle for the LEQG problem two adjustments are required as consequence of the introduction of time-discounting: i) the second order condition for the extremization of the future
stress function requires that (δ ∆ Πn+1 )−1 − ρ∆N, rather than Π−1
n+1 − ρ∆N, is positive definite; and ii) in the
e n+1 = ((δ ∆ Πn+1 )−1 − ρ∆N)−1 replaces Π
e n+1 = (Π−1 − ρ∆N)−1 .
modified Riccati equation Π
n+1
7
The matrix Πn+1 may be non-invertible (this being necessarily the case for ΠN +1 = 0), so that Theorem 2 cannot be applied. The proof of such Theorem indicates how to amend its statement, in that for
Πn+1 non-invertible, the second order condition for the stress function to satisfy the saddle point condition
e n+1 =
is that δ ∆ Πn+1 − (1/ρ)(∆N)−1 is negative definite, while the modified matrix now takes the form Π
δ ∆ Πn+1 (I − δ ∆ ρ∆NΠn+1 )−1 . In N , since ΠN +1 = 0, we immediately see that the second order condition that
e N +1 = 0.
δ ∆ ΠN +1 − (1/ρ)(∆N)−1 being negative definite is trivially satisfied, while Π
11
2
The Continuous-time Limit
Whereas for ∆ = 1 we have the standard discrete time formulation, for ∆ ↓ 0 the Markovian
DLEQG problem of Section1 converges to its continuous-time analogue. Assuming that for
∆ ↓ 0, period n converges to time t (and final period N to terminal date T ), the following
result holds:
Theorem 3 In the continuous-time limit of the Markovian DLEQG problem, the optimal policy is
u(t) = K(t) z(t) , with
(2.1)
K(t) = − Q−1 (S0 + B0 Π(t)) ,
(2.2)
where the matrix Π(t) solves the following continuous-time risk-sensitive Riccati equation
d Π(t)
+ R + A0 Π(t) + Π(t) A − (S0 + Π(t) B) Q−1 (S + B0 Π(t))
dt
+ ρ Π(t) N Π(t) + ln δ Π(t)
under the boundary condition Π(T ) = 0 .
=
0,
(2.3)
(2.4)
Proof. From the proof of Theorem 2 we know that the extremized future stress is Fn (zn ) =
z0n Πn zn for un = Kn zn , where
0e
−1
Kn = − (Q + ∆ B Πn+1 B)
Πn
0 e
S + B Πn+1 (I + A ∆) ,
0
0
0e
e
e
e
= Πn+1 + ∆ R + Kn Q Kn + 2S Kn + A Πn+1 + Πn+1 A + 2Πn+1 B Kn + o(∆2 ) .
e n+1 (Π
e n+1 = ((δ ∆ Πn+1 )−1 − ρ∆N)−1 ) and considering
Given the expressions for Kn and Π
e n+1 = lim∆↓0 Πn+1 and lim∆↓0 Πn = Π(t), for ∆ ↓ 0 the Riccati difference
that lim∆↓0 Π
equation for Πn converges to its continuous-time counterpart in (2.3), while the terminal
condition ΠN +1 = 0 converges to the boundary condition Π(T ) = 0. The extremized future
stress in the continuous-time limit is F (z(t), t) = z(t)0 Π(t)z(t), while the optimal policy is
u(t) = K(t)z(t) with K(t) = −Q−1 (S0 + B0 Π(t)). It is worth noticing that in the continuous-time limit the solution to the Markovian DLEQG
e n converges to Πn . Indeed, this
problem is simpler, in that for ∆ ↓ 0 the modified matrix Π
12
is consequence of the fact that for ∆ ↓ 0 the stress S n always possesses a maximum in n+1
for n+1 = 0. This implies that the positive-definiteness condition for the extremization of
e n+1 B0 ) converges
the stress in Theorem 2 is redundant. Furthermore, in the limit (Q + ∆BΠ
to Q, so that a sufficient condition for the existence of a solution to the Markovian DLEQG
problem is that such matrix is positive definite. This means that as long as the cost function,
cn , is positive definite in un , in the continuous-time limit an optimal policy for the Markovian
DLEQG always exists, since the stress, S n , surely possesses a saddle point. In brief, conditions
for DLEQG problems to admit solutions are less restrictive in continuos time than in discrete
time and correspond to those which apply to DLQG problems.
Finally, notice that with respect to the standard Riccati equation which applies to the
LQG problem in continuous time, two extra terms appear in the DLEQG extension. The
term ρΠ(t)NΠ(t) modifies the standard Riccati equation as in the Markovian LEQG problem discussed by Whittle and captures the impact in continuous time of risk-aversion on the
dynamics of the optimal solution. Similarly, the extra term ln δ Π(t) captures the impact of
time-discounting. Interestingly, the two terms enter separately into the modified Riccati equation indicating that in the continuous-time limit the impact of risk-enhancement and timediscounting is clearly disjointed.
2.1
Optimal Production Policy for a Risk-averse Entrepreneur
To illustrate how the Markovian DLEQG formulation operates in continuous time we consider
an entrepreneur running a monopolistic firm which produces a perishable commodity. The
firm’s profits at time t are equal to p(t)x(t) − qx(t)2 , where p(t) is the commodity price and
x(t) the quantity produced. Since q is a positive constant, production presents increasing
marginal costs. We assume the demand schedule for this commodity is linear in its price and
its price variation,
xd (t) = ad − bd p(t) − cd
d p(t)
+ d (t) ,
dt
where all coefficients are positive. This means that the demand for the commodity is decreasing in both its price level and its price variation. Since the commodity is perishable, the
entrepreneur chooses the quantity the firm produces, accepting any price which will clear the
market. From the inverted demand function we obtain the following plant equation
d p(t)
dt
=
a p(t) + b x(t) + µ + (t) ,
13
(2.5)
where a = −bd /cd , b = −1/cd (with a and b negative), while µ = ad /cd and (t) = −d (t)/cd .
Because the demand for the commodity is subject to stochastic shocks the entrepreneur faces
an uncertain environment and will not be able to anticipate the profits her production decisions
will bring about in the future. Assuming she is risk-averse, her optimal production problem
can be casted within our Markovian DLEQG framework, for z(t) = p(t), u(t) = x(t) and
c(t) = qx(t)2 − p(t)x(t). For simplicity we restrict our analysis to a homogeneous formulation
with µ = 0, so that Q = q, R = 0, S = −1/2, A = a and B = b.8
From Theorem 3, we find that the optimal production policy respects the formulation
x(t) = κ(t) p(t) , where
1 1
− b π(t) and π(t) = −
κ(t) =
q 2
(2.6)
√
ζ1
b2
q
− ρ σ2
e
ζ1
ζ2
D(T −t) − 1
√
e D(T −t) − 1
!
,
2
with σ2 the variance of (t), T the terminal date, D = (2a + b/q + ln δ)2 − (1/q)( bq − ρσ2 )
√
and ζ1 , ζ2 = − 12 (2a + b/q + ln δ) ± 21 D. Inspection of this solution indicates that π(t) is never
positive, as the monopolistic firm would stop producing if the entrepreneur did not earn any
profits. In addition, since for t ↑ T π(t) ↑ 0 and κ(t) ↑ 1/(2q), as the entrepreneur approaches
the final horizon of her optimization problem, her optimal policy converges to the static solution, in which the entrepreneur chooses her production to maximizes her expected per-period
2
profits. On the contrary, for t ↓ −∞, π(t) ↓ −ζ2 /( bq − ρσ2 ), while k(t) ↓
1 1
q (2
+
bζ2
),
b2 /q−ρσ2
indicating that the optimal policy converges to the stationary solution for the formulation with
infinite horizon.
In Figure 4 we plot the dynamics of the coefficients π(t) and κ(t) under a specific parametric
choice for ρ = 0 and 5 and δ = 1 and 0.5. This graphical representation clearly confirms that
in the dynamic optimization exercise the entrepreneur will be more cautious than in the static
formulation and will choose to produce a smaller quantity of the commodity (the intensity
of her production policy is smaller than in the static formulation as κ(t) < (1/2q)). The
entrepreneur realizes that a larger quantity of the commodity brought to the market at time t
will jeopardize her future profits. Indeed, a larger quantity x(t) in t lowers the commodity price
p(t). Because of inertia in the demand for the commodity, such reduction propagates through
time, so that a larger production today induces lower prices and impaired profit opportunities
tomorrow. As the terminal date T approaches, future profits have a smaller impact on the
entrepreneur’s optimal policy, which converges to the static solution (κ(t) converges to the
8
In Section 4 we discuss the general formulation with µ > 0.
14
Control Rule Function
Riccati Function
2
0
‐0.05
1.9
1.8
‐0.15
κ(t)
π(t)
‐0.1
ρ=5,δ=0.5
ρ=0,δ=0.5
ρ=5,δ=1
ρ=0,δ=1
ρ=5,δ=0.5
ρ=0,δ=0.5
ρ=5,δ=1
ρ=0,δ=1
1.7
‐0.2
1.6
‐0.25
1.5
‐0.3
‐0.35
0
0.2
0.4
Time, t
0.6
0.8
0
1
0.2
0.4
Time, t
0.6
0.8
1
Figure 4: The dynamics π(t) and κ(t) for T = 1, σ2 = 0.25, a = − 1, b = − 0.5 and q = 0.25.
static optimal value 1/(2q)).
Both time-discounting and risk-aversion condition the production policy.9 Unsurprisingly,
for δ smaller the entrepreneur becomes more aggressive, as she gives less weight to future
profits in determining her current production policy. This implies that the intensity of the
production policy, κ(t), is larger for any t. Similarly, risk-aversion makes the entrepreneur
more aggressive, as κ(t) is larger for ρ = 5 than ρ = 0 throughout time.
The impact of risk-aversion on the optimal policy can be explained considering that a larger
quantity x(t) in t will reduce both the expected value and the variance of future profits. Given
that the optimization criterion in (1.1) favors early resolution of uncertainty, a risk-averse
entrepreneur will be willing to accept smaller expected future profits to reduce their variance
and will consequently choose a more aggressive production policy.
3
DLEQG Problems Under Imperfect State Observation
In many situations agents observe only a noisy signal on the state vector zn in period n. In
such a scenario the optimization criterion (1.1) is not well-defined, as the cost function cn is
no longer deterministic. Nevertheless, we can employ the optimization criterion (1.2). Under
9
While the plots represented in Figure 4 are specific to the parametric constellation we have chosen, qualitatively
similar results are obtained for other choices of such parameters. This suggests that Figure 4 represents the general
features of the impact of time-discounting and risk-aversion on the optimal policy of the entrepreneur.
15
imperfect state observation, while no longer equivalent to the former one, this optimization
criterion is well-defined.
To allow for imperfect state observation we introduce the following modified Definition for
the class of Markovian DLEQG problems:
Definition 3 An optimal control problem is said to be Markovian linear exponential quadratic
Gaussian with time-discounting and imperfect state observation if the following recursive optimization
i
h
ρ
Vn
En exp
2
=
h
ρ
i
min En exp
(∆cn + δ ∆ V n+1 ) ,
un
2
(3.1)
is solved in periods n = 1, 2, . . . , N with respect to the free-valued control vector un under the
conditions that:
(i) for n = 1, 2, . . . N , the cost function, cn , is a positive-definite quadratic form in the control
vector, un , and the state vector, zn , cn = u0n Qun + z0n Rzn + 2u0n Szn ;
(ii) the state vector, zn , follows the plant equation zn = (I + A∆)zn−1 + B∆un−1 + n ;
(iii) in n the vector of observable variables is given by
wn = C zn−1 + η n ,
with ψ n =
n
!
ηn
"
∼ N
0
0
!
,
!#
∆N
L
L0
1
∆M
and ψ n ⊥ ψ n0 .
As zn is now unobservable, the (discounted) stress takes a new formulation. In particular, let
ẑn−1 denote the expectation of the state vector zn−1 conditional on the information contained
in observation history Hn−1 = {h0 , Un−2 , Wn−1 }, with Un−2 = (u1 , . . . , un−2 ), Wn−1 =
(w1 , . . . , wn−1 ), Ωn−1 the corresponding conditional covariance matrix and P the covariance
matrix for ψ n , where
P=
∆N
L0
L
1
∆M
!
.
We introduce a revised Definition for the (discounted) stress:
Definition 4 Under imperfect state observation, the (discounted) stress is S n ≡ ∆cn − ρ1 (D n−1 +
dn + dn+1 ) + δ ∆ V n+1 where dn is equal to ψ 0n P−1 ψ n for n ≤ N and 0 for n = N + 1, while
D n−1 = (zn−1 − ẑn−1 )0 Ω−1
n−1 (zn−1 − ẑn−1 ).
16
We can then prove the following Lemma, which reformulates a result stated in Lemma 2 under
the perfect state observation:
Lemma 5 In a Markovian DLEQG problem, under imperfect state observation, if the value function in n + 1, V n+1 , is a quadratic form in the state vector zn+1 and the (discounted) stress S n
satisfies a saddle point condition with respect to ξn (with ξ 0n ≡ (z0n−1 − ẑ0n−1 ψ 0n ψ 0n+1 )) and un ,
so that minun maxξn S n exists, the following proportionality condition holds
h
ρ
i
ρ
∆
min En exp
(∆cn + δ V n+1 )
∝ exp
min max S n ,
un
2
2 un ξn
where the proportionality constant is independent of the state vector zn , while the value function
V n is a quadratic form in zn equal to the extremized (discounted) stress minun maxξn S n plus a
constant independent of zn .
Proof. Consider that under imperfect state observation V n+1 is still a function of zn+1 , while
cn is function of zn . Since under imperfect state information zn and zn+1 can be expressed in
terms of the vector ξ n , we have
h
ρ
i
min En exp
(∆cn + δ ∆ V n+1 )
un
2
Z
∝
min
un
ρ
1 0 −1
∆
exp (∆cn + δ V n+1 ) − ξ n Υn−1 ξ n dξ n ,
2
2
where Υn−1 denotes the covariance matrix of ξ n conditional on observation history Hn−1 . In
addition, since (zn−1 − ẑn−1 )0 ⊥ ψ 0n ⊥ ψ 0n+1 , we can write
h
ρ
i
min En exp
(∆cn + δ ∆ V n+1 )
un
2
Z
∝
min
un
ρ
1
exp
(∆cn + δ ∆ V n+1 ) −
2
2
ψ 0n+1 P−1 ψ n+1 +
ψ 0n P−1 ψ n + (zn−1 − ẑn−1 )0 Ω−1
n−1 (zn−1 − ẑn−1 )
Z
=
min
un
!
Sn
exp ρ
dξ n .
2
We proceed as in the Proof of Lemma 2. Since V n+1 is a quadratic function of zn+1 and the
latter is linearly dependent on ξ n and un , S n is a quadratic form in ξ n and un . If the stress,
S n , respects the aforementioned saddle point condition, exploiting the property of quadratic
17
dξ n
forms outlined in Lemma 3, we conclude that
Z
min
un
Sn
exp ρ
2
Z
dξ n
=
min
un
!
1
exp − (−ρS n ) dξ n
2 | {z }
Q(un ,ξn )
∝
exp
1
ρ
− max min (−ρS n ) = exp
min max S n ,
2 un ξn
2 un ξn
where once again we have made use of the fact that if the saddle point minun maxξn S n exists,
so does maxun minξn −S n . As the stress S n respects the saddle point condition, its extremized
value will be a quadratic form in zn and so will be V n . Lemma 5 suggests a revision of the RSCEP outlined in Theorem 1:
Theorem 4 - (Risk-sensitive Certainty Equivalence Principle). In a Markovian DLEQG problem,
under imperfect state observation, if the stress S n+j respects a saddle point condition with respect
to ξn+j and un+j for j = 0, 1, . . . , N − n, the optimal value of the vector un is determined in
period n by simultaneously minimizing S n with respect to un and maximizing it with respect to
ξ n . The value function is then proportional to the extremized stress, V n ∝ minun maxξn S n .
The vector ξ n contains the vectors zn−1 − ẑn−1 , n , η n , n+1 and η n+1 which in period n
are unknown. Following Whittle’s suggestion they can be expressed as linear functions of the
unobservable (at time n) state vectors, zn−1 , zn and zn+1 , and the vector of signals, wn+1 . It
follows that the saddle-point condition for the stress in n can be equivalently written as
min
max
un zn−1 ,zn ,zn+1 ,wn+1
Sn .
By writing the saddle point condition in this way, we see that it can be satisfied proceeding in
two stages: in stage i), conditionally on the current state vector zn , the stress, S n , is extremized
with respect to un , zn−1 , zn+1 and wn+1 ; in stage ii) the resulting function is extremized with
respect to zn . In fact, we notice that
min
max
un zn−1 ,zn ,zn+1 ,wn+1
S n ⇔ max
zn
min
max
un zn−1 ,zn+1 ,wn+1
Sn
.
In stage i), conditionally on zn , the extremization of the stress will be achieved by isolating
terms in S n which pertain to past and future. This allows a partial separation between estimation and control. Specifically, let Pn (zn , Hn ) denote the extremized (discounted) past stress
18
defined as
Pn (zn , Hn )
=
max
zn−1
1
−
ρ
dn + D n−1
(3.2)
,
where Hn is observation history in n. Then, the following Lemma, which adapts to the Markovian DLEQG problem a result originally derived by Whittle, holds:
Lemma 6 Under imperfect state observation, the extremization of the stress in period n, can be
obtained operating into two stages. Firstly, the extremized past and future stresses, Pn (zn , Hn )
and Fn (zn ), are calculated conditionally on the current state vector, zn . Secondly, the saddle point
for the stress S n is achieved by maximizing Pn (zn , Hn ) + Fn (zn ) with respect to zn .
Proof. The stress can be divided into two parts which contain values respectively depending
on past and future variables,
−
1
ρ
dn + D n−1
and ∆cn −
1
dn+1 + δ ∆ V n+1 .
ρ
In stage i), extremizing S n with respect to un and zn−1 , zn+1 , wn+1 , conditionally on the
current state vector zn , is equivalent to solving separately the two programs
1
max −
zn−1
ρ
∆dn + D n−1
and min
max
un zn+1 ,wn+1
1
∆cn − dn+1 + δ ∆ V n+1
ρ
.
As zn+1 and wn+1 are linearly dependent on n+1 and η n+1 , the latter program can be rewritten as follows
min
max
un n+1 ,η n+1
1
∆
∆cn − dn+1 + δ V n+1 .
ρ
The maximization of ∆cn −(1/ρ)dn+1 +δ ∆ V n+1 with respect to η n+1 reduces to maxηn+1 {−(1/ρ)dn+1 }
= − ρ1 0n+1 (∆N)−1 n+1 . Importantly, this means that under imperfect state observation the
extremization, conditionally on zn , of the stress, S n , with respect to un , zn+1 and wn+1 is
equivalent to the extremization of the stress under perfect state information with respect to un
and n+1 . Lemma 4 shows that this corresponds to calculating the extremized future stress,
Fn (zn ), which yields the optimal policy, u(zn ), conditional on the state vector zn .
The former program instead corresponds to calculating the extremized past stress in n,
Pn (zn , Hn ). This extremization identifies the optimal estimate for zn conditionally on information up to time n. In stage ii) estimation and control are recoupled by maximizing the sum
Pn (zn , Hn ) + Fn (zn ) with respect to the current state vector zn . 19
Theorem 4 and Lemma 6 extend to the class of Markovian DLEQG problems Whittle’s risksensitive separation principle (RSSP) which allows to find the optimal policy by partially separating estimation and control. In particular, if the conditions listed in Theorem 4 for the
existence of a meaningful solution to the DLEQG problem under imperfect state observation
are met, this is found through the following Theorem:
Theorem 5 - (Risk-sensitive Separation Principle). The extremized past and future stresses, Pn
and Fn , relate to estimation and control respectively, as the evaluation of the former identifies the
optimal estimate for the state vector zn conditional on past observations, while that of the latter
pins down the control un (zn ) which would be optimal if zn were known. The calculations of Pn
and Fn are recoupled by maximizing Pn + Fn with respect to the state vector zn so as to find the
maximum stress estimate (MSE) z̆n for the state vector zn .
- (Risk-sensitive Certainty Equivalence Principle). The optimal value of the control vector in n is
given by un (z̆n ).
As mentioned, conditionally on zn the future stress, Fn , respects the recursion (1.3) in
Lemma 4. Then, Theorem 2 provides the conditional optimal policy, un (zn ). The extremization
of the past stress is obtained via the following Lemma:10
Lemma 7 The extremized past stress is equal to Pn (zn , Hn ) = −(1/ρ) D n + · · · , where D n =
(zn − ẑn )0 Ω−1
n (zn − ẑn ) and · · · indicates terms independent of zn , ẑn denotes the ML estimate
of the state vector zn , given by its expectation conditional on observation history Hn , and Ωn the
corresponding conditional covariance matrix.
Proof. Pn (zn , Hn ) = maxzn−1 − ρ1 {dn + D n−1 } . From the definition of D n−1 we see that this
maximum is equivalent to maxzn−1 ρ1 2 ln f (zn | Hn−1 ) where f (.) is the conditional density
function of zn given Hn−1 . The maximum corresponds to Pn (zn , Hn ) = − ρ1 D n + · · · , where
once again · · · indicates terms independent of zn . The extremized past stress depends on the maximum likelihood estimate (MLE) of the state
vector zn . Given the noisy linear signal wn , from the Kalman filter we see that this value
10
A corollary of this Lemma is that the extremized past stress solves a (forward) recursion complementary to the
future stress recursion, in that Pn (zn , Hn ) = maxzn−1 {− ρ1 dn + Pn−1 (zn−1 , Hn−1 )}, with P0 (z0 , H0 ) = − ρ1 D 0 .
20
respects the following recursion
ẑn =
e ẑn−1 + B∆ un−1 + (L + AΩ
e n−1 C0 )
A
1
M + C Ωn−1 C0
∆
−1
×
(wn − C ẑn−1 ) , (3.3)
e = I + A∆ and the conditional covariance matrix Ωn respects the Riccati recursion
where A
0
e Ωn−1 A
e − (L + AΩ
e n−1 C )
Ωn = ∆N + A
1
M + C Ωn−1 C0
∆
−1
×
e n−1 C0 )0 . (3.4)
(L + AΩ
Importantly, Lemma 7 indicates that differently from what applies to the Markovian LEQG
problem studied by Whittle, in the calculation of the extremized past stress no adjustment
is made to the MLE of zn to correct for the impact of risk-aversion. This is because differently from Whittle’s formulation the past stress does not depend on the cost function.11
However, estimation and control are still intertwined as suggested in the discussion of Theorem 1. In fact, to re-couple the extremized past and future stresses we apply Theorem
5. This requires maximizing the sum Pn (zn , Hn ) + Fn (zn ) with respect to the state vector
zn to obtain the maximum stress estimate (MSE), z̆n . Given that Pn (zn , Hn ) + Fn (zn ) =
0
−(1/ρ)(zn − ẑn )0 Ω−1
n (zn − ẑn ) + zn Πn zn plus terms independent of zn , from the first derivative
of this sum with respect to zn it is immediate to see that, for Ω−1
n − ρ Πn positive definite, z̆n
is given by the following expression
z̆n = (I − ρ Ωn Πn )−1 ẑn ,
(3.5)
which is function of ρ and the matrix Πn . As the latter depends on the components of cn ,
we see that the MSE z̆n is clearly affected by the shape of the cost function alongside the
degree of risk-aversion, confirming the close nexus between control and estimation for the
class of DLEQG problems. Finally, the optimal control vector under imperfect state observation
is given by Theorem 2 where z̆n replaces zn , i.e. un = Kn z̆n , with Kn the matrix of optimal
coefficients presented in Theorem 2.
11
Whittle shows that a maximum past stress estimate (MPSE) replaces the standard MLE in the expression for
Pn . Such MPSE is obtained by introducing an adjustment to the Kalman filter which accounts for the impact of
risk-aversion on the optimal estimation of the state vector zn .
21
3.1
The Continuous-time Limit with Imperfect State Information
Under imperfect state information one should see how optimal control and estimation behaves
for ∆ ↓ 0. It is immediate to see that Theorem 4 and Lemma 7 still apply, with the qualification
that the ML estimate of zn will be replaced by its continuous-time counterpart. The continuoustime version of the Kalman filter indicates that for w(t) = Cz(t) + η(t), the ML estimate of the
state vector respects the following expression,
d ẑ(t)
dt
= A ẑ(t) + B u(t) + (L + Ω(t) C0 )M−1 (w(t) − C ẑ(t)) ,
(3.6)
where the conditional covariance matrix Ω(t) respects the Riccati differential equation,
d Ω(t)
dt
= N + A Ω(t) + Ω(t) A0 − (L + Ω(t) C0 )M−1 (L + Ω(t) C0 )0 .
(3.7)
Re-coupling the extremized past and future stresses yields the MSE for the state vector. In the
continuous-time limit this will be given by the expression
z̆(t) = (I − ρ Ω(t) Π(t))−1 ẑ(t) .
(3.8)
Let us reconsider our risk-averse entrepreneur and assume she only observes a noisy signal,
w(t) = p(t) + η(t), of the commodity price, p(t). To justify such an assumption notice that p(t)
corresponds to a relative price. While the entrepreneur observes the nominal price at which
the firm sells its commodity, she does not observe the general price level but only a price index,
which represents a noisy signal of the former. Assuming that she can continuously check such
price index, at any time t she receives the noisy signal w(t) on p(t).
Applying equations (3.6) and (3.7) we see that
dp̂(t)
dt
=
ap̂(t) + bx(t) +
dσp2 (t)
dt
=
σ2 + 2w(t) −
σp2 (t)
w(t)
−
p̂(t)
,
ση2
1 4
σ (t) .
ση2 p
The latter is a standard Riccati equation in continuous time. Given the boundary condition
22
σp2 (0) = σ02 , the conditional variance for the relative price p(t) at time t, is
σp2 (t) = ση2
θ = a+
σ2
ση2
1/2
a+
σ2
ση2
1/2
θι1 exp(ι1 t) + ι2 exp(ι2 t)
θ exp(ι1 t) + exp(ι2 t)
+
σ02
ση2
−a
−
σ02
ση2
−a
, where
1/2
σ2
.
and ι1 , ι2 = a ± a + σ 2
η
From equation (3.8) we see that the MSE for the relative price is then p̌(t) =
1
1−ρσp2 (t)π(t)
p̂(t),
while the optimal production policy becomes x(t) = κ(t)p̌(t).
4
The Time-heterogeneous Formulation
In Definition 1 the Markovian DLEQG problem is time-homogeneous, in that neither the plant
equation nor the cost function explicitly depends on period n. This Definition can be adjusted
to accommodate a non-homogenous plant equation and/or cost function, by making any of the
matrices A, B, N, Q, R and S time-dependent. Treatment of this generalization is straightforward. It is more involving the analysis of the formulation with pre-determined disturbance
terms into the plant equation. In particular, assume the state vector respects the following law
of motion
zn
=
(I + A∆)zn−1 + B∆un−1 + µn ∆ + n ,
where the vector µn contains pre-determined values. These values are known in advance
and represent anticipated disturbances which modify the original plant equation introduced in
Definition 1.
Under perfect state observation, with a pre-determined disturbance term µn in the law of
motion for the state vector Lemma 2 and Theorem 1 hold, as the stress in n, S n , is still a
quadratic form in un , n+1 and zn . However, this quadratic form is no longer homogeneous
in zn . This implies that Lemma 4 must be amended, in that the extremized future stress is
now a non-homogenous quadratic form, Fn (zn ) = z0n Πn zn − 2ϑ0n zn · · · (with ϑn a vector of
coefficients for the components of zn and · · · indicating terms independent of zn ). Given that
in N + 1 FN +1 ≡ 0, the terminal conditions ΠN +1 = 0 and ϑN +1 = 0 apply. Therefore,
Theorem 2 must be extended as follows:
23
Theorem 6 - (Risk-sensitive Riccati Equation). Under perfect state observation, if the matrix
(δ ∆ Πn+1 )−1 − ρN is positive definite and the state vector respects the linear plant equation with
pre-determined disturbances, in period n the extremized future stress is given by
Fn (zn )
=
z0n Πn zn − 2ϑ0n zn + · · · for
(4.1)
un
=
e n+1 B)−1 B0 Π
e n+1 (Π−1 ϑn+1 − µn+1 ∆) ,
Kn zn + (Q + ∆B0 Π
n+1
(4.2)
where
e 0Π
e n+1 A
e − (S0 + A
e 0Π
e n+1 B)(Q + ∆B0 Π
e n+1 B)−1 (∆S + ∆B0 Π
e n+1 A)
e (4.3)
Πn = R∆ + A
,
e n+1 B)−1 (S + B0 Π
e n+1 A)
e ,
Kn = − (Q + ∆B0 Π
(4.4)
e n+1 (Π−1 ϑn+1 − µn+1 ∆) ,
ϑn = Γ0n Π
n+1
(4.5)
e + ∆BKn ,
Γn = A
e n+1 = ((δ ∆ Πn+1 )−1 − ρ N)−1 and A
e = (I + ∆A) .
Π
with
(4.6)
Proof. We just repeat the steps followed in the proof of Theorem 2. Recall that the extremized
future stress respects the double recursion Fn = L L̃ Fn+1 based on the two operators
1
e
L φ(z) = min [∆c(z, u) + φ(Az+∆Bu+µ∆)]
and L̃ φ(z) = max [φ(z+) − 0 (∆N)−1 ] .
u
ρ
Assume that φ(z) = δ ∆ z0 Π z − 2δ ∆ ϑ0 z + · · · , so that L̃ φ(z) = max [(z + )0 δ ∆ Π(z + ) −
2δ ∆ ϑ0 (z + ) −
1 0
−1
ρ (∆N) ˜ = − (δ ∆ Π −
+ · · · ]. Taking first derivatives, we find that
1
1
(∆N)−1 )−1 δ ∆ Π z + (δ ∆ Π − (∆N)−1 )−1 δ ∆ ϑ ,
ρ
ρ
which pins down a maximum if (δ ∆ Π)−1 − ρ∆N is positive definite. Replacing this expression
e 0 z+· · · , where Π
e = ΠΠ
e z − 2ϑ
e = ((δ ∆ Π)−1 − ρ∆N)−1 , ϑ
e −1 ϑ
we conclude that L̃ φ(z) = z0 Π
e 0 z + · · · , the solution of the
e z − 2ϑ
and · · · denotes terms independent of z. For L̃ φ(z) = z0 Π
operator L yields the standard recursive formulae for Π, K and ϑ from the Markovian LQG
e replace
e = ((δ ∆ Π)−1 − ρ∆N)−1 and ϑ
problem with pre-determined disturbances, where Π
respectively Π and ϑ. Specifically, applying the double recursion Fn = L L̃ Fn+1 , we find that
e 0 zn +· · · for un = Kn zn +(Q+∆B0 Π
e n+1 − Π
e n+1 B)−1 B0 (ϑ
e n+1 µ
F (zn ) = z0 Πn zn −2ϑ
∆),
n
where Kn = −(Q +
n+1
e n+1 B)−1 (S
B0 Π
n+1
+
e n+1 A).
e
B0 Π
e n+1 with Π
e n+1 Π−1 ϑn+1 we
Replacing ϑ
n+1
find the recursive formulae presented in the statement. 24
Theorem 6 indicates that in the presence of pre-determined disturbances the optimal policy
contains a risk-adjusted correction which takes into account their anticipated values. This also
implies that the optimal control is not longer a simple linear function of the state vector, as an
extra term enters into the optimal policy in equation 4.2.12
A second adjustment must be introduced under imperfect state observation, when predetermined disturbances enter into the plant equation for the state vector. As Theorem 5
and Lemma 7 still apply, in recoupling the extremized past and future stresses, the sum
Pn (zn , Hn ) + Fn (zn ) is maximized with respect to zn to obtain the MSE, z̆n . Given that
0
0
0
Pn (zn , Hn ) + Fn (zn ) = −(1/ρ)(zn − ẑn )0 Ω−1
n (zn − ẑn ) + zn Πn zn − 2ϑn zn plus terms indepen-
dent of zn , taking the first derivative of this sum with respect to zn we see that, for Ω−1
n − ρ Πn
positive definite,
z̆n = (I − ρ Ωn Πn )−1 (ẑn − ρ Ωn ϑn ) ,
(4.7)
where ẑn is still the ML estimate of zn . Thanks to the presence of the pre-determined disturbances, this is given by
ẑn
e n−1 C0 )
e ẑn−1 + B∆ un−1 + µn ∆ + (L + AΩ
= A
1
M + C Ωn−1 C0
∆
(wn − C ẑn−1 ) .
−1
×
(4.8)
For the continuous-time limit of the Markovian DLEQG problem, where the pre-determined
disturbance, µ(t), enters into the law of motion for the state vector, the following result
holds:13
12
If the matrix Πn+1 is non-invertible Theorem 6 must be amended. The expression for the vector ϑn is now
e n+1 µ
e
− δ ∆ ρ∆NΠn+1 )−1 ϑn+1 − Π
n+1 ∆], while Πn+1 and the second order condition take the formulation given in footnote 7.
13
The presence of the pre-determined disturbances will only add a vector of pre-determined values, µ(t), in
the expression for the MLE of the state vector under imperfect state observation in equation (3.6), while the
continuous-time analogue for the MSE in equation (4.7) is z̆(t) = (I − ρ Ω(t)Π(t))−1 (ẑ(t) − ρ Ω(t) ϑ(t)).
Γ0n [δ ∆ (I
25
Theorem 7 In the continuous-time limit of the Markovian DLEQG problem with pre-determined
disturbances the optimal policy is
u(t) = K(t)z(t) + Q−1 B0 ϑ(t) ,
(4.9)
K(t) = − Q−1 (S0 + B0 Π(t)) ,
with
(4.10)
where the matrix Π(t) solves equation (2.3) and the vector ϑ(t) the following one
d ϑ(t)
+
dt
0
e
ln δ + Γ(t) ϑ(t) − Π(t) µ(t) = 0 ,
e
with Γ(t)
= A − B Q−1 S −
−1
BQ
(4.11)
B − ρ N Π(t) ,
0
under the boundary conditions Π(T ) = 0 and ϑ(T ) = 0 .
(4.12)
(4.13)
Proof. From Theorem 6 we know that the extremized future stress is Fn (zn ) = z0n Πn zn −
e n+1 is given by ((δ ∆ Πn+1 )−1 − ρN∆)−1 and the optimal control vector is
2ϑ0 zn + · · · , while Π
n
0e
−1
un = − (Q + ∆ B Πn+1 B)
0
e (I + A ∆)
S+B Π
zn
e n+1 − Π
e n+1 ∆B)−1 B0 (ϑ
e n+1 µn+1 ∆) , where
+ (Q + ∆B0 Π
e n+1 = Π
e n+1 = ((δ ∆ Πn+1 )−1 − ρ ∆ N)−1 and ϑ
e n+1 Π−1 ϑn+1 .
Π
n+1
While the matrix Πn respects the Riccati difference equation derived in the proof of Theorem
3, the vector ϑn respects the following difference equation
e 0 [ I + ∆ (A + B Kn ) ] − ∆µ0 Π
e
ϑ0n = ϑ
n+1
n+1 n+1
e n+1 − Π
e n+1 µ∆)0 B (Q + ∆B0 Π
e n+1 ∆B)−1 [ Q Kn + S + B0 Π
e n+1 (I + A ∆) ] .
− ∆ (ϑ
e n+1 and Π
e n+1 and considering that lim∆↓0 Π
e n+1 = lim∆↓0 Πn+1 ,
Given the expressions for Kn , ϑ
e n+1 = lim∆↓0 ϑn+1 and lim∆↓0 ϑn = ϑ(t), for ∆ ↓ 0 the differlim∆↓0 Πn = Π(t), lim∆↓0 ϑ
ence equation for ϑn converges to its continuous-time counterpart in (4.11), while the terminal conditions ΠN +1 = 0 and ϑN +1 = 0 converge to the boundary conditions Π(T ) = 0
and ϑ(T ) = 0. The extremized future stress in the continuous-time limit is F (z(t), t) =
z(t)0 Π(t) z(t) − 2ϑ(t)0 z(t) + · · · , while the optimal policy is u(t) = K(t)z(t) + Q−1 B0 ϑ(t)
with K(t) = −Q−1 (S0 + B0 Π(t)). 26
In the illustrative example we have considered in Section 2 we have assumed that µ =
ad /cd = 0. For µ 6= 0, we see that x(t) = κ(t)p(t) + ϑ(t)µ, where, applying Theorem 7, we find
that ϑ(t) solves the non-homogenous differential equation
dϑ(t)
+
dt
2
b
1
2
ϑ(t) −
ln δ + a +
− ρσ π(t)ϑ(t) − µπ(t) = 0 .
2q
q
(4.14)
Numerical methods are called for to solve such a differential equation. To obtain a closed-form
solution, we can consider the infinite horizon formulation of the optimal production problem
assuming that T ↑ ∞. In this case π(t) → π̄ while ϑ(t) → ϑ̄, so that x(t) = κ̄p(t) + ϑ̄µ. In
Section 2 we have seen that
π̄ = − b2
q
ζ2
− ρσ2


bζ2
1 1
.
+ 2
and κ̄ =
b
q 2
2
−
ρσ
q
In addition, notice that in steady state equation (4.14) simplifies to
2
b
1
2
ϑ̄ −
− ρσ π̄ ϑ̄ − µπ̄ = 0 ,
ln δ + a +
2q
q
so that
π̄
ϑ̄ =
ln δ + a +
1
2q
−
b2
q
µ.
− ρσ2 π̄
Importantly, the conclusions we drew in Section 2 on the impact of risk-aversion on the sensitiveness of the production policy with respect to the commodity price are unaffected by the
presence of the pre-determined disturbance µ.
5
Concluding Remarks
We have analyzed a specific class of optimal control problems, termed discounted linear exponential quadratic Gaussian (DLEQG) problems, where risk-averse agents satisfy a recursive
optimization criterion defined over a quadratic cost function in state and control vectors as in
Hansen and Sargent (1994, 1995, 2013) and a Markovian linear plant equation for the state
vector dictates the dynamics of the economic environment. We haveshown that the DLEQG
class is a generalization of Whittle’s (Whittle, 1990) linear exponential quadratic Gaussian
(LEQG) class which allows to accommodate time-discounting while preserving most of the
results he derives. Thus, Whittle’s risk-sensitive certainty-equivalence (RSCEP) and separa27
tion (SP) principles are reformulated, while his recursive formulae for the optimal control are
modified to adapt them to the DLEQG class.
In our analysis we consider the continuous-time limit of this class of optimal control problems and show that sufficient conditions for DLEQG problems to admit meaningful solutions
are less stringent in continuous than in discrete time and coincide with those which apply to
DLQG problems.
Our revised RSCEP confirms within the DLEQG class Whittle’s result that the optimal behavior of risk-averse agents can be identified via a pessimistic choice mechanism, according to
which the control variables are chosen by applying a min-max strategy, so that agents minimize
their welfare loss against the most adverse shocks.
A possible conjecture is that amid an uncertain environment a pessimistic agent acts cautiously, choosing conservative control rules. In effect, it is immediate to verify that within static
optimization problems under uncertainty more risk-averse agents are less aggressive. Common
intuition would then indicate that a similar result also holds within fully dynamic optimization
problems. Our analysis suggests the contrary. We see, within the fully dynamic optimization
problem faced by our monopolistic entrepreneur, that a larger degree of risk-aversion brings
about bolder optimal policies. Hence, a conclusion we draw from our analysis is that, differently from what common intuition would suggest, a pessimistic agent should not be confused
with a cautious one, as we see that pessimism induces such agent to act boldly.
References
B OUAKIZ , M.,
AND
M. S OBEL (1984): “Non Stationarity Policies Are Optimal for Risk-sensitive
Markov Decision Processes,” Discussion paper, California Institute of Technology.
E PSTEIN , L. G., AND S. E. Z IN (1991): “Substitution, Risk Aversion, and the Temporal Behavior
of Consumption and Asset Returns: An Empirical Analysis,” Journal of Political Economy, 99,
263–286.
H ANSEN , L.,
AND
T. J. S ARGENT (1994): “Discounted Linear Exponential Quadratic Gaussian
Control,” Discussion paper, University of Chicago mimeo.
(1995): “Discounted Linear Exponential Quadratic Gaussian Control,” IEEE Transactions on Automatic Control, 40, 968–971.
28
(2008): Robustness. Princeton University Press, Princeton.
(2013): Recursive Models of Dynamic Linear Economies. Princeton University Press.
H ANSEN , L. P., T. J. S ARGENT,
AND
T. D. TALLARINI (1999): “Robust Permanent Income and
Pricing,” Review of Financial Studies, 66, 873–907.
H OLDEN , C. W.,
AND
A. S UBRAHMANYAM (1994): “Risk-aversion, Imperfect Competition, and
Long-lived Information,” Economic Letters, 44, 181–190.
L UO, Y. (2004): “Consumption Dynamics, Asset Pricing, and Welfare Effects under Information
Processing Constraints,” Discussion paper, Princeton University.
L UO, Y., AND E. R. Y OUNG (2010): “Risk-sensitive Consumption and Investment under Rational
Inattention,” American Economic Journal: Macroeconomics, 2, 281–325.
M AMAYSKY, H.,
AND
M. S PIEGEL (2002): “A Theory of Mutual Funds: Optimal Fund Objectives
and Industry Organization,” Discussion paper, Yale School of Management.
TALLARINI , T. D. (2000): “Risk-Sensitive Real Business Cycles,” Journal of Monetary Economics,
45, 507–532.
VAN DER
P LOEG, F. (2009): “Prudent Monetary Policy and Prediction of the Output Gap,”
Journal of Macroeconomics, 31, 217–230.
(2010): “Political Economy of Prudent Budgetary Policy,” International Tax and Public
Finance, 17, 295–314.
V ITALE , P. (1995): “Risk-Averse Traders and Inside Information,” Discussion Paper 9504, Department of Applied Economics, University of Cambridge.
(2012): “Risk-averse Insider Trading in Multi-asset Sequential Auction Markets,” Economics Letters, 117, 673–675.
W HITTLE , P. (1990): Risk-sensitive Optimal Control. John Wiley & Sons, New York.
Z HANG, H. (2004): “Dynamic Beta, Time-Varying Risk Premium, and Momentum,” Discussion
paper, Yale School of Management.
29
Appendix A: Markovian DLEQG Problems in Discrete Time
A.1. Monotonicity and Convexity of the Optimization Criterion (1.1).
Let R(V) ≡ ln E exp δ ∆ ρ2 V . Then assume V 1 ≥ V 2 ≥ 0. Consider that
R(V 1 ) − R(V 2 )
h
ρ
i
h
ρ
i
ln E exp δ ∆ V 1
− ln E exp δ ∆ V 2
= ln
2
2
=
!
E exp δ ∆ ρ2 V 1
≥ 0 .
E exp δ ∆ ρ2 V 2
This means that R(V) is monotone increasing in V. So, consider the convex combination θV 1 + (1 −
θ)V 2 , with 0 < θ < 1.
R(θV 1 + (1 − θ)V 2 )
=
≤
=
=
i
ρ
θ
ρ
1−θ h
ρ
= ln E exp δ ∆ V 1 exp δ ∆ V 2
ln E exp δ ∆ [θV 1 + (1 − θ)V 2 ]
2
2
2
n h
ρ
ioθ n h
io1−θ
ρ
ln
E exp δ ∆ V 1
· E exp δ ∆ V 2
2
2
i
h
ρ
i
h
ρ
+ ln(1 − θ) E exp δ ∆ V 2
ln(θ) ln E exp δ ∆ V 1
2
2
ln(θ)R(V 1 ) + ln(1 − θ)R(V 2 ) ,
where the inequality follows from Hölder’s inequality. In fact, Hölder’s inequality states that for p and q
such that 1 < p, q, with 1/p + 1/q = 1, E[| X · Y |] ≤ (E[| X |p ])1/p · (E[| Y |q ])1/q . We can apply Hölder’s
θ
1−θ
inequality setting X ≡ exp δ ∆ ρ2 V 1 and Y ≡ exp δ ∆ ρ2 V 2
, choosing p = 1/θ and q = 1/(1 − θ)
and noticing that the exponential function is non-negative. This means that R is convex in V. Define
Γ(u, z, V) ≡ Q(u, z) + R(V), where Q is positive definite in u and z. From the properties of the
function R, it follows that Γ(u, z, V) is monotone increasing in V and convex in u, z and V, so that the
recursive optimization criterion captures risk-aversion. In fact, the larger ρ the larger the convexity of
the function Γ.
A.2. Limit Properties of the Optimization Criterion (1.1).
For ρ > 0 we have that
ρ Vn
n
h
ρ
io
ρ
i
2 h
∆
∆
= min ρ∆cn + 2 ln En exp δ
V n+1
= ρ min ∆cn + ln En exp δ
V n+1
,
un
un
2
ρ
2
ie. V n = min
∆cn +
un
ρ
i
2 h
ln En exp δ ∆ V n+1
ρ
2
.
We proceed by backward induction. Assume that V n+1 is independent of ρ (this being certainly true in
N + 1). Then,
ρ
i
1 h
lim ln En exp δ ∆ V n+1
ρ↓0 ρ
2
=
lim δ
ρ↓0
∆
ρ
1 En exp δ ∆ 2 V n+1 · V n+1
1
= δ ∆ En [ V n+1 ] ,
2
2
En exp δ ∆ ρ2 V n+1
i
where we have used the Hôpital’s rule and moved the derivative operator inside the expectation opera
tor. This implies that for ρ ↓ 0 we have V n = minun ∆cn + δ ∆ En [V n+1 ] , with V n independent of ρ,
while the argmin of the Markovian DLEQG problem converges to that of the corresponding Markovian
DLQG problem for ρ ↓ 0. This implies that the Markovian DLEQG problem encompasses the Markovian
DLQG problem and it can be considered an extension of the latter.
A.3. Equivalence for the Optimization Criterion (1.1) and that of Hansen and Sargent’s.
For ∆ = 1 and ρ > 0 the following holds
Vn
=
min
cn
un
=
min
=
min
cn
un
cn
un
ρ
i
2 h
ln En exp δ V n+1
+
ρ
2
ρ
i
δ 2 h
+
ln En exp δ V n+1
δ ρ
2
0
ρ
2
+ δ 0 ln En exp
V n+1
,
ρ
2
so that our recursive optimization criterion (1.1) corresponds to that proposed by Hansen and Sargent
(1994) for ρ replaced by ρ0 ≡ δρ.
A.4. The Optimization Criterion (1.1) and Epstein and Zin’s Preferences.
Suppose U n solves Epstein and Zin’s recursion
(
U n = max (1 − δ)Cn1−η + δ En
h
U 1−γ
n+1
1−η
i( 1−γ
)
1
)( 1−η
)
,
where 1/η is the elasticity of inter-temporal substitution. Let η = 1. Tallarini (2000) shows that
δ
h
i ( 1−γ
)
.
U n = max Cn1−δ En U 1−γ
n+1
Taking logs,
ln U n = max (1 − δ) ln Cn +
h
i
δ
ln En U 1−γ
,
n+1
1−γ
or equivalently
ln U n
= max
1−δ
ln Cn +
h
i
δ
ln En U 1−γ
.
n+1
(1 − δ)(1 − γ)
We can re-write this as
h
i
ln U n
δ
1−γ
−
= min − ln Cn −
ln En U n+1
.
1−δ
(1 − δ)(1 − γ)
Un
For V n = − ln1−δ
, we have that −(1 − δ)V n = ln U n , so that U n+1 = exp(−(1 − δ)V n+1 ) and
U 1−γ
n+1 = (exp(−(1 − δ)V n+1 ))
1−γ
ii
= exp(−(1 − δ)(1 − γ)V n+1 ) .
Setting ρ0 = −2(1 − δ)(1 − γ), we can write
V n = min
− ln Cn + δ
0
2
ρ
ln
E
exp
V
,
n
n+1
ρ0
2
which corresponds to the optimization criterion (1.1) for ρ =
ρ0
δ ,
− ln Cn equal to a quadratic form in
the control and state vectors, un and zn , and ∆ = 1.
A.5. Proof of Lemma 1.
We can write
exp
ρ
2
Vn
io
ρ
exp min
∆cn + ln En exp δ
V n+1
un
2
2
n
hρ
h
ρ
iio
min exp ∆cn + ln En exp δ ∆ V n+1
un
2
2
n
h ρ
h
ρ
iio
min exp ln exp
∆cn
+ ln En exp δ ∆ V n+1
un
2
2
h
ρ
iio
n
h ρ
∆cn En exp δ ∆ V n+1
min exp ln exp
un
2
2
n
ρ
h
ρ
io
min exp
∆cn En exp δ ∆ V n+1
un
2
2
n h
io
ρ
min En exp
(∆cn + δ ∆ V n+1 )
, so that
un
2
n h
io
ρ
= ln min En exp
(∆cn + δ ∆ V n+1 )
.
un
2
=
=
=
=
=
=
ρ
Vn
2
nρ
h
∆
A.6. Proof of Lemma 3.
Consider the quadratic form Q(u, ), where Q(u, ) = u0 Qu u u + 2u0 Qu + 0 Q . Assume Q admits
a minimum in in that Q is positive definite. The following holds
Z
1
exp − Q(u, ) d 2
∝
1
exp − min Q(u, ) .
2 This is because, for ˆ = argmin Q, we can write Q(u, ) = Q(u, ˆ) + ( − ˆ)0 Q ( − ˆ). In fact, as Q is positive definite and invertible, the minimum of Q with respect to is obtained for ˆ = −Q−1
Q u u
and is equal to Q(u, ˆ) = u0 [Qu u − Qu Q−1
Q u ]u. Then,
Q(u, ) − Q(u, ˆ)
=
0 Q + 0 Q u u + u0 Qu + u0 Qu Q−1
Q u u
=
0 Q − 0 Q ˆ − ˆ0 Q + ˆ0 Q ˆ = ( − ˆ)0 Q ( − ˆ) .
iii
As Q(u, ˆ) = min Q(u, ) is a constant in the integration,
1
exp − Q(u, ) d 2
Z
=
Z
1
1
exp − min Q(u, ) × exp[ − ( − ˆ)0 Q ( − ˆ)] d .
2 2
Let ∆ denote (−ˆ). Because Q is positive definite, for ∆ integrated over Rp where p is the dimension
R
of , we find that exp( − 21 ∆0 Q ∆) d ∆ = (2π)p/2 det(Q )−1/2 . It follows
Z
1
exp − Q(u, ) d 2
=
(2π)
p/2
−1/2
det(Q )
1
× exp − min Q(u, ) ,
2 where the constant (2π)p/2 det(Q )−1/2 is independent of u.
Suppose that we solve the program minu
R
exp − 12 Q(u, ) . Assume that Q admits a saddle point
with respect to and u, so that maxu min Q(u, ) exists. This is the case if Q > 0 and Qu u −
Qu Q−1
Q u < 0 hold. As a corollary of the former result we have
Z
min
u
1
exp − Q(u, ) d 2
1
1
∝ min exp − min Q(u, ) = exp − max min Q(u, ) . 2
u
2 2 u
A.7. The (Discounted) Future Stress Backward Recursion.
From Lemma 2 we know that if V n+1 is a quadratic form in zn+1
ρ
exp
Vn
2
ρ
min max S n
constant × exp
2 un n+1
=
ρ
= exp
γn + min max S n ,
un n+1
2
for γn a constant independent of zn . This implies that V n = γn + minun maxn+1 S n . Then, assume
that V n+1 = νn+1 + z0n+1 Πn+1 zn+1 . Since S n = ∆cn − ρ1 dn+1 + δ ∆ V n+1 , it follows that
1
∆
∆ 0
min max ∆cn − dn+1 + δ νn+1 + δ zn+1 Πn+1 zn+1
un
n+1
ρ
1
∆
∆ 0
δ νn+1 + min max ∆cn − dn+1 + δ zn+1 Πn+1 zn+1
un
n+1
ρ
min max S n
un
n+1
=
=
=
δ ∆ νn+1 + z0n Πn zn = δ ∆ νn+1 + Fn (zn ) ,
so that V n = γn + minun maxn+1 S n = νn + Fn (zn ), with νn = γn + δ ∆ νn+1 and Fn (zn ) = z0n Πn zn .
iv
A.8. The Value Function.
Consider that (for p the dimension of the vector n+1 )
h
ρ
i
En exp
(∆cn + δ ∆ V n+1 )
2
ρ
exp
Vn
2
ρ
1
(∆cn + δ ∆ V n+1 ) − 0n+1 (∆N)−1 n+1 dn+1
2
2
Z
Sn
= (2π)−p/2 det(∆N)−1/2 exp ρ
dn+1 , so that
2
Z
Sn
−p/2
−1/2
= (2π)
det(∆N)
min exp ρ
dn+1 .
un
2
=
(2π)−p/2 det(∆N)−1/2
Z
exp
The function −ρS n is a quadratic form in un and n+1 , which we can write as u0n Suu un +2u0n Su n+1 +
0n+1 S n+1 , with S = (∆N)−1 − δ ∆ ρΠn+1 . We can apply Lemma 3. From its proof we know that
Z
min
un
Sn
exp ρ
dn+1
2
(2π)p/2 det((∆N)−1 − δ ∆ ρΠn+1 )−1/2 × exp
=
ρ
min max S n .
2 un n+1
Notice that (∆N)−1 − δ ∆ ρΠn+1 = (∆N)−1 (I − δ ∆ ρ∆NΠn+1 ), so that det((∆N)−1 − δ ∆ ρΠn+1 ) =
det(I − δ ∆ ρ∆NΠn+1 )/det(∆N). Therefore,
ρ
exp
Vn
2
∆
= det(I − δ ρ∆N Πn+1 )
This implies that in A.7. γn =
2
ρ
−1/2
Z
min
un
Sn
exp ρ
2
dn+1 .
ln(det(I − δ ∆ ρ∆N Πn+1 )−1/2 ) = − ρ1 ln(det(I − δ ∆ ρ∆N Πn+1 )),
while νn = γn + δ ∆ νn+1 . In steady state V n = F (zn ) + ν, where F (zn ) = z0n Πzn , while
ν = −
1
1 − δ∆
1
ln(det(I − δ ∆ ρ∆N Π)) .
ρ
For ρ ↓ 0 we can prove that γn → δ ∆ Tr(∆NΠn+1 ). This also implies that in steady state ρ ↓ 0
ν →
δ∆
1−δ ∆ Tr(∆NΠ).
To show this result consider that calculate the limit for ρ ↓ 0 of γn we need to
apply Hôpital’s rule. Then, we should use the following results which apply to any invertible matrix Ξ
d ln(det(Ξ))
1
d det(Ξ)
=
dy
det(Ξ)
dy
and
d det(Ξ)
−1 d Ξ
= det(Ξ) Tr Ξ
.
dy
dy
Then,
d ln det I − δ ∆ ρ∆NΠn+1
dρ
=
− δ ∆ Tr (I − δ ∆ ρ∆NΠn+1 )−1 ∆NΠn+1 .
For ρ ↓ 0 this converges to −δ ∆ Tr(∆NΠn+1 ). Applying Hôpital’s rule we conclude that γn → δ ∆ Tr(∆NΠn+1 ),
so that in steady state ν →
δ∆
1−δ ∆ Tr(∆NΠ).
v
A.9. The L̃-Recursion.
Suppose φ(z) = δ ∆ z0 Πz, so that L̃ φ(z) = max [(z + )0 δ ∆ Π(z + ) −
1 0
−1
].
ρ (∆N)
Taking first
derivatives, we find that
2
1
(∆N)−1
ρ
δ∆ Π −
+ 2δ ∆ Π z = 0 ⇔ ˜ = −
δ∆ Π −
1
(∆N)−1
ρ
−1
−1 ∆
δ ∆ Π z = − Π̌
δ Πz,
which identifies a maximum for Π̌ negative definite. Plugging this formula in the expression for L̃φ(z),
we find that
L̃φ(z)
=
−
1 0
˜ (∆N)−1 ˜ + (z + ˜)0 δ ∆ Π (z + ˜)
ρ
=
−
1 0 ∆
−1
−1
−1
−1
z δ Π Π̌ (∆N)−1 Π̌ δ ∆ Π z + z0 (I − Π̌ δ ∆ Π)0 δ ∆ Π (I − Π̌ δ ∆ Π) z .
ρ
Now,
−
1 ∆
−1
−1
−1
−1
δ Π Π̌ (∆N)−1 Π̌ δ ∆ Π + (I − Π̌ δ ∆ Π)0 δ ∆ Π (I − Π̌ δ ∆ Π) =
ρ
1 ∆
−1
−1
−1
−1
−1
δ Π Π̌ (∆N)−1 Π̌ δ ∆ Π + δ ∆ Π − 2 δ ∆ Π Π̌ δ ∆ Π + δ ∆ Π Π̌ δ ∆ Π Π̌ δ ∆ Π =
ρ
1
−1
−1
−1
δ ∆ Π − 2 δ ∆ Π Π̌ δ ∆ Π + δ ∆ Π Π̌
Π − (∆N)−1 Π̌ δ ∆ Π =
ρ
−
δ ∆ Π − 2 δ ∆ Π Π̌
−1 ∆
−1 ∆
δ Π + δ ∆ Π Π̌
−1 ∆
δ Π = δ ∆ Π − δ ∆ Π Π̌
−1 ∆
δ Π = δ ∆ Π [I − Π̌
Consider that
1
(∆N)−1
ρ
−1
1
δ ∆ Π − (∆N)−1
ρ
δ∆ Π −
I−
δ∆ Π −
1
(∆N)−1
ρ
=
1
1
δ ∆ (∆N)−1 ρ∆N Π − (∆N)−1 , so that
ρ
ρ
=
−1
δ ∆ ρ∆N Π − I
δ∆ Π
−1
ρ∆N . Hence,
−1 ∆
I + I − δ ∆ ρ∆N Π
δ ρ∆NΠ .
=
Since for I + A invertible (I + A)−1 = I − (I + A)−1 A, we find that
−1
1
∆
−1
I − δ Π − (∆N)
δ∆ Π
ρ
"
∆
δ Π I−
1
δ Π − (∆N)−1
ρ
∆
−1
−1
I − δ ∆ ρ∆N Π
=
#
∆
δ Π
vi
and hence
−1
= δ ∆ Π I − δ ∆ ρ∆N Π
.
δ Π] .
Then,
"
0 ∆
L̃φ(z) = z δ Π I −
1
δ Π − (∆N)−1
ρ
∆
−1
#
−1
δ Π z = z0 δ ∆ Π I − δ ∆ ρ∆N Π
z.
∆
For Π invertible,
δ ∆ Π (I − ρ ∆Nδ ∆ Π)−1 = δ ∆ Π [((δ ∆ Π)−1 − ρ ∆N) δ ∆ Π]−1 = ((δ ∆ Π)−1 − ρ ∆N)−1 .
e z, where Π
e = ((δ ∆ Π)−1 − ρ ∆N)−1 if Π is invertible and δ ∆ Π I − δ ∆ ρ∆N Π −1
We conclude that L̃φ(z) = z0 Π
otherwise.
A.10. Second Order Conditions for the L̃-Recursion.
Consider that the second order condition for the maximization in the L̃-recursion is that δ ∆ Π −
1
−1
ρ (∆N)
being negative definite. Now, as this is a symmetric matrix, there exists a coordinate transfor-
mation which diagonalizes it. This matrix will be negative definite iff all its eigenvalues are negative,
or equivalently iff its elements on the main diagonal are negative, suggesting that is possible to proceed
as in the scalar case. Hence, suppose that Π is invertible. The condition δ ∆ Π −
∆
−1
equivalent to (δ Π)
1
−1
ρ (∆N)
< 0 is
− ρ∆N > 0, as the elements on the main diagonal of the former matrix will be
negative iff those on the latter are positive, or equivalently the former matrix is negative definite iff the
latter is positive definite. We therefore establish that a solution to the L̃-recursion exists if an only if
(δ ∆ Π)−1 − ρ∆N is positive definite.
A.11. The L-Recursion.
Suppose that Π is positive semi-definite and Q and R are positive definite. This will be true if the cost
function c is positive definite in u and z (that Q and R are positive definite when the cost function is
a positive definite quadratic form is obvious; that in this case Π is also positive semi-definite will be
shown below). Assume also that the condition in Theorem 2 for the DLEQG problem to have a proper
e (as long as its inverse, (δ ∆ Π)−1 − ρ∆N) is positive definite.
solution holds, so that Π
In solving the L-recursion consider that we need to solve
min [∆c + φ((I + A ∆)z + B ∆ u)] ,
u
e z and ∆c = u0 Q ∆ u + z0 R ∆ z + 2u0 S ∆ z. The first order condition is
where φ(z) = z0 Π
e B u + S z + B0 Π
e (I + A ∆) z = 0 , so that
2 ∆ Q u + ∆ B0 Π
−1
0e
e
u = K z with K = − (Q + ∆ B Π B)
S + B Π (I + A ∆) .
0
e are positive definite a minimum is certainly reached since the denominator in the expression
As Q and Π
for the optimal control is also positive definite. Plugging the optimal control vector into the L-recursion,
vii
standard algebra shows that Lφ(z) = z0 Πz, where
−1 e (I+A∆) − S0 + (I + A∆)0 Π
eB
eB
e (I + A∆) .
Π = ∆R + (I+A∆)0 Π
Q + ∆B0 Π
∆S + ∆B0 Π
A.12. Second Order Conditions for the L-Recursion.
e n+1 is positive definite. Now, consider that
Suppose that in n + 1 Πn+1 is positive semi-definite, while Π
Lφ(zn ) = min un [∆c(zn , un ) + ((I + A∆) zn + ∆B un )0 Π̃n+1 ((I + A∆) zn + ∆B un )] .
e n+1 is positive definite,
Since the cost function c is a positive definite quadratic form in un and zn and Π
Lφ(zn ) is non-negative and therefore Πn must be positive semi-definite. As in N ΠN is equal to 0,
by induction we prove that in any period n the L-recursion has the solution discussed in the proof of
Theorem 2, as the second order condition of the minimization is always respected, while the matrix Πn
is positive semi-definite.
That the cost function c is a positive definite quadratic form in un and zn is a sufficient condition for
the DLEQG problem to have the recursive solution presented in Theorem 2, but it is not necessary. If
e n+1 B is positive
this assumption is abandoned, it will be necessary to verify that the matrix Q + ∆B0 Π
definite in any period n.
B. Markovian DLEQG Problems in Continuous Time
B.1. The Continuous-time Limit.
In A.9 we have shown that in the L̃-recursion, with φ(z) = z0 Πz, a maximum is found for
˜ = −
1
δ Π − (∆ N)−1
ρ
∆
−1
e z , with Π
e = ((δ ∆ Π)−1 − ρ∆ N)−1 .
δ ∆ Π z , so that L̃φ(z) = z0 Π
Importantly, whatever Π, for ∆ ↓ 0 a maximum in the L̃-recursion above is found for ˜ = 0. In fact,
when ∆ ↓ 0, for 6= 0, −(1/ρ)0 (∆N)−1 ↓ −∞ and hence a maximum for the L̃-recursion with respect
e → Π.
to is found for ˜ = 0, irrespective of Π. Notice also that for ∆ ↓ 0, Π
In A.11 we have analyzed the L-recursion, showing that in the minimization
min [∆c + φ((I + A ∆)z + B ∆ u)] ,
u
e z and ∆c = u0 Q ∆ u + z0 R ∆ z + 2u0 S ∆ z, the first order condition is
with φ(z) = z0 Π
e B u + S z + B0 Π
e (I + A ∆) z = 0 , so that
2 ∆ Q u + ∆ B0 Π
viii
−1
0e
e
S + B Π (I + A ∆) .
u = K z with K = − (Q + ∆ B Π B)
0
e → Π, K → − Q−1 (S + B0 Π). Thus, in the limit a sufficient condition for the DLEQG
For ∆ ↓ 0, as Π
problem to have the recursive solution discussed in Theorem 3 is that Q being positive definite.
Inserting the expression above for u in the minimizing function we find Lφ(z) = z0 Π− z, where
e + ∆ R + K0 Q K + 2S0 K + A0 Π + Π
e A + 2Π B K
= Π
Π−
e A + K0 B Π
e B K + 2A0 Π
e BK ,
+ ∆2 A0 Π
so that applying the double recursion in n, Fn = LL̃Fn+1 , we find that
Πn
=
e n+1 + ∆ R + K0 Q Kn + 2S0 Kn + A0 Π
e n+1 + Π
e n+1 A + 2Π
e n+1 B Kn + o(∆2 ) ,
Π
n
e n+1 = ((δ ∆ Πn+1 )−1 − ρ∆ N)−1 = δ ∆ Πn+1 (I − ρ∆ N δ ∆ Πn+1 )−1 . This implies that
where Π
Πn − Πn+1
∆
e n+1 − Πn+1
Π
o(∆2 )
0
0
0e
e
e
+ R + Kn Q Kn + 2S Kn + A Πn+1 + Πn+1 A + 2Πn+1 B Kn +
∆
∆
2
e n+1 − Πn+1
Π
e n+1 + Π
e n+1 A + K0 Q Kn + 2S0 Kn + 2Π
e n+1 B Kn + o(∆ ) .
+ R + A0 Π
n
∆
∆
=
=
Now, given the expression for Kn ,
K0n Q
0
0e
−1
e
e
+ 2 (S + Πn+1 B) Kn = (S + Πn+1 B) 2I − (Q + ∆ B Πn+1 B) Q + o(∆) Kn =
0
0
e n+1 B)
− (S + Π
e n+1 B)
− (S0 + Π
0
e n+1 B)−1 Q + o(∆)
2I − (Q + ∆ B Π
e n+1 B)−1 Q
2I − (Q + ∆ B0 Π
−1
0e
e
(Q + ∆ B Πn+1 B)
S + B Πn+1 (I + A ∆) =
0
e n+1 B)−1 S + B0 Π
e n+1 + o(∆) + o(∆2 ) .
(Q + ∆ B0 Π
e n+1 = lim∆↓0 Πn+1 = Π(t), we conclude that
Therefore, considering that lim∆↓0 Π
e n+1 B) Kn
lim K0n Q + 2 (S0 + Π
∆↓0
= − (S0 + Π(t) B) Q−1 (S + B0 Π(t)) .
In addition,
lim
∆↓0
Πn − Πn+1
d Π(t)
= −
and
∆
dt
e n+1 + Π
e n+1 A = A0 Π(t) + Π(t) A .
lim A0 Π
∆↓0
ix
Finally, consider that
e n+1 − Πn+1 = Πn+1 δ ∆ (I − ρ∆ N δ ∆ Πn+1 )−1 − I .
Π
For ∆ small we can write (I − ρ∆ N δ ∆ Πn+1 )−1 = I + ρ∆ N δ ∆ Πn+1 + o(∆2 ), so that
e n+1 − Πn+1 = Πn+1 (δ ∆ − 1) I + ρδ 2∆ ∆ N Πn+1 ) + o(∆2 ) .
Π
It follows that
e n+1 − Πn+1
Π
∆
=
δ∆ − 1
∆
Πn+1 + ρ δ 2∆ Πn+1 N Πn+1 + o(∆) .
In conclusion, we find that
lim
∆↓0
e n+1 − Πn+1
Π
∆
=
ln δ Π(t) + ρ Π(t) N Π(t)
and that the continuous-time counterpart of the Riccati equation is
d Π(t)
+ R + A0 Π(t) + Π(t) A − (S0 + Π(t) B) Q−1 (S + B0 Π(t)) + ln δ Π(t) + ρ Π(t) N Π(t)
dt
=
0.
B.2. The Value Function in the Continuous-time Limit.
In A.8 and A.9 we have seen that V n = νn +Fn (zn ), where νn = γn +δ ∆ νn+1 and γn = − ρ1 ln det I − δ ∆ ρ∆NΠn+1 .
We can then write that
∆
δ −1
νn+1 − νn
νn+1 +
∆
∆
The limit of the l.h.s. for ∆ ↓ 0 is ln δ ν(t) +
=
d ν(t)
dt .
1 1
ln det I − δ ∆ ρ∆NΠn+1 .
∆ ρ
As for the r.h.s. we need to apply Hôpital’s rule.
Recalling that for any invertible matrix Ξ
d ln(det(Ξ))
dy
=
−1
Tr Ξ
dΞ
dy
,
we see that
d ln det I − δ ∆ ρ∆NΠn+1
d∆
=
− δ ∆ (1 + ∆ ln δ) ρ Tr (I − δ ∆ ρ∆NΠn+1 )−1 NΠn+1 .
For ∆ ↓ 0 this converges to −ρTr(NΠ(t)). This implies that in the limit νn converges to ν(t) which
solves the following differential equation
d ν(t)
+ ln δ ν(t) + Tr(NΠ(t))
dt
x
=
0.
B.3. The Continuous-time Limit of the Kalman Filter.
Consider that zn = (I + A∆)zn−1 + B∆un−1 + n , while wn = Czn−1!+ η n . In addition, we know that
∆N
L
the perturbations (0n , η n ) possess covariance matrix
. Then, application of Kalman
1
0
L
∆M
filter implies that
ẑn
(I + A∆)ẑn−1 + B∆un−1 + (L + (I + A∆)Ωn−1 C0 )
=
1
M + C Ωn−1 C0
∆
−1
(wn − C ẑn−1 ) ,
where the conditional covariance matrix Ωn respects the Riccati recursion
Ωn
∆N + (I + A∆) Ωn−1 (I + A∆)0 − (L + (I + A∆)Ωn−1 C0 )
=
1
M + C Ωn−1 C0
∆
−1
×
(L + (I + A∆)Ωn−1 C0 )0 .
The former entails that
ẑn − ẑn−1
∆
Notice that
ẑn − ẑn−1
∆
For ∆ ↓ 0,
= Aẑn−1 + Bun−1
1
∆M
+ C Ωn−1 C0
−1
1
+
(L + (I + A∆)Ωn−1 C0 )
∆
= ∆ (M + ∆C Ωn−1 C0 )
−1
1
M + C Ωn−1 C0
∆
−1
dẑ(t)
dt ,
→
d ẑ(t)
dt
(wn − C ẑn−1 ) ,
, so that
= Aẑn−1 + Bun−1 + (L + (I + A∆)Ωn−1 C0 ) (M + ∆C Ωn−1 C0 )
ẑn − ẑn−1
∆
−1
(wn − C ẑn−1 ) ,
ẑn−1 → ẑ(t), un−1 → u(t), wn → w(t), Ωn−1 → Ω(t). We conclude that
=
A ẑ(t) + B u(t) + (L + Ω(t) C0 )M−1 (w(t) − C ẑ(t)) .
As for the limit behavior Ωn consider that we can write
1
(Ωn − Ωn−1 )
∆
For ∆ ↓ 0,
=
Ωn − Ωn−1
∆
d Ω(t)
dt
−1
N + AΩn−1 + Ωn−1 A0 − (L + Ωn−1 C0 ) (M + ∆C Ωn−1 C0 )
→
=
dΩ(t)
dt
and hence
N + A Ω(t) + Ω(t) A0 − (L + Ω(t) C0 )M−1 (L + Ω(t) C0 )0 .
xi
(L + Ωn−1 C0 )0 + o(∆) .
C. Markovian DLEQG Problems with Deterministic Disturbances
C.1. The Recursive Optimization with Deterministic Disturbances to the Plant Equation.
Assume that V n+1 = z0n+1 Πn+1 z0n+1 − 2ϑ0n+1 zn+1 + νn+1 . Then, consider
exp
ρ
2
Vn
=
h
ρ
i
min E exp
∆cn + δ ∆ V n+1
.
un
2
Given that zn+1 is a linear function of n+1 , this expression is equal to
exp
ρ
2
Vn
=
min
un
(2π)−p/2
det(∆N)1/2
Z ρ
1 0
∆
−1
exp
∆cn + δ V n+1 − n+1 (∆N) n+1
d n+1 .
2
2
Given the expression for V n+1 , the argument inside the exponential can be written as ρ2 κn + Q(n+1 ),
where κn contains terms independent of n+1 , while Q(n+1 ) = − 21 0n+1 Q, n+1 + Q0 n+1 with
Q, = (I − δ ∆ ρΠn+1 ∆N)(∆N)−1
and
e n + B∆un ) − δ ∆ ρ ϑn+1 .
Q = δ ∆ ρ Πn+1 (Az
n+1 ) = 21 Q0 Q−1
For n+1 = ˆn+1 = Q−1
, Q . Moreover, and
, Q , Q(n+1 ) has a maximum equal to Q(ˆ
more importantly, even if Q is not an homogeneous quadratic form, it is immediate to see that
Q(n+1 )
max Q(n+1 ) −
=
n+1
1
(n+1 − ˆn+1 )0 Q, (n+1 − ˆn+1 ) .
2
In fact, − 12 (n+1 −ˆn+1 )0 Q, (n+1 −ˆn+1 ) = − 21 0n+1 Q, n+1 − 21 ˆ0n+1 Q, ˆn+1 +ˆ0n+1 Q, n+1 . Substi1 0
1 0 −1
0
tuting ˆn+1 with Q−1
, Q , we find − 2 n+1 Q, n+1 +Q n+1 − 2 Q Q, Q = Q(n+1 )−maxn+1 Q(n+1 ).
Then,
exp
ρ
2
Vn
=
(2π)−p/2
min
det(∆N)1/2 un
=
(2π)−p/2
det(∆N)1/2
Z ρ
1
exp
κn + max Q(n+1 ) − (n+1 − ˆn+1 )0 Q, (n+1 − ˆn+1 )
d n+1
n+1
2
2
Z ρ
1 0
min exp
κn + max Q(n+1 )
exp − ∆ Q, ∆
d∆ ,
un
n+1
2
2
where ∆ = n+1 − ˆn+1 . This implies that
exp
ρ
2
Vn
=
(2π)−p/2
(2π)p/2
min
1/2
det(∆N)
det(Q, )1/2 un
exp
ρ
κn + max Q(n+1 )
n+1
2
Considering that det(Q, ) = det(I − δ ∆ ρΠn+1 ∆N)/ det(∆N), we find that
exp
ρ
2
Vn
=
=
ρ
det(I − δ ∆ ρΠn+1 ∆N)−1/2 exp min
κn + max Q(n+1 )
un
n+1
2
ρ
det(I − δ ∆ ρΠn+1 ∆N)−1/2 exp min max
κn + Q(n+1 )
.
un n+1 2
xii
It is immediate to see that, given the definitions of Q(n+1 ) and S n ,
ρ
κn + Q(n+1 )
2
=
=
exp
ρ
2
Vn
=
1
∆cn + δ ∆ z0n+1 Πn+1 z0n+1 − 2δ ∆ ϑ0n+1 zn+1 − 0n+1 (∆N)−1 n+1 + δ ∆ νn+1
ρ
ρ
S n + δ ∆ νn+1
and hence even in this case
2
ρ
det(I − δ ∆ ρΠn+1 ∆N)−1/2 exp
min max S n + δ ∆ νn+1
.
2 un n+1
ρ
2
This implies that as in the homogenous case (see A.7 and A.8)
Vn
γn + δ ∆ νn+1 + min max S n , where γn = −
=
un
n+1
1
log(det(I − δ ∆ ρΠn+1 ∆N)) .
ρ
C.2. The L̃-Recursion with Deterministic Disturbances to the Plant Equation.
Suppose φ(z) = δ ∆ z0 Πz − 2δ ∆ ϑ0 z + δ ∆ ν, so that L̃ φ(z) = max [(z + )0 δ ∆ Π(z + ) − 2δ ∆ ϑ0 (z + ) +
δ∆ ν −
1 0
−1
].
ρ (∆N)
Taking first derivatives, we find that
2
˜ = −
δ∆ Π −
1
(∆N)−1
ρ
+ 2δ ∆ (Π z − ϑ) = 0
⇔
−1
−1
1
1
δ ∆ Π − (∆N)−1
δ ∆ Π z + δ ∆ Π − (∆N)−1
δ∆ ϑ ,
ρ
ρ
ie. ˜ = ˜o + ˜e , with ˜o = −Π̌
−1 ∆
δ Πz and ˜e = Π̌
−1 ∆
δ ϑ. Plugging this formula in the expression for
L̃φ(z), we find that
L̃φ(z)
=
=
−
1
(˜o + ˜e )0 (∆N)−1 (˜o + ˜e ) + (z + ˜o + ˜e )0 δ ∆ Π (z + ˜o + ˜e ) − 2 δ ∆ ϑ0 (z + ˜o + ˜e ) + δ ∆ ν
ρ
1
1
− ˜0o (∆N)−1 ˜o + (z + ˜o )0 δ ∆ Π (z + ˜o ) − ˜0e (∆N)−1 ˜e + ˜0e δ ∆ Π ˜e − 2 δ ∆ ϑ0 ˜e + δ ∆ ν
ρ
ρ
|
{z
} |
{z
}
ez
z0 Π
independent of z
2
− ˜0e (∆N)−1 ˜o + 2 ˜0e δ ∆ Π (z + ˜o ) − 2 δ ∆ ϑ0 (z + ˜o ) .
ρ
|
{z
}
linear in z
e z” correspond to the function L̃φ(z) of the homogenous formulation (See
The terms tagged as “z0 Π
A.9), so that we concentrate on those tagged as “independent of z” and “linear z”. Thus, starting from
−1 ∆
the terms “independent of z”, consider that, as ˜e = Π̌
xiii
δ ϑ, the sum of the first three can be written
as
δ
2∆
1 −1
−1
−1
−1
−1
−1
∆
ϑ − Π̌ (∆N) Π̌
+ δ Π̌ Π Π̌
− 2 Π̌
ϑ
ρ
1
−1
−1
−1
−1
2∆ 0
∆
Π̌
− 2 Π̌
ϑ
δ ϑ Π̌
δ Π − (∆N)
ρ
0
=
=
−1
− δ 2∆ ϑ0 Π̌
ϑ.
This implies that sum of the terms “independent of z” is equal to a constant ν̃, where
ν̃
=
∆
δ ν − δ
2∆
−1
−1
1
∆
−1
ϑ δ Π − (∆N)
ϑ = δ ∆ ν + δ 2∆ ρ ϑ0 ∆N I − δ ∆ ρ Π ∆N
ϑ.
ρ
0
The terms “linear in z” can be re-written as follows
1 0
2
0 ∆
−1
∆ 0
˜ (∆N) − ˜e δ Π + δ ϑ (z + ˜o ) + ˜0e (∆N)−1 z .
−2
ρ e
ρ
Given the expression for ˜e we find that the sum of the first three terms is equal to
1 ∆ 0 −1
−1
∆ 0 −1 ∆
∆ 0
δ ϑ Π̌ (∆N) − δ ϑ Π̌ δ Π + δ ϑ (z + ˜o )
−2
ρ


=


1
−1

∆
−1 
δ Π − (∆N)
−2 δ ϑ I − Π̌
 (z + ˜o ) = 0 , while
ρ


|
{z
}
∆
0
Π̌
2 0
−1
˜ (∆N)−1 z = 2 δ ∆ ϑ0 Π̌ (ρ∆N)−1 z .
ρ e
0
−1
e z, with −ϑ
e = δ ∆ (− ρ∆N)−1 Π̌
which we can write as −2ϑ
−1
(− ρ∆N)−1 Π̌
(−ϑ). Now,
=
(− ρ∆N)−1 (δ ∆ Π − (ρ∆N)−1 )−1
=
(− ρ∆N)−1 [(I + δ ∆ Π(− ρ∆N))(−ρ∆N)−1 )]−1
=
(I − δ ∆ ρ Π∆N)−1 .
e 0 z+ν̃, with ϑ
e = δ ∆ (− ρ∆N)−1 Π̌−1 ϑ = δ ∆ (I−δ ∆ ρΠ ∆N)−1 ϑ
e
ϑ
Then, we conclude that Lφ(z) = z0 Πz−2
−1
and ν̃ = δ ∆ ν + δ 2∆ ρϑ0 ∆N I − δ ∆ ρΠ∆N
ϑ. Notice that for Π invertible we can also write
e
ϑ
=
δ ∆ ((δ ∆ Π)−1 − ρ ∆N)−1 (δ ∆ Π)−1 ϑ
=
e Π−1 ϑ .
((δ ∆ Π)−1 − ρ ∆N)−1 Π−1 ϑ = Π
xiv
C.3. The L-Recursion with Deterministic Disturbances to the Plant Equation.
Suppose that φ(z) = z0 Πz − 2ϑ0 z + ν. Then, consider that
Lφ(z)
=
e z + B∆ u + µ∆)0 Π (A
e z + B∆u + µ∆) − 2 ϑ0 (A
e z + B∆ u + µ∆) + ν] ,
min [∆c(z, u) + (A
c(z, u)
=
u0 Q u + 2 u0 S z + z R z .
u
The first order condition is
e z + B0 (Π µ∆ − ϑ) = 0 ,
(Q + ∆B0 Π B) u + (S + BΠA)
so that the optimal control is uo + ue where
uo
=
e z,
Ko z = − (Q + ∆B0 Π B)−1 (S + B ΠA)
ue
=
Ke (Π µ∆ − ϑ) = − (Q + ∆B0 Π B)−1 B0 (Π µ∆ − ϑ) .
Plugging the expressions for uo and ue into Lφ(z), we find that
z0 Π−1 z − 2 ϑ0−1 z + ν−1 = z0 R∆ z + (uo + ue )0 ∆Q (uo + ue ) + 2 z0 S0 ∆ (uo + ue ) +
e z + B∆(uo + ue ) + µ∆]0 Π [A
e z + B∆(uo + ue ) + µ∆] − 2 ϑ0 [A
e z + B∆(uo + ue ) + µ∆] + ν =
[A
e z + B∆uo )0 Π (A
e z + B∆uo ) +
z0 R∆ z + u0o Q∆ uo + 2 z0 S0 ∆ uo + (A
{z
}
|
quadratic in z
e z ] + 2 (Π µ∆ − ϑ)0 (Az
e + B∆ uo ) +
2 u0 [(Q∆ + B0 ∆ΠB∆)uo + 2 (S∆ + B0 ∆ Π A)
{z
}
| e
linear in z
u0e Q∆ ue + (B∆ue + µ∆)0 Π (B∆ue + µ∆) − 2 ϑ0 (B∆ue + µ∆) + ν .
|
{z
}
independent of z
The terms in the right hand side of this equation tagged as “quadratic in z” will correspond to z0 Π−1 z”,
the terms tagged as “linear in z” to will correspond to −2ϑ0−1 z, while those tagged “independent of z
will correspond to ν−1 . In particular, as
z0 Π−1 z
e z + B∆uo ]0 Π [A
e z + B∆uo ]
= z0 R∆ z + u0o Q∆ uo + 2 z0 S0 uo + [A
corresponds to the Riccati recursion of the homogenous LQG formulation, standard results indicate that
e 0 ΠA
e − (S0 + A
e 0 ΠB)(Q + ∆B0 ΠB)−1 (S∆ + B0 ∆ΠA)
e .
Π−1 = R∆ + A
Consider hence the sum of the terms “linear in z”, corresponding to −2ϑ0−1 z. Given that uo = Kz and
xv
e we see that the first term in parentheses is null so that
that K = (Q + ∆B0 ΠB)−1 (S + B0 Π A),
e + B∆K .
−2ϑ0−1 z = 2 (Πµ∆ − ϑ)0 Γ z , i.e. ϑ−1 = Γ0 (ϑ − Π µ∆) with Γ = A
Consider hence the sum of the terms “independent of z”, corresponding to ν−1 . We can write ue =
Kµ µ∆ + Kϑ ϑ, where Kµ = −(Q + ∆B0 ΠB)−1 B0 Π and Kϑ = (Q + ∆B0 ΠB)−1 B0 , and B∆ue + µ∆ =
(I + B∆Kµ )µ∆ + B∆Kϑ ϑ. Substituting out ue and B∆ue + µ∆ and regrouping terms, one finds that
!
ν−1
=
µ∆
0
K0µ (Q∆
0
+ B ∆ΠB∆)Kµ + Π +
2K0µ B0 ∆Π
µ∆ +
!
0
K0ϑ (Q∆
ϑ
0
0
+ B ∆ΠB∆)Kϑ − 2B ∆Kϑ ϑ +
!
2ϑ
0
K0ϑ (Q∆ + B0 ∆ΠB∆)Kµ + K0ϑ B0 ∆Π − I − B∆Kµ µ∆ + ν .
Substituting out Kµ and Kθ , we can show that the three terms in parentheses are equal respectively to
(I − BJ−1 B0 ∆)Π, −BJ−1 B0 ∆ and −(I − BJ−1 B0 ∆Π), where J = (Q + B0 ΠB∆). This implies that
ν−1
=
µ∆0 (I − BJ−1 B0 ∆)Πµ∆ − 2 ϑ0 (I − BJ−1 B0 ∆Π)µ∆ − ϑ0 BJ−1 B0 ∆ϑ + ν .
C.4. The Value Function with Deterministic Disturbances to the Plant Equation.
We just combine results in C.1, C.2 and C.3. From C.1 we know that if V n+1 = z0n+1 Πn+1 z0n+1 −
2ϑ0n+1 zn+1 + νn+1 . Then
Vn
=
γn + min max ∆cn + δ
un
n+1
∆
z0n+1 Πn+1
z0n+1
− 2δ
∆
ϑ0n+1 zn+1
1
− 0n+1 (∆N)−1 n+1 + δ ∆ νn+1
ρ
with γn = − ρ1 log(det(I − δ ∆ ρΠn+1 ∆N)). From C.2 we know that
1
max ∆cn + δ ∆ z0n+1 Πn+1 z0n+1 − 2δ ∆ ϑ0n+1 zn+1 − 0n+1 (∆N)−1 n+1 + δ ∆ νn+1 =
n+1
ρ
e0
e n+1 z0
en+1 , where
z0n+1 Π
n+1 − 2ϑn+1 zn+1 + ν
((δ ∆ Πn+1 )−1 − ρ ∆N)−1 ,
e n+1
Π
=
e n+1
ϑ
= δ ∆ (I − δ ∆ ρΠn+1 ∆N)−1 ϑn+1 = ((δ ∆ Πn+1 )−1 − ρ ∆N)−1 Π−1
n+1 ϑn+1 ,
ν̃n+1
=
−1
δ ∆ νn+1 + δ 2∆ ρ ϑ0n+1 ∆N I − δ ∆ ρ Πn+1 ∆N
ϑn+1 .
xvi
,
Then, from C.3 we find that V n = z0n Πn z0n − 2ϑ0n zn + νn , where
Πn
=
e 0Π
e n+1 A
e − (S0 + A
e 0Π
e n+1 B)(Q + ∆B0 Π
e n+1 B)−1 (S∆ + B0 ∆Π
e n+1 A)
e ,
R∆ + A
ϑn
=
e n+1 − Π
e n+1 µ
Γ0n (ϑ
n+1 ∆) ,
νn
=
e−1 B0 ∆)Π
e n+1 µn+1 ∆
γn + ν̃n+1 + µ0n+1 ∆(I − BJ
n+1
e 0 (I − BJ
e0
e−1 B0 ∆Π
e n+1 )µ
e−1 0 e
− 2ϑ
n+1 ∆ − ϑn+1 BJn+1 B ∆ϑn+1
n+1
n+1
en+1 = (Q + B0 Π
e n+1 B∆).
and J
C.5. The Continuous-time Limit with Pre-determined Disturbances.
In C.3 we have seen that if zn = (I + A∆)zn−1 + B∆un−1 + µn ∆ + n , then,
ϑ0n
=
e 0 [ I + ∆ (A + B Kn ) ] − ∆µ0 Π
e
ϑ
n+1
n+1 n+1 [ I + ∆ (A + B Kn ) ] or equivalently
ϑn − ϑn+1
∆
e n+1 − ϑn+1
ϑ
e n+1 − Π
e n+1 µn+1 + o(∆) .
+ (A + B Kn )0 ϑ
∆
=
e n+1
e n+1 = lim∆↓0 Πn = Π(t), lim∆↓0 ϑ
Now, lim∆↓0 Π
n+1
lim∆↓0 ϑn −ϑ
∆
=
− d ϑ(t)
dt .
In addition, lim∆↓0 Kn = −Q
= lim∆↓0 ϑn = ϑ(t), lim∆↓0 µn+1 = µ(t),
−1
(S + B0 Π(t)). We conclude that
lim (A + B Kn ) = A − B Q−1 S − JΠ(t) ,
e n+1 µ
lim Π
n+1 = Π(t) µ(t) ,
∆↓0
∆↓0
e n+1 = Π
e n+1 Π−1 ϑn+1 ,
with J = B Q−1 B0 . Since ϑ
n+1
e n+1 − ϑn+1
ϑ
=
e n+1 Π−1 − I) ϑn+1 .
(Π
n+1
e n+1 = (I − ρ ∆ δ ∆ Πn+1 N)−1 δ ∆ Πt+1 , so that
In addition, Π
e n+1 Π−1 − I)
(Π
n+1
=
(I − ρ ∆ δ ∆ Πn+1 N)−1 δ ∆ − I .
Since for ∆ small we can write (I − ρ ∆ δ ∆ Πn+1 N)−1 = I + ρ ∆ δ ∆ Πn+1 N + o(∆2 ) we find that
e n+1 Π−1 − I)
(Π
n+1
=
(I + ρ ∆ δ ∆ Πn+1 N + o(∆2 )) δ ∆ − I
(δ ∆ − 1) I + ρ ∆ δ 2∆ Πn+1 N + o(∆2 ) . It follows that
∆
(δ − 1)
=
I + ρ δ 2∆ Πn+1 N + o(∆) ϑn+1
∆
=
e n+1 − ϑn+1
ϑ
∆
xvii
and that
lim
∆↓0
e n+1 − ϑn+1
ϑ
∆
=
ln δ ϑ(t) + ρ Π(t) N ϑ(t) .
Summing up terms we find that
d ϑ(t)
+ ln δ ϑ(t) + ρ Π(t) N ϑ(t) + [A − B Q−1 S − J Π(t)]0 ϑ(t) − Π(t) µ(t) = 0 ,
dt
which can also be written as
d ϑ(t)
e 0 ) ϑ(t) − Π(t) µ(t) = 0 , with
+ (ln δ + Γ(t)
dt
e
Γ(t)
=
A − B Q−1 S − (J − ρ N )Π(t) .
Finally, consider the term νn . We see from C.4 that
νn
=
e−1 ∆B0 )Π
e n+1 µn+1 ∆
γn + ν̃n+1 + ∆µ0n+1 (I − BJ
n+1
e 0 BJ
e n+1 ,
e 0 (I − BJ
e−1 ∆B0 ϑ
e−1 ∆B0 Π
e n+1 )∆µn+1 − ϑ
− 2ϑ
n+1
n+1
n+1
n+1
where
en+1
J
=
e n+1 B∆)
(Q + B0 Π
e n+1
ϑ
=
δ ∆ (I − δ ∆ ρ Πn+1 N∆)−1 ϑn+1 ,
ν̃n+1
=
δ ∆ νn+1 + δ 2∆ ρ ϑ0n+1 N∆ I − δ ∆ ρ Πn+1 N∆
γn
=
−
−1
ϑn+1 ,
1
ln det I − δ ∆ ρ∆NΠn+1 .
ρ
This is equivalent to
νn − δ ∆ νn+1
∆
=
−1
1
γn + δ 2∆ ρ ϑ0n+1 N I − δ ∆ ρ Πn+1 N∆
ϑn+1
∆
e0 µ
e0
e−1 0 e
− 2ϑ
n+1 n+1 − ϑn+1 BJn+1 B ϑn+1 + o(∆) .
The l.h.s. can be written as
νn − νn+1
1 − δ∆
νn − νn+1
∂ν(t)
1 − δ∆
+ νn+1
with lim
= −
and lim νn+1
= − ν(t) ln δ .
∆↓0
∆↓0
∆
∆
∆
∂t
∆
so that the l.h.s. converges to −( ∂ν(t)
∂t + ln δ ν(t)). For the r.h.s. consider that in B.2 we have shown that
xviii
1
∆ γn
e n+1 = lim∆↓0 ϑn+1 = ϑ(t), lim∆↓0 µ
= Tr(NΠ(t)). Finally, recall that lim∆↓0 ϑ
n+1 = µ(t)
−1
−1
∆
e
and consider that lim∆↓0 J
= −Q and lim∆↓0 (I − δ ρΠn+1 N∆) = I. Thus, we conclude that the
lim∆↓0
n+1
r.h.s. converges to
Tr(NΠ(t)) − 2 ϑ(t)0 µ(t) − ϑ(t)0 (BQB0 − ρN)ϑ(t)
so that ν(t) respects the following differential equation
∂ν(t)
+ ln δ ν(t) + Tr(NΠ(t)) − 2 ϑ(t)0 µ(t) − ϑ(t)0 (BQ−1 B0 − ρN)ϑ(t) = 0 .
∂t
D. Optimal Policy for a Risk-averse Monopolist
D.1. Optimal Production Policy for a Risk-averse Entrepreneur.
Given that Q = q, R = 0, S = −1/2, A = a, B = b, z = p and u = x, the continuous-time risk-sensitive
Riccati equation is
1
d π(t)
+ 2 a π(t) −
dt
q
which we can re-write as
d π(t)
dt
1
2
b π(t) −
2
+ ρ σ2 π(t)2 + ln δ π(t)
=
0,
= h0 + h1 π(t) + h2 π(t)2 , with h0 = 1/(4q), h1 = −(2a + b/q + ln δ),
h2 = b2 /q − ρσ2 , and transformed into a homogeneous ordinary differential equation of order two,
d y(t)
d y(t)
d2 y(t)
− h1
+ h0 h2 y(t)
d2 t
dt
0 , with π(t) = −
=
1 dt
.
h2 y(t)
Assume then that y(t) = m exp(ζ t). We have a solution of the ODE iff
ζ 2 m exp(ζ t) − ζ h1 m exp(ζ t) + h0 h2 m exp(ζ t)
=
0 , ie. iff
m ζ 2 − m h1 ζ + m h0 h2
=
0.
(
This admits two roots equal to ζ =
ζ1 =
ζ2 =
m1 exp(ζ1 t) + m2 exp(ζ2 t). Given that π(t) =
π(t)
= −
2
( bq
1
2
h1 +
1
2
h1 −
− h12
d y(t)
dt
y(t)
1
2
√
√
1
2
D
D
, with D = h21 − 4h0 h2 . Thus, y(t) =
, we can write that
m1 ζ1 exp(ζ1 t) + m2 ζ2 exp(ζ2 t)
.
− ρ σ2 ) (m1 exp(ζ1 t) + m2 exp(ζ2 t))
We can impose the terminal condition π(T ) = 0 to find that
m1 ζ1 exp(ζ1 T ) + m2 ζ2 exp(ζ2 T ) = 0 ⇔ m2 = −
xix
√
ζ1
ζ1
m1 exp((ζ1 − ζ2 ) T ) = − m1 exp( D T ) .
ζ2
ζ2
Re-inserting this expression in that for π(t) we find that
π(t)
=
−
=
−
=
−
=
−
( bq
1
− ρ σ2 )
2
( bq
ζ1
− ρ σ2 )
2
( bq
ζ1
− ρ σ2 )
2
( bq
ζ1
− ρ σ2 )
2
as ζ1 − ζ2 = D. Notice, that for
2
b
q
b2
q
!
√
ζ1 exp(ζ1 t) − ζ1 exp( D T ) exp(ζ2 t)
√
exp(ζ1 t) − ζζ21 exp( D T ) exp(ζ2 t)
!
√
1 − exp( D T ) exp(−(ζ1 − ζ2 ) t)
√
1 − ζζ12 exp( D T ) exp(−(ζ1 − ζ2 ) t)
!
√
1 − exp(− D (t − T ))
√
1 − ζζ12 exp(− D (t − T ))
!
√
exp( D (T − t)) − 1
√
,
ζ1
ζ2 exp( D (T − t)) − 1
> ρσ2 0 < ζ2 < ζ1 . It immediately follows that π(t) < 0, while for
< ρσ2 , ζ2 < 0 < ζ1 . Even in this case π(t) < 0 as
that for
2
b
q
2
( bq
ζ1
−ρ σ2 )
and
√
exp( D (T −t))−1
√
exp( D (T −t))−1
ζ1
ζ2
change sign. Notice
= ρσ2 , ζ2 = 0 and the solution turns degenerate and collapses to the static solution. In fact,
in this case ζ2 = 0 and ζ1 = h1 , so that y(t) = m1 exp(h1 t) + m2 and hence d y(t)/d t = m1 h1 exp(h1 t).
The terminal condition π(T ) = 0 entails that m1 = 0 and this on turn implies that π(t) = 0.
D.2. Riccati Equation for the Conditional Variance of the Commodity Price.
The conditional variance of p(t) solves the continuous time Riccati equation
dσp2 (t)
dt
=
σ2 + 2aw(t) −
1 4
σ (t) .
ση2 p
Even this can be transformed into an ordinary differential equation
d2 y(t)
d y(t)
σ2
−
2a
−
y(t)
d2 t
dt
ση2
0 , with σp2 (t) = ση2
=
d y(t) 1
.
d t y(t)
The ordinary differential equation will present a general solution y(t) = m exp(ιt) iff
m exp(ι t) ι2 − 2a m exp(ι t)ι −
σ2
m exp(ι t)
ση2
=
0 , ie. iff
σ2
m
ση2
=
0.
m ι2 − 2am ι −
(
ι1 = a +
q
ι2 = a −
q
This admits two roots equal to ι =
a2 +
σ2
ση2
a2 +
σ2
ση2 .
xx
Thus, y(t) = m1 exp(ι1 t) + m2 exp(ι2 t).
Given that σp2 (t) = ση2
d y(t)
dt
y(t)
, we can write that
σp2 (t)
=
ση2
m1 ι1 exp(ι1 t) + m2 ι2 exp(ι2 t)
m1 exp(ι1 t) + m2 exp(ι2 t)
=
ση2
θ ι1 exp(ι1 t) + ι2 exp(ι2 t)
.
θ exp(ι1 t) + exp(ι2 t)
Imposing the initial condition that σp2 (0) = σ02 and substituting out ι1 and ι2 we find that
s
σ02
=
ση2
a+
σ2 θ − 1
a2 + 2
ση θ + 1
!
which is equivalent to
σ02
ση2
−a
q
a2 +
σ2
ση2
θ−1
=
θ+1
⇐⇒
θ = xxi
a+
σ2
ση2
1/2
a+
σ2
ση2
1/2
+
σ02
ση2
−
σ02
ση2
−a
.
−a