Optimal Dynamic Information Acquisition

Optimal Dynamic Information Acquisition
Weijie Zhong1
Columbia University
Abstract
In this paper, I studied a decision problem in which the decision maker(DM) can acquire information about payoff relevant state to facilitate decision making. The design
of information flow can be fully general, while acquisition of information in unit time is
either limited or costly. I characterized five key properties of optimal dynamic information acquisition strategy in the continuous time limit: the DM will seek for informative
evidence arriving as a Poisson process that Confirms Prior Belief and lead to Immediate
Action with Increasing Precision and Decreasing Intensity over time conditional on continuation. Within the scope of assumptions I made, the results provided an optimization
foundation for Poisson bandits learning, a dynamic foundation for rational inattention
and a full characterization of dynamic information acquisition.
Keywords: dynamic information acquisition, rational inattention, Poisson-bandits
Contents
1 Introduction
2
2 A General Discrete Time Framework
2.1 Simplification of Information Structure . . . . . . . . . . . . . . . . . . .
2.2 Flow Cost Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Continuous Time Limit of Information . . . . . . . . . . . . . . . . . . .
3
5
6
7
3 Optimal Information Acquisition in Continuous Time
3.1 Convergence in Continuous Time . . . . . . . . . . . . . . . . . . . . . .
3.2 Characterization of Solution . . . . . . . . . . . . . . . . . . . . . . . . .
8
9
10
4 Extensions
4.1 Convex Flow Cost . . . . . . .
4.2 Continuum of Actions . . . .
4.3 General Information Measure
4.4 Connection to Static Problem
16
17
18
20
21
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5 Discussions and Conclusion
23
Appendix: Omitted Proofs
26
Email address: [email protected] (Weijie Zhong)
Vert preliminary draft version
January 20, 2017
1. Introduction
A standard approach in researches involving dynamic information acquisition is to
model information flow as a simple family of random process. The decision maker
(DM) has control over parameters of the process representing the dimension of interest. For example, DM can control timing of action—Wald-type problem and Poissonbandit problem, type of evidence that arrives—evidence seeking problem, or intensity of
experimentation—random sampling problem. These models provide elegant characterizations, but at a cost: First, we can’t justify which of those families will be endogenously
pinned down by optimization when we don’t impose any restriction on the design of information flow. Second, the different choice dimensions never co-exist so we can’t infer
about interactions between different aspects of information structure.
In this paper, I develop an information acquisition framework which imposes no restriction on specific form of information the DM can acquire and allows the DM to
optimize over all aforementioned dimensions. The most important feature is that, the
DM can choose any arbitrary random process and she observes realization of this process as her information. Her objective is to maximize expected utility from action she
chooses based on signals she receives, subject to a constraint or convex cost on amount
of information acquired within unit time. To gain tractability with so much generality, I
impose following assumptions on main model 1) continuous time, 2) binary state and 3)
informativeness measure being posterior separable.
Within this framework, I try to accomplish three main goals. The first goal is to
provide foundation for a family of simple information structures that arises endogenously
as optimizer in dynamic information acquisition environment. The main finding is that
under my assumptions, the optimal information acquisition strategy will be Evidence
Seeking: In each period, the DM will wait for a signal arriving at a Poisson rate. The
signal is very informative about the state and it drives a jump in the DM’s posterior
belief once being observed. This finding suggests that in environment where there is
more flexibility in information acquisition, for example R&D activity, experiment design
of researchers, market sampling etc., information flow is more likely to assemble a Poisson
bandit model.
The second goal is to characterize the interaction among all four dimensions of interest in design of information: type of evidence, informativeness of signal, intensity of
experimentation, and timing of actions. It’s optimal for the DM to seek for informative
evidences that Confirms Prior Belief and lead to Immediate Action with Increasing Precision and Decreasing Intensity over time. A decision maker with prior belief that one
state is most likely will wait for a signal arriving occasionally. Once the signal is received,
the decision maker’s posterior belief will jump towards her prior conjecture being true
and she will immediately choose an optimal action associated with posterior belief. If
the signal doesn’t arrive, belief of the decision maker drifts towards being unsure and she
will seek for more informative signal arriving at lower Poisson rate.
The characterization of precision v.s. intensity trade-off is novel to the literature. The
following pattern: higher experimentation intensity associated with more extreme beliefs
is shared with Moscarini and Smith (2001), which uses a single parameter to represent
2
both precision and intensity. However, by separating precision from intensity, I illustrate
the intuition that with higher continuation value (which is the cost of discounting), the
gain from increasing signal arrival rate is higher relative to increasing signal precision.
Therefore, lower signal precision is associated with more extreme beliefs. The following
patterns: no repeated experimentation and confirmatory evidence seeking associated with
ambiguous beliefs are shared with Che and Mierendorff (2016), which uses magnitude of
belief jumping as control parameter to represent type of evidences. However, by allowing
freedom of choosing both type of evidence and arrival rate, I suggest that when belief is
more extreme, higher continuation value will make confirmatory evidence more appealing
for its high arrival rate comparing to contradictory evidence.
Third, I want to setup link between dynamic information acquisition problem and
well studied static problems. I derive a discretization of the dynamic problem in this
paper. The discretized model is a sequential generalization of rational inattention model
from Matejka and McKay (2014), which studies design of flexible information acquisition
within one period. On the other hand, in zero discounting limit and linear cost limit
our model nests static rational inattention model as a special case. Therefore, the within
period foundation of my model boils down to static rational inattention model and my
model serves as dynamic foundation for static rational inattention.
The rest of the paper is structured as follows. Section 2 setup a general discrete
time framework for dynamic information acquisition problem and provide simplification
results based on information theory. Section 3 derives the proper continuous time limit
of the discrete time problem and provides full characterization of its solution. Section 4
studies several extensions of baseline model. Section 5 concludes.
2. A General Discrete Time Framework
Assume that a decision maker(DM) faces the following decision problem:
• Decision problem: Time horizon t = 0, 1, . . . , ∞ is discrete. Length of each time
interval is dt. Both actions space A and state space X are finite. Utility associated
with action-state pair (a, x) is u(a, x) and the DM discounts his utility exponentially
with factor ρ > 0. If the DM take action a ∈ A at time t conditional on state being
x ∈ X, then his utility gain is e−ρt·dt u(a, x).
• Uncertainty: When not knowing the true state, the DM will form a belief µ ∈ ∆X
about the state. Her preference under uncertainty is expressed as von NeumannMorgenstern expected utility. I am going to use two essentially equivalent formulations
to express expected utility. 1) Given belief µ, the expected utility associated with each
action a ∈ A is Eµ [u(a, x)]. 2) State and action are treated as random variables X , A.
Expected utility is denoted by E[u(A, X )]. F (µ) = maxa∈A Eµ [u(a, x)] is assumed to
be the expected utility from choosing optimal action given belief µ.
• Information: Information is defined as a signal space S = {s} with state dependent
conditional distribution of signals {g(s|x)}. The DM is Bayesian sequentially rational.
Given prior µ, signal structure (S, g(s|x)), her posterior belief µ′ when observing signal
3
s is determined by Bayes rule: µ′ (x|s) = ∑ ′ g(s|x)µ(x)
′
′ . An information structure can
x ∈X g(s|x )µ(x )
be equivalently denoted by a random variable S, whose realization is in S and whose
joint distribution with state X is defined by conditional distribution g.
Markov Chain Property: To capture the fact that the information required to take an
action must be provided by signals, I assume that state, information and action form
a Markov chain:
X →S→A
That is to say, state and action are independent conditional on information: X ⊥A|S.
Information Measure: Measure of informativeness of information structure is characterized by a prior dependent function I(S; X |µ) ≥ 0, which is the information measure
of signal structure S when prior is µ. I is defined as a measure of reduced uncertainty level of belief: I(S; X |µ) = H(µ) − Es [H(µ′ (x|s))],
where µ′ are posterior beliefs
determined by Bayes rule. I(S; X |S1 ) = I(S; X |µ′ (x|s))S1 =s represents a random variable taking value as information measure of S conditional on prior belief of X being
posterior beliefs associated with realizations of S1 .
Uncertainty measure H : ∆X 7→ R+ is assumed to be a concave function.
1
Information Cost: Given the structure of information measure, we define a time separable information cost function. In each period, DM pays information cost f (I (S, X |µ))
which transforms information measure of the information structure into utility loss.
f : R+ 7→ R+ is non-decreasing.
• Dynamic Optimization: The dynamic optimization problem of the DM will be:
]
[
∞
∑
(
(
))
[
]
e−ρdt·t f I S t ; X |S t−1
e−ρdt·T E u(AT , X )|S T −1 −
sup E(S t =S t )∞
t=0
S t ,At ,T
(1)
t=0

t−1

X → S → 1T ≤t s.t. X → S t−1 T =t → At T =t


X → S t → S t−1
where T ∈ ∆N, t ∈ N. S −1 is defined as a deterministic variable that induces belief
same as prior belief µ of the decision maker (for notation simplicity). The DM chooses
action time T , choices of action conditional on action time At and signals S t subject
to information cost and three natural constraints for information process:
1. Information path prior to action time is sufficient for action time.
2. Information received prior to period t is sufficient for action at time t.
3. Information is accumulative over time.
1
2
This is sufficient and necessary for I(S; X |µ) to be non-negative for any information structure and prior.
Noticing that in every period, the information in current period has not been acquired yet. So decision can only be
taken based on the information already acquired in the past. So the Markov chain property on information and action
time/action will have information lagged by one period. This within period timing issue doesn’t make a difference when
going to continuous time limit. It matters when going to the linear cost limit and we will highlight this in Theorem 9.
2
4
2.1. Simplification of Information Structure
In this subsection, I am going to simplify the problem defined in Section 2. It is almost
impossible to directly solve or provide characterization to solution of Equation (1) due
to both dimensionality and complexity of the space of feasible strategies. On one hand,
the space of general random processes are all huge spaces of infinite dimension. On the
other hand, the three constraints on information and action process interacts in a very
implicit way. The simplification result will be mainly built on the following lemma:
Lemma 1. Information measure I(S; X |µ) satisfies the following properties:
1. Markov property: If X → S → T , then I(T ; X |S) = 0.
2. Chain rule (Linearity): I(S, T ; X |µ) = I(S; X |µ) + E[I(T ; X |S, µ)].
3. Information processing inequality: If X → S → T , then I(T ; X |µ) ≤ I(S; X |µ).
Lemma 1 is an analog to three key theorems on mutual information proved in Cover
and Thomas (2012), generalizing the log-sum structure in mutual information to the
general form of information measure we defined. The intuition of Lemma 1 is very simple.
The first statement says that when S contains all information about X in T , then the
cost of acquiring T when knowing S will be zero. This is intuitive because in this case,
all randomness contained in T other than S are irrelevant to unknown state and can be
produced by an arbitrary random device jointly with the known signal S. The second
statement says that the cost of acquiring a combined signal (S, T ) could be decomposed
linearly into first acquiring S and then acquiring the remaining information in T . The
third statement says that the if S contains all information about X in T , then acquiring
S will be more costly than acquiring T .
Lemma 1 illustrates two key ingredients of my framework: sophistication and flexibility. On one hand, the hypothetical DM is sufficiently deliberate so that she can perfectly
distinguish any “random noise” from “useful information”. She can also perfectly extract information from any random process and form correct posterior belief through
rationally applying Bayes rule. On the other hand, the design of information is totally
flexible. Therefore, the DM can separate any information that is costly but not necessary
for her decision making and discard it to minimized information cost.
Given any strategy (T , At , S t ), the expected utility can be weakly improved by simplifying the information structure S t in three steps. Step one is to first enrich S t by nesting
all past signals (S 0 , . . . , S t ) into the current period signal Set . Since information is accumulative, this operation doesn’t add any additional information to what is currently
learned and what is potentially to be leaned. Step two is to discard all information after
an action being taken. It’s obvious that after an action being taken, information becomes
useless for the DM. Step three is to replace signal process by the action process itself.
This operation weakly reduces cost due to the information processing inequality. To sum
up, we can assume WLOG that the optimal strategy only involves signal structure S t of
the following form:
t
 t−1
S are degenerate.
S
,
1
T
>t+1


S t−1 T =t = At T =t

 t S
= S t−1 T ≤t
T ≤t
5
In each period, the DM will acquire information in the form of combination of two
kinds of signals: signals that directly lead to actions in next period, and signals that indicates continuation. Then instead of abstract random process, I can represent information
as conditional distributions of actions/signals. By rewriting the sequential problem into
a recursive problem, I obtain the following representation lemma:
Lemma 2 (Recursive formulation). Vdt (µ) is the optimal utility level solving Equation (1)
given initial belief µ if and only if Vdt (µ) solves the following functional equation:
{
}
∑
−ρdt
Vdt (µ) = max F (µ), supe
(2)
pi Vdt (µi ) − f (C)
pi ,µi

∑
C = H(µ) −
pi H(µi )
s.t. ∑

pi µi = µ
where (pi ) ∈ ∆(2 |X|), (µi ) ∈ ∆X 2|X| .
Equation (2) is very straight forward except for the subtlety in dimensionality of
strategy space. The first term in the maximization problem is utility from immediate
action based on current belief. The second term is the supreme of expected gain from
information acquisition. By choosing a signal structure, it’s equivalent to choose posterior
∑
∑
beliefs subject to Bayesian plausibility
pi µi = µ. So e−ρdt pi Vdt (µi ) will be the
expected utility from observing the signals inducing µi and delaying further decision to
next period. f (C) is the cost of information acquisition in this period.
Now, the only remaining problem with Equation (2) is that it covers a very restricted
space of strategies. The choice of signal structure is restricted to no more than 2 |X|
posteriors each with positive possibility, while the original space contains signals with
arbitrary number (even continuum) of realizations. This simplification is based on a
generalized concavification methodology developed in Theorem 10. The original concavification methodology from Aumann et al. (1995) (utilized by Kamenica and Gentzkow
(2009)) states the fact that in Rn (a finite dimensional belief space), any point in upper
concave hull of function f (x) could be achieved by linear combination of no more than n
points in the graph of the function. My problem Equation (2) involves an additional term
f (C), which is not simply in the expectation form but similar mathematical intuition applies. One can replace this extra term using a multiplier and apply the concavification
method on the Lagrangian. In this case, the number of posterior beliefs needed grows by
no more than twice.
Finally, I established that solving the simplified problem Equation (2) is equivalent
to solving the original problem Equation (1). Thanks to reduced dimensionality, the
standard Blackwell condition can be used in Lemma 3 proves the existence and uniqueness
of solution to Equation (2).
Lemma 3. ∀dt > 0, there exists unique Vdt ∈ C∆X solving Equation (2).
2.2. Flow Cost Structure
Starting from this subsection, we can focus on Equation (2). To characterize solution to this functional equation using differential equations, we impose extra smoothness
assumptions on information measure H and cost function f :
6
Assumption 1 (Uncertainty measure).
• H : ∆X 7→ R− is C (2) smooth.
{
}
• ∀δ > 0, ∆δ = µ ∈ ∆X minx {µ(x)} ≥ δ , H ′′ (µ) is Lipschitz continuous on ∆δ .
• H ′′ (µ) is negative definite ∀µ ∈ ∆X.
Assumption 1 imposes conditions on uncertainty measure H. The first and second
conditions are just smoothness conditions for mathematical convenience. The third condition rules out the case that H is locally linear at some µ ∈ ∆X. Linearity of H around
µ is equivalent to a very uninformative signal structure at prior µ being totally free. I
rule out this case in order to avoid discussing technical details related to free information.
Assumption 2 (Bounded capacity). ∀dt > 0, there exists c s.t.:
{
0
when c′ ≤ cdt
f (c′ ) =
∞ when c′ > cdt
Assumption 2 restrict the cost function f to be a hard cap: information is essentially
free when its measure is below per period capacity cdt and infinitely costly when exceeds
this capacity. This condition forces the DM to smooth his information acquisition process
over time. With Assumption 2, solution to Equation (1) will be the solution to following
functional equation:
{
}
∑
−ρdt
Vdt (µ) = max F (µ), max e
pi Vdt (µi )
(3)
pi ,µi

∑
H(µ) −
pi H(µi ) ≤ cdt
s.t. ∑

p i µi = µ
In the main paper, I impose Assumptions 1 and 2 all the time afterwards and focus
on Equation (3) as well as its continuous time limit. This helps me to focus on the design
of delicate structures of information in a dynamic environment and keep analysis clean.
In Section 4, I’m going to replace Assumption 2 with Assumption 2′ and show that the
main characterization results are exactly the same in an environment with a flexible cost.
2.3. Continuous Time Limit of Information
Before moving to continuous time limit of optimization problem, let’s informally study
the form of signal structures in continuous time limit. In static environment, the conditional distribution formulation used in rational inattention literature and the posterior
belief formulation used in Bayesian persuasion literature provide great mathematical
convenience in corresponding problems. In dynamic environment, tracking the whole
trajectory of signal distributions or induced beliefs becomes extremely hard. However,
suppose time interval is converging to zero and per period information cost is bounded
to be of same order of interval length, any dynamic information acquisition strategy can
be represented as a combination of two families of basic random process familiar to us:
1. Limiting diffusion signals: At prior µ, if a series of experiments ({µ′idt } , pdt (µ′i ))
have µ′idt → µ, pdt (µ − µ′idt )2 ∼ O(dt), then this series of signal structures are
limiting diffusion.
7
2. Limiting Poisson signals: At prior µ, if a series of experiments ({µ′idt } , pdt (µ′i ))
have pdt (µ′i ) ∼ O(dt) for all i s.t. µ′idt ̸→ µ, then this series of signal structures are
limiting Poisson.
Lemma 4. With either Assumptions 1 and 2 or Assumptions 1 and 2′ satisfied. At any
prior µ, the optimal policy for dt → 0 must be a combination of limiting diffusion and
Poisson signals i.e. ∀ convergence subsequence µ′idt , pdt , ∀i, pdt (µ − µ′idt )2 ∼ O(dt).
Consider posterior beliefs as a random process. Lemma 4 shows that the optimal
policy must has flow variance of same order of time interval length. Therefore, take
any converging (in the sense of both posteriors and probabilities) sequence of process,
there are only two possible limiting behavior: 1) posterior beliefs are bounded away from
prior. Then it’s probability must be of order of time interval length. 2) posterior beliefs
are converging to prior. Then it behaves as if it’s diffusion around prior with bounded
flow variance. In Section 3, I am going to formalize this argument by showing that it’s
sufficient to only consider these two kinds of process in the continuous time limit of the
optimization problem.
3. Optimal Information Acquisition in Continuous Time
In this section, I restrict my attention to one-dimensional belief space. I derive the
proper continuous time limit of Equation (3) and characterize the optimal strategy of
information acquisition. The continuous time limit of solution Vdt (µ) to Equation (3)
will be characterized by a very simple functional equation: At any instant of time, the
DM chooses between optimal immediate action and continuing information acquisition.
Conditional on continuation, the DM essentially allocates the flow capacity into two kinds
of signals: 1) Poisson-bandit signals that arrive at a Poisson rate and drive jumps in
posterior belief. 2) diffusion signal that drives diffusion in posterior belief. The optimal
strategy is even simpler. It involves a single Poisson signal that drives jump toward
confirming the DM’s prior belief. Any arrival of signal is followed by immediate action.
The following assumption is necessary for me to utilize tools in ordinary differential
equations to analyze this problem:
Assumption 3.
1. (Binary states): |X| = 2, ∆X = [0, 1].
2. (Positive payoff ): F (0), F (1) > 0.
Assumption 3 contains two parts. First, state space is assumed to be binary, then
belief space is one-dimensional. Second, I assume that knowing the true state ensures
the DM strictly positive utility. This assumption is made to guarantee that there are
non-degenerate region of beliefs when the DM is sufficiently sure about the state and she
chooses an immediate action. This is needed to setup a proper boundary condition for
the differential equations.
Continuous time problem is introduced and derived in Section 3.1. Solution is characterized in Section 3.2.
8
3.1. Convergence in Continuous Time
To be formal, the object I am going to study is limit of value function Vdt when time
interval dt converges to zero. Lemma 5 shows the existence of such a limit under L∞
norm:
Lemma 5. Let V (µ) = lim supdt→0 Vdt (µ), then ∥Vdt (µ) − V (µ)∥∞ → 0 when dt → 0.
Now I setup a continuous time optimization problem directly. Later Theorem 1 will
guarantee that this problem properly characterizes V (µ) I just defined. Consider the
following Bellmen equation defined on C (1) smooth functions:
{
}
∑
D2 V (µ) 2
′
ρV (µ) = max ρF (µ), sup
pi (V (µi ) − V (µ) − V (µ)(µi − µ)) +
(4)
σ̂
2
µi ∈[0,1],
pi ,σ̂∈R+
s.t.
∑
pi (H(µ) − H(µi ) + H ′ (µ)(µi − µ)) −
H ′′ (µ) 2
σ̂ ≤ c
2
Despite the usage of general second derivative D2 V (µ) 3 , Equation (4) is quite straight
forward based on our intuition from last section. Left hand side of Equation (4) is
the flow cost from discounting. In a Bellmen equation, it equals to the flow utility
gain from continuation. Right hand side of Equation (4) maximizes over two terms.
The first term is flow utility loss from not choosing an optimal immediate action. The
second supremum term is utility gain from information acquisition in an infinitesimal
period. Consider a Poisson signal that arrives at Poisson rate pi and induces posterior
belief µi . When it arrives, the utility gain is V (µi ) − V (µ). When it doesn’t arrive,
posterior belief will be drifting against µi . A standard result shows that the speed of
drifting is exactly −pi (µi − µ). Therefore, the utility gain from receiving no signal is
−pi V ′ (µ)(µi − µ) because one is essentially drifting along the value function. To sum
up, the term pi (V (µi ) − V (µ) − V ′ (µ)(µi − µ)) summarizes the flow gain of waiting for
a signal (pi , µi ). Consider instead a signal which arrives constantly but with very low
informativeness, for example a Weiner process parametrized by the state. Then the
utility gain from waiting for such a signal will be flow variance of belief process times
2
local concavity of value function σ̂2 D2 V (µ). This is introduced with details in Moscarini
and Smith (2001).
It is no surprising that the cost constraint takes the form in Equation (4). By Equa∑
tion (3), utility gain from information acquisition in discrete time is
p V (µ ) − Vdt (µ),
∑ i dt i
while information measure of this information structure is H(µ) − pi H(µi ). Therefore,
utility gain takes same form as information cost. In the continuous time limit, simply
apply the same form of utility gain to information cost, I get the constraint formulation
in Equation (4).
An informal interpretation of Equation (4) is that, the trade-off for the DM is precision
versus intensity. When choosing posterior µi further away from prior µ, signal is more
precise and potentially leads to a higher utility gain upon arrival. But due to capacity
′
′
′
(µ)
General second derivative D2 V (µ) is defined by: D2 V (µ) = lim supµ′ →µ V (µµ)−V
. I made this
′ −µ
(1)
generalization because at boundaries where V is pasted to F , C smooth V is not twice differentiable.
3
9
constraint, the DM can only do so at a cost of reducing its arrival rate pi . When the DM
chooses a very fuzzy signal in the limit, she essentially chooses a diffusion signal. It’s not
hard to see that both D2 V and V (µi ) − V (µ) − V ′ (µ)(µi − µ) characterize concavity of a
function (local and global concavity). Therefore, the key factor determines the optimal
level of experimentation is relative concavity between value function V and uncertainty
measure H. When V is relatively more concave than H, the DM values gain from an
informative signal µi relatively more than its cost. In this case, she is willing to wait
for a signal to arrive. On the other hand, when V is relatively less concave than H, the
cost of waiting outweighs gain from information. The DM stops information acquisition
and chooses immediate action. Before discussing the details of solution to Equation (4),
I prove that it properly characterizes V (µ) through the following theorem:
Theorem 1. With Assumptions 1, 2 and 3 satisfied. Suppose V (µ) ∈ C (1) is a solution
to Equation (4). Let Vdt (µ) be solution to Equation (3) and V (µ) = lim supdt→0 Vdt (µ).
Then V (µ) = V (µ).
Theorem 1 proofs that whenever solution to Equation (4) exists, this solution will be
unique and coincides with the limit of solution to discrete time problem Equation (3).
Therefore, the Bellmen equation I developed is the proper problem to study. The intuition
of proof is simple and contains three steps. Step 1, rewrite the whole functional equation
on a larger space of locally Lipschitz continuous functions. By standard intuition of maximum principle, Vdt will be all Lipschitz continuous because each type can simply mimic its
neighbours. As a limit, V (µ) will be locally Lipschitz continuous. Rewrite Equation (4)
to accommodate all locally Lipschitz continuous value functions. Step 2, show that V (µ)
is unimprovable under the optimization problem. Suppose by contradiction that V (µ) is
improvable, then the strictly dominating strategy (pi , µi ) or σ
b2 can be discretized such
that a discrete time value function Vdt with dt sufficiently small can also been improved.
Step 3, show that V (µ) equals the solution to the functional equation V (µ) (this proves
feasibility automatically). The idea is as following. Suppose by contradiction they are
different. Consider at some µ, the distance between V (µ) and V (µ) is maximized (say
V (µ) < V (µ)). Then this implies that at µ, the lower function will be weakly “more
convex” than the higher function, and jumping/diffusing to elsewhere will give the DM
weakly higher utility gain: ∀µ′ , V (µ′ )−V (µ)−V ′ (µ)(µ′ −µ) ≥ V (µ′ )−V (µ)−V ′ (µ)(µ′ −µ).
Then by the nature of our problem, any signal structure will be more beneficial with V (µ)
than with V (µ) at µ. However, value function is proportional to the experimentation gain
everywhere in the problem and this leads to contradiction.
3.2. Characterization of Solution
3.2.1. Geometric Representation
In last section, I vaguely talked about “relative concavity” between value function
and uncertainty measure being the key factor determining the optimal strategy. Now
let’s study this idea in a more formal way. Suppose value function V (µ) is given, consider
the optimization problem choosing one posterior belief:
sup p (V (ν) − V (µ) − V ′ (µ)(ν − µ))
ν,p
10
(5)
s.t. p (H(µ) − H(ν) + H ′ (µ)(ν − µ)) ≤ c
Equation (5) is much more restrictive than Equation (4). I assume that the DM
decided to continue information acquisition and acquires only one Poisson-bandit signal.
It’s not hard to see that this is without much loss of generality. Since both objective
function and constraints are linear in pi , σ
b2 , it’s WLOG to assume that the DM always
chooses one signal with largest gain/cost ratio. Suppose it’s optimal to choose diffusion
signal, then it can be approached by ν → µ when V is sufficiently smooth (twice differentiable). In this part, let’s assume V being sufficiently smooth. First order conditions
for ν and p implies:
FOC-ν : V ′ (ν) − V ′ (µ) + λ (H ′ (ν) − H ′ (µ)) = 0
FOC-p : V (ν) − V (µ) − V ′ (µ)(ν − µ) + λ (H(ν) − H(µ) − H ′ (µ)(ν − µ)) = 0
{
G′ (ν) = G′ (µ)
G=V +λH
=====⇒
G(ν) − G(µ) − G′ (µ)(ν − µ) = 0
(6)
Let’s call G(µ) the gross value function which integrates shadow cost from the capacity
constraint. Equation (6) is simply a concavification characterization: both G(µ) and
b of G. G(µ) + G′ (µ)(ν − µ) as a linear function of
G(ν) are on the upper concave hull G
ν will be the lowest supporting hyperplane of graph of G. This is actually very similar
to the concavification we are familiar with in Bayesian persuasion literature. The only
difference is that in current problem, at prior µ, graph of gross value function must
coincide with convex hull of its own graph. This result has clear economic meaning. By
definition of gross value function, when shadow cost λ is chosen properly such that solving
unconstrained problem:
sup p(G(ν) − G(µ) − G′ (ν)(ν − µ))
p,ν
is equivalent to solving the constrained problem. Whether this problem yields positive
payoff depends on whether G(µ) is below or on its upper concave hull. In static maximization problem, G(µ) strictly below its upper concave hull implies strictly positive gain
from information. However, in the current problem, strictly positive gain induces the DM
to infinitely increase experimentation intensity p. To keep flow cost bounded, the payoff
from concavifying gross value function must be exactly zero. One can understand it as
the limit that prior belief µ being arbitrarily close to the boundary of a region where
b > G. So utility gain from experimentation diminishes at the same order of time interG
val length, which matches the magnitude of opportunity cost of waiting. What’s more,
suppose V (µ) is achieved by solving Equation (5), we can impose feasibility condition:
ρV (µ) = c
V (ν) − V (µ) − V ′ (µ)(ν − µ)
H(µ) − H(ν) + H ′ (µ)(ν − µ)
ρ
FOC-p
===⇒ λ = V (µ)
c
The shadow cost on capacity constraint λ is determined by two terms: ρc is the effective
discount factor (increasing flow capacity constraint is equivalent to time passing faster).
11
V (µ) is the value function at µ. This is also intuitive. The shadow cost of violating
capacity constraint can be translated to loss in reduced arrival rate, which is exactly
discounted continuation value.
{Value function}
{Uncertainty Measure}
V
0
{Gross value function}
H
1
G
μ
1
μ
0
μ
ν
1
μ
The left panel shows optimal value function V (µ) (blue line) with F (µ) defined by
the dashed lines. The center panel shows the uncertainty measure H(µ), defined as
standard Entropy function. The right panel shows the gross value function evaluated
at µ, G = V + ρc V (µ)H (blue line). The dashed black line is supporting hyperplane
of graph of G. It tangents G at both µ and ν (the optimal posterior).
Figure 1: Concavification of gross value function
A geometric characterization of my previous analysis is illustrated in Figure 1 4 Left
and center panel of Figure 1 shows typical shape of value function V (convex) and information measure H (concave). Right panel shows the gross value function V + ρc V (µ)H
with prior µ. The gross value must have the following property: prior (µ, G(µ)) is on
upper concave hull of G, the optimal posterior (ν, G(ν)) is also on upper concave hull of
G. And they must be on the same supporting hyper plane (the dashed line). On the
figure, it’s clear that whether continuing information acquisition is profitable depends
on relative convexity of value function versus uncertainty measure. At µ, V is relatively
more convex than H and information is sufficiently valuable. Then it’s optimal to wait
for an informative signal. When µ approaches boundary, V becomes flat and it will be
optimal to choose an action immediately.
Remark. So far I focused on the optimization problem choosing an optimal posterior away
from prior and totally ignores the possibility of diffusion signals. But it’s not hard for
one to see that diffusion signals being optimal can also be represented within the same
framework: gross value function G is locally linear but globally concave. The choice
between Poisson-bandit and diffusion signals is eventually determined by local concavity
versus global concavity of gross value function.
Finally, if one is interested in exactly which type of evidence (posterior belief) to
choose, perturbing the weight λ in gross value function provides useful intuition. Since the
shape of V and H is fixed for a particular problem. For different prior µ, the only factor
governing shape of G is λ = ρc V (µ). Increasing λ increases concavity in G. Therefore,
tangent points of supporting hyperplane to G will get closer. That is to say, λ adjusts
the weight on local convexity and global convexity of G, which determines the location
of optimal posterior ν. Recall that λ measures continuation value. So this means with
Parameters used to calculate this example: F (µ) = max {0.5µ − 0.2, 0.3 − 0.5µ},
−µ log(µ) − (1 − µ) log(1 − µ).
4
12
ρ
c
= 2, H(µ) =
higher continuation value, the DM is more willing to give up signal precision to achieve
higher signal arrival rate.
The discussion of geographic representation of the optimization problem provides us a
clear map between interested aspects of information acquisition problem and geographic
properties of value function. Table 1 summarizes key factors determining the four interested aspects of information acquisition problem. The first three columns summarize the
discussion in this section. The discussion of last column will be presented in Section 4
when I endogenize capacity constraint with an actual cost function.
Table 1: Four aspects of dynamic information acquisition problem
Dimension
of interest:
Trade-off:
Timing of action
continuation V.S.
stopping
Deterministic relative convexity
factor:
of V to H
Informativeness of
signal
Poisson-bandit
V.S. diffusion
global V.S. local
convexity of G
Type of evidence
precision V.S. arrival rate
λ adjusting global
and local concavity
of G
Experimentation
Intensity
N.A.
continuation valuation
3.2.2. Main Characterization Theorem
In this subsection, I will state the main characterization theorem of value function
V (µ), discuss its proof utilizing the geometric representation I developed in last subsection
and explain its economics intuitions.
Given Theorem 1, to characterize V (µ), it’s sufficient to find a smooth solution for
Equation (4). I prove the existence of such a solution and provide characterization simultaneously by constructing the optimal policy function:
Theorem 2. With Assumptions 1, 2 and 3 satisfied, there exists V ∈ C (1) [0, 1] solving
Equation (4). Let E = {µ ∈ [0, 1]|V (µ) > F (µ)} be experimentation region, there exists
ν : E 7→ [0, 1] \ E s.t.:
ρV (µ) = − c
F (ν(µ)) − V (µ) − V ′ (µ)(ν(µ) − µ)
H(ν(µ)) − H(µ) − H ′ (µ)(ν(µ) − µ)
where ν(µ) has the following properties:
1. ∃µ∗ ∈ (0, 1) s.t. ν(µ) ≥ µ when µ ≥ µ∗ and ν(µ) ≤ µ when µ ≤ µ∗ .
2. ν(µ) is piecewise C (1) smooth on E.
3. ν(µ) is piecewise strictly decreasing on E.
4. ν(µ) is unique solution to Equation (4) almost everywhere.
Theorem 2 proves existence of solution to Equation (4) and characterizes the optimal
policy function. The theorem first states that optimal value function can be achieved
through evidences arriving as Poisson process, i.e. an signal inducing optimal posterior
belief ν(µ). What’s more, since ν maps experimentation region E into immediate action
region [0, 1] \ E, the DM will take immediate action upon arrival of signals. Property 1
says that the optimal signal will be confirmatory evidence. When µ ≥ µ∗ , the DM holds
13
prior belief that state 1 is more likely and she acquires information that induces even
higher posterior belief. Vice versa for µ ≤ µ∗ . Conditional on receiving no signal, the
DM’s belief will drift towards µ∗ . Property 3 says that while belief is drifting towards
µ∗ , the posterior belief induced by signal will be moving towards 0 or 1. When feeling
more and more ambiguous which state being the true state, the DM acquires signal
with increasing precision. Finally, property 2 and 4 states that optimal policy ν is a
well-behaved function and is essentially uniquely defined.
I calculated a simple example and presented the solution in Figures 2 and 3.5 There
are four actions and associated payoffs are represented by F (µ), the dashed curve in Figure 2 with three kinks. We refer to the two actions with steeper slope riskier actions and
the two actions with flatter slope safer actions. The blue curve in Figure 2 represents
value function V (µ). In the shaded region V (µ) > F (µ). So its projection on horizontal
axis is experimentation region E. Figure 3 shows the optimal policy function ν(µ). Both
blue curve and red curve shows ν, blue curve means the optimal action associated with
posterior being riskier actions and red curve means the optimal action associated with
posterior being safer actions. As stated in Theorem 2, the policy function is piecewise
smooth and decreasing. The three arrowed curves in Figure 2 shows example of optimal
strategy at three different priors. The arrows start at priors and points to optimal posteriors. The two blue arrows mean associated action being riskier action and the red arrow
means that being safer action.
Figure 2: Value function
Figure 3: Policy function
V
ν
0.30
1.0
0.25
0.8
0.20
0.6
0.15
0.4
0.10
0.2
0.05
0.2
0.4
0.6
0.8
1.0
μ
Blue line is value function V . Dashed black line
is immediate action payoff F . Shaded region
projected on horizontal axis is experimentation
region E. Arrows starts from a prior and points
to its optimal posterior.
0.2
0.4
0.6
0.8
1.0
μ
Dashed straight line is ν = µ. Blue lines and
red lines are policy function ν(µ). When policy
function is blue, optimal posterior leads to the
two outer actions (riskier actions). When policy
function is red, optimal posterior leads to the
two inner actions (safer actions).
Sketched proof: I proved Theorem 2 by construction and verification. I conjecture that
optimal policy to Equation (4) takes the form in Theorem 2: a single confirmatory signal
associated with immediate action. Then I constructed V (µ) and ν(µ) with three steps.
Step 1 is to determine the critical belief µ∗ . µ∗ can be calculated as essentially the unique
belief at which acquiring optimal signals with higher belief or lower belief yields the same
Parameters used in this example: F (µ) = max {0.5µ − 0.25, 1.3µ − 1, 0.25 − 0.5µ, 0.3 − µ},
H is standard Entropy function.
5
14
ρ
c
= 3,
payoff, i.e. the unique belief at which a stationary information acquisition strategy is
optimal. Step 2 is to fix an action and solve for the optimal policy function. Once action
is fixed, the only parameter to choose is posterior belief ν. The problem is:
ρV (µ) = max −c
ν
(u(a, 1)ν + u(a, 0)(1 − ν)) − V (µ) − V ′ (µ)(ν − µ)
H(ν) − H(µ) − H ′ (µ)(ν − µ)
Since we fixed an action a, kinked function F is replaced with a linear function Fa (ν) =
u(a, 1)ν + u(a, 0)(1 − ν). The objective function is sufficiently smooth in ν. Then, first
order equation w.r.t. ν and feasibility condition yields a well behaved first order ordinary
differential equation characterizing ν(µ). Therefore we can solve for optimal policy ν
and calculate value V (µ) accordingly. Step 3 is to update the value function w.r.t. all
alternative actions and smoothly paste the solved value function piece by piece. This
step starts from using the value and policy at µ∗ as initial value for the ODE defined in
step 2. Then I extend the value function towards µ = {0, 1}. Whenever I reach a belief
at which two action yield same payoff, I setup a new ODE with the new action. This
process continues until the calculated value function V (µ) smoothly pastes to F (µ).
Verification also takes three steps. Step 1 is to verify that V (µ) is optimal w.r.t.
repeated experimentation. Step 2 is to verify that V (µ) is optimal w.r.t. non-confirmatory
signals. Step 3 is to very that V (µ) is optimal w.r.t. diffusion signals. The intuition of
the proofs is easy to understand with the geometric characterization introduced in last
subsection. First, suppose by contradiction that repeated experimentation is profitable
at some belief µ. This implies that if we consider G(ν) = F (ν) + ρc V (µ)H(ν) (gross
value function with immediate action), (ν(µ), G(ν(µ))) will be on the upper concave
e
hull of G. However, suppose we allows continuation experimentation and take G(ν)
=
ρ
V (ν) + c V (µ)H(ν) (gross value function with continuation experimentation), there will
e ′ )) strictly higher than upper concave hull of G. As is shown in Figure 4, the
be (ν ′ , G(ν
e ′ )) on the
red curve is the hypothetical higher gross value function. Then take (ν ′ , G(ν
b Then clearly ν ′ is the only point
hypothetical value function that is furthest away from G.
e touching supporting hyper plane (the red dashed line). Now consider the maximization
G
problem at ν ′ , since ν ′ ≥ µ, λ is even higher and gross value function at ν ′ is even more
concave. Then ν ′ must still be the unique point touch its supporting hyper plane. This
means V (ν ′ ) is not feasible, which is a contradiction.
G
0
G
μ
ν'
ν
1
μ
0
G
μ
ν
1
μ
0
μ
ν
1
μ
Figure 4: Proof of no repeated ex- Figure 5: Proof of confirmatory Figure 6: Proof of Poisson signal
perimentation
evidence
Second, consider the gross value function at µ∗ . By construction, at µ∗ stationary
experiment is optimal. Therefore the supporting hyperplane must touches G at two
15
posteriors at both sides of µ∗ . The solution to ODE shows that V is minimized at µ∗ .
Therefore, for any other belief other than µ∗ , the gross value function must be strictly
more concave. It’s not hard to see from the shape of gross value function in Figure 5 that
with more concave gross value function, at µ > µ∗ the supporting hyperplane can only
cross even higher posterior belief and vice versa for µ < µ∗ . Finally, suppose diffusion
signal strictly dominates Poisson signal at µ, then the gross value function has shape in
Figure 6. Consider beliefs around µ, the shape of gross value function suggests that they
can be achieved through concavification only if they have lower λ. This implies µ being
a local maximizer of V , contradicting the quasi-convex shape of V .
Economic intuition: Timing of action is determined by kinks of F (µ). Geometrically,
at kinds of F , F is infinitely convex relative to H. Therefore continuation is always locally
profitable. While at flat area of F , H tends to be sufficiently concave that immediate
action is optimal. Every kink of F (µ) is a critical belief at which arbitrarily small shift
in posterior belief will shift the choice of optimal action. Therefore, at those beliefs
infinitesimal amount of information will be valuable while cost diminishes at second order.
As a result, the shape of experimentation region E takes the form of disjoint intervals
around kinks of F (µ).
Type of evidence is determined by level of continuation value. Geometrically, when
µ is further away from µ∗ , λ is larger. So global concavity of G is relatively lower and
optimal posterior will be closer to prior as a result. When the DM is more sure about
the state the continuation value is high, thus cost from discounting (parametrized by λ)
is high. The DM wants to receive a signal and enjoy the high value soon. Therefore,
optimal signal will be with lower precision and higher arrival rate. Vice versa, when DM
is more ambiguous about the state the continuation value is low. The DM is willing to
suffer from long waiting for a more precise signal. Diffusion signals will only be used as
a limit when the DM is close to the boundary of experimentation region.
4. Extensions
In this section, I’m going to study several extensions of the baseline model. The first
extension is to endogenize the capacity constraint and study the dynamics of experimentation intensity jointly with choice of evidence type and precision. I setup a continuous
time model which allows the DM to pay a convex cost on measure of information. The
standard monotonic property6 still holds on experiment intensity while the dynamics of
evidence type and precision are same as characterized in baseline model. The second
extension is to relax the finite action assumption. I showed that problem with continuum
of actions can be approximated well by adding actions, in the sense of both value function and policy function. The third extension is to generalize structure of information
measure and study robustness of the main characterization result. I modeled a generic
set of information measure in continuous time and showed that it’s generally true that
informative Poisson signal is strictly superior to diffusion signals. The forth extension
is to connect the dynamic model to well studied static models. The dynamic model in
6
Optimal cost function is isomorphic to value function. Proposition 1 of Moscarini and Smith (2001).
16
current paper will reduce to a static rational inattention model when the agent is fully
patient or when cost function is linear.
4.1. Convex Flow Cost
In addition to the dimensions studied in baseline model, I am also interested in optimal
experimentation intensity when flow capacity is also endogenized. In this extension, I
replace Assumption 2 with Assumption 2′ .
Assumption 2′ (Convex flow cost). Function h : R+ 7→ R+ is C (2) smooth, h′ (c) > 0,
(C )
∃ε > 0 s.t. h′′ (c) ≥ ε > 0, ∀dt > 0. f (C) is defined by f (C) = dt · h dt
Assumption 2′ restrict the cost function f to be C (2) smooth strictly convex function:
acquiring additional unit of information will be of increasing marginal cost. This gives
the DM incentive to smooth the acquisition of information over time instead of learning
everything and making decision instantly. With Assumption 2′ , solution to Equation (1)
will be the solution to following functional equation:
{
∑
(
) }
∑
H(µ)
−
p
H(µ
)
i
i
pi Vdt (µi ) − h
Vdt (µ) = max F (µ), supe−ρdt
dt
(7)
dt
pi ,µi
∑
s.t.
pi µi = µ
The continuous time version of Equation (7) will be the following Bellmen equation
on V ∈ C (1) [0, 1]:
{
}
2
∑
D
V
(µ)
ρV (µ) = max ρF (µ), max
σ̂ 2 − h(c)
pi (V (µi ) − V (µ) − V ′ (µ)(µi − µ)) +
µi ∈[0,1],
2
pi ,σ̂,c∈R+
(8)
where c = −
∑
pi (H(µi ) − H(µ) − H ′ (µ)(µi − µ)) −
′′
H (µ) 2
σ̂
2
Equation (8) is an analog to Equation (4). Flow gain and cost of experimentation are
defined in the same way as in Equation (4). The only difference is that instead of imposing
a constraint on maximal flow cost c, I assume the DM actually pays a convex function of
flow cost h(c).
′
Theorem 3. With Assumptions 1,
there exists V ∈ C (1) [0, 1] solving
2 and 3 satisfied,
{
}
Equation (8). Let E = µ ∈ [0, 1]V (µ) > F (µ) be experimentation region, there ∃ ν :
E 7→ [0, 1] \ E , c ∈ C (1) (E) s.t.
ρV (µ) = − c(µ)
F (ν(µ)) − V (µ) − V ′ (µ)(ν(µ) − µ)
− h(c(µ))
H(ν(µ)) − H(µ) − H ′ (µ)(ν(µ) − µ)
where ν and c have the following properties:
1. ∃µ∗ ∈ (0, 1) s.t. ν(µ) ≥ µ when µ ≥ µ∗ and ν(µ) ≤ µ when µ ≤ µ∗ .
2. ν(µ) is piecewise C (1) smooth on E.
3. ν(µ) is piecewise strictly decreasing on E.
4. c(µ) is strictly quasi convex and minimized at µ∗ .
17
5. (ν(µ), c(µ)) is unique solution to Equation (8) almost everywhere.
Theorem 3 proves existence of solution to Equation (8) and provides characterization
of optimal strategy. Other than property 4, the optimal strategy shares the same set
of properties as in Theorem 2. The optimal value function can be achieved through
evidences arriving as Poisson process. The DM takes immediate action upon arrival
of signals. Unique optimal signal will be confirmatory evidence arriving at increasing
precision conditional on continuation. Property 4 states that intensity of experimentation
(parametrized by flow cost) is higher when prior belief is further away from µ∗ . Since the
belief process is always drifting towards µ∗ conditional on continuation, this implies that
the DM will invest decreasing amount of resources into information acquisition.
The intuition for dynamics of experimentation intensity is already provided in Moscarini
and Smith (2001). The marginal gain from experimentation is proportional to continuation value while marginal cost is increasing in c. Therefore, the optimal cost is actually
isomorphic to value function. In Moscarini and Smith (2001) intensity and precision
of experimentation shares the same parameter (flow variance of diffusion process). My
analysis identifies this intuition separately from the intuition of lower signal precision
associated with higher continuation value.
4.2. Continuum of Actions
In this section, I will study extension of my model to accommodate a continuum of
actions in the underlying decision problem i.e. |A| = ∞. Then mathematically, the only
difference is that value from immediate action is no-longer a piecewise linear function:
F (µ) = sup E[u(a, x)]
a∈A
There will be several technical problems arising from a continuum of actions. For example
whether the supremum is achievable and whether F has bounded subdifferential. We
impose the following assumptions:
Assumption 4 (Continuum of actions). F (µ) = maxa∈A E [u(a, x)] has bounded subdifferentials.
Assumption 4 rules out two cases. First is that the supremum is not achievable.
Second case is some optimal action being infinitely risky: the optimal action with belief
approaching x = 0 has utility approaching −∞ with belief approaching x = 1 (and
similar case with states swapped). A sufficient condition for Assumption 4 will be:
Assumption 4′ . A is a compact set. ∀x ∈ X, u(a, x) ∈ C(A) ∩ TB(A).
The proof of Theorem 1 doesn’t rely on the fact that F (µ)is piecewise linear. Actually the only necessary properties of F (µ) is boundedness and continuity in Lemma 3
that proves existence of solution to discrete time functional equation Equation (A.1).
Therefore Assumption 4 guarantees that Lemma 3 and Lemma 5 still hold when there is
a continuum of actions. With Assumption 4, problem with continuum of actions can be
approximated well by a sequence of problems with discrete actions.
18
Lemma 6 (Convergence of value function). With Assumptions 1, 2, 3 and 4 satisfied,
let Vdt (F ) be the unique solution of Equation (3) and V(F ) = limdt→0 Vdt (F ). Then V is
a Lipschitz continuous functional under L∞ norm.
Lemma 6 states that a problem with continuum of actions can be approximated well
by sequences of problems with discrete actions in the sense of value function convergence.
Next, I push the convergence criteria further to convergence of policy function.
Theorem 4 (Convergence of policy function). With Assumptions 1, 2, 3 and 4 satisfied,
let {Fn } be a set of piecewise linear functions on [0,1] satisfying:
1. ∥Fn − F ∥∞ → 0;
2. ∀µ ∈ [0, 1], lim Fn′ (µ) = F ′ (µ).
Define Vdt (Fn ) as the solution to Equation (3). Define functional V(F ) = limdt→0 Vdt (F ).
Then |V(F ) − V(Fn )| → 0. What’s more:
1. V(F ) solves Equation (4).
2. ∀µ s.t. V (µ) > F (µ), let νn be maximizer of V(Fn ) s.t. ν = limn→∞ νn exists, then
ν achieves V(F ) at µ.
Theorem 4 states that to solve for a continuous time problem with a continuum of
actions, one can simply use both value function and policy function from problem with
finite actions to approximate. As long as the immediate action values Fn converge both
uniformly in value and pointwise in first derivative, the optimal value functions have a
uniform limit. The limit will solve Equation (4) and the optimal policy function will be
the pointwise limit of policy functions for finite action problems.
V
0.6
ν
1.0
0.5
0.9
0.4
0.8
0.3
0.7
0.2
0.6
0.1
0.6
0.7
0.8
0.9
1.0
μ
0.5
0.6
0.7
0.8
0.9
1.0
μ
The left panel shows the optimal policy function of discrete actions (red) and continuous actions (blue). The dashed line is ν = µ.
The right panel shows the optimal value function. The thin black line is value
from immediate action F (µ), the dashed lines are discrete approximations of the
continuous action value.
Figure 7: Approximation of a continuum of actions
Figure 7 illustrates this approximation process. On both panels, only µ ∈ [0.5, 1] is
plotted. On the right panel, the thin black curve shows a smooth F (µ) associated with
continuum of actions. Since optimal policy only utilizes a subset of actions, I approximate
19
the smooth function only locally as the upper envelope of dashed lines (each represents
one action). The optimal value function with continuous actions is blue curve and the
approximation is red curve. Left panel shows the approximation of policy function. The
blue smooth curve is optimal policy of continuous action problem and the red curve with
breaks is optimal policy of finite action problem.
To approximate a smooth F (µ), one can simply add more and more actions to the
finite action problem and use F ’s supporting hyper planes to approximate it. Then the
optimal policy function will have more and more breaks as optimal policy will involve
more frequent jumping among actions. In the limit, as number of breaks grow to infinity,
the size of breaks shrinks to zero and approaches a smooth policy function.
4.3. General Information Measure
In this section, I extend my model to information measures other than the posterior
separable measure. A general critique to posterior separable measures including mutual
information is that the information measure of the same Blackwell experiment is prior
dependent. I setup a continuous time Bellmen equation with a general information cost
structure which imposes no specific link between prior and posterior. I want to show that
one key feature—Evidence seeking I identified in the baseline model is generic.
Let J(µ, ν) be a sufficiently smooth function. Consider the following functional equation:
{
}
∑
ρV (µ) = max ρF (µ), sup
pi (V (νi ) − V (µ) − V ′ (µ)(νi − µ)) + σ 2 V ′′ (µ)
(9)
pi ,νi ,σ 2
s.t.
∑
′′
pi J(µ, νi ) + σ 2 Jνν
(µ, µ) ≤ c
The objective function of Equation (9) is exactly the same as that of Equation (4). I
assume the DM choose both Poisson signals and diffusion signal. The gain from experimentation is assumed in a way that diffusion signal is consistent with the uninformative
limit of Poisson signal. The information measure is different. I assume J(µ, ν) to be
an arbitrary function which is both prior and posterior dependent (of course this can
also accommodate prior independent measures). Cost of Poisson signal is assumed to be
pi J(µ, νi ) instead of a totally general function on (pi , µ, νi ) to capture the fact that DM
can always break a signal into several ones with lower arrival rate but inducing same
posterior. Cost of diffusion signal is assumed to be consistent with uninformative limit
′′
(µ, µ). We impose following
of Poisson signal. So Taylor expansion implies it being σ 2 Jνν
assumptions on J(µ, ν):
′′
Assumption 5. J ∈ C (4) (0, 1). ∀µ ∈ (0, 1), J(µ, µ) = Jν′ (µ, µ) = 0, Jνν
(µ, µ) > 0.
First J is assumed to be sufficiently smooth to eliminate technical difficulty. J(µ, µ) =
′′
(µ, µ) > 0 is neces= 0 is necessary for uninformative experiment being free. Jνν
sary for any signal that’s not totally uninformative being costly. Within this continuous
time framework, the assumptions imposed on J is without loss of generality.
Jν′ (µ, µ)
20
Theorem 5. Suppose V ∈ C (3) (0, 1) solves Equation (9) and Assumption 5 is satisfied,
let L(µ) be defined by:
(3)
(3)
(3)
ρ ′′
2Jννµ (µ, µ)2 + Jννν (µ, µ)Jννµ (µ, µ)
(4)
(4)
L(µ) = Jνν
(µ, µ)2 −
+ Jνννµ
(µ, µ) + Jννµµ
(µ, µ)
′′
c
Jνν (µ, µ)
Then in the open region:
{ }
D = µV (µ) > F (µ) and L(µ) ̸= 0
The set of µ s.t.:
ρV (µ) = c
V ′′ (µ)
′′ (µ, µ)
Jνν
will be of zero measure.
The interpretation of Theorem 5 is that Poisson signal is almost always going to be
strictly superior than diffusion signal. In the experimentation region where L(µ) ̸= 0,
V (µ) can be achieved by a diffusion signal only at a zero measure of points. L(µ) = 0
is a partial differential equation on J(µ, ν). Therefore, the set of points that L(µ) = 0
could contain interval only when J(µ, ν) is locally solution to the PDE. Solution to a
specific PDE is a non-generic set in the set of all functions satisfying Assumption 5. In
this sense, for an arbitrary information measure J(µ, ν), the optimal policy function will
not contain diffusion signal almost everywhere.
A simple sufficient condition for L(µ) ̸= 0 is Assumption 6.
(3)
Assumption 6. ∀µ ∈ (0, 1), Jννµ (µ, µ) = 0.
Assumption 6 states that the local concavity of J w.r.t. ν at ν = µ is constant
for different µ. The information measure I defined in main model satisfies this. With
Assumption 6, I get a direct corollary of Theorem 5:
Corollary 6. Suppose V ∈ C (3) (0, 1) solves Equation (9), Assumptions 5 and 6 are
satisfied, then the set of µ s.t.:
ρV (µ) = c
V ′′ (µ)
′′ (µ, µ)
Jνν
will be of zero measure.
4.4. Connection to Static Problem
In this section, I establish connection between the dynamic information acquisition
studied in current paper with static information acquisition problems studied in literature. I will show that in two kinds of limit, constant discounting and linear flow cost,
the dynamic information acquisition problem reduces to a quite standard static rational
inattention problem studied in Matejka and McKay (2014). Constant discounting will be
discussed in Section 4.4.1 and linear cost will be discussed in Section 4.4.2.
21
4.4.1. Constant discounting
In this subsection, I study the case in which the DM doesn’t discount her utility
exponentially. In stead, she pays a constant flow cost m > 0 for waiting. The Bellman
equation becomes:
∑
1
m = sup
pi (V (νi ) − V (µ) − V ′ (µ)(νi − µ)) + σ 2 V ′′ (µ)
(10)
2
pi ,νi ,σ
i
∑
1
s.t.
pi (H(µ) − H(νi ) + H ′ (µ)(νi − µ)) − σ 2 H ′′ (µ) ≤ c
2
Equation (10) differs from Equation (4) only on the LHS of equation. The flow loss
from waiting ρV (µ) is replaced by a fixed discount factor m. We are looking for smooth
function V s.t. (1) V (0) = F (0), V (1) = F (1). (2) In region V (µ) > F (µ), V is twice
differentiable and Functional equation Equation (10) is satisfied.
Theorem 7. Suppose Assumption 1 is satisfied. There exists unique V ∈ C (1) [0, 1]
solving Equation
(10) with the following properties:
{ }
1. µV (µ) > F (µ) is consisted of finite number of open intervals ∪In .
2. In each interval In , there exists linear function Ln (µ) s.t. V (µ) = Ln (µ) − mc H(µ).
Corollary 8. V solving Equation (10) solves the following problem:
V (µ) = supE [u(A, X )] −
S,A
m
I (S; X )
c
(11)
s.t. X → S → A
Theorem 7 and Corollary 8 shows that when discounting is replaced by a constant
flow cost of waiting, the value function of dynamic information acquisition problem is
identical to the value function of static information acquisition problem with a linear
cost mc I(S; X ). Theorem 7 also shows that the optimal value function V will be a linear
transformation of the uncertainty measure H in every experimentation region. Therefore,
V ′′ (µ)
V (ν)−V (µ)−V ′ (µ)(ν−µ)
it’s easy to see that both c H(µ)−H(ν)+H
′ (µ)(ν−µ) and −c H ′′ (µ) equals m. This implies that
actual form of information structure doesn’t matter. The DM is essentially solving the
following problem: she first decides how many periods to take to accumulate capacity
constraint. Then she chooses optimal information strategy subject to the total constraint.
Finally she divides experiment arbitrarily over time such that flow cost constraint is
satisfied. Utility from action is not discounted and the cost from waiting will be mc times
total capacity constraint. This is essentially the static problem Equation (11).
A different interpretation of Theorem 7 is that in the limit when the DM is fully
patient (ρ = 0) but she needs to pay the capacity constraint, the dynamic problem
reduces to the static problem with linear cost. With this interpretation, the result is
logically straight forward.
4.4.2. Linear flow cost
In this subsection, I study the case in which instead of a convex flow cost on information measure (Assumption 2′ ), the DM pays a linear cost.
Assumption 2′′ (Linear flow cost). Function h is defined by h(c) = λc.
22
Theorem 9. Suppose Assumptions 1 and 2′′ are satisfied. Then solution to the following
two-period problem:
}
{
[ −ρdt
]
V (µ) = max F [µ], supEµ e
u(A, X ) − λI(A; X )
(12)
A
solves Equation (1).
Theorem 9 states that when flow cost function h is linear, the value function of
dynamic information acquisition problem is identical to a two stage problem. The DM
chooses either an immediate action or acquiring information at linear cost λI(A; X ) and
delay action to next period. This is almost a static problem. In fact, I assumed action
to be delayed by one period than information for simplicity in the dynamic problem in
Section 2. If we tweak the timing a little to allow DM to utilize information acquired in
current period to facilitate action in current period, then solution to Equation (1) will
totally reduce to a static problem with linear cost.
The intuition for this result is simple. Suppose a dynamic information acquisition
strategy involves action being taken at some time t. Then combining any information
acquired before t into period t strictly reduces total discounted cost because all costs are
delayed. Then the strategy becomes waiting for t periods doing nothing and apply a static
strategy in period t. Suppose this strategy yields positive payoff. Then it’s profitable not
to discount the utility at all and implement the strategy right in first period.
On one hand, in this section I showed that my model can serve as a dynamic foundation of the static rational inattention model. The optimal information acquisition
strategy in static RI problem can be implemented by optimally searching for informative
evidence arriving at Poisson rate. On the other hand, analysis in this section implies that
the dynamic framework I setup in current paper is quite tight. Suppose we accept the
posterior separable information measure (i.e. the fact that cost of information structure
is linearly separable), then a convex cost on information is necessary to give DM incentive
to smooth the process of information acquisition over time. Meanwhile, real discounting
is necessary to make time distribution of information process matter.
5. Discussions and Conclusion
5.1. Information Measure
It’s not hard to see that all assumptions I explicitly made (Assumptions 1, 2 and 3)
is either purely technical or can be extended to general case. The only real restriction
I imposed on this framework is the structure of information measure. The information
measure I used in current paper is a generalized version of Kullback-Leiber divergence
(GKLD, standard KLD is introduced in Kullback and Leibler (1951)). (Generalized)KLD
measures the distance of prior and posterior belief by measuring the expected distance
between Entropy(general convex uncertainty measure) on the two distributions. This
helps me to illustrate the key mechanism in a simplest way.
The key trade-off in dynamic information acquisition problem is between gain from
information (measured by concavity of value function) and loss of information cost (measured by the information measure). With KLD, these two terms are directly comparable.
23
The gross value of a information structure could be measured by linear combination of
value function and uncertainty measure. The optimal choice of information structure is
determined by concavity of the gross value function. Therefore, it’s clear that studying
the endogenous multiplier that adjusts weight on value and cost will be the key. What’s
more, there is a clean geometric representation of the whole problem.
This convenience comes at a loss of generality in the following sense. GKLD restricts
the measure of “compound experiment” to be linearly separable in its components. Geometrically speaking, the function governing information cost (H) at one prior must be
exactly the same function at other priors. As is shown in the proof to Theorem 2, this is
critical for me to show the immediate action property and confirmatory evidence property. One can imagine an information measure which has the following property: there is
a small threshold such that when distance between prior and induced posterior is smaller
than the threshold information is almost free but jumping to a posterior further than the
threshold incurs a fast increasing cost. Then it will be very profitable to break a long
jump into several small ones. Suppose the underlying decision problem makes informative
signal very useful (F is globally very convex). Then the DM will find it profitable to jump
multiple times to a posterior before taking action. One can also imagine an information
measure which makes long jumps very cheap comparing to short ones. Then, it might be
profitable to choose contradictory evidence when confirmatory evidence doesn’t increase
arrival rate much (approximating the baseline setup in Che and Mierendorff (2016)).
To sum up, among the properties characterized in current paper, optimality of Poisson
signal and monotonicity of experimentation intensity are generally robust to alternative
structures of information measure. Immediate action and confirmatory evidence property
are more specific to the generalized KLD framework, or equivalently linear separability
of information measure.
5.2. Conclusion
This paper provides a framework of dynamic information acquisition problem which
allows fully general design of information flow and characterizes the optimal information
acquisition strategy. My first contribution to the literature is a robust optimization
foundation for a family of simple random process: for a generic information acquisition
problem with flexibility in experiment design, Poisson signals which induces jumps in
posterior belief will be the endogenously optimal form of signal structure. Second, I
contribute to the discussion of optimal experimentation design by providing a complete
characterization of optimal solution for a subset of problems with posterior separable
measure: it is optimal to seek for an informative evidence which confirms prior belief
and lead to immediate action at increasing precision and decreasing intensity over time.
This paper also contributes to the rational inattention literature by providing a dynamic
foundation: static rational inattention is both the full patience limit and linear flow cost
limit of the dynamic information acquisition problem.
24
Aumann, R. J., Maschler, M., and Stearns, R. E. (1995). Repeated games with incomplete
information. MIT press.
Blackwell, D. et al. (1951). Comparison of experiments. In Proceedings of the second
Berkeley symposium on mathematical statistics and probability, volume 1, pages 93–
102.
Che, Y.-K. and Mierendorff, K. (2016). Optimal sequential decision with limited attention.
Cover, T. M. and Thomas, J. A. (2012). Elements of information theory. John Wiley &
Sons.
Kamenica, E. and Gentzkow, M. (2009). Bayesian persuasion. Technical report, National
Bureau of Economic Research.
Kullback, S. and Leibler, R. A. (1951). On information and sufficiency. The annals of
mathematical statistics, 22(1):79–86.
Matejka, F. and McKay, A. (2014). Rational inattention to discrete choices: A new foundation for the multinomial logit model. The American Economic Review, 105(1):272–
298.
Moscarini, G. and Smith, L. (2001). The optimal level of experimentation. Econometrica,
69(6):1629–1644.
Sims, C. A. (2003). Implications of rational inattention. Journal of monetary Economics,
50(3):665–690.
Wald, A. (1947). Foundations of a general theory of sequential decision functions. Econometrica, Journal of the Econometric Society, pages 279–313.
25
Contents
A Proofs in Section 2
A.1 Concavification . . . . . . . . . . . . .
A.2 Simplification of Information Structure
A.2.1 Proof of Lemma 1: . . . . . . .
A.2.2 Proof of Lemma 2: . . . . . . .
A.2.3 Proof of Lemma 3 . . . . . . . .
A.3 Continuous Time Limit of Experiments
A.3.1 Proof of Lemma 4 . . . . . . . .
B Proofs in Section 3
B.1 Convergence . . . . . . . .
B.1.1 Proof of Lemma 5 .
B.1.2 Proof of Theorem 1
B.2 Characterization . . . . .
B.2.1 Proof of Theorem 2
.
.
.
.
.
.
.
.
.
.
C Proofs in Section 4
C.1 Convex Flow Cost . . . . . . .
C.1.1 Proof of Theorem 3 . .
C.2 Continuum of Actions . . . .
C.2.1 Proof of Lemma 6 . . .
C.2.2 Proof of Theorem 4 . .
C.3 General Information Measure
C.3.1 Proof of Theorem 5 . .
C.4 Connection to Static Problem
C.4.1 Proof of Theorem 7 . .
C.4.2 Proof of Corollary 8 .
C.4.3 Proof of Theorem 9 . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
26
26
30
30
30
35
38
38
.
.
.
.
.
40
40
40
42
51
51
.
.
.
.
.
.
.
.
.
.
.
74
74
74
90
90
91
97
97
99
99
101
101
A. Proofs in Section 2
This section contains formal proofs for theorems and lemmas in Section 2.
A.1. Concavification
Theorem 10 (Concavification). Let X be a finite state space, V ∈ C (∆X), µ ∈ ∆X.
H ∈ C (∆X) is non-negative. f : R+ 7→ R+ increasing and convex. Then there exists τ
s.t. |supp(τ )| ≤ 2 |X| solving:
sup Eτ [V (µ′ )] − f (H(µ) − Eτ [H(µ′ )])
(A.1)
τ ∈∆2 X
s.t. Eτ [µ′ ] = µ
And there exists C, λ ∈ R+ , s.t. C = H(µ) − Eτ [H(µ′ )] , Eτ [µ′ ] = µ, λ ∈ df (C) and:
Co(V + λH)(µ′ ) = Eτ [(V + λH)(µ′ )]
26
Proof. We first show that fix C > 0, there exists λ ≥ 0 and τ as defined in Theorem 10
s.t. the following constrained maximization problem is solved by τ :
Eτ [V (µ′ )] = sup Ep [V (µ′ )]
(A.2)
p∈∆2 X
s.t. Ep [µ′ ] = µ
H(µ) − Ep [H(µ′ )] ≤ C
• Define correspondence c∗ : R+ ⇒ R+ as:
{
{
|supp(p)| ≤ |X| , E [µ′ ] = µ
p
c∗ (λ) = H(µ) − Ep [H(µ′ )] Co(V + λH)(µ) = Ep [(V + λH)(µ′ )]
}
Where Co(V + λH) is defined as the upper concave hull of V + λH. First, we show
that c∗ (λ) has non-empty graph. ∀λ ≥ 0, if (V + λH)(µ) = Co (V + λH) (µ), then let
p = δµ and c∗ (λ) = {0}. If Co(V + λH)(µ) > (V + λH)(µ), then by definition of upper
concave hull, it must be a cavexification of the graph of V +λH. Since ∆X is |X|−1 dimensional, the graph of V + λH is |X| dimensional. Therefore, there exists at most |X|
∑
∑
different (µi ) and
αi = 1 s.t.
α1 (µi , (V + λ(H))(µi )) = (µ, Co(V + λH)(µ)). Let
τ be a distribution defined as taking probability αi on posteriors µi , then τ ∈ c∗ (λ).
Second, c∗ has closed graph. Consider any λn → λ, cn → c. Let pn be the corresponding signal structure s.t. H(µ) − Epn [H(µ′ )] = cn . pn can be summarized by
|X|
n
× ∆(X). By compactness of Euclidean space, we can find a converging
(µ′n
i , pi ) ∈ R
∑
k
subsequence of µ′n
→ µ′i , pni k → pi . Continuity of H guarantees H(µ)− pi H(µ′i ) = c.
i
∑
Convergence guarantees pi µ′i = µ. Then Co(V + λH)(µ) ≥ Ep [(V + λH)(µ′ )]. Continuity of H and V guarantees that Ep [(V + λH)(µ′ )] = lim Co(V +λn H)(µ). To porve
that c ∈ c∗ (λ), it’s sufficeint to prove that Co(V +λH)(µ) is continuous in λ. ∀λ′ ∈ R+ ,
∀gi = (µi , (V + λH)(µi )) in graph of (V + λH), we pick gi′ = (µi , (V + λ′ H)(µi ))
in graph of (V + λ′ H). Let M = maxµ H, then |gi − gi′ | ≤ |λ − λ′ | M . Suppose
∑
(µ, Co(V + λH)(µ)) =
αi gi , then:
∑
∑
∑
′
′
α i gi −
αi gi = αi gi − (µ, Co(V + λH)(µ)) ≤ |λ − λ′ | M
Therefore, Co(V + λ′ H)(µ) ≥ Co(V + λH)(µ) − |λ − λ′ | M . Since λ and λ′ are interchangeble, we actually showed that Co(V + λH)(µ) is Lipschtiz continuous in λ with
parameter M . Therefore c ∈ c∗ (λ).
• The next step is to convexify c∗ (λ). We take b
c∗ (λ) = cov (c∗ (λ)). Then b
c∗ also has
closed graph: Consider any λn → λ, cn → c, the corresponding signal structure can be
summarized by (αn , µ′i n , pni , µ′j n , pnj ). By compactness of Euclidean space, we can find
converging subsequence to (α, µ′i , pi , µ′j , pj ). We know that (µ′i , pi ) and (µ′j , pj ) are both
∑
∑
∑
in c∗ (λ). c = H(µ) − α pi H(µi ) − (1 − α) pj H(µj ) = α (H(µ) − pi H(µi )) + (1 −
∑
α) (H(µ) − pj H(µj )) ∈ cov(c∗ (λ)).
• Let C = limλ→+∞ {c∗ (λ)}. C must be zero. Suppose not, then H(µ) > Epn [H(µ′ )] + ε
for λn → ∞ and pn defined as in c∗ (λn ). Then Epn [(V + λn H)(µ′ )] − λn H(µ) ≤
27
Epn [V (µ′ )] − ε) is going to −∞. However, Co(V + λn H)(µ) − λn H(µ) ≥ V (µ) is
positive. Let C = sup {c∗ (0)} is the cost associated with unconstrained maximization
of Ep [V ]. Therefore for any C in [0, C], since b
c∗ has convex value and closed graph,
exists λ s.t. b
c∗ (λ) = C. For any C > C, we take λ = 0. Since the image of c∗ is
one-dimensional, by convexification we took in last step, there exists τ1 , τ2 satisfying
the definition of c∗ (λ) s.t. H(µ) − Eατ1 +(1−α)τ2 [H(µ′ )] = C. Let τ = ατ1 + (1 − α)τ2 ,
then |supp(τ )| ≤ 2 |X|.
• Suppose now exists p s.t. Ep [µ′ ] = µ, H(µ)−Ep [H(µ′ )] ≤ C and Ep [V (µ′ )] > Eτ [V (µ′ )].
Then by definition Ep [V (µ′ ) + λ(H(µ′ ) − H(µ))] > Eτ [V (µ′ ) + λ(H(µ′ ) − H(µ))]. This
contradicts the fact that Eτ [(V + λH)(µ)] achieves the upper concave hull of V + λH.
So τ defined here actually solves Equation (A.2).
Let V ∗ (C) be the maximum in Equation (A.2). Now we show concavity of V ∗ . Take
any C1 , C2 ∈ [0, C], α ∈ [0, 1]. Let τ1 and τ2 be signal structures that achieves V ∗ (C1 ) and
V ∗ (C2 ). Consider a compound signal ατ1 + (1 − α)τ2 . Then H(µ) − Eατ1 +(1−α)τ2 [H(µ′ )] =
H(µ) − αEτ1 [H(µ′ )] − (1 − α)Eτ2 [H(µ′ )] = αC1 + (1 − α)C2 . Eατ1 +(1−α)τ2 [V (µ′ )] =
αV ∗ (C1 ) + (1 − α)V ∗ (C2 ). Therefore, since ατ1 + (1 − α)τ2 is feasible under cost αC1 +
(1 − α)C2 , we have V ∗ (αC1 + (1 − α)C2 ) ≥ αV ∗ (C1 ) + (1 − α)V ∗ (C2 ). It’s obvious that
V ∗ (C) is weakly increasing in C when C ∈ [0, C]. When C > C, constraint is never
binding, V ∗ (C) is flat. Therefore, V ∗ (C) is concave globally for C ∈ R+ .
Given concavity of f , V ∗ (C)−f (C) is weakly concave and maximum exists. We know
that C maximizes (V ∗ − f ) if and only if dV ∗ (C) ∩ df (C) ̸= ∅. Therefore it remains
to solve for the general derivative of V ∗ . It’s existence is guaranteed by concavity. We
calculate it by proving a generalized envelope theorem. By our construction of V ∗ (C),
∀C there exists λ(C) s.t. c∗ (λ(C)) = C. Let τ (C) be the associated signal structure that
maximize V + λ(C)H (as well as Equation (A.2)). Then for any C1 :
V ∗ (C1 ) − V ∗ (C)
(
)
=Eτ (C1 ) [V (µ′ )] + λ(C1 ) C1 − H(µ) + Eτ (C1 ) [H(µ′ )]
(
)
− Eτ (C) [V (µ′ )] − λ(C) C − H(µ) + Eτ (C) [H(µ′ )]
(
)
∫
∫
′
′
′
′
= V (µ )(τ (C1 ) − τ (C))dµ + λ(C) C1 − C + H(µ )(τ (C1 ) − τ (C))dµ
(
)
+ (λ(C1 ) − λ(C)) C1 − H(µ) + Eτ (C1 ) [H(µ′ )]
∫
= (V (µ′ ) + λ(C)H(µ′ )) (τ (C1 ) − τ (C))dµ′ + λ(C1 )(C1 − C)
(
)
+ (λ(C1 ) − λ(C)) C − H(µ) + Eτ (C1 ) [H(µ′ )]
∫
= (V (µ′ ) + λ(C)H(µ′ )) (τ (C1 ) − τ (C))dµ′ + λ(C1 )(C1 − C)
+ (λ(C1 ) − λ(C)) (C − C1 )
∫
= (V (µ′ ) + λ(C)H(µ′ )) (τ (C1 ) − τ (C))dµ′ + λ(C)(C1 − C)
≤λ(C)(C1 − C)
28
The last inequality is from the fact that τ (C) maximizes Eτ [V + λ(C)H]. On the other
hand:
V ∗ (C1 ) − V ∗ (C)
)
(
∫
∫
′
′
′
′
= V (µ )(τ (C1 ) − τ (C))dµ + λ(C1 ) C1 − C + H(µ )(τ (C1 ) − τ (C))dµ
(
)
+ (λ(C1 ) − λ(C)) C − H(µ) + Eτ (C) [H(µ′ )]
∫
= (V (µ′ ) + λ(C1 )H(µ′ )) (τ (C1 ) − τ (C))dµ′ + λ(C1 )(C1 − C)
(
)
+ (λ(C1 ) − λ(C)) C1 − H(µ) + Eτ (C) [H(µ′ )]
∫
= (V (µ′ ) + λ(C1 )H(µ′ )) (τ (C1 ) − τ (C))dµ′ + λ(C1 )(C1 − C)
≥λ(C1 )(C1 − C)
Last inequality is from the fact that τ (C1 ) maximizes Eτ [V + λ(C1 )H]. Combining both,
we have ∀dC > 0:

V ∗ (C + dC) − V ∗ (C)


λ(C) ≥
≥ λ(C + dC)
dC
V ∗ (C − dC) − V ∗ (C)


λ(C − dC) ≥
≥ λ(C)
−dC
[
]
=⇒ dV ∗ (C) ⊂ limdC→0+ λ(C + dC), limdC→0+ λ(C − dC)
What’s more, previous proof implies that λ(C) is weakly decreasing. Since λ(C) is pick
arbitrarily from the inversed graph of b
c∗ (λ), this implies that b
c∗ (λ) is weakly decreasing
in set order i.e. ∀λ1 > λ, inf c∗ (λ) ≥ sup c∗ (λ1 ). Define Λ(C) = {λ|C ∈ b
c∗ (µ)}, then we
show that:
[
]
limdC→0+ λ(C + dC), limdC→0+ λ(C − dC) = Λ(C)
[
]
Monotonicity of c∗ implies Λ(C) ⊂ limdC→0+ λ(C + dC), limdC→0+ λ(C − dC) . Close
[
]
graph property implies limdC→0+ λ(C + dC), limdC→0+ λ(C − dC) ⊂ Λ(C).
Finally, we show that:
dV ∗ (C) = Λ(C)
Λ(C) is a correspondence with convex value, closed graph and decreasing in set order.
By definition of subdifferential to concave function, dV ∗ (C) is also a correspondence with
convex value, closed graph and decreasing in set order. Suppose dV ∗ (C) ⫋ Λ(C), first
possibility is that inf dV ∗ (C) > inf Λ(C) = limC ′ →C + Λ(C ′ ) ≥ limC ′ →C + dV ∗ (C ′ ). Second
∑ ∗
possiblity is that
dV (C) < sup Λ(C) = limC ′ →C − Λ(C ′ ) ≤ limC ′ →C − dV ∗ (C ′ ). Both
case contradicts closed graph property of dV ∗ .
To sum up, we showed that V ∗ is the maximum of Equation (A.1) if and only if exists
C s.t.:
df (C) ∪ Λ(C) ̸= ∅
That is to say ∃τ s.t. |supp(τ )| ≤ 2 |X|, Eτ [µ′ ] = µ, H(µ) − Eτ [H(µ′ )] = C and
Co(V + λH)(µ) = Eτ [(V + λH) (µ′ )].
29
A.2. Simplification of Information Structure
A.2.1. Proof of Lemma 1:
Proof.
1. Markov property: Suppose the signal realization of S, T are denoted by s, t. Then:
I(T ; X |S) = Es [H(µ(x|s)) − Et [H(µ(x|t, s))|s]]
= Es [H(µ(x|s)) − Et [H(µ(x|s))|s]]
=0
First equality is by definition of I. Second equality is by T ⊥X S, then conditional on
s, t will not shift belief of X at all.
2. Chain rule: Suppose the signal realization of S, T are denoted by s, t. Then:
I(S, T ; X |µ) = Es,t [H(µ) − H(µ(x|s, t))]
= Es,t [H(µ) − H(µ(x|s)) + (H(µ(x|s)) − H(µ(x|s, t)))]
= (H(µ) − Es [H(µ(x|s))]) + (Es [H(µ(x|s)) − Et [H(µ(x|s, t))|s]])
= I(S; X |µ) + E[I(T ; X |S, µ)]
First equality is by definition. Second equality is trivial. Third equality is by chain
rule of conditional expectation.
3. Information processing inequality:
I(S; X |µ) = I(S, T ; X |µ) − I(T ; X |S, µ)
= I(S, T ; X |µ)
= I(T ; X |µ) + I(S; X |T , µ)
≥ I(T ; X |µ)
First and third equalities are from chain rule. Second equality is from Markov property.
A.2.2. Proof of Lemma 2:
Proof. I break the proof of Lemma 2 into three lemmas. Lemma 7 shows that solving
Equation (1) is equivalent to solving Equation (A.3), which reduces the signal structure
to be nested, containing only action as direct signals and continuation signals. Then
Lemma 8 shows that solving Equation (A.3) is equivalent to solving Equation (A.4), which
transforms abstract random process formulation to conditional distribution formulation.
Then Lemma 9 shows that solving functional equation Equation (A.5) is equivalent to
solving sequential problem Equation (A.4) using the standard methodology. Finally, we
apply Theorem 10 to Equation (A.5) to further reduce the dimensionality of strategy
space to Equation (2).
30
(
)
Lemma 7 (
(Reduction )of redundency). S t , T , AT solves Equation (1) if and only if
there exists SeT , T , AT solving :
∞ (
∑
sup
S t ,T ,AT
( [
])
e−ρdt P[T = t] E u(At , X )|S t−1 , T = t
(A.3)
[ ( (
])
t−1 )) t
T > t
− P[T > t]E f I Se ; X Se
 t−1
t
S are degenerate.

S
,
1
T
>t+1


s.t. Set−1 T =t = At T =t


Set = Set−1 T ≤t
T ≤t
t=0
What’s more, the optimal utility level is same in Equation (1) and Equation (A.3).
Proof. Suppose (S t , T , At ) is a solution to Equation (1). First, we combine signal T into
S to get Sbt = (S t , 1T >t+1 ). Consider:
(
)
I(Sbt ; X |Sbt−1 ) = I S t , 1T >t+1 ; X |S t−1 , 1T >t
[ (
)]
= E1T >t I S t , 1T >t+1 ; X S t−1 , 1T >t
= I(S t , 1T >t+1 , 1T >t ; X |S t−1 ) − I(1T >t ; X |S t−1 )
= I(S t , 1T >t+1 ; X |S t−1 )
= I(S t ; X |S t−1 )
Second equality is from Markov property X → S t−1 → 1T >t . Then 1T >t doesn’t affect posterior belief given S t−1 . Third equality is from chain rule. Forth and fifth
equality
( is from )Markov chain property. Now define a nested information structure
t
Sb = Sb0 , . . . , Sbt , then:
(
)
(
)
I Sbt ; X |Sbt−1 =I Sb0 , . . . , Sbt ; X Sb0 , . . . , Sbt−1
(
)
(
)
0
0
t
t−1
0
t−1
t−1
b
b
b
b
b
b
b
=I S ; X S , . . . , S
+ I S ,...,S ;X S ,...,S
(
)
=I Sbt ; X S 0 , . . . , S t−1 , 1T >0 , . . . , 1T >t
[ (
)]
=E1T >t I Sbt ; X S 0 , . . . , S t−1 , 1T >0 , . . . , 1T >t
(
)
0
t
t−1
b
=I S , 1T >t ; X S , . . . , S , 1T >0 , . . . , 1T >t−1
(
)
− I 1T >t ; X S 0 , . . . , S t−1 , 1T >0 , . . . , 1T >t−1
(
)
=I Sbt ; X S 0 , . . . , S t−1 , 1T >0 , . . . , 1T >t−1
[ (
)]
+ ESbt I 1T >t ; X S 0 , . . . , S t , 1T >0 , . . . , 1T >t−1
=···
(
)
=I Sbt ; X S t−1
=I(S t ; X |S t−1 )
31
Now we define Set :
et
S =

bt−1


S
when T < t + 1
At+1 , Sbt−1


Sbt
when T = t + 1
when T > t + 1
Initial information Se−1 is defined as a degenerate(uninformative) signal and induced belief
is the prior. Verify the properties of Set :
1. Set−1 |T =t = At |T =t is satisfied by definition. What’s more, Set−1 is sufficient for 1T >t
and At by definition.
2. By definition,
1T >t+1 is contained in Sbt . Sbt−1 is also contained in Sbt . Therefore
Set−1 , 1T >t+1 Set is degenerate by definition when T > t + 1. When T ≤ t + 1, 1T >t+1 is
degenerate automatically. Of(course this
) is a special case of the Markov chain condition
t
t
in Equation (1). Therefore Se , T , A is feasible in both problem Equation (1) and
Equation (A.3).
(
)
3. Now we calculate the information cost associated with Set , T , At :
 (
)

bt−1 ; X Set−1 , T < t + 1

I
S


(
) 
(
)
I Set ; X Set−1 = P[T = t + 1]I At+1 , Sbt−1 ; X Set−1 , T = t + 1

(
)




+ P[T > t + 1]I Sbt ; X Set−1 , T > t + 1
if T < t + 1
if T ≥ t + 1


0
if T < t + 1


(
)

= P[T = t + 1]I At+1 ; X Set−1 , T = t + 1

(
) if T ≥ t + 1


 + P[T > t + 1]I Sbt ; X Set−1 , T > t + 1
(
)
≤I Sbt ; X Sbt−1
(
)
=I S t ; X |S t−1
(
)
Therefore, Set , T , At dominates the original solution in Equation (1) by achieving
same action profile but lower costs. By optimality,
(
)they must achieve the same utility
t
t
e
level. What remains to be proved is that S , T , A solves Equation (A.3). Of course
(
)
t
t
e
S , T , A is feasible under Equation (A.3). Suppose Equation (A.3) has a solution
(
)
Set , T , At . We know that it’s feasible in Equation (1):
[
E e−ρdt·T E[u(AT , X )|S T −1 ] −
=
∞ (
∑
∞
∑
( (
))
e−ρdt·t f I Set ; X Set−1
t=0
]
[ ( (
))])
( [
])
e−ρdt P[T = t] E u(At , X )|S t , T = t − E f I Set ; X Set−1
t=0
∞ (
( [
])
∑
−ρdt
t
t−1
e
=
e
P[T = t] E u(A , X )|S , T = t
t=0
32
[ ( (
])
t−1 )) t
e
e
T >t
− P[T > t]E f I S ; X S
(
)
The last equality is from I Set ; X Set−1 , T ≤ t = 0. Therefore, the maximum of Equation (1) should be higher than maximum of Equation (A.3). Combining the previous
proof, they must be identical.
(
)
Lemma 8 (Tranformation of space). S t , T , AT solves Equation (1) if and only if there
exists pt (µt+1 |µt ) : ∆X 7→ ∆2 X and qst (µt ) : ∆X 7→ [0, 1] solving:
)
∫ [(
∞
∑
∑
max
u(a, xj ) · µj qst (µt )
e−ρdt·t
sup
(A.4)
(pt ,qst ) t=0
(
−f
∫
H(µt ) −
j
t−1
∏
∆X t−1
]
)
H(µt+1 )pt (µt+1 |µt )dµt+1 (1 − qst (µt ))
∆X
(∫
∫
a
∆X
)
pτ (µτ +1 |µτ ) (1 − qsτ (µτ )) dµ1 . . . µt−1 dµt
τ =0
µpt (µ|µt )dµ = µt
s.t.
∆X
What’s more, the optimal utility level is same in Equation (1) and Equation (A.4).
Proof. Let pt (·|µt ) be the distribution of posteriors generated by Set T >t,Set−1 =Set−1 , where µt
[
]
is posterior belief associated with signal Set−1 . Let qst (µt ) = P T = tSet−1 = Set−1 , T ≥ t .
e T , A with the conditional distribuNow we can explicity represent the distribution of S,
tions. First, P[T = t] and P[T > t] can be calculated by integrating qst (µt ):
[
]
−1
e
P[T = t] =E P[T = t|S ]
[ [
] [
]]
=E P T = tSe−1 , T > 0 P T > 0Se−1
[
]
=(1 − qs0 (µ0 ))P T = tT > 0
[ [
]]
=(1 − qs0 (µ0 ))E P T = tT > 0, Se0
∫
[
]
0
0
=(1 − qs (µ )) P T = tT ≥ 1, Se0 p0 (µ1 |µ0 )dµ1
∫
[
] [
]
0
0
=(1 − qs (µ )) P T = tT > 1, Se0 P T > 1T ≥ 1, Se0 p0 (µ1 |µ0 )dµ1
∫
] ]
[ [
0
1 1
0
e
=(1 − qs (µ )) E P T = t T > 1, S µ (1 − qs1 (µ1 ))p0 (µ1 |µ0 )dµ1
=···
∫ ∏
t−1
pτ (µτ +1 |µτ )(1 − qsτ (µτ ))qst (µt )dµ1 . . . µt
=
τ =0
Similarly, we can get:
P [T > t] =
∫ ∏
t−1
pτ (µτ +1 |µτ )(1 − qsτ (µτ ))(1 − qst (µt ))dµ1 . . . µt
τ =0
33
Then we can calculate the joint distribution of T and µt :

∫ ∏
t−1
[
]

t


P T = t, µ = ν =
pτ (µτ +1 |µτ )(1 − qsτ (µτ ))qst (µt )dµ1 . . . µt−1


τ =0
∫ ∏
t−1

[
]


t

P
T
>
t,
µ
=
ν
=
pτ (µτ +1 |µτ )(1 − qsτ (µτ ))(1 − qst (µt ))dµ1 . . . µt−1

τ =0
Therefore:
∫ ∏t−1 τ τ +1 τ

|µ )(1 − qsτ (µτ ))dµ1 . . . µt−1 qst (µt )

t
τ =0 p (µ


A T =t ∼ ∫ ∏t−1 pτ (µτ +1 |µτ )(1 − q τ (µτ ))q t (µt )dµ1 . . . µt
s
s
τ =0
∫ ∏t−1
τ
τ +1 τ
τ
τ

|µ )(1 − qs (µ ))dµ1 . . . µt−1 (1 − qst (µt ))

τ =0 p (µ

Set T >t ∼ ∫ ∏
t−1 τ
τ +1 |µτ )(1 − q τ (µτ )(1 − q t (µt ))dµ1 . . . µt
s
s
τ =0 p (µ
This implies:
[
]
t−1
e
P [T = t] E u(A , X ) S , T = t
∫
∫
t−1
∑
∏
t
max
u(a, xj )µj
pτ (µτ +1 |µτ )(1 − qsτ (µτ ))qst (µt )dµ1 . . . µt−1 dµt
=
∆X
a
t
∆X t−1 τ =0
j
[ ( (
]
t−1 )) t
e
e
P [T > t] E f I S ; X S
T >t
(
)
∫
∫
t
t+1 t
t+1 t
t+1
=
f H(µ ) −
H(µ )p (µ |µ )dµ
∆X
×
∫ ∏
t−1
∆X
pτ (µτ +1 |µτ )(1 − qsτ (µτ ))(1 − qst (µt ))dµ1 . . . µt−1 dµt
τ =0
e T , A solving Equation (A.3), we can conTo sum up, we showed that starting from S,
t t
struct p , qs such that the value of Equation (A.3) is achieved in Equation (A.4).
Next,
we start from (pt , q t ) solving Equation (A.4). We can easily define T : T T ≥t,µt ∼
B(qst (µt )) conditionally independent across all t, µt . Set T >t,µt ∼ pt (·|µt ), At T =t,µt =
∑
arg max u(a, xj )µtj . Therefore, the previous calculation shows that the value of Equation (A.4) is also achieved in Equation (A.3). Combining with the previous result,
( we con)
e
clude that Equation (A.3) and Equation (A.4) are equivalent in the sense that S, T , A
solves Equation (A.3) if and only if the corresponding (pt , qst ) solves Equation (A.4).
Lemma 9 (Recursive representation). Vdt (µ) is the optimal utility level solving Equation (1) given initial belief µ if and only if Vdt (µ) satisfies the following functional equation:
{
}
∫
Vdt (µ) = max max E [u(a, x)|µ] , sup e−ρdt
∫
s.t.
a
p∈∆2 X
νp(ν)dν = µ
∫
H(ν)p(ν)dν
C = H(µ) −
∆X
∆X
34
Vdt (µ)p(µ)dµ − f (C)
∆X
(A.5)
Proof. We first derive the recursive representation of Equation (A.4). Consider the following functional equation:
[∫
(
)
∑
Vdt (µ) = sup qs (µ) max
u(a, xj )µj + (1 − qs (µ))
Vdt (ν)p(ν|µ)dν
qs (µ),p(·|µ)
a
(
)]
∫
−f H(µ) −
H(ν)p (ν|µ) dν
∆X
∫
s.t.
νp(ν|µ)dν = µ
∆X
∆X
Since RHS is linear in qs (µ), it will be WLOG that we only consider boundary solution
qs (µ) ∈ {0, 1}. Therefore, it will be exactly the same as Equation (A.5).
Now consider the equivalence between sequential problem and recursive problem. By
assumption E [u(a, x)|µ] is bounded above by maxa,x u(a, x). Therefore, e−ρdt·t E[u(a, x)|µ]
is uniformly (for all choice of µ, a) converging to zero when t → ∞. Then Vdt (µ) will be
the solution of Equation (A.4).
A.2.3. Proof of Lemma 3
Proof.
{
}
• Let Z = V ∈ C∆X F ≤ V ≤ F̂ . We define operator:
{
}
∑
−ρdt
T (V )(µ) = max F (µ), max e
pi V (µi ) − f (C)
pi ,µi
∑
s.t. C =
pi (H(µ) − H(µi ))
∑
pi µi = µ
(A.6)
Noticing that the maximization operator is well defined since V ∈ C∆X. Then T will
be a contraction mapping on space (Z, l∞ ).
• T (Z) ⊂ Z: First it’s obvious that by choosing an uninformative signal structure µi = µ,
constraints in Equation (A.6) are satisfied and T (V )(µ) ≥ F (µ). What’s more, since
F̂ (µ) is the full information
limit,}the upper concave hull of F̂ will be F̂ itself. Therefore
{
−ρdt ˆ
T (V )(µ) ≤ max F (µ), e
F (µ) ≤ F̂ (µ). What remains to be shown is that ∀V ∈ Z,
T (V ) will be continuous.
We first show that T (V ) is lower semi-continuous i.e. lim inf µ′ →µ T (V )(µ′ ) ≥ T (V )(µ).
Suppose T (V )(µ) = F (µ), then this is trivial because T (V ) ≥ F . Therefore we only need
to discuss T (V )(µ) > F (µ). Suppose pi , µi solves Equation (A.6) at µ for V . WLOG, we
drop all signals with pi = 0. Then the remaining signals still satisfies Bayes condition and
pi > 0. Suppose q(µi |xj ) is the conditional distribution of each realization of posterior
beliefs. ∀µ′ , define:

q(µi |xj )µ′ (xj )

′

∑
µ
(x
)
=

j
i

′
j q(µi |xj )µ (xj )
∑


p′ =
q(µi |xj )µ′ (xj )

 i
j
35
 ′
 ∂pi 
 ′ ′ = q(µi |xj )

∂µj µ =µ
=⇒
 ∂µ′i q(µi |xj ) q(µi |xj )

 ′
= −µi (xj )
+
1i=j

∂µj µ′ =µ
pi
pi
Since pi > 0, q(si |xj ) ≤ 1, µi ∈ ∆X, we can conclude that there exists δ, M > 0 s.t.
∀ |µ′ − µ| ≤ δ, |µ′i − µi | + |p′i − pi | ≤ M |µ′ − µ|. Then µ′i , p′i are all continuous in µ′ . Now
we define:
{ ∑
}
(∑
)
∑
p
(H(µ)
−
H(µ
))
i
i
′
′
−ρdt
′
e
∑
V (µ ) =e
pi V (µi ) min 1,
−f
pi (H(µ) − H(µi ))
p′i (H(µ′ ) − H(µ′i ))
Since V (µ) and H(µ) are continuous around µ and µi , Ve (µ′ ) will be continuous around
∑
∑ ′
µ. If
pi (H(µ) − H(µi )) ≤
pi (H(µ) − H(µ′i )), then:
(∑
)
∑
Ve (µ′ ) =e−ρdt
p′i V (µ′i ) − f
pi (H(µ) − H(µi ))
(∑
)
∑
≤e−ρdt
p′i V (µ′i ) − f
p′i (H(µ) − H(µ′i ))
≤T (V )(µ′ )
∑
∑
If
pi (H(µ) − H(µi )) > p′i (H(µ) − H(µ′i )), consider the following information structure:
 ′′
µi = µ′i , µ′′0 = µ′



∑


pi (H(µ) − H(µi ))
 ′′
′
pi = pi ∑ ′
p (H(µ′ ) − H(µ′i ))

∑i


pi (H(µ) − H(µi ))


p′′0 = 1 − ∑ ′
pi (H(µ′ ) − H(µ′i ))

∑
∑
) ∑
(
∑
pi (H(µ) − H(µi ))
pi (H(µ) − H(µi ))

′′ ′′
′
′ ′


= µ′
pi µi = µ 1 − ∑ ′
+
pi µi ∑ ′

′
′
′ ) − H(µ′ ))

p
(H(µ
)
−
H(µ
))
p
(H(µ

i
i
i
i

(∑
) ∑ p (H(µ) − H(µ ))
∑
i
i
=⇒
p′′i (H(µ′ ) − H(µ′′i )) =
p′i (H(µ′ ) − H(µ′i )) ∑ ′

′ ) − H(µ′ ))

p
(H(µ

i
i
 ∑


=
pi (H(µ) − H(µi ))
Therefore:
Ve (µ′ ) =e−ρdt
∑
p′′i V (µ′′i ) − f
(∑
)
p′′i (H(µ′ ) − H(µ′′i ))
≤T (V )(µ′ )
The inequality is from suboptimality of (p′′i , µ′′i ) in Equation (A.5). What’s more, it’s
obvious that Ve (µ) = T (V )(µ). Therefore:
lim inf
T (V )(µ′ ) ≥ lim inf
Ve (µ′ ) = Ve (µ) = T (V )(µ)
′
′
µ →µ
µ →µ
Then T (V )(µ) is lower semi-continuous on ∆X. Second, we show that T (V ) is upper
semi-continuous i.e. lim supµ′ →µ T (V )(µ′ ) ≤ T (V )(µ). Consider a sequence of µn → µ
36
s.t. lim T (V )(µn ) = lim supµ′ →µ T (V )(µ′ ). Now since the number of posteriors in the
optimization problem is bounded by 2 |X| by Theorem 10, we can find a subsequence if
n s.t. (pni , µni ) → (pi , µi ). This is done by choosing converging µni and pni for each index.
Now by continuity of H(µ), we get:
∑
∑

p
(H(µ)
−
H(µ
))
=
lim
pni (H(µn ) − H(µni ))
i
i


n→∞
∑
p i µi = µ


∑


p =1
i
Since V (µ) is continuous, it will be continuous at each µi . Since f (C) is convex, it will be
∑
continuous at C < ∞ s.t. f (C) < ∞. By optimzality ,we know that f ( pni (H(µn ) − H(µni ))) <
∑
∞. Since f −1 (R+ ) is closed, we know that f ( pi (H(µ) − H(µi ))) < ∞, then f is continuous around it.
(∑
)
∑
−ρdt
T (V )(µ) ≥e
pi V (µi ) − f
pi (H(µ) − H(µi ))
(∑
)
∑
= lim e−ρdt
pni V (µni ) − f
pni (H(µn ) − H(µni ))
n→∞
= lim sup T (V )(µ′ )
µ′ →µ
To sum up, T (V )(µ) is continuous.
• T (V ) is monotonic. Suppose U (µ) ≥ 0 and U + V ∈ Z. If T (V )(µ) = F (µ), then by
construction T (V + U ) ≥ F (µ) = T (V )(µ). If T (V )(µ) > F (µ), let (pi , µi ) be solution to
Equation (A.6) at µ for V .:
∑
T (V + U )(µ) ≥e−ρdt
pi (V (µi ) + U (µi ))
∑
=T (V )(µ) + e−ρdt
pi U (µi )
≥T (V )(µ)
• T (V ) is contraction. We claim that T (V + α)(µ) ≤ T (V )(µ) + e−ρdt α. Suppose not at µ.
Obviously T (V + α)(µ) > F (µ). Then let (pi , µi ) be the solution of Equation (A.6) at µ
for V + α.
∑
pi V (µi )
T (V )(µ) ≥e−ρdt
∑
pi (V (µi ) + α) − e−ρdt α
=e−ρdt
=T (V + α)(µ) − e−ρdt α
>T (V )(µ)
Contradiction.
• Therefore, by Blackwell condition, T (V ) is a contraction mapping on Z. And there exists
a unique solution Vdt ∈ Z solving the fixed point problem T (Vdt )Vdt .
37
A.3. Continuous Time Limit of Experiments
A.3.1. Proof of Lemma 4
Proof. First, Lemma 10 shows that with both set of assumptions, flow cost of optimal
strategy in each period will be bounded by a finite constant times interval length. Then
let’s discuss all possible cases:
• Case 1 : Suppose for some i, µ′idt − µ ̸→ 0. Let’s take subsequence that they converge
to µ′i away from µ. Without loss, we can combine all other signals and have a lower
conditional mutual information since the experiment is less informative. For the combined signal µ′0 , we must have µ′0 → µ. Otherwise it’s obvious we will have limit of
mutual information of order zero, contradiction.
∑
Since µ′0 → 0 and
pidt µ′idt + p0dt µ′0dt = µ, we have:
∑
pidt H(µ′idt ) + p0dt H(µ′0dt ) − H(µ)
∑
=
pidt (H(µ′idt ) − H(µ) − H ′ (µ)(µ′idt − µ))
+ p0dt (H(µ′0dt ) − H(µ) − H ′ (µ)(µ′idt − µ))
∑
=
pidt (H(µ′idt ) − H(µ) − H ′ (µ)(µ′idt − µ))
H ′′ (µ) ′
+ p0dt
(µ0dt − µ)2 + O((µ′0dt − µ)3 )
2
∑
≥
pidt (H(µ′i ) − H(µ) − H ′ (µ)(µ′i − µ)) when dt → 0
By convexity of entropy function and µ′i are bounded away from µ, we know that the
time multiplied by pidt are strictly positive. Thus for the whole term to be bounded
by O(dt), pidt has to be O(dt).
• Case 2 : Now for the rest i’s, µ′idt → 0. We combine all√the other i’s to µ′0dt , p0dt . By
the argument in the first part, we know µ′0dt − µ ∼ O( dt) (This is got by observing
the quadratic term in the third line). We have:
∑
pidt H(µ′idt ) + p0dt H(µ′0dt ) − H(µ)
∑
H ′′ (µ) ′
3
=
pidt
(µidt − µ)2 + O(pidt )O((µ′idt − µ) ) + O(dt)
2
To have the whole term bounded by O(dt), we must have pidt (µ′idt − µ)2 ∼ O(dt).
Thus for all convergence sequence of experiments, there are only two kinds of limiting
behavior, either µ′idt ̸→ µ and pidt ∼ O(dt), or µ′idt → µ and pidt (µ′idt − µ)2 ∼ O(dt).
To put it differently, optimal policies involves only Poisson like signals and diffusion like
signals in the limit.
∗
(µ) ≤
Lemma 10 (Bounded flow cost). With Assumption 2′ satisfied, ∃∆ ∈ R+ s.t. Cdt
∑
∗
∆dt. ∀µ, dt. Where Cdt (µ) =
pi (H(µ) − H(µi )) for optimal (pi , µi ) in Equation (7)
Proof. ∀dt, ∀ optimal policy (pi (µ), µi (µ)), ∀µ ∈ ∆(X), by optimality:
( ∗
{
) }
∑
Cdt (µ)
−ρdt
pi V (µi ) − h
max F (µ), e
dt
dt
38
−2ρdt
≥e
∑
(
pi V (µi ) − h
∗
Cdt
(µ)
2dt
)
(
dt − e
−ρdt
h
∗
Cdt
(µ)
2dt
)
dt.
The first line is simply definition of optimal value function. The second line is the value
from delaying experimentation by one more period and divide the experiment such that
the costs paid in each period are the same. Let’s first prove that dividing the experiment
in this way is feasible.
Consider any optimal experiment (pi , µi ) at prior µ. There are two possibilities:
• Case 1 : ∀i, µ′i = µ: That’s to say the optimal experiment at prior µ is uninformative.
∗
Then Cdt
(µ) = 0 and proof is done.
∗
• Case 2 : Cdt
(µ) > 0 =⇒ ∃µi ̸= µ. Take µ′i = αi µi + (1 − αi )µ. Consider the following
signal structure:
{
µij = µj
pij = αi pj + 1j=i (1 − αi )
∑
∑ i i
p
µ
=
α
pj µj + (1 − αi )µi = µ′i

i
j
j


j
=⇒ ∑
∑
i
′
i
′


p
(H(µ
)
−
H(µ
))
=
H(µ
)
−
α
pj H(µj ) − (1 − αi )H(µi )
i
j
i
j
i

j
( i i)
µ , p satisfies Bayes condition at µ′i and the associated cost is 1 when αi = 0,
∑j j
∑
pi (H(µ) − H(µi )) when αi = 1. By continuity of H, there exists αi s.t. j pij (H(µ′i )−
∑
i ))
H(µij )) = pi (H(µ)−H(µ
. Suppose αi is already defined for all µi ̸= µ. For µi = µ, let
2
αi = 1. Consider the following signal structure:
 ′
′

µ i = µ i
pi
1−αi
′

pi = ∑ pj
j 1−αj
=⇒
∑
∑ p i αi
∑
p
µ
i
i
i
p′i µ′i = ∑ pj + ∑ 1−α
pj µ = µ
1−αj
i
∑
∑
p′i (H(µ)
−
H(µ′i ))
1−αj
∑
∑
p′i H(µ′i )
∑
∑
∑
∑
∑
=
pi (H(µ) − H(µi )) −
p′i H(µ′i ) −
αi p′i
pj H(µj ) −
(1 − αi )p′i H(µi )
∑
∑
∑
=
pi (H(µ) − H(µi )) −
p′i (H(µ′i ) −
pij H(µij ))
∑
pi (H(µ) − H(µi ))
=
2
=
pi (H(µ) − H(µi )) +
pi H(µi ) −
That is to say, we developed information structure (µ′i , p′i ) and (µij , pij ) such that the
induced belief has distribution (µi , pi ), the cost in first period and in any posterior in
C ∗ (µ)
∗
second period is dt2 . On the other hand, Cdt
(µ) > 0 implies F (µ) is suboptimal. Then
the optimal value is achieved by:
( ∗
)
∑
Cdt (µ)
−ρdt
pi V (µi ) − h
e
dt
dt
39
(
)
∗
Cdt
(µ)
≥e
pi V (µi ) − h
dt − e
h
dt
2dt
(
)
)
)
( ∗
( ∗
e−ρdt 1 − e−ρdt ∑
Cdt (µ)
Cdt (µ)
=⇒
pi V (µi ) ≥ h
− 2h
dt
dt
2dt
−2ρdt
∑
)
∗
Cdt
(µ)
2dt
(
−ρdt
Let’s proceed step by step:
• Step 1 : It’s obvious that left hand side is dominated by 2ρ∆ if we choose ∆ =
maxa,x u(a, x).
( ∗ )
( ∗ )
C (µ)
C (µ)
C ∗ (µ)
• Step 2 : We show that h dtdt
− 2h dt
bounds dtdt .
2dt
∫
x
h(x) − 2h(x/2) =
∫
′
h (z)dz − 2
0
∫ x
∫
′
=
h (z)dz −
x/2
x/2
∫
∫
=
0
x/2
0
z/2
h′ (z)dz
h′ (z)dz
0
x/2
h′′ (z + y)dzdy
0
x2 ε
≥
4
• Step 3 : To sum up:
∗
Cdt
(µ)2 ε
2
4dt
√
2ρ∆
∗
dt
=⇒ Cdt
(µ) ≤ 2
ε
2ρ∆ ≥
B. Proofs in Section 3
B.1. Convergence
B.1.1. Proof of Lemma 5
Proof. We break down the proof of Lemma Lemma 5 into three steps:
• Step 1 : Let V dt = lim supn→∞ V dtn , then V dt − V dtn → 0. First it’s trivial that V dtn is
2
2
2
an increasing sequence, because every experimentation strategy associated with 2dtn can
dt
be replicated in a problem with 2n+1
. The DM can always split the experiment into two
stages with equal cost in two periods and get an identical distribution of posterior beliefs
at the end of second period. Thus existence of V dt = lim V dtn is guaranteed by monotonic
2
convergence theorem.
Now let’s prove the convergence is uniform, i.e. V dtn is a Cauchy sequence under sup
2
norm. ∀m > n, ∀µ0 , consider the problem with 2dtm , consider the optimal experimentation
(pi (µ), µi (µ)) and associated action rule AT , the expected utility is:
∑
dt
e−ρT 2m Eµ0 [u(AT , X)] .
V dtm (µ0 ) =
2
40
=
∑
e
−ρT 2dt
n
2m−n
∑−1
e−ρτ 2m Eµ0 [u(AT 2m−n +τ , X)]
dt
(B.1)
τ =0
The second equality is get by rewriting T = 2m−n T ′ + τ . Then take summation first over
τ then over T ′ (and relabel T ′ to be T ).
Now we construct an experimentation strategy for problem with 2dtn . We combine all
experiments between 2m−n T to 2m−n (T + 1), and get the joint distribution of posteriors.
We use this as the signal structure in each period T . Given this construction, at the end of
each 2m−n T , the posterior distribution will be exactly as that using original experiment.
Then we assign same action as before to each posterior. By construction this action profile
satisfies Markov property of information (i.e. signal realization is a sufficient statistics
for action). In the end, the new experimentation strategy satisfies cost constraint due to
linearity of Mutual information. Therefore if we let U (µ0 ) be the discounted expected
utility associated with the aforementioned strategy at µ0 :
V dtn (µ0 ) ≥U (µ0 )
2
=
∑
−ρT 2dt
n
e
2m−n
∑−1
e−ρ 2n Eµ0 [u(A2m−n T +τ , X)]
dt
(B.2)
τ =0
=e
−ρ 2dt
n
∑
e
−ρT 2dt
n
2m−n
∑−1
Eµ0 [u(A2m−n T +τ , X)]
τ =0
>e−ρT 2n
dt
∑
e−ρT 2n
dt
2m−n
∑−1
e−ρ 2m Eµ0 [u(A2m−n T +τ , X)]
dt
τ =0
=e
−ρ 2dt
n
V dtm (µ0 )
2
Noticing that Equation (B.2) is different from Equation (B.1) by only one term: the
dt
dt
discounting term in inner summation (e−ρ 2m and e−ρ 2n ). This characterize the experiment design in problem 2dtn . In each period T , actions are all postponed to the
end of period. Therefore they are discounted by 2dtn , which is period length. The
dt
second equality is from moving the constant e−ρ 2n out of summations. The next indt
equality is from
e−ρ 2m( < 1 By Lemma
3, Vdt are uniformly bounded by max v, then
)
dt
V dtn − V dtm ≤ max v 1 − e−ρ 2n → 0 when n → 0.
2
2
• Step 2 : ∀dt > 0, V dt are identical, WLOG we can call it V (µ). ∀dt, dt′ > 0, ∀n,
′
dt
consider V dtn . Pick m large enough that there exists N s.t. 2n+1
≤ N 2dtm ≤ 2dtn ≤
′
2
(N +1) 2dtm . Consider optimal experimentation and action associated with 2dtn , we construct
′
experimentation strategy for problem with 2dtm . For each time period T in the original
problem, split the experiment in period T into N + 1 periods and take any action at the
end of N + 1th period. In the new experiment strategy, the effective period length will
′
′
increase from 2dtn to (N + 1)c 2dtm . First, comparing the cost constraint c 2dtn < (N + 1)c 2dtm .
Therefore the new experiment strategy satisfies cost constraint. Second, since induced
posterior distribution and action distribution are still the same, Markov property still
41
holds. Finally:
∑
dt′
V dtm′ (µ0 ) ≥
e−ρT (N +1) 2m Eµ0 [u(AT , X)]
2
)
∑
∑(
dt
dt′
dt
=
e−ρT 2n Eµ0 [u(AT , X)] −
e−ρT 2n − e−ρT (N +1) 2m Eµ0 [u(AT , X)]
∑
∑
∑
′
−ρT 2dt
−ρT 2dt
−ρT (N +1) 2dtm n
n
≥
e
Eµ0 [u(AT , X)] − max v e
−
e
dt′
e−ρ 2n − e−ρ(N +1) 2m
)(
)
=V dtn (µ0 ) − max v (
dt
dt′
2
1 − e−ρ 2n
1 − e−ρ(N +1) 2m
dt
dt′
≥V dtn (µ0 ) − max v
dt′
e−ρN 2m − e−ρ(N +1) 2m
(1 − e−ρ 2n )2
dt
2
dt′
=V dtn (µ0 ) − max v
2
e−ρN 2m
−ρ 2dt
n
(1 − e
dt′
)2
e−ρ 2n+1
(eρ 2m − 1)
dt
≥V dtn (µ0 ) − max v
2
dt′
(1 − e−ρ 2n )2
dt
(eρ 2m − 1)
First inequality is from suboptimality
of the constructed experiment. Second inequality
′
′
−ρT 2dt
−ρT (N +1) 2dtm
n
is from e
≥e
. Third inequality is from 2dtn ≥ N 2dtm . Last inequality is
′
dt
from N 2dtm ≥ 2n+1
. Take m → ∞ on both side, we have V dt′ (µ0 ) ≥ V dtn (µ0 ). Then take
2
n → 0 on both side V dt′ (µ0 ) ≥ V dt (µ0 ). Since this holds for arbitrary dt, dt′ and µ0 , we
conclude that V dt = V dt′ .
• Step 3 : ∥V
dt − V ∥ →
0 when dt → 0. Fix any dt > 0, then ∀ε > 0, there exists N s.t.
∀n ≥ N , V dtn − V < 2ε . Then given the proof in last part, for any dt′ < 2dtn , suppose
2
dt
≤ N dt′ ≤
there exists N s.t. 2n−1
Vdt′ will be bounded by:
dt
2n
′
≤ (N + 1) 2dtm , then the difference between V dtn and
2
e−ρ 2n+1
dt
max v
[
]
(1 − e
−ρ 2dt
n
′
)
dt
′
Actually
such
N = 2n dt′ exists for any dt ≤
Vdt′ − V dtn < 2ε , then ∥Vdt′ − V ∥ < ε.
(eρdt − 1)
dt
.
2n
Thus there exists δ s.t. ∀dt′ < δ,
2
B.1.2. Proof of Theorem 1
Proof for Theorem 1 shows that {
V (µ) is within the family oflocally Lipschitz
} contin V (µ′ )−V (µ) uous functions on [0, 1]: V ∈ L = V : [0, 1] 7→ R, lim supµ′ →µ µ′ −µ ∈ R . Then
consider the following Bellmen equation defined on functions in space L:
{
∑
ρV (µ) = max ρF (v), sup
pi (V (µi ) − V (µ))
µi ∈[0,1],
pi ,σ̂∈R+
(
− DV
}
∑
)
D2 V (µ) 2
pi µi ∑
pi (µi − µ) +
σ̂
µ, ∑
pi
2
42
(B.3)
s.t. −
∑
pi (H(µi ) − H(µ) − H ′ (µ)(µi − µ)) −
H ′′ (µ) 2
σ̂ ≤ c
2
Since V is not necessarily differentiable, we use operator D and D2 to replace first and
second derivatives. D and D2 are defined as following:
Definition 1 (General Derivative). ∀f ∈ L:
{
(x)
when x′ > x
lim inf xn →x− f (xxnn)−f
′
−x
Df (x, x ) =
(x)
lim supxn →x+ f (xxnn)−f
when x′ < x
−x


when Df (x, x+ ) < Df (x, x− )

+∞
D2 f (x) =
−∞


lim sup
when Df (x, x+ ) > Df (x, x− )
2f (x+dx)−2f (x)−2Df (x)dx
dx→0
dx2
otherwise
Proof.
Local Lipschitz Continuity: First, since V is the uniform limit of continuous Vdt ,
V is continuous. Suppose V is not locally Lipschitz continuous. Then there exists µ s.t.
(µ)
| ≥ n. We discuss only the case µn > µ, µn < µ can be proved
∃ µn → µ, | V (µµnn)−V
−µ
using same method, then there are two possibilities.
•
V (µn )−V (µ)
µn −µ
c
≥ n. Then pick µ0 = 0, we have:
V (0) − V (µ) −
H(µ) − H(0) +
V (µn )−V (µ)
(0 − µ)
µn −µ
H(µn )−H(µ)
(0 − µ)
µn −µ
≥c
V (0) − V (µ) + nµ
H(µ) − H(0) +
H(µn )−H(µ)
(0
µn −µ
− µ)
(µ)
Noticing that the only difference between LHS and RHS is that V (µµnn)−V
is replaced
−µ
with n on RHS. Take n → ∞ on RHS, we observe that RHS goes to infinity. Therefore,
there exists N s.t. ∀n ≥ N , RHS is larger than 2ρ sup F .
c
V (0) − V (µ) −
H(µ) − H(0) +
V (µn )−V (µ)
(0 − µ)
µn −µ
H(µn )−H(µ)
(0 − µ)
µn −µ
≥ 2ρ sup F
(µn − µ)V (0) + (µ − 0)V (µn ) + V (µ)(0 − µn )
2ρ
≥
sup F
(µ − µn )H(0) + (0 − µ)H(µn ) + H(µ)(µn − 0)
c
µ−µn
0−µ
V (0) + 0−µ
V (µn ) − V (µ)
2ρ
0−µn
n
≥
sup F
=⇒
µ−µn
0−µ
c
H(µ) − 0−µn H(0) − 0−µn H(µn )
=⇒
=⇒
µ−µn
V
0−µn
(0) +
0−µ
V
0−µn
(µn ) − V (µ)
≥
2ρ
sup F
c
I(µn , 0|µ)
µ − µn
0−µ
ρ
=⇒
V (0) +
V (µn ) ≥ V (µ) + 2 sup F I(µn , 0|µ)
0 − µn
0 − µn
c
ρ
ρ
≥ V (µ)(1 + I(µn , 0|µ)) + sup F I(µn , 0|µ)
c
c
Then N can be chosen sufficiently large such that:
e
ρ
I(µn ,0|µ)
c
( ρ )k
∑
ρ
1
1
− 1 − I(µn , 0|µ) =
I(µn , 0|µ) ≤
c
(k + 1)! c
2
k=1
43
=⇒
ρ
0−µ
µ − µn
ρ
V (0) +
V (µn ) ≥ V (µ)e c I(µn ,0|µ) + sup F I(µn , 0|µ)
0 − µn
0 − µn
2c
Then we pick dt = I(µnc,0|µ) and dtm = 2dtm . m is chosen sufficiently large that |V −
ρ
ρ
Vdtm |e c I(µn ,0|µ) < 8c
sup F I(µn , 0|µ), then:
ρ
µ − µn
0−µ
ρ
Vdtm (0) +
Vdtm (µn ) ≥Vdtm (µ)e c I(µn ,0|µ) + sup F I(µn , 0|µ)
0 − µn
0 − µn
4c
We consider an experimentation strategy that divides the I(µn , 0|µ) uniformly into 2m
periods, and wait for 2m periods before taking action:
(
)
ρ
µ − µn
0−µ
ρ
− ρc I(µn ,0|µ)
e
Vdtm (0) +
Vdtm (µn ) ≥ Vdtm (µ) +
sup F I(µn , 0|µ)e− c I(µn ,0|µ)
0 − µn
0 − µn
4c
LHS will be the expected utility from taking the aforementioned experiment at µ.
ρ I(µn ,0|µ)
Taking m sufficiently large, RHS will be strictly larger than e c 2m Vdtm (µ). Thus
this experiment dominates optimal experiment of dtm problem at µ. Contradiction.
•
V (µn )−V (µ)
µn −µ
c
≤ −n. Then pick µ0 = 1, we have:
V (1) − V (µn ) −
H(µn ) − H(1) +
V (µn )−V (µ)
(1 − µn )
µn −µ
H(µn )−H(µ)
(1 − µn )
µn −µ
≥c
V (1) − V (µn ) + n(1 − µn )
H(µn ) − H(1) +
H(µn )−H(µ)
(1
µn −µ
− µn )
Take n → ∞ on RHS, RHS goes to infinity. Therefore there exists N s.t. ∀n ≥ N ,
RHS is larger than 2ρ sup F .
=⇒
1 − µn
ρ
µn − µ
V (1) +
V (µ) ≥V (µn ) + 2 sup F I(µ, 1|µn )
1−µ
1−µ
c
ρ
ρ
≥V (µn )(1 + I(µ, 1|µn )) + sup F I(µ, 1|µn )
c
c
Similar to last part, n can be chosen sufficiently large that:
ρ
µn − µ
1 − µn
ρ
V (1) +
V (µ) ≥ V (µn )e c I(µ,1|µn ) + sup F I(µ, 1|µn )
1−µ
1−µ
2c
Then pick dt =
I(µ,1|µn )
c
and dtm =
dt
,
2m
m can be chosen sufficiently large that:
ρ
µn − µ
1 − µn
ρ
Vdtm (1) +
Vdtm (µ) ≥ Vdtm (µn )e c I(µ,1|µn ) + sup F I(µ, 1|µn )
1−µ
1−µ
4c
We consider the similar experimentation strategy as before that divides the experiment:
)
(
ρ
1 − µn
µn − µ
ρ
− ρc I(µ,1|µn )
e
Vdtm (1) +
Vdtm (µ) ≥ Vdtm (µn ) + e− c I(µ,1|µn ) sup F I(µ, 1|µn )
1−µ
1−µ
4c
ρ I(µ,1|µn )
m can be taken sufficiently large that RHS is strictly larger than e c 2m Vdtm (µn ).
This experiment dominates optimal experiment of dtm problem at µn . Contradiction.
44
Unimprovability: Let’s show that V is unimprovable. Suppose not, then there
exists pi , µi , σ̂i s.t.:
( ∑
)
∑
pi µi ∑
D2 V (µ) 2
σ̂
ρV (µ) <
pi (V (µi ) − V (µ)) − DV µ, ∑
pi (µi − µ) +
2
pi
∑
H ′′ (µ) 2
s.t. −
pi (H(µi ) − H(µ) − H ′ (µ)(µi − µ)) −
σ̂ ≤ c
2
∑
We spilt µi ’s into (µi , µj ) s.t.:
pi µi = µ and all remaining µj are on the same side of
µ. Then
∑
ρV (µ) <
pi (V (µi ) − V (µ))
∑
+
pj (V (µj ) − V (µ) − DV (µ, µj )(µj − µ))
D2 V (µ) 2
σ̂
∑2
c≥−
pi (H(µi ) − H(µ))
∑
−
pj (H(µj ) − H(µ) − H ′ (µ)(µj − µ))
+
−
H ′′ (µ) 2
σ̂
2
Then if we compare the following three groups of ratios:
∑
(V (µj ) − V (µ) − DV (µ, µj )(µj − µ)) D2 V (µ)
p (V (µi ) − V (µ))
∑i
,
,
− pi (H(µi ) − H(µ))
−(H(µj ) − H(µ) − H ′ (µ)(µj − µ))
−H ′′ (µ)
At least one of them must be larger than
• Case 1 :
ρV (µ)
.
c
∑
ρ
p (V (µi ) − V (µ))
∑ i
> V (µ)
pi (H(µ) − H(µi ))
c
Then there exists ε > 0 s.t.:
∑
p (V (µi ) − V (µ))
ρ
∑ i
≥ V (µ) + ε
pi (H(µ) − H(µi ))
c
∑ pi
∑ (V (µi ) − V (µ))
=⇒
pi
∑ pi
∑ pi
ρ
∑ (H(µ) − H(µi )) + ε
∑ (H(µ) − H(µi ))
≥ V (µ) + V (µ)
c
pi
pi
(
)
∑
ρ
pi
=⇒
pei (V (µi ), µ) ≥ V (µ) 1 + I(µi |µ) + εI(µi |µ) (if we let pei = ∑ )
c
pi
pi , µi ) violates cost constraint. We define the
Now for any δt < I(µci |µ) , experiment (e
following experiment strategy: experiment (e
pi , µi ) is taken with probability I(µcdt
every
i |µ)
period. If it’s not taken, then same strategy is applied next period. Then the utility
associated with this strategy is:
((
)
)
∑
cdt
cdt
−ρdt
1−
Vedt = e
pei Vdt (µi )
Vedt +
I(µi |µ)
I(µi |µ)
45
∑
e−ρdt I(µcdt
pei Vdt (µi )
|µ)
i
(
)
=⇒ Vedt =
1 − e−ρdt 1 − I(µcdt
i |µ)
=⇒ lim Vedt =
∑
dt→0
pei V (µi ) lim
dt→0
=⇒ lim (Vedt − Vdt (µ)) >
dt→0
c
I(µi |µ)
(
e−ρdt ρ +
c
I(µi |µ)
∑
)=
pei V (µi )
1 + ρc I(µi |µ)
εI(µi |µ)
1 + ρc I(µi |µ)
Therefore, there exists δt sufficiently small that:
Vedt > Vdt (µ) +
1
1
εI(µi |µ)
2
+ ρc I(µi |µ)
Contradicting the optimality of Vdt (µ).
• Case 2 :
V (µj ) − V (µ) − DV (µ, µj )(µj − µ)
ρ
> V (µ)
′
H(µ) − H(µj ) + H (µ)(µj − µ)
c
It’s easy to see that if we call this posterior µ′ , corresponding probability p(µ′ ) =
c
, then ∃ ε ≥ 0 s.t.
H(µ)−H(µ′ )+H ′ (µ)(µ′ −µ)
p(µ′ )(V (µ′ ) − V (µ) − DV (µ, µ′ )) ≥ ρV (µ) + ε
We discuss the case µ′ > µ first. By Definition 1, there exists µ1 → µ− s.t.
V (µ1 )−V (µ)
(µ′ − µ)
ε
µ1 −µ
c
≥
ρV
(µ)
+
2
H(µ) − H(µ′ ) + H(µµ11)−H(µ)
(µ′ − µ)
−µ
ε
V (µ′ )(µ − µ1 ) − V (µ)(µ′ − µ1 ) + V (µ1 )(µ′ − µ)
≥ ρV (µ) +
c
′
′
′
H(µ)(µ − µ1 ) − H(µ )(µ − µ1 ) − H(µ1 )(µ − µ)
2
′
µ − µ1
µ −µ
ρ
ε
V (µ′ ) + ′
V (µ1 ) ≥ V (µ)(1 + I(µ1 , µ′ |µ)) + I(µ1 , µ′ |µ)
′
µ − µ1
µ − µ1
c
2c
V (µ′ ) − V (µ) −
=⇒
=⇒
Given ε, µ1 can be chosen arbitrarily close to µ− such that
e
( ρ )k
∑
ρ
1
ε
′
− 1 − I(µ , µ1 |µ) =
I(µ′ , µ1 |µ)k ≤
c
(k + 1)! c
4 max v
k=1
ρ
I(µ′ ,µ1 |µ)
c
Then:
ρ
µ − µ1
µ′ − µ
ε
′
′
V
(µ
)
+
V (µ1 ) ≥ e c I(µ ,µ1 |µ) V (µ) + I(µ1 , µ′ |µ)
′
′
µ − µ1
µ − µ1
4c
Then pick ∆t =
∀ n ≥ N:
e
I(µ′ ,µ1 |µ)
,
c
−ρndtn
(
dtn =
dt
.
n
By uniform convergence, there exists N s.t.
µ − µ1
µ′ − µ
′
V
(µ
)
+
Vdt (µ1 )
dt
µ′ − µ1 n
µ′ − µ1 n
46
)
> Vdtn (µ)
cndtn = I(µ′ , µ1 |µ)
That is to say we find a feasible experiment, whose cost can be spread into n periods and
satisfies cost constraint. This experiment strictly dominates the optimal experiment at
µ for dtn discrete problem. Contradiction. Thus V must be unimprovable at µ. Same
proof applies to case where µ′ < µ.
2
D V (µ)
2
′
• Case 3 : c −H
′′ (µ) ≥ ρV (µ) + 2ε. Then by definition of operator D , there exists µ
sufficiently close to µ s.t.:
c
=⇒ c
V (µ′ )−V (µ)−DV (µ)(µ′ −µ)
(µ′ −µ)2
H(µ)−H(µ′ )+H ′ (µ′ −µ)
(µ′ −µ)2
′
≥ ρV (µ) + ε
V (µ ) − V (µ) − DV (µ)(µ′ − µ)
≥ ρV (µ) + ε
H(µ) − H(µ′ ) + H ′ (µ′ − µ)
Then by definition of operator D, there exists µ1 → µ s.t.:
c
=⇒
V (µ′ ) − V (µ) −
H(µ) − H(µ′ ) −
µ′ −µ
(V (µ) − V (µ1 ))
µ−µ1
µ′ −µ
(H(µ) − H(µ1 ))
µ−µ1
′
≥ ρV (µ) +
ε
2
µ − µ1
µ −µ
ρ
ε
V (µ′ ) + ′
V (µ1 ) ≥ V (µ)(1 + I(µ1 , µ′ |µ)) + I(µ1 , µ′ |µ)
′
µ − µ1
µ − µ1
c
2c
By similar argument as before, we can pick µ1 small enough such that 1 + ρc I(µ′ , µ1 |µ)
ρ
′
approaches e c I(µ ,µ1 |µ) . Noticing that the expression we are studying is exactly the same
as before. The using same argument, we rule out this case being possible.
Equality: Then we show that ∀V solving (B.3), V = V . Noticing that this automatically proves uniqueness of solution of (B.3).
• V (µ) ≥ V (µ): Suppose not, then consider U (µ) = V (µ) − V (µ). Since both V and V
are in L, U ∈ L. Therefore arg min U is non empty and min U < 0 according to our
assumption. Choose µ ∈ arg min U . Let (pi , µi , σ̂) be the strategy approaches V (µ):
( ∑
)
∑
p i µi ∑
D2 V (µ) 2
ρV (µ) =
pi (V (µi ) − V (µ)) − DV µ, ∑
pi (µi − µ) +
σ̂ + ε
pi
2
Now we compare DV and DV : ∀µn → µ− :
V (µn ) − V (µ)
V (µ) − V (µn )
=
µn − µ
µ − µn
V (µ) − V (µn ) + U (µ) − U (µn )
=
µ − µn
V (µn ) − V (µ)
≤
µn − µ
V (µn ) − V (µ)
V (µn ) − V (µ)
=⇒ lim inf
≤ lim inf
µn − µ
µn − µ
=⇒ DV− (µ) ≤ DV − (µ)
47
Similarly for µn → µ+ :
V (µn ) − V (µ) V (µn ) − V (µ) − U (µ) + U (µn )
=
µn − µ
µn − µ
V (µn ) − V (µ)
≥
µn − µ
=⇒ DV+ (µ) ≥DV + (µ)
Suppose DV+ (µ) > DV− (µ), then D2 V (µ) = +∞. This contradicts unimprovability of
V at µ. Therefore the only possibility is DV+ (µ) = DV( µ) = DV + (µ) = DV − (µ) = D.
∀µn → µ:
2V (µn ) − 2V (µ) − 2D 2V (µn ) − 2V (µ) − 2D − 2U (µ) + 2U (µn )
=
(µn − µ)2
(µn − µ)2
2V (µn ) − 2V (µ) − 2D
≥
(µn − µ)2
=⇒ D2 V (µ) ≥D2 V (µ)
Therefore:
ρV (µ) =
≤
∑
(
pi (V (µi ) − V (µ)) − DV
∑
∑
)
D2 V (µ) 2
pi µi ∑
σ̂ + ε
µ, ∑
pi (µi − µ) +
pi
2
pi (V (µi ) − V (µ) − (U (µ) − U (µi )))
( ∑
)
pi µi ∑
D2 V (µ) 2
− DV µ, ∑
σ̂ + ε
pi (µi − µ) +
2
pi
( ∑
)
∑
pi µi ∑
D2 V (µ) 2
∑
≤
pi (V (µi ) − V (µ)) − DV µ,
σ̂ + ε
pi (µi − µ) +
2
pi
<ρV (µ) + ε
(µ)
If we choose ε = ρV (µ)−ρV
in the begining,
we get a contradiction. The first inequality
2∑
∑
∑
pi µi
∑
. Therefore it takes DV− when the coefficient
comes from
pi µi ≥ pi µ iff µ ≤
pi
is negative and vice versa. The second inequality comes from µ ∈ arg min U .
• V (µ) ≥ V (µ): We prove by showing that ∀dt > 0, V ≥ Vdt . Suppose not, then there
exists µ′ , dt s.t. Vdt (µ′ ) > V (µ′ ). Let dtn = 2dtn . Since Vdtn is increasing, there exists
ε > 0 s.t. Vdtn (µ′ ) − V (µ′ ) ≥ ε ∀n ∈ N. Now consider Un = V − Vdtn . Un will be
continuous by Lemma 3 and Un (µ′ ) < −ε. Therefore, there exists µn ∈ arg min Un .
Since ∆(X) is compact, there exists a converging sequence lim µn = µ. By assumption,
Un (µn ) ≤ −ε, therefore µ must be in interior of ∆(X). Now consider the optimal
strategy of discrete time problem:

∑
n
−ρdtn

pi Vdtn (µni )
V
dt

n (µ ) = e

∑
pi (H(µn ) − H(µni )) = cdtn


∑


pi µni = µn
48
By definition of Un (µ):
∑ (
) ∑
pi (Vdtn (µni ) − Vdtn (µn ) − Un (µn ) + U (µni ))
pi V (µni ) − V (µn ) =
∑
≥
pi (Vdtn (µni ) − Vdtn (µn ))
(
)
= eρdtn − 1 Vdtn (µn )
≥ρdtn ε + ρdtn V (µn )
∑ pi (
)
=⇒ ρV (µn ) ≤ − ρε +
V (µni ) − V (µn )
dtn
The first equality is definition of Un . The first inequality is from µn ∈ arg min Un .
The second inequality is from ex − 1 = x and Un (µn ) ≤ −ε. Now since number of
posteriors µni is no more than 2 |X|, we can take a subsequence of n such that all
lim µni = µi . We partition µni into two kinds, lim µni = µi ̸= µ, lim µnj = µ. First, since
V is unimprovable, we have −D2 V (µ) ≤ ρc V (µ)H ′′ (µ) point wise. Since V ∈ C (1) ,
H ∈ C (2) , ∀η, there exists δ s.t. ∀ |µ′ − µ| ≤ δ:

D2 V (µ′ ) ≤ − ρ V (µ)H ′′ (µ) + η
c
|H ′′ (µ) − H ′′ (µn )| ≤ η
Then there exists N s.t. ∀n ≥ N , µnj − µ < δ, |µn − µ| < δ. We have an intermediate
value result:
V (µnj ) − V (µn ) − V ′ (µn )(µnj − µn )
(
)
= V ′ (ξjn ) − V ′ (µn ) (µnj − µn )
(
)2
≤ sup D2 V (ξ) µnj − µn
ξ∈[ξjn ,µn ]
(
)2
≤ sup D2 V (ξ) µnj − µn
|ξ−µ|≤δ
)(
( ρ
)2
≤ − V (µ)H ′′ (µ) + η µnj − µn
c
Therefore:
∑
(
) ∑ (
)
pi,j V (µni,j ) − V (µn ) =
pi V (µni ) − V (µn ) − V ′ (µn ) (µni − µn )
∑ (
(
))
+
pj V (µnj ) − V (µn ) − V ′ (µn ) µnj − µn
∑ (
)
≤
pi V (µni ) − V (µn ) − V ′ (µn )(µni − µn )
)
∑ (
)2 ( ρ
− V (µ)H ′′ (µ) + η
+
pj µnj − µn
c
∑
Now let pni =
pi
,
dtn
∑
σ̂n2 = 2
n
n
′ n
n
n
pn
j (H(µ )−H(µj )+H (µ )(µj −µ ))
,
−H ′′ (µn )dtn
we will have:
1
pni (H(µn ) − H(µni ) + H ′ (µn )(µni − µn )) − σ̂n2 H ′′ (µn ) = c
2
49
(pni , µni , σ̂ n ) is a feasible experiment at µ for problem Equation (B.3). Therefore, by
optimality of V at µn , we have:

1 n ′′
n
∑ (
)

n
n
n
n
′
n
n
n c + 2 σ̂ H (µ )

pi V (µi ) − V (µ ) − V (µ )(µi − µ ) ≤ ρV (µ )

c
n
ρ
(µ
)
V


D2 V (µn ) ≤ −
c H ′′ (µn )
∑
Then we study term
pj (µnj − µn )2 . Consider:
∑ (
)
pj H(µn ) − H(µnj ) + H ′ (µn )(µnj − µn )
∑ (
)2
)(
=
pj −H ′′ (ξjn ) µnj − µn
∑
∑
)2
(
pj (µnj − µn )2
≥
pj (−H ′′ (µ)) µnj − µn − η
Therefore, to sum up:
∑ pi,j (
) ∑ n(
)
V (µni,j ) − V (µn ) ≤
pi V (µni ) − V (µn ) − V ′ (µn )(µni − µn )
dtn
)
∑ pj (
)2 ( ρ
+
µnj − µn
− V (µ)H ′′ (µ) + η
dtn
c
1 2 ′′
n
c + 2 σ̂n H (µ )
≤ρV (µn )
c
(∑
+
pj (H(µn ) − H(µnj ) + H ′ (µn )(µnj − µn ))
)
) ρ
1 ∑ ( n
n 2
η
pj µj − µ
V (µ)
+
dtn
c
1 ∑
+
pj (µnj − µn )2 η
dtn
c + 12 σ̂n2 H ′′ (µn )
=ρV (µn )
c
(
)
) ρ
1 n
1 ∑ ( n
′′
n
n 2
V (µ)
+
σ̂ (−H (µ )) +
η
pj µj − µ
2
dtn
c
1 ∑
+
pj (µnj − µn )2 η
dtn
What’s more:
(
)
pj H(µn ) − H(µnj ) + H ′ (µn )(µnj − µn )
−µ ) ≤
−H ′′ (µ) − η
cdtn
≤
−H ′′ (µ) − η
n
2 ′′
c + σ̂n H (µ )
≤1
c
=⇒ ρV (µ) ≤ − ρε + ρV (µ)
∑
∑
pj (µnj
n 2
+η
ρV (µ)
c
+η
′′
′′
−H (µ) − η
−H (µ) − η
→ ρV (µ) − ρε when η → 0
50
Contradiction. Therefore
V (µ) ≥ Vdt (µ)
=⇒ V (µ) ≥ lim sup Vdt (µ) = V (µ)
dt→0
B.2. Characterization
B.2.1. Proof of Theorem 2
V
0.30
V
0.30
V
0.30
0.25
0.25
0.25
0.20
0.20
0.20
0.15
0.15
0.15
0.10
0.10
0.10
0.05
0.05
0.05
0.5
0.6
0.7
0.8
0.9
1.0
⋁
0.5
0.6
0.7
0.8
0.9
1.0
⋁
0.5
V
0.30
V
0.30
V
0.30
0.25
0.25
0.25
0.20
0.20
0.20
0.15
0.15
0.15
0.10
0.10
0.10
0.05
0.05
0.05
0.5
0.6
0.7
0.8
0.9
1.0
⋁
0.5
0.6
0.7
0.8
0.9
1.0
⋁
0.5
0.6
0.7
0.8
0.9
1.0
0.6
0.7
0.8
0.9
1.0
⋁
⋁
The two black (dashed and solid) lines are Fm−1 (µ), Fm (µ).
The blue line is optimal value function from taking immediate action m.
The red line is optimal value function from taking immediate action m − 1.
Figure B.8: Construction of optimal value function.
Proof. We prove Theorem 2 by constructing the point µ∗ and function ν(µ). By definition
of ν(µ), it’s a feasible mechanism in Equation (B.3), then the corresponding V (µ) will be
feasible. To prove Theorem 2, it will be sufficient to prove unimprovability of V (µ). We
prove unimprovabiltiy of V (µ) after the construction. To simplify notation, we define a
flow version of information measure:
J(µ, µ′ ) = H(µ) − H(µ′ ) − H ′ (µ)(µ′ − µ)
Then total flow information cost will be p(J(µ, µ′ )).
Algorithm:
In this part, we introduce the algorithm to construct V (µ) and ν(µ). We only discuss
the case µ ≥ µ∗ and the case µ ≤ µ∗ will follow by a symmetric method.
51
• Step 1 : By Lemma 13, there exists µ∗ ∈ [0, 1] and V (µ) defined as:
V (µ) = max
′
µ ,m
Fm (µ′ )
1 + ρc J(µ, µ′ )
• Step 2: We construct the first piece of V (µ) to the right of µ∗ . By Lemma 13, there
are three possible cases of µ∗ to discuss (we omitted µ∗ = 1 by symmetry).
Case 1: Suppose µ∗ ∈ (0, 1) and V (µ∗ ) > F (µ∗ ). Then, there exists m, ν(µ∗ ) > µ∗ s.t.
V (µ∗ ) =
Fm (ν(µ∗ ))
1 + ρc J(µ∗ , ν(µ∗ ))
(
)
With initial condition µ0 = µ∗ , V0 = V (µ∗ ), V0′ = 0 , we solve for Vm (µ) as defined in
Lemma 15. This refers to Figure B.8-1. Let µ̂m be the first µ ≥ µ∗ that:
Vm (µ) = max
′
µ ≥µ
c Fm−1 (µ′ ) − Vm (µ) − Vm′ (µ)(µ′ − µ)
ρ
J(µ, µ′ )
Noticing that Vm (µ̂m ) ≥ Fm−1 (µ) otherwise, there will be a µ even smaller.This
refers to Figure B.8-2. Then we solve for Vm−1 with initial condition µ0 = µ̂m , V0 =
Vm (µ̂m ), V0′ = Vm′ (µ̂m ). If m − 1 > m, we continue this procedure by looking for µ̂m−1
being first µ s.t.:
Vm−1 (µ) = max
′
µ ≥µ
c Fm−2 (µ′ ) − Vm (µ) − Vm′ (µ)(µ′ − µ)
ρ
J(µ, µ′ )
until we get Vm (µ). This refers to Figure B.8-3. Vm will be piecewise defined as Vm .
By definition, Vm (µ) will be smoothly increasing until it hits F . Since Vm (µ∗ ) > F (µ∗ ),
this intersection point will be µ∗∗ > µ∗ . This refers to the intersection point of red
curve and F in Figure B.8-3.
Case 2: Suppose µ∗ ∈ (0, 1) but V (µ∗ ) = F (µ∗ ), consider:
Let µ∗∗
c Fk (µ′ ) − F (µ)
Ve (µ) = max
µ′ ≥µ,k ρ
J(µ, µ′ )
{
}
e
= inf µ|V (µ) > F (µ) .
Case 3: Suppose µ∗ = 0, consider
c Fk (µ′ ) − F1 (µ) − F1′ (µ′ − µ)
Ve (µ) = max
µ′ ≥µ,k ρ
J(µ, µ′ )
{
}
e
≤ inf F . Therefore, inf µ|V (µ) > F1 (µ) >
There exists δ s.t. ∀µ < δ, ∀µ ≤ µ2 ,
∗∗
0. We call it µ . This step refers to Figure B.8-4.
′
7
7
sup F
J(µ,µ′ )
µk = inf {µ|Fk (µ) > Fk−1 (µ)}
52
• Step 3: For all µ ≥ µ∗∗ such that:
c Fk (µ′ ) − F (µ) − F ′− (µ)(µ′ − µ)
µ ≥µ,k ρ
J(µ′ , µ)
F (µ) = max
′
(B.4)
Let m be the optimal action. Solve for Vm with initial condition µ0 = µ, V0 =
F (µ), V0′ = F ′− (µ). Let µ̂m be the first µ ≥ µ0 that:
Vm (µ) = max
′
µ ≥µ
c Fm−1 (µ′ ) − Vm (µ) − Vm′ (µ)(µ′ − µ)
ρ
J(µ, µ′ )
Then we solve for Vm−1 with initial condition µ0 = µ̂m , V0 = Vm (µ̂m ), V0′ = Vm′ (µ̂m ).
This step refers to Figure B.8-5. We continue this procedure until we get Vm0 , where
m0 is the index such that Fm0 −1 (µ) = F (µ). Now suppose Vm0 first hit F (µ) at some
point µ′′ (can potentially be µ), define:

′

if µ′ < µ

F (µ )
Vµ (µ′ ) = Vm0 (µ′ ) if µ′ ∈ [µ, µ′′ ]


F (µ′ )
if µ′ > µ′′
Vµ ≥ F point wise. Vµ can be identical to F . However whennever Vµ is not identical
to F , Vµ (µ′ ) > F (µ′ ) on an open interval (µ, µ′′ ). Call the set of all these µ0 : Ω. Since
F ′− (µ) is a left continuous function that only jumps down. It’s not hard to verify that
Ω is a closed set.
• Step 4: Define:
V (µ) =

Vm (µ)
if µ ∈ [µ∗ , µ∗∗ ]
 sup {Vµ0 (µ)}
if µ ≥ µ∗∗
µ0 ∈Ω
Then ∀V (µ) > F (µ), there must exists µn s.t. V (µ) = limn Vµn (µ) by definition of sup.
Since Ω is a closed set (and bounded in [0, 1]), there exists µnk → µ0 . By continuous
dependence, Vµ0 (µ) = V (µ). Then in the open interval around µ that V > F , V (µ) =
Vµ0 (µ). Otherwise Vµ0 intersects some other Vµ′ in the region Vµ0 > 0, which violates
uniqueness of ODE. Therefore V is a smooth function on {µ|V (µ) > F (µ)}. This step
refers to Figure B.8-6.
In the algorithm, we only discussed the case µ∗ < 1 and constructed the value function
on the right of µ∗ . On the left of µ∗ , V can be defined by using a totally symmetric
argument by referring to Lemma 15′ .
Before we proceed to proof of smoothness and unimprovability of V , we state a useful
result:
Lemma 11. ∀µ ≥ µ∗ s.t. V (µ) = Fm (µ), we have:
Fm (µ) ≥ max
′
µ ≥µ
c Fm+k (µ′ ) − Fm (µ) − Fm− ′ (µ′ − µ)
k
≜ Um
(µ)
ρ
J(µ, µ′ )
53
k
We prove this result by contradiction. Suppose not true, then exists µ s.t. Um
(µ) >
−′
k
Fm (µ). F is a lower-semi continous and left continuous function. Then Um will be
upper-semi continuous and left continuous w.r.t. µ when m is taken that F (µ) = Fm (µ)
k
(a continuous function with only downward jumps). By the definition of µ∗∗ and Um
(µ) >
k
Fm (µ), there exists µ0 > 0 s.t. Um0 (µ0 ) = Fm0 (µ0 ), where m0 is the corresponding index
s.t. F (µ0 ) = Fm0 (µ0 ). Take µ0 < µ to be the supremum of µ0 such that this is true
(
)
(we know the existence by continuity). Now consider initial condition µ0 , Fm0 (µ0 ), Fm′ 0 ,
by Lemma 15, we solve for Vk (µ) on [µ0 , µ]. Now consider any µ′ ∈ (µ0 , µ), suppose
Vk (µ′ ) ≤ Fm′ (µ′ ), then by immediate value theorem, µ′ can be picked that Vk′ (µ′ ) ≤ Fm′ ′ .
Therefore:
c Fk (µ′′ ) − Vk (µ′ ) − Vk′ (µ′ )(µ′′ − µ′ )
J(µ′ , µ′′ )
µ′′ ρ
c Fk (µ′′ ) − Fm′ (µ′ ) − Fm′ ′ (µ′′ − µ′ )
≥ sup
J(µ′ , µ′′ )
µ′′ ρ
Vk (µ′ ) = sup
>Fm′ (µ′ )
Last inequality is from the fact that µ′ ∈ (µ0 , µ]. Therefore, by definition of V (µ),
V (µ) ≥ Vk (µ) on [µ0 , µ]. This contradicts the fact that V (µ) = Fm (µ).
Lemma 11′ . ∀µ ≤ µ∗ s.t. V (µ) = Fm (µ), we have:
Fm (µ) ≥ max
′
µ ≤µ
c Fm−k (µ′ ) − Fm (µ) − Fm+ ′ (µ′ − µ)
k
≜ Um
(µ)
ρ
J(µ, µ′ )
Smoothness:
Given our construction of V (µ), ∀µ s.t. V (µ) > F (µ), V is piecewise solution of the
ODEs and is C (1) smooth by construction. However on {µ|V (µ) = F (µ)}, our definition
of V is by taking supremum over an uncountable set of Vµ ’s. Therefore V (µ) is not
necessarily differentiable. We now discuss smoothness of V on this set in details (we
only discuss µ ≥ µ∗ and leave the remaining case to symmetry argument). Suppose
µ ∈ {µ|V (µ) = F (µ)}o , then V = F locally on an open interval. To show smoothness of
V , it’s sufficient to show smoothness of F . Suppose not, then µ = µm . However, at µm :
′
′
′
c Fm+1 (µ ) − Fm (µm ) − Fm (µ − µm )
=∞
lim
µ′ →µ+
ρ
J(µm , µ′ )
m
Therefore, we apply the result just derived and get contradiction. Now we only need
to discuss the boundary of {µ|V (µ) = F (µ)}. The first case is that {µ|V (µ) > F (µ)} is
not dense locally. Therefore, V = F locally at only side of µ, which implies one sided
smoothness. The only remaining case is that there exists µn → µ s.t. F (µn ) < V (µn ). We
first show differentiability of V at µ. We already know that V (µ′ ) − V (µ) ≥ F ′ (µ)(µ′ − µ)
(µ)
since V ≥ F . Suppose now µn → µ+ and V (µµnn)−V
≥ F ′ (µ) + ε. Then apply Lemma 12
−µ
to V (µ) − F (µ), we can pick µn → µ+ and V ′ (µn ) ≥ F ′ (µ) + ε.
Consider ν(µn ) being the solution of posterior associated with µn , by definition of
µn , ν(µn ) ≥ µm+1 (when µn < µm+1 , the objective function will be negative, therefore
54
suboptimal for sure). So we can pick a converging subsequence of µ2n to some ν ≥ µm+1 .
then:
F (µ) = lim V (µn )
c Fmn (ν(µn )) − V (µn ) − V ′ (µn )(ν(µn ) − µn )
= lim
n→∞ ρ
J(µn , ν(µn ))
c Fmn (ν(µn )) − F (µn ) − (F ′ (µ) + ε)(ν(µn ) − µn )
≤ lim
n→∞ ρ
J(µn , ν(µn ))
c Fmn (ν(µn )) − F (µn ) − F ′ (µ)(ν(µn ) − µn )
cε ν(µn ) − µn
≤ lim
− lim
n→∞ ρ
n→∞
J(µn , ν(µn ))
ρ J(µn , ν(µn ))
′
c Fm′ (ν) − F (µ) − F (µ)(ν − µ) cε ν − µ
=
−
ρ
J(µ, ν)
ρ J(µ, ν)
<F (µ)
(µn )
Contradiction. Now suppose µn → µ− and V (µ)−V
≤ F ′ (µ) − ε. Then similarly we can
µ−µn
choose µn s.t. V ′ (µn ) ≤ F ′ (µ) − ε. Choose ν, m being the optimal posterior and action
at µ. Then:
F (µ) = lim V (µn )
c Fm (ν) − V (µn ) − V ′ (µn )(ν − µn )
≥ lim
n→∞ ρ
J(µn , ν)
c Fm (ν) − V (µn ) − F ′ (µ)(ν − µn ) c ε(ν − µn )
+
≥ lim
n→∞ ρ
J(µn , ν)
ρ J(µn , ν)
′
c Fm (ν) − V (µn ) − F (µ)(ν − µn )
c ε(ν − µn )
≥ lim
+ lim
n→∞ ρ
J(µn , ν)
n→∞ ρ J(µn , ν)
′
c Fm (ν) − F (µ) − F (µ)(ν − µ) cε ν − µ
+
=
ρ
J(µ, ν)
ρ J(µ, ν)
>F (µ)
Contradiction. Therefore we showed that V will be differentiable everywhere. Now
suppose V ′ is not continuous at µ. Utilizing previous proof, we have already ruled out
the cases when limµ′ →µ+ > F ′ (µ) and limµ′ →µ− < F ′ (µ). Suppose now exists µn → µ+
and V ′ (µn ) ≤ F ′ (µ) − ε. Then consider:
F (µ) = lim V (µn )
c Fm (ν) − V (µn ) − V ′ (µn )(ν − µn )
≥ lim
n→∞ ρ
J(µn , ν)
c Fm (ν) − V (µn ) − F ′ (µ)(ν − µn )
c ε(ν − µn )
≥ lim
+ lim
n→∞ ρ
J(µn , ν)
n→∞ ρ J(µn , ν)
′
c Fm (ν) − F (µ) − F (µ)(ν − µ) cε ν − µ
+
=
ρ
J(µ, ν)
ρ J(µ, ν)
>F (µ)
Contradiction. When µn → µ− and V ′ (µn ) ≥ F ′ (µ) + ε, similarly as before, we can take
ν(µn ) converging to ν ≥ µm+1 . Then:
F (µ) = lim V (µn )
55
c Fmn (ν(µn )) − V (µn ) − V ′ (µn )(ν(µn ) − µn )
n→∞ ρ
J(µn , ν(µn ))
c Fmn (ν(µn )) − F (µn ) − F ′ (µ)(ν(µn ) − µn )
cε ν(µn ) − µn
≤ lim
− lim
n→∞ ρ
n→∞ ρ J(µn , ν(µn ))
J(µn , ν(µn ))
′
c Fm′ (ν) − F (µ) − F (µ)(ν − µ) cε ν − µ
=
−
ρ
J(µ, ν)
ρ J(µ, ν)
= lim
<F (µ)
Contradiction. To sum up, we proved that V (µ) is differentiable on (0, 1) and V ′ (µ)
is continuous on (0, 1). What’s more, since µ∗ ∗ is bounded away from {0, 1}, in the
neighbour of {0, 1}, V = F . Therefore V (µ) is C (1) smooth on [0, 1].
Unimprovability:
Now we prove the unimprovability of V (µ).
• Step 1 : We first show that V (µ) solves the following problem:
{
}
c Fm (µ′ ) − V (µ) − V ′ (µ)(µ′ − µ)
V (µ) = max F (µ), max
µ′ ,m ρ
J(µ, µ′ )
{
µ′ ≥ µ when µ ≥ µ∗
(P-C)
µ′ ≤ µ when µ ≤ µ∗
Equation (P-C) is the maximization problem over all confirmatory evidence seeking
with immediate decision making upon arrival of signals. It has less constraint than the
definition of V (µ) in the following sense: when we are defining V (µ), we also optimize
over confirmatory eivdence seeking and immediate decision making. But we optimizing
over all possible actions in a sequential way – with µ increasing, the DM is forced to
choose actions with decreasing index m. However in Equation (P-C), unimprovability
is over all possible choice of actions.
We still focus on the case µ ≥ µ∗ . For the case µ ≤ µ∗ , a totally symmetric argument
applies by referring to Lemma 19′ .
Case 1 : V (µ) > F (µ). Then there exists µ0 s.t. V (µ) = Vµ0 (µ), and m0 is the optimal
action corresponding to µ0 . Suppose the associated action is m at µ. Then in the
construction of Vµ0 , we explicitly constructed µ̂m′ ≤ µ, ∀m′ ∈ {m0 , . . . , m}. Then by
Lemma 19, Equation (P-C) is satisfied at µ.
Case 2 : V (µ) = F (µ). Then according to Step 4, there are two possibilities. If µ ∈ Ω,
then by construction of Vµ , we have:
c Fk (µ′ ) − F (µ) − F ′ (µ)(µ′ − µ)
µ ≥µ,k ρ
J(µ, µ′ )
F (µ) = max
′
This exactly is Equation (P-C). If µ ̸∈ Ω and F (µ) is larger than the maximum on
RHS of Equation (B.4), then this also satisfies Equation (P-C).
The only remaining contradictory case is that µ ̸∈ Ω and:
c Fk (µ′ ) − F (µ) − F ′ (µ)(µ′ − µ)
µ ≥µ,k ρ
J(µ, µ′ )
F (µ) < max
′
By Lemma 11 we proved in the construction part, we conclude that this is impossible.
56
• Step 2 : Then we show that V (µ) solves the following problem:
{
}
c V (µ′ ) − V (µ) − V ′ (µ)(µ′ − µ)
V (µ) = max F (µ), max
µ′ ρ
J(µ, µ′ )
{
µ′ ≥ µ when µ ≥ µ∗
(P-D)
µ′ ≤ µ when µ ≤ µ∗
Equation (P-D) is the maximization problem over all confirmatory evidence seekings.
It has less constraint than Equation (P-C) in the following sense: when a signal arrives
and posterior belief µ′ is realized, the DM is allowed to continue experimentation
instead of being forced to take an action.
We only show that case µ ≥ µ∗ and a totally symmetric argument applies to µ ≤ µ∗ .
Suppose not, then there exists:
c Fm (µ′ ) − V (µ) − V ′ (µ)(µ′ − µ)
c V (µ′′ ) − V (µ) − V ′ (µ)(µ′′ − µ)
Ve = max
≤
V
(µ)
<
µ′ ≥µ,m ρ
J(µ, µ′ )
ρ
J(µ, µ′′ )
Suppose the maximizer is µ′ , m. Optimality implies Equation (B.9) and Equation (B.8):

ρ
ρ
Fm′ + Ve H ′ (µ′ ) = V ′ (µ) + Ve H ′ (µ)
c
(
) ( c
) (
)
 F (µ′ ) + ρ Ve H(µ′ ) − V (µ) + ρ Ve H(µ) = V ′ (µ) + ρ V (µ)H ′ (µ)(µ′ − µ)
m
c
c
c
We define L(V, λ, µ)(µ′ ) as a linear function of µ′
L(V, λ, µ)(µ′ ) = (V (µ) + λH(µ)) + (V ′ (µ) + λH ′ (µ))(µ′ − µ)
Define G(V, λ)(µ) as a function of µ:
G(V, λ)(µ) = V (µ) + λH(µ)
Then G(Fm , ρc Ve )(µ′ ) is a concave function of µ′ . Consider:
( ρ
)
(
ρ e) ′
′
e
L V, V , µ (µ ) − G Fm , V (µ )
c
c
FOC implies it will be convex and attains minimum 0 at µ′ . For any m′ other than m,
( ρ
)
(
ρ )
L V, Ve , µ (µ′ ) − G Fm′ , Ve (µ′ )
c
c
will be convex and weakly larger than zero. However,
( ρ
)
( ρ )
L V, Ve , µ (µ′′ ) − G V, Ve (µ′′ )
c
( c
)
ρe
′′
′
′′
′′
= − V (µ ) − V (µ) − V (µ)(µ − µ) − V J(µ, µ )
c
<0
(
)
(
)
ρe
ρe
′
Therefore L V, c V , µ (µ )−G V, c V (µ′ ) will have minimum strictly negative. Suppose it’s minimized at µ
e. Then FOC implies:
ρ
ρ
µ) + Ve H ′ (e
µ)
V ′ (µ) + Ve H ′ (µ) = V ′ (e
c
c
57
Consider:
(
(
ρ e)
ρe )
e (ν(e
µ)) − G Fm , V (e
ν)
L V, V , µ
c )
( ρc
)
(
ρ
=L V, Ve , µ (ν(e
µ)) − G Fm , Ve (ν(e
µ))
c
c
(
ρe
ρe ′ )
′
+ V (e
µ) − V (µ) + V (H(e
µ) − H(µ)) − V (µ) + V H (µ) (e
µ − µ)
c
c
(
)
ρ
ρ
≥V (e
µ) − V (µ) + Ve (H(e
µ) − H(µ)) − V ′ (µ) + Ve H ′ (µ) (e
µ − µ)
c (
c
( ρ )
)
ρ
µ) − L V, Ve , µ (e
µ)
=G V, Ve (e
c
c
>0
In the first equality we used FOC. In first inequality we used suboptimality of µ
e at µ.
′
However for m and ν(e
µ) being optimizer at µ
e:
( ρ
)
(
)
ρ
0 =L V, V (e
µ), µ
e (ν(e
µ)) − G Fm′ , V (e
µ) (ν(e
µ))
c)
( ρc
)
(
ρ
e (ν(e
µ)) − G Fm′ , Ve (ν(e
µ))
=L V, Ve , µ
c
c
ρ
+ (V (e
µ) − Ve )(H(e
µ) − H(ν(e
µ)) + H ′ (e
µ)(ν(e
µ) − µ
e))
c
ρ
> (V (e
µ − Ve ))J(e
µ, ν(e
µ))
c
Contradiction. Therefore, we proved Equation (P-D).
• Step 3 : We show that V satisfies Equation (B.3), which is less restrictive than Equation (P-D) by allowing 1) diffusion experiments. 2) evidience seeking of all possible
posteriors instead of just confirmatory evidence.
First, since V is smooth and has a differentiable optimizer ν, envelope theorem implies:
c −V ′′ (µ)(ν − µ)
−H ′′ (µ)(ν − µ)
+ V (µ)
ρ
J(µ, ν)
J(µ, ν)
(
)
ρ
c ν−µ
′′
′′
V (µ) + V (µ)H (µ)
=−
ρ J(µ, ν)
c
V ′ (µ) =
>0
ρ
=⇒ V ′′ (µ) + V (µ)H ′′ (µ) < 0
c
Therefore, allocating to diffusion experiment will always be suboptimal. What’s more
consider:
c V (µ′ ) − V (µ) − V ′ (µ)(µ′ − µ)
µ ≤µ ρ
J(µ, µ′ )
(
c µ′ − µ
ρ − ′′ )
−′
′′
=⇒ V (µ) = −
V (µ) + V H (µ)
ρ J(µ, µ′ )
c
V − (µ) = max
′
V − (µ∗ ) = V (µ∗ ) and whenever V (µ) = V − (µ), we will have V −′ (µ) < 0. Therefore,
V − (µ) can never cross V (µ) from below. That is to say:
{
}
1 ′′
′
′
′
2
ρV (µ) = max ρF (µ), max
p(V (µ ) − V (µ) − V (µ)(µ − µ)) + V (µ)σ
µ′ ,p,σ
2
58
1
s.t. pJ(µ, µ′ ) + H ′′ (µ)σ 2 ≤ c
2
To sum up, we construncted a policy function ν(µ) and value function V (µ) solving
Equation (B.3). Now we show the four properties in Theorem 2. First, by our construction
algorithm, in the case µ∗ ∈ {0, 1}, we can replace µ∗ with µ∗∗ ∈ (0, 1). Therefore we can
{
}
WLOG set µ∗ ∈ (0, 1). Second, E = µ ∈ [0, 1]V (µ) > F (µ) is an open set, thus a
union of disjoint open intervals. By our construction, in each interval, ν is piecewise
smoothly defined by ODE. Therefore, ν(µ) is a piecewise C (1) smooth function. Third,
by Lemma 17, solution to the ODE is strictly decreasing. By our construction, on each
interval constructing E, when µ ≥ µ∗ increase ν always jumps to an action with lower
slope. Therefore ν is piecewise decreasing. When µ ≤ µ∗ symmetric argument applies.
Finally, discussion in Lemma 17 shows that when restricted to only one action, ν is
uniquely determined by FOC. Therefore, except for those discountinous points of ν, ν
is uniquely defined. Number of such discontinuous points is countable, thus of zero
measure.
Lemma 12. Suppose f : D ∈ R 7→ R is continuous and f is differentiable at x s.t.
f (x) ̸= 0. Then ∀x s.t. f (x) = 0:
• limx′ →x+
f (x′ )−f (x)
x′ −x
> ε, then there exists xn → x+ s.t. f ′ (xn ) ≥ ε.
• limx′ →x+
f (x′ )−f (x)
x′ −x
< −ε, then there exists xn → x+ s.t. f ′ (xn ) ≤ −ε.
• limx′ →x−
f (x′ )−f (x)
x′ −x
> ε, then there exists xn → x− s.t. f ′ (xn ) ≥ ε.
• limx′ →x−
f (x′ )−f (x)
x′ −x
< −ε, then there exists xn → x− s.t. f ′ (xn ) ≤ −ε.
Proof. We only prove the first result and the other three follow by symmetric argument.
′
(x)
(x)
Suppose limx′ →x+ f (xx)−f
> ε, then there exists xn → x+ s.t. f (xxnn)−f
≥ ε. Now define
′ −x
−x
′
′
′
′
g(x ) = f (x )−ε(x −x). We have g(x) = 0 and g(xn ) ≥ 0. Since g(x ) = 0 =⇒ f (x′ ) > 0,
g is differentiable at its roots. Suppose g(x) ≤ 0 on [x, xn ], then ∀x′ < xn , g(x′ ) ≤ g(xn ).
Therefore g ′ (xn ) ≥ 0 =⇒ f ′ (xn ) − ε ≥ 0. Suppose g(x′ ) > 0 for some x′ ∈ (0, xn ), then
g(x′ ) is maximized at interiror point µ∗n and FOC implies g ′ (x∗n ) = 0 =⇒ f ′ (x∗n ) = ε. In
this case, we deifne xn = x∗n . Since x∗n ∈ [x, xn ], the newly defined xn → x+ .
Lemma 13. Define V + and V − :
Fm (µ′ )
ρ
µ ≥µ,m 1 + J(µ, µ′ )
c
′
F
m (µ )
V − (µ) = max
ρ
µ′ ≤µ,m 1 + J(µ, µ′ )
c
V + (µ) = max
′
There exists µ∗ ∈ [0, 1] s.t. V + (µ) ≥ V − (µ) ∀µ ≥ µ∗ ; V − (µ) ≤ V − (µ) ∀µ ≤ µ∗ .
+
−
Proof. We define function Um
and Um
as following:
+
Um
(µ) = max
′
µ ≥µ
Fm (µ′ )
1 + ρc J(µ, µ′ )
59
−
Um
(µ) = max
′
µ ≤µ
Fm (µ′ )
1 + ρc J(µ, µ′ )
Since Fm (µ) is a linear function, J(µ, µ′ ) ≥ 0 and smooth, the objective function will
be a continuous function on compact domain. Therefore both maximization operators
are well defined. Existence is already guaranteed, therefore we can refer to first order
condition to characterize the maximizer:
(
)
ρ
ρ
′
′
(B.5)
FOC : Fm 1 + J(µ, µ ) + Fm (µ′ ) (H ′ (µ′ ) − H ′ (µ)) = 0
c
c
ρ
SOC : Fm′ (H ′ (µ′ ) − H ′ (µ))
(B.6)
c
First we discuss first problem where µ′ ≥ µ, since (1 + ρc J) > 0, H ′′ < 0, we have
H ′ (µ′ ) − H ′ (µ) ≤ 0 and inequality is strict when µ′ > µ. Therefore, if Fm′ < 0, FOC
being held will imply SOC being strictly positive. So ∀Fm′ < 0, optimal µ′ will be
boundary. What’s more,
Fm (µ)
Fm (1)
= Fm (µ) > Fm (1) >
ρ
1 + c J(µ, µ)
1 + ρc J(µ, 1)
If Fm′ = 0, then ∀µ′ > µ:
Fm (µ′ )
fm (µ)
′
=
F
(µ)
=
F
(µ
)
≥
m
m
1 + ρc J(µ, µ)
1 + ρc J(µ, µ′ )
+
Therefore ∀Fm′ ≤ 0, Um
(µ) = Fm (µ). Then we only need to consider the case Fm′ > 0. We
will have SOC strictly negative when FOC holds. Therefore solution of FOC characterizes
maximizer.
ρ
ρ
Fm′ (1 + J(µ, µ)) + Fm (µ) (H ′ (µ) − H ′ (µ)) = Fm′ > 0
c
c
ρ
ρ
′
′
lim
Fm (1 + J(µ, µ )) + Fm (µ) (H ′ (µ′ ) − H(µ)) = −∞
′
µ →1
c
c
Therefore a unique solution of µ′ exists by solving FOC. Since FOC itself is a smooth
function of µ, µ′ , and SOC is non-diminishing, implicit function theorem implies µ′ being
a smooth function of µ. This is sufficient to apply envelope theorem:
Fm (µ′ )(−H ′′ (µ)(µ′ − µ))
d +
Um (µ) =
>0
(
)2
dµ
1 + ρ J(µ, µ′ )
c
Let m being the first Fm′ > 0 (not necessarily exists). Let:
+
U + (µ) = max Um
(µ)
m≥m
Then U + (µ) will be a strictly increasing function when it’s well defined and we define:
{
}
V + (µ) = max F (µ), U + (µ)
Remark. U + doesn’t necessarily exist. Existence of U + is equivalent ot existence of an
strictly increasing Fm . Similar for U − in the following discussion for the other case.
60
Similarly, in the second problem, we have H ′ (µ′ ) − H ′ (µ) ≥ 0. Therefore ∀Fm′ ≥ 0,
we have
−
Um
(µ) = Fm (µ)
∀Fm′ < 0, we have a unique and smooth solution characterized by FOC. Envelope theorem
implies:
Fm (µ′ )(−H ′′ (µ))(µ′ − µ)
d −
Um (µ) =
<0
(
)2
dµ
1 + ρ J(µ, µ′ )
c
We can define:
{
}
V − (µ) = max F (µ), U − (µ)
Where U − (µ) is a strictly decreasing function which might not exist. However, at least
one of U − (µ) and U + (µ) exists otherwise the decision problem is trivial. Now we can
discuss the intersection of U + and U − . We first eliminate one possible case: U + and U −
has interior intersection µ∗ ∈ (0, 1) but at µ∗ , U + (µ∗ ) = U − (µ∗ ) = F (µ∗ ) ̸= 0. We show
by contradiction that this is not possible. Suppose such µ∗ exists and F (µ∗ ) = Fm (µ∗ ),
Fm′ < 0. Previous FOC implies
ρ
ρ
lim
Fm′ (1 + J(µ, µ′ )) + Fm (µ) (H ′ (µ′ ) − H ′ (µ)) = Fm′ < 0
′
µ →µ
c
c
−
and boundary solution
Therefore there must exists interior solution µ′ < µ∗ maximizes Um
−
∗
∗
µ must be suboptimal. Contradicting U (µ ) = F (µ ). By symmetry, we can also prove
the other case Fm′ > 0 doesn’t exist. Therefore, we can define µ∗ in the following three
cases:
• Case 1: Both U + , U − exist and intersect at µ∗ ∈ (0, 1), where U + (µ∗ ) = U − (µ∗ ) ≥
F (µ∗ ), then we define this intersection as µ∗ . Equality holds only when F ′ (µ) = 0.
• Case 2: Only U + exists or U + ≥ U − , we define µ∗ = 0.
• Case 3: Only U − exists or U − ≤ U + , we define µ∗ = 1.
Finally, we define V as:
V (µ) =
{
V + (µ) when µ ≥ µ∗
V − (µ) when µ ≤ µ∗
Given our construction, µ∗ always exists and satisfies the conditions in Lemma 13.
Lemma 14. Define V :
{
}
V (µ) := max V + (µ), V − (µ)
V is a piecewise smooth function. It is increasing if µ > µ∗ , V ′ > 0 when V (µ) > F (µ);
decreasing if µ < µ∗ , V ′ < 0 when V (µ) > F (µ).
61
Proof. We discuss three cases of µ∗ separately:
• Case 1: µ∗ ∈ (0, 1). Suppose F ′ (µ) = 0, then it’s trivial. Because both V and U + are
increasing on [µ∗ , 1], then V = max {F, U + } must be increasing. What’s more, when
V > F , V = U + , therefore V ′ > 0. By symmetric argument we can prove for [0, µ∗ ].
{
}
Suppose F ′ (µ) > 0, then V (µ) > F (µ). Consider µ1 = inf µ ≥ µ∗ |V (µ) = F (µ) ,
then F ′ ≥ V ′ > 0 on the left of µ1 . Then on [µ∗ , µ1 ], V = U + , on [µ1 , 1], F ′ > 0.
Therefore V is strictly increasing on [µ∗ , 1]. By symmetric argument we can prove for
[0, µ∗ ].
• Case 2: µ∗ = 0. Suppose U + ≥ U − , then U + ≥ F and the result is trivial. Suppose
U − doesn’t exist, then F itself is increainsg, then V = max {U + , F } is increasing.
• Case 3: µ∗ = 1. Suppose U + ≤ U − , then U − ≥ F and the result is trivial. Suppose
U + doesn’t exist, then F itself is decreainsg, then V = max {U + , F } is decreasing.
Lemma 15. Assume µ0 ≥ µ∗ , Fm′ ≥ 0, V0 , V0′ ≥ 0 satisfies:


V (µ0 ) ≥ V0 ≥ Fm (µ0 )
c Fm (µ′ ) − V0 − V0′ (µ′ − µ0 )

V
=
max
 0
µ′ ≥µ ρ
J(µ0 , µ′ )
Then there exists a C (1) smooth and strictly increasing V (µ) defined on [µ0 , 1] satisfying
V (µ) = max
′
µ ≥µ
c Fm (µ′ ) − V (µ) − V ′ (µ)(µ′ − µ)
ρ
J(µ, µ′ )
(B.7)
and initial condition V (µ0 ) = V0 , V ′ (µ0 ) = V0′ .
Proof. We start from deriving the FOC and SOC for Equation (B.7):
Fm′ − Vm′ (µ) Fm (µ′ ) − Vm (µ) − Vm′ (µ)(µ′ − µ)
+
(H ′ (µ′ ) − H ′ (µ)) = 0
′
′
2
J(µ, µ )
J(µ, µ )
( ′
)
′
′
′
′
H (µ ) − H (µ) Fm − Vm (µ) Fm (µ′ ) − Vm (µ) − Vm′ (µ)(µ′ − µ)
′
′
′
SOC:
+
(H (µ ) − H (µ))
J(µ, µ′ )
J(µ, µ′ )
J(µ, µ′ )2
H ′′ (µ′ )
(Fm (µ′ ) − Vm (µ) − Vm′ (µ)(µ′ − µ)) ≤ 0
+
J(µ, µ′ )
FOC:
If we impose feasibility:
Vm (µ) =
c Fm (µ′ ) − Vm (µ) − Vm′ (µ)(µ′ − µ)
ρ
J(µ, µ′ )
(B.8)
FOC and SOC reduces to:
ρ
FOC: Fm′ − Vm′ (µ) + V (µ)(H ′ (µ′ ) − H ′ (µ)) = 0
c
ρ ′′ ′
SOC: H (µ )V (µ) ≤ 0
c
62
(B.9)
(B.10)
We proceed as following, we use FOC and feasiblity to derive an ODE system with
intial value defined by V0 , V0′ . Then we prove that the solution must be strictly positive.
Therefore, SOC is satisfied at the point where FOC is satisfied, the solution must be
locally maximizer. What’s more, since H ′ (µ′ ) − H ′ (µ) < 0, when FOC is positive, SOC
must be negative, then FOC will have a unique solution. Therefore the solution we get
from the ODE system will be solution to problem Equation (B.7).

F (µ′ ) − Vm′ (µ)(µ′ − µ)

Equation (B.8) =⇒ Vm (µ) = m
1 + ρc J(µ, µ′ )

Equation (B.9) =⇒ V ′ (µ) = F ′ + ρ V (µ)(H ′ (µ′ ) − H ′ (µ))
m
m
c
(
)
Fm (µ′ ) − Fm′ + ρc Vm (µ)(H ′ (µ′ ) − H ′ (µ)) (µ′ − µ)
=⇒ Vm (µ) =
1 + ρc J(µ, µ′ )
(
)
ρ
ρ
=⇒ Vm (µ) 1 + J(µ, µ′ ) = Fm (µ) − Vm (µ)(H ′ (µ′ ) − H ′ (µ))(µ′ − µ)
c
c
Fm (µ)
=⇒ Vm (µ) =
ρ
1 + c (H(µ) − H(µ′ ) + (H ′ (µ) + H ′ (µ′ ) − H ′ (µ)) (µ′ − µ))

Fm (µ)



Vm (µ) = 1 − ρ J(µ′ , µ)
c
(B.11)
=⇒
ρ
Fm (µ)(H ′ (µ′ ) − H ′ (µ))


′
′
c

Vm (µ) = Fm +
1 − ρc J(µ′ , µ)
Consistency of Equation (B.11) implies that µ′ = ν(µ) characterized by the following
ODE:
ρ
F (µ)(H ′ (ν) − H ′ (µ))
∂
∂
Fm (µ)
Fm (µ)
′
c m
+
ν̇
=
F
+
m
∂µ 1 − ρc J(ν, µ) ∂ν 1 − ρc J(ν, µ)
1 − ρc J(ν, µ)
(B.12)
Simplifying Equation (B.12), we get:
ρ
ρ
F (µ)(H ′ (ν) − H ′ (µ))
F (µ)H ′′ (ν)(µ − ν)
Fm′
c m
c m
+
+
)2
)2 ν̇
(
(
1 − ρc J(ν, µ)
1 − ρc J(ν, µ)
1 − ρc J(ν, µ)
Fm′ + ρc (−Fm′ J(ν, µ) + Fm (µ)(H ′ (ν) − H ′ (µ)))
=
1 − ρc J(ν, µ)
=⇒ Fm (µ)(H ′ (ν) − H ′ (µ)) + Fm (µ)H ′′ (ν)(µ − ν)ν̇
ρ
=(−Fm′ J(ν, µ) + Fm (µ)(H ′ (ν) − H ′ (µ)))(1 − J(ν, µ))
c
ρ
ρ
′′
′
=⇒ Fm (µ)H (ν)(µ − ν)ν̇ = −Fm J(ν, µ)(1 − J(ν, µ)) − J(ν, µ)Fm (µ)(H ′ (ν) − H ′ (µ))
c
c
(
)
Fm′ 1 − ρc J(ν, µ) + ρc Fm (µ)(H ′ (ν) − H ′ (µ))
=⇒ ν̇ = J(ν, µ)
Fm (µ)H ′′ (ν)(ν − µ)
(
)
ρ
Fm′ 1 + c J(µ, ν) + ρc Fm (ν)(H ′ (ν) − H ′ (µ))
ν̇ = J(ν, µ)
Fm (µ)H ′′ (ν)(ν − µ)
Since we want to solve for V0 on [µ0 , 1], we solve for ν0 at µ0 as the initial condition of
ODE for ν. To utilitze Lemma 17, we need to first verify the inequality condition in
Lemma 17.
63
∗
The FOC characterizing optimizer νm
of V (µ0 ) is Equation (B.5):
)
(
ρ
∗
0 ρ
∗
Fm′ 1 + J(µ0 , νm
) + Fm (νm
) (H ′ (νm
) − H ′ (µ0 )) = 0
c
c
The FOC characterizing ν is Equation (B.11):
(
) ρ
ρ
(Fm′ − V0′ ) 1 − J(ν0 , µ0 ) + Fm (µ0 ) (H ′ (ν0 ) − H ′ (µ0 )) = 0
c )
c
(
(
)
ρ
ρ
ρ
′
⇔Fm 1 + J(µ0 , ν0 ) + Fm (ν0 )(H ′ (ν0 ) − H ′ (µ)) = V0′ 1 − J(ν0 , µ0 )
c )
(
) c
)
(c (
ρ
ρ
ρ
⇔Fm (µ0 ) Fm′ 1 + J(µ0 , ν0 ) + Fm (ν0 )(H ′ (ν0 ) − H ′ (µ)) = V0′ Fm (µ0 ) 1 − J(ν0 , µ0 )
c
c
c
(µ0 )
Since V0 = 1−FρmJ(ν
≥ 0, we can conclude that LHS is weakly positive. This satisifes
0 ,µ0 )
c
the condition in Lemma 17 when Fm (µ0 ) ̸= 0. When Fm (µ0 ) = 0 then the condition holds
for sure. Then Lemma 17 guarantees existence of ν(µ), which is continuously decreasing
from µ0 until it hits ν(µ) = µ. Suppose ν is minimized at µm < 1, we define Vm (µ) as
following:

Fm (µ)


if µ ∈ [µ0 , µm )
ρ
Vm (µ) = 1 − c J(ν(µ), µ)

F (µ)
if µ ∈ [µ , 1]
m
m
Then we prove the properies of Vm :
1. When µ → µm , ν(µ) → µ. Therefore J(ν, µ) → 0. This implies Vm (µ) → Fm (µ). So
Vm is continuous.
2. By Equation (B.11), when µ ∈ [µ0 , µm ):
Vm′ (µ) = Fm′ +
Fm (µ)(H ′ (ν(µ)) − H ′ (µ))
c
− J(ν(µ), µ)
ρ
When µ → µm , H ′ (ν(µ)) − H ′ (µ) → 0, J(ν(µ), µ) → 0. Thus Vm′ (µ) → Fm′ . So Vm′
will be continuous everywhere on [µ0 , 1]. Vm ∈ C (1) [µ0 , 1].
3. By Lemma 17, ν will be decreasing when µ < µm . Therefore 1 − ρc J(ν, µ) will be
increasing. So V (µ) > 0 ∀µ ∈ [µ0 , µm ). It’s trivial that when µ ≥ µm , Vm (µ) = Fm (µ).
Therefore Vm (µ) > 0 everywhere. By our previous discussion, ν will actually be the
maximizer.
4. Rewrite Equation (B.11) on [µ0 , 1]:
(
)
Fm′ 1 + ρc J(µ, ν) + Fm (ν)(H ′ (ν) − H ′ (µ))
′
Vm (µ) =
1 − ρc J(ν, µ)
Accodring to proof of Lemma 17, Vm′ (µ) > 0 ∀µ ∈ (µ0 , 1].
64
(B.13)
Lemma 16. ∀δ, η > 0, ∀µ, ν s.t. µ, ν ∈ (δ, 1 − δ), |Fm (µ)| > η,
(
)
Fm′ 1 + ρc J(ν, µ) + ρc Fm (ν)(H ′ (ν) − H ′ (µ))
L(µ, ν) = J(ν, µ)
(ν − µ)Fm (µ)H ′′ (ν)
L(µ, ν) is uniformly Lipschtiz continuous in ν and continuous in µ.
Proof. It’s not hard to see that ∀ξ > 0 when |µ − ν| > ξ, then L(µ, ν) will be well
behaved. We will discuss this case and µ = ν separately:
• When |µ − ν| > ξ, there exists ε, ∆ > 0 s.t.:


∆ ≥ |Fm (µ)| ≥ η


∆ ≥ |H ′′ (ν)| ≥ ε

∆ ≥ |H ′ (µ)| , |H ′ (ν)|




∆ ≥ |H(µ)| , |H(ν)|
and H ′′ (µ) having Lipschtiz parameter ∆ on [δ, 1 − δ]. Then:
|L(µ, ν) − L(µ, ν ′ )|
( (
)
)
J(ν, µ) F ′ 1 + ρ J(µ, ν) + ρ Fm (ν)(H ′ (ν) − H ′ (µ))
m
c
c
− J(ν ′ , µ) (F ′ (1 + ρ J(µ, ν ′ )) + ρ F (ν ′ )(H ′ (ν ′ ) − H ′ (µ))) m
m
c
c
=
′′
(ν − µ)Fm (µ)H (ν)
)
)
( (
ρ
ρ
+ J(ν ′ , µ) Fm′ 1 + (µ, ν) + Fm (ν)(H ′ (ν) − H ′ (µ)) c
c
′
(ς − µ)Fm (µ)H ′′ (ν ′ ) − (ν − µ)Fm (µ)H ′′ (ν) × ′
(ν − µ)Fm (µ)H ′′ (ν ′ ) × (ν − µ)Fm (µ)H ′′ (ν) ρ
c Fm (ν̃) (H(µ) − H(ν̃) + (ν̃ − µ)(2H ′ (ν̃ − H ′ (µ)))) (
)
ρ
′
′′
1
+
+
(ν̃
−
µ)F
J(µ,
ν̃)
m
c
|H (ν̃)| |ν ′ − ν|
≤ ηξε
( (
)) ∆2 |ν − ν ′ | + ∆2 |ν ′ − ν|
ρ
+ 3∆ Fm′ (1 + 3∆) + 2∆2
c
ξ 2 η 2 ε2
)
(
(
)
ρ
ρ
(
(
) ρ
)
3
′
1
+
3∆
+
δF
3∆
ρ
2∆2
m
c
+ 3∆Fm′ 1 + 3∆ + 2∆2
≤ c
|ν − ν ′ |
ξεη
c
c
ξ 3 η 3 ε2
Therefore L(µ, ν) is uniformly Lipschtiz continuous in ν.
• When ν → µ, we still use the parameters defined in first case:
L(µ, ν) (µ − ν)H ′′ (ν̃) ρc Fm (ν̃ ′ )(ν − µ) =
ρ
ν−µ
(ν − µ)2 Fm (µ)H ′′ (ν)
c
∆2
≤
η
65
ν
1.0
0.8
0.6
0.4
0.2
0.2
0.4
0.6
v(μ)
⋁(μ)=μ
0.8
1.0
μ
⋁m * (μ)
ν(µ) is defined by: ρc J(ν(µ), µ) = 1.
)
(
∗ (µ) is defined by: F ′ 1 + ρ J(µ, ν ∗ (µ)) + ρ F (ν ∗ (µ))(H ′ (ν ∗ (µ)) −
νm
m
m
c
c m m
H ′ (µ)) = 0.
The red line and blue lines are solution path of ODE µ̇ = L(µ, ν) with initial
value satisfying Lemma 17.
Figure B.9: Phase diagram of (µ̇, ν̇).
66
Combine the two cases, L(µ, ν) will be uniformly Lipschtiz continous for all ν ∈ [δ, 1 − δ].
On the other hand, continuous lf L(µ, ν) on µ will be trivial because all components
constructing L are continuous when µ is bounded aways from ν. When µ → ν, the result
in second case we discussed before can also be used to show continuity in µ.
Lemma 17. Assume µ0 ∈ [µ∗ , 1), Fm (µ0 ) ̸= 0, Fm′ ≥ 0, ν0 ∈ [µ0 , 1) satisfies:
( (
) ρ
)
ρ
′
′
′
Fm (µ0 ) Fm 1 + J(µ0 , ν0 ) + Fm (ν0 )(H (ν0 ) − H (µ0 )) ≥ 0
c
c
Then there is a continuous function ν on [µ0 , 1] satisfying initial condition ν(µ0 ) = ν0 .
On {µ|ν(µ) > µ}, ν is differentiable, strictly decreasing and satisfis ODE:
(
)
Fm′ 1 + ρc J(µ, ν) + ρc Fm (ν)(H ′ (ν) − H ′ (µ))
ν̇ = J(ν, µ)
(ν − µ)Fm (µ)H ′′ (ν)
Proof. Before we proceed to solving the ODE, we characterize the dynamics of (µ, ν)
on [0, 1]2 . Figure B.9 shows the phase diagram of µ̇, ν̇ on [0, 1]2 and some important
functions that determines the dynamics of (µ, ν). The horizongtal axis is µ and vertical
axis is ν. The black line is ν = µ. The two thin black lines characterizes ν(µ) as the
solutions to:
ρ
1 − J(ν(µ), µ) = 0
c
The two dashed black lines characterizes ν ∗ (µ) as the two solutions to:
(
) ρ
ρ
′
∗
Fm 1 + J(µ, ν (µ)) + Fm (ν ∗ (µ))(H ′ (ν ∗ (µ)) − H ′ (µ)) = 0
c
c
Since we are discussing the case ν ≥ µ, we only focus on the upper left half the graph:
• F (µ0 ) < 0. This corresponds to the left half of the graph.
(
) ρ
ρ
Fm′ 1 + J(µ0 , ν0 ) + Fm (ν0 )(H ′ (ν0 ) − H ′ (µ0 )) ≤ 0
c
c
∗
=⇒ ν0 ≥ ν (µ0 )
Therefore our initial condition means (µ0 , ν0 ) lies in the red region. ν̇ = 0 when
ν(µ) = ν ∗ . otherwise ν̇ < 0. When F (µ) is close to 0, ν̇ goes to negative infinity if
ν > ν ∗ (µ). So the dynamics of ν in this region must have ν strictly decreasing and
reaches ν ∗ when F (µ) = 0. Intuitively, ν will never reach the region ν > ν0 . Then
uniform Lipschtiz continuity of L(µ, ν) on ν ∈ [µ, ν0 ], for µ ∈ [µ0 , F −1 (−η)] will be
enough to guarantee existence of solution.
• F (µ0 ) > 0. This corresponds to the right half of the graph.
(
) ρ
ρ
′
Fm 1 + J(µ0 , ν0 ) + Fm (ν0 )(H ′ (ν0 ) − H ′ (µ0 )) ≥ 0
c
c
=⇒ ν0 ≤ ν ∗ (µ0 )
Our intial condition will lie below the dashed line in blue region. L(µ, ν) < 0 in this
region and L(µ, ν ∗ ) = 0. So the dynamics of ν in this region must have ν strictly
decreasing until it reaches ν = µ. Then uniform Lipschtiz continuity of L(µ, ν) on
ν ∈ [µ, ν0 ] for µ ∈ [µ0 , 1] will be sufficient ot guanrantee existence of solution.
67
Then we discuss in details the solution of ODEs:
• Fm (µ0 ) > 0. Our conjecture is that solution ν will be no larger than ν0 within the
region: µ ∈ [µ0 , ν0 ], ν ∈ [µ0 , ν0 ]. Therefore, we modify L(µ, ν) to define L̃(µ, ν) on the
whole space:
L̃(µ, ν) = L (max {min {µ, ν0 } , µ0 } , max {min {ν, ν0 } , µ0 })
It’s not hard to see that L̃ is uniformly Lipschtiz continuous w.r.t ν ∈ R for µ ∈ [0, 1]
and continuous in µ ∈ [0, 1]. We can apply Picard-Lindelof to solve for ODE ν̃˙ = L̃(µ, ν̃)
on the space with initial condition ν̃(µ0 ) = ν0 .
– Consider ν̃ on [µ0 , 1], it starts at ν0 > µ0 . It first reaches ν = µ at µ ∈ (µ0 , 1]
(we define it to be 1 when it doesn’t exist). Then for µ ∈ (µ0 , µ), we must have
∗
L(µ, ν̃) < 0. Suppose not, then there exists ν̃(µ) ≥ νm
(µ) > ν0 . We pick a
smallest µ such that this is true. Then this µ must be strictly larger than µ0
∗
∗ (µ ). Then at µ, ṽ(µ)
˙
because L(µ,0 , ν0 ) = 0 < ν˙m
= 0 but ν̇m
(µ) > 0. It’s
0
∗
˙
impossible that ṽ crosses νm from below. Contradiction. Then ν̃ < 0 until it hits
ν = µ.
– µ < ν0 . Suppose µ ≥ µ0 , since ν̃ < 0 on (µ0 , µ), ν̃(µ) < ν0 . Contradiction.
Therefore, ν̃ on [µ0 , µ] will be with region [µ0 , ν0 ].
In the region [µ0 , µ] × [µ0 , ν0 ], L̃ coincides L. Therefore, ṽ is a solution to original ODE
Equation (B.12). We define ν:
{
ν̃(µ) if µ ∈ [µ0 , µ]
ν(µ) =
µ
if µ ∈ [µ, 1]
It’s easy to verify that ν satisfies Lemma 17. The blue line on Figure B.9 illustrates a
solution in this case.
• Fm (µ0 ) < 0. Define µ0 = F −1 (0), our conjecture is that solution ν will be decreasing
on [µ0 , µ0 ). ∀η > 0, define µη = F−1 (−η), we modify L(µ, ν) to define L̃(µ, ν) on the
whole space:
∗
L̃(µ, ν) = L (max (min (µ, µη ) , µ0 ) , max {min {ν, ν0 } , νm
(µ)})
It’s not hard to see that L̃ is uniformly Lipschtiz continuosu w.r.t. ν ∈ R for µ ∈ [0, 1]
and continuous in µ ∈ [0, 1]. We can apply Picard-Lindelof to solve for ODE ν̃˙ = L̃(µ, ν̃)
on the space with initial condition ν̃(µ0 ) = ν0 . ν̃ will be strictly decreasing on (µ0 , µη ].
∗
is must crosses from below and this is not possible.
Because when ν̃ first touches νm
η
Then, when µ ∈ [µ0 , µ ], we have L(µ, ν̃) = L̃(µ, ν̃). Therefore ν̃ is a solution to
original ODE Equation (B.12).
Then we extend ν̃ to [µ0 , µ0 ) by taking η → 0 and define:
{
ν̃(µ)
if µ ∈ [µ0 , µ0 )
ν(µ) =
limµ→F −1 (0) ν̃(µ) if µ = F −1 (0)
68
First since ν̃ is decreasing, the sup limit will actually be the limit and ν ∈ C[µ0 , µ0 ].
Then we show that this extension is left differentiable at µ0 . Consider:
V (µ) =
1−
Fm (µ)
ρ
J(ν(µ), µ)
c
By Equation (B.13), we know that on [µ0 , µ0 ) sign of V ′ is determined by sign of
1 − ρc J(ν(µ), µ). At initial value, V0 ≥ 0 =⇒ 1 − ρc J(ν0 , µ0 ) > 0. On the other hand,
V (µ) will be bounded above by V . So 1 − ρc J(ν(µ), µ) as a continuous function of µ has
to stay above 0. Therefore V ′ (µ) > 0 on [µ)0 , µ0 ). By monotonic convergence, there
exists limµ→µ0− V (µ). Define it as V (µ0 ). We define:
ν̇(µ0 ) =
′
Fm
+ ρc (H ′ (ν(µ0 )) − H ′ (µ0 ))
V (µ0 )
ρ ′′
H (ν(µ0 ))(ν(µ0 ) − µ0 )
c
0
)
. Suppose not, there exists ε > 0, µn → µ0
Now we show that ν̇(µ0 ) = limµ→µ0 ν(µ)−ν(µ
µ−µ0
0) s.t. ν̇(µ0 ) − ν(µµnn)−ν(µ
> ε. Suppose ν(µn ) > ν(µ0 ) + (ν̇(µ0 ) − ε) (µn − µ0 ):
−µ0
Fm (µ)
+ (ν̇(µ0 ) − ε)(µn − µ0 ), µn )
1−
Fm′
=⇒ lim V (µn ) ≤ ρ
n→∞
(−H ′ (ν(µ0 )) + H ′ (µ0 ) + H ′′ (ν(µ0 ))(ν(µ0 ) − µ0 )(ν̇(µ0 ) − ε))
c
Fm′
<ρ
(−H ′ (ν(µ0 )) + H ′ (µ0 ) + H ′′ (ν(µ0 ))(ν(µ0 ) − µ0 )ν̇(µ0 ))
c
V (µn ) <
ρ
J
c
(ν 0
=V (µ0 )
First strict inequality is from 1 − ρc J(ν, µ) strictly increasing in ν. When Fm (µ) < 0,
Fm (µ)
will be decreasing in ν. Second inequality is by taking limit of lower bounded
1− ρc J(ν,µ)
of V (µn ) with L’Hospital rule. Third strict inequality is from ε > 0, H ′′ < 0. Last
equality is from definition of ν̇(µ0 ). We get contradiction. Similarly, we cna rule out
ν(µn ) < ν(µ0 ) + (ν̇(µ0 ) + ε)(µn − µ0 ). Therefore, we extended ν to [µ0 , µ0 ] such that
it’s differentiable on [µ0 , µ0 ] and smooth on (µ0 , µ0 ).
Let µ0 = µ0 , ν0 = ν(µ0 ), ν0′ = ν̇(µ0 ), then ν0 > µ0 and
{
1 − ρc J(ν0 , µ0 ) = 0
ρ
(H ′ (µ0 )
c
− H ′ (ν0 ) + H ′′ (ν0 )(ν0 − µ0 )ν0′ ) =
V (µ0 )
′
Fm
≤
V (µ0 )
′
Fm
Then by Lemma 18, we can solve for ν(µ) on [µ0 , 1] satisfying the conditions in
Lemma 17. What’s more, ν̇(µ0 ) = ν0 , then ν is differentiable at µ0 . For any other
points in {µ|ν(µ) > µ}, ν is C (1) smooth. Since ν0′ < 0, then the solved ν will be
strictly decreasing.
Lemma 18. Assume Fm (µ0 ) = 0, Fm′ > 0, ν0 ∈ [µ0 , 1), ν0′ satisfies
{
1 − ρc J(ν0 , µ0 ) = 0
0<
ρ
c
(H ′ (µ0 ) − H ′ (ν0 ) + H ′′ (ν0 )(ν0 − µ0 )ν0′ ) ≤
69
V (µ0 )
′
Fm
Then there is a continuous function ν on [µ0 , 1] satifying initial condition ν(µ0 ) = ν0 ,
ν̇(µ0 ) = ν0′ . On {µ|ν(µ) > µ}, ν is differentiable, strictly increasing and satisfies ODE:
)
(
Fm′ 1 + ρc J(µ, ν) + ρc Fm (ν)(H ′ (ν) − H ′ (µ))
ν̇ = J(ν, µ)
(ν − µ)Fm (µ)H ′′ (ν)
∗
Proof. ∀ µ1 ∈ (µ0 , 1), ∀ν1 ∈ [µ1 , νm
(µ1 )), we consider the solution of ODE with initial
condition (µ0 , ν0 ). ∀η > 0, define µη = F −1 (η). Then like the proof of Lemma 17, we
∗
can solve for a smooth ν on [µη , µ]. ν will be strictly decreasing below νm
and strictly
∗
increasing over νm . Consider the slope of ν:
H ′ (ν) − H ′ (µ)
ν̇ = ′′
= L(µ, ν)
H (ν)(ν − µ)
ν itself satisfies ODE Equation (B.12), then uniqness of solution to ODE implies ν < ν
∀µ ∈ [µη , µ]. So solution must lies in the blue region in Figure B.9. When ν1 → ν(µ1 ),
1 − ρc J(ν, µ) → 0∀µ. Thus V (µ) → ∞. On the other hand, when µ1 → µ0 , ν1 = µ1 ,
V (µ) → Fm (µ0 ) = 0. Then when we move around µ1 , ν1 , we will have ν̇(µ0 ) higher or
lower than ν0′ .
We index V (µ0 ) by initial value (µ1 , ν1 ): V0 (µ1 , ν1 ). We show that V0 (µ1 , ν1 ) is continuous in (µ1 , ν1 ). Suppose not, then there exists limµn1 ,ν1n →µ1 ,ν1 V0 (µn1 , ν1n ) ̸= V0 (µ1 , ν1 ).
On the other hand, we index V (µη ) by initial value (µ1 , ν1 ): Vη (µ1 , ν1 ), then continuous
dependence of ODE guanrantees that limµn1 ,ν1n →µ1 ,ν1 Vη (µn1 , ν1n ) = Vη (µ1 , ν1 ). Therefore,
∀N , there exists η s.t.
limµn ,ν n →µ1 ,ν1 V0 (µn1 , ν1n ) − V0 (µ1 , ν1 )
1 1
> 3N
|µ0 − µη |
Then by continuity, we can have η sufficiently small that:
limµn ,ν n →µ1 ,ν1 V0 (µn1 , ν1n ) − Vη (µ1 , ν1 )
1
1
|µ0 − µη |
> 2N
Then we can have n sufficiently large that:
|V0 (µn1 , ν1n ) − Vη (µn1 , ν1n )|
>N
|µ0 − µη |
Then there must exists µ̃N s.t. |V ′ (µ̃N )| > N . On the other hand, |V ′ | must be bounded
because:
Fm (ν) − V ′ (µ)(ν − µ)
V (µ) =
1 + ρc J(µ, ν)
When V ′ going to positive infinity, V (µ) will go to Fm (µ). When V ′ going to negative
infinity, V (µ) will go to positive infinity. Both cases are impossible. Therefore, V0 (µ1 , ν1 )
will be a continuous function on initial value.
To sum up, there exists initial value (µ1 , ν1 ) s.t. the solved ν̇(µ0 ) = ν0′ .
70
Lemma 15′ . Assume µ0 ≤ µ∗ , Fm′ ≤ 0, V0 , V0′ satisfies:


V (µ0 ) ≥ V0 ≥ Fm (µ0 )
c Fm (µ′ ) − V0 − V0′ (µ′ − µ0 )

V0 = max
µ′ ≤µ ρ
J(µ0 , µ′ )
Then there exists a C (1) smooth and strictly decreasing V (µ) defined on [0, µ0 ] satisfying
V (µ) = max
′
µ ≤µ
c Fm (µ′ ) − V (µ) − V ′ (µ)(µ′ − µ)
ρ
J(µ, µ′ )
(B.7’)
and initial condition V (µ0 ) = V0 , V ′ (µ0 ) = V0′ .
Lemma 17′ . Assume µ0 ∈ (0, µ∗ ], Fm (µ0 ) ̸= 0, Fm′ ≤ 0, ν0 ∈ (0, µ0 ] satisfies:
(
(
) ρ
)
ρ
Fm (µ0 ) −Fm′ 1 + J(µ0 , ν0 ) + Fm (ν0 )(H ′ (µ0 ) − H ′ (ν0 )) ≥ 0
c
c
Then ∃ν ∈ C[0, µ0 ] satisfying initial condition ν(µ0 ) = ν0 . On {µ|ν(µ) > ν}, ν is differentiable, strictly decreasing and satifies ODE:
(
)
Fm′ 1 + ρc J(µ, ν) + ρc Fm (ν)(H ′ (ν) − H ′ (µ))
′
ν = J(ν, µ)
(ν − µ)Fm (µ)H ′′ (ν)
Lemma 18′ . Assume Fm (µ0 ) = 0, Fm′ < 0, ν0 ∈ (0, µ0 ], ν0′ satisfies
{
1 − ρc J(ν0 , µ0 ) = 0
0 > ρc (H ′ (µ0 ) − H ′ (ν0 ) + J ′′ (ν0 )(ν0 − µ0 )ν0′ ) ≥
V (µ0 )
′
Fm
Then ∃ ν ∈ C[0, µ0 ] satifying initial condition ν(µ0 ) = ν0 , ν̇(µ0 ) = ν0′ . On {µ|ν(µ) > µ},
ν is differentiable, strictly decreasing and satisfies ODE:
(
)
Fm′ 1 + ρc J(µ, ν) + ρc Fm (ν)(H ′ (ν) − H ′ (µ))
′
ν = J(ν, µ)
(ν − µ)Fm (µ)H ′′ (ν)
Lemma 19. Suppose at µ0 , V0 , V0′ , k ≥ 1 satisfies:

′
′
′
′
′
′

V0 = max c Fm−k (µ ) − V0 − V0 (µ − µ) ≥ max c Fm (µ ) − V0 − V0 (µ − µ)
µ′ ≥µ ρ
µ′ ≥µ ρ
J(µ, µ′ )
J(µ, µ′ )

V (µ ) ≥ V ≥ F
(µ )
0
0
m−k
0
Vm−k is the solution as defined in Lemma 15 with initial condition V0 , V0′ , then ∀µ ∈
[µ0 , ν(µ0 )]:
Vm−k (µ) ≥
′
(µ)(µ′ − µ)
c Fm′ (µ′ ) − Vm−k (µ) − Vm−k
µ′ ≥µ,m ∈[m−k,m] ρ
J(µ, µ′ )
max
′
71
Proof. We first show that:
V0 ≥
c Vm−k (µ′ ) − V0 − V0′ (µ′ − µ)
max
J(µ, µ′ )
µ′ ∈[µ0 ,µm ] ρ
Suppose not, then there exists µ′ s.t.
V0 <
c Vm−k (µ′ ) − V0 − V0′ (µ′ − µ)
ρ
J(µ, µ′ )
(B.14)
By definition of V0 , we must have Vm−k (µ′ ) > Fm−k (µ′ ). The inequality is trivial because if Fm−k (µ′ ) = Vm−k (µ′ ), then choosing µ′ will be suboptimal. Optimality implies
Equation (B.9) and Equation (B.8):

ρ
ρ
′
Fm−k
+ V0 H ′ (ν(µ)) = V0′ + V0 H ′ (µ)
c
(
)c (
) (
)
ρ
ρ
ρ
′
′
 F
(ν(µ))
+
V
H(ν(µ))
−
V
+
V
H(µ)
=
V
+
V
H
(µ)
(ν(µ) − µ)
m−k
0
0
0
0
0
c
c
c
We define L(V, λ, µ)(µ′ ) as a linear function of µ′ :
L(V, λ, µ)(µ′ ) = (V (µ) + λH(µ)) + (V ′ (µ) + λH ′ (µ))(µ′ − µ)
(B.15)
Define G(V, λ)(µ) as a function of µ:
G(V, λ)(µ) = V (µ) + λH(µ)
(B.16)
Then G(Fm−k , ρc Vm−k (µ0 ))(µ′ ) is a concave function of µ′ . Consider:
(
)
(
)
ρ
ρ
L Vm−k , Vm−k (µ0 ), µ0 (µ′ ) − G Fm−k , Vm−k (µ0 ) (µ′ )
c
c
This is a convex function and have unique minimum. Therefore, the minimum will be
determined by FOC. Simple calculation shows that it is minimized at ν(µ0 ) and the
minimal value is 0.
ρ
ρ
′
′
+ Vm−k (µ0 )H ′ (µ′ )
FOC : Vm−k
(µ0 ) + Vm−k (µ0 )H ′ (µ0 ) = Fm−k
c
c
It’s easy to see that this equation is identical to the FOC for ν(µ0 ). Now consider:
(
)
(
)
ρ
ρ
′
L Vm−k , Vm−k (µ0 ), µ0 (µ ) − G Vm−k , Vm−k (µ0 ) (µ′ )
c
c
(
) (
)
ρ
ρ
′
= Vm−k (µ0 ) + Vm−k (µ0 )H(µ0 ) + Vm−k (µ0 ) + Vm−k (µ0 )H ′ (µ0 ) (µ′ − µ0 )
c
c
(
)
ρ
′
′
− Vm−k (µ ) + Vm−k (µ0 )H(µ )
c
(
)
ρ
′
′
(µ0 )(µ′ − µ0 ) − Vm−k (µ0 )J(µ0 , µ′ )
= − Vm−k (µ ) − Vm−k (µ0 ) − Vm−k
c
<0
The last inequality is from rewriting Equation (B.14). Therefrore, L(Vm−k , ρc Vm−k (µ0 ), µ0 )(µ′ )−
G(Vm−k , ρc Vm−k (µ0 ))(µ′ ) will have minimum strictly negative. Suppose it’s minimized at
µ′′ . Then FOC implies:
ρ
ρ
′
′
(µ′′ ) + Vm−k (µ0 )H(µ′′ )
(µ0 ) + Vm−k (µ0 )H ′ (µ0 ) = Vm−k
Vm−k
c
c
72
Consider:
(
)
(
)
ρ
ρ
L Vm−k , Vm−k (µ0 ), µ′′ (ν(µ′′ )) − G Fm−k , Vm−k (µ0 ) (ν(µ′′ ))
c
c
(
)
(
)
ρ
ρ
=L Vm−k , Vm−k (µ0 ), µ0 (ν(µ′′ )) − G Fm−k , Vm−k (µ0 ) (ν(µ′′ ))
c
c
ρ
ρ
′′
′′
′
+ Vm−k (µ ) − Vm−k (µ0 ) + Vm−k (µ0 )(H(µ ) − H(µ0 )) − (Vm−k
(µ0 ) + H ′ (µ0 ))(µ′′ − µ0 )
c
c
ρ ′
ρ
′′
′
′′
≥Vm−k (µ ) − Vm−k (µ0 ) + Vm−k (µ0 )(H(µ ) − H(µ0 )) − (Vm−k (µ0 ) + H (µ0 ))(µ′′ − µ0 )
c
(
) c
(
)
ρ
ρ
′′
′′
=G Vm−k , Vm−k (µ0 ) (µ ) − L Vm−k , Vm−k (µ0 ), µ0 (µ )
c
c
>0
In the first equality we used FOC. In the first inequality we used suboptimality of ν(µ′′ )
at µ0 . However:
)
(
(
)
ρ
ρ
0 =L Vm−k , Vm−k (µ′′ ), µ′′ (ν(µ′′ )) − G Fm−k , Vm−k (µ′′ ) (ν(µ′′ ))
c
c
(
)
(
)
ρ
ρ
′′
′′
=L Vm−k , Vm−k (µ0 ), µ (ν(µ )) − G Fm−k , Vm−k (µ0 ) (ν(µ′′ ))
c
c
ρ
′′
′′
′′
+ (Vm−k (µ ) − Vm−k (µ)) (H(µ ) − H(ν(µ )) + H ′ (µ′′ )(ν(µ′′ ) − µ′′ ))
c
ρ
> (Vm−k (µ′′ ) − Vm−k (µ))J(µ′′ , ν(µ′′ ))
c
>0
Contradiction with optimality of ν(µ′′ ).
Now we show Lemma 19. Suppose it’s not true, then there exists µ′ ∈ (µ0 , ν(µ0 )),
µ′′ ≥ µm′ s.t.:
Vm−k (µ′ ) <
′
(µ′ )(µ′′ − µ′ )
c Fm′ (µ′′ ) − Vm−k (µ′ ) − Vm−k
ρ
J(µ′ , µ′′ )
Then by definition:
)
(
)
(
ρ
ρ
0 ≤L Vm−k , Vm−k (µ0 ), µ0 (µ′′ ) − G Fm′ , Vm−k (µ0 ) (µ′′ )
c
(
)
( c ρ
)
ρ
′′
=L Fm−k , Vm−k (µ0 ), ν(µ0 ) (µ ) − G Fm′ , Vm−k (µ0 ) (µ′′ )
c
c
(
)
(
)
ρ
ρ
′
0 ≤L Vm−k , Vm−k (µ0 ), µ0 (µ ) − G Vm−k , Vm−k (µ0 ) (µ′ )
c
(
)
( c ρ
)
ρ
′
=L Fm−k , Vm−k (µ0 ), ν(µ0 ) (µ ) − G Vm−k , Vm−k (µ0 ) (µ′ )
c
c
(
)
(
)
ρ
ρ
′
′′
=⇒ L Fm−k , Vm−k (µ ), ν(µ0 ) (µ ) − G Fm′ , Vm−k (µ′ ) (µ′′ )
c
c
(
)
(
)
ρ
ρ
′′
′
=L Fm−k , Vm−k (µ0 ), ν(µ0 ) (µ ) − G Fm , Vm−k (µ0 ) (µ′′ )
c
c
ρ
′
′′
+ (Vm−k (µ ) − Vm−k (µ0 )) J(µ0 , µ )
c
>0
)
(
)
(
ρ
ρ
L Fm−k , Vm−k (µ′ ), ν(µ0 ) (µ′ ) − G Vm−k , Vm−k (µ′ ) (µ′ )
c
c
73
(
)
(
)
ρ
ρ
=L Fm−k , Vm−k (µ0 ), ν(µ0 ) (µ′ ) − G Vm−k , Vm−k (µ0 ) (µ′ )
c
c
ρ
′
′
+ (Vm−k (µ ) − Vm−k (µ0 )) J(µ0 , µ )
c
>0
)
(
Now we consider L Vm−k , ρc Vm−k (µ′ ), µ′ (·):
 (
)
(
)
ρ
ρ
′
′
′
′

L Vm−k , Vm−k (µ ), µ (µ ) = G Vm−k , Vm−k (µ ) (µ′ )



c

)
( c ρ
(
)

ρ

′
′

L Vm−k , Vm−k (µ ), µ (ν(µ0 )) ≥ G Vm−k , Vm−k (µ′ ) (ν(µ0 ))
c
c
(
)
(
)
ρ
ρ

′
′
′

L Fm−k , Vm−k (µ ), ν(µ0 ) (µ ) > G Vm−k , Vm−k (µ ) (µ′ )


c

(
)
( c ρ
)


ρ

′
L Fm−k , Vm−k (µ ), ν(µ0 ) (ν(µ0 )) = G Vm−k , Vm−k (µ′ ) (ν(µ0 ))
c
c
 (
)
(
)
ρ
ρ

L Vm−k , Vm−k (µ′ ), µ′ (ν(µ0 )) ≥ L Fm−k , Vm−k (µ′ ), ν(µ0 ) (ν(µ0 ))
c
c
(
)
(
)
=⇒
ρ
ρ

′
′
′
L Vm−k , Vm−k (µ ), µ (µ ) < L Fm−k , Vm−k (µ′ ), ν(µ0 ) (µ′ )
c
c
The two equalities are directly from definition of L and G. First inequality is from subopti)
(
mality, second inequality is from previous calculation. Therefore L Vm−k , ρc Vm−k (µ′ ), µ′ (·)
(
)
is lower at µ′ and L Fm−k , ρc Vm−k (µ′ ), ν(µ0 ) (·) is lower at ν(µ0 ). Since both of them
(
)
are linear fuinctions, then L Vm−k , ρc Vm−k (µ′ ), µ′ (·) must be higher at any µ′′ > ν(µ0 ).
Therefore, this implies:
)
(
)
(
ρ
ρ
L Vm−k , Vm−k (µ′ ), µ′ (µ′′ ) > G Fm′ , Vm−k (µ′ ) (µ′′ )
c
c
Contradicting that µ′′ is superior than ν(µ′ ).
Lemma 19′ . Suppose at µ0 , V0 , V0′ , k ≥ 1 satisfies:

′
′
′
′
′
′

V0 = max c Fm−k (µ ) − V0 − V0 (µ − µ) ≤ max c Fm (µ ) − V0 − V0 (µ − µ)
µ′ ≤µ ρ
µ′ ≥µ ρ
J(µ, µ′ )
J(µ, µ′ )

V (µ ) ≥ V ≥ F
(µ )
0
0
m+k
0
Vm+k is the solution as defined in Lemma 15 with initial condition V0 , V0′ , then ∀µ ∈
[ν(µ0 ), µ0 ]:
Vm+k (µ) ≥
′
(µ)(µ′ − µ)
c Fm′ (µ′ ) − Vm−k (µ) − Vm−k
µ′ ≤µ,m ∈[m,m+k] ρ
J(µ, µ′ )
max
′
C. Proofs in Section 4
C.1. Convex Flow Cost
C.1.1. Proof of Theorem 3
Proof. In this part, we introduce the algorithm to construct V (µ) and ν(µ). We only
discuss the case µ ≥ µ∗ and the case µ ≤ µ∗ will follow by a symmetric method.
Algorithm:
74
• Step 1 : By Lemma 21, there exists µ∗ ∈ ∆(X) and V (µ) defined as:
Fm (µ′ ) − h(c)
J(µ, µ′ )
c
µ ,m,c
1 + ρc J(µ, µ′ )
V (µ) = max
′
• Step 2 : We construct the first piece of V (µ) to the right of µ∗ . By Lemma 21, there
are three possible cases of µ∗ to discuss (µ∗ = 1 is omitted by symmetry).
Case 1 : Suppose µ∗ ∈ (0, 1) and V (µ∗ ) > F (µ∗ ). Then there exists (m, ν(µ∗ ) > µ∗ , c)
s.t.
V (µ∗ ) =
Fm (ν(µ∗ )) − h(c)
J(µ∗ , ν(µ∗ ))
c
1 + ρc J(µ∗ , ν(µ∗ ))
With initial condition (µ0 = µ∗ , V0 = V (µ∗ ), V0′ = 0), we solve for Vm (µ) on [µ∗ , 1] as
defined in Lemma 23. Let µ̂m be the first µ ≥ µ∗ that:
c Fm−1 (µ′ ) − Vm (µ) − Vm′ (µ)(µ′ − µ) h(c)
−
µ ≥µ,c ρ
J(µ, µ′ )
ρ
Vm (µ) = max
′
We must have Vm (µ̂m ) ≥ Fm−1 (µ) otherwise there will be a µ even smaller. Then
we solve for Vm−1 with initial condition µ0 = µ̂m , V0 = Vm (µ̂m ), V0′ = Vm′ (µ̂m ). If
m − 1 > m, we continue this procedure by looking for µ̂m−1 until we get Vm (µ). Vm will
be piecewise defined by different Vm . By definition, Vm (µ) will be smoothly increasing
until it hits F . Since V, (µ∗ ) > F (µ∗ ), this intersection point will be µ∗∗ > µ∗ .
Case 2 : Suppose µ∗ ∈ (0, 1) but V (µ∗ ) = F (µ∗ ), consider:
Let µ∗∗
c Fk (µ′ ) − F (µ) h(c)
Ve (µ) = ′max
−
µ ≥µ,k,c ρ
J(µ, µ′ )
ρ
{
}
e
= inf µ|V (µ) > F (µ) .
Case 3 : Suppose µ∗ = 0, consider:
c Fk (µ′ ) − F1 (µ) − F1′ (µ′ − µ) h(c)
−
µ ≥µ,k,c ρ
J(µ, µ′ )
ρ
{ }
sup F
≤
inf
F
.
Therefore,
inf
µ
Ṽ
(µ)
>
F
(µ)
>
There exists δ s.t. ∀µ < δ, ∀µ′ ≤ µ2 , J(µ,µ
1
′)
0. We call it µ∗∗ .
Ṽ (µ) = ′max
• Step 3: For all µ ≥ µ∗∗ such that:
c Fk (µ′ ) − F (µ) − F ′− (µ)(µ′ − µ) h(c)
−
µ ≥µ,k,c ρ
J(µ′ , µ)
ρ
F (µ) = ′max
Let m be the optimal action. Solve for Vm with initial condition µ0 = µ, V0 =
F (µ), V0′ = F ′− (µ). Let µ̂m be the first µ ≥ µ0 that:
c Fm−1 (µ′ ) − Vm (µ) − Vm′ (µ)(µ′ − µ) h(c)
−
µ ≥µ,c ρ
J(µ, µ′ )
ρ
Vm (µ) = max
′
75
Then we solve for Vm−1 with initial condition µ0 = µ̂m , V0 = Vm (µ̂m ), V0′ = Vm′ (µ̂m ).
We continue this procedure until we get Vm0 , where m0 is the optimal action solved in
the definition of Ṽ (µ). Now suppose Vm0 first hits F (µ) at some point µ′′ , define:

′

if µ′ < µ

F (µ )
Vµ (µ′ ) =
Vm0 (µ′ ) if µ′ ∈ [µ, µ′′ ]


F (µ′ )
if µ′ > µ′′
Let Ω be the set of all these µ0 . Then either Vµ = F , or there exists open interval
(µ, µ′′ ) on which Vµ > F .
• Step 4 : Define:

Vm (µ)
V (µ) =
if µ ∈ [µ∗ , µ∗∗ ]
∗∗
 sup {Vµ0 (µ)} if µ ≥ µ
µ0 ∈Ω
Then ∀V (µ) > F (µ), there must exists µn s.t. V (µ) = limn Vµn (µ) by definition of
sup. Since Ω is a closed set, there exists µnk → µ0 . By continuous dependence,
Vµ0 (µ) = V (µ). Then in the open interval around µ that V > F , V (µ) = Vµ0 (µ).
Other uniqueness of ODE will be
violated. Therefore V will be locally defined as Vµ0
{ }
and is a smooth function on µ V (µ) > F (µ) .
In the algorithm, we only discussed the case µ∗ < 1 and constructed the value function
to the right of µ∗ . On the left of µ∗ , V can be defined by using a totally symmetric
argument by referring to Lemma 22′ .
Before we proceed to proof of smoothness and unimprovability of V , we state a useful
result:
Lemma 20. ∀µ ≥ µ∗ s.t. V (µ) = Fm (µ), we have:
c Fm+k (µ′ ) − Fm (µ) − Fm− ′ (µ′ − µ) h(c)
k
−
≜ Um
(µ)
µ ≥µ,c ρ
J(µ, µ′ )
ρ
Fm (µ) ≥ max
′
k
Suppose Lemma 20 is not true, then exists µ s.t. Um
(µ) > Fm (µ) = V (µ). F −′ is LSC
and left continuous function. Then Um k will be USC and left continuous w.r.t. µ when m
k
(µ) > Fm (µ), there exists µ0 > 0
is taken that F (µ = Fm (µ). By definition of µ∗∗ and Um
k
s.t. Um0 (µ0 ) = Fm0 (µ0 ). Take µ0 < µ to be the supremum of such µ0 . Now consider
)
(
initial condition µ0 , Fm0 (µ0 ), Fm′ 0 , by Lemma 22, we solve fore Vk (µ) on [µ0 , µ]. Now
∀µ′ ∈ (µ0 , µ) s.t. Vk (µ′ ) ≤ Fm′ (µ′ ), by immediate value theorem, µ′ can be picked that
Vk′ (µ′ ) ≤ Fm′ ′ . Therefore:
c Fk (µ′′ ) − Vk (µ′ ) − Vk′ (µ′ )(µ′′ − µ′ ) h(c)
−
J(µ′ , µ′′ )
ρ
µ′′ ,c ρ
c Fk (µ′′ ) − Fm′ (µ′ ) − Fm′ ′ (µ′ )(µ′′ − µ′ ) h(c)
≥ sup
−
J(µ′ , µ′′ )
ρ
µ′′ ,c ρ
Vk (µ′ ) = sup
≥Fm′ (µ′ )
Contradicting V (µ) = Fm (µ).
76
Lemma 20′ . ∀µ ≤ µ∗ s.t. V (µ) = Fm (µ), we have:
c Fm−k (µ′ ) − Fm (µ) − Fm+ ′ (µ′ − µ) h(c)
k
−
≜ Um
(µ)
µ ≤µ,c ρ
J(µ, µ′ )
ρ
Fm (µ) ≥ max
′
Smoothness
Given our construction of V (µ), ∀µ s.t. V (µ) > F (µ), V is piecewise solution of the
ODEs and is C (1) smooth by construction. However on {µ|V (µ) = F (µ)}, our definition
of V is by taking supremum over an uncountable set of Vµ ’s. Therefore V (µ) is not
necessarily differentiable. We now discuss smoothness of V on this set in details (we
only discuss µ ≥ µ∗ and leave the remaining case to symmetry argument). Suppose
µ ∈ {µ|V (µ) = F (µ)}o , then V = F locally on an open interval. To show smoothness of
V , it’s sufficient to show smoothness of F . Suppose not, then µ = µm . However, at µm ,
∀c > 0:
′
′
′
c Fm+1 (µ ) − Fm (µm ) − Fm (µ − µm ) h(c)
lim
−
=∞
µ′ →µ+
ρ
J(µm , µ′ )
ρ
m
Therefore, we apply the result just derived and get contradiction. Now we only need
to discuss the boundary of {µ|V (µ) = F (µ)}. The first case is that {µ|V (µ) > F (µ)} is
not dense locally. Therefore, V = F locally at only side of µ, which implies one sided
smoothness. The only remaining case is that there exists µn → µ s.t. F (µn ) < V (µn ). We
first show differentiability of V at µ. We already know that V (µ′ ) − V (µ) ≥ F ′ (µ)(µ′ − µ)
(µ)
since V ≥ F . Suppose now µn → µ+ and V (µµnn)−V
≥ F ′ (µ) + ε. ∀n, V (µn ) ≥
−µ
F (µn ) + ε(µn − µ) > F (µn ). Then V is smooth around µn . µn can be picked that
V ′ (µ′n ) ≥ F ′ (µ) + ε by Lemma 12.
Consider (ν(µn ), c(µn )) being the solution of posterior and cost level associated with
µn , by definition of µn , ν(µn ) ≥ µm+1 (when µn < µm+1 , the objective function will be
negative, therefore suboptimal for sure). So we can pick a converging subsequence of µ2n
to some ν ≥ µm+1 and c(µn ) → c. Then:
F (µ) = lim V (µn )
(
)
c(µn ) Fmn (ν(µn )) − V (µn ) − V ′ (µn )(ν(µn ) − µn ) h(c(µn ))
= lim
−
n→∞
ρ
J(µn , ν(µn ))
ρ
′
c(µn ) Fmn (ν(µn )) − F (µn ) − (F (µ) + ε)(ν(µn ) − µn ) h(c)
≤ lim
−
n→∞
ρ
J(µn , ν(µn ))
ρ
′
cε ν(µn ) − µn
h(c)
c(µn ) Fmn (ν(µn )) − F (µn ) − F (µ)(ν(µn ) − µn )
− lim
−
≤ lim
n→∞ ρ J(µn , ν(µn ))
n→∞
ρ
J(µn , ν(µn ))
ρ
′
c Fm′ (ν) − F (µ) − F (µ)(ν − µ) cε ν − µ
h(c)
=
−
−
ρ
J(µ, ν)
ρ J(µ, ν)
ρ
<F (µ)
(µn )
Contradiction. Now suppose µn → µ− and V (µ)−V
≤ F ′ (µ) − ε. Then similarly we can
µ−µn
choose µn s.t. V ′ (µn ) ≤ F ′ (µ) − ε. Choose (ν, m, c) being the optimal posterior, action
and cost at µ. Then:
F (µ) = lim V (µn )
77
c(µn ) Fm (ν) − V (µn ) − V ′ (µn )(ν − µn ) h(c(µn ))
−
n→∞
ρ
J(µn , ν)
ρ
′
c(µn ) Fm (ν) − V (µn ) − F (µ)(ν − µn ) c ε(ν − µn ) h(c(µn ))
≥ lim
+
−
n→∞
ρ
J(µn , ν)
ρ J(µn , ν)
ρ
′
c(µn ) Fm (ν) − V (µn ) − F (µ)(ν − µn )
c(µn ) ε(ν − µn ) h(c(µn ))
≥ lim
+ lim
−
n→∞
ρ
J(µn , ν)
ρ J(µn , ν)
ρ
n→∞
′
c Fm (ν) − F (µ) − F (µ)(ν − µ) h(c) cε ν − µ
=
−
+
ρ
J(µ, ν)
ρ
ρ J(µ, ν)
≥ lim
>F (µ)
Contradiction. Therefore we showed that V will be differentiable everywhere. Now
suppose V ′ is not continuous at µ. Utilizing previous proof, we have already ruled out
the cases when limµ′ →µ+ > F ′ (µ) and limµ′ →µ− < F ′ (µ). Suppose now exists µn → µ+
and V ′ (µn ) ≤ F ′ (µ) − ε. Then consider:
F (µ) = lim V (µn )
c Fm (ν) − V (µn ) − V ′ (µn )(ν − µn ) h(c)
−
≥ lim
n→∞ ρ
J(µn , ν)
ρ
′
c Fm (ν) − V (µn ) − F (µ)(ν − µn )
c ε(ν − µn ) h(c)
≥ lim
+ lim
−
n→∞ ρ
J(µn , ν)
ρ
n→∞ ρ J(µn , ν)
′
c Fm (ν) − F (µ) − F (µ)(ν − µ) h(c) cε ν − µ
−
+
=
ρ
J(µ, ν)
ρ
ρ J(µ, ν)
>F (µ)
Contradiction. When µn → µ− and V ′ (µn ) ≥ F ′ (µ) + ε, similarly as before, we can take
ν(µn ) converging to ν ≥ µm+1 and c(µn ) converging to c. Then:
F (µ) = lim V (µn )
c(µn ) Fmn (ν(µn )) − V (µn ) − V ′ (µn )(ν(µn ) − µn ) h(c(µn ))
= lim
−
n→∞
ρ
J(µn , ν(µn ))
ρ
′
c(µn ) Fmn (ν(µn )) − F (µn ) − F (µ)(ν(µn ) − µn )
cε ν(µn ) − µn
h(c(µn ))
≤ lim
− lim
−
n→∞
n→∞ ρ J(µn , ν(µn ))
ρ
J(µn , ν(µn ))
ρ
′
c Fm′ (ν) − F (µ) − F (µ)(ν − µ) h(c) cε ν − µ
=
−
−
ρ
J(µ, ν)
ρ
ρ J(µ, ν)
<F (µ)
Contradiction. To sum up, we proved that V (µ) is differentiable on (0, 1) and V ′ (µ)
is continuous on (0, 1). What’s more, since µ∗ ∗ is bounded away from {0, 1}, in the
neighbour of {0, 1}, V = F . Therefore V (µ) is C (1) smooth on [0, 1].
Unimprovability
Now we prove the unimprovability of V (µ).
• Step 1 : We first show that V (µ) solves the following problem:
{
}
c Fm (µ′ ) − V (µ) − V ′ (µ)(µ′ − µ) h(c)
V (µ) = max F (µ), max
−
µ′ ,m,c ρ
J(µ, µ′ )
ρ
78
(P-C1)
{
µ′ ≥ µ when µ ≥ µ∗
µ′ ≤ µ when µ ≤ µ∗
Case 1 : V (µ) > F (µ). Then there exists µ0 s.t. V (µ) = Vµ0 (µ). Suppose the associated
action is m0 at µ0 , m at µ. Then the construction of Vµ0 guarantees than ∀µ′ ∈ [µ, µ0 ],
∃µ̂m′ ∈ [µ0 , µ]. Then by Lemma 24, Equation (P-C1) is satisfied at µ.
Case 2 : V (µ) = F (µ). Then there are two possibilities. If µ ∈ Ω, then by construction
of Vµ , we have:
c Fk (µ′ ) − F (µ) − F ′− (µ)(µ′ − µ) h(c)
F (µ) = ′max
−
µ ≥µ,k,c ρ
J(µ, µ′ )
ρ
This is exactly Equation (P-C1).
The only remaining case is that µ ∈
̸ Ω and Equation (P-C1) is violated:
c Fk (µ′ ) − F (µ) − F ′− (µ)(µ′ − µ) h(c)
F (µ) < ′max
−
µ ≥µ,k,c ρ
J(µ, µ′ )
ρ
This possibility is ruled out by Lemma 20.
• Step 2 : Then we show that V (µ) solves the following problem:
{
}
c V (µ′ ) − V (µ) − V ′ (µ)(µ′ − µ) h(c)
−
V (µ) = max F (µ), max
µ′ ,c ρ
J(µ, µ′ )
ρ
{
µ′ ≥ µ when µ ≥ µ∗
(P-D1)
µ′ ≤ µ when µ ≤ µ∗
Suppose not, then there exists:
c Fm (µ′ ) − V (µ) − V ′ (µ)(µ′ − µ) h(c)
−
µ ≥µ,c ρ
J(µ, µ′ )
ρ
′′
′′
′
′′
′′
c V (µ ) − V (µ) − V (µ)(µ − µ) h(c )
≤ V (µ) <
−
ρ
J(µ, µ′ )
ρ
Ṽ = max
′
Suppose the optimizer is µ′ , m, c. Optimality implies Equation (C.3):
V (µ′ ) − V (µ) − V ′ (µ)(µ′ − µ)
= h′ (c′ )
J(µ, µ′ )
Together with Equation (C.1), we have c′ h′ (c′ ) = ρṼ + h(c′ ). Then combine with
Equation (C.2), we get:
{
Fm′ + h′ (c)H ′ (µ′ ) = V ′ (µ) + h′ (c′ )H ′ (µ)
(Fm (µ′ ) + h′ (c′ )H(µ′ )) − (V (µ) + h′ (c′ )H(µ)) = (V ′ (µ) + h′ (c′ )H ′ (µ))(µ′ − µ)
We define L and G as in Theorem 2. Then L will be linear and G(Fm , h′ (c′ ))(µ′ ) will
be a concave function of µ′ . Consider:
L(V, h′ (c′ ), µ)(µ′ ) − G(Fm , h′ (c′ ))
79
FOC implies that it will be convex and attains minimum 0 at µ′ . For any m′ other
than m,
L(V, h′ (c′ ))(µ′ ) − G(Fm′ , h′ (c′ ))(µ′ )
will be convex and weakly larger than zero. However:
L(V, h′ (c′ ), µ)(µ′′ ) − G(V, h′ (c′ ))(µ′′ )
= − (V (µ′′ ) − V (µ) − V ′ (µ)(µ′′ − µ) − h′ (c′ )J(µ, µ′′ ))
<0
The inequality is from definition of c′ :
c′ h′ (c′ ) − h(c′ ) < c′′ h′ (c′′ ) − h(c′′ )
=⇒ h′ (c′ ) < h′ (c′′ )
V (µ′′ ) − V (µ) − V ′ (µ)(µ′′ − µ)
=⇒ h′ (c′ ) <
J(µ, µ′′ )
Therefore, L(V, h′ (c′ ), µ)(·)−G(V, h′ (c′ ))(·) will have a strictly negative minimum. Suppose it’s minimized at µ̃, Then FOC implies:
V ′ (µ) + h′ (c′ )H ′ (µ) = V ′ (µ̃) + h′ (c′ )H ′ (µ̃)
Consider:
L (V, h′ (c′ ), µ̃) (ν(µ̃)) − G (Fm , h′ (c′ )) (ν̃)
=L (V, h′ (c′ ), µ) (ν(µ̃)) − G (Fm , h′ (c′ )) (ν(µ̃))
+ V (µ̃) − V (µ) + h′ (c′ )(H(µ̃) − H(µ)) − (V ′ (µ) + h′ (c′ )H(µ)) (µ̃ − µ)
≥V (µ̃) − V (µ) + h′ (c′ )(H(µ̃) − H(µ)) − (V ′ (µ) + h′ (c′ )H ′ (µ)) (µ̃ − µ)
=G (V, h′ (c′ )) (µ̃) − L (V, h′ (c′ ), µ) (µ̃)
>0
Let m′ , ν(µ̃), c̃ be maximizer at µ̃, c̃h′ (c̃) = ρV (µ̃) + h(c̃):
0 =L(V, f ′ (c̃), µ̃)(ν(µ̃)) − G(Fm′ , h′ (c̃))(ν(µ̃))
=L(V, h′ (c′ ), µ̃)(ν(µ̃)) − G(Fm′ , h′ (c′ ))(ν(µ̃))
+ (f ′ (c̃) − h′ (c′ ))J(µ̃, ν(µ̃))
>(f ′ (c̃) − h′ (c′ ))J(µ̃, ν(µ̃))
Since µ̃ > µ, we have h′ (c̃) − h′ (c′ ) > 0. Contradiction. Therefore we proved Equation (P-D1).
• Step 3 : We show that V satisfies Equation (8). First, since V is smooth, envelope
theorem implies:
V ′ (µ) = −
c ν−µ
(V ′′ (µ) + h′ (c)H ′′ (µ))
ρ J(µ, ν)
80
>0
=⇒ V ′′ (µ) + h′ (c)H ′′ (µ) < 0
Therefore, allocating to diffusion experiment will always be suboptimal. What’s more,
consider:
c V (µ′ ) − V (µ) − V ′ (µ)(µ′ − µ)
µ ≤µ,c ρ
J(µ, µ′ )
c µ′ − µ
=⇒ V ′− (µ) = −
(V ′′ (µ) + h′ (c)H ′′ (µ))
′
ρ J(µ, µ )
V − (µ) = max
′
V − (µ∗ ) = V (µ∗ ) and whenever V (µ) = V − (µ), we will have V −′ (µ) < 0. Therefore,
V − (µ) can never cross from below, that is to say:
}
{
1 ′′
′
′
′
2
ρV (µ) = max ρF (µ), max
p(V (µ ) − V (µ) − V (µ)(µ − µ)) + V (µ)σ − h(c)
µ′ ,p,σ,c
2
1
s.t. pJ(µ, µ′ ) + H ′′ (µ)σ 2 = c
2
Lemma 21. Define V + and V − :
cFm (µ′ ) − h(c)J(µ, µ′ )
µ ≥µ,m,c
c + ρJ(µ, µ′ )
cFm (µ′ ) − h(c)J(µ, µ′ )
V − (µ) = ′max
µ ≤µ,m,c
c + ρJ(µ, µ′ )
V + (µ) = ′max
There exists µ∗ ∈ [0, 1] s.t. V + (µ) ≥ V − (µ) ∀µ ≥ µ∗ ; V + (µ) ≤ V − (µ) ∀µ ≤ µ∗ .
−
+
as following:
and Um
Proof. We define function Um
cFm (µ′ ) − h(c)J(µ, µ′ )
= max
µ′ ≥µ,c
c + ρJ(µ, µ′ )
cFm (µ′ ) − h(c)J(µ, µ′ )
−
U m (µ) = max
µ′ ≤µ,c
c + ρJ(µ, µ′ )
U+
m (µ)
Since h(c), Fm (µ) and J(µ, µ′ ) are all smooth functions, the objective function will be
smooth. First consider FOCs and SOCs:
)
(
) ( h(c) ρ
ρ
′
′
′
′
FOC-µ :Fm 1 + J(µ, µ ) −
+ Fm (µ ) (H ′ (µ) − H ′ (µ′ )) = 0
c
c
c
FOC-c : ρFm (µ′ ) + h(c) − h′ (c) (c + ρJ(µ, µ′ )) = 0
[
]
c(ρFm (µ′ ) + h(c))(c + ρJ(µ, µ′ ))H ′′ (µ′ )
0
SOC : H =
0
−J(µ, µ′ )(c + ρJ(µ, µ′ ))2 h′′ (c)
Noticing that SOC is evaluated at the pairs (µ′ , c) at which FOC holds.
Remark. Details of calculation of second derivatives:
81
• Hµ′ ,µ′ :
∂ 2 cFm (µ′ ) − h(c)J(µ, µ′ )
∂µ′2
c + ρJ(µ, µ′ )
[
1
2
′
′
′
′
′ 2
=
3 2ρ (cFm (µ ) − h(c)J(µ, µ )) (H (µ) − H (µ ))
′
(c + ρJ(µ, µ ))
− 2ρ (c + ρJ(µ, µ′ )) (H ′ (µ) − H ′ (µ′ )) (cFm′ − h(c)(H ′ (µ) − H ′ (µ′ )))
+ ρ(c + ρJ(µ, µ′ ))(cFm (µ′ ) − h(c)J(µ, µ′ ))H ′′ (µ′ )
]
′ 2
′′
′
+(c + ρJ(µ, µ )) h(c)H (µ )
(h(c) + ρFm (µ′ ))(H ′ (µ) − H ′ (µ′ ))
c + ρJ(µ, µ′ )
∂ 2 cFm (µ′ ) − h(c)J(µ, µ′ )
=⇒
∂µ′2
c + ρJ(µ, µ′ )
[
1
=
2ρ2 (cFm (µ′ ) − h(c)J(µ, µ′ )) (H ′ (µ) − H ′ (µ′ ))2
(c + ρJ(µ, µ′ ))3
FOC-µ′ =⇒ Fm′ =
+ 2ρ(c + ρJ(µ, µ′ ))h(c)(H ′ (µ) − H ′ (µ′ ))2
− 2ρ(h(c) + ρFm (µ′ ))(H ′ (µ) − H ′ (µ′ ))2
+ ρ(c + ρJ(µ, µ′ ))(cFm (µ′ ) − h(c)J(µ, µ′ ))H ′′ (µ′ )
]
′ 2
′′
′
+(c + ρJ(µ, µ )) h(c)H (µ )
[
1
=
ρ(c + ρJ(µ, µ′ ))(cFm (µ′ ) − h(c)J(µ, µ′ ))H ′′ (µ′ )
(c + ρJ(µ, µ′ ))3
]
′ 2
′′
′
+(c + ρJ(µ, µ )) h(c)H (µ )
=(c + ρJ(µ, µ′ ))H ′′ (µ′ ) (ρcFm (µ′ ) − ρh(c)J(µ, µ′ ) + ch(c) + ρh(c)J(µ, µ′ ))
c(ρFm (µ′ ) + h(c))(c + ρJ(µ, µ′ ))H ′′ (µ′ )
=
(c + ρJ(µ, µ′ ))3
• Hc,c :
∂ 2 cFm (µ′ ) − h(c)J(µ, µ′ )
∂c2
c + ρJ(µ, µ′ )
[
1
=
2 (cFm (µ′ ) − h(c)J(µ, µ′ ))
(c + ρJ(µ, µ′ ))3
− 2(c + ρJ(µ, µ′ ))(Fm (µ′ ) − h′ (c)J(µ, µ′ ))
]
′
′ 2 ′′
− J(µ, µ )(c + ρJ(µ, µ )) h (c)
FOC-c =⇒ cFm (µ′ ) − h(c)J(µ, µ′ ) = (c + ρJ(µ, µ′ ))(Fm (µ′ ) − h′ (c)J(µ, µ′ ))
∂ 2 cFm (µ′ ) − h(c)J(µ, µ′ )
∂c2
c + ρJ(µ, µ′ )
[
1
=
2(c + ρJ(µ, µ′ ))(Fm (µ′ ) − h′ (c)J(µ, µ′ ))
′
3
(c + ρJ(µ, µ ))
=⇒
82
− 2(c + ρJ(µ, µ′ ))(Fm (µ′ ) − h′ (c)J(µ, µ′ ))
]
′
′ 2 ′′
− J(µ, µ )(c + ρJ(µ, µ )) h (c)
=
−J(µ, µ′ )(c + ρJ(µ, µ′ ))2 h′′ (c)
(c + ρJ(µ, µ′ ))3
• Hµ′ ,c :
∂ 2 cFm (µ′ ) − h(c)J(µ, µ′ )
∂c∂µ′
c + ρJ(µ, µ′ )
[
1
2ρ(cFm (µ′ ) − h(c)J(µ, µ′ ))(H ′ (µ) − H ′ (µ′ ))
=
′
3
(c + ρJ(µ, µ ))
− ρ(c + ρJ(µ, µ′ ))(Fm (µ′ ) − h′ (c)J(µ, µ′ ))(H ′ (µ) − H ′ (µ′ ))
− (c + ρJ(µ, µ′ ))(cFm′ − h(c)(H ′ (µ − H ′ (µ′ ))))
′
2
(Fm′
′
′
′
]
′
− h (c)(H (µ) − H (µ )))
+ (c + ρJ(µ, µ ))
[
1
=
2ρ(cFm (µ′ ) − h(c)J(µ, µ′ ))(H ′ (µ) − H ′ (µ′ ))
′
3
(c + ρJ(µ, µ ))
− ρ(cFm (µ′ ) − h(c)J(µ, µ′ ))(H ′ (µ) − H ′ (µ′ ))
(h(c) + ρFm (µ′ ))(H ′ (µ) − H ′ (µ′ ))
− (c + ρJ(µ, µ′ ))(c
h(c)(H ′ (µ − H ′ (µ′ ))))
c + ρJ(µ, µ′ )
)]
(
(h(c) + ρFm (µ′ ))(H ′ (µ) − H ′ (µ′ ))
′
′
′
′
′ 2
− h (c)(H (µ) − H (µ ))
+ (c + ρJ(µ, µ ))
c + ρJ(µ, µ′ )
H ′ (µ) − H ′ (µ′ )
=
(ρcFm (µ′ ) − ρh(c)J(µ, µ′ ) − c(h(c) + ρFm (µ′ ))
(c + ρJ(µ, µ′ ))3
)
+(c + ρJ(µ, µ′ ))h(c) + (c + ρJ(µ, µ′ )(h(c) + ρFm (µ′ ))) − (c + ρJ(µ, µ′ ))2 h′ (c)
=0
The only term we don’t know its sign is
ρFm (µ′ ) + h(c) =
c + ρJ(µ, µ′ ) ′
F
H ′ (µ) − H ′ (µ′ ) m
Therefore, H will be ND if µ′ > µ and Fm′ > 0, or µ′ < µ and Fm′ < 0. In these cases,
FOC uniquely characterizes the maximum. Suppose µ′ > µ and Fm′ < 0 or µ′ < µ
and Fm′ > 0, the H will never be ND, and choice of µ′ will be on boundary. What’s
more, simple calculation shows that choosing µ′ = µ will dominate choosing µ′ = 0, 1.
Therefore:
′
U+
m (µ) = Fm (µ) when Fm < 0
′
U−
m (µ) = Fm (µ) when Fm > 0
When Fm′ > 0, envelope condition implies:
(
)
−H ′′ (µ)(µ′ − µ) h(c) + ρc Fm (µ′ )
d +
U (µ) =
>0
)2
(
dµ m
1 + ρ J(µ, µ′ )
c
83
Similarly, when Fm′ < 0, envelope condition implies:
(
)
−H ′′ (µ)(µ′ − µ) h(c) + ρc Fm (µ′ )
d −
U (µ) =
<0
(
)2
dµ m
1 + ρ J(µ, µ′ )
c
−
Therefore, U +
m and U m have exactly the same properties as in Lemma 13, the rest of
∗
proofs simply follow Lemma 13. What’s more, we define νm
and c∗m as the maximizer in
this problem.
Lemma 22. Assume µ0 ≥ µ∗ , Fm′ ≥ 0, V0 , V0′ satisfies:


V (µ0 ) ≥ V0 > Fm (µ0 )
c Fm (µ′ ) − V0 − V0′ (µ′ − µ) h(c)

−
V0 = max
µ′ ≥µ,c ρ
J(µ, µ′ )
ρ
Then there exists a C (1) smooth and strictly increasing V (µ) defined on [µ0 , 1] satisfying:
c Fm (µ′ ) − V (µ) − V ′ (µ)(µ′ − µ) h(c)
V (µ) = max
−
µ′ ≥µ,c ρ
J(µ, µ′ )
ρ
(B.7-c)
and initial condition V (µ0 ) = V0 , V ′ (µ0 ) = V0′ .
Proof. We start from deriving FOC and SOC for Equation (B.7-c):
( ′
)
Fm − V ′ (µ) Fm (µ′ ) − V (µ) − V ′ (µ)(µ′ − µ) ′ ′
′ c
′
FOC-µ :
+
(H (µ ) − H (µ)) = 0
ρ
J(µ, µ′ )
J(µ, µ′ )2
)
(
1 Fm (µ′ ) − V (µ) − V ′ (µ)(µ′ − µ)
− h′ (c) = 0
FOC-c:
′
ρ
J(µ, µ )
[ −2(H ′ (µ)−H ′ (µ′ ))(FOC-µ′ )
]
c (Fm (µ′ )−V (µ)−V ′ (µ)(µ′ −µ))H ′′ (µ′ )
1
′
+
FOC-µ
′
′
2
J(µ,µ )
ρ
J(µ,µ )
c
SOC:
H=
′′
1
′
FOC-µ
− h ρ(c)
c
Noticing that Hc,c < 0, therefore c satisfying FOC will be unique given µ, µ′ . On the other
hand, FOC-µ′ is independent of c. Hµ′ ,µ′ < 0 when FOC-µ′ ≥ 0. Therefore, solution of
F)C-µ′ will be unique. When FOCs are satisfied, H is strictly ND, then the solution of
FOCs are going to be maximizer. Therefore, FOC-µ′ and FOC-c uniquely characterize
optimal choice of µ′ , c. Now we impose feasibility:
V (µ) =
c Fm (µ′ ) − V (µ) − V ′ (µ)(µ′ − µ) h(c)
−
ρ
J(µ, µ′ )
ρ
(C.1)
FOCs reduces to:
ρV (µ) + h(c) ′ ′
(H (µ ) − H ′ (µ)) = 0
c
FOC-c: ch′ (c) = ρV (µ) + h(c)
FOC-µ′ :(Fm′ − V ′ (µ)) +
Differentiate FOC-c, we get:

ch′ (c) − h(c)


V (µ) =
ρ
′′

ch (c)

V ′ (µ) =
ċ
ρ
84
(C.2)
(C.3)
(C.4)
Plug Equation (C.4) into Equation (C.2) and Equation (C.1):

ρ

ċ
=
(Fm′ + h′ (c)(H ′ (µ′ ) − H ′ (µ)))


′′
ch (c)
(
)
h(c)
+
ρF
(µ)
1

m
′

c−
J(µ , µ) =
ρ
h′ (c)
(C.5)
We obtained an equation system with one ODE of (c, ċ) and one regular equation for
µ′ . Since J(µ′ , µ) is strictly monotonic for µ′ ≥ µ, we can also define an implicit inverse
function M to eliminate µ′ in the equation.
J(M (y, µ), µ) = y
Therefore we get an ODE:
(
( ( ( (
)) )
))
ρ
1
h(c) + ρFm (µ)
′
′
′
′
ċ = ′′
Fm + h (c) H M
c−
, µ − H (µ)
ch (c)
ρ
h′ (c)
(C.6)
We define cm (µ0 )f ′ (cm (µ0 )) − f (cm (µ0 )) = ρFm (µ) when this equation has solution and
cm (µ) = 0 when ρFm (µ) is so small that this equation has no solution. Since Fm (µ) is
increasing in µ, cm (µ) is increasing and strictly increasing when cm (µ) > 0. We consider
the initial conditions:
c0 h′ (c0 ) − h(c0 )
≤ V (µ0 )
Fm (µ0 ) < V0 =
ρ
=⇒ cm (µ0 ) < c0 ≤ c∗m (µ0 )
Then Lemma 23 guaranteed the existence of an increasing function c(µ) on [µ0 , 1].
Lemma 23. Define M as J(M (y, µ), µ) = y. Assume µ0 ∈ [µ∗ , 1), c0 satisfies:
cm (µ0 ) < c0 ≤ c∗m (µ0 )
Then there exists a C (1) and strictly increasing c on [µ0 , 1] satisfying initial condition
c(µ0 ) = c0 . On {µ|c(µ) > cm (µ)}, c solves:
(
( ( ( (
)) )
))
ρ
1
h(c) + ρFm (µ)
′
′
′
′
ċ = ′′
Fm + h (c) H M
c−
, µ − H (µ)
(C.6)
ch (c)
ρ
h′ (c)
Proof. We first characterize some useful properties of the ODE. We denote the ODE by
ċ = R(µ, c).
• Domain: By definition of cm (µ), ∀µ ∈ (0, 1)
cm (µ) −
h(cm (µ)) + ρFm (µ)
=0
f ′ (cm (µ))
Since cm ≥ 0, then h(cm (µ)) + ρFm (µ) ≥ 0. Therefore at c = cm (µ):
(
)
∂
h(c) + ρFm (µ)
h(c) + ρFm (µ) ′′
c−
=
h (c) > 0
′
∂c
h (c)
h′ (c)2
Therefore, ∀c ≥ cm (µ), c −
h(c)+ρFm (µ)
h′ (c)
≥ 0. Strictly inequality holds when c > cm (µ).
m (µ)
On the other hand, when c < cm (µ), if Fm (µ) ≥ 0, then c − h(c)+ρF
< 0. Else if
h′ (c)
Fm (µ) ≤ 0, then cm (µ) = 0. Since M only applies to non-negative reals, we know that
the ODE is only well defined in the region: {c|c ≥ cm (µ)}.
85
c
1.0
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
c(μ)
0.8
1.0
μ
cm * (μ)
Figure C.10: Phase diagram of (µ̇, ċ).
• Continuity: It’s straight forward that the ODE is well behaved (satisfying PicardLindelof) when µ is strictly bounded away from {0, 1}, c is uniformly bounded away
from cm (µ).
• Monotonicity: When c = c∗m (µ), ċ = 0. This can be shown by considering FOC on c∗m :
{
Fm′ − h′ (c)(H ′ (µ) − H ′ (µ′ )) = 0
(c + ρJ(µ, µ′ ))h′ (c) = h(c) + ρFm (µ′ )
=⇒ (c − ρJ(µ′ , µ))h′ (c) = h(c) + ρFm (µ) + ρFm′ (µ′ − µ) + h′ (c)(H ′ (µ′ ) − H ′ (µ))(µ′ − µ)
=⇒ (c − ρJ(µ′ , µ))h′ (c) = h(c) + ρFm (µ)
( ( ( (
) ))
)
h(c) + ρFm (µ)
1
′
′
′
′
=⇒ Fm + h (c) H M
c−
,µ
− H (µ) = 0
ρ
h′ (c)
=⇒ ċ = R(µ, c) = 0
Then we consider the monotonicity of R(µ, c):
∂
H ′′ (M )
1 h(c) + ρFm (µ) ′′
′′
′
′
′
R(µ, c) = h (c) (H (M ) − H (µ)) + h (c) ′′
h (c) < 0
∂c
H (M )(µ − M ) ρ
h′ (c)2
Therefore, R(µ, c) will be positive in {cm (µ) < c ≤ c∗m (µ)}. This refers to the blue
region in Figure C.10.
∀δ > 0, we consider solving the ODE ċ = R(µ, c) in region: µ ∈ [δ, 1 − δ], c ∈
[cm (µ) + δ, c∗m (µ)]. The initial condition (µ0 , c0 ) is in the blue region of Figure C.10.
Picard-Lindelof guarantees a unique solution satisfying the ODE in the region. What’s
more, it’s straight forward that the solution c(µ) will be increasing. A solution is a blue
86
line with arrows in Figure C.10. A solution c(µ) will lie between cm (µ) and c∗m (µ) until
it hits the boundary of region.
Now we can take δ → 0 and extend c(µ) towards the boundary. Since the end point
of c(µ) has both µ, c monotonically increasing, there is a limit c, µ with cm (µ) = c. Then
′
since R(µ, c) has a limit chρF′′m
, we actually have limµ→µ V ′ (µ) = Fm′ by Equation (C.4).
(c)
So the resulting V (µ) calculated from
V (µ) =
c(µ)h′ (c(µ)) − h(c(µ))
ρ
will be smooth on [µ0 , 1].
Lemma 22′ . Assume µ0 ≤ µ∗ , Fm′ ≥ 0, V0 , V0′ satisfies:


V (µ0 ) ≥ V0 > Fm (µ0 )
c Fm (µ′ ) − V0 − V0′ (µ′ − µ) h(c)

−
V0 = max
µ′ ≤µ,c ρ
J(µ, µ′ )
ρ
Then there exists a C (1) smooth and strictly decreasing V (µ) defined on [0, µ0 ] satisfying:
c Fm (µ′ ) − V (µ) − V ′ (µ)(µ′ − µ) h(c)
−
µ ≤µ,c ρ
J(µ, µ′ )
ρ
V (µ) = max
′
(B.7-c’)
and initial condition V (µ0 ) = V0 , V ′ (µ0 ) = V0′ .
Lemma 23′ . Define M as J(M (y, µ), µ) = y. Assume µ0 ∈ (0, µ∗ ], c0 satisfies:
cm (µ0 ) < c0 ≤ c∗m (µ0 )
Then there exists a C (1) and strictly decreasing c on [0, µ0 ] satisfying initial condition
c(µ0 ) = c0 . On {µ|c(µ) > cm (µ)}, c solves:
(
( ( ( (
)) )
))
ρ
1
h(c) + ρFm (µ)
′
′
′
′
ċ = ′′
Fm + h (c) H M
c−
, µ − H (µ)
(C.6’)
ch (c)
ρ
h′ (c)
Lemma 24. Suppose at µ0 , V0 , V0′ , k ≥ 1 satisfies:

′
′
′
′
′
′

V0 = max c Fm−k (µ ) − V0 − V0 (µ − µ) − h(c) ≥ max c Fm (µ ) − V0 − V0 (µ − µ) − h(c)
µ′ ≥µ,c ρ
µ′ ≥µ,c ρ
J(µ, µ′ )
ρ
J(µ, µ′ )
ρ

V (µ ) ≥ V ≥ F
(µ )
0
0
m−k
0
Vm−k is the solution as defined in Lemma 23 with initial condition µ0 , V0 , V0′ , then ∀µ ∈
[µ0 , ν(µ0 )]:
Vm−k (µ) ≥
′
(µ)(µ′ − µ) h(c)
c Fm′ − Vm−k (µ) − Vm−k
−
µ′ ≥µ,µ ∈[m−k,m],c ρ
J(µ, µ′ )
ρ
max
′
Proof. We first show that:
V0 ≥
c Vm−k (µ′ ) − V0 − V0′ (µ′ − µ) h(c)
−
µ′ ∈[µ0 ,µm ],c ρ
J(µ, µ′ )
ρ
max
87
Suppose not, then there exists µ′ , c′ s.t.

c′ Vm−k (µ′ ) − V0 − V0′ (µ′ − µ) h(c′ )


V
<
−
 0
ρ
J(µ, µ′ )
ρ
′
′
′

V
(µ ) − V0 − V0 (µ − µ)

 m−k
= h′ (c′ )
J(µ, µ′ )
(C.7)
Let c0 h′ (c0 ) = ρV0 + h(c0 ), then optimality implies Equation (C.1) and Equation (C.2):
{
′
Fm−k
+ h′ (c0 )H ′ (ν(µ)) = V0′ + h′ (c0 )H ′ (µ)
(Fm−k (ν(µ)) + h′ (c0 )H(ν(µ))) − (V0 + h′ (c0 )H(µ)) = (Vo′ + h′ (c0 )H ′ (µ)) (ν(µ) − µ)
We define L(V, λ, µ)(µ′ ) and G(V, λ)(µ) as Equation (B.15), Equation (B.16). Consider:
L (Vm−k , h′ (c0 ), µ0 ) (µ′ ) − G (Fm−k , h′ (c0 )) (µ′ )
L is a linear function and G is a concave function. Therefore this is a convex function and
have unique minimum determined by FOC. Simple calculation shows that it is minimized
at ν(µ0 ) and the minimal value is 0. Now consider
L (Vm−k , h′ (c0 ), µ0 ) (µ′ ) − G (Vm−k , h′ (c0 )) (µ′ )
′
= − (Vm−k (µ′ ) − Vm−k (µ0 ) − Vm−k
(µ0 )(µ′ − µ0 ) − h′ (c0 )J(µ0 , µ′ ))
<0
The inequality is from Equation (C.7) and definition of c0 :
c0 h′ (c0 ) − h(c0 ) < c′ h′ (c′ ) − h(c′ )
=⇒ h′ (c0 ) < h′ (c′ )
Vm−k (µ′ ) − V0 − V0′ (µ′ − µ0 )
=⇒ h′ (c0 ) <
J(µ0 , µ′ )
Therefore L (Vm−k , h′ (c0 ), µ0 ) (µ′ )−G (Vm−k , h′ (c0 )) (µ′ ) will be strictly negative at µ′ and
will have minimum strictly negative. Suppose it’s minimized at µ′′ , then FOC implies:
′
′
Vm−k
(µ0 ) + h′ (c0 )H ′ (µ0 ) = Vm−k
(µ′′ ) + h′ (c0 )H(µ′′ )
Let c′′ h′ (c′′ ) = ρVm−k (µ′′ ) + h(c′′ ), then we have c′′ > c0 and h′ (c′′ ) > h′ (c0 ). Consider:
L(Vm−k , h′ (c0 ), µ′′ )(ν(µ′′ )) − G(Fm−k , h′ (c0 ))(ν(µ′′ ))
=L(Vm−k , h′ (c0 ), µ0 )(ν(µ′′ )) − G(Fm−k , h′ (c0 ))(ν(µ′′ ))
+ Vm−k (µ′′ ) − Vm−k (µ0 ) + h′ (c0 )(H(µ′′ ) − H(µ0 )) − (V ′ (µ0 ) + h′ (c0 ))(µ′′ − µ0 )
≥Vm−k (µ′′ ) − Vm−k (µ0 ) + h′ (c0 )(H(µ′′ ) − H(µ0 )) − (V ′ (µ0 ) + h′ (c0 ))(µ′′ − µ0 )
=G(Vm−k , h′ (c0 ))(µ′′ ) − L(Vm−k , h′ (c0 ), µ0 )(µ′′ )
>0
However:
0 =L(Vm−k , h′ (c′′ ), µ′′ )(ν(µ′′ )) − G(Fm−k , h′ (c′′ ))(ν(µ′′ ))
88
=L(Vm−k , h′ (c0 ), µ′′ )(ν(µ′′ )) − G(Fm−k , h′ (c0 ))(ν(µ′′ ))
+ (f ′ (µ′′ ) − h′ (c0 ))(H(µ′′ ) − H(ν(µ′′ )) + H ′ (µ′′ )(ν(µ′′ ) − µ′′ ))
>(h′ (c′′ ) − h′ (c0 ))J(µ′′ , ν(µ′′ ))
>0
Contradiction.
Now we show Lemma 24. Suppose it’s not true, then there exists µ′ ∈ (µ0 , ν(µ0 )),
µ′′ ≥ µm , and c′′ s.t.

′
′
(µ′ ) − Vm−k
(µ′ )(µ′′ − µ′ ) h(c′′ )
c′′ Fm′ (µ′′ ) − Vm−k
′


−
V
(µ
)
<
 m−k
ρ
J(µ′ , µ′′ )
ρ
′′
′
′
′
′
′′
′

F ′ (µ ) − Vm−k (µ ) − Vm−k (µ )(µ − µ )

 m
= h′ (c′′ )
J(µ′ , µ′′ )
If we let c′ h′ (c′ ) = ρV (µ′ ) + h(c′ ), then c′ > c0 and h′ (c′ ) > h′ (c0 ). By definition:
0 ≤L(Vm−k , h′ (c0 ), µ0 )(µ′′ ) − G(Fm′ , h′ (c0 ))(µ′′ )
=L(Fm−k , h′ (c0 ), ν(µ0 ))(µ′′ ) − G(Fm′ , h′ (c0 ))(µ′′ )
0 ≤L(Vm−k , h′ (c0 ), µ0 )(µ′ ) − G(Fm′ , h′ (c0 ))(µ′ )
=L(Fm−k , h′ (c0 ), ν(µ0 ))(µ′ ) − G(Fm′ , h′ (c0 ))(µ′ )
=⇒ L(Fm−k , h′ (c′ ), ν(µ0 ))(µ′′ ) − G(Fm′ , h′ (c′ ))(µ′′ )
=L(Fm−k , h′ (c0 ), ν(µ0 ))(µ′′ ) − G(Fm′ , h′ (c0 ))(µ′′ )
+ (h′ (c′ ) − h′ (c0 ))J(µ0 , µ′′ )
>0
L(Fm−k , h′ (c′ ), ν(µ0 ))(µ′′ ) − G(Fm′ , h′ (c′ ))(µ′′ )
=L(Fm−k , h′ (c0 ), ν(µ0 ))(µ′ ) − G(Fm′ , h′ (c0 ))(µ′ )
+ (h′ (c′ ) − h′ (c0 ))J(µ0 , µ′ )
>0
No we consider L(Vm−k , h′ (c′ ), µ′ )(·) and L(Fm−k , h′ (c′ ), ν(µ0 ))(·):


L(Vm−k , h′ (c′ ), µ′ )(µ′ ) = G(Vm−k , h′ (c′ ))(µ′ )



L(v
′ ′
′
′ ′
m−k , h (c ), µ )(ν(µ0 )) ≥ G(Vm−k , h (c ))(ν(µ0 ))

L(Fm−k , h′ (c′ ), ν(µ0 ))(µ′ ) > G(Vm−k , h′ (c′ ))(µ′ )




L(Fm−k , h′ (c′ ), ν(µ0 ))(ν(µ0 )) = G(Vm−k , h′ (c′ ))(ν(µ0 ))
{
L(Vm−k , h′ (c′ ), µ′ )(ν(µ0 )) ≥ L(Fm−k , h′ (c′ ), ν(µ0 ))(ν(µ0 ))
=⇒
L(Vm−k , h′ (c′ ), µ′ )(µ′ ) < L(Fm−k , h′ (c′ ), ν(µ0 ))(µ′ )
d
d
Since both functions are linear: dµ
L(Vm−k , h′ (c′ ), µ′ )(µ) > dµ
L(Fm−k , h′ (c′ )ν(µ0 ))(µ),
then L(Vm−k , h′ (c′ ), µ′ )(·) must be larger than L(Fm−k , h′ (c′ ), ν(µ0 ))(·) at any µ′′ ≥ ν(µ0 ).
This implies:
L(Vm−k , h′ (c′ ), µ′ )(µ′′ ) > G(Fm′ , h′ (c′ ))(µ′′ )
Contradicting the assumption.
89
Lemma 24′ . Suppose at µ0 , V0 , V0′ , k ≥ 1 satisfies:

′
′
′
′
′
′

V0 = max c Fm+k (µ ) − V0 − V0 (µ − µ) − h(c) ≥ max c Fm (µ ) − V0 − V0 (µ − µ) − h(c)
µ′ ≤µ,c ρ
µ′ ≤µ,c ρ
J(µ, µ′ )
ρ
J(µ, µ′ )
ρ

V (µ ) ≥ V ≥ F
(µ )
0
0
m+k
0
Vm+k is the solution as defined in Lemma 23 with initial condition µ0 , V0 , V0′ , then ∀µ ∈
[ν(µ0 ), µ0 ]:
Vm+k (µ) ≥
′
(µ)(µ′ − µ) h(c)
c Fm′ − Vm−k (µ) − Vm−k
−
µ′ ≤µ,µ ∈[m,m+k],c ρ
J(µ, µ′ )
ρ
max
′
C.2. Continuum of Actions
C.2.1. Proof of Lemma 6
Proof. We prove with two steps:
Step 1 : We first show that if we let Vdt (F ) be the solution to Equation (3), then
Vdt is Lipschitz continuous in F under L∞ norm. ∀F1 , F2 convex and with bounded
subdifferentials, consider F = max {F1 ,2 }, F = min {F1 , F2 }. Then by properties of
convex functions, F , F are convex. ∂F (µ), ∂F (µ) ∈ {∂F1 (µ), ∂F2 (µ)}. Therefore F and
F are both within the domian of convex and bounded subdifferential functions with the
following quantitative property:
{
F ≥ F1 , F2 ≥ F
F − F = |F1 − F2 |
Noticing that Vdt is the fixed point of operator T defined by Equation (A.6). It’s not
hard to see that T is monotonicall increasing in F . Therefore, we have:
Vdt (F ) ≤ Vdt (F1 ), Vdt (F2 ) ≤ Vdt (F )
Now let (pi , µi ) be the policy solving Vdt (F ). Let V dt = Vdt (F ), V dt = Vdt (F ). Then
consider:
∑
V dt (µ) ≥1V dt (µ)≤F (µ) F (µ) + 1V dt (µ)>F (µ) e−ρdt
p1i (µ)V dt (µ1i )
∑
p1i (µ)1V dt (µ1i )≤F (µ1i ) F (µ1i )
≥1V dt (µ)≤F (µ) F (µ) + 1V dt (µ)>F (µ) e−ρdt
∑
∑
p1i (µ)1V dt (µ1i )>F (µ1i )
p2i (µ1i )V dt (µ2i )
+ 1V dt (µ)>F (µ) e−2ρdt
≥···
∑ ∏
∑
∑
t
pτi (µτi −1 )1V dt (µτi )>F (µτi )
=
e−ρt·dt
pti (µt−1
i )1V dt (µti )≤F (µti ) F (µi )
t
≥
∑
t
−
i1 ,...,it−1
e−ρt·dt
∑ ∏
pτi (µτi −1 )1V dt (µτi )>F (µτi )
i1 ,...,it−1
∑ ∑ ∏
t
pτi (µτi −1 )1V dt (µτi )>F (µτi )
i1 ,...,it−1
=V dt (µ) − F − F 90
∑
∑
t
pti (µt−1
i )1V dt (µti )≤F (µti ) F (µi )
pti (µt−1
i )1V dt (µti )≤F (µti ) F − F
Therefore, V dt − V dt ≤ F − F =⇒ |Vdt (F1 ) − Vdt (F2 )| ≤ |F1 − F2 |. Vdt (F ) has
Lipschitz parameter 1.
Step 2 : ∀F1 , F2 , ∀ε > 0, by Theorem 1, there exists dt s.t. |V(Fi ) − Vdt (Fi )| ≤
ε |F1 − F2 |. Therefore:
|V(F1 ) − V(F2 )| ≤ |V(F1 ) − Vdt (F1 )| + |V(F2 ) − Vdt (F2 )| + |Vdt (F1 ) − Vdt (F2 )|
≤(1 + 2ε) |F1 − F2 |
Take ε → 0, since LHS is not a function of ε, we conclude that V(F ) is Lipschitz continuous in F with Lipschitz parameter 1.
C.2.2. Proof of Theorem 4
Proof. We prove the three main results in following steps:
• Lipschitz continuity. By Lemma 6, we directly get Lipschitz continuity of operator V
on {Fn , F } and the Lipschitz parameter being 1.
• Convergence of derivatives. Let Vn = V(Fn ), V = V(F ), we show that ∀µ s.t. V (µ) >
F (µ), V ′ (µ) = lim Vn′ (µ). Since V (µ) > F (µ), by continuity strict inequality holds
in an closed interval [µ1 , µ2 ] around µ. Then by Lemma 26, limn→∞ Vn′ (µ′ ) exists
∀µ′ ∈ [µ1 , µ2 ]. Now consider function Vn′ (µ). Since Vn′′ (µ) is uniformly bounded for all
n, Vn′ (µ) are uniformly Lipschitz continuous, thus equicontinuous and totally bounded.
Therefore by lemma Arzela-Ascolli, Vn′ converges uniformly to lim Vn′ . By convergence
theorem of derivatives, V ′ = lim Vn′ on [µ1 , µ2 ]. Therefore, V ′ (µ) = limn→∞ Vn′ (µ).
• Feasibility. For µ s.t. V (µ) = F (µ), feasibility is trivial. Now we discuss the case
V (µ) > F (µ). We only prove for µ > µ∗ and µ = µ∗ , the case µ < µ∗ follows by
symmetry. If µ > µ∗ , there exists N s.t. ∀n ≥ N , µ > µ∗n . N can be picked large
enough that in a closed interval around µ, Vn (µ) > Fn (µ). Therefore, there exists νn
being maximizer for Vn (µ) bounded away from µ and satisfying:
Vn (µ) =
c Fn (νn ) − Vn (µ) − Vn′ (µ)(νn − µ)
ρ
J(µ, νn )
Pick a converging subsequence νn → ν:
c F (ν) − V (µ) − V ′ (µ)(ν − µ)
ρ
J(µ, νn )
c Fn (νn ) − Vn (ν) − Vn′ (ν)(νn − µ)
= lim
n→∞ ρ
J(µ, νn )
= lim Vn (µ)
n→∞
=V (µ)
Therefore V (µ) is feasible in Equation (B.3).
Suppose µ = µ∗ . Then there exists a subsequence of µ∗n converging from one side of
µ∗ . Suppose they are converging from left. Then µ ≥ µ∗n . Previous proof still works.
Essentially, what we showed is that the limit of strategy in discrete action problem
achieves V (µ) in the continuous action limit.
91
• Unimprovability. First, when µ ∈ {0, 1}, information provides no value but discounting
is costly, therefore V (µ) is unimprovable. We now show unimprovability on (0, 1) by
adding more feasible information acquisition strategies in several steps.
– Step 1. Poisson experiments at V (µ) > F (µ). In this step, we show that ∀µ ≥ µ∗
and V (µ) > F (µ):
ρV (µ) = max
c
′
µ ≥µ
F (µ′ ) − V (µ) − V ′ (µ)(µ′ − µ)
J(µ, µ′ )
Suppose not true, then there exists ν s.t.:
lim ρVn (µ) =ρV (µ)
n→∞
F (ν) − V (µ) − V ′ (ν)(ν − µ)
<c
J(µ, ν)
Fn (ν) − Vn (µ) − Vn′ (µ)(ν − µ)
= lim c
n→∞
J(µ, ν)
≤ lim ρVn (µ)
n→∞
Second line is by the opposite proposition. Third line is by convergence of Fn by
assumption, convergence of Vn by Lemma 6 and convergence of Vn′ by Lemma 26.
Last inequality is by suboptimality of ν.
Similarly, for the case µ ≤ µ∗ , we can apply a symmetric argument to prove.
– Step 2. Poisson experiments at V (µ) = F (µ). In this step, we shoe that ∀µ ≥ µ∗
and V (µ) = F (µ) (The symmetric case µ ≤ µ∗ is ommited):
F (µ′ ) − F (µ) − DV (µ, µ′ )(µ′ − µ)
ρF (µ) ≥ max
c
µ′ ≥µ
J(µ, µ′ )
First of all, we show that V is differentiable at µ and V ′ (µ) = F ′ (µ). Suppose not,
then since V (µ) = F (µ) and V ≥ F , we know that V − F is locally minimized at
µ. Therefore DV+ (µ) > DV− (µ). By Definition 1, there exists ε > 0, µn1 ↗ µ and
V (µn
V (µ)−V (µn )
2 )−V (µ)
µn2 ↘ µ s.t.
≥ ε + µ−µn 1 . Let δ1n = µ − µn1 , δ2n = µn2 − µ, this implies:
µn −µ
2
1
µn2 − µ
(µn2 − µ)(µ − µn1 )
µ − µn1
n
n
)
−
V
(µ))
+
)
−
V
(µ))
≥
ε
(V
(µ
(V
(µ
2
1
µn2 − µn1
µn2 − µn1
µn2 − µn1
µn2 − µ
µ − µn1
n
V
(µ
)
+
V (µn1 ) ≥ V (µ) + ε · min {δ1n , δ2n }
=⇒ n
2
n
n
n
µ2 − µ1
µ2 − µ1
On the other hand:
µ − µn1
µn2 − µ
n
(H(µ)
H(µ
))
+
(H(µ) − H(µn1 ))
−
2
n
n
n
n
µ2 − µ1
µ2 − µ1
(
)
n
1 ′′ n
µ − µ1
′
n
n
H (µ)(µ − µ2 ) + H (ξ2 )(µ − µ2 )
= n
µ2 − µn1
2
(
)
n
1 ′′ n
µ2 − µ
′
n
n 2
H (µ)(µ − µ1 ) + H (ξ1 )(µ − µ1 )
+ n
µ2 − µn1
2
92
=
1 (µn2 − µ)(µ − µn1 )
(H ′′ (ξ2n )(µn2 − µ) + H ′′ (ξ1n )(µ − µn1 ))
2
µn2 − µn1
ξ1n and ξ2n are determined by applying intermediate value theorem on H ′ . Now we
can choose N s.t. ∀n ≥ N , maxµ′ ∈[µn1 ,µn2 ] {H ′′ (µ′ )} ≤ 2H ′′ (µ). Therefore:
µ − µn1
µn2 − µ
n
(H(µ)
−
H(µ
))
+
(H(µ) − H(µn1 ))
2
µn2 − µn1
µn2 − µn1
≤H ′′ (µ)(µn2 − µ)(µ − µn1 )
=H ′′ (µ)δ1n δ2n
Now we consider a stationary experiment at µ that takes any experiment with posc
teriors (µn1 , µn2 ) with flow probability H ′′ (µ)δ
n n . Then by definition the flow cost of
1 δ2
this information acquisition strategy is less than c, thus is feasible. The expected
utility is:
c
Ve (µ) =
ρ
µ−µn
1
nV
µn
2 −µ1
µ−µn
1
n
µn
2 −µ1
µn
n
2 −µ
e
n V (µ1 ) − V (µ)
µn
2 −µ1
µn −µ
H(µn2 )) + µn2−µn (H(µ) − H(µn1 ))
2
1
(µn2 ) +
(H(µ) −
V (µ) − Ve (µ) + ε min {δ1n , δ2n }
H ′′ (µ)δ1n δ2n
V (µ) + ε min {δ1n , δ2n }
=⇒ Ve (µ) ≥
1 + ρc H ′′ (µ)δ1n δ2n
ε min {δ1n , δ2n } − ρc H ′′ (µ)δ1n δ2n
=V (µ) +
1 + ρc H ′′ (µ)δ1n δ2n
ε − H ′′ (µ) max {δ1n , δ2n }
=V (µ) + min {δ1n , δ2n }
1 + ρc H ′′ (µ)δ1n δ2n
≥
n can be pick large enough that ε−H ′′ (µ) max {δ1n , δ2n } is positive. Therefore Ve (µ) >
V (µ). Now fix n and define:
c
Vem (µ) =
ρ
=⇒
µn
n
2 −µ
e
n Vm (µ1 ) − Vm (µ)
µn
2 −µ1
µn −µ
H(µn2 )) + µn2−µn (H(µ) − H(µn1 ))
2
1
µ−µn
n
1
n Vm (µ2 )
µn
2 −µ1
µ−µn
1
n
µn
2 −µ1
(H(µ) −
+
lim Vem (µ) = Ve (µ) > lim Vm (µ)
m→∞
m→∞
There exists m large enough that Vem (µ) > Vm (µ), violating optimality of Vm . Contradiction. Therefore, we showed that V ′ (µ) = F ′ (µ).
Next we show unimperovability. Suppose not, then ∃ν s.t.:
F (µ) <
c F (ν) − F (µ) − F ′ (µ)(ν − µ)
ρ
J(µ, ν)
By continuity of V , ∃ε and a neighbourhood µ ∈ O, ∀µ′ ∈ O:
V (µ′ ) + ε ≤
c F (ν) − V (µ′ ) − F ′ (µ)(ν − µ′ )
ρ
J(µ′ , ν)
93
By uniform convergence of Fn and Vn , there exists ε > 0 and N s.t. ∀n ≥ N :
ε
c Fn (ν) − Vn (µ′ ) − F ′ (µ)(ν − µ′ )
Vn (µ ) + ≤
2
ρ
J(µ′ , ν)
c Fn (ν) − Vn (µ′ ) − Vn′ (µ′ )(ν − µ′ ) ε
c Fn (ν) − Vn (µ′ ) − F ′ (µ)(ν − µ′ )
=⇒
+
≤
ρ
J(µ′ , ν)
2
ρ
J(µ′ , ν)
ρε J(µ′ , ν)
=⇒ Vn′ (µ′ ) ≥ F ′ (µ) +
2c ν − µ′
′
′
J(µ ,ν)
, which is a positive number
In an interval around µ, Vn′ (µ′ ) − F ′ (µ) ≥ ρε
2c ν−µ′
independent of n and uniformly bounded away from 0 for all µ′ . Then it’s impossible
that V ′ (µ) = F ′ (µ). Contradiction.
What’s more, since V ′ is Lipschitz continuous at any V (µ) > F (µ), it can be extended smoothly to the boundary. Since V ′ = F ′ at V (µ) = F (µ), then the limit of
this smooth extension has lim V ′ (µ) = F ′ (µ). Therefore V is C (1) smooth on [0, 1].
– Step 3. Repeated experiments and contradictory experiments. With the convergence
result we have on hand, we can apply similar proof by contradiction method in step
1 and 2 to rule out these two cases. We omitted the proofs here. Therefore:
{
}
c V (µ′ ) − V (µ) − V ′ (µ)(µ′ − µ)
V (µ) = max F (µ), max
µ′ ρ
J(µ, µ′ )
– Step 4. Diffusion experiments. Suppose at µ, diffusion experiment is strictly optimal:
V (µ) < −
c D2 V (µ)
ρ H ′′ (µ)
Then by Definition 1, there exists ε, δ1 s.t.:
V (µ) + ε ≤
c V (µ + δ1 ) − V (µ) − V ′ (µ)δ1
ρ H(µ) − H(µ + δ1 ) + H ′ (µ)δ1
Then by definition of derivative, there exists δ2 s.t.:
V (µ) +
c
ε
≤
2
ρ
δ2
(V (µ + δ1 ) − V (µ))
δ1 +δ2
δ2
(H(µ) − H(µ + δ1 ))
δ1 +δ2
+
+
δ2
δ1 +δ2
δ2
δ1 +δ2
(V (µ − δ2 ) − V (µ))
(H(µ) − H(µ − δ2 ))
By convergence of Vn , there exists n s.t.:
Vn (µ) +
ε
c
≤
4
ρ
δ2
(Vn (µ + δ1 ) − Vn (µ))
δ1 +δ2
δ2
(H(µ) − H(µ + δ1 ))
δ1 +δ2
+
+
δ2
δ1 +δ2
δ2
δ1 +δ2
(Vn (µ − δ2 ) − Vn (µ))
(H(µ) − H(µ − δ2 ))
δ1
δ2
Vn (µ + δ1 ) +
Vn (µ − δ2 )
δ1 + δ2
δ1 + δ2
(
(
))
ρ
δ2
δ1
≥Vn (µ) 1 +
H(µ) −
H(µ + δ1 ) −
H(µ − δ2 )
c
δ1 + δ2
δ1 + δ2
(
)
ρ
δ2
δ1
ε
+
H(µ) −
H(µ + δ1 ) −
H(µ − δ2 )
c
δ1 + δ2
δ1 + δ2
4
=⇒
94
If we consider the experiment with posterior beliefs µ + δ1 , µ − δ2 at µ. Taking this
experiment at µ with flow probability:
H(µ) −
δ2
H(µ
δ1 +δ2
c
+ δ1 ) −
δ1
H(µ
δ1 +δ2
− δ2 )
Then the flow cost constraint will be satisfied and the utility gain is:
Ven (µ) =
δ2
V (µ
δ1 +δ2 n
+ δ1 ) +
δ1
V (µ
δ1 +δ2 n
− δ2 )
)
δ1
2
H(µ) − δ1δ+δ
H(µ
δ
H(µ
δ
+
)
−
−
)
1
2
δ1 +δ2
2
(
)
ρ
δ2
δ1
H(µ) − δ1 +δ2 H(µ + δ1 ) − δ1 +δ2 H(µ − δ2 )
c
ε
)
(
≥Vn (µ) +
2
1
1 + ρc H(µ) − δ1δ+δ
H(µ + δ1 ) − δ1δ+δ
H(µ − δ2 ) 4
2
2
1+
ρ
c
(
>Vn (µ)
Contradiction.
To sum up, we proved that V (µ) solves Equation (B.3).
Lemma 25 (Convergence of µ∗ ). Suppose Assumptions 1, 2, 3 and 4 are satisfied. Let
Fn be piecewise linear function on [0,1] satisfying:
1. |Fn − F | → 0;
2. ∀µ ∈ [0, 1], lim Fn′ (µ) = F ′ (µ).
Let µ∗n be as defined in Lemma 13 associated with Fn . Suppose µ∗ = lim µ∗n . Then,
1. ∀µ > µ∗ , ∃N s.t. ∀n ≥ N , νn (µ) ≥ µ.
2. ∀µ < µ∗ , ∃N s.t. ∀n ≥ N , νn (µ) ≤ µ.
Proof. ∀µ > µ∗ , by definition lim µ∗n = µ∗ , there exists N s.t. ∀n ≥ N : |µ∗n − µ∗ | <
|µ − µ∗ |. Therefore µ > µ∗n and thus νn (µ) ≥ µ. Same argument applies to µ < µ∗ .
Lemma 26. Suppose Assumptions 1, 2, 3 and 4 are satisfied. Let Fn be piecewise linear
function on [0,1] satisfying:
1. |Fn − F | → 0;
2. ∀µ ∈ [0, 1], lim Fn′ (µ) = F ′ (µ).
Define Vn = V(Fn ) and V = V(F ). Then: ∀µ ∈ [0, 1] s.t. V (µ) > F (µ), ∃ lim Vn′ (µ).
Proof. With Lemma 25, we can define µ∗ ∈ [0, 1] (we pick an arbitrary limiting point
when there are multiple ones). First by assumption lim Fn′ (µ) = F ′ (µ), and Vn′ = Fm′
on the boundary by construction in Theorem 2, the statement is automatically true for
µ ∈ {0, 1}. We discuss three possible cases for different µ ∈ (0, 1) separately.
• Case 1 : µ > µ∗ . If V (µ) > F (µ), then by convergence in L∞ norm, there exists
N and neighbourhood µ ∈ O s.t. ∀n ≥ N , µ′ ∈ O, Vn (µ′ ) > Fn (µ′ ). We know
that by no-repeated-experimentation property of solution νn (µ) to problem with Fn ,
νn (µ) > sup O. Now consider Vn′ (µ). Suppose Vn′ (µ) have unlimited limiting point.
95
Then exists subsequence lim Vn′ (µ) = ∞ or −∞. If lim Vn′ (µ) = ∞, consider ν = 0, else
if lim Vn′ (µ) = −∞, consider ν = 1:
V (µ) = lim Vn (µ)
n→∞
c Fn (ν) − Vn (µ) − Vn′ (µ)(ν − µ)
≥ lim
n→∞ ρ
J(µ, ν)
c F (ν) − V (µ) c
ν−µ
=
− lim Vn′ (µ)
n→∞
ρ J(µ, ν)
ρ
J(µ, ν)
=+∞
Contradiction. Therefore we know that Vn′ (µ) must have finite limiting points. Now
suppose Vn′ (µ) doesn’t converge, then there exists two subsequences lim Vn′ (µ) = V1′ and
lim Vm′ (µ) = V2′ , V1′ ̸= V2′ ∈ R. Suppose V1′ > V2′ . Now take a converging subsequence
of optimal policy at µ νnk → ν 1 . By previous result ν 1 ≥ sup O. Therefore ν 1 will be
bounded away from µ. Consider:
V (µ) = lim Vnk (µ)
k→∞
c Fmk (ν 1 ) − Vmk (µ) − Vm′ k (µ)(ν 1 − µ)
k→∞ ρ
J(µ, ν 1 )
c F (ν 1 ) − V (µ) − V2′ (ν 1 − µ)
=
ρ
J(µ, ν 1 )
Fnk (νnk ) − Vnk (µ) − Vn′K (µ)(νnk − µ) (V1′ − V2′ )(ν 1 − µ)
+
= lim
k→∞
J(µ, νnk )
J(µ, ν 1 )
≥ lim
>V (µ)
Contradiction. Therefore, limit point of Vn′ (µ) must be unique. Such limit point exists
since Vn′ are uniformly bounded. To sum up, there exists lim Vn′ (µ).
• Case 2 : µ = µ∗ . Since V (µ∗ ) > F (µ∗ ). This implies that ∃N s.t. ∀n ≥ N , Vn (µ∗ ) >
Fn (µ∗ ). In this case, by Lemma 13, µ∗n are unique. Since µ∗n is the unique intersection
of U n+ and U n− (Definition of U n+ , U n−1 are as in Lemma 13, n is index), we can first
establish convergence of µ∗ through convergence of U n+ and U n−1 . By definition:
U + (µ) =
Fm (µ′ )
ρ
µ ≥µ,m≥µ 1 + J(µ, µ′ )
c
max
′
Therefore, suppose the maximizer for index n is νn , mn , then for index n′ :
′
Fn′ (νn )
1 + ρc J(µ, νn )
Fn (νn ) − Fn′ (νn )
≥U n+ (µ) +
1 + ρc J(µ, νn )
U n + (µ) ≥
≥U n+ (µ) − |Fn − Fn′ |
Since n and n′ are totally symmetric, we actually showed that the functional map
from Fn to U n+ is Lipschitz continuous in Fn with Lipschitz parameter 1. Symmetric
96
argument shows that same property for U n− . Since by assumption Fn is uniformly
converging, we can conclude that U n+ and U n− are Cauchy sequence with L∞ norm.
Therefore converging. Then U n+ − U n− uniformly converges and their roots will be
UHC when n → ∞. To show convergence of µ∗n , it’s sufficient to show that such
limit is unique. This is not hard to see by applying envelope theory to U n+ and U n− :
′′ (µ)(ν −µ)
d
n
U n+ (µ) = − ρc F (νn )H
. Therefore U n+ − U n−1 will have slope bounded below
dµ
J(µ,νn )2
from zero, therefore the limit will also be strictly increasing. So µ∗ is unique.
Since µ∗n → µ, and Vn′′ (µ) are all bounded from above:
Vn′ (µ∗ ) =Vn′ (µ∗n ) + Vn′′ (ξn )(µ∗ − µ∗n )
=Vn′′ (ξn )(µ∗ − µ∗n ) → 0
• Case 3 : µ < µ∗ . We can apply exactly the symmetric proof of case 1.
C.3. General Information Measure
C.3.1. Proof of Theorem 5
Proof. Consider Equation (9), it’s sasy to see that both the inner maximization problem
and the constraint are linear in pi and σ 2 . Therefore, Equation (9) can be written
equivalently as choosing either one posterior or a diffusion experiment:
{
}
c (V (ν) − V (µ) − V ′ (µ)(ν − µ)) cV ′′ (µ)
ρV (µ) = max ρF (µ), sup
, ′′
J(µ, ν)
Jνν (µ, µ)
ν
Now suppose µ ∈ D and ρV (µ) = c JV′′
′′ (µ)
νν (µ,µ)
sup
ν
. This is saying, the maximization problem:
c (V (ν) − V (µ) − V ′ (µ)(ν − µ))
J(µ, ν)
will be solved for ν → µ. Therefore, consider the FOC:
FOC:
V ′ (ν) − V ′ (µ) Jν′ (µ, ν)
−
(V (ν) − V (µ) − V ′ (µ)(ν − µ))
J(µ, ν)
J(µ, ν)2
It must be ≤ 0 when ν → µ+ and ≥ 0 when ν → µ− . Otherwise, the diffusion experiment
will be locally dominated by some Poisson experiment. When ν → µ, J(µ, ν) → 0,
V ′ (ν) → V ′ (µ), V (ν) − V (µ) − V ′ (µ)(ν − µ) → 0. Therefore, we can apply L’Hospital’s
rule:
)
(
V (ν)−V (µ)−V ′ (µ)(ν−µ)
′′
′′
′
limν→µ V (ν) − Jνν (µ, ν)
− Jν (µ, ν) · F OC
J(µ,ν)
lim FOC =
ν→µ
limν→µ Jν′ (µ, ν)
)
(
V (ν)−V (µ)−V ′ (µ)(ν−µ)
′′
′′
(µ,
ν)
lim
V
(ν)
−
J
ν→µ
νν
J(µ,ν)
1
=
′
2
limν→µ Jν (µ, ν)
(
)
(3)
V (ν)−V (µ)(ν−µ)
(3)
′′
lim
V
(ν)
−
J
(µ,
ν)
−
J
(µ,
ν)
·
FOC
ννν
ν→µ
νν
J(µ,ν)
1
=
′′ (µ, ν)
2
limν→µ Jνν
97
1V
=
3
(3)
(3)
(µ) − Jννν (µ, µ) JV′′
′′ (µ)
νν (µ,µ)
′′ (µ, µ)
Jνν
′′
(µ)
Now consider V (µ) − ρc JV′′ (µ,µ)
. By assumption, it’s non-negative and achieves 0 at µ.
νν
Therefore it is locally minimized at µ:
(
)
d
c V ′′ (µ)
V (µ) −
=0
′′ (µ, µ)
dµ
ρ Jνν
)
ρ
V (3) (µ)
V ′′ (µ) ( (3)
(3)
(µ,
µ)
=0
(µ,
µ)
+
J
=⇒ V ′ (µ) − ′′
J
+ ′′
ννµ
c
Jνν (µ, µ) Jνν (µ, µ)2 ννν
(3)
=⇒
V (3) (µ) − Jννν (µ, µ) JV′′
νν (µ,µ)
′′ (µ, µ)
Jνν
(3)
=⇒
′′ (µ)
V (3) (µ) − Jννν (µ, µ) JV′′
′′ (µ)
νν (µ,µ)
′′ (µ, µ)
Jνν
(3)
ρ
Jννµ (µ, µ)
= V ′ (µ) + V ′′ (µ) ′′
c
Jνν (µ, µ)2
(3)
ρ
ρ
Jννµ (µ, µ)
= V ′ (µ) + V (µ) ′′
c
c
Jνν (µ, µ)
By smoothness of V and J, for FOC to be non-positive when ν → µ+ and non-negative
when ν → µ− , it can only be that:
′′
(3)
V ′ (µ)Jνν
(µ, µ) + V (µ)Jννµ
(µ, µ) = 0
′′
n)
, we have:
Now suppose there exists µn → µ s.t. ρV (µn ) = c J ′′V (µ(µn ,µ
n)
νν
′′
(3)
V ′ (µn )Jνν
(µn , µn ) + V (µn )Jννµ
(µn , µn ) = 0
By differentiability of the whole term, we have:
)
d ( ′
′′
(3)
V (µ)Jνν
(µ, µ) + V (µ)Jννµ
(µ, µ) = 0
dµ
( (3)
)
( (4)
)
′′
(3)
(4)
=⇒ V ′′ (µ)Jνν
(µ, µ) + V ′ (µ) 2Jννµ
(µ, µ) + Jννν
(µ, µ) + V (µ) Jνννµ
(µ, µ) + Jννµµ
(µ, µ) = 0
)
V (µ) ( (3)
ρ
′′
(3)
(3)
(µ, µ)2 − ′′
2Jννµ (µ, µ)2 + Jννν
(µ, µ)Jννµ
(µ, µ)
=⇒ V (µ)Jνν
c
Jνν (µ, µ)
( (4)
)
(4)
+ V (µ) Jνννµ (µ, µ) + Jννµµ
(µ, µ) = 0
(3)
(3)
(3)
2Jννµ (µ, µ)2 + Jννν (µ, µ)Jννµ (µ, µ)
ρ ′′
2
(4)
(4)
+ Jνννµ
(µ, µ) + Jννµµ
(µ, µ) = 0
=⇒ Jνν (µ, µ) −
′′
c
Jνν (µ, µ)
By assumption, µ ∈ D, therefore the differential equation must not be satisfied. This
implies that there doesn’t exist such µn → µ. So the set:
{
}
V ′′ (µ)
µ ∈ DρV (µ) = c ′′
Jνν (µ, µ)
is a closed set (closed w.r.t. D) containing no limiting point. That is to say, within any
compact subset of D, this set is finite. By definition of Lebesgue measure, the measure
of a set can be approximated by compact subsets from below. Therefore, this set will be
a zero-measure set.
98
C.4. Connection to Static Problem
C.4.1. Proof of Theorem 7
Proof. We prove by constructing a candidate solution satisfying the characterization in
Theorem 7, then show its optimality and uniqueness.
• Construction: F is a piecewise linear convex function on [0, 1] with finite kinks µk .
Now consider the function G(µ) = F (µ) + mc H(µ). By definition, in each interval
b
[µk , µk+1 ], G(µ) is a strictly concave function. Now consider G(µ)
= Co(G) which is
b will be locally linear in neighbourhood of any µ where
the upper concave hull of{G. G
}
b
b
b and piecewise
G(µ)
> G(µ). Let I = µG(µ)
> G(µ) . Since concave function G
convex function G are both continuous, I will be an open set. Therefore, I will be
consisted of countable open intervals ∪In .
Now we prove the following statement: ∀In , there exists µk ∈ In = {an , bn }. Suppose
b n ), G(bn ) = G(b
b n ).
not, then G(µ) will be strictly concave on In and G(an ) = G(a
′
Concavity of G implies that G (µ) being strictly decreasing on (an , bn ). On the other
b
b n ), this implies that G′ (an ) ≤
hand, since G(µ)
≥ G(µ) ∀µ ∈ (an , bn ) and G(an ) = G(a
b n ). Similarly, G′ (bn ) ≥ inf ∂ G(b
b n ). This is to say, if we replace G
b with G on
sup ∂ G(a
b will still have decreasing subdifferentials. G
b being the upper concave implies that
In , G
b
G(µ)
= G(µ) on In . Contradiction.
Since the number of µk is finite, we’ve shown that I is consisted of finite number of
open intervals ∪In . Now we define:
Ln (µ) = G(an ) +
G(bn ) − G(an )
(µ − an )
b n − an
Noticing that this is equivalent to defining:
m
b
V (µ) =G(µ)
− H(µ)
c
(
m
m )
=Co F + H (µ) − H(µ)
c
c
• Optimality: First it’s easy to see that V is feasible in Equation (10). ∀µ s.t. V (µ) >
(
)
F (µ), pick σ 2 = − H ′′2c(µ) . Then 12 σ 2 V ′′ (µ) = − H ′′c(µ) − mc H ′′ (µ) = m. Now we show
that it’s unimprovable in Equation (10). By construction, V (µ) + mc H(µ) is a concave
function, therefore ∀ν:
(
)
m
m
m
V (ν) + H(ν) ≤ V (µ) + H(µ) + V ′ (µ) + H ′ (µ) (ν − µ)
c
c
c
m
′
(H(µ) − H(ν) + H ′ (µ)(ν − µ))
=⇒ V (ν) − V (µ) − V (µ)(ν − µ) ≤
c
∑
1
′
=⇒
pi (V (νi ) − V (µ) − V (µ)(νi − µ)) + σ 2 V ′′ (µ)
2
∑ m
1 m
′
≤
pi (H(µ) − H(νi ) + H (µ)(ν − µ)) − σ 2 H ′′ (µ)
c
2 c
≤m
That is to say, V is unimprovable.
99
• Uniqueness: Suppose there is Ve ̸= V solving Equation (10), where Ve ∈ C (1) [0, 1] and
twice differentiable when Ve (µ) > F (µ). Now consider U = Ve − V ̸= 0. Suppose
min U < 0. Let µ∗ ∈ arg min U . By definition, µ∗ ∈ (0, 1). U (µ∗ ) < 0 implies that
V (µ∗ ) > F (µ∗ ). Therefore, µ∗ ∈ In . Now consider:
Ve (bn ) − Ve (µ∗ ) − Ve ′ (µ∗ )(bn − µ∗ )
H(µ∗ ) − H(bn ) + H ′ (µ∗ )(bn − µ∗ )
V (bn ) − V (µ∗ ) − V ′ (µ∗ )(bn − µ∗ )
U (bn ) − U (µ∗ ) − U ′ (µ∗ )(bn − µ∗ )
+
c
=c
H(µ∗ ) − H(bn ) + H ′ (µ∗ )(bn − µ∗ )
H(µ∗ ) − H(bn ) + H ′ (µ∗ )(bn − µ∗ )
∗
′
∗
∗
V (bn ) − V (µ ) − V (µ )(bn − µ )
U (bn ) − U (µ∗ )
=c
+
c
H(µ∗ ) − H(bn ) + H ′ (µ∗ )(bn − µ∗ )
H(µ∗ ) − H(bn ) + H ′ (µ∗ )(bn − µ∗ )
∗
′
∗
∗
V (bn ) − V (µ ) − V (µ )(bn − µ )
>c
H(µ∗ ) − H(bn ) + H ′ (µ∗ )(bn − µ∗ )
m ≥c
=m
The second equality is from µ∗ ∈ arg min U . The last inequality is from U (bn ) ≥ 0 and
U (µ∗ ) = 0. Contradiction.
e = Ve + m H ≥ G.
e First we show that G
e is weakly
Now suppose max U > 0. Consider G
c
concave. Suppose not, then there exists µ, ν s.t.
e
e
e′ (µ)(ν − µ)
G(ν)
> G(µ)
+G
m
=⇒ Ve (ν) − Ve (µ) − Ve ′ (µ)(ν − µ) >
(H(µ) − H(ν) + H ′ (µ)(ν − µ))
c
′
e
e
e
V (ν) − V (µ) − V (µ)(ν − µ)
=⇒ c
>m
H(µ) − H(ν) + H ′ (µ)(ν − µ)
Contradicting the optimality of Ve . Since max U > 0, there exists open interval I =
e > G on I and G(a)
e
e
e is
(a, b) s.t. G
= G(a), G(b)
= G(b). Since Fe > F on I, G
′
e (µ1 ) −
twice differentiable. By intermediate value theorem, there exists µ1 < µ2 , G
′
′
′
′
′
e (µ2 ) − G (µ2 ) < 0. By concavity of G, G (µ1 ) ≥ G (µ2 ). Therefore,
G (µ1 ) > 0, G
′
e (µ1 ) > G
f′ (µ2 ). Again by intermediate value theorem, there exists G
e′′ (µ) < 0. Since
G
e is globally concave, ∀ν ̸= µ,
G
e
e
e′ (µ)(ν − µ)
G(ν)
< G(µ)
+G
m
(H(µ) − H(ν) + H ′ (µ)(ν − µ))
=⇒ Ve (ν) − Ve (µ) − Ve ′ (µ)(ν − µ) <
c
Ve (ν) − Ve (µ) − Ve ′ (µ)(ν − µ)
=⇒ c
<m
H(µ) − H(ν) + H ′ (µ)(ν − µ)
e′′ (µ) < 0 =⇒ Ve ′′ (µ) < − m H ′′ (µ) =⇒ c Ve ′′′′(µ) < m. Contradicting feasibility of Ve .
G
c
−H (µ)
To sum up, we showed that V solving Equation (10) is unique.
100
C.4.2. Proof of Corollary 8
Proof. It’s not hard to observe that:
)
(
I(S; X ) =I (A, S; X ) − I A; X S
)
(
=I (A; X ) + I S; X A
≥I (A; X )
Therefore, Equation (11) will be equivalent to:
m
sup Eµ [u(A, X )] − I (A; X )
c
A
)
∑
∑
m(
H(µ) −
pi H(νi )
= sup
pi F [νi ] −
c
νi ,pi
) m
∑ (
m
= sup
pi F [νi ] + H(νi ) − H(µ)
c
c
νi ,pi
(
)
m
m
=Co F + H (µ) − H(µ)
c
c
This is exactly the solution in Theorem 7.
C.4.3. Proof of Theorem 9
Proof. Take any information acquisition strategy (S t , At , T ) that satisfies the constraints
in Equation (1). The achieved expected utility will be:
]
[
∞
∑
(
)
[
]
e−ρdt·t λI S t ; X |S t−1
E(S t =S t )∞
e−ρdt·T E u(AT , X )|S T −1 −
t=0
t=0
We can separate the utility gain part and information cost part. By the iterated rule of
expectation, the utility gain part is:
[ −ρdt·T [
]]
E(S t =S t )∞
e
E u(AT , X )|S T −1
t=0
[
(
)]
=ET ,AT e−ρdt·T u AT , X
It’s easy to see that
this is determined only by action time T and action process AT . Let
(
)
Set−1 = 1T =t , At T =t . Then by the three Markov properties in Equation (1), we have:
X → S t → Set
Therefore:
∞
∑
)
(
e−ρdt·t λI S t ; X S t−1
t=0
∞
∑
=λ
t=0
=λ
∞
∑
( (
)
(
))
e−ρdt·t I S t ; X − I S t−1 ; X
( (
)
(
(
)
(
))
)
e−ρdt·t I Set ; X + I S t ; S Set − I Set−1 ; X − I S t−1 ; S Set−1
t=0
=λ
∞
∑
t=0
e
−ρdt·t
(
∞
∞
(
(
∑
∑
t−1 )
t)
t−1 )
t
−ρdt·t
t
−ρdt·t
t−1
e
e
e
I S ;X S
+λ
e
I S ;X S − λ
e
I S ; X Se
t=0
t=0
101
=λ
∞
∑
t=0
≥λ
∞
∑
∞
)
(
(
)
(
)∑
I S t ; X Set
e−ρdt·t I Set ; X Set−1 + λ 1 − e−ρdt
t=0
(
)
e−ρdt·t I Set ; X Set−1
t=0
Therefore, by replacing signal process S t with Set , the DM can achieve the same utility
gain and pay a weakly lower information cost. Now consider:
∞
)
(
∑
[
(
)]
e−ρdt·t I Set ; X Set−1
ET ,AT e−ρdt·T u AT , X − λ
t=0
[ (
)
]
=P [T = 0] E u At , X T = 0
[
(
)]
[
(
)]
+ P [T = 1] E e−ρdt u AT , X + P [T > 1] E e−ρdt·T u AT , X
∞
(
(
∑
)
t−1 )
0
−ρdt·t
t
e
e
− λI A ; X µ − λ
e
I S ; X Se
t=1
[ (
)
]
=P [T = 0] E u At , X T = 0
(
)
[ −ρdt ( 1 ) ]
0
e
+ P [T = 1] E e
u A , X T = 1 − λI S ; X µ
∞
(
∑
t−1 )
[ −ρdt·T ( T
)]
−ρdt·t
t
e
e
I S ; X Se
+ P [T > 1] E e
u A ,X − λ
t=1
Suppose the term:
e
−ρdt
(
0)
]
[ ( 1 )
1
e
P [T = 1] E u A , X T = 1 − λI S ; X Se
(C.8)
∞
(
)
∑
[
(
)]
e−ρdt·t I Set ; X Set−1 ≤ 0
+P [T > 1] E e−ρdt·T u AT , X − λ
t=1
Then discard all actions and information after first period will give the DM higher expected utility E [u (A0 ; X )]. This information and action process satisfies this theorem.
Therefore, WLOG we assume Equation (C.8), as well as all continuation payoffs are
non-negative. Then:
ET ,AT
∞
(
∑
t−1 )
[ −ρdt·T ( T
)]
−ρdt·t
t
e
e
u A ,X − λ
e
I S ; X Se
t=0
[ (
)
]
=P [T = 0] E u A0 , X T = 0
(
)
[
(
)
]
+ P [T = 1] E e−ρdt u A1 , X T = 1 − λI Se0 ; X µ
[
−ρdt·T
+ P [T > 1] E e
(
T
u A ,X
)]
−λ
∞
∑
e
−ρdt·t
(
t−1 )
t
e
I S ; X Se
t=1
[ (
)
]
=P [T = 0] E u A0 , X T = 0
(
)
[
(
)
]
+ P [T = 1] E e−ρdt u A1 , X T = 1 − λI Se0 ; X µ
∞
(
∑
t−1 )
[ −ρdt·T ( T
)]
−ρdt·t
t
e
+ P [T > 1] E e
u A ,X − λ
e
I S ; X Se
t=1
102
[ (
)
]
=P [T = 0] E u A0 , X T = 0
(
)
[
(
)
]
+ P [T = 1] E e−ρdt u A1 , X T = 1 − λI Se0 ; X µ
(
)
[
(
)
]
+ P [T = 2] E e−2ρdt u A2 , X T = 2 − e−ρdt λI Se1 ; X Se0
∞
(
)
∑
[
(
)]
+ P [T > 2] E e−ρdt·T u AT , X − λ
e−ρdt·t I Set ; X Set−1
t=1
[ (
)
]
≤P [T = 0] E u A0 , X T = 0
(
)
[
(
)
]
+ P [T = 1] E e−ρdt u A1 , X T = 1 − λI Se0 ; X µ
(
)
[
(
)
]
+ P [T = 2] E e−ρdt u A2 , X T = 2 − λI Se1 ; X Se0
∞
(
∑
t−1 )
[ −ρdt·(T −1) ( T
)]
−ρdt·(t−1)
t
e
+ P [T > 2] E e
u A ,X − λ
e
I S ; X Se
t=1
[ (
)
]
=P [T = 0] E u A0 , X T = 0
(
)
[ −ρdt ( T
)
]
0 e1
e
+ P [T = 1, 2] E e
u A , X T = 1, 2 − λI S , S ; X µ
∞
(
∑
t−1 )
[ −ρdt·(T −1) ( T
)]
−ρdt·(t−1)
t
e
+ P [T > 2] E e
u A ,X − λ
e
I S ; X Se
t=1
≤···
]
[ (
)
=P [T = 0] E u A0 , X T = 0
(
)
[
]
+ P [T > 0] E e−ρdt u(AT , X ) − λI Se0 , Se1 , . . . ; X µ
]
[ (
)
≤P [T = 0] E u A0 , X T = 0
)
[
]
(
+ P [T > 0] E e−ρdt u(AT , X ) − λI AT ; X µ
)
[
]
(
≤P [T = 0] F (µ) + P [T > 0] E e−ρdt u(AT , X ) − λI AT ; X µ
By definition of information measure:
)
(
I AT ; X µ
[ ( )]
=H(µ) − EAt =AT H µ′ At
(
[
])
=P [T = 0] (H(µ) − H(µ)) + P [T > 0] H(µ) − EAt =AT H(µ′ At )T > 0
)
(
=P [T > 0] I AT ; X µ
Therefore:
ET ,AT
∞
(
)
∑
[ −ρdt·T ( T
)]
e
u A ,X − λ
e−ρdt·t I Set ; X Set−1
t=0
))
( [
(
)]
(
≤P [T = 0] F (µ) + (1 − P [T = 0]) E e−ρdt u AT , X − λI AT ; X µ
{
}
[ −ρdt
]
≤ max F (µ), sup E e
u(A, X ) − λI(A; X )
A
Therefore, we showed that any dynamic information acquisition strategy solving Equation (1) will have weakly lower expected utility level than a static information acquisition
103
strategy solving Equation (12). On the other hand, any solution to Equation (12) will
be a dynamic information acquisition strategy with only one period non-degenerate information, which will be dominated by solution to Equation (1).
104