Optimal Dynamic Information Acquisition Weijie Zhong1 Columbia University Abstract In this paper, I studied a decision problem in which the decision maker(DM) can acquire information about payoff relevant state to facilitate decision making. The design of information flow can be fully general, while acquisition of information in unit time is either limited or costly. I characterized five key properties of optimal dynamic information acquisition strategy in the continuous time limit: the DM will seek for informative evidence arriving as a Poisson process that Confirms Prior Belief and lead to Immediate Action with Increasing Precision and Decreasing Intensity over time conditional on continuation. Within the scope of assumptions I made, the results provided an optimization foundation for Poisson bandits learning, a dynamic foundation for rational inattention and a full characterization of dynamic information acquisition. Keywords: dynamic information acquisition, rational inattention, Poisson-bandits Contents 1 Introduction 2 2 A General Discrete Time Framework 2.1 Simplification of Information Structure . . . . . . . . . . . . . . . . . . . 2.2 Flow Cost Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Continuous Time Limit of Information . . . . . . . . . . . . . . . . . . . 3 5 6 7 3 Optimal Information Acquisition in Continuous Time 3.1 Convergence in Continuous Time . . . . . . . . . . . . . . . . . . . . . . 3.2 Characterization of Solution . . . . . . . . . . . . . . . . . . . . . . . . . 8 9 10 4 Extensions 4.1 Convex Flow Cost . . . . . . . 4.2 Continuum of Actions . . . . 4.3 General Information Measure 4.4 Connection to Static Problem 16 17 18 20 21 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Discussions and Conclusion 23 Appendix: Omitted Proofs 26 Email address: [email protected] (Weijie Zhong) Vert preliminary draft version January 20, 2017 1. Introduction A standard approach in researches involving dynamic information acquisition is to model information flow as a simple family of random process. The decision maker (DM) has control over parameters of the process representing the dimension of interest. For example, DM can control timing of action—Wald-type problem and Poissonbandit problem, type of evidence that arrives—evidence seeking problem, or intensity of experimentation—random sampling problem. These models provide elegant characterizations, but at a cost: First, we can’t justify which of those families will be endogenously pinned down by optimization when we don’t impose any restriction on the design of information flow. Second, the different choice dimensions never co-exist so we can’t infer about interactions between different aspects of information structure. In this paper, I develop an information acquisition framework which imposes no restriction on specific form of information the DM can acquire and allows the DM to optimize over all aforementioned dimensions. The most important feature is that, the DM can choose any arbitrary random process and she observes realization of this process as her information. Her objective is to maximize expected utility from action she chooses based on signals she receives, subject to a constraint or convex cost on amount of information acquired within unit time. To gain tractability with so much generality, I impose following assumptions on main model 1) continuous time, 2) binary state and 3) informativeness measure being posterior separable. Within this framework, I try to accomplish three main goals. The first goal is to provide foundation for a family of simple information structures that arises endogenously as optimizer in dynamic information acquisition environment. The main finding is that under my assumptions, the optimal information acquisition strategy will be Evidence Seeking: In each period, the DM will wait for a signal arriving at a Poisson rate. The signal is very informative about the state and it drives a jump in the DM’s posterior belief once being observed. This finding suggests that in environment where there is more flexibility in information acquisition, for example R&D activity, experiment design of researchers, market sampling etc., information flow is more likely to assemble a Poisson bandit model. The second goal is to characterize the interaction among all four dimensions of interest in design of information: type of evidence, informativeness of signal, intensity of experimentation, and timing of actions. It’s optimal for the DM to seek for informative evidences that Confirms Prior Belief and lead to Immediate Action with Increasing Precision and Decreasing Intensity over time. A decision maker with prior belief that one state is most likely will wait for a signal arriving occasionally. Once the signal is received, the decision maker’s posterior belief will jump towards her prior conjecture being true and she will immediately choose an optimal action associated with posterior belief. If the signal doesn’t arrive, belief of the decision maker drifts towards being unsure and she will seek for more informative signal arriving at lower Poisson rate. The characterization of precision v.s. intensity trade-off is novel to the literature. The following pattern: higher experimentation intensity associated with more extreme beliefs is shared with Moscarini and Smith (2001), which uses a single parameter to represent 2 both precision and intensity. However, by separating precision from intensity, I illustrate the intuition that with higher continuation value (which is the cost of discounting), the gain from increasing signal arrival rate is higher relative to increasing signal precision. Therefore, lower signal precision is associated with more extreme beliefs. The following patterns: no repeated experimentation and confirmatory evidence seeking associated with ambiguous beliefs are shared with Che and Mierendorff (2016), which uses magnitude of belief jumping as control parameter to represent type of evidences. However, by allowing freedom of choosing both type of evidence and arrival rate, I suggest that when belief is more extreme, higher continuation value will make confirmatory evidence more appealing for its high arrival rate comparing to contradictory evidence. Third, I want to setup link between dynamic information acquisition problem and well studied static problems. I derive a discretization of the dynamic problem in this paper. The discretized model is a sequential generalization of rational inattention model from Matejka and McKay (2014), which studies design of flexible information acquisition within one period. On the other hand, in zero discounting limit and linear cost limit our model nests static rational inattention model as a special case. Therefore, the within period foundation of my model boils down to static rational inattention model and my model serves as dynamic foundation for static rational inattention. The rest of the paper is structured as follows. Section 2 setup a general discrete time framework for dynamic information acquisition problem and provide simplification results based on information theory. Section 3 derives the proper continuous time limit of the discrete time problem and provides full characterization of its solution. Section 4 studies several extensions of baseline model. Section 5 concludes. 2. A General Discrete Time Framework Assume that a decision maker(DM) faces the following decision problem: • Decision problem: Time horizon t = 0, 1, . . . , ∞ is discrete. Length of each time interval is dt. Both actions space A and state space X are finite. Utility associated with action-state pair (a, x) is u(a, x) and the DM discounts his utility exponentially with factor ρ > 0. If the DM take action a ∈ A at time t conditional on state being x ∈ X, then his utility gain is e−ρt·dt u(a, x). • Uncertainty: When not knowing the true state, the DM will form a belief µ ∈ ∆X about the state. Her preference under uncertainty is expressed as von NeumannMorgenstern expected utility. I am going to use two essentially equivalent formulations to express expected utility. 1) Given belief µ, the expected utility associated with each action a ∈ A is Eµ [u(a, x)]. 2) State and action are treated as random variables X , A. Expected utility is denoted by E[u(A, X )]. F (µ) = maxa∈A Eµ [u(a, x)] is assumed to be the expected utility from choosing optimal action given belief µ. • Information: Information is defined as a signal space S = {s} with state dependent conditional distribution of signals {g(s|x)}. The DM is Bayesian sequentially rational. Given prior µ, signal structure (S, g(s|x)), her posterior belief µ′ when observing signal 3 s is determined by Bayes rule: µ′ (x|s) = ∑ ′ g(s|x)µ(x) ′ ′ . An information structure can x ∈X g(s|x )µ(x ) be equivalently denoted by a random variable S, whose realization is in S and whose joint distribution with state X is defined by conditional distribution g. Markov Chain Property: To capture the fact that the information required to take an action must be provided by signals, I assume that state, information and action form a Markov chain: X →S→A That is to say, state and action are independent conditional on information: X ⊥A|S. Information Measure: Measure of informativeness of information structure is characterized by a prior dependent function I(S; X |µ) ≥ 0, which is the information measure of signal structure S when prior is µ. I is defined as a measure of reduced uncertainty level of belief: I(S; X |µ) = H(µ) − Es [H(µ′ (x|s))], where µ′ are posterior beliefs determined by Bayes rule. I(S; X |S1 ) = I(S; X |µ′ (x|s))S1 =s represents a random variable taking value as information measure of S conditional on prior belief of X being posterior beliefs associated with realizations of S1 . Uncertainty measure H : ∆X 7→ R+ is assumed to be a concave function. 1 Information Cost: Given the structure of information measure, we define a time separable information cost function. In each period, DM pays information cost f (I (S, X |µ)) which transforms information measure of the information structure into utility loss. f : R+ 7→ R+ is non-decreasing. • Dynamic Optimization: The dynamic optimization problem of the DM will be: ] [ ∞ ∑ ( ( )) [ ] e−ρdt·t f I S t ; X |S t−1 e−ρdt·T E u(AT , X )|S T −1 − sup E(S t =S t )∞ t=0 S t ,At ,T (1) t=0 t−1 X → S → 1T ≤t s.t. X → S t−1 T =t → At T =t X → S t → S t−1 where T ∈ ∆N, t ∈ N. S −1 is defined as a deterministic variable that induces belief same as prior belief µ of the decision maker (for notation simplicity). The DM chooses action time T , choices of action conditional on action time At and signals S t subject to information cost and three natural constraints for information process: 1. Information path prior to action time is sufficient for action time. 2. Information received prior to period t is sufficient for action at time t. 3. Information is accumulative over time. 1 2 This is sufficient and necessary for I(S; X |µ) to be non-negative for any information structure and prior. Noticing that in every period, the information in current period has not been acquired yet. So decision can only be taken based on the information already acquired in the past. So the Markov chain property on information and action time/action will have information lagged by one period. This within period timing issue doesn’t make a difference when going to continuous time limit. It matters when going to the linear cost limit and we will highlight this in Theorem 9. 2 4 2.1. Simplification of Information Structure In this subsection, I am going to simplify the problem defined in Section 2. It is almost impossible to directly solve or provide characterization to solution of Equation (1) due to both dimensionality and complexity of the space of feasible strategies. On one hand, the space of general random processes are all huge spaces of infinite dimension. On the other hand, the three constraints on information and action process interacts in a very implicit way. The simplification result will be mainly built on the following lemma: Lemma 1. Information measure I(S; X |µ) satisfies the following properties: 1. Markov property: If X → S → T , then I(T ; X |S) = 0. 2. Chain rule (Linearity): I(S, T ; X |µ) = I(S; X |µ) + E[I(T ; X |S, µ)]. 3. Information processing inequality: If X → S → T , then I(T ; X |µ) ≤ I(S; X |µ). Lemma 1 is an analog to three key theorems on mutual information proved in Cover and Thomas (2012), generalizing the log-sum structure in mutual information to the general form of information measure we defined. The intuition of Lemma 1 is very simple. The first statement says that when S contains all information about X in T , then the cost of acquiring T when knowing S will be zero. This is intuitive because in this case, all randomness contained in T other than S are irrelevant to unknown state and can be produced by an arbitrary random device jointly with the known signal S. The second statement says that the cost of acquiring a combined signal (S, T ) could be decomposed linearly into first acquiring S and then acquiring the remaining information in T . The third statement says that the if S contains all information about X in T , then acquiring S will be more costly than acquiring T . Lemma 1 illustrates two key ingredients of my framework: sophistication and flexibility. On one hand, the hypothetical DM is sufficiently deliberate so that she can perfectly distinguish any “random noise” from “useful information”. She can also perfectly extract information from any random process and form correct posterior belief through rationally applying Bayes rule. On the other hand, the design of information is totally flexible. Therefore, the DM can separate any information that is costly but not necessary for her decision making and discard it to minimized information cost. Given any strategy (T , At , S t ), the expected utility can be weakly improved by simplifying the information structure S t in three steps. Step one is to first enrich S t by nesting all past signals (S 0 , . . . , S t ) into the current period signal Set . Since information is accumulative, this operation doesn’t add any additional information to what is currently learned and what is potentially to be leaned. Step two is to discard all information after an action being taken. It’s obvious that after an action being taken, information becomes useless for the DM. Step three is to replace signal process by the action process itself. This operation weakly reduces cost due to the information processing inequality. To sum up, we can assume WLOG that the optimal strategy only involves signal structure S t of the following form: t t−1 S are degenerate. S , 1 T >t+1 S t−1 T =t = At T =t t S = S t−1 T ≤t T ≤t 5 In each period, the DM will acquire information in the form of combination of two kinds of signals: signals that directly lead to actions in next period, and signals that indicates continuation. Then instead of abstract random process, I can represent information as conditional distributions of actions/signals. By rewriting the sequential problem into a recursive problem, I obtain the following representation lemma: Lemma 2 (Recursive formulation). Vdt (µ) is the optimal utility level solving Equation (1) given initial belief µ if and only if Vdt (µ) solves the following functional equation: { } ∑ −ρdt Vdt (µ) = max F (µ), supe (2) pi Vdt (µi ) − f (C) pi ,µi ∑ C = H(µ) − pi H(µi ) s.t. ∑ pi µi = µ where (pi ) ∈ ∆(2 |X|), (µi ) ∈ ∆X 2|X| . Equation (2) is very straight forward except for the subtlety in dimensionality of strategy space. The first term in the maximization problem is utility from immediate action based on current belief. The second term is the supreme of expected gain from information acquisition. By choosing a signal structure, it’s equivalent to choose posterior ∑ ∑ beliefs subject to Bayesian plausibility pi µi = µ. So e−ρdt pi Vdt (µi ) will be the expected utility from observing the signals inducing µi and delaying further decision to next period. f (C) is the cost of information acquisition in this period. Now, the only remaining problem with Equation (2) is that it covers a very restricted space of strategies. The choice of signal structure is restricted to no more than 2 |X| posteriors each with positive possibility, while the original space contains signals with arbitrary number (even continuum) of realizations. This simplification is based on a generalized concavification methodology developed in Theorem 10. The original concavification methodology from Aumann et al. (1995) (utilized by Kamenica and Gentzkow (2009)) states the fact that in Rn (a finite dimensional belief space), any point in upper concave hull of function f (x) could be achieved by linear combination of no more than n points in the graph of the function. My problem Equation (2) involves an additional term f (C), which is not simply in the expectation form but similar mathematical intuition applies. One can replace this extra term using a multiplier and apply the concavification method on the Lagrangian. In this case, the number of posterior beliefs needed grows by no more than twice. Finally, I established that solving the simplified problem Equation (2) is equivalent to solving the original problem Equation (1). Thanks to reduced dimensionality, the standard Blackwell condition can be used in Lemma 3 proves the existence and uniqueness of solution to Equation (2). Lemma 3. ∀dt > 0, there exists unique Vdt ∈ C∆X solving Equation (2). 2.2. Flow Cost Structure Starting from this subsection, we can focus on Equation (2). To characterize solution to this functional equation using differential equations, we impose extra smoothness assumptions on information measure H and cost function f : 6 Assumption 1 (Uncertainty measure). • H : ∆X 7→ R− is C (2) smooth. { } • ∀δ > 0, ∆δ = µ ∈ ∆X minx {µ(x)} ≥ δ , H ′′ (µ) is Lipschitz continuous on ∆δ . • H ′′ (µ) is negative definite ∀µ ∈ ∆X. Assumption 1 imposes conditions on uncertainty measure H. The first and second conditions are just smoothness conditions for mathematical convenience. The third condition rules out the case that H is locally linear at some µ ∈ ∆X. Linearity of H around µ is equivalent to a very uninformative signal structure at prior µ being totally free. I rule out this case in order to avoid discussing technical details related to free information. Assumption 2 (Bounded capacity). ∀dt > 0, there exists c s.t.: { 0 when c′ ≤ cdt f (c′ ) = ∞ when c′ > cdt Assumption 2 restrict the cost function f to be a hard cap: information is essentially free when its measure is below per period capacity cdt and infinitely costly when exceeds this capacity. This condition forces the DM to smooth his information acquisition process over time. With Assumption 2, solution to Equation (1) will be the solution to following functional equation: { } ∑ −ρdt Vdt (µ) = max F (µ), max e pi Vdt (µi ) (3) pi ,µi ∑ H(µ) − pi H(µi ) ≤ cdt s.t. ∑ p i µi = µ In the main paper, I impose Assumptions 1 and 2 all the time afterwards and focus on Equation (3) as well as its continuous time limit. This helps me to focus on the design of delicate structures of information in a dynamic environment and keep analysis clean. In Section 4, I’m going to replace Assumption 2 with Assumption 2′ and show that the main characterization results are exactly the same in an environment with a flexible cost. 2.3. Continuous Time Limit of Information Before moving to continuous time limit of optimization problem, let’s informally study the form of signal structures in continuous time limit. In static environment, the conditional distribution formulation used in rational inattention literature and the posterior belief formulation used in Bayesian persuasion literature provide great mathematical convenience in corresponding problems. In dynamic environment, tracking the whole trajectory of signal distributions or induced beliefs becomes extremely hard. However, suppose time interval is converging to zero and per period information cost is bounded to be of same order of interval length, any dynamic information acquisition strategy can be represented as a combination of two families of basic random process familiar to us: 1. Limiting diffusion signals: At prior µ, if a series of experiments ({µ′idt } , pdt (µ′i )) have µ′idt → µ, pdt (µ − µ′idt )2 ∼ O(dt), then this series of signal structures are limiting diffusion. 7 2. Limiting Poisson signals: At prior µ, if a series of experiments ({µ′idt } , pdt (µ′i )) have pdt (µ′i ) ∼ O(dt) for all i s.t. µ′idt ̸→ µ, then this series of signal structures are limiting Poisson. Lemma 4. With either Assumptions 1 and 2 or Assumptions 1 and 2′ satisfied. At any prior µ, the optimal policy for dt → 0 must be a combination of limiting diffusion and Poisson signals i.e. ∀ convergence subsequence µ′idt , pdt , ∀i, pdt (µ − µ′idt )2 ∼ O(dt). Consider posterior beliefs as a random process. Lemma 4 shows that the optimal policy must has flow variance of same order of time interval length. Therefore, take any converging (in the sense of both posteriors and probabilities) sequence of process, there are only two possible limiting behavior: 1) posterior beliefs are bounded away from prior. Then it’s probability must be of order of time interval length. 2) posterior beliefs are converging to prior. Then it behaves as if it’s diffusion around prior with bounded flow variance. In Section 3, I am going to formalize this argument by showing that it’s sufficient to only consider these two kinds of process in the continuous time limit of the optimization problem. 3. Optimal Information Acquisition in Continuous Time In this section, I restrict my attention to one-dimensional belief space. I derive the proper continuous time limit of Equation (3) and characterize the optimal strategy of information acquisition. The continuous time limit of solution Vdt (µ) to Equation (3) will be characterized by a very simple functional equation: At any instant of time, the DM chooses between optimal immediate action and continuing information acquisition. Conditional on continuation, the DM essentially allocates the flow capacity into two kinds of signals: 1) Poisson-bandit signals that arrive at a Poisson rate and drive jumps in posterior belief. 2) diffusion signal that drives diffusion in posterior belief. The optimal strategy is even simpler. It involves a single Poisson signal that drives jump toward confirming the DM’s prior belief. Any arrival of signal is followed by immediate action. The following assumption is necessary for me to utilize tools in ordinary differential equations to analyze this problem: Assumption 3. 1. (Binary states): |X| = 2, ∆X = [0, 1]. 2. (Positive payoff ): F (0), F (1) > 0. Assumption 3 contains two parts. First, state space is assumed to be binary, then belief space is one-dimensional. Second, I assume that knowing the true state ensures the DM strictly positive utility. This assumption is made to guarantee that there are non-degenerate region of beliefs when the DM is sufficiently sure about the state and she chooses an immediate action. This is needed to setup a proper boundary condition for the differential equations. Continuous time problem is introduced and derived in Section 3.1. Solution is characterized in Section 3.2. 8 3.1. Convergence in Continuous Time To be formal, the object I am going to study is limit of value function Vdt when time interval dt converges to zero. Lemma 5 shows the existence of such a limit under L∞ norm: Lemma 5. Let V (µ) = lim supdt→0 Vdt (µ), then ∥Vdt (µ) − V (µ)∥∞ → 0 when dt → 0. Now I setup a continuous time optimization problem directly. Later Theorem 1 will guarantee that this problem properly characterizes V (µ) I just defined. Consider the following Bellmen equation defined on C (1) smooth functions: { } ∑ D2 V (µ) 2 ′ ρV (µ) = max ρF (µ), sup pi (V (µi ) − V (µ) − V (µ)(µi − µ)) + (4) σ̂ 2 µi ∈[0,1], pi ,σ̂∈R+ s.t. ∑ pi (H(µ) − H(µi ) + H ′ (µ)(µi − µ)) − H ′′ (µ) 2 σ̂ ≤ c 2 Despite the usage of general second derivative D2 V (µ) 3 , Equation (4) is quite straight forward based on our intuition from last section. Left hand side of Equation (4) is the flow cost from discounting. In a Bellmen equation, it equals to the flow utility gain from continuation. Right hand side of Equation (4) maximizes over two terms. The first term is flow utility loss from not choosing an optimal immediate action. The second supremum term is utility gain from information acquisition in an infinitesimal period. Consider a Poisson signal that arrives at Poisson rate pi and induces posterior belief µi . When it arrives, the utility gain is V (µi ) − V (µ). When it doesn’t arrive, posterior belief will be drifting against µi . A standard result shows that the speed of drifting is exactly −pi (µi − µ). Therefore, the utility gain from receiving no signal is −pi V ′ (µ)(µi − µ) because one is essentially drifting along the value function. To sum up, the term pi (V (µi ) − V (µ) − V ′ (µ)(µi − µ)) summarizes the flow gain of waiting for a signal (pi , µi ). Consider instead a signal which arrives constantly but with very low informativeness, for example a Weiner process parametrized by the state. Then the utility gain from waiting for such a signal will be flow variance of belief process times 2 local concavity of value function σ̂2 D2 V (µ). This is introduced with details in Moscarini and Smith (2001). It is no surprising that the cost constraint takes the form in Equation (4). By Equa∑ tion (3), utility gain from information acquisition in discrete time is p V (µ ) − Vdt (µ), ∑ i dt i while information measure of this information structure is H(µ) − pi H(µi ). Therefore, utility gain takes same form as information cost. In the continuous time limit, simply apply the same form of utility gain to information cost, I get the constraint formulation in Equation (4). An informal interpretation of Equation (4) is that, the trade-off for the DM is precision versus intensity. When choosing posterior µi further away from prior µ, signal is more precise and potentially leads to a higher utility gain upon arrival. But due to capacity ′ ′ ′ (µ) General second derivative D2 V (µ) is defined by: D2 V (µ) = lim supµ′ →µ V (µµ)−V . I made this ′ −µ (1) generalization because at boundaries where V is pasted to F , C smooth V is not twice differentiable. 3 9 constraint, the DM can only do so at a cost of reducing its arrival rate pi . When the DM chooses a very fuzzy signal in the limit, she essentially chooses a diffusion signal. It’s not hard to see that both D2 V and V (µi ) − V (µ) − V ′ (µ)(µi − µ) characterize concavity of a function (local and global concavity). Therefore, the key factor determines the optimal level of experimentation is relative concavity between value function V and uncertainty measure H. When V is relatively more concave than H, the DM values gain from an informative signal µi relatively more than its cost. In this case, she is willing to wait for a signal to arrive. On the other hand, when V is relatively less concave than H, the cost of waiting outweighs gain from information. The DM stops information acquisition and chooses immediate action. Before discussing the details of solution to Equation (4), I prove that it properly characterizes V (µ) through the following theorem: Theorem 1. With Assumptions 1, 2 and 3 satisfied. Suppose V (µ) ∈ C (1) is a solution to Equation (4). Let Vdt (µ) be solution to Equation (3) and V (µ) = lim supdt→0 Vdt (µ). Then V (µ) = V (µ). Theorem 1 proofs that whenever solution to Equation (4) exists, this solution will be unique and coincides with the limit of solution to discrete time problem Equation (3). Therefore, the Bellmen equation I developed is the proper problem to study. The intuition of proof is simple and contains three steps. Step 1, rewrite the whole functional equation on a larger space of locally Lipschitz continuous functions. By standard intuition of maximum principle, Vdt will be all Lipschitz continuous because each type can simply mimic its neighbours. As a limit, V (µ) will be locally Lipschitz continuous. Rewrite Equation (4) to accommodate all locally Lipschitz continuous value functions. Step 2, show that V (µ) is unimprovable under the optimization problem. Suppose by contradiction that V (µ) is improvable, then the strictly dominating strategy (pi , µi ) or σ b2 can be discretized such that a discrete time value function Vdt with dt sufficiently small can also been improved. Step 3, show that V (µ) equals the solution to the functional equation V (µ) (this proves feasibility automatically). The idea is as following. Suppose by contradiction they are different. Consider at some µ, the distance between V (µ) and V (µ) is maximized (say V (µ) < V (µ)). Then this implies that at µ, the lower function will be weakly “more convex” than the higher function, and jumping/diffusing to elsewhere will give the DM weakly higher utility gain: ∀µ′ , V (µ′ )−V (µ)−V ′ (µ)(µ′ −µ) ≥ V (µ′ )−V (µ)−V ′ (µ)(µ′ −µ). Then by the nature of our problem, any signal structure will be more beneficial with V (µ) than with V (µ) at µ. However, value function is proportional to the experimentation gain everywhere in the problem and this leads to contradiction. 3.2. Characterization of Solution 3.2.1. Geometric Representation In last section, I vaguely talked about “relative concavity” between value function and uncertainty measure being the key factor determining the optimal strategy. Now let’s study this idea in a more formal way. Suppose value function V (µ) is given, consider the optimization problem choosing one posterior belief: sup p (V (ν) − V (µ) − V ′ (µ)(ν − µ)) ν,p 10 (5) s.t. p (H(µ) − H(ν) + H ′ (µ)(ν − µ)) ≤ c Equation (5) is much more restrictive than Equation (4). I assume that the DM decided to continue information acquisition and acquires only one Poisson-bandit signal. It’s not hard to see that this is without much loss of generality. Since both objective function and constraints are linear in pi , σ b2 , it’s WLOG to assume that the DM always chooses one signal with largest gain/cost ratio. Suppose it’s optimal to choose diffusion signal, then it can be approached by ν → µ when V is sufficiently smooth (twice differentiable). In this part, let’s assume V being sufficiently smooth. First order conditions for ν and p implies: FOC-ν : V ′ (ν) − V ′ (µ) + λ (H ′ (ν) − H ′ (µ)) = 0 FOC-p : V (ν) − V (µ) − V ′ (µ)(ν − µ) + λ (H(ν) − H(µ) − H ′ (µ)(ν − µ)) = 0 { G′ (ν) = G′ (µ) G=V +λH =====⇒ G(ν) − G(µ) − G′ (µ)(ν − µ) = 0 (6) Let’s call G(µ) the gross value function which integrates shadow cost from the capacity constraint. Equation (6) is simply a concavification characterization: both G(µ) and b of G. G(µ) + G′ (µ)(ν − µ) as a linear function of G(ν) are on the upper concave hull G ν will be the lowest supporting hyperplane of graph of G. This is actually very similar to the concavification we are familiar with in Bayesian persuasion literature. The only difference is that in current problem, at prior µ, graph of gross value function must coincide with convex hull of its own graph. This result has clear economic meaning. By definition of gross value function, when shadow cost λ is chosen properly such that solving unconstrained problem: sup p(G(ν) − G(µ) − G′ (ν)(ν − µ)) p,ν is equivalent to solving the constrained problem. Whether this problem yields positive payoff depends on whether G(µ) is below or on its upper concave hull. In static maximization problem, G(µ) strictly below its upper concave hull implies strictly positive gain from information. However, in the current problem, strictly positive gain induces the DM to infinitely increase experimentation intensity p. To keep flow cost bounded, the payoff from concavifying gross value function must be exactly zero. One can understand it as the limit that prior belief µ being arbitrarily close to the boundary of a region where b > G. So utility gain from experimentation diminishes at the same order of time interG val length, which matches the magnitude of opportunity cost of waiting. What’s more, suppose V (µ) is achieved by solving Equation (5), we can impose feasibility condition: ρV (µ) = c V (ν) − V (µ) − V ′ (µ)(ν − µ) H(µ) − H(ν) + H ′ (µ)(ν − µ) ρ FOC-p ===⇒ λ = V (µ) c The shadow cost on capacity constraint λ is determined by two terms: ρc is the effective discount factor (increasing flow capacity constraint is equivalent to time passing faster). 11 V (µ) is the value function at µ. This is also intuitive. The shadow cost of violating capacity constraint can be translated to loss in reduced arrival rate, which is exactly discounted continuation value. {Value function} {Uncertainty Measure} V 0 {Gross value function} H 1 G μ 1 μ 0 μ ν 1 μ The left panel shows optimal value function V (µ) (blue line) with F (µ) defined by the dashed lines. The center panel shows the uncertainty measure H(µ), defined as standard Entropy function. The right panel shows the gross value function evaluated at µ, G = V + ρc V (µ)H (blue line). The dashed black line is supporting hyperplane of graph of G. It tangents G at both µ and ν (the optimal posterior). Figure 1: Concavification of gross value function A geometric characterization of my previous analysis is illustrated in Figure 1 4 Left and center panel of Figure 1 shows typical shape of value function V (convex) and information measure H (concave). Right panel shows the gross value function V + ρc V (µ)H with prior µ. The gross value must have the following property: prior (µ, G(µ)) is on upper concave hull of G, the optimal posterior (ν, G(ν)) is also on upper concave hull of G. And they must be on the same supporting hyper plane (the dashed line). On the figure, it’s clear that whether continuing information acquisition is profitable depends on relative convexity of value function versus uncertainty measure. At µ, V is relatively more convex than H and information is sufficiently valuable. Then it’s optimal to wait for an informative signal. When µ approaches boundary, V becomes flat and it will be optimal to choose an action immediately. Remark. So far I focused on the optimization problem choosing an optimal posterior away from prior and totally ignores the possibility of diffusion signals. But it’s not hard for one to see that diffusion signals being optimal can also be represented within the same framework: gross value function G is locally linear but globally concave. The choice between Poisson-bandit and diffusion signals is eventually determined by local concavity versus global concavity of gross value function. Finally, if one is interested in exactly which type of evidence (posterior belief) to choose, perturbing the weight λ in gross value function provides useful intuition. Since the shape of V and H is fixed for a particular problem. For different prior µ, the only factor governing shape of G is λ = ρc V (µ). Increasing λ increases concavity in G. Therefore, tangent points of supporting hyperplane to G will get closer. That is to say, λ adjusts the weight on local convexity and global convexity of G, which determines the location of optimal posterior ν. Recall that λ measures continuation value. So this means with Parameters used to calculate this example: F (µ) = max {0.5µ − 0.2, 0.3 − 0.5µ}, −µ log(µ) − (1 − µ) log(1 − µ). 4 12 ρ c = 2, H(µ) = higher continuation value, the DM is more willing to give up signal precision to achieve higher signal arrival rate. The discussion of geographic representation of the optimization problem provides us a clear map between interested aspects of information acquisition problem and geographic properties of value function. Table 1 summarizes key factors determining the four interested aspects of information acquisition problem. The first three columns summarize the discussion in this section. The discussion of last column will be presented in Section 4 when I endogenize capacity constraint with an actual cost function. Table 1: Four aspects of dynamic information acquisition problem Dimension of interest: Trade-off: Timing of action continuation V.S. stopping Deterministic relative convexity factor: of V to H Informativeness of signal Poisson-bandit V.S. diffusion global V.S. local convexity of G Type of evidence precision V.S. arrival rate λ adjusting global and local concavity of G Experimentation Intensity N.A. continuation valuation 3.2.2. Main Characterization Theorem In this subsection, I will state the main characterization theorem of value function V (µ), discuss its proof utilizing the geometric representation I developed in last subsection and explain its economics intuitions. Given Theorem 1, to characterize V (µ), it’s sufficient to find a smooth solution for Equation (4). I prove the existence of such a solution and provide characterization simultaneously by constructing the optimal policy function: Theorem 2. With Assumptions 1, 2 and 3 satisfied, there exists V ∈ C (1) [0, 1] solving Equation (4). Let E = {µ ∈ [0, 1]|V (µ) > F (µ)} be experimentation region, there exists ν : E 7→ [0, 1] \ E s.t.: ρV (µ) = − c F (ν(µ)) − V (µ) − V ′ (µ)(ν(µ) − µ) H(ν(µ)) − H(µ) − H ′ (µ)(ν(µ) − µ) where ν(µ) has the following properties: 1. ∃µ∗ ∈ (0, 1) s.t. ν(µ) ≥ µ when µ ≥ µ∗ and ν(µ) ≤ µ when µ ≤ µ∗ . 2. ν(µ) is piecewise C (1) smooth on E. 3. ν(µ) is piecewise strictly decreasing on E. 4. ν(µ) is unique solution to Equation (4) almost everywhere. Theorem 2 proves existence of solution to Equation (4) and characterizes the optimal policy function. The theorem first states that optimal value function can be achieved through evidences arriving as Poisson process, i.e. an signal inducing optimal posterior belief ν(µ). What’s more, since ν maps experimentation region E into immediate action region [0, 1] \ E, the DM will take immediate action upon arrival of signals. Property 1 says that the optimal signal will be confirmatory evidence. When µ ≥ µ∗ , the DM holds 13 prior belief that state 1 is more likely and she acquires information that induces even higher posterior belief. Vice versa for µ ≤ µ∗ . Conditional on receiving no signal, the DM’s belief will drift towards µ∗ . Property 3 says that while belief is drifting towards µ∗ , the posterior belief induced by signal will be moving towards 0 or 1. When feeling more and more ambiguous which state being the true state, the DM acquires signal with increasing precision. Finally, property 2 and 4 states that optimal policy ν is a well-behaved function and is essentially uniquely defined. I calculated a simple example and presented the solution in Figures 2 and 3.5 There are four actions and associated payoffs are represented by F (µ), the dashed curve in Figure 2 with three kinks. We refer to the two actions with steeper slope riskier actions and the two actions with flatter slope safer actions. The blue curve in Figure 2 represents value function V (µ). In the shaded region V (µ) > F (µ). So its projection on horizontal axis is experimentation region E. Figure 3 shows the optimal policy function ν(µ). Both blue curve and red curve shows ν, blue curve means the optimal action associated with posterior being riskier actions and red curve means the optimal action associated with posterior being safer actions. As stated in Theorem 2, the policy function is piecewise smooth and decreasing. The three arrowed curves in Figure 2 shows example of optimal strategy at three different priors. The arrows start at priors and points to optimal posteriors. The two blue arrows mean associated action being riskier action and the red arrow means that being safer action. Figure 2: Value function Figure 3: Policy function V ν 0.30 1.0 0.25 0.8 0.20 0.6 0.15 0.4 0.10 0.2 0.05 0.2 0.4 0.6 0.8 1.0 μ Blue line is value function V . Dashed black line is immediate action payoff F . Shaded region projected on horizontal axis is experimentation region E. Arrows starts from a prior and points to its optimal posterior. 0.2 0.4 0.6 0.8 1.0 μ Dashed straight line is ν = µ. Blue lines and red lines are policy function ν(µ). When policy function is blue, optimal posterior leads to the two outer actions (riskier actions). When policy function is red, optimal posterior leads to the two inner actions (safer actions). Sketched proof: I proved Theorem 2 by construction and verification. I conjecture that optimal policy to Equation (4) takes the form in Theorem 2: a single confirmatory signal associated with immediate action. Then I constructed V (µ) and ν(µ) with three steps. Step 1 is to determine the critical belief µ∗ . µ∗ can be calculated as essentially the unique belief at which acquiring optimal signals with higher belief or lower belief yields the same Parameters used in this example: F (µ) = max {0.5µ − 0.25, 1.3µ − 1, 0.25 − 0.5µ, 0.3 − µ}, H is standard Entropy function. 5 14 ρ c = 3, payoff, i.e. the unique belief at which a stationary information acquisition strategy is optimal. Step 2 is to fix an action and solve for the optimal policy function. Once action is fixed, the only parameter to choose is posterior belief ν. The problem is: ρV (µ) = max −c ν (u(a, 1)ν + u(a, 0)(1 − ν)) − V (µ) − V ′ (µ)(ν − µ) H(ν) − H(µ) − H ′ (µ)(ν − µ) Since we fixed an action a, kinked function F is replaced with a linear function Fa (ν) = u(a, 1)ν + u(a, 0)(1 − ν). The objective function is sufficiently smooth in ν. Then, first order equation w.r.t. ν and feasibility condition yields a well behaved first order ordinary differential equation characterizing ν(µ). Therefore we can solve for optimal policy ν and calculate value V (µ) accordingly. Step 3 is to update the value function w.r.t. all alternative actions and smoothly paste the solved value function piece by piece. This step starts from using the value and policy at µ∗ as initial value for the ODE defined in step 2. Then I extend the value function towards µ = {0, 1}. Whenever I reach a belief at which two action yield same payoff, I setup a new ODE with the new action. This process continues until the calculated value function V (µ) smoothly pastes to F (µ). Verification also takes three steps. Step 1 is to verify that V (µ) is optimal w.r.t. repeated experimentation. Step 2 is to verify that V (µ) is optimal w.r.t. non-confirmatory signals. Step 3 is to very that V (µ) is optimal w.r.t. diffusion signals. The intuition of the proofs is easy to understand with the geometric characterization introduced in last subsection. First, suppose by contradiction that repeated experimentation is profitable at some belief µ. This implies that if we consider G(ν) = F (ν) + ρc V (µ)H(ν) (gross value function with immediate action), (ν(µ), G(ν(µ))) will be on the upper concave e hull of G. However, suppose we allows continuation experimentation and take G(ν) = ρ V (ν) + c V (µ)H(ν) (gross value function with continuation experimentation), there will e ′ )) strictly higher than upper concave hull of G. As is shown in Figure 4, the be (ν ′ , G(ν e ′ )) on the red curve is the hypothetical higher gross value function. Then take (ν ′ , G(ν b Then clearly ν ′ is the only point hypothetical value function that is furthest away from G. e touching supporting hyper plane (the red dashed line). Now consider the maximization G problem at ν ′ , since ν ′ ≥ µ, λ is even higher and gross value function at ν ′ is even more concave. Then ν ′ must still be the unique point touch its supporting hyper plane. This means V (ν ′ ) is not feasible, which is a contradiction. G 0 G μ ν' ν 1 μ 0 G μ ν 1 μ 0 μ ν 1 μ Figure 4: Proof of no repeated ex- Figure 5: Proof of confirmatory Figure 6: Proof of Poisson signal perimentation evidence Second, consider the gross value function at µ∗ . By construction, at µ∗ stationary experiment is optimal. Therefore the supporting hyperplane must touches G at two 15 posteriors at both sides of µ∗ . The solution to ODE shows that V is minimized at µ∗ . Therefore, for any other belief other than µ∗ , the gross value function must be strictly more concave. It’s not hard to see from the shape of gross value function in Figure 5 that with more concave gross value function, at µ > µ∗ the supporting hyperplane can only cross even higher posterior belief and vice versa for µ < µ∗ . Finally, suppose diffusion signal strictly dominates Poisson signal at µ, then the gross value function has shape in Figure 6. Consider beliefs around µ, the shape of gross value function suggests that they can be achieved through concavification only if they have lower λ. This implies µ being a local maximizer of V , contradicting the quasi-convex shape of V . Economic intuition: Timing of action is determined by kinks of F (µ). Geometrically, at kinds of F , F is infinitely convex relative to H. Therefore continuation is always locally profitable. While at flat area of F , H tends to be sufficiently concave that immediate action is optimal. Every kink of F (µ) is a critical belief at which arbitrarily small shift in posterior belief will shift the choice of optimal action. Therefore, at those beliefs infinitesimal amount of information will be valuable while cost diminishes at second order. As a result, the shape of experimentation region E takes the form of disjoint intervals around kinks of F (µ). Type of evidence is determined by level of continuation value. Geometrically, when µ is further away from µ∗ , λ is larger. So global concavity of G is relatively lower and optimal posterior will be closer to prior as a result. When the DM is more sure about the state the continuation value is high, thus cost from discounting (parametrized by λ) is high. The DM wants to receive a signal and enjoy the high value soon. Therefore, optimal signal will be with lower precision and higher arrival rate. Vice versa, when DM is more ambiguous about the state the continuation value is low. The DM is willing to suffer from long waiting for a more precise signal. Diffusion signals will only be used as a limit when the DM is close to the boundary of experimentation region. 4. Extensions In this section, I’m going to study several extensions of the baseline model. The first extension is to endogenize the capacity constraint and study the dynamics of experimentation intensity jointly with choice of evidence type and precision. I setup a continuous time model which allows the DM to pay a convex cost on measure of information. The standard monotonic property6 still holds on experiment intensity while the dynamics of evidence type and precision are same as characterized in baseline model. The second extension is to relax the finite action assumption. I showed that problem with continuum of actions can be approximated well by adding actions, in the sense of both value function and policy function. The third extension is to generalize structure of information measure and study robustness of the main characterization result. I modeled a generic set of information measure in continuous time and showed that it’s generally true that informative Poisson signal is strictly superior to diffusion signals. The forth extension is to connect the dynamic model to well studied static models. The dynamic model in 6 Optimal cost function is isomorphic to value function. Proposition 1 of Moscarini and Smith (2001). 16 current paper will reduce to a static rational inattention model when the agent is fully patient or when cost function is linear. 4.1. Convex Flow Cost In addition to the dimensions studied in baseline model, I am also interested in optimal experimentation intensity when flow capacity is also endogenized. In this extension, I replace Assumption 2 with Assumption 2′ . Assumption 2′ (Convex flow cost). Function h : R+ 7→ R+ is C (2) smooth, h′ (c) > 0, (C ) ∃ε > 0 s.t. h′′ (c) ≥ ε > 0, ∀dt > 0. f (C) is defined by f (C) = dt · h dt Assumption 2′ restrict the cost function f to be C (2) smooth strictly convex function: acquiring additional unit of information will be of increasing marginal cost. This gives the DM incentive to smooth the acquisition of information over time instead of learning everything and making decision instantly. With Assumption 2′ , solution to Equation (1) will be the solution to following functional equation: { ∑ ( ) } ∑ H(µ) − p H(µ ) i i pi Vdt (µi ) − h Vdt (µ) = max F (µ), supe−ρdt dt (7) dt pi ,µi ∑ s.t. pi µi = µ The continuous time version of Equation (7) will be the following Bellmen equation on V ∈ C (1) [0, 1]: { } 2 ∑ D V (µ) ρV (µ) = max ρF (µ), max σ̂ 2 − h(c) pi (V (µi ) − V (µ) − V ′ (µ)(µi − µ)) + µi ∈[0,1], 2 pi ,σ̂,c∈R+ (8) where c = − ∑ pi (H(µi ) − H(µ) − H ′ (µ)(µi − µ)) − ′′ H (µ) 2 σ̂ 2 Equation (8) is an analog to Equation (4). Flow gain and cost of experimentation are defined in the same way as in Equation (4). The only difference is that instead of imposing a constraint on maximal flow cost c, I assume the DM actually pays a convex function of flow cost h(c). ′ Theorem 3. With Assumptions 1, there exists V ∈ C (1) [0, 1] solving 2 and 3 satisfied, { } Equation (8). Let E = µ ∈ [0, 1]V (µ) > F (µ) be experimentation region, there ∃ ν : E 7→ [0, 1] \ E , c ∈ C (1) (E) s.t. ρV (µ) = − c(µ) F (ν(µ)) − V (µ) − V ′ (µ)(ν(µ) − µ) − h(c(µ)) H(ν(µ)) − H(µ) − H ′ (µ)(ν(µ) − µ) where ν and c have the following properties: 1. ∃µ∗ ∈ (0, 1) s.t. ν(µ) ≥ µ when µ ≥ µ∗ and ν(µ) ≤ µ when µ ≤ µ∗ . 2. ν(µ) is piecewise C (1) smooth on E. 3. ν(µ) is piecewise strictly decreasing on E. 4. c(µ) is strictly quasi convex and minimized at µ∗ . 17 5. (ν(µ), c(µ)) is unique solution to Equation (8) almost everywhere. Theorem 3 proves existence of solution to Equation (8) and provides characterization of optimal strategy. Other than property 4, the optimal strategy shares the same set of properties as in Theorem 2. The optimal value function can be achieved through evidences arriving as Poisson process. The DM takes immediate action upon arrival of signals. Unique optimal signal will be confirmatory evidence arriving at increasing precision conditional on continuation. Property 4 states that intensity of experimentation (parametrized by flow cost) is higher when prior belief is further away from µ∗ . Since the belief process is always drifting towards µ∗ conditional on continuation, this implies that the DM will invest decreasing amount of resources into information acquisition. The intuition for dynamics of experimentation intensity is already provided in Moscarini and Smith (2001). The marginal gain from experimentation is proportional to continuation value while marginal cost is increasing in c. Therefore, the optimal cost is actually isomorphic to value function. In Moscarini and Smith (2001) intensity and precision of experimentation shares the same parameter (flow variance of diffusion process). My analysis identifies this intuition separately from the intuition of lower signal precision associated with higher continuation value. 4.2. Continuum of Actions In this section, I will study extension of my model to accommodate a continuum of actions in the underlying decision problem i.e. |A| = ∞. Then mathematically, the only difference is that value from immediate action is no-longer a piecewise linear function: F (µ) = sup E[u(a, x)] a∈A There will be several technical problems arising from a continuum of actions. For example whether the supremum is achievable and whether F has bounded subdifferential. We impose the following assumptions: Assumption 4 (Continuum of actions). F (µ) = maxa∈A E [u(a, x)] has bounded subdifferentials. Assumption 4 rules out two cases. First is that the supremum is not achievable. Second case is some optimal action being infinitely risky: the optimal action with belief approaching x = 0 has utility approaching −∞ with belief approaching x = 1 (and similar case with states swapped). A sufficient condition for Assumption 4 will be: Assumption 4′ . A is a compact set. ∀x ∈ X, u(a, x) ∈ C(A) ∩ TB(A). The proof of Theorem 1 doesn’t rely on the fact that F (µ)is piecewise linear. Actually the only necessary properties of F (µ) is boundedness and continuity in Lemma 3 that proves existence of solution to discrete time functional equation Equation (A.1). Therefore Assumption 4 guarantees that Lemma 3 and Lemma 5 still hold when there is a continuum of actions. With Assumption 4, problem with continuum of actions can be approximated well by a sequence of problems with discrete actions. 18 Lemma 6 (Convergence of value function). With Assumptions 1, 2, 3 and 4 satisfied, let Vdt (F ) be the unique solution of Equation (3) and V(F ) = limdt→0 Vdt (F ). Then V is a Lipschitz continuous functional under L∞ norm. Lemma 6 states that a problem with continuum of actions can be approximated well by sequences of problems with discrete actions in the sense of value function convergence. Next, I push the convergence criteria further to convergence of policy function. Theorem 4 (Convergence of policy function). With Assumptions 1, 2, 3 and 4 satisfied, let {Fn } be a set of piecewise linear functions on [0,1] satisfying: 1. ∥Fn − F ∥∞ → 0; 2. ∀µ ∈ [0, 1], lim Fn′ (µ) = F ′ (µ). Define Vdt (Fn ) as the solution to Equation (3). Define functional V(F ) = limdt→0 Vdt (F ). Then |V(F ) − V(Fn )| → 0. What’s more: 1. V(F ) solves Equation (4). 2. ∀µ s.t. V (µ) > F (µ), let νn be maximizer of V(Fn ) s.t. ν = limn→∞ νn exists, then ν achieves V(F ) at µ. Theorem 4 states that to solve for a continuous time problem with a continuum of actions, one can simply use both value function and policy function from problem with finite actions to approximate. As long as the immediate action values Fn converge both uniformly in value and pointwise in first derivative, the optimal value functions have a uniform limit. The limit will solve Equation (4) and the optimal policy function will be the pointwise limit of policy functions for finite action problems. V 0.6 ν 1.0 0.5 0.9 0.4 0.8 0.3 0.7 0.2 0.6 0.1 0.6 0.7 0.8 0.9 1.0 μ 0.5 0.6 0.7 0.8 0.9 1.0 μ The left panel shows the optimal policy function of discrete actions (red) and continuous actions (blue). The dashed line is ν = µ. The right panel shows the optimal value function. The thin black line is value from immediate action F (µ), the dashed lines are discrete approximations of the continuous action value. Figure 7: Approximation of a continuum of actions Figure 7 illustrates this approximation process. On both panels, only µ ∈ [0.5, 1] is plotted. On the right panel, the thin black curve shows a smooth F (µ) associated with continuum of actions. Since optimal policy only utilizes a subset of actions, I approximate 19 the smooth function only locally as the upper envelope of dashed lines (each represents one action). The optimal value function with continuous actions is blue curve and the approximation is red curve. Left panel shows the approximation of policy function. The blue smooth curve is optimal policy of continuous action problem and the red curve with breaks is optimal policy of finite action problem. To approximate a smooth F (µ), one can simply add more and more actions to the finite action problem and use F ’s supporting hyper planes to approximate it. Then the optimal policy function will have more and more breaks as optimal policy will involve more frequent jumping among actions. In the limit, as number of breaks grow to infinity, the size of breaks shrinks to zero and approaches a smooth policy function. 4.3. General Information Measure In this section, I extend my model to information measures other than the posterior separable measure. A general critique to posterior separable measures including mutual information is that the information measure of the same Blackwell experiment is prior dependent. I setup a continuous time Bellmen equation with a general information cost structure which imposes no specific link between prior and posterior. I want to show that one key feature—Evidence seeking I identified in the baseline model is generic. Let J(µ, ν) be a sufficiently smooth function. Consider the following functional equation: { } ∑ ρV (µ) = max ρF (µ), sup pi (V (νi ) − V (µ) − V ′ (µ)(νi − µ)) + σ 2 V ′′ (µ) (9) pi ,νi ,σ 2 s.t. ∑ ′′ pi J(µ, νi ) + σ 2 Jνν (µ, µ) ≤ c The objective function of Equation (9) is exactly the same as that of Equation (4). I assume the DM choose both Poisson signals and diffusion signal. The gain from experimentation is assumed in a way that diffusion signal is consistent with the uninformative limit of Poisson signal. The information measure is different. I assume J(µ, ν) to be an arbitrary function which is both prior and posterior dependent (of course this can also accommodate prior independent measures). Cost of Poisson signal is assumed to be pi J(µ, νi ) instead of a totally general function on (pi , µ, νi ) to capture the fact that DM can always break a signal into several ones with lower arrival rate but inducing same posterior. Cost of diffusion signal is assumed to be consistent with uninformative limit ′′ (µ, µ). We impose following of Poisson signal. So Taylor expansion implies it being σ 2 Jνν assumptions on J(µ, ν): ′′ Assumption 5. J ∈ C (4) (0, 1). ∀µ ∈ (0, 1), J(µ, µ) = Jν′ (µ, µ) = 0, Jνν (µ, µ) > 0. First J is assumed to be sufficiently smooth to eliminate technical difficulty. J(µ, µ) = ′′ (µ, µ) > 0 is neces= 0 is necessary for uninformative experiment being free. Jνν sary for any signal that’s not totally uninformative being costly. Within this continuous time framework, the assumptions imposed on J is without loss of generality. Jν′ (µ, µ) 20 Theorem 5. Suppose V ∈ C (3) (0, 1) solves Equation (9) and Assumption 5 is satisfied, let L(µ) be defined by: (3) (3) (3) ρ ′′ 2Jννµ (µ, µ)2 + Jννν (µ, µ)Jννµ (µ, µ) (4) (4) L(µ) = Jνν (µ, µ)2 − + Jνννµ (µ, µ) + Jννµµ (µ, µ) ′′ c Jνν (µ, µ) Then in the open region: { } D = µV (µ) > F (µ) and L(µ) ̸= 0 The set of µ s.t.: ρV (µ) = c V ′′ (µ) ′′ (µ, µ) Jνν will be of zero measure. The interpretation of Theorem 5 is that Poisson signal is almost always going to be strictly superior than diffusion signal. In the experimentation region where L(µ) ̸= 0, V (µ) can be achieved by a diffusion signal only at a zero measure of points. L(µ) = 0 is a partial differential equation on J(µ, ν). Therefore, the set of points that L(µ) = 0 could contain interval only when J(µ, ν) is locally solution to the PDE. Solution to a specific PDE is a non-generic set in the set of all functions satisfying Assumption 5. In this sense, for an arbitrary information measure J(µ, ν), the optimal policy function will not contain diffusion signal almost everywhere. A simple sufficient condition for L(µ) ̸= 0 is Assumption 6. (3) Assumption 6. ∀µ ∈ (0, 1), Jννµ (µ, µ) = 0. Assumption 6 states that the local concavity of J w.r.t. ν at ν = µ is constant for different µ. The information measure I defined in main model satisfies this. With Assumption 6, I get a direct corollary of Theorem 5: Corollary 6. Suppose V ∈ C (3) (0, 1) solves Equation (9), Assumptions 5 and 6 are satisfied, then the set of µ s.t.: ρV (µ) = c V ′′ (µ) ′′ (µ, µ) Jνν will be of zero measure. 4.4. Connection to Static Problem In this section, I establish connection between the dynamic information acquisition studied in current paper with static information acquisition problems studied in literature. I will show that in two kinds of limit, constant discounting and linear flow cost, the dynamic information acquisition problem reduces to a quite standard static rational inattention problem studied in Matejka and McKay (2014). Constant discounting will be discussed in Section 4.4.1 and linear cost will be discussed in Section 4.4.2. 21 4.4.1. Constant discounting In this subsection, I study the case in which the DM doesn’t discount her utility exponentially. In stead, she pays a constant flow cost m > 0 for waiting. The Bellman equation becomes: ∑ 1 m = sup pi (V (νi ) − V (µ) − V ′ (µ)(νi − µ)) + σ 2 V ′′ (µ) (10) 2 pi ,νi ,σ i ∑ 1 s.t. pi (H(µ) − H(νi ) + H ′ (µ)(νi − µ)) − σ 2 H ′′ (µ) ≤ c 2 Equation (10) differs from Equation (4) only on the LHS of equation. The flow loss from waiting ρV (µ) is replaced by a fixed discount factor m. We are looking for smooth function V s.t. (1) V (0) = F (0), V (1) = F (1). (2) In region V (µ) > F (µ), V is twice differentiable and Functional equation Equation (10) is satisfied. Theorem 7. Suppose Assumption 1 is satisfied. There exists unique V ∈ C (1) [0, 1] solving Equation (10) with the following properties: { } 1. µV (µ) > F (µ) is consisted of finite number of open intervals ∪In . 2. In each interval In , there exists linear function Ln (µ) s.t. V (µ) = Ln (µ) − mc H(µ). Corollary 8. V solving Equation (10) solves the following problem: V (µ) = supE [u(A, X )] − S,A m I (S; X ) c (11) s.t. X → S → A Theorem 7 and Corollary 8 shows that when discounting is replaced by a constant flow cost of waiting, the value function of dynamic information acquisition problem is identical to the value function of static information acquisition problem with a linear cost mc I(S; X ). Theorem 7 also shows that the optimal value function V will be a linear transformation of the uncertainty measure H in every experimentation region. Therefore, V ′′ (µ) V (ν)−V (µ)−V ′ (µ)(ν−µ) it’s easy to see that both c H(µ)−H(ν)+H ′ (µ)(ν−µ) and −c H ′′ (µ) equals m. This implies that actual form of information structure doesn’t matter. The DM is essentially solving the following problem: she first decides how many periods to take to accumulate capacity constraint. Then she chooses optimal information strategy subject to the total constraint. Finally she divides experiment arbitrarily over time such that flow cost constraint is satisfied. Utility from action is not discounted and the cost from waiting will be mc times total capacity constraint. This is essentially the static problem Equation (11). A different interpretation of Theorem 7 is that in the limit when the DM is fully patient (ρ = 0) but she needs to pay the capacity constraint, the dynamic problem reduces to the static problem with linear cost. With this interpretation, the result is logically straight forward. 4.4.2. Linear flow cost In this subsection, I study the case in which instead of a convex flow cost on information measure (Assumption 2′ ), the DM pays a linear cost. Assumption 2′′ (Linear flow cost). Function h is defined by h(c) = λc. 22 Theorem 9. Suppose Assumptions 1 and 2′′ are satisfied. Then solution to the following two-period problem: } { [ −ρdt ] V (µ) = max F [µ], supEµ e u(A, X ) − λI(A; X ) (12) A solves Equation (1). Theorem 9 states that when flow cost function h is linear, the value function of dynamic information acquisition problem is identical to a two stage problem. The DM chooses either an immediate action or acquiring information at linear cost λI(A; X ) and delay action to next period. This is almost a static problem. In fact, I assumed action to be delayed by one period than information for simplicity in the dynamic problem in Section 2. If we tweak the timing a little to allow DM to utilize information acquired in current period to facilitate action in current period, then solution to Equation (1) will totally reduce to a static problem with linear cost. The intuition for this result is simple. Suppose a dynamic information acquisition strategy involves action being taken at some time t. Then combining any information acquired before t into period t strictly reduces total discounted cost because all costs are delayed. Then the strategy becomes waiting for t periods doing nothing and apply a static strategy in period t. Suppose this strategy yields positive payoff. Then it’s profitable not to discount the utility at all and implement the strategy right in first period. On one hand, in this section I showed that my model can serve as a dynamic foundation of the static rational inattention model. The optimal information acquisition strategy in static RI problem can be implemented by optimally searching for informative evidence arriving at Poisson rate. On the other hand, analysis in this section implies that the dynamic framework I setup in current paper is quite tight. Suppose we accept the posterior separable information measure (i.e. the fact that cost of information structure is linearly separable), then a convex cost on information is necessary to give DM incentive to smooth the process of information acquisition over time. Meanwhile, real discounting is necessary to make time distribution of information process matter. 5. Discussions and Conclusion 5.1. Information Measure It’s not hard to see that all assumptions I explicitly made (Assumptions 1, 2 and 3) is either purely technical or can be extended to general case. The only real restriction I imposed on this framework is the structure of information measure. The information measure I used in current paper is a generalized version of Kullback-Leiber divergence (GKLD, standard KLD is introduced in Kullback and Leibler (1951)). (Generalized)KLD measures the distance of prior and posterior belief by measuring the expected distance between Entropy(general convex uncertainty measure) on the two distributions. This helps me to illustrate the key mechanism in a simplest way. The key trade-off in dynamic information acquisition problem is between gain from information (measured by concavity of value function) and loss of information cost (measured by the information measure). With KLD, these two terms are directly comparable. 23 The gross value of a information structure could be measured by linear combination of value function and uncertainty measure. The optimal choice of information structure is determined by concavity of the gross value function. Therefore, it’s clear that studying the endogenous multiplier that adjusts weight on value and cost will be the key. What’s more, there is a clean geometric representation of the whole problem. This convenience comes at a loss of generality in the following sense. GKLD restricts the measure of “compound experiment” to be linearly separable in its components. Geometrically speaking, the function governing information cost (H) at one prior must be exactly the same function at other priors. As is shown in the proof to Theorem 2, this is critical for me to show the immediate action property and confirmatory evidence property. One can imagine an information measure which has the following property: there is a small threshold such that when distance between prior and induced posterior is smaller than the threshold information is almost free but jumping to a posterior further than the threshold incurs a fast increasing cost. Then it will be very profitable to break a long jump into several small ones. Suppose the underlying decision problem makes informative signal very useful (F is globally very convex). Then the DM will find it profitable to jump multiple times to a posterior before taking action. One can also imagine an information measure which makes long jumps very cheap comparing to short ones. Then, it might be profitable to choose contradictory evidence when confirmatory evidence doesn’t increase arrival rate much (approximating the baseline setup in Che and Mierendorff (2016)). To sum up, among the properties characterized in current paper, optimality of Poisson signal and monotonicity of experimentation intensity are generally robust to alternative structures of information measure. Immediate action and confirmatory evidence property are more specific to the generalized KLD framework, or equivalently linear separability of information measure. 5.2. Conclusion This paper provides a framework of dynamic information acquisition problem which allows fully general design of information flow and characterizes the optimal information acquisition strategy. My first contribution to the literature is a robust optimization foundation for a family of simple random process: for a generic information acquisition problem with flexibility in experiment design, Poisson signals which induces jumps in posterior belief will be the endogenously optimal form of signal structure. Second, I contribute to the discussion of optimal experimentation design by providing a complete characterization of optimal solution for a subset of problems with posterior separable measure: it is optimal to seek for an informative evidence which confirms prior belief and lead to immediate action at increasing precision and decreasing intensity over time. This paper also contributes to the rational inattention literature by providing a dynamic foundation: static rational inattention is both the full patience limit and linear flow cost limit of the dynamic information acquisition problem. 24 Aumann, R. J., Maschler, M., and Stearns, R. E. (1995). Repeated games with incomplete information. MIT press. Blackwell, D. et al. (1951). Comparison of experiments. In Proceedings of the second Berkeley symposium on mathematical statistics and probability, volume 1, pages 93– 102. Che, Y.-K. and Mierendorff, K. (2016). Optimal sequential decision with limited attention. Cover, T. M. and Thomas, J. A. (2012). Elements of information theory. John Wiley & Sons. Kamenica, E. and Gentzkow, M. (2009). Bayesian persuasion. Technical report, National Bureau of Economic Research. Kullback, S. and Leibler, R. A. (1951). On information and sufficiency. The annals of mathematical statistics, 22(1):79–86. Matejka, F. and McKay, A. (2014). Rational inattention to discrete choices: A new foundation for the multinomial logit model. The American Economic Review, 105(1):272– 298. Moscarini, G. and Smith, L. (2001). The optimal level of experimentation. Econometrica, 69(6):1629–1644. Sims, C. A. (2003). Implications of rational inattention. Journal of monetary Economics, 50(3):665–690. Wald, A. (1947). Foundations of a general theory of sequential decision functions. Econometrica, Journal of the Econometric Society, pages 279–313. 25 Contents A Proofs in Section 2 A.1 Concavification . . . . . . . . . . . . . A.2 Simplification of Information Structure A.2.1 Proof of Lemma 1: . . . . . . . A.2.2 Proof of Lemma 2: . . . . . . . A.2.3 Proof of Lemma 3 . . . . . . . . A.3 Continuous Time Limit of Experiments A.3.1 Proof of Lemma 4 . . . . . . . . B Proofs in Section 3 B.1 Convergence . . . . . . . . B.1.1 Proof of Lemma 5 . B.1.2 Proof of Theorem 1 B.2 Characterization . . . . . B.2.1 Proof of Theorem 2 . . . . . . . . . . C Proofs in Section 4 C.1 Convex Flow Cost . . . . . . . C.1.1 Proof of Theorem 3 . . C.2 Continuum of Actions . . . . C.2.1 Proof of Lemma 6 . . . C.2.2 Proof of Theorem 4 . . C.3 General Information Measure C.3.1 Proof of Theorem 5 . . C.4 Connection to Static Problem C.4.1 Proof of Theorem 7 . . C.4.2 Proof of Corollary 8 . C.4.3 Proof of Theorem 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 26 30 30 30 35 38 38 . . . . . 40 40 40 42 51 51 . . . . . . . . . . . 74 74 74 90 90 91 97 97 99 99 101 101 A. Proofs in Section 2 This section contains formal proofs for theorems and lemmas in Section 2. A.1. Concavification Theorem 10 (Concavification). Let X be a finite state space, V ∈ C (∆X), µ ∈ ∆X. H ∈ C (∆X) is non-negative. f : R+ 7→ R+ increasing and convex. Then there exists τ s.t. |supp(τ )| ≤ 2 |X| solving: sup Eτ [V (µ′ )] − f (H(µ) − Eτ [H(µ′ )]) (A.1) τ ∈∆2 X s.t. Eτ [µ′ ] = µ And there exists C, λ ∈ R+ , s.t. C = H(µ) − Eτ [H(µ′ )] , Eτ [µ′ ] = µ, λ ∈ df (C) and: Co(V + λH)(µ′ ) = Eτ [(V + λH)(µ′ )] 26 Proof. We first show that fix C > 0, there exists λ ≥ 0 and τ as defined in Theorem 10 s.t. the following constrained maximization problem is solved by τ : Eτ [V (µ′ )] = sup Ep [V (µ′ )] (A.2) p∈∆2 X s.t. Ep [µ′ ] = µ H(µ) − Ep [H(µ′ )] ≤ C • Define correspondence c∗ : R+ ⇒ R+ as: { { |supp(p)| ≤ |X| , E [µ′ ] = µ p c∗ (λ) = H(µ) − Ep [H(µ′ )] Co(V + λH)(µ) = Ep [(V + λH)(µ′ )] } Where Co(V + λH) is defined as the upper concave hull of V + λH. First, we show that c∗ (λ) has non-empty graph. ∀λ ≥ 0, if (V + λH)(µ) = Co (V + λH) (µ), then let p = δµ and c∗ (λ) = {0}. If Co(V + λH)(µ) > (V + λH)(µ), then by definition of upper concave hull, it must be a cavexification of the graph of V +λH. Since ∆X is |X|−1 dimensional, the graph of V + λH is |X| dimensional. Therefore, there exists at most |X| ∑ ∑ different (µi ) and αi = 1 s.t. α1 (µi , (V + λ(H))(µi )) = (µ, Co(V + λH)(µ)). Let τ be a distribution defined as taking probability αi on posteriors µi , then τ ∈ c∗ (λ). Second, c∗ has closed graph. Consider any λn → λ, cn → c. Let pn be the corresponding signal structure s.t. H(µ) − Epn [H(µ′ )] = cn . pn can be summarized by |X| n × ∆(X). By compactness of Euclidean space, we can find a converging (µ′n i , pi ) ∈ R ∑ k subsequence of µ′n → µ′i , pni k → pi . Continuity of H guarantees H(µ)− pi H(µ′i ) = c. i ∑ Convergence guarantees pi µ′i = µ. Then Co(V + λH)(µ) ≥ Ep [(V + λH)(µ′ )]. Continuity of H and V guarantees that Ep [(V + λH)(µ′ )] = lim Co(V +λn H)(µ). To porve that c ∈ c∗ (λ), it’s sufficeint to prove that Co(V +λH)(µ) is continuous in λ. ∀λ′ ∈ R+ , ∀gi = (µi , (V + λH)(µi )) in graph of (V + λH), we pick gi′ = (µi , (V + λ′ H)(µi )) in graph of (V + λ′ H). Let M = maxµ H, then |gi − gi′ | ≤ |λ − λ′ | M . Suppose ∑ (µ, Co(V + λH)(µ)) = αi gi , then: ∑ ∑ ∑ ′ ′ α i gi − αi gi = αi gi − (µ, Co(V + λH)(µ)) ≤ |λ − λ′ | M Therefore, Co(V + λ′ H)(µ) ≥ Co(V + λH)(µ) − |λ − λ′ | M . Since λ and λ′ are interchangeble, we actually showed that Co(V + λH)(µ) is Lipschtiz continuous in λ with parameter M . Therefore c ∈ c∗ (λ). • The next step is to convexify c∗ (λ). We take b c∗ (λ) = cov (c∗ (λ)). Then b c∗ also has closed graph: Consider any λn → λ, cn → c, the corresponding signal structure can be summarized by (αn , µ′i n , pni , µ′j n , pnj ). By compactness of Euclidean space, we can find converging subsequence to (α, µ′i , pi , µ′j , pj ). We know that (µ′i , pi ) and (µ′j , pj ) are both ∑ ∑ ∑ in c∗ (λ). c = H(µ) − α pi H(µi ) − (1 − α) pj H(µj ) = α (H(µ) − pi H(µi )) + (1 − ∑ α) (H(µ) − pj H(µj )) ∈ cov(c∗ (λ)). • Let C = limλ→+∞ {c∗ (λ)}. C must be zero. Suppose not, then H(µ) > Epn [H(µ′ )] + ε for λn → ∞ and pn defined as in c∗ (λn ). Then Epn [(V + λn H)(µ′ )] − λn H(µ) ≤ 27 Epn [V (µ′ )] − ε) is going to −∞. However, Co(V + λn H)(µ) − λn H(µ) ≥ V (µ) is positive. Let C = sup {c∗ (0)} is the cost associated with unconstrained maximization of Ep [V ]. Therefore for any C in [0, C], since b c∗ has convex value and closed graph, exists λ s.t. b c∗ (λ) = C. For any C > C, we take λ = 0. Since the image of c∗ is one-dimensional, by convexification we took in last step, there exists τ1 , τ2 satisfying the definition of c∗ (λ) s.t. H(µ) − Eατ1 +(1−α)τ2 [H(µ′ )] = C. Let τ = ατ1 + (1 − α)τ2 , then |supp(τ )| ≤ 2 |X|. • Suppose now exists p s.t. Ep [µ′ ] = µ, H(µ)−Ep [H(µ′ )] ≤ C and Ep [V (µ′ )] > Eτ [V (µ′ )]. Then by definition Ep [V (µ′ ) + λ(H(µ′ ) − H(µ))] > Eτ [V (µ′ ) + λ(H(µ′ ) − H(µ))]. This contradicts the fact that Eτ [(V + λH)(µ)] achieves the upper concave hull of V + λH. So τ defined here actually solves Equation (A.2). Let V ∗ (C) be the maximum in Equation (A.2). Now we show concavity of V ∗ . Take any C1 , C2 ∈ [0, C], α ∈ [0, 1]. Let τ1 and τ2 be signal structures that achieves V ∗ (C1 ) and V ∗ (C2 ). Consider a compound signal ατ1 + (1 − α)τ2 . Then H(µ) − Eατ1 +(1−α)τ2 [H(µ′ )] = H(µ) − αEτ1 [H(µ′ )] − (1 − α)Eτ2 [H(µ′ )] = αC1 + (1 − α)C2 . Eατ1 +(1−α)τ2 [V (µ′ )] = αV ∗ (C1 ) + (1 − α)V ∗ (C2 ). Therefore, since ατ1 + (1 − α)τ2 is feasible under cost αC1 + (1 − α)C2 , we have V ∗ (αC1 + (1 − α)C2 ) ≥ αV ∗ (C1 ) + (1 − α)V ∗ (C2 ). It’s obvious that V ∗ (C) is weakly increasing in C when C ∈ [0, C]. When C > C, constraint is never binding, V ∗ (C) is flat. Therefore, V ∗ (C) is concave globally for C ∈ R+ . Given concavity of f , V ∗ (C)−f (C) is weakly concave and maximum exists. We know that C maximizes (V ∗ − f ) if and only if dV ∗ (C) ∩ df (C) ̸= ∅. Therefore it remains to solve for the general derivative of V ∗ . It’s existence is guaranteed by concavity. We calculate it by proving a generalized envelope theorem. By our construction of V ∗ (C), ∀C there exists λ(C) s.t. c∗ (λ(C)) = C. Let τ (C) be the associated signal structure that maximize V + λ(C)H (as well as Equation (A.2)). Then for any C1 : V ∗ (C1 ) − V ∗ (C) ( ) =Eτ (C1 ) [V (µ′ )] + λ(C1 ) C1 − H(µ) + Eτ (C1 ) [H(µ′ )] ( ) − Eτ (C) [V (µ′ )] − λ(C) C − H(µ) + Eτ (C) [H(µ′ )] ( ) ∫ ∫ ′ ′ ′ ′ = V (µ )(τ (C1 ) − τ (C))dµ + λ(C) C1 − C + H(µ )(τ (C1 ) − τ (C))dµ ( ) + (λ(C1 ) − λ(C)) C1 − H(µ) + Eτ (C1 ) [H(µ′ )] ∫ = (V (µ′ ) + λ(C)H(µ′ )) (τ (C1 ) − τ (C))dµ′ + λ(C1 )(C1 − C) ( ) + (λ(C1 ) − λ(C)) C − H(µ) + Eτ (C1 ) [H(µ′ )] ∫ = (V (µ′ ) + λ(C)H(µ′ )) (τ (C1 ) − τ (C))dµ′ + λ(C1 )(C1 − C) + (λ(C1 ) − λ(C)) (C − C1 ) ∫ = (V (µ′ ) + λ(C)H(µ′ )) (τ (C1 ) − τ (C))dµ′ + λ(C)(C1 − C) ≤λ(C)(C1 − C) 28 The last inequality is from the fact that τ (C) maximizes Eτ [V + λ(C)H]. On the other hand: V ∗ (C1 ) − V ∗ (C) ) ( ∫ ∫ ′ ′ ′ ′ = V (µ )(τ (C1 ) − τ (C))dµ + λ(C1 ) C1 − C + H(µ )(τ (C1 ) − τ (C))dµ ( ) + (λ(C1 ) − λ(C)) C − H(µ) + Eτ (C) [H(µ′ )] ∫ = (V (µ′ ) + λ(C1 )H(µ′ )) (τ (C1 ) − τ (C))dµ′ + λ(C1 )(C1 − C) ( ) + (λ(C1 ) − λ(C)) C1 − H(µ) + Eτ (C) [H(µ′ )] ∫ = (V (µ′ ) + λ(C1 )H(µ′ )) (τ (C1 ) − τ (C))dµ′ + λ(C1 )(C1 − C) ≥λ(C1 )(C1 − C) Last inequality is from the fact that τ (C1 ) maximizes Eτ [V + λ(C1 )H]. Combining both, we have ∀dC > 0: V ∗ (C + dC) − V ∗ (C) λ(C) ≥ ≥ λ(C + dC) dC V ∗ (C − dC) − V ∗ (C) λ(C − dC) ≥ ≥ λ(C) −dC [ ] =⇒ dV ∗ (C) ⊂ limdC→0+ λ(C + dC), limdC→0+ λ(C − dC) What’s more, previous proof implies that λ(C) is weakly decreasing. Since λ(C) is pick arbitrarily from the inversed graph of b c∗ (λ), this implies that b c∗ (λ) is weakly decreasing in set order i.e. ∀λ1 > λ, inf c∗ (λ) ≥ sup c∗ (λ1 ). Define Λ(C) = {λ|C ∈ b c∗ (µ)}, then we show that: [ ] limdC→0+ λ(C + dC), limdC→0+ λ(C − dC) = Λ(C) [ ] Monotonicity of c∗ implies Λ(C) ⊂ limdC→0+ λ(C + dC), limdC→0+ λ(C − dC) . Close [ ] graph property implies limdC→0+ λ(C + dC), limdC→0+ λ(C − dC) ⊂ Λ(C). Finally, we show that: dV ∗ (C) = Λ(C) Λ(C) is a correspondence with convex value, closed graph and decreasing in set order. By definition of subdifferential to concave function, dV ∗ (C) is also a correspondence with convex value, closed graph and decreasing in set order. Suppose dV ∗ (C) ⫋ Λ(C), first possibility is that inf dV ∗ (C) > inf Λ(C) = limC ′ →C + Λ(C ′ ) ≥ limC ′ →C + dV ∗ (C ′ ). Second ∑ ∗ possiblity is that dV (C) < sup Λ(C) = limC ′ →C − Λ(C ′ ) ≤ limC ′ →C − dV ∗ (C ′ ). Both case contradicts closed graph property of dV ∗ . To sum up, we showed that V ∗ is the maximum of Equation (A.1) if and only if exists C s.t.: df (C) ∪ Λ(C) ̸= ∅ That is to say ∃τ s.t. |supp(τ )| ≤ 2 |X|, Eτ [µ′ ] = µ, H(µ) − Eτ [H(µ′ )] = C and Co(V + λH)(µ) = Eτ [(V + λH) (µ′ )]. 29 A.2. Simplification of Information Structure A.2.1. Proof of Lemma 1: Proof. 1. Markov property: Suppose the signal realization of S, T are denoted by s, t. Then: I(T ; X |S) = Es [H(µ(x|s)) − Et [H(µ(x|t, s))|s]] = Es [H(µ(x|s)) − Et [H(µ(x|s))|s]] =0 First equality is by definition of I. Second equality is by T ⊥X S, then conditional on s, t will not shift belief of X at all. 2. Chain rule: Suppose the signal realization of S, T are denoted by s, t. Then: I(S, T ; X |µ) = Es,t [H(µ) − H(µ(x|s, t))] = Es,t [H(µ) − H(µ(x|s)) + (H(µ(x|s)) − H(µ(x|s, t)))] = (H(µ) − Es [H(µ(x|s))]) + (Es [H(µ(x|s)) − Et [H(µ(x|s, t))|s]]) = I(S; X |µ) + E[I(T ; X |S, µ)] First equality is by definition. Second equality is trivial. Third equality is by chain rule of conditional expectation. 3. Information processing inequality: I(S; X |µ) = I(S, T ; X |µ) − I(T ; X |S, µ) = I(S, T ; X |µ) = I(T ; X |µ) + I(S; X |T , µ) ≥ I(T ; X |µ) First and third equalities are from chain rule. Second equality is from Markov property. A.2.2. Proof of Lemma 2: Proof. I break the proof of Lemma 2 into three lemmas. Lemma 7 shows that solving Equation (1) is equivalent to solving Equation (A.3), which reduces the signal structure to be nested, containing only action as direct signals and continuation signals. Then Lemma 8 shows that solving Equation (A.3) is equivalent to solving Equation (A.4), which transforms abstract random process formulation to conditional distribution formulation. Then Lemma 9 shows that solving functional equation Equation (A.5) is equivalent to solving sequential problem Equation (A.4) using the standard methodology. Finally, we apply Theorem 10 to Equation (A.5) to further reduce the dimensionality of strategy space to Equation (2). 30 ( ) Lemma 7 ( (Reduction )of redundency). S t , T , AT solves Equation (1) if and only if there exists SeT , T , AT solving : ∞ ( ∑ sup S t ,T ,AT ( [ ]) e−ρdt P[T = t] E u(At , X )|S t−1 , T = t (A.3) [ ( ( ]) t−1 )) t T > t − P[T > t]E f I Se ; X Se t−1 t S are degenerate. S , 1 T >t+1 s.t. Set−1 T =t = At T =t Set = Set−1 T ≤t T ≤t t=0 What’s more, the optimal utility level is same in Equation (1) and Equation (A.3). Proof. Suppose (S t , T , At ) is a solution to Equation (1). First, we combine signal T into S to get Sbt = (S t , 1T >t+1 ). Consider: ( ) I(Sbt ; X |Sbt−1 ) = I S t , 1T >t+1 ; X |S t−1 , 1T >t [ ( )] = E1T >t I S t , 1T >t+1 ; X S t−1 , 1T >t = I(S t , 1T >t+1 , 1T >t ; X |S t−1 ) − I(1T >t ; X |S t−1 ) = I(S t , 1T >t+1 ; X |S t−1 ) = I(S t ; X |S t−1 ) Second equality is from Markov property X → S t−1 → 1T >t . Then 1T >t doesn’t affect posterior belief given S t−1 . Third equality is from chain rule. Forth and fifth equality ( is from )Markov chain property. Now define a nested information structure t Sb = Sb0 , . . . , Sbt , then: ( ) ( ) I Sbt ; X |Sbt−1 =I Sb0 , . . . , Sbt ; X Sb0 , . . . , Sbt−1 ( ) ( ) 0 0 t t−1 0 t−1 t−1 b b b b b b b =I S ; X S , . . . , S + I S ,...,S ;X S ,...,S ( ) =I Sbt ; X S 0 , . . . , S t−1 , 1T >0 , . . . , 1T >t [ ( )] =E1T >t I Sbt ; X S 0 , . . . , S t−1 , 1T >0 , . . . , 1T >t ( ) 0 t t−1 b =I S , 1T >t ; X S , . . . , S , 1T >0 , . . . , 1T >t−1 ( ) − I 1T >t ; X S 0 , . . . , S t−1 , 1T >0 , . . . , 1T >t−1 ( ) =I Sbt ; X S 0 , . . . , S t−1 , 1T >0 , . . . , 1T >t−1 [ ( )] + ESbt I 1T >t ; X S 0 , . . . , S t , 1T >0 , . . . , 1T >t−1 =··· ( ) =I Sbt ; X S t−1 =I(S t ; X |S t−1 ) 31 Now we define Set : et S = bt−1 S when T < t + 1 At+1 , Sbt−1 Sbt when T = t + 1 when T > t + 1 Initial information Se−1 is defined as a degenerate(uninformative) signal and induced belief is the prior. Verify the properties of Set : 1. Set−1 |T =t = At |T =t is satisfied by definition. What’s more, Set−1 is sufficient for 1T >t and At by definition. 2. By definition, 1T >t+1 is contained in Sbt . Sbt−1 is also contained in Sbt . Therefore Set−1 , 1T >t+1 Set is degenerate by definition when T > t + 1. When T ≤ t + 1, 1T >t+1 is degenerate automatically. Of(course this ) is a special case of the Markov chain condition t t in Equation (1). Therefore Se , T , A is feasible in both problem Equation (1) and Equation (A.3). ( ) 3. Now we calculate the information cost associated with Set , T , At : ( ) bt−1 ; X Set−1 , T < t + 1 I S ( ) ( ) I Set ; X Set−1 = P[T = t + 1]I At+1 , Sbt−1 ; X Set−1 , T = t + 1 ( ) + P[T > t + 1]I Sbt ; X Set−1 , T > t + 1 if T < t + 1 if T ≥ t + 1 0 if T < t + 1 ( ) = P[T = t + 1]I At+1 ; X Set−1 , T = t + 1 ( ) if T ≥ t + 1 + P[T > t + 1]I Sbt ; X Set−1 , T > t + 1 ( ) ≤I Sbt ; X Sbt−1 ( ) =I S t ; X |S t−1 ( ) Therefore, Set , T , At dominates the original solution in Equation (1) by achieving same action profile but lower costs. By optimality, ( )they must achieve the same utility t t e level. What remains to be proved is that S , T , A solves Equation (A.3). Of course ( ) t t e S , T , A is feasible under Equation (A.3). Suppose Equation (A.3) has a solution ( ) Set , T , At . We know that it’s feasible in Equation (1): [ E e−ρdt·T E[u(AT , X )|S T −1 ] − = ∞ ( ∑ ∞ ∑ ( ( )) e−ρdt·t f I Set ; X Set−1 t=0 ] [ ( ( ))]) ( [ ]) e−ρdt P[T = t] E u(At , X )|S t , T = t − E f I Set ; X Set−1 t=0 ∞ ( ( [ ]) ∑ −ρdt t t−1 e = e P[T = t] E u(A , X )|S , T = t t=0 32 [ ( ( ]) t−1 )) t e e T >t − P[T > t]E f I S ; X S ( ) The last equality is from I Set ; X Set−1 , T ≤ t = 0. Therefore, the maximum of Equation (1) should be higher than maximum of Equation (A.3). Combining the previous proof, they must be identical. ( ) Lemma 8 (Tranformation of space). S t , T , AT solves Equation (1) if and only if there exists pt (µt+1 |µt ) : ∆X 7→ ∆2 X and qst (µt ) : ∆X 7→ [0, 1] solving: ) ∫ [( ∞ ∑ ∑ max u(a, xj ) · µj qst (µt ) e−ρdt·t sup (A.4) (pt ,qst ) t=0 ( −f ∫ H(µt ) − j t−1 ∏ ∆X t−1 ] ) H(µt+1 )pt (µt+1 |µt )dµt+1 (1 − qst (µt )) ∆X (∫ ∫ a ∆X ) pτ (µτ +1 |µτ ) (1 − qsτ (µτ )) dµ1 . . . µt−1 dµt τ =0 µpt (µ|µt )dµ = µt s.t. ∆X What’s more, the optimal utility level is same in Equation (1) and Equation (A.4). Proof. Let pt (·|µt ) be the distribution of posteriors generated by Set T >t,Set−1 =Set−1 , where µt [ ] is posterior belief associated with signal Set−1 . Let qst (µt ) = P T = tSet−1 = Set−1 , T ≥ t . e T , A with the conditional distribuNow we can explicity represent the distribution of S, tions. First, P[T = t] and P[T > t] can be calculated by integrating qst (µt ): [ ] −1 e P[T = t] =E P[T = t|S ] [ [ ] [ ]] =E P T = tSe−1 , T > 0 P T > 0Se−1 [ ] =(1 − qs0 (µ0 ))P T = tT > 0 [ [ ]] =(1 − qs0 (µ0 ))E P T = tT > 0, Se0 ∫ [ ] 0 0 =(1 − qs (µ )) P T = tT ≥ 1, Se0 p0 (µ1 |µ0 )dµ1 ∫ [ ] [ ] 0 0 =(1 − qs (µ )) P T = tT > 1, Se0 P T > 1T ≥ 1, Se0 p0 (µ1 |µ0 )dµ1 ∫ ] ] [ [ 0 1 1 0 e =(1 − qs (µ )) E P T = t T > 1, S µ (1 − qs1 (µ1 ))p0 (µ1 |µ0 )dµ1 =··· ∫ ∏ t−1 pτ (µτ +1 |µτ )(1 − qsτ (µτ ))qst (µt )dµ1 . . . µt = τ =0 Similarly, we can get: P [T > t] = ∫ ∏ t−1 pτ (µτ +1 |µτ )(1 − qsτ (µτ ))(1 − qst (µt ))dµ1 . . . µt τ =0 33 Then we can calculate the joint distribution of T and µt : ∫ ∏ t−1 [ ] t P T = t, µ = ν = pτ (µτ +1 |µτ )(1 − qsτ (µτ ))qst (µt )dµ1 . . . µt−1 τ =0 ∫ ∏ t−1 [ ] t P T > t, µ = ν = pτ (µτ +1 |µτ )(1 − qsτ (µτ ))(1 − qst (µt ))dµ1 . . . µt−1 τ =0 Therefore: ∫ ∏t−1 τ τ +1 τ |µ )(1 − qsτ (µτ ))dµ1 . . . µt−1 qst (µt ) t τ =0 p (µ A T =t ∼ ∫ ∏t−1 pτ (µτ +1 |µτ )(1 − q τ (µτ ))q t (µt )dµ1 . . . µt s s τ =0 ∫ ∏t−1 τ τ +1 τ τ τ |µ )(1 − qs (µ ))dµ1 . . . µt−1 (1 − qst (µt )) τ =0 p (µ Set T >t ∼ ∫ ∏ t−1 τ τ +1 |µτ )(1 − q τ (µτ )(1 − q t (µt ))dµ1 . . . µt s s τ =0 p (µ This implies: [ ] t−1 e P [T = t] E u(A , X ) S , T = t ∫ ∫ t−1 ∑ ∏ t max u(a, xj )µj pτ (µτ +1 |µτ )(1 − qsτ (µτ ))qst (µt )dµ1 . . . µt−1 dµt = ∆X a t ∆X t−1 τ =0 j [ ( ( ] t−1 )) t e e P [T > t] E f I S ; X S T >t ( ) ∫ ∫ t t+1 t t+1 t t+1 = f H(µ ) − H(µ )p (µ |µ )dµ ∆X × ∫ ∏ t−1 ∆X pτ (µτ +1 |µτ )(1 − qsτ (µτ ))(1 − qst (µt ))dµ1 . . . µt−1 dµt τ =0 e T , A solving Equation (A.3), we can conTo sum up, we showed that starting from S, t t struct p , qs such that the value of Equation (A.3) is achieved in Equation (A.4). Next, we start from (pt , q t ) solving Equation (A.4). We can easily define T : T T ≥t,µt ∼ B(qst (µt )) conditionally independent across all t, µt . Set T >t,µt ∼ pt (·|µt ), At T =t,µt = ∑ arg max u(a, xj )µtj . Therefore, the previous calculation shows that the value of Equation (A.4) is also achieved in Equation (A.3). Combining with the previous result, ( we con) e clude that Equation (A.3) and Equation (A.4) are equivalent in the sense that S, T , A solves Equation (A.3) if and only if the corresponding (pt , qst ) solves Equation (A.4). Lemma 9 (Recursive representation). Vdt (µ) is the optimal utility level solving Equation (1) given initial belief µ if and only if Vdt (µ) satisfies the following functional equation: { } ∫ Vdt (µ) = max max E [u(a, x)|µ] , sup e−ρdt ∫ s.t. a p∈∆2 X νp(ν)dν = µ ∫ H(ν)p(ν)dν C = H(µ) − ∆X ∆X 34 Vdt (µ)p(µ)dµ − f (C) ∆X (A.5) Proof. We first derive the recursive representation of Equation (A.4). Consider the following functional equation: [∫ ( ) ∑ Vdt (µ) = sup qs (µ) max u(a, xj )µj + (1 − qs (µ)) Vdt (ν)p(ν|µ)dν qs (µ),p(·|µ) a ( )] ∫ −f H(µ) − H(ν)p (ν|µ) dν ∆X ∫ s.t. νp(ν|µ)dν = µ ∆X ∆X Since RHS is linear in qs (µ), it will be WLOG that we only consider boundary solution qs (µ) ∈ {0, 1}. Therefore, it will be exactly the same as Equation (A.5). Now consider the equivalence between sequential problem and recursive problem. By assumption E [u(a, x)|µ] is bounded above by maxa,x u(a, x). Therefore, e−ρdt·t E[u(a, x)|µ] is uniformly (for all choice of µ, a) converging to zero when t → ∞. Then Vdt (µ) will be the solution of Equation (A.4). A.2.3. Proof of Lemma 3 Proof. { } • Let Z = V ∈ C∆X F ≤ V ≤ F̂ . We define operator: { } ∑ −ρdt T (V )(µ) = max F (µ), max e pi V (µi ) − f (C) pi ,µi ∑ s.t. C = pi (H(µ) − H(µi )) ∑ pi µi = µ (A.6) Noticing that the maximization operator is well defined since V ∈ C∆X. Then T will be a contraction mapping on space (Z, l∞ ). • T (Z) ⊂ Z: First it’s obvious that by choosing an uninformative signal structure µi = µ, constraints in Equation (A.6) are satisfied and T (V )(µ) ≥ F (µ). What’s more, since F̂ (µ) is the full information limit,}the upper concave hull of F̂ will be F̂ itself. Therefore { −ρdt ˆ T (V )(µ) ≤ max F (µ), e F (µ) ≤ F̂ (µ). What remains to be shown is that ∀V ∈ Z, T (V ) will be continuous. We first show that T (V ) is lower semi-continuous i.e. lim inf µ′ →µ T (V )(µ′ ) ≥ T (V )(µ). Suppose T (V )(µ) = F (µ), then this is trivial because T (V ) ≥ F . Therefore we only need to discuss T (V )(µ) > F (µ). Suppose pi , µi solves Equation (A.6) at µ for V . WLOG, we drop all signals with pi = 0. Then the remaining signals still satisfies Bayes condition and pi > 0. Suppose q(µi |xj ) is the conditional distribution of each realization of posterior beliefs. ∀µ′ , define: q(µi |xj )µ′ (xj ) ′ ∑ µ (x ) = j i ′ j q(µi |xj )µ (xj ) ∑ p′ = q(µi |xj )µ′ (xj ) i j 35 ′ ∂pi ′ ′ = q(µi |xj ) ∂µj µ =µ =⇒ ∂µ′i q(µi |xj ) q(µi |xj ) ′ = −µi (xj ) + 1i=j ∂µj µ′ =µ pi pi Since pi > 0, q(si |xj ) ≤ 1, µi ∈ ∆X, we can conclude that there exists δ, M > 0 s.t. ∀ |µ′ − µ| ≤ δ, |µ′i − µi | + |p′i − pi | ≤ M |µ′ − µ|. Then µ′i , p′i are all continuous in µ′ . Now we define: { ∑ } (∑ ) ∑ p (H(µ) − H(µ )) i i ′ ′ −ρdt ′ e ∑ V (µ ) =e pi V (µi ) min 1, −f pi (H(µ) − H(µi )) p′i (H(µ′ ) − H(µ′i )) Since V (µ) and H(µ) are continuous around µ and µi , Ve (µ′ ) will be continuous around ∑ ∑ ′ µ. If pi (H(µ) − H(µi )) ≤ pi (H(µ) − H(µ′i )), then: (∑ ) ∑ Ve (µ′ ) =e−ρdt p′i V (µ′i ) − f pi (H(µ) − H(µi )) (∑ ) ∑ ≤e−ρdt p′i V (µ′i ) − f p′i (H(µ) − H(µ′i )) ≤T (V )(µ′ ) ∑ ∑ If pi (H(µ) − H(µi )) > p′i (H(µ) − H(µ′i )), consider the following information structure: ′′ µi = µ′i , µ′′0 = µ′ ∑ pi (H(µ) − H(µi )) ′′ ′ pi = pi ∑ ′ p (H(µ′ ) − H(µ′i )) ∑i pi (H(µ) − H(µi )) p′′0 = 1 − ∑ ′ pi (H(µ′ ) − H(µ′i )) ∑ ∑ ) ∑ ( ∑ pi (H(µ) − H(µi )) pi (H(µ) − H(µi )) ′′ ′′ ′ ′ ′ = µ′ pi µi = µ 1 − ∑ ′ + pi µi ∑ ′ ′ ′ ′ ) − H(µ′ )) p (H(µ ) − H(µ )) p (H(µ i i i i (∑ ) ∑ p (H(µ) − H(µ )) ∑ i i =⇒ p′′i (H(µ′ ) − H(µ′′i )) = p′i (H(µ′ ) − H(µ′i )) ∑ ′ ′ ) − H(µ′ )) p (H(µ i i ∑ = pi (H(µ) − H(µi )) Therefore: Ve (µ′ ) =e−ρdt ∑ p′′i V (µ′′i ) − f (∑ ) p′′i (H(µ′ ) − H(µ′′i )) ≤T (V )(µ′ ) The inequality is from suboptimality of (p′′i , µ′′i ) in Equation (A.5). What’s more, it’s obvious that Ve (µ) = T (V )(µ). Therefore: lim inf T (V )(µ′ ) ≥ lim inf Ve (µ′ ) = Ve (µ) = T (V )(µ) ′ ′ µ →µ µ →µ Then T (V )(µ) is lower semi-continuous on ∆X. Second, we show that T (V ) is upper semi-continuous i.e. lim supµ′ →µ T (V )(µ′ ) ≤ T (V )(µ). Consider a sequence of µn → µ 36 s.t. lim T (V )(µn ) = lim supµ′ →µ T (V )(µ′ ). Now since the number of posteriors in the optimization problem is bounded by 2 |X| by Theorem 10, we can find a subsequence if n s.t. (pni , µni ) → (pi , µi ). This is done by choosing converging µni and pni for each index. Now by continuity of H(µ), we get: ∑ ∑ p (H(µ) − H(µ )) = lim pni (H(µn ) − H(µni )) i i n→∞ ∑ p i µi = µ ∑ p =1 i Since V (µ) is continuous, it will be continuous at each µi . Since f (C) is convex, it will be ∑ continuous at C < ∞ s.t. f (C) < ∞. By optimzality ,we know that f ( pni (H(µn ) − H(µni ))) < ∑ ∞. Since f −1 (R+ ) is closed, we know that f ( pi (H(µ) − H(µi ))) < ∞, then f is continuous around it. (∑ ) ∑ −ρdt T (V )(µ) ≥e pi V (µi ) − f pi (H(µ) − H(µi )) (∑ ) ∑ = lim e−ρdt pni V (µni ) − f pni (H(µn ) − H(µni )) n→∞ = lim sup T (V )(µ′ ) µ′ →µ To sum up, T (V )(µ) is continuous. • T (V ) is monotonic. Suppose U (µ) ≥ 0 and U + V ∈ Z. If T (V )(µ) = F (µ), then by construction T (V + U ) ≥ F (µ) = T (V )(µ). If T (V )(µ) > F (µ), let (pi , µi ) be solution to Equation (A.6) at µ for V .: ∑ T (V + U )(µ) ≥e−ρdt pi (V (µi ) + U (µi )) ∑ =T (V )(µ) + e−ρdt pi U (µi ) ≥T (V )(µ) • T (V ) is contraction. We claim that T (V + α)(µ) ≤ T (V )(µ) + e−ρdt α. Suppose not at µ. Obviously T (V + α)(µ) > F (µ). Then let (pi , µi ) be the solution of Equation (A.6) at µ for V + α. ∑ pi V (µi ) T (V )(µ) ≥e−ρdt ∑ pi (V (µi ) + α) − e−ρdt α =e−ρdt =T (V + α)(µ) − e−ρdt α >T (V )(µ) Contradiction. • Therefore, by Blackwell condition, T (V ) is a contraction mapping on Z. And there exists a unique solution Vdt ∈ Z solving the fixed point problem T (Vdt )Vdt . 37 A.3. Continuous Time Limit of Experiments A.3.1. Proof of Lemma 4 Proof. First, Lemma 10 shows that with both set of assumptions, flow cost of optimal strategy in each period will be bounded by a finite constant times interval length. Then let’s discuss all possible cases: • Case 1 : Suppose for some i, µ′idt − µ ̸→ 0. Let’s take subsequence that they converge to µ′i away from µ. Without loss, we can combine all other signals and have a lower conditional mutual information since the experiment is less informative. For the combined signal µ′0 , we must have µ′0 → µ. Otherwise it’s obvious we will have limit of mutual information of order zero, contradiction. ∑ Since µ′0 → 0 and pidt µ′idt + p0dt µ′0dt = µ, we have: ∑ pidt H(µ′idt ) + p0dt H(µ′0dt ) − H(µ) ∑ = pidt (H(µ′idt ) − H(µ) − H ′ (µ)(µ′idt − µ)) + p0dt (H(µ′0dt ) − H(µ) − H ′ (µ)(µ′idt − µ)) ∑ = pidt (H(µ′idt ) − H(µ) − H ′ (µ)(µ′idt − µ)) H ′′ (µ) ′ + p0dt (µ0dt − µ)2 + O((µ′0dt − µ)3 ) 2 ∑ ≥ pidt (H(µ′i ) − H(µ) − H ′ (µ)(µ′i − µ)) when dt → 0 By convexity of entropy function and µ′i are bounded away from µ, we know that the time multiplied by pidt are strictly positive. Thus for the whole term to be bounded by O(dt), pidt has to be O(dt). • Case 2 : Now for the rest i’s, µ′idt → 0. We combine all√the other i’s to µ′0dt , p0dt . By the argument in the first part, we know µ′0dt − µ ∼ O( dt) (This is got by observing the quadratic term in the third line). We have: ∑ pidt H(µ′idt ) + p0dt H(µ′0dt ) − H(µ) ∑ H ′′ (µ) ′ 3 = pidt (µidt − µ)2 + O(pidt )O((µ′idt − µ) ) + O(dt) 2 To have the whole term bounded by O(dt), we must have pidt (µ′idt − µ)2 ∼ O(dt). Thus for all convergence sequence of experiments, there are only two kinds of limiting behavior, either µ′idt ̸→ µ and pidt ∼ O(dt), or µ′idt → µ and pidt (µ′idt − µ)2 ∼ O(dt). To put it differently, optimal policies involves only Poisson like signals and diffusion like signals in the limit. ∗ (µ) ≤ Lemma 10 (Bounded flow cost). With Assumption 2′ satisfied, ∃∆ ∈ R+ s.t. Cdt ∑ ∗ ∆dt. ∀µ, dt. Where Cdt (µ) = pi (H(µ) − H(µi )) for optimal (pi , µi ) in Equation (7) Proof. ∀dt, ∀ optimal policy (pi (µ), µi (µ)), ∀µ ∈ ∆(X), by optimality: ( ∗ { ) } ∑ Cdt (µ) −ρdt pi V (µi ) − h max F (µ), e dt dt 38 −2ρdt ≥e ∑ ( pi V (µi ) − h ∗ Cdt (µ) 2dt ) ( dt − e −ρdt h ∗ Cdt (µ) 2dt ) dt. The first line is simply definition of optimal value function. The second line is the value from delaying experimentation by one more period and divide the experiment such that the costs paid in each period are the same. Let’s first prove that dividing the experiment in this way is feasible. Consider any optimal experiment (pi , µi ) at prior µ. There are two possibilities: • Case 1 : ∀i, µ′i = µ: That’s to say the optimal experiment at prior µ is uninformative. ∗ Then Cdt (µ) = 0 and proof is done. ∗ • Case 2 : Cdt (µ) > 0 =⇒ ∃µi ̸= µ. Take µ′i = αi µi + (1 − αi )µ. Consider the following signal structure: { µij = µj pij = αi pj + 1j=i (1 − αi ) ∑ ∑ i i p µ = α pj µj + (1 − αi )µi = µ′i i j j j =⇒ ∑ ∑ i ′ i ′ p (H(µ ) − H(µ )) = H(µ ) − α pj H(µj ) − (1 − αi )H(µi ) i j i j i j ( i i) µ , p satisfies Bayes condition at µ′i and the associated cost is 1 when αi = 0, ∑j j ∑ pi (H(µ) − H(µi )) when αi = 1. By continuity of H, there exists αi s.t. j pij (H(µ′i )− ∑ i )) H(µij )) = pi (H(µ)−H(µ . Suppose αi is already defined for all µi ̸= µ. For µi = µ, let 2 αi = 1. Consider the following signal structure: ′ ′ µ i = µ i pi 1−αi ′ pi = ∑ pj j 1−αj =⇒ ∑ ∑ p i αi ∑ p µ i i i p′i µ′i = ∑ pj + ∑ 1−α pj µ = µ 1−αj i ∑ ∑ p′i (H(µ) − H(µ′i )) 1−αj ∑ ∑ p′i H(µ′i ) ∑ ∑ ∑ ∑ ∑ = pi (H(µ) − H(µi )) − p′i H(µ′i ) − αi p′i pj H(µj ) − (1 − αi )p′i H(µi ) ∑ ∑ ∑ = pi (H(µ) − H(µi )) − p′i (H(µ′i ) − pij H(µij )) ∑ pi (H(µ) − H(µi )) = 2 = pi (H(µ) − H(µi )) + pi H(µi ) − That is to say, we developed information structure (µ′i , p′i ) and (µij , pij ) such that the induced belief has distribution (µi , pi ), the cost in first period and in any posterior in C ∗ (µ) ∗ second period is dt2 . On the other hand, Cdt (µ) > 0 implies F (µ) is suboptimal. Then the optimal value is achieved by: ( ∗ ) ∑ Cdt (µ) −ρdt pi V (µi ) − h e dt dt 39 ( ) ∗ Cdt (µ) ≥e pi V (µi ) − h dt − e h dt 2dt ( ) ) ) ( ∗ ( ∗ e−ρdt 1 − e−ρdt ∑ Cdt (µ) Cdt (µ) =⇒ pi V (µi ) ≥ h − 2h dt dt 2dt −2ρdt ∑ ) ∗ Cdt (µ) 2dt ( −ρdt Let’s proceed step by step: • Step 1 : It’s obvious that left hand side is dominated by 2ρ∆ if we choose ∆ = maxa,x u(a, x). ( ∗ ) ( ∗ ) C (µ) C (µ) C ∗ (µ) • Step 2 : We show that h dtdt − 2h dt bounds dtdt . 2dt ∫ x h(x) − 2h(x/2) = ∫ ′ h (z)dz − 2 0 ∫ x ∫ ′ = h (z)dz − x/2 x/2 ∫ ∫ = 0 x/2 0 z/2 h′ (z)dz h′ (z)dz 0 x/2 h′′ (z + y)dzdy 0 x2 ε ≥ 4 • Step 3 : To sum up: ∗ Cdt (µ)2 ε 2 4dt √ 2ρ∆ ∗ dt =⇒ Cdt (µ) ≤ 2 ε 2ρ∆ ≥ B. Proofs in Section 3 B.1. Convergence B.1.1. Proof of Lemma 5 Proof. We break down the proof of Lemma Lemma 5 into three steps: • Step 1 : Let V dt = lim supn→∞ V dtn , then V dt − V dtn → 0. First it’s trivial that V dtn is 2 2 2 an increasing sequence, because every experimentation strategy associated with 2dtn can dt be replicated in a problem with 2n+1 . The DM can always split the experiment into two stages with equal cost in two periods and get an identical distribution of posterior beliefs at the end of second period. Thus existence of V dt = lim V dtn is guaranteed by monotonic 2 convergence theorem. Now let’s prove the convergence is uniform, i.e. V dtn is a Cauchy sequence under sup 2 norm. ∀m > n, ∀µ0 , consider the problem with 2dtm , consider the optimal experimentation (pi (µ), µi (µ)) and associated action rule AT , the expected utility is: ∑ dt e−ρT 2m Eµ0 [u(AT , X)] . V dtm (µ0 ) = 2 40 = ∑ e −ρT 2dt n 2m−n ∑−1 e−ρτ 2m Eµ0 [u(AT 2m−n +τ , X)] dt (B.1) τ =0 The second equality is get by rewriting T = 2m−n T ′ + τ . Then take summation first over τ then over T ′ (and relabel T ′ to be T ). Now we construct an experimentation strategy for problem with 2dtn . We combine all experiments between 2m−n T to 2m−n (T + 1), and get the joint distribution of posteriors. We use this as the signal structure in each period T . Given this construction, at the end of each 2m−n T , the posterior distribution will be exactly as that using original experiment. Then we assign same action as before to each posterior. By construction this action profile satisfies Markov property of information (i.e. signal realization is a sufficient statistics for action). In the end, the new experimentation strategy satisfies cost constraint due to linearity of Mutual information. Therefore if we let U (µ0 ) be the discounted expected utility associated with the aforementioned strategy at µ0 : V dtn (µ0 ) ≥U (µ0 ) 2 = ∑ −ρT 2dt n e 2m−n ∑−1 e−ρ 2n Eµ0 [u(A2m−n T +τ , X)] dt (B.2) τ =0 =e −ρ 2dt n ∑ e −ρT 2dt n 2m−n ∑−1 Eµ0 [u(A2m−n T +τ , X)] τ =0 >e−ρT 2n dt ∑ e−ρT 2n dt 2m−n ∑−1 e−ρ 2m Eµ0 [u(A2m−n T +τ , X)] dt τ =0 =e −ρ 2dt n V dtm (µ0 ) 2 Noticing that Equation (B.2) is different from Equation (B.1) by only one term: the dt dt discounting term in inner summation (e−ρ 2m and e−ρ 2n ). This characterize the experiment design in problem 2dtn . In each period T , actions are all postponed to the end of period. Therefore they are discounted by 2dtn , which is period length. The dt second equality is from moving the constant e−ρ 2n out of summations. The next indt equality is from e−ρ 2m( < 1 By Lemma 3, Vdt are uniformly bounded by max v, then ) dt V dtn − V dtm ≤ max v 1 − e−ρ 2n → 0 when n → 0. 2 2 • Step 2 : ∀dt > 0, V dt are identical, WLOG we can call it V (µ). ∀dt, dt′ > 0, ∀n, ′ dt consider V dtn . Pick m large enough that there exists N s.t. 2n+1 ≤ N 2dtm ≤ 2dtn ≤ ′ 2 (N +1) 2dtm . Consider optimal experimentation and action associated with 2dtn , we construct ′ experimentation strategy for problem with 2dtm . For each time period T in the original problem, split the experiment in period T into N + 1 periods and take any action at the end of N + 1th period. In the new experiment strategy, the effective period length will ′ ′ increase from 2dtn to (N + 1)c 2dtm . First, comparing the cost constraint c 2dtn < (N + 1)c 2dtm . Therefore the new experiment strategy satisfies cost constraint. Second, since induced posterior distribution and action distribution are still the same, Markov property still 41 holds. Finally: ∑ dt′ V dtm′ (µ0 ) ≥ e−ρT (N +1) 2m Eµ0 [u(AT , X)] 2 ) ∑ ∑( dt dt′ dt = e−ρT 2n Eµ0 [u(AT , X)] − e−ρT 2n − e−ρT (N +1) 2m Eµ0 [u(AT , X)] ∑ ∑ ∑ ′ −ρT 2dt −ρT 2dt −ρT (N +1) 2dtm n n ≥ e Eµ0 [u(AT , X)] − max v e − e dt′ e−ρ 2n − e−ρ(N +1) 2m )( ) =V dtn (µ0 ) − max v ( dt dt′ 2 1 − e−ρ 2n 1 − e−ρ(N +1) 2m dt dt′ ≥V dtn (µ0 ) − max v dt′ e−ρN 2m − e−ρ(N +1) 2m (1 − e−ρ 2n )2 dt 2 dt′ =V dtn (µ0 ) − max v 2 e−ρN 2m −ρ 2dt n (1 − e dt′ )2 e−ρ 2n+1 (eρ 2m − 1) dt ≥V dtn (µ0 ) − max v 2 dt′ (1 − e−ρ 2n )2 dt (eρ 2m − 1) First inequality is from suboptimality of the constructed experiment. Second inequality ′ ′ −ρT 2dt −ρT (N +1) 2dtm n is from e ≥e . Third inequality is from 2dtn ≥ N 2dtm . Last inequality is ′ dt from N 2dtm ≥ 2n+1 . Take m → ∞ on both side, we have V dt′ (µ0 ) ≥ V dtn (µ0 ). Then take 2 n → 0 on both side V dt′ (µ0 ) ≥ V dt (µ0 ). Since this holds for arbitrary dt, dt′ and µ0 , we conclude that V dt = V dt′ . • Step 3 : ∥V dt − V ∥ → 0 when dt → 0. Fix any dt > 0, then ∀ε > 0, there exists N s.t. ∀n ≥ N , V dtn − V < 2ε . Then given the proof in last part, for any dt′ < 2dtn , suppose 2 dt ≤ N dt′ ≤ there exists N s.t. 2n−1 Vdt′ will be bounded by: dt 2n ′ ≤ (N + 1) 2dtm , then the difference between V dtn and 2 e−ρ 2n+1 dt max v [ ] (1 − e −ρ 2dt n ′ ) dt ′ Actually such N = 2n dt′ exists for any dt ≤ Vdt′ − V dtn < 2ε , then ∥Vdt′ − V ∥ < ε. (eρdt − 1) dt . 2n Thus there exists δ s.t. ∀dt′ < δ, 2 B.1.2. Proof of Theorem 1 Proof for Theorem 1 shows that { V (µ) is within the family oflocally Lipschitz } contin V (µ′ )−V (µ) uous functions on [0, 1]: V ∈ L = V : [0, 1] 7→ R, lim supµ′ →µ µ′ −µ ∈ R . Then consider the following Bellmen equation defined on functions in space L: { ∑ ρV (µ) = max ρF (v), sup pi (V (µi ) − V (µ)) µi ∈[0,1], pi ,σ̂∈R+ ( − DV } ∑ ) D2 V (µ) 2 pi µi ∑ pi (µi − µ) + σ̂ µ, ∑ pi 2 42 (B.3) s.t. − ∑ pi (H(µi ) − H(µ) − H ′ (µ)(µi − µ)) − H ′′ (µ) 2 σ̂ ≤ c 2 Since V is not necessarily differentiable, we use operator D and D2 to replace first and second derivatives. D and D2 are defined as following: Definition 1 (General Derivative). ∀f ∈ L: { (x) when x′ > x lim inf xn →x− f (xxnn)−f ′ −x Df (x, x ) = (x) lim supxn →x+ f (xxnn)−f when x′ < x −x when Df (x, x+ ) < Df (x, x− ) +∞ D2 f (x) = −∞ lim sup when Df (x, x+ ) > Df (x, x− ) 2f (x+dx)−2f (x)−2Df (x)dx dx→0 dx2 otherwise Proof. Local Lipschitz Continuity: First, since V is the uniform limit of continuous Vdt , V is continuous. Suppose V is not locally Lipschitz continuous. Then there exists µ s.t. (µ) | ≥ n. We discuss only the case µn > µ, µn < µ can be proved ∃ µn → µ, | V (µµnn)−V −µ using same method, then there are two possibilities. • V (µn )−V (µ) µn −µ c ≥ n. Then pick µ0 = 0, we have: V (0) − V (µ) − H(µ) − H(0) + V (µn )−V (µ) (0 − µ) µn −µ H(µn )−H(µ) (0 − µ) µn −µ ≥c V (0) − V (µ) + nµ H(µ) − H(0) + H(µn )−H(µ) (0 µn −µ − µ) (µ) Noticing that the only difference between LHS and RHS is that V (µµnn)−V is replaced −µ with n on RHS. Take n → ∞ on RHS, we observe that RHS goes to infinity. Therefore, there exists N s.t. ∀n ≥ N , RHS is larger than 2ρ sup F . c V (0) − V (µ) − H(µ) − H(0) + V (µn )−V (µ) (0 − µ) µn −µ H(µn )−H(µ) (0 − µ) µn −µ ≥ 2ρ sup F (µn − µ)V (0) + (µ − 0)V (µn ) + V (µ)(0 − µn ) 2ρ ≥ sup F (µ − µn )H(0) + (0 − µ)H(µn ) + H(µ)(µn − 0) c µ−µn 0−µ V (0) + 0−µ V (µn ) − V (µ) 2ρ 0−µn n ≥ sup F =⇒ µ−µn 0−µ c H(µ) − 0−µn H(0) − 0−µn H(µn ) =⇒ =⇒ µ−µn V 0−µn (0) + 0−µ V 0−µn (µn ) − V (µ) ≥ 2ρ sup F c I(µn , 0|µ) µ − µn 0−µ ρ =⇒ V (0) + V (µn ) ≥ V (µ) + 2 sup F I(µn , 0|µ) 0 − µn 0 − µn c ρ ρ ≥ V (µ)(1 + I(µn , 0|µ)) + sup F I(µn , 0|µ) c c Then N can be chosen sufficiently large such that: e ρ I(µn ,0|µ) c ( ρ )k ∑ ρ 1 1 − 1 − I(µn , 0|µ) = I(µn , 0|µ) ≤ c (k + 1)! c 2 k=1 43 =⇒ ρ 0−µ µ − µn ρ V (0) + V (µn ) ≥ V (µ)e c I(µn ,0|µ) + sup F I(µn , 0|µ) 0 − µn 0 − µn 2c Then we pick dt = I(µnc,0|µ) and dtm = 2dtm . m is chosen sufficiently large that |V − ρ ρ Vdtm |e c I(µn ,0|µ) < 8c sup F I(µn , 0|µ), then: ρ µ − µn 0−µ ρ Vdtm (0) + Vdtm (µn ) ≥Vdtm (µ)e c I(µn ,0|µ) + sup F I(µn , 0|µ) 0 − µn 0 − µn 4c We consider an experimentation strategy that divides the I(µn , 0|µ) uniformly into 2m periods, and wait for 2m periods before taking action: ( ) ρ µ − µn 0−µ ρ − ρc I(µn ,0|µ) e Vdtm (0) + Vdtm (µn ) ≥ Vdtm (µ) + sup F I(µn , 0|µ)e− c I(µn ,0|µ) 0 − µn 0 − µn 4c LHS will be the expected utility from taking the aforementioned experiment at µ. ρ I(µn ,0|µ) Taking m sufficiently large, RHS will be strictly larger than e c 2m Vdtm (µ). Thus this experiment dominates optimal experiment of dtm problem at µ. Contradiction. • V (µn )−V (µ) µn −µ c ≤ −n. Then pick µ0 = 1, we have: V (1) − V (µn ) − H(µn ) − H(1) + V (µn )−V (µ) (1 − µn ) µn −µ H(µn )−H(µ) (1 − µn ) µn −µ ≥c V (1) − V (µn ) + n(1 − µn ) H(µn ) − H(1) + H(µn )−H(µ) (1 µn −µ − µn ) Take n → ∞ on RHS, RHS goes to infinity. Therefore there exists N s.t. ∀n ≥ N , RHS is larger than 2ρ sup F . =⇒ 1 − µn ρ µn − µ V (1) + V (µ) ≥V (µn ) + 2 sup F I(µ, 1|µn ) 1−µ 1−µ c ρ ρ ≥V (µn )(1 + I(µ, 1|µn )) + sup F I(µ, 1|µn ) c c Similar to last part, n can be chosen sufficiently large that: ρ µn − µ 1 − µn ρ V (1) + V (µ) ≥ V (µn )e c I(µ,1|µn ) + sup F I(µ, 1|µn ) 1−µ 1−µ 2c Then pick dt = I(µ,1|µn ) c and dtm = dt , 2m m can be chosen sufficiently large that: ρ µn − µ 1 − µn ρ Vdtm (1) + Vdtm (µ) ≥ Vdtm (µn )e c I(µ,1|µn ) + sup F I(µ, 1|µn ) 1−µ 1−µ 4c We consider the similar experimentation strategy as before that divides the experiment: ) ( ρ 1 − µn µn − µ ρ − ρc I(µ,1|µn ) e Vdtm (1) + Vdtm (µ) ≥ Vdtm (µn ) + e− c I(µ,1|µn ) sup F I(µ, 1|µn ) 1−µ 1−µ 4c ρ I(µ,1|µn ) m can be taken sufficiently large that RHS is strictly larger than e c 2m Vdtm (µn ). This experiment dominates optimal experiment of dtm problem at µn . Contradiction. 44 Unimprovability: Let’s show that V is unimprovable. Suppose not, then there exists pi , µi , σ̂i s.t.: ( ∑ ) ∑ pi µi ∑ D2 V (µ) 2 σ̂ ρV (µ) < pi (V (µi ) − V (µ)) − DV µ, ∑ pi (µi − µ) + 2 pi ∑ H ′′ (µ) 2 s.t. − pi (H(µi ) − H(µ) − H ′ (µ)(µi − µ)) − σ̂ ≤ c 2 ∑ We spilt µi ’s into (µi , µj ) s.t.: pi µi = µ and all remaining µj are on the same side of µ. Then ∑ ρV (µ) < pi (V (µi ) − V (µ)) ∑ + pj (V (µj ) − V (µ) − DV (µ, µj )(µj − µ)) D2 V (µ) 2 σ̂ ∑2 c≥− pi (H(µi ) − H(µ)) ∑ − pj (H(µj ) − H(µ) − H ′ (µ)(µj − µ)) + − H ′′ (µ) 2 σ̂ 2 Then if we compare the following three groups of ratios: ∑ (V (µj ) − V (µ) − DV (µ, µj )(µj − µ)) D2 V (µ) p (V (µi ) − V (µ)) ∑i , , − pi (H(µi ) − H(µ)) −(H(µj ) − H(µ) − H ′ (µ)(µj − µ)) −H ′′ (µ) At least one of them must be larger than • Case 1 : ρV (µ) . c ∑ ρ p (V (µi ) − V (µ)) ∑ i > V (µ) pi (H(µ) − H(µi )) c Then there exists ε > 0 s.t.: ∑ p (V (µi ) − V (µ)) ρ ∑ i ≥ V (µ) + ε pi (H(µ) − H(µi )) c ∑ pi ∑ (V (µi ) − V (µ)) =⇒ pi ∑ pi ∑ pi ρ ∑ (H(µ) − H(µi )) + ε ∑ (H(µ) − H(µi )) ≥ V (µ) + V (µ) c pi pi ( ) ∑ ρ pi =⇒ pei (V (µi ), µ) ≥ V (µ) 1 + I(µi |µ) + εI(µi |µ) (if we let pei = ∑ ) c pi pi , µi ) violates cost constraint. We define the Now for any δt < I(µci |µ) , experiment (e following experiment strategy: experiment (e pi , µi ) is taken with probability I(µcdt every i |µ) period. If it’s not taken, then same strategy is applied next period. Then the utility associated with this strategy is: (( ) ) ∑ cdt cdt −ρdt 1− Vedt = e pei Vdt (µi ) Vedt + I(µi |µ) I(µi |µ) 45 ∑ e−ρdt I(µcdt pei Vdt (µi ) |µ) i ( ) =⇒ Vedt = 1 − e−ρdt 1 − I(µcdt i |µ) =⇒ lim Vedt = ∑ dt→0 pei V (µi ) lim dt→0 =⇒ lim (Vedt − Vdt (µ)) > dt→0 c I(µi |µ) ( e−ρdt ρ + c I(µi |µ) ∑ )= pei V (µi ) 1 + ρc I(µi |µ) εI(µi |µ) 1 + ρc I(µi |µ) Therefore, there exists δt sufficiently small that: Vedt > Vdt (µ) + 1 1 εI(µi |µ) 2 + ρc I(µi |µ) Contradicting the optimality of Vdt (µ). • Case 2 : V (µj ) − V (µ) − DV (µ, µj )(µj − µ) ρ > V (µ) ′ H(µ) − H(µj ) + H (µ)(µj − µ) c It’s easy to see that if we call this posterior µ′ , corresponding probability p(µ′ ) = c , then ∃ ε ≥ 0 s.t. H(µ)−H(µ′ )+H ′ (µ)(µ′ −µ) p(µ′ )(V (µ′ ) − V (µ) − DV (µ, µ′ )) ≥ ρV (µ) + ε We discuss the case µ′ > µ first. By Definition 1, there exists µ1 → µ− s.t. V (µ1 )−V (µ) (µ′ − µ) ε µ1 −µ c ≥ ρV (µ) + 2 H(µ) − H(µ′ ) + H(µµ11)−H(µ) (µ′ − µ) −µ ε V (µ′ )(µ − µ1 ) − V (µ)(µ′ − µ1 ) + V (µ1 )(µ′ − µ) ≥ ρV (µ) + c ′ ′ ′ H(µ)(µ − µ1 ) − H(µ )(µ − µ1 ) − H(µ1 )(µ − µ) 2 ′ µ − µ1 µ −µ ρ ε V (µ′ ) + ′ V (µ1 ) ≥ V (µ)(1 + I(µ1 , µ′ |µ)) + I(µ1 , µ′ |µ) ′ µ − µ1 µ − µ1 c 2c V (µ′ ) − V (µ) − =⇒ =⇒ Given ε, µ1 can be chosen arbitrarily close to µ− such that e ( ρ )k ∑ ρ 1 ε ′ − 1 − I(µ , µ1 |µ) = I(µ′ , µ1 |µ)k ≤ c (k + 1)! c 4 max v k=1 ρ I(µ′ ,µ1 |µ) c Then: ρ µ − µ1 µ′ − µ ε ′ ′ V (µ ) + V (µ1 ) ≥ e c I(µ ,µ1 |µ) V (µ) + I(µ1 , µ′ |µ) ′ ′ µ − µ1 µ − µ1 4c Then pick ∆t = ∀ n ≥ N: e I(µ′ ,µ1 |µ) , c −ρndtn ( dtn = dt . n By uniform convergence, there exists N s.t. µ − µ1 µ′ − µ ′ V (µ ) + Vdt (µ1 ) dt µ′ − µ1 n µ′ − µ1 n 46 ) > Vdtn (µ) cndtn = I(µ′ , µ1 |µ) That is to say we find a feasible experiment, whose cost can be spread into n periods and satisfies cost constraint. This experiment strictly dominates the optimal experiment at µ for dtn discrete problem. Contradiction. Thus V must be unimprovable at µ. Same proof applies to case where µ′ < µ. 2 D V (µ) 2 ′ • Case 3 : c −H ′′ (µ) ≥ ρV (µ) + 2ε. Then by definition of operator D , there exists µ sufficiently close to µ s.t.: c =⇒ c V (µ′ )−V (µ)−DV (µ)(µ′ −µ) (µ′ −µ)2 H(µ)−H(µ′ )+H ′ (µ′ −µ) (µ′ −µ)2 ′ ≥ ρV (µ) + ε V (µ ) − V (µ) − DV (µ)(µ′ − µ) ≥ ρV (µ) + ε H(µ) − H(µ′ ) + H ′ (µ′ − µ) Then by definition of operator D, there exists µ1 → µ s.t.: c =⇒ V (µ′ ) − V (µ) − H(µ) − H(µ′ ) − µ′ −µ (V (µ) − V (µ1 )) µ−µ1 µ′ −µ (H(µ) − H(µ1 )) µ−µ1 ′ ≥ ρV (µ) + ε 2 µ − µ1 µ −µ ρ ε V (µ′ ) + ′ V (µ1 ) ≥ V (µ)(1 + I(µ1 , µ′ |µ)) + I(µ1 , µ′ |µ) ′ µ − µ1 µ − µ1 c 2c By similar argument as before, we can pick µ1 small enough such that 1 + ρc I(µ′ , µ1 |µ) ρ ′ approaches e c I(µ ,µ1 |µ) . Noticing that the expression we are studying is exactly the same as before. The using same argument, we rule out this case being possible. Equality: Then we show that ∀V solving (B.3), V = V . Noticing that this automatically proves uniqueness of solution of (B.3). • V (µ) ≥ V (µ): Suppose not, then consider U (µ) = V (µ) − V (µ). Since both V and V are in L, U ∈ L. Therefore arg min U is non empty and min U < 0 according to our assumption. Choose µ ∈ arg min U . Let (pi , µi , σ̂) be the strategy approaches V (µ): ( ∑ ) ∑ p i µi ∑ D2 V (µ) 2 ρV (µ) = pi (V (µi ) − V (µ)) − DV µ, ∑ pi (µi − µ) + σ̂ + ε pi 2 Now we compare DV and DV : ∀µn → µ− : V (µn ) − V (µ) V (µ) − V (µn ) = µn − µ µ − µn V (µ) − V (µn ) + U (µ) − U (µn ) = µ − µn V (µn ) − V (µ) ≤ µn − µ V (µn ) − V (µ) V (µn ) − V (µ) =⇒ lim inf ≤ lim inf µn − µ µn − µ =⇒ DV− (µ) ≤ DV − (µ) 47 Similarly for µn → µ+ : V (µn ) − V (µ) V (µn ) − V (µ) − U (µ) + U (µn ) = µn − µ µn − µ V (µn ) − V (µ) ≥ µn − µ =⇒ DV+ (µ) ≥DV + (µ) Suppose DV+ (µ) > DV− (µ), then D2 V (µ) = +∞. This contradicts unimprovability of V at µ. Therefore the only possibility is DV+ (µ) = DV( µ) = DV + (µ) = DV − (µ) = D. ∀µn → µ: 2V (µn ) − 2V (µ) − 2D 2V (µn ) − 2V (µ) − 2D − 2U (µ) + 2U (µn ) = (µn − µ)2 (µn − µ)2 2V (µn ) − 2V (µ) − 2D ≥ (µn − µ)2 =⇒ D2 V (µ) ≥D2 V (µ) Therefore: ρV (µ) = ≤ ∑ ( pi (V (µi ) − V (µ)) − DV ∑ ∑ ) D2 V (µ) 2 pi µi ∑ σ̂ + ε µ, ∑ pi (µi − µ) + pi 2 pi (V (µi ) − V (µ) − (U (µ) − U (µi ))) ( ∑ ) pi µi ∑ D2 V (µ) 2 − DV µ, ∑ σ̂ + ε pi (µi − µ) + 2 pi ( ∑ ) ∑ pi µi ∑ D2 V (µ) 2 ∑ ≤ pi (V (µi ) − V (µ)) − DV µ, σ̂ + ε pi (µi − µ) + 2 pi <ρV (µ) + ε (µ) If we choose ε = ρV (µ)−ρV in the begining, we get a contradiction. The first inequality 2∑ ∑ ∑ pi µi ∑ . Therefore it takes DV− when the coefficient comes from pi µi ≥ pi µ iff µ ≤ pi is negative and vice versa. The second inequality comes from µ ∈ arg min U . • V (µ) ≥ V (µ): We prove by showing that ∀dt > 0, V ≥ Vdt . Suppose not, then there exists µ′ , dt s.t. Vdt (µ′ ) > V (µ′ ). Let dtn = 2dtn . Since Vdtn is increasing, there exists ε > 0 s.t. Vdtn (µ′ ) − V (µ′ ) ≥ ε ∀n ∈ N. Now consider Un = V − Vdtn . Un will be continuous by Lemma 3 and Un (µ′ ) < −ε. Therefore, there exists µn ∈ arg min Un . Since ∆(X) is compact, there exists a converging sequence lim µn = µ. By assumption, Un (µn ) ≤ −ε, therefore µ must be in interior of ∆(X). Now consider the optimal strategy of discrete time problem: ∑ n −ρdtn pi Vdtn (µni ) V dt n (µ ) = e ∑ pi (H(µn ) − H(µni )) = cdtn ∑ pi µni = µn 48 By definition of Un (µ): ∑ ( ) ∑ pi (Vdtn (µni ) − Vdtn (µn ) − Un (µn ) + U (µni )) pi V (µni ) − V (µn ) = ∑ ≥ pi (Vdtn (µni ) − Vdtn (µn )) ( ) = eρdtn − 1 Vdtn (µn ) ≥ρdtn ε + ρdtn V (µn ) ∑ pi ( ) =⇒ ρV (µn ) ≤ − ρε + V (µni ) − V (µn ) dtn The first equality is definition of Un . The first inequality is from µn ∈ arg min Un . The second inequality is from ex − 1 = x and Un (µn ) ≤ −ε. Now since number of posteriors µni is no more than 2 |X|, we can take a subsequence of n such that all lim µni = µi . We partition µni into two kinds, lim µni = µi ̸= µ, lim µnj = µ. First, since V is unimprovable, we have −D2 V (µ) ≤ ρc V (µ)H ′′ (µ) point wise. Since V ∈ C (1) , H ∈ C (2) , ∀η, there exists δ s.t. ∀ |µ′ − µ| ≤ δ: D2 V (µ′ ) ≤ − ρ V (µ)H ′′ (µ) + η c |H ′′ (µ) − H ′′ (µn )| ≤ η Then there exists N s.t. ∀n ≥ N , µnj − µ < δ, |µn − µ| < δ. We have an intermediate value result: V (µnj ) − V (µn ) − V ′ (µn )(µnj − µn ) ( ) = V ′ (ξjn ) − V ′ (µn ) (µnj − µn ) ( )2 ≤ sup D2 V (ξ) µnj − µn ξ∈[ξjn ,µn ] ( )2 ≤ sup D2 V (ξ) µnj − µn |ξ−µ|≤δ )( ( ρ )2 ≤ − V (µ)H ′′ (µ) + η µnj − µn c Therefore: ∑ ( ) ∑ ( ) pi,j V (µni,j ) − V (µn ) = pi V (µni ) − V (µn ) − V ′ (µn ) (µni − µn ) ∑ ( ( )) + pj V (µnj ) − V (µn ) − V ′ (µn ) µnj − µn ∑ ( ) ≤ pi V (µni ) − V (µn ) − V ′ (µn )(µni − µn ) ) ∑ ( )2 ( ρ − V (µ)H ′′ (µ) + η + pj µnj − µn c ∑ Now let pni = pi , dtn ∑ σ̂n2 = 2 n n ′ n n n pn j (H(µ )−H(µj )+H (µ )(µj −µ )) , −H ′′ (µn )dtn we will have: 1 pni (H(µn ) − H(µni ) + H ′ (µn )(µni − µn )) − σ̂n2 H ′′ (µn ) = c 2 49 (pni , µni , σ̂ n ) is a feasible experiment at µ for problem Equation (B.3). Therefore, by optimality of V at µn , we have: 1 n ′′ n ∑ ( ) n n n n ′ n n n c + 2 σ̂ H (µ ) pi V (µi ) − V (µ ) − V (µ )(µi − µ ) ≤ ρV (µ ) c n ρ (µ ) V D2 V (µn ) ≤ − c H ′′ (µn ) ∑ Then we study term pj (µnj − µn )2 . Consider: ∑ ( ) pj H(µn ) − H(µnj ) + H ′ (µn )(µnj − µn ) ∑ ( )2 )( = pj −H ′′ (ξjn ) µnj − µn ∑ ∑ )2 ( pj (µnj − µn )2 ≥ pj (−H ′′ (µ)) µnj − µn − η Therefore, to sum up: ∑ pi,j ( ) ∑ n( ) V (µni,j ) − V (µn ) ≤ pi V (µni ) − V (µn ) − V ′ (µn )(µni − µn ) dtn ) ∑ pj ( )2 ( ρ + µnj − µn − V (µ)H ′′ (µ) + η dtn c 1 2 ′′ n c + 2 σ̂n H (µ ) ≤ρV (µn ) c (∑ + pj (H(µn ) − H(µnj ) + H ′ (µn )(µnj − µn )) ) ) ρ 1 ∑ ( n n 2 η pj µj − µ V (µ) + dtn c 1 ∑ + pj (µnj − µn )2 η dtn c + 12 σ̂n2 H ′′ (µn ) =ρV (µn ) c ( ) ) ρ 1 n 1 ∑ ( n ′′ n n 2 V (µ) + σ̂ (−H (µ )) + η pj µj − µ 2 dtn c 1 ∑ + pj (µnj − µn )2 η dtn What’s more: ( ) pj H(µn ) − H(µnj ) + H ′ (µn )(µnj − µn ) −µ ) ≤ −H ′′ (µ) − η cdtn ≤ −H ′′ (µ) − η n 2 ′′ c + σ̂n H (µ ) ≤1 c =⇒ ρV (µ) ≤ − ρε + ρV (µ) ∑ ∑ pj (µnj n 2 +η ρV (µ) c +η ′′ ′′ −H (µ) − η −H (µ) − η → ρV (µ) − ρε when η → 0 50 Contradiction. Therefore V (µ) ≥ Vdt (µ) =⇒ V (µ) ≥ lim sup Vdt (µ) = V (µ) dt→0 B.2. Characterization B.2.1. Proof of Theorem 2 V 0.30 V 0.30 V 0.30 0.25 0.25 0.25 0.20 0.20 0.20 0.15 0.15 0.15 0.10 0.10 0.10 0.05 0.05 0.05 0.5 0.6 0.7 0.8 0.9 1.0 ⋁ 0.5 0.6 0.7 0.8 0.9 1.0 ⋁ 0.5 V 0.30 V 0.30 V 0.30 0.25 0.25 0.25 0.20 0.20 0.20 0.15 0.15 0.15 0.10 0.10 0.10 0.05 0.05 0.05 0.5 0.6 0.7 0.8 0.9 1.0 ⋁ 0.5 0.6 0.7 0.8 0.9 1.0 ⋁ 0.5 0.6 0.7 0.8 0.9 1.0 0.6 0.7 0.8 0.9 1.0 ⋁ ⋁ The two black (dashed and solid) lines are Fm−1 (µ), Fm (µ). The blue line is optimal value function from taking immediate action m. The red line is optimal value function from taking immediate action m − 1. Figure B.8: Construction of optimal value function. Proof. We prove Theorem 2 by constructing the point µ∗ and function ν(µ). By definition of ν(µ), it’s a feasible mechanism in Equation (B.3), then the corresponding V (µ) will be feasible. To prove Theorem 2, it will be sufficient to prove unimprovability of V (µ). We prove unimprovabiltiy of V (µ) after the construction. To simplify notation, we define a flow version of information measure: J(µ, µ′ ) = H(µ) − H(µ′ ) − H ′ (µ)(µ′ − µ) Then total flow information cost will be p(J(µ, µ′ )). Algorithm: In this part, we introduce the algorithm to construct V (µ) and ν(µ). We only discuss the case µ ≥ µ∗ and the case µ ≤ µ∗ will follow by a symmetric method. 51 • Step 1 : By Lemma 13, there exists µ∗ ∈ [0, 1] and V (µ) defined as: V (µ) = max ′ µ ,m Fm (µ′ ) 1 + ρc J(µ, µ′ ) • Step 2: We construct the first piece of V (µ) to the right of µ∗ . By Lemma 13, there are three possible cases of µ∗ to discuss (we omitted µ∗ = 1 by symmetry). Case 1: Suppose µ∗ ∈ (0, 1) and V (µ∗ ) > F (µ∗ ). Then, there exists m, ν(µ∗ ) > µ∗ s.t. V (µ∗ ) = Fm (ν(µ∗ )) 1 + ρc J(µ∗ , ν(µ∗ )) ( ) With initial condition µ0 = µ∗ , V0 = V (µ∗ ), V0′ = 0 , we solve for Vm (µ) as defined in Lemma 15. This refers to Figure B.8-1. Let µ̂m be the first µ ≥ µ∗ that: Vm (µ) = max ′ µ ≥µ c Fm−1 (µ′ ) − Vm (µ) − Vm′ (µ)(µ′ − µ) ρ J(µ, µ′ ) Noticing that Vm (µ̂m ) ≥ Fm−1 (µ) otherwise, there will be a µ even smaller.This refers to Figure B.8-2. Then we solve for Vm−1 with initial condition µ0 = µ̂m , V0 = Vm (µ̂m ), V0′ = Vm′ (µ̂m ). If m − 1 > m, we continue this procedure by looking for µ̂m−1 being first µ s.t.: Vm−1 (µ) = max ′ µ ≥µ c Fm−2 (µ′ ) − Vm (µ) − Vm′ (µ)(µ′ − µ) ρ J(µ, µ′ ) until we get Vm (µ). This refers to Figure B.8-3. Vm will be piecewise defined as Vm . By definition, Vm (µ) will be smoothly increasing until it hits F . Since Vm (µ∗ ) > F (µ∗ ), this intersection point will be µ∗∗ > µ∗ . This refers to the intersection point of red curve and F in Figure B.8-3. Case 2: Suppose µ∗ ∈ (0, 1) but V (µ∗ ) = F (µ∗ ), consider: Let µ∗∗ c Fk (µ′ ) − F (µ) Ve (µ) = max µ′ ≥µ,k ρ J(µ, µ′ ) { } e = inf µ|V (µ) > F (µ) . Case 3: Suppose µ∗ = 0, consider c Fk (µ′ ) − F1 (µ) − F1′ (µ′ − µ) Ve (µ) = max µ′ ≥µ,k ρ J(µ, µ′ ) { } e ≤ inf F . Therefore, inf µ|V (µ) > F1 (µ) > There exists δ s.t. ∀µ < δ, ∀µ ≤ µ2 , ∗∗ 0. We call it µ . This step refers to Figure B.8-4. ′ 7 7 sup F J(µ,µ′ ) µk = inf {µ|Fk (µ) > Fk−1 (µ)} 52 • Step 3: For all µ ≥ µ∗∗ such that: c Fk (µ′ ) − F (µ) − F ′− (µ)(µ′ − µ) µ ≥µ,k ρ J(µ′ , µ) F (µ) = max ′ (B.4) Let m be the optimal action. Solve for Vm with initial condition µ0 = µ, V0 = F (µ), V0′ = F ′− (µ). Let µ̂m be the first µ ≥ µ0 that: Vm (µ) = max ′ µ ≥µ c Fm−1 (µ′ ) − Vm (µ) − Vm′ (µ)(µ′ − µ) ρ J(µ, µ′ ) Then we solve for Vm−1 with initial condition µ0 = µ̂m , V0 = Vm (µ̂m ), V0′ = Vm′ (µ̂m ). This step refers to Figure B.8-5. We continue this procedure until we get Vm0 , where m0 is the index such that Fm0 −1 (µ) = F (µ). Now suppose Vm0 first hit F (µ) at some point µ′′ (can potentially be µ), define: ′ if µ′ < µ F (µ ) Vµ (µ′ ) = Vm0 (µ′ ) if µ′ ∈ [µ, µ′′ ] F (µ′ ) if µ′ > µ′′ Vµ ≥ F point wise. Vµ can be identical to F . However whennever Vµ is not identical to F , Vµ (µ′ ) > F (µ′ ) on an open interval (µ, µ′′ ). Call the set of all these µ0 : Ω. Since F ′− (µ) is a left continuous function that only jumps down. It’s not hard to verify that Ω is a closed set. • Step 4: Define: V (µ) = Vm (µ) if µ ∈ [µ∗ , µ∗∗ ] sup {Vµ0 (µ)} if µ ≥ µ∗∗ µ0 ∈Ω Then ∀V (µ) > F (µ), there must exists µn s.t. V (µ) = limn Vµn (µ) by definition of sup. Since Ω is a closed set (and bounded in [0, 1]), there exists µnk → µ0 . By continuous dependence, Vµ0 (µ) = V (µ). Then in the open interval around µ that V > F , V (µ) = Vµ0 (µ). Otherwise Vµ0 intersects some other Vµ′ in the region Vµ0 > 0, which violates uniqueness of ODE. Therefore V is a smooth function on {µ|V (µ) > F (µ)}. This step refers to Figure B.8-6. In the algorithm, we only discussed the case µ∗ < 1 and constructed the value function on the right of µ∗ . On the left of µ∗ , V can be defined by using a totally symmetric argument by referring to Lemma 15′ . Before we proceed to proof of smoothness and unimprovability of V , we state a useful result: Lemma 11. ∀µ ≥ µ∗ s.t. V (µ) = Fm (µ), we have: Fm (µ) ≥ max ′ µ ≥µ c Fm+k (µ′ ) − Fm (µ) − Fm− ′ (µ′ − µ) k ≜ Um (µ) ρ J(µ, µ′ ) 53 k We prove this result by contradiction. Suppose not true, then exists µ s.t. Um (µ) > −′ k Fm (µ). F is a lower-semi continous and left continuous function. Then Um will be upper-semi continuous and left continuous w.r.t. µ when m is taken that F (µ) = Fm (µ) k (a continuous function with only downward jumps). By the definition of µ∗∗ and Um (µ) > k Fm (µ), there exists µ0 > 0 s.t. Um0 (µ0 ) = Fm0 (µ0 ), where m0 is the corresponding index s.t. F (µ0 ) = Fm0 (µ0 ). Take µ0 < µ to be the supremum of µ0 such that this is true ( ) (we know the existence by continuity). Now consider initial condition µ0 , Fm0 (µ0 ), Fm′ 0 , by Lemma 15, we solve for Vk (µ) on [µ0 , µ]. Now consider any µ′ ∈ (µ0 , µ), suppose Vk (µ′ ) ≤ Fm′ (µ′ ), then by immediate value theorem, µ′ can be picked that Vk′ (µ′ ) ≤ Fm′ ′ . Therefore: c Fk (µ′′ ) − Vk (µ′ ) − Vk′ (µ′ )(µ′′ − µ′ ) J(µ′ , µ′′ ) µ′′ ρ c Fk (µ′′ ) − Fm′ (µ′ ) − Fm′ ′ (µ′′ − µ′ ) ≥ sup J(µ′ , µ′′ ) µ′′ ρ Vk (µ′ ) = sup >Fm′ (µ′ ) Last inequality is from the fact that µ′ ∈ (µ0 , µ]. Therefore, by definition of V (µ), V (µ) ≥ Vk (µ) on [µ0 , µ]. This contradicts the fact that V (µ) = Fm (µ). Lemma 11′ . ∀µ ≤ µ∗ s.t. V (µ) = Fm (µ), we have: Fm (µ) ≥ max ′ µ ≤µ c Fm−k (µ′ ) − Fm (µ) − Fm+ ′ (µ′ − µ) k ≜ Um (µ) ρ J(µ, µ′ ) Smoothness: Given our construction of V (µ), ∀µ s.t. V (µ) > F (µ), V is piecewise solution of the ODEs and is C (1) smooth by construction. However on {µ|V (µ) = F (µ)}, our definition of V is by taking supremum over an uncountable set of Vµ ’s. Therefore V (µ) is not necessarily differentiable. We now discuss smoothness of V on this set in details (we only discuss µ ≥ µ∗ and leave the remaining case to symmetry argument). Suppose µ ∈ {µ|V (µ) = F (µ)}o , then V = F locally on an open interval. To show smoothness of V , it’s sufficient to show smoothness of F . Suppose not, then µ = µm . However, at µm : ′ ′ ′ c Fm+1 (µ ) − Fm (µm ) − Fm (µ − µm ) =∞ lim µ′ →µ+ ρ J(µm , µ′ ) m Therefore, we apply the result just derived and get contradiction. Now we only need to discuss the boundary of {µ|V (µ) = F (µ)}. The first case is that {µ|V (µ) > F (µ)} is not dense locally. Therefore, V = F locally at only side of µ, which implies one sided smoothness. The only remaining case is that there exists µn → µ s.t. F (µn ) < V (µn ). We first show differentiability of V at µ. We already know that V (µ′ ) − V (µ) ≥ F ′ (µ)(µ′ − µ) (µ) since V ≥ F . Suppose now µn → µ+ and V (µµnn)−V ≥ F ′ (µ) + ε. Then apply Lemma 12 −µ to V (µ) − F (µ), we can pick µn → µ+ and V ′ (µn ) ≥ F ′ (µ) + ε. Consider ν(µn ) being the solution of posterior associated with µn , by definition of µn , ν(µn ) ≥ µm+1 (when µn < µm+1 , the objective function will be negative, therefore 54 suboptimal for sure). So we can pick a converging subsequence of µ2n to some ν ≥ µm+1 . then: F (µ) = lim V (µn ) c Fmn (ν(µn )) − V (µn ) − V ′ (µn )(ν(µn ) − µn ) = lim n→∞ ρ J(µn , ν(µn )) c Fmn (ν(µn )) − F (µn ) − (F ′ (µ) + ε)(ν(µn ) − µn ) ≤ lim n→∞ ρ J(µn , ν(µn )) c Fmn (ν(µn )) − F (µn ) − F ′ (µ)(ν(µn ) − µn ) cε ν(µn ) − µn ≤ lim − lim n→∞ ρ n→∞ J(µn , ν(µn )) ρ J(µn , ν(µn )) ′ c Fm′ (ν) − F (µ) − F (µ)(ν − µ) cε ν − µ = − ρ J(µ, ν) ρ J(µ, ν) <F (µ) (µn ) Contradiction. Now suppose µn → µ− and V (µ)−V ≤ F ′ (µ) − ε. Then similarly we can µ−µn choose µn s.t. V ′ (µn ) ≤ F ′ (µ) − ε. Choose ν, m being the optimal posterior and action at µ. Then: F (µ) = lim V (µn ) c Fm (ν) − V (µn ) − V ′ (µn )(ν − µn ) ≥ lim n→∞ ρ J(µn , ν) c Fm (ν) − V (µn ) − F ′ (µ)(ν − µn ) c ε(ν − µn ) + ≥ lim n→∞ ρ J(µn , ν) ρ J(µn , ν) ′ c Fm (ν) − V (µn ) − F (µ)(ν − µn ) c ε(ν − µn ) ≥ lim + lim n→∞ ρ J(µn , ν) n→∞ ρ J(µn , ν) ′ c Fm (ν) − F (µ) − F (µ)(ν − µ) cε ν − µ + = ρ J(µ, ν) ρ J(µ, ν) >F (µ) Contradiction. Therefore we showed that V will be differentiable everywhere. Now suppose V ′ is not continuous at µ. Utilizing previous proof, we have already ruled out the cases when limµ′ →µ+ > F ′ (µ) and limµ′ →µ− < F ′ (µ). Suppose now exists µn → µ+ and V ′ (µn ) ≤ F ′ (µ) − ε. Then consider: F (µ) = lim V (µn ) c Fm (ν) − V (µn ) − V ′ (µn )(ν − µn ) ≥ lim n→∞ ρ J(µn , ν) c Fm (ν) − V (µn ) − F ′ (µ)(ν − µn ) c ε(ν − µn ) ≥ lim + lim n→∞ ρ J(µn , ν) n→∞ ρ J(µn , ν) ′ c Fm (ν) − F (µ) − F (µ)(ν − µ) cε ν − µ + = ρ J(µ, ν) ρ J(µ, ν) >F (µ) Contradiction. When µn → µ− and V ′ (µn ) ≥ F ′ (µ) + ε, similarly as before, we can take ν(µn ) converging to ν ≥ µm+1 . Then: F (µ) = lim V (µn ) 55 c Fmn (ν(µn )) − V (µn ) − V ′ (µn )(ν(µn ) − µn ) n→∞ ρ J(µn , ν(µn )) c Fmn (ν(µn )) − F (µn ) − F ′ (µ)(ν(µn ) − µn ) cε ν(µn ) − µn ≤ lim − lim n→∞ ρ n→∞ ρ J(µn , ν(µn )) J(µn , ν(µn )) ′ c Fm′ (ν) − F (µ) − F (µ)(ν − µ) cε ν − µ = − ρ J(µ, ν) ρ J(µ, ν) = lim <F (µ) Contradiction. To sum up, we proved that V (µ) is differentiable on (0, 1) and V ′ (µ) is continuous on (0, 1). What’s more, since µ∗ ∗ is bounded away from {0, 1}, in the neighbour of {0, 1}, V = F . Therefore V (µ) is C (1) smooth on [0, 1]. Unimprovability: Now we prove the unimprovability of V (µ). • Step 1 : We first show that V (µ) solves the following problem: { } c Fm (µ′ ) − V (µ) − V ′ (µ)(µ′ − µ) V (µ) = max F (µ), max µ′ ,m ρ J(µ, µ′ ) { µ′ ≥ µ when µ ≥ µ∗ (P-C) µ′ ≤ µ when µ ≤ µ∗ Equation (P-C) is the maximization problem over all confirmatory evidence seeking with immediate decision making upon arrival of signals. It has less constraint than the definition of V (µ) in the following sense: when we are defining V (µ), we also optimize over confirmatory eivdence seeking and immediate decision making. But we optimizing over all possible actions in a sequential way – with µ increasing, the DM is forced to choose actions with decreasing index m. However in Equation (P-C), unimprovability is over all possible choice of actions. We still focus on the case µ ≥ µ∗ . For the case µ ≤ µ∗ , a totally symmetric argument applies by referring to Lemma 19′ . Case 1 : V (µ) > F (µ). Then there exists µ0 s.t. V (µ) = Vµ0 (µ), and m0 is the optimal action corresponding to µ0 . Suppose the associated action is m at µ. Then in the construction of Vµ0 , we explicitly constructed µ̂m′ ≤ µ, ∀m′ ∈ {m0 , . . . , m}. Then by Lemma 19, Equation (P-C) is satisfied at µ. Case 2 : V (µ) = F (µ). Then according to Step 4, there are two possibilities. If µ ∈ Ω, then by construction of Vµ , we have: c Fk (µ′ ) − F (µ) − F ′ (µ)(µ′ − µ) µ ≥µ,k ρ J(µ, µ′ ) F (µ) = max ′ This exactly is Equation (P-C). If µ ̸∈ Ω and F (µ) is larger than the maximum on RHS of Equation (B.4), then this also satisfies Equation (P-C). The only remaining contradictory case is that µ ̸∈ Ω and: c Fk (µ′ ) − F (µ) − F ′ (µ)(µ′ − µ) µ ≥µ,k ρ J(µ, µ′ ) F (µ) < max ′ By Lemma 11 we proved in the construction part, we conclude that this is impossible. 56 • Step 2 : Then we show that V (µ) solves the following problem: { } c V (µ′ ) − V (µ) − V ′ (µ)(µ′ − µ) V (µ) = max F (µ), max µ′ ρ J(µ, µ′ ) { µ′ ≥ µ when µ ≥ µ∗ (P-D) µ′ ≤ µ when µ ≤ µ∗ Equation (P-D) is the maximization problem over all confirmatory evidence seekings. It has less constraint than Equation (P-C) in the following sense: when a signal arrives and posterior belief µ′ is realized, the DM is allowed to continue experimentation instead of being forced to take an action. We only show that case µ ≥ µ∗ and a totally symmetric argument applies to µ ≤ µ∗ . Suppose not, then there exists: c Fm (µ′ ) − V (µ) − V ′ (µ)(µ′ − µ) c V (µ′′ ) − V (µ) − V ′ (µ)(µ′′ − µ) Ve = max ≤ V (µ) < µ′ ≥µ,m ρ J(µ, µ′ ) ρ J(µ, µ′′ ) Suppose the maximizer is µ′ , m. Optimality implies Equation (B.9) and Equation (B.8): ρ ρ Fm′ + Ve H ′ (µ′ ) = V ′ (µ) + Ve H ′ (µ) c ( ) ( c ) ( ) F (µ′ ) + ρ Ve H(µ′ ) − V (µ) + ρ Ve H(µ) = V ′ (µ) + ρ V (µ)H ′ (µ)(µ′ − µ) m c c c We define L(V, λ, µ)(µ′ ) as a linear function of µ′ L(V, λ, µ)(µ′ ) = (V (µ) + λH(µ)) + (V ′ (µ) + λH ′ (µ))(µ′ − µ) Define G(V, λ)(µ) as a function of µ: G(V, λ)(µ) = V (µ) + λH(µ) Then G(Fm , ρc Ve )(µ′ ) is a concave function of µ′ . Consider: ( ρ ) ( ρ e) ′ ′ e L V, V , µ (µ ) − G Fm , V (µ ) c c FOC implies it will be convex and attains minimum 0 at µ′ . For any m′ other than m, ( ρ ) ( ρ ) L V, Ve , µ (µ′ ) − G Fm′ , Ve (µ′ ) c c will be convex and weakly larger than zero. However, ( ρ ) ( ρ ) L V, Ve , µ (µ′′ ) − G V, Ve (µ′′ ) c ( c ) ρe ′′ ′ ′′ ′′ = − V (µ ) − V (µ) − V (µ)(µ − µ) − V J(µ, µ ) c <0 ( ) ( ) ρe ρe ′ Therefore L V, c V , µ (µ )−G V, c V (µ′ ) will have minimum strictly negative. Suppose it’s minimized at µ e. Then FOC implies: ρ ρ µ) + Ve H ′ (e µ) V ′ (µ) + Ve H ′ (µ) = V ′ (e c c 57 Consider: ( ( ρ e) ρe ) e (ν(e µ)) − G Fm , V (e ν) L V, V , µ c ) ( ρc ) ( ρ =L V, Ve , µ (ν(e µ)) − G Fm , Ve (ν(e µ)) c c ( ρe ρe ′ ) ′ + V (e µ) − V (µ) + V (H(e µ) − H(µ)) − V (µ) + V H (µ) (e µ − µ) c c ( ) ρ ρ ≥V (e µ) − V (µ) + Ve (H(e µ) − H(µ)) − V ′ (µ) + Ve H ′ (µ) (e µ − µ) c ( c ( ρ ) ) ρ µ) − L V, Ve , µ (e µ) =G V, Ve (e c c >0 In the first equality we used FOC. In first inequality we used suboptimality of µ e at µ. ′ However for m and ν(e µ) being optimizer at µ e: ( ρ ) ( ) ρ 0 =L V, V (e µ), µ e (ν(e µ)) − G Fm′ , V (e µ) (ν(e µ)) c) ( ρc ) ( ρ e (ν(e µ)) − G Fm′ , Ve (ν(e µ)) =L V, Ve , µ c c ρ + (V (e µ) − Ve )(H(e µ) − H(ν(e µ)) + H ′ (e µ)(ν(e µ) − µ e)) c ρ > (V (e µ − Ve ))J(e µ, ν(e µ)) c Contradiction. Therefore, we proved Equation (P-D). • Step 3 : We show that V satisfies Equation (B.3), which is less restrictive than Equation (P-D) by allowing 1) diffusion experiments. 2) evidience seeking of all possible posteriors instead of just confirmatory evidence. First, since V is smooth and has a differentiable optimizer ν, envelope theorem implies: c −V ′′ (µ)(ν − µ) −H ′′ (µ)(ν − µ) + V (µ) ρ J(µ, ν) J(µ, ν) ( ) ρ c ν−µ ′′ ′′ V (µ) + V (µ)H (µ) =− ρ J(µ, ν) c V ′ (µ) = >0 ρ =⇒ V ′′ (µ) + V (µ)H ′′ (µ) < 0 c Therefore, allocating to diffusion experiment will always be suboptimal. What’s more consider: c V (µ′ ) − V (µ) − V ′ (µ)(µ′ − µ) µ ≤µ ρ J(µ, µ′ ) ( c µ′ − µ ρ − ′′ ) −′ ′′ =⇒ V (µ) = − V (µ) + V H (µ) ρ J(µ, µ′ ) c V − (µ) = max ′ V − (µ∗ ) = V (µ∗ ) and whenever V (µ) = V − (µ), we will have V −′ (µ) < 0. Therefore, V − (µ) can never cross V (µ) from below. That is to say: { } 1 ′′ ′ ′ ′ 2 ρV (µ) = max ρF (µ), max p(V (µ ) − V (µ) − V (µ)(µ − µ)) + V (µ)σ µ′ ,p,σ 2 58 1 s.t. pJ(µ, µ′ ) + H ′′ (µ)σ 2 ≤ c 2 To sum up, we construncted a policy function ν(µ) and value function V (µ) solving Equation (B.3). Now we show the four properties in Theorem 2. First, by our construction algorithm, in the case µ∗ ∈ {0, 1}, we can replace µ∗ with µ∗∗ ∈ (0, 1). Therefore we can { } WLOG set µ∗ ∈ (0, 1). Second, E = µ ∈ [0, 1]V (µ) > F (µ) is an open set, thus a union of disjoint open intervals. By our construction, in each interval, ν is piecewise smoothly defined by ODE. Therefore, ν(µ) is a piecewise C (1) smooth function. Third, by Lemma 17, solution to the ODE is strictly decreasing. By our construction, on each interval constructing E, when µ ≥ µ∗ increase ν always jumps to an action with lower slope. Therefore ν is piecewise decreasing. When µ ≤ µ∗ symmetric argument applies. Finally, discussion in Lemma 17 shows that when restricted to only one action, ν is uniquely determined by FOC. Therefore, except for those discountinous points of ν, ν is uniquely defined. Number of such discontinuous points is countable, thus of zero measure. Lemma 12. Suppose f : D ∈ R 7→ R is continuous and f is differentiable at x s.t. f (x) ̸= 0. Then ∀x s.t. f (x) = 0: • limx′ →x+ f (x′ )−f (x) x′ −x > ε, then there exists xn → x+ s.t. f ′ (xn ) ≥ ε. • limx′ →x+ f (x′ )−f (x) x′ −x < −ε, then there exists xn → x+ s.t. f ′ (xn ) ≤ −ε. • limx′ →x− f (x′ )−f (x) x′ −x > ε, then there exists xn → x− s.t. f ′ (xn ) ≥ ε. • limx′ →x− f (x′ )−f (x) x′ −x < −ε, then there exists xn → x− s.t. f ′ (xn ) ≤ −ε. Proof. We only prove the first result and the other three follow by symmetric argument. ′ (x) (x) Suppose limx′ →x+ f (xx)−f > ε, then there exists xn → x+ s.t. f (xxnn)−f ≥ ε. Now define ′ −x −x ′ ′ ′ ′ g(x ) = f (x )−ε(x −x). We have g(x) = 0 and g(xn ) ≥ 0. Since g(x ) = 0 =⇒ f (x′ ) > 0, g is differentiable at its roots. Suppose g(x) ≤ 0 on [x, xn ], then ∀x′ < xn , g(x′ ) ≤ g(xn ). Therefore g ′ (xn ) ≥ 0 =⇒ f ′ (xn ) − ε ≥ 0. Suppose g(x′ ) > 0 for some x′ ∈ (0, xn ), then g(x′ ) is maximized at interiror point µ∗n and FOC implies g ′ (x∗n ) = 0 =⇒ f ′ (x∗n ) = ε. In this case, we deifne xn = x∗n . Since x∗n ∈ [x, xn ], the newly defined xn → x+ . Lemma 13. Define V + and V − : Fm (µ′ ) ρ µ ≥µ,m 1 + J(µ, µ′ ) c ′ F m (µ ) V − (µ) = max ρ µ′ ≤µ,m 1 + J(µ, µ′ ) c V + (µ) = max ′ There exists µ∗ ∈ [0, 1] s.t. V + (µ) ≥ V − (µ) ∀µ ≥ µ∗ ; V − (µ) ≤ V − (µ) ∀µ ≤ µ∗ . + − Proof. We define function Um and Um as following: + Um (µ) = max ′ µ ≥µ Fm (µ′ ) 1 + ρc J(µ, µ′ ) 59 − Um (µ) = max ′ µ ≤µ Fm (µ′ ) 1 + ρc J(µ, µ′ ) Since Fm (µ) is a linear function, J(µ, µ′ ) ≥ 0 and smooth, the objective function will be a continuous function on compact domain. Therefore both maximization operators are well defined. Existence is already guaranteed, therefore we can refer to first order condition to characterize the maximizer: ( ) ρ ρ ′ ′ (B.5) FOC : Fm 1 + J(µ, µ ) + Fm (µ′ ) (H ′ (µ′ ) − H ′ (µ)) = 0 c c ρ SOC : Fm′ (H ′ (µ′ ) − H ′ (µ)) (B.6) c First we discuss first problem where µ′ ≥ µ, since (1 + ρc J) > 0, H ′′ < 0, we have H ′ (µ′ ) − H ′ (µ) ≤ 0 and inequality is strict when µ′ > µ. Therefore, if Fm′ < 0, FOC being held will imply SOC being strictly positive. So ∀Fm′ < 0, optimal µ′ will be boundary. What’s more, Fm (µ) Fm (1) = Fm (µ) > Fm (1) > ρ 1 + c J(µ, µ) 1 + ρc J(µ, 1) If Fm′ = 0, then ∀µ′ > µ: Fm (µ′ ) fm (µ) ′ = F (µ) = F (µ ) ≥ m m 1 + ρc J(µ, µ) 1 + ρc J(µ, µ′ ) + Therefore ∀Fm′ ≤ 0, Um (µ) = Fm (µ). Then we only need to consider the case Fm′ > 0. We will have SOC strictly negative when FOC holds. Therefore solution of FOC characterizes maximizer. ρ ρ Fm′ (1 + J(µ, µ)) + Fm (µ) (H ′ (µ) − H ′ (µ)) = Fm′ > 0 c c ρ ρ ′ ′ lim Fm (1 + J(µ, µ )) + Fm (µ) (H ′ (µ′ ) − H(µ)) = −∞ ′ µ →1 c c Therefore a unique solution of µ′ exists by solving FOC. Since FOC itself is a smooth function of µ, µ′ , and SOC is non-diminishing, implicit function theorem implies µ′ being a smooth function of µ. This is sufficient to apply envelope theorem: Fm (µ′ )(−H ′′ (µ)(µ′ − µ)) d + Um (µ) = >0 ( )2 dµ 1 + ρ J(µ, µ′ ) c Let m being the first Fm′ > 0 (not necessarily exists). Let: + U + (µ) = max Um (µ) m≥m Then U + (µ) will be a strictly increasing function when it’s well defined and we define: { } V + (µ) = max F (µ), U + (µ) Remark. U + doesn’t necessarily exist. Existence of U + is equivalent ot existence of an strictly increasing Fm . Similar for U − in the following discussion for the other case. 60 Similarly, in the second problem, we have H ′ (µ′ ) − H ′ (µ) ≥ 0. Therefore ∀Fm′ ≥ 0, we have − Um (µ) = Fm (µ) ∀Fm′ < 0, we have a unique and smooth solution characterized by FOC. Envelope theorem implies: Fm (µ′ )(−H ′′ (µ))(µ′ − µ) d − Um (µ) = <0 ( )2 dµ 1 + ρ J(µ, µ′ ) c We can define: { } V − (µ) = max F (µ), U − (µ) Where U − (µ) is a strictly decreasing function which might not exist. However, at least one of U − (µ) and U + (µ) exists otherwise the decision problem is trivial. Now we can discuss the intersection of U + and U − . We first eliminate one possible case: U + and U − has interior intersection µ∗ ∈ (0, 1) but at µ∗ , U + (µ∗ ) = U − (µ∗ ) = F (µ∗ ) ̸= 0. We show by contradiction that this is not possible. Suppose such µ∗ exists and F (µ∗ ) = Fm (µ∗ ), Fm′ < 0. Previous FOC implies ρ ρ lim Fm′ (1 + J(µ, µ′ )) + Fm (µ) (H ′ (µ′ ) − H ′ (µ)) = Fm′ < 0 ′ µ →µ c c − and boundary solution Therefore there must exists interior solution µ′ < µ∗ maximizes Um − ∗ ∗ µ must be suboptimal. Contradicting U (µ ) = F (µ ). By symmetry, we can also prove the other case Fm′ > 0 doesn’t exist. Therefore, we can define µ∗ in the following three cases: • Case 1: Both U + , U − exist and intersect at µ∗ ∈ (0, 1), where U + (µ∗ ) = U − (µ∗ ) ≥ F (µ∗ ), then we define this intersection as µ∗ . Equality holds only when F ′ (µ) = 0. • Case 2: Only U + exists or U + ≥ U − , we define µ∗ = 0. • Case 3: Only U − exists or U − ≤ U + , we define µ∗ = 1. Finally, we define V as: V (µ) = { V + (µ) when µ ≥ µ∗ V − (µ) when µ ≤ µ∗ Given our construction, µ∗ always exists and satisfies the conditions in Lemma 13. Lemma 14. Define V : { } V (µ) := max V + (µ), V − (µ) V is a piecewise smooth function. It is increasing if µ > µ∗ , V ′ > 0 when V (µ) > F (µ); decreasing if µ < µ∗ , V ′ < 0 when V (µ) > F (µ). 61 Proof. We discuss three cases of µ∗ separately: • Case 1: µ∗ ∈ (0, 1). Suppose F ′ (µ) = 0, then it’s trivial. Because both V and U + are increasing on [µ∗ , 1], then V = max {F, U + } must be increasing. What’s more, when V > F , V = U + , therefore V ′ > 0. By symmetric argument we can prove for [0, µ∗ ]. { } Suppose F ′ (µ) > 0, then V (µ) > F (µ). Consider µ1 = inf µ ≥ µ∗ |V (µ) = F (µ) , then F ′ ≥ V ′ > 0 on the left of µ1 . Then on [µ∗ , µ1 ], V = U + , on [µ1 , 1], F ′ > 0. Therefore V is strictly increasing on [µ∗ , 1]. By symmetric argument we can prove for [0, µ∗ ]. • Case 2: µ∗ = 0. Suppose U + ≥ U − , then U + ≥ F and the result is trivial. Suppose U − doesn’t exist, then F itself is increainsg, then V = max {U + , F } is increasing. • Case 3: µ∗ = 1. Suppose U + ≤ U − , then U − ≥ F and the result is trivial. Suppose U + doesn’t exist, then F itself is decreainsg, then V = max {U + , F } is decreasing. Lemma 15. Assume µ0 ≥ µ∗ , Fm′ ≥ 0, V0 , V0′ ≥ 0 satisfies: V (µ0 ) ≥ V0 ≥ Fm (µ0 ) c Fm (µ′ ) − V0 − V0′ (µ′ − µ0 ) V = max 0 µ′ ≥µ ρ J(µ0 , µ′ ) Then there exists a C (1) smooth and strictly increasing V (µ) defined on [µ0 , 1] satisfying V (µ) = max ′ µ ≥µ c Fm (µ′ ) − V (µ) − V ′ (µ)(µ′ − µ) ρ J(µ, µ′ ) (B.7) and initial condition V (µ0 ) = V0 , V ′ (µ0 ) = V0′ . Proof. We start from deriving the FOC and SOC for Equation (B.7): Fm′ − Vm′ (µ) Fm (µ′ ) − Vm (µ) − Vm′ (µ)(µ′ − µ) + (H ′ (µ′ ) − H ′ (µ)) = 0 ′ ′ 2 J(µ, µ ) J(µ, µ ) ( ′ ) ′ ′ ′ ′ H (µ ) − H (µ) Fm − Vm (µ) Fm (µ′ ) − Vm (µ) − Vm′ (µ)(µ′ − µ) ′ ′ ′ SOC: + (H (µ ) − H (µ)) J(µ, µ′ ) J(µ, µ′ ) J(µ, µ′ )2 H ′′ (µ′ ) (Fm (µ′ ) − Vm (µ) − Vm′ (µ)(µ′ − µ)) ≤ 0 + J(µ, µ′ ) FOC: If we impose feasibility: Vm (µ) = c Fm (µ′ ) − Vm (µ) − Vm′ (µ)(µ′ − µ) ρ J(µ, µ′ ) (B.8) FOC and SOC reduces to: ρ FOC: Fm′ − Vm′ (µ) + V (µ)(H ′ (µ′ ) − H ′ (µ)) = 0 c ρ ′′ ′ SOC: H (µ )V (µ) ≤ 0 c 62 (B.9) (B.10) We proceed as following, we use FOC and feasiblity to derive an ODE system with intial value defined by V0 , V0′ . Then we prove that the solution must be strictly positive. Therefore, SOC is satisfied at the point where FOC is satisfied, the solution must be locally maximizer. What’s more, since H ′ (µ′ ) − H ′ (µ) < 0, when FOC is positive, SOC must be negative, then FOC will have a unique solution. Therefore the solution we get from the ODE system will be solution to problem Equation (B.7). F (µ′ ) − Vm′ (µ)(µ′ − µ) Equation (B.8) =⇒ Vm (µ) = m 1 + ρc J(µ, µ′ ) Equation (B.9) =⇒ V ′ (µ) = F ′ + ρ V (µ)(H ′ (µ′ ) − H ′ (µ)) m m c ( ) Fm (µ′ ) − Fm′ + ρc Vm (µ)(H ′ (µ′ ) − H ′ (µ)) (µ′ − µ) =⇒ Vm (µ) = 1 + ρc J(µ, µ′ ) ( ) ρ ρ =⇒ Vm (µ) 1 + J(µ, µ′ ) = Fm (µ) − Vm (µ)(H ′ (µ′ ) − H ′ (µ))(µ′ − µ) c c Fm (µ) =⇒ Vm (µ) = ρ 1 + c (H(µ) − H(µ′ ) + (H ′ (µ) + H ′ (µ′ ) − H ′ (µ)) (µ′ − µ)) Fm (µ) Vm (µ) = 1 − ρ J(µ′ , µ) c (B.11) =⇒ ρ Fm (µ)(H ′ (µ′ ) − H ′ (µ)) ′ ′ c Vm (µ) = Fm + 1 − ρc J(µ′ , µ) Consistency of Equation (B.11) implies that µ′ = ν(µ) characterized by the following ODE: ρ F (µ)(H ′ (ν) − H ′ (µ)) ∂ ∂ Fm (µ) Fm (µ) ′ c m + ν̇ = F + m ∂µ 1 − ρc J(ν, µ) ∂ν 1 − ρc J(ν, µ) 1 − ρc J(ν, µ) (B.12) Simplifying Equation (B.12), we get: ρ ρ F (µ)(H ′ (ν) − H ′ (µ)) F (µ)H ′′ (ν)(µ − ν) Fm′ c m c m + + )2 )2 ν̇ ( ( 1 − ρc J(ν, µ) 1 − ρc J(ν, µ) 1 − ρc J(ν, µ) Fm′ + ρc (−Fm′ J(ν, µ) + Fm (µ)(H ′ (ν) − H ′ (µ))) = 1 − ρc J(ν, µ) =⇒ Fm (µ)(H ′ (ν) − H ′ (µ)) + Fm (µ)H ′′ (ν)(µ − ν)ν̇ ρ =(−Fm′ J(ν, µ) + Fm (µ)(H ′ (ν) − H ′ (µ)))(1 − J(ν, µ)) c ρ ρ ′′ ′ =⇒ Fm (µ)H (ν)(µ − ν)ν̇ = −Fm J(ν, µ)(1 − J(ν, µ)) − J(ν, µ)Fm (µ)(H ′ (ν) − H ′ (µ)) c c ( ) Fm′ 1 − ρc J(ν, µ) + ρc Fm (µ)(H ′ (ν) − H ′ (µ)) =⇒ ν̇ = J(ν, µ) Fm (µ)H ′′ (ν)(ν − µ) ( ) ρ Fm′ 1 + c J(µ, ν) + ρc Fm (ν)(H ′ (ν) − H ′ (µ)) ν̇ = J(ν, µ) Fm (µ)H ′′ (ν)(ν − µ) Since we want to solve for V0 on [µ0 , 1], we solve for ν0 at µ0 as the initial condition of ODE for ν. To utilitze Lemma 17, we need to first verify the inequality condition in Lemma 17. 63 ∗ The FOC characterizing optimizer νm of V (µ0 ) is Equation (B.5): ) ( ρ ∗ 0 ρ ∗ Fm′ 1 + J(µ0 , νm ) + Fm (νm ) (H ′ (νm ) − H ′ (µ0 )) = 0 c c The FOC characterizing ν is Equation (B.11): ( ) ρ ρ (Fm′ − V0′ ) 1 − J(ν0 , µ0 ) + Fm (µ0 ) (H ′ (ν0 ) − H ′ (µ0 )) = 0 c ) c ( ( ) ρ ρ ρ ′ ⇔Fm 1 + J(µ0 , ν0 ) + Fm (ν0 )(H ′ (ν0 ) − H ′ (µ)) = V0′ 1 − J(ν0 , µ0 ) c ) ( ) c ) (c ( ρ ρ ρ ⇔Fm (µ0 ) Fm′ 1 + J(µ0 , ν0 ) + Fm (ν0 )(H ′ (ν0 ) − H ′ (µ)) = V0′ Fm (µ0 ) 1 − J(ν0 , µ0 ) c c c (µ0 ) Since V0 = 1−FρmJ(ν ≥ 0, we can conclude that LHS is weakly positive. This satisifes 0 ,µ0 ) c the condition in Lemma 17 when Fm (µ0 ) ̸= 0. When Fm (µ0 ) = 0 then the condition holds for sure. Then Lemma 17 guarantees existence of ν(µ), which is continuously decreasing from µ0 until it hits ν(µ) = µ. Suppose ν is minimized at µm < 1, we define Vm (µ) as following: Fm (µ) if µ ∈ [µ0 , µm ) ρ Vm (µ) = 1 − c J(ν(µ), µ) F (µ) if µ ∈ [µ , 1] m m Then we prove the properies of Vm : 1. When µ → µm , ν(µ) → µ. Therefore J(ν, µ) → 0. This implies Vm (µ) → Fm (µ). So Vm is continuous. 2. By Equation (B.11), when µ ∈ [µ0 , µm ): Vm′ (µ) = Fm′ + Fm (µ)(H ′ (ν(µ)) − H ′ (µ)) c − J(ν(µ), µ) ρ When µ → µm , H ′ (ν(µ)) − H ′ (µ) → 0, J(ν(µ), µ) → 0. Thus Vm′ (µ) → Fm′ . So Vm′ will be continuous everywhere on [µ0 , 1]. Vm ∈ C (1) [µ0 , 1]. 3. By Lemma 17, ν will be decreasing when µ < µm . Therefore 1 − ρc J(ν, µ) will be increasing. So V (µ) > 0 ∀µ ∈ [µ0 , µm ). It’s trivial that when µ ≥ µm , Vm (µ) = Fm (µ). Therefore Vm (µ) > 0 everywhere. By our previous discussion, ν will actually be the maximizer. 4. Rewrite Equation (B.11) on [µ0 , 1]: ( ) Fm′ 1 + ρc J(µ, ν) + Fm (ν)(H ′ (ν) − H ′ (µ)) ′ Vm (µ) = 1 − ρc J(ν, µ) Accodring to proof of Lemma 17, Vm′ (µ) > 0 ∀µ ∈ (µ0 , 1]. 64 (B.13) Lemma 16. ∀δ, η > 0, ∀µ, ν s.t. µ, ν ∈ (δ, 1 − δ), |Fm (µ)| > η, ( ) Fm′ 1 + ρc J(ν, µ) + ρc Fm (ν)(H ′ (ν) − H ′ (µ)) L(µ, ν) = J(ν, µ) (ν − µ)Fm (µ)H ′′ (ν) L(µ, ν) is uniformly Lipschtiz continuous in ν and continuous in µ. Proof. It’s not hard to see that ∀ξ > 0 when |µ − ν| > ξ, then L(µ, ν) will be well behaved. We will discuss this case and µ = ν separately: • When |µ − ν| > ξ, there exists ε, ∆ > 0 s.t.: ∆ ≥ |Fm (µ)| ≥ η ∆ ≥ |H ′′ (ν)| ≥ ε ∆ ≥ |H ′ (µ)| , |H ′ (ν)| ∆ ≥ |H(µ)| , |H(ν)| and H ′′ (µ) having Lipschtiz parameter ∆ on [δ, 1 − δ]. Then: |L(µ, ν) − L(µ, ν ′ )| ( ( ) ) J(ν, µ) F ′ 1 + ρ J(µ, ν) + ρ Fm (ν)(H ′ (ν) − H ′ (µ)) m c c − J(ν ′ , µ) (F ′ (1 + ρ J(µ, ν ′ )) + ρ F (ν ′ )(H ′ (ν ′ ) − H ′ (µ))) m m c c = ′′ (ν − µ)Fm (µ)H (ν) ) ) ( ( ρ ρ + J(ν ′ , µ) Fm′ 1 + (µ, ν) + Fm (ν)(H ′ (ν) − H ′ (µ)) c c ′ (ς − µ)Fm (µ)H ′′ (ν ′ ) − (ν − µ)Fm (µ)H ′′ (ν) × ′ (ν − µ)Fm (µ)H ′′ (ν ′ ) × (ν − µ)Fm (µ)H ′′ (ν) ρ c Fm (ν̃) (H(µ) − H(ν̃) + (ν̃ − µ)(2H ′ (ν̃ − H ′ (µ)))) ( ) ρ ′ ′′ 1 + + (ν̃ − µ)F J(µ, ν̃) m c |H (ν̃)| |ν ′ − ν| ≤ ηξε ( ( )) ∆2 |ν − ν ′ | + ∆2 |ν ′ − ν| ρ + 3∆ Fm′ (1 + 3∆) + 2∆2 c ξ 2 η 2 ε2 ) ( ( ) ρ ρ ( ( ) ρ ) 3 ′ 1 + 3∆ + δF 3∆ ρ 2∆2 m c + 3∆Fm′ 1 + 3∆ + 2∆2 ≤ c |ν − ν ′ | ξεη c c ξ 3 η 3 ε2 Therefore L(µ, ν) is uniformly Lipschtiz continuous in ν. • When ν → µ, we still use the parameters defined in first case: L(µ, ν) (µ − ν)H ′′ (ν̃) ρc Fm (ν̃ ′ )(ν − µ) = ρ ν−µ (ν − µ)2 Fm (µ)H ′′ (ν) c ∆2 ≤ η 65 ν 1.0 0.8 0.6 0.4 0.2 0.2 0.4 0.6 v(μ) ⋁(μ)=μ 0.8 1.0 μ ⋁m * (μ) ν(µ) is defined by: ρc J(ν(µ), µ) = 1. ) ( ∗ (µ) is defined by: F ′ 1 + ρ J(µ, ν ∗ (µ)) + ρ F (ν ∗ (µ))(H ′ (ν ∗ (µ)) − νm m m c c m m H ′ (µ)) = 0. The red line and blue lines are solution path of ODE µ̇ = L(µ, ν) with initial value satisfying Lemma 17. Figure B.9: Phase diagram of (µ̇, ν̇). 66 Combine the two cases, L(µ, ν) will be uniformly Lipschtiz continous for all ν ∈ [δ, 1 − δ]. On the other hand, continuous lf L(µ, ν) on µ will be trivial because all components constructing L are continuous when µ is bounded aways from ν. When µ → ν, the result in second case we discussed before can also be used to show continuity in µ. Lemma 17. Assume µ0 ∈ [µ∗ , 1), Fm (µ0 ) ̸= 0, Fm′ ≥ 0, ν0 ∈ [µ0 , 1) satisfies: ( ( ) ρ ) ρ ′ ′ ′ Fm (µ0 ) Fm 1 + J(µ0 , ν0 ) + Fm (ν0 )(H (ν0 ) − H (µ0 )) ≥ 0 c c Then there is a continuous function ν on [µ0 , 1] satisfying initial condition ν(µ0 ) = ν0 . On {µ|ν(µ) > µ}, ν is differentiable, strictly decreasing and satisfis ODE: ( ) Fm′ 1 + ρc J(µ, ν) + ρc Fm (ν)(H ′ (ν) − H ′ (µ)) ν̇ = J(ν, µ) (ν − µ)Fm (µ)H ′′ (ν) Proof. Before we proceed to solving the ODE, we characterize the dynamics of (µ, ν) on [0, 1]2 . Figure B.9 shows the phase diagram of µ̇, ν̇ on [0, 1]2 and some important functions that determines the dynamics of (µ, ν). The horizongtal axis is µ and vertical axis is ν. The black line is ν = µ. The two thin black lines characterizes ν(µ) as the solutions to: ρ 1 − J(ν(µ), µ) = 0 c The two dashed black lines characterizes ν ∗ (µ) as the two solutions to: ( ) ρ ρ ′ ∗ Fm 1 + J(µ, ν (µ)) + Fm (ν ∗ (µ))(H ′ (ν ∗ (µ)) − H ′ (µ)) = 0 c c Since we are discussing the case ν ≥ µ, we only focus on the upper left half the graph: • F (µ0 ) < 0. This corresponds to the left half of the graph. ( ) ρ ρ Fm′ 1 + J(µ0 , ν0 ) + Fm (ν0 )(H ′ (ν0 ) − H ′ (µ0 )) ≤ 0 c c ∗ =⇒ ν0 ≥ ν (µ0 ) Therefore our initial condition means (µ0 , ν0 ) lies in the red region. ν̇ = 0 when ν(µ) = ν ∗ . otherwise ν̇ < 0. When F (µ) is close to 0, ν̇ goes to negative infinity if ν > ν ∗ (µ). So the dynamics of ν in this region must have ν strictly decreasing and reaches ν ∗ when F (µ) = 0. Intuitively, ν will never reach the region ν > ν0 . Then uniform Lipschtiz continuity of L(µ, ν) on ν ∈ [µ, ν0 ], for µ ∈ [µ0 , F −1 (−η)] will be enough to guarantee existence of solution. • F (µ0 ) > 0. This corresponds to the right half of the graph. ( ) ρ ρ ′ Fm 1 + J(µ0 , ν0 ) + Fm (ν0 )(H ′ (ν0 ) − H ′ (µ0 )) ≥ 0 c c =⇒ ν0 ≤ ν ∗ (µ0 ) Our intial condition will lie below the dashed line in blue region. L(µ, ν) < 0 in this region and L(µ, ν ∗ ) = 0. So the dynamics of ν in this region must have ν strictly decreasing until it reaches ν = µ. Then uniform Lipschtiz continuity of L(µ, ν) on ν ∈ [µ, ν0 ] for µ ∈ [µ0 , 1] will be sufficient ot guanrantee existence of solution. 67 Then we discuss in details the solution of ODEs: • Fm (µ0 ) > 0. Our conjecture is that solution ν will be no larger than ν0 within the region: µ ∈ [µ0 , ν0 ], ν ∈ [µ0 , ν0 ]. Therefore, we modify L(µ, ν) to define L̃(µ, ν) on the whole space: L̃(µ, ν) = L (max {min {µ, ν0 } , µ0 } , max {min {ν, ν0 } , µ0 }) It’s not hard to see that L̃ is uniformly Lipschtiz continuous w.r.t ν ∈ R for µ ∈ [0, 1] and continuous in µ ∈ [0, 1]. We can apply Picard-Lindelof to solve for ODE ν̃˙ = L̃(µ, ν̃) on the space with initial condition ν̃(µ0 ) = ν0 . – Consider ν̃ on [µ0 , 1], it starts at ν0 > µ0 . It first reaches ν = µ at µ ∈ (µ0 , 1] (we define it to be 1 when it doesn’t exist). Then for µ ∈ (µ0 , µ), we must have ∗ L(µ, ν̃) < 0. Suppose not, then there exists ν̃(µ) ≥ νm (µ) > ν0 . We pick a smallest µ such that this is true. Then this µ must be strictly larger than µ0 ∗ ∗ (µ ). Then at µ, ṽ(µ) ˙ because L(µ,0 , ν0 ) = 0 < ν˙m = 0 but ν̇m (µ) > 0. It’s 0 ∗ ˙ impossible that ṽ crosses νm from below. Contradiction. Then ν̃ < 0 until it hits ν = µ. – µ < ν0 . Suppose µ ≥ µ0 , since ν̃ < 0 on (µ0 , µ), ν̃(µ) < ν0 . Contradiction. Therefore, ν̃ on [µ0 , µ] will be with region [µ0 , ν0 ]. In the region [µ0 , µ] × [µ0 , ν0 ], L̃ coincides L. Therefore, ṽ is a solution to original ODE Equation (B.12). We define ν: { ν̃(µ) if µ ∈ [µ0 , µ] ν(µ) = µ if µ ∈ [µ, 1] It’s easy to verify that ν satisfies Lemma 17. The blue line on Figure B.9 illustrates a solution in this case. • Fm (µ0 ) < 0. Define µ0 = F −1 (0), our conjecture is that solution ν will be decreasing on [µ0 , µ0 ). ∀η > 0, define µη = F−1 (−η), we modify L(µ, ν) to define L̃(µ, ν) on the whole space: ∗ L̃(µ, ν) = L (max (min (µ, µη ) , µ0 ) , max {min {ν, ν0 } , νm (µ)}) It’s not hard to see that L̃ is uniformly Lipschtiz continuosu w.r.t. ν ∈ R for µ ∈ [0, 1] and continuous in µ ∈ [0, 1]. We can apply Picard-Lindelof to solve for ODE ν̃˙ = L̃(µ, ν̃) on the space with initial condition ν̃(µ0 ) = ν0 . ν̃ will be strictly decreasing on (µ0 , µη ]. ∗ is must crosses from below and this is not possible. Because when ν̃ first touches νm η Then, when µ ∈ [µ0 , µ ], we have L(µ, ν̃) = L̃(µ, ν̃). Therefore ν̃ is a solution to original ODE Equation (B.12). Then we extend ν̃ to [µ0 , µ0 ) by taking η → 0 and define: { ν̃(µ) if µ ∈ [µ0 , µ0 ) ν(µ) = limµ→F −1 (0) ν̃(µ) if µ = F −1 (0) 68 First since ν̃ is decreasing, the sup limit will actually be the limit and ν ∈ C[µ0 , µ0 ]. Then we show that this extension is left differentiable at µ0 . Consider: V (µ) = 1− Fm (µ) ρ J(ν(µ), µ) c By Equation (B.13), we know that on [µ0 , µ0 ) sign of V ′ is determined by sign of 1 − ρc J(ν(µ), µ). At initial value, V0 ≥ 0 =⇒ 1 − ρc J(ν0 , µ0 ) > 0. On the other hand, V (µ) will be bounded above by V . So 1 − ρc J(ν(µ), µ) as a continuous function of µ has to stay above 0. Therefore V ′ (µ) > 0 on [µ)0 , µ0 ). By monotonic convergence, there exists limµ→µ0− V (µ). Define it as V (µ0 ). We define: ν̇(µ0 ) = ′ Fm + ρc (H ′ (ν(µ0 )) − H ′ (µ0 )) V (µ0 ) ρ ′′ H (ν(µ0 ))(ν(µ0 ) − µ0 ) c 0 ) . Suppose not, there exists ε > 0, µn → µ0 Now we show that ν̇(µ0 ) = limµ→µ0 ν(µ)−ν(µ µ−µ0 0) s.t. ν̇(µ0 ) − ν(µµnn)−ν(µ > ε. Suppose ν(µn ) > ν(µ0 ) + (ν̇(µ0 ) − ε) (µn − µ0 ): −µ0 Fm (µ) + (ν̇(µ0 ) − ε)(µn − µ0 ), µn ) 1− Fm′ =⇒ lim V (µn ) ≤ ρ n→∞ (−H ′ (ν(µ0 )) + H ′ (µ0 ) + H ′′ (ν(µ0 ))(ν(µ0 ) − µ0 )(ν̇(µ0 ) − ε)) c Fm′ <ρ (−H ′ (ν(µ0 )) + H ′ (µ0 ) + H ′′ (ν(µ0 ))(ν(µ0 ) − µ0 )ν̇(µ0 )) c V (µn ) < ρ J c (ν 0 =V (µ0 ) First strict inequality is from 1 − ρc J(ν, µ) strictly increasing in ν. When Fm (µ) < 0, Fm (µ) will be decreasing in ν. Second inequality is by taking limit of lower bounded 1− ρc J(ν,µ) of V (µn ) with L’Hospital rule. Third strict inequality is from ε > 0, H ′′ < 0. Last equality is from definition of ν̇(µ0 ). We get contradiction. Similarly, we cna rule out ν(µn ) < ν(µ0 ) + (ν̇(µ0 ) + ε)(µn − µ0 ). Therefore, we extended ν to [µ0 , µ0 ] such that it’s differentiable on [µ0 , µ0 ] and smooth on (µ0 , µ0 ). Let µ0 = µ0 , ν0 = ν(µ0 ), ν0′ = ν̇(µ0 ), then ν0 > µ0 and { 1 − ρc J(ν0 , µ0 ) = 0 ρ (H ′ (µ0 ) c − H ′ (ν0 ) + H ′′ (ν0 )(ν0 − µ0 )ν0′ ) = V (µ0 ) ′ Fm ≤ V (µ0 ) ′ Fm Then by Lemma 18, we can solve for ν(µ) on [µ0 , 1] satisfying the conditions in Lemma 17. What’s more, ν̇(µ0 ) = ν0 , then ν is differentiable at µ0 . For any other points in {µ|ν(µ) > µ}, ν is C (1) smooth. Since ν0′ < 0, then the solved ν will be strictly decreasing. Lemma 18. Assume Fm (µ0 ) = 0, Fm′ > 0, ν0 ∈ [µ0 , 1), ν0′ satisfies { 1 − ρc J(ν0 , µ0 ) = 0 0< ρ c (H ′ (µ0 ) − H ′ (ν0 ) + H ′′ (ν0 )(ν0 − µ0 )ν0′ ) ≤ 69 V (µ0 ) ′ Fm Then there is a continuous function ν on [µ0 , 1] satifying initial condition ν(µ0 ) = ν0 , ν̇(µ0 ) = ν0′ . On {µ|ν(µ) > µ}, ν is differentiable, strictly increasing and satisfies ODE: ) ( Fm′ 1 + ρc J(µ, ν) + ρc Fm (ν)(H ′ (ν) − H ′ (µ)) ν̇ = J(ν, µ) (ν − µ)Fm (µ)H ′′ (ν) ∗ Proof. ∀ µ1 ∈ (µ0 , 1), ∀ν1 ∈ [µ1 , νm (µ1 )), we consider the solution of ODE with initial condition (µ0 , ν0 ). ∀η > 0, define µη = F −1 (η). Then like the proof of Lemma 17, we ∗ can solve for a smooth ν on [µη , µ]. ν will be strictly decreasing below νm and strictly ∗ increasing over νm . Consider the slope of ν: H ′ (ν) − H ′ (µ) ν̇ = ′′ = L(µ, ν) H (ν)(ν − µ) ν itself satisfies ODE Equation (B.12), then uniqness of solution to ODE implies ν < ν ∀µ ∈ [µη , µ]. So solution must lies in the blue region in Figure B.9. When ν1 → ν(µ1 ), 1 − ρc J(ν, µ) → 0∀µ. Thus V (µ) → ∞. On the other hand, when µ1 → µ0 , ν1 = µ1 , V (µ) → Fm (µ0 ) = 0. Then when we move around µ1 , ν1 , we will have ν̇(µ0 ) higher or lower than ν0′ . We index V (µ0 ) by initial value (µ1 , ν1 ): V0 (µ1 , ν1 ). We show that V0 (µ1 , ν1 ) is continuous in (µ1 , ν1 ). Suppose not, then there exists limµn1 ,ν1n →µ1 ,ν1 V0 (µn1 , ν1n ) ̸= V0 (µ1 , ν1 ). On the other hand, we index V (µη ) by initial value (µ1 , ν1 ): Vη (µ1 , ν1 ), then continuous dependence of ODE guanrantees that limµn1 ,ν1n →µ1 ,ν1 Vη (µn1 , ν1n ) = Vη (µ1 , ν1 ). Therefore, ∀N , there exists η s.t. limµn ,ν n →µ1 ,ν1 V0 (µn1 , ν1n ) − V0 (µ1 , ν1 ) 1 1 > 3N |µ0 − µη | Then by continuity, we can have η sufficiently small that: limµn ,ν n →µ1 ,ν1 V0 (µn1 , ν1n ) − Vη (µ1 , ν1 ) 1 1 |µ0 − µη | > 2N Then we can have n sufficiently large that: |V0 (µn1 , ν1n ) − Vη (µn1 , ν1n )| >N |µ0 − µη | Then there must exists µ̃N s.t. |V ′ (µ̃N )| > N . On the other hand, |V ′ | must be bounded because: Fm (ν) − V ′ (µ)(ν − µ) V (µ) = 1 + ρc J(µ, ν) When V ′ going to positive infinity, V (µ) will go to Fm (µ). When V ′ going to negative infinity, V (µ) will go to positive infinity. Both cases are impossible. Therefore, V0 (µ1 , ν1 ) will be a continuous function on initial value. To sum up, there exists initial value (µ1 , ν1 ) s.t. the solved ν̇(µ0 ) = ν0′ . 70 Lemma 15′ . Assume µ0 ≤ µ∗ , Fm′ ≤ 0, V0 , V0′ satisfies: V (µ0 ) ≥ V0 ≥ Fm (µ0 ) c Fm (µ′ ) − V0 − V0′ (µ′ − µ0 ) V0 = max µ′ ≤µ ρ J(µ0 , µ′ ) Then there exists a C (1) smooth and strictly decreasing V (µ) defined on [0, µ0 ] satisfying V (µ) = max ′ µ ≤µ c Fm (µ′ ) − V (µ) − V ′ (µ)(µ′ − µ) ρ J(µ, µ′ ) (B.7’) and initial condition V (µ0 ) = V0 , V ′ (µ0 ) = V0′ . Lemma 17′ . Assume µ0 ∈ (0, µ∗ ], Fm (µ0 ) ̸= 0, Fm′ ≤ 0, ν0 ∈ (0, µ0 ] satisfies: ( ( ) ρ ) ρ Fm (µ0 ) −Fm′ 1 + J(µ0 , ν0 ) + Fm (ν0 )(H ′ (µ0 ) − H ′ (ν0 )) ≥ 0 c c Then ∃ν ∈ C[0, µ0 ] satisfying initial condition ν(µ0 ) = ν0 . On {µ|ν(µ) > ν}, ν is differentiable, strictly decreasing and satifies ODE: ( ) Fm′ 1 + ρc J(µ, ν) + ρc Fm (ν)(H ′ (ν) − H ′ (µ)) ′ ν = J(ν, µ) (ν − µ)Fm (µ)H ′′ (ν) Lemma 18′ . Assume Fm (µ0 ) = 0, Fm′ < 0, ν0 ∈ (0, µ0 ], ν0′ satisfies { 1 − ρc J(ν0 , µ0 ) = 0 0 > ρc (H ′ (µ0 ) − H ′ (ν0 ) + J ′′ (ν0 )(ν0 − µ0 )ν0′ ) ≥ V (µ0 ) ′ Fm Then ∃ ν ∈ C[0, µ0 ] satifying initial condition ν(µ0 ) = ν0 , ν̇(µ0 ) = ν0′ . On {µ|ν(µ) > µ}, ν is differentiable, strictly decreasing and satisfies ODE: ( ) Fm′ 1 + ρc J(µ, ν) + ρc Fm (ν)(H ′ (ν) − H ′ (µ)) ′ ν = J(ν, µ) (ν − µ)Fm (µ)H ′′ (ν) Lemma 19. Suppose at µ0 , V0 , V0′ , k ≥ 1 satisfies: ′ ′ ′ ′ ′ ′ V0 = max c Fm−k (µ ) − V0 − V0 (µ − µ) ≥ max c Fm (µ ) − V0 − V0 (µ − µ) µ′ ≥µ ρ µ′ ≥µ ρ J(µ, µ′ ) J(µ, µ′ ) V (µ ) ≥ V ≥ F (µ ) 0 0 m−k 0 Vm−k is the solution as defined in Lemma 15 with initial condition V0 , V0′ , then ∀µ ∈ [µ0 , ν(µ0 )]: Vm−k (µ) ≥ ′ (µ)(µ′ − µ) c Fm′ (µ′ ) − Vm−k (µ) − Vm−k µ′ ≥µ,m ∈[m−k,m] ρ J(µ, µ′ ) max ′ 71 Proof. We first show that: V0 ≥ c Vm−k (µ′ ) − V0 − V0′ (µ′ − µ) max J(µ, µ′ ) µ′ ∈[µ0 ,µm ] ρ Suppose not, then there exists µ′ s.t. V0 < c Vm−k (µ′ ) − V0 − V0′ (µ′ − µ) ρ J(µ, µ′ ) (B.14) By definition of V0 , we must have Vm−k (µ′ ) > Fm−k (µ′ ). The inequality is trivial because if Fm−k (µ′ ) = Vm−k (µ′ ), then choosing µ′ will be suboptimal. Optimality implies Equation (B.9) and Equation (B.8): ρ ρ ′ Fm−k + V0 H ′ (ν(µ)) = V0′ + V0 H ′ (µ) c ( )c ( ) ( ) ρ ρ ρ ′ ′ F (ν(µ)) + V H(ν(µ)) − V + V H(µ) = V + V H (µ) (ν(µ) − µ) m−k 0 0 0 0 0 c c c We define L(V, λ, µ)(µ′ ) as a linear function of µ′ : L(V, λ, µ)(µ′ ) = (V (µ) + λH(µ)) + (V ′ (µ) + λH ′ (µ))(µ′ − µ) (B.15) Define G(V, λ)(µ) as a function of µ: G(V, λ)(µ) = V (µ) + λH(µ) (B.16) Then G(Fm−k , ρc Vm−k (µ0 ))(µ′ ) is a concave function of µ′ . Consider: ( ) ( ) ρ ρ L Vm−k , Vm−k (µ0 ), µ0 (µ′ ) − G Fm−k , Vm−k (µ0 ) (µ′ ) c c This is a convex function and have unique minimum. Therefore, the minimum will be determined by FOC. Simple calculation shows that it is minimized at ν(µ0 ) and the minimal value is 0. ρ ρ ′ ′ + Vm−k (µ0 )H ′ (µ′ ) FOC : Vm−k (µ0 ) + Vm−k (µ0 )H ′ (µ0 ) = Fm−k c c It’s easy to see that this equation is identical to the FOC for ν(µ0 ). Now consider: ( ) ( ) ρ ρ ′ L Vm−k , Vm−k (µ0 ), µ0 (µ ) − G Vm−k , Vm−k (µ0 ) (µ′ ) c c ( ) ( ) ρ ρ ′ = Vm−k (µ0 ) + Vm−k (µ0 )H(µ0 ) + Vm−k (µ0 ) + Vm−k (µ0 )H ′ (µ0 ) (µ′ − µ0 ) c c ( ) ρ ′ ′ − Vm−k (µ ) + Vm−k (µ0 )H(µ ) c ( ) ρ ′ ′ (µ0 )(µ′ − µ0 ) − Vm−k (µ0 )J(µ0 , µ′ ) = − Vm−k (µ ) − Vm−k (µ0 ) − Vm−k c <0 The last inequality is from rewriting Equation (B.14). Therefrore, L(Vm−k , ρc Vm−k (µ0 ), µ0 )(µ′ )− G(Vm−k , ρc Vm−k (µ0 ))(µ′ ) will have minimum strictly negative. Suppose it’s minimized at µ′′ . Then FOC implies: ρ ρ ′ ′ (µ′′ ) + Vm−k (µ0 )H(µ′′ ) (µ0 ) + Vm−k (µ0 )H ′ (µ0 ) = Vm−k Vm−k c c 72 Consider: ( ) ( ) ρ ρ L Vm−k , Vm−k (µ0 ), µ′′ (ν(µ′′ )) − G Fm−k , Vm−k (µ0 ) (ν(µ′′ )) c c ( ) ( ) ρ ρ =L Vm−k , Vm−k (µ0 ), µ0 (ν(µ′′ )) − G Fm−k , Vm−k (µ0 ) (ν(µ′′ )) c c ρ ρ ′′ ′′ ′ + Vm−k (µ ) − Vm−k (µ0 ) + Vm−k (µ0 )(H(µ ) − H(µ0 )) − (Vm−k (µ0 ) + H ′ (µ0 ))(µ′′ − µ0 ) c c ρ ′ ρ ′′ ′ ′′ ≥Vm−k (µ ) − Vm−k (µ0 ) + Vm−k (µ0 )(H(µ ) − H(µ0 )) − (Vm−k (µ0 ) + H (µ0 ))(µ′′ − µ0 ) c ( ) c ( ) ρ ρ ′′ ′′ =G Vm−k , Vm−k (µ0 ) (µ ) − L Vm−k , Vm−k (µ0 ), µ0 (µ ) c c >0 In the first equality we used FOC. In the first inequality we used suboptimality of ν(µ′′ ) at µ0 . However: ) ( ( ) ρ ρ 0 =L Vm−k , Vm−k (µ′′ ), µ′′ (ν(µ′′ )) − G Fm−k , Vm−k (µ′′ ) (ν(µ′′ )) c c ( ) ( ) ρ ρ ′′ ′′ =L Vm−k , Vm−k (µ0 ), µ (ν(µ )) − G Fm−k , Vm−k (µ0 ) (ν(µ′′ )) c c ρ ′′ ′′ ′′ + (Vm−k (µ ) − Vm−k (µ)) (H(µ ) − H(ν(µ )) + H ′ (µ′′ )(ν(µ′′ ) − µ′′ )) c ρ > (Vm−k (µ′′ ) − Vm−k (µ))J(µ′′ , ν(µ′′ )) c >0 Contradiction with optimality of ν(µ′′ ). Now we show Lemma 19. Suppose it’s not true, then there exists µ′ ∈ (µ0 , ν(µ0 )), µ′′ ≥ µm′ s.t.: Vm−k (µ′ ) < ′ (µ′ )(µ′′ − µ′ ) c Fm′ (µ′′ ) − Vm−k (µ′ ) − Vm−k ρ J(µ′ , µ′′ ) Then by definition: ) ( ) ( ρ ρ 0 ≤L Vm−k , Vm−k (µ0 ), µ0 (µ′′ ) − G Fm′ , Vm−k (µ0 ) (µ′′ ) c ( ) ( c ρ ) ρ ′′ =L Fm−k , Vm−k (µ0 ), ν(µ0 ) (µ ) − G Fm′ , Vm−k (µ0 ) (µ′′ ) c c ( ) ( ) ρ ρ ′ 0 ≤L Vm−k , Vm−k (µ0 ), µ0 (µ ) − G Vm−k , Vm−k (µ0 ) (µ′ ) c ( ) ( c ρ ) ρ ′ =L Fm−k , Vm−k (µ0 ), ν(µ0 ) (µ ) − G Vm−k , Vm−k (µ0 ) (µ′ ) c c ( ) ( ) ρ ρ ′ ′′ =⇒ L Fm−k , Vm−k (µ ), ν(µ0 ) (µ ) − G Fm′ , Vm−k (µ′ ) (µ′′ ) c c ( ) ( ) ρ ρ ′′ ′ =L Fm−k , Vm−k (µ0 ), ν(µ0 ) (µ ) − G Fm , Vm−k (µ0 ) (µ′′ ) c c ρ ′ ′′ + (Vm−k (µ ) − Vm−k (µ0 )) J(µ0 , µ ) c >0 ) ( ) ( ρ ρ L Fm−k , Vm−k (µ′ ), ν(µ0 ) (µ′ ) − G Vm−k , Vm−k (µ′ ) (µ′ ) c c 73 ( ) ( ) ρ ρ =L Fm−k , Vm−k (µ0 ), ν(µ0 ) (µ′ ) − G Vm−k , Vm−k (µ0 ) (µ′ ) c c ρ ′ ′ + (Vm−k (µ ) − Vm−k (µ0 )) J(µ0 , µ ) c >0 ) ( Now we consider L Vm−k , ρc Vm−k (µ′ ), µ′ (·): ( ) ( ) ρ ρ ′ ′ ′ ′ L Vm−k , Vm−k (µ ), µ (µ ) = G Vm−k , Vm−k (µ ) (µ′ ) c ) ( c ρ ( ) ρ ′ ′ L Vm−k , Vm−k (µ ), µ (ν(µ0 )) ≥ G Vm−k , Vm−k (µ′ ) (ν(µ0 )) c c ( ) ( ) ρ ρ ′ ′ ′ L Fm−k , Vm−k (µ ), ν(µ0 ) (µ ) > G Vm−k , Vm−k (µ ) (µ′ ) c ( ) ( c ρ ) ρ ′ L Fm−k , Vm−k (µ ), ν(µ0 ) (ν(µ0 )) = G Vm−k , Vm−k (µ′ ) (ν(µ0 )) c c ( ) ( ) ρ ρ L Vm−k , Vm−k (µ′ ), µ′ (ν(µ0 )) ≥ L Fm−k , Vm−k (µ′ ), ν(µ0 ) (ν(µ0 )) c c ( ) ( ) =⇒ ρ ρ ′ ′ ′ L Vm−k , Vm−k (µ ), µ (µ ) < L Fm−k , Vm−k (µ′ ), ν(µ0 ) (µ′ ) c c The two equalities are directly from definition of L and G. First inequality is from subopti) ( mality, second inequality is from previous calculation. Therefore L Vm−k , ρc Vm−k (µ′ ), µ′ (·) ( ) is lower at µ′ and L Fm−k , ρc Vm−k (µ′ ), ν(µ0 ) (·) is lower at ν(µ0 ). Since both of them ( ) are linear fuinctions, then L Vm−k , ρc Vm−k (µ′ ), µ′ (·) must be higher at any µ′′ > ν(µ0 ). Therefore, this implies: ) ( ) ( ρ ρ L Vm−k , Vm−k (µ′ ), µ′ (µ′′ ) > G Fm′ , Vm−k (µ′ ) (µ′′ ) c c Contradicting that µ′′ is superior than ν(µ′ ). Lemma 19′ . Suppose at µ0 , V0 , V0′ , k ≥ 1 satisfies: ′ ′ ′ ′ ′ ′ V0 = max c Fm−k (µ ) − V0 − V0 (µ − µ) ≤ max c Fm (µ ) − V0 − V0 (µ − µ) µ′ ≤µ ρ µ′ ≥µ ρ J(µ, µ′ ) J(µ, µ′ ) V (µ ) ≥ V ≥ F (µ ) 0 0 m+k 0 Vm+k is the solution as defined in Lemma 15 with initial condition V0 , V0′ , then ∀µ ∈ [ν(µ0 ), µ0 ]: Vm+k (µ) ≥ ′ (µ)(µ′ − µ) c Fm′ (µ′ ) − Vm−k (µ) − Vm−k µ′ ≤µ,m ∈[m,m+k] ρ J(µ, µ′ ) max ′ C. Proofs in Section 4 C.1. Convex Flow Cost C.1.1. Proof of Theorem 3 Proof. In this part, we introduce the algorithm to construct V (µ) and ν(µ). We only discuss the case µ ≥ µ∗ and the case µ ≤ µ∗ will follow by a symmetric method. Algorithm: 74 • Step 1 : By Lemma 21, there exists µ∗ ∈ ∆(X) and V (µ) defined as: Fm (µ′ ) − h(c) J(µ, µ′ ) c µ ,m,c 1 + ρc J(µ, µ′ ) V (µ) = max ′ • Step 2 : We construct the first piece of V (µ) to the right of µ∗ . By Lemma 21, there are three possible cases of µ∗ to discuss (µ∗ = 1 is omitted by symmetry). Case 1 : Suppose µ∗ ∈ (0, 1) and V (µ∗ ) > F (µ∗ ). Then there exists (m, ν(µ∗ ) > µ∗ , c) s.t. V (µ∗ ) = Fm (ν(µ∗ )) − h(c) J(µ∗ , ν(µ∗ )) c 1 + ρc J(µ∗ , ν(µ∗ )) With initial condition (µ0 = µ∗ , V0 = V (µ∗ ), V0′ = 0), we solve for Vm (µ) on [µ∗ , 1] as defined in Lemma 23. Let µ̂m be the first µ ≥ µ∗ that: c Fm−1 (µ′ ) − Vm (µ) − Vm′ (µ)(µ′ − µ) h(c) − µ ≥µ,c ρ J(µ, µ′ ) ρ Vm (µ) = max ′ We must have Vm (µ̂m ) ≥ Fm−1 (µ) otherwise there will be a µ even smaller. Then we solve for Vm−1 with initial condition µ0 = µ̂m , V0 = Vm (µ̂m ), V0′ = Vm′ (µ̂m ). If m − 1 > m, we continue this procedure by looking for µ̂m−1 until we get Vm (µ). Vm will be piecewise defined by different Vm . By definition, Vm (µ) will be smoothly increasing until it hits F . Since V, (µ∗ ) > F (µ∗ ), this intersection point will be µ∗∗ > µ∗ . Case 2 : Suppose µ∗ ∈ (0, 1) but V (µ∗ ) = F (µ∗ ), consider: Let µ∗∗ c Fk (µ′ ) − F (µ) h(c) Ve (µ) = ′max − µ ≥µ,k,c ρ J(µ, µ′ ) ρ { } e = inf µ|V (µ) > F (µ) . Case 3 : Suppose µ∗ = 0, consider: c Fk (µ′ ) − F1 (µ) − F1′ (µ′ − µ) h(c) − µ ≥µ,k,c ρ J(µ, µ′ ) ρ { } sup F ≤ inf F . Therefore, inf µ Ṽ (µ) > F (µ) > There exists δ s.t. ∀µ < δ, ∀µ′ ≤ µ2 , J(µ,µ 1 ′) 0. We call it µ∗∗ . Ṽ (µ) = ′max • Step 3: For all µ ≥ µ∗∗ such that: c Fk (µ′ ) − F (µ) − F ′− (µ)(µ′ − µ) h(c) − µ ≥µ,k,c ρ J(µ′ , µ) ρ F (µ) = ′max Let m be the optimal action. Solve for Vm with initial condition µ0 = µ, V0 = F (µ), V0′ = F ′− (µ). Let µ̂m be the first µ ≥ µ0 that: c Fm−1 (µ′ ) − Vm (µ) − Vm′ (µ)(µ′ − µ) h(c) − µ ≥µ,c ρ J(µ, µ′ ) ρ Vm (µ) = max ′ 75 Then we solve for Vm−1 with initial condition µ0 = µ̂m , V0 = Vm (µ̂m ), V0′ = Vm′ (µ̂m ). We continue this procedure until we get Vm0 , where m0 is the optimal action solved in the definition of Ṽ (µ). Now suppose Vm0 first hits F (µ) at some point µ′′ , define: ′ if µ′ < µ F (µ ) Vµ (µ′ ) = Vm0 (µ′ ) if µ′ ∈ [µ, µ′′ ] F (µ′ ) if µ′ > µ′′ Let Ω be the set of all these µ0 . Then either Vµ = F , or there exists open interval (µ, µ′′ ) on which Vµ > F . • Step 4 : Define: Vm (µ) V (µ) = if µ ∈ [µ∗ , µ∗∗ ] ∗∗ sup {Vµ0 (µ)} if µ ≥ µ µ0 ∈Ω Then ∀V (µ) > F (µ), there must exists µn s.t. V (µ) = limn Vµn (µ) by definition of sup. Since Ω is a closed set, there exists µnk → µ0 . By continuous dependence, Vµ0 (µ) = V (µ). Then in the open interval around µ that V > F , V (µ) = Vµ0 (µ). Other uniqueness of ODE will be violated. Therefore V will be locally defined as Vµ0 { } and is a smooth function on µ V (µ) > F (µ) . In the algorithm, we only discussed the case µ∗ < 1 and constructed the value function to the right of µ∗ . On the left of µ∗ , V can be defined by using a totally symmetric argument by referring to Lemma 22′ . Before we proceed to proof of smoothness and unimprovability of V , we state a useful result: Lemma 20. ∀µ ≥ µ∗ s.t. V (µ) = Fm (µ), we have: c Fm+k (µ′ ) − Fm (µ) − Fm− ′ (µ′ − µ) h(c) k − ≜ Um (µ) µ ≥µ,c ρ J(µ, µ′ ) ρ Fm (µ) ≥ max ′ k Suppose Lemma 20 is not true, then exists µ s.t. Um (µ) > Fm (µ) = V (µ). F −′ is LSC and left continuous function. Then Um k will be USC and left continuous w.r.t. µ when m k (µ) > Fm (µ), there exists µ0 > 0 is taken that F (µ = Fm (µ). By definition of µ∗∗ and Um k s.t. Um0 (µ0 ) = Fm0 (µ0 ). Take µ0 < µ to be the supremum of such µ0 . Now consider ) ( initial condition µ0 , Fm0 (µ0 ), Fm′ 0 , by Lemma 22, we solve fore Vk (µ) on [µ0 , µ]. Now ∀µ′ ∈ (µ0 , µ) s.t. Vk (µ′ ) ≤ Fm′ (µ′ ), by immediate value theorem, µ′ can be picked that Vk′ (µ′ ) ≤ Fm′ ′ . Therefore: c Fk (µ′′ ) − Vk (µ′ ) − Vk′ (µ′ )(µ′′ − µ′ ) h(c) − J(µ′ , µ′′ ) ρ µ′′ ,c ρ c Fk (µ′′ ) − Fm′ (µ′ ) − Fm′ ′ (µ′ )(µ′′ − µ′ ) h(c) ≥ sup − J(µ′ , µ′′ ) ρ µ′′ ,c ρ Vk (µ′ ) = sup ≥Fm′ (µ′ ) Contradicting V (µ) = Fm (µ). 76 Lemma 20′ . ∀µ ≤ µ∗ s.t. V (µ) = Fm (µ), we have: c Fm−k (µ′ ) − Fm (µ) − Fm+ ′ (µ′ − µ) h(c) k − ≜ Um (µ) µ ≤µ,c ρ J(µ, µ′ ) ρ Fm (µ) ≥ max ′ Smoothness Given our construction of V (µ), ∀µ s.t. V (µ) > F (µ), V is piecewise solution of the ODEs and is C (1) smooth by construction. However on {µ|V (µ) = F (µ)}, our definition of V is by taking supremum over an uncountable set of Vµ ’s. Therefore V (µ) is not necessarily differentiable. We now discuss smoothness of V on this set in details (we only discuss µ ≥ µ∗ and leave the remaining case to symmetry argument). Suppose µ ∈ {µ|V (µ) = F (µ)}o , then V = F locally on an open interval. To show smoothness of V , it’s sufficient to show smoothness of F . Suppose not, then µ = µm . However, at µm , ∀c > 0: ′ ′ ′ c Fm+1 (µ ) − Fm (µm ) − Fm (µ − µm ) h(c) lim − =∞ µ′ →µ+ ρ J(µm , µ′ ) ρ m Therefore, we apply the result just derived and get contradiction. Now we only need to discuss the boundary of {µ|V (µ) = F (µ)}. The first case is that {µ|V (µ) > F (µ)} is not dense locally. Therefore, V = F locally at only side of µ, which implies one sided smoothness. The only remaining case is that there exists µn → µ s.t. F (µn ) < V (µn ). We first show differentiability of V at µ. We already know that V (µ′ ) − V (µ) ≥ F ′ (µ)(µ′ − µ) (µ) since V ≥ F . Suppose now µn → µ+ and V (µµnn)−V ≥ F ′ (µ) + ε. ∀n, V (µn ) ≥ −µ F (µn ) + ε(µn − µ) > F (µn ). Then V is smooth around µn . µn can be picked that V ′ (µ′n ) ≥ F ′ (µ) + ε by Lemma 12. Consider (ν(µn ), c(µn )) being the solution of posterior and cost level associated with µn , by definition of µn , ν(µn ) ≥ µm+1 (when µn < µm+1 , the objective function will be negative, therefore suboptimal for sure). So we can pick a converging subsequence of µ2n to some ν ≥ µm+1 and c(µn ) → c. Then: F (µ) = lim V (µn ) ( ) c(µn ) Fmn (ν(µn )) − V (µn ) − V ′ (µn )(ν(µn ) − µn ) h(c(µn )) = lim − n→∞ ρ J(µn , ν(µn )) ρ ′ c(µn ) Fmn (ν(µn )) − F (µn ) − (F (µ) + ε)(ν(µn ) − µn ) h(c) ≤ lim − n→∞ ρ J(µn , ν(µn )) ρ ′ cε ν(µn ) − µn h(c) c(µn ) Fmn (ν(µn )) − F (µn ) − F (µ)(ν(µn ) − µn ) − lim − ≤ lim n→∞ ρ J(µn , ν(µn )) n→∞ ρ J(µn , ν(µn )) ρ ′ c Fm′ (ν) − F (µ) − F (µ)(ν − µ) cε ν − µ h(c) = − − ρ J(µ, ν) ρ J(µ, ν) ρ <F (µ) (µn ) Contradiction. Now suppose µn → µ− and V (µ)−V ≤ F ′ (µ) − ε. Then similarly we can µ−µn choose µn s.t. V ′ (µn ) ≤ F ′ (µ) − ε. Choose (ν, m, c) being the optimal posterior, action and cost at µ. Then: F (µ) = lim V (µn ) 77 c(µn ) Fm (ν) − V (µn ) − V ′ (µn )(ν − µn ) h(c(µn )) − n→∞ ρ J(µn , ν) ρ ′ c(µn ) Fm (ν) − V (µn ) − F (µ)(ν − µn ) c ε(ν − µn ) h(c(µn )) ≥ lim + − n→∞ ρ J(µn , ν) ρ J(µn , ν) ρ ′ c(µn ) Fm (ν) − V (µn ) − F (µ)(ν − µn ) c(µn ) ε(ν − µn ) h(c(µn )) ≥ lim + lim − n→∞ ρ J(µn , ν) ρ J(µn , ν) ρ n→∞ ′ c Fm (ν) − F (µ) − F (µ)(ν − µ) h(c) cε ν − µ = − + ρ J(µ, ν) ρ ρ J(µ, ν) ≥ lim >F (µ) Contradiction. Therefore we showed that V will be differentiable everywhere. Now suppose V ′ is not continuous at µ. Utilizing previous proof, we have already ruled out the cases when limµ′ →µ+ > F ′ (µ) and limµ′ →µ− < F ′ (µ). Suppose now exists µn → µ+ and V ′ (µn ) ≤ F ′ (µ) − ε. Then consider: F (µ) = lim V (µn ) c Fm (ν) − V (µn ) − V ′ (µn )(ν − µn ) h(c) − ≥ lim n→∞ ρ J(µn , ν) ρ ′ c Fm (ν) − V (µn ) − F (µ)(ν − µn ) c ε(ν − µn ) h(c) ≥ lim + lim − n→∞ ρ J(µn , ν) ρ n→∞ ρ J(µn , ν) ′ c Fm (ν) − F (µ) − F (µ)(ν − µ) h(c) cε ν − µ − + = ρ J(µ, ν) ρ ρ J(µ, ν) >F (µ) Contradiction. When µn → µ− and V ′ (µn ) ≥ F ′ (µ) + ε, similarly as before, we can take ν(µn ) converging to ν ≥ µm+1 and c(µn ) converging to c. Then: F (µ) = lim V (µn ) c(µn ) Fmn (ν(µn )) − V (µn ) − V ′ (µn )(ν(µn ) − µn ) h(c(µn )) = lim − n→∞ ρ J(µn , ν(µn )) ρ ′ c(µn ) Fmn (ν(µn )) − F (µn ) − F (µ)(ν(µn ) − µn ) cε ν(µn ) − µn h(c(µn )) ≤ lim − lim − n→∞ n→∞ ρ J(µn , ν(µn )) ρ J(µn , ν(µn )) ρ ′ c Fm′ (ν) − F (µ) − F (µ)(ν − µ) h(c) cε ν − µ = − − ρ J(µ, ν) ρ ρ J(µ, ν) <F (µ) Contradiction. To sum up, we proved that V (µ) is differentiable on (0, 1) and V ′ (µ) is continuous on (0, 1). What’s more, since µ∗ ∗ is bounded away from {0, 1}, in the neighbour of {0, 1}, V = F . Therefore V (µ) is C (1) smooth on [0, 1]. Unimprovability Now we prove the unimprovability of V (µ). • Step 1 : We first show that V (µ) solves the following problem: { } c Fm (µ′ ) − V (µ) − V ′ (µ)(µ′ − µ) h(c) V (µ) = max F (µ), max − µ′ ,m,c ρ J(µ, µ′ ) ρ 78 (P-C1) { µ′ ≥ µ when µ ≥ µ∗ µ′ ≤ µ when µ ≤ µ∗ Case 1 : V (µ) > F (µ). Then there exists µ0 s.t. V (µ) = Vµ0 (µ). Suppose the associated action is m0 at µ0 , m at µ. Then the construction of Vµ0 guarantees than ∀µ′ ∈ [µ, µ0 ], ∃µ̂m′ ∈ [µ0 , µ]. Then by Lemma 24, Equation (P-C1) is satisfied at µ. Case 2 : V (µ) = F (µ). Then there are two possibilities. If µ ∈ Ω, then by construction of Vµ , we have: c Fk (µ′ ) − F (µ) − F ′− (µ)(µ′ − µ) h(c) F (µ) = ′max − µ ≥µ,k,c ρ J(µ, µ′ ) ρ This is exactly Equation (P-C1). The only remaining case is that µ ∈ ̸ Ω and Equation (P-C1) is violated: c Fk (µ′ ) − F (µ) − F ′− (µ)(µ′ − µ) h(c) F (µ) < ′max − µ ≥µ,k,c ρ J(µ, µ′ ) ρ This possibility is ruled out by Lemma 20. • Step 2 : Then we show that V (µ) solves the following problem: { } c V (µ′ ) − V (µ) − V ′ (µ)(µ′ − µ) h(c) − V (µ) = max F (µ), max µ′ ,c ρ J(µ, µ′ ) ρ { µ′ ≥ µ when µ ≥ µ∗ (P-D1) µ′ ≤ µ when µ ≤ µ∗ Suppose not, then there exists: c Fm (µ′ ) − V (µ) − V ′ (µ)(µ′ − µ) h(c) − µ ≥µ,c ρ J(µ, µ′ ) ρ ′′ ′′ ′ ′′ ′′ c V (µ ) − V (µ) − V (µ)(µ − µ) h(c ) ≤ V (µ) < − ρ J(µ, µ′ ) ρ Ṽ = max ′ Suppose the optimizer is µ′ , m, c. Optimality implies Equation (C.3): V (µ′ ) − V (µ) − V ′ (µ)(µ′ − µ) = h′ (c′ ) J(µ, µ′ ) Together with Equation (C.1), we have c′ h′ (c′ ) = ρṼ + h(c′ ). Then combine with Equation (C.2), we get: { Fm′ + h′ (c)H ′ (µ′ ) = V ′ (µ) + h′ (c′ )H ′ (µ) (Fm (µ′ ) + h′ (c′ )H(µ′ )) − (V (µ) + h′ (c′ )H(µ)) = (V ′ (µ) + h′ (c′ )H ′ (µ))(µ′ − µ) We define L and G as in Theorem 2. Then L will be linear and G(Fm , h′ (c′ ))(µ′ ) will be a concave function of µ′ . Consider: L(V, h′ (c′ ), µ)(µ′ ) − G(Fm , h′ (c′ )) 79 FOC implies that it will be convex and attains minimum 0 at µ′ . For any m′ other than m, L(V, h′ (c′ ))(µ′ ) − G(Fm′ , h′ (c′ ))(µ′ ) will be convex and weakly larger than zero. However: L(V, h′ (c′ ), µ)(µ′′ ) − G(V, h′ (c′ ))(µ′′ ) = − (V (µ′′ ) − V (µ) − V ′ (µ)(µ′′ − µ) − h′ (c′ )J(µ, µ′′ )) <0 The inequality is from definition of c′ : c′ h′ (c′ ) − h(c′ ) < c′′ h′ (c′′ ) − h(c′′ ) =⇒ h′ (c′ ) < h′ (c′′ ) V (µ′′ ) − V (µ) − V ′ (µ)(µ′′ − µ) =⇒ h′ (c′ ) < J(µ, µ′′ ) Therefore, L(V, h′ (c′ ), µ)(·)−G(V, h′ (c′ ))(·) will have a strictly negative minimum. Suppose it’s minimized at µ̃, Then FOC implies: V ′ (µ) + h′ (c′ )H ′ (µ) = V ′ (µ̃) + h′ (c′ )H ′ (µ̃) Consider: L (V, h′ (c′ ), µ̃) (ν(µ̃)) − G (Fm , h′ (c′ )) (ν̃) =L (V, h′ (c′ ), µ) (ν(µ̃)) − G (Fm , h′ (c′ )) (ν(µ̃)) + V (µ̃) − V (µ) + h′ (c′ )(H(µ̃) − H(µ)) − (V ′ (µ) + h′ (c′ )H(µ)) (µ̃ − µ) ≥V (µ̃) − V (µ) + h′ (c′ )(H(µ̃) − H(µ)) − (V ′ (µ) + h′ (c′ )H ′ (µ)) (µ̃ − µ) =G (V, h′ (c′ )) (µ̃) − L (V, h′ (c′ ), µ) (µ̃) >0 Let m′ , ν(µ̃), c̃ be maximizer at µ̃, c̃h′ (c̃) = ρV (µ̃) + h(c̃): 0 =L(V, f ′ (c̃), µ̃)(ν(µ̃)) − G(Fm′ , h′ (c̃))(ν(µ̃)) =L(V, h′ (c′ ), µ̃)(ν(µ̃)) − G(Fm′ , h′ (c′ ))(ν(µ̃)) + (f ′ (c̃) − h′ (c′ ))J(µ̃, ν(µ̃)) >(f ′ (c̃) − h′ (c′ ))J(µ̃, ν(µ̃)) Since µ̃ > µ, we have h′ (c̃) − h′ (c′ ) > 0. Contradiction. Therefore we proved Equation (P-D1). • Step 3 : We show that V satisfies Equation (8). First, since V is smooth, envelope theorem implies: V ′ (µ) = − c ν−µ (V ′′ (µ) + h′ (c)H ′′ (µ)) ρ J(µ, ν) 80 >0 =⇒ V ′′ (µ) + h′ (c)H ′′ (µ) < 0 Therefore, allocating to diffusion experiment will always be suboptimal. What’s more, consider: c V (µ′ ) − V (µ) − V ′ (µ)(µ′ − µ) µ ≤µ,c ρ J(µ, µ′ ) c µ′ − µ =⇒ V ′− (µ) = − (V ′′ (µ) + h′ (c)H ′′ (µ)) ′ ρ J(µ, µ ) V − (µ) = max ′ V − (µ∗ ) = V (µ∗ ) and whenever V (µ) = V − (µ), we will have V −′ (µ) < 0. Therefore, V − (µ) can never cross from below, that is to say: } { 1 ′′ ′ ′ ′ 2 ρV (µ) = max ρF (µ), max p(V (µ ) − V (µ) − V (µ)(µ − µ)) + V (µ)σ − h(c) µ′ ,p,σ,c 2 1 s.t. pJ(µ, µ′ ) + H ′′ (µ)σ 2 = c 2 Lemma 21. Define V + and V − : cFm (µ′ ) − h(c)J(µ, µ′ ) µ ≥µ,m,c c + ρJ(µ, µ′ ) cFm (µ′ ) − h(c)J(µ, µ′ ) V − (µ) = ′max µ ≤µ,m,c c + ρJ(µ, µ′ ) V + (µ) = ′max There exists µ∗ ∈ [0, 1] s.t. V + (µ) ≥ V − (µ) ∀µ ≥ µ∗ ; V + (µ) ≤ V − (µ) ∀µ ≤ µ∗ . − + as following: and Um Proof. We define function Um cFm (µ′ ) − h(c)J(µ, µ′ ) = max µ′ ≥µ,c c + ρJ(µ, µ′ ) cFm (µ′ ) − h(c)J(µ, µ′ ) − U m (µ) = max µ′ ≤µ,c c + ρJ(µ, µ′ ) U+ m (µ) Since h(c), Fm (µ) and J(µ, µ′ ) are all smooth functions, the objective function will be smooth. First consider FOCs and SOCs: ) ( ) ( h(c) ρ ρ ′ ′ ′ ′ FOC-µ :Fm 1 + J(µ, µ ) − + Fm (µ ) (H ′ (µ) − H ′ (µ′ )) = 0 c c c FOC-c : ρFm (µ′ ) + h(c) − h′ (c) (c + ρJ(µ, µ′ )) = 0 [ ] c(ρFm (µ′ ) + h(c))(c + ρJ(µ, µ′ ))H ′′ (µ′ ) 0 SOC : H = 0 −J(µ, µ′ )(c + ρJ(µ, µ′ ))2 h′′ (c) Noticing that SOC is evaluated at the pairs (µ′ , c) at which FOC holds. Remark. Details of calculation of second derivatives: 81 • Hµ′ ,µ′ : ∂ 2 cFm (µ′ ) − h(c)J(µ, µ′ ) ∂µ′2 c + ρJ(µ, µ′ ) [ 1 2 ′ ′ ′ ′ ′ 2 = 3 2ρ (cFm (µ ) − h(c)J(µ, µ )) (H (µ) − H (µ )) ′ (c + ρJ(µ, µ )) − 2ρ (c + ρJ(µ, µ′ )) (H ′ (µ) − H ′ (µ′ )) (cFm′ − h(c)(H ′ (µ) − H ′ (µ′ ))) + ρ(c + ρJ(µ, µ′ ))(cFm (µ′ ) − h(c)J(µ, µ′ ))H ′′ (µ′ ) ] ′ 2 ′′ ′ +(c + ρJ(µ, µ )) h(c)H (µ ) (h(c) + ρFm (µ′ ))(H ′ (µ) − H ′ (µ′ )) c + ρJ(µ, µ′ ) ∂ 2 cFm (µ′ ) − h(c)J(µ, µ′ ) =⇒ ∂µ′2 c + ρJ(µ, µ′ ) [ 1 = 2ρ2 (cFm (µ′ ) − h(c)J(µ, µ′ )) (H ′ (µ) − H ′ (µ′ ))2 (c + ρJ(µ, µ′ ))3 FOC-µ′ =⇒ Fm′ = + 2ρ(c + ρJ(µ, µ′ ))h(c)(H ′ (µ) − H ′ (µ′ ))2 − 2ρ(h(c) + ρFm (µ′ ))(H ′ (µ) − H ′ (µ′ ))2 + ρ(c + ρJ(µ, µ′ ))(cFm (µ′ ) − h(c)J(µ, µ′ ))H ′′ (µ′ ) ] ′ 2 ′′ ′ +(c + ρJ(µ, µ )) h(c)H (µ ) [ 1 = ρ(c + ρJ(µ, µ′ ))(cFm (µ′ ) − h(c)J(µ, µ′ ))H ′′ (µ′ ) (c + ρJ(µ, µ′ ))3 ] ′ 2 ′′ ′ +(c + ρJ(µ, µ )) h(c)H (µ ) =(c + ρJ(µ, µ′ ))H ′′ (µ′ ) (ρcFm (µ′ ) − ρh(c)J(µ, µ′ ) + ch(c) + ρh(c)J(µ, µ′ )) c(ρFm (µ′ ) + h(c))(c + ρJ(µ, µ′ ))H ′′ (µ′ ) = (c + ρJ(µ, µ′ ))3 • Hc,c : ∂ 2 cFm (µ′ ) − h(c)J(µ, µ′ ) ∂c2 c + ρJ(µ, µ′ ) [ 1 = 2 (cFm (µ′ ) − h(c)J(µ, µ′ )) (c + ρJ(µ, µ′ ))3 − 2(c + ρJ(µ, µ′ ))(Fm (µ′ ) − h′ (c)J(µ, µ′ )) ] ′ ′ 2 ′′ − J(µ, µ )(c + ρJ(µ, µ )) h (c) FOC-c =⇒ cFm (µ′ ) − h(c)J(µ, µ′ ) = (c + ρJ(µ, µ′ ))(Fm (µ′ ) − h′ (c)J(µ, µ′ )) ∂ 2 cFm (µ′ ) − h(c)J(µ, µ′ ) ∂c2 c + ρJ(µ, µ′ ) [ 1 = 2(c + ρJ(µ, µ′ ))(Fm (µ′ ) − h′ (c)J(µ, µ′ )) ′ 3 (c + ρJ(µ, µ )) =⇒ 82 − 2(c + ρJ(µ, µ′ ))(Fm (µ′ ) − h′ (c)J(µ, µ′ )) ] ′ ′ 2 ′′ − J(µ, µ )(c + ρJ(µ, µ )) h (c) = −J(µ, µ′ )(c + ρJ(µ, µ′ ))2 h′′ (c) (c + ρJ(µ, µ′ ))3 • Hµ′ ,c : ∂ 2 cFm (µ′ ) − h(c)J(µ, µ′ ) ∂c∂µ′ c + ρJ(µ, µ′ ) [ 1 2ρ(cFm (µ′ ) − h(c)J(µ, µ′ ))(H ′ (µ) − H ′ (µ′ )) = ′ 3 (c + ρJ(µ, µ )) − ρ(c + ρJ(µ, µ′ ))(Fm (µ′ ) − h′ (c)J(µ, µ′ ))(H ′ (µ) − H ′ (µ′ )) − (c + ρJ(µ, µ′ ))(cFm′ − h(c)(H ′ (µ − H ′ (µ′ )))) ′ 2 (Fm′ ′ ′ ′ ] ′ − h (c)(H (µ) − H (µ ))) + (c + ρJ(µ, µ )) [ 1 = 2ρ(cFm (µ′ ) − h(c)J(µ, µ′ ))(H ′ (µ) − H ′ (µ′ )) ′ 3 (c + ρJ(µ, µ )) − ρ(cFm (µ′ ) − h(c)J(µ, µ′ ))(H ′ (µ) − H ′ (µ′ )) (h(c) + ρFm (µ′ ))(H ′ (µ) − H ′ (µ′ )) − (c + ρJ(µ, µ′ ))(c h(c)(H ′ (µ − H ′ (µ′ )))) c + ρJ(µ, µ′ ) )] ( (h(c) + ρFm (µ′ ))(H ′ (µ) − H ′ (µ′ )) ′ ′ ′ ′ ′ 2 − h (c)(H (µ) − H (µ )) + (c + ρJ(µ, µ )) c + ρJ(µ, µ′ ) H ′ (µ) − H ′ (µ′ ) = (ρcFm (µ′ ) − ρh(c)J(µ, µ′ ) − c(h(c) + ρFm (µ′ )) (c + ρJ(µ, µ′ ))3 ) +(c + ρJ(µ, µ′ ))h(c) + (c + ρJ(µ, µ′ )(h(c) + ρFm (µ′ ))) − (c + ρJ(µ, µ′ ))2 h′ (c) =0 The only term we don’t know its sign is ρFm (µ′ ) + h(c) = c + ρJ(µ, µ′ ) ′ F H ′ (µ) − H ′ (µ′ ) m Therefore, H will be ND if µ′ > µ and Fm′ > 0, or µ′ < µ and Fm′ < 0. In these cases, FOC uniquely characterizes the maximum. Suppose µ′ > µ and Fm′ < 0 or µ′ < µ and Fm′ > 0, the H will never be ND, and choice of µ′ will be on boundary. What’s more, simple calculation shows that choosing µ′ = µ will dominate choosing µ′ = 0, 1. Therefore: ′ U+ m (µ) = Fm (µ) when Fm < 0 ′ U− m (µ) = Fm (µ) when Fm > 0 When Fm′ > 0, envelope condition implies: ( ) −H ′′ (µ)(µ′ − µ) h(c) + ρc Fm (µ′ ) d + U (µ) = >0 )2 ( dµ m 1 + ρ J(µ, µ′ ) c 83 Similarly, when Fm′ < 0, envelope condition implies: ( ) −H ′′ (µ)(µ′ − µ) h(c) + ρc Fm (µ′ ) d − U (µ) = <0 ( )2 dµ m 1 + ρ J(µ, µ′ ) c − Therefore, U + m and U m have exactly the same properties as in Lemma 13, the rest of ∗ proofs simply follow Lemma 13. What’s more, we define νm and c∗m as the maximizer in this problem. Lemma 22. Assume µ0 ≥ µ∗ , Fm′ ≥ 0, V0 , V0′ satisfies: V (µ0 ) ≥ V0 > Fm (µ0 ) c Fm (µ′ ) − V0 − V0′ (µ′ − µ) h(c) − V0 = max µ′ ≥µ,c ρ J(µ, µ′ ) ρ Then there exists a C (1) smooth and strictly increasing V (µ) defined on [µ0 , 1] satisfying: c Fm (µ′ ) − V (µ) − V ′ (µ)(µ′ − µ) h(c) V (µ) = max − µ′ ≥µ,c ρ J(µ, µ′ ) ρ (B.7-c) and initial condition V (µ0 ) = V0 , V ′ (µ0 ) = V0′ . Proof. We start from deriving FOC and SOC for Equation (B.7-c): ( ′ ) Fm − V ′ (µ) Fm (µ′ ) − V (µ) − V ′ (µ)(µ′ − µ) ′ ′ ′ c ′ FOC-µ : + (H (µ ) − H (µ)) = 0 ρ J(µ, µ′ ) J(µ, µ′ )2 ) ( 1 Fm (µ′ ) − V (µ) − V ′ (µ)(µ′ − µ) − h′ (c) = 0 FOC-c: ′ ρ J(µ, µ ) [ −2(H ′ (µ)−H ′ (µ′ ))(FOC-µ′ ) ] c (Fm (µ′ )−V (µ)−V ′ (µ)(µ′ −µ))H ′′ (µ′ ) 1 ′ + FOC-µ ′ ′ 2 J(µ,µ ) ρ J(µ,µ ) c SOC: H= ′′ 1 ′ FOC-µ − h ρ(c) c Noticing that Hc,c < 0, therefore c satisfying FOC will be unique given µ, µ′ . On the other hand, FOC-µ′ is independent of c. Hµ′ ,µ′ < 0 when FOC-µ′ ≥ 0. Therefore, solution of F)C-µ′ will be unique. When FOCs are satisfied, H is strictly ND, then the solution of FOCs are going to be maximizer. Therefore, FOC-µ′ and FOC-c uniquely characterize optimal choice of µ′ , c. Now we impose feasibility: V (µ) = c Fm (µ′ ) − V (µ) − V ′ (µ)(µ′ − µ) h(c) − ρ J(µ, µ′ ) ρ (C.1) FOCs reduces to: ρV (µ) + h(c) ′ ′ (H (µ ) − H ′ (µ)) = 0 c FOC-c: ch′ (c) = ρV (µ) + h(c) FOC-µ′ :(Fm′ − V ′ (µ)) + Differentiate FOC-c, we get: ch′ (c) − h(c) V (µ) = ρ ′′ ch (c) V ′ (µ) = ċ ρ 84 (C.2) (C.3) (C.4) Plug Equation (C.4) into Equation (C.2) and Equation (C.1): ρ ċ = (Fm′ + h′ (c)(H ′ (µ′ ) − H ′ (µ))) ′′ ch (c) ( ) h(c) + ρF (µ) 1 m ′ c− J(µ , µ) = ρ h′ (c) (C.5) We obtained an equation system with one ODE of (c, ċ) and one regular equation for µ′ . Since J(µ′ , µ) is strictly monotonic for µ′ ≥ µ, we can also define an implicit inverse function M to eliminate µ′ in the equation. J(M (y, µ), µ) = y Therefore we get an ODE: ( ( ( ( ( )) ) )) ρ 1 h(c) + ρFm (µ) ′ ′ ′ ′ ċ = ′′ Fm + h (c) H M c− , µ − H (µ) ch (c) ρ h′ (c) (C.6) We define cm (µ0 )f ′ (cm (µ0 )) − f (cm (µ0 )) = ρFm (µ) when this equation has solution and cm (µ) = 0 when ρFm (µ) is so small that this equation has no solution. Since Fm (µ) is increasing in µ, cm (µ) is increasing and strictly increasing when cm (µ) > 0. We consider the initial conditions: c0 h′ (c0 ) − h(c0 ) ≤ V (µ0 ) Fm (µ0 ) < V0 = ρ =⇒ cm (µ0 ) < c0 ≤ c∗m (µ0 ) Then Lemma 23 guaranteed the existence of an increasing function c(µ) on [µ0 , 1]. Lemma 23. Define M as J(M (y, µ), µ) = y. Assume µ0 ∈ [µ∗ , 1), c0 satisfies: cm (µ0 ) < c0 ≤ c∗m (µ0 ) Then there exists a C (1) and strictly increasing c on [µ0 , 1] satisfying initial condition c(µ0 ) = c0 . On {µ|c(µ) > cm (µ)}, c solves: ( ( ( ( ( )) ) )) ρ 1 h(c) + ρFm (µ) ′ ′ ′ ′ ċ = ′′ Fm + h (c) H M c− , µ − H (µ) (C.6) ch (c) ρ h′ (c) Proof. We first characterize some useful properties of the ODE. We denote the ODE by ċ = R(µ, c). • Domain: By definition of cm (µ), ∀µ ∈ (0, 1) cm (µ) − h(cm (µ)) + ρFm (µ) =0 f ′ (cm (µ)) Since cm ≥ 0, then h(cm (µ)) + ρFm (µ) ≥ 0. Therefore at c = cm (µ): ( ) ∂ h(c) + ρFm (µ) h(c) + ρFm (µ) ′′ c− = h (c) > 0 ′ ∂c h (c) h′ (c)2 Therefore, ∀c ≥ cm (µ), c − h(c)+ρFm (µ) h′ (c) ≥ 0. Strictly inequality holds when c > cm (µ). m (µ) On the other hand, when c < cm (µ), if Fm (µ) ≥ 0, then c − h(c)+ρF < 0. Else if h′ (c) Fm (µ) ≤ 0, then cm (µ) = 0. Since M only applies to non-negative reals, we know that the ODE is only well defined in the region: {c|c ≥ cm (µ)}. 85 c 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 c(μ) 0.8 1.0 μ cm * (μ) Figure C.10: Phase diagram of (µ̇, ċ). • Continuity: It’s straight forward that the ODE is well behaved (satisfying PicardLindelof) when µ is strictly bounded away from {0, 1}, c is uniformly bounded away from cm (µ). • Monotonicity: When c = c∗m (µ), ċ = 0. This can be shown by considering FOC on c∗m : { Fm′ − h′ (c)(H ′ (µ) − H ′ (µ′ )) = 0 (c + ρJ(µ, µ′ ))h′ (c) = h(c) + ρFm (µ′ ) =⇒ (c − ρJ(µ′ , µ))h′ (c) = h(c) + ρFm (µ) + ρFm′ (µ′ − µ) + h′ (c)(H ′ (µ′ ) − H ′ (µ))(µ′ − µ) =⇒ (c − ρJ(µ′ , µ))h′ (c) = h(c) + ρFm (µ) ( ( ( ( ) )) ) h(c) + ρFm (µ) 1 ′ ′ ′ ′ =⇒ Fm + h (c) H M c− ,µ − H (µ) = 0 ρ h′ (c) =⇒ ċ = R(µ, c) = 0 Then we consider the monotonicity of R(µ, c): ∂ H ′′ (M ) 1 h(c) + ρFm (µ) ′′ ′′ ′ ′ ′ R(µ, c) = h (c) (H (M ) − H (µ)) + h (c) ′′ h (c) < 0 ∂c H (M )(µ − M ) ρ h′ (c)2 Therefore, R(µ, c) will be positive in {cm (µ) < c ≤ c∗m (µ)}. This refers to the blue region in Figure C.10. ∀δ > 0, we consider solving the ODE ċ = R(µ, c) in region: µ ∈ [δ, 1 − δ], c ∈ [cm (µ) + δ, c∗m (µ)]. The initial condition (µ0 , c0 ) is in the blue region of Figure C.10. Picard-Lindelof guarantees a unique solution satisfying the ODE in the region. What’s more, it’s straight forward that the solution c(µ) will be increasing. A solution is a blue 86 line with arrows in Figure C.10. A solution c(µ) will lie between cm (µ) and c∗m (µ) until it hits the boundary of region. Now we can take δ → 0 and extend c(µ) towards the boundary. Since the end point of c(µ) has both µ, c monotonically increasing, there is a limit c, µ with cm (µ) = c. Then ′ since R(µ, c) has a limit chρF′′m , we actually have limµ→µ V ′ (µ) = Fm′ by Equation (C.4). (c) So the resulting V (µ) calculated from V (µ) = c(µ)h′ (c(µ)) − h(c(µ)) ρ will be smooth on [µ0 , 1]. Lemma 22′ . Assume µ0 ≤ µ∗ , Fm′ ≥ 0, V0 , V0′ satisfies: V (µ0 ) ≥ V0 > Fm (µ0 ) c Fm (µ′ ) − V0 − V0′ (µ′ − µ) h(c) − V0 = max µ′ ≤µ,c ρ J(µ, µ′ ) ρ Then there exists a C (1) smooth and strictly decreasing V (µ) defined on [0, µ0 ] satisfying: c Fm (µ′ ) − V (µ) − V ′ (µ)(µ′ − µ) h(c) − µ ≤µ,c ρ J(µ, µ′ ) ρ V (µ) = max ′ (B.7-c’) and initial condition V (µ0 ) = V0 , V ′ (µ0 ) = V0′ . Lemma 23′ . Define M as J(M (y, µ), µ) = y. Assume µ0 ∈ (0, µ∗ ], c0 satisfies: cm (µ0 ) < c0 ≤ c∗m (µ0 ) Then there exists a C (1) and strictly decreasing c on [0, µ0 ] satisfying initial condition c(µ0 ) = c0 . On {µ|c(µ) > cm (µ)}, c solves: ( ( ( ( ( )) ) )) ρ 1 h(c) + ρFm (µ) ′ ′ ′ ′ ċ = ′′ Fm + h (c) H M c− , µ − H (µ) (C.6’) ch (c) ρ h′ (c) Lemma 24. Suppose at µ0 , V0 , V0′ , k ≥ 1 satisfies: ′ ′ ′ ′ ′ ′ V0 = max c Fm−k (µ ) − V0 − V0 (µ − µ) − h(c) ≥ max c Fm (µ ) − V0 − V0 (µ − µ) − h(c) µ′ ≥µ,c ρ µ′ ≥µ,c ρ J(µ, µ′ ) ρ J(µ, µ′ ) ρ V (µ ) ≥ V ≥ F (µ ) 0 0 m−k 0 Vm−k is the solution as defined in Lemma 23 with initial condition µ0 , V0 , V0′ , then ∀µ ∈ [µ0 , ν(µ0 )]: Vm−k (µ) ≥ ′ (µ)(µ′ − µ) h(c) c Fm′ − Vm−k (µ) − Vm−k − µ′ ≥µ,µ ∈[m−k,m],c ρ J(µ, µ′ ) ρ max ′ Proof. We first show that: V0 ≥ c Vm−k (µ′ ) − V0 − V0′ (µ′ − µ) h(c) − µ′ ∈[µ0 ,µm ],c ρ J(µ, µ′ ) ρ max 87 Suppose not, then there exists µ′ , c′ s.t. c′ Vm−k (µ′ ) − V0 − V0′ (µ′ − µ) h(c′ ) V < − 0 ρ J(µ, µ′ ) ρ ′ ′ ′ V (µ ) − V0 − V0 (µ − µ) m−k = h′ (c′ ) J(µ, µ′ ) (C.7) Let c0 h′ (c0 ) = ρV0 + h(c0 ), then optimality implies Equation (C.1) and Equation (C.2): { ′ Fm−k + h′ (c0 )H ′ (ν(µ)) = V0′ + h′ (c0 )H ′ (µ) (Fm−k (ν(µ)) + h′ (c0 )H(ν(µ))) − (V0 + h′ (c0 )H(µ)) = (Vo′ + h′ (c0 )H ′ (µ)) (ν(µ) − µ) We define L(V, λ, µ)(µ′ ) and G(V, λ)(µ) as Equation (B.15), Equation (B.16). Consider: L (Vm−k , h′ (c0 ), µ0 ) (µ′ ) − G (Fm−k , h′ (c0 )) (µ′ ) L is a linear function and G is a concave function. Therefore this is a convex function and have unique minimum determined by FOC. Simple calculation shows that it is minimized at ν(µ0 ) and the minimal value is 0. Now consider L (Vm−k , h′ (c0 ), µ0 ) (µ′ ) − G (Vm−k , h′ (c0 )) (µ′ ) ′ = − (Vm−k (µ′ ) − Vm−k (µ0 ) − Vm−k (µ0 )(µ′ − µ0 ) − h′ (c0 )J(µ0 , µ′ )) <0 The inequality is from Equation (C.7) and definition of c0 : c0 h′ (c0 ) − h(c0 ) < c′ h′ (c′ ) − h(c′ ) =⇒ h′ (c0 ) < h′ (c′ ) Vm−k (µ′ ) − V0 − V0′ (µ′ − µ0 ) =⇒ h′ (c0 ) < J(µ0 , µ′ ) Therefore L (Vm−k , h′ (c0 ), µ0 ) (µ′ )−G (Vm−k , h′ (c0 )) (µ′ ) will be strictly negative at µ′ and will have minimum strictly negative. Suppose it’s minimized at µ′′ , then FOC implies: ′ ′ Vm−k (µ0 ) + h′ (c0 )H ′ (µ0 ) = Vm−k (µ′′ ) + h′ (c0 )H(µ′′ ) Let c′′ h′ (c′′ ) = ρVm−k (µ′′ ) + h(c′′ ), then we have c′′ > c0 and h′ (c′′ ) > h′ (c0 ). Consider: L(Vm−k , h′ (c0 ), µ′′ )(ν(µ′′ )) − G(Fm−k , h′ (c0 ))(ν(µ′′ )) =L(Vm−k , h′ (c0 ), µ0 )(ν(µ′′ )) − G(Fm−k , h′ (c0 ))(ν(µ′′ )) + Vm−k (µ′′ ) − Vm−k (µ0 ) + h′ (c0 )(H(µ′′ ) − H(µ0 )) − (V ′ (µ0 ) + h′ (c0 ))(µ′′ − µ0 ) ≥Vm−k (µ′′ ) − Vm−k (µ0 ) + h′ (c0 )(H(µ′′ ) − H(µ0 )) − (V ′ (µ0 ) + h′ (c0 ))(µ′′ − µ0 ) =G(Vm−k , h′ (c0 ))(µ′′ ) − L(Vm−k , h′ (c0 ), µ0 )(µ′′ ) >0 However: 0 =L(Vm−k , h′ (c′′ ), µ′′ )(ν(µ′′ )) − G(Fm−k , h′ (c′′ ))(ν(µ′′ )) 88 =L(Vm−k , h′ (c0 ), µ′′ )(ν(µ′′ )) − G(Fm−k , h′ (c0 ))(ν(µ′′ )) + (f ′ (µ′′ ) − h′ (c0 ))(H(µ′′ ) − H(ν(µ′′ )) + H ′ (µ′′ )(ν(µ′′ ) − µ′′ )) >(h′ (c′′ ) − h′ (c0 ))J(µ′′ , ν(µ′′ )) >0 Contradiction. Now we show Lemma 24. Suppose it’s not true, then there exists µ′ ∈ (µ0 , ν(µ0 )), µ′′ ≥ µm , and c′′ s.t. ′ ′ (µ′ ) − Vm−k (µ′ )(µ′′ − µ′ ) h(c′′ ) c′′ Fm′ (µ′′ ) − Vm−k ′ − V (µ ) < m−k ρ J(µ′ , µ′′ ) ρ ′′ ′ ′ ′ ′ ′′ ′ F ′ (µ ) − Vm−k (µ ) − Vm−k (µ )(µ − µ ) m = h′ (c′′ ) J(µ′ , µ′′ ) If we let c′ h′ (c′ ) = ρV (µ′ ) + h(c′ ), then c′ > c0 and h′ (c′ ) > h′ (c0 ). By definition: 0 ≤L(Vm−k , h′ (c0 ), µ0 )(µ′′ ) − G(Fm′ , h′ (c0 ))(µ′′ ) =L(Fm−k , h′ (c0 ), ν(µ0 ))(µ′′ ) − G(Fm′ , h′ (c0 ))(µ′′ ) 0 ≤L(Vm−k , h′ (c0 ), µ0 )(µ′ ) − G(Fm′ , h′ (c0 ))(µ′ ) =L(Fm−k , h′ (c0 ), ν(µ0 ))(µ′ ) − G(Fm′ , h′ (c0 ))(µ′ ) =⇒ L(Fm−k , h′ (c′ ), ν(µ0 ))(µ′′ ) − G(Fm′ , h′ (c′ ))(µ′′ ) =L(Fm−k , h′ (c0 ), ν(µ0 ))(µ′′ ) − G(Fm′ , h′ (c0 ))(µ′′ ) + (h′ (c′ ) − h′ (c0 ))J(µ0 , µ′′ ) >0 L(Fm−k , h′ (c′ ), ν(µ0 ))(µ′′ ) − G(Fm′ , h′ (c′ ))(µ′′ ) =L(Fm−k , h′ (c0 ), ν(µ0 ))(µ′ ) − G(Fm′ , h′ (c0 ))(µ′ ) + (h′ (c′ ) − h′ (c0 ))J(µ0 , µ′ ) >0 No we consider L(Vm−k , h′ (c′ ), µ′ )(·) and L(Fm−k , h′ (c′ ), ν(µ0 ))(·): L(Vm−k , h′ (c′ ), µ′ )(µ′ ) = G(Vm−k , h′ (c′ ))(µ′ ) L(v ′ ′ ′ ′ ′ m−k , h (c ), µ )(ν(µ0 )) ≥ G(Vm−k , h (c ))(ν(µ0 )) L(Fm−k , h′ (c′ ), ν(µ0 ))(µ′ ) > G(Vm−k , h′ (c′ ))(µ′ ) L(Fm−k , h′ (c′ ), ν(µ0 ))(ν(µ0 )) = G(Vm−k , h′ (c′ ))(ν(µ0 )) { L(Vm−k , h′ (c′ ), µ′ )(ν(µ0 )) ≥ L(Fm−k , h′ (c′ ), ν(µ0 ))(ν(µ0 )) =⇒ L(Vm−k , h′ (c′ ), µ′ )(µ′ ) < L(Fm−k , h′ (c′ ), ν(µ0 ))(µ′ ) d d Since both functions are linear: dµ L(Vm−k , h′ (c′ ), µ′ )(µ) > dµ L(Fm−k , h′ (c′ )ν(µ0 ))(µ), then L(Vm−k , h′ (c′ ), µ′ )(·) must be larger than L(Fm−k , h′ (c′ ), ν(µ0 ))(·) at any µ′′ ≥ ν(µ0 ). This implies: L(Vm−k , h′ (c′ ), µ′ )(µ′′ ) > G(Fm′ , h′ (c′ ))(µ′′ ) Contradicting the assumption. 89 Lemma 24′ . Suppose at µ0 , V0 , V0′ , k ≥ 1 satisfies: ′ ′ ′ ′ ′ ′ V0 = max c Fm+k (µ ) − V0 − V0 (µ − µ) − h(c) ≥ max c Fm (µ ) − V0 − V0 (µ − µ) − h(c) µ′ ≤µ,c ρ µ′ ≤µ,c ρ J(µ, µ′ ) ρ J(µ, µ′ ) ρ V (µ ) ≥ V ≥ F (µ ) 0 0 m+k 0 Vm+k is the solution as defined in Lemma 23 with initial condition µ0 , V0 , V0′ , then ∀µ ∈ [ν(µ0 ), µ0 ]: Vm+k (µ) ≥ ′ (µ)(µ′ − µ) h(c) c Fm′ − Vm−k (µ) − Vm−k − µ′ ≤µ,µ ∈[m,m+k],c ρ J(µ, µ′ ) ρ max ′ C.2. Continuum of Actions C.2.1. Proof of Lemma 6 Proof. We prove with two steps: Step 1 : We first show that if we let Vdt (F ) be the solution to Equation (3), then Vdt is Lipschitz continuous in F under L∞ norm. ∀F1 , F2 convex and with bounded subdifferentials, consider F = max {F1 ,2 }, F = min {F1 , F2 }. Then by properties of convex functions, F , F are convex. ∂F (µ), ∂F (µ) ∈ {∂F1 (µ), ∂F2 (µ)}. Therefore F and F are both within the domian of convex and bounded subdifferential functions with the following quantitative property: { F ≥ F1 , F2 ≥ F F − F = |F1 − F2 | Noticing that Vdt is the fixed point of operator T defined by Equation (A.6). It’s not hard to see that T is monotonicall increasing in F . Therefore, we have: Vdt (F ) ≤ Vdt (F1 ), Vdt (F2 ) ≤ Vdt (F ) Now let (pi , µi ) be the policy solving Vdt (F ). Let V dt = Vdt (F ), V dt = Vdt (F ). Then consider: ∑ V dt (µ) ≥1V dt (µ)≤F (µ) F (µ) + 1V dt (µ)>F (µ) e−ρdt p1i (µ)V dt (µ1i ) ∑ p1i (µ)1V dt (µ1i )≤F (µ1i ) F (µ1i ) ≥1V dt (µ)≤F (µ) F (µ) + 1V dt (µ)>F (µ) e−ρdt ∑ ∑ p1i (µ)1V dt (µ1i )>F (µ1i ) p2i (µ1i )V dt (µ2i ) + 1V dt (µ)>F (µ) e−2ρdt ≥··· ∑ ∏ ∑ ∑ t pτi (µτi −1 )1V dt (µτi )>F (µτi ) = e−ρt·dt pti (µt−1 i )1V dt (µti )≤F (µti ) F (µi ) t ≥ ∑ t − i1 ,...,it−1 e−ρt·dt ∑ ∏ pτi (µτi −1 )1V dt (µτi )>F (µτi ) i1 ,...,it−1 ∑ ∑ ∏ t pτi (µτi −1 )1V dt (µτi )>F (µτi ) i1 ,...,it−1 =V dt (µ) − F − F 90 ∑ ∑ t pti (µt−1 i )1V dt (µti )≤F (µti ) F (µi ) pti (µt−1 i )1V dt (µti )≤F (µti ) F − F Therefore, V dt − V dt ≤ F − F =⇒ |Vdt (F1 ) − Vdt (F2 )| ≤ |F1 − F2 |. Vdt (F ) has Lipschitz parameter 1. Step 2 : ∀F1 , F2 , ∀ε > 0, by Theorem 1, there exists dt s.t. |V(Fi ) − Vdt (Fi )| ≤ ε |F1 − F2 |. Therefore: |V(F1 ) − V(F2 )| ≤ |V(F1 ) − Vdt (F1 )| + |V(F2 ) − Vdt (F2 )| + |Vdt (F1 ) − Vdt (F2 )| ≤(1 + 2ε) |F1 − F2 | Take ε → 0, since LHS is not a function of ε, we conclude that V(F ) is Lipschitz continuous in F with Lipschitz parameter 1. C.2.2. Proof of Theorem 4 Proof. We prove the three main results in following steps: • Lipschitz continuity. By Lemma 6, we directly get Lipschitz continuity of operator V on {Fn , F } and the Lipschitz parameter being 1. • Convergence of derivatives. Let Vn = V(Fn ), V = V(F ), we show that ∀µ s.t. V (µ) > F (µ), V ′ (µ) = lim Vn′ (µ). Since V (µ) > F (µ), by continuity strict inequality holds in an closed interval [µ1 , µ2 ] around µ. Then by Lemma 26, limn→∞ Vn′ (µ′ ) exists ∀µ′ ∈ [µ1 , µ2 ]. Now consider function Vn′ (µ). Since Vn′′ (µ) is uniformly bounded for all n, Vn′ (µ) are uniformly Lipschitz continuous, thus equicontinuous and totally bounded. Therefore by lemma Arzela-Ascolli, Vn′ converges uniformly to lim Vn′ . By convergence theorem of derivatives, V ′ = lim Vn′ on [µ1 , µ2 ]. Therefore, V ′ (µ) = limn→∞ Vn′ (µ). • Feasibility. For µ s.t. V (µ) = F (µ), feasibility is trivial. Now we discuss the case V (µ) > F (µ). We only prove for µ > µ∗ and µ = µ∗ , the case µ < µ∗ follows by symmetry. If µ > µ∗ , there exists N s.t. ∀n ≥ N , µ > µ∗n . N can be picked large enough that in a closed interval around µ, Vn (µ) > Fn (µ). Therefore, there exists νn being maximizer for Vn (µ) bounded away from µ and satisfying: Vn (µ) = c Fn (νn ) − Vn (µ) − Vn′ (µ)(νn − µ) ρ J(µ, νn ) Pick a converging subsequence νn → ν: c F (ν) − V (µ) − V ′ (µ)(ν − µ) ρ J(µ, νn ) c Fn (νn ) − Vn (ν) − Vn′ (ν)(νn − µ) = lim n→∞ ρ J(µ, νn ) = lim Vn (µ) n→∞ =V (µ) Therefore V (µ) is feasible in Equation (B.3). Suppose µ = µ∗ . Then there exists a subsequence of µ∗n converging from one side of µ∗ . Suppose they are converging from left. Then µ ≥ µ∗n . Previous proof still works. Essentially, what we showed is that the limit of strategy in discrete action problem achieves V (µ) in the continuous action limit. 91 • Unimprovability. First, when µ ∈ {0, 1}, information provides no value but discounting is costly, therefore V (µ) is unimprovable. We now show unimprovability on (0, 1) by adding more feasible information acquisition strategies in several steps. – Step 1. Poisson experiments at V (µ) > F (µ). In this step, we show that ∀µ ≥ µ∗ and V (µ) > F (µ): ρV (µ) = max c ′ µ ≥µ F (µ′ ) − V (µ) − V ′ (µ)(µ′ − µ) J(µ, µ′ ) Suppose not true, then there exists ν s.t.: lim ρVn (µ) =ρV (µ) n→∞ F (ν) − V (µ) − V ′ (ν)(ν − µ) <c J(µ, ν) Fn (ν) − Vn (µ) − Vn′ (µ)(ν − µ) = lim c n→∞ J(µ, ν) ≤ lim ρVn (µ) n→∞ Second line is by the opposite proposition. Third line is by convergence of Fn by assumption, convergence of Vn by Lemma 6 and convergence of Vn′ by Lemma 26. Last inequality is by suboptimality of ν. Similarly, for the case µ ≤ µ∗ , we can apply a symmetric argument to prove. – Step 2. Poisson experiments at V (µ) = F (µ). In this step, we shoe that ∀µ ≥ µ∗ and V (µ) = F (µ) (The symmetric case µ ≤ µ∗ is ommited): F (µ′ ) − F (µ) − DV (µ, µ′ )(µ′ − µ) ρF (µ) ≥ max c µ′ ≥µ J(µ, µ′ ) First of all, we show that V is differentiable at µ and V ′ (µ) = F ′ (µ). Suppose not, then since V (µ) = F (µ) and V ≥ F , we know that V − F is locally minimized at µ. Therefore DV+ (µ) > DV− (µ). By Definition 1, there exists ε > 0, µn1 ↗ µ and V (µn V (µ)−V (µn ) 2 )−V (µ) µn2 ↘ µ s.t. ≥ ε + µ−µn 1 . Let δ1n = µ − µn1 , δ2n = µn2 − µ, this implies: µn −µ 2 1 µn2 − µ (µn2 − µ)(µ − µn1 ) µ − µn1 n n ) − V (µ)) + ) − V (µ)) ≥ ε (V (µ (V (µ 2 1 µn2 − µn1 µn2 − µn1 µn2 − µn1 µn2 − µ µ − µn1 n V (µ ) + V (µn1 ) ≥ V (µ) + ε · min {δ1n , δ2n } =⇒ n 2 n n n µ2 − µ1 µ2 − µ1 On the other hand: µ − µn1 µn2 − µ n (H(µ) H(µ )) + (H(µ) − H(µn1 )) − 2 n n n n µ2 − µ1 µ2 − µ1 ( ) n 1 ′′ n µ − µ1 ′ n n H (µ)(µ − µ2 ) + H (ξ2 )(µ − µ2 ) = n µ2 − µn1 2 ( ) n 1 ′′ n µ2 − µ ′ n n 2 H (µ)(µ − µ1 ) + H (ξ1 )(µ − µ1 ) + n µ2 − µn1 2 92 = 1 (µn2 − µ)(µ − µn1 ) (H ′′ (ξ2n )(µn2 − µ) + H ′′ (ξ1n )(µ − µn1 )) 2 µn2 − µn1 ξ1n and ξ2n are determined by applying intermediate value theorem on H ′ . Now we can choose N s.t. ∀n ≥ N , maxµ′ ∈[µn1 ,µn2 ] {H ′′ (µ′ )} ≤ 2H ′′ (µ). Therefore: µ − µn1 µn2 − µ n (H(µ) − H(µ )) + (H(µ) − H(µn1 )) 2 µn2 − µn1 µn2 − µn1 ≤H ′′ (µ)(µn2 − µ)(µ − µn1 ) =H ′′ (µ)δ1n δ2n Now we consider a stationary experiment at µ that takes any experiment with posc teriors (µn1 , µn2 ) with flow probability H ′′ (µ)δ n n . Then by definition the flow cost of 1 δ2 this information acquisition strategy is less than c, thus is feasible. The expected utility is: c Ve (µ) = ρ µ−µn 1 nV µn 2 −µ1 µ−µn 1 n µn 2 −µ1 µn n 2 −µ e n V (µ1 ) − V (µ) µn 2 −µ1 µn −µ H(µn2 )) + µn2−µn (H(µ) − H(µn1 )) 2 1 (µn2 ) + (H(µ) − V (µ) − Ve (µ) + ε min {δ1n , δ2n } H ′′ (µ)δ1n δ2n V (µ) + ε min {δ1n , δ2n } =⇒ Ve (µ) ≥ 1 + ρc H ′′ (µ)δ1n δ2n ε min {δ1n , δ2n } − ρc H ′′ (µ)δ1n δ2n =V (µ) + 1 + ρc H ′′ (µ)δ1n δ2n ε − H ′′ (µ) max {δ1n , δ2n } =V (µ) + min {δ1n , δ2n } 1 + ρc H ′′ (µ)δ1n δ2n ≥ n can be pick large enough that ε−H ′′ (µ) max {δ1n , δ2n } is positive. Therefore Ve (µ) > V (µ). Now fix n and define: c Vem (µ) = ρ =⇒ µn n 2 −µ e n Vm (µ1 ) − Vm (µ) µn 2 −µ1 µn −µ H(µn2 )) + µn2−µn (H(µ) − H(µn1 )) 2 1 µ−µn n 1 n Vm (µ2 ) µn 2 −µ1 µ−µn 1 n µn 2 −µ1 (H(µ) − + lim Vem (µ) = Ve (µ) > lim Vm (µ) m→∞ m→∞ There exists m large enough that Vem (µ) > Vm (µ), violating optimality of Vm . Contradiction. Therefore, we showed that V ′ (µ) = F ′ (µ). Next we show unimperovability. Suppose not, then ∃ν s.t.: F (µ) < c F (ν) − F (µ) − F ′ (µ)(ν − µ) ρ J(µ, ν) By continuity of V , ∃ε and a neighbourhood µ ∈ O, ∀µ′ ∈ O: V (µ′ ) + ε ≤ c F (ν) − V (µ′ ) − F ′ (µ)(ν − µ′ ) ρ J(µ′ , ν) 93 By uniform convergence of Fn and Vn , there exists ε > 0 and N s.t. ∀n ≥ N : ε c Fn (ν) − Vn (µ′ ) − F ′ (µ)(ν − µ′ ) Vn (µ ) + ≤ 2 ρ J(µ′ , ν) c Fn (ν) − Vn (µ′ ) − Vn′ (µ′ )(ν − µ′ ) ε c Fn (ν) − Vn (µ′ ) − F ′ (µ)(ν − µ′ ) =⇒ + ≤ ρ J(µ′ , ν) 2 ρ J(µ′ , ν) ρε J(µ′ , ν) =⇒ Vn′ (µ′ ) ≥ F ′ (µ) + 2c ν − µ′ ′ ′ J(µ ,ν) , which is a positive number In an interval around µ, Vn′ (µ′ ) − F ′ (µ) ≥ ρε 2c ν−µ′ independent of n and uniformly bounded away from 0 for all µ′ . Then it’s impossible that V ′ (µ) = F ′ (µ). Contradiction. What’s more, since V ′ is Lipschitz continuous at any V (µ) > F (µ), it can be extended smoothly to the boundary. Since V ′ = F ′ at V (µ) = F (µ), then the limit of this smooth extension has lim V ′ (µ) = F ′ (µ). Therefore V is C (1) smooth on [0, 1]. – Step 3. Repeated experiments and contradictory experiments. With the convergence result we have on hand, we can apply similar proof by contradiction method in step 1 and 2 to rule out these two cases. We omitted the proofs here. Therefore: { } c V (µ′ ) − V (µ) − V ′ (µ)(µ′ − µ) V (µ) = max F (µ), max µ′ ρ J(µ, µ′ ) – Step 4. Diffusion experiments. Suppose at µ, diffusion experiment is strictly optimal: V (µ) < − c D2 V (µ) ρ H ′′ (µ) Then by Definition 1, there exists ε, δ1 s.t.: V (µ) + ε ≤ c V (µ + δ1 ) − V (µ) − V ′ (µ)δ1 ρ H(µ) − H(µ + δ1 ) + H ′ (µ)δ1 Then by definition of derivative, there exists δ2 s.t.: V (µ) + c ε ≤ 2 ρ δ2 (V (µ + δ1 ) − V (µ)) δ1 +δ2 δ2 (H(µ) − H(µ + δ1 )) δ1 +δ2 + + δ2 δ1 +δ2 δ2 δ1 +δ2 (V (µ − δ2 ) − V (µ)) (H(µ) − H(µ − δ2 )) By convergence of Vn , there exists n s.t.: Vn (µ) + ε c ≤ 4 ρ δ2 (Vn (µ + δ1 ) − Vn (µ)) δ1 +δ2 δ2 (H(µ) − H(µ + δ1 )) δ1 +δ2 + + δ2 δ1 +δ2 δ2 δ1 +δ2 (Vn (µ − δ2 ) − Vn (µ)) (H(µ) − H(µ − δ2 )) δ1 δ2 Vn (µ + δ1 ) + Vn (µ − δ2 ) δ1 + δ2 δ1 + δ2 ( ( )) ρ δ2 δ1 ≥Vn (µ) 1 + H(µ) − H(µ + δ1 ) − H(µ − δ2 ) c δ1 + δ2 δ1 + δ2 ( ) ρ δ2 δ1 ε + H(µ) − H(µ + δ1 ) − H(µ − δ2 ) c δ1 + δ2 δ1 + δ2 4 =⇒ 94 If we consider the experiment with posterior beliefs µ + δ1 , µ − δ2 at µ. Taking this experiment at µ with flow probability: H(µ) − δ2 H(µ δ1 +δ2 c + δ1 ) − δ1 H(µ δ1 +δ2 − δ2 ) Then the flow cost constraint will be satisfied and the utility gain is: Ven (µ) = δ2 V (µ δ1 +δ2 n + δ1 ) + δ1 V (µ δ1 +δ2 n − δ2 ) ) δ1 2 H(µ) − δ1δ+δ H(µ δ H(µ δ + ) − − ) 1 2 δ1 +δ2 2 ( ) ρ δ2 δ1 H(µ) − δ1 +δ2 H(µ + δ1 ) − δ1 +δ2 H(µ − δ2 ) c ε ) ( ≥Vn (µ) + 2 1 1 + ρc H(µ) − δ1δ+δ H(µ + δ1 ) − δ1δ+δ H(µ − δ2 ) 4 2 2 1+ ρ c ( >Vn (µ) Contradiction. To sum up, we proved that V (µ) solves Equation (B.3). Lemma 25 (Convergence of µ∗ ). Suppose Assumptions 1, 2, 3 and 4 are satisfied. Let Fn be piecewise linear function on [0,1] satisfying: 1. |Fn − F | → 0; 2. ∀µ ∈ [0, 1], lim Fn′ (µ) = F ′ (µ). Let µ∗n be as defined in Lemma 13 associated with Fn . Suppose µ∗ = lim µ∗n . Then, 1. ∀µ > µ∗ , ∃N s.t. ∀n ≥ N , νn (µ) ≥ µ. 2. ∀µ < µ∗ , ∃N s.t. ∀n ≥ N , νn (µ) ≤ µ. Proof. ∀µ > µ∗ , by definition lim µ∗n = µ∗ , there exists N s.t. ∀n ≥ N : |µ∗n − µ∗ | < |µ − µ∗ |. Therefore µ > µ∗n and thus νn (µ) ≥ µ. Same argument applies to µ < µ∗ . Lemma 26. Suppose Assumptions 1, 2, 3 and 4 are satisfied. Let Fn be piecewise linear function on [0,1] satisfying: 1. |Fn − F | → 0; 2. ∀µ ∈ [0, 1], lim Fn′ (µ) = F ′ (µ). Define Vn = V(Fn ) and V = V(F ). Then: ∀µ ∈ [0, 1] s.t. V (µ) > F (µ), ∃ lim Vn′ (µ). Proof. With Lemma 25, we can define µ∗ ∈ [0, 1] (we pick an arbitrary limiting point when there are multiple ones). First by assumption lim Fn′ (µ) = F ′ (µ), and Vn′ = Fm′ on the boundary by construction in Theorem 2, the statement is automatically true for µ ∈ {0, 1}. We discuss three possible cases for different µ ∈ (0, 1) separately. • Case 1 : µ > µ∗ . If V (µ) > F (µ), then by convergence in L∞ norm, there exists N and neighbourhood µ ∈ O s.t. ∀n ≥ N , µ′ ∈ O, Vn (µ′ ) > Fn (µ′ ). We know that by no-repeated-experimentation property of solution νn (µ) to problem with Fn , νn (µ) > sup O. Now consider Vn′ (µ). Suppose Vn′ (µ) have unlimited limiting point. 95 Then exists subsequence lim Vn′ (µ) = ∞ or −∞. If lim Vn′ (µ) = ∞, consider ν = 0, else if lim Vn′ (µ) = −∞, consider ν = 1: V (µ) = lim Vn (µ) n→∞ c Fn (ν) − Vn (µ) − Vn′ (µ)(ν − µ) ≥ lim n→∞ ρ J(µ, ν) c F (ν) − V (µ) c ν−µ = − lim Vn′ (µ) n→∞ ρ J(µ, ν) ρ J(µ, ν) =+∞ Contradiction. Therefore we know that Vn′ (µ) must have finite limiting points. Now suppose Vn′ (µ) doesn’t converge, then there exists two subsequences lim Vn′ (µ) = V1′ and lim Vm′ (µ) = V2′ , V1′ ̸= V2′ ∈ R. Suppose V1′ > V2′ . Now take a converging subsequence of optimal policy at µ νnk → ν 1 . By previous result ν 1 ≥ sup O. Therefore ν 1 will be bounded away from µ. Consider: V (µ) = lim Vnk (µ) k→∞ c Fmk (ν 1 ) − Vmk (µ) − Vm′ k (µ)(ν 1 − µ) k→∞ ρ J(µ, ν 1 ) c F (ν 1 ) − V (µ) − V2′ (ν 1 − µ) = ρ J(µ, ν 1 ) Fnk (νnk ) − Vnk (µ) − Vn′K (µ)(νnk − µ) (V1′ − V2′ )(ν 1 − µ) + = lim k→∞ J(µ, νnk ) J(µ, ν 1 ) ≥ lim >V (µ) Contradiction. Therefore, limit point of Vn′ (µ) must be unique. Such limit point exists since Vn′ are uniformly bounded. To sum up, there exists lim Vn′ (µ). • Case 2 : µ = µ∗ . Since V (µ∗ ) > F (µ∗ ). This implies that ∃N s.t. ∀n ≥ N , Vn (µ∗ ) > Fn (µ∗ ). In this case, by Lemma 13, µ∗n are unique. Since µ∗n is the unique intersection of U n+ and U n− (Definition of U n+ , U n−1 are as in Lemma 13, n is index), we can first establish convergence of µ∗ through convergence of U n+ and U n−1 . By definition: U + (µ) = Fm (µ′ ) ρ µ ≥µ,m≥µ 1 + J(µ, µ′ ) c max ′ Therefore, suppose the maximizer for index n is νn , mn , then for index n′ : ′ Fn′ (νn ) 1 + ρc J(µ, νn ) Fn (νn ) − Fn′ (νn ) ≥U n+ (µ) + 1 + ρc J(µ, νn ) U n + (µ) ≥ ≥U n+ (µ) − |Fn − Fn′ | Since n and n′ are totally symmetric, we actually showed that the functional map from Fn to U n+ is Lipschitz continuous in Fn with Lipschitz parameter 1. Symmetric 96 argument shows that same property for U n− . Since by assumption Fn is uniformly converging, we can conclude that U n+ and U n− are Cauchy sequence with L∞ norm. Therefore converging. Then U n+ − U n− uniformly converges and their roots will be UHC when n → ∞. To show convergence of µ∗n , it’s sufficient to show that such limit is unique. This is not hard to see by applying envelope theory to U n+ and U n− : ′′ (µ)(ν −µ) d n U n+ (µ) = − ρc F (νn )H . Therefore U n+ − U n−1 will have slope bounded below dµ J(µ,νn )2 from zero, therefore the limit will also be strictly increasing. So µ∗ is unique. Since µ∗n → µ, and Vn′′ (µ) are all bounded from above: Vn′ (µ∗ ) =Vn′ (µ∗n ) + Vn′′ (ξn )(µ∗ − µ∗n ) =Vn′′ (ξn )(µ∗ − µ∗n ) → 0 • Case 3 : µ < µ∗ . We can apply exactly the symmetric proof of case 1. C.3. General Information Measure C.3.1. Proof of Theorem 5 Proof. Consider Equation (9), it’s sasy to see that both the inner maximization problem and the constraint are linear in pi and σ 2 . Therefore, Equation (9) can be written equivalently as choosing either one posterior or a diffusion experiment: { } c (V (ν) − V (µ) − V ′ (µ)(ν − µ)) cV ′′ (µ) ρV (µ) = max ρF (µ), sup , ′′ J(µ, ν) Jνν (µ, µ) ν Now suppose µ ∈ D and ρV (µ) = c JV′′ ′′ (µ) νν (µ,µ) sup ν . This is saying, the maximization problem: c (V (ν) − V (µ) − V ′ (µ)(ν − µ)) J(µ, ν) will be solved for ν → µ. Therefore, consider the FOC: FOC: V ′ (ν) − V ′ (µ) Jν′ (µ, ν) − (V (ν) − V (µ) − V ′ (µ)(ν − µ)) J(µ, ν) J(µ, ν)2 It must be ≤ 0 when ν → µ+ and ≥ 0 when ν → µ− . Otherwise, the diffusion experiment will be locally dominated by some Poisson experiment. When ν → µ, J(µ, ν) → 0, V ′ (ν) → V ′ (µ), V (ν) − V (µ) − V ′ (µ)(ν − µ) → 0. Therefore, we can apply L’Hospital’s rule: ) ( V (ν)−V (µ)−V ′ (µ)(ν−µ) ′′ ′′ ′ limν→µ V (ν) − Jνν (µ, ν) − Jν (µ, ν) · F OC J(µ,ν) lim FOC = ν→µ limν→µ Jν′ (µ, ν) ) ( V (ν)−V (µ)−V ′ (µ)(ν−µ) ′′ ′′ (µ, ν) lim V (ν) − J ν→µ νν J(µ,ν) 1 = ′ 2 limν→µ Jν (µ, ν) ( ) (3) V (ν)−V (µ)(ν−µ) (3) ′′ lim V (ν) − J (µ, ν) − J (µ, ν) · FOC ννν ν→µ νν J(µ,ν) 1 = ′′ (µ, ν) 2 limν→µ Jνν 97 1V = 3 (3) (3) (µ) − Jννν (µ, µ) JV′′ ′′ (µ) νν (µ,µ) ′′ (µ, µ) Jνν ′′ (µ) Now consider V (µ) − ρc JV′′ (µ,µ) . By assumption, it’s non-negative and achieves 0 at µ. νν Therefore it is locally minimized at µ: ( ) d c V ′′ (µ) V (µ) − =0 ′′ (µ, µ) dµ ρ Jνν ) ρ V (3) (µ) V ′′ (µ) ( (3) (3) (µ, µ) =0 (µ, µ) + J =⇒ V ′ (µ) − ′′ J + ′′ ννµ c Jνν (µ, µ) Jνν (µ, µ)2 ννν (3) =⇒ V (3) (µ) − Jννν (µ, µ) JV′′ νν (µ,µ) ′′ (µ, µ) Jνν (3) =⇒ ′′ (µ) V (3) (µ) − Jννν (µ, µ) JV′′ ′′ (µ) νν (µ,µ) ′′ (µ, µ) Jνν (3) ρ Jννµ (µ, µ) = V ′ (µ) + V ′′ (µ) ′′ c Jνν (µ, µ)2 (3) ρ ρ Jννµ (µ, µ) = V ′ (µ) + V (µ) ′′ c c Jνν (µ, µ) By smoothness of V and J, for FOC to be non-positive when ν → µ+ and non-negative when ν → µ− , it can only be that: ′′ (3) V ′ (µ)Jνν (µ, µ) + V (µ)Jννµ (µ, µ) = 0 ′′ n) , we have: Now suppose there exists µn → µ s.t. ρV (µn ) = c J ′′V (µ(µn ,µ n) νν ′′ (3) V ′ (µn )Jνν (µn , µn ) + V (µn )Jννµ (µn , µn ) = 0 By differentiability of the whole term, we have: ) d ( ′ ′′ (3) V (µ)Jνν (µ, µ) + V (µ)Jννµ (µ, µ) = 0 dµ ( (3) ) ( (4) ) ′′ (3) (4) =⇒ V ′′ (µ)Jνν (µ, µ) + V ′ (µ) 2Jννµ (µ, µ) + Jννν (µ, µ) + V (µ) Jνννµ (µ, µ) + Jννµµ (µ, µ) = 0 ) V (µ) ( (3) ρ ′′ (3) (3) (µ, µ)2 − ′′ 2Jννµ (µ, µ)2 + Jννν (µ, µ)Jννµ (µ, µ) =⇒ V (µ)Jνν c Jνν (µ, µ) ( (4) ) (4) + V (µ) Jνννµ (µ, µ) + Jννµµ (µ, µ) = 0 (3) (3) (3) 2Jννµ (µ, µ)2 + Jννν (µ, µ)Jννµ (µ, µ) ρ ′′ 2 (4) (4) + Jνννµ (µ, µ) + Jννµµ (µ, µ) = 0 =⇒ Jνν (µ, µ) − ′′ c Jνν (µ, µ) By assumption, µ ∈ D, therefore the differential equation must not be satisfied. This implies that there doesn’t exist such µn → µ. So the set: { } V ′′ (µ) µ ∈ DρV (µ) = c ′′ Jνν (µ, µ) is a closed set (closed w.r.t. D) containing no limiting point. That is to say, within any compact subset of D, this set is finite. By definition of Lebesgue measure, the measure of a set can be approximated by compact subsets from below. Therefore, this set will be a zero-measure set. 98 C.4. Connection to Static Problem C.4.1. Proof of Theorem 7 Proof. We prove by constructing a candidate solution satisfying the characterization in Theorem 7, then show its optimality and uniqueness. • Construction: F is a piecewise linear convex function on [0, 1] with finite kinks µk . Now consider the function G(µ) = F (µ) + mc H(µ). By definition, in each interval b [µk , µk+1 ], G(µ) is a strictly concave function. Now consider G(µ) = Co(G) which is b will be locally linear in neighbourhood of any µ where the upper concave hull of{G. G } b b b and piecewise G(µ) > G(µ). Let I = µG(µ) > G(µ) . Since concave function G convex function G are both continuous, I will be an open set. Therefore, I will be consisted of countable open intervals ∪In . Now we prove the following statement: ∀In , there exists µk ∈ In = {an , bn }. Suppose b n ), G(bn ) = G(b b n ). not, then G(µ) will be strictly concave on In and G(an ) = G(a ′ Concavity of G implies that G (µ) being strictly decreasing on (an , bn ). On the other b b n ), this implies that G′ (an ) ≤ hand, since G(µ) ≥ G(µ) ∀µ ∈ (an , bn ) and G(an ) = G(a b n ). Similarly, G′ (bn ) ≥ inf ∂ G(b b n ). This is to say, if we replace G b with G on sup ∂ G(a b will still have decreasing subdifferentials. G b being the upper concave implies that In , G b G(µ) = G(µ) on In . Contradiction. Since the number of µk is finite, we’ve shown that I is consisted of finite number of open intervals ∪In . Now we define: Ln (µ) = G(an ) + G(bn ) − G(an ) (µ − an ) b n − an Noticing that this is equivalent to defining: m b V (µ) =G(µ) − H(µ) c ( m m ) =Co F + H (µ) − H(µ) c c • Optimality: First it’s easy to see that V is feasible in Equation (10). ∀µ s.t. V (µ) > ( ) F (µ), pick σ 2 = − H ′′2c(µ) . Then 12 σ 2 V ′′ (µ) = − H ′′c(µ) − mc H ′′ (µ) = m. Now we show that it’s unimprovable in Equation (10). By construction, V (µ) + mc H(µ) is a concave function, therefore ∀ν: ( ) m m m V (ν) + H(ν) ≤ V (µ) + H(µ) + V ′ (µ) + H ′ (µ) (ν − µ) c c c m ′ (H(µ) − H(ν) + H ′ (µ)(ν − µ)) =⇒ V (ν) − V (µ) − V (µ)(ν − µ) ≤ c ∑ 1 ′ =⇒ pi (V (νi ) − V (µ) − V (µ)(νi − µ)) + σ 2 V ′′ (µ) 2 ∑ m 1 m ′ ≤ pi (H(µ) − H(νi ) + H (µ)(ν − µ)) − σ 2 H ′′ (µ) c 2 c ≤m That is to say, V is unimprovable. 99 • Uniqueness: Suppose there is Ve ̸= V solving Equation (10), where Ve ∈ C (1) [0, 1] and twice differentiable when Ve (µ) > F (µ). Now consider U = Ve − V ̸= 0. Suppose min U < 0. Let µ∗ ∈ arg min U . By definition, µ∗ ∈ (0, 1). U (µ∗ ) < 0 implies that V (µ∗ ) > F (µ∗ ). Therefore, µ∗ ∈ In . Now consider: Ve (bn ) − Ve (µ∗ ) − Ve ′ (µ∗ )(bn − µ∗ ) H(µ∗ ) − H(bn ) + H ′ (µ∗ )(bn − µ∗ ) V (bn ) − V (µ∗ ) − V ′ (µ∗ )(bn − µ∗ ) U (bn ) − U (µ∗ ) − U ′ (µ∗ )(bn − µ∗ ) + c =c H(µ∗ ) − H(bn ) + H ′ (µ∗ )(bn − µ∗ ) H(µ∗ ) − H(bn ) + H ′ (µ∗ )(bn − µ∗ ) ∗ ′ ∗ ∗ V (bn ) − V (µ ) − V (µ )(bn − µ ) U (bn ) − U (µ∗ ) =c + c H(µ∗ ) − H(bn ) + H ′ (µ∗ )(bn − µ∗ ) H(µ∗ ) − H(bn ) + H ′ (µ∗ )(bn − µ∗ ) ∗ ′ ∗ ∗ V (bn ) − V (µ ) − V (µ )(bn − µ ) >c H(µ∗ ) − H(bn ) + H ′ (µ∗ )(bn − µ∗ ) m ≥c =m The second equality is from µ∗ ∈ arg min U . The last inequality is from U (bn ) ≥ 0 and U (µ∗ ) = 0. Contradiction. e = Ve + m H ≥ G. e First we show that G e is weakly Now suppose max U > 0. Consider G c concave. Suppose not, then there exists µ, ν s.t. e e e′ (µ)(ν − µ) G(ν) > G(µ) +G m =⇒ Ve (ν) − Ve (µ) − Ve ′ (µ)(ν − µ) > (H(µ) − H(ν) + H ′ (µ)(ν − µ)) c ′ e e e V (ν) − V (µ) − V (µ)(ν − µ) =⇒ c >m H(µ) − H(ν) + H ′ (µ)(ν − µ) Contradicting the optimality of Ve . Since max U > 0, there exists open interval I = e > G on I and G(a) e e e is (a, b) s.t. G = G(a), G(b) = G(b). Since Fe > F on I, G ′ e (µ1 ) − twice differentiable. By intermediate value theorem, there exists µ1 < µ2 , G ′ ′ ′ ′ ′ e (µ2 ) − G (µ2 ) < 0. By concavity of G, G (µ1 ) ≥ G (µ2 ). Therefore, G (µ1 ) > 0, G ′ e (µ1 ) > G f′ (µ2 ). Again by intermediate value theorem, there exists G e′′ (µ) < 0. Since G e is globally concave, ∀ν ̸= µ, G e e e′ (µ)(ν − µ) G(ν) < G(µ) +G m (H(µ) − H(ν) + H ′ (µ)(ν − µ)) =⇒ Ve (ν) − Ve (µ) − Ve ′ (µ)(ν − µ) < c Ve (ν) − Ve (µ) − Ve ′ (µ)(ν − µ) =⇒ c <m H(µ) − H(ν) + H ′ (µ)(ν − µ) e′′ (µ) < 0 =⇒ Ve ′′ (µ) < − m H ′′ (µ) =⇒ c Ve ′′′′(µ) < m. Contradicting feasibility of Ve . G c −H (µ) To sum up, we showed that V solving Equation (10) is unique. 100 C.4.2. Proof of Corollary 8 Proof. It’s not hard to observe that: ) ( I(S; X ) =I (A, S; X ) − I A; X S ) ( =I (A; X ) + I S; X A ≥I (A; X ) Therefore, Equation (11) will be equivalent to: m sup Eµ [u(A, X )] − I (A; X ) c A ) ∑ ∑ m( H(µ) − pi H(νi ) = sup pi F [νi ] − c νi ,pi ) m ∑ ( m = sup pi F [νi ] + H(νi ) − H(µ) c c νi ,pi ( ) m m =Co F + H (µ) − H(µ) c c This is exactly the solution in Theorem 7. C.4.3. Proof of Theorem 9 Proof. Take any information acquisition strategy (S t , At , T ) that satisfies the constraints in Equation (1). The achieved expected utility will be: ] [ ∞ ∑ ( ) [ ] e−ρdt·t λI S t ; X |S t−1 E(S t =S t )∞ e−ρdt·T E u(AT , X )|S T −1 − t=0 t=0 We can separate the utility gain part and information cost part. By the iterated rule of expectation, the utility gain part is: [ −ρdt·T [ ]] E(S t =S t )∞ e E u(AT , X )|S T −1 t=0 [ ( )] =ET ,AT e−ρdt·T u AT , X It’s easy to see that this is determined only by action time T and action process AT . Let ( ) Set−1 = 1T =t , At T =t . Then by the three Markov properties in Equation (1), we have: X → S t → Set Therefore: ∞ ∑ ) ( e−ρdt·t λI S t ; X S t−1 t=0 ∞ ∑ =λ t=0 =λ ∞ ∑ ( ( ) ( )) e−ρdt·t I S t ; X − I S t−1 ; X ( ( ) ( ( ) ( )) ) e−ρdt·t I Set ; X + I S t ; S Set − I Set−1 ; X − I S t−1 ; S Set−1 t=0 =λ ∞ ∑ t=0 e −ρdt·t ( ∞ ∞ ( ( ∑ ∑ t−1 ) t) t−1 ) t −ρdt·t t −ρdt·t t−1 e e e I S ;X S +λ e I S ;X S − λ e I S ; X Se t=0 t=0 101 =λ ∞ ∑ t=0 ≥λ ∞ ∑ ∞ ) ( ( ) ( )∑ I S t ; X Set e−ρdt·t I Set ; X Set−1 + λ 1 − e−ρdt t=0 ( ) e−ρdt·t I Set ; X Set−1 t=0 Therefore, by replacing signal process S t with Set , the DM can achieve the same utility gain and pay a weakly lower information cost. Now consider: ∞ ) ( ∑ [ ( )] e−ρdt·t I Set ; X Set−1 ET ,AT e−ρdt·T u AT , X − λ t=0 [ ( ) ] =P [T = 0] E u At , X T = 0 [ ( )] [ ( )] + P [T = 1] E e−ρdt u AT , X + P [T > 1] E e−ρdt·T u AT , X ∞ ( ( ∑ ) t−1 ) 0 −ρdt·t t e e − λI A ; X µ − λ e I S ; X Se t=1 [ ( ) ] =P [T = 0] E u At , X T = 0 ( ) [ −ρdt ( 1 ) ] 0 e + P [T = 1] E e u A , X T = 1 − λI S ; X µ ∞ ( ∑ t−1 ) [ −ρdt·T ( T )] −ρdt·t t e e I S ; X Se + P [T > 1] E e u A ,X − λ t=1 Suppose the term: e −ρdt ( 0) ] [ ( 1 ) 1 e P [T = 1] E u A , X T = 1 − λI S ; X Se (C.8) ∞ ( ) ∑ [ ( )] e−ρdt·t I Set ; X Set−1 ≤ 0 +P [T > 1] E e−ρdt·T u AT , X − λ t=1 Then discard all actions and information after first period will give the DM higher expected utility E [u (A0 ; X )]. This information and action process satisfies this theorem. Therefore, WLOG we assume Equation (C.8), as well as all continuation payoffs are non-negative. Then: ET ,AT ∞ ( ∑ t−1 ) [ −ρdt·T ( T )] −ρdt·t t e e u A ,X − λ e I S ; X Se t=0 [ ( ) ] =P [T = 0] E u A0 , X T = 0 ( ) [ ( ) ] + P [T = 1] E e−ρdt u A1 , X T = 1 − λI Se0 ; X µ [ −ρdt·T + P [T > 1] E e ( T u A ,X )] −λ ∞ ∑ e −ρdt·t ( t−1 ) t e I S ; X Se t=1 [ ( ) ] =P [T = 0] E u A0 , X T = 0 ( ) [ ( ) ] + P [T = 1] E e−ρdt u A1 , X T = 1 − λI Se0 ; X µ ∞ ( ∑ t−1 ) [ −ρdt·T ( T )] −ρdt·t t e + P [T > 1] E e u A ,X − λ e I S ; X Se t=1 102 [ ( ) ] =P [T = 0] E u A0 , X T = 0 ( ) [ ( ) ] + P [T = 1] E e−ρdt u A1 , X T = 1 − λI Se0 ; X µ ( ) [ ( ) ] + P [T = 2] E e−2ρdt u A2 , X T = 2 − e−ρdt λI Se1 ; X Se0 ∞ ( ) ∑ [ ( )] + P [T > 2] E e−ρdt·T u AT , X − λ e−ρdt·t I Set ; X Set−1 t=1 [ ( ) ] ≤P [T = 0] E u A0 , X T = 0 ( ) [ ( ) ] + P [T = 1] E e−ρdt u A1 , X T = 1 − λI Se0 ; X µ ( ) [ ( ) ] + P [T = 2] E e−ρdt u A2 , X T = 2 − λI Se1 ; X Se0 ∞ ( ∑ t−1 ) [ −ρdt·(T −1) ( T )] −ρdt·(t−1) t e + P [T > 2] E e u A ,X − λ e I S ; X Se t=1 [ ( ) ] =P [T = 0] E u A0 , X T = 0 ( ) [ −ρdt ( T ) ] 0 e1 e + P [T = 1, 2] E e u A , X T = 1, 2 − λI S , S ; X µ ∞ ( ∑ t−1 ) [ −ρdt·(T −1) ( T )] −ρdt·(t−1) t e + P [T > 2] E e u A ,X − λ e I S ; X Se t=1 ≤··· ] [ ( ) =P [T = 0] E u A0 , X T = 0 ( ) [ ] + P [T > 0] E e−ρdt u(AT , X ) − λI Se0 , Se1 , . . . ; X µ ] [ ( ) ≤P [T = 0] E u A0 , X T = 0 ) [ ] ( + P [T > 0] E e−ρdt u(AT , X ) − λI AT ; X µ ) [ ] ( ≤P [T = 0] F (µ) + P [T > 0] E e−ρdt u(AT , X ) − λI AT ; X µ By definition of information measure: ) ( I AT ; X µ [ ( )] =H(µ) − EAt =AT H µ′ At ( [ ]) =P [T = 0] (H(µ) − H(µ)) + P [T > 0] H(µ) − EAt =AT H(µ′ At )T > 0 ) ( =P [T > 0] I AT ; X µ Therefore: ET ,AT ∞ ( ) ∑ [ −ρdt·T ( T )] e u A ,X − λ e−ρdt·t I Set ; X Set−1 t=0 )) ( [ ( )] ( ≤P [T = 0] F (µ) + (1 − P [T = 0]) E e−ρdt u AT , X − λI AT ; X µ { } [ −ρdt ] ≤ max F (µ), sup E e u(A, X ) − λI(A; X ) A Therefore, we showed that any dynamic information acquisition strategy solving Equation (1) will have weakly lower expected utility level than a static information acquisition 103 strategy solving Equation (12). On the other hand, any solution to Equation (12) will be a dynamic information acquisition strategy with only one period non-degenerate information, which will be dominated by solution to Equation (1). 104
© Copyright 2026 Paperzz