Random Intertemporal Choice
Jay Lu
Kota Saito
UCLA
Caltech∗
March 21, 2016
Abstract
We provide a theory of random intertemporal choice. Choice is random due to unobserved heterogeneity in discounting from the perspective of a modeler. First, we show that
the modeler can identify the distribution of discount rates uniquely from random choice.
We then provide axiomatic characterizations of random discounting utility models, including exponential and quasi-hyperbolic discounting as special cases. Finally, we test our
axioms using recent experimental data. We find that random exponential discounting is
not rejected and the distribution of discount rates is statistically indistinguishable across
decision times.
1
Introduction
In many economic models of intertemporal choice, agents have discount rates that are heterogeneous across the population. Theoretically, this heterogeneity is useful as it allows
the modeler to capture aspects of the data that cannot be explained assuming a standard
representative agent model.1 Empirically, this heterogeneity is realistic and consistent
∗
We would like to thank Taisuke Imai for his excellent RA work on the data analysis. We also want to thank
Jose Apesteguia, Miguel Ballester, Yoram Halevy, Yoichiro Higashi, Vijay Krishna, Tomasz Strzalecki, Charlie
Sprenger and participants at the D-Day conference at Duke and LA theory bash for their helpful comments.
1
For example, see Krusell and Smith (1998).
1
with what is observed in actual choice data. In many cases however, the modeler (an
outside observer such as an econometrician) is not privy to all the different factors affecting discount rates across the population. Whenever there is unobserved heterogeneity in
discounting, the resulting aggregate choice data from the prospective of the modeler is
random.
In this paper, we provide a theory of random intertemporal choice due to unobserved
heterogeneity in discounting. Our contributions are threefold. First, we show that the
modeler can identify the distribution of discounting in the population uniquely from random choice. Second, we provide axiomatic characterizations of random discounting, including random exponential and quasi-hyperbolic discounting as special cases. Third, using
the data from Halevy (2015), we test our axioms and find that they are consistent with
random exponential discounting. Moreover, while Halevy (2015) finds significant timeinconsistency and non-stationarity in individual choices, we find that the distribution of
discount rates is statistically indistinguishable across decision times. This demonstrates
how models of random discounting can deliver insights not easily captured by models of
individual deterministic choice.
Our general model is random utility maximization model with discounted utilities.
Agents in a population are characterized by a distribution µ of discount functions and
a taste utility u. In other words, the only residual unobserved heterogeneity from the
perspective of the modeler is due to discounting.2 The probability that an infinite-period
consumption stream f = (f (0), f (1), . . . ) is chosen from a menu F of consumptions streams
is the proportion of agents who rank f higher than every other consumption stream in F .
In other words, if we let ρF (f ) denote this probability, then
)
X
X
ρF (f ) = µ D ∈ D D (t) (u (f (t))) ≥
D (t) u (g (t)) for all g ∈ F ,
(
t
t
where D is the set of discount functions that are decreasing in t, D(0) = 1 and
P
s>t D (s)
→
0 as t → ∞. We call this a random discounting model.
Our first main result shows that under a random discounting model, the modeler can
2
We can easily generalize our model to accommodate unobserved tastes as well.
2
uniquely identify the distribution of discounting in the population. Moreover, this identification can be achieved using only binary choice data. This extends related uniqueness
results of random utility representations to our setup with infinite-period consumption
streams.
Our second main result is an axiomatic characterization of our model. We introduce
three new axioms: Initial Determinism, Time Monotonicity, and Stochastic Impatience.
Initial Determinism requires that choice must be deterministic when all consumption
streams differ only at time 0. Time Monotonicity requires that consumption streams that
dominate at every time period must be chosen for sure. Stochastic Impatience requires
that when a menu consists of early and delayed consumption streams, the early streams
must be chosen for sure. We show that these three axioms along with the standard axioms
for random utility representations fully characterize the random discounting model.
By adding two new axioms, Stochastic Stationarity and Intertemporal Extremeness,
we characterize a special case of random discounting model, the random exponential discounting. In a random exponential discounting model, every agent in the population is
an exponential discounter. In other words, for each D in the support of µ, there exists
δ ∈ (0, 1) such that
D(t) = δ t .
Stochastic Stationarity requires that choice probabilities remain unchanged when all consumption streams in a menu are delayed by the same number of time periods. It is exactly
the stochastic version of the deterministic stationarity axiom originally proposed by Koopmans (1960) which requires that intertemporal choices are not reversed when consumptions are delayed by the same number of time periods. Our second axiom, Intertemporal
Extremeness is novel and requires that when faced with a consumption stream and two
appropriately delayed streams, either the earliest or the latest stream will be chosen for
sure. It is the intertemporal analog of Extremeness proposed by Gul and Pesendorfer
(2006) (see Appendix A for further discussion on this relationship).
By weakening Stochastic Stationarity, we axiomatize a model of random quasi-hyperbolic
discounting. In a random quasi-hyperbolic discounting model, every agent in the popula-
3
tion is a quasi-hyperbolic discounter. In other words, for each D in the support of µ, there
exist β ∈ [0, 1] and δ ∈ (0, 1) such that
D(t) = βδ t .
Weak Stochastic Stationarity requires Stochastic Stationarity to hold only for consumption
streams that agree at time 0. Analogous to the deterministic quasi-hyperbolic discounting
model, random quasi-hyperbolic discounting allows for violations of Stochastic Stationarity
when comparing immediate to future consumptions.
Our last main result involves testing our axioms using the data from Halevy (2015).
Of the axioms for random exponential discounting, the available data allow us to test
Stochastic Stationarity, Stochastic Impatience, and Time Monotonicity. We find that all
three are not rejected. The fact that Stochastic Stationarity holds is surprising given
that Halevy found that around 40% of individuals in the experiment violate deterministic
stationarity.
Finally, we elicit the distribution of discount factors. Since in Halevy’s experiments,
the same subjects were asked questions at two different time periods, we were able to
elicit two distributions of discount factors, one for each decision time. We find that the
distributions are statistically indistinguishable across decision times. This is remarkable
given Halevy’s finding that around half of the individuals in the experiment are timeinconsistent and make different choices at the two decision times. These results suggest
the potential usefulness of our model in terms of prediction; while individual choice data
may be highly inconsistent, a theory of random choice may reveal patterns of choice pattern
at the aggregate level that would be useful for inference for the modeler.
We are not the first to find stable discount distributions across decision times. In
a large field experiment conducted over two years, Meier and Sprenger (2015) elicited
time preferences using incentivized choice experiments. They found that the aggregate
distributions of discount factors and the proportion of present-biased individuals are found
to be unchanged over the two years. This suggests that our finding on the stability of
discount distributions may be robust in other settings as well.
4
On the theoretical side, the closest papers to ours are Gul and Pesendorfer (2006)
and Lu (2015). While they do not model intertemporal choice, from a technical perspective, we generalize their results into a richer domain with an infinite-dimensional product
space. This extension is necessary in order for us to deal with stochastic choice over the
standard domain for intertemporal preferences, i.e., the set of infinite-period consumption
streams. Furthermore, our axiomatic characterizations for the random exponential and
quasi-hyperbolic discounting models are novel and address issues unique to intertemporal
choice.
Using a different primitive, Higashi et al. (2009) also provide a model of random discounting which includes random exponential discounting as a special case. Their primitive
consists of an ex-ante menu preference reflecting the agent’s anticipation of future uncertainty in discount rates. In contrast, our primitive consists of ex-post random choice. More
recently, Higashi et al. (2016) propose a behavioral definition of more or less impatient
using their ex-ante menu preference primitive.
Pennesi (2015) provides an axiomatization of an intertemporal version of Luce’s model.
As in Luce’s model, the probability that an agent chooses a consumption stream from
a menu of streams is equal to the ratio of the utility of the stream over the sum of
the utilities of the streams in the menu, where each utility is evaluated according to
exponential discounting. Pennesi (2015) also provides a generalization which accounts for
quasi-hyperbolic discounting.
Apesteguia and Ballester (2015) analyze the validity of stochastic choice models in
intertemporal and risky choice. They show the possibility of a fundamental problem
arising in the standard application of random utility models. Our models are free of their
criticism as our random discounting model belongs to the class of what they call random
parameter models.3 In a more recent paper, Apesteguia and Ballester (2016) propose an
axiomatic model of stochastic choice in which the collection of utility functions satisfies
the single-crossing property.
The rest of the paper is organized as follows. Section 2 introduces our model and
provides the main identification result. Section 3 provides the axioms for random dis3
This is demonstrated by our comparative statics in Section 5.
5
counting. Section 4 provides the axioms for the special cases of random exponential and
quasi-hyperbolic discounting. Section 5 provides our analysis of comparative statics. In
Section 6, we test our axioms using the data from Halevy (2015) and show that the discount distribution is stable across decision times. Unless otherwise stated, all proofs are
in the appendices.
2
Model
2.1
Primitives and Notation
We consider agents choosing an infinite-period stream of consumption lotteries. Let time
be denoted by T := {0, 1, 2, . . . }, that is, the set of all nonnegative integers. Let X be
some finite set of prizes. The consumption at each time period is given by a lottery in ∆X.
A consumption stream corresponds to a sequence of consumption lotteries f ∈ (∆X)T .4
Let H be the set of all possible consumption streams. While the use of lotteries to model
consumption allows us to obtain a straightforward characterization, any metric space would
also suffice.
Call a menu is a finite set of consumption streams. Let K be the set of all menus of consumption streams. We endow H with the product topology and K with the corresponding
Hausdorff metric.5 Given any menu F ∈ K, we let extF denote the extreme points of F .
Choice data in our model is a random choice rule (RCR) that specifies a choice distribution over consumption streams for every menu F ∈ K. Let ∆H be the set of all
measures over consumption streams and endow it with the topology of weak convergence.
Formally, a RCR is a function ρ : K → ∆H such that ρF (F ) = 1. We use the notation
ρF (f ) to denote the probability that consumption stream f will be chosen in the menu F .
4
While in our primitive, consumption lotteries are independent across time, our results would still hold if we
adopted a primitive that allowed for temporal correlations of lotteries and assumed that agents are indifferent
to randomization. We could accommodate preference for randomization by adopting a more general model such
as a random intertemporal version of Saito (2015).
5
The product topology corresponds to point-wise convergence in that fk → f if fk (t) → f (t) for all t ∈ T .
P
kf (t)−g(t)k
The corresponding metric can be defined as d(f, g) := t 21t 1+kf
(t)−g(t)k , where k · k is the Euclidian norm in
∆X. Alternatively, using uniform convergence would have resulted in a continuity axiom that would be too
weak. Of course, if T is finite, then both notions of convergence agree.
6
As in standard models of deterministic choice, we allow for indifferences by relaxing the
restriction that all choice probabilities are specified. In other words, we allow the RCR to
be silent about choice probabilities between indifferent streams in a menu.6 Let K0 ⊂ K
denote the set of menus without indifferences.
2.2
Representations
We now describe the general model. Call D : T → [0, 1] a discount function if it is
P
decreasing, D(0) = 1 and s>t D (s) → 0 as t → ∞. Let D be the set of all discount
functions.
Definition (Random Discounting Representation). ρ is said to have a Random Discounting Representation if there exists a probability measure µ on D and a vN-M function u on
∆X such that for all F ∈ K and f ∈ F
)
X
X
ρF (f ) = µ D ∈ D D (t) (u (f (t))) ≥
D (t) u (g (t)) for all g ∈ F .
(
t∈T
t∈T
We say that µ is regular if the random utilities of two consumption streams are either
always or never equal. In other words, if ties occur, then they occur almost surely.7
Regular distributions are dense in the set of all distributions. They are a relaxation of the
standard restriction in traditional random utility models where utilities are never equal
and allows us to handle indifferences. Going forward, we only consider regular µ and call
it a discounting distribution. If ρ has a Random Discounting Representation, we say that
it is represented by some (µ, u). The following shows that the discounting distribution can
be uniquely identified by only looking at binary choices over consumption streams.
Theorem 1. Let ρ and τ be represented by (µ, u) and (ν, v) respectively. Then the following
are equivalent.
(1) ρ{f,g} (f ) = τ{f,g} (g) for all f, g ∈ H
6
Formally, indifferences correspond to non-measurability with respect to a σ-algebra H on H. Given any
menu F , the corresponding choice distribution ρF is a measure on the σ-algebra generated by H ∪{F }. Without
loss of generality, we let ρ denote the outer measure with respect to this σ-algebra. See Lu (2015) for details.
7
Formally, this means that for all z ∈ [0, 1]T , D · z = 0 occurs with µ-measure zero or one.
7
(2) ρ = τ
(3) (ν, v) = (µ, αu + β) for some α > 0.
Proof. See Appendix B.1.
In a Random Discounting Representation, randomness in choice is driven by the
stochasticity of discounting. Each choice realization corresponds to a realization of the
discount function D ∈ D of an agent in the population. Note that for simplicity, our
model assumes that the vN-M utility u is the same across agents. In other words, the only
unobserved heterogeneity is due to discounting. This can be easily generalized so that u
is random as well, in which case, we would obtain a more general representation that is
characterized by a joint distribution over both discount functions and vN-M utilities.
While we have focused on the population interpretation of this random utility model,
one could alternatively use our model to describe the random choice of a single agent
choosing consumption streams repeatedly over time. In this case, the richness of our
model even allows us to accommodate learning by the agent. For example, suppose D (t) =
E [δ (1) · · · δ (t) |δ (1) ] where δ (t) is the actual discount factor at time t ∈ T . In this case,
the model describes an agent who learns about future discount factors from the realization
of the current discount factor δ (t). If δ corresponds to interest rates for instance, then the
agent updates about future interest rates given today’s interest rate.
Two natural special cases of the representation are the following.
Definition. A discount distribution µ is
(1) exponential if and only if µ-a.s. for each t ∈ T
D(t) = δ t
for some δ ∈ (0, 1).
(2) quasi hyperbolic if and only if µ-a.s. for each t > 0
D(t) = βδ t
for some δ ∈ (0, 1) and β ∈ [0, 1].
8
3
Axioms for Random Discounting
We now introduce the axioms for our model. The first set of axioms are standard.
Axiom (Monotonicity). For any F, G ∈ K, if G ⊂ F , then ρG (f ) ≥ ρF (f ).
Axiom (Linearity). For any F ∈ K, g ∈ H, and a ∈ (0, 1),
ρF (f ) = ρaF +(1−a)g (af + (1 − a) g) .
Axiom (Extremeness). For any F ∈ K, ρF (extF ) = 1.
Axiom (Continuity). ρ : K0 → ∆H is continuous.
Axiom (Nondegeneracy). ρF (f ) < 1 for some F and f ∈ F .
In our paper, the only residual unobserved heterogeneity from the perspective of the
modeler is due to discounting. If the consumption streams are different only at period 0,
then the choices over such consumption streams must be deterministic. This requirement
is formalized by the following axiom.
Axiom (Initial Determinism). For any F ∈ K and any f, g ∈ F , if f (t) = g(t) for all
t > 0, then ρF (·) ∈ {0, 1}.
We now introduce some useful notation. Given any two consumption streams f and g
and time period t ∈ T , define the spliced consumption stream f tg such that
f tg (s) =
f (s)
if s < t,
g (s − t)
if s ≥ t.
Thus, f tg is the consumption stream that is f up to period t − 1 and then restarts with
g from t onwards. In other words,
f tg = (f (0) , f (1) , . . . , f (t − 1) , g (0) , g (1) , . . . ) .
9
For F = {f1 , . . . , fk } and G = {g1 , . . . , gk }, we can also define the spliced menu
F tG :=
k
[
fi tgi .
i=1
Note that the notation F tG is well-defined if either F or G are singletons. Also note
that the sequence of menus F tg converges to the menu F as t → ∞ under the product
topology. By the Continuity axiom, ρF tg → ρF . In other words, only consumptions in
finite time matter.
For any lottery p ∈ ∆X, we let p ∈ H denote the constant consumption stream that
yields p every period. Given Initial Determinism and the fact that the set of prizes is
finite, we can pin down preferences using time 0 choice data and find a worst consumption
stream w ∈ H where w is a constant consumption stream and for all f, g ∈ F ,
ρ{f 1g,w1g} (f 1g) = 1.
Lemma 2 in the Appendix shows that given the standard axioms, the worst consumption
stream is well-defined. We can now define our next axiom.
Axiom (Time Monotonicity). For all F ∈ K and f ∈ F , if ρF (t)1w (f (t) 1w) = 1 for all
t ∈ T , then ρF (f ) = 1.
Time Monotonicity says that if the consumption at every time period of a stream is
the best in a menu, then that stream must be chosen for sure. It is the natural temporal
analog of standard monotonicity axioms.
Finally, we define delayed consumptions. For any f ∈ H and t ∈ T , let f t := wtf .
Hence, f t is a consumption stream that consists of f delayed by t and with w at the
beginning. In other words,
f t = (w, . . . , w, f (0), f (1), f (2), . . . ).
| {z }
t
Stochastic Impatience below states that earlier streams are always chosen over delayed
ones.
10
Axiom (Stochastic Impatience). For any f ∈ H and t ∈ T , ρ{f,f t } (f ) = 1
We are now ready to state our general representation theorem.
Theorem 2. ρ has a Random Discounting Representation if and only if it satisfies Monotonicity, Linearity, Extremeness, Continuity, Nondegeneracy, Initial Determinism, Time
Monotonicity and Stochastic Impatience.
Proof. See Appendix B.2.
The proof of Theorem 2 consists of two steps. First, we use the standard arguments
to obtain a representation for menus that consist of streams that yield non-worst consumptions only in a finite number of time periods. We then use Kolmogorov’s Extension
theorem and Continuity to obtain the representation for all menus.
4
Random Exponential and Quasi-Hyperbolic
Discounting
In order to characterize the random exponential discounting model, we need two more
S
axioms. For any F ∈ K and t ∈ T , define F t =: f ∈F f t .
Axiom (Stochastic Stationarity). For any f ∈ H and t ∈ T ,
ρF (f ) = ρF t f t .
This is the random choice version of the deterministic stationarity axiom as proposed by
Koopmans (1960). However, we need something more for characterizing a representation of
random exponential discounting (see example below). First, define a forward consumption
1
f −1 by f −1 = f . Note that f −1 is well defined if and only if f (0) = w.
Axiom (Intertemporal Extremeness). For all f, g, h ∈ H, and a ∈ [0, 1], if f = ag −1 +
(1 − a) w and g = ah−1 + (1 − a) w, then
ρ{f,g,h} ({f, h}) = 1.
11
To understand Intertemporal Extremeness, note that there are two aspects to intertemporal choices: the level of consumption and the timing of consumption. The condition
g = ah−1 + (1 − a) w (or f = ag −1 + (1 − a) w) imply that there is a trade-off between
these two aspects when an agent is choosing between g and h (or f and g). By choosing
g over h (or f over g), the agent consumes one period earlier but his level of consumption
decreases due to the mixing with the worst outcome. In this sense, one can think of f and
h as the extreme choices; f is the best in terms of consumption timing but worst in terms
of consumption level while h is the best in terms of consumption level but worst in terms
of consumption timing. Stream g is in between f and h and is the intermediary choice.
Intertemporal Extremeness states that g will never be chosen.
Technically, Intertemporal Extremeness is the intertemporal analog of the extremeness
axiom proposed by Gul and Pesendorfer (2006). In other words, just like how linearity by
itself is insufficient and requires extremeness for a random expected utility representation,
stationarity by itself is insufficient and requires intertemporal extremeness for a random
exponential representation (see Appendix A for a precise statement of this relationship).
Theorem 3. Let ρ be represented by (µ, u). Then µ is exponential if and only if ρ satisfies
Stochastic Stationarity and Intertemporal Extremeness.
Proof. See Appendix B.3.
We now provide an example of random discounting representation that satisfies Stochastic Stationarity but is not exponential. Such an example would clarify the importance of
Intertemporal Extremeness. For each ω ∈ [0, 1] define
exp(−2n)
Dω (t) =
exp(−2n −
if t = 2n,
1
2
− ω)
if t = 2n + 1.
In other words, Dω = 1, exp(− 21 − ω), exp(−2), exp(− 25 − ω), exp(−4), exp(− 29 − ω), . . . .
Let µ be a random discounting representation which is uniform over {Dω |ω ∈ [0, 1]}.
Proposition 1. (i) µ is not exponential but satisfies Stochastic Stationarity; (ii) µ violates
Intertemporal Extremeness.
12
log D ω (t)
0
log D 1
−2
log D 0
−4
0
1
2
3
4
t
Figure 4.1: Non Exponential µ which satisfies Stochastic Stationarity
Proof. See Appendix B.4.
The formal proof is in the Appendix. Here, we provide the sketch as to why µ satisfies
Stochastic Stationarity. If t is even, then (Dω (t), Dω (t+1), . . . ) = exp(−t)(Dω (0), Dω (1), . . . ).
Then, for any f, g ∈ H, Dω · u ◦ f t ≥ Dω · u ◦ g t ⇔ Dω · (u ◦ f ) ≥ Dω · (u ◦ g). Note
that this equivalence is captured in Figure 4.1 by the fact that the slope of log Dω is the
same between periods from 0 to 1 and periods from 3 to 4, for example. Therefore, when t
is even, each realization Dω predicts no violation of the deterministic stationarity axiom.
If t is odd, then (Dω (t), Dω (t + 1), . . . ) = exp(−t)(D1−ω (0), D1−ω (1), . . . ). Then, for
any f, g ∈ H, Dω · u ◦ f t ≥ Dω · u ◦ g t ⇔ D1−ω ·(u ◦ f ) ≥ D1−ω ·(u ◦ g). Note that this
equivalence is captured in Figure 4.1 by the fact that the slope of log D1 between periods
from 0 to 1 is the same as the slope of log D0 between periods from 1 to 2, for example.
Therefore when t is odd, Dω predicts a violation of the deterministic stationarity axiom if
and only if D1−ω predicts the opposite direction of the violation. The two reversals cancel
each other out and “on average” Stationarity is satisfied, which implies that Stochastic
Stationarity is satisfied.
For the random quasi-hyperbolic discounting model, we need to weaken Stochastic
Stationarity. In particular, suppose that Stochastic Stationarity holds only for the menus
13
in which all consumptions at time 0 are the same.
Axiom (Weak Stochastic Stationarity). For any F ∈ K and t ∈ T , if f (0) = g(0) for any
f, g ∈ F , then
ρF (f ) = ρFt f t .
The deterministic version of this axiom has appeared in Hayashi (2003) and Olea and
Strzalecki (2014). In our model, Weak Stochastic Stationarity along with Intertemporal
Extremeness exactly characterize random quasit-hyperbolic discounting.
Theorem 4. Let ρ be represented by (µ, u). Then µ is quasi-hyperbolic if and only if ρ
satisfies Weak Stochastic Stationarity and Intertemporal Extremeness.
Proof. See Appendix B.3.
5
Comparative Statics
We now present some comparative statics for our random discounting model. First, consider two RCRs ρ and τ with worst consumption streams wρ and wτ respectively. A
consumption stream f is one-shot under ρ if f (t) = wρ for all t > 0 and ρ{wρ ,f } (wρ ) = 0.
In other words, f gives the ρ-worst payoff for all future time periods and something strictly
better than the ρ-worst payoff initially. We say one RCR is stochastically more patient than
another if the probability that the first chooses a delayed one-shot consumption stream is
always greater than the second.
Definition. ρ is stochastically more patient than τ if for any f and g that are one-shot
under ρ and τ respectively, s > t and a ∈ [0, 1],
ρ f s , af t + (1 − a) wρ ≥ τ g s , ag t + (1 − a) wτ ,
where wρ and wτ are the worst streams for ρ and τ respectively.
Given two discount distribution µ and ν, let µ ν denote the fact that for all s, t ∈ T
such that s > t the distribution of D (s) /D (t) under µ first-order stochastically dominates
14
(FOSD) its distribution under ν. This exactly captures the ordering of distributions of
discount factors according to the level of patience. We now have the following result.
Proposition 2. Let ρ and τ be represented by (µ, u) and (ν, v) respectively. Then µ ν
if and only if ρ is stochastically more patient than τ .
Proof. Consider the constant consumption streams r, p ∈ H and let f = r1wρ and g =
p1wτ . Now, for s > t,
ρ f s , af t + (1 − a) wρ ≥ τ g s , ag t + (1 − a) wτ
⇔ µ {D ∈ D | D (s) u (r) ≥ D (t) au (r) } ≥ ν {D ∈ D | D (s) v (p) ≥ D (t) av (p) }
⇔ µ {D ∈ D | D (s) ≥ D (t) a } ≥ ν {D ∈ D | D (s) ≥ D (t) a } ,
where the last line follows from the fact u(r) > 0 and v(p) > 0 because ρ{wρ ,f } (wρ ) =
ρ{wτ ,g} (wτ ) = 0.
Note that this immediately implies the following result that allows the modeler to
compare FOSD of exponential discounting distributions using random choice.
Corollary 1. Let ρ and τ be represented by (µ, u) and (ν, v) respectively where both µ and
ν are exponential. Then µ FOSD ν if and only if ρ is stochastically more patient than τ .
Proof. Follows immediately from Proposition 2 above.
6
Testing the Axioms
In this section, we report the empirical results from testing our axioms using the data from
Halevy (2015). Our main finding is that the axioms for random exponential discounting
are not rejected. Moreover, since we have choice data from the same population of subjects
at two different time periods, we can elicit the distribution of discount factors at each time
period. Surprisingly, we find that they are statistically indistinguishable across the two
decision times.
15
In the experiments of Halevy (2015), subjects are asked to choose between a sooner
but smaller consumption and a later but larger consumption at two different time periods
(i.e., week 0, week 4). In the questions, the payoffs and consumption times vary as follows:
At week 0 : Question 1: [$10; now]
or
[$x; 1 week later],
At week 0 : Question 2: [$10; 4 weeks later]
or
[$x; 5 weeks later],
At week 4 : Question 3: [$10; now]
or
[$x; 1 week later],
where x varies from 9.90 to 10.90. (In these questions, the smaller payoff is fixed at $10.
Halevy also repeated the same questions but with all payments scaled up (e.g. the smaller
payoff is $100 instead of $10). We also performed the same analysis using the data from
these additional questions. Our results do not change (see Appendix C).
Note that the difference between Questions 1 and 2 is that the consumptions in the
latter are delayed by exactly four weeks. Hence, the deterministic stationarity axiom would
imply that a subject would choose the sooner but smaller consumption in Question 1 if
and only if he chooses the sooner but smaller consumption in Question 2. Halevy (2015)
found that about 40% of the subjects violate the deterministic stationarity axiom.
Note also that the consumptions in Questions 2 and 3 are the same both in terms of
the size of the prizes and the time period in which they are realized. They are different
only in terms of the decision time of the subjects. A subject is time consistent if he does
not reverse his choices between the two decision times. Halevy (2015) found that about
half of the subjects exhibit time inconsistency.
Halevy (2015) called the above set of experiments “Wave 1”. He also conducted the
same experiment using different subjects at different times in a second set of experiments
which he called “Wave 2”. In total, 130 subjects participated in all the experiments. Among
these subjects, 10 had multiple switching points. As in Halevy (2015), we exclude them
from our analysis.
For each Wave and Question, we test whether the random choice induced by the subjects satisfy the axioms we proposed. In particular, we focus on Stochastic Stationarity,
Stochastic Impatience, and Time Monotonicity since given Halevy’s data, these were the
only axioms we could test. As a robustness check, we also performed the same analysis
16
using the combined data from Waves 1 and 2. Our results do not change (see Appendix
C for details).
Stochastic Stationarity requires consistency between Questions between 1 and 2 as
follows. To simplify notation, let ρ(x, y) denote ρ{x,y} (x). In Questions 1 and 2, Stochastic
Stationarity requires that for any x ∈ {9.9, . . . , 10.9},
ρ([$x; 1 week later], [$10; now]) = ρ([$x; 5 weeks later], [$10; 4 weeks later]).
(6.1)
We statistically test (6.1) for Waves 1 and 2, and it is not rejected at the 10% significance
level in both Waves for all values of x (see Appendix C for p-values). This result is
surprising given that at the individual level, Halevy (2015) finds that 40% of the subjects
violate the deterministic stationarity axiom. Halevy (2015) does find that the means of
aggregated choices satisfy deterministic stationarity, and while this is consistent with our
results, our findings are stronger in that they are not implied by his results.
Stochastic Impatience requires the following conditions on the choice data:
ρ([$10; 1 week later], [$10; now]) = 0,
(6.2)
ρ([$10; 5 weeks later], [$10; 4 weeks later]) = 0.
In other words, subjects should all prefer earlier to later prizes. We find that (6.2) is not
rejected at the 10% significance level in each Wave (see Appendix C for p-values).
Time Monotonicity together with Stochastic Impatience requires the following conditions on the choice data:
ρ([$9.9; 1 week later], [$10; now]) = 0,
(6.3)
ρ([$9.9; 5 weeks later], [$10; 4 weeks later]) = 0.
This provides an indirect test of Time Monotonicity and we find that (6.3) is not rejected
at the 10% significance level in each Wave (see Appendix C for p-values).
In summary, the three axioms that we set out to test, Stochastic Stationarity, Stochastic
Impatience, and Time Monotonicity, are all not rejected at the 10% significance level. Next,
we elicit the distribution of discount factors µ using the data. We assume risk neutrality
17
Wave 1 − Base 10
Wave 2 − Base 10
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0.9
0.95
1
1.05
Discount factor
1.1
0
0.9
1.15
0.95
1
1.05
Discount factor
1.1
1.15
Figure 6.1: µ0 (blue) and µ4 (green) in left; µ1 (blue) and µ5 (green) in right
and elicit the switching points x at which subjects prefer consuming x later to consuming
a fixed amount immediately. We assume that these switching points correspond to when
subjects are indifferent and define each subject’s discount factor as follows. If a subject
is indifferent between [$10, now] and [$x; 1 week later], we define the subject’s discount
factor as 10/x.
Following this methodology, we elicit the distribution µ according to the week when
the experiment was conducted (i.e., weeks 0 and 4 in Wave 1; weeks 1 and 5 in Wave 2).
For each Wave, we test whether the distributions are statistically indistinguishable across
decision times. Using the Kolmogorov-Smirnov test, we test the following hypotheses
separately at the 10% significance level:
µ0 = µ4 , µ 1 = µ5 ,
(6.4)
where µt denotes the distribution elicited from the experiment conducted at week t ∈
{0, 1, 4, 5}. Figures 6.1 displays the corresponding distributions for the four tests. Both
equations in (6.4) are not rejected (see Appendix C for p-values).
This last finding is of interest especially given Halevy’s result that about the half of the
subjects exhibit time inconsistency. In other words, there is a high degree of preference
reversals between the two decision times at the individual level, but when we look at
random choice at the aggregate level, this inconsistency disappears. A priori, there is no
18
reason why the aggregated random choice data should satisfy (6.4). This demonstrates
how models of random discounting can deliver insights not easily captured by models of
individual deterministic choice. It also illustrates potential applications of random choice
models in terms of prediction. For instance, while individual choice data may be highly
inconsistent, a theory of random choice may reveal patterns of choice at the aggregate
level that would be useful for inference for the modeler.
One may wonder whether our findings are consistent with half the subjects in the
population being exponential discounters while the remaining being present or futurebiased. While this is technically possible, in order to satisfy Stochastic Stationarity, it
would require that the fraction of present-biased subjects be exactly equal to the fraction
of future-biased subjects. Moreover, it would also mean that the magnitudes of present
and future-biasedness are such that they cancel each other out exactly. While unlikely, this
could be possible if there was symmetric noise around discount factors in the population.
Note that this is exactly the example we consider in Proposition 1. Hence, testing for
Intertemporal Extremeness would allow us to unequivocally determine if all subjects are
exponential discounters or not.
19
A
Appendix: Intertemporal Extremeness
In this section, we argue that Intertemporal Extremeness plays an analogous role as Extremeness in the random expected utility model of Gul and Pesendorfer (2006). Let X be
a finite set and ∆X be the set of lotteries over X. Let C and C 0 be finite sets of lotteries.
We say C is a translate of C 0 if and only if C = C 0 + (p − q) for some p ∈ C and q ∈ C 0 .8
First, note that in the lottery setup, Stochastic Stationarity is equivalent to Linearity∗ , a
weaker condition than Linearity.
Axiom (Linearity∗ ). ρC (f ) = ρC 0 (f 0 ) if C and f are translates of C 0 and f 0 respectively.
Clearly, Linearity implies Linearity∗ . There are random non-expected utility representations that yield random choice rules that satisfy Linearity∗ but not Extremeness. We
now describe one such example. Let X = {x, y} so we can associate each lottery with a
point p ∈ [0, 1]. Let ω be uniformly distributed on [0, 1] and let
uω (p) := |p − ω| ,
vω (p) := − |p − ω| .
Consider a random utility that puts 12 weight on uω and 21 weight on vω . To show that
this violates Extremeness, let C = 0, 21 , 1 . Since the mixed lottery 12 is never chosen in
C under uω , we have that
1
1
ρC
= · P ω ∈ [0, 1]
2
2
1
1
vω
≥ max {vω (0) , vω (1)} = > 0.
2
4
To show that this satisfies Linearity∗ , suppose C = {p1 , . . . , pk } with p1 < p2 < · · · < pk .
Now, for each pi such that 1 < i < k, we have
1
ρC (pi ) =
2
8
pi+1 − pi−1
2
More explicitly, there exists p ∈ C and q ∈ C 0 such that C = {r + p − q | r ∈ C 0 }.
20
which is unchanged if we translate C. For p1 , we have
1
ρC (p1 ) =
2
p1 + p2
2
1
+
2
p1 + pk
1
pk − p2
1−
=
1−
2
2
2
which is again unchanged if we translate C. By symmetric argument, the same holds for pk
as well, so Linearity∗ is satisfied, but this is clearly not a random expected utility model.
By imposing Extremeness however, we are able to obtain a random expected utility
representation.
Proposition 3. Let ρ be represented by µ. Then µ is expected utility if and only if ρ
satisfies Linearity∗ and Extremeness
Proof. See Appendix B.5.
In other words, Extremeness provides the additional restrictions to ensure that the
random utilities are linear. By analogy, Intertemporal Extremeness plays the same role in
our model and is hence aptly named.
B
Appendix: Proofs
Recall that T = {0, 1, . . . , ∞}. In the following, we will write ρ (f, g) to denote ρ{f,g} (f )
for any f, g ∈ H. For every D ∈ [0, 1]T , f ∈ H, and vN-M utility function u on ∆X, we
use the condensed notation
D · (u ◦ f ) :=
∞
X
D (t) u (f (t))
t=0
whenever the limit is well-defined, which may be infinite. Note that this converges for
P
all D ∈ D since s>t D (s) → 0 as t → ∞ and u is bounded since X is finite. Given
consumption streams f, g ∈ H and t ∈ T , recall the spliced consumption stream
f tg (s) =
f (s)
if s < t,
g (s − t)
if s ≥ t.
21
When F = {f1 , . . . , fk } and G = {g1 , . . . , gk }, F tG denotes the spliced menu. Note that
this is also well-defined if either F or G is a singleton menu. Finally, recall that for any
p ∈ ∆X, we also let p ∈ H denote the constant consumption stream that yields p every
period.
B.1
Proof of Theorem 1
Let ρ and τ be represented by (µ, u) and (ν, v) respectively. Note that if part (3) is true,
then ρF (f ) = τF (f ) for all f ∈ H from the representation. Moreover, since ρ (f, g) =
ρ (g, f ) = 1 iff τ (f, g) = τ (g, f ) = 1 iff f and g are tied, both RCRs have the same ties
so ρ = τ and part (2) is true. Since part (2) implies part (1) trivially, we have that (3)
implies (2) implies (1).
Hence, all that remains is to prove that part (1) implies part (3). Suppose (1) is
true so ρ{f,g} (f ) = τ{f,g} (g) for all f, g ∈ H. First, note that for any p, q, r ∈ ∆X,
u (p) ≥ u (q) ⇔ µ { D ∈ D | u (p) ≥ u (q)} = 1 ⇔ ρ (p1r, q1r) = 1 ⇔ τ (p1r, q1r) = 1 ⇔
τ { D ∈ D | v (p) ≥ v (q)} = 1 ⇔ v (p) ≥ v (q), so u = αv + β for some α > 0. Without
loss of generality, we can let u = v and w ∈ ∆X be the worst stream for both ρ and τ .
Fix some finite J ⊂ T and let f ∈ H be such that f (t) = w for all t 6∈ J. Let p ∈ ∆X
such that u (p) = v (p) = a ∈ [0, 1] and note that
)
X
D (t) u (f (t)) ≥ a = ρ (f, p1w)
µ D∈D (
t∈J
= τ (f, p1w)
)
(
X
D (t) v (f (t)) ≥ a .
=ν D∈D t∈J
Since this is true for all a ∈ [0, 1] and such f , it must be that the distribution of
P
J
t∈J D (t) z (t) for all z ∈ [0, 1] must be the same under µ and ν. Note we can easily extend this for all z ∈ RJ+ by scaling so by Cramer-Wold, (D (t))t∈J has the same
distribution under µ and ν.9 Since this is true for all J ⊂ T , by Kolmogorov’s Extension
For each z ∈ RJ+ we can find k ∈ Z++ such that z/k ∈ [0, 1]J . Define µ(D ∈ DJ |D · z ≥ a) = µ(D ∈
D |D · (z/k) ≥ a/k). Note that the definition does not depend on k.
9
J
22
Theorem, µ = ν. This proves (3).
B.2
Proof of Theorem 2
B.2.1
Worst Consumption Stream is Well-Defined
We first prove that the worst consumption stream w is well-defined. First, we prove a
technical lemma showing that under linearity, we can show the following.
Lemma 1. If ρ satisfies Linearity, then ρ (p1f, q1f ) = ρ (p1g, q1g) for all p, q ∈ ∆X and
f, g ∈ H.
Proof. Let r := 12 p + 12 q and note that
1
(p1f ) +
2
1
(q1f ) +
2
1
1
1
1
(q1g) = r1
f + g = (p1g) +
2
2
2
2
1
1
1
1
(q1g) = q1
f + g = (q1g) +
2
2
2
2
1
(q1f ) ,
2
1
(q1f ) .
2
By Linearity, this implies that
1
1
ρ (p1f, q1f ) = ρ
(p1f ) + (q1g) ,
2
2
1
1
=ρ
(p1g) + (q1f ) ,
2
2
1
1
(q1f ) + (q1g)
2
2
1
1
(q1g) + (q1f ) = ρ (p1g, q1g)
2
2
as desired.
We can now show that the worst consumption stream w ∈ H is well-defined.
Lemma 2. Suppose ρ satisfies Monotonicity, Linearity, Extremeness, Continuity and Initial Determinism. Then there exists a constant consumption stream w ∈ H such that
ρ (f 1g, w1g) = 1 for all f, g ∈ H.
Proof. Fix some consumption lottery r ∈ ∆X. Consider the random choice rule τ on ∆X
such that for any finite set of lotteries C ⊂ ∆X and p ∈ C,
τC (p) = ρC1r (p1r) .
23
Note that by Initial Determinism, τ is deterministic. Hence, from Lu (2015), τ can be
represented by a deterministic expected utility u on ∆X. Let w ∈ ∆X be some worst
lottery according to u. Note that w exists as X is finite. Let w ∈ H denote the constant
consumption stream that yields w every period. From Lemma 1, this implies that for any
f, g ∈ H, ρ (f 1g, w1g) = ρ (f 1r, w1r) = τ (f (0) , w) = 1, as desired.
B.2.2
Sufficiency
In order to prove that a Random Discounting Representation exists, we first prove it exists
for a subset of menus. For each finite J ⊂ T such that 0 ∈ J, let H J be the subset of
menus such that f (t) = w for all t 6∈ J, where the existence of w follows from Lemma
2. Let KJ ⊂ K be the subset of menus that only contain streams in H J . Hence, we can
define a RCR ρJ on KJ such that for all F ∈ KJ and f ∈ F ,
ρJF (f ) = ρF (f ) .
By the same argument as in Lu (2015), for every finite J, we can find a measure ν J on
∆J and a vN-M utility u on ∆X such that for every F ∈ KJ and f ∈ F
ρJF (f ) = ν J { p ∈ ∆J | p · (u ◦ f − u ◦ g) ≥ 0 for all g ∈ F }
Note that Initial Determinism and Time Monotonicity imply that this u is fixed and
independent of J. We normalize u : ∆X → [0, 1] such that u (w) = 0. Choose f ∈ H
such that u (f (t)) = 1 for some t ∈ T and f (s) = w for all s 6= t. Then by Stochastic
Impatience, for any J such that {t, t + 1} ⊂ J, we have
1 = ρ f, f 1 = ρJ f, f 1 = ν J { p ∈ ∆J | p (t) ≥ p (t + 1)} .
Hence, p is decreasing ν J -a.s. for all finite J where 0 ∈ J. For any J ⊂ T such that 0 ∈ J,
let DJ ⊂ [0, 1]J be such that D (0) = 1 for all D ∈ DJ . We can define a measure µJ on
24
DJ such that for every F ∈ KJ and f ∈ F ,
ρJF (f ) = µJ D ∈ DJ D · (u ◦ f − u ◦ g) ≥ 0 for all g ∈ F .
We now extend this representation from any finite J to all of T by using Kolmogorov’s
Extension Theorem. Hence, we need to check for the following consistency condition. Let
0 ∈ S ⊂ J ⊂ T . For any F ∈ KS and f ∈ F , µS D ∈ DS D · (u ◦ f − u ◦ g) ≥ 0 for all g ∈ F =
ρSF (f ) = ρF (f ) = ρJF (f ) = µJ D ∈ DJ D · (u ◦ f − u ◦ g) ≥ 0 for all g ∈ F .
Let f ∈ H S and p ∈ ∆X such that u (p) = a ∈ [0, 1]. Since p1w ∈ H S , we then have
µS D ∈ DS D · (u ◦ f ) ≥ a = µJ D ∈ DJ D · (u ◦ f ) ≥ a .
In other words, for all z ∈ [0, 1]S , the distribution of D · z under µS is the same as that
under µJ . As in the proof of Theorem 1, we can easily extend this for all z ∈ RS+ so by
Cramer-Wold, it must be that µS is exactly the projection of µJ on DS . Formally, if we
let χJS : DJ → DS be the projection mapping from DJ to DS , then
µS = µJ ◦ χ−1
JS .
Hence, from Kolmogorov’s Extension Theorem, we know there exists a measure µ on DT
such that for any finite J ⊂ T and F ∈ KJ , ρF (f ) = µJ D ∈ DJ D · (u ◦ f − u ◦ g) ≥ 0 for all g ∈ F =
µ D ∈ DT D · (u ◦ f − u ◦ g) ≥ 0 for all g ∈ F .
We now need to generalize the representation for all F ∈ K. First, for every f ∈ F ∈ K
and finite t ∈ T , define the following two sets of maximizing discount functions
N (f, F ) := D ∈ DT
N t (f, F ) := D ∈ DT
D · (u ◦ f − u ◦ g) ≥ 0 for all g ∈ F ,
D · (u ◦ (f tw) − u ◦ (gtw)) ≥ 0 for all g ∈ F .
Note that N (f, F ) is well-defined only if D · (u ◦ f − u ◦ g) is well-defined for all f, g ∈ F .
Lemma 3. Suppose D · (u ◦ f − u ◦ g) is well-defined for all f, g ∈ F and D ∈ DT . Then
(1) ρF (f ) = µ (N (f, F )) for all f ∈ F ,
25
(2) µ D ∈ DT D · (u ◦ f − u ◦ g) = 0 ∈ {0, 1} for all f, g ∈ F .
Proof. We first show that if the premise holds, then ρF (f ) ≤ µ (N (f, F )). In order to
show this, we prove that lim supt 1N t (f,F ) (D) ≤ 1N (f,F ) (D) for all D ∈ DT . Suppose
0
lim supt 1N t (f,F ) (D) = 1 so for any t ∈ T , we can find some t0 > t where D ∈ N t (f, F )
or
X
s≤t0
D (s) · (u (f (s)) − u (g (s))) ≥ 0
for all g ∈ F . Since D · (u ◦ f − u ◦ g) is well-defined for all f, g ∈ F and D ∈ DT , this
implies that
D · (u ◦ f − u ◦ g) = lim
t
X
s≤t
D (s) · (u (f (s)) − u (g (s))) ≥ 0
for all g ∈ F so D ∈ N (f, F ). Hence, lim supt 1N t (f,F ) (D) ≤ 1N (f,F ) (D). By Fa
R
tou’s Lemma, limt ρF tw (f tw) = limt µ N t (f, F ) ≤ DT lim supt 1N t (f,F ) (D) µ (dD) ≤
R
DT 1N (f,F ) (D) µ (dD) = µ (N (f, F )). Since F tw → F , by Continuity, this implies that
ρF (f ) = lim ρF tw (f tw) ≤ µ (N (f, F ))
t
(B.1)
as desired.
Before completing the proof of part (1), we will now prove part (2). Fix f, g ∈ F
and note that if f and g are tied, then from equation (B.1), we have 1 = ρ (f, g) ≤
µ (N (f, {f, g})) and 1 = ρ (g, f ) ≤ µ (N (g, {f, g})) so µ D ∈ DT D · (u ◦ f − u ◦ g) = 0 =
1.
Now, suppose f and g are not tied. Let r ∈ ∆X be such that u (r) = 1. By linearity,
we can assume without loss of generality that
1
2 u (f
(0)) + 21 u (g (0)) < u (r). For any
ε > 0, let pε ∈ ∆X be such that u (pε ) = 12 u (f (0)) + 21 u (g (0)) + ε and define hε ∈ H
such that hε (0) = pε and hε (t) = 21 f (t) + 21 g (t) for all t > 0. Now, for all D ∈ DT ,
D·(u ◦ f − u ◦ hε ) = D· u ◦ f − u ◦
1
1
1
f + g − (ε, 0, 0, . . . ) = D·(u ◦ f − u ◦ g)−ε,
2
2
2
which is well-defined as D · (u ◦ f − u ◦ g) is well-defined.
26
By symmetric argument, D · (u ◦ g − u ◦ hε ) = 21 D · (u ◦ g − u ◦ f ) − ε. For all positive
number ε, define Fε = {f, g, hε }. Then,
N (f, Fε ) = D ∈ DT
N (g, Fε ) = D ∈ DT
D · (u ◦ f − u ◦ g) ≥ 2ε ,
D · (u ◦ g − u ◦ f ) ≥ 2ε .
Note that N (f, Fε ) ∩ N (g, Fε ) = Ø as ε > 0. Now, from equation (B.1) again, we have
ρFε (f ) + ρFε (g) ≤ µ (N (f, Fε )) + µ (N (g, Fε )) = µ (N (f, Fε ) ∪ N (g, Fε )) ≤ 1.
Consider a sequence of menus Fεi as εi → 0. Suppose there are three menus Fεi , Fεj ,
and Fεk in this sequence that are not in K0 . Since f and g are not tied, we can assume
without loss of generality that hεj and hεk are both tied with f (the case for both tied
with g is symmetric). Hence, hεj and hεk must be tied, so hεj and
1
2f
+ 21 g must be
tied. By Linearity, this implies that r1w is tied with w, contradicting the representation
from above. Hence, there cannot be more than two menus in this sequence that are not
in K0 . So we can always remove menus Fεi that are not in K0 . Hence, we can assume
that Fεi ∈ K0 for all i without loss of generality. By Continuity, we thus have that
1 = ρ (f, g) + ρ (g, f ) = limi ρFεi (f ) + ρFεi (g) ≤ limi µ (N (f, Fεi ) ∪ N (g, Fεi )). Hence,
µ D ∈ DT D · (u ◦ f − u ◦ g) = 0 = lim µ D ∈ DT − 2εi < D · (u ◦ f − u ◦ g) < 2εi
i
= 1 − lim µ (N (f, Fεi ) ∪ N (g, Fεi )) = 0.
i
This proves part (2) of the lemma.
We now return to the proof of part (1). Suppose that the inequality in equation (B.1)
is strict for some f ∈ F . Let F ∗ ⊂ F be the subset of streams in F that are not tied. If
we sum over all the non-tied streams F ∗ , then
1=
X
g∈F ∗
ρF (g) <
X
g∈F ∗
µ (N (g, F )) ≤ 1,
where the last inequality follows from part (2) as F ∗ contains no ties. Since this cannot
27
be true, it must be that ρF (f ) = µ (N (f, F )) for all f ∈ F . This completes the proof for
the lemma.
We now complete the sufficiency proof. Let r ∈ ∆X such that u (r) = 1 and note that
wtr → w as t → ∞. Now, for every D ∈ DT ,
St (D) := D · (u ◦ (wtr)) =
X
D (s)
s≥t
is well-defined, which may be infinite. Hence, by part (1) of Lemma 3 and Continuity,
we have 1 = limt ρ (w, wtr) = limt µ D ∈ DT | St (D) = 0 , as {w, wtr} → {w}. Since
S is decreasing in t, limt→∞ St is well-defined, although it could be infinite. Moreover, if
St (D) = 0 for some t ∈ T , then limt0 →∞ St0 (D) ≤ St (D) = 0. So for all D ∈ DT ,
lim sup 1{St (D)=0} (D) ≤ 1{limt→∞ St (D)=0} (D) .
t
R
R
By Fatou’s Lemma again, 1 = limt DT 1{St (D)=0} (D) µ (dD) ≤ DT lim supt 1{St (D)=0} (D) µ (dD) ≤
R
P
T | lim
t→∞ St (D) = 0 . Hence, limt→∞
s≥t D (s) =
DT 1{limt→∞ St (D)=0} (D) µ (dD) = µ D ∈ D
0 µ-a.s.. Note this implies that D · (u ◦ f − u ◦ g) converges for all f, g ∈ F . Since D is
decreasing µ-a.s. follows trivially from Stochastic Impatience, µ (D) = 1. Hence, by part
(1) of Lemma 3, we have for all F ∈ K and f ∈ F ,
ρF (f ) = µ { D ∈ D | D · (u ◦ f ) ≥ D · (u ◦ g) for all g ∈ F } .
Moreover, the regularity of µ follows from part (2) of Lemma 3. We thus have a Random
Discounting Representation as desired.
B.2.3
Necessity
We now prove necessity of the axioms under a Random Discounting Representation. Note
that Monotonicity, Linearity, Extremeness and Non-degeneracy follows by similar argument as in Lu (2015). To see Initial Determinism, note that if f (t) = g (t) for all t > 0 and
f, g ∈ F , then for any f ∈ F , ρF (f ) = µ { D ∈ D | u (f (0)) ≥ u (g (0)) for all g ∈ F } ∈
28
{0, 1} , as desired. To see Time Monotonicity, note that for f ∈ F , if u (f (t)) ≥ u (g (t))
for all g ∈ F , ρF (f ) = µ { D ∈ D | D · (u ◦ f − u ◦ g) ≥ 0 for all g ∈ F } = 1, as desired.
To see Stochastic Impatience, note that
ρ f, f t = µ D ∈ D | D · u ◦ f − u ◦ f t ≥ 0
(
)
X
=µ D∈D|
(D (s) − D (s + t)) u(f (s)) ≥ 0 = 1
s∈T
as D is decreasing µ-a.s..
Finally, we prove Continuity. Let Fk → F where Fk , F ∈ K0 . Note that for any
f, g ∈ Fk , f and g are not tied. Since µ is regular, this implies that D · (u ◦ f − u ◦ g) = 0
with µ-measure zero. Now, define
I :=
[
f,g∈Fk ∪F
{ D ∈ D | D · (u ◦ f ) = D · (u ◦ g)}
as the set of all discount functions that rank some f, g ∈ Fk ∪ F as the same. Note that
µ (I) = 0 so if we let D∗ := D\I, then µ (D∗ ) = 1. Let µ∗ be the restriction of µ on D∗ .
We will now define random variables ξk : D∗ → H and ξ : D∗ → H that have distributions
ρFk and ρF respectively. For each Fk , let ξk : D∗ → H be such that
ξk (D) := arg max D · (u ◦ f )
f ∈Fk
and define ξ similarly for F . Note that these are well-defined because there exists a unique
maximizer f for D ∈ D∗ . For any measurable set E ⊂ H,
ξk−1 (E) = { D ∈ D∗ | ξk (D) ∈ E ∩ Fk }
[
=
{ D ∈ D∗ | D · (u ◦ f ) > D · (u ◦ g) ∀g ∈ Fk }
f ∈E∩Fk
29
which is measurable. Hence, ξk and ξ are random variables. Note that
µ∗ ◦ ξk−1 (E) =
=
X
f ∈E∩Fk
X
f ∈E∩Fk
µ∗ { D ∈ D∗ | D · (u ◦ f ) > D · (u ◦ g) ∀g ∈ Fk }
µ { D ∈ D| D · (u ◦ f ) ≥ D · (u ◦ g) ∀g ∈ Fk }
= ρFk (E ∩ Fk )
= ρFk (E)
so ρFk and ρF are the distributions of ξk and ξ respectively. Note that for any D ∈ D∗ ⊂ D,
D · (u ◦ f ) is bounded and thus continuous in f . Hence, by the Maximum Theorem,
ξk (D) = arg maxf ∈Fk D · (u ◦ f ) is continuous in Fk . Hence, ξk → ξ µ∗ -a.s. and since a.s.
convergence implies convergence in distribution, ρFk → ρF as desired.
B.3
Proof of Theorems 3 and 4
We will now prove Theorems 3 and 4. We will prove them in reverse order as Theorem 3
follows easily from Theorem 4. To begin, we first show the following lemma
Lemma 4. Let ρ be represented by (µ, u).
(1) If ρ satisfies Weak Stochastic Stationarity, then for all t ≥ 1, µ {D ∈ D | D (1) = 0 } =
µ {D ∈ D | D (t) = 0 }.
(2) If ρ satisfies Stochastic Stationarity, then for all t ∈ T , µ {D ∈ D | D (t) = 0 } = 0.
Proof. Suppose ρ is represented by (µ, u). Let r ∈ ∆X be such that u (r) = 1. We prove
the two cases separately.
(1) First, suppose ρ satisfies Weak Stochastic Stationarity. Let f ∈ H be such that
f (1) = r and f (s) = w for all s 6= 1.
Now, by Weak Stochastic Stationar
ity, for any t ∈ T , µ {D ∈ D | D (1) = 0 } = ρ (w, f ) = ρ wt , f t = ρ w, f t =
µ {D ∈ D | D (t) = 0 }, as desired.
(2) Now, suppose ρ satisfies Stochastic Stationarity. If we let h ∈ H be such that
h (0) = r and h (s) = w for all s > 0, then by the same argument as above we
30
have µ {D ∈ D | D (0) = 0 } = ρ (w, h) = ρ wt , ht = µ {D ∈ D | D (t) = 0 }. Since
D (0) = 1, the result follows.
B.3.1
Sufficiency of Theorem 4
We first prove the sufficiency part of Theorem 4. Suppose ρ is represented by (µ, u). Since
ρ satisfies Weak Stochastic Stationarity, from Lemma 4 and the fact that µ is regular, we
know that for all t ≥ 1,
µ {D ∈ D | D (t) = 0 } = µ {D ∈ D | D (1) = 0 } ∈ {0, 1}
(B.2)
By this result, it suffices to consider the following two cases.
Case 1: µ {D ∈ D | D (1) = 0 } = 1. Then by (B.2), µ {D ∈ D | D (t) = 0 } = 1 for all
t ≥ 1. Then since T is countable, this implies that D (t) = 0 for all t ≥ 1 µ-a.s.. Hence,
ρF (f ) = µ {D ∈ D | u (f (0)) ≥ u (g (0)) for all g ∈ F }, so µ is trivially quasi-hyperbolic
with β = 0.
Case 2: µ {D ∈ D | D (1) = 0 } = 0. Then by (B.2), µ {D ∈ D | D (t) = 0 } = 0 for all
t ≥ 1. Since D ≥ 0, µ {D ∈ D | D (t) > 0 } = 1 for all t ≥ 1. So D(t) > 0 µ a.s. for all
t ∈ T . Hence, D(t + 1)/D(t) is well defined µ a.s. for all t ∈ T .
Choose r ∈ ∆X such that u(r) = 1. Define h ∈ H such that h (2) = r, and h (s) = w
for all s 6= 2. Also, for any a > 0, define fa , ga ∈ H such that ga = ah−1 + (1 − a) w
and fa = aga−1 + (1 − a) w. Hence, we can write down the utility streams for fa , ga , h as
follows:
u ◦ h = (0, 0, 1, 0, . . . ) , u ◦ ga = (0, a, 0, 0, . . . ) , u ◦ fa = a2 , 0, 0, 0, . . . .
Moreover, for any t ∈ T , the utility streams for the t-delayed streams fat , gat , ht are as
follows:
u ◦ ht = (0, . . . 0, 0, 1, 0, . . . ) , u ◦ gat = (0, . . . 0, a, 0, 0, . . . ) , u ◦ fat = 0, . . . a2 , 0, 0, 0, . . . ,
31
where h (t + 2) = r. Note that for D ∈ D
D · u ◦ ht = D (t + 2) , D · u ◦ gat = D (t + 1) a, D · u ◦ fat = D (t) a2 .
Let Fat := fat , gat , ht . We now consider two cases.
Subcase 2.1: Suppose there exists some a > 0 such that ga1 is tied with either fa1 or h1 .
Consider the case in which ga1 is tied with h1 . Hence, ρ ga1 , h1 = 1 = ρ h1 , ga1 . By Weak
Stochastic Stationarity, for all t ∈ T , ρ gat , ht = ρ ga1 , h1 = 1 = ρ h1 , ga1 = ρ ht , gat .
n
o
D(t+2)
Hence, for all t ∈ T , 1 = µ {D ∈ D | D (t + 1) a = D (t + 2) } = µ D ∈ D D(t+1)
=a .
Thus, if we let β =
D(1)
a
and δ = a, then for all t > 0, we have µ-a.s.
D (t) =
D (1) t
a = βδ t
a
so µ is quasi-hyperbolic as desired. The case for ga1 is tied with fa1 is symmetric.
Subcase 2.2: Now consider the second case where ga1 is not tied with fa1 nor h1 for all
a > 0. Note that by Weak Stochastic Stationarity, this implies that for all t ≥ 1, gat is not
tied with fat nor ht . Since Fat contains no ties, by Intertemporal Extremeness,
0 = ρFat gat
=µ D∈D
=µ D∈D
D (t + 1) a ≥ D (t + 2) and D (t + 1) a ≥ D (t) a2
D (t + 1)
D (t + 2)
≥
a
≥
D (t)
D (t + 1)
= µ {D ∈ D | Xt ≥ a ≥ Yt } ,
where we define Xt :=
D(t+1)
D(t)
≥ 0 and Yt :=
D(t+2)
D(t+1)
≥ 0. Hence,
0 = µ {D ∈ D | Xt ≥ a ≥ Yt } .
Since this is true for all a > 0, it must be that µ-a.s. for all t ≥ 1,
D (t + 1)
D (t + 2)
= Xt ≤ Yt =
D (t)
D (t + 1)
32
(B.3)
In other words, Intertemporal Extremeness implies that the discount ratios are increasing
µ-a.s..
For the final step, note that fat+1 , gat+1 = a gat , ht + (1 − a) w. Hence, by Weak
Stochastic Stationarity and Linearity ρ fat , gat = ρ fat+1 , gat+1 = ρ gat , ht . This im
plies that µ D ∈ D D (t) a2 ≥ D (t + 1) a = µ {D ∈ D | D (t + 1) a ≥ D (t + 2) } and
µ {D ∈ D | Xt ≤ a } = µ {D ∈ D | Yt ≤ a }.
Finally, by the inclusion-exclusion principle10 , we have
µ {D ∈ D | Xt ≤ a ≤ Yt } = µ {D ∈ D | Xt ≤ a } + µ {D ∈ D | a ≤ Yt } − µ {D ∈ D | Xt ≤ a or a ≤ Yt }
= µ {D ∈ D | Yt ≤ a } + µ {D ∈ D | a ≤ Yt } − µ {D ∈ D | Xt ≤ a or a ≤ Yt }
= 1 − µ {D ∈ D | Xt ≤ a or a ≤ Yt }
= µ {D ∈ D | Xt ≥ a ≥ Yt }
where the third and fourth equalities hold because gat is not tied with fat nor ht . Therefore,
by (B.3), we have that µ {D ∈ D | Xt ≤ a ≤ Yt } = µ {D ∈ D | Xt ≥ a ≥ Yt } = 0. Since
this holds for any a > 0, it must be that for all t ≥ 1,
D (t + 1)
D (t + 2)
= Xt = Yt =
D (t)
D (t + 1)
µ-a.s.. If we let δ =
D(2)
D(1)
and β =
D(1)2
D(2) ,
then for all t > 0,
D (t) = D (1)
D (2)
D (1)
t−1
= βδ t
µ-a.s.. Hence, µ is quasi-hyperbolic as desired.
B.3.2
Sufficiency of Theorem 3
Now, suppose ρ satisfies Stochastic Stationary and Intertemporal Extremeness. From
Lemma 4, we know that D (t) > 0 µ-a.s. for all t ∈ T . As in the sufficiency proof for
10
For any two events A and B, P (A ∩ B) = P (A) + P (B) − P (A ∪ B).
33
Theorem 4, define the streams h, ga , fa and ht , gat , fat such that for D ∈ D,
D · u ◦ ht = D (t + 2) , D · u ◦ gat = D (t + 1) a, D · u ◦ fat = D (t) a2 .
Again we consider two cases.
Case 1: Suppose there exists some a > 0 such that ga is tied with either fa or h. Consider
the case in which ga is tied with h. Hence, ρ (ga , h) = 1 = ρ (h, ga ). By Stochastic
Stationarity, for all t ≥ −1, ρ gat , ht = 1 = ρ ht , gat . Hence, for all t ∈ T
D (t + 1)
=a .
1 = µ {D ∈ D | D (t) a = D (t + 1) } = µ D ∈ D D (t)
If we let δ = a, then for all t ∈ T , we have µ-a.s.
D (t) = D (0) at = δ t
so µ is exponential as desired. As before, the case for ga is tied with fa is symmetric.
Case 2: Now consider the second case where ga is not tied with fa nor h for all a > 0.
By Stochastic Stationarity, this implies that for all t ∈ T , gat is not tied with fat nor ht .
Let Xt :=
D(t+1)
D(t)
and Yt :=
D(t+2)
D(t+1)
as before. Now, by the same argument as in the
sufficiency proof for Theorem 4, Intertemporal Extremeness implies that for all t ∈ T ,
µ {D ∈ D | Xt ≥ a ≥ Yt } = 0. Moreover, by the same argument using Stochastic Stationarity and Linearity, we have that µ {D ∈ D | Xt ≤ a ≤ Yt } = µ {D ∈ D | Xt ≥ a ≥ Yt } =
0. Since this holds for any a > 0, it must be that for all t ∈ T ,
D (t + 2)
D (t + 1)
= Xt = Yt =
D (t)
D (t + 1)
µ-a.s.. If we let δ = D (1), then for all t ∈ T , D (t) = D (1)t = δ t µ-a.s.. Hence, µ is
exponential as desired.
34
B.3.3
Necessity of Theorem 4
We now prove that if µ is quasi-hyperbolic, then ρ must satisfy Intertemporal Extremeness
and Weak Stochastic Stationary. We first show Intertemporal Extremeness. Choose any
f, g, h ∈ H and a ∈ [0, 1] such that f = ag −1 + (1 − a) w and g = ah−1 + (1 − a) w. Hence,
the utility streams for f, g, h are
u ◦ h = (0, 0, u (h (2)) , u (h (3)) , . . . ) ,
u ◦ g = (0, au (h (2)) , au (h (3)) , . . . ) ,
u ◦ f = a2 u (h (2)) , a2 u (h (3)) , . . . .
Let F = {f, g, h}. Note that if g is tied with either f or h, then ρF ({f, h}) = 1 trivially.
Hence, assume g is not tied with f nor h. Now, note that since µ is quasi-hyperbolic,
D · (u ◦ g − u ◦ f ) = βδau (h (2)) − a2 u (h (2)) + βδ 2 au (h (3)) − βδa2 u (h (3))
=a (βδ − a) u (h (2)) + a (δ − a) [βδu (h (3)) + · · · ] ,
D · (u ◦ g − u ◦ h) = βδau (h (2)) − βδ 2 u (h (2)) + βδ 2 au (h (3)) − βδ 3 u (h (3)) + · · ·
= (a − δ) βδu (h (2)) + βδ 2 u (h (3)) + · · · .
Suppose D · (u ◦ g − u ◦ h) ≥ 0 so a − δ ≥ 0. Since β ≤ 1, βδ − a ≤ 0 so
a (βδ − a) u (h (2)) ≤ 0 ≤ a (δ − a) βδu (h (3)) + βδ 2 u (h (4)) + · · · ,
which implies that D·(u ◦ g − u ◦ f ) ≤ 0. So D·(u ◦ g − u ◦ h) ≥ 0 implies D·(u ◦ g − u ◦ f ) ≤
0. Hence,
ρF (g) = µ {D ∈ D | D · (u ◦ g − u ◦ f ) ≥ 0 and D · (u ◦ g − u ◦ h) ≥ 0 }
≤ µ {D ∈ D | D · (u ◦ g − u ◦ f ) = 0 } = 0
as µ is regular and g is not tied with f . Hence, Intertemporal Extremeness is satisfied.
We now prove Weak Stochastic Stationarity. Suppose for all f, g ∈ F , f (0) = g (0).
35
Now, for any t ∈ T and f, g ∈ F ,
X s+t
βδ [u (f (s)) − u (g (s))] = δ t [D · (u ◦ f − u ◦ g)] .
D · u ◦ f t − u ◦ gt =
s≥1
Hence,
ρF t f t = µ D ∈ D D · u ◦ f t − u ◦ g t ≥ 0 for all g t ∈ F t
= µ {D ∈ D | D · (u ◦ f − u ◦ g) ≥ 0 for all g ∈ F } = ρF (f )
so Weak Stochastic Stationarity is satisfied.
B.3.4
Necessity of Theorem 3
We now prove that if µ is exponential, then ρ must satisfy Intertemporal Extremeness and
Stochastic Stationarity. Note that Intertemporal Extremeness follows immediately from
the necessity proof of Theorem 4. To show Stochastic Stationarity, note that for any t ∈ T
and f, g ∈ F ,
X s+t
D · u ◦ f t − u ◦ gt =
δ [u (f (s)) − u (g (s))] = δ t [D · (u ◦ f − u ◦ g)] .
s∈T
Hence,
ρF t f t = µ D ∈ D D · u ◦ f t − u ◦ g t ≥ 0 for all g t ∈ F t
= µ {D ∈ D | D · (u ◦ f − u ◦ g) ≥ 0 for all g ∈ F } = ρF (f ) ,
so Stochastic Stationarity is satisfied.
B.4
Proof of Proposition 1
Obviously, µ is not exponential. To show µ satisfies Stationarity, choose any d ∈ T .
Consider the case in which d is even and d = 2n. Then (Dω (d), Dω (d + 1), . . . ) =
36
exp(−2n)Dω .11 Therefore, for any f, g ∈ H,
(Dω (d), Dω (d+1), . . . )·u◦f > (Dω (d), Dω (d+1), . . . )·u◦g ⇔ Dω ·u◦f > Dω ·u◦g. (B.4)
So for any F ⊂ H and f ∈ F ,
ρF d (f d )
= µ{Dω |(Dω (d), Dω (d + 1), . . . ) · u ◦ f > (Dω (d), Dω (d + 1), . . . ) · u ◦ g for all g ∈ F }
= µ{Dω |Dω · u ◦ f ≥ Dω · u ◦ g for all g ∈ F }
(∵ (B.4))
= ρF (f ).
Consider the case in which d is even and d = 2n + 1. Then (Dω (d), Dω (d + 1), . . . ) =
exp(−2n)D1−ω .12 Therefore, for any f, g ∈ H,
(Dω (d), Dω (d + 1), . . . ) · u ◦ f > (Dω (d), Dω (d + 1), . . . ) · u ◦ g ⇔ D1−ω · u ◦ f > D1−ω · u ◦ g.
(B.5)
Let I be a uniform distribution over [0, 1]. Then for any F ⊂ H and f ∈ F ,
ρF d (f d )
= µ{Dω |(Dω (d), Dω (d + 1), . . . ) · u ◦ f > (Dω (d), Dω (d + 1), . . . ) · u ◦ g for all g ∈ F }
= I({t|(Dω (d), Dω (d + 1), . . . ) · u ◦ f > (Dω (d), Dω (d + 1), . . . ) · u ◦ g for all g ∈ F })
= I({t|D1−ω · u ◦ f > D1−ω · u ◦ g for all g ∈ F })
= I({1 − ω|D1−ω · u ◦ f > D1−ω · u ◦ g for all g ∈ F })
(∵ (B.5))
(∵ I is uniform)
= µ{Dω |Dω · u ◦ f > Dω · u ◦ g for all g ∈ F }
= ρF (f ).
Finally, we show that µ violates Intertemporal Extremeness. Fix r ∈ ∆X such that
u (r) = 1 > 0 = u (w). Define h ∈ H such that h (2) = r, and h (t) = w for all
t ∈ T such that t 6= 2. Fix a ∈ (0, 1). Define f, g ∈ H by f = ag −1 + (1 − a) w and
11
(Dω (d), Dω (d + 1), . . . ) = (exp(−2n), exp(−2n − 12 − ω), exp(−2(n + 1)), exp(−2(n + 1) − 21 − ω), . . . ) =
exp(−2n)(1, exp(− 12 − ω), exp(−2), exp(− 52 − ω), . . . ) = exp(−2n)Dω .
12
(Dω (d), Dω (d + 1), . . . ) = (exp(−2n − 21 − ω), exp(−2(n + 1)), exp(−2(n + 1) − 21 − ω), exp(−2(n + 2)), . . . ) =
exp(−2n − 21 − ω)(1, exp(− 12 − (1 − ω)), exp(−2), exp(− 25 − (1 − ω)), . . . ) = exp(−2n)D1−ω .
37
g = ah−1 + (1 − a) w.
Then for all s ∈ T
Dω · u ◦ g > Dω · u ◦ h ⇔ D1ω a > D2ω ⇔ a > exp(− 32 + ω),
Dω · u ◦ g > Dω · u ◦ f ⇔ D1ω a > a2 ⇔ exp(− 12 − ω) > a.
Note that exp(− 12 − ω) > exp(− 32 + ω) if and only if
1
2
> ω. Let a = exp(−1). Then for
all ω < 21 , we have exp(− 12 − ω) > exp(−1) > exp(− 32 + ω). Hence, ρ{f,g,h} (g) = I ω <
1
1
2 = 2 . This contradicts Intertemporal Extremeness.
B.5
Proof of Proposition 3
We show that Linearity∗ and Extremeness imply Linearity. The rest follows from Gul
and Pesendorfer (2006). Let C 0 = aC + (1 − a) r for some r ∈ ∆X and a ∈ (0, 1).
By Extremeness, we can only consider the extreme set of points of C without loss of
generality. Suppose C has k extreme points. Hence, we can translate C 0 k-times such that
each translated Ci0 overlaps with an extreme point pi ∈ C and conv (Ci0 ) ⊂ conv (C). Now,
define
E :=
[
Ci0
i
Note that ext (E) = C. By Extremeness, we know that ρE (C) = 1. By Monotonicity, we
have
ρE (pi ) ≤ ρC (pi )
for all pi ∈ C. Since pi are also the extreme points of E, by Extremeness, we also have
P
that i ρE (pi ) = 1. Hence, it must be that ρE (pi ) = ρC (pi ) for all i. Moreover, by
Monotonicity again, we have that for each i,
ρCi0 (pi ) ≥ ρE (pi )
as Ci0 ⊂ E for all i. If we let p0i = api + (1 − a) r ∈ C 0 , then by Linearity∗ , we have
ρC 0 p0i = ρCi0 (pi ) ≥ ρE (pi ) = ρC (pi )
38
for all pi ∈ C. Since this is true for all i, and by Extremeness again
P
i ρC 0
(p0i ) = 1, it
must be that ρC 0 (p0i ) = ρC (pi ) for all i. Hence, Linearity is satisfied as desired.
C
Appendix: Details of Empirical Tests
In this section, we report the detailed results of our empirical tests. The tests are based
on the data from Halevy (2015), which is available from the journal website. We use
the data from his main experiments (not the robustness treatments). As mentioned, 130
subjects completed the experiments, and among them, 10 had multiple switching points.
As in Halevy (2015), we exclude these 10 subjects from our analysis. In addition to the
10 subjects, Halevy (2015) also excluded 3 subjects who chose all later rewards implying
strict non-impatience. As excluding them would render our test of Stochastic Impatience
trivial, we keep them in our data set.
As mentioned in the main text, Halevy also asked questions in which all payoffs were
scaled up (e.g. the smaller payoff was $100 instead of $10). We also performed our analysis
on the data from these questions as well and our main results do not change. These results
are also reported in what follows.
As a robustness check, we did all tests by combining the dataset from Wave 1 and
Wave 2. The power of the tests is stronger because of the larger number of choices. All
results of the tests did not change.
The first and second tables report p-values for the tests using data from Waves 1 and
2 respectively. The third tables report p-values for the tests using the combined data from
Waves 1 and 2. In the last two tables, we reports p-values of the Kolmogorov-Smirnov
tests by using data from Waves 1, and Wave 2, and the combined data from Wave 1 and
2 respectively. All p-values are larger than 10%. As a result, all tests are not rejected at
the 10% significance level.
39
40
99
0.0222
0.0222
1.0000
1.0000
1.0000
0.8815
0.8815
x
ρ([x; 1w], [100; now]) ≡ p(x)
ρ([x; 5w], [100; 1w]) ≡ p0 (x)
z-test (p(x) = p0 (x))
chi squared (p(x) = p0 (x))
Fisher’s exact (p(x) = p0 (x))
Chi squared (p(x) = 0)
Chi squared (p0 (x) = 0)
100
0.0667
0.0889
0.6939
0.6940
1.0000
0.6547
0.5510
101
0.4667
0.4667
1.0000
1.0000
1.0000
10.0
10.1
0.0444 0.3111
0.0444 0.4222
1.0000 0.2741
1.0000 0.2740
1.0000 0.3820
0.7656
0.7656
105
0.8444
0.7556
0.2918
0.2920
0.4300
106
0.8444
0.7556
0.2918
0.2920
0.4300
107
0.8889
0.8000
0.2447
0.2450
0.3840
100
104
0.6000
0.5111
0.3961
0.3960
0.5250
Base line payoff
102
103
0.4889 0.5333
0.4667 0.4667
0.8329 0.5271
0.8330 0.5270
1.0000 0.6740
108
0.9111
0.8000
0.1338
0.1340
0.2300
10.8
0.5778
0.6000
0.8304
0.8300
1.0000
Table 1: p-values of tests using data from Wave 1: The upper table is for the experiments with baseline payoff $10 and
the lower table is for ones with baseline payoff $100 as the first row of each table shows. In the upper table, the second
row show the amount x which a subject can consume in 1 week later in Question 1 or in 5 weeks later in Question
2. The next two rows show ρ([x; 1w], [10; now]) and ρ([x; 5w], [10; 1w]) for each x, respectively. The next three rows
show p-values of tests of the hypothesis that ρ([x; 1w], [10; now]) = ρ([x; 5w], [10; 1w]) for each x. (We conducted three
different tests to make sure the results are robust.) The last two rows show p-values of tests of the hypothesis that
ρ([x; 1w], [10; now]) = 0 and ρ([x; 5w], [10; 1w]) = 0, respectively for each x. In the lower table, NA means that p-values
are not available because the test values are exactly equal to zero.
9.9
0.0000
0.0000
NA
NA
NA
NA
NA
x
ρ([x; 1w], [10; now]) ≡ p(x)
ρ([x; 5w], [10; 4w]) ≡ p0 (x)
z-test (p(x) = p0 (x))
chi squared (p(x) = p0 (x))
Fisher’s exact (p(x) = p0 (x))
Chi squared (p(x) = 0)
Chi squared (p0 (x) = 0)
10
10.4
10.5
10.6
10.7
0.4222 0.5556 0.5556 0.5556
0.4222 0.5556 0.6000 0.6000
1.0000 1.0000 0.6695 0.6695
1.0000 1.0000 0.6700 0.6700
1.0000 1.0000 0.8310 0.8310
Base line payoff
10.2
10.3
0.3111 0.3333
0.4222 0.4222
0.2741 0.3845
0.2740 0.3840
0.3820 0.5150
109
0.9111
0.8000
0.1338
0.1340
0.2300
10.9
0.5778
0.6000
0.8304
0.8300
1.0000
110
0.9333
0.8667
0.2918
0.2920
0.4850
11
0.6667
0.6667
1.0000
1.0000
1.0000
41
x
ρ([x; 1w], [100; now]) ≡ p(x)
ρ([x; 5w], [100; 1w]) ≡ p0 (x)
z-test (p(x) = p0 (x))
chi squared (p(x) = p0 (x))
Fisher’s exact (p(x) = p0 (x))
Chi squared (p(x) = 0)
Chi squared (p0 (x) = 0)
x
ρ([x; 1w], [10; now]) ≡ p(x)
ρ([x; 5w], [10; 4w]) ≡ p0 (x)
z-test (p(x) = p0 (x))
chi squared (p(x) = p0 (x))
Fisher’s exact (p(x) = p0 (x))
Chi squared (p(x) = 0)
Chi squared (p0 (x) = 0)
100
0.0133
0.0267
0.5598
0.5600
1.0000
0.9081
0.8174
10.0
0.0133
0.0267
0.5598
0.5600
1.0000
0.9081
0.8174
101
0.4533
0.4133
0.6211
0.6210
0.7420
10.1
0.3867
0.3867
1.0000
1.0000
1.0000
10
10.4
0.4000
0.4000
1.0000
1.0000
1.0000
100
104
0.5733
0.5333
0.6222
0.6220
0.7430
Base line payoff
10.2
10.3
0.3867 0.4000
0.3867 0.3867
1.0000 0.8673
1.0000 0.8670
1.0000 1.0000
Base line payoff
102
103
0.4933 0.5467
0.4667 0.5067
0.7438 0.6237
0.7440 0.6240
0.8700 0.7440
Table 2: p-values of tests using data from Wave 2
99
0.0133
0.0133
1.0000
1.0000
1.0000
0.9081
0.9081
9.9
0.0133
0.0133
1.0000
1.0000
1.0000
0.9081
0.9081
105
0.8133
0.7467
0.3244
0.3240
0.4310
10.5
0.5333
0.4800
0.5136
0.5140
0.6240
106
0.8667
0.8000
0.2733
0.2730
0.3810
10.6
0.5600
0.5333
0.7429
0.7430
0.8700
107
0.8667
0.8267
0.4966
0.4970
0.6510
10.7
0.5867
0.5467
0.6211
0.6210
0.7420
108
0.8933
0.8533
0.4614
0.4610
0.6240
10.8
0.6000
0.5733
0.7402
0.7400
0.8680
109
0.8933
0.8533
0.4614
0.4610
0.6240
10.9
0.6000
0.5733
0.7402
0.7400
0.8680
110
0.9333
0.9200
0.7514
0.7540
1.0000
11
0.8133
0.7067
0.1262
0.1260
0.1800
]
42
99
0.0167
0.0167
1.0000
1.0000
1.0000
0.8551
0.8551
9.9
0.0083
0.0083
1.0000
1.0000
1.0000
0.9273
0.9273
100
0.0333
0.0500
0.5182
0.5180
0.7490
0.7150
0.5839
10.0
0.0250
0.0333
0.7013
0.7010
1.0000
0.7842
0.7150
101
0.4583
0.4333
0.6968
0.6970
0.7950
10.1
0.3583
0.4000
0.5059
0.5060
0.5950
10
10.4
0.4083
0.4083
1.0000
1.0000
1.0000
100
104
0.5833
0.5250
0.3633
0.3630
0.4360
Base line payoff
10.2
10.3
0.3583 0.3750
0.4000 0.4000
0.5059 0.6910
0.5060 0.6910
0.5950 0.7910
Base line payoff
102
103
0.4917 0.5417
0.4667 0.4917
0.6983 0.4383
0.6980 0.4380
0.7960 0.5180
105
0.8250
0.7500
0.1556
0.1560
0.2070
10.5
0.5417
0.5083
0.6051
0.6050
0.6980
106
0.8583
0.7833
0.1298
0.1300
0.1780
10.6
0.5583
0.5583
1.0000
1.0000
1.0000
Table 3: p-values of tests using merged data from Wave 1 and Wave 2
x
ρ([x; 1w], [100; now]) ≡ p(x)
ρ([x; 5w], [100; 1w]) ≡ p0 (x)
z-test (p(x) = p0 (x))
chi squared (p(x) = p0 (x))
Fisher’s exact (p(x) = p0 (x))
Chi squared (p(x) = 0)
Chi squared (p0 (x) = 0)
x
ρ([x; 1w], [10; now]) ≡ p(x)
ρ([x; 5w], [10; 4w]) ≡ p0 (x)
z-test (p(x) = p0 (x))
chi squared (p(x) = p0 (x))
Fisher’s exact (p(x) = p0 (x))
Chi squared (p(x) = 0)
Chi squared (p0 (x) = 0)
107
0.8750
0.8167
0.2108
0.2110
0.2830
10.7
0.5750
0.5667
0.8962
0.8960
1.0000
108
0.9000
0.8333
0.1287
0.1290
0.1830
10.8
0.5917
0.5833
0.8957
0.8960
1.0000
109
0.9000
0.8333
0.1287
0.1290
0.1830
10.9
0.5917
0.5833
0.8957
0.8960
1.0000
110
0.9333
0.9000
0.3502
0.3500
0.4840
11
0.7583
0.6917
0.2475
0.2470
0.3120
]
µ0
µ00
µ1
µ01
= µ4
= µ04
= µ5
= µ05
0.8198
0.8274
0.5279
0.4677
Table 4: p-values of Kolmogorov-Smirnov test: µt denotes the distribution of discount factors
elicited from the experiments conducted at week t ∈ {0, 1, 4, 5} where the smaller payoff is $10.
µ0t denotes the distribution of discount factors elicited from the experiments conducted at week
t ∈ {0, 1, 4, 5} where the smaller payoff is $100.
µ0,1 = µ4,5
µ00,1 = µ04,5
0.3789
0.5352
Table 5: p-values of Kolmogorov-Smirnov for the combined dataset from Wave 1 and 2: µ0,1
denotes the distribution of discount factors elicited from the experiments conducted at week 0
and 1, where the smaller payoff is $10. µ4,5 denotes the distribution of discount factors elicited
from the experiments conducted at week 4 and 5, where the smaller payoff is $10. µ00,1 denotes
the distribution of discount factors elicited from the experiments conducted at week 0 and 1,
where the smaller payoff is $100. µ04,5 denotes the distribution of discount factors elicited from
the experiments conducted at week 4 and 5, where the smaller payoff is $100.
43
References
Apesteguia, J. and M. A. Ballester (2015): “Monotone stochastic choice models:
The case of risk and time preferences,” Working paper.
——— (2016): “Single-crossing random utility models,” Working paper.
Gul, F. and W. Pesendorfer (2006): “Random Expected Utility,” Econometrica, 74,
pp. 121–146.
Halevy, Y. (2015): “Time consistency: Stationarity and time invariance,” Econometrica,
83, 335–352.
Hayashi, T. (2003): “Quasi-stationary cardinal utility and present bias,” Journal of Economic Theory, 112, 343–352.
Higashi, Y., K. Hyogo, and N. Takeoka (2009): “Subjective random discounting and
intertemporal choice,” Journal of Economic Theory, 144, 1015–1053.
Higashi, Y., K. Hyogo, N. Takeoka, and H. Tanaka (2016): “Comparative impatience under random discounting,” Economic Theory, 1–31.
Koopmans, T. C. (1960): “Stationary Ordinal Utility and Impatience,” Econometrica,
28, 287–309.
Krusell, P. and A. Smith (1998): “Income and Wealth Heterogeneity in the Macroeconomy,” Journal of Political Economy, 106, 867–896.
Lu, J. (2015): “Random choice and private information,” Working paper, UCLA.
Meier, S. and C. D. Sprenger (2015): “Temporal stability of time preferences,” Review
of Economics and Statistics, 97, 273–286.
Olea, J. L. M. and T. Strzalecki (2014): “Axiomatization and measurement of quasihyperbolic discounting,” The Quarterly Journal of Economics, 129, 1449–1499.
Pennesi, D. (2015): “The Intertemporal Luce Rule,” Working paper, THEMA.
44
Saito, K. (2015): “Preferences for Flexibility and Randomization under Uncertainty,”
American Economic Review, 105, 1246–71.
45
© Copyright 2026 Paperzz