Learning and Valuation with Costly Attention

Learning and Valuation with Costly Attention
Jacob LaRivierea and William Neilsonb
June 2015
Abstract
This paper develops a theoretical framework of consumer learning and product
valuation when attending to new information is costly. The key attribute of the
model is that agents are unsure what product characteristics are present in a good. In
the model increased beliefs that a good contains valuable attributes serves to increase
learning and, possibly, willingness to pay for the good. We develop two testable
implications from the model which can be readily tested using lab or field data.
JEL Codes: D01; D83; Q41
Keywords: Information, Updating, Preferences, Uncertainty
a
University of Tennessee & Baker Center for Public Policy, Department of Economics, 525 Stokely
Management Center, Knoxville, TN 37996-0550. Email: [email protected].
b
University of Tennessee, Department of Economics, 508 Stokely Management Center, Knoxville, TN
37996-0550.
1
1
Introduction
There is mounting evidence that learning is a complicated process. Recent research notes
that if attention is scarce or learning is costly, consumers may be left with inefficient levels
of information (DellaVigna (2009), Bordalo, Gennaioli, and Shleifer (2013), Schwartzstein
(2014), and LaRiviere, Czajkowski, Hanley, and Simpson (2015)). Despite this, there
is little known either theoretically or empirically about the role of costly learning on
how consumers form valuations for goods despite long-standing knowledge that learning
resources are indeed scarce (Gabaix, Laibson, Moloche, and Weinberg (2006)). It is also
unclear what the causal impact of increasing the retention of information is on economic
decision-making.
Hanna, Mullainathan, and Schwartzstein (2014) attempts to tackle some of these issues
in the context of firms. That paper develops a theoretical model of costly attention in
which firms must pay a cost to attend to information related to how effective an input
is at increasing productivity. They then test and find evidence of their model using data
from farmers in India. Our paper takes aspects of their model and applies it to consumers.
We model both uncertainty that an attribute is present in a good and uncertainty as
to the level of the attribute conditional on it being present. The theoretical model leads to
two key propositions in addition to other testable predictions. Intuitively, if the probability
that a desirable attribute is present in a good increases, learning about the good increases.
If the attribute is indeed present, then valuations subsequently increase. Our two testable
propositions can be tested in the lab or the field. Consistent with the empirical findings
of LaRiviere, Czajkowski, Hanley, and Simpson (2015), our key modelling assumption is
that learning is probabilistic.
2
2
Theoretical framework
The approach is to construct the simplest possible model to capture all of the relevant
aspects of the consumer’s problem of learning and valuation with costly effort in order
to guide the design of experiments and formulate testable hypotheses. In particular, the
model considers a good with multiple features which might or might not be embodied
in the good (e.g., purchasing a bottle of wine which may or may not be have heavy
citrus but is definitely a Savignon Blanc from Marlborough, New Zealand), and allows for
learning about how much a consumer values uncertain or new features. This framework
applies to new private goods, new public goods, or new mixed goods with both private
and public attributes like green energy blocks. This model bears qualitative similarity to
other recent models of costly learning (Hanna, Mullainathan, and Schwartzstein (2014)
and Schwartzstein (2014)).
A consumer faces two types of goods, a composite good y and another good z.
A
standard framework for such a setting would treat utility as quasilinear:
u(y, z) = y + v(z),
(1)
but the model used here has more structure behind v(z). A unit of z has two attributes,
a familiar attribute in amount a, and an unfamiliar attribute b which may or may not be
valued by the consumer. The utility generated by a single unit of z is
Z
v(1) = a + π
φ(b)dF (b)
(2)
where π ∈ [0, 1] is the probability that the good includes the new feature and the distri-
3
bution function F (b) governs the amount of the unfamiliar attribute the individual uses
conditional on it being included.
The function φ is nondecreasing and concave with
φ(0) = 0. The additive characterization of v(1) has each attribute generate utility separately, and the individual displays diminishing sensitivity over the unfamiliar feature. We
refer to this latter property as concavity in attributes and note that it bears similarity to
the Lancaster (1966) utility formulation.
The individual can choose an amount of good z, and consuming more z entails scaling
up the two characteristics proportionately. This leads to the representation
Z
v(z) = Φ(z) a + π
φ(b)dF (b) ,
(3)
where Φ is increasing and strictly concave to reflect diminishing marginal utility of the good
as a whole. We refer to this property as concavity in levels. The two forms of concavity,
in attributes and in levels, replace the concavity of v(z) in the standard representation of
(1).
The individual is endowed with the noisy random variable b̃ + ε̃, where ε̃ is a noise
variable with E[ε̃|b] = 0 for all b. Learning is costly and probabilistic.
If learning is
successful she learns whether the unfamiliar attribute is, in fact, included in the good and,
if it is, the noise is removed from the random variable governing its amount. Learning
whether the attribute is included resolves the binary random variable captured by the prior
probability π, and we assume that this prior is unbiased. If the attribute is included and
learning is successful, the per-unit amount of the unfamiliar attribute is governed by the
random variable b̃ instead of the noisy b̃ + ε̃. If the attribute is not included and learning
is successful, she learns that b = 0 with certainty. On the other hand, when learning is
4
unsuccessful she learns nothing about the inclusion probability π nor the amount of the
attribute, so she still faces random variable b̃ + ε̃.
Learning is measured by its success probability t, and requires energy which is expended
at cost c(t) with c nondecreasing and convex with c(0) = c0 (0) = 0. This representation
of learning cost is consistent with either a fatigue interpretation (digging deeper requires
increasingly more energy) or one based on ease of finding information (digging deeper
requires more effort to find and digest additional relevant information).
Let Fb̃+ε̃ (b) be the distribution function for the noisy random variable b̃ + ε̃, and
let Fb̃ (b) denote the distribution function for the noiseless, but still random, variable b̃.
Combining all of these notions into (1) and (3) yields
Z
Z
u(y, z, t) = y + Φ(z) a + tπ φ(b)dFb̃ (b) + (1 − t)π φ(b)dFb̃+ε̃ (b) − c(t).
(4)
This representation makes learning cost a utility cost, not a monetary expenditure. The
individual chooses y, z, and t to maximize u subject to the budget constraint y + z = m,
where m is income.
The timing of the decision process is as follows. First the individual observes π, the
prior probability that the attribute is included in the good. The individual then chooses
learning effort t and the resulting information is revealed. Finally, the individual allocates
the budget m between the composite good y and the good z.
Theorem 1 Suppose that b̃ and b̃+ ε̃ are both nonnegative random variables. An increase
in the inclusion probability π leads to an increase in learning t and an increase in expected
utility.
5
The proof of the theorem relies on the existence of three states of the world that could
follow the learning stage. In state 1 the individual learns that the feature is absent from
the good, while in state 2 she learns that it is, in fact, included.
In state 3 learning is
ineffective and she does not know whether the feature is included. In practice the feature
is either included or not, and if it is included then no one can learn that it is absent, and
so no one can enter state 1 in such a state of the world. If the feature is absent no one
can learn that it is included, so nobody can enter state 2. Individuals can still enter state
3 regardless of whether the good contains the feature because learning was ineffective.
Let λ be a binary variable with λ = 1 when the attibute is included and λ = 0 when
it is absent.
The post-learning expected demand for the good depends on whether the
attribute is included, and it is given by
E[z ∗ ] =




t∗ z1∗ + (1 − t∗ )z3∗




λ=0
if






 t∗ z ∗ + (1 − t∗ )z ∗
λ=1
2
3
where z1∗ , z2∗ , and z3∗ are the individual’s post-learning demands in each of the states
described above.
Theorem 2 An increase in the inclusion probability π has a positive impact on demand
for good z if the attribute is included in the good; that is, dE[z ∗ |λ = 1] > 0..
When the attribute is not included, an increase in π has an ambiguous impact on
expected demand. There are two countervailing effects. First, demand increases among
those for whom learning is ineffective because they place higher probability on the highervalued (but incorrect) state of the world in which the good contains the attribute. Second,
6
demand falls among those for whom learning is effective, because they find out that the
attribute is absent from the good and therefore value it less.
These two results lead to our two main hypotheses which could be test in either the
field or the lab:
Hypothesis 1. When attention is drawn to an included and desirable feature, subjects
learn more about the good.
Hypothesis 2. When attention is drawn to an included and desirable feature, subjects
are willing to pay more for the good.
3
Discussion
In this paper, we developed a theoretical model which shows comparative statics of how
agents value goods when they have incomplete information and effort devoted to learning
is costly. The model finds that consumers will spend more effort learning about attributes
embedded in a good if an agent’s belief that a good has a desireable attribute increases.
The model also finds that an agent’s valuation for the good will increase as well if the
desireable attribute is actually embedded in the good.
Testing this model of consumer valuation with costly attention, a consumer analog of
Hanna, Mullainathan, and Schwartzstein (2014), would be straightforward in an experimental setting, whether in a lab or in the field. The key attribute of any such experiment
would be creating exogenous variation in π, the prior that subjects place on the attribute
being included in the good. If the good does, in reality, contain the feature in question
then Theorem 1 applies.
7
There are several feasible ways that an experimental design could create exogenous
variation in π. For example, a quiz over good characteristics could alter the probability
subjects place on the probability with which attributes are embedded in the good. Information on the attributes of similar goods coud also create variation in π. Lastly, recent
evidence suggests that creating variation in subjects’ beliefs about the accuracy of their
dataset affects valuations for public goods (LaRiviere, Czajkowski, Hanley, Aanesen, FalkPetersen, and Tinch (2014)). A similar type of treatment could also create variation in
π.
There are also secondary hypotheses which could be tested in an experiment. Conditional on a particular level of π, for example, the model implies that the distribution
of possible valuations decreases with more information. Further, the experimental design
could provide information about the mechanism behind learning due to increases in π.
8
References
Bordalo, P., N. Gennaioli, and A. Shleifer (2013): “Salience and Consumer
Choice,” Journal of Political Economy, 121(5), 803–843.
DellaVigna, S. (2009): “Psychology and Economics: Evidence from the Field,” Journal
of Economic Literature, 47(2), 315–375.
Gabaix, X., D. Laibson, G. Moloche, and S. Weinberg (2006): “Costly Information Acquisition: Experimental Analysis of a Boundedly Rational Model,” American
Economic Review, 96(4), 1043–1068.
Hanna, R., S. Mullainathan, and J. Schwartzstein (2014): “Learning Through
Noticing: Theory and Experimental Evidence in Farming,” The Quarterly Journal of
Economics, 129(3), 1311–1353.
Lancaster, K. (1966): “A New Approach to Consumer Theory,” Journal of Political
Economy, 74(2), 132–157.
LaRiviere, J., M. Czajkowski, N. Hanley, M. Aanesen, J. Falk-Petersen, and
D. Tinch (2014): “The Value of Familiarity: Effects of Knowledge and Objective
Signals on Willingness to Pay for a Public Good,” Journal of Environmental Economics
and Management, 68(2), 376–389.
LaRiviere, J., M. Czajkowski, N. Hanley, and K. Simpson (2015): “What is the
Causal Effect of Knowledge on Decision Making,” University of Tennessee Working
Paper.
9
Schwartzstein, J. (2014): “Selective Attention and Learning,” Journal of the European
Economic Association, 12(6), 1423–1452.
10
Proofs
Theorem 1 Suppose that b̃ and b̃ + ε̃ are both nonnegative random variables.
An
increase in the inclusion probability π leads to an increase in learning t and an increase
in expected utility.
Proof. For notational ease define w =
R
φ(b)dFb̃ (b) and wε =
R
φ(b)dFb̃+ε̃ (b). Because
φ is concave and b̃ + ε̃ differs from b̃ by a Rothschild-Stiglitz mean-preserving increase in
risk, w ≥ wε . By hypothesis wε > 0. Let ∆w = w − wε .
Solve the problem by backward induction. The end of the game has three states of
the world: one in which learning is successful but the attribute is not included (b = 0),
one in wh1ich learning is successful and the attribute is included (b drawn from b̃), and
one in which learning is unsuccessful (b drawn from b̃ + ε̃). We take these one at a time.
Suppose that learning is successful but the attribute is not included. The individual
then chooses z to maximize y + aΦ(z) subject to the constraint y + z = m. Substituting
from the constraint yields the unconstrained objective function m − z + aΦ(z), and the
first-order condition is Φ0 (z) = 1/a. Let z1∗ denote the value of z that solves this problem.
This case occurs with probability t(1 − π).
Now suppose that the individual learns that the attribute is included in the good.
She chooses z to maximize m − z + Φ(z) (a + w) and the first-order condition is Φ0 (z) =
1/ (a + w). Let z2∗ denote the solution to this equation. This case occurs with probability
tπ.
Finally, suppose that the learning is unsuccessful. She chooses z to maximize m − z +
Φ(z) (a + πwε ) and the first-order condition has Φ0 (z) = 1/ (a + πwε ). Let z3∗ denote the
solution to this expression. This case occurs with probability 1 − t.
11
Now turn to the problem of choosing t. The objective function is
h(t) = t(1 − π) [m − z1∗ + aΦ(z1∗ )] + tπ [m − z2∗ + Φ(z2∗ ) (a + w)]
+(1 − t) [m − z3∗ + Φ(z3∗ )(a + πwε )] − c(t).
The first-order condition is
0 = (1 − π) [m − z1∗ + aΦ(z1∗ )] + π [m − z2∗ + Φ(z2∗ ) (a + w)]
(5)
− [m − z3∗ + Φ(z3∗ )(a + πwε )] − c0 (t).
Let t∗ donote the solution to this problem.
We want to know how t and z respond to a change in π. Note that t does not appear
in the first-order conditions defining any of the zi∗ ’s, and so dz1∗ /dt = dz2∗ /dt = dz3∗ /dt = 0.
Also, π does not appear in the first-order conditions defining z1∗ and z2∗ , and so dz1∗ /dπ =
dz2∗ /dπ = 0. Implicitly differentiating the first-order condition identifying z3∗ yields
Φ00 (z)
dz3∗
wε
=−
,
dπ
(a + πwε + πt∆w )
so that
dz3∗
wε
= − 00 ∗
> 0.
dπ
Φ (z3 ) (a + πwε + πt∆w )
Now implicitly differentiate (5) with respect to π to get
0 = − [m − z1∗ + aΦ(z1∗ )] + [m − z2∗ + Φ(z2∗ ) (a + w)]
+
dz3∗
dz ∗
dt∗
− Φ0 (z3∗ ) 3 (a + πwε ) − c00 (t)
.
dπ
dπ
dπ
12
The first-order condition for z3∗ has Φ0 (z3∗ )(a + πwε ) = 1, and so the first two terms in the
second line cancel out. We are left with
dt∗
1
= 00 ([m − z2∗ + Φ(z2∗ ) (a + w)] − [m − z1∗ + aΦ(z1∗ )]) .
dπ
c (t)
The term in parentheses on the right-hand side is strictly positive.
To see why, note
that m − z2∗ + Φ(z2∗ ) (a + w) is the optimized utility when the individual learns that the
attribute matters, while is the optimized utility when she learns that the attribute does
not matter. In the former case she could have chosen z1∗ instead of z2∗ , generating utility
m − z1∗ + Φ(z1∗ )(a + w), but because she chose z2∗ it must be the case that
m − z1∗ + aΦ(z1∗ ) < m − z1∗ + Φ(z1∗ )(a + w) ≤ m − z2∗ + Φ(z2∗ ) (a + w) .
Convexity of the learning cost means that c00 (t) > 0, and consequently, dt∗ /dπ > 0.
The intuition is that increasing learning increases the probability that the individual
gets to make an ex post optimal choice of z. An increase in π increases the probability
that learning will reveal that the feature is included, and the individual learns more in
order to reach a better optimum for this case.
Expected utility is given by
EU
= t∗ (1 − π) [m − z1∗ + aΦ(z1∗ )] + t∗ π [m − z2∗ + Φ(z2∗ ) (a + w)]
+(1 − t∗ ) [m − z3∗ + Φ(z3∗ )(a + πwε )] − c(t∗ ).
By the envelope theorem,
13
dEU
dπ
=
∂EU
∂π
= t∗ ([m − z2∗ + Φ(z2∗ ) (a + w)] − [m − z1∗ + aΦ(z1∗ )]) + (1 − t∗ )Φ(z3∗ )wε .
As argued above, the term in parentheses in the second line is positive, and the third term
is positive, so expected utility increases when π increases.
Theorem 2 An increase in the inclusion probability π has a positive impact on demand
for good z if the attribute is included in the good.
Proof. When the attribute is included, the expected value of z ∗ is
E[z ∗ |λ = 1] = t∗ z2∗ + (1 − t∗ )z3∗ .
It follows that
d
dz ∗
dz ∗
dt∗
E[z ∗ |λ = 1] = t∗ 2 + (1 − t∗ ) 3 + (z2∗ − z3∗ )
.
dπ
dπ
dπ
dπ
From the proof of Theorem 1, dz2∗ /dπ = 0 and dz3∗ /dπ > 0. Concavity of Φ implies that
z3∗ < z2∗ . Consequently dE[z ∗ |λ = 1] > 0.
14