Meaning, its Evolution, and Credibility in Experimental Cheap

Meaning, its Evolution, and Credibility in
Experimental Cheap-Talk Games∗
Ernest K. Lai
Department of Economics
Lehigh University
[email protected]
Wooyoung Lim†
Department of Economics
The Hong Kong University of
Science and Technology
[email protected]
February 2, 2016
Abstract
We design two sets of experimental games to evaluate the predictive power of the pioneering cheap-talk refinement, neologism-proofness. In our first set of treatments designed
to evaluate neologism-proofness with its usual emphasis on literal meanings, we observe
that an equilibrium was played more often when it was neologism-proof than when it was
not. We further find that the mere existence of a neologism, albeit not credible, attracted
deviating behavior from the senders, while the receivers’ deviations required the neologism
to be credible. Our second set of treatments evaluates the refinement concept in the absence
of a pre-existing common language, in which the meaning of a neologism must evolve. We
obtain a few, but not all, observations where the meaning of a neologism emerged in accordance with what neologism-proofness predicts. Our findings shed light on the powers and
limitations of neologism-proofness in predicting laboratory behavior under different levels
of challenges posed by different language environments.
Keywords: Neologism-Proofness; Cheap Talk; Equilibrium Refinement; Evolution of Meanings; Laboratory Experiments
JEL classification: C72; C92; D82; D83
∗
We are grateful to Andreas Blume, Syng-Joo Choi, Navin Kartik, Barry Sopher and Joel Sobel for their
valuable comments and suggestions. We also thank conference and seminar participants at HKUST, Rutgers
University, the 2nd Haverford Meeting on Behavioral and Experimental Economics, the 16th KAEA-KEA International Conference, the 5th Annual Xiamen University International Workshop on Experimental Economics,
the 90th Western Economic Association International Annual Conference and the 26th International Conference
on Game Theory for valuable discussions. This study is supported by a grant from the Research Grants Council
of Hong Kong (Grant No. ECS-699613). Lai gratefully acknowledges the financial support from the Office of
the Vice President and Associate Provost for Research and Graduate Studies at Lehigh University. This paper
is a substantially revised and extended version of an earlier paper “Meaning and Credibility in Experimental
Cheap-Talk Games.”
†
Corresponding author. Address: HKUST Department of Economics, LSK Business Building, Clearwater Bay,
Kowloon, Hong Kong. Phone: (852) 2358 7628.
1
Introduction
Cheap-talk games are a type of signaling game in which messages have no direct payoff consequence. The costless nature of messages has profound implications for the treatment of cheap-talk
equilibria. In using cheap talk to communicate, the meanings of messages can only be established by use (in equilibrium) and cannot be learned by introspection. Furthermore, for every
equilibrium with unsent messages, there exists another outcome-equivalent equilibrium in which
all messages are used so that no off-equilibrium paths exist. Consequently, the standard refinement concepts for costly signalling games (e.g., the intuitive criterion), which select equilibria by
restricting how messages should be interpreted off the equilibrium path, have no selection power
for cheap-talk games.
Several refinement concepts exclusively designed for cheap-talk games have been proposed.
Farrell’s (1993) notion of neologism-proofness is one such concept. It is the pioneering contribution that serves as the headwater of the refinement literature on cheap talk.1 Despite the
fact that neologism-proofness has been developed for more than twenty years, no experimental
effort has been devoted exclusively to evaluating its predictive power. In fact, only a handful
of earlier studies (e.g., Sopher and Zapater, 2000; Blume, DeJong, Kim, and Sprinkle, 2001)
has experimentally evaluated cheap-talk refinements, and only a limited set of concepts have
been covered.2 The situation is unlike that of the major refinement concepts for costly signaling
games, which have been subject to thorough experimental investigations (e.g., Brandts and Holt,
1992; Banks, Camerer and Porter, 1994) at an early stage of the literature development.
This study contributes to filling this void of the experimental cheap-talk literature. Multiplicity of equilibria is a notorious property of cheap-talk games, and the refinement of cheap-talk
equilibria is an issue that has not been entirely settled. There is therefore an apparent need for
empirical work, which may complement the theoretical advancement of cheap-talk refinements.
In this, it is natural to revisit its very first refinement concept.
Meaning and credibility are two building blocks of neologism-proofness. In the theoretical
context, a neologism, from the Greek for “new word,” refers to an out-of-equilibrium message.3
1
Rabin’s (1990) nonequilibrium notion of “credible message rationalizability” builds on neologism-proofness
by combining it with rationalizability. Matthews, Okuno-Fujiwara and Postlewaite (1991) develop the concept
of announcement-proofness, which generalizes neologism-proofness. Chen, Kartik and Sobel (2008) relate their
criterion, NITS (No Incentive To Separate), to neologism-proofness by showing that an equilibrium survives NITS
if it is neologism-proof. More recently, De Groot Ruiz, Offerman and Onderstal (2015) propose a behavioral
refinement, ACDC (Average Credible Deviation Criterion), which further extends on neologism-proofness and
announcement-proofness. There are also other refinement concepts that are, unlike neologism-proofness and its
derivatives, not belief-based. They include the “partial common interest” by Blume, Kim and Sobel (1993) and
the “recurrent mop” by Blume, Kim and Sobel (1993). It is also worth mentioning the concept of immunity
to credible deviations by Eső and Schummer (2009), which has selection power for Crawford and Sobel’s (1982)
model where the cheap talk is replaced by costly messages.
2
There are some recent experimental studies on cheap-talk refinements (e.g., Kawagoe and Takizawa, 2008;
De Groot Ruiz, Offerman and Onderstal, 2014), which will be discussed in the literature review.
3
Although any unused message in a cheap-talk equilibrium can be turned into use by means of sender’s
1
Despite the fact that a neologism, being an unsent message, has no meaning in light of established
use, Farrell (1993) argues that if the players share a pre-existing language (e.g., English), a
neologism may still be understood by its commonly known literal meaning.4 Building on the
assumption that a neologism is to be understood literally, Farrell (1993) then introduces the
concept of credible neologisms: when certain types of the sender, who want to deviate from
a putative equilibrium by sending a particular neologism, precisely match the meaning of the
neologism (expressed in the form of “I am of one of these types”), the neologism is credible or
self-signaling. A neologism-proof equilibrium is an equilibrium in which no credible neologisms
exist.
The above definition of neologism-proofness, with its emphasis on literal meanings, informs
the design of our first set of treatments. We design two cheap-talk games with binary state and
with literally meaningful messages. We vary the cardinality of the message spaces from two to
three, where the third, additional message represents the neologism that we induce with respect
to a putative equilibrium under consideration. This manipulation of the message spaces results
in a total of four treatments. All four games each admit a fully revealing equilibrium. But their
payoff structures and/or message spaces are different so that only the fully revealing equilibrium
in some of the games is neologism proof, and this serves as our treatment variation.
Farrell (1993) also provides an evolutionary interpretation of neologisms, invoking the fact
that an equilibrium can be viewed as an evolutionarily stable outcome. In this interpretation,
it is natural to consider messages with no a priori meanings, where the meaning of a neologism must evolve endogenously. An experimental study with subjects participating in repeated
interactions provides an appropriate arena to study neologism-proof equilibria in light of this
interpretation. Such a study allows us to investigate empirically how the evolved meaning of
a neologism may affect the communication outcome depending on whether the equilibrium in
question is neologism-proof.
Our second set of treatments takes the two basic games in the first set of treatments and
endow them with a priori meaningless messages, initially with only two messages. Unlike the
first set of treatments where the presence of a neologism is a between-subject treatment variable,
in this second set of treatments the introduction of a neologism, i.e., a third meaningless message,
is within subject. It is introduced in the middle of an experimental session, after meanings have
supposedly been established for the two initial messages. We investigate whether the introduction
of a neologism in this manner may disrupt any (fully revealing) equilibrium play and whether
that is as predicted by the neologism-proof property of the equilibrium in question.
randomization (in another outcome-equivalent equilibrium), Farrell (1993) argues that randomization may not
be plausible, especially when the message space is large. In developing neologism-proofness, he therefore assumes
that neologisms (unused messages) always exist.
4
A feature of neologism-proofness that distinguishes it from standard signaling refinements is its reference to
natural language. Perfect sequential equilibrium developed by Grossman and Perry (1986) for costly signaling
games does not have the same natural language element but is otherwise closely related to neologism-proofness.
2
Our findings from the first set of treatments indicate that neologisms and their credibility
played an evident role in how subjects played the games. Overall, fully revealing equilibria that
were neologism-proof were played more often than those that were not. The senders and the
receivers, however, responded differently to the presence and credibility of neologisms. The mere
existence of a literally meaningful neologism, albeit not credible, sufficed to attract deviations
from the senders. The receivers, on the other hand, deviated from the separating strategies only
when the meaningful neologism was credible.
In the absence of a pre-existing common language, there was a greater variation in individual
group behavior. We observed a few, but not all, individual groups in which, before the neologism
was introduced, subjects used and interpreted messages in a way consistent with a fully revealing
equilibrium. In other words, distinct meanings evolved for the two initial messages in selected
groups.5 We further observed among these groups some cases where the introduction of the
neologism disrupted the equilibrium play, at the same time when the meaning of the neologism
evolved to render it self-signaling. On the other hand, when the fully revealing equilibrium was
neologism-proof, there were cases where the equilibrium play remained intact after the neologism
was introduced.
Before we proceed to the literature review, we discuss three limitations of neologism-proofness
as a refinement concept and argue how our study may provide responses to two of the limitations.
The discussion serves to further highlight the contribution of our experiment in the context of
the theory. As a refinement concept, neologism-proofness lacks a general existence property,
and therefore we do not know what it predicts when existence fails. It also lacks a complete
formalization regarding the presence of unsent messages with natural meanings. Finally, it falls
short of fully addressing the concern over the usefulness of natural language, as neologisms arise
off the equilibrium path but languages on the path are still arbitrary.6
Regarding the first limitation, our experiment helps assess from an empirical vantage point
whether the lack of a general existence property should be a reason to discard the refinement
concept even for the games in which it has the power to predict play.7 While our study does
not address the second limitation, it provides an experimental response to the third limitation
by exploring whether there is an empirical tendency for literal meanings to be followed on the
equilibrium path. Furthermore, our treatments with meaningless messages offer a contrasting
5
By individual group, we mean a matching group in which a few sender-subjects randomly match with a few
receiver-subjects to form groups of two. Statistically, a matching group constitutes an independent observation.
6
We thank Joel Sobel for sharing these points with us.
7
Regarding the lack of existence, for instance, no equilibrium outcome in the leading example of Crawford and
Sobel (1982)—the uniform quadratic formulation—is neologism-proof. Yet in some other instances neologismproofness is still relied upon (probably due to the lack of a perfect alternative) to obtain sharper predictions of
the games (e.g., Gertner, Gibbons and Schafstein, 1988; Farrell and Gibbons, 1988, 1989; Austen-Smith, 1990;
Lim, 2014; Kim and Kircher, 2014); see also the discussion in Farrell and Rabin (1996). Thus, a major question
we pose in this paper is: how well neologism-proofness predicts play in the laboratory for the games in which the
refinement concept has some bite?
3
point to demonstrate the importance of natural language in a laboratory setting.
Related Literature. To our knowledge, the first experimental study that involves neologismproofness is Blume et al. (2001). With the primary objective of testing the validity of the
Partial Common Interest (PCI) criterion (Blume, Kim, and Sobel, 1993), Blume et al. (2001)
compare PCI with other selection criteria including neologism-proofness. In the games they
consider, neologism-proofness either shares the same prediction as Pareto efficiency or rejects
all the equilibria. Their findings therefore do not distinctively reveal how useful neologismproofness is in predicting play, which is not a focus of their study to begin with. As will be
discussed below, avoiding the potential confound of Pareto efficiency as a selection criterion is a
major consideration informing our experimental design.
Two other experimental studies related to cheap-talk refinements are Sopher and Zapater
(2000) and Kawagoe and Takiawa (2008). Sopher and Zapater (2000) find experimental support
for the credible message rationalizability proposed by Rabin (1990). Kawagoe and Takizawa
(2008) compare the effectivenesses of refinement concepts and of level-k model in explaining their
data, concluding that level-k analysis outperforms. Their design, however, does not provide a rich
enough environment to address neologism-proofness as neologisms are absent in their induced
environments.
More recently, De Groot Ruiz, Offerman and Onderstal (2015) propose a new cheap-talk refinement, Average Credible Deviation Criterion (ACDC), in which the stability of an equilibrium
is measured by the frequency and size of credible deviations. They show that ACDC successfully
organizes the data from several prior experiments. Most related to our study is their another
paper (De Groot Ruiz, Offerman and Onderstal, 2014), which further investigates ACDC with
their own experiment. They find that neologism-proofness performs well when its prediction
is unique, in which case ACDC predicts the same as does neologism-proofness. Perhaps more
importantly, the ACDC equilibrium is found to perform the best when neologism-proofness (and
announcement-proofness) has no selection power. Our paper differs from theirs on focus, design, and the issue addressed. While they use the inability of neologism-proofness to predict as
a contrasting point to demonstrate the usefulness of ACDC, our study aims at evaluating the
usefulness of neologism-proofness by focusing exclusively on the cases where the concept selects
an equilibrium. The use of a priori meaningless messages to investigate the evolution of meaning
of a neologism is also a unique feature of our design, an issue not addressed in their study.
Our paper is also related to two other strands of literature. The first is the literature on
experimental communication games (e.g., Dickhaut, McCabe, and Mukherji, 1995; Blume et al.
1998, 2001; Gneezy, 2005; Cai and Wang, 2006; Sánchez-Pagés and Vorsatz, 2007; Hurkens and
Kartik, 2009; Wang, Spezio, and Camerer, 2010).8 A robust finding of this literature is the
8
See also Crawford (1998) for a survey of earlier studies and a discussion on the connection between cheap-talk
refinements and costly signaling game refinements.
4
observation of “over-communication” or “lying aversion,” where subjects communicated more
than what equilibrium predicts. Our paper differs from these papers in that we are interested
not only in whether subjects play according to equilibria but also in whether they play according
to selected or refined equilibria.
Our investigation of the evolution of meanings using a priori meaningless messages is a subject
matter of Blume et al. (1998, 2001) and Blume, DeJong, and Sprinkle (2008), but studying the
evolution in reference to neologism-proof equilibria represents a new inquiry. The communication
games we use also have a different qualitative property from those of the games used in these
studies. While Blume et al. (1998, 2001) and Blume, DeJong, and Sprinkle (2008) use either
common-interest or partially-common-interest games, our study was the first to investigate the
evolution of meanings in games in which the sender and the receiver have conflicting preferences
over different equilibria.9 Furthermore, the introduction of an additional meaningless message
in the middle of an experimental session is also, to our knowledge, not explored before.10
The other literature to which our paper is related is the experimental investigation of equilibrium refinements for costly signaling games. Brandts and Holt (1992) test the intuitive criterion
by Cho and Kreps (1987). They find that both message-level and action-level data supported
the intuitive equilibrium, although this equilibrium was vulnerable to self-enforcing assignment
by an outside authority. Banks, Camerer, and Porter (1994) design games to separate various
refinements including the Nash equilibrium, sequential equilibrium, intuitive criterion, divine,
universal divine, and never-weak-best-reply. They show that subjects’ behaviors converged to
the more refined equilibrium up to the intuitive criterion.
The rest of the paper proceeds as follows. Section 2 lays out our experimental games and analyzes their equilibria. Section 3 discusses our experimental hypotheses and procedures. Section
4 reports our findings. Section 5 concludes.
9
Duffy, Lai, and Lim (2015) study how meanings of a priori meaningless messages develop in the Battle of
the Sexes game, also a game with conflicting preferences over different equilibria. Their study is, however, not in
a sender-receiver setting and is about how subjects learn to play a correlated equilibrium aided by an external
correlation device sending meaningless messages.
10
In relation to our manipulation of the sizes of the message spaces, Blume et al. (2008) document that
restricting the message space expedites convergence when messages are a priori meaningless; Lai, Lim, and
Wang (2015) highlight the role of the message space in facilitating information transmission in multidimensional
settings; Serra-Garcia, van Damme, and Potters (2013) study the effect of limiting the message space in public
goods games with cheap-talk communication about private information. Our novel contribution relative to these
studies is to provide a new channel—a refinement of equilibria—through which message spaces may play a role
in information transmission.
5
2
The Experimental Cheap-Talk Games
2.1
Four Games with Literal Messages
Our first set of treatments consists of four cheap-talk games endowed with literally meaningful
messages. Putting aside the variations in the sizes of the message spaces, the four games reduce
to two games with different payoff structures, which we call Game 1 and Game 2. We first
describe these two games and provide details regarding the message spaces when we discuss
neologisms below.
There are two players in each game, a sender (he) and a receiver (she). The sender is privately
informed about his type θ P ts, tu, and the common prior is that the two types are equally likely.
After observing the realized θ, the sender sends a cheap-talk message, m P M , to the receiver.
Upon receiving m, the receiver takes an action a P tL, C, Ru.
Table 1 presents the payoffs of the games. In each payoff cell, the first number is the sender’s
payoff and the second number is the receiver’s payoff when an action is taken in a state.
s
t
L
30, 20
30, 20
C
20, 30
8, 0
R
0, 8
20, 30
s
t
L
50, 20
10, 20
(a) Game 1
C
20, 30
8, 0
R
0, 8
20, 30
(b) Game 2
Table 1: Payoffs of the Games
We design these games out of simplicity and three other considerations. First, as long as
|M | ě 2, Games 1 and 2 both have multiple perfect Bayesian equilibrium (henceforth “equilibrium”) outcomes with no Pareto ranking. There are thus rooms for equilibrium selection, and
yet Pareto dominance is ruled out as a confounding selection criterion.11 There are two equilibrium outcomes, babbling and fully revealing. In the babbling equilibrium outcome, the receiver
ignores the sender’s message, taking the ex-ante ideal action L under the equal priors; in the
fully revealing equilibrium outcome, the receiver takes different actions, C or R, after receiving
distinct messages sent by distinct types of the sender. There is no Pareto dominance as there
are conflicting preferences over the two outcomes: in both games, the sender strictly prefers the
babbling outcome to the fully revealing outcome, while the opposite is true for the receiver.
Our second consideration is to minimize the possibility that subjects choose to play different
equilibria in different games out of pure payoff considerations. In this, we control the profile of
the sender’s expected payoffs from the two equilibrium outcomes, 30 for the babbling and 20 for
the fully revealing, to be the same across the two games.
11
Game 1 shares the same qualitative structure as the “I Won’t Tell You” game in Farrell (1993), Game Γ2
in Matthews, Okuno-Fujiwara, and Postlewaite (1991), Game 2 in Kawagoe and Takizawa (2008) and Game 1 in
Sobel (2013). Refer to Sobel (2013) for a justification for the payoff structure of the game.
6
Our last consideration is to ensure that other-regarding preferences, if any, do not play a
role in the equilibrium selection. To achieve this, we assign payoffs so that in both games a
player’s expected payoff from one equilibrium outcome is the other player’s expected payoff from
the other outcome, making the payoff profiles similar to that found in the Battle of the Sexes
game.12
Neologism-Proofness Applied. A neologism-proof equilibrium is an equilibrium in which a
credible neologism does not exist. To set the ground for defining credible neologisms, Farrell
(1993) assumes that for every relevant equilibrium, a neologism, being an unsent message with
literal meaning “my type is in K,” exists for every non-empty subset K of types. Relative to
a putative equilibrium, a neologism with meaning “my type is in K” is then credible, or selfsignaling, if a) the sender’s types in K strictly prefer the outcome achieved when the neologism
is believed over the putative equilibrium outcome, and b) the types not in K weakly prefer to
stay in the putative equilibrium.
In designing a laboratory environment that is easy for subjects to comprehend, we look for
games that allow us to apply neologism-proofness in the simplest possible setting. Note that
Farrell’s (1993) requirement that a neologism exists for every non-empty subset of types is not
part of the definition of credible neologisms. Accordingly, neologism-proofness can still be applied
when, for a given equilibrium, a neologism exists for some but not all subsets of types. This
balance between a simple design and the applicability of the concept guides the design of our
message spaces, which feature what we call limited neologisms.
We take Games 1 and 2 and pair each of them with a message space that contains three literal messages: M “ t“my type is s”, “my type is t”, “I won’t tell you my type”u. The resulting
games are called Game 1M 3 and Game 2M 3. We characterize the neologism-proof equilibrium
outcomes of these two games as follows:
Proposition 1. The fully revealing outcome in Game 1M 3 cannot be supported as neologismproof, whereas that in Game 2M 3 can be so supported. The babbling outcome in Game 1M 3 can
be supported as neologism-proof, whereas that in Game 2M 3 cannot be so supported.
To show that the fully revealing outcome in Game 1M 3 cannot be supported as neologismproof, it suffices to show that one of the fully revealing equilibria is not neologism-proof. Consider
then the truth-telling equilibrium in which s and t send, respectively, “my type is s” and “my
type is t.”13 The payoffs for both types in this equilibrium are 20. Now, “I won’t tell you my
12
Yet another consideration we have is to create an environment in which the prediction of neologism-proofness
can be separated as much as possible from that of the level-k model of bounded rationality, which is commonly
applied to rationalize findings from communication games. Note that the only difference between Games 1 and 2
is the type-t sender’s ideal actions. The expected payoff from each equilibrium outcome for each player, as well
as the receiver’s ideal actions, are designed to be the same across the two games. Accordingly, for a given size
of the message space, very limited configurations of level-k models can generate different predictions for the two
games. For a supplementary level-k analysis, see Appendix A.
13
The truth-telling equilibrium is a fully revealing equilibrium in which literal meanings are used on the
7
type,” which literally means the sender’s type is in ts, tu, is the neologism. If the receiver believes
the literal meaning of this neologism, she will take the ex-ante ideal action L, resulting in a payoff
of 30 for the sender regardless of his type. Given that 30 ą 20, both s and t—precisely the types
corresponding to the literal meaning of the neologism—prefer the outcome achieved when the
neologism is believed over the putative equilibrium outcome: the neologism is thus self-signaling.
For the truth-telling equilibrium in Game 2M 3, note that when the receiver believes the
literal meaning of “I won’t tell you my type” and takes action L, only s but not t strictly prefers
the neologism to be believed: the neologism is thus not self-signaling. Consider further the
remaining two fully revealing equilibria that are not truth telling, in which either “my type is
s” or “my type is t” is the neologism. In either case, the type expressed by the literal meaning
of the neologism would not strictly prefer the outcome achieved when the neologism is believed,
because the on- and off-equilibrium path payoffs are the same for the type. The neologisms in
these non-truth-telling fully revealing equilibria are thus also not self-signaling. Accordingly, the
fully revealing outcome in Game 2M 3 can be supported as neologism-proof.14
In Game 1M 3, since both sender’s types receive their maximum payoffs in the babbling
outcome, it is straightforward that no credible neologisms can exist, and thus the babbling
outcome can be supported as neologism-proof. Finally, to show that the babbling outcome in
Game 2M 3, which gives type s a payoff of 50 and type t a payoff of 10, cannot be supported
as neologism-proof, it suffices to consider a babbling equilibrium in which “my type is t” is a
neologism (there may be another neologism). In this case, type t strictly prefers the outcome
achieved when the neologism is believed (in which case the receiver takes R, giving type s a
payoff of 0 and type t a payoff of 20) over the putative equilibrium outcome. Type s, on the
other hand, prefers to stay in the equilibrium, which together makes the neologism self-signaling.
To generate richer treatment variations so as to identify the effects of neologisms, we introduce two more games, Game 1M 2 and Game 2M 2, which are Games 1 and 2 endowed with a
binary message space M 1 “ t“my type is s”, “my type is t”u. The neologism-proof equilibrium
outcomes of the two games are characterized as follows:
Proposition 2. Both the fully revealing and babbling outcomes in Game 1M 2 can be supported as
neologism-proof. Only the fully revealing outcome in Game 2M 2 can be supported as neologismequilibrium path.
14
While the use of limited neologisms does not prevent us from applying neologism-proofness, it calls for a
different approach in showing that an equilibrium outcome can be supported as neologism-proof. In Farrell (1993),
since for any equilibrium a neologism exists for every subset of types, when a neologism-proof equilibrium exists
to support an outcome, it automatically covers all neologisms, i.e., under no neologism does self-signaling occur
to compromise the equilibrium outcome. When there are limited neologisms, however, different equilibria that
support an outcome may be associated with different (sets of) neologisms. Even if self-signaling does not occur
under the neologism(s) available in one supporting equilibrium, it may occur under another neologism in another
supporting equilibrium. Accordingly, to show that an outcome can be supported as neologism-proof, we need to
show that all its supporting equilibria with different neologisms are neologism-proof so as to cover all neologisms;
if there is one supporting equilibrium that is not neologism-proof, the outcome cannot be so supported, even
though there are other neologism-proof supporting equilibria.
8
proof.
For these games with binary messages, the fully revealing outcome is either supported by a
truth-telling equilibrium or by another equilibrium in which the literal meanings of the messages
are not followed on the equilibrium path. The neologism-proofness of the fully revealing outcomes in these games is a trivial consequence of there being only two messages so that in either
supporting equilibrium there is no neologism. We have thus taken the limited neologisms to the
extreme by eliminating neologisms.15 The neologism-proofnesses of the babbling outcomes in
Games 1M 2 and 2M 2 are characterized in the same way as those in Games 1M 3 and 2M 3.
A couple of remarks about how the design of message spaces for these treatments relates
to Farrell’s (1993) notion of natural language and his assumed existence of neologisms is in
order. There are two main ingredients in Farrell’s (1993) notion of natural language: a) common
language, i.e., each message has a literal meaning that is associated with a type in the type space,
and b) rich language, i.e., the message space is large enough (relative to the type space) so that
for any subset K of the sender’s types, a message with the literal meaning “my type is in K”
exists. Our message spaces fulfill the first requirement: being framed in English, the messages
have clear and commonly understood literal meanings. While by most standards our message
spaces cannot be considered as large and despite the limited neologisms we induce, our message
space M is rich enough to describe all subsets of the binary types, even though some of these
messages are used in equilibrium.
In assuming the existence of neologisms, Farrell (1993) argues that a mixed-strategy equilibrium in which the sender randomizes over messages with the same equilibrium meaning is
implausible. This type of equilibria will be implausible when the sender has a preference for
short, simple, and straightforward messages.16 Our design with two different sizes of message
spaces allows us to evaluate whether the assumed existence of unsent messages is an empirically
plausible assumption. If randomization is a natural empirical tendency, having an additional
message in M relative to M 1 should not affect the information transmission outcomes.
15
Our Game 1M 2 is of the same class as Game 2 in Kawagoe and Takiawa (2008). Our conclusion on the
neologism-proofness of the fully revealing outcome is, however, different from theirs. With binary types and
binary messages, there will be no unsent messages in any fully revealing equilibrium in our Game 1M 2 or in
Kawagoe and Takiawa’s (2008) Game 2. Since no neologisms exist, we consider that the fully revealing outcome
trivially survives neologism-proofness, while they consider that only the babbling outcome survives the refinement.
This difference in characterization will have implications for the interpretation of experimental findings. As will
become apparent below, subjects’ behavior in our Game 1M 2 conformed to the fully revealing equilibrium, which,
accordingly to our view, is consistent with the prediction of neologism-proofness. On the other hand, under the
conviction that only the babbling outcome survives neologism-proofness, Kawagoe and Takiawa (2008) attribute
their similar findings to over-communication.
16
On this point, Farrell (1993) states that: “For example, if type t wants (and is expected) to reveal himself,
and if both the English sentences, ‘I am t,’ and ‘I am either u or v,’ are interpreted in equilibrium as meaning ‘I
am t,’ then S [the sender] will prefer the former” (pp. 518, Section 4. Are There Unused Messages?).
9
2.2
Two Games with A Priori Meaningless Messages
Despite the important role of common language in neologism-proofness, Farrell (1993) also provides an evolutionary interpretation of neologisms in an environment without pre-existing language, where the meaning of a neologism must evolve. To investigate the predictive power of
neologism-proofness from this perspective, we design a second set of treatments, which consists
of two additional games with a priori meaningless messages, Game 1E and Game 2E.
Games 1E and 2E are, respectively, Games 1 and 2 endowed with an initial binary message
space M 2 “ t$, %u. In the course of the repetitions of subjects’ tasks, a third meaningless
message, “&”, will be introduced and made available to the sender-subjects in each game. Our
second set of treatments thus involves a within-subject variation of the message spaces (the
details of the experimental procedures will be provided in Section 3.2 below).17
Without proving additional propositions, we discuss what behavior may emerge in Games 1E
and 2E, leveraging the characterizations in Proposition 1 and anticipating that meanings will
eventually emerge in one way or the other.18 Note that our second set of treatments is more of
an exploratory nature, and the discussion below is meant to provide some behavioral restrictions
that help inform our experimental hypotheses.
We start with two plausible conjectures about how play may evolve before and immediately
after the introduction of the third message. First, given the payoff structures of the games,
it is plausible that some separation may occur under the binary message spaces, which results
in distinct meanings being emerged for the two initial messages, “$” and “%”.19 When the
third message, “&”, is introduced and sent, it is considered as a neologism. In the absence
of a precedence and a literal meaning, a natural initial response of the receiver is to ignore it:
our second conjecture is that, immediately after it is introduced, the receiver considers that the
neologism does not provide any information and takes the ex-ante ideal action L under the equal
priors. These conjectures and the simple analysis of initial behavior lead to different predicted
paths of play in Games 1E and 2E.
Recall that in Game 1E, both types of the sender obtain 20 from separation in a fully revealing
equilibrium and 30 from the receiver’s taking L. Accordingly, it pays for both s and t to send
17
Accordingly, Games 1E and 2E should be more properly understood as an experimental design rather than
one-shot games in a formal sense; the variation of the message space in a particular “game” (treatment) involves,
strictly speaking, two different games.
18
Farrell’s (1993) discussion regarding the evolutionary interpretation of neologism-proofness (p.526-527, Section 9) is based on an example and is also rather informal. Note that our design is not to empirically evaluate
the exact argument he makes with his example, partly because his example is based on a game with a different
equilibrium property. Rather, our objective is to investigate how well neologism-proofness might predict in the
absence of pre-existing language.
19
As an empirical precedence supporting this conjecture, Blume et al. (1998) obtain the finding that meaningful
and efficient communication (i.e., separation) endogenously emerged in a common-interest communication game
with a priori meaningless messages, where, as in our case, the type space and the message space share the same
cardinality of two.
10
the neologism when the receiver’s initial response to it is L. As they do, a meaning that does
not exist in the prevailing fully revealing equilibrium emerges for “&”, which is exactly that the
sender can be of either type. This in turn reinforces the receiver to choose L. The neologism is
thus self-signaling with respect to its evolved meaning, and we expect to see less separation after
it is introduced, echoing the fact that the fully revealing outcome in the corresponding Game
1M 3 cannot be supported as neologism-proof.
In Game 2E, the receiver’s initial response of L to “&” will only attract type s to send
the neologism. As “&” becomes a choice of message by s, its meaning evolves to be that the
sender’s type is s, to which the receiver’s best response is C. The incentive for type s to send
the neologism is then weakened, because s receives the same treatment in the prevailing fully
revealing equilibrium. The neologism also does not convey any new meaning relative to the
prevailing equilibrium. On the other hand, regardless of whether the receiver’s response to “&”
is L (initially) or C (eventually), it does not pay for type t to send the neologism. Since no
type strictly prefers to send “&”, it is not self-signaling; separation remains even with the third
message, echoing the fact that the fully revealing outcome in the corresponding Game 2M 3 can
be supported as neologism-proof.
3
Experimental Hypotheses and Procedures
3.1
Hypotheses
The three different configurations of message spaces and the two different payoff structures give
rise to a 3 ˆ 2 treatment design, which is summarized in Table 2. The table also summarizes the
neologism-proofness characterizations presented in Section 2.
Table 2: Experimental Treatments
Game 1
Game 2
Literal M
|M | “ 3
Game 1M 3
neologism-proof: babbling
Game 2M 3
neologism-proof: fully revealing
Literal M 1
|M 1 | “ 2
Game 1M 2
neologism-proof: both
Game 2M 2
neologism-proof: fully revealing
Meaningless M 2
|M 2 | “ 2 Ñ 3
Game 1E
neologism-proof: both Ñ babbling
Game 2E
neologism-proof: fully revealing Ñ fully revealing
In formulating the experimental hypotheses, we make comparisons within each set of treatments, one with literal messages and one with meaningless messages. For the treatments with
11
literal messages, we compare the frequencies of fully revealing outcomes in different games. Informed by Propositions 1 and 2, the hypotheses are based on the premise that a fully revealing
equilibrium surviving neologism-proof is more likely to be played.
We start the comparative statics with Game 2M 2: from Game 2M 2 to Game 2M 3, we
create a neologism that is not credible. Our first hypothesis concerns the effect of the existence
of non-credible neologisms:
Hypothesis 1 (Effect of the Existence of Non-Credible Neologisms). The frequency of
fully revealing outcome in Game 2M 2 is the same as that in Game 2M 3.
From Game 2M 3 to Game 1M 3, we make the neologism credible. Our second hypothesis
concerns the effect of the credibility of neologisms:
Hypothesis 2 (Effect of the Credibility of Neologisms). The frequency of fully revealing
outcome is higher in Game 2M 3 than in Game 1M 3.
From Game 1M 3 with a credible neologism to Game 1M 2 with only two messages, we get rid
of any neologism. Our third hypothesis concerns the joint effect of the existence and credibility
of neologisms:
Hypothesis 3 (Effect of the Existence and Credibility of Neologisms). The frequency
of fully revealing outcome is lower in Game 1M 3 than in Game 1M 2.
Our last hypothesis concerns the treatments with meaningless messages. It evaluates how
the evolution of meanings, to be established empirically by the frequencies of the senders’ types
conditional on the messages sent, impacts on subjects’ play. In particular, we examine three
effects of the evolved meaning of the third, neologistic message implied by the analysis in Section
2.2:
Hypothesis 4 (Effects of the Evolution of Meanings of Credible and Non-Credible
Neologisms).
(i) In Game 1E, a meaning not present before the introduction of the third message “&”, that
the sender is equally likely to be of either type, endogenously emerges for “&”; in Game
2E, no new meaning emerges for “&” after it is introduced.
(ii) The frequency of “&” being sent is higher in Game 1E than in Game 2E.
(iii) In Game 1E, the frequency of fully revealing outcome after the introduction of “&” is lower
than that before the neologism is introduced; in Game 2E, there is no difference in such
frequencies before and after “&” is introduced.
12
3.2
Procedures
Our experiment was conducted in English using networked computers and z-Tree (Fischbacher,
2007) at The Hong Kong University of Science and Technology. A total of 256 subjects participated in the six treatments. The subjects, all without prior experience with our experiment,
were recruited from the undergraduate population of the university. Upon arrival at the lab,
subjects were instructed to sit at separate computer terminals. Each received a copy of the
experimental instructions. The instructions were read aloud using slide illustrations as an aid,
which was followed by a comprehension quiz and a practice round.
The two sets of treatments follow different procedures and have different instructions. Appendix B contains the instructions for Games 1M 3 (with literal messages) and 1E (with meaningless messages); the instructions for the other games are similar. In the following, we describe
the essence of the experimental procedures, starting with the literal-message treatments.
Treatments with Literal Messages. Two sessions were conducted for each game with literal
messages. A session was participated by two matching groups. A matching group consisted of
10 subjects, five as senders (Member As) and five as receivers (Member Bs). Random matching
allowing repeating partners was used in each matching group, where senders and receivers were
matched within groups. Viewing each matching group as an independent observation, we thus
have four observations per game, which provide sufficient statistical power for non-parametric
tests.
In each session, subjects participated in 20 rounds of decision making under a single treatment
condition, i.e., between-subject design was adopted for these treatments. At the beginning of a
session, half of the subjects were randomly assigned the role of Member A and the other half
the role of Member B. The role assignment remained fixed throughout the session. During the
entire session, the relevant payoff table in Table 1 was shown on each subject’s screen.
In order to elicit a rich set of information to analyze how subjects make decisions, we employed
the strategy method and elicited beliefs. Subjects were told that there was a random variable,
X, whose integer value ranged equally likely from 1 to 100. At the beginning of each round, the
Member A in each group was asked what message he/she would send to the paired Member B
if X ą 50 (corresponding to θ “ s) and if X ď 50 (corresponding to θ “ t). For the games
with three messages, the available messages were “X is bigger than 50,” “X is smaller than or
equal to 50,” and “I won’t tell you” (the last message was not available in the game with two
messages). The Member B in each group was asked what action he/she would take, namely L,
C, or R, upon receiving each of the two or three messages from the paired Member A.20 Once
20
For the sake of simple experimental instructions and design, we did not allow subjects to randomize explicitly
over different messages/actions for a given contingency. Note that a strict and unique best-response action exists
for a babbling message and for any truth-telling message so that allowing explicit randomization does not result
in additional gain in generality.
13
the members had stated their strategies, the value of X would be realized and revealed to them.
Their strategies were then implemented based on the realized X, and the rewards for the round
would be determined. Having known the implemented message based on the realized X, the
Member A in each group was further asked to predict the paired Member B’s action choice. A
correct prediction was rewarded with two payoff points. Feedback on choices and rewards, but
not strategies, was provided at the end of each round.
For these treatments with 20 rounds of play, we randomly selected two rounds for payments.
The sum of the payoffs a subject earned in the two selected rounds was converted into Hong
Kong Dollars at a fixed and known exchange rate of HK$1 per payoff point. A show-up fee of
HK$30 was also provided. Subjects on average earned HK$76.8 (« US$9.85) by participating in
a session that on average lasted about an hour.21
Treatments with A Priori Meaningless Messages. Two sessions were conducted for each
game with meaningless messages. A session was participated by six matching groups. In order to
expedite the emergence of meanings while maintaining the use of random matching, we reduced
the size of a matching group to four subjects, two as Member As and two as Member Bs. The
smaller-sized matching groups also allow us to have more observations without increasing the
number of sessions per game. Viewing each matching group as an independent observation, we
have 12 observations per game in these treatments.
In each session, subjects participated in 40 rounds of decision making, 20 rounds with two
messages and 20 rounds with three messages. Subjects made decisions under a single treatment
condition with a within-subject variation of the message spaces. The role-assigning procedure
and the display of payoff table on subjects’ screens were the same as those in the treatments
with literal messages.
Given the focus on the emergence of meanings, we adopted the choice method for these
treatments, which allows us to collect sufficient information for the purpose. We also did not
elicit beliefs.22 Subjects were told that there was a random variable, X, whose integer value
would be drawn equally likely from 1 to 100. At the beginning of each round, the Member A
in each group was privately informed about whether the randomly drawn X was larger than 50
(corresponding to θ “ s) or less than or equal to 50 (corresponding to θ “ t). In each of the 1st
round to the 20th round, the Member A was asked to send one of the two messages, “$” or “%”,
to the paired Member B after seeing the realized range of X.23
21
Under the Hong Kong currency board system, the HK dollar is pegged to the US dollar at the rate of HK
$7.8 to US$1.
22
While we draw analogy between the two sets of treatments, we do not perform direct comparative statics
between the games with and without literal messages. The different use of decision-elicitation method and the
omission of belief elicitations therefore do not jeopardize the integrity of our design. The different procedure used
in these treatments with meaningless messages was also driven by a practical consideration. Since subjects were
making 20 more rounds of decision, using strategy method and eliciting beliefs would prolong the time allocated
for a session beyond the limit stipulated by our lab policy.
23
In order to avoid subjects using the positions of the symbols “$” and “%” on their screens and in the
14
In the paper instructions, subjects were told that further instructions would be given for the
last 20 rounds. Thus, they were aware of the different instructions for the later rounds, but
their decisions in the first 20 rounds could not be influenced by the details of the forthcoming
instructions. After the 20th round was over, further instructions were provided on subjects’
screens. They were told that an additional message, “&”, became available for Member As to
send. Everything else remained the same as in the first 20 rounds.
In each round, after receiving a message from the paired Member A, the Member B in each
group chose one of the three actions, L, C, or R. The rewards for the round were then determined
based on the realized X and the Member B’s action choice. Feedback on choices and rewards
was provided at the end of each round.
Since subjects participated in 40 rounds of decision making in these treatments, we commensurately increased their average payments to reflect the extra time they spent. We randomly
selected three rounds, and the sum of the payoffs a subject earned in these three rounds was
converted into Hong Kong Dollars at a fixed and known exchange rate of HK$1 per payoff point.
A higher show-up fee of HK$40 was also provided. Subjects on average earned HK$105.88 («
US$13.57) by participating in a session that on average lasted about two hours.
4
4.1
Experimental Findings
Games with Literal Messages
In this subsection, we report the findings from our first set of treatments with literal messages.
We first analyze the data obtained via the strategy method by examining the on-path play, using
it to evaluate the relevant hypotheses. We then examine the elicited strategies and beliefs, which
provide further, supporting evidence.
On-Path Plays. To extract on-path plays from the elicited strategies, we apply the elicited
strategies to the realized X to determine the realized path of play in each instance of the decision
making. We then use these realized plays to establish the frequencies of the relevant information
transmission outcomes and of the subjects’ behavior. Other than the major task of evaluating
Hypotheses 1-3, the comparisons of information transmission outcomes in games with different
sizes of message spaces will also allow us to assess the empirical plausibility of Farrell’s (1993)
assumed existence of unsent messages, which, as discussed in Section 2.1, is an essential part of
the refinement concept.
instructions as focal points for their choices of messages, we randomized the relative positions of the two symbols
on the decision screens and in the instructions, and this was made known to subjects. On a Member A’s decision
screen, the positions of the buttons “$” and “%” are randomized in each round. We prepared two sets of
instructions, in which the two symbols are stated in different orders. We randomly distributed the instructions
so that half of the subjects received one set of instructions and the other half received the other set.
15
Frequencies of Fully Revealing Equilibrium Outcomes
Game 2M2
Game 2M3
Game 1M3
Game 1M2
1.00
Proportion
0.80
0.60
0.40
0.20
0.00
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20
Round
Figure 1: Frequencies of Fully Revealing Equilibrium Outcomes, Games 1M 2, 1M 3, 2M 2, and 2M 3
Figure 1 presents the round-by-round frequencies of fully revealing outcomes in the four games
with literal messages. The frequencies were obtained by measuring how often the receivers took
their ideal actions. Our first finding confirms Hypothesis 1:
Finding 1 (Effect of the Existence of Non-Credible Neologisms). Confirming Hypothesis
1, there was no significant difference in the frequency of fully revealing outcome between Game
2M 2 and Game 2M 3.
Using the independent group-level, all-round data, we fail to reject the null hypothesis of
there being no difference in the frequencies of fully revealing outcome between Games 2M 2 and
2M 3, which were, respectively, 55% and 51% (two-sided p “ 0.486, Mann-Whitney test).24
While the existence of a non-credible neologism had no overall impact on the frequencies of
fully revealing outcome, it did affect the senders’ and the receivers’ behavior. Figure 2 presents
the frequencies of messages conditional on types and of actions conditional on messages.25
Using the observations from Game 2M 2 as a point of reference, in Game 2M 3 the introduction of the non-credible neologism (relative to the truth-telling equilibrium), “I won’t tell you,”
attracted deviating behavior on the senders’ part. The senders in Game 2M 2 exhibited truthtelling behavior, where s-types sent “my type is s” 86% of the time and t-types sent “my type is
t” 79% of the time. In Game 2M 3, the former frequency dropped significantly to 38% (p “ 0.014,
Mann-Whitney test) and the latter not significantly to 61% (p “ 0.243, Mann-Whitney test).
Furthermore, in Game 2M 3, s-types and t-types used the neologism “I won’t tell you” 58% and
31% of the time respectively. In response to “I won’t tell you,” the receivers in Game 2M 3 took
24
As Figure 1 reveals, there was no notable convergence in subjects’ behavior. We therefore use all-round
data for our statistical tests. Unless otherwise indicated, the p values are from one-sided tests. We consider a
comparison with p ă 0.06 as being statistically significant.
25
To illustrate how these frequencies are calculated based on on-path plays, suppose that in a round a) a sender
furnishes the following strategy: sending “X is bigger than 50” if X ą 50 and sending “X is smaller than or equal
to 50” if X ď 50; and b) a receiver furnishes the following strategy: taking action C after receiving message “X
is bigger than 50,” taking R after receiving “X is smaller or equal to 50,” and taking L after receiving “I won’t
tell you.” If it turns out that in this round the realized X ą 50, then one data point of “my type is s” being sent
by s and one data point of action C being taken after “my type is s” is received will be recorded.
16
Proportions of Type‐Message Realized Pairs
Proportions of Message‐Action Realized Pairs
1.00
0.60
Game 2M2
Game 2M3
Game 1M3
Game 1M2
0.80
Proportion
0.80
Proportion
1.00
Game 2M2
Game 2M3
Game 1M3
Game 1M2
0.40
0.20
0.60
0.40
0.20
0.00
"My type is s"
"My type is t"
"I won't tell
you"
"My type is s"
s
"My type is t"
0.00
"I won't tell
you"
L
C
"My type is s"
t
Type‐Message
Type-Message
R
L
C
R
"My type is t"
Message‐Action
L
C
R
"I won’t tell you"
Message-Action
Figure 2: Senders’ and Receivers’ Behavior, Games 1M 2, 1M 3, 2M 2, and 2M 3
L 55% of the time. Even though theoretically speaking “I won’t tell you” is not a self-signaling
neologism in Game 2M 3, s-types sent it to induce—to a certain extent successfully—the receivers
to take the ex-ante ideal action so as to obtain the maximal payoff of 50.
Unlike the case of the senders, the presence of a non-credible neologism did not attract more
deviating behavior from the receivers. In fact, compared to the responses in Game 2M 2, in
Game 2M 3 the receivers’ responses to “my type is s” and “my type is t” were more in line with
the truth-telling equilibrium behavior (i.e., taking actions as if the sender was telling the truth).
The frequency of action C conditional on “my type is s” increased significantly from 62% in
Game 2M 2 to 80% in Game 2M 3 (p “ 0.014, Mann-Whitney test); the frequency of action R
conditional on “my type is t” increased from 70% in Game 2M 2 to 72% in Game 2M 3, though
not significantly (p “ 0.557, Mann-Whitney test).
In terms of generating deviating behavior, the neologism “I won’t tell you,” being non-credible
in Game 2M 3, affected the senders but not the receivers. Relative to the observations from Game
2M 2, the more frequent deviating behavior of the senders was more or less compensated by the
more frequent truth-telling equilibrium behavior of the receivers, resulting in the same overall
frequencies of fully revealing outcome between the two games.26
We report our second finding, which confirms Hypothesis 2:
Finding 2 (Effect of the Credibility of Neologisms). Confirming Hypothesis 2, the frequency of fully revealing outcome was significantly higher in Game 2M 3 than in Game 1M 3.
The frequency of fully revealing outcome in Game 1M 3 was 32%, significantly lower than the
51% in Game 2M 3 (p “ 0.029, Mann-Whitney test). This is consistent with our premise that
26
The frequencies of fully revealing outcomes observed in Games 2M 2 and 2M 3, being 55% and 51%, may
appear to be on the low side, calling into question how well subjects played according to the equilibrium. Note,
however, that even with reasonably high frequencies of conforming behavior, e.g., that the senders truthfully
reveal 80% of the time and the receivers optimally respond 80% of the time, the resulting frequencies of fully
revealing outcomes would be substantially lower at 64%. Also, our primary interests are in the comparative
statics, not the absolute levels of the frequencies.
17
the equilibrium that survives neologism-proofness is more likely to be played in the laboratory.
The credibility of the neologism, however, had only small impacts on the senders’ uses of the
neologism. For t-types, the frequency of “I won’t tell you” increased insignificantly from 31% in
Game 2M 3 to 45% in Game 1M 3 (p “ 0.100, Mann-Whitney test); for s-types, the frequency of
“I won’t tell you” decreased insignificantly from 58% to 45% (p “ 0.100, Mann-Whitney test).
By contrast, the receivers’ behavior was more significantly influenced by the credibility of the
neologism; we observed a lower extent of truth-telling equilibrium behavior when the neologism
became credible. The frequency of action C conditional on “my type is s” decreased significantly
from 80% in Game 2M 3 to 54% in Game 1M 3 (p “ 0.057, Mann-Whitney test); the frequency
of R conditional on “my type is t” decreased, though insignificantly, from 72% in Game 2M 3 to
56% in Game 1M 3 (p “ 0.243, Mann-Whitney test).
The two properties of neologisms, their existence and credibility, therefore had totally opposite
implications for laboratory behavior: in terms of generating deviating behavior, the credibility
of neologisms affected, opposite to the case of existence, the receivers but not the senders. This
finding also suggests that the lower adherence to fully revealing outcome in Game 1M 3 relative
to the observations from Game 2M 3 was a result of a more frequent deviations of the receivers
(while the senders’ behavior showing minimal variations between the two games).
We report our last finding from the literal-message treatments, which confirms Hypothesis 3:
Finding 3 (Effect of the Existence and Credibility of Neologisms). Confirming Hypothesis 3, the frequency of fully revealing outcome was significantly lower in Game 1M 3 than in
Game 1M 2.
The frequency of fully revealing outcome in Game 1M 2 was 67%, significantly higher than the
32% in Game 1M 3 (p “ 0.015, Mann-Whitney test). For the senders’ behavior, the frequency
at which s-types sent “my type is s” increased significantly from 45% in Game 1M 3 to 85% in
Game 1M 2 (p “ 0.015, Mann-Whitney test); the frequency at which t-types sent “my type is t”
increased significantly from 40% in Game 1M 3 to 85% in Game 1M 2 (p “ 0.014, Mann-Whitney
test). For the receivers’ behavior, the frequency of C conditional on “my type is s” increased
insignificantly from 54% in Game 1M 3 to 71% in Game 1M 2 (p “ 0.171, Mann-Whitney test);
the frequency of R conditional on “my type is t” increased significantly from 56% in Game 1M 3
to 84% in Game 1M 2 (p “ 0.029, Mann-Whitney test).
Getting rid of any neologism increased the frequencies of truth-telling equilibrium behavior
of both the senders and the receivers, and this resulted in the receivers taking their ideal actions
significantly more often in Game 1M 2 than in Game 1M 3.
To conclude our on-path-play analysis, we further note that the different information transmission outcomes observed under two different sizes of message spaces, as we see in Games 1M 2
18
and 1M 3, undermines the idea that randomization is a natural empirical tendency. This observation lends force to the empirical plausibility of Farrell’s (1993) assumed existence of unsent
messages. Even though there was no significant difference between the frequencies of fully revealing outcome in Games 2M 2 and 2M 3, the fact that the underlying senders’ and receivers’
behavior differed in the two games also rules out randomization as a plausible phenomenon.
Elicited Strategies and Beliefs. We proceed to examine the elicited strategies, providing
further support for the above on-path-play findings. Senders’ elicited beliefs about the receivers’
responses will also be examined.
In analyzing the senders’ elicited strategies, we classify the strategies into the following four
categories: literal babbling (“I won’t tell you” chosen for both s and t), non-revealing (the same
message chosen for both s and t), truth-telling (“My type is s” chosen for s and “My type is
t” chosen for t), and fully revealing (different messages chosen for s and t).27 For the receivers,
we classify their elicited strategies into two categories: pooling (action L chosen for all available
messages) and separating (action C chosen for message “my type is s,” R chosen for “my type
is t,” and, for three-message games, any one of the actions chosen for “I won’t tell you”). Figure
3 presents the frequencies of the senders’ and the receivers’ strategy categories.
Frequencies of Strategy Categories
Frequencies of Strategy Categories
1.00
1.00
Game 2M2
Game 2M3
0.80
Game 1M3
Proportion
Game 1M2
Proportion
Game 2M2
Game 2M3
0.80
0.60
0.40
Game 1M3
Game 1M2
0.60
0.40
0.20
0.20
0.00
0.00
Literal Babbling
Non‐Revealing
Truth Telling
Fully revealing
Strategy Category
Senders’ Strategy Categories
Pooling
Separating
Separating
Separating
Separating
(L, L, L)/(L, L)
(C, R, *)/(C, R)
(C, R, L)
Strategy Category
(C, R, C)
(C, R, R)
Receivers’ Strategy Categories
Figure 3: Senders’ and Receivers’ Strategies, Games 1M 2, 1M 3, 2M 2, and 2M 3
In Game 2M 2, almost all fully revealing strategies of the senders were truth telling (82% vs.
76%). This is to be contrasted with Game 2M 3, in which the frequencies of fully revealing and
truth-telling strategies were, respectively, 64% and 31%. This gap suggests that some senders in
Game 2M 3 chose to send “I won’t tell you” for one type and another message for another type.28
The on-path-play finding that the non-credible neologism attracted deviating behavior from the
senders was also apparent from the significant drop in the frequency of truth-telling strategies
from 76% in Game 2M 2 to 31% in Game 2M 3 (p “ 0.0143, Mann-Whitney test).
27
Note that some of these categories are not mutually exclusive: a literal babbling strategy is also non-revealing,
while a truth-telling strategy is also fully revealing.
28
The on-path plays shown in the left panel of Figure 2 suggest that these senders chose “I won’t tell you” for
s and “My type is t” for t, which would effectively reveal their types.
19
Consistent with the fact that the fully revealing outcome in Game 1M 3 cannot be supported
as neologism-proof, there were more instances of literal babbling in Game 1M 3 than in Game
2M 3 (37% vs. 29%). The difference was, however, not significant (p “ 0.3316, Mann-Whitney
test), echoing the on-path-play finding that the effect of the credibility of a neologism on the
senders was present but overall not significant. Without any neologism, the frequencies of fully
revealing and truth-telling strategies in Game 1M 2 (77% and 71%) were as high as those in
Game 2M 2 (82% and 76%) and significantly higher than those in Game 1M 3 (51% and 33%;
p “ 0.0143, Mann-Whitney tests), again echoing the corresponding on-path-play findings.
For the receivers, the existence of a non-credible neologism in Game 2M 3 led to a greater
adoption of separating strategy, although the increase was not significant: the frequencies of this
strategy category increased from 58% in Game 2M 2 to 72% in Game 2M 3 (p “ 0.1918, MannWhitney test). This non-significant increase was, however, in line with the on-path-play finding
that the presence of a non-credible neologism had a less significant impact on the receivers than
on the senders.
Further echoing the finding that the credibility of the neologism contributed to deviating
behavior from the receivers, a significantly lower frequency of separating strategy was observed
in Game 1M 3 (45% vs. 72% in Game 2M 3; p “ 0.0571, Mann-Whitney test); a higher frequency
of pooling strategy, though not significant, was also observed in Game 1M 3 (28% vs. 14% in
Game 2M 3; p “ 0.1547, Mann-Whitney test). Finally, the frequency of separating strategy was
72% in Game 1M 2, significantly higher than the 45% in Game 1M 3 (p “ 0.0147, Mann-Whitney
test). Overall, the analysis of elicited strategies confirms that the on-path plays were results of
subjects playing the corresponding strategies.
Senders' Predictions of Receivers' Responses Conditional on Senders' Strategy Categories
Game 2M2
Game 2M3
Game 1M3
Game 1M2
1.00
Proportion
0.80
0.60
0.40
0.20
0.00
Separating
Pooling
Literal Babbling
Separating
Pooling
Separating
Pooling
Non‐Revealing
Truth Telling
Strategy Category‐Prediction
Separating
Pooling
Fully Revealing
Figure 4: Senders’ Predictions of Receivers’ Responses, Games 1M 2, 1M 3, 2M 2, and 2M 3
Figure 4 presents the elicited senders’ beliefs about the receivers’ responses, which are classified into either separating or pooling.29 In presenting the frequency bars of the predicted receivers’
29
Recall that the senders’ beliefs were elicited after they were informed about their types, and each sender was
asked to predict his/her paired receiver’s response to the message implemented given his/her realized type (and
the furnished strategy). Accordingly, in characterizing the senders’ predictions of the receivers’ responses, we
define separating and pooling with respect to the realized type: a predicted response is classified as separating if
20
responses, we categorize them by the senders’ strategy categories defined above. This helps make
clear that the senders’ strategies were, to varying degrees, consistent with their reported beliefs.
In all four games, more than 50% of the predicted receivers’ responses, ranging from 54% to
92%, were best responses to the senders’ adopted literal babbling or truth-telling strategies.
Even when considering the more encompassing sets of non-revealing and fully-revealing strategies, the corresponding percentages were above 40% and as high as 83%. These findings suggest
that what we observed in the laboratory was largely a manifestation of purposeful behavior;
out of the anticipation that the receivers would best respond accordingly, some of the senders
adopted the truth-telling/fully revealing strategies and some adopted the babbling strategies.30
We nevertheless note the unique finding from Game 1M 3, in which substantial proportions—
up to 45%—of the predicted receivers’ responses were pooling when the senders adopted the
truth-telling or fully revealing strategies. These senders revealed their types even when they
anticipated that their messages would be ignored. Recall that Game 1M 3 has the unique property
among the four games with literal messages that its fully revealing outcome is the only one that
is not neologism-proof. The senders’ pooling predictions signaled their anticipations that the
receivers understood that they had no incentives to truthfully reveal in the presence of a credible
neologism. While any communication strategy would be a best response to the beliefs that the
messages would be ignored, these senders chose to break the indifference by being truthful. This
suggests that, consistent with the lying aversion documented in prior experimental studies, some
of our subjects might have a homegrown preference for telling the truthful.
4.2
Games with A Priori Meaningless Messages
In this subsection, we report findings from our second set of treatments with a priori meaningless
messages, focusing on the three hypothesized effects in Hypothesis 4. Not surprisingly, in the
absence of a pre-existing common language, we observed a greater variety in the uses and interpretations of messages across different matching groups. To provide a comprehensive picture of
our findings, we therefore make individual group behavior the focus of our analysis. Table 3 reports, for each matching group in Games 1E and 2E, the frequencies of messages conditional on
types and of types conditional on messages. The former frequency captures how the senders used
the messages, and the latter helps determine the endogenous, empirical meanings of messages.
The frequencies are aggregated separately across the first 20 rounds and the last 20 rounds.
In order to investigate whether a new meaning emerges for the third message [Hypothesis
the sender anticipates that 1) the receiver will take C after observing his/her type to be s, or 2) the receiver will
take R after observing his/her type to be t; a predicted response is classified as pooling if the sender anticipates
that the receiver will take L after observing his/her own type to be either s or t.
30
Refer to Appendix A for a level-k analysis in which we explore an alternative rationalization of our findings.
While complementarities exist between neologism-proofness and the level-k approach, a natural specification of
level-k models for communication games fails to explain some of our findings from Games 1M 3 and 2M 2.
21
4(i)], it is imperative to first ascertain the endogenous meanings of the two initial messages.
Amid the heterogenous behavior across matching groups, our strategy is not to look for general
empirical regularity but to explore whether individual observations consistent with the predicted
emergence of meanings exist. For each game, we first identify the matching groups in which
separation occurred and distinct meanings emerged for the two messages in the first 20 rounds.
We then evaluate among these groups the meaning of the third message in the last 20 rounds.
We apply a simple criterion to the first-20-round frequencies of types conditional on messages
to determine whether distinct meanings have emerged for the two initial messages.31 For Game
1E, we conclude that distinct meanings could be established for “$” and “%” in the first 20
rounds in 6 out of the 12 observations (matching groups). These identified observations are
highlighted in Table 3.32 Observations 2-2 (Session-Group) and 2-5 represent, respectively, the
strongest and the weakest examples. In the first 20 rounds of Observation 2-2, message “$” was
sent exclusively by type t—the frequency of t conditional on “$” was 100%—and 87% of the time
message “%” was sent by type s, which strongly established the empirical meaning of “$” to be
t and that of “%” to be s.33 For Observation 2-5, 70% of the time “$” was sent by s and 65%
of the time “%” was sent by t, establishing the meaning of “$” to be s and that of “%” to be
t. Note that different meanings emerged in these two observations, but the mapping [“%” ÞÑ s
and “$” ÞÑ t] was the dominant finding: it was established not only for Observation 2-2 but also
for the other four observations, 1-4, 1-6, 2-4, and 2-6.34
31
Our criterion, though rather arbitrary, requires reasonably clear differentiation between the uses of the two
messages while providing leeway for imperfect subjects’ behavior. For each message, the type with the higher
conditional frequency is considered the candidate for the meaning of the message. The necessary condition for
distinct meanings is that the two candidates, one for each message, cannot be the same. (We use the empirical
frequencies of s and t rather than their theoretical equal priors in the calculation of the conditional frequencies,
which makes it possible that the higher conditional frequencies for both messages rest on the same type). In
concluding whether the distinct candidates are indeed deemed to be the meanings, we further require the sufficient
condition that the lower of the two conditional frequencies, one for each candidate, be at least 60% and the higher
be at least 70%. The ensuing examples in the main text illustrate our criterion.
32
The shaded observations in Table 3 represent the observations in which individual group behavior was
consistent with the theoretical predictions, either only in the first 20 rounds or in all 40 rounds.
33
Note how this can be reconciled with the frequencies of messages conditional on types via the Bayes’ rule.
Table 3 reveals that, in the first 20 rounds of Observation 2-2, 100% of the messages sent by s-types was “%”
and 85% of the messages sent by t-types was “$”. Since s-types never sent “$” while t-types did, the frequency
of t conditional on “$” must be 100%. On the other hand, since “%” was sent by both types, but the frequency
by s was higher, the frequency of s conditional on “%” should also be higher than that of t conditional on “%”.
34
Observation 1-1 was close but did not meet our criterion for the establishment of meanings. It is, however,
another observation in which subjects used messages in a way consistent with the alternative meaning mapping.
22
23
Freq.(Message|Type)
s
t
“$”
“%”
“$”
“%”
0.24
0.76
0.78
0.22
0.14
0.86
0.67
0.33
0.05
0.95
0.68
0.32
0.05
0.95
0.89
0.11
0.10
0.90
0.95
0.05
0.00
1.00
0.95
0.05
0.26
0.74
0.76
0.24
0.40
0.60
0.56
0.44
0.25
0.75
0.62
0.38
0.52
0.48
0.41
0.59
0.00
1.00
1.00
0.00
0.00
1.00
1.00
0.00
–
–
–
–
Game/
Obs.
2E
Sess.-Grp.
1-1
1-2
1-3
1-4
1-5
1-6
2-1
2-2
2-3
2-4
2-5
2-6
Mean
First 20 Rounds
Freq.(Type|Message)
“$”
“%”
Sent
s
t
Sent
s
0.55
0.18 0.82
0.45
0.72
0.38
0.20 0.80
0.62
0.76
0.35
0.07 0.93
0.65
0.77
0.45
0.06 0.94
0.55
0.91
0.50
0.10 0.90
0.50
0.95
0.45
0.00 1.00
0.55
0.95
0.48
0.32 0.68
0.52
0.81
0.50
0.30
0.70
0.50
0.45
0.48
0.21
0.79
0.52
0.57
0.48
0.63
0.37
0.52
0.52
0.50
0.00 1.00
0.50
1.00
0.62
0.00 1.00
0.38
1.00
0.48
–
–
0.52
–
First 20 Rounds
Freq.(Type|Message)
“$”
“%”
Sent
s
t
Sent
s
0.65
0.58
0.42
0.35
0.21
0.62
0.52
0.48
0.38
0.53
0.50
0.60
0.40
0.50
0.50
0.48
0.11 0.89
0.52
0.76
0.40
0.25
0.75
0.60
0.54
0.48
0.37 0.63
0.52
0.76
0.30
0.50
0.50
0.70
0.43
0.43
0.00 1.00
0.57
0.87
0.70
0.32
0.68
0.30
0.58
0.43
0.29 0.71
0.57
0.78
0.50
0.70 0.30
0.50
0.35
0.25
0.20 0.80
0.75
0.63
0.48
–
–
0.52
–
t
0.28
0.24
0.23
0.09
0.05
0.05
0.19
0.55
0.43
0.48
0.00
0.00
–
t
0.79
0.47
0.50
0.24
0.46
0.24
0.57
0.13
0.42
0.22
0.65
0.37
–
“$”
0.05
0.00
0.00
0.00
0.00
0.04
0.12
0.16
0.16
0.65
0.00
0.00
–
“$”
0.32
0.59
0.85
0.05
0.05
0.00
0.15
0.00
0.06
0.06
0.32
0.04
–
“&”
0.68
0.23
0.05
0.82
0.57
0.55
0.62
0.61
0.59
0.28
0.59
0.54
–
“$”
0.62
0.33
0.05
0.50
0.32
0.67
0.21
0.59
0.22
0.41
0.33
0.44
–
t
“%”
0.00
0.33
0.55
0.00
0.42
0.06
0.29
0.00
0.09
0.00
0.17
0.31
–
s
“%”
0.62
0.95
0.36
0.76
0.57
0.70
0.59
0.42
0.68
0.30
1.00
0.24
–
“&”
0.33
0.05
0.64
0.24
0.43
0.26
0.29
0.42
0.16
0.05
0.00
0.76
–
“$”
0.63
0.75
0.62
0.83
1.00
0.76
0.69
0.38
0.66
0.25
0.96
0.68
–
t
“%”
0.11
0.25
0.00
0.00
0.00
0.18
0.09
0.13
0.05
0.45
0.00
0.00
–
Freq.(Message|Type)
s
“%”
0.00
0.18
0.10
0.14
0.38
0.45
0.23
0.39
0.35
0.67
0.09
0.42
–
Freq.(Message|Type)
“&”
0.26
0.00
0.38
0.17
0.00
0.06
0.22
0.49
0.29
0.30
0.04
0.32
–
“&”
0.38
0.33
0.40
0.50
0.26
0.28
0.50
0.41
0.69
0.59
0.50
0.25
–
t
0.68
0.32
0.06
0.90
0.86
1.00
0.43
1.00
0.83
0.90
0.46
0.87
–
Sent
0.33
0.38
0.40
0.47
0.65
0.35
0.45
0.25
0.43
0.44
0.59
0.33
0.42
“$”
s
0.08
0.00
0.00
0.00
0.00
0.07
0.11
0.40
0.18
0.72
0.00
0.00
–
t
0.92
1.00
1.00
1.00
1.00
0.93
0.89
0.60
0.82
0.28
1.00
1.00
–
Last 20 Rounds
Sent
0.48
0.47
0.44
0.25
0.18
0.30
0.18
0.33
0.15
0.25
0.33
0.20
0.30
“$”
s
0.32
0.68
0.94
0.10
0.14
0.00
0.57
0.00
0.17
0.10
0.54
0.13
–
Last 20 Rounds
Freq.(Type|Message)
“%”
Sent
s
t
0.37
0.87
0.13
0.59
0.79
0.21
0.13
1.00
0.00
0.33
1.00
0.00
0.20
1.00
0.00
0.47
0.84
0.16
0.30
0.83
0.17
0.30
0.83
0.17
0.35
0.93
0.07
0.38
0.40
0.60
0.38
1.00
0.00
0.13
1.00
0.00
0.33
–
–
Freq.(Type|Message)
“%”
Sent
s
t
0.00
N/A
N/A
0.25
0.40
0.60
0.33
0.15
0.85
0.08
1.00
0.00
0.40
0.50
0.50
0.28
0.91
0.09
0.25
0.60
0.40
0.18
1.00
0.00
0.20
0.75
0.25
0.30
1.00
0.00
0.13
0.40
0.60
0.38
0.67
0.33
0.23
–
–
Sent
0.30
0.03
0.47
0.20
0.15
0.18
0.25
0.45
0.23
0.18
0.03
0.54
0.25
Sent
0.53
0.28
0.23
0.67
0.42
0.42
0.57
0.49
0.65
0.45
0.49
0.42
0.47
“&”
s
0.58
1.00
0.47
0.50
1.00
0.86
0.50
0.56
0.33
0.14
0.00
0.73
–
“&”
s
0.62
0.45
0.11
0.67
0.71
0.71
0.70
0.55
0.38
0.28
0.59
0.76
–
t
0.42
0.00
0.53
0.50
0.00
0.14
0.50
0.44
0.67
0.86
1.00
0.27
–
t
0.38
0.55
0.89
0.33
0.29
0.29
0.30
0.45
0.62
0.72
0.41
0.24
–
Note: “Freq.(Message|Type)”, the frequency of message conditional on type, is used to measure how the senders used the messages. “Freq.(Type|Message)”, the frequency of type conditional on
message, is used to determine the endogenous, empirical meanings of messages implied by the senders’ uses of messages; “N/A” indicates the cases in which this conditional frequency cannot be
calculated because the message in question was not used at all. The additional columns “Sent” provide the total frequency at which the particular message was sent by the senders (received by the
receivers; same as the frequencies reported under “Received” in Table 4). Means are reported only for these “Sent” frequencies, because alternative uses of the a priori meaningless messages in different
matching groups render the means for other frequencies not informative about the average behavior of the senders. The shaded observations represent the ones in which individual group behavior was
consistent with the theoretical predictions, either only in the first 20 rounds or in all 40 rounds.
Freq.(Message|Type)
s
t
“$”
“%”
“$”
“%”
0.83
0.17
0.50
0.50
0.62
0.38
0.63
0.37
0.55
0.45
0.44
0.56
0.11
0.89
0.77
0.23
0.24
0.76
0.52
0.48
0.30
0.70
0.71
0.29
0.33
0.67
0.27
0.73
0.00
1.00
0.85
0.15
0.56
0.44
0.79
0.21
0.22
0.78
0.71
0.29
0.67
0.33
0.32
0.68
0.10
0.90
0.42
0.58
–
–
–
–
Game/
Obs.
1E
Sess.-Grp.
1-1
1-2
1-3
1-4
1-5
1-6
2-1
2-2
2-3
2-4
2-5
2-6
Mean
Table 3: Frequencies of Messages Conditional on Types and of Types Conditional on Messages, Games 1E and 2E
For the last 20 rounds, recall that the third message, “&”, is predicted to be self-signaling in
Game 1E. Given that separation has occurred with the two initial messages, it pays for both s
and t to send “&” when it becomes available as a neologism, which leads to a new endogenous
meaning that both types are equally likely. Using a criterion consistent with the one applied to
the first 20 rounds, we conclude that 2 out of the 6 observations identified above, 2-2 and 2-5,
had this new meaning emerged for “&”.35 Observation 2-2 remains to be the strongest example
in the second half of the sessions. Table 3 reveals that in Observation 2-2 the frequencies of
s and t conditional on “&” were approximately equal (respectively 55% and 45%) and were
clearly different from the 87% frequency of s (conditional on “%”) and the 100% frequency of t
(conditional on “$”) in the first 20 rounds, indicating that the endogenous meaning of “&” was
not present before it was introduced.
It is worth mentioning another one of the 6 observations identified in the first-20-round
analysis, Observation 1-4. While they do not meet our criterion for the last 20 rounds, the
respective frequencies of s and t conditional on “&” in this observation were 67% and 33%,
fairly different from the 76% frequency of s (conditional on “%”) and the 89% frequency of
t (conditional on “$”) in the first 20 rounds. More importantly, Table 3 reveals that in this
observation the frequency of “&” was 67%, the highest among all the 12 observations from
Game 1E. This high frequency of “&” and the corresponding low frequencies of the two other
messages in the last 20 rounds (e.g., the frequency of “%” was down from 52% in the first 20
rounds to a mere 8% after “&” was introduced) suggest that there was a strong incentive for the
senders to deviate to send the neologism, an implication of the self-signaling property of “&”.
Accordingly, we also highlight Observation 1-4 as an observation from Game 1E that was in line
with the predicted self-signaling property of “&”. Note further that substantial decreases in the
frequencies of the two initial messages after “&” was introduced were also observed in the other
two highlighted observations, 2-2 and 2-5.
Applying our criterion for the first 20 rounds to Game 2E, we conclude that distinct meanings
could be established for “$” and “%” in 9 out of the 12 observations. Table 3 reveals that
Observations 2-5 and 2-6 represent the strongest examples. In both observations, “$” was sent
exclusively by t and “%” exclusively by s. In the other 7 identified observations, 1-1, 1-2, 1-3,
1-4, 1-5, 1-6, and 2-1, the distinct meanings were also strongly established, with the relevant
conditional frequencies being at least above 70% (with one exception) and as high as 95%. It is
also curious that the mapping [“%” ÞÑ s and “$” ÞÑ t] was observed in all 9 observations.36
35
We consider the frequencies of s and t conditional on “&” to be approximately equal if they are within
50% ˘ 10%.
36
This common mapping of meanings was observed in Game 2E (and to a lesser extent in Game 1E) despite our
effort to randomize the distributions of alternative instructions and the decision buttons on the screen. Although
a keyboard was not involved in subjects’ inputs of decisions, we speculate that the relative positions of “$” and
“%” on a keyboard attached to a computer terminal in our lab might have provided a focal point for mapping
“$” to X ď 50 (θ “ t) and “%” to X ą 50 (θ “ s), which would have been hard to avoid even if we had used
other symbols for the messages. Regardless, this potential focal point originated from the keyboard positions of
24
For the last 20 rounds, recall the prediction for Game 2E that, given that separation has
occurred with the two initial messages, the third message, “&”, is not preferred by type t and is
only weakly preferred by type s, rendering “&” not credible. With the endogenous meaning that
the sender’s type is s (since only s may send “&”), “&” does not convey any meaning not present
before it is introduced. Out of the 9 observations identified above, there were 4 observations, 1-2,
1-5, 1-6, and 2-6, in which “&” was established under our criterion to mean s; Table 3 reveals
that in these observations the frequencies of s conditional on “&” ranged from 73% to 100%.
Parallel to the case of Observation 1-4 from Game 1E, in Observation 2-5 from Game 2E
we observed an extremely low frequency of “&”—only 3%. Even though the meaning of “&”
in this observation was established perfectly to be t, it was established under only a few cases
of “&”. More importantly, a low frequency of “&” was a predicted consequence of the noncredibility of “&” as a neologism.37 We therefore also highlight Observation 2-5 from Game 2E
as a conforming observation for the last 20 rounds.
In fact, while we were able to establish the predicted meanings only for a subset of observations, the predicted consequence of the contrasting properties of “&” in Games 1E and 2E on
the frequencies of the neologism being used [Hypothesis 4(ii)] was observed in most observations:
the average frequency of “&” was 47% in Game 1E, significantly higher than the 25% in Game
2E (p “ 0.003, Mann-Whitney test); the self-signaling property of “&” in Game 1E rendered it
on average the most frequently sent message among the three messages available in the last 20
rounds, while the non-credibility of “&” in Game 2E made it the least popular.
We summarize our findings to this point, in which we address the qualitative Hypothesis 4(i)
by emphasizing the existence of conforming behavior and confirm the quantitative Hypothesis
4(ii) with aggregate behavior:
Finding 4a (Effects of the Evolution of Meanings of Credible and Non-Credible
Neologisms).
(i) There existed observations in Games 1E and 2E that were consistent with the evolutions of
meanings hypothesized in Hypothesis 4(i). In two observations from Game 1E, the meaning
“the sender is (approximately) equally probable to be of either type” emerged for the third,
neologistic message, “&”, which was a meaning not present before “&” was introduced; in
four observations from Game 2E, the meaning “the sender’s type is (highly likely to be) s”
emerged for “&”, which was an existing meaning before “&” was introduced.
(ii) Confirming Hypothesis 4(ii), the average frequency of “&” being sent was significantly
higher in Game 1E than in Game 2E.
the symbols did not undermine the purpose of our design, which is to avoid focal points to be developed directly
from the literal meanings of messages.
37
Note also that an endogenous meaning of t for “&”, though inconsistent with the specific prediction on the
emerged meaning of “&”, is consistent with the more general prediction that no new meaning emerges for “&”.
25
26
Received
0.55
0.38
0.35
0.45
0.50
0.45
0.48
0.50
0.48
0.48
0.50
0.62
0.48
Game/
Obs.
2E
Sess.-Grp.
1-1
1-2
1-3
1-4
1-5
1-6
2-1
2-2
2-3
2-4
2-5
2-6
Mean
“$”
L
0.09
0.47
0.00
0.06
0.00
0.00
0.26
0.35
0.05
0.47
0.00
0.40
–
“$”
L
0.58
0.44
0.15
0.26
0.19
0.16
0.17
0.12
0.46
0.41
0.35
0.10
–
R
0.19
0.28
0.35
0.74
0.56
0.73
0.50
0.88
0.54
0.53
0.55
0.90
–
Received
0.35
0.38
0.50
0.52
0.60
0.52
0.70
0.57
0.30
0.57
0.50
0.75
0.52
C
0.23
0.20
0.00
0.00
0.00
0.06
0.21
0.05
0.16
0.21
0.05
0.04
–
R
0.68
0.33
1.00
0.94
1.00
0.94
0.53
0.60
0.79
0.32
0.95
0.56
–
Received
0.45
0.62
0.65
0.55
0.50
0.55
0.52
0.50
0.52
0.52
0.50
0.38
0.52
First 20 Rounds
Freq.(Action|Message)
C
0.23
0.28
0.50
0.00
0.25
0.11
0.33
0.00
0.00
0.06
0.10
0.00
–
First 20 Rounds
Freq.(Action|Message)
“%”
L
0.22
0.64
0.38
0.00
0.15
0.64
0.38
0.30
0.43
0.52
0.40
0.33
–
“%”
L
0.07
0.20
0.30
0.38
0.17
0.29
0.29
0.26
0.17
0.13
0.45
0.10
–
C
0.56
0.20
0.58
0.91
0.85
0.27
0.43
0.50
0.47
0.19
0.55
0.60
–
C
0.29
0.33
0.20
0.57
0.58
0.57
0.29
0.65
0.33
0.74
0.15
0.83
–
R
0.22
0.16
0.04
0.09
0.00
0.09
0.19
0.20
0.10
0.29
0.05
0.07
–
R
0.64
0.47
0.50
0.05
0.25
0.14
0.42
0.09
0.50
0.13
0.40
0.07
–
Received
0.33
0.38
0.40
0.47
0.65
0.35
0.45
0.25
0.43
0.44
0.59
0.33
0.42
Received
0.48
0.47
0.44
0.25
0.18
0.30
0.18
0.33
0.15
0.25
0.33
0.20
0.30
“$”
L
0.00
0.67
0.00
0.00
0.00
0.00
0.00
0.20
0.12
0.44
0.00
0.38
–
“$”
L
0.47
0.47
0.11
0.30
0.43
0.00
0.29
0.00
0.00
0.20
0.69
0.00
–
C
0.08
0.00
0.00
0.00
0.00
0.00
0.00
0.10
0.06
0.28
0.00
0.00
–
C
0.37
0.42
0.89
0.00
0.00
0.00
0.14
0.00
0.00
0.00
0.00
0.00
–
R
0.92
0.33
1.00
1.00
1.00
1.00
1.00
0.70
0.85
0.28
1.00
0.62
–
R
0.16
0.11
0.00
0.70
0.57
1.00
0.57
1.00
1.00
0.80
0.31
1.00
–
Last 20 Rounds
Freq.(Action|Message)
“%”
Received
L
C
R
0.37
0.00
1.00
0.00
0.59
0.66
0.21
0.13
0.13
0.00
1.00
0.00
0.33
0.00
1.00
0.00
0.20
0.13
0.87
0.00
0.47
0.47
0.53
0.00
0.30
0.58
0.42
0.00
0.30
0.25
0.42
0.33
0.35
0.14
0.86
0.00
0.38
0.33
0.54
0.13
0.38
0.67
0.33
0.00
0.13
0.60
0.40
0.00
0.33
–
–
–
Last 20 Rounds
Freq.(Action|Message)
“%”
Received
L
C
R
0.00
N/A
N/A
N/A
0.25
0.10
0.30
0.60
0.33
0.23
0.15
0.62
0.08
0.00
1.00
0.00
0.40
0.38
0.31
0.31
0.28
0.00
1.00
0.00
0.25
0.40
0.10
0.50
0.18
0.29
0.71
0.00
0.20
0.25
0.50
0.25
0.30
0.33
0.67
0.00
0.18
0.40
0.00
0.60
0.38
0.13
0.74
0.13
0.23
–
–
–
Received
0.30
0.03
0.47
0.20
0.15
0.18
0.25
0.45
0.23
0.18
0.03
0.54
0.25
Received
0.53
0.28
0.23
0.67
0.42
0.42
0.57
0.49
0.65
0.45
0.49
0.42
0.47
“&”
L
0.17
1.00
0.68
0.00
0.17
0.57
0.60
0.33
0.45
0.14
1.00
0.72
–
“&”
L
0.47
0.45
0.44
0.52
0.18
0.59
0.26
0.70
0.50
0.17
0.41
0.06
–
C
0.25
0.00
0.16
0.25
0.50
0.29
0.40
0.39
0.33
0.43
0.00
0.14
–
C
0.29
0.00
0.00
0.48
0.58
0.35
0.17
0.00
0.00
0.33
0.05
0.24
–
R
0.58
0.00
0.16
0.75
0.33
0.14
0.00
0.28
0.22
0.43
0.00
0.14
–
R
0.24
0.55
0.56
0.00
0.24
0.06
0.57
0.30
0.50
0.50
0.54
0.70
–
Note: “Freq.(Action|Message)”, the frequency of action conditional on message, is used to measure how the receivers responded to the senders’ messages. The additional columns
“Received” provide the total frequency at which the particular message was received by the receivers (sent by the senders; same as the frequencies reported under “Sent” in Table 3).
Means are reported only for these “Received” frequencies, because alternative responses to the a priori meaningless messages in different matching groups render the means for other
frequencies not informative about the average behavior of the receivers. The shaded observations represent the ones in which individual group behavior was consistent with the theoretical
predictions, either only in the first 20 rounds or in all 40 rounds.
Received
0.65
0.62
0.50
0.48
0.40
0.48
0.30
0.43
0.70
0.43
0.50
0.25
0.48
Game/
Obs.
1E
Sess.-Grp.
1-1
1-2
1-3
1-4
1-5
1-6
2-1
2-2
2-3
2-4
2-5
2-6
Mean
Table 4: Frequencies of Actions Conditional on Messages, Games 1E and 2E
We proceed to evaluate the remaining Hypothesis 4(iii). Given the senders’ uses of messages,
the achievement of fully revealing outcome depends on whether the implied meanings are picked
up by the receivers. Before we evaluate the revelation outcomes, it is therefore useful to first
examine the receivers’ behavior. Table 4 reports, for each matching group in Games 1E and 2E,
the frequencies of actions conditional on messages, which are again aggregated separately across
the first 20 rounds and the last 20 rounds.
Our analysis of the receivers’ responses focuses again on individual group behavior and involves two steps. For each game, we first identify, among the matching groups with established
distinct meanings in the first 20 rounds, the groups in which the receivers’ actions were best responses given the meanings. We then further examine among the identified groups the receivers’
responses to the third message in the last 20 rounds.
We apply a similar criterion to the first-20-round frequencies of actions conditional on messages. Among the 6 observations from Game 1E with established meanings in the first 20 rounds,
we identified 5 observations, 1-4, 1-6, 2-2, 2-4, and 2-6, in which the receivers’ most frequent
responses were best responses to the corresponding messages given their empirically determined
meanings.38 Observation 2-2 remains to be one of the strongest examples when it comes to the
receivers’ responses. Recall that in Observation 2-2 (as well as in the other 4 identified observations), the meaning of “$” was established to be t and that of “%” was established to be s; Table
4 reveals that in this observation the receiver’s ideal action under t, R, was taken 88% of the time
when “$” was received, and the ideal action under s, C, was taken 65% of the time when “%” was
received. Observation 2-6 represents an even stronger example, in which the two ideal actions
were taken 90% and 83% of the time. It is also curious that the only observation where meanings
could be established but the best response frequencies did not meet our criterion, Observation
2-5, was the observation in which the alternative meaning mapping was established.
Regarding the responses to the third message, “&”, in 3 out of the 5 observations from Game
1E identified above, 1-4, 1-6, and 2-2, the receivers’ most frequent responses were L, the ex-ante
ideal action under the equal priors. On the other hand, recall that Observations 1-4, 2-2, and
2-5 had either the meaning of “&” established to be that both sender’s types are equally likely
or such a meaning was less clearly established but there was a high frequency of “&”. The
intersection of these two sets of observations, 1-4 and 2-2, represent the cases where both types
of the sender deviated to send the neologism when it became available and the receivers were
able to respond optimally to such deviations. In fact, Observation 2-2 was the strongest example
all along. Both the senders and the receivers in this matching group behaved consistently with
the predicted disruption to the fully revealing equilibrium when a self-signaling neologism was
introduced. While we only obtained one such observation, it is remarkable that subjects were able
38
Our criterion places a more stringent requirement than demanding the best responses to be chosen more
than 1/3 of the time. For the conditional frequencies of the two best responses, one for each message, we require
the lower frequency to be at least 50% and the higher to be at least 60%.
27
to respond optimally to the incentives underlying the self-signaling neologism in an environment
without pre-existing common language.
For the first 20 rounds in Game 2E, we identified 6 out of the 9 observations with established
meanings, 1-1, 1-3, 1-4, 1-5, 2-5, and 2-6, in which the receivers best responded to the meanings.
The strongest examples were Observations 1-4 and 1-5. Recall that the same meaning mapping,
[“$” ÞÑ t and “%” ÞÑ s], was established for those 9 observations. Table 4 reveals that, in
Observation 1-4, the frequency of R conditional on “$” was 94% and that of C conditional on
“%” was 91%; in Observation 1-5, the corresponding frequencies were 100% and 85%.
For the last 20 rounds in Game 2E, we were able to identify only one observation among
the 6 observations identified above, 1-5, in which the receivers’ responses to “&” matched its
theoretically predicted and empirically established meaning that the sender is of type s. In this
observation, the frequency of C—the ideal action under s—conditional on “&” met our criterion
at precisely 50%. Note also that the last-20-round frequencies of R conditional on “$” and of C
conditional on “%” showed almost no variation from the corresponding frequencies in the first 20
rounds. Parallel to Observation 2-2 from Game 1E, Observation 1-5 from Game 2E provides an
example in which subjects, in an environment without pre-existing common language, behaved
consistently with the fully revealing equilibrium, where this time the introduction of a noncredible neologism created no disruption to the fully revealing outcome.39
We further compare Observation 2-2 from Game 1E and Observation 1-5 from Game 2E,
discussing the implications of their similarities and differences for the fully revealing outcomes. In
both observations, strong distinct meanings were established for the two messages “$” and “%”,
and any changes from the first 20 rounds to the last 20 rounds concerning these meanings were
to sharpen them even further. Similarly, in both observations the receivers best responded to
the endogenous meanings of the two messages, and the frequencies of the best responses showed
an upward trend over rounds. However, in Observation 2-2 (Game 1E) the receivers responded
to the third message, “&”, mostly with the ex-ante ideal action L, whereas in Observation 1-5
(Game 2E) the receivers responded to “&” (sent exclusively by s) mostly with the ex-post ideal
action C. Furthermore, the frequency of “&” in the former observation was more than three
times the frequency in the latter. Even though the two observations showed similar uses and
interpretations of “$” and “%”, their differences with respect to the uses and interpretations of
“&” resulted in, when the neologism was introduced, a drop in the frequency of fully revealing
outcome in Observation 2-2 (Game 1E) that was not observed in Observation 1-5 (Game 2E).
Table 5 reports the first-20-round and the last-20-round frequencies of fully revealing outcomes
39
We emphasize fully revealing outcome because we did observe disruption to the fully revealing equilibrium: stypes deviated to send “&”, which accounted for the 15% of “&” observed in the last 20 rounds of this observation.
Given that more than a majority of the responses to “&” was C, the ideal action under s, this disruption to the
fully revealing equilibrium, however, did not result in the same disruption to the fully revealing outcome. More
thorough analysis of fully revealing outcomes in both games follows.
28
in all observations from both games. In Observation 2-2 from Game 1E, the frequency was 70%
in the first 20 rounds, which dropped to 55% in the last 20 rounds. By contrast, in Observation
1-5 from Game 2E, the frequency was 85% in the first 20 rounds, which increased to 90% in
the last 20 rounds.40 The same qualitative differences between the first 20 rounds and the last
20 rounds were observed in the aggregate data, although the differences were not statistically
significant. Table 5 reveals that the first-20-round average frequency of fully revealing outcome
in Game 1E was 49%, which was higher than the second-20-round average frequency at 45%
(p “ 0.194, Wilcoxon signed-rank test); the first-20-round average frequency of fully revealing
outcome in Game 2E was 53%, which was lower than the second-20-round average frequency at
56% (p “ 0.127, Wilcoxon signed-rank test).41
Table 5: Frequencies of Fully Revealing Outcomes, Games 1E and 2E
Observation
1-1
1-2
1-3
1-4
1-5
1-6
2-1
2-2
2-3
2-4
2-5
2-6
Mean
Game 1E
First 20 Rounds Last
0.33
0.50
0.55
0.53
0.48
0.53
0.48
0.70
0.35
0.53
0.30
0.58
0.49
20 Rounds
0.23
0.28
0.65
0.45
0.43
0.68
0.30
0.55
0.40
0.68
0.20
0.55
0.45
Game 2E
First 20 Rounds Last
0.45
0.28
0.60
0.85
0.85
0.58
0.43
0.40
0.38
0.20
0.75
0.58
0.53
20 Rounds
0.63
0.23
0.53
0.95
0.90
0.63
0.53
0.38
0.63
0.38
0.73
0.28
0.56
Note: The frequency of fully revealing outcome is measured by the frequency at which the receivers took
the (ex-post) ideal actions, C under type s and R under type t.
We summarize the above findings, which address Hypothesis 4(iii):
Finding 4b (Effects of the Evolution of Meanings of Credible and Non-Credible
Neologisms).
(i) Consistent with the qualitative difference hypothesized in the first part of Hypothesis 4(iii),
the average frequency of fully revealing outcome in Game 1E after the introduction of the
third message, “&”, was lower than that before the neologism was introduced; the differences
were, however, not statistically significant enough to confirm the hypothesis.
(ii) The average frequency of fully revealing outcome in Game 2E after the introduction of “&”
40
Other than the introduction of a third message, learning may also have contributed to how often the fully
revealing equilibrium was played in the last 20 rounds relative to the first 20 rounds. When learning occurred
toward the equilibrium and the third message introduced no disruption, the frequency of fully revealing outcome
could be higher in the last 20 rounds as was observed in this observation.
41
The two respective two-sided p-values are 0.388 and 0.255.
29
was higher than that before the neologism was introduced; the differences were, however,
not statistically significant, thus confirming the second part of Hypothesis 4(iii).
Without confirming Hypothesis 4(iii) for Game 1E, we consider the confirmation for Game
2E to be of limited value. To highlight what we learned from these treatments without the
anchor of literal meanings, we instead emphasize that the fully revealing equilibrium and the
credible/non-credible neologisms with their evolved meanings were able to predict some observed
behavior.
5
Concluding Remarks
In this study, we experimentally evaluate the first cheap-talk refinement concept, neologismproofness. In the experimental literature of communication games, behavioral phenomena such as
overcommunication are prevalent, and they have been quite successfully explained by behavioral
concepts or models such as lying aversion and level-k reasoning. The same behavioral factors
may have been driving some of our findings. However, by focusing on the differences among the
games in our two sets of treatments, we have largely controlled for these factors, which allows us
to focus on the predictive power of neologism-proofness.
In our first set of treatments which features literally meaningful messages, we find that
whether a fully revealing equilibrium is neologism-proof affects how often it is played, lending
support to the predictive power of the refinement concept. We also obtain a finding not covered
by the theory: even non-credible neologisms can cause senders to deviate. On the other hand,
the presence of credible neologisms resulted in a lower frequency of fully revealing outcome via
the receivers’ deviating behavior.
In our second set of treatments which features a priori meaningless messages, we find, rather
unsurprisingly, that subjects had a harder time playing a fully revealing equilibrium and responding to the incentives behind the neologisms. We nonetheless obtained a few observations
that confirm the theoretical predictions, in particular the predicted evolution of meanings of the
credible and non-credible neologisms. These conforming observations demonstrate that sophisticated coordinated plays can be achieved by human subjects in the laboratory. In our case, the
plays involve the evolution of equilibrium plays and a disruption to the equilibrium as predicted
by a refinement concept, all in an environment without the anchor of common language.
Our findings shed light on the powers and limitations of neologism-proofness as a predictive
theory. While we view it as a remarkable finding that the refinement concept was able to
predict behavior in the absence of pre-existing common language, the prediction was confirmed
only in selected individual matching groups. By contrast, individual group behavior in the
treatments with literal messages conformed to the major prediction of neologism-proofness fairly
30
consistently. From an empirical vantage point, this observed contrast validates the importance
of literal meanings in the applications of neologism-proofness. Within the common-language
environment, our findings indicate that the receivers were better able to appreciate the senders’
incentives to send credible neologisms than what the senders expected them to be. This suggests
that strategic sophistication and the expectations thereon may be a missing element in the
current state of the theories and may warrant a place in a more comprehensive, predictive theory
of cheap-talk refinements. We hope that our study can help inform future research in this regard.
31
Appendix A – Level-k Analysis
Motivated by the over-communication phenomenon well-documented in the literature of experimental cheap-talk games and following the convention in the literature (Crawford, 2003; Cai and
Wang, 2006), we specify the following level-k model. The model starts with a credulous L0 receiver and a truthful L0 sender. The sender of the lowest level of sophistication, L0 , simply sends
the truthful message all the time. In response to the L0 sender, the L0 receiver trusts the sender
and always chooses the ideal action given the belief that is consistent with the literal meaning
of a message. The model further assumes that Lkě1 senders best-respond to Lk´1 receivers and
Lkě1 receivers best-respond to Lk senders. Tables A1 and A2 report the level-k predictions for
Game 1M 3 and Game 2M 3, respectively. Tables A3 reports the level-k predictions for Game
1M 2 and Game 2M 2 where the predictions coincide.
Table A1: Level-k Predictions for Game 1M 3
Sender’s Strategy
L0
L1
Lkě2
s
t
“My type is s.”
“My type is s.”
“I won’t tell you.”
“I won’t tell you.”
“My type is t.”
“I won’t tell you.”
“I won’t tell you.”
C
C
C
Receiver’s Strategy
“My type is t.”
“I won’t tell you.”
R
R
R
L
L
L
Table A2: Level-k Predictions for Game 2M 3
Sender’s Strategy
s
L0
L1
Lkě2
“My type is s.”
“I won’t tell you.”
“My type is s.” or “I won’t tell you.”
t
“My type is s.”
“My type is t.”
“My type is t.”
“My type is t.”
C
C
C
Receiver’s Strategy
“My type is t.” “I won’t tell you.”
R
R
R
L
C
C
Table A3: Level-k Predictions for Game 1M 2 and Game 2M 2
Sender’s Strategy
L0
L1
Lkě2
Receiver’s Strategy
“My type is s.”
“My type is t.”
s
t
“My type is s.”
“My type is s.”
“My type is s.”
“My type is t.”
“My type is t.”
“My type is t.”
C
C
C
R
R
R
First of all, like neologism-proofness, the model predicts a babbling outcome for Game 1M 3.
Second, like neologism-proofness, the model essentially predicts a fully revealing outcome for
Game 2M 3. Similarly, the model predicts a fully revealing outcome for Game 1M 2 and Game
2M 2. However, unlike neologism-proofness, the model fails to predict the systematic difference
between receivers’ strategies in Game 1M 3 and Game 2M 3. Recall that neologism-proofness
predicts that when the neologism becomes credible in Game 1M 3, the receiver would use the
32
pooling strategy in which action L is taken regardless of the message received. Our data indicate
that the frequency of the pooling strategy being used by receivers is indeed significantly higher
in Game 1M 3 than in Game 2M 3. On the other hand, the specified level-k model organizes the
data from Game 2M 3 quite well. In particular, the model predicts that the message “I won’t tell
you” was paired with another message “My type is t” to reveal senders’ types effectively, which
is indeed observed in our strategy-level data.
A few remarks regarding alternative specifications of L0 players are in order. One plausible
alternative is to specify that the L0 receiver uniformly randomizes over the three actions regardless of the message. But this alternative specification is somewhat unnatural given our adopted
message space in which each message has a clear literal meaning, and secondly the specification
essentially generates the same fully revealing outcomes from Game 1M 3 and Game 2M 3, which
do not match our data.
We believe that the level-k model and the equilibrium notion of neologism proofness should be
complements rather than substitutes in explaining the experimental data. Our result indicates
that neologism-proofness is crucial to explain the observed treatment effects or the difference
among the four treatments. However, the finding does not imply that the level-k model has no
power to explain the data. Our simple analysis in this section demonstrates that the level-k
model can explain some feature of our data reasonably well.
33
Appendix B – Sample Experimental Instructions
Instructions for Game 1M 3
Welcome to the experiment. This experiment studies decision making between two individuals. In the following hour or so, you will participate in 20 rounds of decision making. Please read
the instructions below carefully; the cash payment you will receive at the end of the experiment
depends on how you make your decisions according to these instructions.
Your Role and Decision Group
There are 20 participants in today’s session. Prior to the first round, one half of the participants will be randomly assigned the role of Member A and the other half the role of Member B.
Your role will remain fixed throughout the experiment. In each round, one Member A and one
Member B will be randomly and anonymously paired to form a group, with a total of 10 groups.
Regarding how players are matched, the 10 groups are equally divided into 2 classes so that
there are 5 groups in each class with 10 participants, 5 Members A and 5 Members B; in each
and every round, you will be randomly matched with a participant in the other role in your class.
Thus, in a round you will have an equal, 1 in 5 chance of being paired with a participant in the
other role in your class. You will not be told the identity of the participant you are matched
with, nor will that participant be told your identity—even after the end of the experiment.
Your Decision in Each Round
Member A’s Decision
In each round and for each group, the computer randomly selects, with equal chance, an
1
integer X from 1 to 100. (Each number therefore has probability 100
to be selected.)
Figure 5 shows Member A’s decision screen. You as Member A have two kinds of decisions
to make:
1. what to tell Member B if it turns out that X ą 50; and
2. what to tell Member B if it turns out that X ď 50.
Note that you are making a decision plan (what to do if this happens and what to do
if that happens), where you will make these decisions without knowing the actually
selected X. For each decision, you can choose from the following three options: 1) “X is bigger
34
Figure 5: Member A’s decision
Figure 6: Member B’s decision
than 50”; 2) “X is smaller than or equal to 50”; and 3) “I won’t tell you”. Your choices for
different decisions could be different or the same.
Your decision for the round is completed after the two choices, which will then enter into the
determination of rewards at a later stage of the round.
Member B’s Decision
35
Figure 6 shows Member B’s decision screen. You as Member B have three kinds of decisions
to make:
1. what action to take if Member A says “X is bigger than 50”;
2. what action to take if Member A says “X is smaller than or equal to 50”; and
3. what action to take if Member A says “I won’t tell you”.
Note that you are making a decision plan (what to do if this happens and what to do
if that happens and etc.), where you will make these decisions without knowing what
Member A actually says. For each decision, you can choose from the following three options:
1) action L; 2) action C; and 3) action R. Your choices for different decisions could be different
or the same.
Your decision for the round is completed after the three choices, which will then enter into
the determination of rewards at a later stage of the round.
Your Reward in Each Round
After Member A’s and Member B’s decisions, the computer will proceed to draw the integer
X randomly from 1 to 100. Then the realization of X will be revealed to Member A and Member
B. Member A’s and Member B’s rewards will be determined according to the reward table in
Figures 5 and 6 and what choices they have made. In each cell of the reward table, the first
number represents the reward in HKD to Member A and the second number the reward in HKD
to Member B. (The relevant numbers for each role are highlighted in Blue.)
The reward procedure, in which the computer draw determines the relevant row of the reward
table and Member B’s action the relevant column, may best be illustrated with an example.
Consider the following choices made by the two members. Member A chooses to say
1. “X is bigger than 50” if X ą 50;
2. “I won’t tell you” if X ď 50.
Member B chooses to take
1. action L if Member A says “X is bigger than 50”;
2. action C if Member A says “X is smaller than or equal to 50”;
3. action R if Member A says “I won’t tell you”.
36
Suppose the computer randomly draws X “ 27. According to the choices, Member A says
“I won’t tell you” and Member B takes action R. Given that X ď 50, Member A will receive 20
HKD and Member B will receive 30 HKD. On the other hand, if the computer draws X “ 79,
according to the choices Member A says “X is bigger than 50” and Member B takes action L.
Given that X ą 50, Member A will receive 30 HKD and Member B will receive 20 HKD.
Prediction Reward by Member A. If you are Member A, you have an opportunity to
earn an extra reward. After the computer draws X, you will be asked to predict what Member
B will take after listening to what you choose to say to him/her for the relevant range of X. If
your prediction is correct, you will receive an extra 2 HKD.
Information Feedback
During the course of a round, you will be informed about the drawn X, what Member A
chooses to say for the relevant range of X, and what Member B chooses to take for what Member
A chooses to say. (The information provided does not include the whole decision plan of the
members.)
Your Cash Payment
The experimenter randomly selects 2 rounds out of 20 to calculate your cash payment. (So it
is in your best interest to take each round seriously.) Your total cash payment at the end of the
experiment will be the sum of HKD you earned in the 2 selected rounds plus a 30 HKD show-up
fee.
Quiz and Practice
To ensure your comprehension of the instructions, we will provide you with a quiz and a
practice round. We will go through the quiz after you answer it on your own. You will then
participate in 1 practice round. At the beginning of the practice round, you will be randomly
assigned the role of either Member A or Member B. Your role in the official rounds is the same
as that in the practice round.
Once the practice round is over, the computer will tell you “The official rounds begin now!”
Adminstration
Your decisions as well as your monetary payment will be kept confidential. Remember that
you have to make your decisions entirely on your own; please do not discuss your decisions with
any other participants.
37
Upon finishing the experiment, you will receive your cash payment. You will be asked to sign
your name to acknowledge your receipt of the payment (which will not be used for tax purposes).
You are then free to leave.
If you have any question, please raise your hand now. We will answer your question individually. If there is no question, we will proceed to the quiz.
Quiz
1. True or False: I will remain as a Member A or Member B in all 20 rounds of decisionmaking. Circle one: True / False
2. True or False: I will be matched with the same player in the other role in all 20 rounds.
Circle one: True / False
3. True or False: My decisions involve making plans for different scenarios but not specific
choice for single scenario. Circle one: True / False
4. True or False: At the end of the experiment, I will be paid my earnings in HKD from two
randomly chosen rounds in addition to 30 HKD show-up fee. Circle one: True / False.
38
Instructions for Game 1E
INSTRUCTION
Welcome to the experiment. This experiment studies decision making between two individuals. In the following hour or so, you will participate in 40 rounds of decision making. Please read
the instructions below carefully; the cash payment you will receive at the end of the experiment
depends on how you make your decisions according to these instructions.
Your Role and Decision Group
There are 20 participants in today’s session. At the beginning of the experiment, one half of
the participants will be randomly assigned the role of Member A and the other half the role of
Member B. Your role will remain fixed throughout the experiment. In each round, one Member
A and one Member B will be randomly and anonymously paired to form a group, with a total
of 10 groups.
Regarding how participants are matched, the 10 groups are equally divided into 5 classes so
that there are 2 groups in each class with 4 participants, 2 Member As and 2 Member Bs; in each
and every round, you will be randomly matched with a participant in the other role in your class.
Thus, in a round you will have an equal, 1 in 2 chance of being paired with a participant in the
other role in your class. You will not be told the identity of the participant you are matched
with, nor will that participant be told your identity—even after the end of the experiment.
Your Decision in Each of Round 1-20
In each round and for each group, the computer randomly selects, with equal chance, an
1
integer X from 1 to 100. (Each number therefore has probability 100
to be selected.) The
computer will reveal (only) to Member A whether the selected X is bigger than 50 (X ą 50) or it
is smaller than or equal to 50 (X ď 50). Member B, without knowing the range of the selected
X, needs to make an action choice.
Member A’s Decision
Figure 7 shows Member A’s decision screen. After being informed of the range of the selected
X (whether X ą 50 or X ď 50), you as Member A have to decide what message to send to your
paired Member B. You will be prompted to enter your choice of message by clicking one of the
two buttons ‘%’ or ‘$’. Note that the relative positions of the two message buttons are randomly
determined in each round, so the positions of the buttons ‘%’ and ‘$’ in a round may or may
39
Figure 7: Member A’s Decision
not be the same as those in the previous or later round.42 Once you click one of the buttons,
your decision in the round is completed and the chosen message will be transmitted to the paired
Member B.
Member B’s Decision
Figure 8 shows Member B’s decision screen. You as Member B will see the message sent
by your paired Member A. After that, you will be prompted to enter your choice of action by
clicking one of the three buttons ‘L’, ‘C’, or ‘R’.
Your decision for the round is then completed.
Your Reward in Each Round
After Member A’s and Member B’s decisions, the realization of X and Member B’s choice of
action will be revealed to both Member A and Member B. Member A’s and Member B’s rewards
will be determined according to the reward table in Figures 5 and 6 and the choices they have
made. In each cell of the reward table, the first number represents the reward in HKD to Member
A and the second number the reward in HKD to Member B. (The relevant numbers for each role
are highlighted in Blue.)
42
We also have two sets of instructions for today’s session, one in which the two symbols are stated in the order
‘%’ followed by ‘$’ and the other in the order ‘$’ followed by ‘%’. We randomly distribute the instructions so
that half of the participants have their instructions adopting one order and the other half have their instructions
adopting the other order. This means that another participant in your group may or may not have the same
symbol order in his/her instructions.
40
Figure 8: Member B’s Decision
Information Feedback
At the end of a round, you will be informed about the selected X, Member A’s choice of
message, Member B choice of action, and your reward for the round.
Your Decision in Each of Round 21-40
After the 20th round, your screen will provide further instructions for your decisions in Round
21-40. Please read the instructions carefully before you start the 21st round. You will have an
opportunity to ask questions if anything is unclear about the new instructions.
Your Cash Payment
The experimenter will randomly select 3 rounds out of the 40 to calculate your cash payment.
(So it is in your best interest to take each round seriously.) Your total cash payment at the end
of the experiment will be the sum of HKD you earned in the 3 selected rounds plus a 40 HKD
show-up fee.
Quiz and Practice
To ensure your comprehension of the instructions, we will provide you with a quiz and a
practice round. We will go through the quiz after you answer it on your own. You will then
participate in 1 practice round. At the beginning of the practice round, you will be randomly
41
assigned the role of either Member A or Member B. Your role in the official rounds is the same
as that in the practice round.
Once the practice round is over, the computer will tell you “The official rounds begin now!”
Adminstration
Your decisions as well as your monetary payment will be kept confidential. Remember that
you have to make your decisions entirely on your own; please do not discuss your decisions with
any other participants.
Upon finishing the experiment, you will receive your cash payment. You will be asked to sign
your name to acknowledge your receipt of the payment (which will not be used for tax purposes).
You are then free to leave.
If you have any question, please raise your hand now. We will answer your question individually. If there is no question, we will proceed to the quiz now.
Quiz
1. True or False: I will remain as a Member A or Member B in all 40 rounds of decisionmaking. Circle one: True / False
2. True or False: I will be matched with the same player in the other role in all 40 rounds.
Circle one: True / False
3. True or False: Member A, but not Member B, will be informed of the randomly selected
X. Circle one: True / False.
4. True or False: Member A will be responsible for taking action, while Member B will be
sending message. Circle one: True / False
42
References
[1] Austen-Smith, David. 1990. “Information Transmission in Debate.” American Journal of
Political Science, 34: 124-152.
[2] Banks, Jeffrey S., Colin F. Camerer and David Porter. 1994. “Experimental Tests of Nash
Refinements in Signaling Games.” Games and Economic Behavior, 6: 1-31.
[3] Blume, Andreas, Yong-Gwan Kim and Joel Sobel. 1993. “Evolutionary Stability in Games of
Communication.” Games and Economic Behavior, 5: 547-575.
[4] Blume, Andreas, Douglas V. Dejong, and Geoffrey B. Sprinkle. 2008. “‘The Effect of Message
Space Size on Learning and Outcomes in Sender-Receiver Games.” Handbook of Experimental
Economics Results, Elsevier.
[5] Blume, Andreas, Douglas V. Dejong, Yong-Gwan Kim, and Geoffrey B. Sprinkle. 1998. “Experimental Evidence on the Evolution of the Meaning of Messages in Sender-Recevier Games.”
American Economic Review, 88: 1323-1340.
[6] Blume, Andreas, Douglas V. Dejong, Yong-Gwan Kim, and Geoffrey B. Sprinkle. 2001. “Evolution of Communication with Partial Common Interest.” Games and Economic Behavior,
37: 79-120.
[7] Brandts, Jordi and Charles A. Holt. 1992. “An Experimental Test of Equilibrium Dominance
in Signaling Games.” American Economic Review, 82: 1350-1365.
[8] Cai, Hongbin, and Joseph Tao-Yi Wang. 2006. “Overcommunication in Strategic Information
Transmission Games.” Games and Economic Behavior, 56: 7-36.
[9] Chen, Ying, Navin Kartik, Joel Sobel. 2008. “Selecting Cheap-Talk Equilibria.” Econometrica, 76: 117-136.
[10] Cho, In-Koo and Kreps, David M. 1987. “Signaling Games and Stable Equilibria.” Quarterly
Journal of Economics, 102: 179-221.
[11] Crawford, Vincent. 1998. “A Survey of Experiments on Communication via Cheap Talk.”
Journal of Economic Theory, 78: 286-298.
[12] Crawford, Vincent. 2003. “Lying for strategic advantage: Rational and boundedly rational
misrepresentation of intentions.” American Economic Review, 93: 133-149.
[13] Crawford, Vincent P. and Joel Sobel. 1982. “Strategic Information Transmission.” Econometrica, 50: 1431-1451.
43
[14] De Groot Ruiz, Adrian, Theo Offerman, and Sander Onderstal. 2015. “Equilibrium Selection
in Experimental Cheap Talk Games.” Games and Economic Behavior, 91: 14-25.
[15] De Groot Ruiz, Adrian, Theo Offerman, and Sander Onderstal. 2014. “For Those about to
Talk We Salute You: An Experimental Study of Credible Deviations and ACDC.” Experimental Economics, 17: 173-199.
[16] Dickhaut, John W., Kevin A. McCabe, and Arijit Mukherji. 1995. “An Experimental Study
of Strategic Information Transmission.” Economic Theory, 6: 389-403.
[17] Duffy, John, Ernest K. Lai, and Wooyoung Lim. 2015. “Coordination via Correlation: An
Experimental Study.” Mimeo.
[18] Eső, Péter and James Schummer. 2009. “Credible Deivations from Signaling Equilibria.”
International Journal of Game Theory, 38: 411-430.
[19] Farrell, Joseph. 1993. “Meaning and Credibility in Cheap Talk Games.” Games and Economics Behavior, 5: 514-531.
[20] Farrell, Joseph and Robert Gibbons. 1988. “Cheap Talk, Neologisms, and Bargaining.”
working paper, Berkeley and MIT.
[21] Farrell, Joseph and Robert Gibbons. 1989. “Cheap Talk Can Matter in Bargaining.” Journal
of Economic Theory, 48: 221-237.
[22] Farrell, Joseph and Matthew Rabin. 1996. “Cheap Talk.” Journal of Economic Perspectives,
10: 103-118.
[23] Gertner, Robert, Robert Gibbons, David Scharfstein. 1988. “Simultaneous Signalling to the
Capital and Product Markets.” The RAND Journal of Economics, 19: 173-190.
[24] Gneezy, Uri. 2005. “Deception: The Role of Consequences.” American Economic Review,
95(1): 384-394.
[25] Grossman, Sanford J. and Motty Perry, 1986. “Perfect Sequential Equilibrium.” Journal of
Economic Theory, 39: 97-119.
[26] Hurkens, Sjaak, and Navin Kartik, 2009. “Would I Lie to You? On Social Preferences and
Lying Aversion.” Experimental Economics, 12: 180-192.
[27] Kawagoe, Toshiji, and Hirokazu Takizawa, 2008. “Equilibrium Refinement vs. Level-k Analysis: An Experimental Study of Cheap-Talk Games with Private Information.” Games and
Economics Behavior, 66: 238-255.
44
[28] Kim, Kyungmin and Philipp Kircher, 2015. “Efficient Competition through Cheap Talk:
The Case of Competing Auctions.” Econometrica, forthcoming.
[29] Lai, Ernest K., Wooyoung Lim, and Joseph Tao-yi Wang, 2015. “An Experimental Analysis
of Multidimensional Cheap Talk.” Games and Economic Behavior, 91: 114-144.
[30] Lim, Wooyoung, 2014. “Communication in Bargaining over Decision Rights.” Games and
Economic Behavior, 85: 159-179.
[31] Matthews, Steven, Masahiro Okuno-Fujiwara, Andrew Postlewaite, 1991. “Refining CheapTalk Equilibria.” Journal of Economic Thoery, 55: 247-273.
[32] Rabin, Matthew. 1990. “Communication between Rational Agent.” Journal of Economic
Theory, 51: 144-170.
[33] Rabin, Matthew and Joel Sobel. 1996. “Deviations, Dynamics and Equilibrium Refinements.” Journal of Economic Theory, 68: 1-25.
[34] Sánchez-Pagés, Santiago, and Marc Vorsatz. 2007. “An Experimental Study of Truth-Telling
in Sender-Receiver Game.” Games and Economic Behavior, 61: 86-112.
[35] Serra-Garcia, Marta, Eric van Damme, and Jan Potters. 2013. “Lying About What You
Know or About What You Do?” Journal of the European Economic Association, 11(5),
1204-1229.
[36] Sobel, Joel. 2013. “Ten Possible Experiments on Communication and Deception.” Journal
of Economic Behavior and Organization, 93: 408-413.
[37] Sopher Barry and Iñigo Zapater. 2000. “Communication and Coordination in Signalling
Games: An Experimental Study.” Mimeo.
[38] Wang, Joseph Tao-Yi, Michael Spezio, and Colin F. Camerer. 2010. “Pinocchio’s Pupil:
Using Eyetracking and Pupil Dilation to Understand Truth Telling and Deception in SenderReceiver Games.” American Economic Review, 100: 984-1007.
45