Meaning, its Evolution, and Credibility in Experimental Cheap-Talk Games∗ Ernest K. Lai Department of Economics Lehigh University [email protected] Wooyoung Lim† Department of Economics The Hong Kong University of Science and Technology [email protected] February 2, 2016 Abstract We design two sets of experimental games to evaluate the predictive power of the pioneering cheap-talk refinement, neologism-proofness. In our first set of treatments designed to evaluate neologism-proofness with its usual emphasis on literal meanings, we observe that an equilibrium was played more often when it was neologism-proof than when it was not. We further find that the mere existence of a neologism, albeit not credible, attracted deviating behavior from the senders, while the receivers’ deviations required the neologism to be credible. Our second set of treatments evaluates the refinement concept in the absence of a pre-existing common language, in which the meaning of a neologism must evolve. We obtain a few, but not all, observations where the meaning of a neologism emerged in accordance with what neologism-proofness predicts. Our findings shed light on the powers and limitations of neologism-proofness in predicting laboratory behavior under different levels of challenges posed by different language environments. Keywords: Neologism-Proofness; Cheap Talk; Equilibrium Refinement; Evolution of Meanings; Laboratory Experiments JEL classification: C72; C92; D82; D83 ∗ We are grateful to Andreas Blume, Syng-Joo Choi, Navin Kartik, Barry Sopher and Joel Sobel for their valuable comments and suggestions. We also thank conference and seminar participants at HKUST, Rutgers University, the 2nd Haverford Meeting on Behavioral and Experimental Economics, the 16th KAEA-KEA International Conference, the 5th Annual Xiamen University International Workshop on Experimental Economics, the 90th Western Economic Association International Annual Conference and the 26th International Conference on Game Theory for valuable discussions. This study is supported by a grant from the Research Grants Council of Hong Kong (Grant No. ECS-699613). Lai gratefully acknowledges the financial support from the Office of the Vice President and Associate Provost for Research and Graduate Studies at Lehigh University. This paper is a substantially revised and extended version of an earlier paper “Meaning and Credibility in Experimental Cheap-Talk Games.” † Corresponding author. Address: HKUST Department of Economics, LSK Business Building, Clearwater Bay, Kowloon, Hong Kong. Phone: (852) 2358 7628. 1 Introduction Cheap-talk games are a type of signaling game in which messages have no direct payoff consequence. The costless nature of messages has profound implications for the treatment of cheap-talk equilibria. In using cheap talk to communicate, the meanings of messages can only be established by use (in equilibrium) and cannot be learned by introspection. Furthermore, for every equilibrium with unsent messages, there exists another outcome-equivalent equilibrium in which all messages are used so that no off-equilibrium paths exist. Consequently, the standard refinement concepts for costly signalling games (e.g., the intuitive criterion), which select equilibria by restricting how messages should be interpreted off the equilibrium path, have no selection power for cheap-talk games. Several refinement concepts exclusively designed for cheap-talk games have been proposed. Farrell’s (1993) notion of neologism-proofness is one such concept. It is the pioneering contribution that serves as the headwater of the refinement literature on cheap talk.1 Despite the fact that neologism-proofness has been developed for more than twenty years, no experimental effort has been devoted exclusively to evaluating its predictive power. In fact, only a handful of earlier studies (e.g., Sopher and Zapater, 2000; Blume, DeJong, Kim, and Sprinkle, 2001) has experimentally evaluated cheap-talk refinements, and only a limited set of concepts have been covered.2 The situation is unlike that of the major refinement concepts for costly signaling games, which have been subject to thorough experimental investigations (e.g., Brandts and Holt, 1992; Banks, Camerer and Porter, 1994) at an early stage of the literature development. This study contributes to filling this void of the experimental cheap-talk literature. Multiplicity of equilibria is a notorious property of cheap-talk games, and the refinement of cheap-talk equilibria is an issue that has not been entirely settled. There is therefore an apparent need for empirical work, which may complement the theoretical advancement of cheap-talk refinements. In this, it is natural to revisit its very first refinement concept. Meaning and credibility are two building blocks of neologism-proofness. In the theoretical context, a neologism, from the Greek for “new word,” refers to an out-of-equilibrium message.3 1 Rabin’s (1990) nonequilibrium notion of “credible message rationalizability” builds on neologism-proofness by combining it with rationalizability. Matthews, Okuno-Fujiwara and Postlewaite (1991) develop the concept of announcement-proofness, which generalizes neologism-proofness. Chen, Kartik and Sobel (2008) relate their criterion, NITS (No Incentive To Separate), to neologism-proofness by showing that an equilibrium survives NITS if it is neologism-proof. More recently, De Groot Ruiz, Offerman and Onderstal (2015) propose a behavioral refinement, ACDC (Average Credible Deviation Criterion), which further extends on neologism-proofness and announcement-proofness. There are also other refinement concepts that are, unlike neologism-proofness and its derivatives, not belief-based. They include the “partial common interest” by Blume, Kim and Sobel (1993) and the “recurrent mop” by Blume, Kim and Sobel (1993). It is also worth mentioning the concept of immunity to credible deviations by Eső and Schummer (2009), which has selection power for Crawford and Sobel’s (1982) model where the cheap talk is replaced by costly messages. 2 There are some recent experimental studies on cheap-talk refinements (e.g., Kawagoe and Takizawa, 2008; De Groot Ruiz, Offerman and Onderstal, 2014), which will be discussed in the literature review. 3 Although any unused message in a cheap-talk equilibrium can be turned into use by means of sender’s 1 Despite the fact that a neologism, being an unsent message, has no meaning in light of established use, Farrell (1993) argues that if the players share a pre-existing language (e.g., English), a neologism may still be understood by its commonly known literal meaning.4 Building on the assumption that a neologism is to be understood literally, Farrell (1993) then introduces the concept of credible neologisms: when certain types of the sender, who want to deviate from a putative equilibrium by sending a particular neologism, precisely match the meaning of the neologism (expressed in the form of “I am of one of these types”), the neologism is credible or self-signaling. A neologism-proof equilibrium is an equilibrium in which no credible neologisms exist. The above definition of neologism-proofness, with its emphasis on literal meanings, informs the design of our first set of treatments. We design two cheap-talk games with binary state and with literally meaningful messages. We vary the cardinality of the message spaces from two to three, where the third, additional message represents the neologism that we induce with respect to a putative equilibrium under consideration. This manipulation of the message spaces results in a total of four treatments. All four games each admit a fully revealing equilibrium. But their payoff structures and/or message spaces are different so that only the fully revealing equilibrium in some of the games is neologism proof, and this serves as our treatment variation. Farrell (1993) also provides an evolutionary interpretation of neologisms, invoking the fact that an equilibrium can be viewed as an evolutionarily stable outcome. In this interpretation, it is natural to consider messages with no a priori meanings, where the meaning of a neologism must evolve endogenously. An experimental study with subjects participating in repeated interactions provides an appropriate arena to study neologism-proof equilibria in light of this interpretation. Such a study allows us to investigate empirically how the evolved meaning of a neologism may affect the communication outcome depending on whether the equilibrium in question is neologism-proof. Our second set of treatments takes the two basic games in the first set of treatments and endow them with a priori meaningless messages, initially with only two messages. Unlike the first set of treatments where the presence of a neologism is a between-subject treatment variable, in this second set of treatments the introduction of a neologism, i.e., a third meaningless message, is within subject. It is introduced in the middle of an experimental session, after meanings have supposedly been established for the two initial messages. We investigate whether the introduction of a neologism in this manner may disrupt any (fully revealing) equilibrium play and whether that is as predicted by the neologism-proof property of the equilibrium in question. randomization (in another outcome-equivalent equilibrium), Farrell (1993) argues that randomization may not be plausible, especially when the message space is large. In developing neologism-proofness, he therefore assumes that neologisms (unused messages) always exist. 4 A feature of neologism-proofness that distinguishes it from standard signaling refinements is its reference to natural language. Perfect sequential equilibrium developed by Grossman and Perry (1986) for costly signaling games does not have the same natural language element but is otherwise closely related to neologism-proofness. 2 Our findings from the first set of treatments indicate that neologisms and their credibility played an evident role in how subjects played the games. Overall, fully revealing equilibria that were neologism-proof were played more often than those that were not. The senders and the receivers, however, responded differently to the presence and credibility of neologisms. The mere existence of a literally meaningful neologism, albeit not credible, sufficed to attract deviations from the senders. The receivers, on the other hand, deviated from the separating strategies only when the meaningful neologism was credible. In the absence of a pre-existing common language, there was a greater variation in individual group behavior. We observed a few, but not all, individual groups in which, before the neologism was introduced, subjects used and interpreted messages in a way consistent with a fully revealing equilibrium. In other words, distinct meanings evolved for the two initial messages in selected groups.5 We further observed among these groups some cases where the introduction of the neologism disrupted the equilibrium play, at the same time when the meaning of the neologism evolved to render it self-signaling. On the other hand, when the fully revealing equilibrium was neologism-proof, there were cases where the equilibrium play remained intact after the neologism was introduced. Before we proceed to the literature review, we discuss three limitations of neologism-proofness as a refinement concept and argue how our study may provide responses to two of the limitations. The discussion serves to further highlight the contribution of our experiment in the context of the theory. As a refinement concept, neologism-proofness lacks a general existence property, and therefore we do not know what it predicts when existence fails. It also lacks a complete formalization regarding the presence of unsent messages with natural meanings. Finally, it falls short of fully addressing the concern over the usefulness of natural language, as neologisms arise off the equilibrium path but languages on the path are still arbitrary.6 Regarding the first limitation, our experiment helps assess from an empirical vantage point whether the lack of a general existence property should be a reason to discard the refinement concept even for the games in which it has the power to predict play.7 While our study does not address the second limitation, it provides an experimental response to the third limitation by exploring whether there is an empirical tendency for literal meanings to be followed on the equilibrium path. Furthermore, our treatments with meaningless messages offer a contrasting 5 By individual group, we mean a matching group in which a few sender-subjects randomly match with a few receiver-subjects to form groups of two. Statistically, a matching group constitutes an independent observation. 6 We thank Joel Sobel for sharing these points with us. 7 Regarding the lack of existence, for instance, no equilibrium outcome in the leading example of Crawford and Sobel (1982)—the uniform quadratic formulation—is neologism-proof. Yet in some other instances neologismproofness is still relied upon (probably due to the lack of a perfect alternative) to obtain sharper predictions of the games (e.g., Gertner, Gibbons and Schafstein, 1988; Farrell and Gibbons, 1988, 1989; Austen-Smith, 1990; Lim, 2014; Kim and Kircher, 2014); see also the discussion in Farrell and Rabin (1996). Thus, a major question we pose in this paper is: how well neologism-proofness predicts play in the laboratory for the games in which the refinement concept has some bite? 3 point to demonstrate the importance of natural language in a laboratory setting. Related Literature. To our knowledge, the first experimental study that involves neologismproofness is Blume et al. (2001). With the primary objective of testing the validity of the Partial Common Interest (PCI) criterion (Blume, Kim, and Sobel, 1993), Blume et al. (2001) compare PCI with other selection criteria including neologism-proofness. In the games they consider, neologism-proofness either shares the same prediction as Pareto efficiency or rejects all the equilibria. Their findings therefore do not distinctively reveal how useful neologismproofness is in predicting play, which is not a focus of their study to begin with. As will be discussed below, avoiding the potential confound of Pareto efficiency as a selection criterion is a major consideration informing our experimental design. Two other experimental studies related to cheap-talk refinements are Sopher and Zapater (2000) and Kawagoe and Takiawa (2008). Sopher and Zapater (2000) find experimental support for the credible message rationalizability proposed by Rabin (1990). Kawagoe and Takizawa (2008) compare the effectivenesses of refinement concepts and of level-k model in explaining their data, concluding that level-k analysis outperforms. Their design, however, does not provide a rich enough environment to address neologism-proofness as neologisms are absent in their induced environments. More recently, De Groot Ruiz, Offerman and Onderstal (2015) propose a new cheap-talk refinement, Average Credible Deviation Criterion (ACDC), in which the stability of an equilibrium is measured by the frequency and size of credible deviations. They show that ACDC successfully organizes the data from several prior experiments. Most related to our study is their another paper (De Groot Ruiz, Offerman and Onderstal, 2014), which further investigates ACDC with their own experiment. They find that neologism-proofness performs well when its prediction is unique, in which case ACDC predicts the same as does neologism-proofness. Perhaps more importantly, the ACDC equilibrium is found to perform the best when neologism-proofness (and announcement-proofness) has no selection power. Our paper differs from theirs on focus, design, and the issue addressed. While they use the inability of neologism-proofness to predict as a contrasting point to demonstrate the usefulness of ACDC, our study aims at evaluating the usefulness of neologism-proofness by focusing exclusively on the cases where the concept selects an equilibrium. The use of a priori meaningless messages to investigate the evolution of meaning of a neologism is also a unique feature of our design, an issue not addressed in their study. Our paper is also related to two other strands of literature. The first is the literature on experimental communication games (e.g., Dickhaut, McCabe, and Mukherji, 1995; Blume et al. 1998, 2001; Gneezy, 2005; Cai and Wang, 2006; Sánchez-Pagés and Vorsatz, 2007; Hurkens and Kartik, 2009; Wang, Spezio, and Camerer, 2010).8 A robust finding of this literature is the 8 See also Crawford (1998) for a survey of earlier studies and a discussion on the connection between cheap-talk refinements and costly signaling game refinements. 4 observation of “over-communication” or “lying aversion,” where subjects communicated more than what equilibrium predicts. Our paper differs from these papers in that we are interested not only in whether subjects play according to equilibria but also in whether they play according to selected or refined equilibria. Our investigation of the evolution of meanings using a priori meaningless messages is a subject matter of Blume et al. (1998, 2001) and Blume, DeJong, and Sprinkle (2008), but studying the evolution in reference to neologism-proof equilibria represents a new inquiry. The communication games we use also have a different qualitative property from those of the games used in these studies. While Blume et al. (1998, 2001) and Blume, DeJong, and Sprinkle (2008) use either common-interest or partially-common-interest games, our study was the first to investigate the evolution of meanings in games in which the sender and the receiver have conflicting preferences over different equilibria.9 Furthermore, the introduction of an additional meaningless message in the middle of an experimental session is also, to our knowledge, not explored before.10 The other literature to which our paper is related is the experimental investigation of equilibrium refinements for costly signaling games. Brandts and Holt (1992) test the intuitive criterion by Cho and Kreps (1987). They find that both message-level and action-level data supported the intuitive equilibrium, although this equilibrium was vulnerable to self-enforcing assignment by an outside authority. Banks, Camerer, and Porter (1994) design games to separate various refinements including the Nash equilibrium, sequential equilibrium, intuitive criterion, divine, universal divine, and never-weak-best-reply. They show that subjects’ behaviors converged to the more refined equilibrium up to the intuitive criterion. The rest of the paper proceeds as follows. Section 2 lays out our experimental games and analyzes their equilibria. Section 3 discusses our experimental hypotheses and procedures. Section 4 reports our findings. Section 5 concludes. 9 Duffy, Lai, and Lim (2015) study how meanings of a priori meaningless messages develop in the Battle of the Sexes game, also a game with conflicting preferences over different equilibria. Their study is, however, not in a sender-receiver setting and is about how subjects learn to play a correlated equilibrium aided by an external correlation device sending meaningless messages. 10 In relation to our manipulation of the sizes of the message spaces, Blume et al. (2008) document that restricting the message space expedites convergence when messages are a priori meaningless; Lai, Lim, and Wang (2015) highlight the role of the message space in facilitating information transmission in multidimensional settings; Serra-Garcia, van Damme, and Potters (2013) study the effect of limiting the message space in public goods games with cheap-talk communication about private information. Our novel contribution relative to these studies is to provide a new channel—a refinement of equilibria—through which message spaces may play a role in information transmission. 5 2 The Experimental Cheap-Talk Games 2.1 Four Games with Literal Messages Our first set of treatments consists of four cheap-talk games endowed with literally meaningful messages. Putting aside the variations in the sizes of the message spaces, the four games reduce to two games with different payoff structures, which we call Game 1 and Game 2. We first describe these two games and provide details regarding the message spaces when we discuss neologisms below. There are two players in each game, a sender (he) and a receiver (she). The sender is privately informed about his type θ P ts, tu, and the common prior is that the two types are equally likely. After observing the realized θ, the sender sends a cheap-talk message, m P M , to the receiver. Upon receiving m, the receiver takes an action a P tL, C, Ru. Table 1 presents the payoffs of the games. In each payoff cell, the first number is the sender’s payoff and the second number is the receiver’s payoff when an action is taken in a state. s t L 30, 20 30, 20 C 20, 30 8, 0 R 0, 8 20, 30 s t L 50, 20 10, 20 (a) Game 1 C 20, 30 8, 0 R 0, 8 20, 30 (b) Game 2 Table 1: Payoffs of the Games We design these games out of simplicity and three other considerations. First, as long as |M | ě 2, Games 1 and 2 both have multiple perfect Bayesian equilibrium (henceforth “equilibrium”) outcomes with no Pareto ranking. There are thus rooms for equilibrium selection, and yet Pareto dominance is ruled out as a confounding selection criterion.11 There are two equilibrium outcomes, babbling and fully revealing. In the babbling equilibrium outcome, the receiver ignores the sender’s message, taking the ex-ante ideal action L under the equal priors; in the fully revealing equilibrium outcome, the receiver takes different actions, C or R, after receiving distinct messages sent by distinct types of the sender. There is no Pareto dominance as there are conflicting preferences over the two outcomes: in both games, the sender strictly prefers the babbling outcome to the fully revealing outcome, while the opposite is true for the receiver. Our second consideration is to minimize the possibility that subjects choose to play different equilibria in different games out of pure payoff considerations. In this, we control the profile of the sender’s expected payoffs from the two equilibrium outcomes, 30 for the babbling and 20 for the fully revealing, to be the same across the two games. 11 Game 1 shares the same qualitative structure as the “I Won’t Tell You” game in Farrell (1993), Game Γ2 in Matthews, Okuno-Fujiwara, and Postlewaite (1991), Game 2 in Kawagoe and Takizawa (2008) and Game 1 in Sobel (2013). Refer to Sobel (2013) for a justification for the payoff structure of the game. 6 Our last consideration is to ensure that other-regarding preferences, if any, do not play a role in the equilibrium selection. To achieve this, we assign payoffs so that in both games a player’s expected payoff from one equilibrium outcome is the other player’s expected payoff from the other outcome, making the payoff profiles similar to that found in the Battle of the Sexes game.12 Neologism-Proofness Applied. A neologism-proof equilibrium is an equilibrium in which a credible neologism does not exist. To set the ground for defining credible neologisms, Farrell (1993) assumes that for every relevant equilibrium, a neologism, being an unsent message with literal meaning “my type is in K,” exists for every non-empty subset K of types. Relative to a putative equilibrium, a neologism with meaning “my type is in K” is then credible, or selfsignaling, if a) the sender’s types in K strictly prefer the outcome achieved when the neologism is believed over the putative equilibrium outcome, and b) the types not in K weakly prefer to stay in the putative equilibrium. In designing a laboratory environment that is easy for subjects to comprehend, we look for games that allow us to apply neologism-proofness in the simplest possible setting. Note that Farrell’s (1993) requirement that a neologism exists for every non-empty subset of types is not part of the definition of credible neologisms. Accordingly, neologism-proofness can still be applied when, for a given equilibrium, a neologism exists for some but not all subsets of types. This balance between a simple design and the applicability of the concept guides the design of our message spaces, which feature what we call limited neologisms. We take Games 1 and 2 and pair each of them with a message space that contains three literal messages: M “ t“my type is s”, “my type is t”, “I won’t tell you my type”u. The resulting games are called Game 1M 3 and Game 2M 3. We characterize the neologism-proof equilibrium outcomes of these two games as follows: Proposition 1. The fully revealing outcome in Game 1M 3 cannot be supported as neologismproof, whereas that in Game 2M 3 can be so supported. The babbling outcome in Game 1M 3 can be supported as neologism-proof, whereas that in Game 2M 3 cannot be so supported. To show that the fully revealing outcome in Game 1M 3 cannot be supported as neologismproof, it suffices to show that one of the fully revealing equilibria is not neologism-proof. Consider then the truth-telling equilibrium in which s and t send, respectively, “my type is s” and “my type is t.”13 The payoffs for both types in this equilibrium are 20. Now, “I won’t tell you my 12 Yet another consideration we have is to create an environment in which the prediction of neologism-proofness can be separated as much as possible from that of the level-k model of bounded rationality, which is commonly applied to rationalize findings from communication games. Note that the only difference between Games 1 and 2 is the type-t sender’s ideal actions. The expected payoff from each equilibrium outcome for each player, as well as the receiver’s ideal actions, are designed to be the same across the two games. Accordingly, for a given size of the message space, very limited configurations of level-k models can generate different predictions for the two games. For a supplementary level-k analysis, see Appendix A. 13 The truth-telling equilibrium is a fully revealing equilibrium in which literal meanings are used on the 7 type,” which literally means the sender’s type is in ts, tu, is the neologism. If the receiver believes the literal meaning of this neologism, she will take the ex-ante ideal action L, resulting in a payoff of 30 for the sender regardless of his type. Given that 30 ą 20, both s and t—precisely the types corresponding to the literal meaning of the neologism—prefer the outcome achieved when the neologism is believed over the putative equilibrium outcome: the neologism is thus self-signaling. For the truth-telling equilibrium in Game 2M 3, note that when the receiver believes the literal meaning of “I won’t tell you my type” and takes action L, only s but not t strictly prefers the neologism to be believed: the neologism is thus not self-signaling. Consider further the remaining two fully revealing equilibria that are not truth telling, in which either “my type is s” or “my type is t” is the neologism. In either case, the type expressed by the literal meaning of the neologism would not strictly prefer the outcome achieved when the neologism is believed, because the on- and off-equilibrium path payoffs are the same for the type. The neologisms in these non-truth-telling fully revealing equilibria are thus also not self-signaling. Accordingly, the fully revealing outcome in Game 2M 3 can be supported as neologism-proof.14 In Game 1M 3, since both sender’s types receive their maximum payoffs in the babbling outcome, it is straightforward that no credible neologisms can exist, and thus the babbling outcome can be supported as neologism-proof. Finally, to show that the babbling outcome in Game 2M 3, which gives type s a payoff of 50 and type t a payoff of 10, cannot be supported as neologism-proof, it suffices to consider a babbling equilibrium in which “my type is t” is a neologism (there may be another neologism). In this case, type t strictly prefers the outcome achieved when the neologism is believed (in which case the receiver takes R, giving type s a payoff of 0 and type t a payoff of 20) over the putative equilibrium outcome. Type s, on the other hand, prefers to stay in the equilibrium, which together makes the neologism self-signaling. To generate richer treatment variations so as to identify the effects of neologisms, we introduce two more games, Game 1M 2 and Game 2M 2, which are Games 1 and 2 endowed with a binary message space M 1 “ t“my type is s”, “my type is t”u. The neologism-proof equilibrium outcomes of the two games are characterized as follows: Proposition 2. Both the fully revealing and babbling outcomes in Game 1M 2 can be supported as neologism-proof. Only the fully revealing outcome in Game 2M 2 can be supported as neologismequilibrium path. 14 While the use of limited neologisms does not prevent us from applying neologism-proofness, it calls for a different approach in showing that an equilibrium outcome can be supported as neologism-proof. In Farrell (1993), since for any equilibrium a neologism exists for every subset of types, when a neologism-proof equilibrium exists to support an outcome, it automatically covers all neologisms, i.e., under no neologism does self-signaling occur to compromise the equilibrium outcome. When there are limited neologisms, however, different equilibria that support an outcome may be associated with different (sets of) neologisms. Even if self-signaling does not occur under the neologism(s) available in one supporting equilibrium, it may occur under another neologism in another supporting equilibrium. Accordingly, to show that an outcome can be supported as neologism-proof, we need to show that all its supporting equilibria with different neologisms are neologism-proof so as to cover all neologisms; if there is one supporting equilibrium that is not neologism-proof, the outcome cannot be so supported, even though there are other neologism-proof supporting equilibria. 8 proof. For these games with binary messages, the fully revealing outcome is either supported by a truth-telling equilibrium or by another equilibrium in which the literal meanings of the messages are not followed on the equilibrium path. The neologism-proofness of the fully revealing outcomes in these games is a trivial consequence of there being only two messages so that in either supporting equilibrium there is no neologism. We have thus taken the limited neologisms to the extreme by eliminating neologisms.15 The neologism-proofnesses of the babbling outcomes in Games 1M 2 and 2M 2 are characterized in the same way as those in Games 1M 3 and 2M 3. A couple of remarks about how the design of message spaces for these treatments relates to Farrell’s (1993) notion of natural language and his assumed existence of neologisms is in order. There are two main ingredients in Farrell’s (1993) notion of natural language: a) common language, i.e., each message has a literal meaning that is associated with a type in the type space, and b) rich language, i.e., the message space is large enough (relative to the type space) so that for any subset K of the sender’s types, a message with the literal meaning “my type is in K” exists. Our message spaces fulfill the first requirement: being framed in English, the messages have clear and commonly understood literal meanings. While by most standards our message spaces cannot be considered as large and despite the limited neologisms we induce, our message space M is rich enough to describe all subsets of the binary types, even though some of these messages are used in equilibrium. In assuming the existence of neologisms, Farrell (1993) argues that a mixed-strategy equilibrium in which the sender randomizes over messages with the same equilibrium meaning is implausible. This type of equilibria will be implausible when the sender has a preference for short, simple, and straightforward messages.16 Our design with two different sizes of message spaces allows us to evaluate whether the assumed existence of unsent messages is an empirically plausible assumption. If randomization is a natural empirical tendency, having an additional message in M relative to M 1 should not affect the information transmission outcomes. 15 Our Game 1M 2 is of the same class as Game 2 in Kawagoe and Takiawa (2008). Our conclusion on the neologism-proofness of the fully revealing outcome is, however, different from theirs. With binary types and binary messages, there will be no unsent messages in any fully revealing equilibrium in our Game 1M 2 or in Kawagoe and Takiawa’s (2008) Game 2. Since no neologisms exist, we consider that the fully revealing outcome trivially survives neologism-proofness, while they consider that only the babbling outcome survives the refinement. This difference in characterization will have implications for the interpretation of experimental findings. As will become apparent below, subjects’ behavior in our Game 1M 2 conformed to the fully revealing equilibrium, which, accordingly to our view, is consistent with the prediction of neologism-proofness. On the other hand, under the conviction that only the babbling outcome survives neologism-proofness, Kawagoe and Takiawa (2008) attribute their similar findings to over-communication. 16 On this point, Farrell (1993) states that: “For example, if type t wants (and is expected) to reveal himself, and if both the English sentences, ‘I am t,’ and ‘I am either u or v,’ are interpreted in equilibrium as meaning ‘I am t,’ then S [the sender] will prefer the former” (pp. 518, Section 4. Are There Unused Messages?). 9 2.2 Two Games with A Priori Meaningless Messages Despite the important role of common language in neologism-proofness, Farrell (1993) also provides an evolutionary interpretation of neologisms in an environment without pre-existing language, where the meaning of a neologism must evolve. To investigate the predictive power of neologism-proofness from this perspective, we design a second set of treatments, which consists of two additional games with a priori meaningless messages, Game 1E and Game 2E. Games 1E and 2E are, respectively, Games 1 and 2 endowed with an initial binary message space M 2 “ t$, %u. In the course of the repetitions of subjects’ tasks, a third meaningless message, “&”, will be introduced and made available to the sender-subjects in each game. Our second set of treatments thus involves a within-subject variation of the message spaces (the details of the experimental procedures will be provided in Section 3.2 below).17 Without proving additional propositions, we discuss what behavior may emerge in Games 1E and 2E, leveraging the characterizations in Proposition 1 and anticipating that meanings will eventually emerge in one way or the other.18 Note that our second set of treatments is more of an exploratory nature, and the discussion below is meant to provide some behavioral restrictions that help inform our experimental hypotheses. We start with two plausible conjectures about how play may evolve before and immediately after the introduction of the third message. First, given the payoff structures of the games, it is plausible that some separation may occur under the binary message spaces, which results in distinct meanings being emerged for the two initial messages, “$” and “%”.19 When the third message, “&”, is introduced and sent, it is considered as a neologism. In the absence of a precedence and a literal meaning, a natural initial response of the receiver is to ignore it: our second conjecture is that, immediately after it is introduced, the receiver considers that the neologism does not provide any information and takes the ex-ante ideal action L under the equal priors. These conjectures and the simple analysis of initial behavior lead to different predicted paths of play in Games 1E and 2E. Recall that in Game 1E, both types of the sender obtain 20 from separation in a fully revealing equilibrium and 30 from the receiver’s taking L. Accordingly, it pays for both s and t to send 17 Accordingly, Games 1E and 2E should be more properly understood as an experimental design rather than one-shot games in a formal sense; the variation of the message space in a particular “game” (treatment) involves, strictly speaking, two different games. 18 Farrell’s (1993) discussion regarding the evolutionary interpretation of neologism-proofness (p.526-527, Section 9) is based on an example and is also rather informal. Note that our design is not to empirically evaluate the exact argument he makes with his example, partly because his example is based on a game with a different equilibrium property. Rather, our objective is to investigate how well neologism-proofness might predict in the absence of pre-existing language. 19 As an empirical precedence supporting this conjecture, Blume et al. (1998) obtain the finding that meaningful and efficient communication (i.e., separation) endogenously emerged in a common-interest communication game with a priori meaningless messages, where, as in our case, the type space and the message space share the same cardinality of two. 10 the neologism when the receiver’s initial response to it is L. As they do, a meaning that does not exist in the prevailing fully revealing equilibrium emerges for “&”, which is exactly that the sender can be of either type. This in turn reinforces the receiver to choose L. The neologism is thus self-signaling with respect to its evolved meaning, and we expect to see less separation after it is introduced, echoing the fact that the fully revealing outcome in the corresponding Game 1M 3 cannot be supported as neologism-proof. In Game 2E, the receiver’s initial response of L to “&” will only attract type s to send the neologism. As “&” becomes a choice of message by s, its meaning evolves to be that the sender’s type is s, to which the receiver’s best response is C. The incentive for type s to send the neologism is then weakened, because s receives the same treatment in the prevailing fully revealing equilibrium. The neologism also does not convey any new meaning relative to the prevailing equilibrium. On the other hand, regardless of whether the receiver’s response to “&” is L (initially) or C (eventually), it does not pay for type t to send the neologism. Since no type strictly prefers to send “&”, it is not self-signaling; separation remains even with the third message, echoing the fact that the fully revealing outcome in the corresponding Game 2M 3 can be supported as neologism-proof. 3 Experimental Hypotheses and Procedures 3.1 Hypotheses The three different configurations of message spaces and the two different payoff structures give rise to a 3 ˆ 2 treatment design, which is summarized in Table 2. The table also summarizes the neologism-proofness characterizations presented in Section 2. Table 2: Experimental Treatments Game 1 Game 2 Literal M |M | “ 3 Game 1M 3 neologism-proof: babbling Game 2M 3 neologism-proof: fully revealing Literal M 1 |M 1 | “ 2 Game 1M 2 neologism-proof: both Game 2M 2 neologism-proof: fully revealing Meaningless M 2 |M 2 | “ 2 Ñ 3 Game 1E neologism-proof: both Ñ babbling Game 2E neologism-proof: fully revealing Ñ fully revealing In formulating the experimental hypotheses, we make comparisons within each set of treatments, one with literal messages and one with meaningless messages. For the treatments with 11 literal messages, we compare the frequencies of fully revealing outcomes in different games. Informed by Propositions 1 and 2, the hypotheses are based on the premise that a fully revealing equilibrium surviving neologism-proof is more likely to be played. We start the comparative statics with Game 2M 2: from Game 2M 2 to Game 2M 3, we create a neologism that is not credible. Our first hypothesis concerns the effect of the existence of non-credible neologisms: Hypothesis 1 (Effect of the Existence of Non-Credible Neologisms). The frequency of fully revealing outcome in Game 2M 2 is the same as that in Game 2M 3. From Game 2M 3 to Game 1M 3, we make the neologism credible. Our second hypothesis concerns the effect of the credibility of neologisms: Hypothesis 2 (Effect of the Credibility of Neologisms). The frequency of fully revealing outcome is higher in Game 2M 3 than in Game 1M 3. From Game 1M 3 with a credible neologism to Game 1M 2 with only two messages, we get rid of any neologism. Our third hypothesis concerns the joint effect of the existence and credibility of neologisms: Hypothesis 3 (Effect of the Existence and Credibility of Neologisms). The frequency of fully revealing outcome is lower in Game 1M 3 than in Game 1M 2. Our last hypothesis concerns the treatments with meaningless messages. It evaluates how the evolution of meanings, to be established empirically by the frequencies of the senders’ types conditional on the messages sent, impacts on subjects’ play. In particular, we examine three effects of the evolved meaning of the third, neologistic message implied by the analysis in Section 2.2: Hypothesis 4 (Effects of the Evolution of Meanings of Credible and Non-Credible Neologisms). (i) In Game 1E, a meaning not present before the introduction of the third message “&”, that the sender is equally likely to be of either type, endogenously emerges for “&”; in Game 2E, no new meaning emerges for “&” after it is introduced. (ii) The frequency of “&” being sent is higher in Game 1E than in Game 2E. (iii) In Game 1E, the frequency of fully revealing outcome after the introduction of “&” is lower than that before the neologism is introduced; in Game 2E, there is no difference in such frequencies before and after “&” is introduced. 12 3.2 Procedures Our experiment was conducted in English using networked computers and z-Tree (Fischbacher, 2007) at The Hong Kong University of Science and Technology. A total of 256 subjects participated in the six treatments. The subjects, all without prior experience with our experiment, were recruited from the undergraduate population of the university. Upon arrival at the lab, subjects were instructed to sit at separate computer terminals. Each received a copy of the experimental instructions. The instructions were read aloud using slide illustrations as an aid, which was followed by a comprehension quiz and a practice round. The two sets of treatments follow different procedures and have different instructions. Appendix B contains the instructions for Games 1M 3 (with literal messages) and 1E (with meaningless messages); the instructions for the other games are similar. In the following, we describe the essence of the experimental procedures, starting with the literal-message treatments. Treatments with Literal Messages. Two sessions were conducted for each game with literal messages. A session was participated by two matching groups. A matching group consisted of 10 subjects, five as senders (Member As) and five as receivers (Member Bs). Random matching allowing repeating partners was used in each matching group, where senders and receivers were matched within groups. Viewing each matching group as an independent observation, we thus have four observations per game, which provide sufficient statistical power for non-parametric tests. In each session, subjects participated in 20 rounds of decision making under a single treatment condition, i.e., between-subject design was adopted for these treatments. At the beginning of a session, half of the subjects were randomly assigned the role of Member A and the other half the role of Member B. The role assignment remained fixed throughout the session. During the entire session, the relevant payoff table in Table 1 was shown on each subject’s screen. In order to elicit a rich set of information to analyze how subjects make decisions, we employed the strategy method and elicited beliefs. Subjects were told that there was a random variable, X, whose integer value ranged equally likely from 1 to 100. At the beginning of each round, the Member A in each group was asked what message he/she would send to the paired Member B if X ą 50 (corresponding to θ “ s) and if X ď 50 (corresponding to θ “ t). For the games with three messages, the available messages were “X is bigger than 50,” “X is smaller than or equal to 50,” and “I won’t tell you” (the last message was not available in the game with two messages). The Member B in each group was asked what action he/she would take, namely L, C, or R, upon receiving each of the two or three messages from the paired Member A.20 Once 20 For the sake of simple experimental instructions and design, we did not allow subjects to randomize explicitly over different messages/actions for a given contingency. Note that a strict and unique best-response action exists for a babbling message and for any truth-telling message so that allowing explicit randomization does not result in additional gain in generality. 13 the members had stated their strategies, the value of X would be realized and revealed to them. Their strategies were then implemented based on the realized X, and the rewards for the round would be determined. Having known the implemented message based on the realized X, the Member A in each group was further asked to predict the paired Member B’s action choice. A correct prediction was rewarded with two payoff points. Feedback on choices and rewards, but not strategies, was provided at the end of each round. For these treatments with 20 rounds of play, we randomly selected two rounds for payments. The sum of the payoffs a subject earned in the two selected rounds was converted into Hong Kong Dollars at a fixed and known exchange rate of HK$1 per payoff point. A show-up fee of HK$30 was also provided. Subjects on average earned HK$76.8 (« US$9.85) by participating in a session that on average lasted about an hour.21 Treatments with A Priori Meaningless Messages. Two sessions were conducted for each game with meaningless messages. A session was participated by six matching groups. In order to expedite the emergence of meanings while maintaining the use of random matching, we reduced the size of a matching group to four subjects, two as Member As and two as Member Bs. The smaller-sized matching groups also allow us to have more observations without increasing the number of sessions per game. Viewing each matching group as an independent observation, we have 12 observations per game in these treatments. In each session, subjects participated in 40 rounds of decision making, 20 rounds with two messages and 20 rounds with three messages. Subjects made decisions under a single treatment condition with a within-subject variation of the message spaces. The role-assigning procedure and the display of payoff table on subjects’ screens were the same as those in the treatments with literal messages. Given the focus on the emergence of meanings, we adopted the choice method for these treatments, which allows us to collect sufficient information for the purpose. We also did not elicit beliefs.22 Subjects were told that there was a random variable, X, whose integer value would be drawn equally likely from 1 to 100. At the beginning of each round, the Member A in each group was privately informed about whether the randomly drawn X was larger than 50 (corresponding to θ “ s) or less than or equal to 50 (corresponding to θ “ t). In each of the 1st round to the 20th round, the Member A was asked to send one of the two messages, “$” or “%”, to the paired Member B after seeing the realized range of X.23 21 Under the Hong Kong currency board system, the HK dollar is pegged to the US dollar at the rate of HK $7.8 to US$1. 22 While we draw analogy between the two sets of treatments, we do not perform direct comparative statics between the games with and without literal messages. The different use of decision-elicitation method and the omission of belief elicitations therefore do not jeopardize the integrity of our design. The different procedure used in these treatments with meaningless messages was also driven by a practical consideration. Since subjects were making 20 more rounds of decision, using strategy method and eliciting beliefs would prolong the time allocated for a session beyond the limit stipulated by our lab policy. 23 In order to avoid subjects using the positions of the symbols “$” and “%” on their screens and in the 14 In the paper instructions, subjects were told that further instructions would be given for the last 20 rounds. Thus, they were aware of the different instructions for the later rounds, but their decisions in the first 20 rounds could not be influenced by the details of the forthcoming instructions. After the 20th round was over, further instructions were provided on subjects’ screens. They were told that an additional message, “&”, became available for Member As to send. Everything else remained the same as in the first 20 rounds. In each round, after receiving a message from the paired Member A, the Member B in each group chose one of the three actions, L, C, or R. The rewards for the round were then determined based on the realized X and the Member B’s action choice. Feedback on choices and rewards was provided at the end of each round. Since subjects participated in 40 rounds of decision making in these treatments, we commensurately increased their average payments to reflect the extra time they spent. We randomly selected three rounds, and the sum of the payoffs a subject earned in these three rounds was converted into Hong Kong Dollars at a fixed and known exchange rate of HK$1 per payoff point. A higher show-up fee of HK$40 was also provided. Subjects on average earned HK$105.88 (« US$13.57) by participating in a session that on average lasted about two hours. 4 4.1 Experimental Findings Games with Literal Messages In this subsection, we report the findings from our first set of treatments with literal messages. We first analyze the data obtained via the strategy method by examining the on-path play, using it to evaluate the relevant hypotheses. We then examine the elicited strategies and beliefs, which provide further, supporting evidence. On-Path Plays. To extract on-path plays from the elicited strategies, we apply the elicited strategies to the realized X to determine the realized path of play in each instance of the decision making. We then use these realized plays to establish the frequencies of the relevant information transmission outcomes and of the subjects’ behavior. Other than the major task of evaluating Hypotheses 1-3, the comparisons of information transmission outcomes in games with different sizes of message spaces will also allow us to assess the empirical plausibility of Farrell’s (1993) assumed existence of unsent messages, which, as discussed in Section 2.1, is an essential part of the refinement concept. instructions as focal points for their choices of messages, we randomized the relative positions of the two symbols on the decision screens and in the instructions, and this was made known to subjects. On a Member A’s decision screen, the positions of the buttons “$” and “%” are randomized in each round. We prepared two sets of instructions, in which the two symbols are stated in different orders. We randomly distributed the instructions so that half of the subjects received one set of instructions and the other half received the other set. 15 Frequencies of Fully Revealing Equilibrium Outcomes Game 2M2 Game 2M3 Game 1M3 Game 1M2 1.00 Proportion 0.80 0.60 0.40 0.20 0.00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Round Figure 1: Frequencies of Fully Revealing Equilibrium Outcomes, Games 1M 2, 1M 3, 2M 2, and 2M 3 Figure 1 presents the round-by-round frequencies of fully revealing outcomes in the four games with literal messages. The frequencies were obtained by measuring how often the receivers took their ideal actions. Our first finding confirms Hypothesis 1: Finding 1 (Effect of the Existence of Non-Credible Neologisms). Confirming Hypothesis 1, there was no significant difference in the frequency of fully revealing outcome between Game 2M 2 and Game 2M 3. Using the independent group-level, all-round data, we fail to reject the null hypothesis of there being no difference in the frequencies of fully revealing outcome between Games 2M 2 and 2M 3, which were, respectively, 55% and 51% (two-sided p “ 0.486, Mann-Whitney test).24 While the existence of a non-credible neologism had no overall impact on the frequencies of fully revealing outcome, it did affect the senders’ and the receivers’ behavior. Figure 2 presents the frequencies of messages conditional on types and of actions conditional on messages.25 Using the observations from Game 2M 2 as a point of reference, in Game 2M 3 the introduction of the non-credible neologism (relative to the truth-telling equilibrium), “I won’t tell you,” attracted deviating behavior on the senders’ part. The senders in Game 2M 2 exhibited truthtelling behavior, where s-types sent “my type is s” 86% of the time and t-types sent “my type is t” 79% of the time. In Game 2M 3, the former frequency dropped significantly to 38% (p “ 0.014, Mann-Whitney test) and the latter not significantly to 61% (p “ 0.243, Mann-Whitney test). Furthermore, in Game 2M 3, s-types and t-types used the neologism “I won’t tell you” 58% and 31% of the time respectively. In response to “I won’t tell you,” the receivers in Game 2M 3 took 24 As Figure 1 reveals, there was no notable convergence in subjects’ behavior. We therefore use all-round data for our statistical tests. Unless otherwise indicated, the p values are from one-sided tests. We consider a comparison with p ă 0.06 as being statistically significant. 25 To illustrate how these frequencies are calculated based on on-path plays, suppose that in a round a) a sender furnishes the following strategy: sending “X is bigger than 50” if X ą 50 and sending “X is smaller than or equal to 50” if X ď 50; and b) a receiver furnishes the following strategy: taking action C after receiving message “X is bigger than 50,” taking R after receiving “X is smaller or equal to 50,” and taking L after receiving “I won’t tell you.” If it turns out that in this round the realized X ą 50, then one data point of “my type is s” being sent by s and one data point of action C being taken after “my type is s” is received will be recorded. 16 Proportions of Type‐Message Realized Pairs Proportions of Message‐Action Realized Pairs 1.00 0.60 Game 2M2 Game 2M3 Game 1M3 Game 1M2 0.80 Proportion 0.80 Proportion 1.00 Game 2M2 Game 2M3 Game 1M3 Game 1M2 0.40 0.20 0.60 0.40 0.20 0.00 "My type is s" "My type is t" "I won't tell you" "My type is s" s "My type is t" 0.00 "I won't tell you" L C "My type is s" t Type‐Message Type-Message R L C R "My type is t" Message‐Action L C R "I won’t tell you" Message-Action Figure 2: Senders’ and Receivers’ Behavior, Games 1M 2, 1M 3, 2M 2, and 2M 3 L 55% of the time. Even though theoretically speaking “I won’t tell you” is not a self-signaling neologism in Game 2M 3, s-types sent it to induce—to a certain extent successfully—the receivers to take the ex-ante ideal action so as to obtain the maximal payoff of 50. Unlike the case of the senders, the presence of a non-credible neologism did not attract more deviating behavior from the receivers. In fact, compared to the responses in Game 2M 2, in Game 2M 3 the receivers’ responses to “my type is s” and “my type is t” were more in line with the truth-telling equilibrium behavior (i.e., taking actions as if the sender was telling the truth). The frequency of action C conditional on “my type is s” increased significantly from 62% in Game 2M 2 to 80% in Game 2M 3 (p “ 0.014, Mann-Whitney test); the frequency of action R conditional on “my type is t” increased from 70% in Game 2M 2 to 72% in Game 2M 3, though not significantly (p “ 0.557, Mann-Whitney test). In terms of generating deviating behavior, the neologism “I won’t tell you,” being non-credible in Game 2M 3, affected the senders but not the receivers. Relative to the observations from Game 2M 2, the more frequent deviating behavior of the senders was more or less compensated by the more frequent truth-telling equilibrium behavior of the receivers, resulting in the same overall frequencies of fully revealing outcome between the two games.26 We report our second finding, which confirms Hypothesis 2: Finding 2 (Effect of the Credibility of Neologisms). Confirming Hypothesis 2, the frequency of fully revealing outcome was significantly higher in Game 2M 3 than in Game 1M 3. The frequency of fully revealing outcome in Game 1M 3 was 32%, significantly lower than the 51% in Game 2M 3 (p “ 0.029, Mann-Whitney test). This is consistent with our premise that 26 The frequencies of fully revealing outcomes observed in Games 2M 2 and 2M 3, being 55% and 51%, may appear to be on the low side, calling into question how well subjects played according to the equilibrium. Note, however, that even with reasonably high frequencies of conforming behavior, e.g., that the senders truthfully reveal 80% of the time and the receivers optimally respond 80% of the time, the resulting frequencies of fully revealing outcomes would be substantially lower at 64%. Also, our primary interests are in the comparative statics, not the absolute levels of the frequencies. 17 the equilibrium that survives neologism-proofness is more likely to be played in the laboratory. The credibility of the neologism, however, had only small impacts on the senders’ uses of the neologism. For t-types, the frequency of “I won’t tell you” increased insignificantly from 31% in Game 2M 3 to 45% in Game 1M 3 (p “ 0.100, Mann-Whitney test); for s-types, the frequency of “I won’t tell you” decreased insignificantly from 58% to 45% (p “ 0.100, Mann-Whitney test). By contrast, the receivers’ behavior was more significantly influenced by the credibility of the neologism; we observed a lower extent of truth-telling equilibrium behavior when the neologism became credible. The frequency of action C conditional on “my type is s” decreased significantly from 80% in Game 2M 3 to 54% in Game 1M 3 (p “ 0.057, Mann-Whitney test); the frequency of R conditional on “my type is t” decreased, though insignificantly, from 72% in Game 2M 3 to 56% in Game 1M 3 (p “ 0.243, Mann-Whitney test). The two properties of neologisms, their existence and credibility, therefore had totally opposite implications for laboratory behavior: in terms of generating deviating behavior, the credibility of neologisms affected, opposite to the case of existence, the receivers but not the senders. This finding also suggests that the lower adherence to fully revealing outcome in Game 1M 3 relative to the observations from Game 2M 3 was a result of a more frequent deviations of the receivers (while the senders’ behavior showing minimal variations between the two games). We report our last finding from the literal-message treatments, which confirms Hypothesis 3: Finding 3 (Effect of the Existence and Credibility of Neologisms). Confirming Hypothesis 3, the frequency of fully revealing outcome was significantly lower in Game 1M 3 than in Game 1M 2. The frequency of fully revealing outcome in Game 1M 2 was 67%, significantly higher than the 32% in Game 1M 3 (p “ 0.015, Mann-Whitney test). For the senders’ behavior, the frequency at which s-types sent “my type is s” increased significantly from 45% in Game 1M 3 to 85% in Game 1M 2 (p “ 0.015, Mann-Whitney test); the frequency at which t-types sent “my type is t” increased significantly from 40% in Game 1M 3 to 85% in Game 1M 2 (p “ 0.014, Mann-Whitney test). For the receivers’ behavior, the frequency of C conditional on “my type is s” increased insignificantly from 54% in Game 1M 3 to 71% in Game 1M 2 (p “ 0.171, Mann-Whitney test); the frequency of R conditional on “my type is t” increased significantly from 56% in Game 1M 3 to 84% in Game 1M 2 (p “ 0.029, Mann-Whitney test). Getting rid of any neologism increased the frequencies of truth-telling equilibrium behavior of both the senders and the receivers, and this resulted in the receivers taking their ideal actions significantly more often in Game 1M 2 than in Game 1M 3. To conclude our on-path-play analysis, we further note that the different information transmission outcomes observed under two different sizes of message spaces, as we see in Games 1M 2 18 and 1M 3, undermines the idea that randomization is a natural empirical tendency. This observation lends force to the empirical plausibility of Farrell’s (1993) assumed existence of unsent messages. Even though there was no significant difference between the frequencies of fully revealing outcome in Games 2M 2 and 2M 3, the fact that the underlying senders’ and receivers’ behavior differed in the two games also rules out randomization as a plausible phenomenon. Elicited Strategies and Beliefs. We proceed to examine the elicited strategies, providing further support for the above on-path-play findings. Senders’ elicited beliefs about the receivers’ responses will also be examined. In analyzing the senders’ elicited strategies, we classify the strategies into the following four categories: literal babbling (“I won’t tell you” chosen for both s and t), non-revealing (the same message chosen for both s and t), truth-telling (“My type is s” chosen for s and “My type is t” chosen for t), and fully revealing (different messages chosen for s and t).27 For the receivers, we classify their elicited strategies into two categories: pooling (action L chosen for all available messages) and separating (action C chosen for message “my type is s,” R chosen for “my type is t,” and, for three-message games, any one of the actions chosen for “I won’t tell you”). Figure 3 presents the frequencies of the senders’ and the receivers’ strategy categories. Frequencies of Strategy Categories Frequencies of Strategy Categories 1.00 1.00 Game 2M2 Game 2M3 0.80 Game 1M3 Proportion Game 1M2 Proportion Game 2M2 Game 2M3 0.80 0.60 0.40 Game 1M3 Game 1M2 0.60 0.40 0.20 0.20 0.00 0.00 Literal Babbling Non‐Revealing Truth Telling Fully revealing Strategy Category Senders’ Strategy Categories Pooling Separating Separating Separating Separating (L, L, L)/(L, L) (C, R, *)/(C, R) (C, R, L) Strategy Category (C, R, C) (C, R, R) Receivers’ Strategy Categories Figure 3: Senders’ and Receivers’ Strategies, Games 1M 2, 1M 3, 2M 2, and 2M 3 In Game 2M 2, almost all fully revealing strategies of the senders were truth telling (82% vs. 76%). This is to be contrasted with Game 2M 3, in which the frequencies of fully revealing and truth-telling strategies were, respectively, 64% and 31%. This gap suggests that some senders in Game 2M 3 chose to send “I won’t tell you” for one type and another message for another type.28 The on-path-play finding that the non-credible neologism attracted deviating behavior from the senders was also apparent from the significant drop in the frequency of truth-telling strategies from 76% in Game 2M 2 to 31% in Game 2M 3 (p “ 0.0143, Mann-Whitney test). 27 Note that some of these categories are not mutually exclusive: a literal babbling strategy is also non-revealing, while a truth-telling strategy is also fully revealing. 28 The on-path plays shown in the left panel of Figure 2 suggest that these senders chose “I won’t tell you” for s and “My type is t” for t, which would effectively reveal their types. 19 Consistent with the fact that the fully revealing outcome in Game 1M 3 cannot be supported as neologism-proof, there were more instances of literal babbling in Game 1M 3 than in Game 2M 3 (37% vs. 29%). The difference was, however, not significant (p “ 0.3316, Mann-Whitney test), echoing the on-path-play finding that the effect of the credibility of a neologism on the senders was present but overall not significant. Without any neologism, the frequencies of fully revealing and truth-telling strategies in Game 1M 2 (77% and 71%) were as high as those in Game 2M 2 (82% and 76%) and significantly higher than those in Game 1M 3 (51% and 33%; p “ 0.0143, Mann-Whitney tests), again echoing the corresponding on-path-play findings. For the receivers, the existence of a non-credible neologism in Game 2M 3 led to a greater adoption of separating strategy, although the increase was not significant: the frequencies of this strategy category increased from 58% in Game 2M 2 to 72% in Game 2M 3 (p “ 0.1918, MannWhitney test). This non-significant increase was, however, in line with the on-path-play finding that the presence of a non-credible neologism had a less significant impact on the receivers than on the senders. Further echoing the finding that the credibility of the neologism contributed to deviating behavior from the receivers, a significantly lower frequency of separating strategy was observed in Game 1M 3 (45% vs. 72% in Game 2M 3; p “ 0.0571, Mann-Whitney test); a higher frequency of pooling strategy, though not significant, was also observed in Game 1M 3 (28% vs. 14% in Game 2M 3; p “ 0.1547, Mann-Whitney test). Finally, the frequency of separating strategy was 72% in Game 1M 2, significantly higher than the 45% in Game 1M 3 (p “ 0.0147, Mann-Whitney test). Overall, the analysis of elicited strategies confirms that the on-path plays were results of subjects playing the corresponding strategies. Senders' Predictions of Receivers' Responses Conditional on Senders' Strategy Categories Game 2M2 Game 2M3 Game 1M3 Game 1M2 1.00 Proportion 0.80 0.60 0.40 0.20 0.00 Separating Pooling Literal Babbling Separating Pooling Separating Pooling Non‐Revealing Truth Telling Strategy Category‐Prediction Separating Pooling Fully Revealing Figure 4: Senders’ Predictions of Receivers’ Responses, Games 1M 2, 1M 3, 2M 2, and 2M 3 Figure 4 presents the elicited senders’ beliefs about the receivers’ responses, which are classified into either separating or pooling.29 In presenting the frequency bars of the predicted receivers’ 29 Recall that the senders’ beliefs were elicited after they were informed about their types, and each sender was asked to predict his/her paired receiver’s response to the message implemented given his/her realized type (and the furnished strategy). Accordingly, in characterizing the senders’ predictions of the receivers’ responses, we define separating and pooling with respect to the realized type: a predicted response is classified as separating if 20 responses, we categorize them by the senders’ strategy categories defined above. This helps make clear that the senders’ strategies were, to varying degrees, consistent with their reported beliefs. In all four games, more than 50% of the predicted receivers’ responses, ranging from 54% to 92%, were best responses to the senders’ adopted literal babbling or truth-telling strategies. Even when considering the more encompassing sets of non-revealing and fully-revealing strategies, the corresponding percentages were above 40% and as high as 83%. These findings suggest that what we observed in the laboratory was largely a manifestation of purposeful behavior; out of the anticipation that the receivers would best respond accordingly, some of the senders adopted the truth-telling/fully revealing strategies and some adopted the babbling strategies.30 We nevertheless note the unique finding from Game 1M 3, in which substantial proportions— up to 45%—of the predicted receivers’ responses were pooling when the senders adopted the truth-telling or fully revealing strategies. These senders revealed their types even when they anticipated that their messages would be ignored. Recall that Game 1M 3 has the unique property among the four games with literal messages that its fully revealing outcome is the only one that is not neologism-proof. The senders’ pooling predictions signaled their anticipations that the receivers understood that they had no incentives to truthfully reveal in the presence of a credible neologism. While any communication strategy would be a best response to the beliefs that the messages would be ignored, these senders chose to break the indifference by being truthful. This suggests that, consistent with the lying aversion documented in prior experimental studies, some of our subjects might have a homegrown preference for telling the truthful. 4.2 Games with A Priori Meaningless Messages In this subsection, we report findings from our second set of treatments with a priori meaningless messages, focusing on the three hypothesized effects in Hypothesis 4. Not surprisingly, in the absence of a pre-existing common language, we observed a greater variety in the uses and interpretations of messages across different matching groups. To provide a comprehensive picture of our findings, we therefore make individual group behavior the focus of our analysis. Table 3 reports, for each matching group in Games 1E and 2E, the frequencies of messages conditional on types and of types conditional on messages. The former frequency captures how the senders used the messages, and the latter helps determine the endogenous, empirical meanings of messages. The frequencies are aggregated separately across the first 20 rounds and the last 20 rounds. In order to investigate whether a new meaning emerges for the third message [Hypothesis the sender anticipates that 1) the receiver will take C after observing his/her type to be s, or 2) the receiver will take R after observing his/her type to be t; a predicted response is classified as pooling if the sender anticipates that the receiver will take L after observing his/her own type to be either s or t. 30 Refer to Appendix A for a level-k analysis in which we explore an alternative rationalization of our findings. While complementarities exist between neologism-proofness and the level-k approach, a natural specification of level-k models for communication games fails to explain some of our findings from Games 1M 3 and 2M 2. 21 4(i)], it is imperative to first ascertain the endogenous meanings of the two initial messages. Amid the heterogenous behavior across matching groups, our strategy is not to look for general empirical regularity but to explore whether individual observations consistent with the predicted emergence of meanings exist. For each game, we first identify the matching groups in which separation occurred and distinct meanings emerged for the two messages in the first 20 rounds. We then evaluate among these groups the meaning of the third message in the last 20 rounds. We apply a simple criterion to the first-20-round frequencies of types conditional on messages to determine whether distinct meanings have emerged for the two initial messages.31 For Game 1E, we conclude that distinct meanings could be established for “$” and “%” in the first 20 rounds in 6 out of the 12 observations (matching groups). These identified observations are highlighted in Table 3.32 Observations 2-2 (Session-Group) and 2-5 represent, respectively, the strongest and the weakest examples. In the first 20 rounds of Observation 2-2, message “$” was sent exclusively by type t—the frequency of t conditional on “$” was 100%—and 87% of the time message “%” was sent by type s, which strongly established the empirical meaning of “$” to be t and that of “%” to be s.33 For Observation 2-5, 70% of the time “$” was sent by s and 65% of the time “%” was sent by t, establishing the meaning of “$” to be s and that of “%” to be t. Note that different meanings emerged in these two observations, but the mapping [“%” ÞÑ s and “$” ÞÑ t] was the dominant finding: it was established not only for Observation 2-2 but also for the other four observations, 1-4, 1-6, 2-4, and 2-6.34 31 Our criterion, though rather arbitrary, requires reasonably clear differentiation between the uses of the two messages while providing leeway for imperfect subjects’ behavior. For each message, the type with the higher conditional frequency is considered the candidate for the meaning of the message. The necessary condition for distinct meanings is that the two candidates, one for each message, cannot be the same. (We use the empirical frequencies of s and t rather than their theoretical equal priors in the calculation of the conditional frequencies, which makes it possible that the higher conditional frequencies for both messages rest on the same type). In concluding whether the distinct candidates are indeed deemed to be the meanings, we further require the sufficient condition that the lower of the two conditional frequencies, one for each candidate, be at least 60% and the higher be at least 70%. The ensuing examples in the main text illustrate our criterion. 32 The shaded observations in Table 3 represent the observations in which individual group behavior was consistent with the theoretical predictions, either only in the first 20 rounds or in all 40 rounds. 33 Note how this can be reconciled with the frequencies of messages conditional on types via the Bayes’ rule. Table 3 reveals that, in the first 20 rounds of Observation 2-2, 100% of the messages sent by s-types was “%” and 85% of the messages sent by t-types was “$”. Since s-types never sent “$” while t-types did, the frequency of t conditional on “$” must be 100%. On the other hand, since “%” was sent by both types, but the frequency by s was higher, the frequency of s conditional on “%” should also be higher than that of t conditional on “%”. 34 Observation 1-1 was close but did not meet our criterion for the establishment of meanings. It is, however, another observation in which subjects used messages in a way consistent with the alternative meaning mapping. 22 23 Freq.(Message|Type) s t “$” “%” “$” “%” 0.24 0.76 0.78 0.22 0.14 0.86 0.67 0.33 0.05 0.95 0.68 0.32 0.05 0.95 0.89 0.11 0.10 0.90 0.95 0.05 0.00 1.00 0.95 0.05 0.26 0.74 0.76 0.24 0.40 0.60 0.56 0.44 0.25 0.75 0.62 0.38 0.52 0.48 0.41 0.59 0.00 1.00 1.00 0.00 0.00 1.00 1.00 0.00 – – – – Game/ Obs. 2E Sess.-Grp. 1-1 1-2 1-3 1-4 1-5 1-6 2-1 2-2 2-3 2-4 2-5 2-6 Mean First 20 Rounds Freq.(Type|Message) “$” “%” Sent s t Sent s 0.55 0.18 0.82 0.45 0.72 0.38 0.20 0.80 0.62 0.76 0.35 0.07 0.93 0.65 0.77 0.45 0.06 0.94 0.55 0.91 0.50 0.10 0.90 0.50 0.95 0.45 0.00 1.00 0.55 0.95 0.48 0.32 0.68 0.52 0.81 0.50 0.30 0.70 0.50 0.45 0.48 0.21 0.79 0.52 0.57 0.48 0.63 0.37 0.52 0.52 0.50 0.00 1.00 0.50 1.00 0.62 0.00 1.00 0.38 1.00 0.48 – – 0.52 – First 20 Rounds Freq.(Type|Message) “$” “%” Sent s t Sent s 0.65 0.58 0.42 0.35 0.21 0.62 0.52 0.48 0.38 0.53 0.50 0.60 0.40 0.50 0.50 0.48 0.11 0.89 0.52 0.76 0.40 0.25 0.75 0.60 0.54 0.48 0.37 0.63 0.52 0.76 0.30 0.50 0.50 0.70 0.43 0.43 0.00 1.00 0.57 0.87 0.70 0.32 0.68 0.30 0.58 0.43 0.29 0.71 0.57 0.78 0.50 0.70 0.30 0.50 0.35 0.25 0.20 0.80 0.75 0.63 0.48 – – 0.52 – t 0.28 0.24 0.23 0.09 0.05 0.05 0.19 0.55 0.43 0.48 0.00 0.00 – t 0.79 0.47 0.50 0.24 0.46 0.24 0.57 0.13 0.42 0.22 0.65 0.37 – “$” 0.05 0.00 0.00 0.00 0.00 0.04 0.12 0.16 0.16 0.65 0.00 0.00 – “$” 0.32 0.59 0.85 0.05 0.05 0.00 0.15 0.00 0.06 0.06 0.32 0.04 – “&” 0.68 0.23 0.05 0.82 0.57 0.55 0.62 0.61 0.59 0.28 0.59 0.54 – “$” 0.62 0.33 0.05 0.50 0.32 0.67 0.21 0.59 0.22 0.41 0.33 0.44 – t “%” 0.00 0.33 0.55 0.00 0.42 0.06 0.29 0.00 0.09 0.00 0.17 0.31 – s “%” 0.62 0.95 0.36 0.76 0.57 0.70 0.59 0.42 0.68 0.30 1.00 0.24 – “&” 0.33 0.05 0.64 0.24 0.43 0.26 0.29 0.42 0.16 0.05 0.00 0.76 – “$” 0.63 0.75 0.62 0.83 1.00 0.76 0.69 0.38 0.66 0.25 0.96 0.68 – t “%” 0.11 0.25 0.00 0.00 0.00 0.18 0.09 0.13 0.05 0.45 0.00 0.00 – Freq.(Message|Type) s “%” 0.00 0.18 0.10 0.14 0.38 0.45 0.23 0.39 0.35 0.67 0.09 0.42 – Freq.(Message|Type) “&” 0.26 0.00 0.38 0.17 0.00 0.06 0.22 0.49 0.29 0.30 0.04 0.32 – “&” 0.38 0.33 0.40 0.50 0.26 0.28 0.50 0.41 0.69 0.59 0.50 0.25 – t 0.68 0.32 0.06 0.90 0.86 1.00 0.43 1.00 0.83 0.90 0.46 0.87 – Sent 0.33 0.38 0.40 0.47 0.65 0.35 0.45 0.25 0.43 0.44 0.59 0.33 0.42 “$” s 0.08 0.00 0.00 0.00 0.00 0.07 0.11 0.40 0.18 0.72 0.00 0.00 – t 0.92 1.00 1.00 1.00 1.00 0.93 0.89 0.60 0.82 0.28 1.00 1.00 – Last 20 Rounds Sent 0.48 0.47 0.44 0.25 0.18 0.30 0.18 0.33 0.15 0.25 0.33 0.20 0.30 “$” s 0.32 0.68 0.94 0.10 0.14 0.00 0.57 0.00 0.17 0.10 0.54 0.13 – Last 20 Rounds Freq.(Type|Message) “%” Sent s t 0.37 0.87 0.13 0.59 0.79 0.21 0.13 1.00 0.00 0.33 1.00 0.00 0.20 1.00 0.00 0.47 0.84 0.16 0.30 0.83 0.17 0.30 0.83 0.17 0.35 0.93 0.07 0.38 0.40 0.60 0.38 1.00 0.00 0.13 1.00 0.00 0.33 – – Freq.(Type|Message) “%” Sent s t 0.00 N/A N/A 0.25 0.40 0.60 0.33 0.15 0.85 0.08 1.00 0.00 0.40 0.50 0.50 0.28 0.91 0.09 0.25 0.60 0.40 0.18 1.00 0.00 0.20 0.75 0.25 0.30 1.00 0.00 0.13 0.40 0.60 0.38 0.67 0.33 0.23 – – Sent 0.30 0.03 0.47 0.20 0.15 0.18 0.25 0.45 0.23 0.18 0.03 0.54 0.25 Sent 0.53 0.28 0.23 0.67 0.42 0.42 0.57 0.49 0.65 0.45 0.49 0.42 0.47 “&” s 0.58 1.00 0.47 0.50 1.00 0.86 0.50 0.56 0.33 0.14 0.00 0.73 – “&” s 0.62 0.45 0.11 0.67 0.71 0.71 0.70 0.55 0.38 0.28 0.59 0.76 – t 0.42 0.00 0.53 0.50 0.00 0.14 0.50 0.44 0.67 0.86 1.00 0.27 – t 0.38 0.55 0.89 0.33 0.29 0.29 0.30 0.45 0.62 0.72 0.41 0.24 – Note: “Freq.(Message|Type)”, the frequency of message conditional on type, is used to measure how the senders used the messages. “Freq.(Type|Message)”, the frequency of type conditional on message, is used to determine the endogenous, empirical meanings of messages implied by the senders’ uses of messages; “N/A” indicates the cases in which this conditional frequency cannot be calculated because the message in question was not used at all. The additional columns “Sent” provide the total frequency at which the particular message was sent by the senders (received by the receivers; same as the frequencies reported under “Received” in Table 4). Means are reported only for these “Sent” frequencies, because alternative uses of the a priori meaningless messages in different matching groups render the means for other frequencies not informative about the average behavior of the senders. The shaded observations represent the ones in which individual group behavior was consistent with the theoretical predictions, either only in the first 20 rounds or in all 40 rounds. Freq.(Message|Type) s t “$” “%” “$” “%” 0.83 0.17 0.50 0.50 0.62 0.38 0.63 0.37 0.55 0.45 0.44 0.56 0.11 0.89 0.77 0.23 0.24 0.76 0.52 0.48 0.30 0.70 0.71 0.29 0.33 0.67 0.27 0.73 0.00 1.00 0.85 0.15 0.56 0.44 0.79 0.21 0.22 0.78 0.71 0.29 0.67 0.33 0.32 0.68 0.10 0.90 0.42 0.58 – – – – Game/ Obs. 1E Sess.-Grp. 1-1 1-2 1-3 1-4 1-5 1-6 2-1 2-2 2-3 2-4 2-5 2-6 Mean Table 3: Frequencies of Messages Conditional on Types and of Types Conditional on Messages, Games 1E and 2E For the last 20 rounds, recall that the third message, “&”, is predicted to be self-signaling in Game 1E. Given that separation has occurred with the two initial messages, it pays for both s and t to send “&” when it becomes available as a neologism, which leads to a new endogenous meaning that both types are equally likely. Using a criterion consistent with the one applied to the first 20 rounds, we conclude that 2 out of the 6 observations identified above, 2-2 and 2-5, had this new meaning emerged for “&”.35 Observation 2-2 remains to be the strongest example in the second half of the sessions. Table 3 reveals that in Observation 2-2 the frequencies of s and t conditional on “&” were approximately equal (respectively 55% and 45%) and were clearly different from the 87% frequency of s (conditional on “%”) and the 100% frequency of t (conditional on “$”) in the first 20 rounds, indicating that the endogenous meaning of “&” was not present before it was introduced. It is worth mentioning another one of the 6 observations identified in the first-20-round analysis, Observation 1-4. While they do not meet our criterion for the last 20 rounds, the respective frequencies of s and t conditional on “&” in this observation were 67% and 33%, fairly different from the 76% frequency of s (conditional on “%”) and the 89% frequency of t (conditional on “$”) in the first 20 rounds. More importantly, Table 3 reveals that in this observation the frequency of “&” was 67%, the highest among all the 12 observations from Game 1E. This high frequency of “&” and the corresponding low frequencies of the two other messages in the last 20 rounds (e.g., the frequency of “%” was down from 52% in the first 20 rounds to a mere 8% after “&” was introduced) suggest that there was a strong incentive for the senders to deviate to send the neologism, an implication of the self-signaling property of “&”. Accordingly, we also highlight Observation 1-4 as an observation from Game 1E that was in line with the predicted self-signaling property of “&”. Note further that substantial decreases in the frequencies of the two initial messages after “&” was introduced were also observed in the other two highlighted observations, 2-2 and 2-5. Applying our criterion for the first 20 rounds to Game 2E, we conclude that distinct meanings could be established for “$” and “%” in 9 out of the 12 observations. Table 3 reveals that Observations 2-5 and 2-6 represent the strongest examples. In both observations, “$” was sent exclusively by t and “%” exclusively by s. In the other 7 identified observations, 1-1, 1-2, 1-3, 1-4, 1-5, 1-6, and 2-1, the distinct meanings were also strongly established, with the relevant conditional frequencies being at least above 70% (with one exception) and as high as 95%. It is also curious that the mapping [“%” ÞÑ s and “$” ÞÑ t] was observed in all 9 observations.36 35 We consider the frequencies of s and t conditional on “&” to be approximately equal if they are within 50% ˘ 10%. 36 This common mapping of meanings was observed in Game 2E (and to a lesser extent in Game 1E) despite our effort to randomize the distributions of alternative instructions and the decision buttons on the screen. Although a keyboard was not involved in subjects’ inputs of decisions, we speculate that the relative positions of “$” and “%” on a keyboard attached to a computer terminal in our lab might have provided a focal point for mapping “$” to X ď 50 (θ “ t) and “%” to X ą 50 (θ “ s), which would have been hard to avoid even if we had used other symbols for the messages. Regardless, this potential focal point originated from the keyboard positions of 24 For the last 20 rounds, recall the prediction for Game 2E that, given that separation has occurred with the two initial messages, the third message, “&”, is not preferred by type t and is only weakly preferred by type s, rendering “&” not credible. With the endogenous meaning that the sender’s type is s (since only s may send “&”), “&” does not convey any meaning not present before it is introduced. Out of the 9 observations identified above, there were 4 observations, 1-2, 1-5, 1-6, and 2-6, in which “&” was established under our criterion to mean s; Table 3 reveals that in these observations the frequencies of s conditional on “&” ranged from 73% to 100%. Parallel to the case of Observation 1-4 from Game 1E, in Observation 2-5 from Game 2E we observed an extremely low frequency of “&”—only 3%. Even though the meaning of “&” in this observation was established perfectly to be t, it was established under only a few cases of “&”. More importantly, a low frequency of “&” was a predicted consequence of the noncredibility of “&” as a neologism.37 We therefore also highlight Observation 2-5 from Game 2E as a conforming observation for the last 20 rounds. In fact, while we were able to establish the predicted meanings only for a subset of observations, the predicted consequence of the contrasting properties of “&” in Games 1E and 2E on the frequencies of the neologism being used [Hypothesis 4(ii)] was observed in most observations: the average frequency of “&” was 47% in Game 1E, significantly higher than the 25% in Game 2E (p “ 0.003, Mann-Whitney test); the self-signaling property of “&” in Game 1E rendered it on average the most frequently sent message among the three messages available in the last 20 rounds, while the non-credibility of “&” in Game 2E made it the least popular. We summarize our findings to this point, in which we address the qualitative Hypothesis 4(i) by emphasizing the existence of conforming behavior and confirm the quantitative Hypothesis 4(ii) with aggregate behavior: Finding 4a (Effects of the Evolution of Meanings of Credible and Non-Credible Neologisms). (i) There existed observations in Games 1E and 2E that were consistent with the evolutions of meanings hypothesized in Hypothesis 4(i). In two observations from Game 1E, the meaning “the sender is (approximately) equally probable to be of either type” emerged for the third, neologistic message, “&”, which was a meaning not present before “&” was introduced; in four observations from Game 2E, the meaning “the sender’s type is (highly likely to be) s” emerged for “&”, which was an existing meaning before “&” was introduced. (ii) Confirming Hypothesis 4(ii), the average frequency of “&” being sent was significantly higher in Game 1E than in Game 2E. the symbols did not undermine the purpose of our design, which is to avoid focal points to be developed directly from the literal meanings of messages. 37 Note also that an endogenous meaning of t for “&”, though inconsistent with the specific prediction on the emerged meaning of “&”, is consistent with the more general prediction that no new meaning emerges for “&”. 25 26 Received 0.55 0.38 0.35 0.45 0.50 0.45 0.48 0.50 0.48 0.48 0.50 0.62 0.48 Game/ Obs. 2E Sess.-Grp. 1-1 1-2 1-3 1-4 1-5 1-6 2-1 2-2 2-3 2-4 2-5 2-6 Mean “$” L 0.09 0.47 0.00 0.06 0.00 0.00 0.26 0.35 0.05 0.47 0.00 0.40 – “$” L 0.58 0.44 0.15 0.26 0.19 0.16 0.17 0.12 0.46 0.41 0.35 0.10 – R 0.19 0.28 0.35 0.74 0.56 0.73 0.50 0.88 0.54 0.53 0.55 0.90 – Received 0.35 0.38 0.50 0.52 0.60 0.52 0.70 0.57 0.30 0.57 0.50 0.75 0.52 C 0.23 0.20 0.00 0.00 0.00 0.06 0.21 0.05 0.16 0.21 0.05 0.04 – R 0.68 0.33 1.00 0.94 1.00 0.94 0.53 0.60 0.79 0.32 0.95 0.56 – Received 0.45 0.62 0.65 0.55 0.50 0.55 0.52 0.50 0.52 0.52 0.50 0.38 0.52 First 20 Rounds Freq.(Action|Message) C 0.23 0.28 0.50 0.00 0.25 0.11 0.33 0.00 0.00 0.06 0.10 0.00 – First 20 Rounds Freq.(Action|Message) “%” L 0.22 0.64 0.38 0.00 0.15 0.64 0.38 0.30 0.43 0.52 0.40 0.33 – “%” L 0.07 0.20 0.30 0.38 0.17 0.29 0.29 0.26 0.17 0.13 0.45 0.10 – C 0.56 0.20 0.58 0.91 0.85 0.27 0.43 0.50 0.47 0.19 0.55 0.60 – C 0.29 0.33 0.20 0.57 0.58 0.57 0.29 0.65 0.33 0.74 0.15 0.83 – R 0.22 0.16 0.04 0.09 0.00 0.09 0.19 0.20 0.10 0.29 0.05 0.07 – R 0.64 0.47 0.50 0.05 0.25 0.14 0.42 0.09 0.50 0.13 0.40 0.07 – Received 0.33 0.38 0.40 0.47 0.65 0.35 0.45 0.25 0.43 0.44 0.59 0.33 0.42 Received 0.48 0.47 0.44 0.25 0.18 0.30 0.18 0.33 0.15 0.25 0.33 0.20 0.30 “$” L 0.00 0.67 0.00 0.00 0.00 0.00 0.00 0.20 0.12 0.44 0.00 0.38 – “$” L 0.47 0.47 0.11 0.30 0.43 0.00 0.29 0.00 0.00 0.20 0.69 0.00 – C 0.08 0.00 0.00 0.00 0.00 0.00 0.00 0.10 0.06 0.28 0.00 0.00 – C 0.37 0.42 0.89 0.00 0.00 0.00 0.14 0.00 0.00 0.00 0.00 0.00 – R 0.92 0.33 1.00 1.00 1.00 1.00 1.00 0.70 0.85 0.28 1.00 0.62 – R 0.16 0.11 0.00 0.70 0.57 1.00 0.57 1.00 1.00 0.80 0.31 1.00 – Last 20 Rounds Freq.(Action|Message) “%” Received L C R 0.37 0.00 1.00 0.00 0.59 0.66 0.21 0.13 0.13 0.00 1.00 0.00 0.33 0.00 1.00 0.00 0.20 0.13 0.87 0.00 0.47 0.47 0.53 0.00 0.30 0.58 0.42 0.00 0.30 0.25 0.42 0.33 0.35 0.14 0.86 0.00 0.38 0.33 0.54 0.13 0.38 0.67 0.33 0.00 0.13 0.60 0.40 0.00 0.33 – – – Last 20 Rounds Freq.(Action|Message) “%” Received L C R 0.00 N/A N/A N/A 0.25 0.10 0.30 0.60 0.33 0.23 0.15 0.62 0.08 0.00 1.00 0.00 0.40 0.38 0.31 0.31 0.28 0.00 1.00 0.00 0.25 0.40 0.10 0.50 0.18 0.29 0.71 0.00 0.20 0.25 0.50 0.25 0.30 0.33 0.67 0.00 0.18 0.40 0.00 0.60 0.38 0.13 0.74 0.13 0.23 – – – Received 0.30 0.03 0.47 0.20 0.15 0.18 0.25 0.45 0.23 0.18 0.03 0.54 0.25 Received 0.53 0.28 0.23 0.67 0.42 0.42 0.57 0.49 0.65 0.45 0.49 0.42 0.47 “&” L 0.17 1.00 0.68 0.00 0.17 0.57 0.60 0.33 0.45 0.14 1.00 0.72 – “&” L 0.47 0.45 0.44 0.52 0.18 0.59 0.26 0.70 0.50 0.17 0.41 0.06 – C 0.25 0.00 0.16 0.25 0.50 0.29 0.40 0.39 0.33 0.43 0.00 0.14 – C 0.29 0.00 0.00 0.48 0.58 0.35 0.17 0.00 0.00 0.33 0.05 0.24 – R 0.58 0.00 0.16 0.75 0.33 0.14 0.00 0.28 0.22 0.43 0.00 0.14 – R 0.24 0.55 0.56 0.00 0.24 0.06 0.57 0.30 0.50 0.50 0.54 0.70 – Note: “Freq.(Action|Message)”, the frequency of action conditional on message, is used to measure how the receivers responded to the senders’ messages. The additional columns “Received” provide the total frequency at which the particular message was received by the receivers (sent by the senders; same as the frequencies reported under “Sent” in Table 3). Means are reported only for these “Received” frequencies, because alternative responses to the a priori meaningless messages in different matching groups render the means for other frequencies not informative about the average behavior of the receivers. The shaded observations represent the ones in which individual group behavior was consistent with the theoretical predictions, either only in the first 20 rounds or in all 40 rounds. Received 0.65 0.62 0.50 0.48 0.40 0.48 0.30 0.43 0.70 0.43 0.50 0.25 0.48 Game/ Obs. 1E Sess.-Grp. 1-1 1-2 1-3 1-4 1-5 1-6 2-1 2-2 2-3 2-4 2-5 2-6 Mean Table 4: Frequencies of Actions Conditional on Messages, Games 1E and 2E We proceed to evaluate the remaining Hypothesis 4(iii). Given the senders’ uses of messages, the achievement of fully revealing outcome depends on whether the implied meanings are picked up by the receivers. Before we evaluate the revelation outcomes, it is therefore useful to first examine the receivers’ behavior. Table 4 reports, for each matching group in Games 1E and 2E, the frequencies of actions conditional on messages, which are again aggregated separately across the first 20 rounds and the last 20 rounds. Our analysis of the receivers’ responses focuses again on individual group behavior and involves two steps. For each game, we first identify, among the matching groups with established distinct meanings in the first 20 rounds, the groups in which the receivers’ actions were best responses given the meanings. We then further examine among the identified groups the receivers’ responses to the third message in the last 20 rounds. We apply a similar criterion to the first-20-round frequencies of actions conditional on messages. Among the 6 observations from Game 1E with established meanings in the first 20 rounds, we identified 5 observations, 1-4, 1-6, 2-2, 2-4, and 2-6, in which the receivers’ most frequent responses were best responses to the corresponding messages given their empirically determined meanings.38 Observation 2-2 remains to be one of the strongest examples when it comes to the receivers’ responses. Recall that in Observation 2-2 (as well as in the other 4 identified observations), the meaning of “$” was established to be t and that of “%” was established to be s; Table 4 reveals that in this observation the receiver’s ideal action under t, R, was taken 88% of the time when “$” was received, and the ideal action under s, C, was taken 65% of the time when “%” was received. Observation 2-6 represents an even stronger example, in which the two ideal actions were taken 90% and 83% of the time. It is also curious that the only observation where meanings could be established but the best response frequencies did not meet our criterion, Observation 2-5, was the observation in which the alternative meaning mapping was established. Regarding the responses to the third message, “&”, in 3 out of the 5 observations from Game 1E identified above, 1-4, 1-6, and 2-2, the receivers’ most frequent responses were L, the ex-ante ideal action under the equal priors. On the other hand, recall that Observations 1-4, 2-2, and 2-5 had either the meaning of “&” established to be that both sender’s types are equally likely or such a meaning was less clearly established but there was a high frequency of “&”. The intersection of these two sets of observations, 1-4 and 2-2, represent the cases where both types of the sender deviated to send the neologism when it became available and the receivers were able to respond optimally to such deviations. In fact, Observation 2-2 was the strongest example all along. Both the senders and the receivers in this matching group behaved consistently with the predicted disruption to the fully revealing equilibrium when a self-signaling neologism was introduced. While we only obtained one such observation, it is remarkable that subjects were able 38 Our criterion places a more stringent requirement than demanding the best responses to be chosen more than 1/3 of the time. For the conditional frequencies of the two best responses, one for each message, we require the lower frequency to be at least 50% and the higher to be at least 60%. 27 to respond optimally to the incentives underlying the self-signaling neologism in an environment without pre-existing common language. For the first 20 rounds in Game 2E, we identified 6 out of the 9 observations with established meanings, 1-1, 1-3, 1-4, 1-5, 2-5, and 2-6, in which the receivers best responded to the meanings. The strongest examples were Observations 1-4 and 1-5. Recall that the same meaning mapping, [“$” ÞÑ t and “%” ÞÑ s], was established for those 9 observations. Table 4 reveals that, in Observation 1-4, the frequency of R conditional on “$” was 94% and that of C conditional on “%” was 91%; in Observation 1-5, the corresponding frequencies were 100% and 85%. For the last 20 rounds in Game 2E, we were able to identify only one observation among the 6 observations identified above, 1-5, in which the receivers’ responses to “&” matched its theoretically predicted and empirically established meaning that the sender is of type s. In this observation, the frequency of C—the ideal action under s—conditional on “&” met our criterion at precisely 50%. Note also that the last-20-round frequencies of R conditional on “$” and of C conditional on “%” showed almost no variation from the corresponding frequencies in the first 20 rounds. Parallel to Observation 2-2 from Game 1E, Observation 1-5 from Game 2E provides an example in which subjects, in an environment without pre-existing common language, behaved consistently with the fully revealing equilibrium, where this time the introduction of a noncredible neologism created no disruption to the fully revealing outcome.39 We further compare Observation 2-2 from Game 1E and Observation 1-5 from Game 2E, discussing the implications of their similarities and differences for the fully revealing outcomes. In both observations, strong distinct meanings were established for the two messages “$” and “%”, and any changes from the first 20 rounds to the last 20 rounds concerning these meanings were to sharpen them even further. Similarly, in both observations the receivers best responded to the endogenous meanings of the two messages, and the frequencies of the best responses showed an upward trend over rounds. However, in Observation 2-2 (Game 1E) the receivers responded to the third message, “&”, mostly with the ex-ante ideal action L, whereas in Observation 1-5 (Game 2E) the receivers responded to “&” (sent exclusively by s) mostly with the ex-post ideal action C. Furthermore, the frequency of “&” in the former observation was more than three times the frequency in the latter. Even though the two observations showed similar uses and interpretations of “$” and “%”, their differences with respect to the uses and interpretations of “&” resulted in, when the neologism was introduced, a drop in the frequency of fully revealing outcome in Observation 2-2 (Game 1E) that was not observed in Observation 1-5 (Game 2E). Table 5 reports the first-20-round and the last-20-round frequencies of fully revealing outcomes 39 We emphasize fully revealing outcome because we did observe disruption to the fully revealing equilibrium: stypes deviated to send “&”, which accounted for the 15% of “&” observed in the last 20 rounds of this observation. Given that more than a majority of the responses to “&” was C, the ideal action under s, this disruption to the fully revealing equilibrium, however, did not result in the same disruption to the fully revealing outcome. More thorough analysis of fully revealing outcomes in both games follows. 28 in all observations from both games. In Observation 2-2 from Game 1E, the frequency was 70% in the first 20 rounds, which dropped to 55% in the last 20 rounds. By contrast, in Observation 1-5 from Game 2E, the frequency was 85% in the first 20 rounds, which increased to 90% in the last 20 rounds.40 The same qualitative differences between the first 20 rounds and the last 20 rounds were observed in the aggregate data, although the differences were not statistically significant. Table 5 reveals that the first-20-round average frequency of fully revealing outcome in Game 1E was 49%, which was higher than the second-20-round average frequency at 45% (p “ 0.194, Wilcoxon signed-rank test); the first-20-round average frequency of fully revealing outcome in Game 2E was 53%, which was lower than the second-20-round average frequency at 56% (p “ 0.127, Wilcoxon signed-rank test).41 Table 5: Frequencies of Fully Revealing Outcomes, Games 1E and 2E Observation 1-1 1-2 1-3 1-4 1-5 1-6 2-1 2-2 2-3 2-4 2-5 2-6 Mean Game 1E First 20 Rounds Last 0.33 0.50 0.55 0.53 0.48 0.53 0.48 0.70 0.35 0.53 0.30 0.58 0.49 20 Rounds 0.23 0.28 0.65 0.45 0.43 0.68 0.30 0.55 0.40 0.68 0.20 0.55 0.45 Game 2E First 20 Rounds Last 0.45 0.28 0.60 0.85 0.85 0.58 0.43 0.40 0.38 0.20 0.75 0.58 0.53 20 Rounds 0.63 0.23 0.53 0.95 0.90 0.63 0.53 0.38 0.63 0.38 0.73 0.28 0.56 Note: The frequency of fully revealing outcome is measured by the frequency at which the receivers took the (ex-post) ideal actions, C under type s and R under type t. We summarize the above findings, which address Hypothesis 4(iii): Finding 4b (Effects of the Evolution of Meanings of Credible and Non-Credible Neologisms). (i) Consistent with the qualitative difference hypothesized in the first part of Hypothesis 4(iii), the average frequency of fully revealing outcome in Game 1E after the introduction of the third message, “&”, was lower than that before the neologism was introduced; the differences were, however, not statistically significant enough to confirm the hypothesis. (ii) The average frequency of fully revealing outcome in Game 2E after the introduction of “&” 40 Other than the introduction of a third message, learning may also have contributed to how often the fully revealing equilibrium was played in the last 20 rounds relative to the first 20 rounds. When learning occurred toward the equilibrium and the third message introduced no disruption, the frequency of fully revealing outcome could be higher in the last 20 rounds as was observed in this observation. 41 The two respective two-sided p-values are 0.388 and 0.255. 29 was higher than that before the neologism was introduced; the differences were, however, not statistically significant, thus confirming the second part of Hypothesis 4(iii). Without confirming Hypothesis 4(iii) for Game 1E, we consider the confirmation for Game 2E to be of limited value. To highlight what we learned from these treatments without the anchor of literal meanings, we instead emphasize that the fully revealing equilibrium and the credible/non-credible neologisms with their evolved meanings were able to predict some observed behavior. 5 Concluding Remarks In this study, we experimentally evaluate the first cheap-talk refinement concept, neologismproofness. In the experimental literature of communication games, behavioral phenomena such as overcommunication are prevalent, and they have been quite successfully explained by behavioral concepts or models such as lying aversion and level-k reasoning. The same behavioral factors may have been driving some of our findings. However, by focusing on the differences among the games in our two sets of treatments, we have largely controlled for these factors, which allows us to focus on the predictive power of neologism-proofness. In our first set of treatments which features literally meaningful messages, we find that whether a fully revealing equilibrium is neologism-proof affects how often it is played, lending support to the predictive power of the refinement concept. We also obtain a finding not covered by the theory: even non-credible neologisms can cause senders to deviate. On the other hand, the presence of credible neologisms resulted in a lower frequency of fully revealing outcome via the receivers’ deviating behavior. In our second set of treatments which features a priori meaningless messages, we find, rather unsurprisingly, that subjects had a harder time playing a fully revealing equilibrium and responding to the incentives behind the neologisms. We nonetheless obtained a few observations that confirm the theoretical predictions, in particular the predicted evolution of meanings of the credible and non-credible neologisms. These conforming observations demonstrate that sophisticated coordinated plays can be achieved by human subjects in the laboratory. In our case, the plays involve the evolution of equilibrium plays and a disruption to the equilibrium as predicted by a refinement concept, all in an environment without the anchor of common language. Our findings shed light on the powers and limitations of neologism-proofness as a predictive theory. While we view it as a remarkable finding that the refinement concept was able to predict behavior in the absence of pre-existing common language, the prediction was confirmed only in selected individual matching groups. By contrast, individual group behavior in the treatments with literal messages conformed to the major prediction of neologism-proofness fairly 30 consistently. From an empirical vantage point, this observed contrast validates the importance of literal meanings in the applications of neologism-proofness. Within the common-language environment, our findings indicate that the receivers were better able to appreciate the senders’ incentives to send credible neologisms than what the senders expected them to be. This suggests that strategic sophistication and the expectations thereon may be a missing element in the current state of the theories and may warrant a place in a more comprehensive, predictive theory of cheap-talk refinements. We hope that our study can help inform future research in this regard. 31 Appendix A – Level-k Analysis Motivated by the over-communication phenomenon well-documented in the literature of experimental cheap-talk games and following the convention in the literature (Crawford, 2003; Cai and Wang, 2006), we specify the following level-k model. The model starts with a credulous L0 receiver and a truthful L0 sender. The sender of the lowest level of sophistication, L0 , simply sends the truthful message all the time. In response to the L0 sender, the L0 receiver trusts the sender and always chooses the ideal action given the belief that is consistent with the literal meaning of a message. The model further assumes that Lkě1 senders best-respond to Lk´1 receivers and Lkě1 receivers best-respond to Lk senders. Tables A1 and A2 report the level-k predictions for Game 1M 3 and Game 2M 3, respectively. Tables A3 reports the level-k predictions for Game 1M 2 and Game 2M 2 where the predictions coincide. Table A1: Level-k Predictions for Game 1M 3 Sender’s Strategy L0 L1 Lkě2 s t “My type is s.” “My type is s.” “I won’t tell you.” “I won’t tell you.” “My type is t.” “I won’t tell you.” “I won’t tell you.” C C C Receiver’s Strategy “My type is t.” “I won’t tell you.” R R R L L L Table A2: Level-k Predictions for Game 2M 3 Sender’s Strategy s L0 L1 Lkě2 “My type is s.” “I won’t tell you.” “My type is s.” or “I won’t tell you.” t “My type is s.” “My type is t.” “My type is t.” “My type is t.” C C C Receiver’s Strategy “My type is t.” “I won’t tell you.” R R R L C C Table A3: Level-k Predictions for Game 1M 2 and Game 2M 2 Sender’s Strategy L0 L1 Lkě2 Receiver’s Strategy “My type is s.” “My type is t.” s t “My type is s.” “My type is s.” “My type is s.” “My type is t.” “My type is t.” “My type is t.” C C C R R R First of all, like neologism-proofness, the model predicts a babbling outcome for Game 1M 3. Second, like neologism-proofness, the model essentially predicts a fully revealing outcome for Game 2M 3. Similarly, the model predicts a fully revealing outcome for Game 1M 2 and Game 2M 2. However, unlike neologism-proofness, the model fails to predict the systematic difference between receivers’ strategies in Game 1M 3 and Game 2M 3. Recall that neologism-proofness predicts that when the neologism becomes credible in Game 1M 3, the receiver would use the 32 pooling strategy in which action L is taken regardless of the message received. Our data indicate that the frequency of the pooling strategy being used by receivers is indeed significantly higher in Game 1M 3 than in Game 2M 3. On the other hand, the specified level-k model organizes the data from Game 2M 3 quite well. In particular, the model predicts that the message “I won’t tell you” was paired with another message “My type is t” to reveal senders’ types effectively, which is indeed observed in our strategy-level data. A few remarks regarding alternative specifications of L0 players are in order. One plausible alternative is to specify that the L0 receiver uniformly randomizes over the three actions regardless of the message. But this alternative specification is somewhat unnatural given our adopted message space in which each message has a clear literal meaning, and secondly the specification essentially generates the same fully revealing outcomes from Game 1M 3 and Game 2M 3, which do not match our data. We believe that the level-k model and the equilibrium notion of neologism proofness should be complements rather than substitutes in explaining the experimental data. Our result indicates that neologism-proofness is crucial to explain the observed treatment effects or the difference among the four treatments. However, the finding does not imply that the level-k model has no power to explain the data. Our simple analysis in this section demonstrates that the level-k model can explain some feature of our data reasonably well. 33 Appendix B – Sample Experimental Instructions Instructions for Game 1M 3 Welcome to the experiment. This experiment studies decision making between two individuals. In the following hour or so, you will participate in 20 rounds of decision making. Please read the instructions below carefully; the cash payment you will receive at the end of the experiment depends on how you make your decisions according to these instructions. Your Role and Decision Group There are 20 participants in today’s session. Prior to the first round, one half of the participants will be randomly assigned the role of Member A and the other half the role of Member B. Your role will remain fixed throughout the experiment. In each round, one Member A and one Member B will be randomly and anonymously paired to form a group, with a total of 10 groups. Regarding how players are matched, the 10 groups are equally divided into 2 classes so that there are 5 groups in each class with 10 participants, 5 Members A and 5 Members B; in each and every round, you will be randomly matched with a participant in the other role in your class. Thus, in a round you will have an equal, 1 in 5 chance of being paired with a participant in the other role in your class. You will not be told the identity of the participant you are matched with, nor will that participant be told your identity—even after the end of the experiment. Your Decision in Each Round Member A’s Decision In each round and for each group, the computer randomly selects, with equal chance, an 1 integer X from 1 to 100. (Each number therefore has probability 100 to be selected.) Figure 5 shows Member A’s decision screen. You as Member A have two kinds of decisions to make: 1. what to tell Member B if it turns out that X ą 50; and 2. what to tell Member B if it turns out that X ď 50. Note that you are making a decision plan (what to do if this happens and what to do if that happens), where you will make these decisions without knowing the actually selected X. For each decision, you can choose from the following three options: 1) “X is bigger 34 Figure 5: Member A’s decision Figure 6: Member B’s decision than 50”; 2) “X is smaller than or equal to 50”; and 3) “I won’t tell you”. Your choices for different decisions could be different or the same. Your decision for the round is completed after the two choices, which will then enter into the determination of rewards at a later stage of the round. Member B’s Decision 35 Figure 6 shows Member B’s decision screen. You as Member B have three kinds of decisions to make: 1. what action to take if Member A says “X is bigger than 50”; 2. what action to take if Member A says “X is smaller than or equal to 50”; and 3. what action to take if Member A says “I won’t tell you”. Note that you are making a decision plan (what to do if this happens and what to do if that happens and etc.), where you will make these decisions without knowing what Member A actually says. For each decision, you can choose from the following three options: 1) action L; 2) action C; and 3) action R. Your choices for different decisions could be different or the same. Your decision for the round is completed after the three choices, which will then enter into the determination of rewards at a later stage of the round. Your Reward in Each Round After Member A’s and Member B’s decisions, the computer will proceed to draw the integer X randomly from 1 to 100. Then the realization of X will be revealed to Member A and Member B. Member A’s and Member B’s rewards will be determined according to the reward table in Figures 5 and 6 and what choices they have made. In each cell of the reward table, the first number represents the reward in HKD to Member A and the second number the reward in HKD to Member B. (The relevant numbers for each role are highlighted in Blue.) The reward procedure, in which the computer draw determines the relevant row of the reward table and Member B’s action the relevant column, may best be illustrated with an example. Consider the following choices made by the two members. Member A chooses to say 1. “X is bigger than 50” if X ą 50; 2. “I won’t tell you” if X ď 50. Member B chooses to take 1. action L if Member A says “X is bigger than 50”; 2. action C if Member A says “X is smaller than or equal to 50”; 3. action R if Member A says “I won’t tell you”. 36 Suppose the computer randomly draws X “ 27. According to the choices, Member A says “I won’t tell you” and Member B takes action R. Given that X ď 50, Member A will receive 20 HKD and Member B will receive 30 HKD. On the other hand, if the computer draws X “ 79, according to the choices Member A says “X is bigger than 50” and Member B takes action L. Given that X ą 50, Member A will receive 30 HKD and Member B will receive 20 HKD. Prediction Reward by Member A. If you are Member A, you have an opportunity to earn an extra reward. After the computer draws X, you will be asked to predict what Member B will take after listening to what you choose to say to him/her for the relevant range of X. If your prediction is correct, you will receive an extra 2 HKD. Information Feedback During the course of a round, you will be informed about the drawn X, what Member A chooses to say for the relevant range of X, and what Member B chooses to take for what Member A chooses to say. (The information provided does not include the whole decision plan of the members.) Your Cash Payment The experimenter randomly selects 2 rounds out of 20 to calculate your cash payment. (So it is in your best interest to take each round seriously.) Your total cash payment at the end of the experiment will be the sum of HKD you earned in the 2 selected rounds plus a 30 HKD show-up fee. Quiz and Practice To ensure your comprehension of the instructions, we will provide you with a quiz and a practice round. We will go through the quiz after you answer it on your own. You will then participate in 1 practice round. At the beginning of the practice round, you will be randomly assigned the role of either Member A or Member B. Your role in the official rounds is the same as that in the practice round. Once the practice round is over, the computer will tell you “The official rounds begin now!” Adminstration Your decisions as well as your monetary payment will be kept confidential. Remember that you have to make your decisions entirely on your own; please do not discuss your decisions with any other participants. 37 Upon finishing the experiment, you will receive your cash payment. You will be asked to sign your name to acknowledge your receipt of the payment (which will not be used for tax purposes). You are then free to leave. If you have any question, please raise your hand now. We will answer your question individually. If there is no question, we will proceed to the quiz. Quiz 1. True or False: I will remain as a Member A or Member B in all 20 rounds of decisionmaking. Circle one: True / False 2. True or False: I will be matched with the same player in the other role in all 20 rounds. Circle one: True / False 3. True or False: My decisions involve making plans for different scenarios but not specific choice for single scenario. Circle one: True / False 4. True or False: At the end of the experiment, I will be paid my earnings in HKD from two randomly chosen rounds in addition to 30 HKD show-up fee. Circle one: True / False. 38 Instructions for Game 1E INSTRUCTION Welcome to the experiment. This experiment studies decision making between two individuals. In the following hour or so, you will participate in 40 rounds of decision making. Please read the instructions below carefully; the cash payment you will receive at the end of the experiment depends on how you make your decisions according to these instructions. Your Role and Decision Group There are 20 participants in today’s session. At the beginning of the experiment, one half of the participants will be randomly assigned the role of Member A and the other half the role of Member B. Your role will remain fixed throughout the experiment. In each round, one Member A and one Member B will be randomly and anonymously paired to form a group, with a total of 10 groups. Regarding how participants are matched, the 10 groups are equally divided into 5 classes so that there are 2 groups in each class with 4 participants, 2 Member As and 2 Member Bs; in each and every round, you will be randomly matched with a participant in the other role in your class. Thus, in a round you will have an equal, 1 in 2 chance of being paired with a participant in the other role in your class. You will not be told the identity of the participant you are matched with, nor will that participant be told your identity—even after the end of the experiment. Your Decision in Each of Round 1-20 In each round and for each group, the computer randomly selects, with equal chance, an 1 integer X from 1 to 100. (Each number therefore has probability 100 to be selected.) The computer will reveal (only) to Member A whether the selected X is bigger than 50 (X ą 50) or it is smaller than or equal to 50 (X ď 50). Member B, without knowing the range of the selected X, needs to make an action choice. Member A’s Decision Figure 7 shows Member A’s decision screen. After being informed of the range of the selected X (whether X ą 50 or X ď 50), you as Member A have to decide what message to send to your paired Member B. You will be prompted to enter your choice of message by clicking one of the two buttons ‘%’ or ‘$’. Note that the relative positions of the two message buttons are randomly determined in each round, so the positions of the buttons ‘%’ and ‘$’ in a round may or may 39 Figure 7: Member A’s Decision not be the same as those in the previous or later round.42 Once you click one of the buttons, your decision in the round is completed and the chosen message will be transmitted to the paired Member B. Member B’s Decision Figure 8 shows Member B’s decision screen. You as Member B will see the message sent by your paired Member A. After that, you will be prompted to enter your choice of action by clicking one of the three buttons ‘L’, ‘C’, or ‘R’. Your decision for the round is then completed. Your Reward in Each Round After Member A’s and Member B’s decisions, the realization of X and Member B’s choice of action will be revealed to both Member A and Member B. Member A’s and Member B’s rewards will be determined according to the reward table in Figures 5 and 6 and the choices they have made. In each cell of the reward table, the first number represents the reward in HKD to Member A and the second number the reward in HKD to Member B. (The relevant numbers for each role are highlighted in Blue.) 42 We also have two sets of instructions for today’s session, one in which the two symbols are stated in the order ‘%’ followed by ‘$’ and the other in the order ‘$’ followed by ‘%’. We randomly distribute the instructions so that half of the participants have their instructions adopting one order and the other half have their instructions adopting the other order. This means that another participant in your group may or may not have the same symbol order in his/her instructions. 40 Figure 8: Member B’s Decision Information Feedback At the end of a round, you will be informed about the selected X, Member A’s choice of message, Member B choice of action, and your reward for the round. Your Decision in Each of Round 21-40 After the 20th round, your screen will provide further instructions for your decisions in Round 21-40. Please read the instructions carefully before you start the 21st round. You will have an opportunity to ask questions if anything is unclear about the new instructions. Your Cash Payment The experimenter will randomly select 3 rounds out of the 40 to calculate your cash payment. (So it is in your best interest to take each round seriously.) Your total cash payment at the end of the experiment will be the sum of HKD you earned in the 3 selected rounds plus a 40 HKD show-up fee. Quiz and Practice To ensure your comprehension of the instructions, we will provide you with a quiz and a practice round. We will go through the quiz after you answer it on your own. You will then participate in 1 practice round. At the beginning of the practice round, you will be randomly 41 assigned the role of either Member A or Member B. Your role in the official rounds is the same as that in the practice round. Once the practice round is over, the computer will tell you “The official rounds begin now!” Adminstration Your decisions as well as your monetary payment will be kept confidential. Remember that you have to make your decisions entirely on your own; please do not discuss your decisions with any other participants. Upon finishing the experiment, you will receive your cash payment. You will be asked to sign your name to acknowledge your receipt of the payment (which will not be used for tax purposes). You are then free to leave. If you have any question, please raise your hand now. We will answer your question individually. If there is no question, we will proceed to the quiz now. Quiz 1. True or False: I will remain as a Member A or Member B in all 40 rounds of decisionmaking. Circle one: True / False 2. True or False: I will be matched with the same player in the other role in all 40 rounds. Circle one: True / False 3. True or False: Member A, but not Member B, will be informed of the randomly selected X. Circle one: True / False. 4. True or False: Member A will be responsible for taking action, while Member B will be sending message. Circle one: True / False 42 References [1] Austen-Smith, David. 1990. “Information Transmission in Debate.” American Journal of Political Science, 34: 124-152. [2] Banks, Jeffrey S., Colin F. Camerer and David Porter. 1994. “Experimental Tests of Nash Refinements in Signaling Games.” Games and Economic Behavior, 6: 1-31. [3] Blume, Andreas, Yong-Gwan Kim and Joel Sobel. 1993. “Evolutionary Stability in Games of Communication.” Games and Economic Behavior, 5: 547-575. [4] Blume, Andreas, Douglas V. Dejong, and Geoffrey B. Sprinkle. 2008. “‘The Effect of Message Space Size on Learning and Outcomes in Sender-Receiver Games.” Handbook of Experimental Economics Results, Elsevier. [5] Blume, Andreas, Douglas V. Dejong, Yong-Gwan Kim, and Geoffrey B. Sprinkle. 1998. “Experimental Evidence on the Evolution of the Meaning of Messages in Sender-Recevier Games.” American Economic Review, 88: 1323-1340. [6] Blume, Andreas, Douglas V. Dejong, Yong-Gwan Kim, and Geoffrey B. Sprinkle. 2001. “Evolution of Communication with Partial Common Interest.” Games and Economic Behavior, 37: 79-120. [7] Brandts, Jordi and Charles A. Holt. 1992. “An Experimental Test of Equilibrium Dominance in Signaling Games.” American Economic Review, 82: 1350-1365. [8] Cai, Hongbin, and Joseph Tao-Yi Wang. 2006. “Overcommunication in Strategic Information Transmission Games.” Games and Economic Behavior, 56: 7-36. [9] Chen, Ying, Navin Kartik, Joel Sobel. 2008. “Selecting Cheap-Talk Equilibria.” Econometrica, 76: 117-136. [10] Cho, In-Koo and Kreps, David M. 1987. “Signaling Games and Stable Equilibria.” Quarterly Journal of Economics, 102: 179-221. [11] Crawford, Vincent. 1998. “A Survey of Experiments on Communication via Cheap Talk.” Journal of Economic Theory, 78: 286-298. [12] Crawford, Vincent. 2003. “Lying for strategic advantage: Rational and boundedly rational misrepresentation of intentions.” American Economic Review, 93: 133-149. [13] Crawford, Vincent P. and Joel Sobel. 1982. “Strategic Information Transmission.” Econometrica, 50: 1431-1451. 43 [14] De Groot Ruiz, Adrian, Theo Offerman, and Sander Onderstal. 2015. “Equilibrium Selection in Experimental Cheap Talk Games.” Games and Economic Behavior, 91: 14-25. [15] De Groot Ruiz, Adrian, Theo Offerman, and Sander Onderstal. 2014. “For Those about to Talk We Salute You: An Experimental Study of Credible Deviations and ACDC.” Experimental Economics, 17: 173-199. [16] Dickhaut, John W., Kevin A. McCabe, and Arijit Mukherji. 1995. “An Experimental Study of Strategic Information Transmission.” Economic Theory, 6: 389-403. [17] Duffy, John, Ernest K. Lai, and Wooyoung Lim. 2015. “Coordination via Correlation: An Experimental Study.” Mimeo. [18] Eső, Péter and James Schummer. 2009. “Credible Deivations from Signaling Equilibria.” International Journal of Game Theory, 38: 411-430. [19] Farrell, Joseph. 1993. “Meaning and Credibility in Cheap Talk Games.” Games and Economics Behavior, 5: 514-531. [20] Farrell, Joseph and Robert Gibbons. 1988. “Cheap Talk, Neologisms, and Bargaining.” working paper, Berkeley and MIT. [21] Farrell, Joseph and Robert Gibbons. 1989. “Cheap Talk Can Matter in Bargaining.” Journal of Economic Theory, 48: 221-237. [22] Farrell, Joseph and Matthew Rabin. 1996. “Cheap Talk.” Journal of Economic Perspectives, 10: 103-118. [23] Gertner, Robert, Robert Gibbons, David Scharfstein. 1988. “Simultaneous Signalling to the Capital and Product Markets.” The RAND Journal of Economics, 19: 173-190. [24] Gneezy, Uri. 2005. “Deception: The Role of Consequences.” American Economic Review, 95(1): 384-394. [25] Grossman, Sanford J. and Motty Perry, 1986. “Perfect Sequential Equilibrium.” Journal of Economic Theory, 39: 97-119. [26] Hurkens, Sjaak, and Navin Kartik, 2009. “Would I Lie to You? On Social Preferences and Lying Aversion.” Experimental Economics, 12: 180-192. [27] Kawagoe, Toshiji, and Hirokazu Takizawa, 2008. “Equilibrium Refinement vs. Level-k Analysis: An Experimental Study of Cheap-Talk Games with Private Information.” Games and Economics Behavior, 66: 238-255. 44 [28] Kim, Kyungmin and Philipp Kircher, 2015. “Efficient Competition through Cheap Talk: The Case of Competing Auctions.” Econometrica, forthcoming. [29] Lai, Ernest K., Wooyoung Lim, and Joseph Tao-yi Wang, 2015. “An Experimental Analysis of Multidimensional Cheap Talk.” Games and Economic Behavior, 91: 114-144. [30] Lim, Wooyoung, 2014. “Communication in Bargaining over Decision Rights.” Games and Economic Behavior, 85: 159-179. [31] Matthews, Steven, Masahiro Okuno-Fujiwara, Andrew Postlewaite, 1991. “Refining CheapTalk Equilibria.” Journal of Economic Thoery, 55: 247-273. [32] Rabin, Matthew. 1990. “Communication between Rational Agent.” Journal of Economic Theory, 51: 144-170. [33] Rabin, Matthew and Joel Sobel. 1996. “Deviations, Dynamics and Equilibrium Refinements.” Journal of Economic Theory, 68: 1-25. [34] Sánchez-Pagés, Santiago, and Marc Vorsatz. 2007. “An Experimental Study of Truth-Telling in Sender-Receiver Game.” Games and Economic Behavior, 61: 86-112. [35] Serra-Garcia, Marta, Eric van Damme, and Jan Potters. 2013. “Lying About What You Know or About What You Do?” Journal of the European Economic Association, 11(5), 1204-1229. [36] Sobel, Joel. 2013. “Ten Possible Experiments on Communication and Deception.” Journal of Economic Behavior and Organization, 93: 408-413. [37] Sopher Barry and Iñigo Zapater. 2000. “Communication and Coordination in Signalling Games: An Experimental Study.” Mimeo. [38] Wang, Joseph Tao-Yi, Michael Spezio, and Colin F. Camerer. 2010. “Pinocchio’s Pupil: Using Eyetracking and Pupil Dilation to Understand Truth Telling and Deception in SenderReceiver Games.” American Economic Review, 100: 984-1007. 45
© Copyright 2026 Paperzz