arXiv:1705.05762v1 [physics.soc-ph] 16 May 2017 Agent-based model for the origins of scaling in human language Javier Vera∗1 and Felipe Urbina†2 1,2 Facultad de Ingenierı́a y Ciencias, Universidad Adolfo Ibáñez, Avda. Diagonal Las Torres 2640, Peñalolén, Santiago, Chile 2 UAI Physics Center, Universidad Adolfo Ibáñez, Santiago, Chile Abstract • Background/Introduction: The Zipf’s law establishes that if the words of a (large) text are ordered by decreasing frequency, the frequency versus the rank decreases as a power law with exponent close to -1. Previous work has stressed that this pattern arises from a conflict of interests of the participants of communication: speakers and hearers. • Methods: The challenge here is to define a computational language game on a population of agents, playing games mainly based on a parameter that measures the relative participant’s interests. • Results: Numerical simulations suggest that at critical values of the parameter a human-like vocabulary, exhibiting scaling properties, seems to appear. ∗ † (corresponding author) e-mail: [email protected]; telephone/fax number: +56 2 2331 1000 [email protected] 1 • Conclusions: The appearance of an intermediate distribution of frequencies at some critical values of the parameter suggests that on a population of artificial agents the emergence of scaling partly arises as a self-organized process only from local interactions between agents. Keywords: Language Games, Vocabularies, Naming Game, Zipf’s Law 2 1 Introduction Can artificial populations of agents develop vocabularies satisfying the Zipf’s law? This question, based on an earlier version proposed in [1], relies on the assumptions and the minimal interacting rules that allow agents to self-organize from scratch a language, exhibiting that if the words are ordered by decreasing frequency, the frequency of the k-th word, P (k), decays as the power law P (k) ∼ k−α where α ≈ 1 [2]. One possible origin of this law arises from the conflict between the simultaneous interests of both speakers and hearers. At the lexical organization level of language, [3] stressed that each schematic conversation role faces a trade-off on lexical interests. The speaker will tend to choose thus the most frequent words, positively correlated with their ambiguity, understood as the higher number of meanings [3]. Put differently, in an idealized scenario the speaker will prefer to transmit at each interaction the same unique word. On the other hand, this behavior is opposed to the requirements of the hearer, which indeed needs to minimize, given the ambiguity of the transmitted word, the effort of understanding. For the hearer, the preferred vocabulary is therefore a one-to-one word-meaning mapping. At an intermediate level of the lexical participant’s interests, [3] described, within the framework of Information Theory, the drastic appearance of scaling, as is shown in the Zipf’s law, in the organization between words and meanings. The main aim here is to describe the dynamics of a language game [4] on a population of agents which behave according to different levels of lexical interest on word ambiguity. Moreover, the hypothesis is that at some intermediate level 3 of the participant’s interests (as is shown in [3]) agents will share a word-meaning mapping exhibiting some scaling properties. The focus of this paper is a distributed solution in which agents collectively reach shared communication systems without any kind of central control influencing the formation of language, and only from local conversations between few participants [5, 6, 7, 8]. The model proposed here is based on a prototypical agent-based model for computational studies of language formation, the naming game [5, 8, 7, 4], which considers a finite population of agents, each one endowed with a memory to store, in principle, an unlimited number of words. At each discrete time step, a pair of agents, one speaker and one hearer, negotiate words as hypothesis for naming one object. Under the typical dynamics of the naming game, the population will share after a finite amount of time a unique word for referring to the object. This paper is organized as follows. Section “Methods” introduces the model for agent’s behavior and the quantitative measures described in numerical simulations. In “Results”, it is described how the consensus dynamics is strongly influenced by the relative participant’s efforts. Finally, “Conclusion” section presents consequences on language evolution and future work. 2 Methods 2.1 Basic model The language game is played by a finite population of agents P = {1, ..., p}, sharing a set of words {1, ..., n} and a set of meanings {1, ..., m}. Each agent k ), where lk = 1 if the i-th k ∈ P is associated to a n × m lexical matrix Lk = (lij ij k = 0, otherwise. More generally, lexical word is related to the j-th meaning, and lij 4 matrices can be understood in terms of language networks [1]. Next, two technical terms are introduced. Consider one agent k ∈ P , a word w and a meaning m. Definition 1 (known word) The agent k knows the word w if there is at least one k = 1. Analogously, k knows the association between w meaning m, such that lwm k and m if lwm = 1. k Definition 2 (ambiguity) The ambiguity of the word w is defined as the sum of lwj P k over j ∈ {1, ..., m}. More precisely, the ambiguity of w is j lwj The purpose of the game is twofold: the development of a (i) common vocabulary (a lexical matrix shared by the entire population), which exhibits (ii) scaling relations, as shown in the Zipf’s law. Basic interaction rules The basic interaction rules therefore read, (step 1) at each discrete time step, a pair of agents is selected uniformly at random: one plays the role of speaker and the other plays the role of hearer; (step 2) first, the speaker chooses uniformly at random a topic of the conversation, assuming meaning transfer [9]: when a word is transmitted, the hearer knows what is the associated meaning. The speaker selects then one column (meaning) m∗ ∈ {1, ..., m}. Next, the speaker calculates a word associated to m∗ , denoted w∗ , and transmits it to the hearer. This calculation is addressed here only based on the speaker’s lexical interest; (step 3) finally, the hearer behaves as in the naming game. If the hearer does not know the association between w∗ and m∗ , establishes a repair strategy 5 (in order to increase the chance of future agreements). Otherwise, mutual agreement implies alignment strategies [8]. More precisely, (i) if the hearer knows w∗ , both speaker and hearer cancel all entries of the m∗ -th column of their lexical matrices, except the row (word) w∗ ; (ii) otherwise, the hearer establishes a simple repair strategy: it adds 1 to the entry (w∗ m∗ ) of its lexical matrix. In this paper, three strategies arising from the basic interaction rules are proposed. The first strategy focuses on the maximization of the speaker’s interest. Speakers will prefer to transmit thus the most ambiguous words. This is opposed to the second strategy, which involves the minimization of the speaker’s interest (or equivalently, the maximization of the hearer’s interest). Speakers will prefer to transmit therefore the least ambiguous words. Finally, the general case, involving relative interests of both participants, is presented. 2.2 Speaker’s interest maximization What would be the minimal adaptation of the basic interaction rules that allows to focus on the speaker’s interest? How can the population reach agreement on a vocabulary, while the speaker’s interest is maximized? One simple solution consists in to define a way the speakers calculate the most ambiguous word. With this in mind, a new version of (step 2) is proposed: (step 2S) (i) if the speaker does not know a word to transmit m∗ , randomly chooses one word w∗ ∈ {1, ..., m}; (ii) otherwise, the speaker calculates w∗ as the most ambiguous word 6 (from the words associated to m∗ ): w∗ is simply the word with the largest number of meanings. 2.2.1 Example: the most ambiguous word At some time step, consider the following scenario: (i) the topic of the interaction is the meaning (column) m∗ = 2; and (ii) the speaker k ∈ P has the lexical matrix 0 0 1 1 1 0 1 1 1 0 1 1 0 0 Lk = 1 1 Therefore, the speaker calculates the most ambiguous word (row) w∗ as ∗ w = argmax m X k {w:lwm ∗ 6=0} j=1 k lwj = argmax m X w∈{2,3} j=1 k lwj =3 and transmits it to the hearer. 2.3 Speaker’s interest minimization What kind of language strategies do hearers need in order to focus on their interests? Hearers want to minimize the effort of understanding and therefore tends to prefer the least ambiguous words, which is opposed to the speaker’s interest. A version of (step 2) is proposed: (step 2H) (i) if the speaker does not know a word to transmit m∗ , randomly chooses one word w∗ ∈ {1, ..., m}; 7 (ii) otherwise, the speaker calculates w∗ as the least ambiguous word (from the words associated to m∗ ): w∗ is now the word with the lowest number of meanings. 2.3.1 Example: the least ambiguous word As in the previous example, the topic of the interaction is the meaning (column) m∗ = 2, and the speaker’s lexical matrix is Lk . Therefore, the speaker calculates the least ambiguous word (row) w∗ as w∗ = argmin k {w:lwm ∗ 6=0} m X k lwj = argmin m X w∈{2,3} j=1 j=1 k lwj =2 and transmits it to the hearer. 2.4 Relative interests of speakers and hearers What would be the minimal adaptation of the basic interaction rules that enables to include at the same time the interests of both speakers and hearers? In order to define relative interests, one feasible solution involves that speakers would prefer to transmit words associated to a relative ambiguity, defined by a simple relationship between (steps 2S) and (steps 2H). The solution consists in (step 2R): (step 2R) (i) if the speaker does not know a word to transmit m∗ , randomly chooses one word w∗ ∈ {1, ..., m}; (ii) otherwise, the speaker calculates w∗ according to the ambiguity parameter ℘ ∈ [0, 1]. Let r ∈ [0, 1] be a random number. Then, 8 • if r > ℘, the speaker calculates w∗ as the least ambiguous word (as in the rule 2H); • otherwise, the speaker calculates w∗ as the most ambiguous word (as in the rule 2S). Notice that for ℘ = 0, 1 agents play respectively (steps 2H) and (steps 2S). For intermediate values ℘ ∈ (0, 1), agents face relative lexical interests while they play the role of speaker or hearer. 2.4.1 Example: relative interests For (step 2R), the speaker calculates w∗ = 2, with probability 1 − ℘; or w∗ = 3, with probability ℘. 2.5 Measures To explicitly describe the consensus dynamics under different participant’s lexical interests, three measures are defined: the amount of global agreement of the population, D(t), defined as the normalized distance to the average lexical matrix, Distance p 1 X k D(t) = |(i, j) : lij 6 ¯lij | = mnp k=1 where | · | denotes cardinality and ¯lij is the association between the word i and the meaning j on the average matrix L̄; the size of the effective vocabulary [3], 9 Effective vocabulary p m k=1 j=1 X 1 X k V (t) = lij > 0| |i : np where Pm k j=1 lij > 0 means that the i-th word of the lexical matrix Lk is being occupied; and the energy-like function EKL , Energy-like function EKL (℘) = d(P (℘), P (0)) + d(P (℘), P (1)) where d is the symmetric distance defined by the Kullback-Liebler divergence KL [10]: d(P (℘), P (0)) = KL(P (℘), P (0)) + KL(P (0), P (℘)). Here, P (℘) denotes the decreasing distribution of frequency meanings for the parameter ℘. In order to define the probability distribution P (℘), two properties are imposed to the P ranked frequencies p℘i : (i) ni=1 p℘i = 1; and (ii) p℘i > 0.001, for all i ∈ {1, ..., n}. For V (t) and D(t), the focus here relies on the values after 2p × 104 speakerhearer interactions, hV i and hDi, which average over 10 initial conditions and the last 2 × 103 steps. One initial condition supposes that each lexical matrix entry is 0 or 1 with probability 0.5. For these measures, the parameter ℘ is varied from 0 to 1 with an increment of 3%. For EKL (℘), it is described the average value over 10 initial conditions, after 2p × 104 speaker-hearer interactions. For this measure, ℘ is varied from 0 to 1 with an increment of 1%. On a population of size p = 64, each agent is endowed with a lexical matrix formed by n = 64 words and m = 64 meanings. 10 3 Results 3.1 Speaker’s interest optimization: (step 2S) and (step 2H) First of all, for each value of ℘, hDi ≈ 0, as shown in the embedded plot of Fig. 1. With this in mind, the negotiation dynamics defined by (step 2S) and (step 2H) develops simpler vocabularies. As shown in Fig. 1, since hV i ≈ 0 the language game under (step 2S) leads in turn to a vocabulary, in which only one word is being used (for a schematic representation, see Fig. 3 (right)). By contrast, the dynamics under (step 2H) develops a vocabulary close to a one-to-one mapping between words and meanings, since hV i ≈ 0.9 (see Fig. 3 (left)). 3.2 Relative interests of speakers and hearers: (step 2R) Several aspects are remarkable for relative interests of speakers and hearers, as shown in Fig. 1 and Fig. 3 (center). Intermediate values of the parameter ℘ ∈ (0, 1) lead to drastic changes between the two idealized communication systems preferred respectively by speakers and hearers. Around the critical parameter ℘∗ ≈ 0.5, the dynamics establishes three phases in the behavior of hV i versus ℘. First, for ℘ < 0.4 the size of the effective vocabulary exhibits a slow decreasing from the value hV i ≈ 0.9. Next, for ℘ ∈ (0.4, 0.6) a drastic decreasing of hV i is founded. Finally, for ℘ > 0.6 there is a slow decreasing until the value hV i ≈ 0. One of the most interesting results is summarized by Fig. 2 (top). Around the critical parameter ℘∗ , the energy-like function EKL is minimized. Thus, the distribution of meaning frequencies at ℘∗ seems to become an intermediate communication system sharing features with both idealized vocabularies emerged from (step 2S) and (step 2H) (see Fig. 3 (center)). 11 4 Conclusion This work summarizes a descentralized agent-based approach to the origins of scaling properties in human language. The paper describes particularly the influence of a parameter that measures the agents’s lexical interests during language game dynamics. The appearance of an intermediate distribution of frequencies at some critical values of the parameter suggests that on a population of artificial agents the emergence of scaling partly arises as a self-organized process only from local interactions between agents endowed with intermediate levels of lexical interest (for a stronger evidence of scaling, see Fig. 2 (bottom)). In some sense, if cooperation is understood as the capacity of selfish agents to forget some of their potential to help one another [11], the emergence of scaling is crucially influenced by the cooperation between agents. Many extensions of the proposed model should be studied in order to increase the complexity of the language emergence task. A first natural extension is to develop more extensive computational simulations that involve large populations of agents. A second extension should describe other ways to define intermediate agent’s interests. Acknowledgments The authors thank Fondequip AIC-34. 12 References [1] Solé RV, Corominas-Murtra B, Valverde S, Steels L. Language networks: Their structure, function, and evolution. Complexity. 2010;15(6):20–26. [2] Zipf G. Human Behaviour and the Principle of Least-Effort. Cambridge, MA: Addison-Wesley; 1949. . [3] Ferrer-i-Cancho R, Solé RV. Least Effort and the Origins of Scaling in the Human Language. Proceedings of the National Academy of Science (USA). 2003;100:788–791. [4] Loreto V, Baronchelli A, Mukherjee A, Puglisi A, Tria F. Statistical physics of language dynamics. Journal of Statistical Mechanics: Theory and Experiment. 2011;2011(04):P04006. [5] Steels L. A Self-Organizing Spatial Vocabulary. Artificial Life. 1995;2(3):319–332. [6] Steels L. Self-organizing vocabularies. In: Proceedings of Artificial Life V, Nara, Japan. Nara, Japan; 1996. p. 179–184. [7] Baronchelli A, Felici M, Caglioti E, Loreto V, Steels L. Sharp Transi- tion towards Shared Vocabularies in Multi-Agent Systems. J Stat Mech. 2006;(P06014). [8] Steels L. Modeling the cultural evolution of language. Physics of Life Reviews. 2011 December;8(4):339–356. [9] De Beule J, De Vylder B, Belpaeme T. A cross-situational learning algorithm for damping homonymy in the guessing game. In: Rocha LM, Yaeger LS, 13 1.195 ×10 -7 0.9 1.194 hDi 0.8 hV i 0.7 1.193 1.192 0.6 1.191 0.5 1.19 0 0.5 1 ℘ 0.4 0.3 0.2 0.1 0 -0.1 0 0.15 0.3 0.45 0.6 0.75 0.9 1 ℘ Figure 1: hV i and hDi versus ℘. On a population of p = 64 agents, each one endowed with a 64×64 lexical matrix, the measures hV i and hDi (small embedded plot) versus ℘ are showed. Vertical bars indicate standard deviation of the data. Bedau MA, Floreano D, Goldstone RL, Vespignani A, editors. Artificial Life X : Proceedings of the Tenth International Conference on the Simulation and Synthesis of Living Systems. International Society for Artificial Life. The MIT Press (Bradford Books); 2006. p. 466–472. [10] Bishop CM. Pattern Recognition and Machine Learning. Information Science and Statistics. Springer; 2006. [11] Nowak M. Five Rules for the Evolution of Cooperation. 2006;314(5805):1560–1563. 14 Science. 1.05 1 EKL (℘) 0.95 0.9 0.85 0.8 0.75 0 0.2 0.4 0.6 0.8 1 ℘ 10 2 ℘ = 0. 3 ℘ ∗ = 0. 52 (α ∗ = − 1. 08) ℘ = 0. 8 1 P ( k) 10 10 10 0 -1 10 0 10 1 k 10 2 Figure 2: Appearance of an intermediate frequency distribution in vocabularies. (top) On a population of p = 64 agents, each one endowed with a 64 × 64 lexical matrix, the energy-like function EKL versus ℘ is showed. The minimization of EKL occurs at ℘ ≈ 0.55. (bottom) P (k) versus k, for ℘ = 0.3, 0.8 and the value of ℘ that gives the power law parameter closest to 1 (℘∗ = 0.52). The plot shows the distribution of the number of meanings associated to the k-ranked word of the effective vocabulary, P (k), versus k (log − log plot). Black depicted lines indicate least squares fit. The calculations average over ten initial conditions. At the critical 15 ∗ parameter ℘ , the distribution restricted to the words associated at least to one ∗ =1.08 meaning follows P (k) ∼ k−α . Figure 3: Lexical matrices for different values of ℘. After the final configurations are reached (as in Fig. 1), three lexical matrices of size 64 × 64 are choosen as examples of shared vocabularies for ℘ = 0 (left), ℘ = 0.5 (center) and ℘ = 1 (right). Black squares () represent ones; white spaces represent zeros. 16
© Copyright 2026 Paperzz