Reading and Writing: An Interdisciplinary Journal 16: 41–59, 2003. © 2003 Kluwer Academic Publishers. Printed in the Netherlands. 41 Phonology: An emergent consequence of memory constraints and sensory input FRANCISCO LACERDA Department of Linguistics, Stockholm University, Stockholm, Sweden Abstract. This paper presents a theoretical model that attempts to account for the early stages of language acquisition in terms of interaction between biological constraints and input characteristics. The model uses the implications of stochastic representations of the sensory input in a volatile and limited memory. It is argued that phonological structure is a consequence of limited memory resources under the pressure of ecologically relevant multi-sensory information. Key words: Emergent phonology, Language acquisition, Self-organizing processes Introduction Speech communication is probably the most complex natural communication system that humans possess. However, while recognizing the complexity of the process, one is at the same time struck by the apparent ease with which children develop the speech communication ability and the adults’ efficiency in using speech to communicate with each other. In the face of the puzzling discrepancy between the complex structure of the speech communication process and the spontaneous character of the language acquisition process, the notion that language was an innate human capacity emerged in the 1960s as a reaction to the strict behaviorist suggestion that language acquisition might be explained in terms of stimulus-response mechanisms. Nevertheless, whereas language is apparently a human activity that is unparalleled in other species, dismissing language acquisition from the linguistic agenda with the assumption that language is an innate human capacity (Chomsky, 1975) probably does not do justice to the explanatory power that biological and ecological factors may bring into the debate of the language acquisition issue. Therefore, the current paper sketches a model of the early stages of the language acquisition process, which, albeit crudely, attempts to draw attention to how elementary, “purposeless” events may, in time, lead to emergent structures that are mainly determined by the constraints of the learning system itself, and its environment. 42 FRANCISCO LACERDA The current attempt to argue for phonological structure as an emergent consequence of memory constraints and sensory input will be organized as follows. First, to set the stage for the argument, a quick review of the nativist perspective will be presented in order to expose the challenges that have to be faced by the present alternative approach. General considerations on how information may be spontaneously represented by living organisms will then be sketched as a broad model of representation of sensory information. Finally, the model will be used to argue that the stored information becomes inevitably structured as a consequence of sensory exposure, the infant’s biases and limitations in memory resources. The nativist perspective The core of the nativist attitude towards language acquisition is that language is a genetically determined process, developing spontaneously, “given minimal conditions of exposure and care” (Chomsky, 1975: 147). Not surprisingly, the programmatic nativist attitude is to dismiss the problems of language acquisition as falling outside the scope of linguistics. If humans are born with a “universal grammar,” then the focus of research may be directed to how this universal schema is tuned to specific ambient languages. To be sure, the nativist argument feeds on numerous observations that humans tend to develop the ability to communicate with each other, overcoming a vast range of adverse conditions. This is the case of deaf children who spontaneously develop sign language in spite of being integrated in exclusively verbal communities. Obviously, the necessity to communicate is an inestimable driving force, capable of putting young humans on the communication trail, even against all odds. With this in mind, Steven Pinker’s statement that “The belief that Motherese is essential for language development is part of the same mentality that sends yuppies to ‘learning centers’ to buy little mittens with bull’s eyes to help their babies to find their hands sooner” (Pinker, 1994: 40, italics added) is perfectly sound. Clearly, language acquisition can unfold even in the absence of motherese, as dramatically demonstrated by the cases of forced deprivation of language exposure during the first years of life (Davis, 1947). In fact, as Gleitman and Newport (1995) point out, evidence from language deprivation cases and other special conditions of language development, such as those occurring in congenitally blind or deaf children, make a strong case for the importance of both biological components and adequate interaction with appropriate environmental conditions. Whereas it certainly is easy to accept their conclusion, the next issue becomes how innate knowledge may interact with ambient language constraints, from the onset of post-natal life. Chomsky’s famous argument of the “poverty of stimulus” strongly suggests that the young language learner EMERGENT PHONOLOGY 43 must have innate linguistic knowledge to be able to make sense of the noisy and sparse linguistic input that infants and children are exposed to. But is the speech input available to the young language learner really poor and noisy? Does the infant need innate specific linguistic knowledge to acquire its ambient language, or can language acquisition be seen as an inevitable emergent consequence of non-specific biases and exposure to ambient language? Modeling general aspects of stimulus representation Let us start off by presenting a very broad theoretical model of how structure may arise through interaction between available organic structures and external variables. The goal is to suggest how “chaotic processes,” in the sense of “purely random,” disordered and meaningless events, may in fact lead to structured outcomes because of the interaction between a system’s random input with the system’s own history (Dennett, 1995). Representing external variables An organism’s representation of its external world is necessarily constrained by its ability to map physical stimuli into internal states. In a very broad sense, this mapping may include anything from an unspecific global process through which the organism’s internal state is modified in response to external stimuli as well as specialized representations achieved after specific processing of the input. External stimuli, intense enough not to harm the organism’s internal structure, can be represented by incremental global changes in the organism’s internal states. Such internal changes constitute a form of automatic encoding of the organism’s exposure to the stimuli. For instance, a bacterium may react to light or to a chemical agent by changing form or moving to another location. It achieves an internal representation of the stimulus and may produce a response based on that representation. At the same time, representing the stimulus involves changes in its detailed internal structure and thus that particular representation becomes part of the bacterium’s specific life history.1 In this sense, representing external information is constantly part of an organism’s natural interaction with its environment. And because the organism’s detailed internal structure is affected by exposure to external stimuli, the status of this structure is also an implicit and automatic record of the organism’s life history. Admittedly, the overwhelming part of the encoded information is unspecific and most often non-retrievable to explicit form. Nevertheless, this purposeless information represents crucial general implicit 44 FRANCISCO LACERDA knowledge since it encodes the organism’s life experience and can be used to plan or guide future actions (in the sense proposed by Tulving, 1998). In complex organisms, the fundamental aspects of the representation process are bound to be identical to those of simpler organisms, as a direct consequence of evolutionary tinkering (Jacob, 1982). The overall complexity arises from the large number of parallel elementary processes, their mutual interaction along with their adaptive response to stimulation (that is also present in the simpler organisms). For instance, the sheer addition of elementary representation processes leads to exponentially growing combinatorial possibilities. In addition, the mutual interaction among these elementary processes and the plasticity of the individual components tends to result in specialized processing structures capable of more efficient mapping of the external information along certain dimensions. For example, instead of a general sensitivity to vibrations conveyed by a single nerve fiber, complex organisms develop ears, i.e., higher-level specialized systems capable of providing a more detailed analysis of the vibrations occurring in their ambient world. The organism is able to add frequency information to the lower-level representation of vibration amplitude alone. This more detailed processing increases the representation capacity dramatically because each of the representation levels available along the initial dimension will now be sub-divided into the number of representation levels available along the new dimension. To be specific, an estimate of the total number of differences between tones that the human auditory system might be able to detect can be derived from the difference limens along the intensity and the frequency dimensions. If the auditory system only could represent intensity, the estimate is about one hundred detectable level differences. However, when the estimates for intensity and frequency are combined the result indicates that about 340,000 tones might be represented in the human frequency × intensity space (Stevens & Davis, 1938/1983: 152). Interestingly, as noted by Steven and Davis (1938/1983), “when the total number of distinguishable colors is deduced from the known number of DL’s for hue, brightness, and saturation, the result is of the same order of magnitude” (p. 152). Thus, for each added dimension, the number of representation levels in the representation space is multiplied by the number of levels that the new dimension contributes with, immediately increasing the complexity of the representation space. An estimate of the total audio-visual resolution capacity, based on the DL’s reported by Stevens and Davis, yields 115,600,000,000 potentially distinguishable audio-visual events! Including in this exercise the number of distinguishable olfactory, gustative and tactile events, based on the respective DL’s, the total will obviously increase very quickly. Still another source of complexity is the interdependence between the different sensory dimensions. EMERGENT PHONOLOGY 45 This type of interdependence introduces non-linearity and time dependency in the representation behavior, but in a plastic system as the sensory representation systems found in complex organisms, interdependencies and plasticity will actually lead to the emergence of structure in the system, making the system more specialized by reducing its degrees of freedom. At first sight, given the variance of the external stimuli and the huge representation resources available in this space, it seems unlikely that the information represented might be structured without the help of pre-existent sorting mechanisms.2 In the context of language acquisition, for instance, something like a Universal Grammar seems to be needed in order to make sense of the noisy speech input and its poor information content. But can innate mechanisms be, in fact, manifestations of the ontogenetic evolution of general-purpose biological resources under the pressure of stimulus exposure? Again, how poorly specified is the speech input that the young language learner is exposed to and how can structure arise in the absence of pre-wired linguistic knowledge? Structure emerges from random processes A common spontaneous attitude towards the variance observed in natural phenomena is to treat it as a unwanted and meaningless disturbance of underlying deterministic processes. While this may be a useful strategy to focus on the core of the phenomena in study, there is a clear risk of missing very relevant information that is conveyed by the variance, itself. In fact, the structure of the variance associated with natural phenomena is an extremely powerful source of information as demonstrated, for instance, by the inferential power of the analysis of variance: the possibility of drawing conclusions about a population in general, based on the variance structure observed in a specific (random) sample of that population. Specifically, such analysis of variance permits quantifying the risk involved in generalizing from the sample to the population. Another common spontaneous attitude is to view samples as timeless sets of data,3 leading to a dramatic mismatch with the ecologically relevant reality. Since all the sensory dimensions are available simultaneously as the organism automatically represents natural events, external events are represented by sequences of points (actually continuous trajectories) in the representation space. Activity represented by a point in this space encodes therefore an observed relationship among the sensory dimensions, at a certain time. As time passes, the level of the activity at a point decreases unless reactivated by new activity represented at the same coordinates or specifically reactivated. In other words, the coordinates of a point in this n-dimensional representation space encode an instance of a specific relationship between the n sensory 46 FRANCISCO LACERDA dimensions represented in this space. Because of the continuous decay of the stored activity, only representations that are frequently updated will tend to be maintained in this space. Also, coordinates shared by different points indicate that the events they represent share the sensory characteristics represented by the shared coordinates. Because events in the outside world are highly variable and the representation space is huge, the likelihood for two unrelated events to be represented in the same location (i.e., the chance of two random points having the same coordinates in the n-dimensional space) is extremely small. Therefore, if two events lead to representations that land on the same location, they implicitly convey the important information that they are very likely related to each other. In fact, given the variance of the input and the available representation resources, the likelihood that two events would be represented on the same location is practically zero, given the time frame of a living organism. Thus, when all the dimensions of the space are considered there will be virtually no clusters of representations because the events that are being represented will tend to differ in some detail and as all the details are represented the whole representation will appear as a blend of scattered random points. However, the chances of disclosing possible structures in this representation space increases dramatically if it is possible to “look” at the represented events from different viewpoints and “angles” from which several points representing events will appear to be clustered.4 Interestingly, reducing the dimensionality of the representation resources may expose hidden structures. General aspects of the language acquisition process Having sketched a general model of how external events may be represented by complex organisms, the challenge is now to apply the model to the specific case of language acquisition in infants. Poverty of stimulus is a reasonable argument when only the speech signal is considered to be the external component of the process of spoken language acquisition. Out of its multi-sensory context and from a strict behaviorist perspective, the speech input alone would very likely be insufficient to account for the language acquisition process within the normal time frame and in the absence of pre-programmed linguistic knowledge. But in the infant’s ecological setting, events are multi-dimensional (multi-sensory), the infant does interact with the environment and the young language learner has relatively limited representation “needs.” In this scenario, the model sketched above critically diminishes the significance of the “poverty of stimulus”: if the stimulus is indeed poor, its lack of variance leads to a rapid association of the involved external variables; if the stimulus is after all rich enough, its EMERGENT PHONOLOGY 47 effective linguistic use under adult-infant interaction will expose “spurious variance” and enable the infant to single out the principal components of the representation space. Initial knowledge The classical study by Eimas, Siqueland, Jusczyk and Vigorito (1971), showing that one- and four-month-old infants categorized the /ba/-/pa/ VOT continuum in adult-like fashion, provided strong experimental indication that a “linguistic mode” might be “part of the biological makeup of the organism” (Eimas et al., 1971: 306) but this view was subsequently abandoned after Kuhl and her collaborators’ demonstrations of categorical perception also in chinchillas and macaques (Kuhl & Miller, 1975; Kuhl & Padden, 1982, 1983). Current accounts of the infant’s initial propensity to focus on speech sounds are less dogmatic as to what mechanisms may underlie the observed infant behavior and Jusczyk (1997), for instance, suggested recently that “dedicated, hard-wired, specialized speech-processing mechanisms” (p. 78) do not have to be necessarily involved in the development of speech perception during the first year of life. Indeed, the experimental evidence suggests that the newborn infant orients towards speech, in particular the mother’s speech, because of prenatal exposure to speech stimuli rather than by hard-wired specialized processing mechanisms (de Casper & Fifer, 1980; de Casper & Prescott, 1984; Greenough & Alcantara, 1993; Turkewitz, 1993; EcklundFlores & Turkewitz, 1996). The findings in the speech domain are paralleled by observations of perinatal olfactory preference for the mother’s amniotic liquid in which the fetus was immersed during its prenatal life (Varendi, Porter & Winberg, 1996, 1997). But if the newborn’s auditory and olfactory preferences are linked to the memory of prenatal exposures it is reasonable to expect that newborns would also show preference for the mother’s non-voluntary sounds, like sounds caused by bowel movements. Is it possible to account for an overall preference for speech sounds in this noisy scenario? In fact, according to the general model, the physiological correlates of speech production (breathing rhythm, diaphragm tension, hormonal discharges associated with the alertness required to speak, etc.) may be sufficient to help singling out speech because of its natural correlation with other sensory dimensions that can also be perceived by the fetus. Of course, this speculative account is only relevant in the context of immediate post-birth preference for speech. Even if an initial bias towards language is likely to be advantageous in launching the newborn infant into its ecological setting, a normal infant who lacks that initial bias will, in a normal linguistic environment, nevertheless quickly focus its attention on speech. In all likelihood, the spoken language will 48 FRANCISCO LACERDA become an inescapably salient acoustic component of the multi-sensory flood to which the infant is exposed, since it is consistently widely used in the infant’s normal environment. The recent demonstration by Ramus, Hauser, Miller, Morris and Mehler (2000) that cotton-top tamarin monkeys parallel human newborns in their ability to pick up specific prosodic cues from speech sequences clearly suggests that sensitivity to prosodic properties of speech does not require innate linguistic capacity. In this scenario, the newborn infant’s initial bias towards speech is due not to an innate propensity, as suggested by the universal grammar, but to an epiphenomenon created by the interaction between available neurophysiological and anatomic structures, on the one hand, and of the statistical properties of the pre-birth multi-sensory exposure, on the other. Post-natal development in a multidimensional perspective The young infant’s language acquisition process is obviously influenced by a number of endogenous and exogenous factors. For instance, aspects like the infant’s anatomic and physiological development must have a relatively direct impact on the infant’s capacity to produce the speech sounds used in the ambient language; the infant’s auditory capacity will largely determine the characteristics of the speech sounds that the infant will discriminate successfully; the nature of the speech input to which the infant is exposed will be an exogenous component through which the phonetic characteristics of the ambient language become accessible to the infant; the adult ability to interact with the young infant and fine-tune to the infant’s needs and expectations is also likely to be an important exogenous component of the language development process. To appreciate the role that components like this may have in the early stages of the language acquisition process, let us select some of the aspects that are expected to have significant impact on the predictions of the current model. The infant’s vocalic domain To estimate the domain of the acoustic output produced by the infant during its first months of life, an acoustic model of the vocal tract was implemented using anatomic data available from comparative anatomy (Bosma, 1975; Aronson, 1990). One of the most conspicuous differences between the vocal tract anatomy of the adult and the young infant is the proportion of the pharyngeal tract to the oral cavity. In the newborn infant the larynx is at the level of the 3rd cervical vertebra and the pharynx is therefore extremely short. As the infant matures, the pharynx length increases dramatically during the EMERGENT PHONOLOGY 49 Figure 1. Larynx’s position relative to the cervical vertebrae, as a function of age (data from Aronson, 1990). first years of age. By about five years of age the relation of the pharyngeal length to the oral cavity has practically reached adult proportions, although the larynx will continue to descend throughout life (Figure 1). The acoustic model was developed according to Fant’s acoustic theory of speech (Fant, 1960). It describes the vocal tract as a series of 20 tubes with regions of articulatory mobility displaced to reflect the infant’s anatomy. For convenient comparison with typical adult values, the formants were computed as if the infant’s vocal tract were 17.5 cm long. The conversion between the adultbased values and the actual formant values for an infant was assumed to be approximately linear. Not surprisingly, the results indicate that opening and closing the jaw with the tongue resting on the jaw mainly affects F1 . The first formant rises quickly as a consequence of the initial jaw openings but all the other formants tend to remain unchanged. This acoustic result is depicted in Figure 2a, where the formant trajectory in the F1 × F2 plane is sampled at constant time intervals, for a uniform opening gesture. The corresponding stylized spectrogram, showing the trajectories of the first four formants, is displayed in Figure 2b. According to this computation, the infant would tend to produce a series of central vowels differing mainly on vowel height, a prediction that is compatible with experimental observation (e.g. Davis & MacNeilage, 1990; MacNeilage & Davis, 2000). 50 FRANCISCO LACERDA Figure 2. F1 and F2 values resulting from opening the jaw with uniform opening and closing speed. Note that the opening gesture affects mainly F1 and leaves F2 at approximately the value of a central vowel. This gesture results in a sequence that sounds roughly like a closant, followed by a vowel that becomes increasingly open, as the jaw is lowered. (a) Trajectory on the F1 × F2 plane. (b) Trajectory in a stylized spectrogram (frequency, in Hz vs. time, on an arbitrary scale). Figure 3. Same as Figure 2 but raising of the tongue dorsum towards the velum of the infant. Note that because of the non-linear transformation between the infant’s and the adult’s vocal tract, this movement results in a sound sequence evolving from a schwa vowel to an approximately pharyngeal consonant. (a) Trajectory on the F1 × F2 plane. (b) Trajectory in a stylized spectrogram (frequency, in Hz vs. time, on an arbitrary scale). A closure gesture, corresponding roughly to a velar place of articulation by reference to the infant’s articulatory structures (a constriction at about 1/4 of the vocal tract length), generates formant movements resembling a vowel + uvular or a vowel + pharyngeal sequence (see Figures 3a, b). Correspondingly, articulatory gestures engaging the young infant’s tongue dorsum would result in adult equivalents of velar articulations. From this perspective, the common notion that the infant’s babbling is initially characterized by pharyngeal and velar sounds (Figures 4a, b) clearly gains a coherent acousticarticulatory explanation. The infant may, in fact, be activating the same structures that the adult uses to produce some of the most frequent speech sounds but because of anatomic differences, the resulting vocalizations sound as if they had places of articulation further back in the vocal tract. EMERGENT PHONOLOGY 51 Figure 4. Same as Figure 2, but raising of the tongue towards the hard palate. This movement results in a sound sequence evolving from a schwa vowel towards a velar consonant. (a) Trajectory on the F1 × F2 plane. (b) Trajectory in a stylized spectrogram (frequency, in Hz vs. time, on an arbitrary scale). Adult feedback Adult feedback in response to the infant’s vocalizations is, in terms of the emergent perspective presented here, an important component of the language acquisition process. Although, as discussed above, the vocal output produced by the infant does not necessarily involve adult-like articulatory-acoustic correspondences, adult listeners often tend to interpret the infant’s vocalizations in terms of speech sounds used in their ambient language. This adult interpretation can therefore be seen as a systematic bias (a “phonological filter,” e.g., Sundberg, 1998) that effectively structures the infant’s phonetic variations (Routh, 1967). In other words, by providing feedback to the infant’s spontaneous utterances, adults may help the infant to establish equivalence classes between babbled utterances and adult speech sound categories. An experimental study of the feedback spontaneously provided by adult listeners when listening to babbled utterances was reported by Lacerda and Ichijima (1995). They asked Japanese and Swedish adult listeners (phonetic students) to estimate the tongue positions used by infants when producing a series of babbled utterances. When the adult judgments were sorted according to the age at which the babbled utterances had been produced the adult judgments of tongue height were surprisingly consistent for all the ages but the frontness judgments were consistent only for the late babbling. Interestingly, the outcome of this listening experiment is also compatible with the acoustic-articulatory predictions of high-low dominance, derived above. In terms of the general representation model, Lacerda and Ichijima’s (1995) results suggest that adults may spontaneously provide more consistent feedback regarding height than frontness, a feedback that may elicit the infant’s bias towards the height contrasts that can easily be produced by the opening and closing gestures during vocalization. The adult feedback does 52 FRANCISCO LACERDA not have to be explicit, of a Skinnerian fashion, nor does it have to be as repetitive as statistical learning per se would require. In real-life situations, the adult essentially reinterprets the infant’s utterances, lending a (fuzzy) structure and a (fuzzy) meaning to them, and in cognitive-constructivistic terms. But this is not a process that demands a long series of repeated exposures to stimulus-response contingencies (Kelly, 1963). On the one hand there is an overall quality attached to the feedback (a sort of paralinguistic emotional validation); on the other hand the low likelihood of two unrelated events leading to the same representation renders high significance to a couple of similar occurrences. Besides, both the infant and the adult generate models of reality involving these contingencies, on the basis of rather little information. Obviously, jumping to conclusions before gathering enough statistical data is always a risky business. However, given the range of ecological settings in which language acquisition develops, perhaps the risks may not be very high after all and, at any rate, worth taking to gain communicative competence. Anisotropies in the infant perceptual space Experimental evidence from speech perception research with infants has shown that the perceptual space of the infant is altered by exposure to language (Kuhl, Williams, Lacerda, Stevens & Lindblom, 1992), suggesting that young infants may organize vowel perception around vowel prototypes, as described by Kuhl’s Native Language Magnet (NLM) Theory. The vowel prototype acts as a magnet that attracts neighboring vowel representations towards it. Kuhl’s suggestion is that the perceptual space becomes structured because of the warping in the neighborhood of the vowel prototypes, as a consequence of exposure to the ambient language.5 Whereas the genesis of the vowel prototypes may be object of discussion (e.g., Frieda, Walley, Flege & Sloane, 1999; Lacerda, 1995), it is possible that the initial structure of the infant’s perceptual space is affected by other types of anisotropies. For instance, in line with the acoustic-articulatory observations, also the infant’s ability to discriminate vowel contrasts seems to favor distinctions of vowel height, relative to frontness, as indicated by experimental results obtained by the infant speech perception research group at Stockholm University. Both 2–3-month-old and 6–12-month-old infants, who were respectively tested with the High-Amplitude Sucking technique and with the Head-Turn technique, demonstrated better discrimination performance for vowel contrasts along F1 than along F2 , for a set of synthetic vowels differing only in F1 or in F2 (Lacerda, 1993, 1994). The stimuli had equal differences, in Bark, along F1 and F2 . In addition, to avoid providing intensity cues correlated with F1 , the vowel stimuli that the older infants listened to were generated by a parallel speech synthesizer (Fant, 1960). EMERGENT PHONOLOGY 53 In summary, the infant’s ability to produce and perceive vowel-like sounds along with the adult’s interpretation of infant babbling, suggest that the young infant may tend to favor vowel contrasts along the height dimension (contrasts in F1 ) rather than along the front-back dimension (contrasts in F2 ).6 Clearly, biases of this sort are likely to have a long-term shaping effect on the infant’s articulatory and perceptual representation of vowels. Modeling the emergence of linguistic structure This section will shortly review Lacerda and Lindblom’s model (Lacerda & Lindblom, 1997, 1998; Lacerda, 1998) illustrating how unstructured representations converge towards “implicit categories,” that are specified by small persistent statistical regularities. In line with the notion of representation sketched above, the model assumes that acoustic input gives rise to activity at a point in the representation space. The coordinates of the point in the representation space encode the sensory input generated both by the acoustic signal itself and by all the other sensory inputs simultaneous with the acoustic signal. The activity level7 at a point in this space at a specific time is the system’s memory of the encoded event. The initial activity in the representation space is assumed to be zero everywhere. The activity generated by external stimuli is mapped onto the appropriate coordinates of the representation space and added to the activity level that might have been elicited by previous exposure mapping onto those coordinates. As stated earlier, the representation space is extremely vast. The immense representation resources associated with a high resolution of representation and the variance of natural external stimuli, leads to an extremely low likelihood of mapping two representations onto the same coordinates of the representation space. In other words, in case of unconstrained representation resources, the system will tend to represent the details associated with every single external stimulus but fails to capture the implicit overall structure of the stimuli in general. However, when the mapping of external stimuli is affected by memory diffusion (Edelman, 1987), or by sensory smearing, the situation becomes radically different because the system performs now a long-term running average of the activity levels generated by the external stimuli. This running average captures automatically part of the structure implicit in the external stimuli (Elman, 1999). In Lacerda and Lindblom’s model the stimuli were vowels leading to twodimensional representations on the F1 × F2 plane and the memory diffusion was then described as a two-dimensional Gaussian distribution centered at the “stimulus coordinates.” The activity levels were made proportional to the duration of the stimuli.8 54 FRANCISCO LACERDA Figure 5. Representation of the areas assigned to /a/, /i/ and /u/ given a decision threshold of 0.01 (for details, see Lacerda & Lindblom, 1997). The left panel shows the assignments in terms of plateaux where each category is represented by a given plateau height. The right panel displays the same information, seen from the top. The model was applied to a set of 100 vowels having no prior explicit knowledge of the type of stimuli to which it is exposed. The simulated acoustic input consisted of the two formants, corresponding to the sound being “heard,” along with another dimension corresponding to a random variable, associated with the formant values. This random variable was named “label” but it is, in fact, not a label in its proper sense. Rather it is a variable that represents circumstantial sensory information, co-occurring with the formant information. Such circumstantial information tends, during the early stages of language acquisition, to be statistically related to the speech information, not deterministically related. For instance, an adult introduces a teddy bear to an infant by showing the toy and saying its name. According to the present model, the infant may register the acoustic information corresponding to the sentences produced by the adult but may be staring at a light source behind the bear. In such case, the light source, not the bear, will be represented along with the acoustic information. What the model predicts is that although several “wrong labels” like this may be stored, in the long run, the sentences referring to the bear will tend to appear along with the visual information representing the bear and this consistency will eventually enable the infant to single out the common denominator between the acoustic the visual information: sound strings involving “bear” and something looking like a teddy bear seen from a variety of angles and contexts. To model the variance of the natural world, these “labels” were drawn from a random variable that could assume the value of any of a number of different categories, with the only constraint being that the probability of the “intended” category, i.e., the category from which the formant values EMERGENT PHONOLOGY 55 had in fact been drawn, was slightly higher than those of the competing categories. Thus, although the “labels” were determined and limited a priori to make sure that the model converges within practical computational time, their random character actually captures the real-life “implicit labeling.” The computations carried out by Lacerda and Lindblom (1997) indicate that in spite of the “wrong” “label”-formants associations, the majority of the those associations is locally correctly corresponding to the intended associations. In other words, the model learns to associate labels to certain areas of the F1 × F2 plane by simply using the “label”-formant association with the highest activity level in that location. This is illustrated in Figure 5, where an arbitrary decision threshold was used. In biological systems, the local dominance of a certain type of label will tend to unbalance the system and enhance even more that dominance, driving the system towards specialized behavior (Zohary, Celebrini, Britten & Newsome, 1994). Without constraints, like memory diffusion or sensory smearing, the structure of the representation space tends to disappear because in the absence of local overlap between the representations of the stimuli, every event will be unique and recency will be the only determinant of the activity levels. Conclusion In general, any correlation along any of the involved dimensions can be used to establish a “labeling relationship” between two sensory inputs (the conventional stimulus and the sensory input representing its label). For instance, the infant learning the word “mama” stores all the available information associated with the word. According to the model, as the infant hears the word “mama,” it also stores other available circumstantial information, i.e., not only the details of the voice speaking (Locke, 1996), but also the image of the mother, her smell, her taste, etc., because these sensory inputs are simultaneously available. Of all these simultaneous sensory inputs, those that are statistically associated with the word may eventually emerge as (unintended) labels of the very word. Infants are good at picking up statistical relationships between events (e.g., Saffran, Aslin & Newport, 1996) and therefore, in the long run, even relationships between the olfactory, visual and gustative representations of the word “mama” will emerge as reciprocal labels. According to the present model, the young language learners will probably start by storing acoustic information corresponding to the global characteristics of the speech they are exposed to. This has been shown by de Casper and his colleagues, as well as, indirectly, by the infant’s preferences for motherese (Fernald & Kuhl, 1987). As the number of stored multi-sensory representations increases, more fine detailed relationships between the acoustic and the 56 FRANCISCO LACERDA other sensory inputs emerge spontaneously from the available correlations between sensory dimensions (Lindblom, 1992). But this succession of correlations of more and more detailed subsets of the sound string stops when the non-auditory components no longer offer information that must be correlated with finer sound sub-strings. According to the model proposed here, this is probably why detailed phonological awareness tends to emerge in response to the demands posed by sophisticated word games or, more commonly, in association with the acquisition of reading and writing abilities. In general, however, the mechanisms underlying the emergence of phonological structure may be essentially the same as those involved in syntax (Anward & Lindblom, 2000). Indeed, because language’s combinatorial principles apply in fundamentally the same way to sentences, words and increasingly detailed parts of words, correlation between different kinds of sensory information may be a pervasive structuring component at any of these levels. Acknowledgements The author is indebted to Amanda Walley, an anonymous reviewer, and Ulla Sundberg for their comments on an earlier version of this paper. Research was supported by The Bank of Sweden Tercentenary Foundation (Grant 94-0435) and by Stockholm University. Notes 1. In fact, repeated exposure to a stimulus must actually elicit different detailed responses, since new responses inevitably interact with the representations caused by the organism’s early history. Incidentally, because living organisms must continuously repair themselves so as not to succumb to the second principle of Thermodynamics, they are in a sense under continuous evolution, even in the absence of repeated exposure to explicit external stimuli. 2. The complexity of the mutual interactions of the sensory channels is also a structuring component. 3. Obviously, Markov models or ANOVA models with repeated measures do use an implicit time dimension but still tend to portray data sets as static arrays. 4. What is described here is essentially the basis of principal components’ analysis. 5. See Frieda et al. (1999) for a discussion of the phenomenon from the perspective of adult vowel perception, and Lacerda (1995) for a discussion of the genesis of the phenomenon. 6. Incidentally, it may be noted that natural vowel systems also do tend to explore more vowel height contrasts than frontness contrasts as the number of vowels in the system increases (Liljencrants & Lindblom, 1972; Lindblom & Maddieson, 1988). In addition, front-back contrasts in natural vowel systems do generally involve rounding of the back vowels, as if the perceptual salience of front-back contrasts conveyed by F2 alone needs to be enhanced by a general lowering of all the formants. EMERGENT PHONOLOGY 57 7. “Activity” is, in fact, an extra dimension in the representation space. 8. This proportionality would not be necessary if the stimuli were represented by series of pairs of F1 and F2 values, sampled at a given sampling frequency because in that case the cumulative activity levels would implicitly be linked to the stimuli durations. References Anward, J. & Lindblom, B. (2000). On the rapid perceptual processing of speech: From signal information to phonetic knowledge. Proceedings of the International Symposium on Language Processing and Interpreting, Stockholm University, Stockholm, February, 1997. http://lab1.isp.su.se/iis/Anward-Lindblom.PDF. Aronson, A. (1990). Clinical voice disorders. New York: Thieme. Bosma, J. (1975). Anatomic and physiologic development of the speech apparatus. In D. Tower (Ed.), The nervous system, vol. 3: Human communication and its disorders. New York: Raven Press. Chomsky, N. (1975). Reflections on language. Glasgow: William Collins Sons. Davis, B. & MacNeilage, P. (1990). Acquisition of correct vowel production: A quantitative case study. Journal of Speech and Hearing Research, 33, 16–27. Davis, K. (1947). Final note on a case of extreme social isolation. American Journal of Sociology, 52, 432–437. De Casper, A. & Fifer, W. (1980). Of human bonding: Newborns prefer their mothers’ voices. Science, 208, 1174–1176. De Casper, A. & Prescott, P. (1984). Human newborns’ perception of male voices: Preference, discrimination, and reinforcing value. Developmental Psychobiology, 17, 481–491. Dennett, D. (1995). Darwin’s dangerous idea: Evolution and the meanings of life. New York: Touchstone. Ecklund-Flores, L. & Turkewitz, G. (1996). Asymmetric headturning to speech and nonspeech in human newborns. Developmental Psychobiology, 29, 205–217. Edelman, G. (1987). Neural darwinism: The theory of neuronal group selection. New York: Basic Books. Eimas, P., Siqueland, E., Jusczyk, P. & Vigorito, J. (1971). Speech perception in infants. Science, 171, 303–306. Elman, J. (1999). The emergence of language: A conspiracy theory. In B. MacWhinney (Ed.), The emergence of language (pp. 1–27). Mahwah, New Jersey: Erlbaum. Fant, G. (1960). Acoustic theory of speech production. The Hague, The Netherlands: Mouton. Fernald, A. & Kuhl, P. (1987). Acoustic determinants of infant preference of motherese speech. Infant Behavior and Development, 10, 279–293. Frieda, E., Walley, A., Flege, J. & Sloane, M. (1999). Adults’ perception of native and nonnative vowels: Implications for the perceptual magnet effect. Perception and Psychophysics, 61, 561–577. Gleitman, L. & Newport, E. (1995). The invention of language by children: Environmental and biological influences on the acquisition of language, In L. Gleitman, M. Liberman & D. Osherson (Eds.), Language, vol. 1: An invitation to cognitive science. Cambridge: MIT Press. Greenough, W. & Alcantara, A. (1993). The roles of experience in different developmental information stage processes. In B. de Boysson-Bardies, S. de Schonen, P. Jusczyk, P. McNeilage & J. Morton (Eds.), Developmental neurocognition: Speech and face 58 FRANCISCO LACERDA processing in the first year of life (pp. 3–16). Dordrecht, The Netherlands: Kluwer Academic Publishers. Jacob, F. (1982). The possible and the actual. Seattle: University of Washington Press. Jusczyk, P. (1997). The discovery of spoken language. Cambridge: MIT Press. Kelly, G. (1963). A theory of personality: The psychology of personal constructs. New York: W.W. Norton. Kuhl, P. & Miller, J. (1975). Speech perception by the chinchilla: Voiced voiceless distinction in alveolar-plosive consonants. Science, 190, 69–72. Kuhl, P. & Padden, D. (1982). Enhanced discriminability at the phonetic boundaries for the voicing feature in macaques. Perception and Psychophysics, 32, 542–550. Kuhl, P. & Padden, D. (1983). Enhanced discriminability at the phonetic boundaries for the place feature in macaques. Journal of the Acoustical Society of America, 73, 1003–1010. Kuhl, P., Williams, K., Lacerda, F., Stevens, K. & Lindblom, B. (1992). Linguistic experience alters phonetic perception in infants by 6 months of age. Science, 55, 606–608. Lacerda, F. (1993). Sonority contrasts dominate young infants’ vowel perception. PERILUS XVII, 55–63, Stockholm University. Lacerda, F. (1994). The asymmetric structure of the infant’s perceptual vowel space. Journal of the Acoustical Society of America, 95, 3016 (A). Lacerda, F. (1995). The perceptual magnet-effect: An emergent consequence of exemplarbased phonetic memory. In K. Elenius & P. Branderud (Eds.), Proceedings of the international congress of phonetic sciences 95, Vol. 2 (pp. 140–147). Stockholm: ICPhS. Lacerda, F. (1998). An exemplar-based account of emergent phonetic categories. Journal of the Acoustical Society of America, 103, 2980–2981. Lacerda, F. & Ichijima, T. (1995). Adult judgements of infant vocalizations. In K. Elenius & P. Branderud (Eds.), Proceedings of the International Congress of Phonetic Sciences 95, Vol. 1 (pp. 142–145). Stockholm: ICPhS. Lacerda, F. & Lindblom, B. (1997). Modeling the early stages of language acquisition. In Å. Olofsson & S. Strömqvist (Eds.), Cross-linguistic studies of dyslexia and early language development (pp. 14–33). Brussels: European Commission/COST A8. Lacerda, F. & Lindblom, B. (1998). Some remarks on Tallal’s transform in the light of emergent phonology, In C. von Euler, I. Lundberg & R. Llinás (Eds.), Basic mechanisms in cognition and language (pp. 263–283). Amsterdam: Elsevier. Liljencrants, J. and Lindblom, B. (1972). Numerical simulation of vowel quality systems: The role of perceptual contrast. Language, 48, 839–862. Lindblom, B. and Maddieson, I (1988). Phonetic universals in consonant systems. In L.M. Hyman & C.N. Li (Eds.), Language, speech and mind: Studies in honor of Victoria Fromkin (pp. 62–78). London: Routledge. Lindblom, B. (1992). Phonological units as adaptive emergents of lexical development. In C.A. Ferguson, L. Menn & C. Stoel-Gammon (Eds.), Phonological development (pp. 131– 163). Timonium, Maryland: York Press. Locke, J. (1996). Why do infants begin to talk? Language as an unintended consequence. Journal of Child Language, 23, 251–268. MacNeilage, P. & Davis, B. (2000). On the origin of internal structure of word forms. Science, 288, 527–531. Pinker, S. (1994). The language instinct. New York: Morrow. Ramus, F., Hauser, M., Miller, C., Morris, D. & Mehler, J. (2000). Language discrimination by human newborns and by cotton-top tamarin monkeys. Science, 288, 349–351. Routh, D. (1967). Conditioning of vocal response differentiation in infant. Developmental Psychology, 1, 219–226. EMERGENT PHONOLOGY 59 Saffran, J., Aslin, R. & Newport, E. (1996). Statistical learning by 8-month-old infants. Science, 274, 1926–1928. Stevens, S. & Davis, H. (1938). Hearing, its psychology and physiology. New York: John Wiley. Sundberg, U. (1998). Mother tongue – Phonetic aspects of infant-directed speech, Unpublished Ph.D. thesis, PERILUS XXI, Stockholm University. Tulving, E. (1998). Neurocognitive processes of human memory. In C. von Euler, I. Lundberg & R. Llinás (Eds.), Basic mechanisms in cognition and language (pp. 263–283), Amsterdam: Elsevier. Turkewitz, G. (1993). The origins of differential hemispheric strategies for information processing in the relationships between voice and face perception. In B. de BoyssonBardies, S. de Schonen, P. Jusczyk, P. McNeilage & J. Morton (Eds.), Developmental neurocognition: Speech and face processing in the first year of life (pp. 165–170). Dordrecht, The Netherlands: Kluwer Academic Publishers. Varendi, H., Porter, R. & Winberg, J. (1996). Attractiveness of amniotic fluid odor: Evidence of prenatal olfactory learning? Acta Paediatrica, 85, 1223–1227. Varendi, H., Porter, R. & Winberg, J. (1997). Natural odor preferences of newborn infants change over time. Acta Paediatrica, 86, 985–990. Zohary, E., Celebrini, S., Britten, K. & Newsome, W. (1994). Neuronal plasticity that underlies improvement in perceptual performance. Science, 263, 1289–1292. Address for correspondence: Francisco Lacerda, Department of Linguistics, Stockholm University, SE-106 91 Stockholm, Sweden Phone: +46-8-162341; Fax: +46-8-155389; E-mail: [email protected]
© Copyright 2026 Paperzz