Cognitive Science 34 (2010) 465–488 Copyright 2010 Cognitive Science Society, Inc. All rights reserved. ISSN: 0364-0213 print / 1551-6709 online DOI: 10.1111/j.1551-6709.2009.01091.x Children’s Production of Unfamiliar Word Sequences Is Predicted by Positional Variability and Latent Classes in a Large Sample of Child-Directed Speech Danielle Matthews,a Colin Bannardb a b Department of Psychology, University of Sheffield Department of Linguistics, University of Texas at Austin Received 23 May 2009; received in revised form 18 November 2009; accepted 23 November 2009 Abstract We explore whether children’s willingness to produce unfamiliar sequences of words reflects their experience with similar lexical patterns. We asked children to repeat unfamiliar sequences that were identical to familiar phrases (e.g., A piece of toast) but for one word (e.g., a novel instantiation of A piece of X, like A piece of brick). We explore two predictions—motivated by findings in the statistical learning literature—that children are likely to have detected an opportunity to substitute alternative words into the final position of a four-word sequence if (a) it is difficult to predict the fourth word given the first three words and (b) the words observed in the final position are distributionally similar. Twenty-eight 2-year-olds and thirty-one 3-year-olds were significantly more likely to correctly repeat unfamiliar variants of patterns for which these properties held. The results illustrate how children’s developing language is shaped by linguistic experience. Keywords: Cognitive development; Language acquisition; Statistical learning; Syntax; Corpus analysis; Information theory; Latent classes; Usage-based models of language 1. Introduction Faced with a stream of speech sounds and gestures, most infants begin to identify the units of their language and discover the potential for recombining them within the first 2 years. Quite how this is achieved is one of the most challenging questions in cognitive science. In the last decade, a very large literature has explored a number of skills that might be useful. It has been reported that children can use basic ‘‘statistical learning’’ mechanisms to take such crucial developmental steps as segmenting the input into ‘‘word-like’’ units Correspondence should be sent to Danielle Matthews, Department of Psychology, University of Sheffield, Western Bank, Sheffield S10 2TP United Kingdom. E-mail: [email protected] 466 D. Matthews, C. Bannard ⁄ Cognitive Science 34 (2010) (e.g., Saffran, Aslin, & Newport, 1996), assigning sounds to ‘‘categories’’ based on their co-occurrence with other sounds (Gomez & Lakusta, 2004) and identifying nonadjacent dependencies (Gomez, 2002; Gomez & Maye, 2005). This research has been conducted using artificial stimuli—sequences of meaning-free sounds from which the infants are able to extract language-like structure using simple pattern detection. The use of such artificial stimuli is valuable in isolating specific input characteristics and learning mechanisms. However, it remains unclear whether these same mechanisms would be at work in a natural learning context. Natural language is of course far noisier than artificial stimuli and rarely displays patterns or statistical structure with the same clear consistency. Crucially, while infants seem to be able to observe patterns in synthetic data from a very young age, it is not clear that they are able to utilize these skills in communicative contexts until sometime later in development. There is thus some work to be done to bridge the gap between these extremely valuable findings and real language development (Pelucchi, Hay, & Saffran, 2009; Johnson & Tyler, in press on word segmentation in natural language). In this paper, we report on a study that examines children’s grammar learning by performing a statistical analysis of a large sample of real input data and using this to make predictions about children’s ability to produce particular sequences of words in a sentence repetition task. The sentence repetition task allows us to test young children, on the cusp of multiword speech, with a procedure that has been tried and tested by many researchers from differing theoretical backgrounds (e.g., Bannard & Matthews, 2008; Potter & Lombardi, 1990; Valian & Aubry, 2005). Using real English of course has some disadvantages, namely that it can be challenging to find sufficient stimuli (where the properties of interest are uncorrelated) while also controlling for other factors that would be presumed to affect production (e.g., word frequency, phonological complexity). However, we think that it is a vital complement to the artificial grammar learning work, and one of our objectives in this study is to show that it is possible to control for many potential confounds via computational analysis of the input data and the use of appropriate methods for statistical analysis of the children’s responses. The aim of this study is to test whether the detailed statistics of the input are reflected in children’s developing grammatical representations. We asked children to repeat unfamiliar sequences of words that were identical to familiar phrases but for one word (e.g., a novel instantiation of a frequent pattern like A piece of X, such as A piece of brick). These variants were unattested in a large child language corpus and thus likely to be novel to most young children or, at the least, unpracticed. We hypothesized that children’s ability to repeat such unattested sequences would reflect their exposure to the relevant pattern in the given lexical form. We thus rely on the assumption that children build lexically specific representations. This assumption has been supported in a recent study (Bannard & Matthews, 2008) where we found that 2- and 3-year-old children were significantly better at repeating the shared first three words of frequently occurring multiword sequences than matched, infrequent sequences (e.g., better at repeating ‘‘sit in your’’ when saying ‘‘sit in your chair’’ than when saying ‘‘sit in your truck’’). It is worth noting that lexical patterns of the kind we are studying here have been given a central role in so-called usage-based theories of development (e.g., Tomasello, 2003; Goldberg, 2006), where they are sometimes referred to as D. Matthews, C. Bannard ⁄ Cognitive Science 34 (2010) 467 ‘‘constructions.’’ Because of the long history of the term construction in the linguistic literature and some minor differences in how the term is applied even within the usagebased literature, we prefer to use the terms schema or pattern in this article, but we nonetheless consider the phenomenon we are discussing as entirely consistent with such an approach. So how might the statistics of the input affect children’s ability to produce unfamiliar sequences of words that are similar to well-known phrases? One recurrent idea in the literature on the learning of linguistic patterns is that children will be affected by what has been called type frequency. The idea here is that children will identify a pattern in the input where some invariant structure is combined with a wide range of other material. For example, Gomez (2002) found that the ability of 18-month-olds to detect a nonadjacent dependency between two sounds was predicted by the extent to which the intervening element was varied in the artificial language they were exposed to. This idea has also been popular in the study of morphology and its development (e.g., Bybee, 1985; Kempe, Brooks, Mironova, Pershukova, & Fedorova, 2007). Similar mechanisms have been proposed for the learning of basic lexical patterns of the kind we are discussing here (e.g., Braine, 1976; Edelman, 2007; Freudenthal, Pine, Aguado-Orea, & Gobet, 2007; Lieven, Pine, & Baldwin, 1997; Pine & Lieven, 1997). Tomasello (2003) has argued that children form the most basic of productive constructions through a process of schematization. This is achieved when children hear repeated uses of one form (e.g., ‘‘Throw’’) along with varied use of another form (e.g., noun phrases referring to whatever is thrown: ‘‘Throw the ball,’’ ‘‘Throw teddy,’’ and ‘‘Throw your bottle’’) in similar contexts. The outcome is a linguistic construction that contains a minimum of one lexical item and one ‘‘slot’’ (Throw X). Type frequency can thus be used to quantify how appropriate it is to generalize over a set of similar utterances. One problem with type frequency, however, is that it does not take into account the frequency distribution of the words filling a given slot. For example, if a child hears the sequence ‘‘Throw your bottle’’ 118 times and ‘‘Throw the ball’’ and ‘‘Throw teddy’’ only once each, then we might not expect the same degree of productivity with a potential ‘‘Throw X’’ construction as if all three sequences had been heard 40 times each (although the type frequency would have been three in both cases). In the former ‘‘unequal’’ case, the child will always expect to hear ‘‘your bottle’’ after ‘‘throw’’ and thus might not detect any potential for productivity. In the latter ‘‘equal’’ case, the child will be uncertain as to which of three possible options will occur and therefore might be more likely to form a productive slot. The intuitive difference between these situations can be quantified with a measure of the entropy (Shannon & Weaver, 1949) of the slot, an index of the uncertainty about which of all the possible words that could fill a slot is most likely to occur (see also Hale, 2006; Keller, 2004; Levy, 2008; Moscoso del Prado Martı́n, Kostiç, & Baayen, 2004; Moscoso del Prado Martı́n, Kostiç, & Filipovic-Djurdjevic, unpublished data). This entropy, which we will refer to as slot entropy, can be calculated as follows, where X is a slot, each x is a word that appears in that slot, and p(x) is the probability of seeing each x in that position. In the above example, then, the entropy in the unequal case is 0.14 and in the equal case it is 1.58. D. Matthews, C. Bannard ⁄ Cognitive Science 34 (2010) 468 HðXÞ ¼ X pðxÞ log2 pðxÞ x2X Following the same reasoning as for type frequency, children should be more competent at producing an unfamiliar sequence when it is an instantiation of a pattern for which a concrete alternative is maximally unpredictable (a pattern with high slot entropy). For example, given two highly frequent utterances, ‘‘Back in the box’’ and ‘‘Let’s have a look,’’ that differ in the slot entropy for the final word position (in the corpus we used, the slot entropy for ‘‘Back in the X’’ was 5.31, for ‘‘Let’s have a X’’ it was 1.24), children should be more likely to accept unfamiliar versions of the sequence that has the greater slot entropy than the sequence with lower slot entropy (e.g., the unfamiliar sequence ‘‘Back in the town’’ should be easier to produce than the unfamiliar sequence ‘‘Let’s have a think’’). Thus, the degree to which children will be willing to extract and utilize invariant patterns will depend on the entropy of its slot(s). We predict, then, that a child will extract a productive pattern (identify a frame and a slot) where there is high entropy. However, the problem is not as simple as determining where there is and is not a slot. Children also face the problem of predicting what is allowed to appear there—forming expectations about not only the exact words seen in a particular position but also the kind of words to be seen. That is, children should have expectations concerning whether a given word or phrase will be seen in a particular position based on its similarity to the words that have been seen there before. Our target sequences were designed to investigate the effect of latent classes—grouping of similar words—on children’s developing knowledge. The idea that speakers have knowledge about how words are similar to other words is of course very widely accepted in linguistic theory—it is the basis for syntactic categories. How exactly they determine this similarity is, however, not so clear. One way in which words are similar to other words is in the similarity of the words or concepts to which they are used to refer. However, although we know that human infants are remarkably good at generalizing across stimuli that are similar (e.g., Shepard, 1987), gauging effects of semantic similarity is notoriously difficult because of the lack of a widely accepted theory of mental representation and semantic cognition. Another way in which words display similarities and dissimilarities is in their distribution relative to other words (Harris, 1964). Learners also seem to be able to exploit this information. For example, it has been shown in an artificial grammar learning study (Gomez & Lakusta, 2004) that children are able to infer similarity between words from the contexts in which they occur (see also Monaghan & Christiansen, 2008 for an extensive investigation of how children might cluster words together using a number of probabilistic phonological cues). In this study, we do not attempt to distinguish between these two sources of similarity. We employ distributional information and operationalize similarity between words by calculating the overlap in their contexts as they occur in a corpus of child-directed speech. However, we cannot be sure whether this measure is the basis that children use to infer similarity. It has long been acknowledged that distributional and semantic similarity are likely to be highly intercorrelated, and that words that have similar meanings will occur in similar D. Matthews, C. Bannard ⁄ Cognitive Science 34 (2010) 469 contexts (see Landauer & Dumais, 1997 for a broad overview of the distributional approach to meaning). In this experiment, we are concerned simply with whether the children exploit the similarity in inferring lexical patterns from the input, and not with the origin of their detection of that similarity. Our second prediction, then, is that children will be more likely to detect the potential for productivity in a four-word sequence and be better at repeating novel instantiations of it when the relevant position has tended to be filled with (semantically or distributionally) similar items. We measure the similarity of the items that have been seen to go into particular slots by looking at how similar the contexts in which they appear are. For all words found in our slots we look at the words that occur two words before and two words after the item in a large corpus of child-directed speech. We record the number of times that each word in the vocabulary occurs within this window. This then gives us a co-occurrence vector for each word, with each entry in the vector representing a dimension in a multidimensional space (where the dimensions are the vocabulary of the language). The similarity between any two words is then taken to be the cosine of the angle between those two vectors (a value between 0 and 1 with higher values indicating greater similarity). In order to calculate the overall cohesiveness of a slot (i.e., the homogeneity or the semantic density of the words previously seen to fill it), we obtained the mean pairwise distance of each word that occurred in that slot from each other word that occurred there. We call this measure slot semantic density and calculate it for the final position slot, X, of each sequence containing N different words as follows: Semantic Density ðXÞ ¼ 1 X X cos ðx; yÞ N2 N x2X y6¼x2X If children are sensitive to the semantic density of a slot, then they might find it easier to produce unfamiliar versions of a four-word sequence if the final slot has both high entropy and high semantic density. For example, given two highly frequent utterances with high slot entropy, ‘‘Back in the box’’ and ‘‘A piece of toast,’’ that differ in the semantic density for the final word position (for ‘‘Back in the X’’ the semantic density is 0.63, for ‘‘A piece of X’’ the semantic density is 0.39), children might be more likely to accept unfamiliar versions of the sequence that has the greater semantic density than the sequence with lower semantic density. Of course, whether such an effect of semantic density holds may depend on the nature of the final word in the unfamiliar sequence. Thus, variants of ‘‘Back in the box’’ might only be easy to repeat if the final word is semantically similar to other words attested in that slot (e.g., containers like ‘‘case’’ or ‘‘fridge’’). In order to test this we selected items that were semantically similar (‘‘case’’) or dissimilar (‘‘town’’) to words seen in the relevant position in the corpus (see the Method section for details). We refer to the former kind of word as semantically ‘‘typical’’ and the latter kind as semantically ‘‘atypical.’’ We thus predicted that unfamiliar sequences would be easier to repeat if they were versions of a construction with high slot entropy and high semantic density and if the final word were semantically typical for that slot. Our predictions for these unfamiliar sequences rest 470 D. Matthews, C. Bannard ⁄ Cognitive Science 34 (2010) on the expectation that the child should not have often uttered them before (if at all) and that they should accordingly be processed as generalizations. To further investigate this proposal, we also tested familiar sequences that could in principle be retrieved directly from memory. Our predictions here were more speculative. We have previously found (Bannard & Matthews, 2008) that children are better at repeating sequences of words that they have frequently encountered before, and it is not clear how having formed a generalization over similar sequences might affect their facility with such familiar instances. One might predict that highly frequent stored sequences will be unaffected by the presence of related items. On the other hand, the possibility of generalizations might actually inhibit the production of familiar word sequences, so that high-frequency items that instantiate low-entropy patterns might be expected to be more fluently produced than their high-entropy counterparts.1 The effect of semantic density on well-integrated familiar sequences could also plausibly be beneficial or detrimental, as having many semantically similar neighbors could presumably either inhibit or enhance production of the sequence (c.f. Magnuson, Mirman, & Strauss, 2007). Note that, as the final word of a familiar sequence is likely to be semantically typical, we did not vary this factor for familiar sequences. To summarize, in the current study we analyzed the properties (slot entropy and semantic density) of four-word schemas that had a lexically specified three-word stem plus a final slot (we henceforth refer to these as schemas) as observed in a large database of British English child-directed speech. We tested how these properties affected children’s ability to reproduce unfamiliar variants of these schemas and checked whether these effects were mediated by the semantics of the final word in the unfamiliar target. We also checked whether these same properties would affect the repetition of familiar sequences (although these could not be fully matched to the unfamiliar sequences for all control variables). We tested children’s ability to comprehend and produce the 27 sequences given in Table 1 by playing them recordings and asking them to repeat them. 2. Method 2.1. Participants Fifty-nine normally developing, monolingual, British English-speaking children were included in the study (32 boys). There were twenty-eight 2-year-olds (range 2.3–2.10, mean age 2.7) and thirty-one 3-year-olds (range 3.1–3.7, mean age 3.4). A further 18 children were tested but not included because of fussiness or inaudible responding. The children were tested in a university laboratory in the United Kindom or in a quiet room in their day care center. 2.2. Materials and design The stimuli for each child consisted of nine triplets of four-word sequences. These sequences were selected using a child language corpus, the largest available to us, D. Matthews, C. Bannard ⁄ Cognitive Science 34 (2010) 471 Table 1 Stimulus sequences and their properties Out of the water Out of the liquid Out of the pudding Back in the box Back in the case Back in the town A piece of toast A piece of meat A piece of brick Have a nice day Have a nice hour Have a nice meal It’s time for lunch It’s time for soup It’s time for drums A bowl of cornflakes A bowl of biscuits A bowl of flowers What a funny noise What a funny sound What a funny cup You bumped your head You bumped your leg You bumped your toy Let’s have a look Let’s have a see Let’s have a think Sequence Familiarity Slot Entropy Semantic Density Typicality of Fourth Word High Low Low High Low Low High Low Low High Low Low High Low Low High Low Low High Low Low High Low Low High Low Low 6.17 6.17 6.17 5.31 5.31 5.31 5.16 5.16 5.16 4.37 4.37 4.37 3.78 3.78 3.78 2.83 2.83 2.83 2.11 2.11 2.11 2.10 2.10 2.10 1.24 1.23 1.23 0.58 0.58 0.58 0.63 0.64 0.64 0.39 0.39 0.39 0.46 0.46 0.46 0.40 0.40 0.40 0.37 0.37 0.37 0.46 0.46 0.46 0.60 0.60 0.60 0.46 0.46 0.46 Typical Typical Atypical Typical Typical Atypical Typical Typical Atypical Typical Typical Atypical Typical Typical Atypical Typical Typical Atypical Typical Typical Atypical Typical Typical Atypical Typical Typical Atypical containing the speech directed to one child, Brian, between the ages of 2 and 5 years recorded in Manchester, UK (Max Planck Child Language Corpus: 1.72 million words of maternal speech). We chose to look at four-word sequences because previous studies have demonstrated that these are sufficiently long to elicit variance in participants’ performance in a repetition task (Bannard & Matthews, 2008; Valian & Aubry, 2005). We extracted all repeated sequences of words from the corpus using the method described in Yamamoto and Church (2001) and discarded all sequences that formed a question (as children might be tempted to answer a question rather than repeat it). Applying this filter meant that our most frequent item was ‘‘I don’t know what’’ which occurred 260 times (a natural log frequency of 5.56). Our log frequency range was then taken to be 0–5.56. We next identified all sequences of four words that began with the same first three words (had the same schema). We identified all schemas for which at least one instantiation was in the top two-thirds of the log frequency range (so that we would have at least one familiar example for later use). For each of these schemas, we calculated the slot entropy and slot semantic density for the fourth word position, as outlined in the introduction. We then 472 D. Matthews, C. Bannard ⁄ Cognitive Science 34 (2010) ordered these schemas according to slot entropy and identified items that spanned the range of observed values. The second key factor that we wish to explore in this paper is the impact of semantic density, and thus it was important that we cross this with slot entropy in our stimuli. For this purpose we (for the purposes of item selection only; slot entropy was treated as a continuous variable in all our analyses) put the items into bands of high, medium, and low slot entropy bands and for each we selected schemas that spanned the range of possible semantic density values as much as possible. Our need to meet all of these criteria meant that we had little freedom in selecting the stimuli. Thus, it was not possible to select schemas of a particular syntactic type or types. The effect on learning that we hypothesize the factors of slot entropy and semantic density to have might be expected to interact with the child’s developing knowledge of syntactic types or categories (they might e.g., expect differing degrees of semantic flexibility for a slot in a noun phrase that in a verb phrase). Nonetheless, we would predict that their effect should be seen across syntactic types. We therefore chose to select the items that maximize the spread of our key predictors, leaving the impact of syntactic type as to be considered in our statistical analysis. The distribution of the items across our key predictor variables can be seen in Fig. 1. Our items reflect multiple syntactic types. One might for example, divide the stimuli set into prepositional phrases (back in the X, out of the X), noun phrases (a bowl of X, a piece of X), and sentences (you bumped your X, what a funny X, let’s have a X, have a nice X, it’s time for X). While certain syntactic types appear to cluster together here (e.g., the prepositional phrases), there is no absolute correlation between syntactic type and our factors. We will later explore the impact of this grouping (which is detailed again in Appendix B for convenience of reference) in our data analysis. Fig. 1. Distribution of test items. D. Matthews, C. Bannard ⁄ Cognitive Science 34 (2010) 473 Having identified our schemas, we then obtained one familiar sequence (seen in the corpus with reasonable frequency) and two unfamiliar sequences (plausible sequences that were nonetheless unseen in our corpus for each schema). The familiar sequence was obtained from the top two-thirds of the overall log frequency range of four word sequences. However, it is important to note that it was not always the most frequent instantiation of the schema, and that the schemas were rarely dominated by any one sequence (the highest frequency instantiations of each of our schemas accounted for a mean of 36% of instances). On average, our selected high-frequency items accounted for 31% of the instantiations of the schema. In order to create two unfamiliar items for each schema, we used the WordNet Lexical database v2.1 (Fellbaum, 1998; WordNet is an IS-A hierarchy (in the sense an apple IS A fruit) created in the psychology department at Princeton University that represents semantic relations between English words) to identify one word that was highly similar to the final word of the selected familiar sequence and one that was semantically dissimilar from this (all nouns cited in appropriate sense in the The Oxford English Dictionary, 1989). Within WordNet, our unseen typical words were in all cases a maximum of five nodes away from the seen words (the threshold on similar pairs proposed by Hirst & St-Onge, 1998). In two cases, the unseen word was a direct hypernym of the seen word (water => liquid) or vice versa (noise => sound). In another two cases, the two words were linked by a direct hypernym of both words (box => container <= case; day => time unit <= hour), and in all other cases except one (lunch => meal => nutriment <= dish <= soup) they were linked via a node that was an immediate hypernym of one of the pair (e.g., toast => bread => baked goods => food <= meat). We refer to the former, similar items as ‘‘typical’’ and the latter as ‘‘atypical.’’ In order to verify the typicality or otherwise of these words for each given schema, we obtained human judgments as to their similarity to the words seen in the schema over the corpus (see Appendix S3 for details). For all but one of the schemas the typical word was judged to be more similar (on average) to the items seen in the schema over the corpus than was the atypical word. Pairs of typical and atypical words were matched for their length in syllables and, as far as possible, their frequencies (see Appendix A). As mentioned above, for each of the nine schemas, we attempted to control for differences in the fourth word frequencies as far as possible (the first three words were identical). However, it was not possible to match the frequency of the final word, bigram, or trigram of the unfamiliar items with the familiar items. Similarly it was not possible to control the frequency of component words, bigrams, or trigrams across different schemas. As we would expect these component frequencies to affect children’s ability to repeat sequences, we factored their effect out by including them as predictors in all regression models. The 10 frequency counts for each four-word sequence (i.e., the frequency of the four-word sequence and its four component words, three component bigrams, and two-component trigrams) are given in Appendix A. To allow us to evaluate the impact of all these separate frequencies without introducing multicollinearity into our models, we reduced the counts to orthogonal dimensions using principal components analysis. We did this separately for the familiar and unfamiliar items as they were intended to be used in separate analyses. We retained all factors with Eigenvalues greater than 1 which left us with four components for the unfamiliar items 474 D. Matthews, C. Bannard ⁄ Cognitive Science 34 (2010) (accounting for 95% of the total variance), and three components for the familiar items (accounting for 93% of total variance). A fuller description of this procedure and a discussion of the loadings for the selected components can be found in Appendix S1. To summarize, this procedure gave us, for each schema, one familiar (high-frequency) sequence and two unfamiliar (unseen) sequences. One of the unfamiliar sequences had a final word that was semantically similar to the familiar item observed in this position in our corpus (unfamiliar, typical) and the other has a dissimilar final word (unfamiliar, atypical). The final 27 stimulus sequences and their properties are presented in Table 1. All sequences were read by a female British English speaker with normal declarative intonation and recorded in a soundproof booth onto a computer disk with a sampling frequency of 44,100 Hz using SoundStudio v.3 (Freeverse, New York, NY, USA). To ensure that the first three words of all matched sequences were identical, we took one sequence as a base and created the matched pair by splicing in the final word using the open-source Audacity software v.1,2.4. We used randomly selected familiar sequences, unfamiliar typical sequences, and unfamiliar atypical sequences as bases for a third of the items each. To ensure that sequences of the same schema type were not encountered in close succession, test items were presented in three blocks of nine items with each block containing one of the variants of a schema in one of two fixed orders (one the reverse of the other), such that each of the three sequences belonging to the same schema was always nine items apart. All three blocks contained an equal number of familiar and unfamiliar sequences and typical and atypical items. These blocks were presented in six orders, with order of presentation counterbalanced for each age group. 2.3. Procedure The experimenter, E, sat with the child at a table in front of a computer (the child either sat alone or on a parent’s knee). E produced a picture of a tree with several stars in the branches and explained they would cover each star with a parrot sticker. E explained that, to get the stickers, they needed to listen to what the computer would say and then say the same thing. Every time they did so, part of a cartoon parrot would appear on the computer. Once they could see the whole parrot (which appeared every three trials), they would get a parrot sticker. E proposed to have a go first. She then clicked on a mouse to play the first of six example sequences and repeated the sequence. She repeated this for the next two example sequences, at which point a full parrot was visible and so E awarded herself a sticker before offering the child a turn. The final three example sequences were used for the child to practice the procedure. E helped the child or replayed the practice sound files once each if necessary. Each time the child had attempted to repeat three sequences s ⁄ he was given a sticker. E then played the test sequences in exactly the same manner except that no help was given, no sound files were replayed, and E did not help the child repeat anything. If the child did not spontaneously repeat a sequence after a reasonable delay, E prompted the child once (saying Can you say that?). If the child did not then respond, or if anything other than this prompt came between the stimulus sequence and the repetition, the response was excluded. Responses were also excluded if the child did not hear the stimulus sequences (e.g., if the D. Matthews, C. Bannard ⁄ Cognitive Science 34 (2010) 475 child spoke unexpectedly as the sound file played. In total, 148 of a possible total of 1,593 responses were excluded). The procedure continued until all 27 sentences were repeated. Responses were recorded onto computer disk using Audacity recording software. 2.4. Transcription and error coding Each word in each sequence was coded for the presence or absence of the errors in Table 2. (The use of such criteria was found in previous studies to improve coder accuracy in comparison to a procedure where coders directly coded the accuracy of each whole threeword stem as correct or incorrect.) If the child did not make a single error on the first three words of the sequence, this sequence was coded as correctly repeated; otherwise it was incorrect. We did not consider errors made on the fourth word as our focus here was on the child’s competence with the schema and we wished to minimize the impact of the phonetic details of the novel item. If a child did not respond to an item, it was discarded along with the other items in that schema. Two research assistants blind to the hypotheses of the experiment transcribed and coded all the children’s responses from audio files. Agreement between these coders was good (Agreement: 82%, Cohen’s kappa = 0.62). A third research assistant, also blind to the hypotheses of the experiment, checked all cases in which the first two coders did not yield identical coding for each word, listened to the relevant response, and resolved the discrepancy. 3. Results All of the children attempted to repeat the vast majority of items (1,445 observations in total). The 2-year-olds repeated the first three words of 21% of the unfamiliar sequences and 30% of the familiar sequences correctly. The 3-year-olds repeated the first three words of 49% of the unfamiliar sequences and 54% of the familiar sequences correctly. As noted in the method, this apparent frequency effect may stem from the frequency of the four-word sequences or their component words, bigrams, or trigrams (because these counts are highly Table 2 Error codes used for children’s responses Code Repetition Deletion Insertion Substitution Mispronunciation Error Whole word or one syllable of the word is repeated. Whole word is missing. Insertion of a word or isolated phonetic material before the target word. Target word substituted for another different word. Target word is missing a phoneme, has a phoneme inserted, or is a morphological variant of the target word (e.g., ‘‘bump’’ instead of ‘‘bumped’’ in ‘‘you bumped your head’’). Missing phonemes that yielded a pronunciation compatible with adult speech and regional dialect (e.g., ‘‘back int box,’’ which is acceptable in northern England) were not scored as errors. The pronunciation of ‘‘the’’ as ‘‘de’’ was also accepted. 476 D. Matthews, C. Bannard ⁄ Cognitive Science 34 (2010) correlated). We will not discuss frequency effects here but rather include in all models the four frequency scores derived through principal components analysis (see Appendix S1). Because of the need to factor out these confounds before the effect of our factors of interest can be usefully observed, we do not present raw data here. To investigate the relationship between correct repetition of the first three words of a sequence and the factors of current interest, we fitted mixed effects logistic regression models to the data using Laplace approximation (Baayen, 2008; Baayen, Davidson, & Bates, 2008; Dixon, 2008; Gelman & Hill, 2007; Jaeger, 2008). The outcome variable in all models was whether the first three words of a sequence were correctly repeated (1) or not (0). Child (N = 59) was added to all models as a random effect on the intercept in order to account for individual differences. We also ran models with extra random effects for the nine schema types and the 27 final words of each sequence, but the variance for these factors was always extremely low—standard deviation always <0.001. We therefore did not include the schema and item variables in our reported analyses. Including these random effects did not change the statistical outcome of the results, and models with item and ⁄ or schema included as random effects provided a substantially poorer fit to the data (a substantially higher AIC score) than models including our selected fixed effect predictors. Taken together these finding indicate that item differences other than the manipulated or controlled variables had minimal impact on the children’s performance. Finally, for all models, we tried introducing the syntactic type of the schema into the model as random effects. We discuss the impact of this on our models below. All noncategorical predictors were centered by calculating the mean for the variable and subtracting it from each value. In Appendix S2, we report on an extensive analysis of the relationship between our predictors, looking for sources of multicollinearity, and suggest that we can be confident in the analyses presented here. Putting the control variables into our model allowed us to examine the effect of the following manipulated variables: 1. 2. 3. 4. Age (2 or 3 years old) Slot entropy (continuous) Semantic density (continuous) Final word typicality (typical or atypical) The principal question of interest is whether these factors affect children’s ability to repeat unfamiliar sequences. We therefore first fitted a model to the repetition data for the novel sequences. We added each of these variables to the model in order to examine their predictive value over and above our controls. We use likelihood ratio tests to compare nested models and Akaike’s information criterion (AIC) values to compare nonnested models. We also report McFadden’s log-likelihood ratio index (LLRI; McFadden, 1974) as a measure of the practical significance of the differences between models.2 First of all, age was found to lead to a significant improvement in fit when added to a model with only our controls as predictors (v2(1) = 23.3, p < .0001, LLRI = 0.021). Further, adding slot entropy again substantially improved the fit of the model (v2(1) = 8.25, p < .005, LLRI = 0.008). Adding semantic density to a model containing our controls and age did not lead to a D. Matthews, C. Bannard ⁄ Cognitive Science 34 (2010) 477 significant improvement in fit (v2(1) = 1.15, p = .284, LLRI = 0.001), and the composite model had a higher AIC than the model including the controls, age, and slot entropy, indicating that slot entropy has greater predictive value than semantic density. However, a model including the controls, age, slot entropy, and semantic density had a significantly better fit, than a model containing only the controls, age, and slot entropy (v2(1) = 4.6, p < .05, LLRI = 0.004), indicating that semantic density does have predictive value (once slot entropy is accounted for) and accounts for additional variance over and above that accounted for by slot entropy. The addition of the typicality of the test item as a predictor offered no significant improvement in fit over a model that contained only the controls and age (v2(1) = 0.31, p = .578, LLRI < 0.001). Similarly, it did not improve fit for models that additionally contained slot entropy (v2(1) = 0.58, p = .455, LLRI < 0.001), semantic density (v2(1) = 0.25, p = .614,, LLRI < 0.001), or both (v2(1) = 0.54, p = .462, LLRI < 0.001), indicating that it had no predictive value. We similarly found that including the human typicality ratings (in Appendix S3) as a continuous predictor gave no improvement in fit when added to a model containing the controls plus age (v2(1) = 0.17, p = .676, LLRI < 0.001) or when we additionally added slot entropy (v2(1) = 0.67, p = .41, LLRI < 0.001), semantic density (v2(1) = 0.29, p = .59, LLRI < 0.001), or both (v2(1) = 1.59, p = .21, LLRI = 0.002). In Table 3, we report on the parameters of a model (model 1) that contained all controls and experimental variables. This had a significantly better fit to the data than a baseline model that included only the random effect of child, control principal components, and age as predictors (v2(3) = 13.36, p = .004, LLRI = 0.013). For this model, the estimated intercepts for the children varied with a standard deviation of 1.03. Age, slot entropy, and semantic density were all significant (positive) predictors, whereas typicality (included here at a categorical value—the same pattern was obtained when including the mean human judgments) was not. These results reflect the fact that 2-year-olds were more likely to make errors than 3-year-olds and that schemas with higher slot entropy and higher semantic denTable 3 Fixed effects in model 1 fitted to data for unfamiliar sequences HPD Intervals (Intercept) Frequency PC1 Frequency PC2 Frequency PC3 Frequency PC4 Age Slot entropy Semantic density Typicality B Lower Upper SE Z )0.67 )0.11 0.31 0.99 )0.09 0.84 1.00 0.23 )0.12 )1.28 )0.43 0.09 0.45 )0.30 0.54 0.41 0.02 )0.44 )0.09 0.14 0.55 1.52 0.13 1.22 1.57 0.45 0.19 0.28 0.14 0.12 0.26 0.10 0.16 0.29 0.11 0.16 )2.37 )0.81 2.69 3.74 )0.85 5.27 3.44 2.17 )0.75 p-Value .018 .42 .007 <.001 .397 <.001 <.001 .030 .455 Note. Concordance between the predicted probabilities and the observed responses, C = 0.838. Somer’s Dxy (rank correlation between predicted probabilities and observed responses) = 0.676 (c.f. Baayen, 2008). 478 D. Matthews, C. Bannard ⁄ Cognitive Science 34 (2010) sity were more likely to be correctly repeated. They are thus consistent with the predictions that ability to reproduce unseen forms will be greater when (a) children have less specific expectations about what should occur in the final word position and (b) the items previously attested in the final word position are more semantically homogenous. In addition to our estimated maximum-likelihood parameters, we also report on a Bayesian analysis (as recommended by Baayen et al., 2008) in which we approximate the full posterior distribution using Gibbs sampling. All model parameters were sampled from normal distributions with noninformative priors (see section 17.4 of Gelman & Hill, 2007, for BUGS code for a similar mixed-effects logistic regression model). We show the lower and upper bounds of the 95% higher posterior density (HPD) intervals for each model parameter. This interval covers 95% of the posterior probability and provides a measure of uncertainty. That this interval does not cross 0 for age, slot entropy, or semantic density gives us further confidence that they are useful positive predictors of repetition performance. To test for possible interactions between experimental factors, we ran a more complex variant of model 1 adding all two-way interactions between age, slot entropy, semantic density, and typicality, again fitting the model to the data for the low-frequency sequences. This model was not a significant improvement on model 1 (v2(6) = 8.44, p = .208, LLRI = 0.008) and did not reveal any significant interactions. Simpler variants of model 1, adding only the interaction between either age and slot entropy or age and semantic density, also did not give any significant improvement in fit over model 1 or reveal any significant interactions, suggesting that children from both age groups were similarly affected by these factors. Finally, we wanted to explore what impact the syntactic type of the frame might have. We did this by adding syntactic class into our model as a random effect on the intercept, using the classification found in Appendix B. Adding this to a baseline model including only the control variables and age resulted in a significant improvement in fit (v2(1) = 5.1, p < .024, LLRI = 0.005). However, this model had a higher AIC value that model 1 (indicating that model 1 offers a better fit to the data). Furthermore, a model with child and syntactic type as random effects on the intercept plus control variables, age, slot entropy, semantic density, and typicality as fixed effects gave a significant improvement in fit over a model including child and syntactic class as random effects with only control variables and age as fixed effects (v2(3) = 8.1, p < .05, LLRI = 0.008). Revealingly when we added syntactic class as a fixed effect to model 1 there was no improvement in fit (v2(1) = 1.54, p = .22, LLRI = 0.001), suggesting that the variance accounted for by syntactic class is a subset of that accounted for by our predictors. In summary, our predictors were seen to have significant explanatory value over and above that provided by the pooling of variance by syntactic class and the analysis offers strong support for the view that they apply across phrases of different syntactic types. Having considered how the properties of a four-word sequence affect the repetition of unfamiliar sequences, an additional question of interest is whether slot entropy and semantic density also affect the production of highly familiar word sequences. As it is very difficult to predict whether high-frequency items would benefit or not from high entropy (see Introduction), this analysis was more exploratory. We again investigated the value of the various predictors via model comparison. Adding age to a model including only the D. Matthews, C. Bannard ⁄ Cognitive Science 34 (2010) 479 controls again resulted in a significant improvement in fit (v2(1) = 16.59, p < .0001, LLRI = 0.029). Unlike for the unfamiliar sequences, adding slot entropy to the model including the controls and age resulted in no improvement in fit (v2(1) = 1.54, p = .214, LLRI = 0.003). However, the addition of semantic density to the model did result in a significant improvement in fit (v2(1) = 9.28, p < .005, LLRI = 0.016). Unlike for the unfamiliar sequences, this did not depend on the inclusion of slot entropy. A model containing slot entropy and semantic density in addition to the controls plus age offered no improvement in fit over one including the controls, age, and semantic density alone (v2(1) = 0.19, p = .660, LLRI < 0.001). Table 4 reports on the parameters for a model (model 2) including all the predictors (except of course typicality, which was not varied for high-frequency sequences) to the data for the high-frequency items. Age was again a significant positive predictor, with 2-yearolds being more likely to make mistakes in repetition. Semantic density was found to be a significant positive predictor, meaning that children were more likely to successfully reproduce a high-frequency sequence if the words that are typically seen in the last position of the schema are highly similar. Slot entropy was not found to be a significant predictor. A model including two-way interactions between age, slot entropy, and semantic density was not found to be an improvement over model 2 (v2(3) = 3.88, p = .275, LLRI = 0.007), and no interactions were found to be significant. The same applied for simpler models including any combinations of two-way interactions together or in isolation. We again also performed a Markov Chain Monte Carlo analysis and report HPD intervals for the models parameters. Finally, we again wanted to explore what impact the syntactic type of the frame might have. We did this once more by adding syntactic class into our model as a random effect on the intercept, using the classification found in Appendix B. Adding this to a baseline model including only the control variables and age did not result in a significant improvement in fit (v2(1) = 0.008, p = .931, LLRI < 0.001). Adding syntactic class to model 2 as a random effect resulted in no change in fit. Furthermore, model 2 had a much smaller AIC score (569.6) than a model containing only the controls, age, and syntactic class as a random effect (576.2). Thus, unlike for the unfamiliar sequences, the syntactic class of the sequence seemed to have no effect on the children’s ability to produce the sequence. Table 4 Fixed effects in model 2 fitted to data for familiar sequences HPD Intervals (Intercept) Frequency PC1 Frequency PC2 Frequency PC3 Age Slot entropy Semantic density B Lower Upper SE Z )0.46 )0.47 0.54 < )0.01 0.72 0.11 0.38 )0.85 )0.75 0.32 )0.48 0.37 )0.38 0.68 )0.12 )0.21 0.80 0.47 1.15 0.64 1.53 0.17 0.13 0.11 0.24 0.17 0.25 0.13 )2.74 )3.56 4.81 0.05 4.30 0.45 2.88 p-Value .006 <.001 <.001 .969 <.001 .652 .004 Note. Concordance between the predicted probabilities and the observed responses, C = 0.845, Somer’s Dxy = 0.689. 480 D. Matthews, C. Bannard ⁄ Cognitive Science 34 (2010) 4. Discussion The current experiment set out to test whether the distributional properties of simple four-word schemas (as estimated using a large corpus of child-directed speech) would affect how accurately unfamiliar versions of the schemas will be repeated by young children. One prediction was that the less certain a child is as to the way a sequence will end given the statistics of maternal input (the greater the slot entropy), the more likely he or she will be to form a basic generalization and hence the easier he or she would find it to produce an unfamiliar sequence. This indeed appears to be the case. Children in both age groups were better able to reproduce unfamiliar sequences with higher slot entropy. The semantic properties of a slot also affect ease of repetition of unfamiliar sequences. The more semantically similar the items that are likely to have been previously heard in a slot, the easier it was for children to repeat an unfamiliar variant of that schema. The patterns used in our experiment spanned syntactic phrase types, and we found in our statistical analysis that slot entropy and semantic density had predictive value over and above syntactic class, suggesting that they affect learning across phrase types. In contrast to our predictions, we observed no effect of the typicality of the final word in the unfamiliar sequences (assessed using both a categorical distinction based on the WordNet hierarchy and human judgments) and no interaction between the semantic density of the slots in our schemas and the typicality of our items (suggesting that producing a sequence that ended in a word that did not fit the semantics of the slot was apparently no harder if that slot was semantically very constrained). As an anonymous reviewer pointed out, this finding can be seen as consistent with a construction-based approach. That is, while the properties of the elements seen in a slot should affect the identification of a schema ⁄ construction at the point of learning, once a construction has been created, an open slot in a good construction should be able to take any word of the appropriate category. While such an explanation is plausible, we hesitate to explain our findings in this way. We see any sharp distinction between patterns in language that are constructions and those that are not that might be implied in the usage-based literature as an idealization for descriptive convenience rather than a strong claim about mental representation. We prefer to think of the learner as identifying very many patterns in the input which continue to compete for utilization, with the specific distributional properties of a pattern remaining an important part of the representation, rather than being discarded once a decision has been made to put a given schema ‘‘in the grammar.’’ Additionally, we also suspect that the lack of a typicality effect can be explained by aspects of our study design, as we now discuss. Our measure of typicality was based on evaluating the similarity of individual words not seen in the schema to individual words seen in the schema without considering the impact of context. It is possible, then, that our manipulation of typicality was not effective because, in creating items whose meaning matched particular observed items, we introduced a degree of unnaturalness to the ‘‘typical’’ stimuli that may have disguised any effects. In finding matches, we were also forced to use low-frequency words in some cases, so it could be that the children were not that familiar with the semantics and semantic relatedness of some of our items (e.g., the similarity of ‘‘box’’ to ‘‘case’’ and ‘‘water’’ to ‘‘liquid’’ might not be D. Matthews, C. Bannard ⁄ Cognitive Science 34 (2010) 481 transparent for a 2-year-old). Alternatively, in judging typicality or atypicality with reference to the particular familiar item used, we may not have picked up on confounding similarities and distances from other words that can appear in the schemas. In brief, much further work is required to develop developmentally plausible measures of semantics before we can draw any firm conclusions about typicality effects. Further work on semantics would also allow us to clarify the beneficial effect of semantic density, which is potentially controversial. In the terminology of the usage-based tradition, slot formation can be seen as an instance of category formation on the basis of functionally based distributional analysis (c.f. Tomasello, 2003; p.124). That is, children should have expectations concerning what words or phrases they are going to see in a particular position based on the functions of the words that have been seen there before. This led us to predict a positive effect of semantic density (and typicality). On the other hand, many theorists have proposed that semantic openness would benefit productivity (e.g., Bybee, 1995; Goldberg, 2006). Our current findings of an effect of semantic density but not of typicality or of an interaction between the two do not sit easily with either account. Further investigation will be required to pull this apart, but the current results certainly suggest that a degree of semantic coherence aids repetition even in the absence of a semantic link between the target sentence and the construction semantics. While the main focus of our study was on generalization and hence on children’s repetition of the unfamiliar items, we also asked children to repeat a single instantiation of each schema that occurred with some frequency in our corpus and hence with which we could expect the children to be familiar. The purpose of this was to investigate whether schema properties affect processing even in circumstances where a sequence could in principle be retrieved directly from memory. There was no effect of slot entropy on the repetition of familiar items, suggesting that the children are employing a different route to production for such items. We did, however, observe an effect of semantic density on the repetition of familiar items, which would suggest the opposite. Further work will be required to clarify why we see an effect of one factor but not the other. As we noted in the introduction and explain further in the method, it could simply be that the relationship between the frequency of the familiar string and the entropy of the schema is not a straightforward one in our stimuli. Further testing with more items would of course help to clarify this, but, as we now discuss, expanding our list of items is not straightforward. In the current study, we were able to identify items that were dispersed over the range of slot entropy and semantic density values. However, doing so left us little freedom in choosing items. The fact that many of the factors that are considered to contribute to children’s language learning are difficult to isolate in this way is not only a practical problem. It also shows how different factors overlap in the input (sometime supporting one another and sometimes conflicting with one another), and thus it emphasizes the gap between the kind of idealized problems children face in artificial grammar learning experiments and those children face in learning language. Bridging this gap will almost certainly require conducting more experiments of the current variety. Doing so will allow us to investigate phonological, syntactic, and semantic factors that we were not able to control in the current study. 482 D. Matthews, C. Bannard ⁄ Cognitive Science 34 (2010) A final limitation of the current study is because of the nature of the repetition task. It is usually assumed that when asked to repeat an utterance children analyze the utterance and then generate it as they would in ordinary speech. The task thus draws on comprehension and production skills in turn. Failure to repeat the utterance might be because of difficulty understanding it, difficulty articulating it, or both. Complementary methodologies are required to further clarify when in processing the effects we report take hold. Alternative methods would also allow us to explore the task specificity of the current effects. For example, it could be that the test situation leads the child to be more conservative or more careful than the child would be in normal speech, which might explain, for example, our failure to find an effect of typicality. So what are the broader implications of the current study for language learning? In previous work (Bannard & Matthews, 2008) we have shown that sequences of sounds that are heard with little variation in the input are likely (as predicted by the many findings in the word segmentation literature) to be identified as units of language that are candidates for words or holophrases, with direct reuse of such sequences from the input being preferred where available and frequent. In the present paper, we have shown that if such sequences occur with some points of variation then the possibility of forming productive morpho-syntactic slots arises and becomes more likely if slot fillers form coherent categories. Unfamiliar sequences that match resulting, partially abstract schemas will be processed more fluently (c.f. Buchner, 1994; Pothos, 2007 for similar effects of fluency in artificial grammar learning). This proposal is in line with a growing literature on ‘‘variation sets’’—successive utterances in child-directed speech that have partial lexical overlap (Küntay & Slobin, 1996; Onnis, Waterfall, & Edelman, 2008). These studies suggest the effects observed in the current study arise because many of the three-word stems will have occurred in variant forms in quick succession in the input. The processes of learning we have sketched here are arguably most consistent with constructivist approaches to language development (e.g., Edelman, 2007; Goldberg, 2006; Tomasello, 2003). On such accounts grammatical development occurs in a piecemeal fashion with early knowledge consisting of sequences of words taken directly from the input with limited generalization across forms. In the present study, we have provided evidence that children’s ability to produce novel sequences of words can be predicted from their previous experience with overlapping sequences, and that this holds for 3-year-olds as for 2-yearolds. We note, however, that this does not rule out the likely possibility that children this young might be quite adept at producing syntactic structures even in the absence of exposure to many directly overlapping forms. Rather this finding demonstrates that children are sensitive to statistical regularities in their language that are plausibly relevant to learning about syntactic structure. We find this question of learnability more interesting than the question of when precisely children show abstraction of a given syntactic structure (c.f. Pulvermüller & Knoblauch, 2009 for a recent attempt at a neurally plausible account of the acquisition of a simple combinatorial grammar where abstraction and learnability sit happily together). We should also note that even if highly abstract syntactic structures are in principle available to the child, it is not obvious that the child should prefer to store or use them. Indeed we would suggest that lexically specific representations are unlikely to be just a ladder to D. Matthews, C. Bannard ⁄ Cognitive Science 34 (2010) 483 abstract syntax, to be kicked away once learning is complete. Rather, they might be expected to form part of any rational agent’s model of the language he or she is trying to learn. A rational learner will want to find the model that assigns a high probability to the exact data observed and reduces the probability of other possible sets of data (see chapter 28 of Mackay, 2003 for a detailed Bayesian approach to model comparison of this kind). Abstract models by their very nature are less tied to particular data and can be used to generate a larger set of possible language. All else being equal, we would expect a rational learner to reuse the input as much as possible even once he or she has acquired additional competence. Although it is by no means clear whether a psychologically plausible model of language learning will reveal children to be rational in this sense, this might explain why we see these kinds of lexically specific representations relatively late in development even once more abstract representations can be expected to have emerged. So what linguistic theories might account for our data? The idea that speakers store and use sequences of specific words has been acknowledged by all models of syntax and is not exclusive to usage-based accounts. All theories, after all, need to account for the presence in language of idiomatic phrases. Where theories differ is in how such phrases fit into their account. Early generative accounts regarded idioms as simply an extension of a lexicon that was very much separate from the core grammatical processes, and they were argued to obtain meaning ‘‘in the manner of a lexical item rather than as a projection from the meanings of its constituents in the manner of compositional complex constituents...’’ (Katz, 1973; p. 358). It has come to be acknowledged that the kind of phrases that reoccur with frequency and that appear not to be the result of a fully abstract generative process is rather larger than earlier theorists had supposed (Jackendoff, 1995). Furthermore, the distinction between grammar and lexicon has come to be regarded as unsustainable in many contemporary generative models where information about how words can combine is a part of lexical entries, with composition occurring via uniform operations (e.g., Bresnan, 2001; Croft, 2001; Goldberg, 1995; Pollard & Sag, 1994; I. A. Sag, unpublished data; Steedman, 2000). There has been a growing awareness that multiword sequences interact with syntactic and semantic phenomena in a way that makes a dual-route model in which they are stored separately untenable (e.g., Nunberg, Sag, & Wasow, 1994), and word sequences have come to be acknowledged as integrated with core grammatical processes (e.g., Culicover & Jackendoff, 2005; Jackendoff, 2002). While our findings are incompatible with an account of syntactic competence which draws a strict distinction between memory-based processing at the word level and procedural processing for grammar (Ullman, 2001), they could, it seems, be accounted for by any model of syntax in which sequence-specific processing is given a role. However, it is important to consider that accounting for the behavior observed here requires any such theory to be somewhat liberal in deciding what sequences will be stored. The integration of sequence or constructional level representations and processes into theories of grammatical competence has been motivated by the observation that there are sentences that cannot otherwise be accounted for. Arguments for this have tended to be based on the syntactic or semantic nature of the phrase and its incompatibility with general compositional or productive processes. We see no reason to believe that the patterns which we use in our study are syntactically or semantically 484 D. Matthews, C. Bannard ⁄ Cognitive Science 34 (2010) idiosyncratic. The explanation for the children’s having pattern-specific representations seems rather to be a matter purely of their distribution. This fact is easiest to accommodate within a usage-based approach where linguistic knowledge is made up of pairings of function with form at any point on the lexically specific ⁄ syntactically abstract continuum. If we agree that constructions have some psychological primacy across the life span, then this study makes a contribution in suggesting what factors would lead to their identification. However, regardless of how we want to characterize the end point of learning, the results here favor the acceptance of a model of syntactic competence in which lexically specific processing plays a substantial role at the ages of 2 and 3. Notes 1. This prediction is complicated by the fact that familiar sequences may vary in the degree to which they are an expected completion of a known pattern. For some items lower entropy might be especially beneficial. This would be the case if one had a strong expectation about what would come next and the high-frequency sequence fulfilled that expectation. However, it is possible that, although highly frequent, some items would not be most expected for a child and for these there may be a degree to which higher entropy is better. 2. The log likelihood ratio index indicates the proportion of the variance explained by the more complex model that is accounted for by the predictors of interest. It can be interpreted as a partial pseudo R2 value (see Veall and Zimmermann, 1996). Acknowledgments The authors would like to thank Jess Butcher, Ellie O’Malley, Manuel Schrepfer, and Elizabeth Wills for help in data collection and coding; Harald Baayen and Roger Mundry for statistical advice; and Bruno Estigarribia, Adele Goldberg, and Julian Pine for helpful comments on the manuscript. This research was supported by postdoctoral fellowships awarded to both authors by the Max Planck Institute for Evolutionary Anthropology, Leipzig. References Baayen, R. H. (2008). Analyzing linguistic data: A practical introduction to statistics. Cambridge, England: Cambridge University Press. Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59(4), 390–412. Bannard, C., & Matthews, D. E. (2008). Stored word sequences in language learning: The effect of familiarity of children’s repetition of four-word combinations. Psychological Science, 19(3), 241–248. Braine, M. (1976). Children’s first word combinations. Monographs of the Society for Research in Child Development, 41(1), serial no. 164. D. Matthews, C. Bannard ⁄ Cognitive Science 34 (2010) 485 Bresnan, J. (2001). Lexical-functional syntax. Malden, MA: Blackwell. Buchner, A. (1994). Indirect effects of synthetic grammar learning in and identification task. Journal of Experimental Psychology: Learning Memory, and Cognition, 20(3), 550–566. Bybee, J. (1985). Morphology: A study of the relation between meaning and form. Amsterdam: John Benjamins. Bybee, J. (1995). Regular morphology and the lexicon. Language and Cognitive Processes, 10(5), 425–455. Croft, W. (2001). Radical construction grammar: Syntatic theory in typological perspective. Oxford, England: Oxford University Press. Culicover, P. W., & Jackendoff, R. (2005). Simpler syntax. Oxford, England: Oxford University Press. Dixon, P. (2008). Models of accuracy in repeated-measures designs. Journal of Memory and Language, 59(4), 447–456. Edelman, S. (2007). Behavioral and computational aspects of language and its acquisition. Physics of Life Reviews, 4, 253–277. Fellbaum, C. (Ed.). (1998). WordNet: An electronic lexical database. Cambridge, MA: MIT Press. Freudenthal, D., Pine, J. M., Aguado-Orea, J., & Gobet, F. (2007). Modelling the developmental pattern of finiteness marking in English, Dutch, German and Spanish using MOSAIC. Cognitive Science, 31, 311–341. Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel ⁄ hierarchical models. Cambridge, England: Cambridge University Press. Goldberg, A. E. (1995). Constructions: A construction grammar approach to argument structure. Chicago: University of Chicago Press. Goldberg, A. E. (2006). Constructions at work: The nature of generalization in language. Oxford, England: Oxford University Press. Gomez, R. L. (2002). Variability and detection of invariant structure. Psychological Science, 13(5), 431–436. Gomez, R. L., & Lakusta, L. (2004). A first step in form-based category abstraction by 12-month-old infants. Developmental Science, 7(5), 567–580. Gomez, R. L., & Maye, J. (2005). The developmental trajectory of nonadjacent dependency learning. Infancy, 7(2), 183–206. Hale, J. (2006). Uncertainty about the rest of the sentence. Cognitive Science, 30(4), 609–642. Harris, Z. (1964). Distributional structure. In J. Fodor & J. Katz (Eds.), The structure of language: Readings in the philosophy of language (pp. 33–49). Englewood Cliffs, NJ: Prentice Hall. Hirst, G., & St-Onge, D. (1998). Lexical Chains as representations of context for the detection and correction of malapropisms. In C. Fellbaum (Ed.), WordNet: An electronic lexical database (pp. 305–332). Cambridge, MA: MIT Press. Jackendoff, R. (1995). The boundaries of the lexicon. In M. Everaert, E. Van der Linden, A. Schenk, & R. Schreuder (Eds.), Idioms: Structural and psychological perspectives (pp. 133–165). Hillsdale, NJ: Lawrence Erlbaum Associates. Jackendoff, R. (2002). Foundations of language: Brain, meaning, grammar, evolution. Oxford, England: Oxford University Press. Jaeger, T. F. (2008). Categorical data analysis: Away from ANOVAs (transformation or not) and towards logit mixed models. Journal of Memory and Language, 59(4), 434–446. Johnson, E. K., & Tyler, M. D. (in press). Testing the limits of artificial language learning. Developmental Science. Katz, J. (1973). Compositionality, idiomaticity and lexical substitution. In S. Anderson & P. Kiparsky (Eds.), A Festschrift for Morris Halle (pp. 392–409). New York: Holt Rinehart and Winston. Keller, F. (2004). The Entropy Rate Principle as a predictor of processing effort: An evaluation against eye-tracking data. Paper presented at the Empirical Methods in Natural Language Processing, Barcelona. Kempe, V., Brooks, P. J., Mironova, N., Pershukova, A., & Fedorova, O. (2007). Playing with word endings: Morphological variation in the learning of Russian noun inflections. British Journal of Developmental Psychology, 25(1), 55–77. 486 D. Matthews, C. Bannard ⁄ Cognitive Science 34 (2010) Küntay, A. C., & Slobin, D. (1996). Listening to a Turkish mother: Some puzzles for acquisition. In D. Slobin, J. Gerhardt, A. Kyratis, & T. Guo (Eds.), Social interaction, social context and language: Essays in honor of Susan Ervin-Tripp (pp. 265–286). Hillsdale, NJ: Erlbaum. Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: the Latent Semantic Analysis theory of acquisition, induction and representation of knowledge. Psychological Review, 104(2), 211–240. Levy, R. (2008). Expectation-based syntactic comprehension. Cognition, 106(3), 1126–1177. Lieven, E. V. M., Pine, J. M., & Baldwin, G. (1997). Lexically based learning and early grammatical development. Journal of Child Language, 24, 187–219. Mackay, D. J. C. (2003). Information theory, inference, and learning algorithms. Cambridge, England: Cambridge University Press. Magnuson, J. S., Mirman, D., & Strauss, T. (2007). Why do neighbors speed visual word recognition but slow spoken word recognition? 13th Annual Conference on Architectures and Mechanisms for Language Processing. McFadden, D. (1974). Conditional log it analysis of qualitative choice behavior. In P. Zarembka (Ed.), Frontiers in econometrics (pp. 105–142). New York: Academic Press. Monaghan, P., & Christiansen, M. H. (2008). Integration of multiple probabilistic cues in syntax acquisition. In H. Behrens (Ed.), Corpora in language acquisition research (pp. 139–163). Amsterdam: Johns Benjamins. Moscoso del Prado Martı́n, F., Kostiç, A., & Baayen, H. (2004). Putting the bits together: An informational theoretic perspective on morphological processing. Cognition, 94(1), 1–18. Nunberg, G., Sag, I. A., & Wasow, T. (1994). Idioms. Language, 70, 491–538. Onnis, L., Waterfall, H. R., & Edelman, S. (2008). Learn locally, act globally: Learning language from variation set cues. Cognition, 109(3), 423–430. The Oxford English Dictionary. (1989). Available at http://www.oed.com. Pelucchi, B., Hay, J. F., & Saffran, J. (2009). Statistical learning in a natural language by 8-month-old infants. Child Development, 80(3), 674–685. Pine, J. M., & Lieven, E. V. M. (1997). Slot and frame patterns and the development of the determiner category. Applied Psycholinguistics, 18, 123–138. Pollard, C., & Sag, I. A. (1994). Head-driven phrase structure grammar. Chicago: University of Chicago Press. Pothos, E. M. (2007). Theories of artificial grammar learning. Psychological Bulletin, 133(2), 227–244. Potter, M. C., & Lombardi, L. (1990). Regeneration in the short-term recall of sentences. Journal of Memory & Language, 29(6), 633–654. Pulvermüller, F., & Knoblauch, A. (2009). Discrete combinatorial circuits emerging in neural networks: A mechanism for rules of grammar in the human brain? Neural Networks, 22(2), 161–172. Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old infants. Science, 274(5294), 1926–1928. Shannan, C. E., & Weaver, W. (1949). The mathematical theory of communication. Urbana, IL: University of Illinois Press. Shepard, R. N. (1987). Toward a universal law of generalization for psychological science. Science, 237(4820), 1317–1323. Steedman, M. (2000). The syntactic process. Cambridge, MA: MIT Press. Tomasello, M. (2003). Constructing a language: A usage-based theory of language acquisition. Cambridge, MA: Harvard University Press. Ullman, M. T. (2001). The declarative ⁄ procedural model of lexicon and grammar. Journal of Psycholinguistic Research, 30(1), 37–69. Valian, V., & Aubry, S. (2005). When opportunity knocks twice: two-year-olds’ repetition of sentence subjects. Journal of Child Language, 32(3), 617–641. Veall, M. R., & Zimmerman, K. F. (1996). Pseudo-R2 measures for some common limited dependent vanable models. Journal of Economic Surveys, 10(3), 241–259. Yamamoto, M., & Church, K. W. (2001). Using suffix arrays to compute term frequency and document frequency for all substrings in a corpus. Computational Linguistics, 27(1), 1–30. 4.16 0.00 0.00 2.48 0.00 0.00 3.61 0.00 0.00 2.20 0.00 0.00 2.20 0.00 0.00 2.64 0.00 0.00 3.66 0.00 0.00 2.77 0.00 0.00 5.56 0.00 0.00 8.19 8.19 8.19 8.44 8.44 8.44 10.74 10.74 10.74 9.35 9.35 9.35 10.74 10.74 10.74 9.48 9.48 9.48 11.15 11.15 11.15 9.70 9.70 9.70 7.80 7.80 7.80 9.80 9.80 9.80 9.62 9.62 9.62 6.94 6.94 6.94 7.57 7.57 7.57 6.28 6.28 6.28 10.74 10.74 10.74 5.12 5.12 5.12 10.74 10.74 10.74 9.48 9.48 9.48 11.12 11.12 11.12 11.12 11.12 11.12 9.62 9.62 9.62 8.88 8.88 8.88 9.62 9.62 9.62 8.54 8.54 8.54 9.43 9.43 9.43 6.62 6.62 6.62 10.74 10.74 10.74 7.31 4.98 4.19 7.19 3.58 3.30 6.50 3.14 4.11 6.53 3.00 2.08 6.19 5.97 6.21 7.32 4.36 5.12 6.80 4.89 5.61 6.88 6.08 6.47 9.08 8.86 9.21 6.27 6.27 6.27 7.13 7.13 7.13 6.03 6.03 6.03 5.11 5.11 5.11 4.54 4.54 4.54 7.92 7.92 7.92 4.36 4.36 4.36 6.16 6.16 6.16 5.90 5.90 5.90 8.87 8.87 8.87 8.09 8.09 8.09 6.59 6.59 6.59 4.68 4.68 4.68 4.36 4.36 4.36 7.06 7.06 7.06 4.47 4.47 4.47 5.60 5.60 5.60 7.92 7.92 7.92 6.57 2.08 3.61 5.89 2.30 0.69 4.70 1.10 0.69 4.84 0.00 0.00 3.93 3.26 3.50 4.50 0.00 2.08 6.14 3.18 3.26 4.80 1.10 0.00 6.80 1.61 1.39 5.82 5.82 5.82 6.57 6.57 6.57 5.89 5.89 5.89 4.01 4.01 4.01 3.81 3.81 3.81 4.76 4.76 4.76 4.04 4.04 4.04 3.33 3.33 3.33 5.80 5.80 5.80 5.46 1.10 2.64 3.18 0.00 0.00 4.34 0.00 0.00 2.40 0.00 0.00 2.56 0.00 0.69 4.23 0.00 1.95 4.08 0.00 0.00 4.62 0.69 0.00 6.74 0.69 1.10 Note. 1 was added to all frequencies before taking the logarithms to accommodate frequencies of 0. For all models reported in the body of the paper we also conducted alternative analyses in which PCs were built using log frequencies in which we added values at intervals between 0.00000000001 and 1. The pattern of results (the outcome of all model comparisons) was found to be the same regardless of the value added. Back in the box Back in the case Back in the town Out of the water Out of the liquid Out of the pudding A piece of toast A piece of meat A piece of brick It’s time for lunch It’s time for soup It’s time for drums A bowl of cornflakes A bowl of biscuits A bowl of flowers Have a nice day Have a nice hour Have a nice meal You bumped your head You bumped your leg You bumped your toy What a funny noise What a funny sound What a funny cup Let’s have a look Let’s have see Let’s have a think Sequence 1st Word 2nd Word 3rd Word 4th Word 1st Bigram 2nd Bigram 3rd Bigram 1st Trigram 2nd Trigram Frequency Frequency Frequency Frequency Frequency Frequency Frequency Frequency Frequency Frequency Log frequencies of stimulus sequences and their component words, bigrams, and trigrams in a 1.72 million-word child language corpus Appendix A D. Matthews, C. Bannard ⁄ Cognitive Science 34 (2010) 487 488 D. Matthews, C. Bannard ⁄ Cognitive Science 34 (2010) Appendix B Syntactic types of stimuli sentences Prepositional phrases Back in the box | case | town Out of the water | liquid | pudding Noun phrases A bowl of cornflakes | biscuits | flowers A piece of toast | meat | brick Sentences You bumped your head | leg | toy What a funny noise | sound | cup Let’s have a look | see | think Have a nice day | hour | meal It’s time for lunch | soup | drums Supporting Information Additional Supporting Information may be found in the online version of this article: Appendix S1: Principal components analysis for stimuli frequencies Appendix S2: Checking our models for collinearity Appendix S3: Obtaining human similarity judgments for evaluating sequence typicality Please note: Wiley-Blackwell is not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.
© Copyright 2026 Paperzz