___________________________________________________________________________ Proceedings of 12th Chinese Lexical Semantics Workshop Ontogenetic Lexical Development in Taiwan Southern Min Jane Tsay Chin-Chuan Cheng Graduate Institute of Linguistics, National Chung Cheng University; Graduate Institute of Teaching Chinese as a Second language, Taiwan Normal University; Taiwan [email protected]; [email protected] Abstract: This is a corpus-based study of young children's ontogenetic lexical development in Taiwan Southern Min. A Corpus containing longitudinal data of 14 young children approximately one and a half to four years old acquiring Southern Min Chinese as their first language was used to study vocabulary size and growth, and word-sense induction for superordinate categories. The vocabulary growth rate for all the individuals varied a little, but all of them acquired about 2,000 words at age 4. With regard to word senses, the children were able to relate senses of similar words, but they did not learn or explicitly form superordinate categories. Keywords: Language acquisition, database, Southern Min, vocabulary size and growth, lexical semantic organization, superordinate categories. 1. Introduction This is a corpus-based study of young children's vocabulary development or word acquisition. When we say that a child has acquired a word, it usually means that the child knows how the word is pronounced, what it means, and how it is used in larger linguistic units such as phrases, sentences, and discourse. In this paper we are interested in finding out the vocabulary size of young children in the first four years of life, as compared to that of the adults. We are also interested in the vocabulary growth of Chinese speaking children, as compared to that of English. Moreover, we will examine the development of word senses in terms of basic-level words and superordinate categories in a lexical semantic hierarchy. Cheng (1996, 2000, 2002a) has found that human cognition of linguistic symbols has an upper bound of 8,000 units (basic words). This is attested both by works of English and Chinese writers. If this is the case for adults, what is the vocabulary size of young children? This question has motivated the current study. For English speaking children, Susan Carey claims that by age six the average child has learned over 14,000 words (including inflected and derived words) or 8,000 root words (Carey 1978:264; see also Miller and Gildea 1991). A related issue on children's vocabulary size is the rate of vocabulary growth. Clark (1993) has reported the rate of learning new words in young children. We will compare the rate of vocabulary growth of Taiwanese children with that of children of other languages. 261 ___________________________________________________________________________ Proceedings of 12th Chinese Lexical Semantics Workshop 2. Data Source Data of young children's vocabulary development used in this study are gleaned from the Taiwan Child Language Corpus (TAICORP, Tsay 2005). The corpus contains longitudinal data of young children (approximately one and a half to four years old) acquiring Southern Min Chinese (also called Taiwan Southern Min or Taiwanese in the literature) as their first language. The construction of this corpus was funded by the National Science Council, Taiwan. The corpus is based on spontaneous speech of young children at play. All of the fourteen children, nine boys and five girls, who participated in the corpus project came from Southern Min speaking families in Minxiong township in Jiayi County of southern Taiwan. The basic information is listed in Table 1 below. Table 1. Children in TAICORP Name Sex YDA M YCX M LJX M ZQM M LMC F YJK M CEY F HBL M LWJ F WZX M YSW M TWX F HYS M LYC F Total 9 boys, 5 girls Age Sessions 3;11-4;4 3;10-4;0 3;9-4;2 2;9-4;6 2;8-5;3 2;6-2;6 2;1-3;10 2;1-4;0 2;1-3;7 2;1-4;3 1;7-2;7 1;5-3;6 1;2-3;4 1;2-3;3 9 6 8 30 50 2 37 45 36 44 21 44 51 48 431 Duration (minutes) 540 285 530 1584 2045 105 1728 1889 1777 1757 1210 1829 2280 2255 approximately 330 hours Recordings were made through home visits on a regular basis, every two weeks for children before three years old and every three weeks for children older than three years old. The participants, in addition to the child, included the care-taker (parents or grandparents) and the observing linguist. Each recording is approximately 40-60 minutes in length, transcribed into both Chinese characters and Southern Min romanization by the observer. The transcribed files are double-checked by two linguists. The total recording time of the corpus is about 330 hours, consisting of 431 text files, and corresponding 431 sound files. The whole corpus is in CHILDES format, i.e., Child Language Data Exchange System (MacWhinney and Snow 1985, MacWhinney 1995). 262 ___________________________________________________________________________ Proceedings of 12th Chinese Lexical Semantics Workshop 3. Cumulative Vocabulary As mentioned above, Cheng has found that human cognition of linguistic symbols has an upper bound of 8,000 units (basic words). The lexical entries in TAICORP are word-based. The total number of words produced by the children as collected in the corpus is 8,250. This is the cumulative vocabulary of 14 children aged between 1;2 and 5;3. For comparison, the cumulative vocabulary of adults (parents/grandparents of the child and/or the observer) in the corpus is 12,882. The mean length of utterance measured in words (MLU) is also given in Table 2. Table 2 Cumulative Vocabulary of Children and Adults in TAICORP Lines MLU Tokens Words (cumulative vocabulary) Children 161,253 2.695 434,557 8,250 Adults 336,173 3.605 1,211,946 12,882 Total 497,426 3.150 1,646,503 14,172 Note that the vocabulary size of both the adults and the children (14,172 words) is not the same as adding up their separate vocabularies, i.e., 8,250 words for the children and 12,882 words for the adults. This is because the vocabulary of the children overlaps largely with that of the adults. In Table 3 we show the vocabulary size of 8 children. Table 3. Children’s cumulative vocabulary Child Sex Age Session LYC HYS TWX LWJ CEY HBL WZX F M F F F M M 1;2 – 3;3 1;2 – 3;4 1;5 – 3;6 2;1 – 3;7 2;1 – 3;10 2;1 – 4;0 2;1 – 4;0 48 51 44 36 37 45 41 Duration (minutes) 2255 2280 1829 1777 1728 1889 1670 Words 1417 1743 1942 1439 1213 1866 2156 Overall, the vocabulary size of these young children before age four is smaller than (or just a little bit above) 2,000. TAICORP does not have data of six-year-old children. The oldest child LMC had a vocabulary size of 1,974 lexical items at the age of 5;3 and the second oldest ZQM had 1,476 at the age of 4;6. These numbers seem to be quite small considering Susan Carey's claim that by age six the average child of English has learned over 14,000 words (including 263 ___________________________________________________________________________ Proceedings of 12th Chinese Lexical Semantics Workshop inflected and derived words) or 8,000 root words. The cross-linguistic comparison seems to be tricky here. It seems reasonable to assume that children across languages should have similar vocabulary size. Although the children in TAICORP are much younger (mostly of the recordings were made before 4 years old) than 6;0, it is hard to imagine, for example, how LMC's vocabulary would increase from less than 2,000 words to 14,000 in nine months. At this point, it is not clear why this big gap exists between these two groups of children. One confounding factor is the different morphological systems of these two languages. However, this factor alone does not seem enough to explain this big gap. More data, especially compatible data, are needed before we can draw any conclusions. Clark (1993) discusses two young children's vocabulary growth. One child Keren (reported in Dromi 1987) produced up to 337 new words by 1;5.23, while the other child Damon produced up to 337 new words by 1;9.24. We compared the vocabulary growth of the two children and the two Taiwanese children LYC and HYS. We found that Keren reached 337 new words around 1;5, Damon around 1;9, LYC between 1;5 and 1;6, and HYS between 1;6 and 1;7 as shown in Table 4. Table 4. Cumulative vocabulary and vocabulary growth 1;2 Keren Damon 1;3 1;4 girl Hebrew boy English 1;5 337 1;6 1;7 1;8 1;9 337 LYC girl TSM 45 76 153 240 358 367 468 518 HYS boy TSM 60 76 91 165 275 460 522 611 Among the four children, Damon's development at this stage is slower than that of the others. However, the individual differences among the children seem to be in a reasonable range. Moreover, the child that started with a higher growth rate did not take lead all the way. For example, LYC had a larger vocabulary than HYS at 1;6. However, HYS took the lead starting in 1;7. 4. Acquisition of Word Senses When an adult native speaker of a language knows the meaning of a word, the person knows other words that are in some way related to the senses of that particular word. Near synonyms are examples of word relations. Furthermore, synonyms may belong to a category of senses. Such a category is the superordinate term and may often have a word for its name. For example, WordNet (Princeton University Cognitive Science Lab 2005) gives a sense of the word “hand” as follows with synonyms and superordinate terms: 264 ___________________________________________________________________________ Proceedings of 12th Chinese Lexical Semantics Workshop hand, manus, mitt, paw -- (the (prehensile) extremity of the superior limb => extremity -- (that part of a limb that is farthest from the torso) => external body part -- (any body part visible externally) => body part -- (any part of an organism) => part, piece -- (a portion of a natural object) => thing -- (a separate and self-contained entity) => physical entity -- (has physical existence) => entity -- (perceived or known or inferred to) The basic-level words are “hand, manus, mitt, paw” as near synonyms. The superordinate categories up the hierarchy are “extremity”, “external body part”, “body part”, “part, piece”, “thing”, “physical entity” and “entity”. We are interested in how and when a child acquires near synonyms and superordinate concepts or words. We now return to TAICORP to examine the learning of Southern Min words 手 “hand” and 跤 “foot”. In Standard Mandarin or Modern Chinese, the word corresponding to Southern Min 跤 is 腳. In our database of Modern Chinese word senses the words 手 and 腳 is each given a sense hierarchy of superordinate terms as: 手 雙手 手爪 / 手 / 四肢 臂 手 腿 腳 / 全身 / 物 腳 足 趾 腳丫子 腳鴨子 / 腳 / 四肢 臂 手 腿 腳 / 全身 / 物 The semantic structure for 手 has the basic-level words consisting of near synonyms 手 雙手 手爪. The next superordinate term is appropriately 手. Further up the hierarchy are the superordinate terms 四肢 臂 手 腿 腳 as a category, 全身, and 物. The higher level terms for 腳 shares the superordinate terms for 手. Chinese dialects share many common words and senses. Mandarin and Southern Min share the word 手. The word 腳 is said as 跤 in Southern Min. Moreover, colloquial Southern Min uses 跤手 for literary 四肢, and 臂 is rarely heard of. Thus we will change the terms in the hierarchy to make the sense structure of these two words for Sothern Min as follows: 手 雙手 手爪 / 手 / 跤手 手 腿 跤 / 全身 / 物 跤 足 趾 / 跤 / 跤手 手 腿 跤 / 全身 / 物 Our goal here is to find out from the TAICORP data if the children younger than age 5 can learn the higher superordinate terms and have structure of categories. In the child development literature, Carey (1985) states that children increasingly develop an ability to use superordinate categories as the basis for induction between ages 4 and 10. Our children were younger and did not belong to this age group. We cannot readily conclude that these children before age 4 have not formed some 265 ___________________________________________________________________________ Proceedings of 12th Chinese Lexical Semantics Workshop structure of word senses. Gelman and O'Reilly’s (1988) experimental studies suggest that preschool children assume that basic-level words share internal parts. Arias-Trejo and Plunkett (2009) states that infants have begun to develop semantic-associative links between lexical items as early as 21 months of age. Many of the studies of superordinate terms involved experiments and referred to superordinate without a formal semantic structure. In contrast we use TAICORP and explicit Chinese lexical semantic organization to pursue the issue here. Here is a session of recording with the child TWX at age 2;2.25. First let us explain the code in the following presentation of dialogue. Code for participants: CHI: the target child, INV: investigator, GRM: the child's grandma (caretaker). Notations in the gloss: CLS – classifiers, PT – particles, [m] – speech in Mandarin, yyy – unrecognizable speech. First the investigator asked a child, “How many legs does the tiger have?”. The child playfully replied “five”. The conversation continued: *INV: 啥物? what "What?" *CHI: 四 支 four CLS "Four legs." … *GRM: 伊 四 跤 foot/leg 支 跤 it four CLS foot/leg "Its four legs include what?" *CHI: 有 這. have this "Have this." *INV: henn PT (confirmation) "Yes." *CHI: 手. hand "Hand(s)." *GRM: 有 手 有 啥? have what a. PT oo. have hand PT (doubt) "Have hand(s)?" *CHI: hm. PT (confirmation) "Yes." 266 ___________________________________________________________________________ Proceedings of 12th Chinese Lexical Semantics Workshop *INV: hoo, 伊 四 支 攏 跤 la. PT it four CLS all foot/leg "Oh. Its four (CLS) should be all legs." *CHI: 伊 四 支 攏 跤. PT (correction) it four CLS all foot/leg "Its four (CLS) are all legs." The child said some of the feet were hands 手. Apparently the child relate 手 and foot 跤 semantically. While these two words were grouped in one category, the superordinate term 跤手 never occurred in her speech. Her occurrences of words are listed in Table 5. Table 5 Words of 手 and 跤 in TWX’s speech with some dates omitted 手 手工 手機仔 手錶仔 放手 洗手 牛跤 四跤仔 灶跤 架跤 塗跤 跤 跤踏 跤踏車 樓跤 豬跤 褪赤跤 2;1 2;2 2;3 3;0 3;1 3;2 3;3 3;5 3;6 3;11 4;1 4;2 4;3 3 5 7 6 1 1 1 4 1 1 1 3 1 1 3 5 2 5 1 3 2 3 1 4 2 11 1 2 1 3 1 4 4 1 4 7 1 1 2 1 3 3 1 1 1 2 3 3 1 1 1 1 As can be seen from the table, both 手and跤 were learned early, in this case age 2;1. But these two words never appeared as a compound word or phrase to refer to the superordinate “limb” which in Southern Min is 跤手 in adult speech. The recording of this child ended in age 4;3. We therefore can say that before age 4 superordinate terms are not readily observable. However, the absence of such a term does not necessarily imply that children before age 4 lacks mental ability to perceive senses in certain relationship. The data from TWX as presented above indicate that she 267 ___________________________________________________________________________ Proceedings of 12th Chinese Lexical Semantics Workshop already have the category of “limb” in her mind. 5. Conclusions To conclude, the vocabulary growth rate for all the Sothern Min children varied a little, but all of them acquired about 2,000 words at age 4. With regard to word senses, the children were able to relate senses of similar words, but they did not learn or explicitly form superordinate categories before age 4. References [1] [2] [3] [4] Arias-Trejo N., Plunkett K. 2009 Lexical–semantic priming effects during infancy. Philosophical Transactions of the Royal Society B 364:3633–3647. Carey, Susan. 1978. The child as word learner. In Morris Halle, Joan Bresnan, and George A. Miller, eds. Linguistic Theory and Psychological Reality 264-293. Cambridge, MA: MIT Press. Carey, Susan. 1985. Conceptual change in childhood. Cambridge, MA: Bradford Cheng, Chin-Chuan. 1998. 從 計 量 理 解 語 言 認 知 (Quantification for understanding language cognition). In Benjamin K. T’sou, Tom B. Y. Lai, Samuel W. K. Chan, and William S-Y. Wang, eds., 漢 語 計 量 與 計 算 研 究 (Quantitative and Computational [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] Studies on the Chinese Language) 15-30. City University of Hong Kong. Cheng, Chin-Chuan. 2000. Frequently-used Chinese characters and language cognition. Studies in the Linguistic Sciences 30(1):107-118. Cheng, Chin-Chuan. 2002a. Language cognition and vocabulary learning. Selected Papers from the Eleventh International Symposium on Language Teaching/Fourth Pan Asia Conference 54-62. Taipei: English Teachers Association. Clark, Eve V. 1993. The Lexicon in Acquisition. Cambridge: Cambridge University Press. Dromi, Esther. 1987. Early lexical development. Cambridge: Cambridge University Press. Gelman, Susan A. and Anne Watson O'Reilly. 1988. Children's inductive inferences within superordinate categories: The role of language and category structure. Child Development 59:876-887. MacWhinney, Brian and Catherine Snow. 1985. The Child Language Data Exchange System. Journal of Child Language 12: 271-296. MacWhinney, Brian. 1995. The CHILDES Project: Tools for Analyzing Talk. 2nd ed. Hillsdale, NJ.: Lawrence Erlbaum Associates Inc. Publishers. Miller, George A. and Patricia M. Gildea. 1991. How Children Learn Words. In William S- Y. Wang, ed., The Emergence of Language Development and Evolution 150-158. New York: W. H. Freeman. Princeton University Cognitive Science Lab. 2005. WordNet Browser. Princeton: Princeton University. Tsay, Jane S. 2005. Taiwan Child Language Corpus (First Edition). National Chung Cheng University. 268
© Copyright 2025 Paperzz