Ontogenetic Lexical Development in Taiwan Southern Min

___________________________________________________________________________
Proceedings of 12th Chinese Lexical Semantics Workshop
Ontogenetic Lexical Development in Taiwan Southern Min
Jane Tsay
Chin-Chuan Cheng
Graduate Institute of Linguistics, National Chung Cheng University; Graduate Institute of Teaching Chinese as a
Second language, Taiwan Normal University; Taiwan
[email protected]; [email protected]
Abstract: This is a corpus-based study of young children's ontogenetic lexical development in Taiwan
Southern Min. A Corpus containing longitudinal data of 14 young children approximately one and a half
to four years old acquiring Southern Min Chinese as their first language was used to study vocabulary size
and growth, and word-sense induction for superordinate categories. The vocabulary growth rate for all the
individuals varied a little, but all of them acquired about 2,000 words at age 4. With regard to word senses,
the children were able to relate senses of similar words, but they did not learn or explicitly form
superordinate categories.
Keywords: Language acquisition, database, Southern Min, vocabulary size and growth, lexical semantic
organization, superordinate categories.
1. Introduction
This is a corpus-based study of young children's vocabulary development or
word acquisition. When we say that a child has acquired a word, it usually means
that the child knows how the word is pronounced, what it means, and how it is used in
larger linguistic units such as phrases, sentences, and discourse. In this paper we are
interested in finding out the vocabulary size of young children in the first four years
of life, as compared to that of the adults. We are also interested in the vocabulary
growth of Chinese speaking children, as compared to that of English. Moreover, we
will examine the development of word senses in terms of basic-level words and
superordinate categories in a lexical semantic hierarchy.
Cheng (1996, 2000, 2002a) has found that human cognition of linguistic symbols
has an upper bound of 8,000 units (basic words). This is attested both by works of
English and Chinese writers. If this is the case for adults, what is the vocabulary size
of young children? This question has motivated the current study. For English
speaking children, Susan Carey claims that by age six the average child has learned
over 14,000 words (including inflected and derived words) or 8,000 root words
(Carey 1978:264; see also Miller and Gildea 1991). A related issue on children's
vocabulary size is the rate of vocabulary growth. Clark (1993) has reported the rate of
learning new words in young children. We will compare the rate of vocabulary growth
of Taiwanese children with that of children of other languages.
261
___________________________________________________________________________
Proceedings of 12th Chinese Lexical Semantics Workshop
2. Data Source
Data of young children's vocabulary development used in this study are gleaned
from the Taiwan Child Language Corpus (TAICORP, Tsay 2005). The corpus
contains longitudinal data of young children (approximately one and a half to four
years old) acquiring Southern Min Chinese (also called Taiwan Southern Min or
Taiwanese in the literature) as their first language.
The construction of this corpus was funded by the National Science Council,
Taiwan. The corpus is based on spontaneous speech of young children at play. All of
the fourteen children, nine boys and five girls, who participated in the corpus project
came from Southern Min speaking families in Minxiong township in Jiayi County of
southern Taiwan. The basic information is listed in Table 1 below.
Table 1. Children in TAICORP
Name
Sex
YDA
M
YCX
M
LJX
M
ZQM
M
LMC
F
YJK
M
CEY
F
HBL
M
LWJ
F
WZX
M
YSW
M
TWX
F
HYS
M
LYC
F
Total 9 boys, 5 girls
Age
Sessions
3;11-4;4
3;10-4;0
3;9-4;2
2;9-4;6
2;8-5;3
2;6-2;6
2;1-3;10
2;1-4;0
2;1-3;7
2;1-4;3
1;7-2;7
1;5-3;6
1;2-3;4
1;2-3;3
9
6
8
30
50
2
37
45
36
44
21
44
51
48
431
Duration
(minutes)
540
285
530
1584
2045
105
1728
1889
1777
1757
1210
1829
2280
2255
approximately 330 hours
Recordings were made through home visits on a regular basis, every two weeks
for children before three years old and every three weeks for children older than three
years old. The participants, in addition to the child, included the care-taker (parents
or grandparents) and the observing linguist. Each recording is approximately 40-60
minutes in length, transcribed into both Chinese characters and Southern Min
romanization by the observer. The transcribed files are double-checked by two
linguists. The total recording time of the corpus is about 330 hours, consisting of 431
text files, and corresponding 431 sound files. The whole corpus is in CHILDES
format, i.e., Child Language Data Exchange System (MacWhinney and Snow 1985,
MacWhinney 1995).
262
___________________________________________________________________________
Proceedings of 12th Chinese Lexical Semantics Workshop
3. Cumulative Vocabulary
As mentioned above, Cheng has found that human cognition of linguistic
symbols has an upper bound of 8,000 units (basic words). The lexical entries in
TAICORP are word-based. The total number of words produced by the children as
collected in the corpus is 8,250. This is the cumulative vocabulary of 14 children aged
between 1;2 and 5;3. For comparison, the cumulative vocabulary of adults
(parents/grandparents of the child and/or the observer) in the corpus is 12,882. The
mean length of utterance measured in words (MLU) is also given in Table 2.
Table 2 Cumulative Vocabulary of Children and Adults in TAICORP
Lines
MLU
Tokens
Words
(cumulative vocabulary)
Children 161,253
2.695
434,557
8,250
Adults
336,173
3.605 1,211,946
12,882
Total
497,426
3.150
1,646,503
14,172
Note that the vocabulary size of both the adults and the children (14,172 words)
is not the same as adding up their separate vocabularies, i.e., 8,250 words for the
children and 12,882 words for the adults. This is because the vocabulary of the
children overlaps largely with that of the adults. In Table 3 we show the vocabulary
size of 8 children.
Table 3. Children’s cumulative vocabulary
Child
Sex
Age
Session
LYC
HYS
TWX
LWJ
CEY
HBL
WZX
F
M
F
F
F
M
M
1;2 – 3;3
1;2 – 3;4
1;5 – 3;6
2;1 – 3;7
2;1 – 3;10
2;1 – 4;0
2;1 – 4;0
48
51
44
36
37
45
41
Duration
(minutes)
2255
2280
1829
1777
1728
1889
1670
Words
1417
1743
1942
1439
1213
1866
2156
Overall, the vocabulary size of these young children before age four is smaller
than (or just a little bit above) 2,000. TAICORP does not have data of six-year-old
children. The oldest child LMC had a vocabulary size of 1,974 lexical items at the age
of 5;3 and the second oldest ZQM had 1,476 at the age of 4;6.
These numbers seem to be quite small considering Susan Carey's claim that by
age six the average child of English has learned over 14,000 words (including
263
___________________________________________________________________________
Proceedings of 12th Chinese Lexical Semantics Workshop
inflected and derived words) or 8,000 root words. The cross-linguistic comparison
seems to be tricky here. It seems reasonable to assume that children across languages
should have similar vocabulary size. Although the children in TAICORP are much
younger (mostly of the recordings were made before 4 years old) than 6;0, it is hard to
imagine, for example, how LMC's vocabulary would increase from less than 2,000
words to 14,000 in nine months. At this point, it is not clear why this big gap exists
between these two groups of children. One confounding factor is the different
morphological systems of these two languages. However, this factor alone does not
seem enough to explain this big gap. More data, especially compatible data, are
needed before we can draw any conclusions.
Clark (1993) discusses two young children's vocabulary growth. One child Keren
(reported in Dromi 1987) produced up to 337 new words by 1;5.23, while the other
child Damon produced up to 337 new words by 1;9.24. We compared the
vocabulary growth of the two children and the two Taiwanese children LYC and HYS.
We found that Keren reached 337 new words around 1;5, Damon around 1;9, LYC
between 1;5 and 1;6, and HYS between 1;6 and 1;7 as shown in Table 4.
Table 4. Cumulative vocabulary and vocabulary growth
1;2
Keren
Damon
1;3
1;4
girl Hebrew
boy English
1;5
337
1;6
1;7
1;8
1;9
337
LYC
girl
TSM
45
76
153
240
358
367
468
518
HYS
boy
TSM
60
76
91
165 275
460
522
611
Among the four children, Damon's development at this stage is slower than that
of the others. However, the individual differences among the children seem to be in a
reasonable range. Moreover, the child that started with a higher growth rate did not
take lead all the way. For example, LYC had a larger vocabulary than HYS at 1;6.
However, HYS took the lead starting in 1;7.
4. Acquisition of Word Senses
When an adult native speaker of a language knows the meaning of a word, the
person knows other words that are in some way related to the senses of that particular
word. Near synonyms are examples of word relations. Furthermore, synonyms
may belong to a category of senses. Such a category is the superordinate term and
may often have a word for its name. For example, WordNet (Princeton University
Cognitive Science Lab 2005) gives a sense of the word “hand” as follows with
synonyms and superordinate terms:
264
___________________________________________________________________________
Proceedings of 12th Chinese Lexical Semantics Workshop
hand, manus, mitt, paw -- (the (prehensile) extremity of the superior limb
=> extremity -- (that part of a limb that is farthest from the torso)
=> external body part -- (any body part visible externally)
=> body part -- (any part of an organism)
=> part, piece -- (a portion of a natural object)
=> thing -- (a separate and self-contained entity)
=> physical entity -- (has physical existence)
=> entity -- (perceived or known or inferred to)
The basic-level words are “hand, manus, mitt, paw” as near synonyms. The
superordinate categories up the hierarchy are “extremity”, “external body part”, “body
part”, “part, piece”, “thing”, “physical entity” and “entity”. We are interested in how
and when a child acquires near synonyms and superordinate concepts or words. We
now return to TAICORP to examine the learning of Southern Min words 手 “hand”
and 跤 “foot”.
In Standard Mandarin or Modern Chinese, the word corresponding to Southern
Min 跤 is 腳. In our database of Modern Chinese word senses the words 手 and
腳 is each given a sense hierarchy of superordinate terms as:
手 雙手 手爪 / 手 / 四肢 臂 手 腿 腳 / 全身 / 物
腳 足 趾 腳丫子 腳鴨子 / 腳 / 四肢 臂 手 腿 腳 / 全身 / 物
The semantic structure for 手 has the basic-level words consisting of near
synonyms 手 雙手 手爪. The next superordinate term is appropriately 手. Further
up the hierarchy are the superordinate terms 四肢 臂 手 腿 腳 as a category, 全身,
and 物. The higher level terms for 腳 shares the superordinate terms for 手.
Chinese dialects share many common words and senses. Mandarin and
Southern Min share the word 手. The word 腳 is said as 跤 in Southern Min.
Moreover, colloquial Southern Min uses 跤手 for literary 四肢, and 臂 is rarely
heard of. Thus we will change the terms in the hierarchy to make the sense structure
of these two words for Sothern Min as follows:
手 雙手 手爪 / 手 / 跤手 手 腿 跤 / 全身 / 物
跤 足 趾 / 跤 / 跤手 手 腿 跤 / 全身 / 物
Our goal here is to find out from the TAICORP data if the children younger than
age 5 can learn the higher superordinate terms and have structure of categories. In
the child development literature, Carey (1985) states that children increasingly
develop an ability to use superordinate categories as the basis for induction between
ages 4 and 10. Our children were younger and did not belong to this age group.
We cannot readily conclude that these children before age 4 have not formed some
265
___________________________________________________________________________
Proceedings of 12th Chinese Lexical Semantics Workshop
structure of word senses. Gelman and O'Reilly’s (1988) experimental studies
suggest that preschool children assume that basic-level words share internal parts.
Arias-Trejo and Plunkett (2009) states that infants have begun to develop
semantic-associative links between lexical items as early as 21 months of age. Many
of the studies of superordinate terms involved experiments and referred to
superordinate without a formal semantic structure. In contrast we use TAICORP
and explicit Chinese lexical semantic organization to pursue the issue here.
Here is a session of recording with the child TWX at age 2;2.25. First let us
explain the code in the following presentation of dialogue. Code for participants: CHI:
the target child, INV: investigator, GRM: the child's grandma (caretaker). Notations in
the gloss: CLS – classifiers, PT – particles, [m] – speech in Mandarin, yyy –
unrecognizable speech. First the investigator asked a child, “How many legs does
the tiger have?”. The child playfully replied “five”. The conversation continued:
*INV: 啥物?
what
"What?"
*CHI: 四
支
four
CLS
"Four legs."
…
*GRM: 伊
四
跤
foot/leg
支
跤
it
four
CLS foot/leg
"Its four legs include what?"
*CHI: 有
這.
have this
"Have this."
*INV: henn
PT (confirmation)
"Yes."
*CHI: 手.
hand
"Hand(s)."
*GRM: 有
手
有
啥?
have what
a.
PT
oo.
have
hand
PT (doubt)
"Have hand(s)?"
*CHI: hm.
PT (confirmation)
"Yes."
266
___________________________________________________________________________
Proceedings of 12th Chinese Lexical Semantics Workshop
*INV:
hoo,
伊
四
支
攏
跤
la.
PT it
four
CLS all
foot/leg
"Oh. Its four (CLS) should be all legs."
*CHI: 伊
四
支
攏
跤.
PT (correction)
it
four CLS all foot/leg
"Its four (CLS) are all legs."
The child said some of the feet were hands 手. Apparently the child relate 手
and foot 跤 semantically. While these two words were grouped in one category, the
superordinate term 跤手 never occurred in her speech. Her occurrences of words
are listed in Table 5.
Table 5 Words of 手 and 跤 in TWX’s speech with some dates omitted
手
手工
手機仔
手錶仔
放手
洗手
牛跤
四跤仔
灶跤
架跤
塗跤
跤
跤踏
跤踏車
樓跤
豬跤
褪赤跤
2;1 2;2 2;3 3;0 3;1 3;2 3;3 3;5 3;6 3;11 4;1 4;2 4;3
3
5
7
6
1
1
1
4
1
1
1
3
1
1
3
5
2
5
1
3
2
3
1
4
2
11
1
2
1
3
1
4
4
1
4
7
1
1
2
1
3
3
1
1
1
2
3
3
1
1
1
1
As can be seen from the table, both 手and跤 were learned early, in this case age
2;1. But these two words never appeared as a compound word or phrase to refer to the
superordinate “limb” which in Southern Min is 跤手 in adult speech. The recording
of this child ended in age 4;3. We therefore can say that before age 4 superordinate
terms are not readily observable. However, the absence of such a term does not
necessarily imply that children before age 4 lacks mental ability to perceive senses in
certain relationship. The data from TWX as presented above indicate that she
267
___________________________________________________________________________
Proceedings of 12th Chinese Lexical Semantics Workshop
already have the category of “limb” in her mind.
5. Conclusions
To conclude, the vocabulary growth rate for all the Sothern Min children varied a
little, but all of them acquired about 2,000 words at age 4. With regard to word senses,
the children were able to relate senses of similar words, but they did not learn or
explicitly form superordinate categories before age 4.
References
[1]
[2]
[3]
[4]
Arias-Trejo N., Plunkett K. 2009 Lexical–semantic priming effects during infancy. Philosophical
Transactions of the Royal Society B 364:3633–3647.
Carey, Susan. 1978. The child as word learner. In Morris Halle, Joan Bresnan, and George A.
Miller, eds. Linguistic Theory and Psychological Reality 264-293. Cambridge, MA: MIT Press.
Carey, Susan. 1985. Conceptual change in childhood. Cambridge, MA: Bradford
Cheng, Chin-Chuan. 1998. 從 計 量 理 解 語 言 認 知 (Quantification for understanding
language cognition). In Benjamin K. T’sou, Tom B. Y. Lai, Samuel W. K. Chan, and
William S-Y. Wang, eds., 漢 語 計 量 與 計 算 研 究 (Quantitative and Computational
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
Studies on the Chinese Language) 15-30. City University of Hong Kong.
Cheng, Chin-Chuan. 2000. Frequently-used Chinese characters and language cognition.
Studies in the Linguistic Sciences 30(1):107-118.
Cheng, Chin-Chuan. 2002a. Language cognition and vocabulary learning. Selected
Papers from the Eleventh International Symposium on Language Teaching/Fourth Pan
Asia Conference 54-62. Taipei: English Teachers Association.
Clark, Eve V. 1993. The Lexicon in Acquisition. Cambridge: Cambridge University
Press.
Dromi, Esther. 1987. Early lexical development. Cambridge: Cambridge University
Press.
Gelman, Susan A. and Anne Watson O'Reilly. 1988. Children's inductive inferences
within superordinate categories: The role of language and category structure. Child
Development 59:876-887.
MacWhinney, Brian and Catherine Snow. 1985. The Child Language Data Exchange
System. Journal of Child Language 12: 271-296.
MacWhinney, Brian. 1995. The CHILDES Project: Tools for Analyzing Talk. 2nd
ed. Hillsdale, NJ.: Lawrence Erlbaum Associates Inc. Publishers.
Miller, George A. and Patricia M. Gildea. 1991. How Children Learn Words. In William
S- Y. Wang, ed., The Emergence of Language Development and Evolution 150-158. New
York: W. H. Freeman.
Princeton University Cognitive Science Lab. 2005. WordNet Browser. Princeton:
Princeton University.
Tsay, Jane S. 2005. Taiwan Child Language Corpus (First Edition). National Chung
Cheng University.
268