Learning to Pluralize the Arabic Noun with a Self‐Training Paradigm Lisa Hesterberg & Janet Pierrehumbert Northwestern University, Evanston IL Morphology Days December 22, 2011 • Noun plural system in Modern Standard Arabic is complex and difficult to predict • Suffixed “sound” plural competes with at least 34 “broken” (non‐concatenaSve) irregular plural paUerns • Modeling this system gives insight into the types of linguisSc info that are relevant to processing • How can we model learning of this morphological system? – Is the same info used for predicSon in an adult system as for learning in a child system? – What is the rate of learning? – How much prior knowledge is needed to begin learning effecSvely? Roadmap •
•
•
•
•
Arabic morphology & noun plural system ComputaSonal model for plural predicSon Self‐training paradigm Results Discussion Arabic words have morphemes on different Sers in underlying representaSon Example: madrasa (school) Prosodic structure relevant to morphological processes (McCarthy& Prince 1990) – Plural types can be grouped by prosodic structure – Plural template derived from singular template, not from root • Ex: bait ‐> buyuut (house ‐> houses) 2 plural types: sound (suffixed) and broken – Sound: [uun] (human masculine) [aat] (other) – Broken: at least 34 different producSve paUerns • Significant structural changes to template • Non‐local processes • Ex: tˤaalib ‐> tˤullaab (student ‐> students) Large proporSon of irregulars – 18‐26% by type, 41% by token (Boudelaa & Gaskell 2002; Hesterberg & Pierrehumbert in prep.) – cf.: 2% of English noun types take irregular Large number of possible plurals for a given singular What types of info could be important in plural selecSon? – Prosodic structure of CV template (McCarthy & Prince 1990): • Claimed to drive plural‐by‐analogy for new forms – Segmental features (e.g. Albright & Hayes 2003) • Similarity affected by shared segmental features, based on natural classes (Frisch, Pierrehumbert & Broe 1996) – Gang size effects (e.g. Rumelhart & McClelland 1986) • Strength of analogy is driven by number of forms in comparison gang Generalized Context Model (Nosofsky 1990) models similarity across classes of items LinguisSc‐analysis‐specific modificaSons by Albright & Hayes (2003) and Nakisa, PlunkeU & Hahn (2001) Analogical model based on similarity to gangs, where gang is defined by sing‐plural template – E.g. CVCC ‐> CVCVVC kalb ‐> kilaab (dog ‐> dogs) bint ‐> banaat (girl ‐> girls) Human masculine (HM) nouns and other (OT) nouns should considered separately – Choice of [‐uun] vs. [‐aat] driven by semanScs, not word form Gender + Animacy Human masculine Other ‐uun broken ‐aat broken broken type broken type Similarity measured by string‐edit distance • Maximum number of changes necessary to transform form A into form B • Gradient similarity based on segmental features (from Albright & Hayes 2003)
Summed similarity of members of gang Form‐gang similarity = Summed similarity of all forms • Similar to kNN with variable‐sized k depending on gang size • Not weighted for word token frequency, only type frequency; analogy is based on type counts (Hay & Baayen 2005) Dataset: – 1735 sing‐plural pairs separated into HM and Other – Split into gangs: ≥4 forms with same sing‐plural templates • HM = 25 (s = 21, b = 4), max = 34 • NHM = 78 (s = 49, b = 29), max = 102 Cross‐validaSon: 25% test set, 75% for comparison – Results aggregated over 10 iteraSons Achieved 87.7% accuracy for HM group and 81.4% for Other group – C.f. baseline: random choice of plural type • 76.5% for HM, 68.6% for Other Self‐training usually used for large unlabeled datasets, incrementally labeling based on small labeled set Computer ‘learns’ relevant features for labeling – Unsupervised or semi‐supervised Process similar to child‐language learning: – Begin with small number of known forms – Generalize paUerns to new forms – At first many mistakes, but as known n increases, so should accuracy Modeling learning with GCM: – How does accuracy increase as known n increases? – What is the rate of change? – How does seed set size affect this process? 25% of pairs for held‐out test set (Other: 362; HM: 73) Randomly selected seed set (known forms) Learning across 8 rounds: – Calculate accuracy for test set against known set – Introduce new forms (25% of total unknown forms) – Determine easiest‐to‐predict of these • Most predictable = strongest correspondence to a gang – Most predictable 50% of new forms added to known set – Since model does not learn from seeing but not selecSng forms, remaining 50% of new forms returned to unknown set Seed set selecSon Gang occurrence based on gang size: larger gangs more likely to be in seed set • Variable size: – 10% of forms (Other = 95 ; HM = 21) – 25% of forms (Other = 238 ; HM = 54) • Role of word frequency: • “Frequency‐biased”: Exposure to forms is dependent on token count • “Unbiased”: Exposure to forms is independent of token count Results, Pre‐training 75
Baseline
25%
Biased
10%
Unbiased
25%
Unbiased
Model Performance
10%
Biased
70
75
70
Initial Round Model Performance for 'Other' Group
65
Model Performance
80
85
Initial Round Model Performance for 'Human Masculine' Group
Baseline
10%
Biased
25%
Biased
10%
Unbiased
25%
Unbiased
Results, Human Masculine Group Model Performance for 'Human Masculine' Group
80
85
Full Model Performance
Baseline
75
Model Performance
90
10% Biased
25% Biased
10% Unbiased
25% Unbiased
0
2
4
Performance by Round
6
8
Results, Other Group Model Performance for 'Other' Group
10% Biased
25% Biased
10% Unbiased
25% Unbiased
75
70
Model Performance
80
Full Model Performance
Baseline
0
4
Performance by Round
6
Above‐baseline accuracy even with very small seed set (10% seed set = 21, 54 forms) Unbiased models perform beUer – Why? Lower number of broken plurals Frequency‐biased models more realisSc – Broken plurals are very difficult for children (e.g. Ravid & Farah 1999) – Reflected in model: sound plural easily learned, broken plural more difficult Learning someSmes drives accuracy down – Small seed set results in small number of gangs • Ex: 10% has 11/25 gangs for HM, 39/75 for Other – Current model has no way to add new gangs DistribuSon of gangs also affects learning – Zipfian, so a few large gangs account for many of forms – Large gangs drive learning in this model, but both broken and sound gangs play a role Next steps • Semi‐supervised paradigm – Feedback is important part of learning • Collapse across HM and NHM? – Naïve learner lacks full knowledge of word gender • Can we predict which plural paUerns are learned first? – Comparison to corpus of Arabic child speech Thank you! • Doug Downey (EECS) • Bryan Pardo and classmates in EECS 349 for comments and suggesSons • MaU Goldrick • PhonaScs Discussion Group References •
Al‐SulaiS, L. (2009). Corpus of Contemporary Arabic. Retrieved from hUp://www.comp.leeds.ac.uk/eric/laSfa/CCA_raw_uz8.txt •
Boudelaa, S., & Gaskell, M. (2002). A re‐examinaSon of the default system for Arabic plurals. Language and Cogni8ve Processes, 17(3), 321‐343. •
Ernestus, M., & Baayen, H. (2003). PredicSng the unpredictable: InterpreSng neutralized segments in Dutch. Language, 79(1), 5‐38. •
Frisch, S., Pierrehumbert, J., & Broe, M. (2004). Similarity avoidance and the OCP. Natural Language & Linguis8cs Theory, 22, 179‐228. •
Hay, J. B., & Baayen, R. H. (2005). Shi{ing paradigms: gradient structure in morphology. Trends in Cogni8ve Sciences, 9, 342‐348. •
McCarthy, J., & Prince, A. (1990). Foot and word in prosodic morphology: The Arabic broken plural. Natural Language & Linguis8c Theory, 8(2), 209‐283. •
Nakisa, R., PlunkeU, K., & Hahn, U. (2001). A cross‐linguisSc comparison of single and dual‐route models of inflecSonal morphology. In P. Broeder & J. Murre (Eds.), Models of language acquisi8on: Induc8ve and deduc8ve approaches. Cambridge, MA: MIT Press. •
Nosofsky, R. (1990). RelaSons between exemplar‐similarity and likelihood models of classificaSon. Journal of Mathema8cal Psychology, 34, 393‐418. •
Rumelhart, D., & McClelland, J. (1986). On learning the past tenses of English verbs: Implicit rules or parallel distributed processing? In J. McClelland, D. Rumelhart & P. R. Group (Eds.), Parallel distributed processing: Explora8ons in the microstructure of cogni8on. Cambridge, MA: MIT Press.
© Copyright 2026 Paperzz