Computational modeling of the Arabic noun plural and

Learning to Pluralize the Arabic Noun with a Self‐Training Paradigm Lisa Hesterberg & Janet Pierrehumbert Northwestern University, Evanston IL Morphology Days December 22, 2011 •  Noun plural system in Modern Standard Arabic is complex and difficult to predict •  Suffixed “sound” plural competes with at least 34 “broken” (non‐concatenaSve) irregular plural paUerns •  Modeling this system gives insight into the types of linguisSc info that are relevant to processing •  How can we model learning of this morphological system? –  Is the same info used for predicSon in an adult system as for learning in a child system? –  What is the rate of learning? –  How much prior knowledge is needed to begin learning effecSvely? Roadmap • 
• 
• 
• 
• 
Arabic morphology & noun plural system ComputaSonal model for plural predicSon Self‐training paradigm Results Discussion Arabic words have morphemes on different Sers in underlying representaSon Example: madrasa (school) Prosodic structure relevant to morphological processes (McCarthy& Prince 1990) –  Plural types can be grouped by prosodic structure –  Plural template derived from singular template, not from root •  Ex: bait ‐> buyuut (house ‐> houses) 2 plural types: sound (suffixed) and broken –  Sound: [uun] (human masculine) [aat] (other) –  Broken: at least 34 different producSve paUerns •  Significant structural changes to template •  Non‐local processes •  Ex: tˤaalib ‐> tˤullaab (student ‐> students) Large proporSon of irregulars –  18‐26% by type, 41% by token (Boudelaa & Gaskell 2002; Hesterberg & Pierrehumbert in prep.) –  cf.: 2% of English noun types take irregular Large number of possible plurals for a given singular What types of info could be important in plural selecSon? –  Prosodic structure of CV template (McCarthy & Prince 1990): •  Claimed to drive plural‐by‐analogy for new forms –  Segmental features (e.g. Albright & Hayes 2003) •  Similarity affected by shared segmental features, based on natural classes (Frisch, Pierrehumbert & Broe 1996) –  Gang size effects (e.g. Rumelhart & McClelland 1986) •  Strength of analogy is driven by number of forms in comparison gang Generalized Context Model (Nosofsky 1990) models similarity across classes of items LinguisSc‐analysis‐specific modificaSons by Albright & Hayes (2003) and Nakisa, PlunkeU & Hahn (2001) Analogical model based on similarity to gangs, where gang is defined by sing‐plural template –  E.g. CVCC ‐> CVCVVC kalb ‐> kilaab (dog ‐> dogs) bint ‐> banaat (girl ‐> girls) Human masculine (HM) nouns and other (OT) nouns should considered separately –  Choice of [‐uun] vs. [‐aat] driven by semanScs, not word form Gender + Animacy Human masculine Other ‐uun broken ‐aat broken broken type broken type Similarity measured by string‐edit distance •  Maximum number of changes necessary to transform form A into form B •  Gradient similarity based on segmental features (from Albright & Hayes 2003)
Summed similarity of members of gang Form‐gang similarity = Summed similarity of all forms •  Similar to kNN with variable‐sized k depending on gang size •  Not weighted for word token frequency, only type frequency; analogy is based on type counts (Hay & Baayen 2005) Dataset: –  1735 sing‐plural pairs separated into HM and Other –  Split into gangs: ≥4 forms with same sing‐plural templates •  HM = 25 (s = 21, b = 4), max = 34 •  NHM = 78 (s = 49, b = 29), max = 102 Cross‐validaSon: 25% test set, 75% for comparison –  Results aggregated over 10 iteraSons Achieved 87.7% accuracy for HM group and 81.4% for Other group –  C.f. baseline: random choice of plural type •  76.5% for HM, 68.6% for Other Self‐training usually used for large unlabeled datasets, incrementally labeling based on small labeled set Computer ‘learns’ relevant features for labeling –  Unsupervised or semi‐supervised Process similar to child‐language learning: –  Begin with small number of known forms –  Generalize paUerns to new forms –  At first many mistakes, but as known n increases, so should accuracy Modeling learning with GCM: –  How does accuracy increase as known n increases? –  What is the rate of change? –  How does seed set size affect this process? 25% of pairs for held‐out test set (Other: 362; HM: 73) Randomly selected seed set (known forms) Learning across 8 rounds: –  Calculate accuracy for test set against known set –  Introduce new forms (25% of total unknown forms) –  Determine easiest‐to‐predict of these •  Most predictable = strongest correspondence to a gang –  Most predictable 50% of new forms added to known set –  Since model does not learn from seeing but not selecSng forms, remaining 50% of new forms returned to unknown set Seed set selecSon Gang occurrence based on gang size: larger gangs more likely to be in seed set •  Variable size: –  10% of forms (Other = 95 ; HM = 21) –  25% of forms (Other = 238 ; HM = 54) •  Role of word frequency: •  “Frequency‐biased”: Exposure to forms is dependent on token count •  “Unbiased”: Exposure to forms is independent of token count Results, Pre‐training 75
Baseline
25%
Biased
10%
Unbiased
25%
Unbiased
Model Performance
10%
Biased
70
75
70
Initial Round Model Performance for 'Other' Group
65
Model Performance
80
85
Initial Round Model Performance for 'Human Masculine' Group
Baseline
10%
Biased
25%
Biased
10%
Unbiased
25%
Unbiased
Results, Human Masculine Group Model Performance for 'Human Masculine' Group
80
85
Full Model Performance
Baseline
75
Model Performance
90
10% Biased
25% Biased
10% Unbiased
25% Unbiased
0
2
4
Performance by Round
6
8
Results, Other Group Model Performance for 'Other' Group
10% Biased
25% Biased
10% Unbiased
25% Unbiased
75
70
Model Performance
80
Full Model Performance
Baseline
0
4
Performance by Round
6
Above‐baseline accuracy even with very small seed set (10% seed set = 21, 54 forms) Unbiased models perform beUer –  Why? Lower number of broken plurals Frequency‐biased models more realisSc –  Broken plurals are very difficult for children (e.g. Ravid & Farah 1999) –  Reflected in model: sound plural easily learned, broken plural more difficult Learning someSmes drives accuracy down –  Small seed set results in small number of gangs •  Ex: 10% has 11/25 gangs for HM, 39/75 for Other –  Current model has no way to add new gangs DistribuSon of gangs also affects learning –  Zipfian, so a few large gangs account for many of forms –  Large gangs drive learning in this model, but both broken and sound gangs play a role Next steps •  Semi‐supervised paradigm –  Feedback is important part of learning •  Collapse across HM and NHM? –  Naïve learner lacks full knowledge of word gender •  Can we predict which plural paUerns are learned first? –  Comparison to corpus of Arabic child speech Thank you! •  Doug Downey (EECS) •  Bryan Pardo and classmates in EECS 349 for comments and suggesSons •  MaU Goldrick •  PhonaScs Discussion Group References • 
Al‐SulaiS, L. (2009). Corpus of Contemporary Arabic. Retrieved from hUp://www.comp.leeds.ac.uk/eric/laSfa/CCA_raw_uz8.txt • 
Boudelaa, S., & Gaskell, M. (2002). A re‐examinaSon of the default system for Arabic plurals. Language and Cogni8ve Processes, 17(3), 321‐343. • 
Ernestus, M., & Baayen, H. (2003). PredicSng the unpredictable: InterpreSng neutralized segments in Dutch. Language, 79(1), 5‐38. • 
Frisch, S., Pierrehumbert, J., & Broe, M. (2004). Similarity avoidance and the OCP. Natural Language & Linguis8cs Theory, 22, 179‐228. • 
Hay, J. B., & Baayen, R. H. (2005). Shi{ing paradigms: gradient structure in morphology. Trends in Cogni8ve Sciences, 9, 342‐348. • 
McCarthy, J., & Prince, A. (1990). Foot and word in prosodic morphology: The Arabic broken plural. Natural Language & Linguis8c Theory, 8(2), 209‐283. • 
Nakisa, R., PlunkeU, K., & Hahn, U. (2001). A cross‐linguisSc comparison of single and dual‐route models of inflecSonal morphology. In P. Broeder & J. Murre (Eds.), Models of language acquisi8on: Induc8ve and deduc8ve approaches. Cambridge, MA: MIT Press. • 
Nosofsky, R. (1990). RelaSons between exemplar‐similarity and likelihood models of classificaSon. Journal of Mathema8cal Psychology, 34, 393‐418. • 
Rumelhart, D., & McClelland, J. (1986). On learning the past tenses of English verbs: Implicit rules or parallel distributed processing? In J. McClelland, D. Rumelhart & P. R. Group (Eds.), Parallel distributed processing: Explora8ons in the microstructure of cogni8on. Cambridge, MA: MIT Press.