Motivation Compound Data and Ratings Vector Space Models Exploring Vector Space Models to Predict the Compositionality of German Noun-Noun Compounds Sabine Schulte im Walde, Stefan Müller, Stephen Roller Institut für Maschinelle Sprachverarbeitung (IMS) Universität Stuttgart, Germany *SEM, Atlanta June 13-14, 2013 Sabine Schulte im Walde, Stefan Müller, Stephen Roller German Compound Nouns – VSMs and Compositionality Motivation Compound Data and Ratings Vector Space Models Overview • Motivation and Background • Description of Compositionality Ratings & Data Sets • Eval & Baselines • Predicting Compound-Constituent Ratings • POS Feature comparison • Syntax Feature Comparison • Predicting Compound Whole Ratings • Conclusions Sabine Schulte im Walde, Stefan Müller, Stephen Roller German Compound Nouns – VSMs and Compositionality Motivation Compound Data and Ratings Vector Space Models Motivation • Vector Space Models (VSMs): explore the notion of “similarity” between a set of target objects within a geometric setting (Turney and Pantel, 2010; Erk, 2012). • Distributional Semantics: exploit the distributional hypothesis (Firth, 1957; Harris, 1968) to determine co-occurrence features for vector space models that best describe the words, phrases, sentences, etc. of interest. • Salient Distributional Features in VSMs: general knowledge about useful features, but not across phenomena. • Linguist - Computational Linguist loop • Phenomenon: German noun-noun compounds, such as Feuerwerk ‘fire works’ (Feuer ‘fire’ + Werk ‘opus’). Sabine Schulte im Walde, Stefan Müller, Stephen Roller German Compound Nouns – VSMs and Compositionality Motivation Compound Data and Ratings Vector Space Models Hypotheses 1 Targets in the vector space models are nouns (compound nouns, modifier nouns, head nouns) → adjectives and verbs provide most salient features, → syntax-based outperforms window-based. 2 Contributions of modifier noun vs. head noun: distributional properties of heads are more salient than distributional properties of modifiers in predicting the degree of compositionality of the compounds. Sabine Schulte im Walde, Stefan Müller, Stephen Roller German Compound Nouns – VSMs and Compositionality Motivation Compound Data and Ratings Vector Space Models German Noun-Noun Compounds • German noun-noun compounds: • combinations of two or more simplex nouns • grammatical head is a noun (German: rightmost constituent) • modifier is a noun • Examples: Ahornblatt ‘maple leaf’, Obstkuchen ‘fruit cake’ • Degree of Compositionality: semantic relatedness between compound meaning and meanings of constituents • Examples (T=transparent; O=opaque): TT Ahornblatt ‘maple+leaf’ OO Löwenzahn ‘lion+tooth → dandelion’ TO Feuerzeug ‘fire+stuff → lighter’ OT Fliegenpilz ‘fly+mushroom → toadstool’ • Dataset: 244 two-part noun-noun compounds Sabine Schulte im Walde, Stefan Müller, Stephen Roller German Compound Nouns – VSMs and Compositionality Motivation Compound Data and Ratings Vector Space Models Compositionality Ratings Two collections: 1 Compound–Constituent Ratings 2 Compound Whole Ratings Sabine Schulte im Walde, Stefan Müller, Stephen Roller German Compound Nouns – VSMs and Compositionality Motivation Compound Data and Ratings Vector Space Models Compound–Constituent Ratings • Material: 450 concrete, depictable German noun compounds • (We use a subset of these) • Participants: 30 per compound • Task: degree of compositionality of the compounds with respect to their first as well as their second constituent • Scale: 1 (definitely opaque) to 7 (definitely transparent) • Mode: paper+pen • Data: rating means and standard deviation Sabine Schulte im Walde, Stefan Müller, Stephen Roller German Compound Nouns – VSMs and Compositionality Motivation Compound Data and Ratings Vector Space Models Compound Whole Ratings • Material: 244 noun-noun compounds (subset of above) • Participants: 27–34 per compound • Task: degree of compositionality of the compounds as a whole • Scale: 1 (definitely opaque) to 7 (definitely transparent) • Mode: Amazon Mechanical Turk (AMT) • Data: rating means and standard deviation Sabine Schulte im Walde, Stefan Müller, Stephen Roller German Compound Nouns – VSMs and Compositionality Motivation Compound Data and Ratings Vector Space Models Compositionality Ratings: Examples whole Compounds literal meanings of constituents Mean Ratings and Standard Deviations whole modifier head Ahornblatt ‘maple leaf’ maple leaf 6.03 ± 1.49 5.64 ± 1.63 5.71 ± 1.70 Löwenzahn ‘dandelion’ lion tooth 1.66 ± 1.54 2.10 ± 1.84 2.23 ± 1.92 Fliegenpilz ‘toadstool’ fly/bow tie mushroom 2.00 ± 1.20 1.93 ± 1.28 6.55 ± 0.63 Feuerzeug ‘lighter’ fire stuff 4.58 ± 1.75 5.87 ± 1.01 1.90 ± 1.03 Sabine Schulte im Walde, Stefan Müller, Stephen Roller German Compound Nouns – VSMs and Compositionality Motivation Compound Data and Ratings Vector Space Models Compositionality Ratings: Distribution (1) Sabine Schulte im Walde, Stefan Müller, Stephen Roller German Compound Nouns – VSMs and Compositionality Motivation Compound Data and Ratings Vector Space Models Vector Space Models: Setup • Goal: use VSM to identify salient distributional features to predict the degree of compositionality of the compounds • Corpora: two German web corpora • Feature Values: local mutual information (Evert, 2005) of co-occurrence counts (between target nouns and features): LMI = O × log O E • Measure of Relatedness: cosine ∼ degree of compositionality • Evaluation: cosine against human ratings; Spearman Rank-Order Correlation Coefficient ρ (Siegel and Castellan, 1988) Sabine Schulte im Walde, Stefan Müller, Stephen Roller German Compound Nouns – VSMs and Compositionality Motivation Compound Data and Ratings Vector Space Models Baseline and Upper Bound Upper Bound: correlations between human ratings: whole ∼ compound–modifier; whole ∼ compound–head addition/multiplication: whole ∼ compound–modifier +/× compound–head Baseline: random assignment of rating values [1,7] to compound–modifier and compound–head pairs; correlation of random values against human ratings addition/multiplication: whole ∼ rand(compound–modifier) +/× rand(compound–head) Sabine Schulte im Walde, Stefan Müller, Stephen Roller German Compound Nouns – VSMs and Compositionality Motivation Compound Data and Ratings Vector Space Models Baseline and Upper Bound Function modifier only head only addition multiplication Baseline .0959 .1019 .1168 .1079 Sabine Schulte im Walde, Stefan Müller, Stephen Roller ρ Upper Bound .6002 .1385 .7687 .7829 German Compound Nouns – VSMs and Compositionality Motivation Compound Data and Ratings Vector Space Models Corpus Data: German Web Corpora 1 sdeWaC (Faaß et al., 2010) • cleaned and parsed version of the German web corpus deWaC created by the WaCky group (Baroni et al., 2009) • corpus cleaning: removing duplicates; disregarding syntactically ill-formed sentences; etc. • size: approx. 880 million words • disadvantage: sentences in the corpus are sorted alphabetically → window co-occurrence refers to x words to left and right BUT within the same sentence 2 WebKo • predecessor version of sdeWaC • size: approx. 1.5 billion words • disadvantage: less clean and not parsed Sabine Schulte im Walde, Stefan Müller, Stephen Roller German Compound Nouns – VSMs and Compositionality Motivation Compound Data and Ratings Vector Space Models Window-based VSMs • Hypothesis 1 (i): adjectives and verbs provide most salient features (for describing noun compounds) • Task: compare parts-of-speech in predicting compositionality • Setup: • specification of corpus, part-of-speech and window size • determine co-occurrence counts and calculate lmi values • parts-of-speech: common nouns, adjectives, main verbs • window sizes: 1, 2, 5, 10, 20 (, . . . 100) • basis: lemmas; no punctuation Sabine Schulte im Walde, Stefan Müller, Stephen Roller German Compound Nouns – VSMs and Compositionality Motivation Compound Data and Ratings Vector Space Models Window-based VSMs: Results • NN > NN+ADJ+VV > VV > ADJ (significant) • window sizes: 100 = 50 ∼ 20 > 10 > 5 > 2 > 1 • WebKo > sdeWaC (significant; also with sentence-internal windows) • best result: ρ = 0.6497 (WebKo, NN, window size: 20) Sabine Schulte im Walde, Stefan Müller, Stephen Roller German Compound Nouns – VSMs and Compositionality Motivation Compound Data and Ratings Vector Space Models Hypothesis 1 (ii): syntax-based features outperform window-based features Task: compare the two co-occurrence conditions Setup: • corpus choice: sdeWaC (parsed) • specification of syntactic function • determine co-occurrence counts and calculate lmi values • syntactic functions (VS features): • nouns in verb subcategorisation: • • • • transitive and intransitive subjects concatenation of both trans/intrans features (all subjects) direct objects PP objects • noun-modifying adjectives • noun-modifying and noun-modified prepositions Sabine Schulte im Walde, Stefan Müller, Stephen Roller German Compound Nouns – VSMs and Compositionality Motivation Compound Data and Ratings Vector Space Models Syntax-based VSMs: Results Sabine Schulte im Walde, Stefan Müller, Stephen Roller German Compound Nouns – VSMs and Compositionality Motivation Compound Data and Ratings Vector Space Models Syntax-based VSMs: Results • window-based > syntax-based • noun-modifying adjectives ∼ adjectives in window 20 • verbs in window 20 > verb subcategorisation; best verb subcategorisation function: direct object • abstracting over subject (in)transitivity > specific functions • concatenation worse than the best individual functions Sabine Schulte im Walde, Stefan Müller, Stephen Roller German Compound Nouns – VSMs and Compositionality Motivation Compound Data and Ratings Vector Space Models Role of Modifiers vs. Heads (1) • Hypothesis 2: distributional properties of heads are more salient than distributional properties of modifiers • Perspective (i): salient features for compound–modifier vs. compound–head pairs • Setup: • same as before (window-based and syntax-based) • distinguish evaluation of 244 compound–modifier predictions vs. 244 compound–head predictions (instead of abstracting over the constituent type, using all 488 predictions) Sabine Schulte im Walde, Stefan Müller, Stephen Roller German Compound Nouns – VSMs and Compositionality Motivation Compound Data and Ratings Vector Space Models Role of Modifiers vs. Heads (1): Results for Windows window-based: • NN > NN+ADJ+VV > VV > ADJ (same as before) • window sizes: 20 > 10 > 5 > 2 > 1 (same as before) • small windows: compound–head > compound–modifier predictions • larger windows: difference vanishes Sabine Schulte im Walde, Stefan Müller, Stephen Roller German Compound Nouns – VSMs and Compositionality Motivation Compound Data and Ratings Vector Space Models Role of Modifiers vs. Heads (1): Results for Syntax syntax-based: • window-based > syntax-based (as before) • compound–head > compound–modifier predictions (exception transitive subjects) • patterns with regard to function types vary (in comparison to previous models, and for modifiers vs. heads) Sabine Schulte im Walde, Stefan Müller, Stephen Roller German Compound Nouns – VSMs and Compositionality Motivation Compound Data and Ratings Vector Space Models Role of Modifiers vs. Heads (2) • Hypothesis 2: distributional properties of heads are more salient than distributional properties of modifiers • Perspective (ii): contribution of modifiers vs. heads to compound meaning • Setup: • window-based, window 20, across parts-of-speech • correlate only one type of compound–constituent predictions with the compound whole ratings • apply addition/multiplication • correspondence to upper bound Sabine Schulte im Walde, Stefan Müller, Stephen Roller German Compound Nouns – VSMs and Compositionality Motivation Compound Data and Ratings Vector Space Models Role of Modifiers vs. Heads (2): Results • impact of distributional semantics: modifiers > heads • multiplication ∼ modifiers only • multiplication > addition Sabine Schulte im Walde, Stefan Müller, Stephen Roller German Compound Nouns – VSMs and Compositionality Motivation Compound Data and Ratings Vector Space Models Summary • Hypothesis 1 (i): against our intuition, not adjectives or verbs but nouns provided the most salient distributional information. • Hypothesis 1 (ii): syntax-based predictions were all worse or same as predictions by the respective window-based parts-of-speech. • Best Model: nouns within a 20-word window (ρ = 0.6497) Sabine Schulte im Walde, Stefan Müller, Stephen Roller German Compound Nouns – VSMs and Compositionality Motivation Compound Data and Ratings Vector Space Models Summary • Hypothesis 2 (i): • salient features to predict similarities between compound–modifier vs. compound–head pairs are different • small windows: distributional similarity between compounds and heads > compounds and modifiers; but difference vanishes in larger contexts • Hypothesis 2 (ii): influence of modifier meaning on compound meaning is stronger than influence of head meaning • in human ratings • and in VSMs • Future Work: learn more about the semantic role of modifiers vs. heads in noun-noun compounds (as do Gagné and Spalding, 2009; 2011, among others). Sabine Schulte im Walde, Stefan Müller, Stephen Roller German Compound Nouns – VSMs and Compositionality Motivation Compound Data and Ratings Vector Space Models Compositionality Ratings: Distribution (2) Sabine Schulte im Walde, Stefan Müller, Stephen Roller German Compound Nouns – VSMs and Compositionality Motivation Compound Data and Ratings Vector Space Models Window-based VSMs: Results Context Windows only Sentence Internal sdeWac, just Nouns vs. Sentence External Webko, just Nouns. Sabine Schulte im Walde, Stefan Müller, Stephen Roller German Compound Nouns – VSMs and Compositionality
© Copyright 2026 Paperzz