Exploring Vector Space Models to Predict the - clic

Motivation
Compound Data and Ratings
Vector Space Models
Exploring Vector Space Models to
Predict the Compositionality of
German Noun-Noun Compounds
Sabine Schulte im Walde, Stefan Müller, Stephen Roller
Institut für Maschinelle Sprachverarbeitung (IMS)
Universität Stuttgart, Germany
*SEM, Atlanta
June 13-14, 2013
Sabine Schulte im Walde, Stefan Müller, Stephen Roller
German Compound Nouns – VSMs and Compositionality
Motivation
Compound Data and Ratings
Vector Space Models
Overview
• Motivation and Background
• Description of Compositionality Ratings & Data Sets
• Eval & Baselines
• Predicting Compound-Constituent Ratings
• POS Feature comparison
• Syntax Feature Comparison
• Predicting Compound Whole Ratings
• Conclusions
Sabine Schulte im Walde, Stefan Müller, Stephen Roller
German Compound Nouns – VSMs and Compositionality
Motivation
Compound Data and Ratings
Vector Space Models
Motivation
• Vector Space Models (VSMs): explore the notion of
“similarity” between a set of target objects within a geometric
setting (Turney and Pantel, 2010; Erk, 2012).
• Distributional Semantics: exploit the distributional hypothesis
(Firth, 1957; Harris, 1968) to determine co-occurrence
features for vector space models that best describe the words,
phrases, sentences, etc. of interest.
• Salient Distributional Features in VSMs: general knowledge
about useful features, but not across phenomena.
• Linguist - Computational Linguist loop
• Phenomenon: German noun-noun compounds, such as
Feuerwerk ‘fire works’ (Feuer ‘fire’ + Werk ‘opus’).
Sabine Schulte im Walde, Stefan Müller, Stephen Roller
German Compound Nouns – VSMs and Compositionality
Motivation
Compound Data and Ratings
Vector Space Models
Hypotheses
1
Targets in the vector space models are nouns
(compound nouns, modifier nouns, head nouns)
→ adjectives and verbs provide most salient features,
→ syntax-based outperforms window-based.
2
Contributions of modifier noun vs. head noun:
distributional properties of heads are more salient than
distributional properties of modifiers
in predicting the degree of compositionality of the compounds.
Sabine Schulte im Walde, Stefan Müller, Stephen Roller
German Compound Nouns – VSMs and Compositionality
Motivation
Compound Data and Ratings
Vector Space Models
German Noun-Noun Compounds
• German noun-noun compounds:
• combinations of two or more simplex nouns
• grammatical head is a noun (German: rightmost constituent)
• modifier is a noun
• Examples: Ahornblatt ‘maple leaf’, Obstkuchen ‘fruit cake’
• Degree of Compositionality: semantic relatedness between
compound meaning and meanings of constituents
• Examples (T=transparent; O=opaque):
TT Ahornblatt ‘maple+leaf’
OO Löwenzahn ‘lion+tooth → dandelion’
TO Feuerzeug ‘fire+stuff → lighter’
OT Fliegenpilz ‘fly+mushroom → toadstool’
• Dataset: 244 two-part noun-noun compounds
Sabine Schulte im Walde, Stefan Müller, Stephen Roller
German Compound Nouns – VSMs and Compositionality
Motivation
Compound Data and Ratings
Vector Space Models
Compositionality Ratings
Two collections:
1
Compound–Constituent Ratings
2
Compound Whole Ratings
Sabine Schulte im Walde, Stefan Müller, Stephen Roller
German Compound Nouns – VSMs and Compositionality
Motivation
Compound Data and Ratings
Vector Space Models
Compound–Constituent Ratings
• Material: 450 concrete, depictable German noun compounds
• (We use a subset of these)
• Participants: 30 per compound
• Task: degree of compositionality of the compounds with
respect to their first as well as their second constituent
• Scale: 1 (definitely opaque) to 7 (definitely transparent)
• Mode: paper+pen
• Data: rating means and standard deviation
Sabine Schulte im Walde, Stefan Müller, Stephen Roller
German Compound Nouns – VSMs and Compositionality
Motivation
Compound Data and Ratings
Vector Space Models
Compound Whole Ratings
• Material: 244 noun-noun compounds (subset of above)
• Participants: 27–34 per compound
• Task: degree of compositionality of the compounds as a whole
• Scale: 1 (definitely opaque) to 7 (definitely transparent)
• Mode: Amazon Mechanical Turk (AMT)
• Data: rating means and standard deviation
Sabine Schulte im Walde, Stefan Müller, Stephen Roller
German Compound Nouns – VSMs and Compositionality
Motivation
Compound Data and Ratings
Vector Space Models
Compositionality Ratings: Examples
whole
Compounds
literal meanings of constituents
Mean Ratings and Standard Deviations
whole
modifier
head
Ahornblatt ‘maple leaf’
maple
leaf
6.03 ± 1.49
5.64 ± 1.63
5.71 ± 1.70
Löwenzahn ‘dandelion’
lion
tooth
1.66 ± 1.54
2.10 ± 1.84
2.23 ± 1.92
Fliegenpilz ‘toadstool’
fly/bow tie
mushroom
2.00 ± 1.20
1.93 ± 1.28
6.55 ± 0.63
Feuerzeug ‘lighter’
fire
stuff
4.58 ± 1.75
5.87 ± 1.01
1.90 ± 1.03
Sabine Schulte im Walde, Stefan Müller, Stephen Roller
German Compound Nouns – VSMs and Compositionality
Motivation
Compound Data and Ratings
Vector Space Models
Compositionality Ratings: Distribution (1)
Sabine Schulte im Walde, Stefan Müller, Stephen Roller
German Compound Nouns – VSMs and Compositionality
Motivation
Compound Data and Ratings
Vector Space Models
Vector Space Models: Setup
• Goal: use VSM to identify salient distributional features to
predict the degree of compositionality of the compounds
• Corpora: two German web corpora
• Feature Values: local mutual information (Evert, 2005) of
co-occurrence counts (between target nouns and features):
LMI = O × log O
E
• Measure of Relatedness: cosine ∼ degree of compositionality
• Evaluation: cosine against human ratings;
Spearman Rank-Order Correlation Coefficient ρ
(Siegel and Castellan, 1988)
Sabine Schulte im Walde, Stefan Müller, Stephen Roller
German Compound Nouns – VSMs and Compositionality
Motivation
Compound Data and Ratings
Vector Space Models
Baseline and Upper Bound
Upper Bound: correlations between human ratings:
whole ∼ compound–modifier;
whole ∼ compound–head
addition/multiplication:
whole ∼ compound–modifier +/× compound–head
Baseline: random assignment of rating values [1,7] to
compound–modifier and compound–head pairs;
correlation of random values against human ratings
addition/multiplication:
whole ∼ rand(compound–modifier) +/× rand(compound–head)
Sabine Schulte im Walde, Stefan Müller, Stephen Roller
German Compound Nouns – VSMs and Compositionality
Motivation
Compound Data and Ratings
Vector Space Models
Baseline and Upper Bound
Function
modifier only
head only
addition
multiplication
Baseline
.0959
.1019
.1168
.1079
Sabine Schulte im Walde, Stefan Müller, Stephen Roller
ρ
Upper Bound
.6002
.1385
.7687
.7829
German Compound Nouns – VSMs and Compositionality
Motivation
Compound Data and Ratings
Vector Space Models
Corpus Data: German Web Corpora
1
sdeWaC (Faaß et al., 2010)
• cleaned and parsed version of the German web corpus deWaC
created by the WaCky group (Baroni et al., 2009)
• corpus cleaning: removing duplicates; disregarding syntactically
ill-formed sentences; etc.
• size: approx. 880 million words
• disadvantage: sentences in the corpus are sorted alphabetically
→ window co-occurrence refers to x words to left and right
BUT within the same sentence
2
WebKo
• predecessor version of sdeWaC
• size: approx. 1.5 billion words
• disadvantage: less clean and not parsed
Sabine Schulte im Walde, Stefan Müller, Stephen Roller
German Compound Nouns – VSMs and Compositionality
Motivation
Compound Data and Ratings
Vector Space Models
Window-based VSMs
• Hypothesis 1 (i):
adjectives and verbs provide most salient features
(for describing noun compounds)
• Task: compare parts-of-speech in predicting compositionality
• Setup:
• specification of corpus, part-of-speech and window size
• determine co-occurrence counts and calculate lmi values
• parts-of-speech: common nouns, adjectives, main verbs
• window sizes: 1, 2, 5, 10, 20 (, . . . 100)
• basis: lemmas; no punctuation
Sabine Schulte im Walde, Stefan Müller, Stephen Roller
German Compound Nouns – VSMs and Compositionality
Motivation
Compound Data and Ratings
Vector Space Models
Window-based VSMs: Results
• NN > NN+ADJ+VV > VV > ADJ (significant)
• window sizes: 100 = 50 ∼ 20 > 10 > 5 > 2 > 1
• WebKo > sdeWaC (significant; also with sentence-internal windows)
• best result: ρ = 0.6497 (WebKo, NN, window size: 20)
Sabine Schulte im Walde, Stefan Müller, Stephen Roller
German Compound Nouns – VSMs and Compositionality
Motivation
Compound Data and Ratings
Vector Space Models
Hypothesis 1 (ii):
syntax-based features outperform window-based features
Task: compare the two co-occurrence conditions
Setup:
• corpus choice: sdeWaC (parsed)
• specification of syntactic function
• determine co-occurrence counts and calculate lmi values
• syntactic functions (VS features):
• nouns in verb subcategorisation:
•
•
•
•
transitive and intransitive subjects
concatenation of both trans/intrans features (all subjects)
direct objects
PP objects
• noun-modifying adjectives
• noun-modifying and noun-modified prepositions
Sabine Schulte im Walde, Stefan Müller, Stephen Roller
German Compound Nouns – VSMs and Compositionality
Motivation
Compound Data and Ratings
Vector Space Models
Syntax-based VSMs: Results
Sabine Schulte im Walde, Stefan Müller, Stephen Roller
German Compound Nouns – VSMs and Compositionality
Motivation
Compound Data and Ratings
Vector Space Models
Syntax-based VSMs: Results
• window-based > syntax-based
• noun-modifying adjectives ∼ adjectives in window 20
• verbs in window 20 > verb subcategorisation;
best verb subcategorisation function: direct object
• abstracting over subject (in)transitivity > specific functions
• concatenation worse than the best individual functions
Sabine Schulte im Walde, Stefan Müller, Stephen Roller
German Compound Nouns – VSMs and Compositionality
Motivation
Compound Data and Ratings
Vector Space Models
Role of Modifiers vs. Heads (1)
• Hypothesis 2:
distributional properties of heads are more salient than
distributional properties of modifiers
• Perspective (i): salient features for compound–modifier vs.
compound–head pairs
• Setup:
• same as before (window-based and syntax-based)
• distinguish evaluation of 244 compound–modifier predictions vs. 244
compound–head predictions (instead of abstracting over the
constituent type, using all 488 predictions)
Sabine Schulte im Walde, Stefan Müller, Stephen Roller
German Compound Nouns – VSMs and Compositionality
Motivation
Compound Data and Ratings
Vector Space Models
Role of Modifiers vs. Heads (1): Results for Windows
window-based:
• NN > NN+ADJ+VV > VV > ADJ (same as before)
• window sizes: 20 > 10 > 5 > 2 > 1 (same as before)
• small windows: compound–head > compound–modifier predictions
• larger windows: difference vanishes
Sabine Schulte im Walde, Stefan Müller, Stephen Roller
German Compound Nouns – VSMs and Compositionality
Motivation
Compound Data and Ratings
Vector Space Models
Role of Modifiers vs. Heads (1): Results for Syntax
syntax-based:
• window-based > syntax-based (as before)
• compound–head > compound–modifier predictions (exception transitive
subjects)
• patterns with regard to function types vary
(in comparison to previous models, and for modifiers vs. heads)
Sabine Schulte im Walde, Stefan Müller, Stephen Roller
German Compound Nouns – VSMs and Compositionality
Motivation
Compound Data and Ratings
Vector Space Models
Role of Modifiers vs. Heads (2)
• Hypothesis 2:
distributional properties of heads are more salient than
distributional properties of modifiers
• Perspective (ii): contribution of modifiers vs. heads to compound
meaning
• Setup:
• window-based, window 20, across parts-of-speech
• correlate only one type of compound–constituent predictions with
the compound whole ratings
• apply addition/multiplication
• correspondence to upper bound
Sabine Schulte im Walde, Stefan Müller, Stephen Roller
German Compound Nouns – VSMs and Compositionality
Motivation
Compound Data and Ratings
Vector Space Models
Role of Modifiers vs. Heads (2): Results
• impact of distributional semantics: modifiers > heads
• multiplication ∼ modifiers only
• multiplication > addition
Sabine Schulte im Walde, Stefan Müller, Stephen Roller
German Compound Nouns – VSMs and Compositionality
Motivation
Compound Data and Ratings
Vector Space Models
Summary
• Hypothesis 1 (i): against our intuition, not adjectives or verbs but
nouns provided the most salient distributional information.
• Hypothesis 1 (ii): syntax-based predictions were all worse or same
as predictions by the respective window-based parts-of-speech.
• Best Model: nouns within a 20-word window (ρ = 0.6497)
Sabine Schulte im Walde, Stefan Müller, Stephen Roller
German Compound Nouns – VSMs and Compositionality
Motivation
Compound Data and Ratings
Vector Space Models
Summary
• Hypothesis 2 (i):
• salient features to predict similarities between compound–modifier
vs. compound–head pairs are different
• small windows: distributional similarity between
compounds and heads > compounds and modifiers;
but difference vanishes in larger contexts
• Hypothesis 2 (ii): influence of modifier meaning on compound
meaning is stronger than influence of head meaning
• in human ratings
• and in VSMs
• Future Work: learn more about the semantic role of modifiers vs.
heads in noun-noun compounds (as do Gagné and Spalding, 2009;
2011, among others).
Sabine Schulte im Walde, Stefan Müller, Stephen Roller
German Compound Nouns – VSMs and Compositionality
Motivation
Compound Data and Ratings
Vector Space Models
Compositionality Ratings: Distribution (2)
Sabine Schulte im Walde, Stefan Müller, Stephen Roller
German Compound Nouns – VSMs and Compositionality
Motivation
Compound Data and Ratings
Vector Space Models
Window-based VSMs: Results
Context Windows only
Sentence Internal sdeWac, just Nouns vs. Sentence External
Webko, just Nouns.
Sabine Schulte im Walde, Stefan Müller, Stephen Roller
German Compound Nouns – VSMs and Compositionality