Deriving adjectival scales from continuous space

Deriving adjectival scales from
continuous space word
representations
Joo-Kyung Kim and Marie-Catherine de Marneffe
The Ohio State University
EMNLP 2013
Overview
Q: Was the movie good?
A: It was excellent.
Yes the movie was good
Q: Was the movie good?
A: It was okay.
No the movie was not good
okay < good < excellent
[Mikolov et al. 2013] syntactic and semantic regularities of continuous word
representations from Recurrent Neural Network Language Model
[Mikolov et al. 2010] off-the-shelf continuous space word representations
http://www.fit.vutbr.cz/~imikolov/rnnlm/word_projections-1600.txt.gz
Can we derive adjectival scales from such representations?
2
Continuous word representations from RNNLM
w(t)
1 out of N
Input
(N nodes)
s(t-1)
[Mikolov et al., 2010]
π‘ π‘–π‘”π‘šπ‘œπ‘–π‘‘ π‘ˆπ‘€ 𝑑 + π‘Šπ‘  𝑑 βˆ’ 1
s(t)
Hidden
(M nodes)
π‘ π‘œπ‘“π‘‘π‘šπ‘Žπ‘₯ 𝑉𝑠 𝑑
y(t)
𝑉 =𝑁×𝑀
Output
(N nodes)
Recurrence
(M nodes)
From the matrix U, M dimensional column vector for each word is used as
the continuous space word representation. (M=1,600 for our experiments)
Both the current and the previous words (contexts) influence on the training.
3
Syntactic/semantic regularities of continuous
word representations in RNNLM
[Mikolov et al. 2013]
Gender transformation
Number transformation
4
Deriving adjectival scales from continuous word
representations
We assume that intermediate vectors between two word vectors in the
continuous space represents some β€œmiddle” form or meaning.
quiet
angry
x1
a=furious
x2
x3
b=calm
tense
5
Intermediate words between base forms and
superlative forms
<comparative>
<superlative>
?
<base form>
base
superlative
words with highest cosine similarities to the center
good
best
better: .738
strong: .644
normal: .619
less: .609
bad
worst
terrible: .726
great: .678
horrible: .674
worse: .665
slow
slowest
slower: .637
sluggish: .614
steady: .558
brisk: .543
fast
fastest
faster: .645
slower: .602
quicker: .542
harder: .518
6
Intermediate words between two
semantically related adjectives
1st
input
word
Word with highest cosine similarity to each
intermediate point
2nd input
word
1st quarter
half
3rd quarter
furious
angry: .615
tense: .465
quiet: .560
calm
furious
angry: .632
unhappy: .640
pleased: .516
happy
terrible
horrible: .783
incredible: .714
wonderful: .772
terrific
cold
mild: .348
warm: .517
sticky: .424
hot
ugly
nasty: .672
wacky: .645
lovely: .715
gorgeous
7
Evaluation: corpus of indirect answers to yes/no
questions
[de Marneffe et al., 2010]
125 question-answer pairs where both the question and the answer
contain an adjective
Q: Is Obama qualified?
A: I think he’s young.
Each pair is annotated via Mechanical Turk for whether the answer
conveys
- yes
- no
- uncertain
8
Classifying the IQAPs
1.
2.
3.
4.
Q: Is Obama qualified?
A: I think he’s young.
Get the antonym of the question word from WordNet
Draw a line connecting the question and the antonym
The perpendicular hyperplane passing through the center between
the question and the antonym is the decision boundary
Check the side of the answer
unqualified
qualified
young
9
Choosing the antonym
Q: Was the movie good?
A: It was excellent.
β€’ A word can have multiple antonyms with different senses.
β€’ We need to choose the antonym that is most related to both the
question and the answer.
β€’ We choose the antonym that is most collinear with the question
and the answer in the continuous word space.
β€’ π‘Žπ‘Ÿπ‘” max π‘π‘œπ‘  π‘€π‘ž βˆ’ π‘€π‘Ž , π‘€π‘ž βˆ’ π‘€π‘Žπ‘›π‘‘π‘–
π‘Žπ‘›π‘‘π‘–
evil
good
excellent
bad
10
Deriving adjectival scales by [de Marneffe et al., 2010]
Movie reviews are with ratings.
𝐸𝑅 𝑀 = π‘Ÿβˆˆπ‘… π‘Ÿ π‘ƒπ‘Ÿ π‘Ÿ|𝑀 is used as
the scale of the word 𝑀
β€’ There is a need for data with
numerical labels(e.g., movie rates)
11
Deriving adjectival scales by [Mohtarami et al., 2011]
SVD
Dimension reduction
Reconstruction
[Courtesy of Wang Houfeng, 2010]
π‘ˆπ‘˜ × π·π‘˜ is the π‘˜-dimensional continuous representations for the terms. Then,
relative positions in the latent semantic space can derive the scales.
β€’ Singular Value Decomposition (SVD) assumes normally distributed data.
β€’ A bag of words model. (Word sequences in a document are ignored)
12
Deriving adjectival scales by [de Melo & Bansal, 2013]
Find intensity scales using regular expressions on web documents
Weak-Strong Patterns
Strong-Weak Patterns
* (,) but not *
not * (,) (but) just *
* (,) if not *
not * (,) (but|although|though) still *
* (,) (al)though not *
* (,) or very *
* (,) (and|or) (even|almost) *
not (only|just) * but *
e.g., β€œgood but not great” οƒ  good < great
Globally optimize the scales using Mixed Integer Linear Programming
(MILP)
β€’ Can find only a limited number of adjectival scales
13
Evaluation on the 125 IQAPs
Accuracy
Precision
Recall
F1 score
de Marneffe et al., (2010)
60.00
59.72
59.40
59.56
Mohtarami et al., (2011)
-
62.23
60.88
61.55
72.80
69.78
71.39
70.58
The RNN based Model
β€’ Precision, Recall, and F1 score are macro averaged for yes and no.
β€’ Our model showed statistically significantly better scores for different
metrics than those of [de Marneffe et al., 2010].
β€’ [Mohtarami et al., 2011] also showed better results by using
synonyms. However, that approach does not learn scales.
14
Visualization of questions, antonyms and answers in
2D space by multidimensional scaling (MDS)
A: Do you think she'd be happy with this book?
B: I think she'd be delighted by it.
20
15
10
bad good
sure
terrible
confident
happy
5
delighted
dim 2
0
unhappy
-5
young
-10
-15
diffident
qualified
-20
-25
-20
unqualified
-15
-10
-5
0
dim 1
5
10
15
20
15
Visualization of questions, antonyms and answers in
2D space by multidimensional scaling (MDS)
A: Do you think that's a good idea?
B: It's a terrible idea.
20
15
10
bad good
sure
terrible
confident
happy
5
delighted
dim 2
0
unhappy
-5
young
-10
-15
diffident
qualified
-20
-25
-20
unqualified
-15
-10
-5
0
dim 1
5
10
15
20
16
Visualization of questions, antonyms and answers in
2D space by multidimensional scaling (MDS)
A: The president is promising support for Americans who have suffered from this
hurricane. Are you confident you are going to be getting that?
B: I'm not so sure about my insurance company.
20
15
10
bad good
sure
terrible
confident
happy
5
delighted
dim 2
0
unhappy
-5
young
-10
-15
diffident
qualified
-20
-25
-20
unqualified
-15
-10
-5
0
dim 1
5
10
15
20
17
Conclusion
We give further evidence that relationships in the RNNLM continuous
vector space are interpretable.
We successfully learn adjectival scales as shown by high improvement
on the IQAP corpus.
18
Future work
Adjectives with modifying adverbs (so sure, quite good, quite a few, etc.)
From
What the British say What the British mean
What foreigners understand
That’s not bad
That’s good
That’s poor
Quite good
A bit disappointing
Quite good
Very interesting
This is clearly nonsense They are impressed
I almost agree
I don’t agree at all
He’s not far from agreement
http://www.telegraph.co.uk/news/newstopics/howaboutthat/10280244/Translationtable-explaining-the-truth-behind-British-politeness-becomes-internet-hit.html.
19
Thank you!
We also thank Eric Fosler-Lussier and the anonymous reviewers for
their helpful comments.
20
Shifting the decision boundary
No accuracy gain from shifting the decision boundary
Center:
72.8
+1%:
71.2
-1%:
70.4
-10%:
69.6
+10%:
64.0
21