Grounding distributional semantics into the visual world

Grounding distributional
semantics into the visual world
Elia Bruni
Center for Mind/Brain Sciences
University of Trento
UW MSR Summer Institute 2013
1
Outline
1 Distributional semantics
2 Multimodal distributional semantics
3 What is this good for?
4 What comes next?
5 Appendix
2
The distributional hypothesis
Harris, Charles and Miller, Firth, Wittgenstein? . . .
• Definition: The meaning of a word is (can be approximated by,
derived from) the set of contexts in which it occurs in texts
• Check this example:
– We found a little, hairy wampimuk sleeping behind the tree
3
Distributional semantics
Landauer and Dumais (1997), Turney and Pantel (2010), . . .
he curtains open and the
ars and the cold , close
rough the night with the
made in the light of the
surely under a crescent
sun , the seasons of the
m is dazzling snow , the
un and the temple of the
in the dark and now the
bird on the shape of the
But I could n’t see the
rning , with a sliver of
they love the sun , the
the light of an enormous
man ’s first step on the
the inevitable piece of
oud obscured part of the
moon
moon
moon
moon
moon
moon
moon
moon
moon
moon
moon
moon
moon
moon
moon
moon
moon
shining in on the barely
" . And neither of the w
shining so brightly , it
. It all boils down , wr
, thrilled by ice-white
? Home , alone , Jay pla
has risen full and cold
, driving out of the hug
rises , full and amber a
over the trees in front
or the stars , only the
hanging among the stars
and the stars . None of
. The plash of flowing w
; various exhibits , aer
rock . Housing The Airsh
. The Allied guns behind
4
Distributional semantics
Distributional meaning as co-occurrence vector
shadow
shine
full
planet
night
crescent
moon
10
22
43
16
29
12
sun
14
10
4
15
45
0
dog
0
4
2
10
0
0
5
Distributional semantics
The geometry of meaning
sun
moon
sun
dog
shadow
16
15
10
moon
shine
29
45
0
dog
6
Outline
1 Distributional semantics
2 Multimodal distributional semantics
3 What is this good for?
4 What comes next?
5 Appendix
7
Lack of grounding
• Distributional semantics represents the meaning of a word entirely in
terms of connections to other words
• Humans have access to rich sources of perceptual knowledge when
learning the meaning of words
8
awbacks. Motivation
for multimodality
Lack of grounding
• clover is blue
• coffee is green
• crows are white
• deers are yellow
• flour is black
• grass is purple
• the sky is green
• violins are blue
9
The distributional hypothesis, generalized
• Definition: The meaning of a word is (can be approximated by,
derived from) the set of contexts in which it occurs in
/// ///////
texts
10
Visual dictionary
!"#$%&'()%*+',$"
-."*$/0#("%#*',%(&(&)1*(."*2"#$%&'()%*#',$"3*4,$.*$/0#("%*&#*$,//"2*,*56&#0,/*7)%283
11
Distributional semantics from images
!""#
!
12
Distributional semantics from images
!""#
!
!""#
13
Distributional semantics from images
!""#
!
.#-/%#0'*
0"1#/-
$%&'(')*
+!%,'!""#
!
!""#
14
Distributional semantics from images
!""#
!
.#-/%#0'*
0"1#/-
$%&'(')*
+!%,'!""#
!
!""#
"
#$$%
23
45
54
67
8"/%(*
0"1#/-
15
Distributional semantics from images
!"#
!"
#$
%&
!'
()*+,.)/0*1
16
Distributional semantics from images
!"#
!"
#$
%&
!'
$%&
2$2
3
!!
%'
()*+,.)/0*1
()*+,.)/0*1
17
Distributional semantics from images
!""#
!"
#$
$#
%&
$%#
%!
#'
&(
%)
&"'
"'"
$
%%
&)
18
Outline
1 Distributional semantics
2 Multimodal distributional semantics
3 What is this good for?
4 What comes next?
5 Appendix
19
Tasks
Task 1 Predicting human semantic relatedness judgments
Improved!
20
Tasks
Task 1 Predicting human semantic relatedness judgments
Improved!
Task 2 Concept categorization, i.e. grouping words into
classes based on their semantic relatedness (car ISA
vehicle; banana ISA fruit)
Improved!
20
Tasks
Task 1 Predicting human semantic relatedness judgments
Improved!
Task 2 Concept categorization, i.e. grouping words into
classes based on their semantic relatedness (car ISA
vehicle; banana ISA fruit)
Improved!
Task 3 Find typical color of concrete objects (cardboard is
brown, tomato is red )
Improved!
20
Tasks
Task 1 Predicting human semantic relatedness judgments
Improved!
Task 2 Concept categorization, i.e. grouping words into
classes based on their semantic relatedness (car ISA
vehicle; banana ISA fruit)
Improved!
Task 3 Find typical color of concrete objects (cardboard is
brown, tomato is red )
Improved!
Task 4 Distinguish literal vs. non-literal usages of color
adjectives (blue uniform vs. blue note)
Improved!
20
Outline
1 Distributional semantics
2 Multimodal distributional semantics
3 What is this good for?
4 What comes next?
5 Appendix
21
Can we use better fusion strategies?
Vision
Text
Fusion
22
Does localization help?
The meaning of a visually depicted concept is (can be approximated
by, derived from) the set of contexts in which it occurs in images
23
Do visual semantic models correlate with neural
representation of concepts?
24
The end
Thank you!
http://clic.cimec.unitn.it/~elia.bruni
25
Outline
1 Distributional semantics
2 Multimodal distributional semantics
3 What is this good for?
4 What comes next?
5 Appendix
26
Text input
• Data
– large: almost 3 billion words
• ukWaC and Wackypedia corpora combined
• http://wacky.sslmit.unibo.it/
– local mutual information (association measure)
• negative values raised to 0
– rows and columns: top 20K nouns, 5K adjectives, and 5K verbs
• Contexts
– Window2, Window20
27
Visual input
The ESP label-image dataset
• Invented by L. von Ahn (2003)
• 100K labeled images
• Labeled through a game:
– two people are partnered together
– both see the same image and have to agree on an appropriate word
label
– a word entered by both participants become a label for the image
• http://www.cs.cmu.edu/~biglou/resources/
28
The ESP label-image dataset
29
Task 1: Semantic relatedness data sets
• Data
– WordSim353 dataset
• 353 word pairs (coverage: 252)
• 16 subjects rate each pair on a 10-point scale, ratings averaged
• dollar/buck: 9.22, professor/cucumber: 0.31
– MEN dataset (created by us)
• 3,000 word pairs, tags in image datasets
• crowdsourcing: subjects see two word pairs and pick the pair containing
most related words
• each word pair is rated 50 times, score = selected / 50
• cold/frost: 0.9, eat/hair: 0.1
• Method
– for each model, compute cosine between word vectors
– score: Spearman correlation against the human ratings
30
Task 1: Results
Bruni, N.K. Tran and Baroni (submitted)
Window 2
Window 20
Model
MEN
WS
MEN
WS
Text
Image
0.73
0.43
0.70
0.36
0.68
0.43
0.70
0.36
Fusion
0.78
0.72
0.76
0.75
Table : Spearman correlation of the models on MEN and WordSim (all
coefficients significant with p < 0.001).
31
Task 2: Concept categorization data sets
• Data
– Battig (for training)
• 77 concepts from 10 different classes
• bird (eagle, owl...) vegetable (broccoli, potato...)
– Almuhareb-Poesio (for testing)
• 231 concepts from 21 different classes
• vehicle (airplane, car...) time (aeon, future...)
• Method
– cluster the words based on their pairwise cosines in the semantic
space (using the CLUTO toolkit)
32
Task 2: Results
Bruni, N.K. Tran and Baroni (submitted)
Window 2 Window 20
Text
Image
0.73
0.26
0.65
0.26
Fusion
0.74
0.69
Table : Percentage purities of the models on AP.
33
Task 3: Find typical color of concrete objects
• Data and task
– spot typical color of 52 concrete objects: cardboard is brown, coal is
black, forest is green
– typical colors assigned by two judges by consensus
• Berlin and Kay (1969)’s basic color adjectives: black, blue, brown,
green, grey, orange, pink, purple, red, white , yellow
• Method
– rank color adjective vectors by similarity to the noun vectors
– good models will rank right color high
34
Task 3: Results
Bruni, Boleda, Baroni and N.K. Tran 2012
• Median rank of correct color and # of top matches
Model
Median Matches
TEXT30K
3
11
LAB128
SIFT40K
1
3
27
15
TEXT+LAB128
1
27
TEXT+SIFT40K
2
17
35
Task 3: Examples
word
gold
LAB
SIFT
TEXT
cauliflower
cello
deer
froth
gorilla
grass
pig
sea
weed
white
brown
brown
white
black
green
pink
blue
green
green
brown
green
brown
black
green
pink
blue
green
yellow
black
blue
black
red
green
brown
blue
yellow
orange
blue
red
orange
grey
green
brown
grey
purple
36
Task 4: Literal vs. non-literal
• Data and task
– distinguish literal and non-literal usages of color adjectives: blue
uniform, blue shark, blue note
– 342 adjective-noun pairs, 227 literal, 115 non-literal, as decided by
two judges by consensus
• Method
– compute cosine between color adjective vector and noun vector
– prediction: higher similarity of color and noun vectors for literal uses
37
Task 4: Results
• Average difference in normalized adj-noun cosines in literal vs.
non-literal conditions with t-test significance
Model
Score
TEXT30K
0.53***
LAB128
0.25*
SIFT40K
0.57***
TEXT+LAB128
0.36***
TEXT+SIFT40K
0.73***
38
Does localization help
Current development
The meaning of a visually depicted concept is (can be approximated
by, derived from) the set of contexts in which it occurs in images
39
Results
• Word relatedness on Pascal similarity dataset
Area
Concept
Context
Concept+Context
No
NA
NA
0.47
Localization:
Manual Automatic
0.39
0.50
0.54
0.36
0.51
0.54
context
concept
40
Distributional semantics and the brain
!""#
!
$%&'(')*
+!%,'!""#
!
!""#
"
#$$%
41
Distributional semantics and the brain
42
Distributional semantics and the brain
Pairwise Spearman similarity between image-based models and fMRI data
Area
rho
p
Global
0.49 0.0004
Concept 0.28 0.0281
Context 0.60 0.0002
context
concept
43