A Mathematical Framework for a Distributional Compositional Model

A Mathematical Framework for a Distributional
Compositional Model of Meaning
Stephen Clark
University of Cambridge Computer Laboratory
King’s College London
22 November 2012
Intro
Lexical
Syntax
2
Sentences in Google
illed his dog - Google Search
+You Web Images Videos Maps News Mail More
http://www.google.co.uk/
[email protected]
A man killed his dog
Search
About 22,300,000 results (0.34 seconds)
Everything
Dog shoots man | Metro.co.uk
Images
www.metro.co.uk/weird/82965-dog-shoots-man
Dog shoots man. Gun Woof, woof, you're dead. A man was killed after his dog stepped
on a loaded shotgun in the back of a pick-up truck. Perry Price, a ...
Maps
Videos
News
Shopping
More
London, UK
Change location
man who killed his dog to survive in the amazon - Topic
community.discovery.com/eve/forums/a/tpc/f/.../m/48719019601
34 posts - 14 authors - Last post: 2 Sep
The man made a stupid decision to go into the amazon by himself and had his poor dog
(who ran 40 miles after he crashed his canoe in the ...
The Man who Killed His Friend for Eating his Dog After it was Killed ...
notverycool.com/.../the-man-who-killed-his-friend-for-eating-his-do...
18 Aug 2011 – And killed it. That's not very cool. The man's friend then attempted to
take the dead dog home to eat. In response, the man shot his friend with ...
DisCo Models of Meaning
News for A man killed his dog
SafeSearch is locked
Intro
Lexical
Syntax
Motivation
• Two success stories:
• distributional vector-based models of lexical meaning (in practice)
• compositional logic-based models of sentence meaning (in theory)
• Can we combine these approaches to give a vector-based
semantic model of phrases and sentences?
• A fundamental problem in natural language semantics
DisCo Models of Meaning
3
Intro
Lexical
Syntax
Interdisciplinary Endeavour
• Collaboration with the Oxford Computational Linguistics and
Quantum groups
• B. Coecke, E. Grefenstette† , S. Pulman, M. Sadrzadeh†
• Linguistics, semantics, logic, category theory, quantum logic, . . .
† thanks to Ed and Mehrnoosh for some of the slides
DisCo Models of Meaning
4
Intro
Lexical
Syntax
A Hot Topic
• 1.5M 5-site† , 3-year EPSRC project
• 1M 5-year ERC Starting Grant (DisCoTex)
• 5-year ERC Starting Grant to Marco Baroni (COMPOSES)
• EPSRC Career Acceleration Fellowship to Mehrnoosh Sadrzadeh
• Lots of papers, workshops:
• ACL 2011: Distributional Semantics and Compositionality
• IWCS 2013: Towards a formal distributional semantics
• ...
† Cambridge, Edinburgh, Oxford, Sussex, York
DisCo Models of Meaning
5
Intro
Lexical
Today’s Talk
• Distributional models of word meaning
• Categorial grammar syntax
• A compositional distributional model
• An example sentence space
• based on simple transitive verb sentences
• Related work in “Deep Learning”
DisCo Models of Meaning
Syntax
6
Intro
Lexical
Syntax
Distributional and Semantic Similarity
• You shall know a word by the company that it keeps. (Firth,‘57)
• Distributional hypothesis: the meaning of a word can be
represented by the distribution of words appearing in its contexts
DisCo Models of Meaning
7
Intro
Lexical
Syntax
Distributional and Semantic Similarity
• dog and cat are related semantically:
dog and cat both co-occur with big, small, furry, eat, sleep
• ship and boat have similar meanings:
ship and boat appear as the direct object of the verbs sail, clean,
bought; as the object of the adjectives large, clean, expensive
• Induce lexical relations automatically from large text collections
and distributional similarity
DisCo Models of Meaning
8
Intro
Lexical
Syntax
Window Methods
• In window methods the context is a fixed-word window either side
of the target word
• For each target word a vector is created where each basis vector
corresponds to a context word
• Coefficient for each basis is a (weighted) frequency of the number
of times the corresponding context word appears in the context of
the target word
• Our compositional framework is agnostic towards the word vectors
DisCo Models of Meaning
9
Intro
Lexical
Vector Space for Window Method
furry
6
cat
dog
stroke
DisCo Models of Meaning
- pet
Syntax
10
Intro
Lexical
Syntax
Example Output
• introduction: launch, implementation, advent, addition,
adoption, arrival, absence, inclusion, creation, departure,
availability, elimination, emergence, use, acceptance, abolition,
array, passage, completion, announcement, . . .
DisCo Models of Meaning
11
Intro
Lexical
Syntax
Example Output
• evaluation: assessment, examination, appraisal, review, audit,
analysis, consultation, monitoring, testing, verification, inquiry,
inspection, measurement, supervision, certification, checkup, . . .
DisCo Models of Meaning
12
Intro
Lexical
Syntax
From Words to Sentences
s1
6
man killed dog
man murdered cat
?
- s3
man killed by dog
s2
DisCo Models of Meaning
13
Intro
Lexical
Syntax
What Semantics?!
• A semantics of similarity
• How to incorporate inference, logical operators, quantification,
etc. is an interesting question . . .
DisCo Models of Meaning
14
Intro
Lexical
Syntax
Categorial Grammar
interleukin − 10
inhibits
production
NP
(S \NP )/NP
NP
S \NP
S
DisCo Models of Meaning
15
Intro
Lexical
Syntax
A Simple CG Derivation
interleukin − 10
inhibits
production
NP
(S \NP )/NP
NP
S \NP
S
>
forward application
DisCo Models of Meaning
>
16
Intro
Lexical
Syntax
A Simple CG Derivation
interleukin − 10
inhibits
production
NP
(S \NP )/NP
NP
S \NP
S
>
<
forward application
backward application
DisCo Models of Meaning
>
<
17
Intro
Lexical
Syntax
Pregroup Grammar Derivation
Google
bought
Microsoft
NP
NP r · S · NP l
NP
S \NP
S
DisCo Models of Meaning
18
Intro
Lexical
Syntax
Pregroup Grammar Derivation
Google
bought
Microsoft
NP
NP r · S · NP l
NP
NP r · S
S
DisCo Models of Meaning
19
Intro
Lexical
Syntax
Pregroup Grammar Derivation
Google
bought
Microsoft
NP
NP r · S · NP l
NP
NP r · S
S
DisCo Models of Meaning
20
Intro
Lexical
Syntax
Pregroup Reduction
Google
NP
DisCo Models of Meaning
bought
NP^r S NP^l
Microsoft
NP
21
Semantics
Example
Learning
Predicate-Argument Semantics
man
bites
dog
NP
NP r · S · NP l
NP
man0 λx.λy bites0 (x, y) dog 0
NP r · S
S
DisCo Models of Meaning
22
Semantics
Example
Learning
Predicate-Argument Semantics
man
bites
dog
NP
NP r · S · NP l
NP
man0 λx.λy bites0 (x, y) dog 0
NP r · S
λy bites0 (dog 0 , y)
S
Function application
DisCo Models of Meaning
23
Semantics
Example
Learning
Predicate-Argument Semantics
man
bites
dog
NP
NP r · S · NP l
NP
man0 λx.λy bites0 (x, y) dog 0
NP r · S
λy bites0 (dog 0 , y)
S
bites0 (dog 0 , man0 )
Function application
DisCo Models of Meaning
24
Semantics
Example
Learning
Vector-Space Semantics?
man
bites
dog
NP
NP r · S · NP l
NP
man0 λx.λy bites0 (x, y) dog 0
NP r · S
λy bites0 (dog 0 , y)
S
bites0 (dog 0 , man0 )
• What are the semantic types of the vectors?
• What is the equivalent of function application?
DisCo Models of Meaning
25
Semantics
Example
Adjective Noun Combinations
red
car
N · Nl N
N
DisCo Models of Meaning
Learning
26
Semantics
Example
Adjective Noun Combinations
red
car
N · Nl N
N
DisCo Models of Meaning
Learning
27
Semantics
Example
Learning
Adjective Noun Combinations
red
car
N · Nl N
N
• Adjective is a function
• How are functions represented in linear algebra? (B&Z, 2010)
• Functions are matrices (Linear Maps)
DisCo Models of Meaning
27
Semantics
Example
Learning
Adjective Noun Combinations
red
car
N · Nl N
N
• Adjective is a function
• How are functions represented in linear algebra? (B&Z, 2010)
• Functions are matrices (Linear Maps)
• How do functions combine with arguments in linear algebra?
• Matrix multiplication
DisCo Models of Meaning
27
Semantics
Example
Learning
Matrix Multiplication






R11
R21
R31
R41
R51
DisCo Models of Meaning
R12
R22
R32
R42
R52
R13
R23
R33
R43
R53
−−−−→
red car
−
→
car
RED
R14
R24
R34
R44
R54
R15
R25
R35
R45
R55






c1
c2
c3
c4
c5




 = 






rc1
rc2
rc3
rc4
rc5






28
Semantics
Example
Learning
Matrix and Vector Types






R11
R21
R31
R41
R51
DisCo Models of Meaning
R12
R22
R32
R42
R52
RED
−
→
car
−−−−→
red car
N⊗N
N
N
R13
R23
R33
R43
R53
R14
R24
R34
R44
R54
R15
R25
R35
R45
R55






c1
c2
c3
c4
c5




 = 






rc1
rc2
rc3
rc4
rc5






29
Semantics
Example
Learning
Matrix and Vector Types
−
→
car
−−−−→
red car
N⊗N
N
N
l
N
N
RED
N ·N






R11
R21
R31
R41
R51
DisCo Models of Meaning
R12
R22
R32
R42
R52
R13
R23
R33
R43
R53
R14
R24
R34
R44
R54
R15
R25
R35
R45
R55






c1
c2
c3
c4
c5




 = 






rc1
rc2
rc3
rc4
rc5






30
Semantics
Example
Syntactic Types to Tensor Spaces
man
bites
dog
NP NP r · S · NP l NP
N N⊗S ⊗N
DisCo Models of Meaning
N
Learning
31
Semantics
Example
Learning
Syntactic Types to Tensor Spaces
man
bites
dog
NP NP r · S · NP l NP
N N⊗S ⊗N
DisCo Models of Meaning
N
32
Semantics
Example
Learning
Syntactic Types to Tensor Spaces
man
bites
dog
NP NP r · S · NP l NP
N
DisCo Models of Meaning
N ⊗S ⊗N
N
33
Semantics
Example
Learning
Syntactic Types to Tensor Spaces
man
bites
dog
NP NP r · S · NP l NP
N
DisCo Models of Meaning
N⊗S⊗N
N
34
Semantics
Example
Learning
Syntactic Types to Tensor Spaces
man
bites
dog
NP NP r · S · NP l NP
N
N⊗S⊗N
N
• Key question: what is the sentence space?
DisCo Models of Meaning
34
Semantics
Example
Learning
Meaning Vectors as Tensors
S
N
N
DisCo Models of Meaning
N
N
35
Semantics
Example
Learning
36
The Distributional Meaning of a Verb
Ψ=
X
−
−
→) ∈ N ⊗ S ⊗ N
Cijk (→
ni ⊗ →
sj ⊗ −
n
k
ijk
hfluffy,T,fluffyihfluffy,F,fluffyihfluffy,T,fastihfluffy,F,fastihfluffy,T,juiceihfluffy,F,juiceihtasty,T,juicei
−−−→
chases
−−→
eats
DisCo Models of Meaning
0.8
0.7
0.2
0.3
0.75
0.6
0.25
0.4
0.2
0.9
0.8
0.1
...
0.1
0.1
Semantics
Example
Learning
The Distributional Meaning of a Verb
Ψ=
X
−
−
→) ∈ N ⊗ S ⊗ N
Cijk (→
ni ⊗ →
sj ⊗ −
n
k
ijk
• The verb tensor effectively encodes all the ways in which a verb
meaning can participate in the meaning of a sentence
DisCo Models of Meaning
37
Semantics
Example
Learning
The Distributional Meaning of a Verb
Ψ=
X
−
−
→) ∈ N ⊗ S ⊗ N
Cijk (→
ni ⊗ →
sj ⊗ −
n
k
ijk
• The verb tensor effectively encodes all the ways in which a verb
meaning can participate in the meaning of a sentence
• Key idea is that a verb is relational, or functional, in nature
DisCo Models of Meaning
37
Semantics
Example
2D Plausibility Space (S)
True
6
dog chases cat
apple chases orange
:
- False
DisCo Models of Meaning
Learning
38
Semantics
Example
Learning
Noun Space (N)
−→
dog
−→
cat
−−−→
apple
−
−−−→
orange
fluffy
run
fast
aggressive
tasty
buy
juice
fruit
0.8
0.9
0.0
0.0
0.8
0.8
0.0
0.0
0.7
0.6
0.0
0.0
0.6
0.3
0.0
0.0
0.1
0.0
0.9
1.0
0.5
0.5
0.9
0.9
0.0
0.0
0.8
1.0
0.0
0.0
1.0
1.0
DisCo Models of Meaning
39
Semantics
Example
Learning
40
Verb Space (N ⊗ S ⊗ N)
hfluffy,T,fluffyihfluffy,F,fluffyihfluffy,T,fastihfluffy,F,fastihfluffy,T,juiceihfluffy,F,juiceihtasty,T,juicei
−−−→
chases
−−→
eats
DisCo Models of Meaning
0.8
0.7
0.2
0.3
0.75
0.6
0.25
0.4
0.2
0.9
0.8
0.1
...
0.1
0.1
Semantics
Example
Learning
41
The Meaning of a Sentence
dog chases cat
hfluffy,T,fluffyihfluffy,F,fluffyihfluffy,T,fastihfluffy,F,fastihfluffy,T,juiceihfluffy,F,juiceihtasty,T,juicei
−−−→
chases
dog,cat
0.8
0.2
0.75
0.25
0.2
0.8
0.1
0.8,0.9 0.8,0.9 0.8,0.6 0.8,0.6 0.8,0.0 0.8,0.0 0.1,0.0
DisCo Models of Meaning
...
Semantics
Example
Learning
42
The Meaning of a Sentence
dog chases cat
hfluffy,T,fluffyihfluffy,F,fluffyihfluffy,T,fastihfluffy,F,fastihfluffy,T,juiceihfluffy,F,juiceihtasty,T,juicei
−−−→
chases
dog,cat
0.8
0.2
0.75
0.25
0.2
0.8
0.1
0.8,0.9 0.8,0.9 0.8,0.6 0.8,0.6 0.8,0.0 0.8,0.0 0.1,0.0
−−−−−−−−−−→
dog chases cat T =
0.8 . 0.8 . 0.9 + 0.75 . 0.8 . 0.6 + 0.2 . 0.8 . 0.0 + 0.1 . 0.1 . 0.0 + . . .
DisCo Models of Meaning
...
Semantics
Example
Learning
43
The Meaning of a Sentence
dog chases cat
hfluffy,T,fluffyihfluffy,F,fluffyihfluffy,T,fastihfluffy,F,fastihfluffy,T,juiceihfluffy,F,juiceihtasty,T,juicei
−−−→
chases
dog,cat
0.8
0.2
0.75
0.25
0.2
0.8
0.1
0.8,0.9 0.8,0.9 0.8,0.6 0.8,0.6 0.8,0.0 0.8,0.0 0.1,0.0
−−−−−−−−−−→
dog chases cat T =
0.8 . 0.8 . 0.9 + 0.75 . 0.8 . 0.6 + 0.2 . 0.8 . 0.0 + 0.1 . 0.1 . 0.0 + . . .
−−−−−−−−−−→
dog chases cat F =
0.2 . 0.8 . 0.9 + 0.25 . 0.8 . 0.6 + 0.8 . 0.8 . 0.0 + . . .
DisCo Models of Meaning
...
Semantics
Example
Learning
The Meaning of a Sentence
→
− −
−
f (→
π ⊗ Ψ ⊗→
o) =
X
−
−
−
−
−
Cijk h→
π |→
πi i→
sj h→
o |→
ok i
ijk
!
=
X X
j
DisCo Models of Meaning
ik
−
−
−
−
−
Cijk h→
π |→
πi ih→
o |→
ok i →
sj
44
Semantics
Example
Learning
Multi-Linear Algebra
S
N
N
N
S
N
N
S
DisCo Models of Meaning
N
45
Semantics
Example
Type Reductions
man
bites
dog
NP NP r · S · NP l NP
N
N⊗S⊗N
N
NP r · S
S
DisCo Models of Meaning
Learning
46
Semantics
Example
Type Reductions
man
bites
dog
NP NP r · S · NP l NP
N
N⊗S⊗N
N
NP r · S
N⊗S
S
Inner products (giving real numbers of dimension 1)
DisCo Models of Meaning
Learning
47
Semantics
Example
Type Reductions
man
bites
dog
NP NP r · S · NP l NP
N
N⊗S⊗N
N
NP r · S
N⊗S
S
S
Inner products
DisCo Models of Meaning
Learning
48
Semantics
Example
Learning
Summary of Vector-Space Semantics
Meaning of a sentence
w1 · · · wn
with the grammatical structure
p1 · · · pn →α s
is:
−
w−1−·−·−
·−
w→
n
:=
→ ⊗ ··· ⊗ −
F (α)(−
w
w→
1
n)
• F (α) is Montague’s homomorphic passage (Frege’s principle) in
the form of a linear map
DisCo Models of Meaning
49
Semantics
Example
Machine Learning
• Two crucial questions the framework does not answer:
1 what is the sentence space?
2 where do the tensor values Cijk come from?
• We’d like ML to answer (2) (and maybe (1) as well)
DisCo Models of Meaning
Learning
50
Semantics
Example
Learning
Machine Learning
• Two crucial questions the framework does not answer:
1 what is the sentence space?
2 where do the tensor values Cijk come from?
• We’d like ML to answer (2) (and maybe (1) as well)
• Objective function?
– probably depends on the task or application:
• parsing: optimise parsing performance on a treebank
• sentiment analysis: optimise accuracy on movie ratings prediction
• machine translation evaluation: optimise correlation between
vector similarities and human similarity scores
DisCo Models of Meaning
50
Semantics
Example
Learning
Neural Networks to the Rescue?
• Work by Socher, Manning and Ng addresses a similar problem
• Uses recursive neural networks and backpropagation through
structure to learn vectors and matrices
• But does not exploit the notion of syntactic type
DisCo Models of Meaning
51
Semantics
Example
Recursive Matrix-Vector Spaces
Taken from Socher, Bengio, Manning ACL 2012 tutorial
DisCo Models of Meaning
Learning
52
Semantics
Example
Learning
Recursive Matrix-Vector Spaces
• Softmax classifer can predict labels for each node in the tree
• Supervised labels can be used for training
• Objective functions based on minimising label errors . . .
• for the task of predicting sentiment labels for adjective-adverb
pairs (eg not great, pretty good)
• and also the Penn Treebank parsing task (for a simpler model and
short sentences)
DisCo Models of Meaning
53
Semantics
Example
Learning
Current Thoughts
• Can distributional semantic representations improve categorial
grammar parsing?
• Can the Socher method be applied to our type-driven framework?
• Should the sentence space be the same as the noun space?
– as it is in the Socher model
• Can we learn higher-order tensors? Or just matrices?
• the third-order tensor for the transitive verb could be approximated
with two matrices (one each for the subject and object)
DisCo Models of Meaning
54
Semantics
Example
Learning
References
• Type-Driven Syntax and Semantics for Composing Meaning Vectors,
Stephen Clark, a draft chapter to appear in Quantum Physics and Linguistics:
A Compositional, Diagrammatic Discourse, Heunen, Sadrzadeh and
Grefenstette Eds.
• Mathematical Foundations for a Compositional Distributional Model of
Meaning, Bob Coecke, Mehrnoosh Sadrzadeh, and Stephen Clark, Linguistic
Analysis: A Festschrift for Joachim Lambek, van Bentham and Moortgat
(eds), 2011
• Nouns are Vectors, Adjectives are Matrices: Representing
Adjective-Noun Constructions in Semantic Space, M. Baroni and R.
Zamparelli, Proceedings of EMNLP, Cambridge MA, 2010
• Semantic Compositionality through Recursive Matrix-Vector Spaces,
Richard Socher, Brody Huval, Christopher D. Manning and Andrew Y. Ng,
Proceedings of EMNLP, Jeju, Korea, 2012
DisCo Models of Meaning
55