A Mathematical Framework for a Distributional Compositional Model

A Mathematical Framework for a Distributional
Compositional Model of Meaning
Stephen Clark
University of Cambridge Computer Laboratory
University of Groningen
10 April 2013
Intro
Sentences in Google
DisCo Models of Meaning
Set-Theoretic
Lexical
2
Intro
Set-Theoretic
Lexical
Motivation
• Two success stories:
• distributional vector-based models of lexical meaning
• compositional logic-based models of sentence meaning
• Can we combine these approaches to give a vector-based
semantic model of phrases and sentences?
• A fundamental new problem in natural language semantics
DisCo Models of Meaning
3
Intro
Set-Theoretic
Lexical
Interdisciplinary Endeavour
• Collaboration with the Oxford Computational Linguistics and
Quantum groups
• B. Coecke, E. Grefenstette† , S. Pulman, M. Sadrzadeh†
• Linguistics, semantics, logic, category theory, quantum logic, . . .
† thanks to Ed and Mehrnoosh for some of the slides
DisCo Models of Meaning
4
Intro
Set-Theoretic
Lexical
A Hot Topic
• 1.5M 5-site† , 3-year EPSRC project
• 1M 5-year ERC Starting Grant (DisCoTex)
• 5-year ERC Starting Grant to Marco Baroni (COMPOSES)
• Lots of papers, workshops:
• ACL 2011: Distributional Semantics and Compositionality
• IWCS 2013: Towards a formal distributional semantics
• ...
† Cambridge, Edinburgh, Oxford, Sussex, York
DisCo Models of Meaning
5
Intro
Set-Theoretic
Today’s Talk
• Recap on set-theoretic approaches to semantics
• Distributional models of word meaning
• Categorial grammar syntax
• A compositional distributional model
• currently applied only to simple transitive verb sentences
• Related work in “Deep Learning”
DisCo Models of Meaning
Lexical
6
Intro
Set-Theoretic
Lexical
Formal (Montague) Semantics
• The dominant approach in linguistics and the philosophy of
language (Lewis, Montague, 1970s)
• Characterised by the use of logic as the semantic formalism
• A successful model of compositionality based on Frege’s principle
DisCo Models of Meaning
7
Intro
Set-Theoretic
Lexical
Formal (Montague) Semantics
S → NP VP : VP 0 (NP 0 )
The dog sleeps
• dog 0 picks out an individual in some model
• sleep 0 is a relation (the set of individuals who sleep in the model)
• The dog sleeps 0 is true if dog 0 is in sleep 0 and false otherwise
DisCo Models of Meaning
8
Intro
Set-Theoretic
Semantics in GOFAI (and Semantic Web)
• First-order predicate calculus
• Well-defined inference procedures
• Efficient theorem provers
• Knowledge encoded as ontologies
DisCo Models of Meaning
Lexical
9
Intro
Set-Theoretic
Lexical
Shortcomings of the Traditional Approach
Regular coffee breaks diminish the risk of getting Alzheiemers and
dementia in old age.
Three cups of coffee a day greatly reduce the chance of developing
dementia or alzheimers later in life.
DisCo Models of Meaning
10
Intro
Set-Theoretic
Lexical
Shortcomings of the Traditional Approach
Regular coffee breaks diminish the risk of getting Alzheiemers and
dementia in old age.
Three cups of coffee a day greatly reduce the chance of developing
dementia or alzheimers later in life.
• Semantic similarity is difficult to model using traditional methods
• Similarity is at the heart of many NLP and IR problems
• Evidence from cognitive science that similarity is part of humans’
conceptual models
DisCo Models of Meaning
10
Intro
Set-Theoretic
Lexical
Distributional and Semantic Similarity
• You shall know a word by the company that it keeps. (Firth,‘57)
• Distributional hypothesis: the meaning of a word can be
represented by the distribution of words appearing in its contexts
DisCo Models of Meaning
11
Intro
Set-Theoretic
Lexical
Distributional and Semantic Similarity
• dog and cat are related semantically:
dog and cat both co-occur with big, small, furry, eat, sleep
• ship and boat have similar meanings:
ship and boat appear as the direct object of the verbs sail, clean,
bought; as the object of the adjectives large, clean, expensive
• Induce lexical relations automatically from large text collections
and distributional similarity
DisCo Models of Meaning
12
Intro
Set-Theoretic
Lexical
Window Methods
• In window methods the context is a fixed-word window either side
of the target word
• For each target word a vector is created where each basis vector
corresponds to a context word
• Coefficient for each basis is a (weighted) frequency of the number
of times the corresponding context word appears in the context of
the target word
• Our compositional framework is agnostic towards the word vectors
DisCo Models of Meaning
13
Intro
Set-Theoretic
Vector Space for Window Method
furry
6
cat
dog
stroke
DisCo Models of Meaning
- pet
Lexical
14
Intro
Set-Theoretic
Lexical
Example Output
• introduction: launch, implementation, advent, addition,
adoption, arrival, absence, inclusion, creation, departure,
availability, elimination, emergence, use, acceptance, abolition,
array, passage, completion, announcement, . . .
DisCo Models of Meaning
15
Intro
Set-Theoretic
Lexical
Example Output
• evaluation: assessment, examination, appraisal, review, audit,
analysis, consultation, monitoring, testing, verification, inquiry,
inspection, measurement, supervision, certification, checkup, . . .
DisCo Models of Meaning
16
Intro
Set-Theoretic
Lexical
From Words to Sentences
s1
6
man killed dog
man murdered cat
?
- s3
man killed by dog
s2
DisCo Models of Meaning
17
Intro
Set-Theoretic
Lexical
What Semantics?!
• A semantics of similarity
• How to incorporate inference, logical operators, quantification,
etc. is an interesting question . . .
DisCo Models of Meaning
18
Syntax
Semantics
Learning
Categorial Grammar
interleukin − 10
inhibits
production
NP
(S \NP )/NP
NP
S \NP
S
DisCo Models of Meaning
19
Syntax
Semantics
Learning
A Simple CG Derivation
interleukin − 10
inhibits
production
NP
(S \NP )/NP
NP
S \NP
S
>
forward application
DisCo Models of Meaning
>
20
Syntax
Semantics
Learning
A Simple CG Derivation
interleukin − 10
inhibits
production
NP
(S \NP )/NP
NP
S \NP
S
>
<
forward application
backward application
DisCo Models of Meaning
>
<
21
Syntax
Semantics
Learning
Pregroup Grammar Derivation
Google
bought
Microsoft
NP
NP r · S · NP l
NP
S \NP
S
DisCo Models of Meaning
22
Syntax
Semantics
Learning
Pregroup Grammar Derivation
Google
bought
Microsoft
NP
NP r · S · NP l
NP
NP r · S
S
DisCo Models of Meaning
23
Syntax
Semantics
Learning
Pregroup Grammar Derivation
Google
bought
Microsoft
NP
NP r · S · NP l
NP
NP r · S
S
DisCo Models of Meaning
24
Syntax
Semantics
Learning
Pregroup Reduction
Google
NP
DisCo Models of Meaning
bought
NP^r S NP^l
Microsoft
NP
25
Syntax
Semantics
Learning
Predicate-Argument Semantics
man
bites
dog
NP
NP r · S · NP l
NP
man0 λx.λy bites0 (x, y) dog 0
NP r · S
S
DisCo Models of Meaning
26
Syntax
Semantics
Learning
Predicate-Argument Semantics
man
bites
dog
NP
NP r · S · NP l
NP
man0 λx.λy bites0 (x, y) dog 0
NP r · S
λy bites0 (dog 0 , y)
S
Function application
DisCo Models of Meaning
27
Syntax
Semantics
Learning
Predicate-Argument Semantics
man
bites
dog
NP
NP r · S · NP l
NP
man0 λx.λy bites0 (x, y) dog 0
NP r · S
λy bites0 (dog 0 , y)
S
bites0 (dog 0 , man0 )
Function application
DisCo Models of Meaning
28
Syntax
Semantics
Learning
Vector-Space Semantics?
man
bites
dog
NP
NP r · S · NP l
NP
man0 λx.λy bites0 (x, y) dog 0
NP r · S
λy bites0 (dog 0 , y)
S
bites0 (dog 0 , man0 )
• What are the semantic types of the vectors?
• What is the equivalent of function application?
DisCo Models of Meaning
29
Syntax
Semantics
Adjective Noun Combinations
red
car
N · Nl N
N
DisCo Models of Meaning
Learning
30
Syntax
Semantics
Adjective Noun Combinations
red
car
N · Nl N
N
DisCo Models of Meaning
Learning
31
Syntax
Semantics
Learning
Adjective Noun Combinations
red
car
N · Nl N
N
• Adjective is a function
• How are functions represented in linear algebra? (B&Z, 2010)
• Functions are matrices (Linear Maps)
DisCo Models of Meaning
31
Syntax
Semantics
Learning
Adjective Noun Combinations
red
car
N · Nl N
N
• Adjective is a function
• How are functions represented in linear algebra? (B&Z, 2010)
• Functions are matrices (Linear Maps)
• How do functions combine with arguments in linear algebra?
• Matrix multiplication
DisCo Models of Meaning
31
Syntax
Semantics
Learning
Matrix Multiplication






R11
R21
R31
R41
R51
DisCo Models of Meaning
R12
R22
R32
R42
R52
R13
R23
R33
R43
R53
−−−−→
red car
−
→
car
RED
R14
R24
R34
R44
R54
R15
R25
R35
R45
R55






c1
c2
c3
c4
c5




 = 






rc1
rc2
rc3
rc4
rc5






32
Syntax
Semantics
Learning
Matrix and Vector Types






R11
R21
R31
R41
R51
DisCo Models of Meaning
R12
R22
R32
R42
R52
RED
−
→
car
−−−−→
red car
N⊗N
N
N
R13
R23
R33
R43
R53
R14
R24
R34
R44
R54
R15
R25
R35
R45
R55






c1
c2
c3
c4
c5




 = 






rc1
rc2
rc3
rc4
rc5






33
Syntax
Semantics
Learning
Matrix and Vector Types
−
→
car
−−−−→
red car
N⊗N
N
N
l
N
N
RED
N ·N






R11
R21
R31
R41
R51
DisCo Models of Meaning
R12
R22
R32
R42
R52
R13
R23
R33
R43
R53
R14
R24
R34
R44
R54
R15
R25
R35
R45
R55






c1
c2
c3
c4
c5




 = 






rc1
rc2
rc3
rc4
rc5






34
Syntax
Semantics
Syntactic Types to Tensor Spaces
man
bites
dog
NP NP r · S · NP l NP
N N⊗S ⊗N
DisCo Models of Meaning
N
Learning
35
Syntax
Semantics
Learning
Syntactic Types to Tensor Spaces
man
bites
dog
NP NP r · S · NP l NP
N N⊗S ⊗N
DisCo Models of Meaning
N
36
Syntax
Semantics
Learning
Syntactic Types to Tensor Spaces
man
bites
dog
NP NP r · S · NP l NP
N
DisCo Models of Meaning
N ⊗S ⊗N
N
37
Syntax
Semantics
Learning
Syntactic Types to Tensor Spaces
man
bites
dog
NP NP r · S · NP l NP
N
DisCo Models of Meaning
N⊗S⊗N
N
38
Syntax
Semantics
Learning
Syntactic Types to Tensor Spaces
man
bites
dog
NP NP r · S · NP l NP
N
N⊗S⊗N
N
• Key question: what is the sentence space?
DisCo Models of Meaning
38
Syntax
Semantics
Learning
Meaning Vectors as Tensors
S
N
N
DisCo Models of Meaning
N
N
39
Syntax
Semantics
Learning
Multi-Linear Algebra
S
N
N
N
S
N
N
S
DisCo Models of Meaning
N
40
Syntax
Semantics
Learning
The Distributional Meaning of a Verb
Ψ=
X
−
−
→) ∈ N ⊗ S ⊗ N
Cijk (→
ni ⊗ →
sj ⊗ −
n
k
ijk
• The verb tensor effectively encodes all the ways in which a verb
meaning can participate in the meaning of a sentence
DisCo Models of Meaning
41
Syntax
Semantics
Learning
The Distributional Meaning of a Verb
Ψ=
X
−
−
→) ∈ N ⊗ S ⊗ N
Cijk (→
ni ⊗ →
sj ⊗ −
n
k
ijk
• The verb tensor effectively encodes all the ways in which a verb
meaning can participate in the meaning of a sentence
• Key idea is that a verb is relational, or functional, in nature
DisCo Models of Meaning
41
Syntax
Semantics
Learning
Type Reductions
man
bites
dog
NP NP r · S · NP l NP
N
N⊗S⊗N
N
NP r · S
S
DisCo Models of Meaning
42
Syntax
Semantics
Learning
Type Reductions
man
bites
dog
NP NP r · S · NP l NP
N
N⊗S⊗N
N
NP r · S
N⊗S
S
Inner products (giving real numbers of dimension 1)
DisCo Models of Meaning
43
Syntax
Semantics
Learning
Type Reductions
man
bites
dog
NP NP r · S · NP l NP
N
N⊗S⊗N
N
NP r · S
N⊗S
S
S
Inner products
DisCo Models of Meaning
44
Syntax
Semantics
Learning
Summary of Vector-Space Semantics
Meaning of a sentence
w1 · · · wn
with the grammatical structure
p1 · · · pn →α s
is:
−
w−1−·−·−
·−
w→
n
:=
→ ⊗ ··· ⊗ −
F (α)(−
w
w→
1
n)
• F (α) is Montague’s homomorphic passage (Frege’s principle) in
the form of a linear map
DisCo Models of Meaning
45
Syntax
Semantics
Learning
Lexical Category Sequence for Newspaper Sentence
In
an
Oct.
19
review
(S /S )/NP NP[nb]/N N /N N /N
N
M isanthrope
at
Chicago
N
(NP\NP)/NP
N
Revitalized Classics
N /N
City
N
,
of
0
T ake
the
(S [dcl]\NP)/NP NP[nb]/N
Leisure
the
role
of
N
(NP\NP)/NP
Kim Cattrall ,
N /N
N
(NP\NP)/S [dcl]
Stage
in
W indy
N
((S \NP)\(S \NP))/NP
N /N
RRB
Celimene ,
was
N
,
played
by
, S [pss]\NP ((S \NP)\(S \NP))/NP
mistakenly
attributed
to
, (S [dcl]\NP)/(S [pss]\NP) (S \NP)/(S \NP) (S [pss]\NP)/PP PP/NP
N
Christina Haag
N /N
−LRB−
Goodman T heatre
Arts −RRB− ,
&
NP[nb]/N
N /N
s
(NP[nb]/N )\NP
, (S \S )/(S \S ) (S \S )/(S \S ) S \S
N
T he
(NP\NP)/NP NP[nb]/N
N
DisCo Models of Meaning
.
.
46
Syntax
Semantics
Machine Learning
• Two crucial questions the framework does not answer:
1 what is the sentence space?
2 where do the tensor values Cijk come from?
• We’d like ML to answer (2) (and maybe (1) as well)
DisCo Models of Meaning
Learning
47
Syntax
Semantics
Learning
Machine Learning
• Two crucial questions the framework does not answer:
1 what is the sentence space?
2 where do the tensor values Cijk come from?
• We’d like ML to answer (2) (and maybe (1) as well)
• Objective function?
– probably depends on the task or application:
• parsing: optimise parsing performance on a treebank
• sentiment analysis: optimise accuracy on movie ratings prediction
• machine translation evaluation: optimise correlation between
vector similarities and human similarity scores
DisCo Models of Meaning
47
Syntax
Semantics
Learning
Neural Networks to the Rescue?
• Work by Socher, Manning and Ng addresses a similar problem
• Uses recursive neural networks and backpropagation through
structure to learn vectors and matrices
• But does not exploit the notion of syntactic type
DisCo Models of Meaning
48
Syntax
Semantics
Recursive Matrix-Vector Spaces
Taken from Socher, Bengio, Manning ACL 2012 tutorial
DisCo Models of Meaning
Learning
49
Syntax
Semantics
Learning
Current Thoughts
• Can we learn tensors for typed phrases?
• Can we learn higher-order tensors? Or just matrices?
• Should the sentence space be the same as the noun space?
• Should the sentence space be contextual?
• What should the learning mechanism be?
• recursive neural networks
• Baroni’s contextual method
• ...
DisCo Models of Meaning
50
Syntax
Semantics
Learning
References
• Type-Driven Syntax and Semantics for Composing Meaning Vectors,
Stephen Clark, a draft chapter to appear in Quantum Physics and Linguistics:
A Compositional, Diagrammatic Discourse, Heunen, Sadrzadeh and
Grefenstette Eds.
• Mathematical Foundations for a Compositional Distributional Model of
Meaning, Bob Coecke, Mehrnoosh Sadrzadeh, and Stephen Clark, Linguistic
Analysis: A Festschrift for Joachim Lambek, van Bentham and Moortgat
(eds), 2011
• Nouns are Vectors, Adjectives are Matrices: Representing
Adjective-Noun Constructions in Semantic Space, M. Baroni and R.
Zamparelli, Proceedings of EMNLP, Cambridge MA, 2010
• Semantic Compositionality through Recursive Matrix-Vector Spaces,
Richard Socher, Brody Huval, Christopher D. Manning and Andrew Y. Ng,
Proceedings of EMNLP, Jeju, Korea, 2012
DisCo Models of Meaning
51