A Mathematical Framework for a Distributional Compositional Model of Meaning Stephen Clark University of Cambridge Computer Laboratory Stanford University 10 May 2013 Intro Sentences in Google DisCo Models of Meaning Set-Theoretic Lexical Syntax 2 Intro Set-Theoretic Lexical Syntax Motivation • Two success stories: • distributional vector-based models of lexical meaning • compositional logic-based models of sentence meaning • Can we combine these approaches to give a vector-based semantic model of phrases and sentences? • A fundamental new problem in natural language semantics DisCo Models of Meaning 3 Intro Set-Theoretic Lexical Syntax Interdisciplinary Endeavour • Collaboration with the Oxford Computational Linguistics and Quantum groups • B. Coecke, E. Grefenstette† , S. Pulman, M. Sadrzadeh† • Linguistics, semantics, logic, category theory, quantum logic, . . . † thanks to Ed and Mehrnoosh for some of the slides DisCo Models of Meaning 4 Intro Set-Theoretic Lexical Today’s Talk • Recap on set-theoretic approaches to semantics • Distributional models of word meaning • Categorial grammar syntax • A compositional distributional model (in theory) • Brief description of some empirical work DisCo Models of Meaning Syntax 5 Intro Set-Theoretic Lexical Syntax Formal (Montague) Semantics • The dominant approach in linguistics and the philosophy of language (Lewis, Montague, 1970s) • Characterised by the use of logic as the semantic formalism • A successful model of compositionality based on Frege’s principle DisCo Models of Meaning 6 Intro Set-Theoretic Lexical Syntax Formal (Montague) Semantics S → NP VP : VP 0 (NP 0 ) The dog sleeps • dog 0 picks out an individual in some model • sleep 0 is a relation (the set of individuals who sleep in the model) • The dog sleeps 0 is true if dog 0 is in sleep 0 and false otherwise DisCo Models of Meaning 7 Intro Set-Theoretic Lexical Syntax Formal (Montague) Semantics S → NP VP : VP 0 (NP 0 ) The dog sleeps • dog 0 picks out an individual in some model • sleep 0 is a relation (the set of individuals who sleep in the model) • The dog sleeps 0 is true if dog 0 is in sleep 0 and false otherwise • Meanings of words and sentences have different semantic types DisCo Models of Meaning 7 Intro Set-Theoretic Lexical Semantics in GOFAI (and Semantic Web) • First-order predicate calculus • Well-defined inference procedures • Efficient theorem provers • Knowledge encoded as ontologies DisCo Models of Meaning Syntax 8 Intro Set-Theoretic Lexical Syntax Shortcomings of the Traditional Approach Regular coffee breaks diminish the risk of getting Alzheiemers and dementia in old age. Three cups of coffee a day greatly reduce the chance of developing dementia or alzheimers later in life. DisCo Models of Meaning 9 Intro Set-Theoretic Lexical Syntax Shortcomings of the Traditional Approach Regular coffee breaks diminish the risk of getting Alzheiemers and dementia in old age. Three cups of coffee a day greatly reduce the chance of developing dementia or alzheimers later in life. • Semantic similarity is difficult to model using traditional methods • Similarity is at the heart of many NLP and IR problems • Evidence from cognitive science that similarity is part of humans’ conceptual models DisCo Models of Meaning 9 Intro Set-Theoretic Lexical Syntax Distributional and Semantic Similarity • You shall know a word by the company that it keeps. (Firth,‘57) • Distributional hypothesis: the meaning of a word can be represented by the distribution of words appearing in its contexts DisCo Models of Meaning 10 Intro Set-Theoretic Lexical Syntax Distributional and Semantic Similarity • dog and cat are related semantically: dog and cat both co-occur with big, small, furry, eat, sleep • ship and boat have similar meanings: ship and boat appear as the direct object of the verbs sail, clean, bought; as the object of the adjectives large, clean, expensive DisCo Models of Meaning 11 Intro Set-Theoretic Lexical Syntax Window Methods • In window methods the context is a fixed-word window either side of the target word • For each target word a vector is created where each basis vector corresponds to a context word • Coefficient for each basis is a (weighted) frequency of how often the context word appears with the target word • Our compositional framework is agnostic towards the word vectors DisCo Models of Meaning 12 Intro Set-Theoretic Lexical Vector Space for Window Method furry 6 cat dog stroke DisCo Models of Meaning - pet Syntax 13 Intro Set-Theoretic Lexical Syntax Example Output • introduction: launch, implementation, advent, addition, adoption, arrival, absence, inclusion, creation, departure, availability, elimination, emergence, use, acceptance, abolition, array, passage, completion, announcement, . . . DisCo Models of Meaning 14 Intro Set-Theoretic Lexical Syntax Example Output • evaluation: assessment, examination, appraisal, review, audit, analysis, consultation, monitoring, testing, verification, inquiry, inspection, measurement, supervision, certification, checkup, . . . DisCo Models of Meaning 15 Intro Set-Theoretic Lexical From Words to Sentences s1 6 man killed dog man murdered cat ? - s3 man killed by dog s2 DisCo Models of Meaning Syntax 16 Intro Set-Theoretic Lexical Syntax What Semantics?! • A semantics of similarity • How to incorporate inference, logical operators, quantification, etc. is an interesting question . . . DisCo Models of Meaning 17 Intro Set-Theoretic Lexical Categorial Grammar interleukin − 10 inhibits production NP (S \NP )/NP NP S \NP S DisCo Models of Meaning Syntax 18 Intro Set-Theoretic Lexical Syntax A Simple CG Derivation interleukin − 10 inhibits production NP (S \NP )/NP NP S \NP S > forward application DisCo Models of Meaning > 19 Intro Set-Theoretic Lexical Syntax A Simple CG Derivation interleukin − 10 inhibits production NP (S \NP )/NP NP S \NP S > < forward application backward application DisCo Models of Meaning > < 20 Intro Set-Theoretic Lexical Pregroup Grammar Derivation Google bought Microsoft NP NP r · S · NP l NP S \NP S DisCo Models of Meaning Syntax 21 Intro Set-Theoretic Lexical Pregroup Grammar Derivation Google bought Microsoft NP NP r · S · NP l NP NP r · S S DisCo Models of Meaning Syntax 22 Intro Set-Theoretic Lexical Pregroup Grammar Derivation Google bought Microsoft NP NP r · S · NP l NP NP r · S S DisCo Models of Meaning Syntax 23 Intro Set-Theoretic Lexical Pregroup Reduction Google NP DisCo Models of Meaning bought NP^r S NP^l Microsoft NP Syntax 24 Intro Set-Theoretic Lexical Syntax Pregroup Algebra • A pregroup is a partially ordered monoid in which each element a has a left adjoint al and a right adjoint ar such that al · a → 1, DisCo Models of Meaning a · ar → 1 25 Intro Set-Theoretic Lexical Syntax Pregroups for Linguistics • The monoid is the set of grammatical types: NP , NP r , NP l , NP rr , NP ll , S , PP , . . . • The monoid operator (·) is just juxtaposition • The unit of the monoid (1) is the empty string DisCo Models of Meaning 26 Intro Set-Theoretic Lexical Syntax Pregroups for Linguistics • Partial order encodes the derivation relation; for the earlier derivation/reduction: NP · (NP r · S · NP l ) · NP → 1 · (S · NP l ) · NP → 1 · S · 1 = S DisCo Models of Meaning 27 Intro Set-Theoretic Lexical Syntax Pregroups for Linguistics • Partial order encodes the derivation relation; for the earlier derivation/reduction: NP · (NP r · S · NP l ) · NP → 1 · (S · NP l ) · NP → 1 · S · 1 = S • At an abstract mathematical level (Category Theory), the algebra of pregroups and vector spaces can be seen as equivalent DisCo Models of Meaning 27 Intro Set-Theoretic Lexical Syntax Category Theory • Pregroups form a compact closed category, with the types as objects, derivation arrows as morphisms, juxtaposition as tensor, and the under-links as the ‘cups’ of composition DisCo Models of Meaning 28 Intro Set-Theoretic Lexical Syntax Category Theory • Pregroups form a compact closed category, with the types as objects, derivation arrows as morphisms, juxtaposition as tensor, and the under-links as the ‘cups’ of composition • Vector spaces form a compact closed category, with vector spaces as objects, linear maps as morphisms, tensor product as tensor, and tensor contraction as the ‘cups’ of composition DisCo Models of Meaning 28 Intro Set-Theoretic Lexical Syntax Category Theory • Pregroups form a compact closed category, with the types as objects, derivation arrows as morphisms, juxtaposition as tensor, and the under-links as the ‘cups’ of composition • Vector spaces form a compact closed category, with vector spaces as objects, linear maps as morphisms, tensor product as tensor, and tensor contraction as the ‘cups’ of composition v Ψ • Similar pictures can be drawn for quantum protocols DisCo Models of Meaning w 29 Tensor Semantics Empirical Predicate-Argument Semantics man bites dog NP NP r · S · NP l NP man0 λx.λy bites0 (x, y) dog 0 NP r · S S DisCo Models of Meaning Conclusion 30 Tensor Semantics Empirical Predicate-Argument Semantics man bites dog NP NP r · S · NP l NP man0 λx.λy bites0 (x, y) dog 0 NP r · S λy bites0 (dog 0 , y) S Function application DisCo Models of Meaning Conclusion 31 Tensor Semantics Empirical Predicate-Argument Semantics man bites dog NP NP r · S · NP l NP man0 λx.λy bites0 (x, y) dog 0 NP r · S λy bites0 (dog 0 , y) S bites0 (dog 0 , man0 ) Function application DisCo Models of Meaning Conclusion 32 Tensor Semantics Empirical Vector-Space Semantics? man bites dog NP NP r · S · NP l NP man0 λx.λy bites0 (x, y) dog 0 NP r · S λy bites0 (dog 0 , y) S bites0 (dog 0 , man0 ) • What are the semantic types of the vectors? • What is the equivalent of function application? DisCo Models of Meaning Conclusion 33 Tensor Semantics Empirical Adjective Noun Combinations red car N · Nl N N DisCo Models of Meaning Conclusion 34 Tensor Semantics Empirical Adjective Noun Combinations red car N · Nl N N DisCo Models of Meaning Conclusion 35 Tensor Semantics Empirical Conclusion Adjective Noun Combinations red car N · Nl N N • Adjective is a function • How are functions represented in linear algebra? (B&Z, 2010) • Functions are matrices (Linear Maps) DisCo Models of Meaning 35 Tensor Semantics Empirical Conclusion Adjective Noun Combinations red car N · Nl N N • Adjective is a function • How are functions represented in linear algebra? (B&Z, 2010) • Functions are matrices (Linear Maps) • How do functions combine with arguments in linear algebra? • Matrix multiplication DisCo Models of Meaning 35 Tensor Semantics Empirical Conclusion Matrix Multiplication R11 R21 R31 R41 R51 DisCo Models of Meaning R12 R22 R32 R42 R52 R13 R23 R33 R43 R53 −−−−→ red car − → car RED R14 R24 R34 R44 R54 R15 R25 R35 R45 R55 c1 c2 c3 c4 c5 = rc1 rc2 rc3 rc4 rc5 36 Tensor Semantics Empirical Conclusion Matrix and Vector Types R11 R21 R31 R41 R51 DisCo Models of Meaning R12 R22 R32 R42 R52 RED − → car −−−−→ red car N⊗N N N R13 R23 R33 R43 R53 R14 R24 R34 R44 R54 R15 R25 R35 R45 R55 c1 c2 c3 c4 c5 = rc1 rc2 rc3 rc4 rc5 37 Tensor Semantics Empirical Conclusion Matrix and Vector Types − → car −−−−→ red car N⊗N N N l N N RED N ·N R11 R21 R31 R41 R51 DisCo Models of Meaning R12 R22 R32 R42 R52 R13 R23 R33 R43 R53 R14 R24 R34 R44 R54 R15 R25 R35 R45 R55 c1 c2 c3 c4 c5 = rc1 rc2 rc3 rc4 rc5 38 Tensor Semantics Empirical Syntactic Types to Tensor Spaces man bites dog NP NP r · S · NP l NP N N⊗S ⊗N DisCo Models of Meaning N Conclusion 39 Tensor Semantics Empirical Syntactic Types to Tensor Spaces man bites dog NP NP r · S · NP l NP N N⊗S ⊗N DisCo Models of Meaning N Conclusion 40 Tensor Semantics Empirical Syntactic Types to Tensor Spaces man bites dog NP NP r · S · NP l NP N DisCo Models of Meaning N ⊗S ⊗N N Conclusion 41 Tensor Semantics Empirical Syntactic Types to Tensor Spaces man bites dog NP NP r · S · NP l NP N DisCo Models of Meaning N⊗S⊗N N Conclusion 42 Tensor Semantics Empirical Conclusion Syntactic Types to Tensor Spaces man bites dog NP NP r · S · NP l NP N N⊗S⊗N N • What is the sentence space (different to the noun space)? DisCo Models of Meaning 42 Tensor Semantics Empirical Conclusion Meaning Vectors as Tensors S N N DisCo Models of Meaning N N 43 Tensor Semantics Empirical Conclusion Tensors • Rank 1 – vector: → − v ∈A= X − Civ → ai i • Rank 2 – matrix: M ∈A⊗B = X → − − M → Cij ai ⊗ bj ij • Rank 3 – cuboid: R∈A⊗B⊗C = X → − − − R → Cijk ai ⊗ bj ⊗ → ck ijk • Rank n: T ∈ V1 ⊗ . . . ⊗ Vn = X α1 ...αn DisCo Models of Meaning −→ −−→ CαT1 ...αn βα1 1 ⊗ . . . ⊗ βαnn 44 Tensor Semantics Empirical Conclusion Tensor contraction • Rank 0 × rank 0: field multiplication • Rank 0 × rank n: scalar multiplication • Rank 1 × rank 1: inner product (dot product) • Rank 2 × rank 1: matrix-vector multiplication • Rank 2 × rank 2: matrix multiplication • ... The general vector reduction mechanism is just a generalisation of these familiar tensor contractions DisCo Models of Meaning 45 Tensor Semantics Empirical Conclusion Multi-Linear Algebra S N N N S N N S DisCo Models of Meaning N 46 Tensor Semantics Empirical Type Reductions man bites dog NP NP r · S · NP l NP N N⊗S⊗N N NP r · S S DisCo Models of Meaning Conclusion 47 Tensor Semantics Empirical Type Reductions man bites dog NP NP r · S · NP l NP N N⊗S⊗N N NP r · S N⊗S S Tensor contraction via inner products Objects ‘get smaller’ (as they do in formal semantics) DisCo Models of Meaning Conclusion 48 Tensor Semantics Empirical Type Reductions man bites dog NP NP r · S · NP l NP N N⊗S⊗N N NP r · S N⊗S S S Tensor contraction DisCo Models of Meaning Conclusion 49 Tensor Semantics Empirical Type Reductions man bites dog NP NP r · S · NP l NP N N⊗S⊗N N NP r · S N⊗S S S Verbs only have operator semantics Nouns only have contextual semantics DisCo Models of Meaning Conclusion 50 Tensor Semantics Empirical Conclusion Summary of Tensor Semantics Meaning of a sentence w1 · · · wn with the grammatical structure p1 · · · pn →α s is: − w−1−·−·− ·− w→ n := → ⊗ ··· ⊗ − F (α)(− w w→ 1 n) • F (α) is Montague’s homomorphic passage (Frege’s principle) in the form of a linear map DisCo Models of Meaning 51 Tensor Semantics Empirical Conclusion A Real Example In l S · S · NP an l Oct. NP[nb] · N M isanthrope l at r Revitalized Classics N · Nl , N N ·N review l l 0 Celimene , N s NP · NP[nb] · N l N T ake & the Arts played attributed N −RRB− , RRB by N · Nl to −LRB− in the , NP[nb] · N l W indy role of N NP r · NP · NP l was , NP r · S [dcl] · S [pss]l · NP NP Christina Haag N · Nl NP r · NP · S [dcl]l N S r · NP rr · NP r · S · NP l N · N l Cattrall , NP r · S · S l · NP r NP r · S [pss] · PP l PP · NP l DisCo Models of Meaning NP[nb] · N l Stage , NP r · S [pss] S r · NP rr · NP r · S · NP l mistakenly l Goodman T heatre r NP r · S [dcl] · NP l N P [nb] · N l Leisure T he NP · NP · NP N , S r · S · S l · S S r · S · S l · S Sr · S N of r Chicago NP · NP · NP N City N ·N 19 l N . . 52 Tensor Semantics Empirical Conclusion Learning Matrices red car red balloon red chair red shoe ... DisCo Models of Meaning R11 h. . .i R21 h. . .i h. . .i =⇒ R31 R41 h. . .i R51 R12 R22 R32 R42 R52 R13 R23 R33 R43 R53 R14 R24 R34 R44 R54 R15 R25 R35 R45 R55 53 Tensor Semantics Empirical Conclusion Learning Matrices red car red balloon red chair red shoe ... R11 h. . .i R21 h. . .i h. . .i =⇒ R31 R41 h. . .i R51 R12 R22 R32 R42 R52 R13 R23 R33 R43 R53 R14 R24 R34 R44 R54 • Use linear regression to learn the RED matrix (B&Z) • RED can now be applied to unseen pairs: −−−−−−→ −−−−−−−−−→ RED × pantaloon ⇒ red pantaloon DisCo Models of Meaning R15 R25 R35 R45 R55 53 Tensor Semantics Empirical Conclusion Learning Matrices tiger sleeps cat sleeps man sleeps pig sleeps ... DisCo Models of Meaning S11 h. . .i S21 h. . .i h. . .i =⇒ S31 S41 h. . .i S51 S12 S22 S32 S42 S52 S13 S23 S33 S43 S53 S14 S24 S34 S44 S54 S15 S25 S35 S45 S55 54 Tensor Semantics Empirical Conclusion Learning Matrices tiger sleeps cat sleeps man sleeps pig sleeps ... S11 h. . .i S21 h. . .i h. . .i =⇒ S31 S41 h. . .i S51 S12 S22 S32 S42 S52 S13 S23 S33 S43 S53 S14 S24 S34 S44 S54 S15 S25 S35 S45 S55 • Use linear regression to learn the SLEEPS matrix (Gref et. al) • SLEEPS can now be applied to unseen pairs: −−−−−→ −−−−−−−−−−→ SLEEPS × chiwawa ⇒ chiwawa sleeps DisCo Models of Meaning 54 Tensor Semantics Empirical Conclusion Context and Compositionality • Should the meanings of all units be contextual? (B&Z, Clarke) • If so, what role does compositionality play? • is it just to combat sparse data? DisCo Models of Meaning 55 Tensor Semantics Empirical Conclusion Learning Tensors STEP 1: ESTIMATE VP MATRICES dogs.eat.meat dogs cats EAT MEAT cats.eat.meat boys.eat.pie boys girls EAT PIE girls.eat.pie STEP 2: ESTIMATE V TENSOR EAT MEAT meat EAT pie EAT PIE training example (input) training example (output) function to estimate Thanks to Marco Baroni and Ed Grefenstette for the picture DisCo Models of Meaning 56 Tensor Semantics Empirical Conclusion Disambiguation Evaluation Nouns and intransitive verbs (Mitchell and Lapata, 2008) Subject Landmark High Low face fire horse glow glow draw beam burn pull burn beam sketch Example judgements (score between 1 and 7): • “the face glowed” vs. “the face beamed” • “the face glowed” vs. “the face burned” • “the fire glowed” vs. “the fire burned” DisCo Models of Meaning 57 Tensor Semantics Empirical Conclusion Disambiguation Evaluation Nouns and transitive verbs (Grefenstette and Sadrzadeh, 2011) Subject Object Landmark High Low people tribunal poll door crime support try try show test judge express judge test picture Example judgements (score between 1 and 7): • “the people tried the door” vs. “the people tested the door” • “the people tried the door” vs. “the people judged the door” • “the tribunal tried the crime” vs. “the tribunal judged the crime” DisCo Models of Meaning 58 Tensor Semantics Empirical Conclusion Experimental Details (Gref et. al) • Build context vectors using large corpus and window method (with dimensionality reduction) • Calculate similarity of each pair using cosine: −−−−−−−−−−→ −−−−−−−−−−−→ • Cosine(people try door, people test door) −−−−−−−−−−→ −−−−−−−−−−−−→ • Cosine(people try door, people judge door) • Calculate correlation coefficient between human and cosine scores DisCo Models of Meaning 59 Tensor Semantics Results (Gref et. al) First Evaluation Model ρ UpperBound Multiply.nmf Regression.nmf Add.nmf Verb.nmf Regression.svd Add.svd Verb.svd 0.40 0.19 0.18 0.13 0.08 0.23 0.11 0.06 DisCo Models of Meaning Empirical Conclusion 60 Tensor Semantics Empirical Conclusion Results (Gref et. al) First Evaluation Second Evaluation Model ρ Model ρ UpperBound Multiply.nmf Regression.nmf Add.nmf Verb.nmf Regression.svd Add.svd Verb.svd 0.40 0.19 0.18 0.13 0.08 0.23 0.11 0.06 UpperBound Regression.nmf Multiply.nmf Add.nmf Verb.nmf Regression.svd Add.svd Verb.svd 0.62 0.29 0.23 0.07 0.04 0.32 0.12 0.08 DisCo Models of Meaning 60 Tensor Semantics Empirical Conclusion Current Thoughts I • What should the sentence space be? • Should the sentence space be contextual? • What should the learning mechanism be? • Can current learning methods be generalised to typed tensors and naturally occurring text? DisCo Models of Meaning 61 Tensor Semantics Empirical Conclusion Current Thoughts II • How to deal with closed-class words (eg relative pronouns) • How to combine distributional and symbolic methods • find me all wild animals which might make good pets DisCo Models of Meaning 62 Tensor Semantics Empirical Conclusion References • Type-Driven Syntax and Semantics for Composing Meaning Vectors, Stephen Clark, in OUP Quantum Physics and Linguistics: A Compositional, Diagrammatic Discourse, Heunen, Sadrzadeh and Grefenstette (eds), 2013 • Mathematical Foundations for a Compositional Distributional Model of Meaning, Bob Coecke, Mehrnoosh Sadrzadeh, and Stephen Clark, Linguistic Analysis: A Festschrift for Joachim Lambek, van Bentham and Moortgat (eds), 2011 • Experimental Support for a Categorical Compositional Distributional Model of Meaning, Edward Grefenstette and Mehrnoosh Sadrzadeh, In Proceedings of EMNLP, Edinburgh, 2011 • Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space, M. Baroni and R. Zamparelli, Proceedings of EMNLP, Cambridge MA, 2010 • A Context-Theoretic Framework for Compositionality in Distributional Semantics, Daoud Clarke, Computational Linguistics, 38(1):41-71, 2012. • MultiStep Regression Learning for Compositional Distributional Semantics, Edward Grefenstette, Georgiana Dinu, Yao-Zhong Zhang, Mehrnoosh Sadrzadeh and Marco Baroni, Proceedings of IWCS, Potsdam, 2013 DisCo Models of Meaning 63
© Copyright 2025 Paperzz