A Mathematical Framework for a Distributional Compositional Model of Meaning Stephen Clark University of Cambridge Computer Laboratory University of Groningen 10 April 2013 Intro Sentences in Google DisCo Models of Meaning Set-Theoretic Lexical 2 Intro Set-Theoretic Lexical Motivation • Two success stories: • distributional vector-based models of lexical meaning • compositional logic-based models of sentence meaning • Can we combine these approaches to give a vector-based semantic model of phrases and sentences? • A fundamental new problem in natural language semantics DisCo Models of Meaning 3 Intro Set-Theoretic Lexical Interdisciplinary Endeavour • Collaboration with the Oxford Computational Linguistics and Quantum groups • B. Coecke, E. Grefenstette† , S. Pulman, M. Sadrzadeh† • Linguistics, semantics, logic, category theory, quantum logic, . . . † thanks to Ed and Mehrnoosh for some of the slides DisCo Models of Meaning 4 Intro Set-Theoretic Lexical A Hot Topic • 1.5M 5-site† , 3-year EPSRC project • 1M 5-year ERC Starting Grant (DisCoTex) • 5-year ERC Starting Grant to Marco Baroni (COMPOSES) • Lots of papers, workshops: • ACL 2011: Distributional Semantics and Compositionality • IWCS 2013: Towards a formal distributional semantics • ... † Cambridge, Edinburgh, Oxford, Sussex, York DisCo Models of Meaning 5 Intro Set-Theoretic Today’s Talk • Recap on set-theoretic approaches to semantics • Distributional models of word meaning • Categorial grammar syntax • A compositional distributional model • currently applied only to simple transitive verb sentences • Related work in “Deep Learning” DisCo Models of Meaning Lexical 6 Intro Set-Theoretic Lexical Formal (Montague) Semantics • The dominant approach in linguistics and the philosophy of language (Lewis, Montague, 1970s) • Characterised by the use of logic as the semantic formalism • A successful model of compositionality based on Frege’s principle DisCo Models of Meaning 7 Intro Set-Theoretic Lexical Formal (Montague) Semantics S → NP VP : VP 0 (NP 0 ) The dog sleeps • dog 0 picks out an individual in some model • sleep 0 is a relation (the set of individuals who sleep in the model) • The dog sleeps 0 is true if dog 0 is in sleep 0 and false otherwise DisCo Models of Meaning 8 Intro Set-Theoretic Semantics in GOFAI (and Semantic Web) • First-order predicate calculus • Well-defined inference procedures • Efficient theorem provers • Knowledge encoded as ontologies DisCo Models of Meaning Lexical 9 Intro Set-Theoretic Lexical Shortcomings of the Traditional Approach Regular coffee breaks diminish the risk of getting Alzheiemers and dementia in old age. Three cups of coffee a day greatly reduce the chance of developing dementia or alzheimers later in life. DisCo Models of Meaning 10 Intro Set-Theoretic Lexical Shortcomings of the Traditional Approach Regular coffee breaks diminish the risk of getting Alzheiemers and dementia in old age. Three cups of coffee a day greatly reduce the chance of developing dementia or alzheimers later in life. • Semantic similarity is difficult to model using traditional methods • Similarity is at the heart of many NLP and IR problems • Evidence from cognitive science that similarity is part of humans’ conceptual models DisCo Models of Meaning 10 Intro Set-Theoretic Lexical Distributional and Semantic Similarity • You shall know a word by the company that it keeps. (Firth,‘57) • Distributional hypothesis: the meaning of a word can be represented by the distribution of words appearing in its contexts DisCo Models of Meaning 11 Intro Set-Theoretic Lexical Distributional and Semantic Similarity • dog and cat are related semantically: dog and cat both co-occur with big, small, furry, eat, sleep • ship and boat have similar meanings: ship and boat appear as the direct object of the verbs sail, clean, bought; as the object of the adjectives large, clean, expensive • Induce lexical relations automatically from large text collections and distributional similarity DisCo Models of Meaning 12 Intro Set-Theoretic Lexical Window Methods • In window methods the context is a fixed-word window either side of the target word • For each target word a vector is created where each basis vector corresponds to a context word • Coefficient for each basis is a (weighted) frequency of the number of times the corresponding context word appears in the context of the target word • Our compositional framework is agnostic towards the word vectors DisCo Models of Meaning 13 Intro Set-Theoretic Vector Space for Window Method furry 6 cat dog stroke DisCo Models of Meaning - pet Lexical 14 Intro Set-Theoretic Lexical Example Output • introduction: launch, implementation, advent, addition, adoption, arrival, absence, inclusion, creation, departure, availability, elimination, emergence, use, acceptance, abolition, array, passage, completion, announcement, . . . DisCo Models of Meaning 15 Intro Set-Theoretic Lexical Example Output • evaluation: assessment, examination, appraisal, review, audit, analysis, consultation, monitoring, testing, verification, inquiry, inspection, measurement, supervision, certification, checkup, . . . DisCo Models of Meaning 16 Intro Set-Theoretic Lexical From Words to Sentences s1 6 man killed dog man murdered cat ? - s3 man killed by dog s2 DisCo Models of Meaning 17 Intro Set-Theoretic Lexical What Semantics?! • A semantics of similarity • How to incorporate inference, logical operators, quantification, etc. is an interesting question . . . DisCo Models of Meaning 18 Syntax Semantics Learning Categorial Grammar interleukin − 10 inhibits production NP (S \NP )/NP NP S \NP S DisCo Models of Meaning 19 Syntax Semantics Learning A Simple CG Derivation interleukin − 10 inhibits production NP (S \NP )/NP NP S \NP S > forward application DisCo Models of Meaning > 20 Syntax Semantics Learning A Simple CG Derivation interleukin − 10 inhibits production NP (S \NP )/NP NP S \NP S > < forward application backward application DisCo Models of Meaning > < 21 Syntax Semantics Learning Pregroup Grammar Derivation Google bought Microsoft NP NP r · S · NP l NP S \NP S DisCo Models of Meaning 22 Syntax Semantics Learning Pregroup Grammar Derivation Google bought Microsoft NP NP r · S · NP l NP NP r · S S DisCo Models of Meaning 23 Syntax Semantics Learning Pregroup Grammar Derivation Google bought Microsoft NP NP r · S · NP l NP NP r · S S DisCo Models of Meaning 24 Syntax Semantics Learning Pregroup Reduction Google NP DisCo Models of Meaning bought NP^r S NP^l Microsoft NP 25 Syntax Semantics Learning Predicate-Argument Semantics man bites dog NP NP r · S · NP l NP man0 λx.λy bites0 (x, y) dog 0 NP r · S S DisCo Models of Meaning 26 Syntax Semantics Learning Predicate-Argument Semantics man bites dog NP NP r · S · NP l NP man0 λx.λy bites0 (x, y) dog 0 NP r · S λy bites0 (dog 0 , y) S Function application DisCo Models of Meaning 27 Syntax Semantics Learning Predicate-Argument Semantics man bites dog NP NP r · S · NP l NP man0 λx.λy bites0 (x, y) dog 0 NP r · S λy bites0 (dog 0 , y) S bites0 (dog 0 , man0 ) Function application DisCo Models of Meaning 28 Syntax Semantics Learning Vector-Space Semantics? man bites dog NP NP r · S · NP l NP man0 λx.λy bites0 (x, y) dog 0 NP r · S λy bites0 (dog 0 , y) S bites0 (dog 0 , man0 ) • What are the semantic types of the vectors? • What is the equivalent of function application? DisCo Models of Meaning 29 Syntax Semantics Adjective Noun Combinations red car N · Nl N N DisCo Models of Meaning Learning 30 Syntax Semantics Adjective Noun Combinations red car N · Nl N N DisCo Models of Meaning Learning 31 Syntax Semantics Learning Adjective Noun Combinations red car N · Nl N N • Adjective is a function • How are functions represented in linear algebra? (B&Z, 2010) • Functions are matrices (Linear Maps) DisCo Models of Meaning 31 Syntax Semantics Learning Adjective Noun Combinations red car N · Nl N N • Adjective is a function • How are functions represented in linear algebra? (B&Z, 2010) • Functions are matrices (Linear Maps) • How do functions combine with arguments in linear algebra? • Matrix multiplication DisCo Models of Meaning 31 Syntax Semantics Learning Matrix Multiplication R11 R21 R31 R41 R51 DisCo Models of Meaning R12 R22 R32 R42 R52 R13 R23 R33 R43 R53 −−−−→ red car − → car RED R14 R24 R34 R44 R54 R15 R25 R35 R45 R55 c1 c2 c3 c4 c5 = rc1 rc2 rc3 rc4 rc5 32 Syntax Semantics Learning Matrix and Vector Types R11 R21 R31 R41 R51 DisCo Models of Meaning R12 R22 R32 R42 R52 RED − → car −−−−→ red car N⊗N N N R13 R23 R33 R43 R53 R14 R24 R34 R44 R54 R15 R25 R35 R45 R55 c1 c2 c3 c4 c5 = rc1 rc2 rc3 rc4 rc5 33 Syntax Semantics Learning Matrix and Vector Types − → car −−−−→ red car N⊗N N N l N N RED N ·N R11 R21 R31 R41 R51 DisCo Models of Meaning R12 R22 R32 R42 R52 R13 R23 R33 R43 R53 R14 R24 R34 R44 R54 R15 R25 R35 R45 R55 c1 c2 c3 c4 c5 = rc1 rc2 rc3 rc4 rc5 34 Syntax Semantics Syntactic Types to Tensor Spaces man bites dog NP NP r · S · NP l NP N N⊗S ⊗N DisCo Models of Meaning N Learning 35 Syntax Semantics Learning Syntactic Types to Tensor Spaces man bites dog NP NP r · S · NP l NP N N⊗S ⊗N DisCo Models of Meaning N 36 Syntax Semantics Learning Syntactic Types to Tensor Spaces man bites dog NP NP r · S · NP l NP N DisCo Models of Meaning N ⊗S ⊗N N 37 Syntax Semantics Learning Syntactic Types to Tensor Spaces man bites dog NP NP r · S · NP l NP N DisCo Models of Meaning N⊗S⊗N N 38 Syntax Semantics Learning Syntactic Types to Tensor Spaces man bites dog NP NP r · S · NP l NP N N⊗S⊗N N • Key question: what is the sentence space? DisCo Models of Meaning 38 Syntax Semantics Learning Meaning Vectors as Tensors S N N DisCo Models of Meaning N N 39 Syntax Semantics Learning Multi-Linear Algebra S N N N S N N S DisCo Models of Meaning N 40 Syntax Semantics Learning The Distributional Meaning of a Verb Ψ= X − − →) ∈ N ⊗ S ⊗ N Cijk (→ ni ⊗ → sj ⊗ − n k ijk • The verb tensor effectively encodes all the ways in which a verb meaning can participate in the meaning of a sentence DisCo Models of Meaning 41 Syntax Semantics Learning The Distributional Meaning of a Verb Ψ= X − − →) ∈ N ⊗ S ⊗ N Cijk (→ ni ⊗ → sj ⊗ − n k ijk • The verb tensor effectively encodes all the ways in which a verb meaning can participate in the meaning of a sentence • Key idea is that a verb is relational, or functional, in nature DisCo Models of Meaning 41 Syntax Semantics Learning Type Reductions man bites dog NP NP r · S · NP l NP N N⊗S⊗N N NP r · S S DisCo Models of Meaning 42 Syntax Semantics Learning Type Reductions man bites dog NP NP r · S · NP l NP N N⊗S⊗N N NP r · S N⊗S S Inner products (giving real numbers of dimension 1) DisCo Models of Meaning 43 Syntax Semantics Learning Type Reductions man bites dog NP NP r · S · NP l NP N N⊗S⊗N N NP r · S N⊗S S S Inner products DisCo Models of Meaning 44 Syntax Semantics Learning Summary of Vector-Space Semantics Meaning of a sentence w1 · · · wn with the grammatical structure p1 · · · pn →α s is: − w−1−·−·− ·− w→ n := → ⊗ ··· ⊗ − F (α)(− w w→ 1 n) • F (α) is Montague’s homomorphic passage (Frege’s principle) in the form of a linear map DisCo Models of Meaning 45 Syntax Semantics Learning Lexical Category Sequence for Newspaper Sentence In an Oct. 19 review (S /S )/NP NP[nb]/N N /N N /N N M isanthrope at Chicago N (NP\NP)/NP N Revitalized Classics N /N City N , of 0 T ake the (S [dcl]\NP)/NP NP[nb]/N Leisure the role of N (NP\NP)/NP Kim Cattrall , N /N N (NP\NP)/S [dcl] Stage in W indy N ((S \NP)\(S \NP))/NP N /N RRB Celimene , was N , played by , S [pss]\NP ((S \NP)\(S \NP))/NP mistakenly attributed to , (S [dcl]\NP)/(S [pss]\NP) (S \NP)/(S \NP) (S [pss]\NP)/PP PP/NP N Christina Haag N /N −LRB− Goodman T heatre Arts −RRB− , & NP[nb]/N N /N s (NP[nb]/N )\NP , (S \S )/(S \S ) (S \S )/(S \S ) S \S N T he (NP\NP)/NP NP[nb]/N N DisCo Models of Meaning . . 46 Syntax Semantics Machine Learning • Two crucial questions the framework does not answer: 1 what is the sentence space? 2 where do the tensor values Cijk come from? • We’d like ML to answer (2) (and maybe (1) as well) DisCo Models of Meaning Learning 47 Syntax Semantics Learning Machine Learning • Two crucial questions the framework does not answer: 1 what is the sentence space? 2 where do the tensor values Cijk come from? • We’d like ML to answer (2) (and maybe (1) as well) • Objective function? – probably depends on the task or application: • parsing: optimise parsing performance on a treebank • sentiment analysis: optimise accuracy on movie ratings prediction • machine translation evaluation: optimise correlation between vector similarities and human similarity scores DisCo Models of Meaning 47 Syntax Semantics Learning Neural Networks to the Rescue? • Work by Socher, Manning and Ng addresses a similar problem • Uses recursive neural networks and backpropagation through structure to learn vectors and matrices • But does not exploit the notion of syntactic type DisCo Models of Meaning 48 Syntax Semantics Recursive Matrix-Vector Spaces Taken from Socher, Bengio, Manning ACL 2012 tutorial DisCo Models of Meaning Learning 49 Syntax Semantics Learning Current Thoughts • Can we learn tensors for typed phrases? • Can we learn higher-order tensors? Or just matrices? • Should the sentence space be the same as the noun space? • Should the sentence space be contextual? • What should the learning mechanism be? • recursive neural networks • Baroni’s contextual method • ... DisCo Models of Meaning 50 Syntax Semantics Learning References • Type-Driven Syntax and Semantics for Composing Meaning Vectors, Stephen Clark, a draft chapter to appear in Quantum Physics and Linguistics: A Compositional, Diagrammatic Discourse, Heunen, Sadrzadeh and Grefenstette Eds. • Mathematical Foundations for a Compositional Distributional Model of Meaning, Bob Coecke, Mehrnoosh Sadrzadeh, and Stephen Clark, Linguistic Analysis: A Festschrift for Joachim Lambek, van Bentham and Moortgat (eds), 2011 • Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space, M. Baroni and R. Zamparelli, Proceedings of EMNLP, Cambridge MA, 2010 • Semantic Compositionality through Recursive Matrix-Vector Spaces, Richard Socher, Brody Huval, Christopher D. Manning and Andrew Y. Ng, Proceedings of EMNLP, Jeju, Korea, 2012 DisCo Models of Meaning 51
© Copyright 2025 Paperzz