A Mathematical Framework for a Distributional Compositional Model of Meaning Stephen Clark University of Cambridge Computer Laboratory King’s College London 22 November 2012 Intro Lexical Syntax 2 Sentences in Google illed his dog - Google Search +You Web Images Videos Maps News Mail More http://www.google.co.uk/ [email protected] A man killed his dog Search About 22,300,000 results (0.34 seconds) Everything Dog shoots man | Metro.co.uk Images www.metro.co.uk/weird/82965-dog-shoots-man Dog shoots man. Gun Woof, woof, you're dead. A man was killed after his dog stepped on a loaded shotgun in the back of a pick-up truck. Perry Price, a ... Maps Videos News Shopping More London, UK Change location man who killed his dog to survive in the amazon - Topic community.discovery.com/eve/forums/a/tpc/f/.../m/48719019601 34 posts - 14 authors - Last post: 2 Sep The man made a stupid decision to go into the amazon by himself and had his poor dog (who ran 40 miles after he crashed his canoe in the ... The Man who Killed His Friend for Eating his Dog After it was Killed ... notverycool.com/.../the-man-who-killed-his-friend-for-eating-his-do... 18 Aug 2011 – And killed it. That's not very cool. The man's friend then attempted to take the dead dog home to eat. In response, the man shot his friend with ... DisCo Models of Meaning News for A man killed his dog SafeSearch is locked Intro Lexical Syntax Motivation • Two success stories: • distributional vector-based models of lexical meaning (in practice) • compositional logic-based models of sentence meaning (in theory) • Can we combine these approaches to give a vector-based semantic model of phrases and sentences? • A fundamental problem in natural language semantics DisCo Models of Meaning 3 Intro Lexical Syntax Interdisciplinary Endeavour • Collaboration with the Oxford Computational Linguistics and Quantum groups • B. Coecke, E. Grefenstette† , S. Pulman, M. Sadrzadeh† • Linguistics, semantics, logic, category theory, quantum logic, . . . † thanks to Ed and Mehrnoosh for some of the slides DisCo Models of Meaning 4 Intro Lexical Syntax A Hot Topic • 1.5M 5-site† , 3-year EPSRC project • 1M 5-year ERC Starting Grant (DisCoTex) • 5-year ERC Starting Grant to Marco Baroni (COMPOSES) • EPSRC Career Acceleration Fellowship to Mehrnoosh Sadrzadeh • Lots of papers, workshops: • ACL 2011: Distributional Semantics and Compositionality • IWCS 2013: Towards a formal distributional semantics • ... † Cambridge, Edinburgh, Oxford, Sussex, York DisCo Models of Meaning 5 Intro Lexical Today’s Talk • Distributional models of word meaning • Categorial grammar syntax • A compositional distributional model • An example sentence space • based on simple transitive verb sentences • Related work in “Deep Learning” DisCo Models of Meaning Syntax 6 Intro Lexical Syntax Distributional and Semantic Similarity • You shall know a word by the company that it keeps. (Firth,‘57) • Distributional hypothesis: the meaning of a word can be represented by the distribution of words appearing in its contexts DisCo Models of Meaning 7 Intro Lexical Syntax Distributional and Semantic Similarity • dog and cat are related semantically: dog and cat both co-occur with big, small, furry, eat, sleep • ship and boat have similar meanings: ship and boat appear as the direct object of the verbs sail, clean, bought; as the object of the adjectives large, clean, expensive • Induce lexical relations automatically from large text collections and distributional similarity DisCo Models of Meaning 8 Intro Lexical Syntax Window Methods • In window methods the context is a fixed-word window either side of the target word • For each target word a vector is created where each basis vector corresponds to a context word • Coefficient for each basis is a (weighted) frequency of the number of times the corresponding context word appears in the context of the target word • Our compositional framework is agnostic towards the word vectors DisCo Models of Meaning 9 Intro Lexical Vector Space for Window Method furry 6 cat dog stroke DisCo Models of Meaning - pet Syntax 10 Intro Lexical Syntax Example Output • introduction: launch, implementation, advent, addition, adoption, arrival, absence, inclusion, creation, departure, availability, elimination, emergence, use, acceptance, abolition, array, passage, completion, announcement, . . . DisCo Models of Meaning 11 Intro Lexical Syntax Example Output • evaluation: assessment, examination, appraisal, review, audit, analysis, consultation, monitoring, testing, verification, inquiry, inspection, measurement, supervision, certification, checkup, . . . DisCo Models of Meaning 12 Intro Lexical Syntax From Words to Sentences s1 6 man killed dog man murdered cat ? - s3 man killed by dog s2 DisCo Models of Meaning 13 Intro Lexical Syntax What Semantics?! • A semantics of similarity • How to incorporate inference, logical operators, quantification, etc. is an interesting question . . . DisCo Models of Meaning 14 Intro Lexical Syntax Categorial Grammar interleukin − 10 inhibits production NP (S \NP )/NP NP S \NP S DisCo Models of Meaning 15 Intro Lexical Syntax A Simple CG Derivation interleukin − 10 inhibits production NP (S \NP )/NP NP S \NP S > forward application DisCo Models of Meaning > 16 Intro Lexical Syntax A Simple CG Derivation interleukin − 10 inhibits production NP (S \NP )/NP NP S \NP S > < forward application backward application DisCo Models of Meaning > < 17 Intro Lexical Syntax Pregroup Grammar Derivation Google bought Microsoft NP NP r · S · NP l NP S \NP S DisCo Models of Meaning 18 Intro Lexical Syntax Pregroup Grammar Derivation Google bought Microsoft NP NP r · S · NP l NP NP r · S S DisCo Models of Meaning 19 Intro Lexical Syntax Pregroup Grammar Derivation Google bought Microsoft NP NP r · S · NP l NP NP r · S S DisCo Models of Meaning 20 Intro Lexical Syntax Pregroup Reduction Google NP DisCo Models of Meaning bought NP^r S NP^l Microsoft NP 21 Semantics Example Learning Predicate-Argument Semantics man bites dog NP NP r · S · NP l NP man0 λx.λy bites0 (x, y) dog 0 NP r · S S DisCo Models of Meaning 22 Semantics Example Learning Predicate-Argument Semantics man bites dog NP NP r · S · NP l NP man0 λx.λy bites0 (x, y) dog 0 NP r · S λy bites0 (dog 0 , y) S Function application DisCo Models of Meaning 23 Semantics Example Learning Predicate-Argument Semantics man bites dog NP NP r · S · NP l NP man0 λx.λy bites0 (x, y) dog 0 NP r · S λy bites0 (dog 0 , y) S bites0 (dog 0 , man0 ) Function application DisCo Models of Meaning 24 Semantics Example Learning Vector-Space Semantics? man bites dog NP NP r · S · NP l NP man0 λx.λy bites0 (x, y) dog 0 NP r · S λy bites0 (dog 0 , y) S bites0 (dog 0 , man0 ) • What are the semantic types of the vectors? • What is the equivalent of function application? DisCo Models of Meaning 25 Semantics Example Adjective Noun Combinations red car N · Nl N N DisCo Models of Meaning Learning 26 Semantics Example Adjective Noun Combinations red car N · Nl N N DisCo Models of Meaning Learning 27 Semantics Example Learning Adjective Noun Combinations red car N · Nl N N • Adjective is a function • How are functions represented in linear algebra? (B&Z, 2010) • Functions are matrices (Linear Maps) DisCo Models of Meaning 27 Semantics Example Learning Adjective Noun Combinations red car N · Nl N N • Adjective is a function • How are functions represented in linear algebra? (B&Z, 2010) • Functions are matrices (Linear Maps) • How do functions combine with arguments in linear algebra? • Matrix multiplication DisCo Models of Meaning 27 Semantics Example Learning Matrix Multiplication R11 R21 R31 R41 R51 DisCo Models of Meaning R12 R22 R32 R42 R52 R13 R23 R33 R43 R53 −−−−→ red car − → car RED R14 R24 R34 R44 R54 R15 R25 R35 R45 R55 c1 c2 c3 c4 c5 = rc1 rc2 rc3 rc4 rc5 28 Semantics Example Learning Matrix and Vector Types R11 R21 R31 R41 R51 DisCo Models of Meaning R12 R22 R32 R42 R52 RED − → car −−−−→ red car N⊗N N N R13 R23 R33 R43 R53 R14 R24 R34 R44 R54 R15 R25 R35 R45 R55 c1 c2 c3 c4 c5 = rc1 rc2 rc3 rc4 rc5 29 Semantics Example Learning Matrix and Vector Types − → car −−−−→ red car N⊗N N N l N N RED N ·N R11 R21 R31 R41 R51 DisCo Models of Meaning R12 R22 R32 R42 R52 R13 R23 R33 R43 R53 R14 R24 R34 R44 R54 R15 R25 R35 R45 R55 c1 c2 c3 c4 c5 = rc1 rc2 rc3 rc4 rc5 30 Semantics Example Syntactic Types to Tensor Spaces man bites dog NP NP r · S · NP l NP N N⊗S ⊗N DisCo Models of Meaning N Learning 31 Semantics Example Learning Syntactic Types to Tensor Spaces man bites dog NP NP r · S · NP l NP N N⊗S ⊗N DisCo Models of Meaning N 32 Semantics Example Learning Syntactic Types to Tensor Spaces man bites dog NP NP r · S · NP l NP N DisCo Models of Meaning N ⊗S ⊗N N 33 Semantics Example Learning Syntactic Types to Tensor Spaces man bites dog NP NP r · S · NP l NP N DisCo Models of Meaning N⊗S⊗N N 34 Semantics Example Learning Syntactic Types to Tensor Spaces man bites dog NP NP r · S · NP l NP N N⊗S⊗N N • Key question: what is the sentence space? DisCo Models of Meaning 34 Semantics Example Learning Meaning Vectors as Tensors S N N DisCo Models of Meaning N N 35 Semantics Example Learning 36 The Distributional Meaning of a Verb Ψ= X − − →) ∈ N ⊗ S ⊗ N Cijk (→ ni ⊗ → sj ⊗ − n k ijk hfluffy,T,fluffyihfluffy,F,fluffyihfluffy,T,fastihfluffy,F,fastihfluffy,T,juiceihfluffy,F,juiceihtasty,T,juicei −−−→ chases −−→ eats DisCo Models of Meaning 0.8 0.7 0.2 0.3 0.75 0.6 0.25 0.4 0.2 0.9 0.8 0.1 ... 0.1 0.1 Semantics Example Learning The Distributional Meaning of a Verb Ψ= X − − →) ∈ N ⊗ S ⊗ N Cijk (→ ni ⊗ → sj ⊗ − n k ijk • The verb tensor effectively encodes all the ways in which a verb meaning can participate in the meaning of a sentence DisCo Models of Meaning 37 Semantics Example Learning The Distributional Meaning of a Verb Ψ= X − − →) ∈ N ⊗ S ⊗ N Cijk (→ ni ⊗ → sj ⊗ − n k ijk • The verb tensor effectively encodes all the ways in which a verb meaning can participate in the meaning of a sentence • Key idea is that a verb is relational, or functional, in nature DisCo Models of Meaning 37 Semantics Example 2D Plausibility Space (S) True 6 dog chases cat apple chases orange : - False DisCo Models of Meaning Learning 38 Semantics Example Learning Noun Space (N) −→ dog −→ cat −−−→ apple − −−−→ orange fluffy run fast aggressive tasty buy juice fruit 0.8 0.9 0.0 0.0 0.8 0.8 0.0 0.0 0.7 0.6 0.0 0.0 0.6 0.3 0.0 0.0 0.1 0.0 0.9 1.0 0.5 0.5 0.9 0.9 0.0 0.0 0.8 1.0 0.0 0.0 1.0 1.0 DisCo Models of Meaning 39 Semantics Example Learning 40 Verb Space (N ⊗ S ⊗ N) hfluffy,T,fluffyihfluffy,F,fluffyihfluffy,T,fastihfluffy,F,fastihfluffy,T,juiceihfluffy,F,juiceihtasty,T,juicei −−−→ chases −−→ eats DisCo Models of Meaning 0.8 0.7 0.2 0.3 0.75 0.6 0.25 0.4 0.2 0.9 0.8 0.1 ... 0.1 0.1 Semantics Example Learning 41 The Meaning of a Sentence dog chases cat hfluffy,T,fluffyihfluffy,F,fluffyihfluffy,T,fastihfluffy,F,fastihfluffy,T,juiceihfluffy,F,juiceihtasty,T,juicei −−−→ chases dog,cat 0.8 0.2 0.75 0.25 0.2 0.8 0.1 0.8,0.9 0.8,0.9 0.8,0.6 0.8,0.6 0.8,0.0 0.8,0.0 0.1,0.0 DisCo Models of Meaning ... Semantics Example Learning 42 The Meaning of a Sentence dog chases cat hfluffy,T,fluffyihfluffy,F,fluffyihfluffy,T,fastihfluffy,F,fastihfluffy,T,juiceihfluffy,F,juiceihtasty,T,juicei −−−→ chases dog,cat 0.8 0.2 0.75 0.25 0.2 0.8 0.1 0.8,0.9 0.8,0.9 0.8,0.6 0.8,0.6 0.8,0.0 0.8,0.0 0.1,0.0 −−−−−−−−−−→ dog chases cat T = 0.8 . 0.8 . 0.9 + 0.75 . 0.8 . 0.6 + 0.2 . 0.8 . 0.0 + 0.1 . 0.1 . 0.0 + . . . DisCo Models of Meaning ... Semantics Example Learning 43 The Meaning of a Sentence dog chases cat hfluffy,T,fluffyihfluffy,F,fluffyihfluffy,T,fastihfluffy,F,fastihfluffy,T,juiceihfluffy,F,juiceihtasty,T,juicei −−−→ chases dog,cat 0.8 0.2 0.75 0.25 0.2 0.8 0.1 0.8,0.9 0.8,0.9 0.8,0.6 0.8,0.6 0.8,0.0 0.8,0.0 0.1,0.0 −−−−−−−−−−→ dog chases cat T = 0.8 . 0.8 . 0.9 + 0.75 . 0.8 . 0.6 + 0.2 . 0.8 . 0.0 + 0.1 . 0.1 . 0.0 + . . . −−−−−−−−−−→ dog chases cat F = 0.2 . 0.8 . 0.9 + 0.25 . 0.8 . 0.6 + 0.8 . 0.8 . 0.0 + . . . DisCo Models of Meaning ... Semantics Example Learning The Meaning of a Sentence → − − − f (→ π ⊗ Ψ ⊗→ o) = X − − − − − Cijk h→ π |→ πi i→ sj h→ o |→ ok i ijk ! = X X j DisCo Models of Meaning ik − − − − − Cijk h→ π |→ πi ih→ o |→ ok i → sj 44 Semantics Example Learning Multi-Linear Algebra S N N N S N N S DisCo Models of Meaning N 45 Semantics Example Type Reductions man bites dog NP NP r · S · NP l NP N N⊗S⊗N N NP r · S S DisCo Models of Meaning Learning 46 Semantics Example Type Reductions man bites dog NP NP r · S · NP l NP N N⊗S⊗N N NP r · S N⊗S S Inner products (giving real numbers of dimension 1) DisCo Models of Meaning Learning 47 Semantics Example Type Reductions man bites dog NP NP r · S · NP l NP N N⊗S⊗N N NP r · S N⊗S S S Inner products DisCo Models of Meaning Learning 48 Semantics Example Learning Summary of Vector-Space Semantics Meaning of a sentence w1 · · · wn with the grammatical structure p1 · · · pn →α s is: − w−1−·−·− ·− w→ n := → ⊗ ··· ⊗ − F (α)(− w w→ 1 n) • F (α) is Montague’s homomorphic passage (Frege’s principle) in the form of a linear map DisCo Models of Meaning 49 Semantics Example Machine Learning • Two crucial questions the framework does not answer: 1 what is the sentence space? 2 where do the tensor values Cijk come from? • We’d like ML to answer (2) (and maybe (1) as well) DisCo Models of Meaning Learning 50 Semantics Example Learning Machine Learning • Two crucial questions the framework does not answer: 1 what is the sentence space? 2 where do the tensor values Cijk come from? • We’d like ML to answer (2) (and maybe (1) as well) • Objective function? – probably depends on the task or application: • parsing: optimise parsing performance on a treebank • sentiment analysis: optimise accuracy on movie ratings prediction • machine translation evaluation: optimise correlation between vector similarities and human similarity scores DisCo Models of Meaning 50 Semantics Example Learning Neural Networks to the Rescue? • Work by Socher, Manning and Ng addresses a similar problem • Uses recursive neural networks and backpropagation through structure to learn vectors and matrices • But does not exploit the notion of syntactic type DisCo Models of Meaning 51 Semantics Example Recursive Matrix-Vector Spaces Taken from Socher, Bengio, Manning ACL 2012 tutorial DisCo Models of Meaning Learning 52 Semantics Example Learning Recursive Matrix-Vector Spaces • Softmax classifer can predict labels for each node in the tree • Supervised labels can be used for training • Objective functions based on minimising label errors . . . • for the task of predicting sentiment labels for adjective-adverb pairs (eg not great, pretty good) • and also the Penn Treebank parsing task (for a simpler model and short sentences) DisCo Models of Meaning 53 Semantics Example Learning Current Thoughts • Can distributional semantic representations improve categorial grammar parsing? • Can the Socher method be applied to our type-driven framework? • Should the sentence space be the same as the noun space? – as it is in the Socher model • Can we learn higher-order tensors? Or just matrices? • the third-order tensor for the transitive verb could be approximated with two matrices (one each for the subject and object) DisCo Models of Meaning 54 Semantics Example Learning References • Type-Driven Syntax and Semantics for Composing Meaning Vectors, Stephen Clark, a draft chapter to appear in Quantum Physics and Linguistics: A Compositional, Diagrammatic Discourse, Heunen, Sadrzadeh and Grefenstette Eds. • Mathematical Foundations for a Compositional Distributional Model of Meaning, Bob Coecke, Mehrnoosh Sadrzadeh, and Stephen Clark, Linguistic Analysis: A Festschrift for Joachim Lambek, van Bentham and Moortgat (eds), 2011 • Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space, M. Baroni and R. Zamparelli, Proceedings of EMNLP, Cambridge MA, 2010 • Semantic Compositionality through Recursive Matrix-Vector Spaces, Richard Socher, Brody Huval, Christopher D. Manning and Andrew Y. Ng, Proceedings of EMNLP, Jeju, Korea, 2012 DisCo Models of Meaning 55
© Copyright 2026 Paperzz