Diagrammatic Reasoning about Meaning of Sentences - clic

Diagrammatic Reasoning about Meaning of Sentences
S. Clark (Cambridge), B. Coecke, M. Sadrzadeh (Oxford)1
The symbolic [5] and distributional [8] theories of meaning are somewhat orthogonal with competing pros and cons: the former is compositional but only qualitative, the latter is non-compositional but
quantitative. Following [9] in the context of Cognitive Science, where a similar problem exists between
the connectionist and symbolic models of mind, [3] argued for the use of the tensor product of vector
spaces and pairing the vectors of meaning with their syntactic roles.
This abstract summarizes the framework developed in [2], which builds on this idea and uses tensor
spaces and pairs vectors with their grammatical types, but in a way that overcomes the main theoretical
shortcomings of [3], i.e. that meaning of sentences with different grammatical types lives in different
spaces, hence it was not possible to compare their meanings.
Moreover, our framework admits a purely diagrammatic way of representing meaning of sentences.
The diagrams simplify to a great extent the linear algebraic computations of the meaning of the sentences
and their comparisons. Type-checking forms an essential fragment of our framework, and the reduction
scheme to verify grammatical correctness of sentences will not only provide a statement on the welltypedness of a sentence, but will also assign a vector in a vector space to each sentence. Hence we obtain
a theory with both grammatical analysis and vector space models as constituents, but which is inherently
compositional and assigns a meaning to a sentence given the meanings of its words.
For the purpose of this abstract, we use a Pregroup type-logic for analysing the grammatical structure
of sentences. Pregroups are recent development of Lambek [6] and build on his original Lambek calculus,
where types are used to analyze the syntax of natural languages in a simple equational algebraic setting.
Pregroups have been used to analyze the syntax of a range of different languages, from English and
French to Polish and Persian, and many more; for more references see [7]. But what is particularly
interesting about Pregroups, and motivates their use here, is that they share a common structure with
vector spaces and tensor products, when passing to a category-theoretic perspective. Both the category
of vector spaces, linear maps and the tensor product, as well as pregoups, are examples of so-called
compact closed categories. Concretely, Pregroups are posetal instances of the categorical logic of vector
spaces, where juxtaposition of types corresponds to the monoidal tensor of the monoidal category. The
mathematical structure within which we compute the meaning of sentences will be a compact closed
category which combines the two above in a product category. The meanings of words are vectors in
vector spaces, their grammatical roles are types in a Pregroup, and tensor product of vector spaces paired
with the Pregroup composition is used for the composition of (meaning, type) pairs.
Compact closed categories admit a beautiful purely diagrammatic calculus that simplifies the meaning computations to a great extent. They also provide reduction diagrams for typing sentences. This
diagrammatic structure, for the case of vector spaces, was recently exploited to expose the flows of information within quantum information protocols [1, 4]. Here, they will expose the flow of information
between the words that make up a sentence, in order to produce the meaning of the whole sentence. On
the application side, we demonstrate how to build the sentence space for truth-theoretic meaning and
leave practical considerations on how to extend it to non-truth-theoretic meaning to a sequel paper.
A gist of the formalism and two examples. Given a sentence w1 · · · wn , we define the vector −
w−1−·−·−
·−
w→
n
→ ⊗ ··· ⊗ −
of its meaning to be f (−
w
w→). We take W to be the vector space of the meaning of word w ,
1
n
i
i
given by the distributional model of meaning, and pi to be its grammatical type, given by the type-logic.
The linear map f is built by substituting each pi in the reduction equation p1 · · · pn ≤ s of the sentence
1
Authors are in alphabetical order. Oxford = Oxford University Computing Lab, Cambridge = Cambridge University
Computer Lab. Emails: firstname.lastname@{comlab.ox, cl.cam}.ac.uk
1
with the corresponding Wi in which the meaning of the word lives. Mathematically, the meaning of the
sentence consists of the following pair of maps
(W1 ⊗ · · · ⊗ Wn , p1 · · · pn )
(f,≤)
- (S, s)
As an example, consider a positive sentence with a transitive verb. It has the Pregroup type n(nrsnl )n.
This means that the subject has type n, the verb nrsnl and the object n. In the type of the verb, s is the
type of a declarative sentence and the juxtaposition (nrsnl ) models a function type: one that inputs a
word of type n on its left, another word of type n on its right and outputs a sentence of type s . In the
Pregroup notation, nl and nr are the left and right adjoints of n. This means that they cancel n out in the
following order
nnr ≤ 1
and
nl n ≤ 1
Here 1 denotes the unit of juxtaposition, i.e. 1s = s1 = s. The juxtaposition is modeled by the monoid
multiplication of the Pregroup. The grammatical structure of a transitive sentence is reflected in the
computation nnrsnl n ≤ 1s1 = s, depicted in the following diagram:
n r s nl
n
n
The negation of a transitive sentence has type n (nr sj l σ) (σ r jj l σ) (σ r jnl ) n, where j stands for the
infinitive of a verb and σ is a dummy glueing type. A similar Pregroup calculation, similar to the above,
shows that this juxtaposition also reduces to s; this is depicted as follows:
n nrs j lσ σ rjj lσ σ rjnl n
On the semantic side, we assume that the meaning spaces of the subject, object, and the verb of the
sentence are given as (V, n), (W, n), and (V ⊗ S ⊗ W, nrsnl ). The meaning of the sentence is the
following map
(f,≤)
- (S, s) ,
V ⊗ (V ⊗ S ⊗ W ) ⊗ W , n(nrsnl )n
where f arises from a grammatical reduction map; in this case we obtain:
f = V ⊗ 1S ⊗ W : V ⊗ (V ⊗ S ⊗ W ) ⊗ W → S .
where maps are ”categorifications” of the type-logic reductions and in a concrete vector space setting
→
correspond to taking inner products. When applied to the vectors of the meanings of the words, −
v for
→
−
→ −
−
→ −
−
→
−
→
−
→
→
−
subject, w for object, and Ψ for verb, i.e. the meaning becomes f ( v ⊗ Ψ ⊗ w ) ∈ S for v ⊗ Ψ ⊗ →
w ∈
V ⊗ S ⊗ W . This map corresponds to the following diagram:
Ψ
v
w
P
→
→
→
The typical vector in the tensor space which represents the type of verb is Ψ = ijk cijk −
v i ⊗−
s j ⊗−
wk ∈
V ⊗ S ⊗ W , hence the above diagram is represented by the map
!
X X
→
−
→
→
→
→
→
→
→
f (−
v ⊗ Ψ ⊗−
w) =
c h−
v |−
v ih−
w |−
wi −
s
ijk
j
ik
By the diagrammatic calculus we have the following equality
2
i
k
j
Ψ
w
=
v
Ψ
w
v
We assign truth-theoretic meaning to a transitive sentence, by assuming that the sentence space S is
spanned by two vectors |0i and |1i, which stand for false and true.The meaning map returns false iff
→
the vector of the meaning of the verb cijk −
s j takes the value |0i and it is true iff it takes the value |1i.
Work in progress on the practical aspects of the theory shows that in order to obtain a non-truth theoretic
meaning, vector spaces of the subject and object, and more importantly that of sentence, may need to be
more structured.
The negative transitive sentence has a much more involved typing and linear map, and we refrain
from giving it here. Its diagram is as follows
v
does
Ψ
not
w
−−→
−→
Here does and not are the vectors corresponding to the meanings of “does” and “not”. Since these are
logical function words, we assign meaning to them without consulting a corpus of documents. To obtain
a truth-theoretic meaning, we implement not as the linear map of the matrix representing the logical not
and does as the linear map of the identity matrix. Diagrammatically we have:
−−→
does =
−→
not =
not
−−→ −→ −
→ →
→
Substituting all of this in f (−
v ⊗ does ⊗ not ⊗ Ψ ⊗ −
w ) we obtain:
v
Ψ
not
w
equal to:
Ψ
w
=
v
not
Ψ
v
not
w
This is true iff the meaning of the positive version of the sentence is false, and false other wise.
Comparing meaning of sentences. One of the advantages of our approach to compositional meaning
is that the meanings of sentences are all vectors in the same space, so we can use the inner product to
compare the meaning vectors. This measure has been referred to and widely used as a degree of similarity
between meanings of words in the distributional approaches to meaning [8]. It is obtained by taking the
cosine of the angle between the vectors of the meanings of the words. In our framework, this measure
0
0
is immediately extendable
to meanings of sentences, we say
w1 · · · wk and w1 · · · wl have degree of
−
→
−
→
→ ⊗ ··· ⊗ −
→), f (w0 ⊗ · · · ⊗ w0 ) = m.
similarity m iff cos f (−
w
w
1
k
1
l
As an example, consider the sentence ”John likes Mary.” and its negation ”John does not like Mary.”.
To make the comparisons more interesting, we define likes to have degrees of love and hate as follows:
−−→ 3 −−→ 1 −−→
likes = 4 loves + 4 hates. The truth-theoretic meanings for love and hate are obtained as explained above
for the transitive verb. These are then used to calculate
−−−−−−−−−−→ −−−−−−−−−−→ 3
cos John loves Mary, John likes Mary =
4
3
A similar calculation provides us with the following degrees of similarity
−−−−−−−−−−→ −−−−−−−−−→ 1
−−−−−−−−−−→ −−−−−−−−−−→
cos John hates Mary, John likes Mary =
cos John loves Mary, John hates Mary = 0
4
−−−−−−−−−−−−−−−→ −−−−−−−−−−−−−−−→ 3
cos John does not love Mary, John does not like Mary =
4
3
−−−−−−−−−−−−−−−→ −−−−−−−−−→
cos John does not like Mary, John likes Mary =
8
Regarding the last example, at first it might feel bizarre that a sentence and its negation have a non-zero
degree of similarity. This is merely because of our (for the sake of the argument) naive conception of
like as comprising degrees of both love and hate. If we calculate its negation by means of the swap
−→
−−→
−−→
matrix not, we obtain not(like) = 14 loves + 34 hates, since it will exchange the roles of the two opposites
−→
−→
love and hate. Hence, not(like) is not orthogonal to like. In a setting where like is interpreted over a
−→
2 dimensional vector space with true and f alse as basis, it will always be the case that not(like) and
−→
like are orthogonal, and hence their inner product is 0. But alternatively, one could also consider three
degrees of truth for like, again resulting in a non-Boolean negation.
Further Work. This abstract only presents the underlying mathematics of the framework and its pictorial
computations of the meaning. To implement these computations and set the formalism to test, one needs
to build concrete vector spaces for the meaning spaces of the words, e.g. for V , W , and more crucially
for S. In a second abstract, to be presented in the same workshop, we have taken first steps towards
implementation and demonstrated concrete constructions. We have used the setting to obtain both truththeoretic and corpus-based meanings for example sentences and their degrees of similarity. We invite the
reader to consult the second abstract for thoughts on evaluation and applications of this approach, also
for connections to other related vector composition methods. The title is Concrete Sentence Spaces and
it is written by E. GrefenStette and S. Pulman together with the current authors.
At the more theoretical end, the highly abstract nature of our framework enables one to also reason
about alternative models of meaning, other than vector spaces.
References
[1] S. Abramsky and B. Coecke. A categorical semantics for quantum protocols. In Proceedings of the 19th
Annual IEEE Symposium on Logic in Computer Science, pages 415–425. IEEE Computer Science Press,
2004. arXiv:quant-ph/0402130.
[2] S. Clark B. Coecke, M. Sadrzadeh.
Mathematical Foundations for a Compositional Distributional Model of Meaning, volume 36.
Linguistic Analysis (Lambek Festschrift), 2010.
http://arxiv.org/abs/1003.4394.
[3] S. Clark and S. Pulman. Combining symbolic and distributional models of meaning. In Proceedings of AAAI
Spring Symposium on Quantum Interaction. AAAI Press, 2007.
[4] B. Coecke. Quantum picturalism. Contemporary physics, 51:59–83, 2010. arXiv:0908.1787.
[5] D.R. Dowty, R.E. Wall, and S. Peters. Introduction to Montague Semantics. Dordrecht, 1981.
[6] J. Lambek. Type grammar revisited. Logical Aspects of Computational Linguistics, 1582, 1999.
[7] J. Lambek and C. Casadio, editors. Computational algebraic approaches to natural language. Polimetrica,
Milan, 2006.
[8] H. Schuetze. Automatic word sense discrimination. Computational Linguistics, 24(1):97–123, 1998.
[9] P. Smolensky and G. Legendre. The Harmonic Mind: From Neural Computation to Optimality-Theoretic
Grammar Vol. I: Cognitive Architecture Vol. II: Linguistic and Philosophical Implications. MIT Press, 2005.
4