Discourse Grammar and Verb Phrase Anaphora∗
Hub Prüst, Remko Scha and Martin van den Berg
August 1994
∗ This paper has appeared in Linguistics & Philosophy (17), 1994. Copyrights: Linguistics &
Philosophy. We thank Martin Stokhof, Nissim Francez and Jeff Pelletier for valuable comments
on earlier versions of this paper. We are especially grateful to Richard Oehrle for his detailed
comments which led to a number of significant improvements.
Contents
1 Verb Phrase Anaphora
1.1 A Discourse Perspective on VPA . . . . . .
1.2 Previous Analyses . . . . . . . . . . . . . .
1.3 VPA and Parallelism . . . . . . . . . . . . .
1.3.1 Structural and Indexical Parallelism
1.3.2 Parallel Elements outside the VP . .
1.4 The Scope of VPA . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
6
10
10
13
16
2 Calculating the Common Denominator
2.1 Syntactic/Semantic Structures . . . . . . .
2.1.1 The Logic . . . . . . . . . . . . . . .
2.1.2 Structures . . . . . . . . . . . . . . .
2.2 Most Specific Common Denominator . . . .
2.3 MSCD over Syntactic/Semantic Structures
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
19
21
21
26
28
33
3 Discourse Parsing
35
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2 Grammar Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4 A Discourse Analysis of VPA
48
4.1 Simple Antecedents . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2 Complex Antecedents . . . . . . . . . . . . . . . . . . . . . . . . 54
4.3 Complex Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . 60
5 Conclusion
62
2
Abstract We argue that an adequate treatment of verb phrase anaphora
(VPA) must depart in two major respects from the standard approaches. First
of all, VP anaphors cannot be resolved by simply identifying the anaphoric
VP with an antecedent VP. The resolution process must establish a syntactic/semantic parallelism between larger units (clauses or discourse constituent
units) that the VPs occur in. Secondly, discourse structure has a significant influence on the reference possibilities of VPA. This influence must be accounted
for.
We propose a treatment which meets these requirements. It builds on a
discourse grammar which characterizes discourse cohesion by means of a syntactic/semantic matching procedure which recognizes parallel structures in discourse. It turns out that this independently motivated procedure yields the
resolution of VPA as a side effect.
Overview
This paper is composed as follows. The opening section discusses the phenomenon of verb phrase anaphora (VPA), viewed from a discourse perspective.
This perspective highlights some hitherto under-exposed aspects of the VPA
problem: parallelism effects and the scope of VPA in relation to the structure of
a discourse. A formal treatment of VPA which takes these phenomena into account needs a mathematical notion of structure sharing. Such a notion is worked
out in detail in section 2. Section 3 shifts the attention to the idea of discourse
parsing on the basis of a unification grammar, and it presents some relevant discourse grammar rules. In section 4, VPA resolution receives a treatment based
on the formal perspective on discourse described in the first part of the paper.
The last section presents conclusions and some ideas about future research.
1
1.1
Verb Phrase Anaphora
A Discourse Perspective on VPA
Each utterance in a discourse may be assumed to contribute something to the
meaning of that discourse as a whole. However, the meaning of a discourse cannot be regarded as the simple conjunction of the meanings of the utterances that
constitute it. Utterances, and discourse segments in general, are usually involved
in complex semantic dependencies. We believe that a proper understanding of
these dependencies in discourse is necessary for a more succesful explanation of
linguistic phenomena, such as VPA, which have hitherto been studied mainly
at the sentence level.
Parallelism effects have considerable impact on the semantic contribution of
an utterance to a discourse. Parallelism effects have been studied, for instance, in
relation to coordination (Lang (1977)), pronoun resolution (Grober, Beardsley
1
and Caramazza (1978), Kameyama (1986)), and the structuring of discourse
(Polanyi (1985), Scha and Polanyi (1988)).
With regard to parallelism effects in coordination, Lang (1977) (page 4761) distinguishes two main types. Firstly, in case of ambiguous conjuncts, the
resulting ambiguity in the coordination of these conjuncts is reduced because
coordination enforces the same reading for all conjuncts (‘Selektionseffekt’). Secondly, the reading of an unambiguous conjunct is forced onto other ambiguous or
vague conjuncts in the same coordination (‘Übertragungseffekt’). To illustrate
the first aspect, consider example (1).1
(1)
(a) John likes visiting relatives
(b) (and) Peter likes visiting friends.
Both sentences can independently either have an ‘active’ reading (for (a): John
likes to visit relatives) or a ‘passive’ reading (John likes relatives which visit).
However, the meaning of the sequence of (1)(a) and (1)(b) is not simply obtained
by all combinations of possible readings for these sentences (which would yield
four possible interpretations). They can only have parallel readings: either both
are interpreted ‘active’, or both are interpreted ‘passive’. Parallelism doesn’t
allow the ‘mixed’ interpretation.
The second aspect of parallelism is illustrated by (2), where sentence (b)
forces its ‘active’ reading onto (a).
(2)
(a) John likes visiting relatives
(b) (and) Peter likes visiting market places.
(Note that this parallelism effect does not depend on the order of the clauses:
if we turn them around, the effect remains the same.) Another example of this
effect is provided by (3), where the expression in (3)(b) will, in coordination with
(3)(a), primarily be interpreted non-idiomatically. (As a result, the idiomatic
interpretation, if that is the intended one, becomes a ‘joke’.)
(3)
(a) John kicked the dog
(b) (and) Peter kicked the bucket.
Another observation along the same lines in Lang (1977) is that in German
the sentences (4) may either all express a habit (Fred is a smoker, Peter is a
drinker and Roberto a nail-biter) or all express an action (Fred is smoking, Peter
is drinking and Roberto is biting his nails).
(4)
(a) John raucht,
(b) Peter säuft,
(c) Roberto kaut seine Nägel
1 The
‘John smokes,’
‘Peter drinks,’
‘Roberto bites his nails’
parentheses around ‘and’ indicate that this explicit coordination marker is optional.
2
Sentences (a), (b) and (c) all become habitual when succeeded by (4)(d):
(d) ... und Harry ist drogensüchtig.
(d) ... ‘and Harry is a drug addict.’
but they all express an action when succeeded by (d)0
(d)0 ... und Harry schreibt einen Brief.
(d)0 ... ‘and Harry writes a letter.’
These parallelism effects provide one of many reasons why the semantics of
a discourse can not be regarded as a simple conjunction of the semantics of
its utterances. They may reduce the number of readings, as in (1), so that
readings of ambiguous sentences can not be choosen independently. They may
also enforce part of the semantics of an utterance upon other utterances: part
of the semantic contribution of (d) to discourse (4) is that it enforces a habitual
reading upon (a) through (c).
Another well-known aspect of parallelism is that it plays a role in pronoun
resolution. In general, interpretations involving parallel structure are preferred
over interpretations which don’t involve parallel structure. Grosz, Joshi and
Weinstein (1983) and Kameyama (1986), for instance, show that in English and
Japanese, there is a tendency for a subject pronoun to refer to the subject of
the previous utterance, or in general that in a chain of pronominalized elements,
these elements preferably have the same grammatical function.
Parallelism may also be employed as a basis for organizing the segmentation of discourse (cf. Polanyi (1985), Scha and Polanyi (1988)). In general, the
semantic contribution of an utterance to a discourse depends on what it has in
common with the other utterances in the discourse segment that it belongs to.
As an example we may compare the contribution of the sentence Susan likes
flamenco-dancing to the contexts (5) and (6). In (5), part of its contribution
to this small discourse is the fact that it directs the topic of the discourse to
something like ‘girls liking some sort of dancing’.
(5)
(a) Maaike likes belly-dancing.
(b) Susan likes flamenco-dancing.
In (6), the contribution of this sentence to the discourse, however, is totally different. In this context Susan likes flamenco-dancing states yet another property
of Susan (though of a different nature than the first three).2
2 The sentence Susan likes flamenco-dancing is likely to have different intonational properties in these two discourse contexts to indicate that the contribution to the discourse in each
case is not the same, but we abstract away from that here.
3
(6)
(a) Susan is a blond.
(b) She weighs about 150.
(c) She’s got a nice disposition.
(d) Susan likes flamenco-dancing.
Polanyi (1985) would classify (6) as a ‘Topic Chain’ structure in which the local
discourse topic of (a) through (c) is something like ‘physical attributes of Susan’.
Whereas these first three sentences may be grouped together by this discourse
topic, sentence (6)(d) gives rise to a binary structure that consists of (6)(a)-(c)
and (6)(d) under the topic ‘generally known properties of Susan’.
Thus the semantic contribution of an utterance to the discourse depends on
what this utterance has in common with the other utterances in the discourse
segment that it belongs to, or, in other words, in what respect it parallels the
other utterances. Part of the contribution of the sentence Susan likes flamencodancing to (5) is the fact that it directs the local topic of the discourse to
something like ‘girls liking some sort of dancing’: it is in this respect that sentence (b) is parallel to (a). In (6), the contribution of the same sentence to
the discourse is different: it parallels all previous sentences in that it states yet
another property of Susan, but of a different kind than the first three.
The above considerations show that if we want to account for the mechanisms
that function in discourse, it is necessary to develop a device that can detect
what two adjacent segments in discourse have in common and what not (i.e., in
what respect they are parallel). In what follows, we shall specify such a device as
a matching process which yields a structured object (the Most Specific Common
Denominator) that indicates what part of its syntactic/semantic structure an
utterance has in common with the discourse segment it is attached to.
The dependence on the context described above holds even more rigorously
for utterances which exhibit verb phrase anaphora. VP anaphors induce parallelism in a compelling way. The simplest instance of this phenomenon is illustrated by (7) and (8).
(7)
(a) Mary likes him
(b) (and) Susan likes him.
(8)
(a) Mary likes him
(b) (and) Susan does too.
In (7)(a) the pronoun him refers to some discourse referent not otherwise identified in this fragment. The preferred interpretation for (7)(b) is an interpretation
that parallels (a), i.e., one where both pronouns refer to the same entity. However, there is a possiblity to escape this parallelism: when deictically interpreted,
the pronouns can in principle have different referents (for instance, when accompanied by different pointing gestures, in which case the pronouns are usually
stressed). In (8), however, it is impossible to escape parallelism because sentence
(b) contains a verb phrase anaphor (embodied by the auxiliary does). Whatever
4
interpretation the pronoun him receives in (8)(a), the interpretation of the VP
anaphor in (b) will have to incorporate the same pronoun interpretation. In
section 1.3.1, we discuss to this aspect of parallelism more thoroughly, under
the heading of ‘indexical parallelism’.
Another important observation with regard to VPA is the fact that antecedents of VPA are not restricted to simplex sentences. A discourse is not just
a simple sequence of clauses, but consists of segments that bear specific structural relations to each other. These segments we shall call ‘discourse constituent
units’ (DCUs). VPA antecedents may consist of DCUs that are more complex
than a simplex sentence; they may also be embedded in complex DCUs. As an
example of a VP anaphor with a complex antecedent, consider (9):
(9)
(a) Maaike likes belly-dancing.
(b) She hates waltzing.
(c) Saskia does too.
The anaphoric clause (9)(c) is ambiguous in that the VP anaphor may be interpreted in at least two different ways. It may either have (9)(b) as antecedent
clause, which corresponds to something like ‘(x) hates waltzing’ as the interpretation of the VP anaphor, or it may have both (9)(a)+(b) as its antecedent,
which results in ‘(x) likes belly-dancing and (x) hates waltzing’ as the interpretation of the VP anaphor. Similarly in (10), the VP anaphor may refer to
a combination of the predicates in both (10)(a) and (10)(b). In that case, the
rhetorical relation between the clauses (10)(a) and (10)(b), namely rhetorical
subordination indicated by because, is preserved in the interpretation of the VP
anaphor (so that this anaphor refers to a predication like ‘(x) went to the dentist
because (x) needed a check-up’).
(10) (a) Fred went to the dentist
(b) because he needed a check-up.
(c) Sara did too.
Such ‘complex’ VPA interpretations cannot be freely construed. As observed in
Nash-Webber and Sag (1978), a change of agent (subject) usually curtails the
scope of VPA. In general, the structure of the preceding discourse is of crucial
importance (cf. section 1.4).
Even more interesting are cases like (11), in which we encounter a VP
anaphor embedded in a complex DCU.
(11) (a) Fred went to the dentist
(b) because he needed a check-up.
(c) Sara did too,
(d) but she had to have her wisdom-tooth removed.
Observe that the contrastive but does not contrast (11)(c) with (11)(d); it contrasts (11)(b) with (11)(d), within a structure which coordinates (11)(a)+(b)
5
with (11)(c)+(d). This structure has its impact on the interpretation of the VP
anaphor in that it cannot refer to (a)+(b) like in example (10), but only to (a)
(cf. section 1.4 below).
The observations above give rise to the idea that the problem of VPA must
be viewed from a general discourse perspective. In fact, we shall show that an
adequate account of the structure of discourse and its semantics (taking into
account parallelism effects) yields VPA resolution as a side effect. Before we go
deeper into these matters, we first briefly discuss some previous approaches to
VPA.
1.2
Previous Analyses
The problem of verb phrase anaphora, exemplified in (12), is how to obtain a
correct interpretation of an underspecified clause such as (12)(b), which contains
an anaphoric auxiliary like does.3
(12) (a) Maaike likes belly-dancing
(b) (and) Brigitte does too.
As (12)(b) gets its interpretation from (12)(a), sequence (12) is interpreted as
meaning that both Maaike and Brigitte like belly-dancing. We shall refer to a
clause like (12)(b) as the ‘anaphoric clause (or rather DCU)’ and to a clause like
(12)(a) as ‘antecedent clause (DCU)’ because it turns out that parallelism constraints bear upon the whole clauses (DCUs) involved as will become apparent
further on.
It has been shown convincingly that VPA is not merely a syntactic but also
a semantic problem (cf. Sag (1977) and Williams (1977), which also discuss
syntactic constraints that are to be respected). The influence of syntactic parallelism is illustrated by (13) and (14). Unlike in (13), the intended antecedent
of the VP anaphor in (14) is not a syntactic VP (because of its passive voice),
which blocks the anaphoric reference.
(13) (a) Maaike wrote a song.
(b) (and) Saskia did too.
(14) (a) A song was written by Maaike.
(b) * (and) Saskia did too.
This difference may also be explained in terms of the parallelism of topic/comment
structure: topic/comment structure of the clauses in (13) is parallel, whereas in
(14) it is not. This explanation also covers cases like (15):
(15) (a) The music, Maaike played for him
(b) * (and) Saskia did too.
3 Although the role of the word too is important, we shall not deal with it in this paper (cf.
Kaplan (1984)).
6
A VP anaphor may be embedded in a sentence as observed by Williams
(1977):
(16) The man who didn’t leave knows the man who did.
In this ‘antecedent-contained VPA’, the anaphoric clause is contained within the
antecedent clause. Williams also observes, among others, that VPA may occur
across utterances and across speakers. This is exemplified by (17).
(17) A: Who sold the painting?
B: Fred did, because Harry wouldn’t.
An intriguing phenomenon which has received much attention in the literature on VPA is the ambiguity between so-called strict and sloppy identity
readings. This phenomenon is exemplified by (18).
(18) (a) John likes his hat
(b) (and) Fred does too.
In the reading where his in (18)(a) refers to John, the anaphoric clause (18)(b)
may either be interpreted as ‘Fred likes Fred’s hat’ (the sloppy identity interpretation) or as ‘Fred likes John’s hat’ (the strict interpretation).
Previous approaches to VPA, whether formulated within a transformational
theory (Sag (1977) and Williams (1977)), a Montagovian framework (Partee
and Bach (1981)) or Discourse Representation Theory (van Eijck (1985) and
Klein (1987)), usually treat VP anaphora in terms of ‘identity of predication’.
This central notion has been expressed in two variants, either, as in Sag (1977),
as a deletion rule which largely deletes one of two semantically identical VPs
(leaving behind an auxiliary, under syntactic conditions comparable to those
involved in pronominal anaphora), or as an interpretive rule which interprets
an elliptical auxiliary in terms of the semantic properties of a syntactically
accessible antecedent VP, as in Williams (1977). Such a mechanism explains
cases like (12) by interpreting the VP does as identical to the VP likes bellydancing. The notion of identity of predication, however, is too narrow to cover
clausal parallelism constraints and to explain VPA with complex antecedents.
Important constraints on VPA can therefore not be formulated in terms of this
notion.
In Sag (1977)’s inspiring and extensive study, VPA is characterized in a
transformational framework in terms of configurations that allow deletion of
identical VPs. VP identity is formulated in terms of ‘alphabetic variance’. Any
VP whose logical form is an alphabetic variant of that of a preceding VP can be
deleted. The notion of alphabetic variance is defined as follows: Two formulas
and are alphabetic variants of each other iff and are identical up to the
renaming of bound variables, and any variable which is free in and is bound by
the same external operator. This notion covers important parallelism aspects of
7
VPA. However, in section 1.3 below we shall see that parallelism effects are not
limited to the VPs involved.
Williams (1977) articulates a different approach to VP anaphora in the transformational framework. He proposes a grammar that is composed of two distinct
subgrammars: a Sentence Grammar and a Discourse Grammar. Discourse rules
interpret the logical forms provided by the Sentence grammar. One of the rules
of the Discourse Grammar interprets a phonologically null VP as equivalent
with a previously interpreted antecedent VP. Beyond this, however, the notion
of Discourse Grammar is not worked out at all. Williams states that ”What
is urgently needed is a careful articulation of the discourse component of the
grammar, on a par with what has been done on Sentence Grammar.” (Williams
(1977), page 138). The proposal in this paper may be regarded as this discourse
component of the grammar.
Partee and Bach (1981) present a fragment of English, accounted for in a
version of Montague Grammar, and a proposal for an interpretive treatment
of VPA in this framework. They encounter a problem in providing a modeltheoretic version of Sag (1977)’s identity of predication which they consider
fundamental, namely the problem of giving an appropriate interpretation to
variables that are free within the antecedent VP. This problem leads to the
conclusion ”that there is no semantic value that can be assigned to VPs such
that VP deletion can be characterized in terms of identity”. However, Partee
and Bach acknowledge that this problem is resolvable in a system that allows
principles to be stated in terms of the global properties of logical form, but
they consider this less desirable than a system in which any level of logical
form is dispensable (as in Montague Grammar). Whether the latter is possible,
however, remains an open question due to the problem indicated above which
needs ”a deeper understanding of the semantics of free and bound variables”
(Partee and Bach (1981), page 469). In what follows, we hope to show that
answering this question is not as crucial as it may seem. From a discourse point
of view, restrictions in terms of ‘the global properties of logical form’, or rather
restrictions on the structure and semantics of discourse, with regard to VPA,
are correlates of general discourse processes and therefore not ad hoc.
A proposal by Sag, Gazdar, Wasow and Weisler (1985) for the treatment of
VPA in the framework of Generalized Phrase Structure Grammar comes close
to the general idea of our approach: ”The interpretation of an elliptical construction is obtained by uniformly substituting its immediate constituents into
some immediately preceding structure, and computing the interpretation of the
results.” (Sag, Gazdar, Wasow and Weisler (1985), page 162). Sag et al. state
that ” . . . the treatment of verbal ellipsis sketched above is highly tentative and
incomplete. This is unavoidable, given the rudimentary state of current understanding of the sorts of discourse factors which play a role in these phenomena.”
Our treatment, however, tries to provide some of this understanding.
Dalrymple, Shieber and Pereira (1991) propose higher-order unification as
a formal means for the interpretation of elliptical constructions in general and
8
VPA in particular. The ellipsis resolution problem is subdivided into two separate parts: first, the determination of the parallelism between the antecedent
clause and the anaphoric clause; second, formation of the ‘implicit relation’ (the
‘missing predication’) that is needed to resolve the anaphoric clause. Dalrymple
et al. mainly discuss the formation of the missing predication. This is obtained
by an abstract formulation of the problem in terms of equations, and higher
order unification to solve these equations. As Dalrymple et al. point out, it is an
important advantage of their approach, that VP anaphor ambiguity, for instance
in case of strict versus sloppy identity, is placed in the process of recovering a
predication for the anaphoric clause. Most other approaches postulate otherwise unwanted ambiguity in the antecedent clause. To apply their unification
machinery, Dalrymple et al. assume that a parallelism between the elliptical
clause and an antecedent has been established, but they do not discuss in much
detail how this is done. This is exactly the topic of the present paper. But we
take a somewhat broader perspective, exploiting a unification mechanism as
a general means to establish parallelism and discourse cohesion; thereby VPA
resolution results as a side effect of establishing discourse cohesion.
The theories mentioned so far try to account for VP anaphora at the sentence
level but in fact they all indicate that it is necessary to address some discourse
level. Both Klein (1987) and van Eijck (1985) give an account of VPA in Discourse Representation Theory (DRT) by introducing ‘predicate discourse representation structures’ (predicate DRSs) as representation of VPs. They place the
approach of Sag (1977) in the framework of DRT. (DRSs embody the ‘logical
form’.) VPA is treated as an anaphoric relation between a predicate DRS and a
discourse referent of the predicate type. Both studies show that the idea of Sag
(1977) can be incorporated within DRT, at the cost of a proliferation of predicate DRSs. Gardent (1991) in her turn shows that Klein (1987)’s treatment can
be recast in the framework of Dynamic Montague Grammar (Groenendijk and
Stokhof (1991)).
The notion of discourse structure in standard DRT is too limited to account
for the constraints on the resolution of VP anaphora which result from discourse
structure. Neither Klein (1987) nor van Eijck (1985) tries to enhance the notion
of discourse structure in this respect. An approach which does is Asher (1993).
This work implements a much richer notion of discourse structure and semantics
on top of the original DRS construction in DRT (in terms of ‘segmented DRSs’).
As for VPA, Asher acknowledges that it is wrong to view this as a phenomenon
concerning only adjacent clauses. As in Klein (1987) and van Eijck (1985), VPA
is treated as an anaphoric relation between a predicate DRS and a discourse
referent. But Asher (1993) takes a broader point of view: VPA is viewed as a form
of ‘concept anaphora’. A VP anaphor introduces a discourse referent for which
a suitable antecedent is to be found. The influence of discourse structure (in
particular relations like parallelism and contrast) on VPA resolution is accounted
for by treating VPA in terms of the richer notion of segmented DRSs. This
opens the possibility to treat discourse aspects of VPA within DRT in a more
9
substantial way.
1.3
VPA and Parallelism
VPA induces parallelism between the DCU that incorporates the anaphoric
clause and the DCU that serves as its antecedent. The parallelism induced by
VPA is very subtle and, among other things, extends to quantifier structures and
to the interpretation of pronouns contained in the antecedent. In this section
we shall have a closer look at these two interacting aspects of parallelism.
1.3.1
Structural and Indexical Parallelism
‘Structural parallelism’ concerns the quantifier structures of the clauses involved.
Example (19) illustrates the contribution of structural parallelism to the interpretation of the VP anaphor in question. Sentence (19)(a) may be given a
so-called ‘collective’ interpretation (the full sum that is donated is $ 10) or a
‘distributive’ reading (each of the three boys donated $ 10).
(19) (a) Three boys donated $ 10
(b) (and) two girls did too.
The interpretation of (19)(b) is partly determined by the interpretation of
(19)(a). A collective (or a distributive) interpretation of (19)(b) is strongly preferred if (19)(a) is assigned a collective (or a distributive) interpretation.
In general if several quantifiers are involved, it seems that the preferred
quantifier reading of a sentence is the one that corresponds to the syntactic
surface structure of the sentence (cf. Scha (1981)). Of course this is not always
the case. Consider for instance a reading of (20)(a) in which the quantifier
structure does not directly correspond to the syntactic surface structure, namely
a (perfectly acceptable) wide scope reading of at least two Dutch soccer players.
(20) (a) Every Italian knows at least two Dutch soccer players
(b) (and) every Dutchman does too.
The VP anaphor in (20)(b) seems to induce parallelism in such a way that an
interpretation of clause (b) must have a quantifier structure parallel to the quantifier structure in the interpretation of (a), even if the quantifier structure does
not correspond to the syntactic surface structure. Both clauses either have the
wide scope universal reading, or both have the wide scope existential reading.4
The fact that the latter reading is possible is made explicit in (21):
4 In a third interpretation of sequence (20), the wide scope existential quantifier may range
over both conjuncts (i.e. over both universal quantifiers). We do not discuss this interpretation
here; its treatment depends on the exact implementation of the discourse referential properties
of existential NPs.
10
(21) (a) Every Italian knows at least two Dutch soccer players: Marco van Basten and Ruud Gullit
(b) Every Dutchman does too: Johan Cruyff and Willem van Hanegem.
Another instance of structural parallelism is shown in (22).
(22) (a) Every woman likes a man
(b) (and) every girl does too.
Sentence (22)(a) is well-known for its semantic ambiguity: either the universal
quantifier has wide scope or the existential quantier does. If sentence (22)(a) is
part of a discourse, the context will probably provide reasons to decide between
these two readings. However, lacking context, (22)(a) receives two different interpretations. Consider the continuation of (22)(a) by (22)(b). Whatever the
interpretation of (22)(a) may be, the interpretation of (22)(b) will correspond
with that interpretation of (22)(a): if (22)(a) has the ‘wide scope universal’ reading (respectively the ‘wide scope existential’ reading), (22)(b) has too (just as in
example (20) above). The two combinations of interpretations of these sentences
where the quantifier scopes differ are ruled out.5
Another aspect of parallelism is what one might call ‘indexical parallelism’:
If the antecedent of a VP anaphor contains a pronoun, the pronoun including
its interpretation must be incorporated in the interpretation of the VP anaphor.
The sentences in (23) give a clear example of indexical parallelism. The interpreted pronoun of (23)(b) is in the antecedent of the VP anaphor. Interpreting
the anaphor just as likes him with some free variable for the pronoun, is obviously wrong because it doesn’t provide the full VP anaphor interpretation. (In
that case the pronoun might even become bound by the universal term.) So the
VP anaphor must include the interpretation of the pronoun.
(23) (a) John is a nice guy.
(b) Mary likes him
(c) (and) every man thinks that Susan does.
As another example, consider (24). To obtain the correct interpretation of the
anaphoric clause (b), the binding structure of (24) needs to be considered. If
him in (24)(a) is bound by the existential term a man, this must also be so in
the interpretation of (b). However, if him in (a) is bound by the universal term
everyone, the pronoun in the VP interpretation of (b) is bound in a parallel
fashion, namely by John. (If him in (a) refers to some other discourse entity,
this interpretation is just ‘copied’ in the interpretation of the VP anaphor.)
5 Sag (1977) also observes that VPA enforces parallel quantifier structure, but doesn’t
discuss parallel quantified terms. Sag formulates a parallelism requirement on quantifier scope
involved in VPA, but his alphabetic variance approach would only allow one parallel pair,
namely the one where both sentences have a wide scope existential reading. In the wide
scope universal reading, the VPs contain free variables which are bound by different operators
outside the VP, which blocks deletion. These are the wrong predictions since both pairs of
readings are possible.
11
(24) (a) Everyone told a man that Mary likes him
(b) (and) John did too.
Crucial examples of clausal parallelism are found when both aspects of the
parallelism effects observed here are present at the same time. A paradigmatic
example is given below. In this example the VP anaphor induces both structural
and indexical parallelism, i.e. the interpretation of sentence (b) requires strict
parallelism with the interpretation of (25)(a) regarding both quantifier structure
and pronoun interpretation.
(25) (a) Everyone told a man that Mary likes him
(b) (and) everyone told a boy that Susan does.
Without any context to interpret it in, sentence (25)(a) has six different interpretations, which are listed below in two groups. The difference between the two
groups is in the quantifier structure of the formulas.
(25)(a) Everyone told a man that Mary likes him
a. ∀x[HU M AN (x) → ∃y[M AN (y)&T ELL(LIKE(x)(M ARY ))(y)(x)]]
b. ∀x[HU M AN (x) → ∃y[M AN (y)&T ELL(LIKE(y)(M ARY ))(y)(x)]]
c. ∀x[HU M AN (x) → ∃y[M AN (y)&T ELL(LIKE(he)(M ARY ))(y)(x)]]
d. ∃y[M AN (y)&∀x[HU M AN (x) → T ELL(LIKE(x)(M ARY ))(y)(x)]]
e. ∃y[M AN (y)&∀x[HU M AN (x) → T ELL(LIKE(y)(M ARY ))(y)(x)]]
f. ∃y[M AN (y)&∀x[HU M AN (x) → T ELL(LIKE(he)(M ARY ))(y)(x)]]
Each group is subdivided with regard to the interpretation of the pronoun him,
which can be interpreted in three different ways: as a variable bound by everyone, as a variable bound by a man, or as a ‘referential’ pronoun referring to
some other discourse referent (not otherwise identified in the sentence). (In the
latter case, the pronoun is represented by a special variable: a wildcard (i.e. an
anonymous unification variable) ‘he’, left to be given an interpretation by the
discourse grammar on the basis of the context, cf. section 2).6 The interpretation of (b) strongly depends on the interpretation of (a). A first incomplete
description of the semantic content of (25)(b) will be twofold, depending on the
order of quantifiers.
(25)(b) (and) everyone told a boy that Susan does.
6 We leave aside what the exact translation of referential pronouns is. Given that these
pronouns refer back to previous utterances, one possibility would be to translate them into
discourse markers (cf. Kamp (1981)). Our preference, however, would be to use Dynamic Logic
methods and translate such pronouns into dynamically bound variables (cf. Groenendijk and
Stokhof (1991)). In that case, for all possible translations, the context-dependent interpretations (i.e. the interpretation after the discourse grammar filled in the variables) have a
homogenuous form: all pronouns are translated as logical variables; these variables are just
not always locally bound.
12
A. ∀x[HU M AN (x) → ∃y[boy(y)&T ELL(do(SU SAN ))(y)(x)]]
B. ∃y[boy(y)&∀x[HU M AN (x) → T ELL(do(SU SAN ))(y)(x)]]
Now it turns out that the interpretation of (25)(b) must be constructed on the
basis of its parallelism with (25)(a), no matter which interpretation(s) the first
clause has.
First of all, as in the foregoing examples, the VP anaphor induces structural
parallelism: the quantifier structure in the interpretations of the two sentences
must be the same. The first interpretation for (25)(b) is only possible if (25)(a)
has been assigned an interpretation from the first group; interpretation (25)(b)B
is possible only if (25)(a) has received an interpretation from the second group.
It is important to notice that, in this case, parallel quantifier structures cannot
be obtained by means of some notion of copying: both the antecedent clause
and the anaphoric clause have ambiguous quantifier structures so that these
structures will have to be inspected to ascertain the required parallelism; if
we only simply copy an antecedent VP (‘like(x)’ with x some free variable,
or ‘like(he)’) into the position of the VP anaphor, there is no way to assure
this quantifier parallelism. Examples like these show that notions of copying or
deletion of (identical) verb phrases are too limited to account for VPA.
Secondly it is important to notice that the interpretation of the pronoun
him in (25)(a) forces a parallel interpretation in (25)(b). For instance, if him
refers to some discourse entity ‘John’ (interpretation c. or f.), the pronoun in the
resolution of the VP anaphor must refer to the same entity ‘John’. Moreover, if in
(25)(a) the pronoun him is interpreted as being bound by a man (interpretation
b. or e.), the interpretation of (25)(b) will have to be parallel, binding the
pronoun in the interpretation of the VP anaphor to a boy. In order to account
for this indexical parallelism, the binding structures of the clauses involved have
to be parallel. It is clear that substituting a semantically identical VP such as
‘like(x)’, with x a free variable, for the VP anaphor will lead to wrong results
because the free variable may become bound by any quantifier. This shows that
the notion of VP identity is too narrow to account for VP anaphora and must
be replaced by some notion of clausal parallelism.
The instances of VPA that we have encountered in this section have clearly
shown that semantic constraints on VPA are not limited to VP identity but
must be stated in terms of parallelism between the anaphoric clause and its
antecedent. We provide more evidence for this point of view in the next section
where we look at parallel elements outside the VP.
1.3.2
Parallel Elements outside the VP
Mostly, only non-parallel elements in the antecedent contribute to a full interpretation of the anaphoric clause. For instance, in (26) the locative in the street
has no parallel in the anaphoric clause and contributes to the interpretation of
that clause (‘Bill jogs in the street’), whereas it is no part of the interpretation
13
of (27)(b), where it does have a parallel element in the anaphoric clause (in the
park).7
(26) (a) John jogs in the street,
(b) (and) Bill does too.
(27) (a) John jogs in the street,
(b) (and) Bill does in the park.
At least as important for the felicity of VPA as the parallelism between the
VPs involved, is the semantic relation between the elements of an anaphoric
clause and their contextual parallels in the antecedent. The elements in the
anaphoric clause are related to their antecedent(s) by what one might call ‘contrastive kinship’: they are semantically related and at the same time contrastive.8
For resolution, the element(s) of an anaphoric clause must, one way or another,
be coupled with their ‘parallel(s)’ in such a way that these elements stand in
this relation of contrastive kinship.
(28) (a) John jogs in the street,
(b) (and) Bill does in his pyjamas.
The sentences (28) provide an example of the fact that the relation of contrastive
kinship between the parallel elements must be properly realized for felicitous use
of VPA: (27) is more acceptable (at the least) because the semantic relation between the parallel elements (in the street,in the park), both locatives, is much
stronger than between those in (28) (in the street, in his pyjamas).9 The relevance of this relation for Gapping has already been argued for by Kuno (1976).
Its relevance for VPA has been underestimated.
In many cases the parallel elements are located outside the VP (subjects,
as in the previous examples, being the most obvious instance). Non-parallel
elements, for that matter, may also be located outside the VP. For instance, on
Monday in (29)(a) is, on many analyses, not part of the VP. However, it does
contribute to the interpretation of a VP anaphor in (29)(b).
7 In general, VPA does not require affixal agreement. However, the interaction between features such as tense with adjuncts is subtle and may, for instance, block adverbial contribution
of non-parallel elements to an anaphoric clause. In the following example, the adjunct last year
does not contribute to the interpretation of the anaphoric clause. John got married last year.
Bill never will. This provides yet another argument for the matching of semantic structures,
but we shall not go into these temporal aspects here.
8 Unlike Gapping, where the entities involved in the Gapping-string and its antecedent
must be different, the contrast between a VPA clause and its antecedent may also reside in
negation of the predicate and/or in the rhetorical relation between the anaphoric clause and
its antecedent, as exemplified by: John should have consulted a doctor but he didn’t.
9 Note that with a relatively long pause between does and in his pyjamas, (b) may be
perceived as the conjunction of two sentences. (Something like: ‘Bill jogs in the street. He
does so in his pyjamas.’)
14
(29) (a) John goes to the bank on Monday
(b) (and) Bill does too.
Apart from the agent, unique thematic relations in the antecedent-DCU
(such as theme - usually the object case of a verb -, and recipient) always seem
to be included in the interpretation of a VP anaphor, i.e. these elements are not
allowed to have parallels in the anaphoric clause (cf. examples (30) and (31)).
(30) (a) John drinks tea in the morning
(b) * (and) Fred does coffee.
(31) (a) John sold a client a table
(b) * (and) Fred did a chair.
As observed above, locatives and temporal adjuncts may or may not have parallel elements in the anaphoric clause. They are only part of the relevant predication if they are non-parallel (examples (26) versus (27)). The co-occurrence
of several locatives and/or temporal adjuncts may give rise to several possible
interpretations of an anaphoric clause, such as in examples (32) and (34). (Some
speakers seem to view example (33) as more acceptable than (32). This indicates
that prosodic accent (in capital letters) must also be considered.)
(32) (a) John jogs in the street in the morning
(b) (and) Mary does in the park.
(33) (a) John jogs in the STREET in the morning
(b) (and) Mary does in the PARK.
(34) (a) John jogs in the street in the morning
(b) (and) Mary does in the evening.
It will be clear that contrastive kinship strongly influences the interpretation
of the anaphoric clause in these cases. For instance, the interpretation of (32)
in which Mary ‘jogs in the park in the morning’ is strongly preferred over the
interpretation where Mary ‘jogs in the street in the park’.10
Some treatments of VPA use an indication for the deletion site, which marks
the place in the surface structure of the utterance where the ellipsis occurs, e.g.
∅ in (35). The examples above show that an approach which uses such a zero
element, or a variable, to mark the deletion site, is not adequate. In general, it
is not known beforehand (i.e. before resolution) whether the VP anaphor under
consideration occupies one or more deletion sites (e.g. example (36)).
(35) (a) John jogs in the street
(b) (and) Bill does ∅ too.
10 It is interesting to notice that in these examples the anaphoric clause may also be expressed
using Gapping, as, for instance, in the example below. (This is where VPA and Gapping meet.)
John jogs in the street in the morning (and) Mary in the park.
15
(36) (a) John jogs in the street in the morning
(b) (and) Mary does ∅ in the park ∅.
To conclude this section on VPA and parallelism we can say that an adequate treatment of VPA requires the whole anaphoric clause to be matched
with its antecedent. VP anaphora embedded in ambiguous quantifier structures
induce clausal parallelism with regard to quantifier and binding structures. Furthermore, it is necessary to establish parallel and non-parallel elements, which
both may be located outside the VP, to determine the relevant predication on
the basis of parallelism. So far, we have only considered parallelism between
sentences. In the next section we look at instances of VPA which have complex
DCUs as their antecedents.
1.4
The Scope of VPA
In this section, we shall show that the antecedent of an anaphoric clause with
VPA does not have to be a clause, nor need it be immediately preceding. We
study instances of VPA which have complex discourse constituent units as their
antecedents and instances of VPA embedded in complex DCUs. Both types of
examples show that the structure of the preceding discourse is relevant for the
interpretation of the anaphoric clause and that a notion of parallelism between
(possibly complex) DCUs is needed to account for VPA.
Though it is well known that VPA may occur in rhetorical subordinations
as in (37), most previous analyses of VPA assume a sentence containing a VP
anaphor to be preceded by one simplex sentence, either coordinated or subordinating, which incorporates the antecedent.11
(37) (a) Maaike must write a song
(b) because Jeroen can’t.
Nash-Webber and Sag (1978), however, observe that, in general, the above assumption is not correct. This is illustrated by examples (9), repeated here as
(38), and (39) below. As observed in section 1.1, the anaphoric clause (38)(c)
is ambiguous in that the VP anaphor may be interpreted in at least two different ways. It may either have (38)(b) as antecedent clause, which corresponds
to something like ‘(x) hates waltzing’ as interpretation of the VP anaphor, or
it may have both (38)(a)+(b) as antecedent, which results in ‘(x) likes bellydancing and (x) hates waltzing’ as interpretation of the VP anaphor.
(38) (a) Maaike likes belly-dancing.
(b) She hates waltzing.
(c) Saskia does too.
11 We leave antecedent-contained VPA aside, but this is immaterial to the argument presented here.
16
(39) (a) John went to the library.
(b) He borrowed a book on computer science.
(c) Bill did too.
Similarly, the anaphor in (39) may refer to the VP in (39)(b) i.e. ‘(x) borrowed
a book on computer science’, or to a conjunction of the VPs in (39)(a) and (b)
respectively, i.e. ‘(x) went to the library and (x) borrowed a book on computer
science’. As it stands, the VP anaphor in (39) seems not to be able to refer to
the VP in (39)(a) only. This third interpretation may become relevant if the
context makes (39)(b) subordinate to (a). For instance, if (39)(a) and (b) form
an answer to the question Who went to the library?, with (b) offered as a side
comment. It is also called for in continuations such as in example (46) discussed
further on.
Thus successive (cohesive) utterances that exhibit topic chaining, i.e. series of predications about the same argument (which is usually the sentence
topic of the first sentence, cf. Polanyi (1985)), may constitute antecedents for
anaphoric clauses with VPA. Example (10), repeated here as (40), indicates
that in complex VPA antecedents that exhibit topic chaining, the rhetorical relations between the clauses in the antecedent (in this case a causal relation) are
preserved in the interpretation of the anaphoric clause.
(40) (a) Fred went to the dentist
(b) He needed a check-up.
(c) Sara did too.
In the preferred interpretation of (40), the anaphoric clause (c) has (a)+(b)
as its antecedent. In this interpretation, the rhetorical subordination between
the clauses in the antecedent seems to be preserved in the interpretation of the
anaphoric clause so that the VP anaphor is interpreted as something like ‘(x)
went to the dentist because (x) needed a check-up’).
A change of subject (agent) usually curtails the scope of VPA as illustrated
by (41) where the anaphor seems to refer to ‘(x) gives a concert’.
(41) (a) John goes to the theatre.
(b) Lia gives a concert.
(c) Brigitte does too.
However, in subordination structures it is not always the case that such a change
curtails the scope of a VP anaphor. In subordination structures, the subordinating DCU plays a crucial role with regard to the relation of the subordination
structure with other DCUs (cf. section 3). In this respect, the subordinated
DCU is less important. This also has its impact on VPA. The importance of the
rhetorical relation in a complex antecedent may be illustrated by (42).
(42) (a) John goes to the theatre.
(b) It’s cause Lia gives a concert.
(c) Brigitte does too.
17
The interpretation of the anaphoric clause in (42) seems ambiguous between
having (b) or (a)+(b) as antecedent. Its preferred interpretation, due to the
explicitly indicated rhetorical subordination in (42)(a)+(b), is ‘Brigitte goes to
the theatre because Lia gives a concert’, in which (a)+(b) is the antecedent DCU.
Thereby the interpretation of the subordinated DCU ((42)(b))is included in the
interpretation of the anaphoric clause. If the subordination is not explicitly
stated (as in (41)), the ‘complex-antecedent-interpretation’ seems hard to get
as there is no topic chaining, but it seems only possible if (41)(b) is interpreted
as subordinated to (a). Whereas the preferred antecedent of (41)(c) is (41)(b),
(42)(b) as antecedent of (42)(c) is less preferred.
It has been observed that a stative in an antecedent clause enforces a stative
in the anaphoric clause (cf. Sag (1977)). If an anaphoric clause has a subordination as its antecedent, such a constraint seems to hold only with respect to
the VP of the subordinator.
(43) (a) Fred went to the dentist
(b) because he was in terrible pain.
(c) Bill did too.
The VP anaphor in (43)(c) cannot refer to (b) because the latter is a stative
whereas (c) is not. Nevertheless, this sequence is well-formed and (43)(c) has
(a)+(b) as its antecedent.
The following examples (44) and (45) provide further evidence for the importance of the structure of the preceding discourse with regard to the interpretation of VPA. The anaphoric clause (44)(d) may have as an antecedent either
(44)(c) only, or (44)(a) through (c). In the latter case, the rhetorical structure
of the antecedent, (44)(b) and (44)(c) elaborating on (44)(a), is part of the
interpretation of the anaphor.
(44) (a) Harry went to the beach.
(b) He didn’t go for a swim, however.
(c) He just took a walk along the seashore.
(d) Fred did too.
(45) (a) John prepared his exam thoroughly:
(b) He decided not to go on holiday.
(c) He spent the whole week studying
(d) and every day he went to bed early.
(e) Mary did too.
Similarly in (45), the VP anaphor may either refer to the whole preceding text, or
to the VP in the directly preceding sentence, (45)(d), only. Again the structure
of this small discourse, (45)(b) through (45)(d) being an elaboration, obviously
limits the reference possibilities of the VP anaphor. The anaphor cannot, for
instance, refer to (45)(b) or (45)(c) only or only to both.
18
In all foregoing examples, the antecedent of an anaphoric clause was a contiguous structured part of the preceding text. Everything within that part was
in the interpretation of the anaphoric clause; there were no ‘holes’ in it. But example (46) illustrates the relevance of discourse structure for the interpretation
of VPA in another (maybe even more interesting) way because we encounter a
VP anaphor embedded in a complex DCU.
(46) (a) John went to the library.
(b) He borrowed a book on computer science.
(c) Bill did too.
(d) He borrowed a book on French.
In the intended interpretation, where He and French in (46)(d) are prosodically
prominent, sentences (46)(a)+(b) are somehow parallel with (46)(c)+(d). The
VP anaphor is ‘embedded’ in the somewhat more complex DCU that consists
of (46)(c)+(d). This DCU is parallel with (46)(a)+(b) which determines the
structure of the discourse and with that the possible interpretation of the VP
anaphor: it is a binary structure that consists of (46)(a)+(b) and (46)(c)+(d).
This leads to (46)(a) as antecedent clause for the anaphoric clause (46)(c),
interpreting the VP anaphor as referring to the embedded VP ‘go to the library’.
A VP anaphor may also be embedded in a complex DCU if the latter has a ‘one
clause’ subordination as its parallel, as illustrated by (11), repeated here as (47):
(47) (a) Fred went to the dentist
(b) because he needed a check-up.
(c) Sara did too,
(d) but she had to have her wisdom-tooth removed.
In summary we can say that whereas most treatments of VPA view this
phenomenon in a too local fashion (namely as only involving an anaphoric VP
and a non-complex antecedent VP), it was shown here that, in general, the scope
of VPA ranges outside of clauses. It turns out that the reference possibilities of
VP anaphora are constrained by the structure of discourse. This means that the
process of resolving a VP anaphor may involve the entire structure of a DCU
in which the VP anaphor occurs, as well as the entire structure of the possibly
complex antecedent. To put it in a different way: the traditional notion of ‘VP
identity’ needs to be broadened to a notion of ‘DCU parallelism’. We give an
account of these observations in section 4.
2
Calculating the Common Denominator
The previous discussion has made clear that in order to account for parallelism
and discourse semantics, we shall need some mechanism to establish what utterances and discourse units have in common and what not. In this section, we
19
shall articulate a device which accomplishes this by matching syntactic/semantic
structures and calculating their ‘Most Specific Common Denominator’ (MSCD).
In section 2.1, we first define the syntactic/semantic representations of sentences. These structures resemble semantic derivation trees in the Montague
tradition (cf. Dowty, Wall and Peters (1981)). They are similar to ‘structured
meanings’ in the sense of Cresswell (1985).
Subsequently we define, in section 2.2, the general notion of a Most Specific
Common Denominator, which is applicable to arbitrary structures. The notion
of a Most Specific Common Denominator of two structures is intended as a
formalization of the intuitive idea of ’what the two structures have in common’.
MSCD may be thought of as unifying those atomic terms that unify and generalizing the others. It is shown that MSCD can be defined along the similar lines
as the familiar notions of Most General Unification (MGU) and Most Specific
Generalization (MSG) (cf. Robinson (1965), Siekmann (1989)).
In section 2.3, we present some examples of calculating MSCDs over structures as defined in section 2.1. This application of the Most Specific Common
Denominator is employed as a characterization of discourse cohesion. As has already been noticed by Hobbs (1979), a discourse is not perceived as coherent just
because successive utterances are ‘about’ the same entities.12 The comprehensive notion of cohesion that we intend to formalize, is related to ideas of Polanyi
(1985) mentioned in section 1.1. Intuitively, this notion may be conceived as
‘making the appropriate generalizations over the meanings of successive discourse constituent units’. Consider, for instance, examples (48) and (49). The
two sentences in (48) are both instances of a common semantics like ‘some girl
likes some sort of dancing’. The semantic structures of (48)(a) and (48)(b) are
indicated by Sa and Sb respectively. The intended generalization is indicated by
Sa |c Sb , the Most Specific Common Denominator of Sa and Sb .
(48) (a) Maaike likes belly-dancing.
(b) Brigitte likes flamenco-dancing.
Sa
Sb
Sa |c Sb
(like0 (belly-dancing0 ))(maaike0 )
(like0 (flamenco-dancing0 ))(brigitte0 )
(like0 (dancing0 ))(girl0 )
(49) (a) Maaike likes belly-dancing.
(b) Fred hates it.
12 Hobbs (1979) observes that although in John took a train from Paris to Istanbul. He likes
spinach, ’he’ can refer only to John, this discourse is not coherent. In this case, the cohesion
between the predicates must be taken into account (and we shall see that the present approach
accomplishes this).
20
Sa
Sb
Sa |c Sb
(like0 (belly-dancing0 ))(maaike0 )
(hate0 (it))(fred0 )
(em-attitude0 (belly-dancing0 ))(human0 )
Similarly, the structure Sa 6 cSb in (49)(c) indicates that the two sentences in
(49) have in common that they both state some emotional attitude of a person
towards belly-dancing.
2.1
Syntactic/Semantic Structures
The structured semantic objects that are subject to the matching process resemble semantic derivation trees in the Montague tradition. One essential difference
with Montague Grammar is that there the level of semantic representation is
dispensable. In the current approach, however, the semantic representations are
used in an essential way in a matching process. So, contrary to Montague Grammar, the (syntactic) structure of the semantic expressions is important. One
inessential difference: Montague Grammar builds on intensional logic, whereas
we use an extensional logic here. This may be easily changed. However, decisions about the proper representation of modalities, propositional attitudes,
etc., which motivated Montague’s Intensional Logic, are independent of the issues discussed in the present paper.
The logic that is defined below is geared to assigning syntactic/semantic
structures to sentences and discourse constituent units in such a way that these
can be examined pairwise in order to determine common syntactic/semantic
structure. It is a standard extensional type logic with one extra syntactic notion:
next to logical variables, the logic provides wildcards (anonymous unification
variables) for the translation of context-dependent elements like VP anaphors
and pronouns. These wildcards are always of a specific sort. The sorts make
it possible to express that something is related to something else in that both
belong to the same subset of the domain. For instance, the constants M AAIKE
and BRIGIT T E, that are used in the translations (maaike0 ) and (brigitte0 ) in
48, are both of the sort girl. This restricts the possible interpretations to those
that interpret M AAIKE and BRIGIT T E as girls. A natural ordering of sorts
also makes it possible to express that the constant M AAIKE is more specific
than a wildcard of sort girl, and that a variable of sort girl is more specific
than one of sort female, which is more specific than one of sort human, etc.
Formally, this natural ordering is given as part of the definition of the logic. We
also specify for all constants the unique most specific sort that they belong to.
2.1.1
The Logic
We start by defining the logic that is used. The syntax of the logic is that of
standard type logic, except that a notion of ‘wildcard of a given sort’ has been
added. This means that in addition to the standard basic ingredients of the
21
logic, there is, for every type α, a set of sorts ‘Sorα ’ that is constructed from
sets of sortal constants CSα . For every sort σ, we add to the logic a wildcard σ.
The sorts are ordered by α . They restrict possible interpretations. All
sorts have a maximal element that is interpreted as the sort of all objects of
the corresponding type. Furthermore, for all sorts σ and τ there is the sort
of functions from σ to τ , written as (σ → τ ). Furthermore, we employ one
inessential but useful simplification: we assure that two sorts always have a
unique most specific sort in common by providing enough sorts.
Definition 1 (Syntax of Lw )
The many-sorted type logic Lw is defined by the following pair:
Lw =< T ype, < Conα , V arα , CSα , Sorα , α , Sα >α∈T ype >
in which T ype is a set, and for every α ∈ T ype : Conα , V arα and Sorα are
disjoint sets and CSα ⊆ Sorα , α is a reflexive ordering of Sorα , and Sα is
a function Sα : Conα → Sorα . T ype, the set of types, is the smallest set such
that:
1. e, t ∈ T ype
2. if α, β ∈ T ype, then < α, β >∈ T ype
For every α ∈ T ype:
• Conα is a set of constants of type α (This includes the logical constants
such as &, ∨ ∈ Con<t,<t,t>> , ¬ ∈ Con<t,t> )
• V arα is a (denumerably infinite) set of logical variables of type α
• CSα is a finite set of sortal constants type α. In particular CSt = {tv},
and CSe is a set such that under the ordering e , CSe is a tree with a
top element (root) entity
• Sorα is a set of sorts of type α. It is the smallest set such that
1. the sortal constants are sorts:
CSα ⊆ Sorα
2. specifying input and output sorts defines a functional sort:
if α =< γ, β >, σ ∈ Sorγ and τ ∈ Sorβ then (σ → τ ) ∈ Sor<γ,β>
• α is a partial order on Sorα satisfying
1. in general, the top element of Sorα is called topα ; entity is the top
of Sore , tv is the top (and only element) of Sort
22
2. if σ α σ 0 and τ β τ 0 , then (σ → τ ) <α,β> (σ 0 → τ 0 )
3. Sorts always have a most specific unique sort in common:
for any two sorts σ and σ 0 , there is a unique smallest sort ρ, such
that σ α ρ and σ 0 α ρ. Note that, because of this, α is only
definable if there are enough sorts.
• Sα is a function that assigns to every element of Conα its ‘closest sort’.
This function will be extended to all expressions of type α below.
The set of meaningful expressions of type α, M Eα is defined by induction:
1. Conα , V arα ⊆ M Eα
2. If φ ∈ M E<α,β> and ψ ∈ M Eα , then φ(ψ) ∈ M Eβ
φ(ψ) is also written as [apply,φ,ψ]
3. If φ ∈ M Eβ and x ∈ V arα , then (λx.φ) ∈ M E<α,β>
4. If φ ∈ M Et and x ∈ V arα , then ∀x(φ), ∃x(φ) ∈ M Et
5. if σ ∈ Sorα , then σ ∈ M Eα
End Definition
This definition is standard except for σ, which is a ‘wildcard of sort σ’.
The sort function Sα can be extended to a function sα which assigns a sort
to all expressions as follows:
1. for c ∈ Conα , sα (c) = Sα (c)
2. for x ∈ V arα , sα (x) = the top element of α
3. if φ ∈ M E<α,β> and ψ ∈ M Eα , then sα (φ(ψ)) = σ, where σ is the smallest
sort such that
s<α,β> (φ) <α,β> (τ → σ) and sα (ψ) α τ
4. for x ∈ V arα and φ ∈ M Eβ , s<α,β> (λx.φ) = (topα → σ), where topα is
the top element of Sorα 13
5. if φ ∈ M Et , then st (φ) = tv
6. sα (σ) = σ
13 Here and in the definition of the sort of a variable, we took the simplest solution. Note
that if sα (φ) = σ → τ , and σ is not the top sort of its type, sα (φ) 6= sα (λx.φ(x)). For our
goal this is sufficient.
23
Lw has a standard semantics for a type logic, extended in a straightforward
way to accommodate the extra components. Except for wildcards, the extra
components do not extend the language, but serve to restrict the possible interpretations to ‘intended models’. A model M for the many-sorted type logic Lw
is an ordered pair < E, I >, where E is a non-empty set of entities (the object
domain) and I is an interpretation function. The range of I is defined as usual:
De = E
Dt = {0, 1}
D<α,β> = DβDα
The interpretation function assigns a denotation to each constant of Lw , i.e. I:
Conα → Dα , and I: Sorα → D<α,t> in such a way that
1. I(topα ) = Dα , e.g. I(entity) = De and I(tv) = Dt .
2. for all types α, β, σ ∈ Sorα , τ ∈ Sorβ :
I(σ → τ ) = {f ∈ D<α,β> |∀d ∈ I(σ) : f (d) ∈ I(τ )},
3. for all σ,τ ∈ Sorα , if σ α τ , then I(σ) ⊆ I(τ )
4. for all c ∈ Conα , I(c) ∈ I(sα (c))
S
5. for all τ ∈ Sorα , I(τ ) = ατ I(σ)
6. for all c ∈ Conα and all τ , if I(c) ∈ I(τ ), then sα (c) α τ
This last condition assures that each constant is assigned a unique most specific
sort.
To give the interpretation of the meaningful expressions of Lw , we need, next
to standard assignments to interpret logical variables, assignments to interpret
wildcards. It is essential that two occurrences of the same wildcard σ are interpreted independently of each other. We use a book-keeping trick to assure this
(cf. Definition 2 below).
An assignment is a pair g = < gv , gw > such that for every type α:
gv : V arα → Dα
gw : Sorα → (N → Dα ), such that, for every σ ∈ Sorα
and i ∈ N: gw (σ)(i) ∈ I(σ)
In Definition 2 (clause 2) below, we shall use the latter condition to decouple
wildcards and make any occurrence of a sorted wildcard unique by employing
auxiliary assignments g1 and g2 in the recursion. (It may seem peculiar that
sorts turn up both as arguments of the interpretation function and as arguments
of the assignments. The difference is that for a sort of type α, the interpretation
24
gives the meaning of that sort, which is an object of type < α, t >, whereas the
assignment gives an arbitrary element of that sort (for every natural number),
which is an object of type α.) The auxiliary assignments behave the same for
logical variables and give ‘split’ values to wildcards. For every type α and every
x ∈ V arα : g1(x) = g(x) and g2(x) = g(x). For every type α, for every σ ∈ Conα
and every i ∈ N the assignments g1 and g2 are defined as follows:
g1(σ)(i) = g(σ)(2i) and g2(σ)(i) = g(σ)(2i+1).
The following semantic rules of Lw define recursively for any expression φ,
the extension of φ with respect to model M and value assignment g, denoted
kφkM,g .
Definition 2 (Semantics of Lw )
(In clauses 3. through 5., g[v/d] is that value assignment exactly like g with the
possible difference that g[v/d](v) is the object d.)
1. If φ ∈ Conα , then kφkM,g = I(φ)
If σ ∈ Sorα , then kσkM,g = gw (σ)(0)
If x ∈ V arα , then kxkM,g = gv (x)
2. If φ ∈ M E<α,β> and ψ ∈ M Eα , then kφ(ψ)kM,g = kφkM,g1 (kψkM,g2 )
D
3. If φ ∈ M Eα and v ∈ V arβ , then kλv.φkM,g is that function h ∈ Dα β such
that for any object d in the domain Dβ , it holds that h(d) = kφkM,g[v/d]
4. If φ ∈ M Et and v ∈ V arα , then k∀v(φ)kM,g = 1 if and only if for every
d ∈ Dα it holds that kφkM,g[v/d] = 1
5. If φ ∈ M Et and v ∈ V arα , then k∃v(φ)kM,g = 1 if and only if for at least
one d ∈ Dα it holds that kφkM,g[v/d] = 1
End Definition
The following example illustrates the effect of clause 2 in this definition with
regard to wildcards, which is to decouple wildcards so that different occurrences
of the same wildcard are treated as if they were different variables:
k(P (σ))(σ)kM,g =
k(P (σ))kM,g1 (kσkM,g2 ) =
kP kM,(g1 )1 (kσkM,(g1 )2 )(kσkM,g2 ) =
I(P )((g1)2(σ)(0))((g2(σ)(0)) =
I(P )((g1)(σ)(1))((g(σ)(1)) =
I(P )(g(σ)(2))((g(σ)(1))
So, different occurrences of the same wildcard have their values assigned independently (i.e. they actually do function as wildcards). As a consequence, two
occurrences of the same wildcard are treated as different objects and two copies
of a formula φ that contains a wildcard have to be regarded as two different
25
objects. In particular, φ and φ&φ will have different meanings! However, we demand that context-dependent interpretations of discourse units do not contain
any wildcards, so wildcards are only used as intermediate calculation devices.
How sorts and wildcards are employed in the current theory, is elucidated in
section 2.2.
2.1.2
Structures
As for the sentence grammar, we assume a compositional approach to semantics
where each syntactic rule has a corresponding semantic rule which specifies the
translation of the output of the syntactic rule in terms of the translation of the
input to it. Thus the translation of a complex expression is determined by the
translations of its parts and the semantic rules that correspond to the syntactic
rules that are used to form the expression. In the following, some terminology
from the Montague framework is employed without explicit reference (cf. Dowty,
Wall and Peters (1981)).14
We adopt a set of syntactic rules and a set of semantic rules as exemplified
below. Corresponding to the syntactic rules,
(I)
(II)
(III)
φT + ψIV → [S φT ψIV ]
φT V + ψT → [IV φT V ψT ]
φDET + ψCN → [T φDET + ψCN ]
Montague Grammar provides the following translations, respectively, which are
all based on the semantic operation of function application (φ,ψ translate into
φ0 ,ψ 0 ):
(I)0
(II)0
(III)0
Translation: ψ 0 (φ0 )
Translation: φ0 (ψ 0 )
Translation: φ0 (ψ 0 )
(ψ ∈ PIV , φ ∈ PT )
(φ ∈ PT V , ψ ∈ PT )
(φ ∈ PDET , ψ ∈ PCN )
In these translations, PΞ denotes the set of all expressions of category Ξ. We
use the logical formulas as shorthand for the syntactic/semantic structures of
sentences and DCUs. So φ(ψ)(χ) and (λy.λx.φ(x)(y))(χ)(ψ) are different structures. To reduce the confusion that may arise from the fact that we use formulas
both as representations of syntactic/semantic structureson the one hand and as
the meanings of these structures on the other, we shall uses square bracket notation ([apply,φ,ψ]) everytime you should think of these as structures, not as
meanings. The translations above correspond in a straightforward manner with
syntactic/semantic structures:
14 In
fact, our analysis is not very sensitive to the particular choice of grammar formalism. In
principle, any sufficiently fine-grained formalism with a one-to-one correspondence of syntactic
and semantic rules may be employed to account for the observations in this paper, as long as
syntactic/semantic structures are assigned consistently.
26
(I)00
(II)00
(III)00
[apply,ψ 0 ,φ0 ]
[apply,φ0 ,ψ 0 ]
[apply,φ0 ,ψ 0 ]
Translation as Structure:
Translation as Structure:
Translation as Structure:
The table below gives a representative sample of translations of lexical elements.
We use x,y,z as variable names for variables of type e; P,Q,R of type < e, t >
and S,T,U of type << e, t >, t >.
Lexical
Element
maaike
a
every
woman
dance
like
Cat
Translation as Structure
T
DET
DET
CN
IV
TV
[λP.P (M AAIKE)]
[λP λQ.∃x(P (x)&Q(x))]
[λP λQ.∀x(P (x) → Q(x))]
[λx.W OM AN (x)]
[λS.S(λx.DAN CE(x))]
[λSλT.T (λy.S(λx.LIKE(x)(y)))]
[λSλT.S(λx.T (λy.LIKE(x)(y)))]
Type
<<e,t>,t>
<<e,t>,<<e,t>,t>>
<<e,t>,<<e,t>,t>>
<e,t>
<<<e,t>,t>,t>
<<<e,t>,t>,<<<e,t>,t>,t>>
Note the type-raised translation of verbs (cf. Hendriks (1993)). The twofold
translation of transitive verbs, such as like, results from attaching quantifier
distribution to the verb; this is motivated in section 4.1. This twofold translation is only relevant if both arguments of the verb are quantified NPs. Otherwise,
i.e. if at least one of the arguments is not a quantified NP, both translations
yield the same semantic interpretation.
We shall occasionally abbreviate the translations by using primed versions
of the lexical elements (such as maaike0 for λP.P (M AAIKE)).
In the translations of clauses, logical variables as well as wildcards (anonymous unification variables) are used. Wildcards are used for context-dependent
elements like VP anaphors, pronouns and WH-terms. (In the semantic representations they are underlined; the anaphor does, for instance, is represented by
‘do’.) In principle, a pronoun receives a twofold translation: it is translated into
a wildcard (he,she,it,. . . ) and into a standard (logical) variable (x,y,z,...). The
latter, of course, only when it is bound by an operator such as a lambda-operator
or a quantifier. This twofold translation reflects the dichotomy of ‘referential’
versus ‘bound variable’ pronouns. The overhead due to this twofold translation
of pronouns can be reduced substantially on the basis of syntactic constraints
on the distribution of referential versus bound variable pronouns (cf. Reinhart
(1983)).
We end this section with an illustration of actual translations as structures: Maaike is a lexical item. Its translation is: λP.P (M AAIKE) of type
<< e, t >, t >, so [λP.P (M AAIKE)<<e,t>,t> ] is a basic syntactic/semantic struc-
27
ture. Here follow some examples of complex syntactic/semantic structures.
Maaike dances
[apply,
[λS.S(λx.DAN CE(x))<<<e,t>,t>,t> ],
[λP.P (M AAIKE)<<e,t>,t> ]]
She sings
[apply,
[λS.S(λx.SIN G(x))<<<e,t>,t>,t> ],
[she<<e,t>,t> ]]
Every woman likes a man
in the reading ∀x(W OM AN (x) → ∃y(M AN (y)&LIKE(y)(x)))
[apply,
[apply,
[λSλT.T (λy.S(λx.LIKE(x)(y)))<<<e,t>,t>,<<<e,t>,t>,t>> ],
[λQ.∃y(M AN (y)&Q(y)))<<e,t>,t> ]],
[λQ.∀x(W OM AN (x) → Q(x))<<e,t>,t> ]]
Every woman likes a man
in the reading ∃y(M AN (y)&∀x(W OM AN (x) → LIKE(y)(x)))
[apply,
[apply,
[λSλT.S(λx.T (λy.LIKE(x)(y)))<<<e,t>,t>,<<<e,t>,t>,t>> ],
[λQ.∃y(M AN (y)&Q(y)))<<e,t>,t> ]],
[λQ.∀x(W OM AN (x) → Q(x))<<e,t>,t> ]]
The units that constitute a discourse may be complex units, constructed from
more than one simplex sentence. Syntactic/semantic structures of complex units
are simply constructed by considering the rhetorical relations that hold among
sentences and DCUs as primitive operators in the structures.
2.2
Most Specific Common Denominator
The definition of MSCD starts with the observation that in any unificationbased formalism there is a notion ‘most specific generalization’ (MSG) and this
notion induces a pre-ordering on its objects according to the rule:
φ ψ iff MSG(φ,ψ) = ψ
where φ ψ is to be read as ‘φ is at least as specific as ψ’. We shall go the
other way, and start with the ordering on the objects and subsequently define
the notions of unification and generalization in terms of this ordering.
28
In the application that we have in mind, a specific consequence of the need to
talk about cohesion and relatedness of constants is that we want to use a manysorted logic where relations hold among domains of the (non-disjoint) sorts.
This makes it possible to express that something is related to something else in
that both belong to the same subset of the domain. It also makes it possible to
express that the constant JOHN is more specific than a variable (or wildcard)
of sort male, and that a variable of sort male is more specific than one of sort
human, etc. The ordering of sorts is assumed to have a natural, Aristotelian,
thesaurus structure, as exemplified below.
M AAIKE,BRIGIT T E
girl
M ARLIES,M ARY
woman
JOHN ,F RED
boy
HARRY ,BILL
man
SOCCER
sport
COF F EE,T EA
drink
BRAZIL,SP AIN
country
female
human animate
male
entity
liquid
LIKE,HAT E
em-attitude
(entity→(human→tv))
activity
EAT ,DRIN K
non-animate
(entity→(entity→tv))
We thus have, for every type α, a set of sorts Sorα , with a partial order defined on it which has Dα itself as its top element. For instance, if the sortstructure depicted above is employed with the type logic defined in section 2.1,
we identify the sort entity with De , and the sort relation with D<e,<e,t>> .
The sort-function s maps any atomic expression of type α into an element of
Sorα .
To exploit the sort-structure for articulating MSCDs, we need, for each sort
a wildcard (an anonymous variable) ranging over it. Therefore, every sort σ ∈
Sorα introduces a wildcard σ. We simply use underlined sort-names as names
for wildcards (for instance, the wildcard boy ranges over the sort boy). Multiple occurrences of the same wildcard are treated independently of each other,
cf.section 2.1.1.
29
Before a definition of the ordering on the expressions of the language can
be given, there is one complexity that we have to deal with. We it to be want
that under this ordering ∀x.M AN (x) and ∀y.M AN (y) are equivalent. The obvious way is to allow for renaming of variables. But we do not want M AN (x)
and M AN (y) to be equivalent. To achieve this, we carry a ‘variable renaming
assignment’ ξ(V arα ) ⊆ V arα down into the recursion. We shall formulate a
recursive definition over the complexity of the formulas:
φ(ψ) ξ χ(σ) iff φ ξ χ and ψ ξ σ
In the following defintion, we shall allow quantifiers to change this assignment
for the variable they bind to allow arbitrary variables to match up. This definition specifies a relation between formulas such that if φ ξ ψ then there is a
substitution of the wildcards in ψ and a renaming of bound variables in ψ, such
that the result is φ.
Definition 3a (Specification)
For any type α and a,b ∈ Conα , u,v ∈ Sorα , x,y ∈ V arα and φ,ψ,χ,σ ∈ M Eα ,
the relation φ ξ ψ (φ is a specification of ψ, relative to ξ) of expressions φ and
ψ of the language is defined as follows:
Atomic Terms
a ξ b iff a = b
u ξ v iff u α v
a ξ u iff s(a) α u
x ξ y iff ξ(x) = ξ(y)
Logical Expressions
φ(ψ) ξ χ(σ) iff φ ξ χ and ψ ξ σ
Important special cases of this for φ,ψ,χ,σ ∈ M Et :
¬φ ξ ¬ψ iff φ ξ ψ
(φ&ψ) ξ (χ&σ) iff φ ξ χ and ψ ξ σ
For quantificational terms we have to define two auxiliary notions. For two assignments ξ and ζ, we write ξ ≈x,y ζ if the assignments only differ for the value
they assign to x and to y, i.e. for all logical variables z not equal to x or y: ξ(z) =
ζ(z). For two formulas φ and ψ, let bvar(φ,ψ) denote the set of logical variables
that are bound by a quantifier or λ-operator in φ or ψ.
Quantificational Terms
For φ,ψ,χ,σ ∈ M Et and x,y ∈ V arα (for arbitrary α) and Q ∈ {∀,∃,λ}:
Qx.φ ξ Qy.ψ iff ∃ζ ≈x,y ξ such that ζ(x) = ζ(y) and ζ(x) 6∈ bvar(φ,ψ) and
φ ζ ψ
30
These rules express the normal recursion into a formula to be able to compare two formulas pointwise. To this we add one extra rule that is somewhat
different.15 We shall use it below to express topic chaining. The set from which
the relation R is taken has to do with the set of rhetorical relations in discourse
(cf. section 3.2).
Topic Chaining
φ(χ) R ψ(χ) ξ (λx.φ(x) R ψ(x))(χ), where R ∈ {and,but,because,. . . }
End Definition
In terms of this relation φ ξ ψ we can now define the ordering φ ψ which
will be the basis for the definition of unification and generalization. We start
with the variable renaming that renames no variable, to make M AN (x) and
M AN (y) different: id(x) = x for all x.
Definition 3b (Ordering on the language)
For two arbitrary expressions φ and ψ of the same type, we write:
φ ψ iff φ id ψ
φ ψ is pronounced: φ is at least as specific as ψ, or as: ψ is at least as general
as φ
End Definition
15 The fact that the definition of the ordering is used has the advantage that checking the
consistency of the induced unification structure amounts to checking whether the resulting
structure is a pre-order. This formulation makes it possible to add other properties to the
unification, as in this rule, without changing any of the objects defined, i.e. it makes extending
this formalism beyond term unification as easy as adding extra axioms. Some examples might
be:
φ&ψ ξ ψ&φ
(φ&ψ)&σ ξ φ&(ψ&σ)
φ&ψ ξ σ iff φ ξ σ and ψ ξ σ
σ ξ φ&ψ iff σ ξ φ and σ ξ ψ
(commutativity)
(associativity)
(left-distributivity)
(right-distributivity)
Another useful extension would make it possible to generalize over different quantifiers. This
involves the notion of a ‘variable quantifier’, for which we use the notation Qx:
Qx.φ ξ Qx.φ
Qx.φ ξ φ
(quantifier abstraction)
(quantifier hiding)
In general, adding such axioms will make the computational complexity much higher than the
linear complexity of the term unification defined so far. (Note that the extra rule for Topic
Chaining also increases the complexity of the formalism.) The point is, however, that we do
not need to change any of the notions defined in terms of the order (like MGU, MSG and
MSCD), and that it allows us to add to the complexity in a piece-meal fashion rather than in
the all-or-nothing way that so-called higher order unification approaches employ.
31
Note that the resulting structure is a pre-order - and not a weak ordering, which
requires the antisymmetry condition (φ ψ&ψ φ) → φ = ψ - because there
are expressions φ and ψ for which both φ ψ and ψ φ and yet φ 6= ψ. For
example, two terms with different bound variables. This means that anything
defined in terms of this ordering will be unique at most modulo equivalence.
This (rather innocent) complication holds for all unification algebras. In particular, this holds for the definition of Most Specific Generalization and of Most
General Unification. These definitions now amount to writing out the names of
the operations:
Definition 4 (MSG)
The Most Specific Generalization (MSG) of φ and ψ (where φ and ψ are of the
same type), written as φ t ψ, is an object χ (of that type) that is at least as
general as both φ and ψ, such that any σ that is at least as general as both φ
and ψ is not more specific than χ.
End Definition
Definition 5 (MGU)
The Most General Unification (MGU) of φ and ψ (where φ and ψ are of the
same type), written as φ u ψ, is an object χ (ot that type) that is at least as
specific as both φ and ψ, such that any σ that is at least as specific as both φ
and ψ is not more general than χ.
End Definition
If there is an object χ, such that χ = φ u ψ, we say that φ and ψ ‘unify’.
For elements of the same type, there is always an MSG: the top element of
the pre- order, > (in our logic a wildcard), is at least as general as any element
of that type. However, there is not always an MGU. If φ and ψ do not have a
unifier, we sometimes write φ u ψ = ⊥, but this is only notation: ⊥ is not an
element of the pre-order.
The definition of MSCD is very much like the definitions of MSG and MGU:
Definition 6 (MSCD)
The Most Specific Common Denominator (MSCD) of φ relative to ψ (where φ
and ψ are of the same type), written as φc| ψ, is an object χ (of that type) such
that χ is at least as general as φ, and unifies with ψ, and such that any σ which
is also at least as general as φ and unifies with ψ, is not more specific than χ.
End Definition
The term φ in φc| ψ is sometimes called the default term (and ψ the non-default
32
term).16
2.3
MSCD over Syntactic/Semantic Structures
The Most Specific Common Denominator of two syntactic/semantic structures
indicates what these two structures have in common. It may also be viewed as
a means by which a structure can inherit information, for instance the the antecedents of its underspecified elements (such as pronomina and VP Anaphora)
from earlier structures. The MSCD is calculated over the structure of the incoming utterance and the structure of the DCU it is attached to. As an illustration
of the application of Common Denominator calculation to syntactic/semantic
structures, consider examples (50) and (51). (Whereas each verb in these examples has two translations, the examples show just one translation of the verb.
The twofold translation is unimportant here because no quantifiers are involved
in these examples.)
(50) (a) Mary kissed John
(b) (and) Susan kissed Peter.
Sa
[apply,
[apply,
[λSλT.T (λy.S(λx.KISS(x)(y)))<<<e,t>,t>,<<<e,t>,t>,t>> ],
[λQ.Q(JOHN )<<e,t>,t> ]],
[λP.P (M ARY )<<e,t>,t> ]]
Sb
[apply,
[apply,
[λSλT.T (λy.S(λx.KISS(x)(y)))<<<e,t>,t>,<<<e,t>,t>,t>> ],
[λQ.Q(P ET ER)<<e,t>,t> ]],
[λP.P (SU SAN )<<e,t>,t> ]]
16 A peculiar property of the formalism as it is defined here, is that there are no named
unification variables. Apart from bound variables, which give rise to an innocent ambiguity,
MSG, MGU and MSCD are all unique in this formalism. In a formalism that does have named
unification variables, one has to make sure that they get the same value throughout a formula.
Although MSG and MGU will still be unique, the MSCD will in general no longer be unique
in such a formalism. A solution to this problem is given in van den Berg and Prüst (1990),
which also discusses an application of such an extended formalism to unification of feature
structures and a comparison of MSCD calculation with Default Unification (Bouma (1990)).
33
Sa |c Sb
[apply,
[apply,
[λSλT.T (λy.S(λx.KISS(x)(y)))<<<e,t>,t>,<<<e,t>,t>,t>> ],
[λQ.Q(boy)<<e,t>,t> ]],
[λP.P (girl)<<e,t>,t> ]]
The Most Specific Common Denominator reflects the fact that M ARY and
SU SAN generalize to an anonymous variable, or wildcard, girl (ranging over
the sort girl), i.e. M ARY t SU SAN = girl. Similarly JOHN and P ET ER
generalize to boy.
(51) (a) Maaike likes belly-dancing.
(b) Fred hates it.
Sa
[apply,
[apply,
[λSλT.T (λy.S(λx.LIKE(x)(y)))<<<e,t>,t>,<<<e,t>,t>,t>> ],
[λQ.Q(BELLY − DAN CIN G)<<e,t>,t> ]],
[λP.P (M AAIKE)<<e,t>,t> ]]
Sb
[apply,
[apply,
[λSλT.T (λy.S(λx.HAT E(x)(y)))<<<e,t>,t>,<<<e,t>,t>,t>> ],
[λQ.Q(it)<<e,t>,t> ]],
[λP.P (F RED)<<e,t>,t> ]]
Sa |c Sb
[apply,
[apply,
[λSλT.T (λy.S(λx.em-attitude(x)(y)))<<<e,t>,t>,<<<e,t>,t>,t>> ],
[λQ.Q(BELLY − DAN CIN G)<<e,t>,t> ]],
[λP.P (human)<<e,t>,t> ]]
As a side effect of the calculation of the Common Denominator, the wildcard it
unifies with λP.P (BELLY − DAN CIN G)<<e,t>,t> (a simplified translation).
When no confusion can arise, we shall write φ(ψ) as a shorthand for [apply,φ,ψ],
and primed elements as abbreviations for full translations, as illustrated in (52).
(52) (a) Mary likes John
(b) Susan adores Peter
34
Sa
Sb
Sa |c Sb
3
3.1
(like0 (john0 ))(mary0 )
(adore0 (peter0 ))(susan0 )
(em-attitude(boy))(girl)
Discourse Parsing
Introduction
The grammar that we use to describe the hierarchical structure of discourse
is very similar to the one proposed in Scha and Polanyi (1988). A set of explicit rules assigns structural analyses (in the form of tree structures annotated
with labels and features including meaning representations) to discourses. The
composition of discourse structure out of so-called ‘discourse constituent units’
(DCUs) is described in a precise way by means of a unification grammar. The
rules describe how compound discourse constituent units are built up out of
their constituting DCUs.
The grammar comprises rules for building DCUs of different structural types.
Individual utterances are analyzed as basic DCUs whose feature structures may
be underspecified in several ways. As for complex DCUs, there are specific rules
for building different kinds of discourse subordinations (such as, for instance,
rhetorical subordinations, elaborations, interruptions) as well as for different
kinds of discourse coordinations (for instance, lists, question/answer pairs, narratives). Some relevant discourse grammar rules are articulated in section 3.2.
The grammar can be used for discourse parsing, i.e. for assigning structure
and semantics to an incoming discourse. Discourse structure is assigned on the
basis of rhetorical relations, discourse operators, parallelism and cohesion. Each
incoming clause is initially interpreted in a context- independent way. Every noninitial utterance in a discourse is assumed to be related by one (or possibly more)
of a fixed number of rhetorical relations to some earlier discourse constituent
unit. (cf. Talmy (1978), Hobbs (1979) for definitions of some rhetorical relations,
and Mann and Thompson (1988) for a more extensive taxonomy.) Rhetorical
relations obtain among larger discourse segments as well. Besides rhetorical relations, so-called ‘discourse operators’, like but, because, so, therefore, anyway,
play an important role in the recursive construction of DCUs. A crucial mechanism for the incorporation of a new utterance within the previous discourse is
the mechanism for establishing cohesion in discourse that we discussed in the
previous section (i.e. establishing what part of the syntactic/semantic structure
a DCU shares with the discourse unit it is attached to).
One difference with Scha and Polanyi (1988) is that we put more emphasis
on the view that discourse structure is built up through incremental construction. Accordingly, our grammar is formulated in terms of incremental parsing
rules rather than top-down generation rules. The discourse parsing process is
incremental, building discourse structure from left to right, one unit at a time.
35
A unit may be an elementary DCU, corresponding to an individual utterance,
or it may be a complex DCU that was built up locally by means of the same
rules.17
The incrementality of the parsing process brings with it that at any point
in the process, only the right nodes (or vertices) in the ‘discourse parse tree’
are ‘open’ for expansion, i.e. to form new (sub)structures, whereas the rest of
the nodes are ‘closed’ as, for instance, in the schematic example below. This
means that only information on the right nodes of the existing discourse tree
is addressed. In other words: a new DCU can only be added to a DCU that
contains material up to the (right) end of the discourse so far. When a new
DCU is added to an open node, all lower ones are closed off.
open
!!aaa
!
aa
!!
a
closed
open
closed
@
ll
@
@
l
closed
closed
closed
open
"HH
"
HH
"
"
H
"
closed closed
closed
Expansion of open nodes is carried out in accordance with the discourse grammar rules, which operate on DCUs. DCUs are represented by their category
and an n-tuple of attribute/value pairs. These attribute/value pairs represent
all relevant parameters of the DCU and its context. There are parameters to
represent context-independent semantics, context-dependent semantics, reference time, syntactic/semantic structure, a discourse referent set, spatial index,
modal index, and also indices indicating participants, speech event etc. Each
rule in the grammar may be used for expanding open nodes in a discourse
tree. Whether a rule actually applies depends on the constraints it poses on the
feature-values of the DCUs involved.
There are two kinds of rules: rules that construct a new DCU (construction
rules), and ones that extend an existing DCU (extension rules). Expansion of an
existing node N by means of a construction rule means: at the place of N , attach
the lefthand constituent L of the rule, node N becomes L’s leftmost daughter
(the first constituent on the right side of the rule), and the rest of the righthand
side constituents form the subsequent daughters of N . This is exemplified below.
17 It may be possible to re-formulate the grammar in a static way (so that in principle any
convenient grammar formalism can be used).
36
Old Tree
@
@
@
@
@
DCUN
Construction Rule: DCUA −→ DCUN DCUB
New input instantiates DCUB
New Tree
@
@
@
@
DCUA
@
@
@
@
DCUN
DCUB
Whereas DCUN was an open node in the old DPT, it is closed in the new tree.
In the latter tree, DCUA and DCUB are open nodes.
The other possible way to expand a DCUN , namely extending it by means
of an extension rule, is only possible if DCUN is n-ary with n ≥ 2. Extension
of a DCU comes down to adding a new daughter to that DCU. Like DCU
construction, DCU extension is subject to syntactic and semantic constraints
implemented in the grammar rules.
Old Tree
@
@
@
@
DCUN
c
L c
L c
L
c
c
L
DCU1 LL DCUi
Extension Rule: DCUN =⇒ DCUN DCUj
New input instantiates DCUj
37
New Tree
@
@
@
@
DCUN
a
ca
L caaa
L c
aa
L
c
aa
a
c
L
DCU1 DCUj
LL DCUi
Note that in an extension rule, the lefthand constituent is the same one as the
first constituent on the righthand side. For the sake of clarity, the arrows used
in extension rules are different from those in construction rules.
The discourse grammar assumes a sentence grammar that assigns (possibly
underspecified) structured semantic representations to sentences. (This division
of labour was already proposed by Williams (1977).) Each incoming clause is
initially assigned a context-independent syntactic/semantic representation by a
sentence grammar as discussed in section 2. This results in the formation of a
basic DCU which has as one of its attribute-values a meaning representation
that is possibly underspecified for context-dependent elements. In the course
of the parsing process, the incoming DCU is placed in its structured discourse
context, thus forming a new structured discourse context for a next utterance.
In processing that utterance, the most recent open node that can be extended
by the incoming DCU is preferred. (Extension of a recent DCU is preferred
over constructing a new DCU because extension of a DCU shows that the input
is semantically strongly connected.) The context-dependent interpretation of a
DCU is determined by its integration in the structured discourse. (The grammar
rules describe how this context-depentent interpretation is formed.)
At every stage of the parsing process, the semantics of a new node in the
parse tree is built up out of the context-dependent semantics of its daughter
nodes. A return to a previously processed open DCU, possibly indicated by a
pop-marker (such as o.k., well, anyway), closes off the most recently processed
DCU and all intermediate DCUs. Note that you can’t read the total semantics
off the top node. It must be collected off individual nodes.
3.2
Grammar Rules
As indicated above, the discourse grammar comprises different rules for different structural types. The two main structural types are subordination and
coordination. Subordinations may be semantically related (so-called ‘semantic
subordinations’), or there may be no semantic relation as in the case of interruptions. Semantic subordinations are binary structures in which one DCU is
subordinate to the other: all or most of the relevant features (reference time, spa38
tial index, modal index etc.) are inherited from the subordinating constituent.
Coordinations are n-ary (n ≥ 2) structures in which all elements have equal
status. In this section, we discuss a relevant sample of grammar rules. We have
not pursued completeness, but present a number of rules that are crucial for
discourse parsing in general and for an adequate treatment of VPA in particular. We first discuss some coordination structures. Most of these are essentially
characterized by parallelism, which is formalized by means of MSCD calculation. Though the computation of the semantics of all structures involves the
detection of parallelism, most subordination structures are not dependent on
the occurrence of parallelism. We refer to Scha and Polanyi (1988) and Prüst
(1992) for discourse grammar rules that cover additional structures.
In the grammar rules, the following operations, as defined in Section 2, are
used:
φ |c ψ
φ u ψ
φ ψ
is the Most Specific Common Denominator (MSCD) of φ relative to ψ
is the Most General Unification of φ and ψ
expresses that φ is at least as specific as ψ
An important application of these operations is (C1 |c S2 )uS2 : the (Most General)
Unification of the structures (C1 c| S2 ) and S2 , in which (C1 c| S2 ) is the (Most
Specific) Common Denominator of C1 and S2 . This (C1 |c S2 ) u S2 is also called
the ‘default unification’ of C1 and S2 (cf. van den Berg and Prüst (1990)).
Lists
The first rule is the construction of a so-called List DCU out of other (possibly
complex) DCUs. Rule(I) formalizes the idea that a List structure may consist
of two DCUs of arbitrary types that have some non-trivial common semantics.
Rule(I) (List Construction)
list [sem: C1 R ((C1 |c S2 ) u S2 ), schema: C1 |c S2 ]
−→ DCU1 [sem: S1 , consem: C1 ]
+ DCU2 [sem: R S2 , consem: ((C1 |c S2 ) u S2 )]
Conditions: C1 |c S2 is non-trivial, R ∈ {and , or , . . .}
Capital letters in a rule are used to indicate variables. Therefore the discourse
constituent units that constitute a List, ‘DCU1 ’ and ’DCU2 ’, may be of any
type. In the rules, R indicates the rhetorical relation which is taken as a primitive
operator in the syntactic/semantic structures, forming one new structure from
the component structures. In all rules, the overt realization of the rhetorical
relation R is optional. (Except for the disjunction realized by or in the List
rule, which cannot be implicit.) If the relation R is not overtly realized in List
structures, this indicates implicit coordination. As a parsing heuristic, explicitly
stated rhetorical relations should be treated first.
39
A List DCU comprises several attributes. For the sake of clarity, attributes
whose values do not constrain the formation of the DCU-structure under consideration will usually be omitted in the rules. Only the relevant attributes
are given in the rule: The sem attribute records the context-independent syntactic/semantic representation of the DCU. The attribute labelled ‘consem’
records the context-dependent syntactic/semantic representation The value of
consem is determined by the context of the DCU; it is always at least as specific
as the value of sem). In all rules, except the rule for Question-Answer Units,
the value of consem must always be grounded i.e. not contain any wildcards.
Finally, the schema attribute records common structure (the MSCD). Other
attributes like discourse referent sets, reference time, spatial index, modal index
etc. are not considered here (but cf. Scha and Polanyi (1988)).
What the elements of a List have in common must be semantically significant. The constraint with regard to the common semantics of the elements of
a List is implemented as a condition on the MSCD. The List rule builds on
this calculation for the recognition of ’optimal’ parallel structure. In general,
important aspects of context-dependence (such as anaphoric links and dependence of lexical items) are detected by MSCD calculation. Applicability of the
rule depends on the value of the schema attribute, the MSCD, which must be
‘non-trivial’, i.e. a ‘characteristic generalization’ of the syntactic/semantic representations of the constituting DCUs. This is a property of the MSCD structure
that will be discussed further on; it amounts to the structure merely consisting
of marked elements from the thesaurus structure.
Rule(I) parses sequences such as the following:
(53) (a) Maaike likes belly-dancing.
(b) (and) Brigitte likes flamenco-dancing.
(54) (a) Maaike likes belly-dancing.
(b) Fred hates it.
(55) (a) Maaike likes belly-dancing.
(b) (and) Brigitte does too.
(56) (a) Maaike dances.
(b) Saskia sings.
Any incoming clause, possibly underspecified, is initially interpreted in a
context-independent way and a basic DCU is formed which has as one of its
attribute-values (sem) a meaning representation which may be underspecified
for context-dependent elements. The basic DCUs for the sentences in (54), for
instance, are the following (The syntactic/semantic structures are in fact more
fine-grained than is shown here. The values in this example represent the toplevel structures of derivation trees, cf. section 2.):18
18 If
the first constituting DCU is initial in the discourse, the interpretation of this DCU obvi-
40
dcu1 [sem:(like0 (belly-dancing0 ))(maaike0 ),
consem:(like0 (belly-dancing0 ))(maaike0 )]
dcu2 [sem:(hate0 (it)(fred0 ), consem:?]
Construction of a List DCU comes about as follows: The context-independent
syntactic/semantic structure of the second DCU (S2 ) is matched with the contextually determined syntactic/semantic structure of the first one (C1 ). What
these structures have in common, their MSCD (C1 |c S2 ), is required to embody
a characteristic generalization in order to form a List.
C1
S2
C1 |c S2
(like0 (belly-dancing0 ))(maaike0 )
(hate0 (it))(fred0 )
(emotional-attitude(belly-dancing0 ))(human)
The MSCD is stored as the value of the schema attribute of the List and it
partly determines the syntactic/semantic structure that represents the contextdependent interpretation of DCU2 because the latter is taken to be the unification of the structures (C1 |c S2 ) and S2 :
(C1 |c S2 ) u S2
(hate0 (belly-dancing0 ))(fred0 )
The (context-independent) semantics of the List is the conjunction of the contextdependent semantics of the constituting DCUs. The full structure that Rule(I)
assigns to (54) is thus:
list1 sem: (like0 (belly-dancing0 ))(maaike0 ) and (hate0 (belly-dancing0 ))(fred0 )
schema: (emotional-attitude(belly-dancing0 ))(human)
dcu1 sem: (like0 (belly-dancing0 ))(maaike0 )
consem: (like0 (belly-dancing0 ))(maaike0 )
dcu2 sem: (hate0 (it))(fred0 )
consem: (hate0 (belly-dancing0 ))(fred0 )
As for the notion of characteristic generalizability: the idea is that two terms
are characteristically generalizable if they are generalizable to an object which
exhibits important characteristics of both terms. Of course, such a notion depends on linguistic factors like local and global context, genre and on nonlinguistic factors such as world knowledge. Formally, it may be covered by a
refinement of the thesaurus structure which is used for MSCD calculation (cf.
Section 2.2), as illustrated below.
ously does not depend on the preceding discourse. Therefore the context-dependent semantics
(consem) of an initial DCU, is by default taken to be identical to its context-independent
semantics (sem). (Though the values of these features may be different, for instance on the
basis of the non-linguistic context.)
41
M AAIKE,BRIG girl
JOHN ,F RED
boy
COF F EE,T EA drink
C female C human C animate entity
C male C human C animate entity
C liquid non-animate
entity
A pair of terms which are characteristically generalizable is preferred over
a pair of terms which are generalizable but not characteristically generalizable.
Among other things, the above ordering implies that the constants JOHN and
M AAIKE, via the sorts boy and girl that are assigned to them, are characteristically generalizable (indicated by C ) to human (a wildcard ranging
over the sort human, cf. Section 2). However, JOHN and COF F EE are only
non-characteristically generalizable (to entity). In general, a structure is ‘nontrivial’ if it merely consists of terms that are marked in the thesaurus structure
as being characteristic generalizations. A structure is ‘trivial’ if it contains some
wildcard that embodies no characteristic generalization. In the current approach
all possible semantically plausible matches within the antecedent clause are considered. In the case of ambiguity, the one which exhibits the strongest semantic
relation is preferable. The preference ordering of the different possibilities is not
articulated in this paper.
Extension
In order to build the List for (57), a rule for List extension is needed. Extension
rules extend DCUs that have already been built.
(57) (a) Maaike likes belly-dancing.
(b) Brigitte likes flamenco-dancing.
(c) (and) Matilda likes waltzing.
The following rule can be applied to extend an (n − 1)-ary List (n > 2) to an
n-ary List. (The rule may also be employed to extend Question-Answer Units,
discussed further on.)
Rule(II) (Extension)
DCUK [sem: S1 R ((Sc| Sn ) u Sn ), schema: S]
=⇒ DCUK [sem: S1 , schema: S]
+ DCUn [sem: R Sn , consem: ((Sc| Sn ) u Sn )]
Conditions: Sc| Sn S R ∈ {and , or , . . .}
The condition on rule(II), Sc| Sn S, says that the Common Denominator of S
and Sn must be at least as specific as S. This has the effect that the schema S is
‘applicable’ to Sn . This condition expresses a rigorous requirement on extension
of Lists because a once established schema S is required to be the schema all
clauses of the List. (One might consider relaxing this restriction to allow more
freedom in List extension, for instance by demanding the value of the schema
42
attribute of the extended List to be a characteristic generalization of all its
elements, irrespective of their order.) Thus the value of the schema attribute
controls the expansion of List DCUs in the incremental parsing process.
(58) (a) John raucht.
(b) Peter säuft.
(c) (und) Harry schreibt einen Brief.
The rule extends Lists as in (58), which is assigned the following List structure:
list1 sem: (smoke0 )(john0 ) and (drink0 )(harry0 ) and (write0 (letter0 ))(bill0 )
schema: (activity)(boy)
dcu1 sem: (smoke0 )(john0 )
consem: (smoke0 )(john0 )
dcu2 sem: (drink0 )(harry0 )
consem: (drink0 )(harry0 )
dcu3 sem: (write0 (letter0 ))(bill0 )
consem: (write0 (letter0 ))(bill0 )
The schema of the List that is constructed for John raucht. Peter säuft., namely
(activity)(boy), is applicable to the third utterance Harry schreibt einen
Brief. The Common Denominator of (activity)(boy) and (write0 (letter0 ))(bill0 )
is simply (activity)(boy), which is more specific than itself because the relation is idempotent. So, extension of the List is possible according to the rule
above. In the discussion of this example in Section 1, we have argued that parallel aspect is required in these sentences. We shall not pursue this matter much
further here, but it will be clear that the present approach provides the means
to account for this, for instance by requiring parallel aspect using a feature for
aspect in the Extension rule.
For every List extension, if S1 and S2 are the context-independent syntactic/semantic structures of two successive constituting DCUs, the following
condition holds:
S1 |c S2 is at least as specific as S2 |c S1 (i.e. S1 |c S2 S2 |c S1 )
This condition requires that what S1 can contribute to S2 is more than what
S2 can contribute to S1 . It excludes cataphora in Lists as, for instance, in He
walks. John whistles. and Bill doesn’t. John walks. However, it does allow John
got mad yesterday. He smashed a window. John never behaved aggressively before., but only as a structure which consists of two Lists (and not one tripartite
List). This is in accordance with results in psycholinguistic research which show
that renominalization may give rise to the construction of new DCUs, especially
if the referent is not ambiguous such as in the last example (cf. Vonk 1991)).
43
Contrast Pairs
The following rule for Contrast Pairs parses sequences as exemplified by (59),
in which the DCUs involved give rise to a binary coordinative structure that is
somehow contrastive.
(59) (a) John went to university,
(b) but Bill started a business.
Rule(III) (Contrast Pairs)
con [sem: C1 R ((C1 |c S2 ) u S2 ), schema: C1 |c S2 ]
−→ DCU1 [sem: S1 , consem: C1 ]
+ DCU2 [sem: R S2 , consem: ((C1 |c S2 ) u S2 )]
Conditions: C1 |c S2 is non-trivial, R ∈ {but, however , conversely, on the other hand , . . .}
As before, the rhetorical relation R is taken as a primitive operator R. In the
case of (59), the context-dependent interpretation of DCU2 (represented by
((C1 |c S2 ) u S2 )) contrasts its antecedent C1 . The relation of contrast itself is
not pursued here (cf. Asher (1993) for a definition).
Rhetorical Coordinations
Rule(IV) below parses Coordinations which involve a binary coordinating rhetorical relation R from {therefore, so, thus, . . .} as exemplified by (60).
(60) (a) The weather was fine,
(b) so Fred went to the beach.
As before, the rhetorical relation may be explicitly stated or not. Of course
implicit rhetorical relations like this are difficult to uncover; among other things,
it will need world-knowledge. Nevertheless, in order to account for discourse
structure, uncovering the implicit rhetorical relations in discourse cannot be
avoided. But the present state of the art is that implicit coordinative rhetorical
relations must be “computed by abduction, magic, or a similar A.I. technique”
(Scha and Polanyi (1988), p. 575).
Rule(IV) (Rhetorical Coordinations)
coo [sem: C1 R ((C1 |c S2 ) u S2 ), schema: (C1 |c S2 )]
−→ DCU1 [sem: S1 , consem: C1 ]
+ DCU2 [sem: R S2 , consem: ((C1 |c S2 ) u S2 )]
44
Condition: R ∈ {therefore, so, thus, accordingly, . . .}
It is important to notice the rule doesn’t impose any condition on C1 c| S2 , so
the schema may be trivial. This means that, although parallel structure will
be detected if it is present, absence of parallel structure does not prevent the
application of the rule. (In fact, this reflects the view that this rhetorical relation
cannot be captured in terms of structure sharing.)
At this point it should be noted that there is no one-to-one correspondence
between connectors and rhetorical relations. For instance, the Rhetorical Coordination in (61) expressed by means of the connector therefore may also be
expressed by means of and as in (62).
(61) (a) The brakes refused to act,
(b) therefore John crashed into a wall.
(62) (a) The brakes refused to act,
(b) and John crashed into a wall.
Unlike the previous Coordination structures, Rhetorical Coordination is not
characterized by the occurrence of parallelism. The heuristics of the parsing
process, however, prefers applying rules on the basis of parallel structure. This
means that in case an explicit coordinator like and is encountered, the rule for
Lists is tried first. If this fails (as in (62), the rule for Rhetorical Coordinations
is tried.
Question-Answer Units
Question-Answer Units are constructed as binary structures, but may be extended to n-ary (n > 2) structures in case of a number of successive answers
to one direct question. Whereas in List structures the value of the schema attribute is determined by calculation of the Common Denominator, in QuestionAnswer Units the syntactic/semantic structure of a direct question functions as
an explicit Common Denominator for the interpretation of the answer. Therefore, the syntactic/semantic structure of the question (SQ ) is taken as the
schema of the resulting structure in the following rule.
Rule(V) (Question-Answer Units)
qua [sem: ((SQ |c SA ) u SA ), schema: SQ ]
−→ DCU1 [mood: interrogative, consem: SQ ]
+ DCU2 [sem: SA , consem: ((SQ |c SA ) u SA )]
(pop-marker)
Condition: ((SQ |c SA ) u SA ) only contains ground terms (i.e. no wildcards) in its
terminal nodes
The condition on the applicability of the rule ensures that all WH constituents
45
are ‘answered’. The value of the schema attribute functions as before in the
interpretation of the compound structure.
The rule for Question-Answer Units assigns to (63) the structure below.
(‘A’ and ‘B’ indicate different speakers.) A direct question is taken to focus on
the answer. (Questions may enforce an exhaustivity effect, cf. Groenendijk and
Stokhof (1984)).
(63) A: Who sold the painting?
B: Fred did.
qua1 sem: (sell0 (painting0 ))(fred0 )
schema: (sell0 (painting0 ))(human)
dcu1 mood: interrogative
consem: (sell0 (painting0 ))(human)
dcu2 sem: (do)(fred0 )
consem: (sell0 (painting0 ))(fred0 )
The rule doesn’t distinguish between full and elliptical answers so that the
structure of (63) is comparable, for instance, to that of (64).
(64) A: Who sold the painting?
B: Fred sold the painting.
Example (65) illustrates that in cooperative conversation, the explicitness of a
question as an explicit Common Denominator is not always necessary to achieve
the same effect.
(65) A: Someone must have sold the painting.
B: Fred did.
The discourse grammar rules explain the correspondence as well as the distinction between (63)/(64) and (65). Whereas in (63)/(64) the structure of the
question functions as a Common Denominator for the interpetation of the answer, in (65) the interpretation of the second utterance is obtained by calculating
the Common Denominator over the two structures involved.
Question-Answer Units may be extended, by employing rule(II), to represent
a direct question with a number of successive answers. In general, an existing
DCU can only be extended by rule(II) if the value of its schema attribute
is instantiated by an incoming DCU, i.e. if the semantics that is common to
all daughters is also present in the new DCU. In the case of Question-Answer
Units, this ensures the right relation between the DCU that represents a direct
question and its successive answers. Any extension of a Question-Answer Unit
that does not instantiate its schema (so that the corresponding input does not
46
constitute an answer to the direct question) is disallowed, so that attachment
of the incoming DCU fails and must be tried elsewhere.
A Question-Answer Unit is optionally followed by a pop-marker. All discourse markers (push/pop-markers) and interruption-markers are treated as independent units, separate from the DCUs that they precede or follow.
Rhetorical Subordinations
Semantic Subordinations, i.e. Subordinations that are semantically related, consist of a subordinating DCU and a subordinated DCU. Both may be complex,
though subordinated DCUs tend to be complex more often than subordinating ones. The two DCUs that constitute a subordination structure do not have
equal status (as opposed to Lists, for instance). In the case of Rhetorical Subordinations, we shall assume (like Scha and Polanyi (1988)), that most of the
structurally relevant features are inherited from the subordinating constituent
(see Rule(VI) below). With regard to the semantic relation with other DCUs,
the subordinating DCU plays a crucial role, whereas the subordinated DCU is
less important.
Rule(VI) parses Semantic Subordinations which involve a rhetorical relation
such as ‘because’ or ‘since’. The variable R ranges over these subordinating
rhetorical relations.
Rule(VI) (Rhetorical Subordinations)
sub [f1 :v1 , . . . , fi :vi , index:k, sem: C1 R ((C1 |c S2 ) u S2 ), schema: C1 ]
−→ DCU1 [f1 :v1 , . . . , fi :vi , index:k, sem: S1 , consem:C1 ]
+ DCU2 [sem: R S2 , consem: ((C1 |c S2 ) u S2 )]
(pop-marker)
Condition: R ∈ {because, since, . . .}
This rule assigns structure to sequences such as (66) and (67).
(66) (a) Fred went to the dentist
(b) because he needed a check-up.
(67) (a) Fred went to the beach
(b) because the weather was fine.
Example (67) illustrates that a common semantics established by structure sharing is not crucial to this rhetorical relation.
In some sense Rhetorical Coordinations and Rhetorical Subordinations may
be each other’s mirror image. However, the structural difference is important:
in Rhetorical Coordinations it is not the case that one part ‘dominates’ the
other.19
19 In
fact, this structural difference may correspond to a semantic difference. Talmy (1978)
47
4
A Discourse Analysis of VPA
In this section we discuss how VP anaphors with simple or complex antecedents
are resolved. In general, anaphor resolution is viewed as a side effect of the process of establishing cohesion and discourse structure. In parsing discourse with
a discourse grammar of the sort exploited here, VPA resolution is indeed a side
effect of the general discourse parsing process. There is no need to stipulate an
autonomous mechanism for VPA resolution because the resolution mechanism is
integrated in the formal account of discourse structure and discourse semantics.
There is no account of antecedent-contained VPA. From the point of view
advocated here, antecedent- contained VPA involves a matching of smaller units
than clauses. Though some ideas about this matching of smaller units are presented in section 4.3, we shall delegate antecedent-contained VPA to sentence
grammar. A clause, then, that is underspecified by means of VPA is viewed
as a complete discourse constituent unit. It is a complete DCU with an incomplete interpretation (just as most clauses which are added to an unfolding
discourse). A DCU is ‘complete’ if it contains sufficient information to interpret
it in the light of the foregoing discourse. We have seen that a crucial mechanism in the integration of a new utterance within the previous discourse is a
mechanism for establishing the (Most Specific) Common Denominator of the
syntactic/semantic structures of adjacent DCUs, which embodies a formalization of a notion of cohesion. Extraction of the Common Denominator is not only
important for the articulation of cohesion, but also for a correct interpretation
of VPA.
In some sense the treatment presented here links up with both Sag (1977)
and Dalrymple, Shieber and Pereira (1991). The idea of Sag (1977)’s notion
of VP identity is incorporated in the broader notion of DCU parallelism, formalized by the Common Denominator, and the current treatment shares with
Dalrymple, Shieber and Pereira (1991) that a unification mechanism is used
for VPA resolution. In the current approach, however, the unification mechanism is exploited as a general mechanism to establish cohesion in discourse, so
contrary to Dalrymple, Shieber and Pereira (1991) it is not used as an ad hoc
mechanism for the resolution of VPA. (Another difference is that Dalrymple,
Shieber and Pereira (1991) use higher order unification, while here first order
unification is used.) The current approach differs from both Sag (1977) and Dalrymple, Shieber and Pereira (1991) in that it essentially incorporates a notion
of discourse structure.
argues that the main semantic difference between (1) and (2) below is that in the Rhetorical
Coordination (2), the proposition ‘the weather was fine’ is independently asserted, whereas in
(1), it is only expressed presuppositionally.
(1) Fred went to the beach because the weather was fine.
(2) The weather was fine, so Fred went to the beach.
48
4.1
Simple Antecedents
The ‘missing material’ of an anaphoric clause such as (68)(b) is contained in
the MSCD of the syntactic/semantic structures of the clauses involved, which
is given below.
(68) (a) Maaike dances
(b) (and) Brigitte does too.
Sa
[apply,
[λS.S(λx.DAN CE(x))<<<e,t>,t>,t> ],
[λP.P (M AAIKE)<<e,t>,t> ]]
Sb
[apply,
[do<<<e,t>,t>,t> ],
[λP.P (BRIGIT T E)<<e,t>,t> ]]
Sa |c Sb
[apply,
[λS.S(λx.DAN CE(x))<<<e,t>,t>,t> ],
[girl<<e,t>,t> ]]
In the MSCD calculation, the wildcard do unifies with its antecedent λS.S(λx.DAN CE(x)),
and M AAIKE and BRIGIT T E generalize to the wildcard girl (ranging over
the sort girl). The full interpretation of the anaphoric clause is reflected by the
structure that results from unifying the relevant MSCD with the structure of
the (context-independent) anaphoric clause (Sb ). This structure is given below:
(Sa |c Sb ) u Sb
[apply,
[λS.S(λx.DAN CE(x))<<<e,t>,t>,t> ],
[λP.P (BRIGIT T E)<<e,t>,t> ]]
As described in section 3.2, Rule(I) for List construction assigns the following
List structure to (68).
list1 sem: (dance0 )(maaike0 ) and (dance0 )(brigitte0 )
schema: (dance0 )(girl)
dcu1 sem: (dance0 )(maaike0 )
consem: (dance0 )(maaike0 )
dcu2 sem: (do)0 (brigitte0 )
consem: (dance0 )(brigitte0 )
49
Thus the grammar rules for assigning structure and semantics to a discourse
have VPA resolution as a side effect. (Note that the resolution mechanism covers
Sag (1977)’s idea of VP identity.)
In general, VPA induces parallel structure. As VPA may also occur in other
structures than those characterized by the occurrence of parallelism, it seems
appropriate to formulate this constraint separately:
VPA requires the MSCD of the syntactic/semantic structure of the antecedent
DCU (say S2 ) and that of the anaphoric DCU (S2 ), i.e. C1 c| S2 , to be nontrivial.
The general condition that C1 must merely consist of ground terms, i.e. not
contain any non-logical variables assures that there are no ‘holes’ in the antecedent (just as there cannot be any ‘holes’ in a piece of discourse that has
been interpreted in its context).
Resolution of an anaphoric clause which has no parallel syntactic/semantic
structure in its antecedent fails because the requirement on parallelism is not
met. This accounts for the following data:
(69) (a) A song was written by Maaike.
(b) * (and) Saskia did too.
(70) (a) The music, Maaike gave to him
(b) * (and) Saskia did too.
The syntactic/semantic structures of (69)(a) and (b), (λx.write0 (x)(maaike0 ))(asong0 )) and (do)(saskia0 ) don’t share any relevant structure. The same holds for
(70). In general, the requirement that the structure of a DCU that incorporates a
VP anaphor parallels that of its antecedent DCU, follows from the fact that the
MSCD of the syntactic/semantic structures is non-trivial, i.e. a characteristic
generalization of those structures.
It was argued in section 1.3 that VPA induces structural parallelism. For
instance, the interpretations of the sentences in (71), discussed before as (22)
must incorporate parallel quantifier structures.
(71) (a) Every woman likes a man
(b) (and) every girl does too.
The manner in which we propose to account for structural parallelism is to incorporate quantifier distribution in the translation of the main verb of a clause.
This idea goes back to the theory of ‘Flexible Montague Grammar’ (Hendriks
(1993)). Flexible Montague Grammar (FMG) is a system of flexible category
and type assignment, that has been proposed to provide a better account (than
original Montague Grammar) of quantifier scope ambiguities and a more adequate division of labour between the syntactic and semantic component of the
50
grammar. FMG deals with scope ambiguities as in (71)(a) by flexible typing of
the verb like so that the landingplaces of quantifiers are attached to the verb. In
FMG, every expression is assigned a basic translation with a semantic type that
is as ‘low’ as possible. General rules of raising and lowering, derive translations
with higher types from the basic translations. For instance, sentence (71)(a)
needs the following basic translations:
every woman
a man
like
λP.∀x(W OM AN (x) → P (x))
λP.∃y(M AN (y)&P (y))
λyλx.LIKE(y)(x)
Two nonequivalent ways of raising the lexical translation of like lead to the
following derived translations:
like
(a) λSλT.T (λy.S(λx.LIKE(x)(y)))
(b) λSλT.S(λx.T (λy.LIKE(x)(y)))
In the present state of the theory, these ‘lexical ambiguities’ are incorporated
directly in the translations of the verb so that translations (a) and (b) are
both stored in the lexicon. (This means that we need not incorporate the whole
machinery of FMG, which would complicate the matching process.) So, quantifier distribution can be considered a property of the translation of the verb.
Therefore structural parallelism may still be viewed as a property of the VPs
involved. Combining translation (a) with the basic translations of a man and
every woman results in the ‘wide scope universal’ reading, translation (b) leads
to the ‘wide scope existential’ reading (S and T are variables of NP-type).
Now consider the following structures for (71)(a) and (b) and their MSCD:
Every woman likes a man
in the reading ∀x(W OM AN (x) → ∃y(M AN (y) & LIKE(y)(x)))
Sa
[apply,
[apply,
[λSλT.T (λy.S(λx.LIKE(x)(y)))<<<e,t>,t>,<<<e,t>,t>,t>> ],
[λQ.∃y(M AN (y)&Q(y)))<<e,t>,t> ]],
[λQ.∀x(W OM AN (x) → Q(x))<<e,t>,t> ]]
51
Sb
[apply,
[do<<<e,t>,t>,t> ],
[λP.∀x(GIRL(x) → P (x))<<e,t>,t> ]]
Sa |c Sb
[apply,
[apply,
[λSλT.T (λy.S(λx.LIKE(x)(y)))<<<e,t>,t>,<<<e,t>,t>,t>> ],
[λQ.∃y(M AN (y)&Q(y)))<<e,t>,t> ]],
[λQ.∀x(female(x) → Q(x))<<e,t>,t> ]]
The MSCD contains the translation of the verb which incorporates the relevant
quantifier distribution. As before, the full interpretation of the anaphoric clause
is reflected by the structure that results from unifying the Common Denominator
(Sa |c Sb ) with the structure of the anaphoric clause (Sb ):
(Sa |c Sb ) u Sb
[apply,
[apply,
[λSλT.T (λy.S(λx.LIKE(x)(y)))<<<e,t>,t>,<<<e,t>,t>,t>> ],
[λQ.∃y(M AN (y)&Q(y)))<<e,t>,t> ]],
[λQ.∀x(GIRL(x) → Q(x))<<e,t>,t> ]]
This structure implies applying the mode of combination of the missing material
(i.e. the non-parallel elements) to the (parallel) elements of the anaphoric clause.
As the mode of combination is the same as in the antecedent clause, structural
parallelism is accounted for. (The interpretation of (71)(a) and (b) where the
existential quantifier has wide scope is obtained in the same manner but of
course with the other translation of the verb.)
When combined with a proper name John as in (72)(a), both translations of
the verb like result in the same verb phrase interpretation. This also holds for
the VP anaphor in the second clause.
(72) (a) Every woman likes John
(b) (and) every girl does too.
(73) (a) Mary likes a man
(b) (and) every girl does too.
In (73) the two different interpretations of the VP like a man result in one
and the same interpretation of the whole sentence: sentence (73)(a) has only
one meaning (despite the two different VP interpretations of like a man). The
second (anaphoric) sentence, however, has two readings, a wide scope and a
52
narrow scope reading for a man. These two readings are obtained by preserving
both syntactic/semantic representations of the antecedent clause, despite the
fact that they give rise to the same sentence interpretation.
Note that this approach builds essentially on the compositionality of the
semantics (i.e. on the fact that the meaning of a complex expression is a mathematical function of the meaning of its syntactic parts and their mode of combination (cf. Janssen (1983)). Usually, compositionality is a preliminary requirement
for elegance and credibility of a theory. In the current theory, compositionality
is more than a philosophically and practically motivated principle; it is essential
for an adequate treatment of VPA because this requires taking into account the
mode of combination of VPA-antecedents to deal with parallelism.
The treatment of instances of VPA that involve indexical parallelism follows
directly from the definition of the matching process (Section 2). For instance in
(8), repeated here as (74, the context-dependent syntactic/semantic structure
of (a), which incorporates the reference of him, is matched with the structure of
the anaphoric clause so that the interpretation of the pronoun is incorporated
in the interpretation of the VP anaphor.
(74) (a) Mary likes him
(b) (and) Susan does too.
Let’s consider again example (25), repeated here as (75) which has been
discussed in section 1.3.1.
(75) (a) Everyone told a man that Mary likes him
(b) (and) everyone told a boy that Susan does.
As quantifier distribution is attached to the main verb, matching the main
verbs of the clauses only yields a relevant Common Denominator in case of
parallel quantifier distribution. Parallel interpretations in which the pronoun
him is bound by one of the quantifiers are accounted for because again MSCD
calculation only yields a relevant Common Denominator in case of parallel binding. If him refers to some discourse entity, again only parallel interpretation of
the VP anaphor is obtained because the syntactic/semantic structure of an incoming DCU is always matched with the structure that represents the contextdependent interpretation of the antecedent. This means that the interpretation
of him is ‘passed on’ to the interpretation of the VP anaphor.
As for the phenomenon of strict versus sloppy identity, the account is a
standard one: The dichotomy of ‘bound variable’ versus ‘referential’ pronouns
is reflected by the twofold translation of pronouns, namely as a logical variable
and as a wildcard. The discourse framework, however, accounts for the influence of the context on deciding which interpretation is relevant in a certain
discourse. For instance, the strict interpretation is especially relevant if John’s
hat has somehow been established as part of the discourse topic in the previous discourse: if John’s hat is physically present and is under discussion, or by
53
explicitly putting it forward as a discourse topic by Let’s discuss John’s hat. or
What about John’s hat? or Who likes John’s hat? or by topic chaining as in (76):
(76) (a) John’s hat is old-fashioned,
(b) (but) John likes his hat
(c) (and) Fred does too.
The anaphoric clause (76)(c) only has the strict interpretation in this context.
Thus the discourse context may disambiguate a sequence which, in itself i.e.
without further context, can be characterized by a sloppy/strict ambiguity. We
refer to Prüst (1992) for a treatment of this observation within the discourse
framework.
Instances of VPA, such as (32), repeated below as (77), in which several
adjuncts are involved, are treated in the same way as the ones above.
(77) (a) John jogs in the street in the morning,
(b) (and) Mary does in the park.
However, these cases demand a more sophisticated notion of unification. We
come back to this in section 4.3.
4.2
Complex Antecedents
It was shown in section 1 that VP anaphors do not always refer to a single
preceding VP; they may have complex antecedents. The exact reference possibilities are determined by the structural relations in the discourse. Formulating
the restrictions on the reference possibilities of VP anaphora in terms of the
parsing process, we may say that VP anaphora can only refer to open DCUs.
If a VP anaphor does not refer to the immediately preceding sentence DCU,
the parsing algorithm, following the general parsing strategy, recursively tries
to attach the anaphoric clause to the previous DCU that is open for expansion.
We have seen in section 1.4 that the reference possibilities of VP anaphora
indicate that in case of topic chaining, complex predicates need to be available
for VPA, whereas a change of the argument in subject position limits the range
of VPA. For instance, in (78), in contrast with (79, the VP anaphor may refer
to sing but also to dance and sing.
(78) (a) Maaike dances.
(b) She sings.
(c) Saskia does too.
(79) (a) Maaike dances.
(b) Brigitte sings.
(c) Saskia does too.
54
Topic chaining may occur in Lists. For instance, (80) (an example from
Polanyi (1985) who classifies this structure as a ‘Topic Chain’) shows a series of
predications about the same argument.
(80) (a) John is a blond.
(b) He weighs about 215.
(c) He’s got a nice disposition.
(d) He works as a guard at the bank.
In the first sentence of the Topic Chain, this argument is usually expressed by
a full NP which occurs in subject position or as a preposed constituent. In the
other clauses, it is usually a pronoun in subject position. Polanyi (1985) argues
that Topic Chains are ”conceived of as more restrictive than merely chains
of clauses sharing a common sentential topic”. The propositions in (80), for
instance, not merely share the sentence topic John, but all express something like
“generally known and knowable property of John at the present time” (Polanyi
(1985), page 309).
Topic chaining is relatively independent of the rhetorical relations that hold
among DCUs. Topic chaining may occur within different kinds of rhetorical
structures, as exemplified by (81) through (83), respectively a List, a Rhetorical
Subordination and a List of Question-Answer Units.
(81) (a) Maaike dances.
(b) She sings.
(82) (a) Maaike likes belly-dancing
(b) The reason is that she dances herself.
(83) (a) What does John like for dinner?
(b) Spinach.
(c) Does he eat meat?
(d) Yes.
Since topic chaining is independent of the rhetorical relations, it is accounted
for in the ordering of the expressions in the language by means of an axiom (cf.
definition 3a in section 2.2):
φ(χ) R ψ(χ) ξ (λx.φ(x) R ψ(x))(χ), where R ∈ {and,but,because,. . . }
For instance, Rule(I) assigns to (78)(a)(b) a structure, in which ‘dance0 (maaike0 )
and sing0 (maaike0 )’ is the syntactic/semantic structure of the List. The above
axiom says that
dance0 (maaike0 ) and sing0 (maaike0 ) ξ (λ x.dance0 (x) and sing0 (x))(maaike0 )
55
In the MSCD calculation, this is exploited so that (do)(saskia0 ) from (78)(c)
matches with (λ x.dance0 (x) and sing0 (x))(maaike0 ) and resolves to (λ x.dance0 (x)
and sing0 (x))(saskia0 ) to obtain the complex-antecedent interpretation.
Note that the syntactic/semantic representation of the List of (79(a)(b) does
not have a more specific version, because these clauses have no argument in
common. Therefore, the MSCD of (79(c), namely (do)(saskia0 ), and (a)+(b),
namely (dance0 )(maaike0 ) and (sing0 )(brigitte0 ), is trivial (cf. section 3.2). (So
that (79(c) cannot be attached to the DCU that represents the List of (79(a)(b);
it can only be attached to the DCU that represents (79(b): only the MSCD of
the syntactic/semantic structures of (c) and (b) gives an appropriate result.
We have argued that discourse (39), repeated here as (84), gives rise to two
possible structures which entail two different interpretations of the VP anaphor.
(84) (a) John went to the library.
(b) He borrowed a book on computer science.
(c) Bill did too.
The anaphoric clause (c) may have (b) as antecedent or it may have the conjunction of (a) and (b) as antecedent DCU. (Just (a) as antecedent is also possible
with certain continuations after (c); this is discussed further on.) According to
Rule(I) of the discourse grammar, the first two sentences are parsed into a List
structure. The Common Denominator and semantics of this List are indicated
in the structure below:
list1 sem: (go-to-lib0 )(john0 ) and (borrow0 (book-on0 (computer-science0 )))(john0 )
schema: (activity)(john0 )
dcu1 sem: (go-to-lib0 )(john0 )
consem: (go-to-lib0 )(john0 )
dcu2 sem: (borrow0 (book-on0 (computer-science0 )))(he)
consem: (borrow0 (book-on0 (computer-science0 )))(john0 )
Only the right edges of this tree (labelled ‘list1 ’ and ‘dcu2 ’) are open for expansion. The next (anaphoric) clause may either be attached to the DCU of
sentence (b) (i.e. dcu2 ) or to the DCU labelled list1 . Two possible interpretations of the VP anaphor in (c) result from the two possible ways to incorporate
this clause in the structure above.
A first possiblity is to attach it by expanding the edge labelled dcu2 by
applying Rule(I). This results in the discourse structure below. The VP anaphor
is interpreted here as referring to ‘borrow a book on computer science’.
list1 sem: (go-to-lib0 )(john0 ) and (borrow0 (book-on0 (computer-science0 )))(john0 )
schema: (activity)(john0 )
dcu1 sem: (go-to-lib0 )(john0 )
consem: (go-to-lib0 )(john0 )
list2 sem: (borrow0 (book-on0 (computer-science0 )))(john0 )
56
and
(borrow0 (book-on0 (computer-science0 )))(bill0 )
schema (borrow0 (book-on0 (computer-science0 )))(boy)
dcu2 consem: (borrow0 (book-on0 (computer-science0 )))(john0 )
dcu3 sem: (do)(bill0 )
consem: (borrow0 (book-on0 (computer-science0 )))(bill0 )
The other possible way to build a discourse structure according to the grammar
rules is by expanding the edge labelled list1 to a(nother) List, again by applying
Rule(I). This has as a side effect that the VP anaphor is interpreted as ‘go to
the library and borrow a book on computer science’.
list2 sem: (go-to-lib0 and borrow0 (book-on0 (computer-science0 )))(john0 ))
and
(go-to-lib0 and borrow0 (book-on0 (computer-science0 )))(bill0 ))
schema: (go-to-lib0 and borrow0 (book-on0 (computer-science0 )))(boy)
list1 sem: (go-to-lib0 )(john0 ) and (borrow0 (book-on0 (computer-science0 )))(john0 )
schema (activity)(john0 )
consem: (go-to-lib0 )(john0 ) and (borrow0 (book-on0 (computer-science0 )))(john0 )
dcu1 sem: (go-to-lib0 )(john0 )
consem: (go-to-lib0 )(john0 )
dcu2 sem: (borrow0 (book-on0 (computer-science0 )))(he)
consem: (borrow0 (book-on0 (computer-science0 )))(john0 )
dcu3 sem: (do)(bill0 )
consem: (go-to-lib0 and (borrow0 (book-on0 (computer-science0 )))(bill0 )
Thus in many cases, structural constraints are simply accounted for by means
of the notion of accessibility that is indicated by means of open versus closed
nodes in the discourse parse tree. How structural constraints can be accounted
for by means of the notion of accessibility, is further illustrated by (85).
(85) (a) John prepared his exam thoroughly:
(b) He decided not to go on holiday.
(c) He spent the whole week studying
(d) and every day he went to bed early.
(e) Mary did too.
The fact that in (85), the VP anaphor may refer to the whole preceding text,
or to the VP in the directly preceding sentence, (85)(d), is partly an artifact
of the structure of this DCU. Schematically the discourse parse tree of (85)(a)
through (d) is:
57
DCUelaboration
Z
Z
Z
Z
Z
DCUelaborated
DCU(a)
,
@
,
@
,
@
,
,
@
,
@
DCU(b)
DCU(c)
DCU(d)
Only the rightmost nodes are open for expansion. So in this structure, three
nodes are open for expansion: the top node DCUelaboration , the node DCUelaborated
containing the elaborated part ((b) through (d)) and the node DCU(d) representing clause (d). (There are probably pragmatic reasons to prefer one possibility over the other.) The respective interpretations are: ‘Mary prepared her
exam thoroughly: She decided not to go on holiday. She spent the whole week
studying and everyday she went to bed early.’, secondly ‘Mary decided not to
go on holiday. She spent the whole week studying and everyday she went to bed
early.’, and ‘Every day Mary went to bed early.’.
As for embedded verb phrase anaphora, let’s reconsider (46), repeated here
as (86):
(86) (a) John went to the library.
(b) He borrowed a book on computer science.
(c) Bill did too.
(d) He borrowed a book on French.
In this case, the parallelism between (a)+(b) and (c)+(d) must be taken into
account, as discussed in section 1.4. Remember that parsing (86)(a) through (c)
gave rise to two possible structures. In one structure, the VP anaphor in (c) is
interpreted as ‘borrow a book on computer science’, and in the other structure
it is interpreted as ‘go to the library and borrow a book on computer science’.
None of these structures can be simply augmented incrementally by a DCU for
(86)(d) to account for the intended interpretation. It is because of examples like
this, that the parsing process cannot always proceed incrementally sentence by
sentence. We must allow the parser to build a complex DCU for (c)+(d) (with an
incomplete interpretation) locally, by means of the same grammar rules. Thus
clauses (c) and (d) are parsed into the following List:
list2 sem: (do)(bill0 ) and (borrow0 (book-on0 (french0 )))(bill0 )
schema: (do)(bill0 )
dcu3 sem: (do)(bill0 )
58
consem: (do)(bill0 )
dcu4 sem: (borrow0 (book-on0 (french0 )))(he)
consem: (borrow0 (book-on0 (french0 )))(bill0 )
The antecedent of the VP anaphor is not found within this List because the
semantic relation of contrastive kinship, necessary for VPA resolution, between
(86)(c) and (86)(d) is not realized. As before, clause (86)(a) and (86)(b) are
parsed into the following List structure (As before, without further context, the
value of consem of this List is taken to be the same as the value of sem):
list1 sem: (go-to-lib0 )(john0 ) and (borrow0 (book-on0 (computer-science0 )))(john0 )
schema (activity)(john0 )
consem: (go-to-lib0 )(john0 ) and (borrow0 (book-on0 (computer-science0 )))(john0 )
dcu1 sem: (go-to-lib0 )(john0 )
consem: (go-to-lib0 )(john0 )
dcu2 sem: (borrow0 (book-on0 (computer-science0 )))(he)
consem: (borrow0 (book-on0 (computer-science0 )))(john0 )
These DCUs of (86)(a)+(b) and (86)(c)+(d) form the constituents for the following structure, constructed by List rule(I):
list1 sem: ((go-to-lib0 )(john0 ) and (borrow0 (book-on0 (computer-science0 )))(john0 ))
and
((go-to-lib0 )(bill0 ) and (borrow0 (book-on0 (french0 )))(bill0 ))
schema: (go-to-lib0 )(boy) and borrow0 (book-on0 (subject)))(boy)
list2 sem: (go-to-lib0 )(john0 ) and (borrow0 (book-on0 (computer-science0 )))(john0 )
schema (activity)(john0 )
consem: (go-to-lib0 )(john0 ) and (borrow0 (book-on0 (computer-science0 )))(john0 )
dcu1 sem: (go-to-lib0 )(john0 )
consem: (go-to-lib0 )(john0 )
dcu2 sem: (borrow0 (book-on0 (computer-science0 )))(he)
consem: (borrow0 (book-on0 (computer-science0 )))(john0 )
list3 sem: (do)(bill0 ) and (borrow0 (book-on0 (french0 )))(bill0 )
schema: (do)(bill0 )
consem: (go-to-lib0 )(bill0 ) and (borrow0 (book-on0 (french0 )))(bill0 )
dcu3 sem: (do)(bill0 )
consem: (do)(bill0 )
dcu4 sem: (borrow0 (book-on0 (french0 )))(he)
consem: (borrow0 (book-on0 (french0 )))(bill0 )
This structure justifies the relation of contrast. The VP anaphor is interpreted
as referring to ‘go to the library’.
59
4.3
Complex Parallelism
Until this point we have assumed that we are matching structures of full sentences or bigger units - possibly with overt underspecified elements represented
by wildcards - in an unequivocal way. It will, however, be necessary to deal with
smaller units than clauses to account for Gapping (as in (87)) and (perhaps less
obviously) for some instances of VPA.
(87) (a) John drinks tea,
(b) and Fred coffee.
The relevant instances of VPA are cases like (32) and (34), repeated below as
(88) and (89).
(88) (a) John jogs in the street in the morning
(b) (and) Mary does in the park.
(89) (a) John jogs in the street in the morning
(b) (and) Mary does in the evening.
We have argued in section 1.3.2 that the semantic relation between elements of
the anaphoric clause and the elements of the antecedent strongly influences the
interpretation of the anaphoric clauses in these examples. Just straightforward
first order unification is not a sufficient device to cover these instances of VPA.
As in the case of utterances that exhibit Gapping, these instances of VPA cannot
be explained by some ‘pre-fabricated’ structure with holes/variables in it: neither
the number, nor the types nor the places of the gaps (the ‘deletion sites’) can
be specified beforehand, they have to fall out of the interpretation process.
The fact that real gaps do not involve any overt variables leads to a somewhat
different concept of the matching process that is needed, namely to a notion in
which the structure of the antecedent is used as a ‘default pattern’. The idea
is that new information is not only obtained by straightforward unification (for
overt underspecification in the case of does and jogs in (88)/(89)) but also by
some default inheritance (cf. the role of in the morning in the interpretation of
(88)(b)). In general, the structure of a possible antecedent discourse unit may
be treated as a pattern in which the elements of the new utterance find their
parallels on the basis of structural accessibility, contrastive kinship and recency.
The context-dependent interpretation of the new utterance is then obtained
by substituting the new elements in place of their parallels in the pattern and
applying the same mode of combination as in the antecedent. In that case,
the syntactic/semantic representation of a preceding DCU is used as a default
pattern for the incoming clause, which means that there is no stipulation of
variables or empty structures on the basis of presumed gaps. This different
concept of the matching process needs further study.
Another problem is to find the right notion of unification. Unrestricted higher
order unification (HOU) as proposed by Dalrymple, Shieber and Pereira (1991)
60
seems too powerful a mechanism. As long as the antecedent of a VP anaphor is a
single preceding VP, using HOU seems fine, but if we consider multiple reference
possibilities in a discourse like (78), (below as (90) it gives wrong results. The
previous sections have shown extensively that the concept of a VP anaphor
referring to a single preceding VP is too limited and therefore wrong. The VP
anaphor in (90) may refer to ‘sing’ or to ‘dance and sing’.
(90) (a) Maaike dances.
(b) She sings.
(c) Saskia does too.
HOU will generate these interpretations depending on which element is considered parallel: if she in (90)(b) is the parallel element, HOU yields λx.sing(x) and
if Maaike in (a) is the parallel element, HOU yields λx.dance(x)&sing(x). In the
latter case, however, HOU also yields λx.dance(x)&sing(maaike) as a possible
outcome, resulting in ‘Saskia dances and Maaike sings’ as the interpretation of
the anaphoric clause. However, reference of the VP anaphor to ‘dance’ is not
always allowed. It is only possible with a continuation that parallels/contrasts
with (b) (e.g. ‘But she barks.’; compare example (86)). HOU yields this interpretation without such a continutation, producing ‘Maaike sings’ for the second
conjunct. It is unclear how this problem can be solved without dislocating the
account of strict versus sloppy identity in Dalrymple, Shieber and Pereira (1991).
A type of examples that indicates a possible direction for the notion of
unification that is needed, concerns instances of VPA for which the unification
mechanism as it is formulated now is not sufficient because they show a more
complex parallelism. Consider the following example (from Tanya Reinhart,
p.c.).
(91) (a) Lucie tells everyone that she loves him,
(b) but no one ever believes that she does.
In this example, everyone and no one are parallel elements. This interesting
instance of indexical parallelism shows that for an overall treatment of VPA,
it must also be possible to match a clause containing a two-place predicate,
say ‘believe(P)(no one)’, with a clause containing a three-place predicate, say
‘tell(P)(everyone)(lucie)’. The two-place predicate (believes) must be allowed to
match with the three- place predicate + one of its arguments, the one that has
no parallel in the anaphoric clause, (Lucie tells) while maintaining indexical
parallelism. In order to handle this kind of parallelism, the system must exploit
the ordering of the meaningful expressions of the language further. In footnote
16 above, we have suggested an extension to the treatment of quantifiers that
can match different quantifiers (‘quantifier abstraction’ to deal with the parallelism between everyone and no one) and an extension that makes it possible
to ignore spurious wide scope operators (‘quantifier hiding’ to deal with Lucie).
Provided that we treat all arguments consistently as quantifiers, we can deal
61
with examples like (91) and a large number of related ones. However, under
what conditions ‘inexact’ matches of structures are wanted and/or allowed and
how this complicates the treatment of other instances of VPA, will need further
study. Some notion of unification, somewhere between first order unification and
the higher order unification employed in Dalrymple et al. (1991), is needed to
obtain the right empirical predictions.
5
Conclusion
We have seen that a discourse perspective on Verb Phrase Anaphora provides
important insights into this phenomenon. We have argued that VP anaphors do
not simply refer to a single preceding VP as most prevailing theories assume:
a VP anaphor may refer to a complex DCU, i.e. the antecedent need not be a
single VP; also the antecedent of a VP anaphor may reside inside a complex
DCU, i.e. it need not be located in the directly preceding clause. In general, the
reference possibilities of VPA are constrained by the structural relations in the
discourse. VPA needs a notion of syntactic/semantic parallelism to be treated
adequately.
We have argued for the appropriateness of a unification grammar for assigning structure and semantics to discourse. The discourse grammar proposed
turned out to be a fruitful framework for the treatment of VPA. The process
that is needed for the interpretation of VPA is parasitic upon a mechanism for
establishing parallelism and cohesion in discourse, MSCD calculation. This formulation in terms of a general discourse mechanism provides an approach that
is more general than the treatments proposed before, and it covers data that
have not been addressed so far.
References
Asher, Nicholas: 1993, Reference to Abstract Objects in Discourse, Studies in
Linguistics (50), Kluwer Academic, Dordrecht.
van den Berg, Martin and Hub Prüst: 1990, ‘Common Denominator and
Default Unification’, Proceedings of the first meeting of Computer Linguists In
the Netherlands (CLIN), 1-16, Utrecht.
Bouma, Gosse: 1990, ‘Defaults in Unification Grammar’, Proceedings of the
28th Annual Meeting of the ACL, University of Pittsburg, Pittsburg, 165-172.
Cresswell, Max J.: 1985, Structured Meanings: The Semantics of Propositional Attitudes, Bradford Books, The MIT Press, Cambridge, Massachusetts.
Dalrymple, Mary and Stuart M.Shieber and Fernando C.N.Pereira: 1991,
‘Ellipsis and Higher-Order Unification’, Linguistics and Philosophy 14(4), 399452.
62
Dowty, David and Robert Wall and Stanley Peters: 1981, Introduction to
Montague Semantics, Reidel Publishing Company, Dordrecht.
van Eijck, Jan: 1985, Aspects of Quantification in Natural Language, unpublished dissertation, University of Groningen.
Gardent, Claire: 1991, Gapping and VP Ellipsis in a Unification-Based Grammar, unpublished dissertation, University of Edinburgh.
Grober, E. and W.Beardsley and A.Caramazza: 1978, ‘Parallel Function
Strategy in Pronoun Assignment’, Cognition 6, 117-133.
Groenendijk, Jeroen and Martin Stokhof: 1984, Studies on the Semantics of
Questions and the Pragmatics of Answers, unpublished dissertation, University
of Amsterdam.
Groenendijk, Jeroen and Martin Stokhof: 1991, ‘Dynamic Predicate Logic’,
Linguistics and Philosophy 14, 39-100.
Grosz, Barbara and Aravind Joshi and Scott Weinstein: 1983, Providing a
Unified Account of Definite Noun Phrases in Discourse, SRI Technical Note 292.
Hendriks, Herman: 1993, Studied Flexibility - Categories and Types in Syntax and Semantics, ILLC Dissertation Series 3, Institute for Language, Logic
and Computation, Amsterdam
Hobbs, Jerry R.: 1979, ‘Coherence and Coreference’, Cognitive Sience 3,
67-90.
Janssen, Theo: 1983, Foundations and Applications of Montague Grammar,
CWI, Amsterdam.
Kameyama, Megumi: 1986, ‘A property sharing constraint in centering’, Proceedings of the 24th Annual Meeting of the ACL, 200-207.
Kamp, Hans: 1981, ‘A Theory of Truth and Semantic Representation’, in
J.Groenendijk, T.Janssen and M.Stokhof (eds), Formal Methods in the Study
of Language, Mathematisch Centrum, Amsterdam.
Kaplan, Jeff: 1984, ‘Obligatory TOO in English’, Language 60(3), 510-518.
Klein, Ewan: 1987, ‘VP Ellipsis in DR Theory’, in J.Groenendijk, D.de
Jongh and M.Stokhof (eds), Studies in Discourse Representation Theory and the
Theory of Generalized Quantifiers, Groningen-Amsterdam Sudies in Semantics
(GRASS), Foris, Dordrecht.
Kuno, Susumu: 1976, ‘Gapping: A Functional Analysis’ Linguistic Inquiry
7, 300-318.
Lang, Ewald: 1977, Semantik der Koordinativen Verknüpfung, Studia Grammatica XIV, Akademie-Verlag, Berlin.
Lightfoot, D.W.: 1982, The Language Lottery: Toward a Biology of Grammar, The MIT Press, London.
Mann, William C. and Sandra A.Thompson, 1988: ‘Rhetorical Structure
Theory: Toward a Functional Theory of Text Organization’, Text 8, 243-281.
Nash-Webber, Bonny and Ivan Sag: 1978, ‘Under whose Control?’, Linguistic
Inquiry 9, 138-141.
Partee, Barbara and Emmon Bach: 1981 ‘Quantification, pronouns and VP
anaphora’, in J.Groenendijk, M.Stokhof and T.Janssen (eds), Formal methods
63
in the study of language, Mathematisch Centrum, Amsterdam.
Polanyi, Livia and Remko Scha: 1984 ‘A Syntactic Approach to Discourse
Semantics’, Proceedings of COLING, 413-419, Stanford, California.
Polanyi, Livia: 1985, ‘A Theory of Discourse Structure and Discourse Coherence’, Papers from the general session of the Chicago Linguistic Society, CLS
21, 306-322.
Polanyi, Livia: 1986, The Linguistic Discourse Model: Towards a Formal
Theory of Discourse Structure, BBN Report No. 6409, BBN Labatories, Cambridge, Massachusetts
Prüst, Hub: 1992, On Discourse Structuring, VP Anaphora and Gapping,
unpublished dissertation, University of Amsterdam.
Reinhart, Tanya: 1983, ‘Coreference and Bound Anaphora’, Linguistics and
Philosophy 6, 47-88.
Robinson, J.A.: 1965, ‘A Machine-Oriented Logic Based on the Resolution
Principle’, Journal of the ACM 12(1).
Rooth, Mats: 1985, Association with Focus, unpublished dissertation, University of Massachusetts, Amherst, Massachusetts.
Sag, Ivan: 1977, Deletion and Logical Form, unpublished dissertation, MIT,
Cambridge, Massachusetts.
Sag, Ivan and Gerald Gazdar and Thomas Wasow and Steven Weisler: 1985,
‘Coordination and How to Distinguish Categories’, Natural Language and Linguistic Theory 3, 117-172.
Scha, Remko: 1981, ‘Distributive, Collective and Cumulative Quantification’,
in J.Groenendijk, T.Janssen and M.Stokhof (eds), Formal Methods in the Study
of Language (Part 2), Mathematisch Centrum, Amsterdam.
Scha, Remko and Livia Polanyi: 1988, ‘An Augmented Contextfree Grammar
for Discourse’, Proceedings of the 12th International Conference on Computational Linguistics (COLING), 22-27.
Siekmann, Jörg: 1989, ‘Unification Theory’, Journal of Symbolic Computation 7, 207-274.
Talmy, Leonard: 1978, ‘Relations Between Subordination and Coordination’,
in J.Greenberg, C.Ferguson and E.Moravcsik (eds), Universals of Human Language, Stanford University Press.
Vonk, Wietske: 1991, ‘Specificity of referring and the comprehension of text
structure’, Proceedings of the Workshop on Discourse Coherence, University of
Edinburgh.
Williams, Edwin: 1977, ‘Discourse and Logical Form’, Linguistic Inquiry
8(1), 101-139.
64
© Copyright 2026 Paperzz