How grammatical and discourse factors may predict the forward

How grammatical and discourse factors may predict the forward
prominence of referents: two corpus studies
Henk Pander Maat & Ted Sanders
Utrecht institute of Linguistics OTS
Utrecht University
Trans 10
NL-3512 JK Utrecht
The Netherlands
[email protected] [email protected]
Abstract
One of the ways in which discourse coheres is by means of repeated reference to
entities. Theoretical accounts of referential coherence propose heuristics for the
interpretation of referential expressions, which are especially important when there is
more than one potential antecedent. One of the most explicit accounts is provided by
Centering Theory (Grosz et al. 1995). Using features such as grammatical status,
expression type, and the referential relation with sentences still further back in the
discourse, it produces a ranking of discourse referents in terms of forward
prominence. We present two corpus studies of how these features, in combination
with discourse topichood, help to predict referential continuations in actual discourse.
In Study 1, we analyzed newspaper fragments in which he is preceded by a
sentence presenting two male singular participants. The factors Grammatical Role
(being a Subject), Backward Center Status and Discourse Topichood appear to
increase the chance that a referent is the intended one for a potentially ambiguous
pronoun, while Expression Type (noun or pronoun) makes no difference. In Study 2,
the continuations for sentences with two referents differing on the same four factors
were compared, assuming that the most prominent referent will reappear in the next
sentence. The study reveals that Grammatical Role only affects the form of
continuation: subject referents do not reappear more often, but when reappearing they
are more often realized as pronouns. Backward Center Status increases the chance of
subsequent references to a referent, and also decreases the chances for its competitor
of being referred to again. Discourse Topichood has the same double effect. In
conclusion, both global and local factors affect referential prominence, but in different
ways.
1
Determinants of forward prominence
Language users communicate by discourse. A crucial property of discourse is that the
utterances are related in various ways: they show coherence. We consider discourse
coherence to be a mental phenomenon, i.e. language users make a coherent
representation of the discourse under consideration (Gernsbacher & Givón 1995;
Garnham & Oakhill 1992; Graesser, Millis & Zwaan 1997; Sanders & Spooren 2001).
Still, the discourse itself contains more or less overt signals that direct this
interpretation process. Connectives like because, for instance, function as linguistic
markers of coherence relations (Hobbs 1979; Sanders, Spooren & Noordman 1992)
between utterances, which the reader uses as a cue to interpret the upcoming utterance
as the cause for a consequence stated in the first utterance (Noordman & Vonk 1997).
In addition to such relational coherence, there is another important type of coherence:
referential coherence, which results from the continuous reference to entities in the
discourse. The relevant linguistic signals are references in various forms: full NPs,
pronouns, zero anaphora, etc. (Ariel 1990, Cornish 1999, Givón 1983, Gundel,
Hedberg & Zacharski 1993). Over the last decades, linguistic and psycholinguistic
studies have shown the relevance of both types of coherence for the study of discourse
(Sanders & Spooren 2007). In this paper we focus on referential coherence.
Referential coherence is not just a matter of retrospectively linking current
discourse segments to earlier ones. It also includes preparing the reader for certain
continuations. This requires writers and readers to coordinate their individual models
of referential prominence. Arnold (2001: 154) explains why this coordination is
relevant for discourse processing:
I propose that the accessibility of a given referent, which can be modeled in
terms of activation of its representation, is driven by the likelihood that it will
be referred to again in the following discourse (…). When a referent is likely
to be referred to again, it behooves comprehenders to instantiate a relatively
activated representation of that referent in their model of the discourse. Then,
if the speaker does refer to that entity again, comprehension will be facilitated.
This notion of forward prominence can be illustrated by considering ‘ambiguous’
subject-pronouns, in which the preceding utterance provides several candidate
antecedents. The most prominent of these is probably the intended referent for the
pronoun. But how does the reader determine which referent is meant to be the most
prominent one in fragments such as (1)?
(1)
George hit Al. He …
This referential configuration is not so uncommon as one might think. In fact, our
Dutch newspaper corpus shows that approximately 2.2% of the sentences preceding
the male singular pronoun hij (he) contains two potential antecedents (i.e. two
singular male participants). This suggests that language users deal with pronoun
ambiguity on a routine-like basis.
How do theories of the referential structure of discourse deal with this
problem? Many prominent theories consider pronouns to indicate a high degree of
activation (Chafe 1994), importance (Givón 1995), or accessibility (Ariel 1990, 2001,
Givón 1992: 16) of their referents, as opposed to, for instance, full NPs. So, the
2
question seems to be which participant in (1) is highest in terms of activation. One of
the factors determining activation is the distance (in number of clauses) between the
anaphor and the preceding reference to a potential antecedent: the larger the distance,
the lower the activation. This factor does not help in the resolution of ambiguous
pronouns such as in (1), since both candidate referents are equally close here. For this
reason, we need to look for more subtle cues regarding the prominence of referents.
Centering Theory (see Grosz et al. 1995 and Walker et al. 1998 for overviews)
offers an attractive theoretical framework for this investigation. Given a multi-referent
utterance, the theory predicts the likelihood for these referents of (still) being a central
referent in the next utterance. The salience of a discourse entity is determined by a
combination of syntactic, lexical and contextual factors, such as grammatical role
(subject or not), expression type (zero, pronoun or NP) but also its occurrences in the
preceding discourse segments. Next, we wil review these factors one by one.
For cases like (1), grammatical role seems a crucial factor. Brennan et al.
(1987) and Kameyama (1996) have suggested a ranking in which the subject precedes
the object(s) and in which the object(s) in turn rank higher than other grammatical
functions; that is, when a referent is placed in subject position, it stands a better
chance to be the central referent in the subsequent discourse than when it is an object
in the preceding utterance. In a corpus study of oral reports of a basketball game,
Brennan (1995) has shown that speakers tend to introduce participants as subject
referents more often when they expect this participant referent to remain in the center
of attention than when subsequent events are uncertain. She also points out a
difference between the ways in which referents of subject and object NPs are referred
to in subsequent utterances: subject participants were more often pronominalized than
object participants. She explains this by reasoning that object referents are not fully
established yet as the center of attention, and can more easily be pronominalized after
having filled the subject position once.
That subject referents are especially prominent has been suggested by other
theorists as well. For instance, Givón (1992, 10-11; 1995, 65-67) proposes a
grammatical hierarchy similar to the centering hierarchy for what he calls ‘topicality’,
and the corpus evidence presented in Givón (1995) supports the claim that subject
referents are more persistent in subsequent discourse. Arnold (1998) arrives at similar
conclusions. Tomlin (1997) even states that the subject role grammatically encodes
topic-status, and that the notion of topic by itself is superfluous.
Moreover, the subject preference in the interpretation of pronouns has received
empirical support in experimental work by centering theorists (Gordon et al. 1993,
Gordon & Scearce 1995, Hudson-D’Zmura & Tanenhaus 1998) and others
(Frederiksen 1981, Crawley et al. 1990). For instance, when a subject referent is
resumed by means of a name instead of a pronoun, this slows down reading. This
phenomenon has been called the ‘repeated name penalty’. In reading time
experiments by Gordon et al. (1993) and eye-tracking experiments by Kennison &
Gordon (1997), this penalty did not occur when the intended referent was in nonsubject position.1
Another indicator for referential prominence relates to the expression type used for the
antecedent. For instance, Givón (1992, 1995) has stressed the difference between
definite and indefinite noun phrases, in that indefinite nominals typically introduce
referents that will be resumed in subsequent discourse. Kameyama (1998) suggests
that pronominalization indicates the forward saliency of a referent. For instance, the
object referent seems to stand a greater chance of being resumed in (1b) than in (1).
3
(1)
(1b)
George hit Al. He …
George hit him. He …
In order to see why pronoun use could indicate forward prominence, we need to
introduce a few notions of Centering Theory (Grosz et al. 1995, Walker et al. 1998).
According to the theory, every non-initial utterance is taken to contain both a socalled backward center (CB), serving as a link to the preceding utterance, and a set of
forward-looking centers (CF). It is assumed that there is only one CB; the CF set is
ranked according to preference. The highest-ranked member of the set, the preferred
center (CP), represents a prediction about the backward center of the following
utterance. In terms of this framework then, the issue to be decided in the resolution of
the anaphor he in (1) is: what referent must be taken as the preferred forward center
(CP) of the first utterance?
In (1b), the object pronoun suggests that the object referent is already salient;
in fact, Centering Theory posits that when there is only one pronoun in a segment, this
must refer to the backward center. It might be that readers expect a CB to be
continued within the subsequent discourse, because they assume that discourse tends
to have a central protagonist. This assumption would movitate a preference for
pronominal referents when searching antecedents for ambiguous pronouns. This
discussion implies that, in fact, it may be backward center status that confers
referential prominence on a referent, not its expression type. In that case, CBs in the
form of full NPs would also be strong candidates for resumption in subsequent
discourse. In our corpus studies, we will try to tease apart both factors.
Besides grammatical role, expression type and/or backward center status, we will
consider another potential determinant of forward prominence: the degree to which a
referent is the global discourse topic. For instance, fragment (1) may be preceded by a
story in which George is the main protagonist, but also by a story about Al. In a
reading time study, Hoover (1997) has shown that readers expect the main protagonist
to be pronominalized and a secondary character to appear in nominal form, not the
other way round. Similarly, quite a number of studies (such as Garrod et al. (1994);
see also Garrod & Sanford (1994) and Sandford & Garrod (1994) for overviews) have
shown that the interpretation of pronouns is affected by the global discourse topic, and
not just by the make-up of the immediately preceding utterance. We will further refer
to this factor as discourse topic status.
This study focuses on grammatical role, expression type / backward center status and
discourse topic status as possible factors affecting pronoun resolution and forward
prominence in general. This is not to say that they are the only relevant factors. Let us
briefly review other factors proposed in the literature.
The first one is order of mention. Gernsbacher (Gernsbacher & Hargreaves
1988, Gernsbacher, Hargreaves & Beeman 1989) has proposed that the firstmentioned noun phrase in a sentence has a privileged status over other potential
antecedents. The first-mentioned antecedent is generally the subject referent in
languages like English, but in languages with a free word order, such as Finnish, the
two factors can be examined separately. In an eye-tracking study using Finnish texts,
Järvikivi et al. (2005) have shown that both being mentioned first and being the
subject referent independently increase the chance that a referent is considered the
probable antecedent of an ambiguous pronoun (see also Gordon et al. 1993). Since our
4
study uses Dutch corpus materials with canonical SVO word order, we cannot tease
apart order of mention and grammatical role. Henceforth we will focus on
grammatical role, without entirely being able to rule out the possibility that some of
the effects are due to a combined operation of role and order of mention.
Another factor influencing the resolution of ambiguous pronouns is the
thematic role of the antecedent being considered. For instance, when reading a
sentence starting like John amazed Bill because he … readers tend to interpret the
pronoun as coreferential with John; this is not because John is in subject position, but
because he has the thematic role of Stimulus, while Bill has the role of Experiencer.
The preference for Stimulus antecedents over Experiencer antecedents also holds
when the grammatical roles are changed, as in Bill admired John because he …, in
which John is still the preferred antecedent (Au 1986). Another thematic role effect
was shown in a continuation experiment (Arnold 2001) on transfer verbs (e.g. give,
offer, send). It was found that Goal participants were more often referred to first in the
continuation utterance than the source participants.
Another factor affecting pronoun resolution is structural parallelism between
the pronoun sentence and its predecessor. Chambers & Smyth (1998) have shown that
in certain contexts, pronouns tend to be seen as coreferential with the antecedent in
the same structural position. For instance, in Josh criticized Paul. Then Marie insulted
him the object pronoun is considered to refer to Paul, not Josh. This tendency has been
empirically supported for two utterances with the same global constituent structure,
and containing predicates that are semantically close to each other (e.g. criticized /
insulted, handed / passed). Moreover, in this parallel context the repeated name
penalty, which is normally restricted to expressions referring to the subject referent of
the preceding sentence, is equally valid for expressions with subject and object
antecedents.
Finally, referential coherence has been shown to be co-determined by
relational coherence. Following suggestions by Kehler (2002), Wolf & Gibson (2004)
examined pronoun reading times in the context of resemblance and causal coherence
relations. They hypothesized that the parallel preference strategy posited by Chambers
& Smyth (1998) only holds for certain coherence relations. In the context of
resemblance relations, parallel references are read faster than non-parallel ones, as in
the Josh & Paul-example above. But when the same segments are related by a causal
connective, the non-parallel version (Fiona defeated Craig and so James
congratulated her) is faster than the parallel one (Fiona defeated Craig and so James
congratulated him).
Although we do not question the relevance of the factors just reviewed, we believe
that thematic role and coherence relation cannot be central to people’s heuristics in
everyday language use, simply because they often do not clearly guide expectations.
Thematic roles, for instance, are undoubtedly important predictors of continuations in
the environment of verbs of transfer (Arnold 1998) and stimulus-experiencer
predicates (Au 1986), but many other predicates do not imply similar restrictions on
referential continuations (compare the verbs listed in Au (1986)). Also, the structural
parallelism factor has been shown to be considerably reduced when the constituent
structures of two adjacent utterances do not match exactly, as (2) and (3) illustrate.
(2)
(3)
Josh criticized Paul and then Marie insulted him.
Josh criticized Paul and then Marie asked him to leave.
5
We suspect that full parallelism as in (2) is quite rare in actual discourse.
In this study we will focus on the factors grammatical role, expression type, backward
center status and discourse topic status, because language users can use these
indicators in almost all contexts of referential ambiguity. But before proceding to our
corpus studies, we will refine our research question in terms of Centering Theory.
2
Centering Theory views on forward prominence
2.1. Four types of transitions
In its most influential version, stemming from Brennan et al. (1987), Centering
Theory distinguishes between four kinds of referential transitions. The distinctions are
made in terms of the notions of preferred forward centers (CP) and backward centers
(CB) which were introduced above. We will illustrate them with small discourse
fragments consisting of three utterances: A context utterance (in parentheses),
followed by a first (U1) and a second utterance (U2). The transition we are concerned
with is the one between U1 and U2. Following the standard view in Centering theory,
we will assume that the subject realizes the preferred forward Center (CP),.
The first type of transition is the so-called Continuous Sequence. Consider (4):
(4)
U1
U2
(George was not here yesterday.)
He (CB1) was ill.
He (CB2 = CB1 = CP2) is often ill at this time of year.
(4) realizes a continuous sequence, because the CB of the second utterance is identical
to both the CB of the first utterance (CB2 = CB1) and the preferred forward center of
the second utterance (CB2=CP2).
For the definition of continuous sequence it does not matter whether the backward
center of U1 and U2 (CB1/CB2) is also the preferred forward center of U1 (CP1). In
(5) another participant (Al) appears in subject position in U1, thereby constituting
CP1; but the fragment still represents a continuous shift.
(5)
U1
U2
(George has not been in for two days now.)
Al (CP1) has just called him (CB1).
He (CB2 = CB1 = CP2) turned out to be ill.
The second type of shift is called Retain. Again, CB2 is identical to CB1 in this
sequence, but CB2 is not identical to CP2, see (6) and (7).
(6)
U1
U2
(7)
U1
U2
(George was not here yesterday.)
He (CB1=CP) was exhausted.
Doctor Stephens (CP2) has ordered him (CB2=CB1) to rest for a week.
(He is very cautious in cases like this.)
(George is ill).
Al (CP1) heard that from him (CB1).
Maria (CP2) took a few days off to take care of him (CB2=CB1)
6
U3
(She has a very busy job, so there must be something seriously wrong).
In this type of shift, CB1 returns as CB2, but together with this element of continuity,
U2 contains a ‘new’ forward looking center (CP2), so that there is a slight
reorientation in the discourse.
The third type is the smooth shift. In this case, CB1 does not return. It is replaced by a
new CB2 that also CP2. The shift is smooth because CP2 does not come as a surprise,
having been introduced earlier in U1. See examples (8) and (9).
(8)
U1
U2
(George has been ill since Monday.)
He (CB1 = CP1) is being looked after by Maria (Y).
She (CB2 = Y = CP2) has taken a few days off.
U1
U2
(George has not been here for two days.)
Al (Y=CP1?) has called him (CB1).
He (CB2 = Y = CP2) wanted to know what was going on.
(9)
Finally, there is the so-called rough shift, in which no centers are shared: CB2 is not
CB1, but CP2 is new as well, see (10).
(10)
U1
U2
(George is ill.)
The doctor (CP1) has sent him (CB1) to a specialist (Y).
Maria (CP2) started a row with that man (CB2=Y).
CB2 can even be missing, as in (11).
(11)
U1
U2
(George is ill.)
The doctor (CP1) has sent him (CB1) home.
Maria (CP2) did not like it at all.
Rule 2 in Centering Theory states the following preference order for transitions:
continue > retain > smooth shift > rough shift. In other words, the more continuous a
sequence is, the more it is to be preferred.
2.2
How Centering Theory deals with ambiguous pronouns
Let us return to our pronoun ambiguity problem. So far we have used George hit Al.
He…as an example, but by now it is clear that we need more context for a centering
analysis. Next, we discuss three scenario’s, each of them starting with an utterance
referring to two characters by their names. Most often, the participants will have been
mentioned earlier on:
(12)
U1
U2a
U2b
(George and Al are twin brothers. They are often angry at each other.)
Yesterday, George (CB1=CP1) punched Al (CB1) in the face.
He (CB2=CB1=CP2) had been pestered all day.
He (CP2) fell (CB2=CB1) down with a bloody nose.
This example is problematic since it forces us to posit two backward centers in U1,
something which is not allowed in the current version of Centering Theory. If it was,
7
however, both continuations would constitute Continue transitions, since in both
continuations the CB remains in place. So there may be ‘symmetrical’ referential
constellations in which both referents are equally prominent, notwithstanding the fact
that only one of them fills the subject position. Here, the local factor is overridden by
the discourse structural factor.
Let us now inspect two cases in which one of the referents is pronominalized
and has already been introduced in the preceding discourse, while the other is new.
First assume that the subject referent is ‘old’ and is realized as a pronoun, while the
object referent is new and realized as a name. Resuming the old subject referent
constitutes a Continue transition (see U2a), while resuming the object referent is a
Smooth Shift (see U2b). U2a is clearly preferred:
(13)
U1
U2a
U2b
(George has not been in for two days now.)
He (CB1= CP1) has just called Al (Y).
He (CB2 = CB1 = CP2) turned out to be ill.
He (CB2 = Y = CP2) was happy to receive the call.
Finally, consider the most complex case: an utterance in which the subject referent is
referred to by name and the object referent is ‘old’ and realized by pronoun.
Suprisingly, this does not change the transition analysis. Resuming the subject
referent still constitutes a Continue transition (see U2a), whereas resuming the object
referent is a Smooth Shift (see U2b):
(14)
U1
U2a
U2b
(George has not been in for two days now.)
Al (Y=CP1) has just called him (CB1).
He (CB2 = CB1 = CP2) turned out to be ill.
He (CB2 = Y = CP2) wanted to know what was going on.
The principle that Continue transitions are preferred over Smooth Shifts embodies a
preference for global continuity, such as can be seen in U2a: the subject referent of
the context utterance George remains the backward center in U1 and U2, even though
there is another subject referent in U1. However, besides proposing a preference order
for transitions, centering theory holds the assumption that subject status is an
indication of CP-status (at least in English). This claim seems to embody a preference
for local continuity: the Subject referent of U1 is CP and therefore the most likely
candidate for CB in U2. This would make U2b be the preferred continuation in (14).
So far, we have sketched the rules in the standard framework of Centering
Theory. However, Kameyama (1998) presented a somewhat different proposal. First,
whereas the standard framework assumes that the set of forward Centers is always
ordered, so that there is always a single entity that is the most prominent, Kameyama
allows for the possibility that there is indeterminacy: different entities may compete
for prominence. This indeterminacy might explain for the phenomenon of ‘real’
pronoun-ambiguity. Second, Kameyama does not regard the different transition-types
as primitives. Instead she introduces two linguistic hierarchies: the grammatical
function hierarchy (GF-ORDER) and the nominal expression type hierarchy (EXPORDER).
(a)
(b)
GF-ORDER: subject > object > object2 > others.
EXP-ORDER: zero pronominal > pronoun > definite NP > indefinite NP.
8
GF-ORDER states that a higher ranked phrase is normally more salient in the ranked
set of forward centers, which Kameyama refers to as the ‘output attentional state’ of
the utterance. EXP-ORDER states that an entity realized by a higher ranked
expression type is normally more salient in the ‘input attentional state’, that is the
prominance ranking passed on to the utterance by the preceding discourse. Kameyama
also claims, somewhat mysteriously, that EXP-ORDER predicts the relative salience
of entities in the output attentional state, “since these assumed salient levels are also
accommodated into the context” (o.c. 92). This might be viewed as a local way of
capturing the global preference for continue transitions in the standard framework.
After all, many pronouns refer to backward centers.
In Kameyama’s terms, it is the interaction between GF-ORDER and EXPORDER which produces indeterminacy in cases like (14) above: in U1 the subject Al
and the pronoun him are both salient entities, one on the basis of GF-order, the other
on the basis of EXP-ORDER.
To sum up, where the standard Centering Theory framework produces
conflicting predictions - the global preference for continuous transitions conflicts with
the local principle that subjects realize preferred forward centers – Kameyama’s
approach predicts an indeterminacy resulting from the interaction of two local
mechanisms: GF-ORDER and EXP-ORDER.
As stated earlier, this paper investigates the contributions of grammatical role,
expression type and backward center status to forward prominence. The underlying
issue is to what degree local and contextual factors determine forward prominence.
Grammatical role is clearly a local factor, since it is only weakly constrained by the
preceding text. Expression type is different in that it has been shown to crucially
depend on input levels of activation; we might call it a local reflection of a contextual
factor. Backward center status is more clearly contextual in that it may be, but need
not be reflected in expression type. The backward center is simply the highest ranked
CF of the previous utterance that is realized in the current utterance (here we follow
the standard CB-definition of Brennan et al. 1987).
The Centering preference for continuous sequences embodies a contextual
constraint according to which the forward prominence of a referent depends on
backward center status, that is on having been mentioned prominently in the
preceding segment. However, the relevant context may be larger than just one
segment. For instance, it might be that the importance of a referent in the preceding
paragraph independently contributes to its forward prominence. Hence we will also
incorporate a global measure of referential importance in this study, the so-called
Discourse Topic Score, to be defined in section 3.3.4.
We close this section with a note on terminology. Poesio et al. (2004) have pointed
out that there a various ways in which the central notions and claims of Centering
Theory have been defined. Therefore, any study pretending to apply centering notions
should clearly state which version of the framework is being used. The main notions
that need to be explicated for this study are backward center, what to count as a
realization of a referent, and how to define an utterance.
Our definition of backward center has been given above. As for realization, we
will only take into account direct mentioning of referents, excluding bridging
inferences of various kinds (e.g. between items such as The room and the door in later
utterances) 2. Our impression is that this restriction does not make too much of a
difference. Our study focuses on person references and entities associated to persons
are often marked as such by possessive pronouns, producing a direct reference,
9
avoiding the need of bridging inferences. Finally, we will define utterances as finite
clauses, although, as we will see later, not every clause is equally important in
referential terms (see 3.2).
3
A corpus study of pronoun ambiguity
3.1
Research method
The issues raised in section 2.2 clearly require empirical investigation. Most empirical
work on referential coherence uses either corpus analysis (e.g. Ariel 1990, Givòn
1992, Brennan 1995) or experiments (e.g. Gordon et al. 1993, Chambers & Smyth
1998). Both methods have their strengths and weaknesses. Experiments offer the
advantage of control: potentially distracting variables can be controlled for in the
experimental materials. On the other hand, corpus analysis presents a more realistic
view on the referential transitions actually encountered in certain discourse genres;
moreover, corpora offer richer contexts for referential expressions than the one or two
simplex sentences that usually precede experimental target sentences.
In this paper, we use corpus analysis to examine the role of four factors
contributing to referential prominence: subject status, expression type, backward
center status, and discourse topichood. Centering theorists have proposed the first
three factors as characteristics of preferred referential configurations. It must be kept
in mind that predictions such as these cannot be directly tested in a corpus analysis.
The Centering framework is about preferences that make discourse easier to process,
not about choices in actual language behavior, as is rightly pointed out by Poesio et al.
(2004). However, since discourse probably tends not to violate expectations of
average readers on a structural basis, the Centering predictions are at least relevant for
corpus data, and vice versa. That is, frequent referential progressions in actual
discourse are at least what readers are ‘used to’, and may even be what they actually
expect to encounter.
We hasten to add that the primary aim of our study is not to test the Centering
Theory as a framework. Such an attempt has been made by Poesio et al. (2004)
already. For instance, they present corpus results regarding (different versions of) the
central Centering claims that utterances tend to have a (single) backward center, and
that this center tends to be pronominalized. The present study is not so much about
Centering, as it is about the factors that affect the forward prominence of referents in
natural discourse. To explore this issue, we focus on a specific discourse environment:
a context in which two participants compete for prominence.
Our study investigates two continuation variables that, in our view, have been
insufficiently distinguisted in theorizing on referential coherence: they may affect
which referents are expected to reappear in subsequent discourse, or they may affect
the form of continuation.
In the first corpus study, we studied forward prominence by means of a corpus
of ambiguous pronouns, reasoning that the intended referent of the pronoun will
generally be the most prominent one available in the context. In this study, the first
continuation variable is investigated: what participant is being taken up in subsequent
discourse. The form of resumption is not at stake here, because it is kept constant. The
second study uses the approach of Arnold (2001): it examines referential
continuations in a collection of fragments around a specific type of utterances. We
10
selected utterances that contained both a personal subject and a personal object and
compared the continuations for both kinds of participants. In this study, we examine
which referents reappear and what form the repeated reference takes.
The data were gathered from an electronic data-base of articles from the 1994
and 1997 issues of De Volkskrant, a Dutch quality newspaper. From this corpus, we
collected 100 passages in which a sentence containing the Dutch pronoun hij (he) was
preceded by a sentence in which two male singular participants are mentioned.3
The cases included in our corpus were of the type (1), in which two candidate
referents, in this case George and Al, are introduced in the first segment U1, while U2
contains the ambiguous pronoun he that could refer to both possible referents .
(1)
George hit Al. He….
The segments in our corpus were often sentences, but could also be clauses making up
complex sentences. In (1c) for instance, the he-utterance is a dependent clause:
(1c)
George hit Al after he ….
In all corpus fragments, the subsequent text was then examined to determine which
referent was the Actual Referent (AR), i.e. the one intended by the writer. Thus, each
selected text fragment contained two candidate referents, of which one was identified
as the AR in retrospect. We refer to the other referent as the Potential Referent (PR).
In all cases, two analysts agreed on the interpretation.
3.2
Candidate referents in different clauses
So far, we have discussed cases of ambiguous reference in which the first simplex
sentence, S1, contained AR en PR, the ambiguous pronoun he was in S2, and both S1
and S2 were main clauses.
However, AR and PR may be in different clauses within S1. These clauses
may appear in three configurations: the two clauses are coordinated, the Main Clause
(MC) precedes the Subordinate Clause (SC) or MC follows SC. In all three
environments, AR may precede or follow PR; this produces six configurations, see
Table 1:
Clause configuation
Coordination
Main-subordinate
Subordinate-main
Order of referents
Actual – Potential
Potential – Actual
1. C1 (AR), C2 (PR)
2. C1 (PR); C2 (AR)
3. MC(AR); SC(PR)
4. MC(PR); SC(AR)
5. SC(AR); MC(PR)
6. SC(PR); MC(AR)
Table 1. Configurations for candidate referents in complex sentences
In the following illustrations we mark the expression containing the AR in bold face,
while the PR expression is italicized. Fragment (15) illustrates configuration 3, a main
clause with AR followed by a subordinate clause providing PR:
(15) D Van Eyck stapt op, maar keert terug als er een flinke reorganisatie is
doorgevoerd en een ontslagen hoogleraar weer is aangenomen.
Hij blijft tot zijn pensioen, al houdt hij een grote weerzin tegen de
bureaucratie op de TH.
11
E Van Eyck resigns, but returns when a radical restructuring has been
implemented and a dismissed professor has been re-appointed.
He stays on until his retirement, although he remains disgusted with the
bureaucracy at the Technical University.
Fragment (16) illustrates configuration 2 (coordination in PR-AR order), while (17)
illustrates configuration 4 (PR-clause ranks above AR-clause in PR-AR order). Note
that in (16), the he-clause itself is subordinated to the AR-clause.
(16) D (From a report on the prize-giving ceremony of a literary prize at which two
writers, Durlacher and Matsier, get the same number of votes:)
Wat leuk dat er twee winnaars zijn! Durlacher kan het niet geloven, Matsier
wel;
voor het eerst van de avond lacht hij onspannen.
E How funny that we have two winners! Durlacher can’t believe it, Matsier
can; for the first time this evening he shows a relaxed smile.
(17) D Grant beschrijft twee facetten van Delors die duidelijk maken waarom hij zo
kon slagen in Europa: …
E Grant discusses two characteristics of Delors that explain why he could be so
successful in European politics: …
In all these cases, the candidate referents are in different linear positions – one is
mentioned earlier than the other – but on top of that, one may be in a main clause and
the other in a subordinate clause. Does this syntactic variable help in determining AR?
When PR precedes AR, pronoun resolution is probably easy because AR is the nearest
candidate referent for he. Cases with the order AR-PR seem more complex, and here
it may be expected that syntax plays a role in indicating the relative importance of
referents. More specifically, in this order we expect AR to appear in the main clause
and PR in the subordinate clause. This is exactly what we find (see Table 2).
Clause configuration
Equal status *
Subordinate – main
Main – subordinate
Main – subordinate –
he-clause subordinate again
Total
Referent order
AR-PR PR-AR
1
6
0
3
13
0
0
7
14
16
Table 2. Results of the subcorpus with PRs in different clauses (n=30).
* The equal status-category contains two cases in which the PR-clause was
interpolated in the AR-clause, following the mentioning of AR.
In PR-AR order, the PR-clause may well be a coordinated clause such as in (16). By
contrast, in AR-PR order the PR-clause is almost always subordinated, such as in (15)
above. In other words, the AR is either mentioned most recently, or is in a clause
dominating the clause with the PR. This result implies that readers looking for
intended referents may sometimes disregard a subordinate clause in their backward
search; only the referent in the main clause is a serious candidate. There is one
exception to this pattern: AR can occur in a subordinate clause when the he-clause in
its turn is subordinated to the (already subordinated) AR-clause, as in (17) (Chi2=
12
20.00, df=1, p=.000 for the two-by-two comparison between the two mainsubordinate configurations in Table 2). In other words, while independent heutterances may refer further back to the main clause of the preceding sentence,
dependent he-clauses refer back to the immediately preceding higher clause.
These results are interesting because of the discussion between centering
theorists about what should be defined as an ‘utterance’ when analyzing referential
progressions. Kameyama (1998) suggested that ‘referential calculation’ is not done
sentence by sentence, but clause by clause (at least for tensed subordinate clauses).
More specifically, she proposed to break up complex sentences into a set of “center
updating units” . These finite clauses would all be of equal weight. By contrast, Suri
& McCoy (1994) propose to see postposed subordinate clauses as ‘embedded’, and
not treat them as updating units. Poesio et al. (2004) demonstrate that both parameter
settings may produce violations of the constraint that utterances should have a
backward center, since they may either have a referential link to the immediately
preceding subordinate clause or to main clause preceding this clause.
Our data cannot decide on this issue, because they only deal with pronominal
references to persons in quite specific ‘competition’-contexts. But they do show that
sentence-final subordinated clauses tend not to be seen as updating units by writers,
thus favouring Suri & McCoy’s side of the debate. Of course, this finding is also in
line with other work showing that subordinate clauses in final position tend to present
background information (Tomlin 1985, Matthiesen & Thompson 1987).
3.3
Candidate referents in the same clause
In our corpus of 100 cases of ambiguous he, 70 cases had the candidate referents in
the same finite clause. From now on, we will call this clause the AR-PR utterance. In
this part of the corpus (n=70), we investigated the influence of four factors that we
expected to guide the identification of AR:
1. the grammatical role of the candidate referents in the AR-PR utterance, the
hypothesis being that subject referents will more often be the AR than nonsubject referents;
2. the expression type of both referents, the hypothesis being that pronominal
referents will more often be the AR than nominal referents (noun phrases and
names);
3. the possible backward center status of the referent, expecting that backward
center referents are the AR more often than other referents;
4. the discourse topichood of both referents, expecting that ARs are more often
the global discourse topic than their competitors.
Most of these factors have been already defined above, with the exception of
discourse topichood, which will be discussed in 3.3.4.
3.3.1 Grammatical role
In our corpus, grammatical role always discriminates between candidate referents.
That is, we never find conjoined referents like the ones in the following fragment
(Givón 1992, 14):
(18)
*I saw Joe and Billy. He didn’t look well.
13
The distribution of grammatical roles over the candidate referents is summarized in
Table 4.
Actual referent
Potential referent
Subject
11 (16%)
53 (76%)
(In)direct object
8 (11%)
5 (7%)
Prepositional phrase*
3 (4%)
52 (74%)
Other
6 (8%)
2 (3%)
Table 3. Grammatical role of ARs and PRs in the AR-PR utterance.
* This category includes both prepositional phrases functioning as are arguments in
the predicate frame and prepositional phrases functioning as adverbials.
Table 3 shows that the grammatical role hypothesis is supported: AR is often realized
as subject, while their competitors only rarely appear as subjects (Chi2= 50.77, df=1,
p=.000 for the two-by-two comparison between subjects versus other roles by actual
versus PRs). Instead, PRs are strikingly often realized as prepositional phrases (that
may function both as oblique arguments of the predicate and as adverbials). In 47
(67%) of the cases a subject-AR was combined with a prepositional phrase-PR. Since
prepositional phrases are lowest in the syntactic constituent hierarchy proposed in
centering theory, this implies that in most cases the difference in syntactic prominence
between actual and potential referents is maximal. (19) is a prototypical example:
(19) D Kalff was de logische opvolger van Hazelhoff. Hij maakt al 17 jaar deel uit
van de top, eerst bij de ABN en na de fusie ook bij ABN Amro.
E Kalff was the obvious successor of Hazelhoff. He was part of the top
management for 17 years already, first within the ABN and after the merger
also within ABN Amro.
The importance of grammatical role can be illustrated further by looking at cases in
which both referents are realized in identical expression types. Our corpus contained
thirty of these cases, four of which could not be used because the subject of the
preceding utterance did not refer to a person. Out of the 26 remaining cases, AR was
the subject of the preceding utterance in 24 cases (92%).
Now it could be argued that the preference for subject referents does not stem from
the grammatical function hierarchy but from grammatical parallelism: subject
pronouns prefer subject antecedents. If that were the case, object pronouns should
behave differently from subject pronouns. In order to check this possibility, we
collected 10 cases of ambiguous him. In all of these cases, him refers to the subject
referent of the preceding utterance, as in the following fragment:
(20)D Scholma is vertrouwd met de rol van secondant. Hij deed het in 1976 voor de
eerste keer ten behoeve van Sijbrands. Wiersma heeft hem gekozen omdat
hun ideeën over het damspel overeen komen.
E Scholma is familiar with the role of second. He did it first in 1976 for
Sijbrands. Wiersma chose him because they have the same view on draughts.
The subject preference is found for both subject and object pronouns. Therefore,
grammatical role seems to be the factor that guides the search for antecedents, rather
than grammatical parallelism.
14
3.3.2 Expression type
A second feature that could affect the interpretation of ambiguous pronouns is the
expression type used to refer to AR and PR: do they appear as pronouns (he, him),
Names (e.g. Kalff) or NP’s (e.g. a / the dismissed professor)?
Actual Referent
Potential referent
Pronoun
27 (39%)
5 (7%)
Name
38 (54%)
50 (72%)
NP
5 (7%)
15 (21%)
Table 4. Expression type for actual vs. potential referents in AR-PR utterances
Table 4 shows that the expression type hypothesis is supported: AR regularly occurs
as a pronoun, while its competitor only rarely does so (Chi2= 21.76, df=2, p=.000).
Note that the proportion of pronouns among ARs is not overwhelming. Also, we need
to add that expression type often does not discriminate between the referents. A
particularly regular configuration (n= 34; 48,5%) is the one in which both referents
are realized by names (sometimes accompanied by introductory NPs).
Although expression type does not appear to be a strong predictor of
continuation in itself, it might perform a secondary role, in that it adds weight to mark
the prominence of non-subject referents. Hence we checked whether non-subject ARs
are more often realized as pronouns than subject ARs. There is no quantitative support
for an interaction between grammatical role and expression type of the antecedents for
ambiguous pronouns.
3.3.3 Backward center status
Grammatical role and expression type are ‘local’ determinants of prominence, in that
they only pertain to the utterance with the candidate referents. We will now go one
step further back in the preceding discourse and examine the relation between the ARPR utterance and the utterance preceding it.
This relation determines the kind of Centering transitions realized by the
ambiguous pronoun utterance following the AR-PR utterance. For both actual and
potential antecedents, we coded whether or not they were the backward center of the
AR-PR utterance. Centering Theory predicts that actual antecedents of pronouns will
preferably be the backward center in their utterance, since in that case the pronoun
utterance meets the conditions for a Continue transition:
• the he-referent is generally the CB of the pronoun utterance
• it is identical to the CB of the AR-PR utterance
• the he-referent is also the preferred forward center, being in subject position.
When the pronoun antecedent is not the backward center in its utterance, there are two
possibilities. First, not the actual but the potential antecedent may be the backward
center in the preceding utterance. In that case, the pronoun utterance realizes a Smooth
Shift, as in fragment (9) above. Second, neither antecedent can be a backward center.
This may be the case when the two participants are both new to the discourse, or when
they are both mentioned in the same preceding utterance in the same syntactic role
and have the same expression type in the AR-PR utterance. In the last case, we
followed the centering theory assumption that there cannot be two backward centers.
When neither referent is a backward center, no predictions on pronoun resolution can
be derived from transition types.
15
When coding referents as backward centers, we adopted a liberal definition: a referent
could not only be a backward center when being mentioned in the preceding
utterance, but also when being mentioned two utterances earlier, provided of course
that the competitor has not been mentioned in the mean time. For instance, in (21) we
consider the referent Al in utterance (c) to be a backward center by virtue of its being
mentioned two utterances back:
(21)
a. George and Al are brothers, but they have a totally different attitude to life.
b. George spends all his spare time reading.
c. Al goes out hiking every weekend with his son Peter.
Table 5 shows that 70% of the pronoun utterances realize Continue Transitions (with
AR as the backward center), while the proportion of Smooth Shifts (with PR being the
backward center) is about 10%; thus the backward center hypothesis is clearly
supported (Chi2= 32.44, df=1, p=.000 for the contrast between AR centers and PR
centers). The proportion of AR-centers is much higher than the proportion of
pronominal ARs (39%, see Table 3), which means that backward centers are not
necessarily realized as pronouns. For a much larger corpus than ours, Poesio at al.
(2004, 332 ff.) report that not more than about half of the CBs in their corpus is
pronominalized. Hence backward center status seems a stronger predictor for AR
status than expression type. At the same time, its predictive power is restricted by the
fact that there is no clear single CB referent in 18% of the cases.
CB configuration
AR = CB(= Continue transition)
PR = CB (= Smooth Shift transition)
Both referents are equally salient
Both referents are new
Total
Table 5. CB status for actual and potential referents
Totals
50 (71%)
7 (10%)
3 (4%)
10 (14%)
70
There is no difference between the CB configurations of subject and non-subject ARs.
That is, non-subject AR and subject AR-cases display the general preference for
Continue transitions to the same extent. This means that transition type does not
‘compensate’ for the irregularity of non-subject continuations: it does not make it any
easier to process these continuations.
3.3.4 Discourse topichood
The global factor of discourse topichood has not received a conventional definition so
far. Givòn (1992: 16-17) mentions a forward-looking measure, topic persistence (the
number of times the referent persists as argument in a certain number of subsequent
clauses), as well as the overall frequency of a referent in the entire discourse. For our
purposes, we needed a backward looking measure that provides a global, but still
locally relevant level of importance for a particular referent. We chose to count the
number of occurrences of a referent in the 5 finite clauses preceding the AR-PR
utterance. Hence our measure of topichood – to which we will refer as the Discourse
Topic Score (DTS) - ranges from 0 to 5. An additional reason to restrict the relevant
context to 5 clauses was that most newspaper texts (unlike narrative texts) do not
continue to refer to central referents for more than one or two paragraphs. Chains of
10 or more references are hard to find in our corpus.
16
First, we determined whether both referents had been mentioned at all in the
preceding text. We found that, prior to the AR-PR utterance, 67 % of the PRs had not
been mentioned at all in the text, compared to only 17 % for the ARs (Chi2= 37.80,
df=1, p=.000).
ARs had a Discourse Topic Score of 2.27 (SD 1.74), compared to 0.79 (SD
1.36) for PRs. That is, our expectation was supported that ARs were significantly
older than PRs (t=6.19, df=68, p=.000). For instance, the following fragment (from
1994) is clearly about the soccer player Vink, mentioned 4 times, not about his coach,
who is only mentioned once:
(22)
D
E
(From a story about a Dutch soccer player whose career is going down after he
left Ajax.)
Uitgerust met een radio en een totoformulier slaat Marciano Vink (23)
momenteel de wedstrijden van Genua gade. Hij is fit, maar (Ø) speelt niet.
Trainer Scoglio van de Italiaanse nummer veertien denkt middenvelder Vink
niet nodig te hebben. Zo goed als mogelijk is na een aantal succesvolle
seizoenen, probeert hij Ajax zo min mogelijk te koesteren: `Voetbal is een
leven van komen en gaan.'
Equipped with a radio and a pool coupon Marciano Vink (23) is watching the
matches of Genoa. He is in condition, but (Ø) does not play. Coach Scoglio of
the Italian number 14 thinks he does not need midfielder Vink. Although this
is difficult after a number of successful years, he tries not to cherish his
memories to his Ajax years too much: “Soccer is a life of coming and going.”
Nevertheless, there was a considerable number of cases in which the two referents
were equally old (n =12), or in which the PR was even older (n =13). Hence the DTS
of the referents does not entirely determine the pronominal referent. As an example
consider (23), in which the AR only appears once, while the PR has been mentioned
three times:
(23)
D
E
(Dietz has published a poetry anthology, the introduction of which has been
criticized for quoting poems by Reve and Faverey without permission:)
Volgens Dietz passen de citaten in de `context van de inleiding' en zijn ze
gebruikt `om redenen van representativiteit', maar of het correct is, `daarover
kan men van mening verschillen.' Dietz heeft nog geen protest gehoord van
Reve, maar de directeur van De Bezige Bij, A. Vorster, heeft zich al wel bij
hem gemeld.
Hij wil een gesprek over het citeren van Faverey.
According to Dietz the quotes ‘fit into the context of the introduction’ and
have been used ‘for reasons of representativeness’, but one may disagree about
whether their use has been entirely ‘correct’. Dietz did not receive any protests
from Reve yet, but the Bezige Bij-director, A. Vorster, already contacted
him.
He demanded a meeting on the quotes from Faverey.
Let us now look at the possible interaction of DTS with grammatical role. When the
two factors would complement each other, we might expect that non-subject ARs
need to be “extra heavy” in terms of global prominence. Hence, in cases with a nonsubject AR, the difference in DTS between the candidate referents should be larger
than in cases with a subject AR. To test this possibility, a repeated-measures ANOVA
17
was carried out on DTS as the dependent variable with referent status (actual or
potential) as within-variable and the role of the AR as the between-variable. The
results are shown in Table 6.
DTS for AR
DTS for PR
Subject ARcases
1.98 (1.62)
.87 (1.45)
Non-subject ARcases
3.18 (1.81)
.53 (1.01)
Totals
2.27 (1.74)
.79 (1.36)
Table 6. The interaction of grammatical role of AR (Subject or not) and discourse
topic score (DTS) for AR and PR
The main effect of referent status (F [1,67] = 39.76, p = .000) on DTS is qualified by a
significant interaction between referent status and grammatical role of the AR ((F
[1,67] = 6.62, p = .012): the difference in DTS between actual and potential referents
is larger for non-subject referents than it is for subject referents, mainly because the
DTS for non-subject ARs is higher than that for subject AR (t = 2.57 df = 68 p =
.012). In other words, when a non-subject becomes the AR, it is exceptionally well
established in the previous discourse.
We can look at the same phenomenon from a different point of view by
comparing DT scores within cases. In Table 7 we distinguish between three kinds of
cases: the AR is ‘older’ than the PR, the PR is older than the AR, or the two are
equally old. The frequencies show that non-subject ARs are almost always older than
their competitors (88%), while for subject ARs only 57% is older (Chi2= 5.61, df=1,
p=.018 for this 2x2 comparison). Alternatively, one may say that PR may be older for
subject AR-cases (24%), while this is impossible for non-subject AR-cases (Chi2=
5.12, df=1, p=.024 for this comparison); in other words, a subject referent can beat an
older competitor in terms of forward prominence, a non-subject referent cannot.
AR is older
AR and PR equally old
PR is older
Totals
Subject ARcases
30 (57%)
10 (19%)
13 (24%)
53
Non-subject ARcases
15 (88%)
2 (12%)
0
17
Totals
45
12
13
70
Table 7. DTS-configurations within fragments for subject and non-subject AR-cases
These results show that non-subject ARs are well established as the discourse topic in
the 5 clauses preceding the AR-PR utterance. This indicates that grammatical factors
and discourse topichood may work together in suggesting plausible antecedents for
ambiguous pronouns. ARs are either prominent because they are the subject of the
preceding utterance (grammatical role), or because they are mentioned more often in
the preceding discourse (discourse topichood). The resulting parsing principle could
be this: in case of an ambiguous pronoun, the intended referent is the subject referent
of the preceding utterance, except when another referent is well established as the
Discourse Topic; in that case, the situation is undecided.
3.3.5 Do the predictors contribute independently?
We have now discussed four predictors for AR-status: grammatical role, expression
type, backward center status and DTS. These variables are significantly correlated: for
instance, subjects are more often CB (r = .39, p < .01), pronouns are more often CB (r
= .49, p < .01), and CB referents have a higher DTS (r = .50, p < .01). We now need
to determine whether the variables contribute independently, or whether the
18
contribution of one variable is cancelled out by the other. That’s why we used a
statistical method to compare several predictors in one and the same analysis: logistic
regression (see Van Hout & Rietveld 1993 for an introduction). This is a technique for
estimating to what extent the likelihood of occurrence of a certain event (in this case:
being the intended referent of the ambiguous pronoun) is affected by an independent
variable. Variables may be combined into models; different models including
different sets of predictors may also be compared to others in terms of their overall
performance. We determined the optimal model by successively adding new
predictors to the model, starting with a model that only includes a constant. First we
added grammatical role, then expression type, then backward center status and finally
DTS.
We used two criteria to accept or reject new predictors:
- the model performance, as measured by the model likelihood ratio, has to
improve significantly when adding a new predictor
- the new predictor itself needs to be significant at the .05 level, as indicated by
the Wald statistic of its B coefficient.
If these criteria are not met, a predictor is dropped from the model.
In this case, all predictors except for expression type were needed for the optimal
model (see Table 8). That is, grammatical role, CB-status and Discourse Topic Score
independently contribute to the chance of resumption as a referent. The model with
three variables explains a fair share of the variance, as indicated by its Nagelkerke R
square.
Variables
Constant
Grammatical role (subject
yes/no)
Backward center status
(yes/no)
Discourse Topic Score (0-5)
Model log-likelihood mod
(df for the model)
Log-likelihood improvement
(df for the comparison)
Nagelkerke R square
Constant model
BWald
coefficient
(S.E.)
.000 (.169) .000
pvalue
1.000
Optimal model
BWald
coefficient
(S.E.)
-2.98 (.55) 29.21
3.14 (.60)
27.29
pvalue
.000
.000
2.26 (.60)
14.07
.000
.52 (.18)
8.83
.003
194.08 (1)
94.45 (4)
.000
99.63 (3)
.679
Table 8. Logistic regression analysis of factors predicting whether a referent is the
intended referent of the pronoun in the next utterance
In sum, this first study provides support for local as well as contextual determinants of
forward prominence. With regard to local factors, we found a subject preference; with
regard to the local context, we found a preference for Continue transitions. The last
finding is newsworthy by itself, given the skepticism about the role of transition
preferences in pronoun resolution voiced by Kehler (1997) and Tetreault (2001).
Furthermore, it is clear that the global contextual factor of discourse topichood plays a
significant role too. Especially interesting is its interaction with grammatical role,
which points to a complementary relation between a local factor and a global
contextual factor.
19
In the final logistic regression analysis, we found no effects for expression
type; the crucial factor is probably backward center status, not expression type by
itself. This result seems to favor the standard centering theory preference rules for
transitions over the expression type hierarchy proposed by Kameyama (1996).
4.
A study on continuations following subject-object utterances
4.1
Why we used a continuation corpus
The size of our corpus in the first study was modest. Moreover, the situation of
ambiguous pronouns is rather special in that speakers using such pronouns seem to be
confident that one of the referents in the AR-PR utterance is clearly more prominent
than the other. In order to examine the determinants of forward prominence in a larger
corpus of utterances in which the referential prominence configuration seems less
clear-cut, we assembled a corpus of utterances presenting two personal referents, one
of them in subject and the other in object position, at least one of which is referred to
in subsequent discourse.
The general assumption behind this study was already discussed in the
introduction: writers and readers need to coordinate their individual models of referent
accessibility in the discourse, so that readers are prepared for what comes next in the
discourse. We would expect then that the actual continuation of the discourse can, to a
degree, be predicted on the basis of characteristics of the discourse so far. This
reasoning is not only relevant for processing ambiguous pronouns, but also applies to
every discourse situation in which several referents have been introduced.
Now, in this larger corpus, we may distinguish between two aspects of
continuation. The first question is whether a referent reappears or not, and the second
one is in what expression type the referent reappears. The first rule in centering theory
states that, when subsequent utterances contain pronouns, CB will be realized by one
of them; this is meant to capture the intuition that pronominalization is a regular way
to indicate backward saliency of a discourse referent.
4.2
Description of the corpus
We gathered a new corpus of 200 newspaper fragments, again from De Volkskrant,
which consists of two parts:
1. The first 100 fragments were collected on the basis of (indirect) object
pronouns: the central utterance contained an occurrence of him and a subject
constituent referring to an identifiable, single third person;
2. Another 100 fragments were collected on the basis of subject pronouns: the
central utterance contained he and an (indirect) object constituent referring to
an identifiable, single third person.
This sampling procedure was chosen for two reasons. First, utterances containing two
single human referents are not very common, and searching for personal pronouns is a
way to computerize at least part of the sampling process. Second and importantly,
using both subject and object pronouns as search string promises a corpus in which
subject and object referents are comparable in terms of expression type, a crucial
requirement for our research in which we hope to disentangle different determinants
of forward prominence.
20
We restrict our sampling to human referents in (in)direct object position; in
Dutch, indirect objects can generally be realized without prepositions. That is, we
exclude person referents in prepositional phrases containing arguments of the clause
predicate (sometimes called ‘object-of-PP’). It may be that object participants are
more prominent than the PRs in the first study, most of which were in prepositional
phrases.
Furthermore, the following conditions were used when selecting corpus fragments:
• the central utterance was not a subordinate clause, because the first corpus
study showed (section 3.2) that the prominence of referents in a subordinate
clause is ‘down-graded’ in comparison to those in the main clause;
• in order to determine the relative prominence of both referents in the
preceding text, the central utterance had to be preceded by minimally five
clauses in which both referents could appear;
• in order to determine the relative prominence of both referents by their reoccurrence, only those fragments were selected in which at least one of the
referents re-appears.
Fragment (24) illustrates an object pronoun fragment, (25) illustrates a subject
pronoun fragment. CU marks the central utterance.
(24)
(At the World Cup swimming tournament, the Russian Popov has set a new
world record for the 100 meters free style.
D
[1] Een Nederlands succes was er voor Ron Dekker [2] die zegevierde op zowel
de 50 als 100 meter schoolslag. [3] Popov ontbrak onlangs bij de eerste
wereldkampioenschappen op de 25 meter- baan, gehouden in Palma de Mallorca.
[4] Hij bereidde zich in Australië voor op het nieuwe seizoen. [5] Vorig jaar
werd zijn `oude' Russische trainer Toeretski ingehuurd door het Australische
Sportinstituut van Canberra
CU: en Popov volgde hem weldra.
De tweevoudig Olympisch kampioen (50 en 100 vrij) was in het Kowloon
Park-bad van Hongkong elf honderdste seconde sneller dan de Braziliaan
Gustavo Borges vorig jaar juli in Santos (47,94).
E
[1] A Dutch succes was the performance of Ron Dekker [2] who won both the 50
and the 100 meter breast stroke. [3] Popov was recently absent at the first world
championships on a 25 meter-track, held in Palma de Mallorca. [4] He prepared
himself for the new season in Australia. [5] Last year his ‘old’ Russian coach
Toeretski was hired by the Australian Sport Institute of Canberra
CU: and Popov followed him soon.
In the Kowloon Park pool of Hongkong, the twofold Olympic champion (50 and
100 free style) was faster by eleven hundreds of a second than the Brasilian
Gustavo Borges last year in Santos (47,94).
(25)
D
(It is their first, stammering telephone call.)
[1] Zij heeft een 06-advertentie gezet, [2] hij reageert. [3] Gaandeweg ontstaat
er contact. [4] Met niet meer dan een stem bouwen ze een relatie op via de
telefoon. [5] De spelregels liggen vast.
CU: Hij belt haar nooit,
zij belt hem. Zo krijgen ze een verhouding die alle fases doorloopt.
21
E
[1] She has placed a 0900-ad in the paper, [2] he responds. [3] Gradually a
connection between them evolves. [4] With no more than their voices, they
build a relationship by phone. [5] The rules of the game are set.
CU: He (subject pronoun) never calls her (object),
she calls him. In this way, they develop an affair that passes through all
phases.
As these fragments show, the central utterance may be coordinated with preceding or
following clauses. In the central utterance of some of these coordinations, the subject
is zero coded (objects cannot be elided in Dutch in this context). We decided to
include these fragments because ellipsis might be a characteristic feature of the
subject role that directly touches on the issue of forward activation that is under
scrutiny here.
Apart from ellipsis, the subjects and objects were realized in more or less the
same expression types in the corpus:
Subject
Object
referents
referents
Ellipsis
8
Pronoun
122
122
Names
54
53
Full NP
16
25
Total
200
200
Table 8. Subject and object expression types in the continuation corpus
We need to note that our sampling method led to a bias in the expression type
configurations, since all our central utterances contained at least one pronoun. That
is, our sample does not include fragments with central utterances in which both
referents are realized as names or full NPs (see table 9).
object ET: Pronoun
Name
Full NP
subject ET:
Ellipsis
8
Pronoun
44
53
25
Name
54
Full NP
16
Total
122
53
25
Table 9. How subject and object expression types are combined
Total
8
122
55
16
200
Subject referents and object referents do not differ in their discourse topic scores
(M=2.08, (SD 1.57) for subject referents, M=2.03 (SD 1.58) for object referents). The
diversity of predicate types that we found in the corpus is such that we can conclude
that there is no particular thematic role bias in the corpus.
The analysis proceeded as follows. Starting from the target-utterance, we checked for
each fragment whether one or both referents reappeared, and if so, whether they were
both mentioned in the same utterance, and in what grammatical role and what kind of
expression type. In addition, we looked at the discourse topic score, that is the number
of times a referent was mentioned in the 5 preceding clauses. For instance, in (24)
22
only the subject referent recurs in the first continuation utterance, in subject position
and in NP-form. The discourse topic scores are 3 for the subject referent (Popov) and
1 for the object referent (his trainer). In (25), both referents recur as pronouns in the
first continuation utterance. The topic score is 2 for both referents, counting the plural
they in utterance [4] as a reference for both referents.
4.3
Results 1: Does the referent reappear in subsequent text?
4.3.1 Role effects
First we discuss whether grammatical role affects the probability of being mentioned
again in subsequent utterances. This continued reference can be considered in several
ways.
• First, we may ask absolute questions for both referents in isolation: does the
referent recur somewhere in the rest of the text or not, and if yes, in what
utterance, what role and what form?
• Second, we may ask relative questions regarding continuation configurations
concerning the two referents in a particular fragment. For instance, how often
do both referents recur and how often does the subject referent recur while the
object referent does not? And is the subject referent more often the first to be
referred to again (the object referent only being mentioned in a later
utterance).
Let us take a first look at the different possibilities of continuation. Departing from the
central utterance, we will define the ‘next utterance’ as the first subsequent clause
presenting either the subject or object referent again. Most often this is the
immediately following utterance. The next utterance may only present the subject
participant, the subject and object participant together, or may present only the object
participant. A ‘subject-only’ utterance may be followed by a later utterance
reintroducing the object participant, or this participant may not return at all; the same
holds, mutatis mutandis, for an ‘object-only’ utterance. Figure 1 presents the numbers
of observations in these categories:
central utterance: subject + object
only SR in next clause
n = 71
OR returns later
n = 19
SR + OR in next clause
n = 54
OR does not return
n = 52
only OR in next clause
n = 75
SR returns later
n = 38
SR does not return
n = 37
Figure 1. Overview of the options for continuation following the central utterance.
SR = subject referent, OR = object referent
We can now compare the frequencies for subject and object participants. As for the
absolute continuation issue, both referents reappear about equally often, see Table 10;
neither is there any difference in the position at which the referents are resumed: both
subject and object referents overwhelmingly reappear in the first or second utterance
following the central utterance, see Table 11.
23
Subject continued? Yes
No
Totals
Object continued?
Yes
111
37
148 (74%)
No
52
0
52 (26%)
Totals
163 (81.5%) 37 (18.5%) 200
Table 10. Whether subject and object referents return in subsequent discourse
Subject referent
Object referent
Resumed in utterance 1
100
106
Resumed in utterance 2
38
25
Resumed in subsequent
25
23
utterances
Totals
163
148
Table 11. In what utterance subject and object referents are being referred to again
4.3.2 Effects of expression type, backward center status and discourse topichood
Subject and object referent reappear about equally often in subsequent text. What
other factors co-determine their reappearances? As in the previous study, we
examined three such factors: expression type, backward center status and discourse
topichood.
Again, discourse topichood was defined in terms of the number of times a
referent has been mentioned in the 5 clauses preceding the central utterance: the
Discourse Topic Score (DTS). Expression type was dichotomized into two values:
pronoun (coded 1) or other (coded 0). Backward center status was defined by asking
whether a referent was the most recently mentioned one in the preceding utterances,
or the most prominent one in case both referents came from the same utterance. For
instance, when the object referent had been mentioned in utterance –1 and the subject
referent had been mentioned in utterance –2 or had not been mentioned in the pretext
at all, the object referent was coded as being the backward center (coded 1) of the
central utterance and the subject referent was coded as a non-backward center (coded
0). When the two referents occurred in the same preceding utterance, Centering
Theory requires the analyst to identify the highest ranked referent which returns in the
central utterance, with the additional rule that when the central utterance contains
pronouns, one of them must be the CB. Generally, this leads to assigning CB status to
the subject referent of the previous utterance. For a few cases, the centering rules do
not produce a CB, e.g. when none of the participants is the previous subject referent
and when both are pronominalized in the central utterance.
Besides the conventional CB identification procedure we also tested a second,
less detailed measure of backward saliency. This measure only considers a referent as
salient when it is mentioned in a more recent utterance than its competitor. It does not
discriminate between referents from the same previous utterance. This crude recency
measure is tested too, in order to see whether the more specific notion of CB is a
stronger predictor.
We investigated how these independent variables affect two aspects of
referential continuation: does a referent reappear at all (absolute continuation), and is
it the first or the only one to be referred to again (relative continuation).
As expected, the independent variables correlated significantly. For instance,
pronominal referents tend to have a higher DTS (r = .57) and they refer to the most
24
recently mentioned participants more often (r = .50). Backward centers tend to have
higher DTS scores (r = .32; for all correlations p < .01). In a logistic regression
analysis, we added the local variables grammatical role and expression type, then the
backward center and finally the global factor DTS. If two models are superior to the
others but do not differ significantly in performance, we will report the model with the
lowest log-likelihood.
The analysis was run separately for subject and object referents, since they
may be affected in different ways by the additional determinants of prominence.
Moreover, this offers the possibility of not only using the characteristics of the
referent itself as predictors, but also those of the competitor.
Let us first turn to the question whether or not a referent returns in subsequent
discourse. In general, we may say that already being established in both the local and
global context increases the probability of reoccurrence of a referent. For object
referents, the model containing CB status and DTS was clearly superior to the rest
(see Table 14). For subject referents, we ended up with two models that did not differ
significantly: a model containing expression type and DTS and a model with CB
status and DTS. Since so far CB status proved a stronger predictor than expression
type, we consider the last model the optimal one for theoretical reasons. That’s why
this is the model reported in Table 13.
SUBJECT REFERENT,
ABSOLUTE CONTINUATION
Independent variables
Constant
Backward center (yes/no)
Discourse Topic Score (0-5)
Model log-likelihood
(df for the model)
Log-likelihood improvement
(df for the comparison)
Nagelkerke R square
Constant model
Bcoefficient
(S.E.)
1.48 (.18)
Optimal model
Wald
pvalue
66.31
.000
Bcoefficient
(S.E)
-.17 (.28)
2.03 (.60)
.65 (.19)
Wald
pvalue
.36
11.64
12.09
.548
.001
.001
191.56 (1)
135.26 (3)
.000
56.30 (2)
.398
Table 13. Logistic regression analysis for absolute continuation of subject referents
OBJECT REFERENT,
ABSOLUTE CONTINUATION
Independent variables
Constant
Backward center (yes/no)
Discourse Topic Score (0-5)
Model log-likelihood mod
(df for the model)
Log-likelihood improvement (df
for the comparison)
Nagelkerke R square
Constant model
Bcoefficient
(S.E.)
1.05 (.16)
Optimal model
Wald
pvalue
42.10
.000
Bcoefficient
(S.E.)
.05 (.25)
.97 (.40)
.37 (.13)
Wald
pvalue
.05
5.79
8.70
.834
.016
.003
229.22 (1)
204.63 (3)
.000
24.59 (2)
.170
Table 14. Logistic regression analysis for absolute continuation of object referents
CB status turns out to be a better predictor here than recency alone. The influence of
CB status is illustrated in Table 15 and 16, showing that CB referents are pretty sure
to be referred to again, while only two out of every three non-CB referents return in
25
subsequent discourse. CB status does not differentiate between subject and object
referents.
SR is not CB
SR is CB
SR not continued
33 (37%)
4 (4%)
SR continued
56 (63%)
107 (96%)
Totals
89
111
Table 15. How CB status affects continuation of subject referents
Totals
37
163
200
OR is not CB
OR is CB
OR not continued
41 (36%)
11 (13%)
OR continued
72 (64%)
76 (87%)
Totals
113
87
Table 16. How CB status affects continuation of object referents
Totals
52
148
200
We need to note that this effect of CB status does not yet constitute empirical support
for the Centering Theoretical claim that Continue transitions are preferred; such
support can only be derived from results regarding the second dimension of
continuation: is a referent the first or the only one to be referred to again? As is shown
in the Tables 17 and 18, the predictors for this aspect of continuation cannot be found
in characteristics of the referent itself; instead, the strength of the competitor is
decisive.
For subject referents, the optimal model contained object backward center
status and object DTS (see Table 17). For object referents, the picture is slightly more
complicated. Three models were equally adequate in statistical terms: all models
contained subject DTS, but the second variable was either expression type, backward
recency or backward center status for the subject referent. We report the last model
here, because it fits best with other findings (see Table 18).
SUBJECT REFERENT,
RELATIVE CONTINUATION
Independent variables
Constant
Object backward center status
Object referent DTS (0-5)
Model log-likelihood mod
(df for the model)
Log-likelihood improvement (df
for the comparison)
Nagelkerke R square
Constant model
Bcoefficien
t (S.E.)
-.60
Optimal model
Wald
pvalue
16.33
.000
Bcoefficient
(S.E.)
.50 (.25)
-.77 (.35)
-.43 (.12)
Wald
pvalue
3.88
4.84
13.89
.049
.028
.000
260.20 (1)
229.72 (2)
.000
30.48 (1)
.194
Table 17. Logistic regression analysis of factors predicting whether subject referents
will be referred to again earlier than object referents
OBJECT REFERENT,
RELATIVE CONTINUATION
Independent variables
Constant
Subject backward center status
Subject referent DTS (0-5)
Constant model
Bcoefficien
t (S.E.)
-.55
Optimal model
Wald
pvalue
14.21
.000
Bcoefficient
(S.E.)
.62 (.27)
-.98 (.34)
-.35 (.11)
Wald
pvalue
5.47
8.33
9.28
.019
.004
.002
26
Model log-likelihood mod
(df for the model)
Log-likelihood improvement (df
for the comparison)
Nagelkerke R square
262.50 (1)
231.37 (3)
.000
31.13 (2)
.197
Table 18. Logistic regression analysis of factors predicting whether object referents
will be referred to again earlier than subject referents
Table 19 summarizes the four analyses presented so far, showing that the optimal
models invariably contain DTS as a global factor and a factor reflecting the local
context as well. Backward center status is the most reliable local predictor, but it may
be replaced by expression type or recency in some cases:
Continuation
variable
SR
reappears
OR
reappears
SR the first to
reappear
OR the first
to reappear
The optimal model contains one of the
local contextual factors:
Expression
Recency
Backward
type
center status
+
+
+
And a single
global factor:
Discourse topic
score
+
+
+
+
+
+
+
+
Table 19. Summary of logistic regression analyses in Study 2.
The most consistent predictor emerging from all this is DTS. Table 20 and 21
illustrate how the probabilities for continuation vary for different DTS values. The
subject referent DTS increases the chance of subject referent continuation and
decreases the chance that the object referent will reappear first (Table 17); object
referent DTS, conversely, increases the chance of subject referent continuation and
decreases the chance that the object referent will reappear first.
Subject referent
continued at all?
Object referent the
first to appear next?
DTS Subject referent
0 (n=41)
.49
.63
1 (n=45)
.78
.42
2 (n=32)
.88
.40
3 (n=32)
1.00
.13
4 (n=39)
.95
.26
5 (n=11)
1.00
.09
Table 20. How subject referent DTS affects the continuation probabilities
Object referent
continued at all?
DTS Object referent
0 (n=48)
1 (n=33)
2 (n=40)
3 (n=33)
.52
.70
.85
.73
Subject referent the
first to appear next?
.63
.45
.20
.33
27
4 (n=36)
.89
.19
5 (n=10)
1.00
.00
Table 21. How object referent DTS affects the continuation probabilities
Let us finally illustrate the DTS effect by comparing the subject referent and object
referent DTS for different kinds of continuations:
SR DTS
OR DTS
Significance
Subject and object referent 2.44 (1.57) 2.29 (1.36)
n.s.
reappear in the same
utterance
Only the subject referent
2.52 (1.49) 1.30 (1.40)
t = 4.32, df = 70, p = .000
reappears in next utterance
Only the object referent
1.40 (1.44) 2.58 (1.61)
t = 4.03, df = 72, p = .000
reappears in next utterance
Table 22. DTS configurations for three kinds of continuations
The most remarkable finding in Table 22 is that combined continuations occur when
both referents are equally highly activated. This may also help us to understand why
DTS of a referent predicts the failure of the competitor in terms of relative
continuation: when referent A has a relatively high DTS, this does not mean it will be
the first or only one to reappear. It may also reappear together with referent B in the
next relevant utterance. But given a strong A referent, the chances that referent B will
be the first or only one to appear in the next utterance are quite low.
In sum, whether or not a referent will reappear is not determined by its
grammatical role, but instead by contextual factors: both its entrenchment in the local
context is relevant, alternatively indicated by its expression type, CB status or its
recency, and its being the discourse topic, indicated by its DTS. Whether it will be the
first or the only one of the competitors to reappear is primarily determined by the
contextual prominence of the competitor. There was no evidence for a preference for
Continue transitions: the CB referent was not the first or only one to reappear in
subsequent discourse.
Another finding of interest is that the determinants of prominence are virtually
identical for subject and object referents. In the first study, non-subject referents were
found to be more sensitive to contextual constraints than subject referents, but in the
second study there was no such interaction.
4.4
Results 2: In what form do subject and object referents reappear?
So far, this study found no evidence for a grammatical role effect on referential
continuation. But perhaps such an effect must not be sought by looking at the reoccurrence of a referent, but rather at the form of the next referent. We will
concentrate on the expression type used on subsequent mention. Using various
methods, Centering theorists (Gordon et al. 1993, Brennan 1995, Kennison & Gordon
1997) have shown that subject referents – which are usually more prominent in the
discourse representation built so far – are preferably continued by pronouns while
this is not the case for object referents. When testing this in our data, we need to
control for the expression type used in the central utterance and for the distance
between first and second mention, since both factors may affect the expression type of
28
the second reference. Moreover, we need to realize that subject referents may
reappear as zeroes in the next utterance (ellipsis), while object referents cannot.
First, we compared subject and object pronoun referents on the expression
type in the subsequent reference, only taking into account references in the
immediately following utterance:
ET in subsequent
utterance
Zero
Pronoun
Name or NP
Subject
pronoun
Subject pronoun,
excluding zeroes
Object
pronoun
16 (21%)
44 (59%)
44 (76%)
46 (59%)
15 (20%)
15 (24%)
32 (41%)
75
59
78
Table 23. Form of continuations for subject and object pronouns
When zero continuations are ignored, the difference in Table 23 is only marginally
significant (Chi2 = 3.63, df = 1, p = .057); when they are counted as pronouns, there is
no doubt that subject referent continuations are more often reduced in form than
object referent continuations are (Chi2 = 7.94, df = 1, p = .005).
Next we turn to the continuation forms of referents of names and NP’s in both
grammatical roles. Here, it makes no difference whether ellipsis is included (there is
no ellipsis following nominal referents). Table 24 shows that subsequent references to
nominal subject referents are more often pronominal than subsequent referents to
nominal object referents (Chi2 = 3.88, df = 1, p = .049).
Subject
Object
name or NP name or NP
16 (67%)
11 (39%)
8 (33%)
17 (61%)
24
28
Table 24. Form of continuations for subject and object pronouns
ET in utterance 1
Pronoun
Name or NP
In order to understand what may be going on here, it is useful to distinguish between
two kinds of cases:
• cases in which both subject and object referent reappear in the next utterance
• cases in which only the subject or object referent reappears in the next
utterance
In combined continuations, we found differences between the form in which subject
and object referents are resumed. Both kinds of referent tended to be pronominalized,
sometimes even in the form of joint references to both participants (plural ze, ‘they’).
Recall that in these cases, both referents tend to be well-established in prior discourse
(Table 21).
We also compared subject-only and object-only continuations in the next
utterance. Note that for subject continuations, the resumed referent was the new
subject in 92%; but resumed object referents were also predominantly in subject
position in the next utterance (84%). The forms of continuations are given in Table
25. Subjects are more often continued as pronouns, both when we ignore zero
realizations (Chi2 = 4.21, df = 1, p = .040) and when we include them as pronouns
(Chi2 = 8.61, df = 1, p = .003).
29
Subject-only
Subject-only,
Objectexcluding zeroes only
Total
ET in utterance 1
10 (20%)
10
Zero
26 (52%)
26 (65%)
25 (43%) 51
Pronoun
14 (28%)
14 (35%)
32 (57%) 45
Name or NP
50
40
57
107
Total
Table 25. Form of continuations for subject and object referents in subject-only and
object-only continuations, excluding subject ellipsis
We also analyzed how other factors co-determine the form of repeated reference to
subject and object referents in the next utterance, but neither expression type, nor
backward center status or discourse topic score had any effects.
How do we explain the tendency that object referents are less often resumed as
pronouns? Compare the following cases of nominal resumptions of object referents:
(26)
D Het verhaal over de oude Gepetto die een pop snijdt uit een levend stuk hout
behoort nog altijd tot de meest vertaalde boeken ter wereld.
Hij maakt zich een zoon, maar het joch is opstandig en wil van zijn vader af.
E The story of the old Gepetto, who cuts a puppet out of a living piece of wood, is
still one of the most often translated books in world literature. He makes himself
a son, but the boy is rebellious and wants to get rid of his father.
(27)
D Uiteindelijk zwichtte Efron, en (Ø) bood Abraham het veld en de spelonk voor
de exorbitante prijs van 400 sikkelen zilver. Zonder af te dingen, betaalde
Abraham hem deze prijs voor het stuk grond.
E Eventually, Ephron gave in, and (Ø) offered Abraham the field and the cavern
for the exorbitant price of 400 sickles of silver. Without bargaining, Abraham
paid him this price for the piece of land.
(28) (Chairman Zoff of football club Lazio has fired the coach, Zeman.)
D Zoff (54) was lange tijd doelman van Juventus en keeper van het nationale team.
De preses verdedigde het doel in 1982 toen Italië de wereldtitel pakte. Hij was
aanvoerder van dat team.
Zoff, die ook Juventus trainde, coachte Lazio al eens, tussen 1990 en 1994.
Zeman volgde hem toen op, Zoff werd voorzitter.
E Zoff (54) has been the goalkeeper of Juventus and of the national team for a long
time. The chairman defended the goal in 1982 when Italy won the world
championship. He was the captain of that team.
Zoff, who also trained Juventus, has already been the coach of Lazio once,
between 1990 and 1994. Zeman then succeeded him, Zoff got to be chairman.
In these cases it would probably be confusing to use subject pronouns in the
continuation utterance, since they could easily be taken as referring to the previous
subject referent. This applies not only to the pronominal or elided subjects in (26) and
(27), but also to the nominal subject in (28). This last example is also noteworthy
because of the prominent status of the object referent (Zoff) in the preceding
sentences.
30
We might ask whether this tendency is not simply a parallelism phenomenon,
in the sense that subject pronouns prefererably refer to subject antecedents. In that
case, object pronouns would behave differently. In the continuation utterances, there
are not enough object pronouns to test this assumption. But we may use the central
utterances in this corpus for this purpose, since half of them contain object pronouns.
We looked at all the object referents in central utterances that were also referred to in
the immediately preceding utterance. Many of them were realized as pronouns in the
central utterance, but some were names or NPs. It turns out that the subject referents
of the previous utterances, when returning as object referents in the central utterance,
are somewhat more often pronominalized than object referents of the previous
utterances (see Table 26, Chi2 = 3.96, df = 1, p = .047). This generalizes the finding
that subject referents are more easily pronominalized in the next utterance.
Subject
Object
Total
antecedent
antecedent
ET of object referent
in utterance in utterance
in central utterance
-1
-1
64 (90%)
32 (78%)
96
Pronoun
6 (10%)
9 (22%)
15
Name or NP
70
41
111
Total
Table 26. Form of continuations for subject and object referents from previous
utterances returning as object referents in the central utterance
In sum, the subject-only and object-only continuations support the Centering Theory
claim that subjects are more easily continued as pronouns, even when zero form
continuations are not counted as pronouns. For the combined continuations, however,
no reliable differences were found. This suggests that there may be a situation of high
activation for two referents (see also Table 22 above), which is at odds with the
centering assumption that there is only one backward center.
5
Conclusions and discussion
Our findings are summarized in Table 27.
STUDY 1
STUDY 2
Continuation ‘being the
Absolute
Relative
Form of
variables referent of an continuation continuation continuation
ambiguous
Predictors
pronoun’
Grammatical role
+
+
Expression type
Backward center
+
+
+
status
Discourse
+
+
+
topichood
Table 27. Summary of findings in the two studies.
+/- = has / has no independent effect on the continuation variable in question
In our first study, we found that being realized as a subject, being the backward center
and being a discourse topic increases the chance that a referent is the intended
31
antecedent of an ambiguous personal pronoun. However, the dependent variable in the
first study was complex, in that it combined several aspects of referential
continuation. In the second study, we have been able to examine three of these aspects
separately: absolute continuation (is a referent taken up again in subsequent discourse
at all), relative continuation (is it the first or only one of the competing referents to be
resumed) and form of continuation (is it pronominalized on subsequent mention).
The three variables that were significant in the first study, were shown to
operate on different continuation aspects in the second study. Grammatical role only
affects the form of continuation. Backward center status increases the chances for a
referent of being referred to again (absolute continuation), while it also decreases the
chances for the competing referent of being the first or only one to be referred to
again (relative continuation). Discourse topichood has the same double effect. Of
course, this adverse effect on the competitor’s chances is only relevant in situations of
referential competition. Expression type turns out to be irrelevant to continuation in
the first study, but in the second study it could replace backward center status in the
optimal model in two out of four cases. We conclude that Expression type is best seen
as a symptom of backward center status or discourse topichood, because it has no
independent explanatory force.
This brings us to the three main theoretical implications of our findings. The
first thing to note is that studies of forward prominence of discourse referents need to
examine both local and contextual factors. Grammatical role is clearly local,
backward center status applies to the immediate context, and discourse topichood
pertains to the global context. The second point to make is that when discussing
forward prominence, we need to distinguish between the form and the content of
continuation. This distinction is missing from the current conceptualization of a
forward center as embodying a prediction regarding the backward center of the
following utterance. For instance, the grammatical role can be said to mark the
preferred forward center only in the sense of signaling which referent can most easily
be taken up in pronominal form, or even in zero form. It cannot be said to indicate
which referent is probable to reappear in the text. This brings us to the third point, that
incorporates the first two points: there seems to be a division of labor between local
and contexual determinants, in that the local feature of grammatical role primarily
affects the form of continuation, while the contextual features backward center status
and discourse topichood help predict the content of continuation.
Given that a stretch of discourse often contains continuous reference to the
same entity, it makes sense that the linguistic structure of utterances enables certain
referents to be taken up efficiently in the next utterance, i.e. by zero anaphora and
unstressed pronouns. These devices have been dubbed ‘minimal-gap devices’ by
Givón (1992: 21), meaning that they (almost) only apply to close-by referents. We
may add now that these devices primarily apply to subject referents. This is evidently
the case for zero anaphora in most languages; pronouns are not entirely restricted to
subject antecedents, but subject referents are certainly more often pronominalized
than referents in other syntactic positions. It seems that subject status enables a
referent to be treated as highly activated, at least on the short term of the upcoming
utterance.
But all this concerns the form of continuation. Whether the potential topic
participants actually reappear is a different matter. Not all subject participants are
highly likely to remain in the center of attention. In our data, this probability was
related to contextual factors, especially to whether the participant had already been
referred to repeatedly in prior discourse.
32
Our findings in the second study contradict those of Givón (1992: 21) and
Arnold (2001), who report that subject referents tend to be more ‘persistent’ than
other referents. The difference seems to be due to the different corpora used. It is very
likely that in ‘ordinary’ discourse with only one topical participant, subject status
coincides with discourse topic status; that is, local and global determinants do not
vary independently. But in this paper we separated these factors by focusing on
situations of referential competition, in which object referents matched subject
referents in terms of contextual prominence. While in this environments subject status
still constrains the form of continuation, it looses its potential of marking who is the
most probable protagonist of the next utterance. Put differently, when two (more or
less) prominent participants are around, one of them may appear in object position
without thereby implying that it will be less persistent in the following discourse. It
might well be that only object participants are such ‘tough’ competitors, not referents
in other syntactic positions such as oblique constituents; our first study suggests that
referents in prepositional phrases are generally discarded as possible referents for
ambiguous pronouns.
In section 2 we noted that Centering Theory offers several predictions
regarding the forward prominence of ‘older’ non-subject referents. The standard
framework does not decide between the subject preference and the backward center
preference, and Kameyama (1998) explicitly predicts the possibility of an
indeterminacy resulting from the conflict between the grammatical role and
expression type hierarchies. Our study does not offer support for expression type
effects on forward prominence. With regard to subject and backward center status, it
suggests that both factors are relevant, but have a different function: while subject
status constrains the form of repeated references, backward center status may help
predict what participant will be still referred to at all.
Besides backward center status, our studies provide strong support for the
more global contextual factor of discourse topichood, which is absent from the
centering framework, although its relevance has been known from psycholinguistic
work (e.g. Garrod et al. 1994). Our first study showed that it contributed
independently to the prediction of the intended pronoun antecedent; moreover, nonsubject antecedents tended to be especially strong discourse topics. The second study
indicated how the discourse topic status of a referent increases the probability of it
being referred to again and decreases the chances for the competitors to be the first or
only one to reappear. In this study, we did not find an interaction between
grammatical role and DTS: whereas the non-subject antecedents in Study 1 needed
the help of a considerable DTS, this was not the case for the object referents in Study
2. Discourse topichood seems an autonomous factor here.
The motivation behind both contextual factors seems to be the presumption of
topic continuity, that may overrule the local prominence provided by subject status.
Discourse clearly needs more global predictors for prominence assignment, given that
it is impossible to keep the main protagonist in subject position all the time. This is
especially true when other prominent participants are around: in the second study,
64% of the object referents that were mentioned in the previous utterance filled the
subject position there.
A last theoretical comment concerns the Centering constraint that utterances
contain only one backward center, or put more generally, the assumption that
discourse tends to have a single topical participant. Our second study has shown that
two participants may be jointly salient as well. In these circumstances, they tend to
reappear together in the same utterance, often in the same expression type. It seems
33
that the CB-uniqueness constraint needs to be relaxed (see Gundel 1998 and Poesio et
al. 2004 for further discussion of this issue).
In conclusion, we would hesitate to say that our statistical predictors ‘mark’
forward referential prominence in any straightforward sense. The performance of the
logistic regression models in the second study was generally quite modest, compared
to the model proposed in the first study. There is a considerable amount of
unexplained variance here, which is more than just indeterminacy produced by
conflicting signals. Nevertheless, the corpora used in both our studies clearly show
recognizable referential patterns, and it seems safe to hypothesize that both local and
global linguistic factors affect language users’ expectations about what discourse
referent will reappear and in what form.
References
Ariel, Mira (1990). Accessing Noun Phrase Antecedents. London: Routledge.
Ariel, Mira (2001). Accessibility theory: An overview. In: Ted Sanders, Joost
Schilperoord & Wilbert Spooren (eds.), Text representation: Linguistic and
Psycholinguistic aspects, Amsterdam etc.: Benjamins, 29-88.
Arnold, Jennifer E. (1998). Reference form and discourse pattern. Stanford,
CA:Stanford University Dissertation.
Arnold, Jennifer E. (2001). The effect of thematic roles on pronoun use and frequency
of reference continuation. Discourse Processes 31 (2), 137-162.
Au, Terry Kit-Fong (1986). A verb is worth thousand words: the causes and
consequences of interpersonal events implicit in language. Journal of Memory
and Language 25 (1), 104-122.
Brennan, Susan E. (1995). Centering attention in discourse. Language and Cognitive
Processes 10 (2), 137-167.
Brennan, Susan E., Marilyn W. Friedman & Charles J. Pollard (1987). A centering
approach to pronouns. In Proceedings of the 25th Annual Meeting of the
Association for Computational Linguistics, Stanford, CA, 155-162.
Chafe, Wallace L. (1994). Discourse, consciousness, and time. The flow and
displacement of conscious experience in speaking and writing. Chicago:
Chicago University Press.
Chambers, Craig G. & Ron Smyth (1998). Structural parallelism and discourse
coherence: a test of Centering Theory. Journal of Memory and Language 39 (4),
593-608.
Cornish, Francis (1999). Anaphora, discourse and understanding. Evidence from
English and French. New York: Oxford University Press.
Crawley, Rosalind A., Rosemary Stevenson & David Kleinman (1990). The use of
heuristic strategies in the interpretation of pronouns. Journal of Psycholinguistic
Research 19 (4), 245-264.
Frederiksen, John (1981). Understanding anaphor: rules used by readers in assigning
pronominal referents. Discourse Processes 4 (4), 323-347.
Garnham, Alan & Oakhill, Jane. (1992, eds.). Discourse representation and text
processing. A special issue of Language and Cognitive processes. Hove, UK:
Lawrence Erlbaum Associates.
Garrod, Simon C., Freudenthal, Daniel & Boyle, Elizabeth (1994). The role of
different types of anaphor in the on-line resolution of sentences in a discourse.
Journal of Memory and Languag 33 (1), 39-68.
34
Garrod, Simon C. & Sanford, Anthony J. (1994). Resolving sentences in a discourse
context: How discourse representation affects language understanding. In:
Gernsbacher, M.A. (Ed.). Handbook of psycholinguistics (pp. 675-698). San
Diego etc.: Academic Press.
Gernsbacher, Morton A. & Hargreaves, David J. (1988). Accessing sentence
participants: the advantage of first mention. Journal of Memory and Language
27, 699-717.
Gernsbacher, Morton A., Hargreaves, David J. & Beeman, Mark (1989). Building and
accessing clausal representations: the advantage of first mention versus the
advantage of clause recency . Journal of Memory and Language 28 (6), 735755.
Gernsbacher, Morton Ann, & Givón, Talmy (1995, eds.). Coherence in spontaneous
text. Amsterdam etc.: John Benjamins.
Givón, Talmy (1992). The grammar of referential coherence as mental processing
instructions. Linguistics 30 (1), 5-55.
Givón, Talmy (1995). Coherence in text vs. coherence in mind. In: Coherence in
spontaneous text, Morton A Gernsbacher & Talmy Givón (eds.), (pp. 59-115).
Amsterdam etc: Benjamins. Typological Studies in Language, 31.
Gordon. Peter C., Grosz, Barbara .J.& Gilliom, Laura A. (1993). Pronouns, names
and the centering of attention in discourse. Cognitive Science 17, 311-347.
Grosz, Barbara J., Joshi, Aravind K. and Weinstein, Scott (1995). Centering: a
framework for modeling the local coherence of discourse. Computational
Linguistics 21 (2), 203-225.
Graesser, Art C., Millis, Keith K., & Zwaan, Rolf A. (1997). Discourse
comprehension. In Janet T. Spence, John M. Darley, and Donald J. Foss (eds.),
Annual Review of Psychology 48 (pp. 163-189). Palo Alto, CA: Annual Reviews
Inc.
Gundel, Jeanette, Nancy Hedberg & Ron Zacharski (1993). Cognitive status and the
form of referring expressions in discourse. Language 69 (2), 274-307.
Gundel, Jeanette (1998). Centering Theory and the Givenness Hierarchy: towards a
synthesis. In: Centering Theory in Discourse, Marilyn A. Walker, Aravind K.
Joshi & Ellen F. Prince (eds.), 183-198. Oxford: Clarendon Press.
Hobbs, Jerry R. (1979). Coherence and coreference. Cognitive Science 3, 67-90.
Hoover, Michael L. (1997). Effect of textual and cohesive structure on discourse
processing. Discourse Processes 23, 193-220.
Hudson-D’Zmura, Susan & Michael K. Tanenhaus (1998). Assigning antecedents to
ambiguous pronouns: the role of the center of attention as the default assigment.
In: Centering Theory in Discourse, Marilyn A. Walker, Aravind K. Joshi &
Ellen F. Prince (eds.), 199-228. Oxford: Clarendon Press.
Järvikivi, Juhani, Roger P.G. van Gompel, Jukka Hyöna & Raymond Bertram (2005).
Ambiguous pronoun resolution. Contrasting the first-mention and subjectpreference accounts. Psychological Science 16 (4), 260-264.
Kameyama, Megumi (1996). Indefeasible semantics and defeasible pragmatics. In
Quantifiers, deduction and context, Makoto Kanazawa, Christopher Pinón 7
Henriette de Swart (eds.), 110-138. CSLI Publications.
Kameyama, Megumi (1998). ‘Intrasentential Centering: A Case Study’. In: Centering
Theory in Discourse, Marilyn A. Walker, Aravind K. Joshi & Ellen F. Prince
(eds.), 89-112. Oxford: Clarendon Press.
Kehler, Andrew (1997). Current theories of centering for pronoun interpretation: a
critical evaluation. Computational Linguistics 23 (3), 467-475.
35
Kehler, Andrew (2002). Coherence, reference and the theory of grammar. Chicago
etc.: Chicago University Press.
Kennison, Shelia M. & Gordon, Peter C. (1997). Comprehending referential
expressions during reading: evidence from eyetracking. Discourse Processes 24
(3), 229-252.
Matthiessen, Christian M.I.M. & Thompson, Sandra A. (1987). The structure of
discourse and “subordination”. In: Clause combining in discourse and
grammar, John R. Haiman & Sandra A. Thompson (eds.), 275-330. Amsterdam:
John Benjamins.
Poesio, Massimo, Di Eugenio, Barbara, Stevenson, Rosemary & Hitzeman, Janet
(2004). Centering: a parametric theory and its instantiations. Computational
Linguistics 30 (3), 309-363.
Sanders, Ted J. M., Spooren, Wilbert P. M., & Noordman, Leo G. M. (1992). Toward
a taxonomy of coherence relations. Discourse Processes 15 (1), 1-35.
Sanders, Ted & Wilbert Spooren (2001). Text representation as an interface between
language and its users. Text representation: Linguistic and Psycholinguistic
aspects, Ted Sanders, Joost Schilperoord & Wilbert Spooren (eds.), 1-25.
Amsterdam: Benjamins.
Sanders, Ted & Wilbert Spooren (in press). Discourse and text structure. In Handbook
of Cognitive Linguistics, Dirk Geeraerts & Hubert Cuykens (eds.). Oxford:
Oxford University Press.
Sanford, Anthony J. & Garrod, Simon C. (1994). Selective processes in text
understanding. In Handbook of Psycholinguistics, Morton A. Gernsbacher (ed.),
699-720. San Diego etc.: Academic Press.
Suri, Linda Z. & McCoy, Kathleen F. (1994). RAFT/RAPR and Centering: a
comparison and discussion of problems related to processing complex
sentences. Computational Linguistics 20 (2), 301-317.
Tetreault, Joel R. (2001). A corpus-based evaluation of centering and pronoun
resolution. Computational Linguistics 27 (4), 507-520.
Tomlin, Russell S. (1985). Foreground and background information and the syntax of
subordination. Text 5, 85-122.
Tomlin, Russell S. (1997). Mapping conceptual representation into linguistic
representations: the role of attention in grammar. In Language and
Conceptualization, Jan Nuyts & Eric Pederson (eds.), 162-189. Cambridge:
Cambridge University Press.
Walker, Marilyn A., Aravind K. Joshi & Ellen F. Prince (1998, eds.). Centering
Theory in Discourse, Oxford: Clarendon Press.
Wolf, Florian & Gibson, Edward (2004). Discourse coherence and pronoun
resolution. Language and Cognitive Processes 19, 665-675.
Notes
1
Perhaps the subject preference only holds for ambiguous pronouns. Hudson-D’Zmura & Tanenhaus
(1998, experiment 2) did not find differences in reading times between pronouns having subject
antecedents and object antecedents when gender information could be used when processing pronouns.
2
Note however that when two participants were referred earlier to by a single expression (e.g. a plural
pronoun), singular references to these participants were counted as repeated reference to the same
participant.
36
3
There are no principled reasons for this choice of gender. We used the same procedure for fragments
with sentences containing ambiguous female pronouns she (zij), but this did not give us enough cases
in the corpora.
37