How grammatical and discourse factors may predict the forward prominence of referents: two corpus studies Henk Pander Maat & Ted Sanders Utrecht institute of Linguistics OTS Utrecht University Trans 10 NL-3512 JK Utrecht The Netherlands [email protected] [email protected] Abstract One of the ways in which discourse coheres is by means of repeated reference to entities. Theoretical accounts of referential coherence propose heuristics for the interpretation of referential expressions, which are especially important when there is more than one potential antecedent. One of the most explicit accounts is provided by Centering Theory (Grosz et al. 1995). Using features such as grammatical status, expression type, and the referential relation with sentences still further back in the discourse, it produces a ranking of discourse referents in terms of forward prominence. We present two corpus studies of how these features, in combination with discourse topichood, help to predict referential continuations in actual discourse. In Study 1, we analyzed newspaper fragments in which he is preceded by a sentence presenting two male singular participants. The factors Grammatical Role (being a Subject), Backward Center Status and Discourse Topichood appear to increase the chance that a referent is the intended one for a potentially ambiguous pronoun, while Expression Type (noun or pronoun) makes no difference. In Study 2, the continuations for sentences with two referents differing on the same four factors were compared, assuming that the most prominent referent will reappear in the next sentence. The study reveals that Grammatical Role only affects the form of continuation: subject referents do not reappear more often, but when reappearing they are more often realized as pronouns. Backward Center Status increases the chance of subsequent references to a referent, and also decreases the chances for its competitor of being referred to again. Discourse Topichood has the same double effect. In conclusion, both global and local factors affect referential prominence, but in different ways. 1 Determinants of forward prominence Language users communicate by discourse. A crucial property of discourse is that the utterances are related in various ways: they show coherence. We consider discourse coherence to be a mental phenomenon, i.e. language users make a coherent representation of the discourse under consideration (Gernsbacher & Givón 1995; Garnham & Oakhill 1992; Graesser, Millis & Zwaan 1997; Sanders & Spooren 2001). Still, the discourse itself contains more or less overt signals that direct this interpretation process. Connectives like because, for instance, function as linguistic markers of coherence relations (Hobbs 1979; Sanders, Spooren & Noordman 1992) between utterances, which the reader uses as a cue to interpret the upcoming utterance as the cause for a consequence stated in the first utterance (Noordman & Vonk 1997). In addition to such relational coherence, there is another important type of coherence: referential coherence, which results from the continuous reference to entities in the discourse. The relevant linguistic signals are references in various forms: full NPs, pronouns, zero anaphora, etc. (Ariel 1990, Cornish 1999, Givón 1983, Gundel, Hedberg & Zacharski 1993). Over the last decades, linguistic and psycholinguistic studies have shown the relevance of both types of coherence for the study of discourse (Sanders & Spooren 2007). In this paper we focus on referential coherence. Referential coherence is not just a matter of retrospectively linking current discourse segments to earlier ones. It also includes preparing the reader for certain continuations. This requires writers and readers to coordinate their individual models of referential prominence. Arnold (2001: 154) explains why this coordination is relevant for discourse processing: I propose that the accessibility of a given referent, which can be modeled in terms of activation of its representation, is driven by the likelihood that it will be referred to again in the following discourse (…). When a referent is likely to be referred to again, it behooves comprehenders to instantiate a relatively activated representation of that referent in their model of the discourse. Then, if the speaker does refer to that entity again, comprehension will be facilitated. This notion of forward prominence can be illustrated by considering ‘ambiguous’ subject-pronouns, in which the preceding utterance provides several candidate antecedents. The most prominent of these is probably the intended referent for the pronoun. But how does the reader determine which referent is meant to be the most prominent one in fragments such as (1)? (1) George hit Al. He … This referential configuration is not so uncommon as one might think. In fact, our Dutch newspaper corpus shows that approximately 2.2% of the sentences preceding the male singular pronoun hij (he) contains two potential antecedents (i.e. two singular male participants). This suggests that language users deal with pronoun ambiguity on a routine-like basis. How do theories of the referential structure of discourse deal with this problem? Many prominent theories consider pronouns to indicate a high degree of activation (Chafe 1994), importance (Givón 1995), or accessibility (Ariel 1990, 2001, Givón 1992: 16) of their referents, as opposed to, for instance, full NPs. So, the 2 question seems to be which participant in (1) is highest in terms of activation. One of the factors determining activation is the distance (in number of clauses) between the anaphor and the preceding reference to a potential antecedent: the larger the distance, the lower the activation. This factor does not help in the resolution of ambiguous pronouns such as in (1), since both candidate referents are equally close here. For this reason, we need to look for more subtle cues regarding the prominence of referents. Centering Theory (see Grosz et al. 1995 and Walker et al. 1998 for overviews) offers an attractive theoretical framework for this investigation. Given a multi-referent utterance, the theory predicts the likelihood for these referents of (still) being a central referent in the next utterance. The salience of a discourse entity is determined by a combination of syntactic, lexical and contextual factors, such as grammatical role (subject or not), expression type (zero, pronoun or NP) but also its occurrences in the preceding discourse segments. Next, we wil review these factors one by one. For cases like (1), grammatical role seems a crucial factor. Brennan et al. (1987) and Kameyama (1996) have suggested a ranking in which the subject precedes the object(s) and in which the object(s) in turn rank higher than other grammatical functions; that is, when a referent is placed in subject position, it stands a better chance to be the central referent in the subsequent discourse than when it is an object in the preceding utterance. In a corpus study of oral reports of a basketball game, Brennan (1995) has shown that speakers tend to introduce participants as subject referents more often when they expect this participant referent to remain in the center of attention than when subsequent events are uncertain. She also points out a difference between the ways in which referents of subject and object NPs are referred to in subsequent utterances: subject participants were more often pronominalized than object participants. She explains this by reasoning that object referents are not fully established yet as the center of attention, and can more easily be pronominalized after having filled the subject position once. That subject referents are especially prominent has been suggested by other theorists as well. For instance, Givón (1992, 10-11; 1995, 65-67) proposes a grammatical hierarchy similar to the centering hierarchy for what he calls ‘topicality’, and the corpus evidence presented in Givón (1995) supports the claim that subject referents are more persistent in subsequent discourse. Arnold (1998) arrives at similar conclusions. Tomlin (1997) even states that the subject role grammatically encodes topic-status, and that the notion of topic by itself is superfluous. Moreover, the subject preference in the interpretation of pronouns has received empirical support in experimental work by centering theorists (Gordon et al. 1993, Gordon & Scearce 1995, Hudson-D’Zmura & Tanenhaus 1998) and others (Frederiksen 1981, Crawley et al. 1990). For instance, when a subject referent is resumed by means of a name instead of a pronoun, this slows down reading. This phenomenon has been called the ‘repeated name penalty’. In reading time experiments by Gordon et al. (1993) and eye-tracking experiments by Kennison & Gordon (1997), this penalty did not occur when the intended referent was in nonsubject position.1 Another indicator for referential prominence relates to the expression type used for the antecedent. For instance, Givón (1992, 1995) has stressed the difference between definite and indefinite noun phrases, in that indefinite nominals typically introduce referents that will be resumed in subsequent discourse. Kameyama (1998) suggests that pronominalization indicates the forward saliency of a referent. For instance, the object referent seems to stand a greater chance of being resumed in (1b) than in (1). 3 (1) (1b) George hit Al. He … George hit him. He … In order to see why pronoun use could indicate forward prominence, we need to introduce a few notions of Centering Theory (Grosz et al. 1995, Walker et al. 1998). According to the theory, every non-initial utterance is taken to contain both a socalled backward center (CB), serving as a link to the preceding utterance, and a set of forward-looking centers (CF). It is assumed that there is only one CB; the CF set is ranked according to preference. The highest-ranked member of the set, the preferred center (CP), represents a prediction about the backward center of the following utterance. In terms of this framework then, the issue to be decided in the resolution of the anaphor he in (1) is: what referent must be taken as the preferred forward center (CP) of the first utterance? In (1b), the object pronoun suggests that the object referent is already salient; in fact, Centering Theory posits that when there is only one pronoun in a segment, this must refer to the backward center. It might be that readers expect a CB to be continued within the subsequent discourse, because they assume that discourse tends to have a central protagonist. This assumption would movitate a preference for pronominal referents when searching antecedents for ambiguous pronouns. This discussion implies that, in fact, it may be backward center status that confers referential prominence on a referent, not its expression type. In that case, CBs in the form of full NPs would also be strong candidates for resumption in subsequent discourse. In our corpus studies, we will try to tease apart both factors. Besides grammatical role, expression type and/or backward center status, we will consider another potential determinant of forward prominence: the degree to which a referent is the global discourse topic. For instance, fragment (1) may be preceded by a story in which George is the main protagonist, but also by a story about Al. In a reading time study, Hoover (1997) has shown that readers expect the main protagonist to be pronominalized and a secondary character to appear in nominal form, not the other way round. Similarly, quite a number of studies (such as Garrod et al. (1994); see also Garrod & Sanford (1994) and Sandford & Garrod (1994) for overviews) have shown that the interpretation of pronouns is affected by the global discourse topic, and not just by the make-up of the immediately preceding utterance. We will further refer to this factor as discourse topic status. This study focuses on grammatical role, expression type / backward center status and discourse topic status as possible factors affecting pronoun resolution and forward prominence in general. This is not to say that they are the only relevant factors. Let us briefly review other factors proposed in the literature. The first one is order of mention. Gernsbacher (Gernsbacher & Hargreaves 1988, Gernsbacher, Hargreaves & Beeman 1989) has proposed that the firstmentioned noun phrase in a sentence has a privileged status over other potential antecedents. The first-mentioned antecedent is generally the subject referent in languages like English, but in languages with a free word order, such as Finnish, the two factors can be examined separately. In an eye-tracking study using Finnish texts, Järvikivi et al. (2005) have shown that both being mentioned first and being the subject referent independently increase the chance that a referent is considered the probable antecedent of an ambiguous pronoun (see also Gordon et al. 1993). Since our 4 study uses Dutch corpus materials with canonical SVO word order, we cannot tease apart order of mention and grammatical role. Henceforth we will focus on grammatical role, without entirely being able to rule out the possibility that some of the effects are due to a combined operation of role and order of mention. Another factor influencing the resolution of ambiguous pronouns is the thematic role of the antecedent being considered. For instance, when reading a sentence starting like John amazed Bill because he … readers tend to interpret the pronoun as coreferential with John; this is not because John is in subject position, but because he has the thematic role of Stimulus, while Bill has the role of Experiencer. The preference for Stimulus antecedents over Experiencer antecedents also holds when the grammatical roles are changed, as in Bill admired John because he …, in which John is still the preferred antecedent (Au 1986). Another thematic role effect was shown in a continuation experiment (Arnold 2001) on transfer verbs (e.g. give, offer, send). It was found that Goal participants were more often referred to first in the continuation utterance than the source participants. Another factor affecting pronoun resolution is structural parallelism between the pronoun sentence and its predecessor. Chambers & Smyth (1998) have shown that in certain contexts, pronouns tend to be seen as coreferential with the antecedent in the same structural position. For instance, in Josh criticized Paul. Then Marie insulted him the object pronoun is considered to refer to Paul, not Josh. This tendency has been empirically supported for two utterances with the same global constituent structure, and containing predicates that are semantically close to each other (e.g. criticized / insulted, handed / passed). Moreover, in this parallel context the repeated name penalty, which is normally restricted to expressions referring to the subject referent of the preceding sentence, is equally valid for expressions with subject and object antecedents. Finally, referential coherence has been shown to be co-determined by relational coherence. Following suggestions by Kehler (2002), Wolf & Gibson (2004) examined pronoun reading times in the context of resemblance and causal coherence relations. They hypothesized that the parallel preference strategy posited by Chambers & Smyth (1998) only holds for certain coherence relations. In the context of resemblance relations, parallel references are read faster than non-parallel ones, as in the Josh & Paul-example above. But when the same segments are related by a causal connective, the non-parallel version (Fiona defeated Craig and so James congratulated her) is faster than the parallel one (Fiona defeated Craig and so James congratulated him). Although we do not question the relevance of the factors just reviewed, we believe that thematic role and coherence relation cannot be central to people’s heuristics in everyday language use, simply because they often do not clearly guide expectations. Thematic roles, for instance, are undoubtedly important predictors of continuations in the environment of verbs of transfer (Arnold 1998) and stimulus-experiencer predicates (Au 1986), but many other predicates do not imply similar restrictions on referential continuations (compare the verbs listed in Au (1986)). Also, the structural parallelism factor has been shown to be considerably reduced when the constituent structures of two adjacent utterances do not match exactly, as (2) and (3) illustrate. (2) (3) Josh criticized Paul and then Marie insulted him. Josh criticized Paul and then Marie asked him to leave. 5 We suspect that full parallelism as in (2) is quite rare in actual discourse. In this study we will focus on the factors grammatical role, expression type, backward center status and discourse topic status, because language users can use these indicators in almost all contexts of referential ambiguity. But before proceding to our corpus studies, we will refine our research question in terms of Centering Theory. 2 Centering Theory views on forward prominence 2.1. Four types of transitions In its most influential version, stemming from Brennan et al. (1987), Centering Theory distinguishes between four kinds of referential transitions. The distinctions are made in terms of the notions of preferred forward centers (CP) and backward centers (CB) which were introduced above. We will illustrate them with small discourse fragments consisting of three utterances: A context utterance (in parentheses), followed by a first (U1) and a second utterance (U2). The transition we are concerned with is the one between U1 and U2. Following the standard view in Centering theory, we will assume that the subject realizes the preferred forward Center (CP),. The first type of transition is the so-called Continuous Sequence. Consider (4): (4) U1 U2 (George was not here yesterday.) He (CB1) was ill. He (CB2 = CB1 = CP2) is often ill at this time of year. (4) realizes a continuous sequence, because the CB of the second utterance is identical to both the CB of the first utterance (CB2 = CB1) and the preferred forward center of the second utterance (CB2=CP2). For the definition of continuous sequence it does not matter whether the backward center of U1 and U2 (CB1/CB2) is also the preferred forward center of U1 (CP1). In (5) another participant (Al) appears in subject position in U1, thereby constituting CP1; but the fragment still represents a continuous shift. (5) U1 U2 (George has not been in for two days now.) Al (CP1) has just called him (CB1). He (CB2 = CB1 = CP2) turned out to be ill. The second type of shift is called Retain. Again, CB2 is identical to CB1 in this sequence, but CB2 is not identical to CP2, see (6) and (7). (6) U1 U2 (7) U1 U2 (George was not here yesterday.) He (CB1=CP) was exhausted. Doctor Stephens (CP2) has ordered him (CB2=CB1) to rest for a week. (He is very cautious in cases like this.) (George is ill). Al (CP1) heard that from him (CB1). Maria (CP2) took a few days off to take care of him (CB2=CB1) 6 U3 (She has a very busy job, so there must be something seriously wrong). In this type of shift, CB1 returns as CB2, but together with this element of continuity, U2 contains a ‘new’ forward looking center (CP2), so that there is a slight reorientation in the discourse. The third type is the smooth shift. In this case, CB1 does not return. It is replaced by a new CB2 that also CP2. The shift is smooth because CP2 does not come as a surprise, having been introduced earlier in U1. See examples (8) and (9). (8) U1 U2 (George has been ill since Monday.) He (CB1 = CP1) is being looked after by Maria (Y). She (CB2 = Y = CP2) has taken a few days off. U1 U2 (George has not been here for two days.) Al (Y=CP1?) has called him (CB1). He (CB2 = Y = CP2) wanted to know what was going on. (9) Finally, there is the so-called rough shift, in which no centers are shared: CB2 is not CB1, but CP2 is new as well, see (10). (10) U1 U2 (George is ill.) The doctor (CP1) has sent him (CB1) to a specialist (Y). Maria (CP2) started a row with that man (CB2=Y). CB2 can even be missing, as in (11). (11) U1 U2 (George is ill.) The doctor (CP1) has sent him (CB1) home. Maria (CP2) did not like it at all. Rule 2 in Centering Theory states the following preference order for transitions: continue > retain > smooth shift > rough shift. In other words, the more continuous a sequence is, the more it is to be preferred. 2.2 How Centering Theory deals with ambiguous pronouns Let us return to our pronoun ambiguity problem. So far we have used George hit Al. He…as an example, but by now it is clear that we need more context for a centering analysis. Next, we discuss three scenario’s, each of them starting with an utterance referring to two characters by their names. Most often, the participants will have been mentioned earlier on: (12) U1 U2a U2b (George and Al are twin brothers. They are often angry at each other.) Yesterday, George (CB1=CP1) punched Al (CB1) in the face. He (CB2=CB1=CP2) had been pestered all day. He (CP2) fell (CB2=CB1) down with a bloody nose. This example is problematic since it forces us to posit two backward centers in U1, something which is not allowed in the current version of Centering Theory. If it was, 7 however, both continuations would constitute Continue transitions, since in both continuations the CB remains in place. So there may be ‘symmetrical’ referential constellations in which both referents are equally prominent, notwithstanding the fact that only one of them fills the subject position. Here, the local factor is overridden by the discourse structural factor. Let us now inspect two cases in which one of the referents is pronominalized and has already been introduced in the preceding discourse, while the other is new. First assume that the subject referent is ‘old’ and is realized as a pronoun, while the object referent is new and realized as a name. Resuming the old subject referent constitutes a Continue transition (see U2a), while resuming the object referent is a Smooth Shift (see U2b). U2a is clearly preferred: (13) U1 U2a U2b (George has not been in for two days now.) He (CB1= CP1) has just called Al (Y). He (CB2 = CB1 = CP2) turned out to be ill. He (CB2 = Y = CP2) was happy to receive the call. Finally, consider the most complex case: an utterance in which the subject referent is referred to by name and the object referent is ‘old’ and realized by pronoun. Suprisingly, this does not change the transition analysis. Resuming the subject referent still constitutes a Continue transition (see U2a), whereas resuming the object referent is a Smooth Shift (see U2b): (14) U1 U2a U2b (George has not been in for two days now.) Al (Y=CP1) has just called him (CB1). He (CB2 = CB1 = CP2) turned out to be ill. He (CB2 = Y = CP2) wanted to know what was going on. The principle that Continue transitions are preferred over Smooth Shifts embodies a preference for global continuity, such as can be seen in U2a: the subject referent of the context utterance George remains the backward center in U1 and U2, even though there is another subject referent in U1. However, besides proposing a preference order for transitions, centering theory holds the assumption that subject status is an indication of CP-status (at least in English). This claim seems to embody a preference for local continuity: the Subject referent of U1 is CP and therefore the most likely candidate for CB in U2. This would make U2b be the preferred continuation in (14). So far, we have sketched the rules in the standard framework of Centering Theory. However, Kameyama (1998) presented a somewhat different proposal. First, whereas the standard framework assumes that the set of forward Centers is always ordered, so that there is always a single entity that is the most prominent, Kameyama allows for the possibility that there is indeterminacy: different entities may compete for prominence. This indeterminacy might explain for the phenomenon of ‘real’ pronoun-ambiguity. Second, Kameyama does not regard the different transition-types as primitives. Instead she introduces two linguistic hierarchies: the grammatical function hierarchy (GF-ORDER) and the nominal expression type hierarchy (EXPORDER). (a) (b) GF-ORDER: subject > object > object2 > others. EXP-ORDER: zero pronominal > pronoun > definite NP > indefinite NP. 8 GF-ORDER states that a higher ranked phrase is normally more salient in the ranked set of forward centers, which Kameyama refers to as the ‘output attentional state’ of the utterance. EXP-ORDER states that an entity realized by a higher ranked expression type is normally more salient in the ‘input attentional state’, that is the prominance ranking passed on to the utterance by the preceding discourse. Kameyama also claims, somewhat mysteriously, that EXP-ORDER predicts the relative salience of entities in the output attentional state, “since these assumed salient levels are also accommodated into the context” (o.c. 92). This might be viewed as a local way of capturing the global preference for continue transitions in the standard framework. After all, many pronouns refer to backward centers. In Kameyama’s terms, it is the interaction between GF-ORDER and EXPORDER which produces indeterminacy in cases like (14) above: in U1 the subject Al and the pronoun him are both salient entities, one on the basis of GF-order, the other on the basis of EXP-ORDER. To sum up, where the standard Centering Theory framework produces conflicting predictions - the global preference for continuous transitions conflicts with the local principle that subjects realize preferred forward centers – Kameyama’s approach predicts an indeterminacy resulting from the interaction of two local mechanisms: GF-ORDER and EXP-ORDER. As stated earlier, this paper investigates the contributions of grammatical role, expression type and backward center status to forward prominence. The underlying issue is to what degree local and contextual factors determine forward prominence. Grammatical role is clearly a local factor, since it is only weakly constrained by the preceding text. Expression type is different in that it has been shown to crucially depend on input levels of activation; we might call it a local reflection of a contextual factor. Backward center status is more clearly contextual in that it may be, but need not be reflected in expression type. The backward center is simply the highest ranked CF of the previous utterance that is realized in the current utterance (here we follow the standard CB-definition of Brennan et al. 1987). The Centering preference for continuous sequences embodies a contextual constraint according to which the forward prominence of a referent depends on backward center status, that is on having been mentioned prominently in the preceding segment. However, the relevant context may be larger than just one segment. For instance, it might be that the importance of a referent in the preceding paragraph independently contributes to its forward prominence. Hence we will also incorporate a global measure of referential importance in this study, the so-called Discourse Topic Score, to be defined in section 3.3.4. We close this section with a note on terminology. Poesio et al. (2004) have pointed out that there a various ways in which the central notions and claims of Centering Theory have been defined. Therefore, any study pretending to apply centering notions should clearly state which version of the framework is being used. The main notions that need to be explicated for this study are backward center, what to count as a realization of a referent, and how to define an utterance. Our definition of backward center has been given above. As for realization, we will only take into account direct mentioning of referents, excluding bridging inferences of various kinds (e.g. between items such as The room and the door in later utterances) 2. Our impression is that this restriction does not make too much of a difference. Our study focuses on person references and entities associated to persons are often marked as such by possessive pronouns, producing a direct reference, 9 avoiding the need of bridging inferences. Finally, we will define utterances as finite clauses, although, as we will see later, not every clause is equally important in referential terms (see 3.2). 3 A corpus study of pronoun ambiguity 3.1 Research method The issues raised in section 2.2 clearly require empirical investigation. Most empirical work on referential coherence uses either corpus analysis (e.g. Ariel 1990, Givòn 1992, Brennan 1995) or experiments (e.g. Gordon et al. 1993, Chambers & Smyth 1998). Both methods have their strengths and weaknesses. Experiments offer the advantage of control: potentially distracting variables can be controlled for in the experimental materials. On the other hand, corpus analysis presents a more realistic view on the referential transitions actually encountered in certain discourse genres; moreover, corpora offer richer contexts for referential expressions than the one or two simplex sentences that usually precede experimental target sentences. In this paper, we use corpus analysis to examine the role of four factors contributing to referential prominence: subject status, expression type, backward center status, and discourse topichood. Centering theorists have proposed the first three factors as characteristics of preferred referential configurations. It must be kept in mind that predictions such as these cannot be directly tested in a corpus analysis. The Centering framework is about preferences that make discourse easier to process, not about choices in actual language behavior, as is rightly pointed out by Poesio et al. (2004). However, since discourse probably tends not to violate expectations of average readers on a structural basis, the Centering predictions are at least relevant for corpus data, and vice versa. That is, frequent referential progressions in actual discourse are at least what readers are ‘used to’, and may even be what they actually expect to encounter. We hasten to add that the primary aim of our study is not to test the Centering Theory as a framework. Such an attempt has been made by Poesio et al. (2004) already. For instance, they present corpus results regarding (different versions of) the central Centering claims that utterances tend to have a (single) backward center, and that this center tends to be pronominalized. The present study is not so much about Centering, as it is about the factors that affect the forward prominence of referents in natural discourse. To explore this issue, we focus on a specific discourse environment: a context in which two participants compete for prominence. Our study investigates two continuation variables that, in our view, have been insufficiently distinguisted in theorizing on referential coherence: they may affect which referents are expected to reappear in subsequent discourse, or they may affect the form of continuation. In the first corpus study, we studied forward prominence by means of a corpus of ambiguous pronouns, reasoning that the intended referent of the pronoun will generally be the most prominent one available in the context. In this study, the first continuation variable is investigated: what participant is being taken up in subsequent discourse. The form of resumption is not at stake here, because it is kept constant. The second study uses the approach of Arnold (2001): it examines referential continuations in a collection of fragments around a specific type of utterances. We 10 selected utterances that contained both a personal subject and a personal object and compared the continuations for both kinds of participants. In this study, we examine which referents reappear and what form the repeated reference takes. The data were gathered from an electronic data-base of articles from the 1994 and 1997 issues of De Volkskrant, a Dutch quality newspaper. From this corpus, we collected 100 passages in which a sentence containing the Dutch pronoun hij (he) was preceded by a sentence in which two male singular participants are mentioned.3 The cases included in our corpus were of the type (1), in which two candidate referents, in this case George and Al, are introduced in the first segment U1, while U2 contains the ambiguous pronoun he that could refer to both possible referents . (1) George hit Al. He…. The segments in our corpus were often sentences, but could also be clauses making up complex sentences. In (1c) for instance, the he-utterance is a dependent clause: (1c) George hit Al after he …. In all corpus fragments, the subsequent text was then examined to determine which referent was the Actual Referent (AR), i.e. the one intended by the writer. Thus, each selected text fragment contained two candidate referents, of which one was identified as the AR in retrospect. We refer to the other referent as the Potential Referent (PR). In all cases, two analysts agreed on the interpretation. 3.2 Candidate referents in different clauses So far, we have discussed cases of ambiguous reference in which the first simplex sentence, S1, contained AR en PR, the ambiguous pronoun he was in S2, and both S1 and S2 were main clauses. However, AR and PR may be in different clauses within S1. These clauses may appear in three configurations: the two clauses are coordinated, the Main Clause (MC) precedes the Subordinate Clause (SC) or MC follows SC. In all three environments, AR may precede or follow PR; this produces six configurations, see Table 1: Clause configuation Coordination Main-subordinate Subordinate-main Order of referents Actual – Potential Potential – Actual 1. C1 (AR), C2 (PR) 2. C1 (PR); C2 (AR) 3. MC(AR); SC(PR) 4. MC(PR); SC(AR) 5. SC(AR); MC(PR) 6. SC(PR); MC(AR) Table 1. Configurations for candidate referents in complex sentences In the following illustrations we mark the expression containing the AR in bold face, while the PR expression is italicized. Fragment (15) illustrates configuration 3, a main clause with AR followed by a subordinate clause providing PR: (15) D Van Eyck stapt op, maar keert terug als er een flinke reorganisatie is doorgevoerd en een ontslagen hoogleraar weer is aangenomen. Hij blijft tot zijn pensioen, al houdt hij een grote weerzin tegen de bureaucratie op de TH. 11 E Van Eyck resigns, but returns when a radical restructuring has been implemented and a dismissed professor has been re-appointed. He stays on until his retirement, although he remains disgusted with the bureaucracy at the Technical University. Fragment (16) illustrates configuration 2 (coordination in PR-AR order), while (17) illustrates configuration 4 (PR-clause ranks above AR-clause in PR-AR order). Note that in (16), the he-clause itself is subordinated to the AR-clause. (16) D (From a report on the prize-giving ceremony of a literary prize at which two writers, Durlacher and Matsier, get the same number of votes:) Wat leuk dat er twee winnaars zijn! Durlacher kan het niet geloven, Matsier wel; voor het eerst van de avond lacht hij onspannen. E How funny that we have two winners! Durlacher can’t believe it, Matsier can; for the first time this evening he shows a relaxed smile. (17) D Grant beschrijft twee facetten van Delors die duidelijk maken waarom hij zo kon slagen in Europa: … E Grant discusses two characteristics of Delors that explain why he could be so successful in European politics: … In all these cases, the candidate referents are in different linear positions – one is mentioned earlier than the other – but on top of that, one may be in a main clause and the other in a subordinate clause. Does this syntactic variable help in determining AR? When PR precedes AR, pronoun resolution is probably easy because AR is the nearest candidate referent for he. Cases with the order AR-PR seem more complex, and here it may be expected that syntax plays a role in indicating the relative importance of referents. More specifically, in this order we expect AR to appear in the main clause and PR in the subordinate clause. This is exactly what we find (see Table 2). Clause configuration Equal status * Subordinate – main Main – subordinate Main – subordinate – he-clause subordinate again Total Referent order AR-PR PR-AR 1 6 0 3 13 0 0 7 14 16 Table 2. Results of the subcorpus with PRs in different clauses (n=30). * The equal status-category contains two cases in which the PR-clause was interpolated in the AR-clause, following the mentioning of AR. In PR-AR order, the PR-clause may well be a coordinated clause such as in (16). By contrast, in AR-PR order the PR-clause is almost always subordinated, such as in (15) above. In other words, the AR is either mentioned most recently, or is in a clause dominating the clause with the PR. This result implies that readers looking for intended referents may sometimes disregard a subordinate clause in their backward search; only the referent in the main clause is a serious candidate. There is one exception to this pattern: AR can occur in a subordinate clause when the he-clause in its turn is subordinated to the (already subordinated) AR-clause, as in (17) (Chi2= 12 20.00, df=1, p=.000 for the two-by-two comparison between the two mainsubordinate configurations in Table 2). In other words, while independent heutterances may refer further back to the main clause of the preceding sentence, dependent he-clauses refer back to the immediately preceding higher clause. These results are interesting because of the discussion between centering theorists about what should be defined as an ‘utterance’ when analyzing referential progressions. Kameyama (1998) suggested that ‘referential calculation’ is not done sentence by sentence, but clause by clause (at least for tensed subordinate clauses). More specifically, she proposed to break up complex sentences into a set of “center updating units” . These finite clauses would all be of equal weight. By contrast, Suri & McCoy (1994) propose to see postposed subordinate clauses as ‘embedded’, and not treat them as updating units. Poesio et al. (2004) demonstrate that both parameter settings may produce violations of the constraint that utterances should have a backward center, since they may either have a referential link to the immediately preceding subordinate clause or to main clause preceding this clause. Our data cannot decide on this issue, because they only deal with pronominal references to persons in quite specific ‘competition’-contexts. But they do show that sentence-final subordinated clauses tend not to be seen as updating units by writers, thus favouring Suri & McCoy’s side of the debate. Of course, this finding is also in line with other work showing that subordinate clauses in final position tend to present background information (Tomlin 1985, Matthiesen & Thompson 1987). 3.3 Candidate referents in the same clause In our corpus of 100 cases of ambiguous he, 70 cases had the candidate referents in the same finite clause. From now on, we will call this clause the AR-PR utterance. In this part of the corpus (n=70), we investigated the influence of four factors that we expected to guide the identification of AR: 1. the grammatical role of the candidate referents in the AR-PR utterance, the hypothesis being that subject referents will more often be the AR than nonsubject referents; 2. the expression type of both referents, the hypothesis being that pronominal referents will more often be the AR than nominal referents (noun phrases and names); 3. the possible backward center status of the referent, expecting that backward center referents are the AR more often than other referents; 4. the discourse topichood of both referents, expecting that ARs are more often the global discourse topic than their competitors. Most of these factors have been already defined above, with the exception of discourse topichood, which will be discussed in 3.3.4. 3.3.1 Grammatical role In our corpus, grammatical role always discriminates between candidate referents. That is, we never find conjoined referents like the ones in the following fragment (Givón 1992, 14): (18) *I saw Joe and Billy. He didn’t look well. 13 The distribution of grammatical roles over the candidate referents is summarized in Table 4. Actual referent Potential referent Subject 11 (16%) 53 (76%) (In)direct object 8 (11%) 5 (7%) Prepositional phrase* 3 (4%) 52 (74%) Other 6 (8%) 2 (3%) Table 3. Grammatical role of ARs and PRs in the AR-PR utterance. * This category includes both prepositional phrases functioning as are arguments in the predicate frame and prepositional phrases functioning as adverbials. Table 3 shows that the grammatical role hypothesis is supported: AR is often realized as subject, while their competitors only rarely appear as subjects (Chi2= 50.77, df=1, p=.000 for the two-by-two comparison between subjects versus other roles by actual versus PRs). Instead, PRs are strikingly often realized as prepositional phrases (that may function both as oblique arguments of the predicate and as adverbials). In 47 (67%) of the cases a subject-AR was combined with a prepositional phrase-PR. Since prepositional phrases are lowest in the syntactic constituent hierarchy proposed in centering theory, this implies that in most cases the difference in syntactic prominence between actual and potential referents is maximal. (19) is a prototypical example: (19) D Kalff was de logische opvolger van Hazelhoff. Hij maakt al 17 jaar deel uit van de top, eerst bij de ABN en na de fusie ook bij ABN Amro. E Kalff was the obvious successor of Hazelhoff. He was part of the top management for 17 years already, first within the ABN and after the merger also within ABN Amro. The importance of grammatical role can be illustrated further by looking at cases in which both referents are realized in identical expression types. Our corpus contained thirty of these cases, four of which could not be used because the subject of the preceding utterance did not refer to a person. Out of the 26 remaining cases, AR was the subject of the preceding utterance in 24 cases (92%). Now it could be argued that the preference for subject referents does not stem from the grammatical function hierarchy but from grammatical parallelism: subject pronouns prefer subject antecedents. If that were the case, object pronouns should behave differently from subject pronouns. In order to check this possibility, we collected 10 cases of ambiguous him. In all of these cases, him refers to the subject referent of the preceding utterance, as in the following fragment: (20)D Scholma is vertrouwd met de rol van secondant. Hij deed het in 1976 voor de eerste keer ten behoeve van Sijbrands. Wiersma heeft hem gekozen omdat hun ideeën over het damspel overeen komen. E Scholma is familiar with the role of second. He did it first in 1976 for Sijbrands. Wiersma chose him because they have the same view on draughts. The subject preference is found for both subject and object pronouns. Therefore, grammatical role seems to be the factor that guides the search for antecedents, rather than grammatical parallelism. 14 3.3.2 Expression type A second feature that could affect the interpretation of ambiguous pronouns is the expression type used to refer to AR and PR: do they appear as pronouns (he, him), Names (e.g. Kalff) or NP’s (e.g. a / the dismissed professor)? Actual Referent Potential referent Pronoun 27 (39%) 5 (7%) Name 38 (54%) 50 (72%) NP 5 (7%) 15 (21%) Table 4. Expression type for actual vs. potential referents in AR-PR utterances Table 4 shows that the expression type hypothesis is supported: AR regularly occurs as a pronoun, while its competitor only rarely does so (Chi2= 21.76, df=2, p=.000). Note that the proportion of pronouns among ARs is not overwhelming. Also, we need to add that expression type often does not discriminate between the referents. A particularly regular configuration (n= 34; 48,5%) is the one in which both referents are realized by names (sometimes accompanied by introductory NPs). Although expression type does not appear to be a strong predictor of continuation in itself, it might perform a secondary role, in that it adds weight to mark the prominence of non-subject referents. Hence we checked whether non-subject ARs are more often realized as pronouns than subject ARs. There is no quantitative support for an interaction between grammatical role and expression type of the antecedents for ambiguous pronouns. 3.3.3 Backward center status Grammatical role and expression type are ‘local’ determinants of prominence, in that they only pertain to the utterance with the candidate referents. We will now go one step further back in the preceding discourse and examine the relation between the ARPR utterance and the utterance preceding it. This relation determines the kind of Centering transitions realized by the ambiguous pronoun utterance following the AR-PR utterance. For both actual and potential antecedents, we coded whether or not they were the backward center of the AR-PR utterance. Centering Theory predicts that actual antecedents of pronouns will preferably be the backward center in their utterance, since in that case the pronoun utterance meets the conditions for a Continue transition: • the he-referent is generally the CB of the pronoun utterance • it is identical to the CB of the AR-PR utterance • the he-referent is also the preferred forward center, being in subject position. When the pronoun antecedent is not the backward center in its utterance, there are two possibilities. First, not the actual but the potential antecedent may be the backward center in the preceding utterance. In that case, the pronoun utterance realizes a Smooth Shift, as in fragment (9) above. Second, neither antecedent can be a backward center. This may be the case when the two participants are both new to the discourse, or when they are both mentioned in the same preceding utterance in the same syntactic role and have the same expression type in the AR-PR utterance. In the last case, we followed the centering theory assumption that there cannot be two backward centers. When neither referent is a backward center, no predictions on pronoun resolution can be derived from transition types. 15 When coding referents as backward centers, we adopted a liberal definition: a referent could not only be a backward center when being mentioned in the preceding utterance, but also when being mentioned two utterances earlier, provided of course that the competitor has not been mentioned in the mean time. For instance, in (21) we consider the referent Al in utterance (c) to be a backward center by virtue of its being mentioned two utterances back: (21) a. George and Al are brothers, but they have a totally different attitude to life. b. George spends all his spare time reading. c. Al goes out hiking every weekend with his son Peter. Table 5 shows that 70% of the pronoun utterances realize Continue Transitions (with AR as the backward center), while the proportion of Smooth Shifts (with PR being the backward center) is about 10%; thus the backward center hypothesis is clearly supported (Chi2= 32.44, df=1, p=.000 for the contrast between AR centers and PR centers). The proportion of AR-centers is much higher than the proportion of pronominal ARs (39%, see Table 3), which means that backward centers are not necessarily realized as pronouns. For a much larger corpus than ours, Poesio at al. (2004, 332 ff.) report that not more than about half of the CBs in their corpus is pronominalized. Hence backward center status seems a stronger predictor for AR status than expression type. At the same time, its predictive power is restricted by the fact that there is no clear single CB referent in 18% of the cases. CB configuration AR = CB(= Continue transition) PR = CB (= Smooth Shift transition) Both referents are equally salient Both referents are new Total Table 5. CB status for actual and potential referents Totals 50 (71%) 7 (10%) 3 (4%) 10 (14%) 70 There is no difference between the CB configurations of subject and non-subject ARs. That is, non-subject AR and subject AR-cases display the general preference for Continue transitions to the same extent. This means that transition type does not ‘compensate’ for the irregularity of non-subject continuations: it does not make it any easier to process these continuations. 3.3.4 Discourse topichood The global factor of discourse topichood has not received a conventional definition so far. Givòn (1992: 16-17) mentions a forward-looking measure, topic persistence (the number of times the referent persists as argument in a certain number of subsequent clauses), as well as the overall frequency of a referent in the entire discourse. For our purposes, we needed a backward looking measure that provides a global, but still locally relevant level of importance for a particular referent. We chose to count the number of occurrences of a referent in the 5 finite clauses preceding the AR-PR utterance. Hence our measure of topichood – to which we will refer as the Discourse Topic Score (DTS) - ranges from 0 to 5. An additional reason to restrict the relevant context to 5 clauses was that most newspaper texts (unlike narrative texts) do not continue to refer to central referents for more than one or two paragraphs. Chains of 10 or more references are hard to find in our corpus. 16 First, we determined whether both referents had been mentioned at all in the preceding text. We found that, prior to the AR-PR utterance, 67 % of the PRs had not been mentioned at all in the text, compared to only 17 % for the ARs (Chi2= 37.80, df=1, p=.000). ARs had a Discourse Topic Score of 2.27 (SD 1.74), compared to 0.79 (SD 1.36) for PRs. That is, our expectation was supported that ARs were significantly older than PRs (t=6.19, df=68, p=.000). For instance, the following fragment (from 1994) is clearly about the soccer player Vink, mentioned 4 times, not about his coach, who is only mentioned once: (22) D E (From a story about a Dutch soccer player whose career is going down after he left Ajax.) Uitgerust met een radio en een totoformulier slaat Marciano Vink (23) momenteel de wedstrijden van Genua gade. Hij is fit, maar (Ø) speelt niet. Trainer Scoglio van de Italiaanse nummer veertien denkt middenvelder Vink niet nodig te hebben. Zo goed als mogelijk is na een aantal succesvolle seizoenen, probeert hij Ajax zo min mogelijk te koesteren: `Voetbal is een leven van komen en gaan.' Equipped with a radio and a pool coupon Marciano Vink (23) is watching the matches of Genoa. He is in condition, but (Ø) does not play. Coach Scoglio of the Italian number 14 thinks he does not need midfielder Vink. Although this is difficult after a number of successful years, he tries not to cherish his memories to his Ajax years too much: “Soccer is a life of coming and going.” Nevertheless, there was a considerable number of cases in which the two referents were equally old (n =12), or in which the PR was even older (n =13). Hence the DTS of the referents does not entirely determine the pronominal referent. As an example consider (23), in which the AR only appears once, while the PR has been mentioned three times: (23) D E (Dietz has published a poetry anthology, the introduction of which has been criticized for quoting poems by Reve and Faverey without permission:) Volgens Dietz passen de citaten in de `context van de inleiding' en zijn ze gebruikt `om redenen van representativiteit', maar of het correct is, `daarover kan men van mening verschillen.' Dietz heeft nog geen protest gehoord van Reve, maar de directeur van De Bezige Bij, A. Vorster, heeft zich al wel bij hem gemeld. Hij wil een gesprek over het citeren van Faverey. According to Dietz the quotes ‘fit into the context of the introduction’ and have been used ‘for reasons of representativeness’, but one may disagree about whether their use has been entirely ‘correct’. Dietz did not receive any protests from Reve yet, but the Bezige Bij-director, A. Vorster, already contacted him. He demanded a meeting on the quotes from Faverey. Let us now look at the possible interaction of DTS with grammatical role. When the two factors would complement each other, we might expect that non-subject ARs need to be “extra heavy” in terms of global prominence. Hence, in cases with a nonsubject AR, the difference in DTS between the candidate referents should be larger than in cases with a subject AR. To test this possibility, a repeated-measures ANOVA 17 was carried out on DTS as the dependent variable with referent status (actual or potential) as within-variable and the role of the AR as the between-variable. The results are shown in Table 6. DTS for AR DTS for PR Subject ARcases 1.98 (1.62) .87 (1.45) Non-subject ARcases 3.18 (1.81) .53 (1.01) Totals 2.27 (1.74) .79 (1.36) Table 6. The interaction of grammatical role of AR (Subject or not) and discourse topic score (DTS) for AR and PR The main effect of referent status (F [1,67] = 39.76, p = .000) on DTS is qualified by a significant interaction between referent status and grammatical role of the AR ((F [1,67] = 6.62, p = .012): the difference in DTS between actual and potential referents is larger for non-subject referents than it is for subject referents, mainly because the DTS for non-subject ARs is higher than that for subject AR (t = 2.57 df = 68 p = .012). In other words, when a non-subject becomes the AR, it is exceptionally well established in the previous discourse. We can look at the same phenomenon from a different point of view by comparing DT scores within cases. In Table 7 we distinguish between three kinds of cases: the AR is ‘older’ than the PR, the PR is older than the AR, or the two are equally old. The frequencies show that non-subject ARs are almost always older than their competitors (88%), while for subject ARs only 57% is older (Chi2= 5.61, df=1, p=.018 for this 2x2 comparison). Alternatively, one may say that PR may be older for subject AR-cases (24%), while this is impossible for non-subject AR-cases (Chi2= 5.12, df=1, p=.024 for this comparison); in other words, a subject referent can beat an older competitor in terms of forward prominence, a non-subject referent cannot. AR is older AR and PR equally old PR is older Totals Subject ARcases 30 (57%) 10 (19%) 13 (24%) 53 Non-subject ARcases 15 (88%) 2 (12%) 0 17 Totals 45 12 13 70 Table 7. DTS-configurations within fragments for subject and non-subject AR-cases These results show that non-subject ARs are well established as the discourse topic in the 5 clauses preceding the AR-PR utterance. This indicates that grammatical factors and discourse topichood may work together in suggesting plausible antecedents for ambiguous pronouns. ARs are either prominent because they are the subject of the preceding utterance (grammatical role), or because they are mentioned more often in the preceding discourse (discourse topichood). The resulting parsing principle could be this: in case of an ambiguous pronoun, the intended referent is the subject referent of the preceding utterance, except when another referent is well established as the Discourse Topic; in that case, the situation is undecided. 3.3.5 Do the predictors contribute independently? We have now discussed four predictors for AR-status: grammatical role, expression type, backward center status and DTS. These variables are significantly correlated: for instance, subjects are more often CB (r = .39, p < .01), pronouns are more often CB (r = .49, p < .01), and CB referents have a higher DTS (r = .50, p < .01). We now need to determine whether the variables contribute independently, or whether the 18 contribution of one variable is cancelled out by the other. That’s why we used a statistical method to compare several predictors in one and the same analysis: logistic regression (see Van Hout & Rietveld 1993 for an introduction). This is a technique for estimating to what extent the likelihood of occurrence of a certain event (in this case: being the intended referent of the ambiguous pronoun) is affected by an independent variable. Variables may be combined into models; different models including different sets of predictors may also be compared to others in terms of their overall performance. We determined the optimal model by successively adding new predictors to the model, starting with a model that only includes a constant. First we added grammatical role, then expression type, then backward center status and finally DTS. We used two criteria to accept or reject new predictors: - the model performance, as measured by the model likelihood ratio, has to improve significantly when adding a new predictor - the new predictor itself needs to be significant at the .05 level, as indicated by the Wald statistic of its B coefficient. If these criteria are not met, a predictor is dropped from the model. In this case, all predictors except for expression type were needed for the optimal model (see Table 8). That is, grammatical role, CB-status and Discourse Topic Score independently contribute to the chance of resumption as a referent. The model with three variables explains a fair share of the variance, as indicated by its Nagelkerke R square. Variables Constant Grammatical role (subject yes/no) Backward center status (yes/no) Discourse Topic Score (0-5) Model log-likelihood mod (df for the model) Log-likelihood improvement (df for the comparison) Nagelkerke R square Constant model BWald coefficient (S.E.) .000 (.169) .000 pvalue 1.000 Optimal model BWald coefficient (S.E.) -2.98 (.55) 29.21 3.14 (.60) 27.29 pvalue .000 .000 2.26 (.60) 14.07 .000 .52 (.18) 8.83 .003 194.08 (1) 94.45 (4) .000 99.63 (3) .679 Table 8. Logistic regression analysis of factors predicting whether a referent is the intended referent of the pronoun in the next utterance In sum, this first study provides support for local as well as contextual determinants of forward prominence. With regard to local factors, we found a subject preference; with regard to the local context, we found a preference for Continue transitions. The last finding is newsworthy by itself, given the skepticism about the role of transition preferences in pronoun resolution voiced by Kehler (1997) and Tetreault (2001). Furthermore, it is clear that the global contextual factor of discourse topichood plays a significant role too. Especially interesting is its interaction with grammatical role, which points to a complementary relation between a local factor and a global contextual factor. 19 In the final logistic regression analysis, we found no effects for expression type; the crucial factor is probably backward center status, not expression type by itself. This result seems to favor the standard centering theory preference rules for transitions over the expression type hierarchy proposed by Kameyama (1996). 4. A study on continuations following subject-object utterances 4.1 Why we used a continuation corpus The size of our corpus in the first study was modest. Moreover, the situation of ambiguous pronouns is rather special in that speakers using such pronouns seem to be confident that one of the referents in the AR-PR utterance is clearly more prominent than the other. In order to examine the determinants of forward prominence in a larger corpus of utterances in which the referential prominence configuration seems less clear-cut, we assembled a corpus of utterances presenting two personal referents, one of them in subject and the other in object position, at least one of which is referred to in subsequent discourse. The general assumption behind this study was already discussed in the introduction: writers and readers need to coordinate their individual models of referent accessibility in the discourse, so that readers are prepared for what comes next in the discourse. We would expect then that the actual continuation of the discourse can, to a degree, be predicted on the basis of characteristics of the discourse so far. This reasoning is not only relevant for processing ambiguous pronouns, but also applies to every discourse situation in which several referents have been introduced. Now, in this larger corpus, we may distinguish between two aspects of continuation. The first question is whether a referent reappears or not, and the second one is in what expression type the referent reappears. The first rule in centering theory states that, when subsequent utterances contain pronouns, CB will be realized by one of them; this is meant to capture the intuition that pronominalization is a regular way to indicate backward saliency of a discourse referent. 4.2 Description of the corpus We gathered a new corpus of 200 newspaper fragments, again from De Volkskrant, which consists of two parts: 1. The first 100 fragments were collected on the basis of (indirect) object pronouns: the central utterance contained an occurrence of him and a subject constituent referring to an identifiable, single third person; 2. Another 100 fragments were collected on the basis of subject pronouns: the central utterance contained he and an (indirect) object constituent referring to an identifiable, single third person. This sampling procedure was chosen for two reasons. First, utterances containing two single human referents are not very common, and searching for personal pronouns is a way to computerize at least part of the sampling process. Second and importantly, using both subject and object pronouns as search string promises a corpus in which subject and object referents are comparable in terms of expression type, a crucial requirement for our research in which we hope to disentangle different determinants of forward prominence. 20 We restrict our sampling to human referents in (in)direct object position; in Dutch, indirect objects can generally be realized without prepositions. That is, we exclude person referents in prepositional phrases containing arguments of the clause predicate (sometimes called ‘object-of-PP’). It may be that object participants are more prominent than the PRs in the first study, most of which were in prepositional phrases. Furthermore, the following conditions were used when selecting corpus fragments: • the central utterance was not a subordinate clause, because the first corpus study showed (section 3.2) that the prominence of referents in a subordinate clause is ‘down-graded’ in comparison to those in the main clause; • in order to determine the relative prominence of both referents in the preceding text, the central utterance had to be preceded by minimally five clauses in which both referents could appear; • in order to determine the relative prominence of both referents by their reoccurrence, only those fragments were selected in which at least one of the referents re-appears. Fragment (24) illustrates an object pronoun fragment, (25) illustrates a subject pronoun fragment. CU marks the central utterance. (24) (At the World Cup swimming tournament, the Russian Popov has set a new world record for the 100 meters free style. D [1] Een Nederlands succes was er voor Ron Dekker [2] die zegevierde op zowel de 50 als 100 meter schoolslag. [3] Popov ontbrak onlangs bij de eerste wereldkampioenschappen op de 25 meter- baan, gehouden in Palma de Mallorca. [4] Hij bereidde zich in Australië voor op het nieuwe seizoen. [5] Vorig jaar werd zijn `oude' Russische trainer Toeretski ingehuurd door het Australische Sportinstituut van Canberra CU: en Popov volgde hem weldra. De tweevoudig Olympisch kampioen (50 en 100 vrij) was in het Kowloon Park-bad van Hongkong elf honderdste seconde sneller dan de Braziliaan Gustavo Borges vorig jaar juli in Santos (47,94). E [1] A Dutch succes was the performance of Ron Dekker [2] who won both the 50 and the 100 meter breast stroke. [3] Popov was recently absent at the first world championships on a 25 meter-track, held in Palma de Mallorca. [4] He prepared himself for the new season in Australia. [5] Last year his ‘old’ Russian coach Toeretski was hired by the Australian Sport Institute of Canberra CU: and Popov followed him soon. In the Kowloon Park pool of Hongkong, the twofold Olympic champion (50 and 100 free style) was faster by eleven hundreds of a second than the Brasilian Gustavo Borges last year in Santos (47,94). (25) D (It is their first, stammering telephone call.) [1] Zij heeft een 06-advertentie gezet, [2] hij reageert. [3] Gaandeweg ontstaat er contact. [4] Met niet meer dan een stem bouwen ze een relatie op via de telefoon. [5] De spelregels liggen vast. CU: Hij belt haar nooit, zij belt hem. Zo krijgen ze een verhouding die alle fases doorloopt. 21 E [1] She has placed a 0900-ad in the paper, [2] he responds. [3] Gradually a connection between them evolves. [4] With no more than their voices, they build a relationship by phone. [5] The rules of the game are set. CU: He (subject pronoun) never calls her (object), she calls him. In this way, they develop an affair that passes through all phases. As these fragments show, the central utterance may be coordinated with preceding or following clauses. In the central utterance of some of these coordinations, the subject is zero coded (objects cannot be elided in Dutch in this context). We decided to include these fragments because ellipsis might be a characteristic feature of the subject role that directly touches on the issue of forward activation that is under scrutiny here. Apart from ellipsis, the subjects and objects were realized in more or less the same expression types in the corpus: Subject Object referents referents Ellipsis 8 Pronoun 122 122 Names 54 53 Full NP 16 25 Total 200 200 Table 8. Subject and object expression types in the continuation corpus We need to note that our sampling method led to a bias in the expression type configurations, since all our central utterances contained at least one pronoun. That is, our sample does not include fragments with central utterances in which both referents are realized as names or full NPs (see table 9). object ET: Pronoun Name Full NP subject ET: Ellipsis 8 Pronoun 44 53 25 Name 54 Full NP 16 Total 122 53 25 Table 9. How subject and object expression types are combined Total 8 122 55 16 200 Subject referents and object referents do not differ in their discourse topic scores (M=2.08, (SD 1.57) for subject referents, M=2.03 (SD 1.58) for object referents). The diversity of predicate types that we found in the corpus is such that we can conclude that there is no particular thematic role bias in the corpus. The analysis proceeded as follows. Starting from the target-utterance, we checked for each fragment whether one or both referents reappeared, and if so, whether they were both mentioned in the same utterance, and in what grammatical role and what kind of expression type. In addition, we looked at the discourse topic score, that is the number of times a referent was mentioned in the 5 preceding clauses. For instance, in (24) 22 only the subject referent recurs in the first continuation utterance, in subject position and in NP-form. The discourse topic scores are 3 for the subject referent (Popov) and 1 for the object referent (his trainer). In (25), both referents recur as pronouns in the first continuation utterance. The topic score is 2 for both referents, counting the plural they in utterance [4] as a reference for both referents. 4.3 Results 1: Does the referent reappear in subsequent text? 4.3.1 Role effects First we discuss whether grammatical role affects the probability of being mentioned again in subsequent utterances. This continued reference can be considered in several ways. • First, we may ask absolute questions for both referents in isolation: does the referent recur somewhere in the rest of the text or not, and if yes, in what utterance, what role and what form? • Second, we may ask relative questions regarding continuation configurations concerning the two referents in a particular fragment. For instance, how often do both referents recur and how often does the subject referent recur while the object referent does not? And is the subject referent more often the first to be referred to again (the object referent only being mentioned in a later utterance). Let us take a first look at the different possibilities of continuation. Departing from the central utterance, we will define the ‘next utterance’ as the first subsequent clause presenting either the subject or object referent again. Most often this is the immediately following utterance. The next utterance may only present the subject participant, the subject and object participant together, or may present only the object participant. A ‘subject-only’ utterance may be followed by a later utterance reintroducing the object participant, or this participant may not return at all; the same holds, mutatis mutandis, for an ‘object-only’ utterance. Figure 1 presents the numbers of observations in these categories: central utterance: subject + object only SR in next clause n = 71 OR returns later n = 19 SR + OR in next clause n = 54 OR does not return n = 52 only OR in next clause n = 75 SR returns later n = 38 SR does not return n = 37 Figure 1. Overview of the options for continuation following the central utterance. SR = subject referent, OR = object referent We can now compare the frequencies for subject and object participants. As for the absolute continuation issue, both referents reappear about equally often, see Table 10; neither is there any difference in the position at which the referents are resumed: both subject and object referents overwhelmingly reappear in the first or second utterance following the central utterance, see Table 11. 23 Subject continued? Yes No Totals Object continued? Yes 111 37 148 (74%) No 52 0 52 (26%) Totals 163 (81.5%) 37 (18.5%) 200 Table 10. Whether subject and object referents return in subsequent discourse Subject referent Object referent Resumed in utterance 1 100 106 Resumed in utterance 2 38 25 Resumed in subsequent 25 23 utterances Totals 163 148 Table 11. In what utterance subject and object referents are being referred to again 4.3.2 Effects of expression type, backward center status and discourse topichood Subject and object referent reappear about equally often in subsequent text. What other factors co-determine their reappearances? As in the previous study, we examined three such factors: expression type, backward center status and discourse topichood. Again, discourse topichood was defined in terms of the number of times a referent has been mentioned in the 5 clauses preceding the central utterance: the Discourse Topic Score (DTS). Expression type was dichotomized into two values: pronoun (coded 1) or other (coded 0). Backward center status was defined by asking whether a referent was the most recently mentioned one in the preceding utterances, or the most prominent one in case both referents came from the same utterance. For instance, when the object referent had been mentioned in utterance –1 and the subject referent had been mentioned in utterance –2 or had not been mentioned in the pretext at all, the object referent was coded as being the backward center (coded 1) of the central utterance and the subject referent was coded as a non-backward center (coded 0). When the two referents occurred in the same preceding utterance, Centering Theory requires the analyst to identify the highest ranked referent which returns in the central utterance, with the additional rule that when the central utterance contains pronouns, one of them must be the CB. Generally, this leads to assigning CB status to the subject referent of the previous utterance. For a few cases, the centering rules do not produce a CB, e.g. when none of the participants is the previous subject referent and when both are pronominalized in the central utterance. Besides the conventional CB identification procedure we also tested a second, less detailed measure of backward saliency. This measure only considers a referent as salient when it is mentioned in a more recent utterance than its competitor. It does not discriminate between referents from the same previous utterance. This crude recency measure is tested too, in order to see whether the more specific notion of CB is a stronger predictor. We investigated how these independent variables affect two aspects of referential continuation: does a referent reappear at all (absolute continuation), and is it the first or the only one to be referred to again (relative continuation). As expected, the independent variables correlated significantly. For instance, pronominal referents tend to have a higher DTS (r = .57) and they refer to the most 24 recently mentioned participants more often (r = .50). Backward centers tend to have higher DTS scores (r = .32; for all correlations p < .01). In a logistic regression analysis, we added the local variables grammatical role and expression type, then the backward center and finally the global factor DTS. If two models are superior to the others but do not differ significantly in performance, we will report the model with the lowest log-likelihood. The analysis was run separately for subject and object referents, since they may be affected in different ways by the additional determinants of prominence. Moreover, this offers the possibility of not only using the characteristics of the referent itself as predictors, but also those of the competitor. Let us first turn to the question whether or not a referent returns in subsequent discourse. In general, we may say that already being established in both the local and global context increases the probability of reoccurrence of a referent. For object referents, the model containing CB status and DTS was clearly superior to the rest (see Table 14). For subject referents, we ended up with two models that did not differ significantly: a model containing expression type and DTS and a model with CB status and DTS. Since so far CB status proved a stronger predictor than expression type, we consider the last model the optimal one for theoretical reasons. That’s why this is the model reported in Table 13. SUBJECT REFERENT, ABSOLUTE CONTINUATION Independent variables Constant Backward center (yes/no) Discourse Topic Score (0-5) Model log-likelihood (df for the model) Log-likelihood improvement (df for the comparison) Nagelkerke R square Constant model Bcoefficient (S.E.) 1.48 (.18) Optimal model Wald pvalue 66.31 .000 Bcoefficient (S.E) -.17 (.28) 2.03 (.60) .65 (.19) Wald pvalue .36 11.64 12.09 .548 .001 .001 191.56 (1) 135.26 (3) .000 56.30 (2) .398 Table 13. Logistic regression analysis for absolute continuation of subject referents OBJECT REFERENT, ABSOLUTE CONTINUATION Independent variables Constant Backward center (yes/no) Discourse Topic Score (0-5) Model log-likelihood mod (df for the model) Log-likelihood improvement (df for the comparison) Nagelkerke R square Constant model Bcoefficient (S.E.) 1.05 (.16) Optimal model Wald pvalue 42.10 .000 Bcoefficient (S.E.) .05 (.25) .97 (.40) .37 (.13) Wald pvalue .05 5.79 8.70 .834 .016 .003 229.22 (1) 204.63 (3) .000 24.59 (2) .170 Table 14. Logistic regression analysis for absolute continuation of object referents CB status turns out to be a better predictor here than recency alone. The influence of CB status is illustrated in Table 15 and 16, showing that CB referents are pretty sure to be referred to again, while only two out of every three non-CB referents return in 25 subsequent discourse. CB status does not differentiate between subject and object referents. SR is not CB SR is CB SR not continued 33 (37%) 4 (4%) SR continued 56 (63%) 107 (96%) Totals 89 111 Table 15. How CB status affects continuation of subject referents Totals 37 163 200 OR is not CB OR is CB OR not continued 41 (36%) 11 (13%) OR continued 72 (64%) 76 (87%) Totals 113 87 Table 16. How CB status affects continuation of object referents Totals 52 148 200 We need to note that this effect of CB status does not yet constitute empirical support for the Centering Theoretical claim that Continue transitions are preferred; such support can only be derived from results regarding the second dimension of continuation: is a referent the first or the only one to be referred to again? As is shown in the Tables 17 and 18, the predictors for this aspect of continuation cannot be found in characteristics of the referent itself; instead, the strength of the competitor is decisive. For subject referents, the optimal model contained object backward center status and object DTS (see Table 17). For object referents, the picture is slightly more complicated. Three models were equally adequate in statistical terms: all models contained subject DTS, but the second variable was either expression type, backward recency or backward center status for the subject referent. We report the last model here, because it fits best with other findings (see Table 18). SUBJECT REFERENT, RELATIVE CONTINUATION Independent variables Constant Object backward center status Object referent DTS (0-5) Model log-likelihood mod (df for the model) Log-likelihood improvement (df for the comparison) Nagelkerke R square Constant model Bcoefficien t (S.E.) -.60 Optimal model Wald pvalue 16.33 .000 Bcoefficient (S.E.) .50 (.25) -.77 (.35) -.43 (.12) Wald pvalue 3.88 4.84 13.89 .049 .028 .000 260.20 (1) 229.72 (2) .000 30.48 (1) .194 Table 17. Logistic regression analysis of factors predicting whether subject referents will be referred to again earlier than object referents OBJECT REFERENT, RELATIVE CONTINUATION Independent variables Constant Subject backward center status Subject referent DTS (0-5) Constant model Bcoefficien t (S.E.) -.55 Optimal model Wald pvalue 14.21 .000 Bcoefficient (S.E.) .62 (.27) -.98 (.34) -.35 (.11) Wald pvalue 5.47 8.33 9.28 .019 .004 .002 26 Model log-likelihood mod (df for the model) Log-likelihood improvement (df for the comparison) Nagelkerke R square 262.50 (1) 231.37 (3) .000 31.13 (2) .197 Table 18. Logistic regression analysis of factors predicting whether object referents will be referred to again earlier than subject referents Table 19 summarizes the four analyses presented so far, showing that the optimal models invariably contain DTS as a global factor and a factor reflecting the local context as well. Backward center status is the most reliable local predictor, but it may be replaced by expression type or recency in some cases: Continuation variable SR reappears OR reappears SR the first to reappear OR the first to reappear The optimal model contains one of the local contextual factors: Expression Recency Backward type center status + + + And a single global factor: Discourse topic score + + + + + + + + Table 19. Summary of logistic regression analyses in Study 2. The most consistent predictor emerging from all this is DTS. Table 20 and 21 illustrate how the probabilities for continuation vary for different DTS values. The subject referent DTS increases the chance of subject referent continuation and decreases the chance that the object referent will reappear first (Table 17); object referent DTS, conversely, increases the chance of subject referent continuation and decreases the chance that the object referent will reappear first. Subject referent continued at all? Object referent the first to appear next? DTS Subject referent 0 (n=41) .49 .63 1 (n=45) .78 .42 2 (n=32) .88 .40 3 (n=32) 1.00 .13 4 (n=39) .95 .26 5 (n=11) 1.00 .09 Table 20. How subject referent DTS affects the continuation probabilities Object referent continued at all? DTS Object referent 0 (n=48) 1 (n=33) 2 (n=40) 3 (n=33) .52 .70 .85 .73 Subject referent the first to appear next? .63 .45 .20 .33 27 4 (n=36) .89 .19 5 (n=10) 1.00 .00 Table 21. How object referent DTS affects the continuation probabilities Let us finally illustrate the DTS effect by comparing the subject referent and object referent DTS for different kinds of continuations: SR DTS OR DTS Significance Subject and object referent 2.44 (1.57) 2.29 (1.36) n.s. reappear in the same utterance Only the subject referent 2.52 (1.49) 1.30 (1.40) t = 4.32, df = 70, p = .000 reappears in next utterance Only the object referent 1.40 (1.44) 2.58 (1.61) t = 4.03, df = 72, p = .000 reappears in next utterance Table 22. DTS configurations for three kinds of continuations The most remarkable finding in Table 22 is that combined continuations occur when both referents are equally highly activated. This may also help us to understand why DTS of a referent predicts the failure of the competitor in terms of relative continuation: when referent A has a relatively high DTS, this does not mean it will be the first or only one to reappear. It may also reappear together with referent B in the next relevant utterance. But given a strong A referent, the chances that referent B will be the first or only one to appear in the next utterance are quite low. In sum, whether or not a referent will reappear is not determined by its grammatical role, but instead by contextual factors: both its entrenchment in the local context is relevant, alternatively indicated by its expression type, CB status or its recency, and its being the discourse topic, indicated by its DTS. Whether it will be the first or the only one of the competitors to reappear is primarily determined by the contextual prominence of the competitor. There was no evidence for a preference for Continue transitions: the CB referent was not the first or only one to reappear in subsequent discourse. Another finding of interest is that the determinants of prominence are virtually identical for subject and object referents. In the first study, non-subject referents were found to be more sensitive to contextual constraints than subject referents, but in the second study there was no such interaction. 4.4 Results 2: In what form do subject and object referents reappear? So far, this study found no evidence for a grammatical role effect on referential continuation. But perhaps such an effect must not be sought by looking at the reoccurrence of a referent, but rather at the form of the next referent. We will concentrate on the expression type used on subsequent mention. Using various methods, Centering theorists (Gordon et al. 1993, Brennan 1995, Kennison & Gordon 1997) have shown that subject referents – which are usually more prominent in the discourse representation built so far – are preferably continued by pronouns while this is not the case for object referents. When testing this in our data, we need to control for the expression type used in the central utterance and for the distance between first and second mention, since both factors may affect the expression type of 28 the second reference. Moreover, we need to realize that subject referents may reappear as zeroes in the next utterance (ellipsis), while object referents cannot. First, we compared subject and object pronoun referents on the expression type in the subsequent reference, only taking into account references in the immediately following utterance: ET in subsequent utterance Zero Pronoun Name or NP Subject pronoun Subject pronoun, excluding zeroes Object pronoun 16 (21%) 44 (59%) 44 (76%) 46 (59%) 15 (20%) 15 (24%) 32 (41%) 75 59 78 Table 23. Form of continuations for subject and object pronouns When zero continuations are ignored, the difference in Table 23 is only marginally significant (Chi2 = 3.63, df = 1, p = .057); when they are counted as pronouns, there is no doubt that subject referent continuations are more often reduced in form than object referent continuations are (Chi2 = 7.94, df = 1, p = .005). Next we turn to the continuation forms of referents of names and NP’s in both grammatical roles. Here, it makes no difference whether ellipsis is included (there is no ellipsis following nominal referents). Table 24 shows that subsequent references to nominal subject referents are more often pronominal than subsequent referents to nominal object referents (Chi2 = 3.88, df = 1, p = .049). Subject Object name or NP name or NP 16 (67%) 11 (39%) 8 (33%) 17 (61%) 24 28 Table 24. Form of continuations for subject and object pronouns ET in utterance 1 Pronoun Name or NP In order to understand what may be going on here, it is useful to distinguish between two kinds of cases: • cases in which both subject and object referent reappear in the next utterance • cases in which only the subject or object referent reappears in the next utterance In combined continuations, we found differences between the form in which subject and object referents are resumed. Both kinds of referent tended to be pronominalized, sometimes even in the form of joint references to both participants (plural ze, ‘they’). Recall that in these cases, both referents tend to be well-established in prior discourse (Table 21). We also compared subject-only and object-only continuations in the next utterance. Note that for subject continuations, the resumed referent was the new subject in 92%; but resumed object referents were also predominantly in subject position in the next utterance (84%). The forms of continuations are given in Table 25. Subjects are more often continued as pronouns, both when we ignore zero realizations (Chi2 = 4.21, df = 1, p = .040) and when we include them as pronouns (Chi2 = 8.61, df = 1, p = .003). 29 Subject-only Subject-only, Objectexcluding zeroes only Total ET in utterance 1 10 (20%) 10 Zero 26 (52%) 26 (65%) 25 (43%) 51 Pronoun 14 (28%) 14 (35%) 32 (57%) 45 Name or NP 50 40 57 107 Total Table 25. Form of continuations for subject and object referents in subject-only and object-only continuations, excluding subject ellipsis We also analyzed how other factors co-determine the form of repeated reference to subject and object referents in the next utterance, but neither expression type, nor backward center status or discourse topic score had any effects. How do we explain the tendency that object referents are less often resumed as pronouns? Compare the following cases of nominal resumptions of object referents: (26) D Het verhaal over de oude Gepetto die een pop snijdt uit een levend stuk hout behoort nog altijd tot de meest vertaalde boeken ter wereld. Hij maakt zich een zoon, maar het joch is opstandig en wil van zijn vader af. E The story of the old Gepetto, who cuts a puppet out of a living piece of wood, is still one of the most often translated books in world literature. He makes himself a son, but the boy is rebellious and wants to get rid of his father. (27) D Uiteindelijk zwichtte Efron, en (Ø) bood Abraham het veld en de spelonk voor de exorbitante prijs van 400 sikkelen zilver. Zonder af te dingen, betaalde Abraham hem deze prijs voor het stuk grond. E Eventually, Ephron gave in, and (Ø) offered Abraham the field and the cavern for the exorbitant price of 400 sickles of silver. Without bargaining, Abraham paid him this price for the piece of land. (28) (Chairman Zoff of football club Lazio has fired the coach, Zeman.) D Zoff (54) was lange tijd doelman van Juventus en keeper van het nationale team. De preses verdedigde het doel in 1982 toen Italië de wereldtitel pakte. Hij was aanvoerder van dat team. Zoff, die ook Juventus trainde, coachte Lazio al eens, tussen 1990 en 1994. Zeman volgde hem toen op, Zoff werd voorzitter. E Zoff (54) has been the goalkeeper of Juventus and of the national team for a long time. The chairman defended the goal in 1982 when Italy won the world championship. He was the captain of that team. Zoff, who also trained Juventus, has already been the coach of Lazio once, between 1990 and 1994. Zeman then succeeded him, Zoff got to be chairman. In these cases it would probably be confusing to use subject pronouns in the continuation utterance, since they could easily be taken as referring to the previous subject referent. This applies not only to the pronominal or elided subjects in (26) and (27), but also to the nominal subject in (28). This last example is also noteworthy because of the prominent status of the object referent (Zoff) in the preceding sentences. 30 We might ask whether this tendency is not simply a parallelism phenomenon, in the sense that subject pronouns prefererably refer to subject antecedents. In that case, object pronouns would behave differently. In the continuation utterances, there are not enough object pronouns to test this assumption. But we may use the central utterances in this corpus for this purpose, since half of them contain object pronouns. We looked at all the object referents in central utterances that were also referred to in the immediately preceding utterance. Many of them were realized as pronouns in the central utterance, but some were names or NPs. It turns out that the subject referents of the previous utterances, when returning as object referents in the central utterance, are somewhat more often pronominalized than object referents of the previous utterances (see Table 26, Chi2 = 3.96, df = 1, p = .047). This generalizes the finding that subject referents are more easily pronominalized in the next utterance. Subject Object Total antecedent antecedent ET of object referent in utterance in utterance in central utterance -1 -1 64 (90%) 32 (78%) 96 Pronoun 6 (10%) 9 (22%) 15 Name or NP 70 41 111 Total Table 26. Form of continuations for subject and object referents from previous utterances returning as object referents in the central utterance In sum, the subject-only and object-only continuations support the Centering Theory claim that subjects are more easily continued as pronouns, even when zero form continuations are not counted as pronouns. For the combined continuations, however, no reliable differences were found. This suggests that there may be a situation of high activation for two referents (see also Table 22 above), which is at odds with the centering assumption that there is only one backward center. 5 Conclusions and discussion Our findings are summarized in Table 27. STUDY 1 STUDY 2 Continuation ‘being the Absolute Relative Form of variables referent of an continuation continuation continuation ambiguous Predictors pronoun’ Grammatical role + + Expression type Backward center + + + status Discourse + + + topichood Table 27. Summary of findings in the two studies. +/- = has / has no independent effect on the continuation variable in question In our first study, we found that being realized as a subject, being the backward center and being a discourse topic increases the chance that a referent is the intended 31 antecedent of an ambiguous personal pronoun. However, the dependent variable in the first study was complex, in that it combined several aspects of referential continuation. In the second study, we have been able to examine three of these aspects separately: absolute continuation (is a referent taken up again in subsequent discourse at all), relative continuation (is it the first or only one of the competing referents to be resumed) and form of continuation (is it pronominalized on subsequent mention). The three variables that were significant in the first study, were shown to operate on different continuation aspects in the second study. Grammatical role only affects the form of continuation. Backward center status increases the chances for a referent of being referred to again (absolute continuation), while it also decreases the chances for the competing referent of being the first or only one to be referred to again (relative continuation). Discourse topichood has the same double effect. Of course, this adverse effect on the competitor’s chances is only relevant in situations of referential competition. Expression type turns out to be irrelevant to continuation in the first study, but in the second study it could replace backward center status in the optimal model in two out of four cases. We conclude that Expression type is best seen as a symptom of backward center status or discourse topichood, because it has no independent explanatory force. This brings us to the three main theoretical implications of our findings. The first thing to note is that studies of forward prominence of discourse referents need to examine both local and contextual factors. Grammatical role is clearly local, backward center status applies to the immediate context, and discourse topichood pertains to the global context. The second point to make is that when discussing forward prominence, we need to distinguish between the form and the content of continuation. This distinction is missing from the current conceptualization of a forward center as embodying a prediction regarding the backward center of the following utterance. For instance, the grammatical role can be said to mark the preferred forward center only in the sense of signaling which referent can most easily be taken up in pronominal form, or even in zero form. It cannot be said to indicate which referent is probable to reappear in the text. This brings us to the third point, that incorporates the first two points: there seems to be a division of labor between local and contexual determinants, in that the local feature of grammatical role primarily affects the form of continuation, while the contextual features backward center status and discourse topichood help predict the content of continuation. Given that a stretch of discourse often contains continuous reference to the same entity, it makes sense that the linguistic structure of utterances enables certain referents to be taken up efficiently in the next utterance, i.e. by zero anaphora and unstressed pronouns. These devices have been dubbed ‘minimal-gap devices’ by Givón (1992: 21), meaning that they (almost) only apply to close-by referents. We may add now that these devices primarily apply to subject referents. This is evidently the case for zero anaphora in most languages; pronouns are not entirely restricted to subject antecedents, but subject referents are certainly more often pronominalized than referents in other syntactic positions. It seems that subject status enables a referent to be treated as highly activated, at least on the short term of the upcoming utterance. But all this concerns the form of continuation. Whether the potential topic participants actually reappear is a different matter. Not all subject participants are highly likely to remain in the center of attention. In our data, this probability was related to contextual factors, especially to whether the participant had already been referred to repeatedly in prior discourse. 32 Our findings in the second study contradict those of Givón (1992: 21) and Arnold (2001), who report that subject referents tend to be more ‘persistent’ than other referents. The difference seems to be due to the different corpora used. It is very likely that in ‘ordinary’ discourse with only one topical participant, subject status coincides with discourse topic status; that is, local and global determinants do not vary independently. But in this paper we separated these factors by focusing on situations of referential competition, in which object referents matched subject referents in terms of contextual prominence. While in this environments subject status still constrains the form of continuation, it looses its potential of marking who is the most probable protagonist of the next utterance. Put differently, when two (more or less) prominent participants are around, one of them may appear in object position without thereby implying that it will be less persistent in the following discourse. It might well be that only object participants are such ‘tough’ competitors, not referents in other syntactic positions such as oblique constituents; our first study suggests that referents in prepositional phrases are generally discarded as possible referents for ambiguous pronouns. In section 2 we noted that Centering Theory offers several predictions regarding the forward prominence of ‘older’ non-subject referents. The standard framework does not decide between the subject preference and the backward center preference, and Kameyama (1998) explicitly predicts the possibility of an indeterminacy resulting from the conflict between the grammatical role and expression type hierarchies. Our study does not offer support for expression type effects on forward prominence. With regard to subject and backward center status, it suggests that both factors are relevant, but have a different function: while subject status constrains the form of repeated references, backward center status may help predict what participant will be still referred to at all. Besides backward center status, our studies provide strong support for the more global contextual factor of discourse topichood, which is absent from the centering framework, although its relevance has been known from psycholinguistic work (e.g. Garrod et al. 1994). Our first study showed that it contributed independently to the prediction of the intended pronoun antecedent; moreover, nonsubject antecedents tended to be especially strong discourse topics. The second study indicated how the discourse topic status of a referent increases the probability of it being referred to again and decreases the chances for the competitors to be the first or only one to reappear. In this study, we did not find an interaction between grammatical role and DTS: whereas the non-subject antecedents in Study 1 needed the help of a considerable DTS, this was not the case for the object referents in Study 2. Discourse topichood seems an autonomous factor here. The motivation behind both contextual factors seems to be the presumption of topic continuity, that may overrule the local prominence provided by subject status. Discourse clearly needs more global predictors for prominence assignment, given that it is impossible to keep the main protagonist in subject position all the time. This is especially true when other prominent participants are around: in the second study, 64% of the object referents that were mentioned in the previous utterance filled the subject position there. A last theoretical comment concerns the Centering constraint that utterances contain only one backward center, or put more generally, the assumption that discourse tends to have a single topical participant. Our second study has shown that two participants may be jointly salient as well. In these circumstances, they tend to reappear together in the same utterance, often in the same expression type. It seems 33 that the CB-uniqueness constraint needs to be relaxed (see Gundel 1998 and Poesio et al. 2004 for further discussion of this issue). In conclusion, we would hesitate to say that our statistical predictors ‘mark’ forward referential prominence in any straightforward sense. The performance of the logistic regression models in the second study was generally quite modest, compared to the model proposed in the first study. There is a considerable amount of unexplained variance here, which is more than just indeterminacy produced by conflicting signals. Nevertheless, the corpora used in both our studies clearly show recognizable referential patterns, and it seems safe to hypothesize that both local and global linguistic factors affect language users’ expectations about what discourse referent will reappear and in what form. References Ariel, Mira (1990). Accessing Noun Phrase Antecedents. London: Routledge. Ariel, Mira (2001). Accessibility theory: An overview. In: Ted Sanders, Joost Schilperoord & Wilbert Spooren (eds.), Text representation: Linguistic and Psycholinguistic aspects, Amsterdam etc.: Benjamins, 29-88. Arnold, Jennifer E. (1998). Reference form and discourse pattern. Stanford, CA:Stanford University Dissertation. Arnold, Jennifer E. (2001). The effect of thematic roles on pronoun use and frequency of reference continuation. Discourse Processes 31 (2), 137-162. Au, Terry Kit-Fong (1986). A verb is worth thousand words: the causes and consequences of interpersonal events implicit in language. Journal of Memory and Language 25 (1), 104-122. Brennan, Susan E. (1995). Centering attention in discourse. Language and Cognitive Processes 10 (2), 137-167. Brennan, Susan E., Marilyn W. Friedman & Charles J. Pollard (1987). A centering approach to pronouns. In Proceedings of the 25th Annual Meeting of the Association for Computational Linguistics, Stanford, CA, 155-162. Chafe, Wallace L. (1994). Discourse, consciousness, and time. The flow and displacement of conscious experience in speaking and writing. Chicago: Chicago University Press. Chambers, Craig G. & Ron Smyth (1998). Structural parallelism and discourse coherence: a test of Centering Theory. Journal of Memory and Language 39 (4), 593-608. Cornish, Francis (1999). Anaphora, discourse and understanding. Evidence from English and French. New York: Oxford University Press. Crawley, Rosalind A., Rosemary Stevenson & David Kleinman (1990). The use of heuristic strategies in the interpretation of pronouns. Journal of Psycholinguistic Research 19 (4), 245-264. Frederiksen, John (1981). Understanding anaphor: rules used by readers in assigning pronominal referents. Discourse Processes 4 (4), 323-347. Garnham, Alan & Oakhill, Jane. (1992, eds.). Discourse representation and text processing. A special issue of Language and Cognitive processes. Hove, UK: Lawrence Erlbaum Associates. Garrod, Simon C., Freudenthal, Daniel & Boyle, Elizabeth (1994). The role of different types of anaphor in the on-line resolution of sentences in a discourse. Journal of Memory and Languag 33 (1), 39-68. 34 Garrod, Simon C. & Sanford, Anthony J. (1994). Resolving sentences in a discourse context: How discourse representation affects language understanding. In: Gernsbacher, M.A. (Ed.). Handbook of psycholinguistics (pp. 675-698). San Diego etc.: Academic Press. Gernsbacher, Morton A. & Hargreaves, David J. (1988). Accessing sentence participants: the advantage of first mention. Journal of Memory and Language 27, 699-717. Gernsbacher, Morton A., Hargreaves, David J. & Beeman, Mark (1989). Building and accessing clausal representations: the advantage of first mention versus the advantage of clause recency . Journal of Memory and Language 28 (6), 735755. Gernsbacher, Morton Ann, & Givón, Talmy (1995, eds.). Coherence in spontaneous text. Amsterdam etc.: John Benjamins. Givón, Talmy (1992). The grammar of referential coherence as mental processing instructions. Linguistics 30 (1), 5-55. Givón, Talmy (1995). Coherence in text vs. coherence in mind. In: Coherence in spontaneous text, Morton A Gernsbacher & Talmy Givón (eds.), (pp. 59-115). Amsterdam etc: Benjamins. Typological Studies in Language, 31. Gordon. Peter C., Grosz, Barbara .J.& Gilliom, Laura A. (1993). Pronouns, names and the centering of attention in discourse. Cognitive Science 17, 311-347. Grosz, Barbara J., Joshi, Aravind K. and Weinstein, Scott (1995). Centering: a framework for modeling the local coherence of discourse. Computational Linguistics 21 (2), 203-225. Graesser, Art C., Millis, Keith K., & Zwaan, Rolf A. (1997). Discourse comprehension. In Janet T. Spence, John M. Darley, and Donald J. Foss (eds.), Annual Review of Psychology 48 (pp. 163-189). Palo Alto, CA: Annual Reviews Inc. Gundel, Jeanette, Nancy Hedberg & Ron Zacharski (1993). Cognitive status and the form of referring expressions in discourse. Language 69 (2), 274-307. Gundel, Jeanette (1998). Centering Theory and the Givenness Hierarchy: towards a synthesis. In: Centering Theory in Discourse, Marilyn A. Walker, Aravind K. Joshi & Ellen F. Prince (eds.), 183-198. Oxford: Clarendon Press. Hobbs, Jerry R. (1979). Coherence and coreference. Cognitive Science 3, 67-90. Hoover, Michael L. (1997). Effect of textual and cohesive structure on discourse processing. Discourse Processes 23, 193-220. Hudson-D’Zmura, Susan & Michael K. Tanenhaus (1998). Assigning antecedents to ambiguous pronouns: the role of the center of attention as the default assigment. In: Centering Theory in Discourse, Marilyn A. Walker, Aravind K. Joshi & Ellen F. Prince (eds.), 199-228. Oxford: Clarendon Press. Järvikivi, Juhani, Roger P.G. van Gompel, Jukka Hyöna & Raymond Bertram (2005). Ambiguous pronoun resolution. Contrasting the first-mention and subjectpreference accounts. Psychological Science 16 (4), 260-264. Kameyama, Megumi (1996). Indefeasible semantics and defeasible pragmatics. In Quantifiers, deduction and context, Makoto Kanazawa, Christopher Pinón 7 Henriette de Swart (eds.), 110-138. CSLI Publications. Kameyama, Megumi (1998). ‘Intrasentential Centering: A Case Study’. In: Centering Theory in Discourse, Marilyn A. Walker, Aravind K. Joshi & Ellen F. Prince (eds.), 89-112. Oxford: Clarendon Press. Kehler, Andrew (1997). Current theories of centering for pronoun interpretation: a critical evaluation. Computational Linguistics 23 (3), 467-475. 35 Kehler, Andrew (2002). Coherence, reference and the theory of grammar. Chicago etc.: Chicago University Press. Kennison, Shelia M. & Gordon, Peter C. (1997). Comprehending referential expressions during reading: evidence from eyetracking. Discourse Processes 24 (3), 229-252. Matthiessen, Christian M.I.M. & Thompson, Sandra A. (1987). The structure of discourse and “subordination”. In: Clause combining in discourse and grammar, John R. Haiman & Sandra A. Thompson (eds.), 275-330. Amsterdam: John Benjamins. Poesio, Massimo, Di Eugenio, Barbara, Stevenson, Rosemary & Hitzeman, Janet (2004). Centering: a parametric theory and its instantiations. Computational Linguistics 30 (3), 309-363. Sanders, Ted J. M., Spooren, Wilbert P. M., & Noordman, Leo G. M. (1992). Toward a taxonomy of coherence relations. Discourse Processes 15 (1), 1-35. Sanders, Ted & Wilbert Spooren (2001). Text representation as an interface between language and its users. Text representation: Linguistic and Psycholinguistic aspects, Ted Sanders, Joost Schilperoord & Wilbert Spooren (eds.), 1-25. Amsterdam: Benjamins. Sanders, Ted & Wilbert Spooren (in press). Discourse and text structure. In Handbook of Cognitive Linguistics, Dirk Geeraerts & Hubert Cuykens (eds.). Oxford: Oxford University Press. Sanford, Anthony J. & Garrod, Simon C. (1994). Selective processes in text understanding. In Handbook of Psycholinguistics, Morton A. Gernsbacher (ed.), 699-720. San Diego etc.: Academic Press. Suri, Linda Z. & McCoy, Kathleen F. (1994). RAFT/RAPR and Centering: a comparison and discussion of problems related to processing complex sentences. Computational Linguistics 20 (2), 301-317. Tetreault, Joel R. (2001). A corpus-based evaluation of centering and pronoun resolution. Computational Linguistics 27 (4), 507-520. Tomlin, Russell S. (1985). Foreground and background information and the syntax of subordination. Text 5, 85-122. Tomlin, Russell S. (1997). Mapping conceptual representation into linguistic representations: the role of attention in grammar. In Language and Conceptualization, Jan Nuyts & Eric Pederson (eds.), 162-189. Cambridge: Cambridge University Press. Walker, Marilyn A., Aravind K. Joshi & Ellen F. Prince (1998, eds.). Centering Theory in Discourse, Oxford: Clarendon Press. Wolf, Florian & Gibson, Edward (2004). Discourse coherence and pronoun resolution. Language and Cognitive Processes 19, 665-675. Notes 1 Perhaps the subject preference only holds for ambiguous pronouns. Hudson-D’Zmura & Tanenhaus (1998, experiment 2) did not find differences in reading times between pronouns having subject antecedents and object antecedents when gender information could be used when processing pronouns. 2 Note however that when two participants were referred earlier to by a single expression (e.g. a plural pronoun), singular references to these participants were counted as repeated reference to the same participant. 36 3 There are no principled reasons for this choice of gender. We used the same procedure for fragments with sentences containing ambiguous female pronouns she (zij), but this did not give us enough cases in the corpora. 37
© Copyright 2026 Paperzz