Cognitive Brain Research 19 (2004) 59 – 73 www.elsevier.com/locate/cogbrainres Research report Seeing words in context: the interaction of lexical and sentence level information during reading John C.J. Hoeks *, Laurie A. Stowe, Gina Doedens BCN NeuroImaging Centre, Behavioral and Cognitive Neurosciences, University of Groningen, P.O. Box 716, 9700 AS Groningen, The Netherlands Accepted 29 October 2003 Abstract The ERP experiment reported here addresses some outstanding questions regarding word processing in sentential contexts: (1) Does only the ‘message-level’ representation (the representation of sentence meaning combining lexico-semantic and syntactic constraints) affect the processing of the incoming word [J. Exp. Psychol.: Learn. Mem. Cogn. 20 (1994) 92]? (2) Is lexically specified semantic relatedness between multiple words the primary factor instead [J. Exp. Psychol.: Learn. Mem. Cogn. 15 (1989) 791]? (3) Alternatively, do word and sentence level information interact during sentence comprehension? Volunteers read sentences (e.g. Dutch sentences resembling The javelin was by the athletes. . .) in which the (passive) syntactic structure and the semantic content of the lexical items together created a strong expectation of a specific final word (e.g., thrown), but also sentences in which the syntactic structure was changed from passive to active (e.g. Dutch sentences resembling The javelin has the athletes. . .), which altered the message level constraint substantially and strongly reduced the expectation of any particular completion. Half of the sentences ended in a final word with a good lexico-semantic fit relative to the preceding content words (e.g. thrown, fitting well with the preceding javelin and athletes). This creates very plausible sentences in the strong constraint context but semantically anomalous ones in the weakly constraining context (e.g., The javelin has the athletes thrown). In the other half the final word had a poor lexico-semantic fit (e.g., summarized that does not fit at all with javelin and athletes). Good lexico-semantic fit endings showed no difference in N400 amplitude in the strong and weak message-level constraint sentences, despite the fact that the latter were semantically anomalous. This result suggests that lexico-semantic fit can be more important for word processing than the meaning of the sentence as determined by the syntactic structure, at least initially. These conditions did differ, however, in the region of the P600 where the anomalous weak constraint version was much more positive, a pattern usually seen with ungrammatical sentences. The processing of poor lexicosemantic fit words showed a quite different pattern; in both strong and weak constraint sentences they elicited a substantial N400 effect, but N400-amplitude was significantly more negative following strong constraint contexts, even though both sentence contexts were equivalently anomalous. Taken together, these findings provide evidence for the importance of both message-level and lexico-semantic information during sentence comprehension. The implications for theories of sentence interpretation are discussed and an extension of the message-based hypothesis will be proposed. D 2003 Elsevier B.V. All rights reserved. Theme: Neural Basis of Behavior Topic: Cognition Keywords: Sentence comprehension; Syntax; Semantics; Message-level constraint; ERPs; N400; P600 1. Introduction It is well established that the processing of a word is influenced by its preceding sentential context. Both lexicosemantic and syntactic aspects of the context appear to be crucially involved, but the mechanism by which context * Corresponding author. Tel.: +31-50-363-7443; fax: +31-50-3636855. E-mail address: [email protected] (J.C.J. Hoeks). 0926-6410/$ - see front matter D 2003 Elsevier B.V. All rights reserved. doi:10.1016/j.cogbrainres.2003.10.022 affects the processing of upcoming words is still not well understood, despite numerous behavioral and electrophysiological studies. Quite a number of studies using the lexical decision paradigm (starting with Meyer and Schvaneveldt [26]) have shown that deciding that a given letter string is an existing word may be speeded up by the presence of a semantically related word, an effect which will be referred to here as ‘simple’ lexical priming. However, whether simple lexical priming plays a role in sentence context has been challenged. In an influential study using eye tracking, 60 J.C.J. Hoeks et al. / Cognitive Brain Research 19 (2004) 59–73 Duffy et al. [6] showed that this simple lexical priming between a pair of words does not appear to occur when the words are embedded in a sentence. They found a facilitatory effect for the target word mustache in sentence contexts containing two words related to the target (e.g., The BARBER TRIMMED the MUSTACHE), but failed to find such an effect for contexts containing only one of these words (e.g., The woman TRIMMED the MUSTACHE or The BARBER saw the MUSTACHE). On the basis of this result, Duffy et al. concluded that at least two mutually supportive content words are necessary to evoke activation levels high enough to cause priming effects during sentence processing (see also Ref. [34]). This view will be called the ‘combinatory’ account of lexical priming in sentence processing, because it assumes that priming effects depend on a specific combination of words, which is more than just a sum of the priming effects for each word separately. Duffy et al. not only looked into lexico-semantic aspects of priming, but also investigated the possible role of the syntactic structure of the preceding context. They did this by comparing sentences that were almost identical with respect to content words but differed in syntactic structure, such as: While talking to the BARBER, she TRIMMED the MUSTACHE versus While she talked to him, the BARBER TRIMMED the MUSTACHE. Somewhat surprisingly, they found that both sentence contexts showed an equal amount of facilitation for mustache (as compared to a neutral baseline), even though their syntactic structure was rather different, resulting in a different message. Duffy et al. concluded that syntactic structure of the preceding context was irrelevant for the processing of upcoming words, as long as it was intact. In their view, the processing of an upcoming word will only be facilitated when two or more words in the context bear a lexico-semantic relation with that word. Morris [27], however, arrived at a very different conclusion in a follow-up study, also using eye tracking. When looking at lexico-semantic facilitation effects in sentences such as The gardener talked as the BARBER TRIMMED the MUSTACHE, she found the expected reduction in reading times on mustache. However, in a sentence such as The gardener talked to the BARBER and TRIMMED the MUSTACHE, there was no facilitation at all for the target word. Morris argued that the representation at the so-called message-level (i.e., representing the ‘message’ that the sentence is meant to convey) changed so much as a result of changing the syntactic structure that no facilitation for mustache was obtained. In other words, the processing of a word that is congruent with the message-level representation of the preceding context will be facilitated, whereas an incongruent word will not. Does this result definitively demonstrate that every instance of word processing in context must be explained by the ‘message-level hypothesis’? Morris may have failed to find an effect of lexico-semantic relatedness partly because of the use of reading time as the dependent measure. Recall that the crucial sentence describes a gardener trimming a mustache, which is rather implausible, especially given that some other object, e.g., a hedge, might have been expected instead of a mustache. So a possible decrease in reading time due to lexico-semantic facilitation might have been canceled out by an increase in reading time due to implausibility. Additionally, Morris pointed out that there was evidence for simple lexical priming in the very same experiment: reading times on the verb trimmed were significantly faster when preceded by the related word barber than by an unrelated word, and this was true even for the sentence where the gardener was doing the trimming. To sum up, at least three mechanisms by which context can affect the processing of upcoming words have been proposed in the literature: via simple lexical priming, via a combinatory lexical process or via the message-level, all of which have received only mixed support from behavioral studies. These same possibilities, with the exception of the combinatory lexical approach, have been under investigation in electrophysiological studies using ERPs (Event Related brain Potentials), with a focus on one particular component of the ERP waveform, namely the N400, which is a negative going deflection that peaks around 400 ms after stimulus onset. The results of Kutas and Hillyard [23], for instance, demonstrate the impact of the message-level representation on the processing of upcoming words, showing that the amplitude of the N400 is highly inversely correlated with the so-called ‘cloze probability’ or ‘expectedness’ of a word after a given context. Cloze probability is determined by asking volunteers to complete a sentence fragment and then calculate the proportion of times they agree on the same word. Kutas and Hillyard showed that the stronger the message-level constraint, that is, the stronger the expectation for a specific final word, the smaller the amplitude of the N400 if that word is actually presented. Similarly, words that make the sentence implausible or semantically anomalous give rise to a larger N400; the poorer the fit with the semantic representation of the context, the larger the N400. Though their experiment was not set up to test this, the results of Kutas and Hillyard provide evidence for ‘simple’ lexical priming as well, since words that did not fit very well with the context but that were closely related to the best completion also showed attenuated N400s (see also Refs. [7,24]). Consider for instance ‘‘He liked lemon and sugar in his coffee’’, where tea is the most expected word, but coffee is a close semantic relative. This has been called the ‘related anomaly’ effect [21]. Thus, the N400 was shown to be sensitive to lexico-semantic relatedness as well as to the message-level representation of the sentence, which incorporates both the aspect of constraint imposed by the sentential context (e.g., effects of cloze probability) and the aspect of the meaning of the sentence as a whole (i.e., effects of semantic anomaly). Results from an important study by Van Petten [32] also suggest that the processing of upcoming words may be J.C.J. Hoeks et al. / Cognitive Brain Research 19 (2004) 59–73 affected via both the message-level and the lexico-semantic route. Van Petten compared the N400 for words appearing in sentences that did or did not have a coherent messagelevel representation. She found that the second of a pair of semantically related words consistently showed a smaller N400 than the first word, both in coherent sentences (e.g., ‘‘When the MOON is full it is hard to see the STARS. . .’’), and in anomalous sentences, where no coherent messagelevel representation can be formed (e.g., ‘‘When the MOON is rusted it is available to buy many STARS. . .’’). Because the N400 difference between first and second words was significantly larger in the coherent condition, she concluded that, in congruent sentences, both lexico-semantic association and the presence of a coherent message-level representation had a beneficial effect on the processing of the second word. However, this conclusion may not be entirely warranted if we also take the data from the unrelated words into account (e.g., ‘‘The biologist went to the desert every WEEK to collect a particular SPECIES. . .’’ vs. ‘‘The shirt went to the gun every WEEK to keep a good SPECIES. . .’’). Note that the following discussion is largely based on a later paper by Van Petten [33, p. 521, Fig. 10] which summarizes the findings for related and unrelated word pairs in coherent and anomalous sentences. If we visually compare the (closely matched) second words of related and unrelated pairs (e.g., STARS vs. SPECIES, respectively) in the anomalous conditions, there is no difference in N400 amplitude, which is rather unexpected if lexical priming independently affects the processing of the second word of the related pair. Likewise, when directly comparing the patterns of results for related and unrelated pairs in coherent sentences it does not appear that the effect of coherence is different for related and unrelated pairs (though this specific comparison was not reported in the original article). This suggests that also in the case of related words in coherent sentences simple lexical priming may not actually have had an effect over and above the effect of sentence coherence. In other words, though Van Petten’s data provide convincing evidence for the effect of coherence at the message level, there seems to be less solid evidence for word-level effects. Taken together, the results of previous behavioral and electrophysiological studies are still inconclusive as to what roles message-level and lexico-semantic information play in the processing of words in sentence contexts. 1.1. Present experiment The present experiment was designed to investigate the interaction between message-level information on the one hand and lexico-semantic information on the other during the processing of upcoming words. The dependent measure will be the amplitude of the N400, which has been shown to be sensitive to both message-level con- 61 straint and lexico-semantic priming [2,4,21,23,32,33]. Materials were used that differed along two orthogonal dimensions, namely message-level constraint (strong vs. weak) of the sentence context, and lexico-semantic fit (good fit vs. poor fit) of the critical final word with the preceding content words. We will first describe these two factors in some detail and then discuss the experimental hypotheses. The first factor, message-level constraint, was manipulated by taking passive sentences that had a strong messagelevel constraint (i.e., average cloze probability for the final word = 77%) and transforming them into active sentences having a much weaker message-level constraint (i.e., average cloze probability for the final word = 21%; see Materials section for the results of cloze tasks on all materials). This is a much stronger version of the type of manipulation used by Duffy et al. [6] and Morris [27]. Consider for example, sentence contexts 1a and 1b. (1a) De speer werd door de atleten. . . (lit. The javelin was by the athletes. . .) (1b) De speer heeft de atleten. . . (lit. The javelin has the athletes. . .) As can be seen here, changing the syntactic structure from passive to active has important consequences for the message-level representation of these partial sentences because of the impact syntactic structure has on the assignment of thematic roles. A thematic role can be described as the role an entity plays in the event described by a sentence (e.g., Refs. [8,12]). For example, the term agent refers to the entity that intentionally performs an action; the entity that undergoes this action is labelled patient or theme. Through the use of the passive construction in fragment 1a it is clear that the grammatical subject the javelin is the theme, and that the athletes play the role of agent. With such a thematic configuration it is very easy to come up with a completion, which in this case should be a verb describing something athletes can do with a javelin, of which the word thrown is produced most frequently in a cloze task (see below). In fragment 1b, however, the active syntactic structure in which the words are embedded rules out that the javelin is the theme and, instead, forces it to play the role of agent (or maybe effector is a more appropriate term here), with the athletes as the patient. In this configuration, then, the thematic roles that are prescribed by the syntactic structure are not the thematic roles that are ‘preferred’ on the basis of the semantic features of the entities concerned. As a result, sentence fragments such as 1b are difficult to complete; the most plausible completions are typically verbs such as amazed or worried which refer to emotional responses evoked by the inanimate entity (see Materials). Thus by changing the syntactic structure, a major change in the message-level representation was obtained both in terms of the meaning of the (partial) sentence and in terms of 62 J.C.J. Hoeks et al. / Cognitive Brain Research 19 (2004) 59–73 the degree of constraint that is imposed by the sentence fragment. The second factor that was varied was the lexico-semantic fit of the final word with the preceding words. The concept ‘lexico-semantic fit’ (for want of a better term) will refer to the extent to which the final word and the preceding content words of a given sentence fit together semantically. This does not necessarily mean that there is a strong associative or semantic relationship between all of the content words. For instance, in a sentence such as 2a (see Table 1) there is not much lexico-semantic association between athletes and thrown. Rather, words with a good lexico-semantic fit go together very well and easily conjure up a specific situation or event (e.g., javelin/athletes/ thrown), in contrast to cases where the final words do not fit with the preceding words (e.g., javelin/athletes/summarized), as in 2c and 2d. Please note that the term ‘lexicosemantic fit’ does not refer to the semantic plausibility of the sentence as a whole. In effect, sentence 2b, for instance, is semantically implausible due to the particular syntactic structure that is used, but its content words have a good lexico-semantic fit. Using these materials allows us to address the question of whether the aspects of processing an upcoming word (as reflected in the N400) are influenced by the message-level representation of the preceding context, by the lexicosemantic links that exist between the word itself and the words that precede it, or by both. We can distinguish four patterns of results that are expected if one or both factors affects the processing of the final words of our current experiment (for a summary, see Table 2). Let us start with the simplest pattern, where neither message-level constraint provided by the preceding context nor lexico-semantic fit has a direct effect on the processing of the upcoming word; rather, only the plausibility of the incoming word is assumed to be important for the amplitude of the N400. This is the view that the primary determinant of the N400 is semantic integration, unaffected by factors such as message level constraints of the sort manipulated here or semantic relationship when they do not affect the plausibility of the sentence. In that case, the smallest N400 amplitude will be found in the highly plausible strong Table 1 Examples of the materials used in the experiment Good Fit Poor Fit Strong message-level constraint Weak message-level constraint 2a. De speer werd door de atleten geworpen lit. The javelin was by the athletes thrown ?? 2c. De speer werd door de atleten opgesomd lit. The javelin was by the athletes summarized ?? 2b. De speer heeft de atleten geworpen lit. The javelin has the athletes thrown ?? 2d. De speer heeft de atleten opgesomd lit. The javelin has the athletes summarized ?? = grammatically correct but semantically implausible; Good/Poor Fit = Good/Poor lexico-semantic fit. Table 2 Expected patterns in N400-amplitude (from least negative to most negative) as a function of the presence or absence of effects of message-level constraint and lexico-semantic fit Hypothesis 1: NO effect of Message-Level Constraint and NO effect of Lexico-Semantic Fit Prediction: Only the plausibility of the sentence will influence the N400; the strong constraint – good fit condition (2a) is plausible, so N400 amplitude will be small; the other conditions are equivalently implausible, and thus cause equally large N400s Predicted Order of N400 amplitude: (2a) The javelin was by the athletes THROWN < (2b) The javelin has the athletes THROWN = (2d) The javelin has the athletes SUMMARIZED = (2c) The javelin was by the athletes SUMMARIZED Hypothesis 2: NO effect of Message-Level Constraint, ONLY effect of Lexico-Semantic Fit Prediction: There will be a lexical effect on top of the effect of plausibility (see Hypothesis 1): N400 amplitude decreases for words that fit lexico-semantically (2a and 2b); no difference is expected between sentences with poor lexico-semantic fit (2d and 2c) Predicted Order of N400 amplitude: (2a) The javelin was by the athletes THROWN < (2b) The javelin has the athletes THROWN < (2d) The javelin has the athletes SUMMARIZED = (2c) The javelin was by the athletes SUMMARIZED Hypothesis 3: ONLY effect of Message-Level Constraint, NO effect of Lexico-Semantic Fit Prediction: N400 amplitude will be smallest if the critical word matches the strong message-level constraint (2a); it will be greatest if the word does not match it (2c); in weak constraint sentences no specific word is expected and thus no mismatch will occur (2b and 2d) Predicted Order of N400 amplitude: (2a) The javelin was by the athletes THROWN < (2b) The javelin has the athletes THROWN = (2d) The javelin has the athletes SUMMARIZED < (2c) The javelin was by the athletes SUMMARIZED Hypothesis 4: Effects of BOTH Message-Level Constraint AND LexicoSemantic Fit Prediction: Same as Hypothesis 3, except that N400 amplitude to words with a good lexico-semantic fit (2b) will be smaller than to words with a poor lexico-semantic fit (2d), other things being equal Predicted Order of N400 amplitude: (2a) The javelin was by the athletes THROWN < (2b) The javelin has the athletes THROWN < (2d) The javelin has the athletes SUMMARIZED < (2c) The javelin was by the athletes SUMMARIZED ‘‘(2a). . . < (2b). . .’’, means that N400 amplitude for condition 2a is less negative than for 2b. constraint – good fit condition (‘‘The javelin was by the athletes thrown’’), and an equally large N400 for the three other conditions, which were rated equally implausible. Now if message-level constraint but not lexico-semantic relatedness were to have an effect, then the smallest N400 should be found for the plausible strong constraint – good fit condition, and the largest N400 for strong constraint – poor fit sentences, because in the former condition the final word is also the most expected word, but in the latter there is a substantial mismatch between the strong message-level constraint and the word that is actually presented (e.g., ‘‘The javelin was by the athletes summarized’’). The other two conditions, which do not have a strong message-level constraint but are implausible because their final words do not fit into the existing semantic representation of the context, will be expected to have an (equal) N400 some- J.C.J. Hoeks et al. / Cognitive Brain Research 19 (2004) 59–73 where in between. That is, we predict that a word which does not fit into a highly constraining context will be harder to process than a word which does not fit into a low constraint context. If, on the other hand, lexico-semantic fit but not message level constraint is important for processing the next word, the N400 for the two good fit conditions should be comparable and quite small. We do expect a somewhat more negative N400 amplitude for the implausible weak constraint – good fit condition, however, somewhat similar to the ‘related anomaly’ effect mentioned above [21]. A much larger N400 is expected where the lexico-semantically poor fit conditions are concerned. Finally, if both message-level constraint and lexicosemantic fit operate during the processing of a new word (which seems possible, given the conflicting evidence in the literature), the resulting interaction could take the following form. The smallest N400 will be expected, as before, for the plausible strong constraint – good fit condition. A somewhat larger N400 is expected for the weak constraint – good fit condition, as the final word fits well with the preceding words but also makes the sentence semantically anomalous. The weak constraint – poor fit condition is expected to elicit a still larger N400, because it does not have a good fit ending, but the largest N400 is expected for strong constraint – poor fit sentences, where the upcoming word is neither compatible with the message-level constraint, nor does it fit with the preceding words. See Table 2 for a summary of these hypotheses and the associated predictions in terms of N400 amplitude. It should be noted that in the present design main effects of either factor will not be easily interpretable. Only one of the four conditions is semantically plausible, so it will always be the case that one particular level of a main effect will contain an average over a plausible and an implausible condition, whereas for the other level two implausible conditions are averaged. However, since the aim of this experiment is to find evidence for either one of four specified forms of interaction between lexico-semantic fit and message-level constraint, we will not report main effects of either of these factors, but instead focus on interactions and planned comparisons. One other aspect of the present design must also be taken into account when interpreting the results and that is the fact that the two levels of the factor message-level constraint differ not only in constraint but also in sentence structure, since the sentences with strong message-level constraint have a passive sentence structure, and the weak messagelevel constraint sentences an active structure. This difference in sentence structure by itself may have consequences for processing, as active sentences such as (2b) and (2d) are temporarily ambiguous in Dutch between a subject-initial and an object-initial reading. In the sentences used in the current study, however, they are disambiguated by the number information on the second NP. We will discuss this issue further in the Materials section. 63 2. Materials and methods 2.1. Materials A total of 112 sentences were selected from a list of high cloze probability Dutch passive sentences used by Gunter, Stowe, and Mulder [13]. As the cloze probability of the final words of the selected sentences was higher than 60% (average = 77%, see Table 3), the sentence contexts created from these sentences by leaving out the final word can be said to have a strong message-level constraint, as they are preferentially completed with one specific final word (e.g., the word thrown in the context of The javelin was by the athletes. . .). The weak message-level constraint version was derived from the strong message-level constraint sentences by changing the syntactic structure from passive to active (e.g., The javelin has the athletes. . ..). Average cloze probability for the final words of the weak message-level constraint contexts was determined in a separate cloze-task pre-test (N = 26) and turned out to be 21% (SD = 12). Please note that in these contexts, the high cloze probability word from the strong constraint contexts were (almost) never generated (i.e., had a cloze probability of 1%, see Table 3). The average cloze probability for the weak constraint sentences would have been much lower than 21%, except that participants tended to choose a generally applicable verb for all these sentences instead of a specific completion. This can be illustrated by the fact that, for example, the verb verbaasd (‘amazed’) received the highest cloze probability in more than one third of all weak constraint sentence contexts. The difficulty of coming up with a suitable specific final verb is also reflected in a difficulty of completion rating which participants gave for all sentences before actually completing the sentence fragments. On a scale of 1 (easy) to 5 (difficult) they had to rate how difficult it was to come up with a suitable completion. Average difficulty rating in weak constraint contexts was 3.4 (SD = 0.6), significantly higher than the 1.5 (SD = 0.4) for strong constraint contexts. Thus, two sets of 112 sentence Table 3 Length (in # Characters), log-frequency, and cloze probability (in proportions) for the final words in the target sentences, and plausibilityratings for target sentences as a whole (standard deviations in parentheses) Length LogFrequency Cloze Probability – Strong Constraint Cloze Probability – Weak Constraint Plausibility – Strong Constraint Plausibility – Weak Constraint Good fit word Poor fit word 7.9 1.5 0.77 0.01 4.5 1.4 7.9 1.3 0.00 0.00 1.6 1.4 (1.4) (0.7) (0.1) (0.4) (0.4) (0.3) (1.3) (0.8) (0.0)* (0.0) (0.4)* (0.4) Constraint = message-level constraint; good/poor fit = good/poor lexicosemantic fit; plausibility ratings from 1 (highly implausible) to 5 (highly plausible). * Significant difference ( p < 0.05) between good fit and poor fit words on the variable given in the row; all differences were tested. 64 J.C.J. Hoeks et al. / Cognitive Brain Research 19 (2004) 59–73 contexts were obtained, one with strong message-level constraint and one with a weak message-level constraint. For the good lexico-semantic fit conditions, the original cloze probability words [13] were used to complete the sentences. These words had a good lexico-semantic fit with the other content words in the sentence. As was mentioned before, this does not necessarily mean that all content words were highly associatively related. In a minority of sentences there was a lexico-semantic association between the final word and the immediately preceding word: In 74 of the 112 sentences the final word was not associatively related to the immediately preceding word, in the remaining 38 sentences there was a moderate to high associative relationship between final and prefinal word (as assessed by two independent judges). These sentences are rather typical of combinatory lexico-semantic relationships (cf. Ref. [6]), as the combination of the two context words provides a clear semantic setting for the final word. Finally, to create the poor lexico-semantic fit conditions, each of the good fit target words were replaced by a poor fit one. Thus, a total of 112 sets of four sentences were created. See Table 1 for an example set. When constructing the materials, the final words for the good fit and poor fit conditions were carefully matched on frequency of occurrence (using the CELEX lexical frequency database, see Ref. [1]) and on word length. Furthermore, plausibility ratings on a scale from 1 (highly implausible) to 5 (highly plausible) for sentences in all four conditions were obtained from 40 students of the University of Groningen who did not participate in the ERP-experiment. As expected, only strong constraint sentence contexts with a good fit final word were judged plausible; the sentences in the other three conditions were rated as highly implausible. There were no significant differences on any of the stimulus characteristics between good fit and poor fit words, except where such a difference was intended, that is, in cloze probability of the word with the good lexico-semantic fit as compared to the word with the poor fit (i.e., 77% vs. 0%) and in rated plausibility of good fit versus poor fit words in strong constraint contexts (i.e., 4.5 vs. 1.4, on a scale of 1 = highly implausible and 5 = highly plausible). Stimulus characteristics, including plausibility ratings are presented in Table 3. Four lists were created (see Design section for more details) containing 112 experimental sentences, 28 sentences for each of the four conditions, with no repetition of related sentences within a list. Since all of the weak message-level constraint sentences (i.e., 56) and half of the strong constraint sentences (i.e., 28) on any list were semantically implausible, 112 plausible filler items, all active sentences with a syntactic structure similar to the experimental items, were added to prevent a bias toward expecting implausible sentences. We should point out that this created a difference in proportion of plausible sentences between actives (112 out of 168 are plausible = 66%) and passives (28 out of 56 are plausible = 50%). Though this difference seems rather small, it might influence the results. We will come back to this issue in the discussion. We have already discussed the fact that, since the active weak message-level constraint sentences have an ambiguous syntactic structure, readers could in principle take the first NP (e.g., the javelin) to be the grammatical object of the sentence. If they were to do so, the number information on the second NP (e.g, the athletes) which does not match the number of the auxiliary (e.g., has) would necessitate syntactic reanalysis, which might complicate the interpretation of effects at the subsequent target word (e.g., thrown). Since the object initial structure is highly non-preferred [25], this seems unlikely; however, it is important to clearly demonstrate that readers do not consider the object-initial reading. To investigate exactly how non-preferred the objectinitial reading is, we conducted a second paper-and-pencil plausibility judgment test. Participants (N = 20) who did not take part in the ERP experiment, were asked to judge the plausibility of a syntactically ambiguous sentence such as [3], where both NP arguments agree with the number information on the auxiliary. Since both NPs match the auxiliary with respect to number, either one can be the subject of the sentence. 3. De speer heeft de atleet geworpen (lit. The javelin[sing] has[sing] the athlete[sing] thrown) A five-point rating scale was used, with 1 = highly implausible and 5 = highly plausible. If readers commonly entertain object-initial readings they will rate these sentences as semantically plausible. However, the rating data clearly show that readers disprefer the object-initial reading, as they rated the ambiguous sentences as highly implausible; participants gave an average plausibility rating of 1.9 (SD 1.1). In comparison, unambiguous implausible sentences such as De speer heeft de atleten geworpen (lit. The javelin has the athletes thrown) received an average rating of 1.5 (SD 0.9). This difference was not significant. The plausible sentence De speer werd door de atleten geworpen (lit. The javelin was by the athletes thrown) received an average of 5.0 (SD 0.0), significantly different from both other ratings. These results suggest that even when readers are completely free to choose the object-initial reading, they do not normally do so; it seems to escape readers that they have this possibility at all. 3.1. Participants Twenty-four native speakers of Dutch took part in the experiment on a voluntary basis. During analysis, the data from one participant turned out to be unusable due to an error in the data-storage process. The data from the remaining 23 participants (18 female; mean age 25 years, age range 20– 35) were used for analysis. All participants were currently receiving a university education or had recently J.C.J. Hoeks et al. / Cognitive Brain Research 19 (2004) 59–73 completed a degree. All were right-handed, had normal or corrected-to-normal vision, and none of them reported having had neurological problems. 3.2. Design In total 112 sets of four experimental sentences were constructed. The four conditions of each set were obtained by crossing the factors message-level constraint (strong constraint vs. weak constraint) and lexico-semantic fit (good fit vs. poor fit). Four experimental lists were created using a Latin Square, with 28 items occurring in each condition on each list, and no list containing more than one version of a given item. The order in which experimental and filler items appeared was determined semirandomly (with the conditions spread evenly over the list) and was the same for all lists. Each list was presented to an equal number of participants, although for one list the data from only five instead of six participants were useable (see section on Participants). Each participant saw only one list. 3.3. Experimental procedure Participants were tested in a dimly lit, sound-proof cabin. They sat facing a computer screen at approximately 50 cm distance; a chin-rest was used to minimize movement artefacts. The participant’s index fingers rested on touch-sensitive response boxes, which recorded a response when either of the index fingers was lifted. Participants were instructed to read each sentence for comprehension, and to give a plausibility judgment at a cue after reading the complete sentence. They were further instructed to use their right finger to indicate that a sentence was plausible and their left finger to indicate that a sentence was implausible. At the beginning of each trial, a fixation mark (an asterisk) appeared for 1 s. After that, a sentence was presented wordby-word in the centre of the screen. Each word remained on screen for 240 ms, and was followed by a blank screen with a duration of 240 ms. After the presentation of the final word (marked by a period) and the subsequent blank screen (also 240 ms), the one-word question ‘‘Goed?’’ ( = ‘‘Correct?’’) appeared on screen for 3 s during which participants could make their plausibility judgement. The word ‘‘Knipper’’ ( = ‘‘Blink’’) was then shown for another 3 s, giving participants the opportunity to blink; they were instructed to avoid blinking during the presentation of the sentence to avoid eyemovement and blink related artefacts. The experimental and filler sentences, 224 items in total per list, were divided into six blocks, the first five of which contained 40 stimuli each, whereas the final block contained 24 stimuli. At the end of each but the last block participants were invited to take a short break before continuing with the experiment. The experiment took about 45 min. 65 3.4. EEG recording parameters The EEG activity was recorded by means of tin electrodes mounted in an elastic cap (Electro-Cap International) from 21 electrode sites, based on the International 10 – 20 system, of which 15 will be used for further analysis: Fp1, Fp2, Fza, F3, F4, Fz, C3, C4, Cz, P3, P4, Pz, O1, O2, Oz. The base electrode was positioned at 10% of the nasioninion distance anterior to Fz. Bipolar horizontal EOG was recorded between electrodes at the outer left and right canthus. Bipolar vertical EOG was recorded for both eyes. Electrode impedances were kept below 5 kV. All electrodes were referenced on-line to linked mastoids. EEG and EOG signals were sampled at 1000 Hz, amplified (EEG: 0.2 mV/ V; EOG: 0.5 mV/V; time constant: 10 s), and digitally lowpass filtered with a cut-off frequency of 30 Hz; effective sample frequency was 100 Hz. 3.5. Data analysis For the behavioral data, RT was defined as the interval between the onset of the cue (i.e., ‘‘Correct?’’) and the participant’s lifting of the left or right index finger. Average proportions of correct responses and average RTs for correct responses were computed for each participant and each condition (response time relative to word onset is 480 msec longer than the RT from the cue). Average ERPs were computed for each electrode site and for each participant in each condition for a time-interval starting at 100 ms before onset of the final verb and ending 1100 ms post-onset. Prior to averaging, trials with ocular or amplifier-related artefacts were excluded from analysis (on average 10% of all trials: 10.5% in condition Strong Constraint – Good Fit, 8.8% in Weak Constraint – Good Fit, 10.8% in Strong Constraint-Poor Fit, and 10% in Weak Constraint-Poor Fit). All averages were aligned to a 100-ms pre-stimulus baseline. For an exploratory analysis of the complete data set, averaged ERPs of each 1200 ms epoch (i.e., 100 ms pre-onset to 1100 ms post-onset of the final verb) were divided into 60 intervals of 20 ms. In each of these 60 intervals, mean amplitudes were statistically analyzed for each electrode separately using MANOVA. To guard against excessive type I errors, effects will be interpreted only when significant in three or more successive intervals. Cases where two successive intervals were significant will nevertheless be shown in Fig. 1 for reasons of comparability with previous studies (cf. Ref. [10,13,14]). The results of this ‘interval’ analysis will also be used to define relevant time-windows for evaluating the topographical distribution of the effects. For the analysis of average ERPs in a given time-window we will include a factor ‘laterality’ with three levels (i.e., left, midline, and right side of the scalp) and a factor ‘posteriority’, with five levels (i.e., prefrontal (Fp1, Fza, Fp2), frontal (F3, Fz, F4), central (C3, Cz, C4), parietal (P3, Pz, P4), and occipital (O1, Oz, O2)). The Huynh-Feldt correction will be applied to correct for 66 J.C.J. Hoeks et al. / Cognitive Brain Research 19 (2004) 59–73 Fig. 1. Results of statistical analyses on single electrode ERP-data, time-locked to the presentation of the final verb. Electrodes on the Y-axis are grouped into left hemisphere (FP1-O1), midline (FZA-OZ), and right hemisphere electrodes (FP2-O2). Planned comparisons between strong and weak constraint sentences with good fit verbs (left-hand panel), and between strong and weak constraint sentences with poor fit verbs (middle panel) are shown, as well as interactions (right-hand panel). Light-gray bars represent cases where effects in two consecutive 20-ms intervals were statistically significant ( p-value < 0.05). Dark-gray bars represent cases where effects in three or more consecutive intervals were significant. violations of the sphericity assumption [31]. We will report the corrected p-values with the original degrees of freedom. 4. Results 4.1. Behavioral data Table 4 presents participant means (and standard deviations) for RT and performance accuracy for each of the experimental conditions. As can be seen from the accuracy data, over 85% of all sentences were correctly classified as either plausible or implausible, suggesting that participants read the sentences carefully and made only few mistakes in their plausibility judgment. An ANOVA on participant RT means with message-level constraint and lexico-semantic fit as within-participant variables yielded a highly significant interaction ( F(1,22) = 24.47, MSe = 14,565, p < 0.001). Reaction times for strong constraint sentences with a good fit final verb were significantly faster (i.e., 111 ms) than for the two weak constraint conditions taken together (t(22) = Table 4 Average reaction time for correctly judged sentences (ms) and average proportion correct as a function of lexico-semantic fit and message-level constraint in the plausibility judgement task (standard deviations in parentheses) Condition Measure RT-correct Proportion correct Good fit final word Strong Constraint Weak Constraint 716 (223) 834 (225) 0.95 (0.06) 0.89 (0.08) Poor fit final word Strong Constraint Weak Constraint 960 (249) 829 (210) 0.83 (0.12) 0.91 (0.10) RT-Correct = RTs to correctly judged sentences. 2.68, p < 0.0.05). On the other hand, reaction times for strong constraint sentences with a poor fit final verb were significantly slower (i.e., 124 ms) than for weak constraint sentences (t(22) = 3.03, p < 0.01). The two conditions with weak message-level constraint did not differ ( p>0.20). Analyses on participant means for ‘‘proportion correct’’ revealed a very similar pattern. There was a highly significant interaction of message-level constraint and lexicosemantic fit ( F(1,22) = 19.37, MSe = 0.01, p < 0.001). Strong constraint sentences with a good fit final verb showed a higher proportion correct (i.e., 0.95) than both weak constraint conditions taken together (i.e., 0.90; t(22) = 2.10, p < 0.05), whereas responses to strong constraint sentences with a poor fit final verb were more often incorrect (i.e., 0.83) as compared to sentences with a weak message-level constraint (i.e., 0.90; t(22) = 3.01, p < 0.01). Again, there was no significant difference between the two weak constraint conditions ( p>0.20). 4.2. ERP data Fig. 1 shows the results of the statistical analyses on the average ERP-amplitudes (in 20-ms intervals) time-locked to the presentation of the final verb. As can be seen, interactions were predominantly found in the region 300– 600 ms post-onset (i.e., the N400 time-window) and, unexpectedly, in approximately 600 – 1100 ms post-onset. As can be seen from Fig. 2, which displays the averaged ERP responses to the four conditions, the responses in these two time-windows are quite different, as is the form taken by the interaction. We will discuss these time-windows separately, starting with the results of the interval analysis, followed by the topographical analyses. For this latter analysis, ERPs were averaged over the interval 300– 600 ms post-onset for the analyses in the N400 time-window, and the interval 700– 1000 ms post-onset for the analyses of J.C.J. Hoeks et al. / Cognitive Brain Research 19 (2004) 59–73 67 Fig. 2. Grand average ERP-waveforms as a function of message-level constraint and lexico-semantic fit. the later time-window. Fig. 3 displays the difference waves obtained by subtracting the ERPs to the correct strong constraint –good fit condition from each of the other three conditions, which might be especially helpful to understand the pattern of results in the later time-window. 4.2.1. Time-window 300– 600 ms: the N400 time-domain The interval analysis (see Fig. 1) revealed significant interactions of message-level constraint and lexico-semantic fit at central, parietal, occipital electrodes and at F4. Followup analyses showed that at the electrodes where an interaction was present, no significant differences in amplitude could be observed between strong and weak constraint sentences ending in lexico-semantically good fit final words (e.g., ‘‘The javelin was by the athletes thrown’’ vs. ‘‘The javelin has the athletes thrown’’); for final words that had a lexico-semantically poor fit (e.g., summarized), however, differences between strong and weak constraint sentences were significant; sentences with a strong message-level constraint (e.g., ‘‘The javelin was by the athletes summarized’’) were more negative than weak constraint sentences (e.g., ‘‘The javelin has the athletes summarized’’). Fig. 2 provides a graphical display of the grand average ERP waveforms for all four conditions. A topographical analysis was conducted subsequently and revealed a four-way interaction of constraint (2) lexico-semantic fit (2) laterality (3) posteriority (5) that was marginally significant ( F(8,176) = 1.95; MSe = 2491; p = 0.08). Follow-up analyses were conducted separately for each level of posteriority (prefrontal, frontal, 68 J.C.J. Hoeks et al. / Cognitive Brain Research 19 (2004) 59–73 Fig. 3. Difference waveforms obtained by subtracting the grand average ERP-waveforms of the Strong Constraint – Good Fit condition from the grand averages of each of the three other conditions. etc.). The three-way interaction constraint x lexico-semantic fit x laterality did not reach significance anywhere except (marginally) at the frontal electrodes: ( F(2,44) = 2.72; MSe = 3013; p = 0.08) where the difference between strong and weak constraint for good fit words was smallest at F4 (though this difference was not significant at any of these electrodes), whereas the pattern for poor fit words did not differ across the frontal electrodes. Thus, the effects making up the interaction in the N400 time window appear not to be lateralized except for an effect at the frontal electrodes. 4.2.2. Time-window 600– 1100 ms The interval analysis showed significant interactions between message-level constraint and lexico-semantic fit, primarily at centro-parietal electrodes: C3, Cz, C4, P3, Pz, P4, but also at Fz and O2 (see Fig. 1). Three sets of pairwise comparisons were conducted to follow up on those interactions. In the first set of comparisons all conditions were compared against the strong constraint sentences with a good fit ending (see Fig. 3 for difference waves). The positivity caused by the weak constraint – good fit condition was significant at all electrodes where an interaction was found; the weak constraint – poor fit condition was significantly more positive at all electrodes but C4. The positivity of the strong constraint – poor fit condition was somewhat less broadly distributed, as it reached significance at Fz, C3, Cz, and P4. The second set of comparisons showed that the effect associated with the weak constraint –good fit condition was significantly larger than the positivity elicited by the weak J.C.J. Hoeks et al. / Cognitive Brain Research 19 (2004) 59–73 constraint –poor fit condition (at Cz, C4, P3, Pz, P4, and O2), and also significantly larger than the positivity in the strong constraint –poor fit condition (at P3, Pz, P4, and O2). The final comparison involved the weak constraint – poor fit condition, which showed a larger positivity numerically as compared to the strong constraint – poor fit condition, which was significant only at P3. A subsequent topographical analysis revealed a fourway interaction of constraint (2) lexico-semantic fit (2) laterality (3) posteriority (5) which was highly significant ( F(8,176) = 4.10; MSe = 6353; p < 0.005). Follow-up analyses at each level of posteriority showed (marginally) significant three-way interactions of constraint fit laterality at the prefrontal, frontal and central level, indicating that at those electrodes the form of the interaction between constraint and lexico-semantic fit differed with laterality (Prefrontal: F(2,44) = 3.11; MSe = 13047; p = 0.06; Frontal: F(2,44) = 3.28; MSe = 6718; p = 0.06; and Central: F(2,44) = 4.07; MSe = 8810; p < 0.05). At none of the prefrontal electrodes the constraint lexicosemantic fit interaction reached significance. The fact that at Fp1 and Fp2 the two poor fit conditions were actually numerically more negative than the plausible condition, in contrast to Fza where all implausible conditions were numerically more positive than the plausible condition must have caused the marginally significant interaction. At the frontal and the central level the pattern was somewhat different. All electrodes showed positive effects of the three implausible conditions as compared to the plausible one, except for F4 where there were no significant differences, and except for C4 where only the weak constraint –good fit condition produced a significant positivity. The absence of significant interactions of constraint lexico-semantic fit laterality at the parietal and the occipital level indicated that the form of the constraint lexico-semantic fit interaction did not change with laterality. 4.3. Summary of results Significant interactions between message-level constraint and lexico-semantic fit were found in two time-windows. In the N400 time-window, from 300 until 600 ms post-onset, the interactions were broadly distributed over electrodes (though excluding most of the frontal ones), and were brought about by the absence of a difference between lexico-semantically good fit final verbs in strong and in weak message-level constraint contexts, and the presence of such a difference for poor fit verbs, with strong messagelevel constraint sentences being significantly more negative than weak constraint sentences with these poor fit words. In the second time-window, from 600 until 1100 ms postonset, significant interactions between constraint and lexicosemantic fit were found, predominantly at centro-parietal electrodes. These interactions were the consequence of the fact that, at most of the electrodes concerned, all three 69 implausible conditions showed a significant positive shift as compared to the plausible condition. This positive shift was in general largest for the weak constraint – good fit condition, while the two conditions with poor fit words had a smaller positivity. When comparing the latter two conditions, the strong constraint –poor fit condition was almost always numerically smaller than the weak constraint – poor fit condition; a difference that reached significance, however, only at P3. Topographical analysis showed that the positive shift was slightly less pronounced on the right as compared to the left side of the scalp, with the effects at central and frontal electrodes on the right side (i.e., C4 and F4) being smaller and mostly not significant. 5. Discussion The aim of this experiment was to find out if and how message-level information and lexico-semantic information interact to affect the processing of an upcoming word during on-line sentence comprehension. We obtained a significant interaction of message-level constraint and lexico-semantic fit in the N400 time-window (300 – 600 ms post-onset) and, unexpectedly, also in a later time window (600 –1100 ms post-onset). The form of the interaction in the N400 timewindow indicated that message-level constraint and lexicosemantic fit both must have had an impact on the processing of upcoming words. First, we found a significant difference in N400 amplitude between strong and weak message-level constraint sentences containing lexico-semantically poor fit words (e.g., summarized), reflecting the impact of messagelevel constraint on processing. Both poor fit conditions were found to be significantly more negative than the correct (and plausible) strong constraint – good fit condition, which was what we had expected since both conditions were (equally) semantically implausible (for plausibility ratings, see Table 3). The largest N400 was obtained for the poor fit word following a strong constraint context, indicating that presenting a poor fit word in a context where another word is highly expected has an effect on the N400, over and above the N400 effect resulting from mere lexico-semantic fit. Importantly, as can be clearly seen in Fig. 3, this additional effect has the same distribution as the effect of lexicosemantic fit per se. This is an important finding, especially in the light of earlier contentions [23,24] that sentence constraint does not in itself influence the N400, but that the ‘cloze probability’ of the following words crucially determines N400 amplitude. As the cloze probability of the final words is 0% in both conditions, this hypothesis is not supported. The results from the two poor fit conditions thus indicate that message-level constraint can make the processing of an unexpected word more difficult, a finding that is compatible with the message-based model proposed by Morris [27]. However, the N400 elicited by lexico-semantically good fit words did not seem to be affected by message-level 70 J.C.J. Hoeks et al. / Cognitive Brain Research 19 (2004) 59–73 constraint at all. This is somewhat unexpected given the effect of message-level constraint that was observed for the poor fit words. It is also quite surprising given the fact that in the weak constraint contexts lexico-semantically good fit words were semantically anomalous (e.g., The javelin has the athletes thrown), which we predicted would cause some increase in N400 amplitude in this condition (see also Table 2). It thus appears that the N400 to the anomalous words in the low constraint – good fit sentence must have been reduced (i.e., made less negative) through the operation of lexico-semantic fit. Let us consider some possible mechanisms. For one thing, we know that the N400 to an anomalous word is sometimes attenuated as a result of it being closely lexico-semantically related to the most expected word in a strong constraint sentence, as reported in the early work of Kutas and Hillyard [23,24]. This kind of explanation seems rather unlikely, however, as the sentences with weak message-level constraint generally do not have a most expected word, and the words with a good lexico-semantic fit (e.g., thrown) are not even remotely related to any word that could be a plausible continuation of the sentence (e.g., amazed). So an explanation along the lines of ‘indirect’ semantic priming, that is, via the most expected word, can be effectively ruled out. Alternatively, the N400 could have been reduced through ‘simple’ lexico-semantic priming between the final word and one of the preceding content words, as suggested by Van Petten [32,33]. However, the fact that we didn’t find any difference between plausible and implausible good fit words seems to go against Van Petten’s account, since she claimed that message level and lexico-semantic effects were additive. It is difficult to see how the effect of lexicosemantics alone (i.e., in the weak constraint condition) could equal the effect of message-level constraint plus plausibility plus lexico-semantic fit (i.e., in the strong constraint condition). Rather, these results seem to be more compatible with the combinatory version of lexico-semantic priming as put forward by Duffy et al. [6], in which the two preceding content words (e.g., javelin and athletes) together facilitate the processing of the final verb. This latter model actually predicts equal priming effects for sentences with the same content words, irrespective of their syntactic structure or message-level representation, which is the pattern seen here. However, both the simple and the combinatory lexicosemantic account clearly fail to provide an explanation for the pattern of results that was found for the poor fit words. In summary, then, the present results strongly suggest that both lexico-semantic fit and message-level constraint have a significant impact on word processing, though the mechanism by which they interact is not obvious. What we do know is that the current set of results cannot be explained by any account focusing solely on either of these sources of information, such as the simple and the combinatory account of lexico-semantic priming, or the message-level hypothesis (i.e., lexico-semantic facilitation will only occur when consistent with the message-level representation). We have also argued that the intuitively appealing ‘additive’ account suggested by Van Petten cannot be right, as message-level constraint did not seem to have an additive effect in the two good fit conditions. One way to think about a solution is to retain the message-based model, but with the extra assumption that under some circumstances, the complete and correct message-level representation may not be available to influence the processing of the upcoming word, but only an underspecified version of the correct representation (see, e.g., Ref. [30] for evidence on the use of underspecified representations in language understanding). For instance, in the weak constraint sentences, it may be hard to arrive at a coherent message-level representation because of the difficulty of correctly assigning thematic roles. The active syntactic structure forces an inanimate entity (e.g., the javelin) to be the agent/effector of the event described and thus to do something to an animate entity (e.g., the athletes), which contrasts with their ‘preferred’ thematic roles (see Introduction). We cannot be sure what such an underspecified messagelevel representation looks like, but it seems it is not able to eliminate semantic activations based on the combined lexico-semantic information of the content words in the sentence, and could be much like the semantic representations proposed by Duffy et al. [6]. Normally, this ‘basic’ semantic representation would have been processed further into a specific message-level representation, apparently also deactivating incompatible semantic features along the way (cf. the results for the poor fit words). Thus, semantic activations in the weak constraint – good fit sentence could in fact be rather similar to those of the correct message-level representation of the strong constraint – good fit sentence! Or to put it differently, it is possible that for a very short period of time (i.e., a few hundred milliseconds) the processor reacts as if the implausible weak constraint – good fit sentence were perfectly plausible, in which case one could speak of a temporary semantic illusion. It seems unlikely, given its proposed brevity, that this temporary illusion will reach consciousness, quite analogous to processing difficulty in garden-path sentences which reaches awareness only in extreme cases. Thus, the message-based model might be extended to explain the present findings by assuming that strong and weak constraint sentences temporarily may have a very similar semantic representation and thus have similar effects on the processing of incoming words. This extended model might also explain other findings in the literature, such as Duffy et al.’s [6] failure to find different outcomes in sentences that clearly differed in syntactic structure, which was taken as evidence for the combinatory lexical priming approach. Recall that the priming effect at mustache (i.e., as compared to a neutral control) did not differ between a sentence such as ‘‘While she talked to him, the barber trimmed the mustache’’ and J.C.J. Hoeks et al. / Cognitive Brain Research 19 (2004) 59–73 the syntactically different ‘‘While talking to the barber, she trimmed the mustache’’. The presence of the pronoun she, referring to a person that has not been introduced and that therefore does not have any real meaning, may make it difficult to come up with a complete and sensible sentence interpretation before reaching mustache. Now if a context had been provided in which a specific referent for the ‘she’ had been mentioned, for instance a female gardener, this would quite likely have had an impact on the priming effect on mustache in the sentence where the ‘she’ is doing the trimming. As shown by Morris, when a gardener is directly introduced as the subject of the sentence, the presence of barber and trimmed is not enough to prime mustache [27]. In addition, the model might explain why Fischler et al. ([9], see also Ref. [19]) did not find an N400 at the end of sentences such as A robin is not a bird (versus A robin is a bird), as it seems likely that computation of the correct message-level representation might have been rather difficult before the final word is read (e.g., A robin is not a). The proposed model is also consistent with the results from two recent ERP studies that investigated the processing of semantic anomaly [18,20]. Both studies failed to find an N400 effect to a semantic anomaly, and both used critical words that had good lexico-semantic fit but were semantically anomalous, in sentences with a problematic thematic structure. Kuperberg et al. [20] did not find the expected N400 effect at the critical final word in anomalous sentences such as: ‘When she had a cold her TISSUE would SNEEZE’ as compared to ‘When she had a cold the GIRL would SNEEZE’. The subject NP of the main clause (e.g., tissue) is inanimate, which might cause thematic processing problems (i.e., if the reader assumes it will play the part of agent) and slow down the construction of a message-level representation, while the verb sneeze has close semantic ties to tissue. The results of the study by Kolk et al. [18] can be explained in much the same manner. Interestingly, as in the present experiment, both of the studies mentioned here found a pronounced late positivity following the condition that failed to show the N400 effect. 5.1. Late positivity Besides the interaction in the N400 time-window there was a second pattern of interactions, occurring some 600 – 1100 ms after the final word was presented. In comparison to the strong constraint –good fit condition all conditions elicited a positive shift, which was particularly prominent for the lexico-semantic good fit sentences with weak message-level constraint. Given the predominantly centroparietal scalp distribution of these differences they might be interpreted as involving the P600, the best known ERP component associated with sentence processing in this timewindow, even though maximal amplitude was reached later than in the ‘standard’ case of the P600, that is, approximately 600 ms post-onset (e.g., Refs. [15,29]). P600 effects have been shown to occur in sentences that are ungram- 71 matical, but also in correct sentences with a non-preferred syntactic structure (e.g., Refs. [15,29]), and have accordingly been argued to reflect (attempts at) syntactic reanalysis. In addition, other studies have indicated that the P600 might reflect processes of syntactic integration (e.g., Ref. [17]). On the basis of this research, P600 amplitude is typically viewed as reflecting some kind of effortful syntactic processing that is likely to occur in sentences that are ungrammatical, syntactically ambiguous, or syntactically complex. In the present experiment, however, the sentences are not ungrammatical, and the most likely cause for the observed processing difficulty is not syntactic ambiguity or syntactic complexity, but, rather, thematic processing difficulty. For instance, the largest positivities (at least numerically) were observed in the two weak constraint conditions, where there is a conflict between the thematic role assignment prescribed by the grammatical structure (e.g., javelin is agent/ effector and athletes are patient/experiencer) and the preferred assignment of thematic roles (e.g., javelin is theme and athletes are agent). The largest positive shift was elicited by the weak constraint –good fit condition which might indicate that readers put considerable extra effort into syntactic (re-)analysis of the sentences in this condition as there is a highly plausible alternative thematic structure (e.g., the athletes throw the javelin) competing with the syntax-based thematic structure that leads to an implausible interpretation (e.g., the javelin throws the athletes). In the weak constraint – poor fit condition, on the other hand, there may be problems with thematic role assignment, but in the absence of a viable thematic alternative the effort put into syntactic (re-) processing in order to make sense of these implausible sentences may be much more limited. Finally, the positive shift for the strong constraint – poor fit condition was found to be numerically smallest but still significant (as compared to the one plausible condition), presumably reflecting the effortful syntactic processing that is put into place when a reader tries to make sense of an implausible sentence, even if there is no conflict between prescribed and preferred thematic roles. Thus, the present results are compatible with the hypothesis that a P600 may be elicited in syntactically correct sentences by processing problems originating from semantic or thematic processing difficulties (e.g., Refs. [16,18,20], but see also Refs. [14,11]). We should, however, consider some alternative interpretations of the late positive effect. It could be argued that the effect in this late time-window is actually a P300 elicited by the plausibility decision that was required for every sentence in the experiment. The P300 is very sensitive to taskrelevant factors and its size correlates closely with task difficulty. Under this interpretation of the late positivity, the largest effect, then, should be observed in cases where it is most difficult to decide whether the sentence is plausible; as can be seen in Table 4, this is not the case: The most difficult condition in terms of increased RT and decreased 72 J.C.J. Hoeks et al. / Cognitive Brain Research 19 (2004) 59–73 proportion correct was the strong constraint – poor fit condition, which actually showed the smallest positivity. On a speculative note, the requirement to make a decision per se may have given rise to the generally observable (slow) positive shift at the end of the sentence for all conditions, including the correct one. In a similar vein it could be assumed that the specific pattern of results in the later time window arises from motor responses or motor preparations associated with the plausibility decision responses. This possibility was not supported by post-hoc analyses of the present data, where no differences in waveforms were found between corresponding left and right electrode sites, which is typically found when participants prepare to make a motor response (lateralized readiness potential, see, e.g., Ref. [22]). Thus, it seems unlikely that the preparation or execution of motor responses was responsible for the waveform pattern in the later time-window. In addition to the arguments given above, the possibility of decision or motor related processes interfering with the ERP results in the late positivity time-window has been explicitly addressed in the Kolk et al. study [18]. When they replicated their experiment without an added task, they found identical results, indicating that neither the absence of an N400 effect nor the presence of the late positive effect could be ascribed to task-related processes. Nevertheless, it is clear that more research is needed to verify whether the late positivity that was found in the present experiment actually belongs to the family of the P600, and whether it actually reflects effortful syntactic processing. A final potential concern regarding the interpretation of the late positive effects comes from the difference in the percentage of correct sentences between the active weak constraint sentences (i.e., 66%) and the passive strong constraint sentences (i.e., 50%; see Materials section). The somewhat smaller probability of encountering an implausible active sentence might have given rise to a larger positivity for the experimental active sentences, which were all implausible. This positivity would then reflect a P300 which has been shown to be sensitive to (relatively large) differences in probability of irregular stimuli, rather than a P600 sensitive to syntactic processing (see Ref. [5], but also Ref. [28]). The pattern of results in the late time-window does not seem incompatible with an explanation in terms of differing proportions. The active weak constraint sentences were clearly more positive than the passive strong constraint sentences containing good fit words, and there were also differences between active and passive sentences containing poor fit words (though significant only at P3). However, the whole pattern of results in this late time-window appears to be too complex to be caused by a small difference in proportions. In summary, then, it seems reasonable to interpret the positive shift as being a P600, reflecting effortful syntactic processing in order to obtain a semantically coherent and plausible sentence. 5.2. Conclusion The present experiment was designed to investigate the interaction between message-level constraints on the one hand, and lexico-semantic information on the other during the processing of upcoming words. We have shown that both kinds of information are very important in processing an upcoming word, but not in an additive fashion. Lexicosemantic information was presumed to be responsible for the absence of an N400 effect in the comparison between strong constraint and weak constraint sentences with good fit words, even though these words were anomalous in the weak constraint context. In contrast, the highly specific message-level information in the strong constraint sentences affected primarily the processing of unrelated poor fit words, increasing the N400 amplitude, over and above the effect of semantic anomaly per se. This specific pattern of results was argued to be incompatible with current models of sentence processing, and an extension of the messagelevel hypothesis was suggested, in which the correct sentence interpretation may sometimes remain temporarily underspecified and thus unable to influence the processing of the upcoming word; an underspecified highly schematic version of it may have to be used instead. Although the sentence eventually receives its correct interpretation, there appears to be a temporary semantic illusion of plausibility when the final word has a good lexico-semantic fit. We have suggested that the late positive component (most probably a P600) that was found in this experiment reflected the effortful syntactic processing invested in trying to make sense of an implausible sentence. It was also suggested that this effort will be maximal if there is a highly plausible alternative thematic structure (e.g., the athletes throw the javelin) competing with the syntaxbased thematic structure that leads to the implausible interpretation (e.g., the javelin throws the athletes). Ongoing research in our laboratory is aimed at clarifying under which conditions an underspecified version of the correct message-level will be used, focussing on the timecourse of constructing a correct and coherent sentence representation. Acknowledgements This research was partly funded by a ‘PIONIER’ grant from the Dutch Organization for Scientific Research (NWO) for the project ‘The Neurological Basis of Language’ awarded to Dr. Laurie A. Stowe. We would like to thank Berry Wijers and David Atkinson for helpful remarks, Joop Clots, Hein van Schie, and Hans Veldman for technical assistance, Ingeborg Prinsen for help with the graphics, and Thom Gunter for allowing us to use his high cloze probability materials. Materials used in this study are available on-line via http://odur.let.rug.nl/~hoeks/ BRESC2003_mat.pdf . J.C.J. Hoeks et al. / Cognitive Brain Research 19 (2004) 59–73 References [1] H.R. Baayen, R. Piepenbrock, H. Van Rijn, The CELEX lexical database (CD-ROM), Linguistic Data Consortium, Philadelphia, PA, 1993. [2] S. Bentin, M. Kutas, S.A. Hillyard, Electrophysiological evidence for task effects on semantic priming in auditory word processing, Psychophysiology 30 (1993) 161 – 169. [3] C.M. Brown, P. Hagoort, The processing nature of the N400: evidence from masked priming, J. Cogn. Neurosci. 5 (1993) 34 – 44. [4] C.M. Brown, P. Hagoort, D.J. Chwilla, An event-related brain potential analysis of visual word priming effects, Brain Lang. 72 (2000) 158 – 190. [5] S. Coulson, J.W. King, M. Kutas, Expect the unexpected: event-related brain responses to morphosyntactic violations, Lang. Cogn. Processes 13 (1998) 21 – 58. [6] S.A. Duffy, J.M. Henderson, R.K. Morris, Semantic facilitation of lexical access during sentence processing, J. Exper. Psychol., Learn., Mem., Cogn. 15 (1989) 791 – 801. [7] K.D. Federmeier, M. Kutas, A rose by any other name: long-term memory structure and sentence processing, J. Mem. Lang. 41 (1999) 469 – 495. [8] C.J. Fillmore, The case for case, in: E. Bach, R.T. Harms (Eds.), Universals in Linguistic Theory, Holt, Rinehart, and Winston, New York, 1968, pp. 1 – 88. [9] I. Fischler, P.A. Bloom, D.G. Childers, S.E. Roucos , N.W. Perry Jr., Brain potentials related to stages of sentence verification, Psychophysiology 20 (1983) 400 – 409. [10] A.D. Friederici, K. Steinhauer, S. Frisch, Lexical integration: sequential effects of syntactic and semantic information, Mem. Cogn. 27 (1999) 438 – 453. [11] S. Frisch, M. Schlesewsky, The N400 reflects problems of thematic hierarchizing, NeuroReport 12 (2001) 3391 – 3394. [12] J.S. Gruber, Lexical Structures in Syntax and Semantics, North-Holland, Amsterdam, 1976. [13] T.C. Gunter, L.A. Stowe, G. Mulder, When syntax meets semantics, Psychophysiology 34 (1997) 660 – 676. [14] T.C. Gunter, A.D. Friederici, H. Schriefers, Syntactic gender and semantic expectancy: ERPs reveal early autonomy and late interaction, J. Cogn. Neurosci. 12 (2000) 556 – 568. [15] P. Hagoort, C.M. Brown, J. Groothusen, The Syntactic Positive Shift (SPS) as an ERP measure of syntactic processing, Lang. Cogn. Processes 8 (1993) 439 – 483. [16] A. Hahne, J.D. Jescheniak, What’s left if the Jabberwock gets the semantics? An ERP investigation into semantic and syntactic processes during auditory sentence comprehension, Cogn. Brain Res. 11 (2001) 199 – 212. [17] E. Kaan, A. Harris, E. Gibson, P.J. Holcomb, The P600 as an index of [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] 73 syntactic integration difficulty, Lang. Cogn. Processes 15 (2) (2000) 159 – 201. H.H.J. Kolk, D.J. Chwilla, M. van Herten, P.J.W. Oor, Structure and limited capacity in verbal working memory: a study with Event Related Potentials, Brain Lang. 85 (2003) 1 – 36. J. Kounios, P.J. Holcomb, Structure and process in semantic memory: evidence from event-related brain potentials and reaction times, J. Exper. Psychol., Learn., Mem., Cogn. 121 (1992) 459 – 479. G.R. Kuperberg, T. Sitnikova, D. Caplan, P. Holcomb, Electrophysiological distinctions in processing conceptual relationships within simple sentences, Cogn. Brain Res. 17 (2003) 117 – 129. M. Kutas, In the company of other words: electrophysiological evidence for single-word and sentence context effects, Lang. Cogn. Processes 8 (1993) 533 – 572. M. Kutas, E. Donchin, Preparation to respond as manifested by movement-related brain potentials, Brain Res., (1980) 95 – 115. M. Kutas, S.A. Hillyard, Brain potentials during reading reflect word expectancy and semantic association, Nature, (1984) 161 – 163. M. Kutas, T.E. Lindamood, S.A. Hillyard, Word expectancy and event-related brain potentials during sentence processing, in: S. Requin, J. Requin (Eds.), Preparatory States and Processes, Erlbaum, Hillsdale, NJ, 1984, pp. 217 – 237. M. Lamers, Sentence processing: using syntactic, semantic, and thematic information, Unpublished Doctoral thesis. University of Groningen, The Netherlands, 2001. D.E. Meyer, R.W. Schvaneveldt, Facilitation in recognising pairs of words: evidence of a dependence between retrieval operations, J. Exp. Psychol. 90 (1971) 227 – 235. R.K. Morris, Lexical and message-level sentence context effects on fixation times in reading, J. Exper. Psychol., Learn., Mem., Cogn. 20 (1994) 92 – 103. L. Osterhout, P. Hagoort, A superficial resemblance doesn’t necessarily mean you’re part of the family: Counterarguments to Coulson, King, and Kutas in the P600/SPS debate, Lang. Cogn. Processes 14 (1998) 1 – 14. L. Osterhout, P.J. Holcomb, Event-related brain potentials elicited by syntactic anomaly, J. Mem. Lang. 31 (1992) 785 – 806. A.S. Sanford, P. Sturt, Depth of processing in language comprehension: not noticing the evidence, Trends Cogn. Sci. 6 (2002) 382 – 386. J. Stevens, Applied Multivariate Statistics for the Social Sciences, Erlbaum, Hillsdale, NJ, 1992. C. Van Petten, A comparison of lexical and sentence-level context effects and their temporal parameters, Lang. Cogn. Processes 8 (1993) 485 – 532. C. Van Petten, Words and sentences: event-related brain potential measures, Psychophysiology 32 (1995) 511 – 525. J.N. Williams, Constraints upon semantic activation during sentence comprehension, Lang. Cogn. Processes 3 (1988) 165 – 206.
© Copyright 2026 Paperzz