Seeing words in context: the interaction of lexical and sentence level

Cognitive Brain Research 19 (2004) 59 – 73
www.elsevier.com/locate/cogbrainres
Research report
Seeing words in context: the interaction of lexical and sentence level
information during reading
John C.J. Hoeks *, Laurie A. Stowe, Gina Doedens
BCN NeuroImaging Centre, Behavioral and Cognitive Neurosciences, University of Groningen, P.O. Box 716, 9700 AS Groningen, The Netherlands
Accepted 29 October 2003
Abstract
The ERP experiment reported here addresses some outstanding questions regarding word processing in sentential contexts: (1) Does only
the ‘message-level’ representation (the representation of sentence meaning combining lexico-semantic and syntactic constraints) affect the
processing of the incoming word [J. Exp. Psychol.: Learn. Mem. Cogn. 20 (1994) 92]? (2) Is lexically specified semantic relatedness between
multiple words the primary factor instead [J. Exp. Psychol.: Learn. Mem. Cogn. 15 (1989) 791]? (3) Alternatively, do word and sentence
level information interact during sentence comprehension? Volunteers read sentences (e.g. Dutch sentences resembling The javelin was by the
athletes. . .) in which the (passive) syntactic structure and the semantic content of the lexical items together created a strong expectation of a
specific final word (e.g., thrown), but also sentences in which the syntactic structure was changed from passive to active (e.g. Dutch sentences
resembling The javelin has the athletes. . .), which altered the message level constraint substantially and strongly reduced the expectation of
any particular completion. Half of the sentences ended in a final word with a good lexico-semantic fit relative to the preceding content words
(e.g. thrown, fitting well with the preceding javelin and athletes). This creates very plausible sentences in the strong constraint context but
semantically anomalous ones in the weakly constraining context (e.g., The javelin has the athletes thrown). In the other half the final word
had a poor lexico-semantic fit (e.g., summarized that does not fit at all with javelin and athletes). Good lexico-semantic fit endings showed no
difference in N400 amplitude in the strong and weak message-level constraint sentences, despite the fact that the latter were semantically
anomalous. This result suggests that lexico-semantic fit can be more important for word processing than the meaning of the sentence as
determined by the syntactic structure, at least initially. These conditions did differ, however, in the region of the P600 where the anomalous
weak constraint version was much more positive, a pattern usually seen with ungrammatical sentences. The processing of poor lexicosemantic fit words showed a quite different pattern; in both strong and weak constraint sentences they elicited a substantial N400 effect, but
N400-amplitude was significantly more negative following strong constraint contexts, even though both sentence contexts were equivalently
anomalous. Taken together, these findings provide evidence for the importance of both message-level and lexico-semantic information during
sentence comprehension. The implications for theories of sentence interpretation are discussed and an extension of the message-based
hypothesis will be proposed.
D 2003 Elsevier B.V. All rights reserved.
Theme: Neural Basis of Behavior
Topic: Cognition
Keywords: Sentence comprehension; Syntax; Semantics; Message-level constraint; ERPs; N400; P600
1. Introduction
It is well established that the processing of a word is
influenced by its preceding sentential context. Both lexicosemantic and syntactic aspects of the context appear to be
crucially involved, but the mechanism by which context
* Corresponding author. Tel.: +31-50-363-7443; fax: +31-50-3636855.
E-mail address: [email protected] (J.C.J. Hoeks).
0926-6410/$ - see front matter D 2003 Elsevier B.V. All rights reserved.
doi:10.1016/j.cogbrainres.2003.10.022
affects the processing of upcoming words is still not well
understood, despite numerous behavioral and electrophysiological studies. Quite a number of studies using the lexical
decision paradigm (starting with Meyer and Schvaneveldt
[26]) have shown that deciding that a given letter string is an
existing word may be speeded up by the presence of a
semantically related word, an effect which will be referred
to here as ‘simple’ lexical priming. However, whether
simple lexical priming plays a role in sentence context has
been challenged. In an influential study using eye tracking,
60
J.C.J. Hoeks et al. / Cognitive Brain Research 19 (2004) 59–73
Duffy et al. [6] showed that this simple lexical priming
between a pair of words does not appear to occur when the
words are embedded in a sentence. They found a facilitatory
effect for the target word mustache in sentence contexts
containing two words related to the target (e.g., The BARBER TRIMMED the MUSTACHE), but failed to find such an
effect for contexts containing only one of these words (e.g.,
The woman TRIMMED the MUSTACHE or The BARBER
saw the MUSTACHE). On the basis of this result, Duffy et
al. concluded that at least two mutually supportive content
words are necessary to evoke activation levels high enough
to cause priming effects during sentence processing (see
also Ref. [34]). This view will be called the ‘combinatory’
account of lexical priming in sentence processing, because it
assumes that priming effects depend on a specific combination of words, which is more than just a sum of the
priming effects for each word separately.
Duffy et al. not only looked into lexico-semantic aspects
of priming, but also investigated the possible role of the
syntactic structure of the preceding context. They did this by
comparing sentences that were almost identical with respect
to content words but differed in syntactic structure, such as:
While talking to the BARBER, she TRIMMED the MUSTACHE versus While she talked to him, the BARBER
TRIMMED the MUSTACHE. Somewhat surprisingly, they
found that both sentence contexts showed an equal amount
of facilitation for mustache (as compared to a neutral
baseline), even though their syntactic structure was rather
different, resulting in a different message. Duffy et al.
concluded that syntactic structure of the preceding context
was irrelevant for the processing of upcoming words, as
long as it was intact. In their view, the processing of an
upcoming word will only be facilitated when two or more
words in the context bear a lexico-semantic relation with
that word.
Morris [27], however, arrived at a very different conclusion in a follow-up study, also using eye tracking. When
looking at lexico-semantic facilitation effects in sentences
such as The gardener talked as the BARBER TRIMMED the
MUSTACHE, she found the expected reduction in reading
times on mustache. However, in a sentence such as The
gardener talked to the BARBER and TRIMMED the MUSTACHE, there was no facilitation at all for the target word.
Morris argued that the representation at the so-called message-level (i.e., representing the ‘message’ that the sentence
is meant to convey) changed so much as a result of changing
the syntactic structure that no facilitation for mustache was
obtained. In other words, the processing of a word that is
congruent with the message-level representation of the
preceding context will be facilitated, whereas an incongruent word will not.
Does this result definitively demonstrate that every
instance of word processing in context must be explained
by the ‘message-level hypothesis’? Morris may have failed
to find an effect of lexico-semantic relatedness partly
because of the use of reading time as the dependent
measure. Recall that the crucial sentence describes a gardener trimming a mustache, which is rather implausible,
especially given that some other object, e.g., a hedge, might
have been expected instead of a mustache. So a possible
decrease in reading time due to lexico-semantic facilitation
might have been canceled out by an increase in reading time
due to implausibility. Additionally, Morris pointed out that
there was evidence for simple lexical priming in the very
same experiment: reading times on the verb trimmed were
significantly faster when preceded by the related word
barber than by an unrelated word, and this was true even
for the sentence where the gardener was doing the trimming.
To sum up, at least three mechanisms by which context
can affect the processing of upcoming words have been
proposed in the literature: via simple lexical priming, via a
combinatory lexical process or via the message-level, all of
which have received only mixed support from behavioral
studies. These same possibilities, with the exception of the
combinatory lexical approach, have been under investigation in electrophysiological studies using ERPs (Event
Related brain Potentials), with a focus on one particular
component of the ERP waveform, namely the N400, which
is a negative going deflection that peaks around 400 ms after
stimulus onset.
The results of Kutas and Hillyard [23], for instance,
demonstrate the impact of the message-level representation
on the processing of upcoming words, showing that the
amplitude of the N400 is highly inversely correlated with
the so-called ‘cloze probability’ or ‘expectedness’ of a word
after a given context. Cloze probability is determined by
asking volunteers to complete a sentence fragment and then
calculate the proportion of times they agree on the same
word. Kutas and Hillyard showed that the stronger the
message-level constraint, that is, the stronger the expectation for a specific final word, the smaller the amplitude of
the N400 if that word is actually presented. Similarly, words
that make the sentence implausible or semantically anomalous give rise to a larger N400; the poorer the fit with the
semantic representation of the context, the larger the N400.
Though their experiment was not set up to test this, the
results of Kutas and Hillyard provide evidence for ‘simple’
lexical priming as well, since words that did not fit very well
with the context but that were closely related to the best
completion also showed attenuated N400s (see also Refs.
[7,24]). Consider for instance ‘‘He liked lemon and sugar in
his coffee’’, where tea is the most expected word, but coffee
is a close semantic relative. This has been called the ‘related
anomaly’ effect [21]. Thus, the N400 was shown to be
sensitive to lexico-semantic relatedness as well as to the
message-level representation of the sentence, which incorporates both the aspect of constraint imposed by the
sentential context (e.g., effects of cloze probability) and
the aspect of the meaning of the sentence as a whole (i.e.,
effects of semantic anomaly).
Results from an important study by Van Petten [32] also
suggest that the processing of upcoming words may be
J.C.J. Hoeks et al. / Cognitive Brain Research 19 (2004) 59–73
affected via both the message-level and the lexico-semantic
route. Van Petten compared the N400 for words appearing
in sentences that did or did not have a coherent messagelevel representation. She found that the second of a pair of
semantically related words consistently showed a smaller
N400 than the first word, both in coherent sentences (e.g.,
‘‘When the MOON is full it is hard to see the STARS. . .’’),
and in anomalous sentences, where no coherent messagelevel representation can be formed (e.g., ‘‘When the MOON
is rusted it is available to buy many STARS. . .’’). Because
the N400 difference between first and second words was
significantly larger in the coherent condition, she concluded
that, in congruent sentences, both lexico-semantic association and the presence of a coherent message-level representation had a beneficial effect on the processing of the second
word.
However, this conclusion may not be entirely warranted
if we also take the data from the unrelated words into
account (e.g., ‘‘The biologist went to the desert every
WEEK to collect a particular SPECIES. . .’’ vs. ‘‘The shirt
went to the gun every WEEK to keep a good SPECIES. . .’’). Note that the following discussion is largely
based on a later paper by Van Petten [33, p. 521, Fig. 10]
which summarizes the findings for related and unrelated
word pairs in coherent and anomalous sentences. If we
visually compare the (closely matched) second words of
related and unrelated pairs (e.g., STARS vs. SPECIES,
respectively) in the anomalous conditions, there is no
difference in N400 amplitude, which is rather unexpected
if lexical priming independently affects the processing of
the second word of the related pair. Likewise, when
directly comparing the patterns of results for related and
unrelated pairs in coherent sentences it does not appear
that the effect of coherence is different for related and
unrelated pairs (though this specific comparison was not
reported in the original article). This suggests that also in
the case of related words in coherent sentences simple
lexical priming may not actually have had an effect over
and above the effect of sentence coherence. In other
words, though Van Petten’s data provide convincing evidence for the effect of coherence at the message level,
there seems to be less solid evidence for word-level
effects.
Taken together, the results of previous behavioral and
electrophysiological studies are still inconclusive as to what
roles message-level and lexico-semantic information play in
the processing of words in sentence contexts.
1.1. Present experiment
The present experiment was designed to investigate
the interaction between message-level information on the
one hand and lexico-semantic information on the other
during the processing of upcoming words. The dependent
measure will be the amplitude of the N400, which has
been shown to be sensitive to both message-level con-
61
straint and lexico-semantic priming [2,4,21,23,32,33].
Materials were used that differed along two orthogonal
dimensions, namely message-level constraint (strong vs.
weak) of the sentence context, and lexico-semantic fit
(good fit vs. poor fit) of the critical final word with the
preceding content words. We will first describe these two
factors in some detail and then discuss the experimental
hypotheses.
The first factor, message-level constraint, was manipulated by taking passive sentences that had a strong messagelevel constraint (i.e., average cloze probability for the final
word = 77%) and transforming them into active sentences
having a much weaker message-level constraint (i.e., average cloze probability for the final word = 21%; see Materials
section for the results of cloze tasks on all materials). This is
a much stronger version of the type of manipulation used by
Duffy et al. [6] and Morris [27]. Consider for example,
sentence contexts 1a and 1b.
(1a) De speer werd door de atleten. . . (lit. The javelin was
by the athletes. . .)
(1b) De speer heeft de atleten. . . (lit. The javelin has the
athletes. . .)
As can be seen here, changing the syntactic structure
from passive to active has important consequences for the
message-level representation of these partial sentences
because of the impact syntactic structure has on the
assignment of thematic roles. A thematic role can be
described as the role an entity plays in the event described
by a sentence (e.g., Refs. [8,12]). For example, the term
agent refers to the entity that intentionally performs an
action; the entity that undergoes this action is labelled
patient or theme. Through the use of the passive construction in fragment 1a it is clear that the grammatical subject
the javelin is the theme, and that the athletes play the role
of agent. With such a thematic configuration it is very easy
to come up with a completion, which in this case should be
a verb describing something athletes can do with a javelin,
of which the word thrown is produced most frequently in a
cloze task (see below).
In fragment 1b, however, the active syntactic structure in
which the words are embedded rules out that the javelin is
the theme and, instead, forces it to play the role of agent (or
maybe effector is a more appropriate term here), with the
athletes as the patient. In this configuration, then, the
thematic roles that are prescribed by the syntactic structure
are not the thematic roles that are ‘preferred’ on the basis of
the semantic features of the entities concerned. As a result,
sentence fragments such as 1b are difficult to complete; the
most plausible completions are typically verbs such as
amazed or worried which refer to emotional responses
evoked by the inanimate entity (see Materials). Thus by
changing the syntactic structure, a major change in the
message-level representation was obtained both in terms
of the meaning of the (partial) sentence and in terms of
62
J.C.J. Hoeks et al. / Cognitive Brain Research 19 (2004) 59–73
the degree of constraint that is imposed by the sentence
fragment.
The second factor that was varied was the lexico-semantic fit of the final word with the preceding words. The
concept ‘lexico-semantic fit’ (for want of a better term) will
refer to the extent to which the final word and the preceding
content words of a given sentence fit together semantically.
This does not necessarily mean that there is a strong
associative or semantic relationship between all of the
content words. For instance, in a sentence such as 2a (see
Table 1) there is not much lexico-semantic association
between athletes and thrown. Rather, words with a good
lexico-semantic fit go together very well and easily conjure
up a specific situation or event (e.g., javelin/athletes/
thrown), in contrast to cases where the final words do not
fit with the preceding words (e.g., javelin/athletes/summarized), as in 2c and 2d. Please note that the term ‘lexicosemantic fit’ does not refer to the semantic plausibility of the
sentence as a whole. In effect, sentence 2b, for instance, is
semantically implausible due to the particular syntactic
structure that is used, but its content words have a good
lexico-semantic fit.
Using these materials allows us to address the question of
whether the aspects of processing an upcoming word (as
reflected in the N400) are influenced by the message-level
representation of the preceding context, by the lexicosemantic links that exist between the word itself and the
words that precede it, or by both. We can distinguish four
patterns of results that are expected if one or both factors
affects the processing of the final words of our current
experiment (for a summary, see Table 2).
Let us start with the simplest pattern, where neither
message-level constraint provided by the preceding context
nor lexico-semantic fit has a direct effect on the processing
of the upcoming word; rather, only the plausibility of the
incoming word is assumed to be important for the amplitude
of the N400. This is the view that the primary determinant
of the N400 is semantic integration, unaffected by factors
such as message level constraints of the sort manipulated
here or semantic relationship when they do not affect the
plausibility of the sentence. In that case, the smallest N400
amplitude will be found in the highly plausible strong
Table 1
Examples of the materials used in the experiment
Good Fit
Poor Fit
Strong message-level
constraint
Weak message-level
constraint
2a. De speer werd door
de atleten geworpen
lit. The javelin was by
the athletes thrown
?? 2c. De speer werd
door de atleten opgesomd
lit. The javelin was by
the athletes summarized
?? 2b. De speer heeft de
atleten geworpen
lit. The javelin has the
athletes thrown
?? 2d. De speer heeft de
atleten opgesomd
lit. The javelin has the
athletes summarized
?? = grammatically correct but semantically implausible; Good/Poor
Fit = Good/Poor lexico-semantic fit.
Table 2
Expected patterns in N400-amplitude (from least negative to most negative)
as a function of the presence or absence of effects of message-level
constraint and lexico-semantic fit
Hypothesis 1: NO effect of Message-Level Constraint and NO effect of
Lexico-Semantic Fit
Prediction: Only the plausibility of the sentence will influence the N400;
the strong constraint – good fit condition (2a) is plausible, so N400
amplitude will be small; the other conditions are equivalently implausible,
and thus cause equally large N400s
Predicted Order of N400 amplitude:
(2a) The javelin was by the athletes THROWN < (2b) The javelin has the
athletes THROWN = (2d) The javelin has the athletes SUMMARIZED =
(2c) The javelin was by the athletes SUMMARIZED
Hypothesis 2: NO effect of Message-Level Constraint, ONLY effect of
Lexico-Semantic Fit
Prediction: There will be a lexical effect on top of the effect of plausibility
(see Hypothesis 1): N400 amplitude decreases for words that fit
lexico-semantically (2a and 2b); no difference is expected between
sentences with poor lexico-semantic fit (2d and 2c)
Predicted Order of N400 amplitude:
(2a) The javelin was by the athletes THROWN < (2b) The javelin has the
athletes THROWN < (2d) The javelin has the athletes SUMMARIZED =
(2c) The javelin was by the athletes SUMMARIZED
Hypothesis 3: ONLY effect of Message-Level Constraint, NO effect of
Lexico-Semantic Fit
Prediction: N400 amplitude will be smallest if the critical word matches the
strong message-level constraint (2a); it will be greatest if the word does not
match it (2c); in weak constraint sentences no specific word is expected and
thus no mismatch will occur (2b and 2d)
Predicted Order of N400 amplitude:
(2a) The javelin was by the athletes THROWN < (2b) The javelin has the
athletes THROWN = (2d) The javelin has the athletes SUMMARIZED < (2c)
The javelin was by the athletes SUMMARIZED
Hypothesis 4: Effects of BOTH Message-Level Constraint AND LexicoSemantic Fit
Prediction: Same as Hypothesis 3, except that N400 amplitude to words
with a good lexico-semantic fit (2b) will be smaller than to words with a
poor lexico-semantic fit (2d), other things being equal
Predicted Order of N400 amplitude:
(2a) The javelin was by the athletes THROWN < (2b) The javelin has the
athletes THROWN < (2d) The javelin has the athletes SUMMARIZED <
(2c) The javelin was by the athletes SUMMARIZED
‘‘(2a). . . < (2b). . .’’, means that N400 amplitude for condition 2a is less
negative than for 2b.
constraint – good fit condition (‘‘The javelin was by the
athletes thrown’’), and an equally large N400 for the three
other conditions, which were rated equally implausible.
Now if message-level constraint but not lexico-semantic
relatedness were to have an effect, then the smallest N400
should be found for the plausible strong constraint – good fit
condition, and the largest N400 for strong constraint – poor
fit sentences, because in the former condition the final word
is also the most expected word, but in the latter there is a
substantial mismatch between the strong message-level
constraint and the word that is actually presented (e.g.,
‘‘The javelin was by the athletes summarized’’). The other
two conditions, which do not have a strong message-level
constraint but are implausible because their final words do
not fit into the existing semantic representation of the
context, will be expected to have an (equal) N400 some-
J.C.J. Hoeks et al. / Cognitive Brain Research 19 (2004) 59–73
where in between. That is, we predict that a word which
does not fit into a highly constraining context will be harder
to process than a word which does not fit into a low
constraint context.
If, on the other hand, lexico-semantic fit but not message
level constraint is important for processing the next word,
the N400 for the two good fit conditions should be comparable and quite small. We do expect a somewhat more
negative N400 amplitude for the implausible weak constraint – good fit condition, however, somewhat similar to
the ‘related anomaly’ effect mentioned above [21]. A much
larger N400 is expected where the lexico-semantically poor
fit conditions are concerned.
Finally, if both message-level constraint and lexicosemantic fit operate during the processing of a new word
(which seems possible, given the conflicting evidence in the
literature), the resulting interaction could take the following
form. The smallest N400 will be expected, as before, for the
plausible strong constraint – good fit condition. A somewhat
larger N400 is expected for the weak constraint – good fit
condition, as the final word fits well with the preceding
words but also makes the sentence semantically anomalous.
The weak constraint – poor fit condition is expected to elicit
a still larger N400, because it does not have a good fit
ending, but the largest N400 is expected for strong constraint – poor fit sentences, where the upcoming word is
neither compatible with the message-level constraint, nor
does it fit with the preceding words. See Table 2 for a
summary of these hypotheses and the associated predictions
in terms of N400 amplitude.
It should be noted that in the present design main effects
of either factor will not be easily interpretable. Only one of
the four conditions is semantically plausible, so it will
always be the case that one particular level of a main effect
will contain an average over a plausible and an implausible
condition, whereas for the other level two implausible
conditions are averaged. However, since the aim of this
experiment is to find evidence for either one of four
specified forms of interaction between lexico-semantic fit
and message-level constraint, we will not report main effects
of either of these factors, but instead focus on interactions
and planned comparisons.
One other aspect of the present design must also be taken
into account when interpreting the results and that is the fact
that the two levels of the factor message-level constraint
differ not only in constraint but also in sentence structure,
since the sentences with strong message-level constraint
have a passive sentence structure, and the weak messagelevel constraint sentences an active structure. This difference in sentence structure by itself may have consequences
for processing, as active sentences such as (2b) and (2d) are
temporarily ambiguous in Dutch between a subject-initial
and an object-initial reading. In the sentences used in the
current study, however, they are disambiguated by the
number information on the second NP. We will discuss this
issue further in the Materials section.
63
2. Materials and methods
2.1. Materials
A total of 112 sentences were selected from a list of high
cloze probability Dutch passive sentences used by Gunter,
Stowe, and Mulder [13]. As the cloze probability of the final
words of the selected sentences was higher than 60%
(average = 77%, see Table 3), the sentence contexts created
from these sentences by leaving out the final word can be
said to have a strong message-level constraint, as they are
preferentially completed with one specific final word (e.g.,
the word thrown in the context of The javelin was by the
athletes. . .).
The weak message-level constraint version was derived
from the strong message-level constraint sentences by
changing the syntactic structure from passive to active
(e.g., The javelin has the athletes. . ..). Average cloze probability for the final words of the weak message-level
constraint contexts was determined in a separate cloze-task
pre-test (N = 26) and turned out to be 21% (SD = 12). Please
note that in these contexts, the high cloze probability word
from the strong constraint contexts were (almost) never
generated (i.e., had a cloze probability of 1%, see Table
3). The average cloze probability for the weak constraint
sentences would have been much lower than 21%, except
that participants tended to choose a generally applicable
verb for all these sentences instead of a specific completion.
This can be illustrated by the fact that, for example, the verb
verbaasd (‘amazed’) received the highest cloze probability
in more than one third of all weak constraint sentence
contexts. The difficulty of coming up with a suitable
specific final verb is also reflected in a difficulty of
completion rating which participants gave for all sentences
before actually completing the sentence fragments. On a
scale of 1 (easy) to 5 (difficult) they had to rate how difficult
it was to come up with a suitable completion. Average
difficulty rating in weak constraint contexts was 3.4
(SD = 0.6), significantly higher than the 1.5 (SD = 0.4) for
strong constraint contexts. Thus, two sets of 112 sentence
Table 3
Length (in # Characters), log-frequency, and cloze probability (in
proportions) for the final words in the target sentences, and plausibilityratings for target sentences as a whole (standard deviations in parentheses)
Length
LogFrequency
Cloze Probability – Strong Constraint
Cloze Probability – Weak Constraint
Plausibility – Strong Constraint
Plausibility – Weak Constraint
Good fit word
Poor fit word
7.9
1.5
0.77
0.01
4.5
1.4
7.9
1.3
0.00
0.00
1.6
1.4
(1.4)
(0.7)
(0.1)
(0.4)
(0.4)
(0.3)
(1.3)
(0.8)
(0.0)*
(0.0)
(0.4)*
(0.4)
Constraint = message-level constraint; good/poor fit = good/poor lexicosemantic fit; plausibility ratings from 1 (highly implausible) to 5 (highly
plausible).
* Significant difference ( p < 0.05) between good fit and poor fit words
on the variable given in the row; all differences were tested.
64
J.C.J. Hoeks et al. / Cognitive Brain Research 19 (2004) 59–73
contexts were obtained, one with strong message-level
constraint and one with a weak message-level constraint.
For the good lexico-semantic fit conditions, the original
cloze probability words [13] were used to complete the
sentences. These words had a good lexico-semantic fit with
the other content words in the sentence. As was mentioned
before, this does not necessarily mean that all content words
were highly associatively related. In a minority of sentences
there was a lexico-semantic association between the final
word and the immediately preceding word: In 74 of the 112
sentences the final word was not associatively related to the
immediately preceding word, in the remaining 38 sentences
there was a moderate to high associative relationship between final and prefinal word (as assessed by two independent judges). These sentences are rather typical of
combinatory lexico-semantic relationships (cf. Ref. [6]), as
the combination of the two context words provides a clear
semantic setting for the final word.
Finally, to create the poor lexico-semantic fit conditions,
each of the good fit target words were replaced by a poor
fit one. Thus, a total of 112 sets of four sentences were
created. See Table 1 for an example set. When constructing
the materials, the final words for the good fit and poor fit
conditions were carefully matched on frequency of occurrence (using the CELEX lexical frequency database, see
Ref. [1]) and on word length. Furthermore, plausibility
ratings on a scale from 1 (highly implausible) to 5 (highly
plausible) for sentences in all four conditions were obtained
from 40 students of the University of Groningen who did
not participate in the ERP-experiment. As expected, only
strong constraint sentence contexts with a good fit final
word were judged plausible; the sentences in the other
three conditions were rated as highly implausible. There
were no significant differences on any of the stimulus
characteristics between good fit and poor fit words, except
where such a difference was intended, that is, in cloze
probability of the word with the good lexico-semantic fit as
compared to the word with the poor fit (i.e., 77% vs. 0%)
and in rated plausibility of good fit versus poor fit words in
strong constraint contexts (i.e., 4.5 vs. 1.4, on a scale of
1 = highly implausible and 5 = highly plausible). Stimulus
characteristics, including plausibility ratings are presented
in Table 3.
Four lists were created (see Design section for more
details) containing 112 experimental sentences, 28 sentences
for each of the four conditions, with no repetition of related
sentences within a list. Since all of the weak message-level
constraint sentences (i.e., 56) and half of the strong constraint sentences (i.e., 28) on any list were semantically
implausible, 112 plausible filler items, all active sentences
with a syntactic structure similar to the experimental items,
were added to prevent a bias toward expecting implausible
sentences. We should point out that this created a difference
in proportion of plausible sentences between actives (112
out of 168 are plausible = 66%) and passives (28 out of 56
are plausible = 50%). Though this difference seems rather
small, it might influence the results. We will come back to
this issue in the discussion.
We have already discussed the fact that, since the active
weak message-level constraint sentences have an ambiguous
syntactic structure, readers could in principle take the first
NP (e.g., the javelin) to be the grammatical object of the
sentence. If they were to do so, the number information on
the second NP (e.g, the athletes) which does not match the
number of the auxiliary (e.g., has) would necessitate syntactic reanalysis, which might complicate the interpretation
of effects at the subsequent target word (e.g., thrown). Since
the object initial structure is highly non-preferred [25], this
seems unlikely; however, it is important to clearly demonstrate that readers do not consider the object-initial reading.
To investigate exactly how non-preferred the objectinitial reading is, we conducted a second paper-and-pencil
plausibility judgment test. Participants (N = 20) who did not
take part in the ERP experiment, were asked to judge the
plausibility of a syntactically ambiguous sentence such as
[3], where both NP arguments agree with the number
information on the auxiliary. Since both NPs match the
auxiliary with respect to number, either one can be the
subject of the sentence.
3. De speer heeft de atleet geworpen (lit. The javelin[sing]
has[sing] the athlete[sing] thrown)
A five-point rating scale was used, with 1 = highly
implausible and 5 = highly plausible. If readers commonly
entertain object-initial readings they will rate these sentences as semantically plausible. However, the rating data
clearly show that readers disprefer the object-initial reading,
as they rated the ambiguous sentences as highly implausible; participants gave an average plausibility rating of 1.9
(SD 1.1). In comparison, unambiguous implausible sentences such as De speer heeft de atleten geworpen (lit. The
javelin has the athletes thrown) received an average rating of
1.5 (SD 0.9). This difference was not significant. The
plausible sentence De speer werd door de atleten geworpen
(lit. The javelin was by the athletes thrown) received an
average of 5.0 (SD 0.0), significantly different from both
other ratings. These results suggest that even when readers
are completely free to choose the object-initial reading, they
do not normally do so; it seems to escape readers that they
have this possibility at all.
3.1. Participants
Twenty-four native speakers of Dutch took part in the
experiment on a voluntary basis. During analysis, the data
from one participant turned out to be unusable due to an
error in the data-storage process. The data from the remaining 23 participants (18 female; mean age 25 years, age range
20– 35) were used for analysis. All participants were currently receiving a university education or had recently
J.C.J. Hoeks et al. / Cognitive Brain Research 19 (2004) 59–73
completed a degree. All were right-handed, had normal or
corrected-to-normal vision, and none of them reported
having had neurological problems.
3.2. Design
In total 112 sets of four experimental sentences were
constructed. The four conditions of each set were obtained
by crossing the factors message-level constraint (strong
constraint vs. weak constraint) and lexico-semantic fit
(good fit vs. poor fit). Four experimental lists were created
using a Latin Square, with 28 items occurring in each
condition on each list, and no list containing more than
one version of a given item. The order in which experimental and filler items appeared was determined semirandomly (with the conditions spread evenly over the list)
and was the same for all lists. Each list was presented to
an equal number of participants, although for one list the
data from only five instead of six participants were useable
(see section on Participants). Each participant saw only
one list.
3.3. Experimental procedure
Participants were tested in a dimly lit, sound-proof
cabin. They sat facing a computer screen at approximately
50 cm distance; a chin-rest was used to minimize movement artefacts. The participant’s index fingers rested on
touch-sensitive response boxes, which recorded a response
when either of the index fingers was lifted. Participants
were instructed to read each sentence for comprehension,
and to give a plausibility judgment at a cue after reading
the complete sentence. They were further instructed to use
their right finger to indicate that a sentence was plausible
and their left finger to indicate that a sentence was
implausible.
At the beginning of each trial, a fixation mark (an asterisk)
appeared for 1 s. After that, a sentence was presented wordby-word in the centre of the screen. Each word remained on
screen for 240 ms, and was followed by a blank screen with a
duration of 240 ms. After the presentation of the final word
(marked by a period) and the subsequent blank screen (also
240 ms), the one-word question ‘‘Goed?’’ ( = ‘‘Correct?’’)
appeared on screen for 3 s during which participants could
make their plausibility judgement. The word ‘‘Knipper’’
( = ‘‘Blink’’) was then shown for another 3 s, giving participants the opportunity to blink; they were instructed to avoid
blinking during the presentation of the sentence to avoid eyemovement and blink related artefacts.
The experimental and filler sentences, 224 items in total
per list, were divided into six blocks, the first five of
which contained 40 stimuli each, whereas the final block
contained 24 stimuli. At the end of each but the last block
participants were invited to take a short break before continuing with the experiment. The experiment took about
45 min.
65
3.4. EEG recording parameters
The EEG activity was recorded by means of tin electrodes mounted in an elastic cap (Electro-Cap International)
from 21 electrode sites, based on the International 10 – 20
system, of which 15 will be used for further analysis: Fp1,
Fp2, Fza, F3, F4, Fz, C3, C4, Cz, P3, P4, Pz, O1, O2, Oz.
The base electrode was positioned at 10% of the nasioninion distance anterior to Fz. Bipolar horizontal EOG was
recorded between electrodes at the outer left and right
canthus. Bipolar vertical EOG was recorded for both eyes.
Electrode impedances were kept below 5 kV. All electrodes
were referenced on-line to linked mastoids. EEG and EOG
signals were sampled at 1000 Hz, amplified (EEG: 0.2 mV/
V; EOG: 0.5 mV/V; time constant: 10 s), and digitally
lowpass filtered with a cut-off frequency of 30 Hz; effective
sample frequency was 100 Hz.
3.5. Data analysis
For the behavioral data, RT was defined as the interval
between the onset of the cue (i.e., ‘‘Correct?’’) and the
participant’s lifting of the left or right index finger. Average
proportions of correct responses and average RTs for correct
responses were computed for each participant and each
condition (response time relative to word onset is 480 msec
longer than the RT from the cue).
Average ERPs were computed for each electrode site and
for each participant in each condition for a time-interval
starting at 100 ms before onset of the final verb and ending
1100 ms post-onset. Prior to averaging, trials with ocular or
amplifier-related artefacts were excluded from analysis (on
average 10% of all trials: 10.5% in condition Strong
Constraint – Good Fit, 8.8% in Weak Constraint – Good Fit,
10.8% in Strong Constraint-Poor Fit, and 10% in Weak
Constraint-Poor Fit). All averages were aligned to a 100-ms
pre-stimulus baseline. For an exploratory analysis of the
complete data set, averaged ERPs of each 1200 ms epoch
(i.e., 100 ms pre-onset to 1100 ms post-onset of the final
verb) were divided into 60 intervals of 20 ms. In each of
these 60 intervals, mean amplitudes were statistically analyzed for each electrode separately using MANOVA. To
guard against excessive type I errors, effects will be interpreted only when significant in three or more successive
intervals. Cases where two successive intervals were significant will nevertheless be shown in Fig. 1 for reasons of
comparability with previous studies (cf. Ref. [10,13,14]).
The results of this ‘interval’ analysis will also be used to
define relevant time-windows for evaluating the topographical distribution of the effects. For the analysis of average
ERPs in a given time-window we will include a factor
‘laterality’ with three levels (i.e., left, midline, and right side
of the scalp) and a factor ‘posteriority’, with five levels (i.e.,
prefrontal (Fp1, Fza, Fp2), frontal (F3, Fz, F4), central (C3,
Cz, C4), parietal (P3, Pz, P4), and occipital (O1, Oz, O2)).
The Huynh-Feldt correction will be applied to correct for
66
J.C.J. Hoeks et al. / Cognitive Brain Research 19 (2004) 59–73
Fig. 1. Results of statistical analyses on single electrode ERP-data, time-locked to the presentation of the final verb. Electrodes on the Y-axis are grouped into
left hemisphere (FP1-O1), midline (FZA-OZ), and right hemisphere electrodes (FP2-O2). Planned comparisons between strong and weak constraint sentences
with good fit verbs (left-hand panel), and between strong and weak constraint sentences with poor fit verbs (middle panel) are shown, as well as interactions
(right-hand panel). Light-gray bars represent cases where effects in two consecutive 20-ms intervals were statistically significant ( p-value < 0.05). Dark-gray
bars represent cases where effects in three or more consecutive intervals were significant.
violations of the sphericity assumption [31]. We will report
the corrected p-values with the original degrees of freedom.
4. Results
4.1. Behavioral data
Table 4 presents participant means (and standard deviations) for RT and performance accuracy for each of the
experimental conditions. As can be seen from the accuracy
data, over 85% of all sentences were correctly classified as
either plausible or implausible, suggesting that participants
read the sentences carefully and made only few mistakes in
their plausibility judgment. An ANOVA on participant RT
means with message-level constraint and lexico-semantic fit
as within-participant variables yielded a highly significant
interaction ( F(1,22) = 24.47, MSe = 14,565, p < 0.001). Reaction times for strong constraint sentences with a good fit
final verb were significantly faster (i.e., 111 ms) than for the
two weak constraint conditions taken together (t(22) =
Table 4
Average reaction time for correctly judged sentences (ms) and average
proportion correct as a function of lexico-semantic fit and message-level
constraint in the plausibility judgement task (standard deviations in
parentheses)
Condition
Measure
RT-correct
Proportion correct
Good fit final word
Strong Constraint
Weak Constraint
716 (223)
834 (225)
0.95 (0.06)
0.89 (0.08)
Poor fit final word
Strong Constraint
Weak Constraint
960 (249)
829 (210)
0.83 (0.12)
0.91 (0.10)
RT-Correct = RTs to correctly judged sentences.
2.68, p < 0.0.05). On the other hand, reaction times for
strong constraint sentences with a poor fit final verb were
significantly slower (i.e., 124 ms) than for weak constraint
sentences (t(22) = 3.03, p < 0.01). The two conditions with
weak message-level constraint did not differ ( p>0.20).
Analyses on participant means for ‘‘proportion correct’’
revealed a very similar pattern. There was a highly significant interaction of message-level constraint and lexicosemantic fit ( F(1,22) = 19.37, MSe = 0.01, p < 0.001). Strong
constraint sentences with a good fit final verb showed a
higher proportion correct (i.e., 0.95) than both weak constraint conditions taken together (i.e., 0.90; t(22) = 2.10,
p < 0.05), whereas responses to strong constraint sentences
with a poor fit final verb were more often incorrect (i.e.,
0.83) as compared to sentences with a weak message-level
constraint (i.e., 0.90; t(22) = 3.01, p < 0.01). Again, there
was no significant difference between the two weak constraint conditions ( p>0.20).
4.2. ERP data
Fig. 1 shows the results of the statistical analyses on the
average ERP-amplitudes (in 20-ms intervals) time-locked to
the presentation of the final verb. As can be seen, interactions were predominantly found in the region 300– 600
ms post-onset (i.e., the N400 time-window) and, unexpectedly, in approximately 600 – 1100 ms post-onset. As can be
seen from Fig. 2, which displays the averaged ERP
responses to the four conditions, the responses in these
two time-windows are quite different, as is the form taken
by the interaction. We will discuss these time-windows
separately, starting with the results of the interval analysis,
followed by the topographical analyses. For this latter
analysis, ERPs were averaged over the interval 300– 600
ms post-onset for the analyses in the N400 time-window,
and the interval 700– 1000 ms post-onset for the analyses of
J.C.J. Hoeks et al. / Cognitive Brain Research 19 (2004) 59–73
67
Fig. 2. Grand average ERP-waveforms as a function of message-level constraint and lexico-semantic fit.
the later time-window. Fig. 3 displays the difference waves
obtained by subtracting the ERPs to the correct strong
constraint –good fit condition from each of the other three
conditions, which might be especially helpful to understand
the pattern of results in the later time-window.
4.2.1. Time-window 300– 600 ms: the N400 time-domain
The interval analysis (see Fig. 1) revealed significant
interactions of message-level constraint and lexico-semantic
fit at central, parietal, occipital electrodes and at F4. Followup analyses showed that at the electrodes where an interaction was present, no significant differences in amplitude
could be observed between strong and weak constraint
sentences ending in lexico-semantically good fit final words
(e.g., ‘‘The javelin was by the athletes thrown’’ vs. ‘‘The
javelin has the athletes thrown’’); for final words that had a
lexico-semantically poor fit (e.g., summarized), however,
differences between strong and weak constraint sentences
were significant; sentences with a strong message-level
constraint (e.g., ‘‘The javelin was by the athletes summarized’’) were more negative than weak constraint sentences
(e.g., ‘‘The javelin has the athletes summarized’’). Fig. 2
provides a graphical display of the grand average ERP
waveforms for all four conditions.
A topographical analysis was conducted subsequently
and revealed a four-way interaction of constraint
(2) lexico-semantic fit (2) laterality (3) posteriority
(5) that was marginally significant ( F(8,176) = 1.95;
MSe = 2491; p = 0.08). Follow-up analyses were conducted
separately for each level of posteriority (prefrontal, frontal,
68
J.C.J. Hoeks et al. / Cognitive Brain Research 19 (2004) 59–73
Fig. 3. Difference waveforms obtained by subtracting the grand average ERP-waveforms of the Strong Constraint – Good Fit condition from the grand averages
of each of the three other conditions.
etc.). The three-way interaction constraint x lexico-semantic
fit x laterality did not reach significance anywhere except
(marginally) at the frontal electrodes: ( F(2,44) = 2.72;
MSe = 3013; p = 0.08) where the difference between strong
and weak constraint for good fit words was smallest at F4
(though this difference was not significant at any of these
electrodes), whereas the pattern for poor fit words did not
differ across the frontal electrodes. Thus, the effects making
up the interaction in the N400 time window appear not to be
lateralized except for an effect at the frontal electrodes.
4.2.2. Time-window 600– 1100 ms
The interval analysis showed significant interactions
between message-level constraint and lexico-semantic fit,
primarily at centro-parietal electrodes: C3, Cz, C4, P3, Pz,
P4, but also at Fz and O2 (see Fig. 1). Three sets of pairwise
comparisons were conducted to follow up on those interactions. In the first set of comparisons all conditions were
compared against the strong constraint sentences with a
good fit ending (see Fig. 3 for difference waves). The
positivity caused by the weak constraint – good fit condition
was significant at all electrodes where an interaction was
found; the weak constraint – poor fit condition was significantly more positive at all electrodes but C4. The positivity
of the strong constraint – poor fit condition was somewhat
less broadly distributed, as it reached significance at Fz, C3,
Cz, and P4.
The second set of comparisons showed that the effect
associated with the weak constraint –good fit condition was
significantly larger than the positivity elicited by the weak
J.C.J. Hoeks et al. / Cognitive Brain Research 19 (2004) 59–73
constraint –poor fit condition (at Cz, C4, P3, Pz, P4, and
O2), and also significantly larger than the positivity in the
strong constraint –poor fit condition (at P3, Pz, P4, and O2).
The final comparison involved the weak constraint – poor fit
condition, which showed a larger positivity numerically as
compared to the strong constraint – poor fit condition, which
was significant only at P3.
A subsequent topographical analysis revealed a fourway interaction of constraint (2) lexico-semantic fit
(2) laterality (3) posteriority (5) which was highly
significant ( F(8,176) = 4.10; MSe = 6353; p < 0.005). Follow-up analyses at each level of posteriority showed
(marginally) significant three-way interactions of constraint fit laterality at the prefrontal, frontal and central
level, indicating that at those electrodes the form of the
interaction between constraint and lexico-semantic fit differed with laterality (Prefrontal: F(2,44) = 3.11; MSe =
13047; p = 0.06; Frontal: F(2,44) = 3.28; MSe = 6718; p =
0.06; and Central: F(2,44) = 4.07; MSe = 8810; p < 0.05).
At none of the prefrontal electrodes the constraint lexicosemantic fit interaction reached significance. The fact that
at Fp1 and Fp2 the two poor fit conditions were actually
numerically more negative than the plausible condition, in
contrast to Fza where all implausible conditions were
numerically more positive than the plausible condition
must have caused the marginally significant interaction.
At the frontal and the central level the pattern was
somewhat different. All electrodes showed positive effects
of the three implausible conditions as compared to the
plausible one, except for F4 where there were no significant differences, and except for C4 where only the weak
constraint –good fit condition produced a significant positivity. The absence of significant interactions of constraint lexico-semantic fit laterality at the parietal and
the occipital level indicated that the form of the constraint lexico-semantic fit interaction did not change with
laterality.
4.3. Summary of results
Significant interactions between message-level constraint
and lexico-semantic fit were found in two time-windows. In
the N400 time-window, from 300 until 600 ms post-onset,
the interactions were broadly distributed over electrodes
(though excluding most of the frontal ones), and were
brought about by the absence of a difference between
lexico-semantically good fit final verbs in strong and in
weak message-level constraint contexts, and the presence of
such a difference for poor fit verbs, with strong messagelevel constraint sentences being significantly more negative
than weak constraint sentences with these poor fit words.
In the second time-window, from 600 until 1100 ms postonset, significant interactions between constraint and lexicosemantic fit were found, predominantly at centro-parietal
electrodes. These interactions were the consequence of the
fact that, at most of the electrodes concerned, all three
69
implausible conditions showed a significant positive shift
as compared to the plausible condition. This positive shift
was in general largest for the weak constraint – good fit
condition, while the two conditions with poor fit words had
a smaller positivity. When comparing the latter two conditions, the strong constraint –poor fit condition was almost
always numerically smaller than the weak constraint – poor fit
condition; a difference that reached significance, however,
only at P3. Topographical analysis showed that the positive
shift was slightly less pronounced on the right as compared to
the left side of the scalp, with the effects at central and frontal
electrodes on the right side (i.e., C4 and F4) being smaller and
mostly not significant.
5. Discussion
The aim of this experiment was to find out if and how
message-level information and lexico-semantic information
interact to affect the processing of an upcoming word during
on-line sentence comprehension. We obtained a significant
interaction of message-level constraint and lexico-semantic
fit in the N400 time-window (300 – 600 ms post-onset) and,
unexpectedly, also in a later time window (600 –1100 ms
post-onset). The form of the interaction in the N400 timewindow indicated that message-level constraint and lexicosemantic fit both must have had an impact on the processing
of upcoming words. First, we found a significant difference
in N400 amplitude between strong and weak message-level
constraint sentences containing lexico-semantically poor fit
words (e.g., summarized), reflecting the impact of messagelevel constraint on processing. Both poor fit conditions were
found to be significantly more negative than the correct (and
plausible) strong constraint – good fit condition, which was
what we had expected since both conditions were (equally)
semantically implausible (for plausibility ratings, see Table
3). The largest N400 was obtained for the poor fit word
following a strong constraint context, indicating that presenting a poor fit word in a context where another word is
highly expected has an effect on the N400, over and above
the N400 effect resulting from mere lexico-semantic fit.
Importantly, as can be clearly seen in Fig. 3, this additional
effect has the same distribution as the effect of lexicosemantic fit per se. This is an important finding, especially
in the light of earlier contentions [23,24] that sentence
constraint does not in itself influence the N400, but that
the ‘cloze probability’ of the following words crucially
determines N400 amplitude. As the cloze probability of
the final words is 0% in both conditions, this hypothesis is
not supported. The results from the two poor fit conditions
thus indicate that message-level constraint can make the
processing of an unexpected word more difficult, a finding
that is compatible with the message-based model proposed
by Morris [27].
However, the N400 elicited by lexico-semantically good
fit words did not seem to be affected by message-level
70
J.C.J. Hoeks et al. / Cognitive Brain Research 19 (2004) 59–73
constraint at all. This is somewhat unexpected given the
effect of message-level constraint that was observed for the
poor fit words. It is also quite surprising given the fact that
in the weak constraint contexts lexico-semantically good fit
words were semantically anomalous (e.g., The javelin has
the athletes thrown), which we predicted would cause some
increase in N400 amplitude in this condition (see also Table
2). It thus appears that the N400 to the anomalous words in
the low constraint – good fit sentence must have been
reduced (i.e., made less negative) through the operation
of lexico-semantic fit. Let us consider some possible
mechanisms.
For one thing, we know that the N400 to an anomalous
word is sometimes attenuated as a result of it being closely
lexico-semantically related to the most expected word in a
strong constraint sentence, as reported in the early work of
Kutas and Hillyard [23,24]. This kind of explanation seems
rather unlikely, however, as the sentences with weak message-level constraint generally do not have a most expected
word, and the words with a good lexico-semantic fit (e.g.,
thrown) are not even remotely related to any word that could
be a plausible continuation of the sentence (e.g., amazed).
So an explanation along the lines of ‘indirect’ semantic
priming, that is, via the most expected word, can be
effectively ruled out.
Alternatively, the N400 could have been reduced through
‘simple’ lexico-semantic priming between the final word
and one of the preceding content words, as suggested by
Van Petten [32,33]. However, the fact that we didn’t find
any difference between plausible and implausible good fit
words seems to go against Van Petten’s account, since she
claimed that message level and lexico-semantic effects were
additive. It is difficult to see how the effect of lexicosemantics alone (i.e., in the weak constraint condition) could
equal the effect of message-level constraint plus plausibility
plus lexico-semantic fit (i.e., in the strong constraint condition). Rather, these results seem to be more compatible with
the combinatory version of lexico-semantic priming as put
forward by Duffy et al. [6], in which the two preceding
content words (e.g., javelin and athletes) together facilitate
the processing of the final verb. This latter model actually
predicts equal priming effects for sentences with the same
content words, irrespective of their syntactic structure or
message-level representation, which is the pattern seen here.
However, both the simple and the combinatory lexicosemantic account clearly fail to provide an explanation for
the pattern of results that was found for the poor fit words.
In summary, then, the present results strongly suggest
that both lexico-semantic fit and message-level constraint
have a significant impact on word processing, though the
mechanism by which they interact is not obvious. What we
do know is that the current set of results cannot be explained
by any account focusing solely on either of these sources of
information, such as the simple and the combinatory account of lexico-semantic priming, or the message-level
hypothesis (i.e., lexico-semantic facilitation will only occur
when consistent with the message-level representation). We
have also argued that the intuitively appealing ‘additive’
account suggested by Van Petten cannot be right, as message-level constraint did not seem to have an additive effect
in the two good fit conditions.
One way to think about a solution is to retain the
message-based model, but with the extra assumption that
under some circumstances, the complete and correct message-level representation may not be available to influence
the processing of the upcoming word, but only an underspecified version of the correct representation (see, e.g.,
Ref. [30] for evidence on the use of underspecified representations in language understanding). For instance, in the
weak constraint sentences, it may be hard to arrive at a
coherent message-level representation because of the difficulty of correctly assigning thematic roles. The active
syntactic structure forces an inanimate entity (e.g., the
javelin) to be the agent/effector of the event described
and thus to do something to an animate entity (e.g., the
athletes), which contrasts with their ‘preferred’ thematic
roles (see Introduction).
We cannot be sure what such an underspecified messagelevel representation looks like, but it seems it is not able to
eliminate semantic activations based on the combined
lexico-semantic information of the content words in the
sentence, and could be much like the semantic representations proposed by Duffy et al. [6]. Normally, this ‘basic’
semantic representation would have been processed further
into a specific message-level representation, apparently also
deactivating incompatible semantic features along the way
(cf. the results for the poor fit words). Thus, semantic
activations in the weak constraint – good fit sentence could
in fact be rather similar to those of the correct message-level
representation of the strong constraint – good fit sentence!
Or to put it differently, it is possible that for a very short
period of time (i.e., a few hundred milliseconds) the
processor reacts as if the implausible weak constraint – good
fit sentence were perfectly plausible, in which case one
could speak of a temporary semantic illusion. It seems
unlikely, given its proposed brevity, that this temporary
illusion will reach consciousness, quite analogous to processing difficulty in garden-path sentences which reaches
awareness only in extreme cases. Thus, the message-based
model might be extended to explain the present findings by
assuming that strong and weak constraint sentences temporarily may have a very similar semantic representation and
thus have similar effects on the processing of incoming
words.
This extended model might also explain other findings in
the literature, such as Duffy et al.’s [6] failure to find
different outcomes in sentences that clearly differed in
syntactic structure, which was taken as evidence for the
combinatory lexical priming approach. Recall that the
priming effect at mustache (i.e., as compared to a neutral
control) did not differ between a sentence such as ‘‘While
she talked to him, the barber trimmed the mustache’’ and
J.C.J. Hoeks et al. / Cognitive Brain Research 19 (2004) 59–73
the syntactically different ‘‘While talking to the barber, she
trimmed the mustache’’. The presence of the pronoun she,
referring to a person that has not been introduced and that
therefore does not have any real meaning, may make it
difficult to come up with a complete and sensible sentence
interpretation before reaching mustache. Now if a context
had been provided in which a specific referent for the ‘she’
had been mentioned, for instance a female gardener, this
would quite likely have had an impact on the priming effect
on mustache in the sentence where the ‘she’ is doing the
trimming. As shown by Morris, when a gardener is directly
introduced as the subject of the sentence, the presence of
barber and trimmed is not enough to prime mustache [27].
In addition, the model might explain why Fischler et al. ([9],
see also Ref. [19]) did not find an N400 at the end of
sentences such as A robin is not a bird (versus A robin is a
bird), as it seems likely that computation of the correct
message-level representation might have been rather difficult before the final word is read (e.g., A robin is not a).
The proposed model is also consistent with the results
from two recent ERP studies that investigated the processing of semantic anomaly [18,20]. Both studies failed to find
an N400 effect to a semantic anomaly, and both used critical
words that had good lexico-semantic fit but were semantically anomalous, in sentences with a problematic thematic
structure. Kuperberg et al. [20] did not find the expected
N400 effect at the critical final word in anomalous sentences
such as: ‘When she had a cold her TISSUE would
SNEEZE’ as compared to ‘When she had a cold the GIRL
would SNEEZE’. The subject NP of the main clause (e.g.,
tissue) is inanimate, which might cause thematic processing
problems (i.e., if the reader assumes it will play the part of
agent) and slow down the construction of a message-level
representation, while the verb sneeze has close semantic ties
to tissue. The results of the study by Kolk et al. [18] can be
explained in much the same manner. Interestingly, as in the
present experiment, both of the studies mentioned here
found a pronounced late positivity following the condition
that failed to show the N400 effect.
5.1. Late positivity
Besides the interaction in the N400 time-window there
was a second pattern of interactions, occurring some 600 –
1100 ms after the final word was presented. In comparison
to the strong constraint –good fit condition all conditions
elicited a positive shift, which was particularly prominent
for the lexico-semantic good fit sentences with weak
message-level constraint. Given the predominantly centroparietal scalp distribution of these differences they might be
interpreted as involving the P600, the best known ERP
component associated with sentence processing in this timewindow, even though maximal amplitude was reached later
than in the ‘standard’ case of the P600, that is, approximately 600 ms post-onset (e.g., Refs. [15,29]). P600 effects
have been shown to occur in sentences that are ungram-
71
matical, but also in correct sentences with a non-preferred
syntactic structure (e.g., Refs. [15,29]), and have accordingly been argued to reflect (attempts at) syntactic reanalysis. In addition, other studies have indicated that the P600
might reflect processes of syntactic integration (e.g., Ref.
[17]). On the basis of this research, P600 amplitude is
typically viewed as reflecting some kind of effortful syntactic processing that is likely to occur in sentences that are
ungrammatical, syntactically ambiguous, or syntactically
complex.
In the present experiment, however, the sentences are not
ungrammatical, and the most likely cause for the observed
processing difficulty is not syntactic ambiguity or syntactic
complexity, but, rather, thematic processing difficulty. For
instance, the largest positivities (at least numerically) were
observed in the two weak constraint conditions, where there
is a conflict between the thematic role assignment prescribed by the grammatical structure (e.g., javelin is agent/
effector and athletes are patient/experiencer) and the preferred assignment of thematic roles (e.g., javelin is theme
and athletes are agent). The largest positive shift was
elicited by the weak constraint –good fit condition which
might indicate that readers put considerable extra effort into
syntactic (re-)analysis of the sentences in this condition as
there is a highly plausible alternative thematic structure
(e.g., the athletes throw the javelin) competing with the
syntax-based thematic structure that leads to an implausible
interpretation (e.g., the javelin throws the athletes). In the
weak constraint – poor fit condition, on the other hand, there
may be problems with thematic role assignment, but in the
absence of a viable thematic alternative the effort put into
syntactic (re-) processing in order to make sense of these
implausible sentences may be much more limited.
Finally, the positive shift for the strong constraint – poor
fit condition was found to be numerically smallest but still
significant (as compared to the one plausible condition),
presumably reflecting the effortful syntactic processing
that is put into place when a reader tries to make sense
of an implausible sentence, even if there is no conflict
between prescribed and preferred thematic roles. Thus, the
present results are compatible with the hypothesis that a
P600 may be elicited in syntactically correct sentences by
processing problems originating from semantic or thematic
processing difficulties (e.g., Refs. [16,18,20], but see also
Refs. [14,11]).
We should, however, consider some alternative interpretations of the late positive effect. It could be argued that the
effect in this late time-window is actually a P300 elicited by
the plausibility decision that was required for every sentence
in the experiment. The P300 is very sensitive to taskrelevant factors and its size correlates closely with task
difficulty. Under this interpretation of the late positivity, the
largest effect, then, should be observed in cases where it is
most difficult to decide whether the sentence is plausible; as
can be seen in Table 4, this is not the case: The most
difficult condition in terms of increased RT and decreased
72
J.C.J. Hoeks et al. / Cognitive Brain Research 19 (2004) 59–73
proportion correct was the strong constraint – poor fit condition, which actually showed the smallest positivity. On a
speculative note, the requirement to make a decision per se
may have given rise to the generally observable (slow)
positive shift at the end of the sentence for all conditions,
including the correct one.
In a similar vein it could be assumed that the specific
pattern of results in the later time window arises from motor
responses or motor preparations associated with the plausibility decision responses. This possibility was not supported
by post-hoc analyses of the present data, where no differences in waveforms were found between corresponding left
and right electrode sites, which is typically found when
participants prepare to make a motor response (lateralized
readiness potential, see, e.g., Ref. [22]). Thus, it seems
unlikely that the preparation or execution of motor
responses was responsible for the waveform pattern in the
later time-window.
In addition to the arguments given above, the possibility
of decision or motor related processes interfering with the
ERP results in the late positivity time-window has been
explicitly addressed in the Kolk et al. study [18]. When they
replicated their experiment without an added task, they
found identical results, indicating that neither the absence
of an N400 effect nor the presence of the late positive effect
could be ascribed to task-related processes. Nevertheless, it
is clear that more research is needed to verify whether the
late positivity that was found in the present experiment
actually belongs to the family of the P600, and whether it
actually reflects effortful syntactic processing.
A final potential concern regarding the interpretation of
the late positive effects comes from the difference in the
percentage of correct sentences between the active weak
constraint sentences (i.e., 66%) and the passive strong
constraint sentences (i.e., 50%; see Materials section). The
somewhat smaller probability of encountering an implausible active sentence might have given rise to a larger
positivity for the experimental active sentences, which were
all implausible. This positivity would then reflect a P300
which has been shown to be sensitive to (relatively large)
differences in probability of irregular stimuli, rather than a
P600 sensitive to syntactic processing (see Ref. [5], but also
Ref. [28]). The pattern of results in the late time-window
does not seem incompatible with an explanation in terms of
differing proportions. The active weak constraint sentences
were clearly more positive than the passive strong constraint
sentences containing good fit words, and there were also
differences between active and passive sentences containing
poor fit words (though significant only at P3). However, the
whole pattern of results in this late time-window appears to
be too complex to be caused by a small difference in
proportions.
In summary, then, it seems reasonable to interpret the
positive shift as being a P600, reflecting effortful syntactic
processing in order to obtain a semantically coherent and
plausible sentence.
5.2. Conclusion
The present experiment was designed to investigate the
interaction between message-level constraints on the one
hand, and lexico-semantic information on the other during
the processing of upcoming words. We have shown that
both kinds of information are very important in processing
an upcoming word, but not in an additive fashion. Lexicosemantic information was presumed to be responsible for
the absence of an N400 effect in the comparison between
strong constraint and weak constraint sentences with good
fit words, even though these words were anomalous in the
weak constraint context. In contrast, the highly specific
message-level information in the strong constraint sentences affected primarily the processing of unrelated poor fit
words, increasing the N400 amplitude, over and above the
effect of semantic anomaly per se. This specific pattern of
results was argued to be incompatible with current models
of sentence processing, and an extension of the messagelevel hypothesis was suggested, in which the correct
sentence interpretation may sometimes remain temporarily
underspecified and thus unable to influence the processing
of the upcoming word; an underspecified highly schematic
version of it may have to be used instead. Although the
sentence eventually receives its correct interpretation, there
appears to be a temporary semantic illusion of plausibility
when the final word has a good lexico-semantic fit. We
have suggested that the late positive component (most
probably a P600) that was found in this experiment
reflected the effortful syntactic processing invested in
trying to make sense of an implausible sentence. It was
also suggested that this effort will be maximal if there is a
highly plausible alternative thematic structure (e.g., the
athletes throw the javelin) competing with the syntaxbased thematic structure that leads to the implausible
interpretation (e.g., the javelin throws the athletes). Ongoing research in our laboratory is aimed at clarifying
under which conditions an underspecified version of the
correct message-level will be used, focussing on the timecourse of constructing a correct and coherent sentence
representation.
Acknowledgements
This research was partly funded by a ‘PIONIER’ grant
from the Dutch Organization for Scientific Research (NWO)
for the project ‘The Neurological Basis of Language’
awarded to Dr. Laurie A. Stowe. We would like to thank
Berry Wijers and David Atkinson for helpful remarks, Joop
Clots, Hein van Schie, and Hans Veldman for technical
assistance, Ingeborg Prinsen for help with the graphics, and
Thom Gunter for allowing us to use his high cloze
probability materials. Materials used in this study are
available on-line via http://odur.let.rug.nl/~hoeks/
BRESC2003_mat.pdf .
J.C.J. Hoeks et al. / Cognitive Brain Research 19 (2004) 59–73
References
[1] H.R. Baayen, R. Piepenbrock, H. Van Rijn, The CELEX lexical database (CD-ROM), Linguistic Data Consortium, Philadelphia, PA,
1993.
[2] S. Bentin, M. Kutas, S.A. Hillyard, Electrophysiological evidence for
task effects on semantic priming in auditory word processing, Psychophysiology 30 (1993) 161 – 169.
[3] C.M. Brown, P. Hagoort, The processing nature of the N400: evidence from masked priming, J. Cogn. Neurosci. 5 (1993) 34 – 44.
[4] C.M. Brown, P. Hagoort, D.J. Chwilla, An event-related brain potential analysis of visual word priming effects, Brain Lang. 72 (2000)
158 – 190.
[5] S. Coulson, J.W. King, M. Kutas, Expect the unexpected: event-related brain responses to morphosyntactic violations, Lang. Cogn. Processes 13 (1998) 21 – 58.
[6] S.A. Duffy, J.M. Henderson, R.K. Morris, Semantic facilitation of
lexical access during sentence processing, J. Exper. Psychol., Learn.,
Mem., Cogn. 15 (1989) 791 – 801.
[7] K.D. Federmeier, M. Kutas, A rose by any other name: long-term
memory structure and sentence processing, J. Mem. Lang. 41 (1999)
469 – 495.
[8] C.J. Fillmore, The case for case, in: E. Bach, R.T. Harms (Eds.),
Universals in Linguistic Theory, Holt, Rinehart, and Winston, New
York, 1968, pp. 1 – 88.
[9] I. Fischler, P.A. Bloom, D.G. Childers, S.E. Roucos , N.W. Perry Jr.,
Brain potentials related to stages of sentence verification, Psychophysiology 20 (1983) 400 – 409.
[10] A.D. Friederici, K. Steinhauer, S. Frisch, Lexical integration: sequential effects of syntactic and semantic information, Mem. Cogn. 27
(1999) 438 – 453.
[11] S. Frisch, M. Schlesewsky, The N400 reflects problems of thematic
hierarchizing, NeuroReport 12 (2001) 3391 – 3394.
[12] J.S. Gruber, Lexical Structures in Syntax and Semantics, North-Holland, Amsterdam, 1976.
[13] T.C. Gunter, L.A. Stowe, G. Mulder, When syntax meets semantics,
Psychophysiology 34 (1997) 660 – 676.
[14] T.C. Gunter, A.D. Friederici, H. Schriefers, Syntactic gender and
semantic expectancy: ERPs reveal early autonomy and late interaction, J. Cogn. Neurosci. 12 (2000) 556 – 568.
[15] P. Hagoort, C.M. Brown, J. Groothusen, The Syntactic Positive Shift
(SPS) as an ERP measure of syntactic processing, Lang. Cogn. Processes 8 (1993) 439 – 483.
[16] A. Hahne, J.D. Jescheniak, What’s left if the Jabberwock gets the
semantics? An ERP investigation into semantic and syntactic processes during auditory sentence comprehension, Cogn. Brain Res.
11 (2001) 199 – 212.
[17] E. Kaan, A. Harris, E. Gibson, P.J. Holcomb, The P600 as an index of
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
73
syntactic integration difficulty, Lang. Cogn. Processes 15 (2) (2000)
159 – 201.
H.H.J. Kolk, D.J. Chwilla, M. van Herten, P.J.W. Oor, Structure and
limited capacity in verbal working memory: a study with Event Related Potentials, Brain Lang. 85 (2003) 1 – 36.
J. Kounios, P.J. Holcomb, Structure and process in semantic memory:
evidence from event-related brain potentials and reaction times,
J. Exper. Psychol., Learn., Mem., Cogn. 121 (1992) 459 – 479.
G.R. Kuperberg, T. Sitnikova, D. Caplan, P. Holcomb, Electrophysiological distinctions in processing conceptual relationships within simple sentences, Cogn. Brain Res. 17 (2003) 117 – 129.
M. Kutas, In the company of other words: electrophysiological evidence for single-word and sentence context effects, Lang. Cogn. Processes 8 (1993) 533 – 572.
M. Kutas, E. Donchin, Preparation to respond as manifested by movement-related brain potentials, Brain Res., (1980) 95 – 115.
M. Kutas, S.A. Hillyard, Brain potentials during reading reflect word
expectancy and semantic association, Nature, (1984) 161 – 163.
M. Kutas, T.E. Lindamood, S.A. Hillyard, Word expectancy and
event-related brain potentials during sentence processing, in: S. Requin, J. Requin (Eds.), Preparatory States and Processes, Erlbaum,
Hillsdale, NJ, 1984, pp. 217 – 237.
M. Lamers, Sentence processing: using syntactic, semantic, and thematic information, Unpublished Doctoral thesis. University of Groningen, The Netherlands, 2001.
D.E. Meyer, R.W. Schvaneveldt, Facilitation in recognising pairs of
words: evidence of a dependence between retrieval operations, J. Exp.
Psychol. 90 (1971) 227 – 235.
R.K. Morris, Lexical and message-level sentence context effects on
fixation times in reading, J. Exper. Psychol., Learn., Mem., Cogn. 20
(1994) 92 – 103.
L. Osterhout, P. Hagoort, A superficial resemblance doesn’t necessarily mean you’re part of the family: Counterarguments to Coulson,
King, and Kutas in the P600/SPS debate, Lang. Cogn. Processes 14
(1998) 1 – 14.
L. Osterhout, P.J. Holcomb, Event-related brain potentials elicited by
syntactic anomaly, J. Mem. Lang. 31 (1992) 785 – 806.
A.S. Sanford, P. Sturt, Depth of processing in language comprehension: not noticing the evidence, Trends Cogn. Sci. 6 (2002) 382 – 386.
J. Stevens, Applied Multivariate Statistics for the Social Sciences,
Erlbaum, Hillsdale, NJ, 1992.
C. Van Petten, A comparison of lexical and sentence-level context
effects and their temporal parameters, Lang. Cogn. Processes 8 (1993)
485 – 532.
C. Van Petten, Words and sentences: event-related brain potential
measures, Psychophysiology 32 (1995) 511 – 525.
J.N. Williams, Constraints upon semantic activation during sentence
comprehension, Lang. Cogn. Processes 3 (1988) 165 – 206.