Visual binding of English and Chinese word parts is limited to low

Perception, 2007, volume 36, pages 49 ^ 74
DOI:10.1068/p5582
Visual binding of English and Chinese word parts is limited
to low temporal frequencies
Alex O Holcombe
School of Psychology, Tower Building, Park Place, Cardiff University, Cardiff CF10 3AT, Wales, UK;
and Department of Psychology, University of California at San Diego, La Jolla, CA 92093, USA;
e-mail: [email protected]
Jeff Judson
Department of Psychology, University of California at San Diego, La Jolla, CA 92093, USA
Received 19 August 2005, in revised form 26 January 2006; published online 5 January 2007
Abstract. Some perceptual mechanisms manifest high temporal precision, allowing reports of
visual information even when that information is restricted to windows smaller than 50 ms.
Other visual judgments are limited to much coarser time scales. What about visual information
extracted at late processing stages, for which we nonetheless have perceptual expertise, such as
words? Here, the temporal limits on binding together visual word parts were investigated. In one
trial, either the word `ball' was alternated with `deck', or `dell' was alternated with `back', with
all stimuli presented at fixation. These stimuli restrict the time scale of the rod identities because
the two sets of alternating words form the same image at high alternation frequencies. Observers
made a forced choice between the two alternatives. Resulting 75% thresholds are restricted to
5 Hz or less for words and nonword letter strings. A similar result was obtained in an analogous
experiment with Chinese participants viewing alternating Chinese characters. These results support
the theory that explicit perceptual access to visual information extracted at late stages is limited
to coarse time scales.
1 Introduction
Physiological recordings show that initial coding in the visual system provides reliable
information at fine temporal scales, with neurons in primary visual cortex following
flicker frequencies well over 60 Hz (Gur and Snodderly 1997). Frequencies much higher
than this are completely lost to perception, but still an impressive degree of precision
is preserved in some cases. For example, when viewing a periodic pattern of bars
(a sinusoidal grating) moving within a window at 20 Hz, humans can perceive its direction of motion (Burr and Ross 1982) and depth (Morgan and Castet 1995). This occurs
despite the fact that for 20 Hz gratings, temporal integration over 50 ms or more yields
a uniform field or bars at every position, totally obliterating the motion and depth
information. The fact that we nonetheless perceive motion and depth at these rates
implies that the mechanisms underlying our perception of motion and depth operate at
a time scale better than 50 ms.
This high temporal precision is not restricted to perception of individual features ö
the pairing of colour with orientation can also be perceived with over 20 Hz stimulus
alternation rates (Holcombe and Cavanagh 2001). In that experiment, participants
viewed a display rapidly alternating between a red leftward-tilted pattern and a green
rightward-tilted pattern. In other trials, the feature pairing was reversed, with green leftward tilt alternating with red rightward tilt. If the representation of the individual
features at the point of binding had a precision worse than 50 ms, or if the binding
process integrated over a period longer than 50 ms, then participants would not have
been able to distinguish the feature pairings of these stimuli. The fast threshold found
imposes strong constraints on the neural computations underlying the binding process, as do more recent studies of the binding of distributed shape elements (Clifford
et al 2004). The results of these latter studies reveal that scattered dot pairs are also
50
A O Holcombe, J Judson
bound together with high, 20 Hz, temporal precision when they form certain regular
global patterns such as spirals and concentric circles (Clifford et al 2004). Hence, the
visual system is capable of combining spatially disparate features into global form with
a precision better than 50 ms. Mechanisms for symmetry perception may have similarly
impressive temporal resolution (Tyler et al 1995).
Interestingly, aside from the cases when disparate elements form certain global
patterns, the pairing of disparate elements generally appears to be limited to low alternation rates. Recall that colour and orientation can be bound at a temporal precision
better than 20 Hz when the features are spatially superposed. When these same
features are spatially separated, their pairing can only be perceived at about 3 Hz or
less (Holcombe and Cavanagh 2001; demonstration available from http://viperlib.com
or author AOH's website), suggesting temporal precision worse than 100 ms.
The contrast between some visual binding judgments that reflect a precision better
than 20 Hz and others limited to several hertz or worse sharpens some important
issues in temporal processing. In particular, at high temporal frequencies, what prevents
perceptual binding for some features but not others?
The perceptual experience of high-speed visual binding may hold a clue. Consider
the experience of two red and green perpendicular gratings alternating at slow rates. One
experiences first one pattern, then the other, and so on. However, when the two gratings
are alternated at a rate faster than about 6 ^ 8 Hz, they no longer seem to be experienced as individuals. Instead, although one is aware that the stimuli are flickering,
the gratings seem to be experienced simultaneously, as if they were both continuously
available to cognition (Holcombe 2001; Holcombe and Cavanagh 2001; demonstrations
may be viewed at http://viperlib.org and AOH's website). This `temporal transparency'
phenomenon (Holcombe 2001) also occurs with rapid alternation of Glass dot patterns
(Clifford et al 2004). Temporal transparency is accompanied by a loss of functional
access to the two stimuli as individuals (Holcombe 2001; Clifford et al 2004).
These phenomena suggest that, by the time visual signals reach awareness, they
have been combined over long intervals, preventing explicit judgments of fine time
scales. However, even when visual awareness does not seem to follow rapid alternations
(Holcombe 2001), not all high-frequency information is lost. Certain aspects of the visual
world, such as the identity of certain global forms and the pairing of local colour and
orientation, are extracted before awareness by mechanisms with high temporal resolution.
Those aspects that are not extracted early on, such as the pairing of spatially disparate
colour and orientation, apparently cannot be perceived at fast alternation rates. The
notion of a substantial loss in temporal resolution prior to cognitive stages suggests
that visual representations from a range of times are combined, as if squeezed together
through a bottleneck, to arrive as one to cognition and consciousness. To fully appreciate this theory, it is important to understand why low temporal resolution at cognition
and consciousness nevertheless does not preclude fine-time-scale access to all aspects
of the visual world. In the case of global form, the global-form mechanisms may
extract the presence of a left spiral and a right spiral at fine time scales, effectively
demodulating this aspect of the stimuli by sending constant labels on to higher levels,
signaling the presence of a left spiral and a right spiral. But in perceptual experience,
these signals have been combined, yielding the awareness of the presence of a left spiral
and a right spiral, but losing the representation of which occurred when. But if the globalform mechanism did not have high temporal resolution, the observer could not perceive
the spirals at all, for the dots from the successive patterns would be combined before the
global-form mechanism operated (Clifford et al 2004).
With visual information apparently integrated over a long interval for explicit perception, an obvious question is where in the visual system this integration occurs. As well
as being a fundamental aspect of the architecture of the visual system, the answer
Binding words is slow
51
would help to identify the neural correlates of visual awareness. The evidence so far
points to a site somewhere between the extrastriate areas responsible for global-form
extraction and the unknown areas that putatively correspond to the visual awareness
of successive events. This is following the assumptions of the popular quest for the
neural correlates of consciousness (eg Crick and Koch 1995, 2003). But even if it is a
mistake to speak of brain areas corresponding to visual awareness, understanding the
bottleneck will nonetheless put informative constraints on future, perhaps more sophisticated, theories of visual awareness. In this paper, we seek to further localise the
aspects of perception for which substantial loss of resolution occurs by measuring
the visual time scale at which humans can access the identity of letter strings, English
words, and Chinese characters.
Linguistic material was chosen for three reasons. First, visual words and letter
strings are encoded very late in the visual processing hierarchy öapparently in the
temporal lobe (McCandliss et al 2003). If late visual mechanisms are generally limited
to coarse time scales, word perception should certainly show this limitation. A second
reason for choosing words is that, if any high-level visual judgment were to occur at
fine time scales, the recognition of words should be a top candidate. Words are processed automatically by adult human readers (Stroop 1935), which is no doubt related
to the decades of daily experience most adults have with reading. This has further led to
some unitisation in the processing of words (Tao et al 1997). A final advantage of
words is that the way they are processed in normal reading has some resemblance to
the way they are presented in the laboratory experiment. Indeed, most theories of
reading suggest that only one or two words are processed per fixation in the rapid
series of fixations made during normal reading (Brysbaert and Vitu 1998; Inhoff et al
2000; Rayner et al 2004). Potter and her colleagues have reported numerous experiments with rapid presentation of words. They find that even at presentation rates of
12 words per second, participants can recall most of the words when together they
form a sentence (Potter 1993), which might suggest high temporal resolution. However,
these experiments were not designed to isolate the duration of processing of individual
words. For example, in these rapid serial-presentation experiments, temporal integration of a word with the preceding and following words does not completely obliterate
the cues to the word's identity.
From the present point of view, the masked priming literature (Kinoshita and
Lupker 2003) suffers from a similar limitation as does the rapid serial-presentation
work. The finding of semantic priming from very briefly presented words does suggest
that word binding and processing are extremely rapid. However, such results do not
provide good constraints on the temporal precision of the binding of letters into a
word. In such experiments, typically a word is presented for a few dozen milliseconds
before a patterned post-mask is presented. In this paradigm, considerable imprecision
in the relative processing time of individual letters and binding of the letters together
might still yield semantic activation. For example, even if letter units are activated at
substantially different times, there might still be no ambiguity which letters are to be
bound together, since no other letters are presented. Hence, the binding mechanism
could potentially integrate over a much longer period than the actual presentation time
of the word.
Temporal limits on word binding have not been previously investigated, but in a
number of studies binding errors have been revealed by brief, simultaneous presentation
of multiple words or letter strings. Mozer (1983) presented pairs of four-letter words for a
few hundred milliseconds and followed them with a post-mask. Letter migration errors
sometimes resulted. For example, when `line' and `lace' were presented, occasionally
participants reported seeing `lice' or `lane'. These errors occur more frequently than would
be expected by guessing (McClelland and Mozer 1986; Treisman and Souther 1986).
52
A O Holcombe, J Judson
Harris and her colleagues showed that rapid presentation of letter strings can lead to
repetition blindness for letter strings, words, and parts of words (Harris and Morris
2001). The patterns of letter migrations and repetition blindness that occur in these
conditions are used for deciding among the various proposed models of how letter
strings are represented and how reading is accomplished (Davis and Bowers 2004).
Our main interest, the relationship of binding limits for linguistic material to those
for other visual materials, does not seem to have been directly investigated by anyone.
However, the reverse-hierarchy theory of perception by Hochstein and Ahissar (2002)
may appear relevant. It suggests that, when we view a scene, our initial conscious percept
reflects high-level visual processing, such as the identity of faces and words, whereas
perception of visual details, such as individual orientations, is slower and requires feedback from higher levels of the cortex. This reverse-hierarchy theory might be taken to
imply that visual characteristics extracted at later cortical stages, such as the identity
of a word, are processed more rapidly and have higher temporal resolution than those
extracted at earlier stages.
However, the results for which Hochstein and Ahissar's theory was concocted do
not directly address this issue. The theory is largely based on patterns of response times
in visual-search experiments, which indicate that search for high-level properties, such
as object category, can be faster than search for more basic features. The present
interest is in the processing of an individual item rather than the apparently simultaneous processing of the many items of a search array. Furthermore, in visual search,
the temporal precision of processing is unlikely to be the most important factor determining reaction time (RT). Search RTs are critically dependent on factors such as the
heterogeneity of items in the display (Duncan and Humphreys 1989), and whether
the items are processed in parallel (Egeth et al 1972). Hence, the results of visual-search
experiments may reveal the characteristics of initial perception when viewing a cluttered
scene without indicating the temporal resolution of the processing of an individual
attended item.
Thorpe and his colleagues (Thorpe et al l996; Fabre-Thorpe et al 2001; VanRullen
and Thorpe 2001; Rousselet et al 2002) have measured the speed of high-level visual
discrimination using evoked potential latencies. In one experiment, evoked potentials
recorded at the scalp discriminated scenes containing an animal from those that do
not. These potentials had latencies as short as 150 ms, suggesting fine temporal resolution
for scene and animal processing. However, follow-up work found that the latency of the
discriminating signal is quite variable, ranging from 150 ms to 300 ms (Johnson and
Olshausen 2003). Thus, processing of the various features involved in these visual discriminations may yet be temporally imprecise, despite the impressive overall performance.
To address the temporal precision of high-level visual processing more directly, we use
a task in which the participants must bind together linguistic features presented at the
same timeöeither the two halves of a four-letter string or, in the Chinese experiment,
two halves of a Chinese character. If word or letter-string binding mechanisms have
temporal properties similar to those for motion, global form, spatially superposed colour
and orientation, flicker, or stereopsis (Morgan and Castet 1995), thresholds should
be better than 15 Hz. On the other hand, if the task relies on slower, perhaps more
central mechanisms, with the temporal characteristics of that of binding spatially separated colour and orientation features, we can expect that accurate performance will
occur only for slow rates öless than 8 Hz. As it turns out, threshold rates for binding
the linguistic materials do fall into this latter slow category. Different stimulus conditions were used for this investigation öChinese characters, words, pseudowords, and
mostly unpronounceable nonwords. Although most studies that use varying linguistic
conditions such as these are designed to isolate the effect of various linguistic factors on
small differences in performance (eg Murray and Forster 2004), this study is an exception.
Binding words is slow
53
Here, the interest was not in small differences in performance, but rather only in any
dramatic differences in temporal threshold öessentially, whether each of the stimulus
categories yielded slow (5 8 Hz) or fast ( 415 Hz) thresholds. Revealing smaller differences due to particular linguistic factors would require a much larger set of stimuli,
carefully balanced in a way that would be very difficult given the constraints imposed
by the paradigm (described below). As it turns out, a slow threshold is found in each
condition. Given this, the use of the varying conditions provides confidence that the
slow threshold is not an accident of the particular items chosen, as would be a worry
if only one condition were used.
Figure 1 schematises an example trial in which the word `ball', presented at fixation,
is alternated with the word `deck', also presented at fixation. In the experiment, this
stimulus train is paired with another in which `dell' is alternated with `back'. The utility
of using these particular words is seen when alternating the two sets of words at very
high rates. At rates exceeding the temporal resolution of the visual system, observers
will perceive the sum, which is identical for the two pairs (figure 1).
Alternating stimuli
Sum
A
Two-alternative
forced choice
B
A
Two-alternative
forced choice
B
Figure 1. A schematic representation of the possible stimuli for a trial in the English words
experiment and the Chinese character experiment. The top half of the figure corresponds to a
trial in which the participant was informed that either `ball' or `deck' would be presented in
rapid alternation at the fixation point, or `dell' and `back' would alternate. Because the sums of
these word pairs are the same, participants could only discriminate between the two alternatives
when the presentation rate was slow enough for word perception mechanisms to individuate the
two temporal intervals.
Thanks to the identical sums of the two stimuli, this presentation method systematically limits the binding information to a particular temporal interval. Varying the
alternation rate then varies this temporal interval, allowing an estimate of the resolution of the system. The method exploits the fact that all mechanisms, including those
54
A O Holcombe, J Judson
underlying word recognition, have a non-zero interval over which they integrate or,
loosely, a temporal resolution. In other words, when two inputs to a mechanism are
alternated at a sufficiently high rate, a mechanism will passively integrate them as if
they were presented at the same time.
In the experimental trial corresponding to the stimuli shown in figure 1, observers
are presented with one of the alternating pairs and asked to determine which was
presented. If the presentation rate is such that both words are presented within the
interval that the word recognition system averages over, then performance will be at
chance. By varying the temporal frequency (presentation rate) of the stimuli across
trials, the temporal resolution can be estimated. To compare the results to those with
methods used previously in the literature, temporal thresholds were also measured
with a single-presentation masked exposure.
The word `binding' is used to refer to many different things in psychophysics and
neuroscience. In this paper, by binding we mean the ability to report the temporal
pairing of two aspects of a visual stimulus. The results of our binding experiments are
consistent with the idea that by the time visual signals reach awareness, they have
been combined over an interval of the order of 100 ms. This notion is related to the
`psychological moment' hypothesis, first advanced over one hundred years ago (von
Baer 1864). Historically, a variety of evidence has been marshaled in support of this
somewhat vague idea that psychological processes operate on about a 100 ms time scale
(Stroud 1956). A limitation of the previous evidence is that the temporal resolution of
different visual judgments was not compared, and as seen here, comparing these may
be quite informative for understanding visual mechanisms. The present work does not
address the psychological-moment theory claim that processing occurs in discrete
episodes (Geissler et al 1999; VanRullen and Koch 2003; but see Kline et al 2004,
2006). Instead, the issue here is the temporal precision of binding word parts, with the
question set aside for now whether these limits reflect continuous averaging or discrete
quantised processing.
Simplified movies of the stimuli give a sense of what it was like to be in the experiment
and can be viewed at AOH's web page, currently http://www.psych.usyd.edu.au/staff/alexh.
2 Methods
2.1 Participants
For the experiments with English words, sixteen participants were recruited from
local laboratories and undergraduate psychology classes at the University of California
at San Diego. All spoke English as their first language and were paid by the hour.
Six Chinese participants were recruited from campus Chinese associations and all were
adults under the age of 35 years, raised in mainland China, and all reported fluency
with the type of Chinese characters used in the study.
2.2 English language, repetitive-presentation experiments
In these experiments, participants were shown the stimuli at a variety of alternation
rates and made a forced choice between two alternatives. At the beginning of the trial,
two pairs of four-letter strings were presented. After studying the two pairs of strings
for as long as they wished, the observers fixated on the central dot and hit a key to
initiate the alternating presentation of one of the pairs of strings that they had just
studied.
During the alternating stimulus, observers were to neither blink nor move their
eyes, and their right eye was monitored as described in section 2.5. Immediately after
the alternation ended, the two pairs of letter strings were again arrayed on the screen
and observers hit a key to indicate which had been presented. Feedback indicated
whether a correct response had been made.
Binding words is slow
55
The lighting in the room was dim, and during the trial the background of the
screen was a uniform 84 cd mÿ2. The observers viewed the screen from a chin-rest
placed 68 cm away. The letter strings appeared in a black ( 1 cd mÿ2 ) rectangular
region 85 pixels wide (2.41 deg) by 27 pixels high (0.76 deg). The letters were drawn
in lower-case Courier font with the letter `l' 20 pixels (0.57 deg) high. Each letter was
drawn 22 pixels (0.62 deg) to the right of the previous letter.
The CRT monitor was set to a refresh rate of 85 Hz, and the stimulus began with
the presentation of the `XXXX' pre-mask at 84 cd mÿ2 for two frames (23 ms). The
alternating letter strings followed. Each trial had a particular slowest temporal frequency, the target rate. The goal of the experiment was to determine how narrow an
interval between successive words would still allow observers to accurately determine
the pairing of the letters, with a 75%-correct criterion. To achieve this, however, the
stimuli could not be simply alternated at different frequencies. Care must be taken to
prevent observers from picking off a single stimulus from the beginning or the end of
the stimulus train. Consider that the first stimulus in a train is not pre-masked in the
same way as the subsequent items and the last stimulus is not post-masked in the same
way as the previous items. This means that the observers' integration window would
at certain times include only a single word, obscuring any effect of alternation rate.
Indeed, previous results purporting to reveal high-frequency grouping (Usher and
Donnelly 1998) have since been explained by lack of adequate pre-masking (Beaudot
2002; Dakin and Bex 2002). The transient, high-resolution mechanism engaged when
pre-masking is inadequate (Beaudot 2002) may even extend to more than just the first
item (Holcombe et al 2001). This transient mechanism can obscure the effect of temporal frequency on performance, by allowing even temporal differences finer than the
integration interval of the visual system to affect performance (Beaudot 2002). Use of
this transient mechanism can be excluded by beginning the stimulus with a period in
which the stimuli are much too rapid and at too low a contrast to perceive the feature
pairing, and then gradually increasing the contrast and stimulus exposure duration to
the target rate (Holcombe and Cavanagh 2001; Holcombe et al 2001; Dakin and Bex
2002; Clifford et al 2003, 2004). Hence, there were competing constraints on the manner of the stimulus presentationöthe desire for a smooth increase in stimulus duration
at the beginning and decrease at the end, the desire for approximately equal total
stimulus train durations across target rates, and the goal of at least two successive
stimuli exposed at the target rate, all while keeping total trial length short enough that
participants could fixate well throughout. In practice, pilot data from the authors and
a few other observers indicated that total stimulus train duration was less important
than having a smooth ramp, so the procedure was designed accordingly.
In these repetitive-presentation conditions the stimuli were always presented for two
frames (23 ms), but the black (5 0:3 cd mÿ2 ) interstimulus interval (ISI) varied to achieve
various temporal frequencies. It is appropriate to think in terms of the stimulus onset
asynchrony (SOA), which is the two-frame stimulus duration plus the ISI. This corresponds to the amount of time the stimulus was available given temporal integration.
The results of pilot experiments yielded confidence that the relative proportion of exposure duration versus ISI in the SOA was not critical. Because the presentation of two
successive stimuli form a cycle, temporal frequency corresponds to the duration of
two letter string exposures plus their ISIs. As described in detail in the next paragraph,
within a given trial, the alternation of the letter strings began at a very high rate and
low luminance and gradually increased in luminance while slowing to the critical
rate for that trial. After a few presentations at the critical rate, the presentation rate
gradually increased and the luminance diminished until the trial ended with a 1-frame
ISI terminated by an `XXXX' post-mask for 2 frames. This ensured that observers
could not make the discrimination based on the beginning or end of the stimulus train,
56
A O Holcombe, J Judson
when the stimulus would not be fully pre- and post-masked. In other words, without
this procedure the presentations at the beginning and end would be available at lower
than the intended temporal frequency. A detailed description of the initial ramp-up
and terminating ramp-down of ISI and luminance for each target temporal frequency
is given below.
The initial ramping-up of the exposure of the stimulus was carried out according
to the following rules. While the duration of the stimulus was constant at 2 frames,
the first three presentations had no ISI and the fourth was presented after a 1-frame
ISI. The first two of these presentations were within a few multiples of contrast threshold (5 1 cd mÿ2 ), with the third presented around 3 cd mÿ2 and the fourth around
10 cd mÿ2. The fifth was presented at about twice the luminance of the fourth, and all
subsequent stimuli were presented at about 25 cd mÿ2 , until the last two, which were
presented at 9 and about 1 cd mÿ2 , respectively. The duration of the ISIs after the
fourth presentation increased until the target ISI was reached, with successive ISIs after
the first three of 0, 0, and 1 being 2, 4, 9, and 19 frames. For those target ISIs that were
greater than 19 frames, such as 30 frames, the target rate began immediately after the
19-frame presentation. Target ISIs of less than 19 frames began immediately after the nextlowest ISI duration in the ramp. For example if the target ISI was 6 frames, the
duration of the preceding ISIs were 0, 0, 1, 2, and 4 frames. The gradual ramp down
that began after the exposures at the target ISI included decrements of 10 frames until
the ISI was less than 10 frames, followed by decrements of 3 frames until 1 frame was
reached. Each stimulus train had at least one 1-frame ISI presented, with additional
if needed to yield at least three exposures of the stimulus during the ramp down.
For example, when the target ISI was 30 frames, the following ISIs were 20, 10, 7, 4, 1
frames. For a target ISI of 13 frames, the ISIs of the downramp were 3, 1, 1, 1 frames.
The effect of the 1 frame and 0-ISI exposures at the end and beginning were to provide
a low-contrast mask which appeared as the sum of the 2 letter strings, hence camouflaging the letter pairing. The number of exposures at the target ISI was set to bring
the total duration of the stimulus train close to 1.6 s. These constraints yielded a range
of 1.6 to 1.9 s, mostly because the acceleration and deceleration at beginning and end
took longer for longer rates. This variation is not very different from that in the
duration of the target rates (376 ms). Participants quickly grew accustomed to waiting
for the stimulus to slow and subsequently attempting to perform the task.
The letter strings were centred on the fixation point, which alternated between
bright white and red, changing each time a letter string was presented. This flicker
provided salient repeated events at the fovea, which helped the observer to maintain
fixation. At the end of each trial, a tone and short message informed the observer
whether the correct choice had been made.
There were three different trial types, tested in separate blocks: words, nonwords,
and pseudowords. The ten possible word sets for the trials of the words condition are
listed in table 1, as are the pairs of letter strings for the other conditions. In nonword
trials, none of the four letter strings formed words, and observers had to choose which
of the two pairs of nonwords had alternated. The nonword strings were formed by
transposing the first two and last two letters of the words in the words condition. In a
pseudoword condition, all the letter strings formed nonwords that were pronounceable,
in contrast to the nonword condition, in which most were not pronounceable.
Sessions lasted less than 1 h. After several practice trials, the span of interstimulus
intervals was set for that participant, guided by the participant's performance in the
practice trials. Each subject then participated in 40 trials per condition per interstimulus
interval. The three conditions (words, nonwords, pseudowords) were tested in different
sessions, whereas the interstimulus intervals were randomly intermixed. Some participants were also in earlier preliminary experiments. When it was seen that participant
Binding words is slow
57
Table 1. The stimuli for each of the three conditions of the letter string experiment for English
readers. In each trial, participants discriminated between two possibilities. The two pairs of letter
strings in each cell represent the two possible alternating stimuli specified to the participant at
the beginning of a trial. The stimuli in the nonwords condition were created by spatially transposing the bigrams in the words condition, so that the same letters were used in the words and
nonwords conditions. In all cases, temporally averaging across successively presented letter strings
makes correct discrimination impossible.
Frame
Words
Nonwords Pseudowords
A
B
A
B
A
B
1
2
ball
deck
back
dell
llba
ckde
llde
ckba
rane
moll
rall
mone
1
2
pump
hell
pull
hemp
mppu
llhe
mphe
llpu
runk
pote
rute
ponk
1
2
dent
pull
dell
punt
ntde
llpu
ntpu
llde
lort
hent
lont
hert
1
2
tank
mope
tape
monk
nkta
pemo
nkmo
peta
pime
jurt
pirt
jume
1
2
ball
hunk
bank
hull
llba
nkhu
llhu
nkba
dert
jomp
demp
jort
1
2
pink
jump
pimp
junk
nkpi
mpju
mppi
nkju
hape
miks
haks
mipe
1
2
tame
lick
tack
lime
meta
ckli
ckta
meli
zate
koms
zams
kote
1
2
pike
jock
pick
joke
kepi
ckjo
ckpi
kejo
nive
lank
nink
lave
1
2
call
port
cart
poll
llca
rtpo
rtca
llpo
paln
bisk
pask
biln
1
2
milk
fore
mire
folk
lkmi
refo
remi
lkfo
ferm
hont
fent
horm
GE's data departed at the fastest SOA of the word condition from the pattern of other
participants, he was run in an additional session of the word condition.
2.3 Chinese characters, repetitive presentation
Ten pairs of Chinese characters, shown in figure 2, were used. The Chinese characters
utilised were adopted in their final form by the People's Republic of China over forty
years ago and are the form used today in the overwhelming majority of written and
printed material in mainland China. Certain parts of these characters are called `radicals'.
For the present experiment, these radicals have been exchanged in a particular character
pair, in order to create a corresponding pair of characters. As in the English case,
this yields two pairs of lexemes that have the same sum (figure 1). This ensured that
temporally integrating over successive characters would yield the same sum for either
pair. A character pair was presented in similar fashion to the presentation of pairs of
English letter strings, as described in the previous section. The only difference was the
pre- and post-mask, comprised of the sum of the two successive characters presented
on that trial, rather than the `XXXX' mask used in the English-language experiments.
After several practice trials, the span of interstimulus intervals was set, guided by
participants' performance in the practice trials. Each mainland-China-raised participant participated in 40 trials in each of 5 alternation rates, randomly intermixed, for a
total of 200 trials. Sessions lasted less than 1 h.
Frame
Character pair
A
1
B
Translation
A
B
why?
how?
live
2
pour
river
1
mother
excellent
2
sand
code
1
town
trouble
2
meet
sincere
1
slave
prostitute
2
actress
only
1
ice rain
cave
2
copper
ring
1
burn
cook
2
drink
forgive
1
bridge
rubber
2
photo
alien
1
OK
chirp
2
flesh
fat
1
media
beautiful
2
burn
coal
1
Mei
(name
of river)
flood
2
Lao
(name of
mountain)
Mei
(name of
mountain)
%C Mean log
% frequency
83
ÿ1.04
76
ÿ0.83
71
ÿ1.59
82
ÿ1.10
78
ÿ2.13
72
ÿ1.50
72
ÿ1.80
66
ÿ1.22
65
ÿ1.35
73
ÿ1.88
Figure 2. The pairs of Chinese characters used in the experiment with Chinese readers. As in the
experiment with English readers, 10 pairs of stimuli were used. The sum of each pair of successive
characters is equivalent to the sum of the complementary pair (A versus B). The meaning of most
of these characters is ambiguous without additional characters as context. One possible meaning
is listed for each. %C stands for percent correct.
Binding words is slow
59
2.4 English language, single presentation
Unlike in the repetitive-presentation experiments, only one letter string was shown, and
it was presented just once. It was preceded by an `XXXX' mask shown for 2 frames
(23 ms) and a 1-frame ISI (12 ms) following the pre-mask. The letter string was always
shown for 2 frames and followed by an ISI of variable duration, varied across trials
to determine threshold performance. The `XXXX' was presented subsequently for
3 frames (33 ms) to post-mask the string. As in the other experiments, at the end of
each trial a tone and short message informed the observer whether the correct choice
had been made.
The experiment consisted of two conditions: words and nonwords. As in the
repetitive-presentation experiment, the stimuli were comprised of those in table 1.
The pre-trial screen and response screen were identical to that of the multiplepresentation experiments. Four letter strings were shown in the preview screen, and the
same four letter strings were presented again in the response screen. These sets of four
were the same as those used in the repetitive-presentation condition. Because in this
condition only one of the letter strings, randomly chosen, was presented as a stimulus, only one of the four alternatives was correct, meaning that chance performance
was 25% instead of the 50% chance level of the repetitive-presentation experiments.
Each participant ran in 40 trials at each of 5 ISIs and each of the two conditions,
for a total of 400 trials. Individual sessions lasted less than 1 h. Some of the participants (TM, TN, DB) had also participated in the repetitive-presentation experiment.
2.5 Eyetracking
Saccades and blinks might allow participants to reduce the intended masking by the
stimuli in the train. To detect in which trials participants blinked or moved their eyes,
the right eye was monitored during the experiment. In the experiments with English
readers, the eyetracking setup typically allowed rejection of trials in which observers
made larger than 0.5 deg movements from fixation. For the Chinese observers, apparently due to the differing average configuration of their eyelids, performance by the
eyetracker was much more variable, and typically not nearly as precise, reliably allowing exclusion of movements only when they reached a few degrees in magnitude
or greater. Nevertheless, the eyetracking did have a similar psychological effect in
the Chinese experiment as in the English experiment. That is, the participants were
constantly aware that their eye movements were being monitored and, in the experience of the experimenters, they maintained fixation much better than naive observers
otherwise do.
Each participant was monitored for saccades and blinks with a Skalar Iris (http://
www.skalar.nl) infrared eyetracker (Reulen et al 1988). The output of the eyetracker,
representing the horizontal position of the right eye, was recorded throughout stimulus
presentation. With observer's chin on a rest, the eyetracker was calibrated first by
manual adjustment to yield high gain and an approximately symmetric signal when
fixating two targets at opposite ends of the screen. Observers then participated in an
automated calibration session consisting of repeated saccades among five dots. Two
were horizontally spaced 1.2 deg on either side of fixation, and two more were set
11.6 deg from fixation. These calibration data were used to create a criterion for deciding when a saccade was made. In the case of the experiments with letter strings, the
criterion was exceeded when a saccade of approximately 0.5 deg or greater occurred.
To create this criterion, the position signals from the calibration trials were filtered
with a Chebyshev-type-two first-order low-pass filter. The maximum piecewise velocity
during the time of the saccade was determined, and the mean maximum velocity from
a sample of 1.2 deg calibration saccades was calculated. In the case of the experiment
which included English words, the 1.2 deg calibration saccade with maximum velocity
60
A O Holcombe, J Judson
closest to the mean was examined more closely. The Chebyshev-filtered signal was
convolved with the kernel [1 ^ 0.2 ^ 1], and the criterion for rejecting a trial was set to
be significantly smaller than the magnitude of the convolved 1.2 deg record. In particular, the maximum magnitude of the convolved signal corresponding to noise around
the time of the saccade was estimated, and the rejection threshold was set a bit higher
than this noise-evoked signal. More precisely, the criterion was calculated by adding
to the maximum noise-evoked convolution magnitude 20% of the difference between
the saccade-evoked convolution magnitude and the noise-evoked convolution.
Apparently because of the smaller average eye opening of the Chinese participants,
a reliable estimate of the magnitude of the convolved signal evoked by the small calibration saccades could not be made; indeed, often the noise was as large as this signal.
The criterion for rejecting a trial was instead set at just above the noise seen in the
perisaccadic interval around the 11.6 deg saccade time.
3 Results
3.1 Letter strings, repetitive presentation
Figure 3 shows performance as a function of alternation rate for the English words,
nonwords, and pseudowords conditions, after trials with significant eye movement were
eliminated. Accuracy was highest and usually close to perfect for the slowest alternation rates, when exposure duration was longest. At higher alternation rates, performance
gradually declined. Pilot experiments had consistently shown that performance was near
chance at frequencies above 8 Hz, and this was borne out in the data. An apparent
exception was shown by participant GE for words at 8.5 Hz, but when he returned to
the lab 5 months later (data shown in inverted triangles), the anomaly was not replicated,
despite the benefit of practice evident at the slower rates. Hence, the one exception to
poor performance at higher temporal frequencies appears to have been due to random
chance.
A useful way to summarise performance in this task is with the 75% threshold of
each subject in each condition. As such thresholds are commonly used in psychophysics,
they have the further advantage of allowing rough comparisons with earlier psychophysical studies. In particular, a point near 75% is often used because it is typically
at a portion of the curve where the accuracy versus frequency slope is high, making it
sensitive to differences across conditions.
We extracted the 75% thresholds from the data by fitting percent correct (%C) to
cycle duration, using a class of curves that capture the main features of psychometric
functions. The curve class fitted, equation below, was the top half of a cumulative
Gaussian rescaled to go from 50% to 98.5% (98.5% used instead of 100% to allow for
occasional lapses), with freely varying sigma (s) and a freely varying exponent on
cycle duration in seconds (x). For convenience, the expression is given in terms of the
MATLAB error function used (erf ):
r 1
x
%C ˆ 0:5 1 ‡ 0:985 erf p
.
2 s
The best-fitting curve was chosen by maximum likelihood via Nelder ^ Mead simplex
search method (MATLAB code available from the first author).
The 75% thresholds in this 2-alternative forced-choice design were sometimes a bit
higher than the 3 ^ 4 Hz found previously for judging the pairing of spatially separated shapes and colours (Holcombe and Cavanagh 2001), but they were far below the
20 Hz rates achievable for binding local elements into the global forms of Glass patterns
(Clifford et al 2004).
Recall that the purpose of these experiments was not to contrast the different
classes of stimuli. Instead, different classes of stimuli were used simply to determine
Binding words is slow
%C
100
61
SOA=ms
100
50
3.5 Hz
2.5 Hz
%C
%C
1.5 Hz
GE
GE
GE
75
50
3.3 Hz
3.2 Hz
100
1.5 Hz
JY
JY
JY
75
50
25
3.7 Hz
2.7 Hz
100
%C
33
DB
50
25
1.9 Hz
SH
SH
SH
75
50
25
2.7 Hz
2.8 Hz
100
%C
SOA=ms
100
50
33
DB
DB
100
1.9 Hz
TL
TL
TL
75
50
25
5.2 Hz
100
%C
SOA=ms
100
50
75
25
8.2 Hz
3.3 Hz
TM
TM
TM
75
50
25
4.3 Hz
2.8 Hz
100
%C
33
2.1 Hz
TN
TN
TN
75
50
25
7.4 Hz
5
10
AR=Hz
2.9 Hz
15
words
5
10
AR=Hz
pseudowords
0.9 Hz
15
5
10
AR=Hz
15
nonwords
Figure 3. Each plot is percent correct (%C) as a function of alternation rate (AR) for a particular
participant (inset initials). Alternation rate is expressed in hertz at bottom, with corresponding
stimulus onset asynchrony shown at top. Chance performance is 50%. Trials in which significant
eye movement was detected were excluded. Standard error bars are shown where they exceed the
symbol size. Thin line is fit psychometric function. Dashed line shows 75% threshold. In the words
condition subject GE shows an anomalous data point for the fastest rate, which did not recur in
a second session (inverted triangles).
62
A O Holcombe, J Judson
whether the slow threshold result was a general one. As linguistic variables were not
controlled across the different classes of stimuli, a number of factors could explain
any differences, which should be kept in mind in the following report of the results in
the various conditions.
For the different conditions of this experiment, as shown in table 2, performance
was highest with the words (75% threshold ˆ 4.3 Hz), somewhat lower for the pseudowords condition (3.6 Hz), and still lower for the nonwords (1.9 Hz). Paired-samples t -tests
indicate that the difference between the words and pseudowords is not significant
(t6 ˆ 0:82, p ˆ 0:44), whereas all other differences are: for the words versus nonwords
t6 ˆ 3:4, p ˆ 0:01, and for the pseudowords versus nonwords t6 ˆ 3:1, p ˆ 0:02.
Table 2. 75% Thresholds (cycles sÿ1 =ms SOA).
Repetitive presentation
Mean
SE=Hz
N
words
pseudowords
nonwords
Chinese characters
4.3 Hz=116 ms
1.6
7
3.6 Hz=139 ms
2.0
7
1.9 Hz=263 ms
0.7
7
2.9 Hz=172 ms
1.1
6
These results clearly indicate that temporal binding thresholds for the nonwords
were lower than for the pseudowords and the words. The main point of this paper is
that all the conditions yield slow thresholds, all of less than 5 Hz. Still, the difference between the words and nonwords is notable, especially since several strings in
the nonwords condition were pronounceable (this was a consequence of satisfying the
constraint that the same bigrams be used as in the words condition).
The effect of blocking can be examined by comparing the unpronounceable strings in
the nonwords condition to the four strings within that condition which were pronounceable
(`pemo', `meta', `kepi', `refo'). Excluding those pronounceable strings does not change
the thresholds by much. Thresholds were lowered by 0.1055 (not statistically significant,
t6 ˆ 0:46, p ˆ 0:67). That the pronounceable-pseudowords condition nonetheless yielded
much higher thresholds suggests some benefit of blocking pronounceable nonwords
together, perhaps because in the less-pronounceable-nonwords condition participants
were in the habit of using a less effective coding strategy for the letter strings.
In the analysis just described, thresholds from different-sized data sets were compared
(nonwords condition with and without pronounceable nonwords). Some data-fitting
algorithms can exhibit a bias dependent on the size of the data set. To ensure that the
exclusion of stimuli did not bias the threshold estimation procedure with our algorithm, thresholds were re-estimated after excluding random four-stimulus subsets of the
nonword condition. Considering five hundred random subsets drawn with replacement,
the average threshold difference was negligible (0.0048 Hz) and certainly not statistically
significant (t499 ˆ 0:57, p ˆ 0:57).
For further examination of the effect of pronounceability within the nonwords
condition, one might compare thresholds for the four pronounceable strings with the
thresholds for the rest. Unfortunately this was not possible, because the small numbers
of trials with pronounceable strings made the thresholds unstable. However, some insight
can be had into the variation due to particular items by examining percent correct by
item, collapsed across duration, which is shown in table 3. The overall percent correct
for the trials of the nonwords condition containing strings which happen to be pronounceable is at the low end of the accuracy for the items in the pseudowords condition
containing exclusively pronounceable items.
Although the effect of blocking should be further explored, the difference between
the nonwords condition and the other conditions appears reliable. The difference likely
Binding words is slow
63
Table 3. Percent correct (%C) for each stimulus pairing, collapsed across alternation rates.
Words Nonwords
Pseudowords
A
B
%C
A
B
%C
A
B
%C
ball
deck
back
dell
73
llba
ckde
llde
ckba
59
rane
moll
rall
mone
78
pump
hell
pull
hemp
77
mppu
llhe
mphe
llpu
70
runk
pote
rute
ponk
77
dent
pull
dell
punt
75
ntde
llpu
ntpu
llde
54
lort
hent
lont
hert
77
tank
mope
tape
monk
75
nkta
pemo
nkmo
peta
74 a
pime
jurt
pirt
jume
65
ball
hunk
bank
hull
80
llba
nkhu
llhu
nkba
56
dert
jomp
demp
jort
65
pink
jump
pimp
junk
76
nkpi
mpju
mppi
nkju
58
hape
miks
haks
mipe
71
tame
lick
tack
lime
73
meta
ckli
ckta
meli
64 a
zate
koms
zams
kote
69
pike
jock
pick
joke
81
kepi
ckjo
ckpi
kejo
62 a
nive
lank
nink
lave
69
call
port
cart
poll
78
llca
rtpo
rtca
llpo
58
paln
bisk
pask
biln
70
milk
fore
mire
folk
80
lkmi
refo
remi
lkfo
67 a
ferm
hont
fent
horm
71
a Note:
These instances of the `nonwords' condition contained pronounceable strings.
results from one of the various mental processes involved in determining which letters
were presented simultaneously. The difference across conditions cannot be due to
difficulties in identifying the individual letters at higher rates, as the letters used in
corresponding trials of the words and nonwords conditions are identical (table 1).
The possible reasons for the difference are described in section 5 below.
The present experiment was not designed to investigate the role of word frequency on
binding threshold. Still, its role in variation of performance within our words condition
can be tentatively examined by regressing performance on word frequency. Frequencies
were taken from the CELEX database (Baayen et al 1995; Davis 2005) and the mean
of the log frequencies for each set of four words was calculated. `Mope' was not present
in the frequency database, so it was assigned a nominal frequency of 0.6, lower than that
of all the other words, which ranged from 0.67 to 1233. When overall percent correct
for each four-word set is regressed on mean log frequency, the resulting r ˆ 0:16 is
small and not significant ( p ˆ 0:68). Unfortunately, the number of trials per four-word
set per rate was too small (4) to allow separate estimation of temporal thresholds for
each four-word set. The regression was performed separately for the five presentation
rates but was far from significant in all cases, with the r negative for two.
In the context of the range of temporal thresholds reported in earlier literature,
thresholds in all conditions of this experiment were decidedly slow. But rather than being
a general result for linguistic stimuli, it is possible that these results may be dependent
on the particular words and nonwords chosen, the language used, or even the writing
system. To determine the generality of the slow thresholds found for these linguistic
stimuli, we decided to go as far afield as possible, and investigate a case of different
words from a different language with a different writing system.
64
A O Holcombe, J Judson
3.2 Chinese characters (repeated presentation)
Figure 4 shows percent correct as a function of alternation rate for the Chinese characters, with one plot for each of the six Chinese natives. As in the experiment with letter
strings, thresholds rapidly decreased with increasing temporal frequency. The mean
75% threshold was 2.9 Hz, in the slow end of the range found with the letter strings.
A quantitative statistical comparison with the results of the letter strings experiment
is not advisable, given the use of different participants and materials. Still, there is no
question that thresholds are a great deal lower than the 15 ^ 20 Hz frequencies found
for binding local elements into regular global forms (Clifford et al 2004).
%C
100
SOA=ms
100
50
33
JY
50
3.3 Hz
2.1 Hz
100
%C
SOA=ms
100
50
CL
75
25
DS
75
LH
50
25
1.7 Hz
3.2 Hz
100
%C
33
JL
NL
75
50
25
2.5 Hz
2.3 Hz
5
10
AR=Hz
15
5
10
AR=Hz
15
Figure 4. Each plot shows one subject's percent correct (%C) as a function of alternation rate
(AR) of the Chinese characters. The thin line shows the psychometric fit, and the inferred 75%
threshold is also shown. Standard error bars are shown where they exceed symbol size. All subjects
were expert readers of simplified Chinese. Chance performance is 50%.
Although in reading, as with English words, Chinese character corpus frequency
can have a large effect on performance (Seidenberg 1985; Hue 1992), in the present
task no significant effect was found. Simplified Chinese character frequencies were taken
from the combined corpus of Da (2004). The mean log percent frequency was calculated for each four-character set (figure 2). Regressing overall percent correct on log
frequency yielded a small Pearson product-moment (r ˆ 0:049, p ˆ 0:89). Trials per
word at each rate were too few to yield good estimates of the 75% threshold. However,
we were able to examine whether there might be a strong effect of word frequency
confined to a particular difficulty level by combining percent correct across observers
for the fastest rate for each observer, the second fastest rate for each, etc. Pearson correlations were low and never significant. From the lowest to the fastest rate, the r s were
0.34, 0.19, 0.18, ÿ0:23, and ÿ0:17, with corresponding p s ˆ 0.34, 0.59, 0.62, 0.52, 0.64.
3.3 Letter strings, single presentation
The results with letter strings described above reflect a repeated-presentation method
designed to test the temporal resolution of binding the parts together. This is in contrast
with earlier studies of the perception of masked letter strings, in which a stimulus was
presented only once in a trial. It was expected that, with this more traditional methodology, the items could be discriminated on the basis of much shorter exposure intervals
Binding words is slow
65
as, without repeated presentation, binding during the exposure time interval was not
necessary. The data shown in figure 5 show that, indeed, letter strings can be perceived
on the basis of much briefer intervals when only one is shown, even when it is masked.
The reasons for this are described in section 5. Although chance performance was
25% rather than 50% as in the other experiments, performance was so good that SOAs
even for 75% performance for many participants were briefer than could be presented
with our CRT, and were almost always much briefer than thresholds with repeated
presentation. Since thresholds could not be estimated from these data, to compare
performance across conditions paired t-tests on percent correct were performed.
Successive screen refreshes were separated by 11.8 ms, and the briefest stimulus exposure in the experiment consisted of two refreshes drawing the letter string and a single
blank frame before the post-mask, for a total exposure of 35.3 ms. At this shortest
SOA, participants provided the correct answer in 87% of trials in the words condition
against 71% of trials in the nonwords condition. A two-sample paired t-test showed
this difference to be significant (t10 ˆ 2:9, p ˆ 0:02). The difference was also significant
at the next two longer durations for which the advantage of the words condition was
10% and 13% (t10 ˆ 2:25, p ˆ 0:048; and t10 ˆ 4:1, p ˆ 0:002), but not the longest two
durations for which the advantage was 3% and 2%. The absence of a statistically
significant difference in the longest two durations is welcome, as it suggests that the
performance difference was caused by the manipulation of exposure duration rather
than always being present. The differences at the faster rates are comparable to, or
somewhat larger than, those found in earlier literature (Manelis 1974).
Accuracy in this experiment was not sensitive to mean log word frequency of each
four-word set. Regressing overall percent correct on mean log word frequency per fourword set yields an inverse correlation (Pearson r ˆ ÿ0:33, p ˆ 0:35). Performing the
analysis separately by rate yields a slightly negative r for each rate, with none statistically
significant.
SOA=ms
100
50
%C
100
SOA=ms
100
50
33
SOA=ms
100
50
33
SOA=ms
100
50
33
33
75
50
25
AS
AZ
JW
LJ
CS
DB
RC
TM
%C
100
75
50
25
5
10
AR=Hz
%C
100
words
75
50
25
15
nonwords
TN
JO
EL
5
10
AR=Hz
15
5
10
AR=Hz
15
5
10
AR=Hz
15
Figure 5. With this single-presentation experiment, the same stimuli and exposure durations were
used as in the repeated-presentation experiments. Overall performance and 75% thresholds are
much better. Each plot shows one subject's percent correct (%C) as a function of SOA of the two
successively presented words. Equivalent alternation rate (AR) is shown at bottom. Standard error
bars are shown where they exceed symbol size.
66
A O Holcombe, J Judson
4 Eye movements and 75% thresholds
4.1 Letter strings, repetitive presentation
In this experiment one concern was that eye movements might, on some trials, allow
participants to circumvent the pre- and post-masking. Each successive stimulus in the
repetitive-presentation paradigm is meant to mask the others by falling on the same
patch of retina. This way, when presentation rate exceeds temporal resolution of the
neural processes that binds the letters into a string, successive unbound strings will
be combined and alternating strings with the same sum will not be discriminated.
However, if a saccade is made at particular times during the stimulus presentation,
then successive strings will not land on the same patch of retina. Potentially, the brain
could then combine visual information over long periods and still recover the string's
identity. Indeed, in the case of an eye movement, the string would be combined with
the empty background rather than a successively presented masking string. Hence,
saccades might yield inflated temporal thresholds; moreover, a different pattern of eye
movement in different conditions might change the pattern of thresholds across conditions. In the authors' experience, using saccades willfully to subvert the masking was
difficult but occasionally effective, especially after extensive practice. Strategic eye blinking could also subvert the stimulus masking. Fortunately, eyeblinks were easily detected
by the eyetracker and all but quite small saccades were also reliably detected.
Analysis of the data indicates that eye movements and blinks did not have a
substantial effect on thresholds. Thresholds estimated from the full data set were compared with those estimated from the data after discarding trials in which saccades
were detected (see section 2.5 for eyetracking details). The mean changes in threshold
caused by rejecting trials with eye movements in each condition were, with associated
standard error across the seven subjects, ÿ0:6 0:9 Hz in the words condition,
0:2 0:5 Hz in the pseudowords condition, and ÿ0:1 0:2 Hz in the nonwords condition. None of these changes in estimated threshold was significant according to
t-tests. Also insignificant were the differences between conditions in these threshold
changes (one-way ANOVA: F2, 18 ˆ 2:73, p ˆ 0:092). Nonetheless, the decrease in threshold caused by rejecting trials with eye movement in the words condition (t6 ˆ 1:74,
p ˆ 0:13), although statistically insignificant, is consistent with the possibility that, in
a small number of trials, eye movements defeated the pre- or post-masking. The magnitude of threshold changes due to rejecting eye movements is nearly as large as the
difference in threshold between the words and pseudowords condition. For this reason,
any further future studies of the difference between these conditions (nonsignificant in
this study) should monitor eye movements.
The evidence also indicates that the number of eye movements and blinks detected
differs across the conditions. The differences across the conditions in number of trials
rejected were not very large, but they were statistically significant. Of 40 trials per
condition per subject per interstimulus interval, the mean number and associated standard error of trials rejected were: 10:8 1:14 for the words condition, 4:7 0:88 for
the pseudowords condition, and 8:6 0:92 for the nonwords condition. In an ANOVA
with condition, interstimulus interval, and their interaction as factors and number of
trials rejected as the dependent variable, condition was significant (F2 ˆ 4:43, p ˆ 0:015),
whereas interstimulus interval was not (F4 ˆ 0:11, p ˆ 0:98). It is not clear why participants apparently made a comparable number of eye movements in the words and
nonwords condition, but made fewer in the pseudowords condition. However, this
pattern of differences does not correlate with the differences in temporal threshold,
which provides further confidence that the measured differences in thresholds were not
related to differences in eye movements in the different conditions.
Some participants exhibited significantly more eye movements than others, with the
average number of trials rejected per 40-trial block ranging from 2.3 for one participant
Binding words is slow
67
to 18.3 for another. This was expected, as it is commonly observed in psychophysical
experiments that some participants fixate better than others. If these eye movements
allowed some subjects to inflate their thresholds, then one would expect a positive
relationship between number of eye movements and 75% thresholds estimated from the
data when all trials, including eye-movement trials, are included. A small and statistically insignificant but positive correlation was found (r ˆ 0:2, p ˆ 0:39). That this
already-small correlation diminished to nearly zero (r ˆ 0:04, p ˆ 0:86) when trials
with detected eye movements were discarded further increases confidence that thresholds
after screening by the eyetracker were not greatly inflated by eye movements.
4.2 Chinese characters (repetitive presentation)
The relationship of the data in the Chinese characters experiment to the detected eye
movements did not differ markedly from that found in the letter-strings experiment.
Chance performance was 50%, and the 75% thresholds were again estimated just as in
the English repetitive-presentation experiment. In the Chinese characters experiment,
of the 40 trials that each subject ran in each condition ^ ISI combination, an average of
11.9 were rejected owing to eye movements or blinks. Rejecting these trials caused a
mean decrease of threshold of 0.4 Hz, which is statistically insignificant (t5 ˆ 1:65,
p ˆ 0:16). As in the experiment with letter strings, there were differences among subjects
in the number of eye movements made, with the range spanning 17 to 6. Again, as in
the experiment with letter strings, statistically insignificant correlations were found between the number of eye movements made and the threshold, both before and after trials
with eye movements were discarded (r ˆ 0:45, p ˆ 0:37; r ˆ 0:35, p ˆ 0:49, respectively).
5 Discussion
Of all visual skills, reading is one of the most overlearned. Indeed, most literate adults
have decades of near-daily experience with the task of reading. Thanks to this practice,
word reading is quite automatic (Stroop 1935). If automaticity and general experience
determined temporal limits in perception, then we might expect thesholds for words
to be among the fastest of all visual thresholds. Instead, in the current study we found
that temporal binding thresholds for English words are comparable to those for nonword letter strings and arbitrary conjunctions of spatially disparate colour and shape
elements. We found a similar result for binding the parts of Chinese characters, even
though they may be processed more holistically than are English words (eg Tzeng et al
1979; Chen 1984). The enormous advantage of words over nonwords stimuli in frequency of exposure, memorability, and other factors has not allowed temporal binding
thresholds to approach those of the fastest in the visual system.
Another result of the present work further distinguishes the slow-threshold linguistic
material from visual material that is bound at high rates. The phenomenology of it is
clearly different. In the case of alternating Glass patterns and alternating gratings, at
rates faster than several hertz observers report perceiving both patterns simultaneously,
yet the features presented at the same time remain grouped in perception (Holcombe
2001; Clifford et al 2004). In the case of the letter strings, as the rate of alternation
increases beyond 5 Hz, the successive letters eventually appear as if transparently
overlaid, but observers reported that there is no strong grouping perceived between
particular letters. In other words, binding is inaccurate not owing to perceptual misbindings, but rather because no particular binding is really perceived. Although this
is very like the phenomenology previously found for slow alternation thresholds, it is
entirely different from the binding problems reported with single exposure. In those
cases, participants often mistakenly had high confidence that their binding errors were
correct (Shallice and McGill 1978; Mozer 1983). Further investigation will be needed to
discover the reasons for this difference.
68
A O Holcombe, J Judson
Frequency of experience with words has a large effect on response time in conventional word-reading tasks. In a study of visual-duration thresholds for identifying words,
Howes and Solomon (1951) found large Pearson correlations with log frequency of
between r ˆ 0:5 and 0.8. Unlike conventional identification tasks, with our forced-choice
methodology participants were informed of the options before each trial. The intention
was to minimise any time needed for lexical search and other linguistic processes,
in order to maximise the possibility for a fast threshold. The absence of a significant
frequency effect suggests that we succeeded. Previous studies of Chinese characters have
sometimes found an even larger frequency effect than is found in English (Seidenberg
1985; Hue 1992). In the present investigation with Chinese characters this frequency effect
was absent.
Our primary interest is what causes words and Chinese characters to be inaccessible at high temporal frequencies, in contrast to simple motion direction (Burr and
Ross 1982), flicker, edges and texture boundaries (Forte et al 1999; Kandil and Fahle
2003; Ramachandran and Rogers-Ramachandran 1991), depth from binocular disparity
(Morgan and Castet 1995), and pairings of superposed local colour and orientation.
Perhaps the most striking discrepancy from the present result with pairing the spatially
separated forms of a word was the result when many form elements were arranged
into simple shapes such as spirals or radial patterns. These are integrated with high
efficiency in both space and time (Wilson and Wilkinson 1998; Clifford et al 2004),
whereas letters and Chinese character parts apparently are not.
These results fit well into the present theoretical framework of lower temporal
resolution for central processes. First, consider the case of a series of bars drifting at
high temporal frequencies, resulting in a retinal patch and its corresponding cortical
cells receiving a rapidly fluctuating input. The motion direction of such stimuli is
extracted by areas relatively early in the processing hierarchy (MT or earlier) whose
neurons have high temporal resolution (Borghuis et al 2003). These mechanisms transform the rapidly varying stimulus information into a constant signal. That is, the rapidly
varying (high temporal frequency) pattern of light and dark as the bars pass by in a
particular direction is signaled by a relatively constant (low temporal frequency) firing
by direction-selective cells. Hence, any subsequent loss of high temporal frequencies,
such as may occur at more central stages, will preserve the now low-frequency motion
signal. In the case of words and Chinese characters, the poor temporal resolution may
result from the lexical recognition mechanisms being situated after a stage which averages
over several dozen milliseconds or more.
Unfortunately, extant models of word processing have yet to address these temporal
binding issues (McClelland and Rumelhart 1981; Brysbaert and Vitu 1998; Coltheart
et al 2001; Engbert et al 2002; Pollatsek et al 2003; Reilly and Radach 2003; Davis,
submitted). Indeed, usually no attempt is made to situate the efficiency of work processing in the context of other visual judgments. Instead, word models in the literature
are designed primarily to explain differences in response time due to psycholinguistic
factors such as lexicality, word frequency, length effects, and priming effects. A partial
exception are the modeling efforts of Graboi and Lisman (2003) who grapple directly
with some neurophysiological constraints on processing time. Their model does not
address our repetitive-presentation method, but appears to predict that, with a 2-alternative task like ours, recognition should occur in only a few processing cycles, or
about 50 ms or less. This is, of course, quite different from the thresholds found here,
which were uniformly greater than 100 ms. If word-perception models were changed
to accommodate temporal binding thresholds, this might help the models account for
normal and rapid serial-presentation reading as well.
Progress in understanding visual thresholds for words, as opposed to response
times, has been halting, and computational models that explain word perception do
Binding words is slow
69
not seem to address the recent psychophysical threshold findings. Analysis to date,
however, indicates that contrast and lateral-masking thresholds for words can be
explained by independent detection of letters, without holistic word processing (Pelli
et al 2003; Martelli et al 2005). This is quite different from contrast thresholds for
certain static forms and motions (Morrone et al 1995; Wilson and Wilkinson 1998),
in which detectors integrate efficiently over the constituent elements. As these same
stimuli and visual judgments that efficiently pool over space also have precise temporal
thresholds, we see a parallel here. Perhaps the presence of a specialised mechanism in
the visual system leads to both benefits for the visual judgments it serves.
The fast thresholds found to date are robust, in that variants of the forced-choice
task and of the particular stimulus parameters chosen do not change the 415 Hz result
(Burr and Ross 1982; Holcombe and Cavanagh 2001; Clifford et al 2003, 2004), provided that the total duration of the stimulus train is sufficient (Bodelon et al 2004).
The slow 3 Hz thresholds are also robust, in that different arrangements of the spatially
separated features had little effect on thresholds (Holcombe and Cavanagh 2001).
Given the robustness of previously found fast and slow thresholds to methodological
changes, it is unlikely that changing the particulars of the presentation method used will
allow thresholds for pairing spatially separated letters to improve by much. The present
results suggest, however, that high-level factors like pronounceability might shift the
slowness of the slow thresholds. This is in agreement with the present theory. To understand why, first consider the robustness of the fast thresholds. In cases where early visual
mechanisms bind the relevant visual information, if the observer is to make the appropriate response, the visual representation must subsequently be recognised by central
processes and consolidated into memory. These central processes will certainly take
time, but thanks to the temporal transparency phenomenon the relevant visual representation is continuously available to central processes (Holcombe 2001). Hence, with
a long train of stimuli any reasonable duration requirement of central processing is
eventually met, as long as the temporal frequency does not exceed the resolution of the
early visual-binding mechanism. The situation is quite different when temporal transparency does not occur, such as with the present linguistic material. Then, central
processes no longer have the interrupted, extended duration of the stimulus train available to recognise and remember. Instead, without temporal transparency a particular
stimulus is available centrally only in a series of short episodes interrupted by short
episodes of the other stimulus. The duration of every episode corresponds to the
interstimulus interval. With visual recognition, memory encoding and consolidation,
both likely to require a substantial amount of sustained processing time (Jolicoeur and
Dell'Acqua 1998; Lawson and Jolicoeur 2003), factors that affect the duration of these
processes would also affect the temporal threshold in our tasks. In other words, when
peripheral processes do not transform the intermittent stimulus into a constant representation, cognitive stages can be a time-limiting step. In the case of the unpronounceable
strings in our experiment, they likely had longer central encoding times and this may
have yielded the slower thresholds.
Interestingly, the advantage here for words over unpronounceable nonwords,
although small when compared to the full range of resolution limits exhibited in visual
performance, is sizeable by the standards of previous studies. In these previous studies
using brief presentation of linguistic material, the letter string was presented just once
in a trial, and usually post-masked. Duration identification thresholds were longer
for words than for nonwords (eg Baron and Thurston 1973; Manelis 1974). Typically,
however, participants were not informed of the alternatives before the stimulus was
presented, allowing memory factors and bias to play a large role. Most published
results from forced-choice designs did not find an advantage for words (Bjork and
Estes 1973; Thompson and Massaro 1973). In the forced-choice study which did find a
70
A O Holcombe, J Judson
word advantage (Smith and Haviland 1972), the advantage was quite small, much smaller
than that found with the repetitive-presentation paradigm of this paper. A 4% ^ 7%
difference in accuracy was found in that study, whereas our study yielded an accuracy difference of 16% at the shortest duration in the single-presentation case and even
larger in the repeated-presentation experiment (as much as 22%).
It is always difficult to be sure that a difference between words and nonwords has
not been caused by the greater ease with which words are remembered. In the present
study we attempted to minimise the role of memory by presenting the four possible
stimuli both before and immediately after each trial, but this does not guarantee that
the observers invested the time necessary to completely internalise the options. If they
did not, the poorer thresholds in the nonwords condition could be explained by a
difference in the degree to which the alternatives were held in working memory during
the stimulus viewing period. Even if the alternatives are learned well, differences in
memory consolidation and retention processes could still result in poorer thresholds
for nonwords. Determining the processing stage(s) at which words have an advantage
over nonwords will require significant further work. One reason for the historically large
difference found here between words and unpronounceable strings likely is the novel
repeated-presentation methodology. This method also provides some other advantages
over traditional methods, as discussed in the following section.
5.1 Masking, single presentation, and multiple presentation
In single-presentation conditions, the SOA between word and mask can be less than
40 ms and still the word can be reliably discriminated (Manelis 1974). This result was
replicated in the present work. But the result with the novel, repetitive-temporalbinding paradigm was quite differentö75% thresholds of 116 ms, even though the
same items were used in both conditions. The present experiments, then, show that
differences between the single-presentation and repetitive-presentation literatures are
due to method rather than material.
A major factor distinguishing single from repetitive presentation is the difference
in the time scale that the target's visual information is available. This difference may go
a long way towards explaining the much longer thresholds found in repetitive presentation. Consider that the repetitive-presentation condition was designed to be limited
by the temporal precision with which the word parts are represented. In the repetitivepresentation condition, if the system loses enough precision in its representation of
which letters were presented when, performance should be at chance. In the singlepresentation case, temporal imprecision is of little consequence. As long as each of the
letters is perceived, it matters not whether the system represents their time of occurrence
precisely.
Furthermore, in single presentation even when the letter strings are presented so
briefly that the visual system integrates the letter string together with the conventional
`XXXX' masks, potentially information is still available to determine which letter string
was presented. That is, summing the mask and the letter string does not obliterate cues
to the identity of the letter string. This is in contrast to the repetitive-presentation case,
where summing successive stimuli obliterates all clues to which of the two alternatives
was presented.
In the efforts to determine the stages in processing that lexical factors modulate,
researchers should attempt to minimise differences in the attention allocated to words
versus nonwords. The repetitive-presentation technique is likely to be less affected by the
dynamics of attention than is single presentation. Certain words may attract attention
more than do nonwords (Mack and Rock 1998), which may contribute to the advantage
of words in single presentation, where attention must be allocated at the right instant.
Binding words is slow
71
If attention is accidentally engaged at the wrong time, subsequent stimuli can easily
be missed (Duncan et al 1994). This is potentially a problem in simple masked displays,
whereas the consequences of occasionally engaging attention on the wrong stimulus is
less of an issue when targets are the only things presented, and are presented repeatedly.
As we have seen above, differences in the information available between the singleand repetitive-presentation methods at a given SOA may explain the difference in
temporal thresholds. This should indeed be considered the most likely explanation of
the difference. Still, there remains the possibility of an actual difference in the way the
stimuli are treated in the two conditions. It is conceivable that the method of repetitive
presentation itself may change the nature of the processing that occurs. Specifically,
the temporal integration window conceivably might expand to reflect the temporally
extended input of the repetitive-presentation condition. This could lead to the slow
alternation thresholds found herein despite temporally precise processing when words
are presented in non-repetitive fashion. The accurate performance at high rates of nonrepetitive presentation found by Potter and others (Rubin and Turano 1992; Potter
1993) as well as Sperling et al (1971) inspire this suggestion that single presentation
may indeed result in shorter integration intervals. Yet this possibility remains doubtful,
because all the reasons described above for better performance with non-repetitive
presentation apply to this literature as well.
5.2 Future directions
The present results militate for a model in which the temporal precision of binding is
fine for efficient, specialised visual mechanisms, but coarse for those that rely on higherlevel mechanisms, such as judgments of linguistic material. This is consistent with the
theory that high-level mechanisms subserving explicit judgments have access only to
coarse visual time scales.
In the tasks used in the present paper, only explicit judgments were solicited from
the study participants. It is possible that high-level mechanisms, such as those which
extract lexical information, process visual information with high temporal resolution,
but that this information does not become available to explicit judgments. The existence of semantic priming from words even with very brief exposures (eg Greenwald
et al 1996) is suggestive here, although the concerns described above for single presentation do apply. Still, Potter and her colleagues (Potter 1993; O'Connor and Potter
2002) have provided evidence that individual items are processed to a semantic level
when embedded in a rapid stream of items, and that these items can be recalled only
if they are conceptually related to other items in the stream. To test the temporal
precision of this sort of implicit word processing, a neuroimaging study or behavioural
study of priming should be carried out, using an appropriate methodology such as
the repetitive-presentation method introduced here.
Acknowledgments. We thank Liqiang Huang for assistance with the Chinese aspects of the
study, including his work to identify Chinese character pairs that could be used in our paradigm.
Bill Holcombe and Janice Lai also contributed. Tom Sanocki, John Jacobson, and Catherine Harris
commented extensively on the manuscript and Mark Elliott provided stimulating feedback.
Discussions with Colin Davis were very beneficial.
References
Baayen R H, Piepenbrock R, Rijn H van, 1995 The CELEX Lexical Database. Release 2 [CD-ROM]
(Philadelphia, PA: Linguistic Data Consortium, University of Pennsylvania)
Baer K V von, 1864 ``Welche Auffassung der lebendigen Natur ist die richtige? Und wie ist diese
Auffassung auf die Entomologie anzuwenden? [Which view of living nature is correct? And how
is this view to be applied to entomology?], in Reden gehalten in wissenschaftlichen Versammlungen
und kleinere Aufsa«tze vermischten Inhalts Ed. H Schmitzdorff (St Petersburg: Verlag der kaiserlichen
Hofbuchhandlung) pp 237 ^ 283
Baron J, Thurston I, 1973 ``An analysis of the word superiority effect'' Cognitive Psychology 4 207 ^ 228
72
A O Holcombe, J Judson
Beaudot W H, 2002 ``Role of onset asynchrony in contour integration'' Vision Research 42 1 ^ 9
Bjork E, Estes W K, 1973 ``Letter identification in relation to linguistic context and masking
conditions'' Memory & Cognition 1 217 ^ 223
Bodelon C, Fallah M, Reynolds J H, 2004 ``Temporal resolution of orientation/color conjunctions'',
paper presented at the Society for Neuroscience Annual Meeting, San Diego, California
Borghuis B G, Perge J A, Vajda I, Wezel R J van, Grind W A van de, Lankheet M J, 2003 ``The
motion reverse correlation (MRC) method: a linear systems approach in the motion domain''
Journal of Neuroscience Methods 123 153 ^ 166
Burr D C, Ross J, 1982 ``Contrast sensitivity at high velocities'' Vision Research 22 479 ^ 484
Brysbaert M, Vitu F, 1998 ``Word skipping: Implications for theories of eye movement control
in reading'', in Eye Guidance in Reading and Scene Perception Ed. G Underwood (New York:
Elsevier) pp 125 ^ 147
Chen H C, 1984 ``Detecting radical component of Chinese characters in visual reading'' Chinese
Journal of Psychology 26 29 ^ 34
Clifford C W G, Arnold D H, Pearson J, 2003 ``A paradox of temporal perception revealed by
a stimulus oscillating in colour and orientation'' Vision Research 43 2245 ^ 2253
Clifford C W, Holcombe A O, Pearson J, 2004 ``Rapid global form binding with loss of associated
colors'' Journal of Vision 4 1090 ^ 1101 (http://journalofvision.org/4/12/8/, DOI:10.1167/4.12.8)
Coltheart M, Rastle K, Perry C, Langdon R, Ziegler J, 2001 ``DRC: a dual route cascaded model
of visual word recognition and reading aloud'' Psychological Review 108 204 ^ 256
Crick F, Koch C, 1995 ``Are we aware of neural activity in primary visual cortex?'' Nature 375
121 ^ 123
Crick F, Koch C, 2003 ``A framework for consciousness'' Nature Neuroscience 6 119 ^ 126
Da J, 2004 ``A corpus-based study of character and bigram frequencies in Chinese e-texts and its
implications for Chinese language instruction'', in Proceedings of the Fourth International
Conference on New Technologies in Teaching and Learning Chinese Eds Z Pu, T Xie, J Xu (Beijing:
Tsinghua University Press) pp 501 ^ 511
Dakin S C, Bex P J, 2002 ``Role of synchrony in contour binding: some transient doubts sustained''
Journal of the Optical Society of America A 19 678 ^ 686
Davis C J, 2005 ``N-Watch: A program for deriving neighborhood size and other psycholinguistic
statistics'' Behavior Research Methods 37 65 ^ 70
Davis C J, submitted ``The SOLAR (Self-Organizing Lexical Acquisition and Recognition) model
of visual word identification, Part 1: Orthographic input coding and lexical matching''
Davis C J, Bowers J S, 2004 ``What do letter migration errors reveal about letter position coding in
visual world recognition?'' Journal of Experimental Psychology: Human Perception and Performance 30 923 ^ 941
Duncan J, Humphreys G W, 1989 ``Visual search and stimulus similarity'' Psychological Review
96 433 ^ 458
Duncan J, Ward R, Shapiro K, 1994 ``Direct measurement of attentional dwell time in human
vision'' Nature 369 313 ^ 316
Egeth H, Jonides J, Wall S, 1972 ``Parallel processing of multielement displays'' Cognitive Psychology
3 674 ^ 698
Engbert R, Longtin A, Kliegl R, 2002 ``A dynamical model of saccade generation in reading based
on spatially distributed lexical processing'' Vision Research 42 621 ^ 636
Fabre-Thorpe M, Delorme A, Marlot C, Thorpe S, 2001 ``A limit to the speed of processing ultrarapid visual categorization of novel natural scenes'' Journal of Cognitive Neuroscience 13
171 ^ 180
Forte J, Hogben J H, Ross J, 1999 ``Spatial limitations of temporal segmentation'' Vision Research
39 4052 ^ 4061
Geissler H G, Schebera F U, Kompass R, 1999 ``Ultra-precise quantal timing: Evidence from
simultaneity thresholds in long-range apparent movement'' Perception & Psychophysics 61
707 ^ 726
Graboi D, Lisman J, 2003 ``Recognition by top ^ down and bottom ^ up processing in cortex: the
control of selective attention'' Journal of Neurophysiology 90 798 ^ 810
Greenwald A G, Draine S C, Abrams R L, 1996 ``Three cognitive markers of unconscious semantic
activation'' Science 273 1699 ^ 1702
Gur M, Snodderly M, 1997 ``A dissociation between brain activity and perception: Chromatically
opponent cortical neurons signal chromatic flicker that is not perceived'' Vision Research 37
377 ^ 382
Harris C L, Morris A L, 2001 ``Illusory words created by repetition blindness: a technique for
probing sublexical representations'' Psychonomic Bulletin & Review 8 118 ^ 126
Binding words is slow
73
Hochstein S, Ahissar M, 2002 ``View from the top: hierarchies and reverse hierarchies in the visual
system'' Neuron 36 791 ^ 804
Holcombe A O, 2001 ``A purely temporal transparency mechanism in the visual system'' Perception
30 1311 ^ 1320
Holcombe A O, Cavanagh P, 2001 ``Early binding of feature pairs for visual perception'' Nature
Neuroscience 4 127 ^ 128
Holcombe A O, Kanwisher N, Treisman A, 2001 ``The midstream order deficit'' Perception & Psychophysics 63 322 ^ 329
Howes D H, Solomon R L, 1951 ``Visual duration threshold as a function of word probability''
Journal of Experimental Psychology 41 401 ^ 410
Hue C W, 1992 ``Recognition processes in character naming'', in Language Processing in Chinese
Eds H C Chen, O J L Tzen (Amsterdam: Elsevier) pp 93 ^ 107
Inhoff A W, Starr M, Shindler K L, 2000 ``Is the processing of words during eye fixations in
reading strictly serial?'' Perception & Psychophyhsics 62 1474 ^ 1484
Johnson J S, Olshausen B A, 2003 ``Timecourse of neural signatures of object recognition''
Journal of Vision 3 499 ^ 512 (http://journalofvision.org/3/7/4, DOI:10.1167/3.7.4)
Jolicoeur P, Dell'Acqua R, 1998 ``The demonstration of short-term consolidation'' Cognitive Psychology
36 138 ^ 202
Kandil F I, Fahle M, 2003 ``Mechanisms of time-based figure ^ ground segregation'' European
Journal of Neuroscience 18 2874 ^ 2882
Kinoshita S, Lupker S J (Eds), 2003 Masked Priming: The State of the Art (New York: Psychology
Press)
Kline K, Holcombe A O, Eagleman D M, 2004 ``Illusory motion reversal is caused by rivalry,
not by perceptual snapshots of the visual field'' Vision Research 44 2653 ^ 2658
Kline K, Holcombe A O, Eagleman D M, 2006 ``Illusory motion reversal does not imply discrete
processing: Reply to Rojas et al.'' Vision Research 46 1158 ^ 1159
Lawson R, Jolicoeur P, 2003 ``Recognition thresholds for plane-rotated pictures of familiar objects''
Acta Psychologica 112 17 ^ 41
McCandliss B D, Cohen L, Dehaene S, 2003 ``The visual word form area: expertise for reading
in the fusiform gyrus'' Trends in Cognitive Sciences 7 293 ^ 299
McClelland J L, Mozer M C, 1986 ``Perceptual interactions in two-word displays: familiarity and
similarity effects'' Journal of Experimental Psychology: Human Perception and Performance 12
18 ^ 35
McClelland J L, Rumelhart D E, 1981 ``An interactive activation model of context effects in letter
perception: Part 1. An account of basic findings'' Psychological Review 88 375 ^ 407
Mack A, Rock I, 1998 Inattentional Blindness (Cambridge, MA: MIT Press)
Manelis L, 1974 ``The effect of meaningfulness in tachistoscopic word perception'' Perception &
Psychophysics 16 182 ^ 192
Martelli M, Majaj N J, Pelli D G, 2005 ``Are faces processed like words? A diagnostic test for
recognition by parts'' Journal of Vision 5 58 ^ 70 (http://journalofvision.org/5/1/6/, DOI:10.1167/5.1.6)
Morgan M J, Castet E, 1995 ``Stereoscopic depth perception at high velocities'' Nature 378 380 ^ 383
Morrone M C, Burr D C, Vaina L M, 1995 ``Two stages of visual processing for radial and
circular motion'' Nature 376 507 ^ 509
Mozer M C, 1983 ``Letter migration in word perception'' Journal of Experimental Psychology: Human
Perception and Performance 9 531 ^ 546
Murray W S, Forster K I, 2004 ``Serial mechanisms in lexical access: the rank hypothesis'' Psychological Review 111 721 ^ 756
O'Connor K J, Potter M C, 2002 ``Constrained formation of object representations'' Psychological
Science 13 106 ^ 111
Pelli D G, Farell B, Moore D C, 2003 ``The remarkable inefficiency of word recognition'' Nature
423 752 ^ 756
Pollatsek A, Reichle E D, Rayner K, 2003 ``Modeling eye movements in reading: Extensions of the
E-Z Reader model'', in The Mind's Eye: Cognitive and Applied Aspects of Eye Movement
Research Eds J Hyona, R Radach, H Deubel (Oxford: Elsevier) pp 361 ^ 390
Potter M C, 1993 ``Very short-term conceptual memory'' Memory & Cognition 21 156 ^ 161
Ramachandran V S, Rogers-Ramachandran D C, 1991 ``Phantom contours: A new class of visual
patterns that selectively activates the magnocellular pathway in man'' Bulletin of the Psychonomic
Society 29 391 ^ 394
Rayner K, Ashby J, Pollatsek A, Reichle E D, 2004 ``The effects of frequency and predictability
on eye fixations in reading: implications for the E-Z Reader model'' Journal of Experimental
Psychology: Human Perception and Performance 30 720 ^ 732
74
A O Holcombe, J Judson
Reilly R, Radach R, 2003 ``Foundations of an interactive activation model of eye movement control in reading'', in The Mind's Eye: Cognitive and Applied Aspects of Eye Movement Research
Eds J Hyona, R Radach, H Deubel (Oxford: Elsevier) pp 429 ^ 455
Reulen J P H, Marcus J T, Koops D, Vries F R de, Tiesinga G, Boshuizen K, Bos J E, 1988 ``Precise
recording of eye movement: the IRIS technique. Part 1'' Medical & Biological Engineering
Computing 26 20 ^ 26
Rousselet G A, Fabre-Thorpe M, Thorpe S J, 2002 ``Parallel processing in high-level categorization of natural images'' Nature Neuroscience 5 629 ^ 630
Rubin G S, Turano K, 1992 ``Reading without saccadic eye movements'' Vision Research 32 895 ^ 902
Seidenberg M S, 1985 ``The time course of phonological code activation in two writing systems''
Cognition 19 1 ^ 30
Shallice T, McGill J, 1978 ``The origins of mixed errors'', in Attention and Performance VII
Ed. J Requin (Hillsdale, NJ: Lawrence Erlbaum Associates) pp 193 ^ 208
Smith E E, Haviland S E, 1972 ``Why words are perceived more accurately than non words:
inference versus unitization'' Journal of Experimental Psychology 92 59 ^ 64
Sperling G, Budiansky J, Spivak J, Johnson M C, 1971 ``Extremely rapid visual search: The maximum rate of scanning letters for the presence of a numeral'' Science 174 307 ^ 311
Stroop J R, 1935 ``Studies of interference in verbal reactions'' Journal of Experimental Psychology
18 643 ^ 662
Stroud J M, 1956 ``The fine structure of psychological time'', in Information Theory in Psychology
Ed. H Quastler (Glencoe, IL: Free Press) pp 140 ^ 207
Tao L, Healy A F, Bourne L E, 1997 ``Unitization in second-language learning: Evidence from
letter detection'' American Journal of Psychology 110 385 ^ 395
Thompson M C, Massaro D W, 1973 ``Visual information and redundancy in reading'' Journal
of Experimental Psychology 98 49 ^ 54
Thorpe S, Fize D, Marlot C, 1996 ``Speed of processing in the human visual system'' Nature 381
520 ^ 522
Treisman A, Souther J, 1986 ``Illusory words: the role of attention and of top ^ down constraints
in conjoining letters to form words'' Journal of Experimental Psychology: Human Perception
and Performance 12 3 ^ 17
Tyler C W, Hardage L, Miller R T, 1995 ``Multiple mechanisms for the detection of mirror
symmetry'' Spatial Vision 9 79 ^ 100
Tzeng O J, Hung D L, Cotton B, 1979 ``Visual internalisation effect in reading Chinese characters''
Nature 282 499 ^ 501
Usher M, Donnelly N, 1998 ``Visual synchrony affects binding and segmentation in perception''
Nature 394 179 ^ 182
VanRullen R, Koch C, 2003 ``Is perception discrete or continuous?'' Trends in Cognitive Sciences
7 207 ^ 213
VanRullen R, Thorpe S J, 2001 ``The time course of visual processing: From early perception to
decision-making'' Journal of Cognitive Neuroscience 3 454 ^ 461
Wilson H R, Wilkinson F, 1998 ``Detection of global structure in Glass patterns: implications for
form vision'' Vision Research 38 2933 ^ 2947
ß 2007 a Pion publication
ISSN 0301-0066 (print)
ISSN 1468-4233 (electronic)
www.perceptionweb.com
Conditions of use. This article may be downloaded from the Perception website for personal research
by members of subscribing organisations. Authors are entitled to distribute their own article (in printed
form or by e-mail) to up to 50 people. This PDF may not be placed on any website (or other online
distribution system) without permission of the publisher.