The perception of spatial order at a glance

Vision Research 45 (2005) 1085–1090
www.elsevier.com/locate/visres
Brief Communication
The perception of spatial order at a glance
Ariella V. Popple *, Dennis M. Levi
School of Optometry and Helen Wills Neuroscience Institute, Berkeley, University of California at Berkeley, Gayley Road,
Berkeley, CA 94720-2020, USA
Received 25 August 2004; received in revised form 4 November 2004
Abstract
Spatial order is the organizing principle of the visual areas in the brain. But to what extent does this spatial mapping help us see
where things are? Observers trained to perfectly recall the spatial order of seven items presented simultaneously for 5 s were asked to
report their order when flashed for only 150 ms. We found that the capacity for perceiving the order of these brief stimuli was limited
by their spacing. Five or six widely-spaced stimuli were seen in the correct order, but only four crowded stimuli. Regardless of spacing and set-size, confusions between neighbors were unexpectedly frequent, suggesting there is positional as well as object
uncertainty.
Ó 2004 Elsevier Ltd. All rights reserved.
Keywords: Capacity; Noise; Crowding; Attention; Uncertainty
1. Introduction
The visual areas of the brain contain multiple mappings of the retinal images (Sereno et al., 1995). Some
overlap, like orientation and ocular dominance maps
in V1 (Hubel & Wiesel, 1962). Others occupy different
regions, such as the colour-pattern and spatio-temporal
maps in V4 and MT (DeYoe, Felleman, Van Essen, &
McClendon, 1994; Shipp & Zeki, 1985; Tootell, Tsao,
& Vanduffel, 2003). We understand how the individual
response properties of neurons in these cortical regions
contribute to the discrimination of visual stimuli (Barlow, Kaushal, Hawken, & Parker, 1987; Britten & Newsome, 1998; Skottun, Bradley, Sclar, Ohzawa, &
Freeman, 1987), but little is known about the influence
of their spatial organization on perception. For example, although both orientation and ocular dominance
are mapped in V1, when we see a line with one eye we
*
Corresponding author. Tel.: +1 510 643 8685; fax: +1 510 643
8733.
E-mail address: [email protected] (A.V. Popple).
0042-6989/$ - see front matter Ó 2004 Elsevier Ltd. All rights reserved.
doi:10.1016/j.visres.2004.11.008
can tell if itÕs tilted, but not which eye is viewing it (Templeton & Green, 1968).
Early work on visual search suggested we need attention to bind features together in the same location (Treisman, 1998; Treisman & Gelade, 1980), a finding
inconsistent with the availability of precise location
information from the retinotopic maps in the visual cortex. Subsequent studies, however, showed that location
is readily available for individual features (Sagi & Julesz,
1985). More recently, in a task where either the identity
and then the location of a target were reported, or its
location and then its identity, the second property
reported was always less precise (Di Lollo, Kawahara,
Zuvic, & Visser, 2001).
Having two tasks instead of just one can limit the
amount of information available, and hence the precision in each task, because of limited attentional resources. Visual factors, such as crowding, can also
influence the availability of information concerning the
identity and location of stimuli. Groups of letters, or
other stimuli, are easier to recognize, especially in
peripheral vision, when their spacing is scaled to more
1086
A.V. Popple, D.M. Levi / Vision Research 45 (2005) 1085–1090
than half their eccentricity (Bouma, 1970; Hess, Dakin,
& Kapoor, 2000; Levi, Hariharan, & Klein, 2002; Parkes, Lund, Angelucci, Solomon, & Morgan, 2001).
Koenderink and Van Doorn (1999) proposed a model
in which the visual scene is coded imprecisely, with a
loose idea of what stimuli are present within each
coarse-grained region of space. This model suggests that
the low-level representation of items in visual space is
noisy, leading to localization errors even under conditions of full attention. To compare the two models,
noisy representation vs. limited resources, we studied
the perception of spatial order while varying the number
of items in the display, and their separation.
2. Methods
2.1. Observers
Three observers with normal visual acuity participated in the experiment, including author AP.
2.2. Stimuli
Our stimuli were colored discs (or black letters), each
chosen from a set of eight, which allowed us to examine
the perceived order and correct selection of up to seven
items. A subset of items with no repetition was selected
randomly on each trial. They were positioned along the
arc of an invisible circle centered on fixation (Fig. 1). To
offset changes in cortical magnification with eccentricity
(Levi, Klein, & Aitsebaomo, 1985; Van Essen, Newsome, & Maunsell, 1984) the angular separation of the
stimuli along the arc remained constant, at 15%, 30%
or 60% of the radius, while eccentricity varied from 1°
(central)–13° (parafoveal). Stimuli appeared randomly
above or below centre, making central fixation the optimal strategy, with attention spread over the visual field.
The 150 ms stimulus interval was too brief to initiate a
saccadic eye-movement (typical latency 200–250 ms),
or to attend to more than 3–5 items in sequence (assuming 30–50 ms per item).
Stimuli were scaled in diameter to 2/3 their centreseparation, so that eccentricity and separation effects
would not be confounded with changes in local contextual influences (i.e., the ratio of size to separation was
constant). Stimuli were presented on a grey background
(CIE coordinates x = 0.29, y = 0.32, lum = 35 cd m2),
with ambient light from an overhead neon strip bulb.
They were colored discs (CIE coordinates in x, y, lum/
cd m2: black = 0.27, 0.27, 1; white = 0.28, 0.30, 127;
red = 0.62, 0.34, 23; pink = 0.29, 0.14, 42; blue = 0.15,
0.07, 14; green = 0.29, 0.61, 50; yellow = 0.42, 0.51,
105; orange = 0.51, 0.43, 53) or black letters rotated to
lie along a circle centered on fixation (A, L, O, U, R,
T, X, Z in Zurich Extended true-type font http://
Fig. 1. Sample stimuli at set-size 7, spacing 30% eccentricity. (a)
Colors, (b) letters. Observers initiated 150 ms trials while attending the
central fixation marker. The perceived sequence of letters or colors was
then typed in as a response, and the correct sequence given as
feedback.
www.clipserver.de/Fonts/Z.htm). Viewing was monocular, and to avoid the blindspot parafoveal stimuli at
set-sizes 6–7 and 60% spacing were shifted by one position left or right depending on the viewing eye. In all
other conditions, stimuli were centered on the vertical
midline as shown in Fig. 1.
2.3. Design
The task was to report the colors (or letters) seen, by
typing them in the correct order. Responses to colors
were 1st letters of color names, except blue which was
coded as ÔuÕ to avoid confusion with black (ÔbÕ). We measured performance and errors as the set size was increased from 1–7 items in successive 40-trial blocks.
2.4. Procedure
Before the experiment, observers were trained to
remember seven items presented for 5 s with 100% accuracy (40 out of 40 trials correct), as short-term memory
capacity normally ranges from 4–9 items (Cowan, 2001;
Miller, 1956).
A.V. Popple, D.M. Levi / Vision Research 45 (2005) 1085–1090
1087
the smaller set-size, expressed as ^y ðx 1Þ, over and
above chance, then you might also guess the xth element
correctly. There are n x + 1 possibilities for the identity of the xth element, and x positions where it could
be placed, in relation to the correctly ordered x 1 elements, giving the following expressions for c
^y ðx 1Þ cp ðx 1Þ
guess
:
cp ðxÞ ¼ cpguess ðxÞ þ 1 cpguess ðxÞ
xðn x þ 1Þ
ð4Þ
^y ðx 1Þ cc ðx 1Þ
guess
cc ðxÞ ¼ ccguess ðxÞ þ 1 ccguess ðxÞ
:
nxþ1
ð5Þ
Fig. 2. Sample data for author AP with parafoveal (13° eccentricity),
colored discs, spaced as in Fig. 1(a). Correct responses to permutations
(j) and combinations () are shown with psychometric model fits—
arrows indicate threshold capacity, where 50% of correct responses
were due to chance.
Results were fitted with psychometric functions to
find the threshold capacity for perceiving the correct
combination of items, regardless of their order, and
the correct permutation of ordered items (sample data
in Fig. 2).
Psychometric functions were fitted using Probit
Z x
ðtlÞ2
1
^y ðxÞ ¼ c þ ð1 cÞ pffiffiffiffiffiffi
e 2r2 dt:
ð1Þ
r 2p 1
Expected frequencies in the confusion matrices (E)
were computed from the observed confusion matrices
(O) based on the distribution of errors as follows, where
i and j are row and column indices, and k and l refer to
sums over columns and rows respectively,
P
P
Oil
k;k6¼j Okj
P l;l6¼i :
Eij;i6¼j ¼ P
ð6Þ
k;k6¼i
l;l6¼k Okl
This is the row total (excluding the diagonal, where
responses are always correct) multiplied by the column
total (excluding the diagonal), divided by the sum of
row totals, excluding the diagonal and excluding the
present row, where responses are presently correct.
Expectations are for off-diagonal (error) responses only,
and the total sum of expected errors equals the sum of
errors observed; only their distributions differ.
where x is the set size, ^y the predicted correct frequency
(fitted to obtained values of y), l is the threshold capacity (where half the time correct responses can be attributed to chance), and r is the standard deviation of the
underlying Gaussian curve. The values of l and r were
estimated from the data. In a capacity-limited model r
would be less than one, but we found it generally between 1 and 2 (of 52 fits, 43 were greater than 1, and
zero was included in the 95% confidence interval for r
only 5 times).
The probability of guessing the correct response,
cguess, is the inverse of the number of (ordered) permutations, or (unordered) combinations, given a set-size x of
n possible stimuli (in the experiment, n = 8)
cpguess ðxÞ ¼
x!
:
n!
ð2Þ
ccguess ðxÞ ¼
ðn xÞ!x!
:
n!
ð3Þ
However, there are two ways of getting a correct response by chance, at a given set-size x. First, you might
guess the correct response. Second, if you do not guess
the correct response, but you do correctly respond to
Fig. 3. Results (averaged across observers, eccentricity, and upper and
lower visual fields) show threshold capacities for reporting the correct
permutation and combination of items varied with crowding. Permutation thresholds (j) were somewhat smaller than combination
thresholds (). Crowding occurred at a larger separation for letters
(unfilled) than for colors (filled).
1088
A.V. Popple, D.M. Levi / Vision Research 45 (2005) 1085–1090
3. Results
Results were fitted with psychometric functions to
find the threshold capacity for perceiving the correct
combination of items, regardless of their order, and
the correct permutation of ordered items (sample data
in Fig. 2, details in Methods). Threshold capacity varied
with crowding, but not eccentricity (central or parafoveal) or visual field (upper or lower), therefore results
were collapsed across these variables. Fig. 3 shows that,
on average, about 4 crowded items (spaced at 15–30%
eccentricity) were seen in the correct order, whereas be-
tween 5 and 6 uncrowded items (spaced at 60% eccentricity) could be perceived. Crowding occurred at a
larger separation for letters than colors, not surprising
since small-scale features were required to identify the
letters, whereas colors were discriminated based on a
global property. Combination capacity was slightly
greater than permutation capacity, suggesting some
positional uncertainty in the stimulus representation;
however this difference was rarely statistically significant
(error bars indicate 95% confidence intervals). Based on
the number of correct responses, we cannot be sure
whether errors were due to object uncertainty (confusion
Fig. 4. (a) Sample confusion matrix for set-size 7 at 15% spacing in observer AP. Responses in each position are plotted against stimulus position,
with color indicating the number of responses, from black (0/40) to white (39/40). Note the serial position effect, shown by the gradation of shading
along the main diagonal. Note the prevalence of neighbor errors, shown by the responses along the neighboring diagonals. (b) Expected confusion
matrix for the condition in (a). The serial position effect was taken into account in the computation of expected errors, as shown by the prevalence of
errors toward positions where correct responses were few (dark grey areas on the diagonal). Note there are clearly fewer neighbor errors than in (a).
(c) The ratio of observed to expected errors was significantly greater than 1 at all set-sizes where sufficient errors were made to perform this
computation (4–7), however there was little difference between the different set sizes. Error bars indicate 95% confidence intervals based on the
student-t distribution computed from 24 of the 26 observers and conditions (2) were excluded as zero errors were expected. (d) The ratio of observed
to expected errors was significantly greater than 1 at all separations, tested for the color data only as the letter condition was not sampled at 15%
separation. Error bars and exclusions as in (c).
A.V. Popple, D.M. Levi / Vision Research 45 (2005) 1085–1090
between item identities) or positional uncertainty (confusions item locations).
In addition to modeling correct responses, we studied
the distribution of errors across the stimulus. The proportion of confusions between neighboring positions
was unexpectedly high. To quantify this effect, we computed the expected confusion matrix, based on the proportion of errors in each position (sample data in Fig.
4(a) and (b), see Methods for details). This takes into account the serial position effect along the diagonal of the
confusion matrix. The ratio of observed/expected neighbor errors was about 1.2, regardless of separation or setsize (Fig. 4(c) and (d)).
1089
Here we have shown that the number of things that
can be seen in order is limited by crowding. However,
even when the spacing between items is large, or the
number of items small, positional uncertainty influences
their perceived spatial order. The difference in performance between crowded and uncrowded items agrees
with established theories of coarse-to-fine visual processing (Robson, 1966), extending them from the detection
of isolated stimuli to the integration of information
across the visual scene. Our results are consistent with
an early, somewhat crude, visual representation of the
joint space between object identity and location, which
may later be refined by attention, aiding the perception
of spatial order in prolonged scenes or across eye
fixations.
4. Discussion
Previous studies have found different limits for counting crowded and uncrowded items, similar in magnitude
to the limits we found for ordering items (Atkinson,
Campbell, & Francis, 1976; Jevons, 1871). The spatial
resolution of visual attention is also said to be limited,
particularly in the upper visual field (He, Cavanagh, &
Intriligator, 1996). Poor performance has been attributed
to exceeding these capacity and resolution limits. We propose, instead, that errors are more parsimoniously attributed directly to a noisy stimulus representation, with
both object and positional uncertainty, possibly as a result of lateral interactions between nearby items (Levi
et al., 2002), when the scene is parsed into successively
smaller chunks. Either way, it seems the precisely ordered
detail of the perceived visual world is to some degree illusory, and perhaps only indirectly related to the detailed
topographic map in visual areas of the brain.
Pelli, Martelli, Majaj, Chattergee, and Thompson
(2004) argued for the idea of a Ôconjunction fieldÕ, a minimum area approximately 50% eccentricity in diameter,
in which only holistic recognition, rather than recognition by parts, can occur. Our data directly contradict
this notion. We found that observers were able to identify correctly and in order, most of the time, 3 colors
presented in this region (15% spacing). Additionally,
we found no evidence for the capacity limitation of visual attention or short-term memory suggested by
Verghese and Pelli (1994) and others (Alvarez & Cavanagh, 2004; Reeves & Sperling, 1986). Instead, we found
that perceived spatial order was characterized by neighbor errors, regardless of spacing and set-size (Fig. 4(c)
and (d)). Although there was a considerable serial position effect (diagonals in Fig. 4(a) and (b)), there was no
evidence that the kind of errors changed drastically after
some fixed capacity, or degree of proximity was reached.
The data are consistent with the idea that the underlying
representation of spatial order is noisy, and positions
beyond the first one were insufficiently sampled to fully
overcome this noise.
Acknowledgements
RO1EY01728 from the National Eye Institute, NIH
to D. Levi. With thanks to Peter Neri for comments
on an earlier draft of the manuscript.
References
Alvarez, G. A., & Cavanagh, P. (2004). The capacity of visual shortterm memory is set both by visual information load and by number
of objects. Psychological Science, 15, 106–111.
Atkinson, J., Campbell, F. W., & Francis, M. R. (1976). The magic
number 4 ± 0: a new look at visual numerosity judgements.
Perception, 5, 327–334.
Barlow, H. B., Kaushal, T. P., Hawken, M., & Parker, A. J. (1987).
Human contrast discrimination and the threshold of cortical
neurons. Journal of the Optical Society of America A, 4, 2366–2371.
Bouma, H. (1970). Interaction effects in parafoveal letter recognition.
Nature, 226, 177–178.
Britten, K. H., & Newsome, W. T. (1998). Tuning bandwidths for
near-threshold stimuli in area MT. Journal of Neurophysiology, 80,
762–770.
Cowan, N. (2001). The magical number 4 in short-term memory: a
reconsideration of mental storage capacity. Behavioral and Brain
Sciences, 24, 87–114.
DeYoe, E. A., Felleman, D. J., Van Essen, D. C., & McClendon, E.
(1994). Multiple processing streams in occipitotemporal visual
cortex. Nature, 371, 151–154.
Di Lollo, V., Kawahara, J., Zuvic, S. M., & Visser, T. A. (2001). The
preattentive emperor has no clothes: a dynamic redressing. Journal
of Experimental Psychology—General, 130, 479–492.
He, S., Cavanagh, P., & Intriligator, J. (1996). Attentional resolution
and the locus of visual awareness. Nature, 383, 334–337.
Hess, R. F., Dakin, S. C., & Kapoor, N. (2000). The foveal ÔcrowdingÕ
effect: physics or physiology. Vision Research, 40, 365–370.
Hubel, D. H., & Wiesel, T. N. (1962). Receptive fields, binocular
interaction and functional architecture in the catÕs visual cortex.
Journal of Physiology, 160, 106–154.
Jevons, W. S (1871). The power of numerical discrimination. Nature, 3,
281–282.
Koenderink, J. J., & Van Doorn, A. J. (1999). The structure of locally
orderless images. International Journal of Computer Vision, 31,
159–168.
1090
A.V. Popple, D.M. Levi / Vision Research 45 (2005) 1085–1090
Levi, D. M., Hariharan, S., & Klein, S. A. (2002). Suppressive and
facilitatory spatial interactions in peripheral vision: peripheral
crowding is neither size invariant nor simple contrast masking.
Journal of Vision, 2, 167–177.
Levi, D. M., Klein, S. A., & Aitsebaomo, A. P. (1985). Vernier acuity,
crowding and cortical magnification. Vision Research, 25, 963–
977.
Miller, G. A. (1956). The magical number seven, plus or minus two:
some limits on our capacity for processing information. Psychological Review, 63, 81–97.
Parkes, L., Lund, J., Angelucci, A., Solomon, J. A., & Morgan, M.
(2001). Compulsory averaging of crowded orientation signals in
human vision. Nature Neuroscience, 4, 739–744.
Pelli, D. G., Martelli, M., Majaj, N. J., Chattergee, G., & Thompson,
A. (2004). Object recognition. Parts, wholes, and the conjunction
index. Perception. ECVP2004 Abstracts, s583.(http://www.perceptionweb.com/ecvp04/0583.html).
Reeves, A., & Sperling, G. (1986). Attention gating in short-term
visual memory. Psychological Review, 93, 180–206.
Robson, J. G. (1966). Spatial and temporal contrast sensitivity
functions of the human eye. Journal of the Optical Society of
America, 46, 721–739.
Sagi, D., & Julesz, B. (1985). ‘‘Where’’ and ‘‘what’’ in vision. Science,
228, 1217–1219.
Sereno, M. I., Dale, A. M., Reppas, J. B., Kwong, K. K., Belliveau, J.
W., Brady, T. J., et al. (1995). Borders of multiple visual areas in
humans revealed by functional magnetic resonance imaging.
Science, 268, 889–893.
Shipp, S., & Zeki, S. (1985). Segregation of pathways leading from
area V2 to areas V4 and V5 of macaque monkey visual cortex.
Nature, 315, 322–325.
Skottun, B. C., Bradley, A., Sclar, G., Ohzawa, I., & Freeman, R. D.
(1987). The effects of contrast on visual orientation and spatial
frequency discrimination: a comparison of single cells and behavior. Journal of Neurophysiology, 57, 773–784.
Templeton, W. B., & Green, F. A. (1968). Chance results in utrocular
discrimination. Quarterly Journal of Experimental Psychology, 20,
200–203.
Tootell, R. B., Tsao, D., & Vanduffel, W. (2003). Neuroimaging
weighs in: humans meet macaques in ‘‘primate’’ visual cortex.
Journal of Neuroscience, 23, 3981–3989.
Treisman, A. (1998). Feature binding, attention and object perception.
Philosophical Transactions of the Royal Society of London Series
B—Biological Sciences, 353, 1295–1306.
Treisman, A. M., & Gelade, G. (1980). A feature-integration theory of
attention. Cognitive Psychology, 12, 97–136.
Van Essen, D. C., Newsome, W. T., & Maunsell, J. H. (1984). The
visual field representation in striate cortex of the macaque monkey:
asymmetries, anisotropies, and individual variability. Vision
Research, 24, 429–448.
Verghese, P., & Pelli, D. G. (1994). The scale bandwidth of visual
search. Vision Research, 34, 955–962.