Vision Research 45 (2005) 1085–1090 www.elsevier.com/locate/visres Brief Communication The perception of spatial order at a glance Ariella V. Popple *, Dennis M. Levi School of Optometry and Helen Wills Neuroscience Institute, Berkeley, University of California at Berkeley, Gayley Road, Berkeley, CA 94720-2020, USA Received 25 August 2004; received in revised form 4 November 2004 Abstract Spatial order is the organizing principle of the visual areas in the brain. But to what extent does this spatial mapping help us see where things are? Observers trained to perfectly recall the spatial order of seven items presented simultaneously for 5 s were asked to report their order when flashed for only 150 ms. We found that the capacity for perceiving the order of these brief stimuli was limited by their spacing. Five or six widely-spaced stimuli were seen in the correct order, but only four crowded stimuli. Regardless of spacing and set-size, confusions between neighbors were unexpectedly frequent, suggesting there is positional as well as object uncertainty. Ó 2004 Elsevier Ltd. All rights reserved. Keywords: Capacity; Noise; Crowding; Attention; Uncertainty 1. Introduction The visual areas of the brain contain multiple mappings of the retinal images (Sereno et al., 1995). Some overlap, like orientation and ocular dominance maps in V1 (Hubel & Wiesel, 1962). Others occupy different regions, such as the colour-pattern and spatio-temporal maps in V4 and MT (DeYoe, Felleman, Van Essen, & McClendon, 1994; Shipp & Zeki, 1985; Tootell, Tsao, & Vanduffel, 2003). We understand how the individual response properties of neurons in these cortical regions contribute to the discrimination of visual stimuli (Barlow, Kaushal, Hawken, & Parker, 1987; Britten & Newsome, 1998; Skottun, Bradley, Sclar, Ohzawa, & Freeman, 1987), but little is known about the influence of their spatial organization on perception. For example, although both orientation and ocular dominance are mapped in V1, when we see a line with one eye we * Corresponding author. Tel.: +1 510 643 8685; fax: +1 510 643 8733. E-mail address: [email protected] (A.V. Popple). 0042-6989/$ - see front matter Ó 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.visres.2004.11.008 can tell if itÕs tilted, but not which eye is viewing it (Templeton & Green, 1968). Early work on visual search suggested we need attention to bind features together in the same location (Treisman, 1998; Treisman & Gelade, 1980), a finding inconsistent with the availability of precise location information from the retinotopic maps in the visual cortex. Subsequent studies, however, showed that location is readily available for individual features (Sagi & Julesz, 1985). More recently, in a task where either the identity and then the location of a target were reported, or its location and then its identity, the second property reported was always less precise (Di Lollo, Kawahara, Zuvic, & Visser, 2001). Having two tasks instead of just one can limit the amount of information available, and hence the precision in each task, because of limited attentional resources. Visual factors, such as crowding, can also influence the availability of information concerning the identity and location of stimuli. Groups of letters, or other stimuli, are easier to recognize, especially in peripheral vision, when their spacing is scaled to more 1086 A.V. Popple, D.M. Levi / Vision Research 45 (2005) 1085–1090 than half their eccentricity (Bouma, 1970; Hess, Dakin, & Kapoor, 2000; Levi, Hariharan, & Klein, 2002; Parkes, Lund, Angelucci, Solomon, & Morgan, 2001). Koenderink and Van Doorn (1999) proposed a model in which the visual scene is coded imprecisely, with a loose idea of what stimuli are present within each coarse-grained region of space. This model suggests that the low-level representation of items in visual space is noisy, leading to localization errors even under conditions of full attention. To compare the two models, noisy representation vs. limited resources, we studied the perception of spatial order while varying the number of items in the display, and their separation. 2. Methods 2.1. Observers Three observers with normal visual acuity participated in the experiment, including author AP. 2.2. Stimuli Our stimuli were colored discs (or black letters), each chosen from a set of eight, which allowed us to examine the perceived order and correct selection of up to seven items. A subset of items with no repetition was selected randomly on each trial. They were positioned along the arc of an invisible circle centered on fixation (Fig. 1). To offset changes in cortical magnification with eccentricity (Levi, Klein, & Aitsebaomo, 1985; Van Essen, Newsome, & Maunsell, 1984) the angular separation of the stimuli along the arc remained constant, at 15%, 30% or 60% of the radius, while eccentricity varied from 1° (central)–13° (parafoveal). Stimuli appeared randomly above or below centre, making central fixation the optimal strategy, with attention spread over the visual field. The 150 ms stimulus interval was too brief to initiate a saccadic eye-movement (typical latency 200–250 ms), or to attend to more than 3–5 items in sequence (assuming 30–50 ms per item). Stimuli were scaled in diameter to 2/3 their centreseparation, so that eccentricity and separation effects would not be confounded with changes in local contextual influences (i.e., the ratio of size to separation was constant). Stimuli were presented on a grey background (CIE coordinates x = 0.29, y = 0.32, lum = 35 cd m2), with ambient light from an overhead neon strip bulb. They were colored discs (CIE coordinates in x, y, lum/ cd m2: black = 0.27, 0.27, 1; white = 0.28, 0.30, 127; red = 0.62, 0.34, 23; pink = 0.29, 0.14, 42; blue = 0.15, 0.07, 14; green = 0.29, 0.61, 50; yellow = 0.42, 0.51, 105; orange = 0.51, 0.43, 53) or black letters rotated to lie along a circle centered on fixation (A, L, O, U, R, T, X, Z in Zurich Extended true-type font http:// Fig. 1. Sample stimuli at set-size 7, spacing 30% eccentricity. (a) Colors, (b) letters. Observers initiated 150 ms trials while attending the central fixation marker. The perceived sequence of letters or colors was then typed in as a response, and the correct sequence given as feedback. www.clipserver.de/Fonts/Z.htm). Viewing was monocular, and to avoid the blindspot parafoveal stimuli at set-sizes 6–7 and 60% spacing were shifted by one position left or right depending on the viewing eye. In all other conditions, stimuli were centered on the vertical midline as shown in Fig. 1. 2.3. Design The task was to report the colors (or letters) seen, by typing them in the correct order. Responses to colors were 1st letters of color names, except blue which was coded as ÔuÕ to avoid confusion with black (ÔbÕ). We measured performance and errors as the set size was increased from 1–7 items in successive 40-trial blocks. 2.4. Procedure Before the experiment, observers were trained to remember seven items presented for 5 s with 100% accuracy (40 out of 40 trials correct), as short-term memory capacity normally ranges from 4–9 items (Cowan, 2001; Miller, 1956). A.V. Popple, D.M. Levi / Vision Research 45 (2005) 1085–1090 1087 the smaller set-size, expressed as ^y ðx 1Þ, over and above chance, then you might also guess the xth element correctly. There are n x + 1 possibilities for the identity of the xth element, and x positions where it could be placed, in relation to the correctly ordered x 1 elements, giving the following expressions for c ^y ðx 1Þ cp ðx 1Þ guess : cp ðxÞ ¼ cpguess ðxÞ þ 1 cpguess ðxÞ xðn x þ 1Þ ð4Þ ^y ðx 1Þ cc ðx 1Þ guess cc ðxÞ ¼ ccguess ðxÞ þ 1 ccguess ðxÞ : nxþ1 ð5Þ Fig. 2. Sample data for author AP with parafoveal (13° eccentricity), colored discs, spaced as in Fig. 1(a). Correct responses to permutations (j) and combinations () are shown with psychometric model fits— arrows indicate threshold capacity, where 50% of correct responses were due to chance. Results were fitted with psychometric functions to find the threshold capacity for perceiving the correct combination of items, regardless of their order, and the correct permutation of ordered items (sample data in Fig. 2). Psychometric functions were fitted using Probit Z x ðtlÞ2 1 ^y ðxÞ ¼ c þ ð1 cÞ pffiffiffiffiffiffi e 2r2 dt: ð1Þ r 2p 1 Expected frequencies in the confusion matrices (E) were computed from the observed confusion matrices (O) based on the distribution of errors as follows, where i and j are row and column indices, and k and l refer to sums over columns and rows respectively, P P Oil k;k6¼j Okj P l;l6¼i : Eij;i6¼j ¼ P ð6Þ k;k6¼i l;l6¼k Okl This is the row total (excluding the diagonal, where responses are always correct) multiplied by the column total (excluding the diagonal), divided by the sum of row totals, excluding the diagonal and excluding the present row, where responses are presently correct. Expectations are for off-diagonal (error) responses only, and the total sum of expected errors equals the sum of errors observed; only their distributions differ. where x is the set size, ^y the predicted correct frequency (fitted to obtained values of y), l is the threshold capacity (where half the time correct responses can be attributed to chance), and r is the standard deviation of the underlying Gaussian curve. The values of l and r were estimated from the data. In a capacity-limited model r would be less than one, but we found it generally between 1 and 2 (of 52 fits, 43 were greater than 1, and zero was included in the 95% confidence interval for r only 5 times). The probability of guessing the correct response, cguess, is the inverse of the number of (ordered) permutations, or (unordered) combinations, given a set-size x of n possible stimuli (in the experiment, n = 8) cpguess ðxÞ ¼ x! : n! ð2Þ ccguess ðxÞ ¼ ðn xÞ!x! : n! ð3Þ However, there are two ways of getting a correct response by chance, at a given set-size x. First, you might guess the correct response. Second, if you do not guess the correct response, but you do correctly respond to Fig. 3. Results (averaged across observers, eccentricity, and upper and lower visual fields) show threshold capacities for reporting the correct permutation and combination of items varied with crowding. Permutation thresholds (j) were somewhat smaller than combination thresholds (). Crowding occurred at a larger separation for letters (unfilled) than for colors (filled). 1088 A.V. Popple, D.M. Levi / Vision Research 45 (2005) 1085–1090 3. Results Results were fitted with psychometric functions to find the threshold capacity for perceiving the correct combination of items, regardless of their order, and the correct permutation of ordered items (sample data in Fig. 2, details in Methods). Threshold capacity varied with crowding, but not eccentricity (central or parafoveal) or visual field (upper or lower), therefore results were collapsed across these variables. Fig. 3 shows that, on average, about 4 crowded items (spaced at 15–30% eccentricity) were seen in the correct order, whereas be- tween 5 and 6 uncrowded items (spaced at 60% eccentricity) could be perceived. Crowding occurred at a larger separation for letters than colors, not surprising since small-scale features were required to identify the letters, whereas colors were discriminated based on a global property. Combination capacity was slightly greater than permutation capacity, suggesting some positional uncertainty in the stimulus representation; however this difference was rarely statistically significant (error bars indicate 95% confidence intervals). Based on the number of correct responses, we cannot be sure whether errors were due to object uncertainty (confusion Fig. 4. (a) Sample confusion matrix for set-size 7 at 15% spacing in observer AP. Responses in each position are plotted against stimulus position, with color indicating the number of responses, from black (0/40) to white (39/40). Note the serial position effect, shown by the gradation of shading along the main diagonal. Note the prevalence of neighbor errors, shown by the responses along the neighboring diagonals. (b) Expected confusion matrix for the condition in (a). The serial position effect was taken into account in the computation of expected errors, as shown by the prevalence of errors toward positions where correct responses were few (dark grey areas on the diagonal). Note there are clearly fewer neighbor errors than in (a). (c) The ratio of observed to expected errors was significantly greater than 1 at all set-sizes where sufficient errors were made to perform this computation (4–7), however there was little difference between the different set sizes. Error bars indicate 95% confidence intervals based on the student-t distribution computed from 24 of the 26 observers and conditions (2) were excluded as zero errors were expected. (d) The ratio of observed to expected errors was significantly greater than 1 at all separations, tested for the color data only as the letter condition was not sampled at 15% separation. Error bars and exclusions as in (c). A.V. Popple, D.M. Levi / Vision Research 45 (2005) 1085–1090 between item identities) or positional uncertainty (confusions item locations). In addition to modeling correct responses, we studied the distribution of errors across the stimulus. The proportion of confusions between neighboring positions was unexpectedly high. To quantify this effect, we computed the expected confusion matrix, based on the proportion of errors in each position (sample data in Fig. 4(a) and (b), see Methods for details). This takes into account the serial position effect along the diagonal of the confusion matrix. The ratio of observed/expected neighbor errors was about 1.2, regardless of separation or setsize (Fig. 4(c) and (d)). 1089 Here we have shown that the number of things that can be seen in order is limited by crowding. However, even when the spacing between items is large, or the number of items small, positional uncertainty influences their perceived spatial order. The difference in performance between crowded and uncrowded items agrees with established theories of coarse-to-fine visual processing (Robson, 1966), extending them from the detection of isolated stimuli to the integration of information across the visual scene. Our results are consistent with an early, somewhat crude, visual representation of the joint space between object identity and location, which may later be refined by attention, aiding the perception of spatial order in prolonged scenes or across eye fixations. 4. Discussion Previous studies have found different limits for counting crowded and uncrowded items, similar in magnitude to the limits we found for ordering items (Atkinson, Campbell, & Francis, 1976; Jevons, 1871). The spatial resolution of visual attention is also said to be limited, particularly in the upper visual field (He, Cavanagh, & Intriligator, 1996). Poor performance has been attributed to exceeding these capacity and resolution limits. We propose, instead, that errors are more parsimoniously attributed directly to a noisy stimulus representation, with both object and positional uncertainty, possibly as a result of lateral interactions between nearby items (Levi et al., 2002), when the scene is parsed into successively smaller chunks. Either way, it seems the precisely ordered detail of the perceived visual world is to some degree illusory, and perhaps only indirectly related to the detailed topographic map in visual areas of the brain. Pelli, Martelli, Majaj, Chattergee, and Thompson (2004) argued for the idea of a Ôconjunction fieldÕ, a minimum area approximately 50% eccentricity in diameter, in which only holistic recognition, rather than recognition by parts, can occur. Our data directly contradict this notion. We found that observers were able to identify correctly and in order, most of the time, 3 colors presented in this region (15% spacing). Additionally, we found no evidence for the capacity limitation of visual attention or short-term memory suggested by Verghese and Pelli (1994) and others (Alvarez & Cavanagh, 2004; Reeves & Sperling, 1986). Instead, we found that perceived spatial order was characterized by neighbor errors, regardless of spacing and set-size (Fig. 4(c) and (d)). Although there was a considerable serial position effect (diagonals in Fig. 4(a) and (b)), there was no evidence that the kind of errors changed drastically after some fixed capacity, or degree of proximity was reached. The data are consistent with the idea that the underlying representation of spatial order is noisy, and positions beyond the first one were insufficiently sampled to fully overcome this noise. Acknowledgements RO1EY01728 from the National Eye Institute, NIH to D. Levi. With thanks to Peter Neri for comments on an earlier draft of the manuscript. References Alvarez, G. A., & Cavanagh, P. (2004). The capacity of visual shortterm memory is set both by visual information load and by number of objects. Psychological Science, 15, 106–111. Atkinson, J., Campbell, F. W., & Francis, M. R. (1976). The magic number 4 ± 0: a new look at visual numerosity judgements. Perception, 5, 327–334. Barlow, H. B., Kaushal, T. P., Hawken, M., & Parker, A. J. (1987). Human contrast discrimination and the threshold of cortical neurons. Journal of the Optical Society of America A, 4, 2366–2371. Bouma, H. (1970). Interaction effects in parafoveal letter recognition. Nature, 226, 177–178. Britten, K. H., & Newsome, W. T. (1998). Tuning bandwidths for near-threshold stimuli in area MT. Journal of Neurophysiology, 80, 762–770. Cowan, N. (2001). The magical number 4 in short-term memory: a reconsideration of mental storage capacity. Behavioral and Brain Sciences, 24, 87–114. DeYoe, E. A., Felleman, D. J., Van Essen, D. C., & McClendon, E. (1994). Multiple processing streams in occipitotemporal visual cortex. Nature, 371, 151–154. Di Lollo, V., Kawahara, J., Zuvic, S. M., & Visser, T. A. (2001). The preattentive emperor has no clothes: a dynamic redressing. Journal of Experimental Psychology—General, 130, 479–492. He, S., Cavanagh, P., & Intriligator, J. (1996). Attentional resolution and the locus of visual awareness. Nature, 383, 334–337. Hess, R. F., Dakin, S. C., & Kapoor, N. (2000). The foveal ÔcrowdingÕ effect: physics or physiology. Vision Research, 40, 365–370. Hubel, D. H., & Wiesel, T. N. (1962). Receptive fields, binocular interaction and functional architecture in the catÕs visual cortex. Journal of Physiology, 160, 106–154. Jevons, W. S (1871). The power of numerical discrimination. Nature, 3, 281–282. Koenderink, J. J., & Van Doorn, A. J. (1999). The structure of locally orderless images. International Journal of Computer Vision, 31, 159–168. 1090 A.V. Popple, D.M. Levi / Vision Research 45 (2005) 1085–1090 Levi, D. M., Hariharan, S., & Klein, S. A. (2002). Suppressive and facilitatory spatial interactions in peripheral vision: peripheral crowding is neither size invariant nor simple contrast masking. Journal of Vision, 2, 167–177. Levi, D. M., Klein, S. A., & Aitsebaomo, A. P. (1985). Vernier acuity, crowding and cortical magnification. Vision Research, 25, 963– 977. Miller, G. A. (1956). The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychological Review, 63, 81–97. Parkes, L., Lund, J., Angelucci, A., Solomon, J. A., & Morgan, M. (2001). Compulsory averaging of crowded orientation signals in human vision. Nature Neuroscience, 4, 739–744. Pelli, D. G., Martelli, M., Majaj, N. J., Chattergee, G., & Thompson, A. (2004). Object recognition. Parts, wholes, and the conjunction index. Perception. ECVP2004 Abstracts, s583.(http://www.perceptionweb.com/ecvp04/0583.html). Reeves, A., & Sperling, G. (1986). Attention gating in short-term visual memory. Psychological Review, 93, 180–206. Robson, J. G. (1966). Spatial and temporal contrast sensitivity functions of the human eye. Journal of the Optical Society of America, 46, 721–739. Sagi, D., & Julesz, B. (1985). ‘‘Where’’ and ‘‘what’’ in vision. Science, 228, 1217–1219. Sereno, M. I., Dale, A. M., Reppas, J. B., Kwong, K. K., Belliveau, J. W., Brady, T. J., et al. (1995). Borders of multiple visual areas in humans revealed by functional magnetic resonance imaging. Science, 268, 889–893. Shipp, S., & Zeki, S. (1985). Segregation of pathways leading from area V2 to areas V4 and V5 of macaque monkey visual cortex. Nature, 315, 322–325. Skottun, B. C., Bradley, A., Sclar, G., Ohzawa, I., & Freeman, R. D. (1987). The effects of contrast on visual orientation and spatial frequency discrimination: a comparison of single cells and behavior. Journal of Neurophysiology, 57, 773–784. Templeton, W. B., & Green, F. A. (1968). Chance results in utrocular discrimination. Quarterly Journal of Experimental Psychology, 20, 200–203. Tootell, R. B., Tsao, D., & Vanduffel, W. (2003). Neuroimaging weighs in: humans meet macaques in ‘‘primate’’ visual cortex. Journal of Neuroscience, 23, 3981–3989. Treisman, A. (1998). Feature binding, attention and object perception. Philosophical Transactions of the Royal Society of London Series B—Biological Sciences, 353, 1295–1306. Treisman, A. M., & Gelade, G. (1980). A feature-integration theory of attention. Cognitive Psychology, 12, 97–136. Van Essen, D. C., Newsome, W. T., & Maunsell, J. H. (1984). The visual field representation in striate cortex of the macaque monkey: asymmetries, anisotropies, and individual variability. Vision Research, 24, 429–448. Verghese, P., & Pelli, D. G. (1994). The scale bandwidth of visual search. Vision Research, 34, 955–962.
© Copyright 2026 Paperzz