Bottom-up and top-down processing in visual perception Thomas Serre Brown University Department of Cognitive & Linguistic Sciences Brown Institute for Brain Science Center for Vision Research The ‘what’ problem monkey electrophysiology Willmore Prenger & Gallant ’10 The ‘what’ problem monkey electrophysiology combinatorial sampling. of the intractable size of three-dimensional shape space. In this a b d Stereo Nonstereo 0 20 10 0 –90 0 90 Light angle (deg) 30 20 10 0 30 20 10 10 20 20 0 10 10 0 90 0 –4 –2 0 2 30 10 20 20 0 10 10 90 0 0 180 0 0 –90 0 90 Rotation angle (deg) –4 –2 0 2 1 0 4 30 20 10 0 0.48 1 Size(x) 2.1 1 2.1 1 2.1 30 Willmore Prenger & Gallant20 ’10 20 10 0 10 0 –90 0 90 40 0 0.48 30 20 20 10 –10 The ‘what’ problem Relative y position 1 20 40 –10 4 30 0 40 –10 Response = 0.4A + 0.0B + 49.0AB + 0.0 Angle on yz plane (deg) Minimum curvature i –10 0 10 x position (deg) 30 0 –90 0 –10 0 –4 –2 0 2 4 Depth (deg) 30 10 0 30 0 –90 20 10 Response (spikes s–1) 10 30 Response (spikes s–1) 20 –1 0 spikes s 40 –10 0 10 0 –90 j 0 90 0 0.48 monkey electrophysiology Relative y position 30 Yamane Carlson Bowman Wang h& Connor ’08 f g e y position (deg) –1 Response (spikes s–1) c Sh a Te din xt g ur N Sh o e a ne Te din xt g u N re on e 5 deg 38 Response (spikes s–1) –1 Response (spikes s ) 0 spikes s 1 0 combinatorial sampling. of the intractable size of three-dimensional shape space. In this a b 38 d Stereo Nonstereo Sh a Te din xt g ur N Sh o e a ne Te din xt g u N re on e 0 al ’05 Hung5 deg Kreiman et 20 10 0 –90 0 90 Light angle (deg) 30 20 10 0 30 20 10 10 20 20 0 10 10 0 90 0 –4 –2 0 2 30 10 20 20 0 10 10 90 0 0 180 0 0 –90 0 90 Rotation angle (deg) –4 –2 0 2 1 0 4 30 20 10 0 0.48 1 Size(x) 2.1 1 2.1 1 2.1 30 Willmore Prenger & Gallant20 ’10 20 10 0 10 0 –90 0 90 40 0 0.48 30 20 20 10 –10 The ‘what’ problem Relative y position 1 20 40 –10 4 30 0 40 –10 Response = 0.4A + 0.0B + 49.0AB + 0.0 Angle on yz plane (deg) Minimum curvature i –10 0 10 x position (deg) 30 0 –90 0 –10 0 –4 –2 0 2 4 Depth (deg) 30 10 0 30 0 –90 20 10 Response (spikes s–1) 10 30 Response (spikes s–1) 20 –1 0 spikes s 40 –10 0 10 0 –90 j 0 90 0 0.48 monkey electrophysiology Relative y position 30 Yamane Carlson Bowman Wang h& Connor ’08 f g e y position (deg) –1 Response (spikes s–1) c Response (spikes s–1) –1 Response (spikes s ) 0 spikes s 1 0 The ‘what’ problem human psychophysics Thorpe et al ’96; VanRullen & Thorpe ’01; Fei-Fei et al ’02 ’05; Evans & Treisman ’05; Serre Oliva & Poggio ’07 Vision as ‘knowing what is where’ Aristotle; Marr ’82 ‘What’ and ‘where’ cortical pathways Ungerleider & Mishkin ‘84 ventral ‘what‘ stream ‘What’ and ‘where’ cortical pathways Ungerleider & Mishkin ‘84 dorsal ‘where’ stream ventral ‘what‘ stream ‘What’ and ‘where’ cortical pathways Ungerleider & Mishkin ‘84 How does the visual system combine information about the identity and location of objects? ? dorsal ‘where’ stream ventral ‘what‘ stream ‘What’ and ‘where’ cortical pathways Ungerleider & Mishkin ‘84 How does the visual system combine information about the identity and location of objects? ➡ Central thesis: visual attention (see also Van Der Velde and De Kamps ‘01; Deco and Rolls ‘04) ? dorsal ‘where’ stream ventral ‘what‘ stream ‘What’ and ‘where’ cortical pathways Ungerleider & Mishkin ‘84 Part I: A ‘Bayesian’ theory of attention that solves the ‘what is where’ problem • Predictions for neurophysiology and human eye movement data Part II: Readout of IT population activity • Attention eliminates clutter • Sharat Chikkerur • Cheston Tan • Tomaso Poggio • Ying Zhang • Ethan Meyers • Narcisse Bichot • Tomaso Poggio • Bob Desimone Part I: A ‘Bayesian’ theory of attention that solves the ‘what is where’ problem • Predictions for neurophysiology and human eye movement data Part II: Readout of IT population activity • Attention eliminates clutter • Sharat Chikkerur • Cheston Tan • Tomaso Poggio • Ying Zhang • Ethan Meyers • Narcisse Bichot • Tomaso Poggio • Bob Desimone Perception as Bayesian inference S visual scene description I image measurements Mumford ’92; Knill & Richards ‘96; Dayan & Zemel ’99; Rao ’02 ’04; Kersten & Yuille ‘03; Kersten et al ‘04; Lee & Mumford ‘03; Dean ’05; George & Hawkins ’05 ’09; Hinton ‘07; Epshtein et al ‘08; Murray & Kreutz-Delgado ’07 P (S|I) ∝ P (I|S)P (S) Perception as Bayesian inference S visual scene description I image measurements Mumford ’92; Knill & Richards ‘96; Dayan & Zemel ’99; Rao ’02 ’04; Kersten & Yuille ‘03; Kersten et al ‘04; Lee & Mumford ‘03; Dean ’05; George & Hawkins ’05 ’09; Hinton ‘07; Epshtein et al ‘08; Murray & Kreutz-Delgado ’07 P (S|I) ∝ P (I|S)P (S) Perception as Bayesian inference S visual scene description I image measurements Mumford ’92; Knill & Richards ‘96; Dayan & Zemel ’99; Rao ’02 ’04; Kersten & Yuille ‘03; Kersten et al ‘04; Lee & Mumford ‘03; Dean ’05; George & Hawkins ’05 ’09; Hinton ‘07; Epshtein et al ‘08; Murray & Kreutz-Delgado ’07 object identities object locations S = {O1 , O2 , · · · , On , L1 , L2 , · · · , Ln } Perception as Bayesian inference P (S|I) ∝ P (I|S)P (S) S visual scene description I image measurements Mumford ’92; Knill & Richards ‘96; Dayan & Zemel ’99; Rao ’02 ’04; Kersten & Yuille ‘03; Kersten et al ‘04; Lee & Mumford ‘03; Dean ’05; George & Hawkins ’05 ’09; Hinton ‘07; Epshtein et al ‘08; Murray & Kreutz-Delgado ’07 object identities object locations S = {O1 , O2 , · · · , On , L1 , L2 , · · · , Ln } Perception as Bayesian inference P (S, I) = P (O1 , L1 , O2 , L2 , · · · , Ln , Ln , I) S visual scene description I image measurements Mumford ’92; Knill & Richards ‘96; Dayan & Zemel ’99; Rao ’02 ’04; Kersten & Yuille ‘03; Kersten et al ‘04; Lee & Mumford ‘03; Dean ’05; George & Hawkins ’05 ’09; Hinton ‘07; Epshtein et al ‘08; Murray & Kreutz-Delgado ’07 • To recognize and localize objects in the scene, the visual system selects objects, one object at a time Assumption #1: Attentional spotlight P (O, L, I) O,L visual scene description I image measurements Broadbent ’52 ‘54; Treisman ‘60; Treisman & Gelade ‘80; Duncan & Desimone ‘95; Wolfe ‘97; and many others • Object location L and identity O are independent dorsal ‘where’ stream ventral ‘what‘ stream Assumption #2: ‘what’ and ‘where’ pathways P (O, L, I) O,L visual scene description I image measurements • Object location L and identity O are independent P (O, L, I) = P (O)P (L)P (I|L, O) dorsal ‘where’ stream object O L ventral ‘what‘ stream location Assumption #2: ‘what’ and ‘where’ pathways I image measurements • Objects encoded by collections of ‘universal’ features (of intermediate complexity) - Either present or absent - Conditionally independent given the object and its location object O L location Assumption #3: universal features I image measurements see Riesenhuber & Poggio ’99; Serre et al ’05 ’07 • Objects encoded by collections of ‘universal’ features (of intermediate complexity) location O object Fi position and scale tolerant features Xi retinotopic features - Either present or absent - Conditionally independent given the object and its location L N I Assumption #3: universal features image measurements see Riesenhuber & Poggio ’99; Serre et al ’05 ’07 location O object Fi position and scale tolerant features Xi retinotopic features L N I Assumption #3: universal features image measurements see Riesenhuber & Poggio ’99; Serre et al ’05 ’07 location O object Fi position and scale tolerant features Xi retinotopic features L N V2/V4 Serre et al ’05 David et al ’06 Cadieu et al ’07 Willmore et al ’10 Assumption #3: universal features I image measurements see Riesenhuber & Poggio ’99; Serre et al ’05 ’07 location O object Fi position and scale tolerant features Xi retinotopic features L N V2/V4 Serre et al ’05 David et al ’06 Cadieu et al ’07 Willmore et al ’10 Assumption #3: universal features I image measurements see Riesenhuber & Poggio ’99; Serre et al ’05 ’07 location IT Serre et al ’07 O object Fi position and scale tolerant features Xi retinotopic features L N V2/V4 Serre et al ’05 David et al ’06 Cadieu et al ’07 Willmore et al ’10 Assumption #3: universal features I image measurements see Riesenhuber & Poggio ’99; Serre et al ’05 ’07 Predicting IT readout Serre et al ’07 1 IT O object Fi position and scale tolerant features Xi retinotopic features Model Classification performance 0.8 location 0.6 0.4 L 0.2 0 Size: 3.4o Position: center 3.4o center 1.7o center 6.8o 3.4o 3.4o o o center 2 horz. 4 horz. N TRAIN I TEST Bayesian inference and attention image measurements Model: Serre et al ‘05 Experimental data: Hung* Kreiman*et al ‘05 O object Fi position and scale tolerant features Xi retinotopic features behavior Serre Oliva & Poggio ’07 location L N I Bayesian inference and attention image measurements O object Fi position and scale tolerant features Xi retinotopic features behavior Serre Oliva & Poggio ’07 location L N I Bayesian inference and attention image measurements O object Fi position and scale tolerant features Xi retinotopic features behavior Serre Oliva & Poggio ’07 location L N I Bayesian inference and attention image measurements feedforward (bottom-up) sweep O object Fi position and scale tolerant features Xi retinotopic features behavior Serre Oliva & Poggio ’07 location L N I Bayesian inference and attention image measurements feedforward (bottom-up) sweep • Goal of visual perception: to estimate posterior probabilities of visual features, objects and their locations in an image location • Attention corresponds to conditioning on high-level latent variables representing particular features or locations (as well as on sensory input), and doing inference over the other latent variables Bayesian inference and attention O object Fi position and scale tolerant features Xi retinotopic features L N I image measurements dorsal / where ventral / what O Fi L Xi N I Bayesian inference and attention dorsal / where ventral / what O Fi L Xi N P (X i |I) = � i i i P (I|X i ) F i ,L P (X |F , L)P (L)P (F ) � � � � i i i i X i P (I|X ) F i ,L P (X |F , L)P (L)P (F ) Bayesian inference and attention I dorsal / where ventral / what O Fi L Xi N feedforward input P (X i |I) = � i i i P (I|X i ) F i ,L P (X |F , L)P (L)P (F ) � � � � i i i i X i P (I|X ) F i ,L P (X |F , L)P (L)P (F ) Bayesian inference and attention I feedforward (bottom-up) sweep dorsal / where ventral / what O Fi top-down spatial attention top-down featurebased attention L Xi feedforward input P (X i |I) = N attentional modulation � i i i P (I|X i ) F i ,L P (X |F , L)P (L)P (F ) � � � � i i i i X i P (I|X ) F i ,L P (X |F , L)P (L)P (F ) Bayesian inference and attention I feedforward (bottom-up) sweep dorsal / where ventral / what O Fi top-down spatial attention top-down featurebased attention L Xi feedforward input P (X i |I) = N attentional modulation � i i i P (I|X i ) F i ,L P (X |F , L)P (L)P (F ) � � � � i i i i X i P (I|X ) F i ,L P (X |F , L)P (L)P (F ) Xk I suppressive drive Bayesian inference and attention feedforward (bottom-up) sweep Neuron Review O The Normalization Model of Attention John H. Reynolds1,* and David J. Heeger2 1Salk Institute for Biological Studies, La Jolla, CA 92037-1099, USA of Psychology and Center for Neural Science, New York University, New York, NY 10003, USA *Correspondence: [email protected] DOI 10.1016/j.neuron.2009.01.002 2Department Fi Attention has been found to have a wide variety of effects on the responses of neurons in visual cortex. We A(x,each θ)E(x, θ)different forms of attentional modulation, depending describe a model of attention that exhibits of these R(x, θ) = on the stimulus conditions and the spread (or selectivity) of the attention field in the model. The model helps S(x,toθ) + σ alternative theories of attention. We argue that the reconcile proposals that have been taken represent variety and complexity of the results reported in the literature emerge from the variety of empirical protocols that were used, such that the results observed in any one experiment depended on the stimulus conditions and the subject’s attentional strategy, a notion that we define precisely in terms of the attention field in the model, but that has not typically been completely under experimental control. L E Introduction Attention has been known to play a central role in perception since the dawn of experimental psychology (James, 1890). Over the past 30 years, the neurophysiological basis of visual i attention has become an active area of research, yielding an F i ,L i explosion of findings. Neuroscientists have utilized a variety of techniques (single-unit electrophysiology, electricalimicrostimui ,L lation, functional imaging, and visual-evoked potentials) to F map Xi the network of brain areas that mediate the allocation of attention (Corbetta and Shulman, 2002; Yantis and Serences, 2003) and to examine how attention modulates neuronal activity in visual cortex (Desimone and Duncan, 1995; Kastner and Ungerleider, 2000; Reynolds and Chelazzi, 2004). During the same period of time, the field of visual psychophysics has developed rigorous methods for measuring and characterizing the effects of attention on visual performance (Braun, 1998; Carrasco, 2006; Cavanagh and Alvarez, 2005; Sperling and Melchner, 1978; Verghese, 2001; Lu and Dosher, 2008). We review the single-unit electrophysiology literature documenting the effects of attention on the responses of neurons in visual cortex, and we propose a computational model to unify the seemingly disparate variety of such effects. Some results are consistent with the appealingly simple proposal that attention increases neuronal responses multiplicatively by applying a fixed response gain factor (McAdams and Maunsell, 1999; Treue and Martinez-Trujillo, 1999), while others are more in keeping with a change in contrast gain (Li and Basso, 2008; Martinez-Trujillo and Treue, 2002; Reynolds et al., 2000), or with P (X |I) = Xi N A within the framework of a single computational model. We demonstrate here that a model of attention that incorporates divisive normalization (Heeger, 1992b) exhibits each of these different forms of attentional modulation, depending on the stimi i i ulus conditions and the spread (or selectivity) of the attentional feedback in the model. In addition i ito unifying a range ofiexperimental data within a common computational framework, the proposed model helps reconcile alternative theories of attention. Moran and Desimone (1985) proposed that attention operates by shrinking neuronal receptive fields around the attended stimulus. Desimone and Duncan (1995) proposed an alternative model, in which neurons representing different stimulus components compete and attention operates by biasing the competition in favor of neurons that encode the attended stimulus. It was later suggested that attention instead operates simply by scaling neuronal responses by a fixed gain factor (McAdams and Maunsell, 1999; Treue and Martinez-Trujillo, 1999). Treue and colleagues advanced the ‘‘feature-similarity gain principle,’’ that the gain factor depends on the match between a neuron’s stimulus selectivity and the features or locations being attended (Treue and Martinez-Trujillo, 1999; Martinez-Trujillo and Treue, 2004). Spitzer et al., 1988 proposed that attention sharpens neuronal tuning curves, and Martinez-Trujillo and Treue (2004) explained that sharpening is predicted by their ‘‘feature-similarity gain principle.’’ Finally, Reynolds et al., 2000 proposed that attention increases contrast gain. Indeed, the initial motivation for the model proposed here derived from the reported similarities between the effects of � P (I|X ) P (X |F , L)P (L)P (F ) � � � � P (I|X ) P (X |F , L)P (L)P (F ) S Bayesian inference and attention I P (X i = x|I) ∝ � P (X i = x|F i , L)P (I|X i )P (F i )P (L). F i ,L McAdams and Maunsell ‘99 Model P (L = x) ≈ 1 P (L = x) = 1/|L| Multiplicative scaling of tuning curves by spatial attention see also Heeger & Reynolds ’09 i P (X i |I) = � i i i P (I|X ) P (X |F , L)P (L)P (F ) i F ,L � � � � i i i i X i P (I|X ) F i ,L P (X |F , L)P (L)P (F ) Trujillo and Treue ‘02 Mc Adams and Maunsell ’99 Contrast vs. response gain see also Heeger & Reynolds ’09 P (X i = x|I) Normalized spikes 1 P stim/P cue NP stim/ P.cue P.stim/NP cue NP stim/NP cue 0.9 0.8 Normalized response 0.7 ∝ � P (X i = x|F i , L)P (I|X i )P (F i )P (L). F i ,L 3 2 1 0 Bichot et al. ‘05 0.6 0.5 0.4 0.3 0.2 0.1 0 0 20 40 60 80 100 120 140 160 180 200 Feature-based attention neural data from Bichot et al ’05 O Fi L Xi N I Learning to localize cars and pedestrians learning object priors O Fi L Xi N I Learning to localize cars and pedestrians learning object priors O Context Fi L Xi N I Learning to localize cars and pedestrians learning location priors (global contextual cues, see Torralba, Oliva et al) learning object priors O Context Fi L Xi N I Learning to localize cars and pedestrians • Eye movements as proxy for attention • Dataset: - 100 street-scenes images with cars & pedestrians and 20 without • Experiment - 8 participants asked to count the number of cars/pedestrians - block design for cars and pedestrians - eye movements recorded using an infra-red eye tracker The experiment • Eye movements as proxy for attention • Dataset: - 100 street-scenes images with cars & pedestrians and 20 without • Experiment - 8 participants asked to count the number of cars/pedestrians Uniform priors (bottom-up) Feature priors Feature + contextual (spatial) priors Humans The experiment - block design for cars and pedestrians - eye movements recorded using an infra-red eye tracker 1st three fixations • Eye movements as proxy for attention • Dataset: - 100 street-scenes images with cars & pedestrians and 20 without • Experiment - 8 participants asked to count the number of cars/pedestrians Uniform priors (bottom-up) Feature priors Feature + contextual (spatial) priors Humans The experiment - block design for cars and pedestrians - eye movements recorded using an infra-red eye tracker 1st three fixations • Eye movements as proxy for attention ROC area • Dataset: - 100 street-scenes images with cars & pedestrians and 20 without • Experiment - 8 participants asked to count the number of cars/pedestrians Uniform priors (bottom-up) Feature priors Feature + contextual (spatial) priors Humans The experiment - block design for cars and pedestrians - eye movements recorded using an infra-red eye tracker 1st three fixations ROC area 1 • Eye movements as proxy for attention • Dataset: 0.75 - 100 street-scenes images with cars & pedestrians and 20 without • Experiment 0.5 cars pedestrians Uniform priors (bottom-up) Feature priors Feature + contextual (spatial) priors Humans The experiment - 8 participants asked to count the number of cars/pedestrians - block design for cars and pedestrians - eye movements recorded using an infra-red eye tracker 1st three fixations ROC area 1 • Eye movements as proxy for attention • Dataset: 0.75 - 100 street-scenes images with cars & pedestrians and 20 without • Experiment 0.5 cars pedestrians Uniform priors (bottom-up) Feature priors Feature + contextual (spatial) priors Humans The experiment - 8 participants asked to count the number of cars/pedestrians - block design for cars and pedestrians - eye movements recorded using an infra-red eye tracker 1st three fixations ROC area 1 • Eye movements as proxy for attention • Dataset: 0.75 - 100 street-scenes images with cars & pedestrians and 20 without • Experiment 0.5 cars pedestrians Uniform priors (bottom-up) Feature priors Feature + contextual (spatial) priors Humans The experiment - 8 participants asked to count the number of cars/pedestrians - block design for cars and pedestrians - eye movements recorded using an infra-red eye tracker 1st three fixations ROC area 1 • Eye movements as proxy for attention • Dataset: 0.75 - 100 street-scenes images with cars & pedestrians and 20 without • Experiment 0.5 cars pedestrians Uniform priors (bottom-up) Feature priors Feature + contextual (spatial) priors Humans The experiment - 8 participants asked to count the number of cars/pedestrians - block design for cars and pedestrians - eye movements recorded using an infra-red eye tracker 1st three fixations ROC area 1 • Eye movements as proxy for attention • Dataset: 0.75 - 100 street-scenes images with cars & pedestrians and 20 without • Experiment 0.5 cars pedestrians Uniform priors (bottom-up) Feature priors Feature + contextual (spatial) priors Humans The experiment - 8 participants asked to count the number of cars/pedestrians - block design for cars and pedestrians - eye movements recorded using an infra-red eye tracker 1st three fixations ROC area 1 • Eye movements as proxy for attention • Dataset: 0.75 - 100 street-scenes images with cars & pedestrians and 20 without • Experiment 0.5 cars pedestrians Uniform priors (bottom-up) Feature priors Feature + contextual (spatial) priors Humans The experiment - 8 participants asked to count the number of cars/pedestrians - block design for cars and pedestrians - eye movements recorded using an infra-red eye tracker 1st three fixations ROC area 1 • Eye movements as proxy for attention • Dataset: 0.75 - 100 street-scenes images with cars & pedestrians and 20 without • Experiment 0.5 cars pedestrians Uniform priors (bottom-up) Feature priors Feature + contextual (spatial) priors Humans The experiment - 8 participants asked to count the number of cars/pedestrians - block design for cars and pedestrians - eye movements recorded using an infra-red eye tracker 1st three fixations ROC area 1 • Eye movements as proxy for attention • Dataset: 0.75 - 100 street-scenes images with cars & pedestrians and 20 without • Experiment 0.5 cars pedestrians Uniform priors (bottom-up) Feature priors Feature + contextual (spatial) priors Humans The experiment - 8 participants asked to count the number of cars/pedestrians - block design for cars and pedestrians - eye movements recorded using an infra-red eye tracker 1st three fixations ROC area 1 • Eye movements as proxy for attention • Dataset: 0.75 - 100 street-scenes images with cars & pedestrians and 20 without • Experiment 0.5 cars pedestrians Uniform priors (bottom-up) Feature priors Feature + contextual (spatial) priors Humans The experiment - 8 participants asked to count the number of cars/pedestrians - block design for cars and pedestrians - eye movements recorded using an infra-red eye tracker Explains 92% of the inter-subject agreement! 1st three fixations • Eye movements as proxy for attention ROC area 1 • Dataset: 0.75 - 100 street-scenes images with cars & pedestrians and 20 without • Experiment 0.5 cars - 8 participants asked to count the number of cars/pedestrians pedestrians Uniform priors (bottom-up) Feature priors Feature + contextual (spatial) priors Humans - block design for cars and pedestrians - eye movements recorded using an infra-red eye tracker The experiment *similar (independent) results by Ehinger Hidalgo Torralba & Oliva ’10 Method ROC area Bruce & Tsotos ’06 72.8% Itti et al ’01 72.7% Proposed 77.9% Fi L Xi N I Bottom-up saliency and free-viewing human eye data from Bruce & Tsotsos Summary I Summary I • Goal of vision: - To solve the problem of ’what is where’ problem Summary I • Goal of vision: - To solve the problem of ’what is where’ problem • Key assumptions: - ‘Attentional spotlight’ → recognition done sequentially, one object at a time - ‘What’ and ‘where’ independent pathways Summary I • Goal of vision: - To solve the problem of ’what is where’ problem • Key assumptions: - ‘Attentional spotlight’ → recognition done sequentially, one object at a time - ‘What’ and ‘where’ independent pathways • Attention as the inference process implemented by the interaction between ventral and dorsal areas - Integrates bottom-up and top-down (feature-based and context-based) attentional mechanisms - Seems consistent with known physiology Summary I • Goal of vision: - To solve the problem of ’what is where’ problem • Key assumptions: - ‘Attentional spotlight’ → recognition done sequentially, one object at a time - ‘What’ and ‘where’ independent pathways • Attention as the inference process implemented by the interaction between ventral and dorsal areas - Integrates bottom-up and top-down (feature-based and context-based) attentional mechanisms - Seems consistent with known physiology • Main attentional effects in the presence of clutters - Spatial attention reduces uncertainty over location and improves object recognition performance over first bottom-up (feedforward) pass Part I: A ‘Bayesian’ theory of attention that solves the ‘what is where’ problem • Predictions for neurophysiology and human eye movement data Part II: Readout of IT population activity • Attention eliminates clutter • Sharat Chikkerur • Cheston Tan • Tomaso Poggio • Ying Zhang • Ethan Meyers • Narcisse Bichot • Tomaso Poggio • Bob Desimone The ‘readout’ approach The ‘readout’ approach The ‘readout’ approach neuron 1 neuron 2 neuron 3 neuron n The ‘readout’ approach neuron 1 neuron 2 neuron 3 neuron n Pattern Classifier The ‘readout’ approach neuron 1 neuron 2 neuron 3 neuron n Prediction Pattern Classifier The ‘readout’ approach neuron 1 neuron 2 neuron 3 neuron n “Correct” Prediction Pattern Classifier IT readout improves with attention train readout classifier on isolated object + IT readout improves with attention train readout classifier on isolated object + test generalization in clutter + IT readout improves with attention train readout classifier on isolated object test generalization in clutter unattended + + IT readout improves with attention train readout classifier on isolated object test generalization in clutter unattended + + attended IT readout improves with attention train readout classifier on isolated object test generalization in clutter unattended + + attended attend away IT readout improves with attention Zhang Meyers Bichot Serre Poggio Desimone unpublished data IT readout improves with attention + Zhang Meyers Bichot Serre Poggio Desimone unpublished data IT readout improves with attention + Zhang Meyers Bichot Serre Poggio Desimone unpublished data IT readout improves with attention + Zhang Meyers Bichot Serre Poggio Desimone unpublished data IT readout improves with attention + Zhang Meyers Bichot Serre Poggio Desimone unpublished data IT readout improves with attention Zhang Meyers Bichot Serre Poggio Desimone unpublished data IT readout improves with attention + Zhang Meyers Bichot Serre Poggio Desimone unpublished data IT readout improves with attention + Zhang Meyers Bichot Serre Poggio Desimone unpublished data IT readout improves with attention + Zhang Meyers Bichot Serre Poggio Desimone unpublished data IT readout improves with attention + Zhang Meyers Bichot Serre Poggio Desimone unpublished data IT readout improves with attention + Zhang Meyers Bichot Serre Poggio Desimone unpublished data IT readout improves with attention + Zhang Meyers Bichot Serre Poggio Desimone unpublished data IT readout improves with attention + Zhang Meyers Bichot Serre Poggio Desimone unpublished data • Robert Desimone (MIT) • Christof Koch (CalTech) • Laurent Itti (USC) • Tomaso Poggio (MIT) • David Sheinberg (Brown) Thank you!
© Copyright 2025 Paperzz