Chapter 3: Object and face recognition Throughout the waking day we are bombarded with information from the visual environment. Mostly we make sense of that information, which usually involves identifying or recognising objects or faces that surround us. Object and face recognition typically occurs so effortlessly that it is hard to believe this is actually a rather complex achievement. In spite of the complexities of object recognition, we can generally go beyond simply identifying objects in the visual environment. For example, we can normally describe what an object would look like if viewed from a different angle, and we know its uses and functions. All in all, there is more to object and face recognition than might initially be supposed. Pattern recognition Navon (1977) demonstrated that participants tend to first process scenes globally before processing the local components. This is the global precedence effect. Navon developed hierarchical stimuli to show this. INTERACTIVE EXERCISE: Navon Hubel and Wiesel (e.g., 1962) won a Nobel prize for work with single-unit recordings of visual neurons. They discovered simple and complex cells in V1 receptive fields. V1 is organised as a retinotopic map (Bruce et al., 2003): A simple cell only responds strongly to stimuli of a particular orientation. Complex cells respond maximally to straight-line stimuli – they have large receptive fields and respond to moving contours. WEBLINK: Hubel and Wiesel’s (1962) article on receptive fields, binocular interaction and functional architecture in the cat’s visual cortex Perceptual organisation A basic issue in vision is perceptual segregation – our ability to work out which parts of the visual information belong together and thus form objects. A fundamental Gestalt (German group of psychologists between the world wars) principle is the law of Prägnanz: of possible geometric organisations, the one that actually occurs has the best, simplest and most stable shape. Gestalt principles include: the law of proximity – elements tend to be grouped together if they are close; the law of similarity – similar elements are grouped; the law of good continuation – elements requiring the fewest changes or interruptions in straight or smooth lines are grouped together; the law of closure – illustrated when we mentally fill in missing parts of a figure (e.g., a circle); the law of common fate – elements moving together are grouped together. WEBLINK: Examples of Gestalt laws of perceptual organisation Gestaltists emphasised the importance of figure–ground segregation – the figure has a form or shape that is distinct from the ground or background. Gestaltists assumed that no learning was needed for newborns to use principles of perceptual organisation (e.g., Bhatt and Quinn, 2011). WEBLINK: Figure–ground segregation Geisler et al. (2001) used realistic stimuli rather than artificial figures. They found that task performance followed different principles, such as the orientation of segments that were near or far apart. Palmer and Rock (1994) proposed a principle called uniform connectedness – a connected region with uniform visual properties is organised as a single perceptual unit. They showed that uniform connectedness could dominate over proximity and similarity. Han and Humphreys (2003) found that grouping by proximity was as fast as grouping by uniform connectedness, but uniform connectedness may be more important for multiple objects. Gestaltists argued that the laws of grouping typically operate in a bottom-up fashion. Vecera et al. (2004) found that attention is not always necessary for figure–ground separation. WEBLINK: Original article by Max Wertheimer on perceptual organisation Gestaltism has provided much inspiration for research in perceptual organisation. The laws of grouping have withstood the test of time. However, there are many limitations of the Gestalt approach. The Gestaltists de-emphasised the importance of past experience, then mostly produced descriptions rather than explanations. Most of the evidence is based on data from 2-D drawings and may not apply to 3-D objects. The Gestaltist view is also too inflexible. The Gestaltists put forward several laws of perceptual organisation, including the laws of proximity, similarity, good continuation, closure and common fate. These laws assist in figure–ground segregation. The Gestaltists provided descriptions rather than explanations, and overemphasised the role of bottom-up processing. Perceptual grouping can involve top-down processing, and a new principle of uniform connectedness seems to be important in perceptual grouping. Approaches to object recognition Hegdé (2008) stated the importance of spatial frequency in the analysis of visual scenes. He stated that visual processing progresses from coarse (low spatial frequency) to fine-grained (high spatial frequency). Flevaris et al. (2014) found evidence for this theory from the Navon task in which the participants’ task involved focusing on the global or local level. Marr (1982) proposed a computational theory with a series of representations: Primal sketch: a 2-D description of light-intensity changes. 2½-D sketch: includes depth and orientation of surfaces. Is viewpoint dependent? 3-D model representation: describes shapes of objects and their relative positions. Viewpoint invariant. WEBLINK: An outline of Marr’s theory Biederman (1987) put forward a theory that objects consist of basic shapes called “geons” (about 36 in total). According to Biederman, the first step in object recognition is edge extraction, followed by deciding how a visual object should be segmented into components. Biederman described five invariant properties of edges: curvature, parallel, cotermination, symmetry and collinearity. The geons of a visual object are constructed from these properties. This theory predicts that object recognition is typically viewpointinvariant since geons can be identified from a variety of viewpoints. The non-accidental principle states that regularities in the visual image reflect actual (or non-accidental) regularities in the world. Use of this principle can help object recognition but may also lead to error, for example a bike viewed from the front may look like a straight edge. When conditions are sub-optimal, invariant properties can still be detected and missing parts of a contour can be restored. Furthermore, there is generally much redundant information for recognising complex objects. Biederman and Gerhardstein (1993) found evidence for object recognition being viewpoint-invariant. They observed that object naming could be primed by different views as well as by identical ones. However, when Tarr and Bülthoff (1995) used novel objects, they found that object recognition was viewpointdependent. Biederman (1987) used line drawings and found that recognition was much harder when concavities were omitted. Vogels et al. (2001) found evidence that some monkey cortical neurons are sensitive to geons. However, Sanocki et al. (1998) pointed out that edge-extraction processes are less likely to lead to accurate object recognition when objects are present in the context of other objects. Top-down processes are also important for object recognition. Biederman’s theory is a reasonably plausible account of object recognition. There is much evidence that the identification of concavities and edges is important for object recognition. Biederman’s theory has several limitations: It focuses on bottom-up processes and de-emphasises top-down processes. It only accounts for fairly unsubtle perceptual discrimination. It assumes that object recognition is viewpoint-invariant. Object recognition is more flexible than the theory assumes. WEBLINK: Biederman’s theory Viewpoint-invariant theories, such as Biederman’s, argue that the ease of recognition is not affected by the observer’s viewpoint. Viewpoint-dependent theories assume that changes in view affect speed or accuracy of recognition. Evidence suggests the two mechanisms are used at different times: viewpoint-invariant for easy category decisions; viewpoint-independent for more difficult, within-category decisions (Milivojevic, 2012). Zimmermann and Eimer (2013) demonstrated that face recognition was view-dependent during the first presentation of faces. Thereafter, as the faces became increasingly familiar through learning, face recognition became more viewpoint-invariant. The extent to which object recognition is viewpointdependent or viewpoint-invariant depends on whether discrimination is between- or within-category, and also on task complexity. Many traditional accounts of object recognition focus solely on a feedforward hierarchy of processing stages progressing from early visual cortex through to the inferotemporal cortex. However, anatomical evidence suggests this is a considerable oversimplification. There are approximately equal numbers of forward and backward projecting neurons throughout most of the visual system (Wyatte et al., 2012; Gilbert & Li, 2013). In essence, backward projecting neurons are associated with top-down processing. Face recognition The ability to recognise faces is very important. Face recognition differs in important ways from other forms of object recognition. Face recognition involves more holistic processing – processing that involves strong integration across the whole object. This is important because specific features of a face may be shared by different individuals, or may be subject to change. In the inversion effect, faces are harder to identify when presented upside-down than when upright. In the part–whole effect, memory for a face part is more accurate when it is presented within the whole face rather than on its own. In the composite effect, performance is only impaired when the two halves of different faces are aligned. The inversion, part–whole and composite effects provide evidence that faces are subject to holistic processing. Most people have much more experience at processing faces and thus have special expertise in face processing. It is possible that holistic processing is found for any category of objects for which an individual possesses expertise. WEBLINK: Thatcher illusion Prosopagnosia Prosopagnosia is a condition in which familiar faces cannot be recognised consciously but common objects are recognised. People with prosopagnosia often show covert recognition (processing of faces without conscious awareness). Covert recognition can become overt if the task is very easy. Prosopagnosia is a diverse condition in which the problems vary from patient to patient. The origins of the condition also vary. Face recognition is also much harder than object recognition. Neuropsychological studies demonstrate a double dissociation between face and object recognition. Busigny et al. (2010a) found evidence from previous research suggesting at least 13 prosopagnosics had essentially normal levels of object recognition in spite of very poor face recognition. Moscovitch et al. (1997) found that a patient CK with object agnosia performed as well as controls on face recognition tasks. This double dissociation may indicate that different processes or brain areas underlie face and object recognition. WEBLINK: Prosopagnosia WEBLINK: Video of a prosopagnostic Fusiform face area The fusiform face area in the lateral fusiform gyrus appears to be specialised for face processing. This area is frequently damaged in patients with acquired prosopagnosia, and responds more strongly to faces than to other objects. However, other areas are involved in face processing, such as the occipital face area and the superior temporal sulcus. The fusiform face area is more complicated than generally assumed. Downing et al. (2006) found, using fMRI, that more voxels in the fusiform face area were selective to faces than to other objects, but the differences were not great. Patients with prosopagnosia often have damage to the occipital face area as well as (or instead of) the fusiform face area (Gainotti & Marra, 2011). Expertise There is major theoretical controversy concerning whether the fusiform face area is face-selective. Gauthier and Tarr (2002) claim that the brain mechanisms for face recognition are also involved in recognising any object categories for which we possess expertise. According to Gauthier and Tarr (2002), findings interpreted as being specific to faces may apply to any object category for which we possess expertise. Four predictions follow from this: 1. Holistic processing occurs for any object category for which observers possess expertise. 2. The fusiform face area should be activated for any object category for which observers possess expertise. 3. Young children should show less evidence of holistic processing of faces than older children and adults. 4. If the processing of faces and of objects of expertise involves similar processes, then objects of expertise should interfere with face processing. Gauthier and colleagues have found some support for these predictions in their studies involving recognition of “Greebles”. Indeed, every hypothesis regarding expertise has been supported to some extent. This suggests that faces have special and unique characteristics not shared by other objects. Theoretical approaches Bruce and Young’s (1986) model is the most influential theoretical approach to face recognition. The model has eight components: structural encoding, expression analysis, facial speech analysis, directed visual processing, face recognition nodes, person identity nodes, name generation and a cognitive system. The model predicts that: familiar and unfamiliar faces are processed differently; facial identity and facial expression are processed separately; for a familiar face, familiarity information is accessed before information about identity or name. Malone et al. (1982) provided evidence for a double dissociation between recognition of familiar and unfamiliar faces. Much research supports the assumption that there are separate routes for processing facial identity and expression. Fox et al. (2011) found patients with damage to the face-recognition network had impaired identity perception but not expression perception. Patients with impaired recognition of facial expression may have other emotional impairments. Different brain regions may be involved in processing facial expressions and identity. Processing facial identity is associated with the fusiform face area, while processing expressions activates the superior temporal sulcus. Young et al. (1985) found people decided more quickly on whether a face was familiar than on the identity of the face. Young et al. (1985) found evidence that the name of a person cannot be accessed without also having other information about the person available. However, Brédart et al. (2005) found evidence contrary to the model, which showed that speed of recall for names was faster than that for personal information when the faces were those of personal friends. Bruce and Young’s model is deservedly influential and many of its predictions have received empirical support. However, the model also possesses limitations, mostly because it is oversimplified. Also, the assumption that facial identity and expression involve separate processing routes may be too extreme. The assumption that name processing always occurs after processing of other personal information may also be too rigid. CASE STUDY: Bruce and Young’s model INTERACTIVE EXERCISE: Bruce and Young’s model Several kinds of information can be extracted from faces, with important differences existing between familiar and unfamiliar faces. It is very rare for anyone to put a name to a face without knowing anything else about the person. There is good evidence for configural processing of faces. Prosopagnosic patients do not recognise familiar faces overtly, but often show evidence of covert recognition. It has often been argued that faces are special because they involve holistic or configural processing, there is a brain area (fusiform face area) specifically associated with face processing, and prosopagnosic patients have recognition problems only with faces. However, it has also been suggested that faces only appear special because we have much expertise with faces. Visual imagery Visual imagery occurs when a visual short-term memory representation is present but the stimulus is not actually being viewed. Individuals with Charles Bonnet syndrome experience hallucinations in which images and perceptions are confused. Patients have increased brain activity in areas specialised for visual processing when hallucinating. We normally do not confuse images with perceptions because we are aware that we have deliberately constructed images. Images also typically contain less detail than perception. WEBLINK: Website of the Kosslyn Laboratory The essence of Kosslyn’s theory (e.g., 1994, 2005) is that there are close similarities between visual imagery and visual perception, with images being depictive representations. According to Kosslyn and Thompson (2003), depictive representations are created in early visual cortex. The notion of a visual buffer is where the depictive representations of imagery are formed. Propositional theory was proposed by Pylyshyn (e.g., 2002, 2003). This states that mental imagery tasks do not depend on depictive representations but on tacit knowledge. INTERACTIVE EXERCISE: Kosslyn: mental scanning Imagery resembles perception If visual perception and visual imagery both depend on the same visual buffer, then we would expect perception and imagery to influence each other. Pearson et al. (2008) demonstrated a facilitation effect between a stimulus that was initially perceived/imagined, and what was later consciously perceived in a binocular rivalry task. Baddeley and Andrade (2000) found evidence for an interference effect between a visual imagery task and a spatial tapping task, both of which involve the visual buffer. Kosslyn and Thompson (2003) considered 59 brain-imaging studies and found that more than half associated visual imagery tasks with early visual cortex activation. Studies in which it was necessary for participants to inspect high-resolution details, or with an emphasis on shape-based processing, were more likely to show activation of early visual cortex. Kosslyn et al. (1999) found that when rTMS was applied to V1, performance on a visual imagery task was impaired, indicating that the early visual cortex is necessary for visual imagery. Ganis et al. (2004) found that brain areas activated during imagery formed a subset of those activated during perception. RESEARCH ACTIVITY: Kosslyn: mental imagery Imagery does not resemble perception There is evidence suggesting important differences between imagery and perception (S.H. Lee et al., 2012). Images often consist of simplified structural descriptions that omit important aspects of the imagined object. Slezak (1991, 1995) found images can be deficient compared to visual percepts. If visual perception and imagery involve the same mechanisms, then one would expect brain damage to have similar effects on both. However, some brain-damaged patients have intact visual perception but impaired visual imagery (Bartolomeo, 2002). How can we account for intact perception with impaired imagery? Kosslyn believed these patients have problems generating the image from information in long-term memory. Conversely, in Anton’s syndrome, a blind person is unaware that he/she is blind and may confuse imagery for actual perception. This is an example of intact imagery with impaired perception. The central assumption of Kosslyn’s perceptual anticipation theory has attracted much support. Predictions regarding facilitatory and interference effects have also been supported. However, the evidence from braindamaged patients is hard to evaluate. We need to develop more understanding of why dissociations occur between perception and imagery. There is convincing evidence that different brain areas are involved in imagery for object shapes and for movement/spatial relationships. According to Kosslyn’s perceptual anticipation theory, there are close similarities between visual imagery and perception, with images being depictive or “quasi-pictorial” representations. Pylyshyn (2002) has put forward a propositional theory, according to which people asked to form images make use of tacit knowledge in the form of propositions. Evidence from brain-damaged patients and from brain-imaging studies gives support for Kosslyn’s theory, but the existence of dissociations between perception and imagery poses problems for Kosslyn’s theory. Additional references Brédart, S., Brennen, T., Delchambre, M., McNeill, A. & Burton, A.M. (2005). Naming very familiar people: When retrieving names is faster than retrieving semantic biographical information. British Journal of Psychology, 96: 205–14. Kosslyn, S.M., Pascual-Leone, A., Felician, O., Camposano, S., Keenan, J.P., Thompson, W.L., Ganis, G., Sukel, K.E. & Alpert, N.M. (1999). The role of Area 17 in visual imagery: Convergent evidence from PET and rTMS. Science, 284: 167–70. Malone, D.R., Morris, H.H., Kay, M.C. & Levin, H.S. (1982). Prosopagnosia: A double dissociation between the recognition of familiar and unfamiliar faces. Journal of Neurology, Neurosurgery, & Psychiatry, 45: 820–2.
© Copyright 2026 Paperzz