Roland Fleming & Bart Anderson: Perceptual Organization of Depth The Perceptual Organization of Depth Roland Fleming and Bart Anderson Department of Brain and Cognitive Sciences, MIT Running Head: Perceptual Organization of Depth Correspondence: Roland Fleming, MIT Room NE20-451, 77 Mass. Ave., Cambridge, MA 02139. Email: [email protected] 1 Roland Fleming & Bart Anderson: Perceptual Organization of Depth 2 Introduction The goal of depth perception is to identify the spatial layout of the objects and surfaces that constitute our surroundings. One important observation about the world around us that influences the way we see depth is that physical matter is not distributed randomly, with arbitrary depths at every location. On the contrary, the environment is generally organized: the world consists mainly of tightly bound objects in a discernable layout. This order results from countless forces and processes in our world which tend to organize matter into objects and place those objects in certain spatial relations. The central thesis of this chapter is that our perception of depth mirrors this organization. We argue that because the world consists of objects and surfaces, our perception of depth should likewise be represented in terms of the functionally valuable units of the environment, namely surfaces and objects. As we shall see, this has profound consequences for the processing of depth information. In particular, there is more to depth perception than simply measuring the distance from the observer of every location in the visual field. Rather, the perception of depth is the active organization of depth estimates into meaningful bodies. Depth constrains the formation of perceptual units, and, reciprocally, the figural relations between depth measurements allow the visual system to parse its representation of depth into ecologically valuable structures. There are many sources of information about depth from ‘pictorial’ perspective to motion parallax. An exhaustive review of all these sources of information is beyond the scope of this chapter (although see Bruce et al., 1996 and Palmer, 1999 for introductory reviews). Instead, we discuss three key domains in which the visual system “organizes” Roland Fleming & Bart Anderson: Perceptual Organization of Depth 3 our perception of depth into meaningful units, to emphasise the intimate relationship between depth processing and perceptual unit formation. In the first section we discuss how the visual system infers the layout of surfaces from local measurements of depth. We will argue that local estimates of depth are ambiguous, but that the geometry of occlusion critically constrains the legal interpretations. Occlusion occurs when one opaque object partly obscures the view of a more distant object, as happens frequently under normal viewing conditions. Occlusion is important because it occurs at object boundaries, and therefore the depth discontinuities introduced by occlusion provide ideal locations for the segmentation of depth into objects. Moreover, as we will show in section 1, the geometry of occlusion causes relatively near and relatively far depths to play different roles in the inference of surface structure. In the second section we discuss the visual representation of environmental structures that are hidden from view. If the visual system is to organize depth into meaningful bodies, it must represent whole objects and not only those fragments that happen to be visible. In order to do this, the visual system must interpolate across gaps in the image to complete its representation of form. We argue that by considering the particular environmental conditions under which structures become invisible (specifically occlusion and camouflage) we can make predictions about the mechanisms underlying visual completion. We also discuss how visual completion influences the representation of depth. Finally, we discuss what happens when the scene contains transparent surfaces, and thus multiple depths are visible along a single line of sight. We argue that this Roland Fleming & Bart Anderson: Perceptual Organization of Depth 4 introduces a second segmentation problem in the perceptual organization of depth. The visual system not only needs to segment “perpendicular” to the image plane, such that neighbouring locations are assigned to different objects; with transparency, the visual system also has to segment depth “parallel” to the image plane, by separating a single image intensity into multiple depths, a process known as ‘scission’ (Koffka, 1935). We discuss the conditions under which the visual system performs scission, and how the ordering of the surfaces in depth is resolved. We argue that the ambiguity of local depth measurements, the representation of missing structure, and the depiction of multiple depth planes are three of the major problems faced by a visual system if it is to organize depth into surfaces and objects. Through systematic explanations of example stimuli, we discuss some of the ways in which the visual system overcomes these problems. 1. Interpreting local depth measurements: the contrast depth asymmetry principle In this section we discuss how occlusion constrains the interpretation of local depth estimates. Specifically we show that occlusion enforces a crucial asymmetry between relatively near and relatively distant structures that can have profound implications for the representation of surface layout. Although the principles are discussed in terms of binocular disparity, the fundamental logic relates to the geometry of occlusion and therefore applies to any local estimate of depth. Roland Fleming & Bart Anderson: Perceptual Organization of Depth 5 1.1 Binocular stereopsis and the correspondence problem. Binocular stereopsis is the most thoroughly studied source of information about depth. Binocular depth perception relies on the fact that the two eyes receive slightly different views of the same scene. The horizontal parallax between the views has the consequence that a given feature in the world often projects to two slightly different locations on the two retinae (see Figure 1). These small differences in retinal location, or binocular “disparities”, vary systematically with distance in depth from the point of convergence and can thus be used to triangulate depth. For a thorough treatment of stereopsis see Howard and Rogers (1995) and chapters [chapter numbers for Ohzawa, Schor and Shimojo] of this volume. In order to determine the disparity of a feature in the world, the visual system must localize that feature in the two retinal images. Once it has identified matching image features, the difference in retinal location is the binocular disparity, which can then be scaled to estimate depth. The visual system must not measure the disparity between features that do not belong together, otherwise it will derive spurious depth estimates (see Figure 1). Because of this, the accuracy of the matching process is critical to binocular depth perception. The problem of identifying matching features in the two eye’s views (that is, features that originate from a common source in the world) is known as the “correspondence problem”. If the features that the visual system localizes in the two images are very simple, such as raw intensity values (or ‘pixels’) then in principle there could be many distracting features that do not in reality share a common origin in the world. Under these Roland Fleming & Bart Anderson: Perceptual Organization of Depth 6 conditions, the correspondence problem would be difficult as the visual system would have to identify the one true match from among a large number of false targets. However, there is considerable debate about what types of image features the visual system matches to determine disparity (Julesz, 1960, 1971; Sperling, 1970; Marr and Poggio, 1976, 1979; Pollard, Mayhew and Frisby, 1985; Prazdny, 1985; Jones and Malik, 1992). Psychophysically, at least, it now seems unlikely that the visual system matches raw luminances. Rather, the visual system seems to match local contrast signals, that is, localizable variations in intensity, such as luminance edges (Anderson and Nakayama, 1994, Smallman and McKee, 1995). This seems an almost inevitable consequence of early visual processing, which maximises sensitivity to contrasts, rather than to absolute luminances (Hartline, 1940; Wallach, 1948; Ratliff, 1965; Cornsweet, 1970). By the time binocular information converges in V1, the visual field appears to be represented in terms of local measurements of oriented contrast energy (Hubel and Wiesel, 1962; DeValois and DeValois, 1988) and thus it is likely that these are the features from which disparity is computed. If this is true, then the image features that carry disparity information are local contrasts, such as luminance edges. However, this poses a problem for the visual system, for in order to capture the functional units of the environment, the visual representation of depth should be tied to surfaces and objects, not to local image features. There is therefore, a potential discrepancy between the image features that carry disparity information (i.e. local contrasts), and the perceptual structures to which depth is assigned (i.e. regions) in the ultimate representation of environmental layout. This discrepancy plays a critical role in the theoretical discussion that follows. Roland Fleming & Bart Anderson: Perceptual Organization of Depth 7 A local image feature, such as an edge, has only one true match in the other eye’s image. Therefore, the edge carries only one disparity. However, depth is ultimately assigned to the two regions that meet to form the edge. This results in a problem: in order to represent surface structure the visual system must assign depth to both sides of an edge, even though the edge carries only one disparity (see Figure 2). How does the visual system infer the depths of two regions from every local disparity signal? We will show that the geometry of occlusion imposes an inviolable constraint on the interpretation of local disparity-carrying features. To anticipate, we show that the simple fact that near surfaces can occlude more distant ones, but not vice versa, has profound consequences for the assignment of depth to whole regions. 1.2 Asymmetries in depth: a demonstration. By way of motivation for the theoretical discussion that follows, consider figure 3, which is based on a figure developed by Takeichi, Watanabe and Shimojo (1992). The figure consists of a Kanizsa “illusory” triangle and three diamonds. When disparity places the diamonds nearer to the observer than the triangle and inducers (by cross-fusing the stereopair on the left of figure 3), the diamonds appear to float independently in front of the background, and the Kanizsa triangle tends to be seen as a figure in front of the circular inducers; this percept is schematised in figure 3b. The disparities in the display can be inverted simply by swapping the left and right eye’s views, as can be seen by cross-fusing the stereopair on the right of figure 3. In this case what was previously distant becomes near and vice versa, such that the diamonds are placed behind the plane of the inducers. In both versions of the display, the triangle itself carries no disparity Roland Fleming & Bart Anderson: Perceptual Organization of Depth 8 relative to the circular inducers; only the disparity of the diamonds changes from near to far. This simple inversion leads to a change in surface representation that is more complex than a simple reversal in the depth ordering of the perceptual units (as schematised in figure 3b). When the diamonds recede, they drag their background back with them, such that the triangle appears as a hole through which the observer can see a white surface; the three black diamonds lie embedded in the more distant white surface. This recession of the background has a secondary effect of increasing the strength of the illusory contour (the border of the triangle). The important observations with regard to the theory are the following. First, when the diamonds are in front, they are freely floating and separate, while when they recede, they drag the background with them. Second, when the dots are forward, the Kanizsa triangle tends to be seen as a figure (rather than ground), but when the diamonds are more distant, the triangle is seen as a hole. And yet all that changed in the display was the disparity of the diamonds. Why does this simple reversal in depth lead to an asymmetric change in the surface representation? Why does the disparity of the diamonds influence the appearance of the triangle? These are the asymmetries of depth to which the following discussion pertains. 1.3 From features to surfaces: interpretation of local disparity signals. Let us assume that the visual system has located a luminance edge and derived a disparity, d0, from that edge. What possible surface configurations are consistent with the local disparity measurement? Broadly, the legal interpretations fall into two classes, as shown in figure 4. The first class consists of surface events in which both sides of the Roland Fleming & Bart Anderson: Perceptual Organization of Depth 9 edge meet at the depth of the edge, d0. There are many surface events for which this is the case: reflectance edges, cast shadows, and creases in the surface, to name just three. When the feature originates from a continuous manifold, as in these cases, interpretation is simple, as both sides of the edge are assigned the same depth, d0. The second class of interpretations occurs when the edge corresponds to an object boundary, and therefore represents a depth discontinuity (see figure 4). In this case, one side of the edge lies at the depth of the occluding object, and the other side of the edge lies at the depth of the background. Therefore, the visual system must assign different depths to the two sides of the edge. How can the visual system assign two depths, when it is given only one disparity, d0? The answer is that it only assigns a unique depth to the occluding side. The critical insight is the following: The depth measurement acquired at an occluding edge only specifies the depth of the occluding surface. The visual system assigns depth d0 to the occluding surface. All that it knows about the other side is that it must be more distant than the occluding surface. If the more distant surface is untextured, then it could be at any depth behind the occluder and the local image data would remain the same. By contrast, if the depth of the occluding surface varies, the disparity carried by the object boundary must also change, because the occluding surface “owns” the contour (Koffka, 1935, Nakayama, Shimojo and Silverman, 1989) and is therefore responsible for the disparity associated with the edge. Although the visual system cannot uniquely derive the depth of the occluded side (i.e. the background) from the local disparity computation, there is one critical piece of information that it does have, and that is that the occluded side is more distant than the occluder. There is no way for an occluding object to be more distant than the background Roland Fleming & Bart Anderson: Perceptual Organization of Depth 10 that it occludes. If the background is brought closer than the object, then the background becomes the occluding surface, and carries the edge with it. In this way, occlusion introduces a fundamental asymmetry into the interpretation of disparity-carrying edges: the occluded side of the edge can be at any distance greater than d0, but neither side can be nearer than d0. We can summarise the possible depth assignments (from the occlusion and nonocclusion classes just described) in the form of a constraint on the interpretation of local disparity-carrying contrasts, which is termed the “contrast depth asymmetry principle” (Anderson, submitted; see also Anderson, Singh and Fleming, 2002): Both sides of an edge must be situated at a depth that is greater than or equal to the depth carried by that edge. Although this geometric fact is simple in form, it can have pronounced effects on the global interpretation of images, when the constraint applies to all edges simultaneously. We will now run through an example to show how the principle can explain the asymmetric changes in perceived surface structure that occur when near and far disparities are inverted. 1.4 Application of the contrast depth asymmetry principle. In order to demonstrate the explanatory power of the contrast depth asymmetry principle (hereafter CDAP), we will now use it to account for the demo in figure 3. Recall that when the diamonds carry near disparity, they float freely in front of the background, and Roland Fleming & Bart Anderson: Perceptual Organization of Depth 11 the illusory triangle tends to be seen as figure. When the disparity is reversed, however, the diamonds drag the background back with them, and the triangle appears as a hole. This asymmetry in surface layout is depicted in figure 3b. Let us first consider the case in which the diamonds appear to float in front. The visual system has to interpret the disparity signals carried by the edges of the diamonds. The CDAP requires that both sides of the diamonds’ edges (i.e. the black inside and the white outside of the diamonds) have to be at least as distant as the edges. Now consider the inducers, which are more distant than the diamonds. The constraint requires both sides of these edges to be at least as distant as their edges. This means that all of the black interior of the inducers must be at least this distant and, more importantly, all of the white background must be at least this distant, which is further than the disparity of the diamonds. If all of the white background is further than the diamonds, then the edges of the diamonds must be occluding edges, and the black interior of the diamonds must be an occluding surface. This explains why the diamonds are seen as independent occluders, floating in front of the large white background and black inducers: the edges of the inducers drag the white background back, leaving the diamonds floating in front. Now consider the case in which the diamonds are more distant than the inducers. Again, the CDAP requires that both the inside and the outside of the diamonds have to be at least as far back as their disparity dictates. This means that both the diamonds and their white background are dragged back to the more distant disparity. Now consider the inducers, which carry a relatively near disparity. Because the white background behind the diamonds has been dragged back with the diamonds, the inducers and their white background must be occluding surfaces. This means that the background immediately Roland Fleming & Bart Anderson: Perceptual Organization of Depth 12 surrounding the diamonds must be visible through a hole in the occluding surface. The edges of this hole are the illusory contours of the Kanizsa figure. Note again, the fact that both sides of every edge have to be at least as far as the edge, leads to asymmetrical surface structures when disparities are inverted. This is just one example that shows how the CDAP can account for asymmetrical effects of relatively near and relatively far disparities on perceived surface layout; because the CDAP is derived from the geometry of occlusion, it can account for a very large number of displays, and can be used to generate surprising new displays (see Anderson, 1999; Anderson, submitted). 2. Occlusion and camouflage: hallucinating the invisible The central thesis of this chapter is that the visual system does not merely record depth at each location in the visual field; rather, it actively organizes its depth measurements into functionally valuable units. In the last section, we discussed how occlusion plays a key role in this organization. In this section, we discuss how the visual system handles what is arguably the hardest problem posed by occlusion: the visual representation of structures that are hidden and are therefore completely invisible. If seeing depth is about representing the actual layout of objects in the environment, then all portions of the objects must be represented, even those that are hidden from view: hidden portions do not disappear from the environment just because they do not appear in the image. Therefore, the visual system has to go beyond local image data to construct representations of Roland Fleming & Bart Anderson: Perceptual Organization of Depth 13 hidden structures. We will now discuss how the environmental conditions of occlusion and camouflage predict properties of the construction process. 2.1 Modal and amodal completion We will consider two major ways in which parts of the scene can become invisible. The first is simple occlusion, when an opaque object obscures part of a more distant object. When this happens, the occluded structures of the more distant object have no corresponding features in the image, and thus the visual system must somehow ‘reconstruct’ the missing data. The second way that viewing conditions can lead to invisible structures is through camouflage. In camouflage it is the nearer, occluding surface that is rendered invisible because it happens to match the color of its background. Because the boundaries of the camouflaged object do not project any contrast, they have no corresponding features in the image and thus the nearer object is effectively invisible. Under these circumstances, the visual system must actively ‘hallucinate’ the invisible structures. In both cases, the visual system interpolates missing data, a process that is known as “visual completion”. This process is important to depth perception because it is one of the means by which the visual system organizes its depth measurements into meaningful bodies. We argue that depth perception and unit formation are intimately intertwined, for depth constrains the perceptual units that are formed, and perceptual organization influences the interpretation of local depth measurements. The phenomenal quality of completed structures differs, depending on whether it is near (camouflaged) or far (occluded) structures that are interpolated. In the case of camouflage, the interpolation leads to a distinct impression of a contour or surface across Roland Fleming & Bart Anderson: Perceptual Organization of Depth 14 the region of missing data. This is referred to as “modal completion” (Michotte, Thines and Crabbe, 1991/1964) because the experience is of the same phenomenal modality as ordinary visual experience. An illusory contour, for example, is crisp, and subjectively similar to a real contour, as can be seen in figure 5a. In contrast to this, the sense of completion experienced with occluded structures is less distinct. The black form in figure 5b tends to be seen as a single object, part of which is hidden, rather than as two distinct objects, whose boundaries coincide with the boundary of the grey occluder. There is a compelling sense that the two visible portions of the black form belong to the same object, and that that object continues in the space behind the occluder. However, this impression, although visual in origin, is not of the same phenomenal mode as normal and modal contours, and is therefore referred to as “amodal completion” (Michotte et al.). In general, the regions of the image which are visible, and lead to visual completion are referred to as “inducers”. 2.2 The identity hypothesis. There is a vast literature on visual completion and a thorough discussion of all the issues is beyond the scope of this chapter. One important issue that is discussed in greater detail in chapter [chapter number for Shimojo], is whether visual completion occurs relatively early or late in the putative processing hierarchy. However, the perceptual organization of depth has a direct bearing on another current debate, specifically, the extent to which modal and amodal completion are the consequence of a single process. This issue is intimately bound to depth perception because it determines the extent to which depth processing and perceptual organization are independent. Roland Fleming & Bart Anderson: Perceptual Organization of Depth 15 The debate runs roughly as follows. On the one hand there has been the strong claim that a single completion mechanism is responsible for both modal and amodal completion. According to this account, perceptual organization (including visual completion) produces perceptual units, and an independent process places those units in depth. The theory states that psychological differences between modal and amodal completion results from the final depth ordering of the completed forms (Kellman and Shipley, 1991; Shipley and Kellman, 1992; Kellman, Yin and Shipley, 1998) rather than a difference between the completion processes themselves. This is known as the “identity hypothesis”. On the other hand the two processes could be largely independent, subject to different constraints and subserved by distinct neural mechanisms. The strong form of this “dual mechanism” hypothesis would be that the two processes are of a fundamentally different kind, for example, that modal completion is largely data-driven, while amodal completion is essentially “cognitive”. To anticipate, although we do not subscribe to the strongest form of the dual-mechanism hypothesis, we will provide evidence that modal and amodal completion follow different constraints and argue that they are subserved by distinct neural processes. Central to the arguments that we present are the geometric and photometric conditions under which occlusion and camouflage actually occur in the environment. The principle evidence for the identity hypothesis has been that subjects perform similarly with modally and amodally completed figures in a variety of tasks. In one task, Shipley and Kellman (1992) varied the spatial alignment of the inducing elements in both modally and amodally completed squares. Such misalignment is known to weaken the sense of completion, as the completed boundary is forced to undergo an inflection. Roland Fleming & Bart Anderson: Perceptual Organization of Depth 16 Subjects were asked to rate the subjective strength of visual completion as a function of the degree of misalignment for modal and amodal versions of the display. Shipley and Kellman (1992) found that ratings declined at the same rate as a function of misalignment for both modal and amodal figures. This has been interpreted as evidence that a single mechanism is responsible for both forms of completion. Using a more rigorous method, Ringach and Shapley (1996) performed a shape discrimination task with modal and amodal versions of a Kanizsa figure. By rotating the inducing elements, the vertical contours of the completed square can be made to bow out (creating a “Fat” Kanizsa), or curve in (creating a “Thin” Kanizsa). Subjects were asked to discriminate between Fat and Thin versions of the display while the angle through which the inducers were rotated was varied. Ringach and Shapley found that discrimination performance as a function of rotation was nearly identical for modal and amodal versions of the display, a finding which is consistent with the identity hypothesis. One problem with this type of evidence is that it relies on negative results, that is, a failure to detect a difference, which could be due to the method rather than a fundamental property of the system being studied. Should positive evidence be provided that modal and amodal completion are subject to different constraints, or result in different perceptual units, then the identity hypothesis would no longer be tenable. There are two major reasons for believing that modal and amodal completion should be subject to different constraints, both of which are related to the environmental conditions under which occlusion and camouflage occur. First, occlusion occurs over greater distances across images because it only requires that one object is in front of another. Camouflage, on the other hand, requires a perfect match in color between the 17 Roland Fleming & Bart Anderson: Perceptual Organization of Depth near surface and its background, and thus occurs less frequently in general. This difference is reflected in a constraint on the image distances over which modal and amodal completion occur, which was first documented by Petter (1956). Petter used a class of stimuli now known as spontaneously splitting objects (SSOs), which consist of a single homogeneously colored shape, such as the one shown in figure 5c, that tends to be interpreted as two independent shapes, one behind the other. Which object is seen in front tends to oscillate with prolonged viewing. However, which shape is seen in front first, and which tends to be seen in front for a greater proportion of the time can be predicted rather well from the lengths of the contours that must be interpolated. Petter’s rule states that longer contours tend to be completed amodally, while shorter contours tend to be completed modally. Thus, which figure is seen in front can be predicted from the length of the contours that must be completed. If the two types of completion are subject to different constraints on the distances over which they occur, this opens the possibility that they are subserved by different mechanisms. A second reason for believing that modal and amodal completion are subject to different constraints relates to the color conditions that are required for occlusion and camouflage to occur. Again, occlusion can happen between objects of any color. The reflectance of the near object is unrelated to the fact that it hides the more distant one from view. This suggests that amodal completion should not be sensitive to the luminance relations between the image regions involved. Camouflage, by contrast, requires a perfect match in luminance between the near and far surface. This implies that modal completion should be sensitive to the luminance relations between the image regions involved. Roland Fleming & Bart Anderson: Perceptual Organization of Depth 18 Recent experimental work has shown that this luminance sensitivity can lead to large differences between modal and amodal displays (Anderson, Singh, Fleming, 2002). Anderson et al. created displays consisting of two vertically separated circles filled with light and dark stripes, as shown in figure 6. The binocular disparity of the circles was kept constant, but the disparity of the light/dark contours inside the circles was altered to place the stripes behind or in front of the circular boundaries. When the stripes were further than the circles, the top and bottom stripes tended to complete amodally to form a single continuous dark and light surface, which appeared to be visible through two circular holes, as schematised in figure 6d. This percept occurred irrespective of the luminance of the region surrounding the circles. By contrast, when the disparity placed the contours in front of the circles, the dark and light stripes separated into different depth planes. The way in which the stripes separated from one another depended on the luminance of the surround. When the surround was the same color as the light stripes, the light stripes appeared to float in front and completed modally across the gap between the two circles. In this condition, the dark stripes completed amodally underneath the light stripes to form complete circles. This lead to an impression of light vertical stripes in front of dark circles, as schematised in figure 6e. However, when the surround was the same luminance as the dark stripes, the percept inverted, such that the dark stripes appeared to float in front of light disks. This demonstrates a fundamental dependence on luminance that was not present in the amodal version of the display. Furthermore, if the surround was an intermediate grey, then the display was not consistent with camouflage, as neither the light nor the dark stripes perfectly matched the luminance of the background. Under these conditions, there Roland Fleming & Bart Anderson: Perceptual Organization of Depth 19 was no modal completion across the gap, and the percept was difficult to interpret. This demonstrates that modal completion is sensitive to luminance relations, while amodal completion is not. Anderson et al. showed that this luminance sensitivity could affect performance on basic visual tasks such as vernier acuity. The stripes in the top and bottom circles can be horizontally offset (i.e. misaligned slightly), without destroying the sense of completion. Subjects were asked to report in which of two displays the contours were slightly misaligned. Both modal and amodal completion facilitate performance in this task. However, in the amodal case performance was unaffected by the luminance of the surround, while in the modal case, performance was much worse when the luminance of the surround was an intermediate grey (the condition in which the stripes do not complete across the gap). Thus, modal and amodal completion are subject to different constraints, both on the distance over which they occur, and the luminance conditions that are required to induce them. This positive evidence for a difference between modal and amodal completion uses essentially the same types of task as the negative evidence that had previously been used to support the identity hypothesis. 2.3 Visual completion and the perceptual organization of depth. The geometric and photometric differences between modal and amodal completion are derived directly from the environmental conditions of occlusion and camouflage. Because occlusion and camouflage occur under different circumstances, they have different consequences for the organization of depth into meaningful bodies. In fact, the differences can be exploited to generate stimuli in which modal and amodal completion 20 Roland Fleming & Bart Anderson: Perceptual Organization of Depth lead to different shapes. This is important as it shows that unit formation is intimately bound to the placement of structures in depth. The greater “promiscuity” of amodal completion is the key in the generation of these displays. Figure 7 is a recently developed stereoscopic variant of the Kanizsa configuration in which the inducing elements are rotated outwards (Anderson et al., 2002). When the straight segments (the “mouths” of the “pacmen”) are placed in front of the circular portions of the inducers, the impression is of 5 independent illusory fragments that float in front of 5 black disks on a white background. However, when the two eyes’ views are interchanged, and thus the straight contours are placed behind the circular segments, the impression is rather dramatically altered. With the disparity inverted, the impression is of a single amodally-completed, irregularly-shaped, black figure on a white background, which is visible through 5 holes in a white surface (these percepts are schematised in figures 7b and c). Thus, the former case consists of a total of 11 surfaces (5 fragments + 5 disks + white background), while the latter case consists of 3 (1 white surface with 5 holes + 1 black shape + white background). Clearly the placement in depth has a considerable effect on what perceptual units are formed. Anderson et al. also provided evidence that differences between modal and amodal interpolation can lead to differences in the very shapes of completed contours themselves. When the left-hand stereopair in Figure 8a is uncross fused, the resulting percept consists of six circular disks that are partly occluded by a jagged white surface on the right-hand side, as schematised in figure 8b. However, when the disparities are inverted (by uncross-fusing the right pair of Figure 8a), the modal completion across the regions between the four black blobs tends to take the form of a continuous wavy contour Roland Fleming & Bart Anderson: Perceptual Organization of Depth 21 that runs down the center of the display. This percept is schematised in figure 8c. The importance of this demonstration is that it shows that modal and amodal completion can not only result in different surface structures, but even in differently shaped contours. It is difficult to see what the concept of a single completion mechanism serves to explain if the two processes can result in different completed forms. Ultimately, the identity hypothesis is a claim about mechanism and can therefore be assessed physiologically. There is a considerable body of evidence for extrastriatal units that are sensitive to illusory, but not to amodally-completed, contours (see chapter [chapter number for von der Heydt], this volume, for a review). A critical additional piece of evidence was provided recently by Sugita (1999), who found cells in V1 that respond to amodal completion across their receptive fields, but not to modal completion. Cells responded weakly when presented with two unconnected edges; holes and occluding surfaces on their own; and stimuli in which two unconnected edges were separated by a hole. However, when the cells were presented with two edge fragments separated by an occluder (a stimulus that leads to amodal completion of the edge), the cells responded vigorously. This shows that at the earliest stages of cortical processing, there is a double dissociation between the representations of modal and amodal structures, a conclusion which supports the dual mechanism hypothesis. 3. Transparency, scission, and the representation of multiple depth planes. Roland Fleming & Bart Anderson: Perceptual Organization of Depth 22 Transparency poses a particularly interesting problem in the perceptual organization of depth. With transparency, one object is visible through another, and thus two distinct depths lie along the same line of sight (see Figure 9). If the visual system is to represent depth in terms of the actual surfaces of the environment, it has to depict two distinct depths at a single location in the visual field. The process of projection compresses the light arriving from the transparent surface and the light arriving from the more distant surface into a single image intensity on the retina. In order to represent both surfaces, the visual system has to separate a single luminance value into multiple contributions, a process known as scission (Koffka, 1935). We argue that scission is a type of perceptual segmentation as it parses the representation of depth into distinct surfaces. However, rather than segmenting neighbouring locations into distinct objects, scission separates depth into layers, or planes, and thus operates “parallel” to the image plane. Scission poses the visual system with two principle problems. The first is to identify when a single luminance results from two distinct depths. The second is to assign surfaces properties correctly at the two depths. By studying when and how we see transparency, we can learn how the visual system scissions depth into layers. Much of the seminal work on perceptual transparency was conducted by Metelli (1970, 1974a,b; see also Metelli et al., 1985), who provided a thorough quantitative analysis of the color mixing that occurs when one surface is visible through another. When a background is visible through a transparent sheet, only certain geometrical and luminance relations can hold between the various regions of the display (see Figure 9). From these relations Metelli derived constraints that determine whether a region will look transparent or not, and how opaque it will appear if it does look transparent. This is Roland Fleming & Bart Anderson: Perceptual Organization of Depth 23 important as it determines the conditions under which the visual system scissions a single image intensity into multiple layers, and thus how the visual system stratifies its representation of depth. Broadly the conditions required for perceptual scission fall into two classes. The first are the photometric conditions for transparency, which detail the relations between the light intensities of neighbouring regions that are necessary for scission. The second set of conditions for perceptual scission are geometrical, or figural. Depth only separates into layers when these relations hold between the various regions of the display. 3.1 Photometric conditions for scission. Consider the display shown in figure 9a, which tends to be seen as a bipartite background that is visible through a transparent filter. The vivid separation of the central region into two depths only occurs when certain luminance relations hold. Metelli derived two constraints on the photometric conditions required for perceptual scission. The intuition behind the first constraint, which we refer to as the “magnitude constraint”, is that a transparent medium cannot increase the contrast of the structures visible through it. The consequence of this constraint is that the central diamond must be lower contrast than its surround in order to appear transparent, as shown in Figure 9a. This constraint is important as it restricts the conditions under which scission occurs: a region can only scission if its contrast is less than or equal to the contrast of its flanking regions. As can be seen from figure 9c, infringement of this constraint with respect to the central diamond prevents the central disk from undergoing scission. However, in this display, the constraint is satisfied for the region surrounding the diamond, and thus, the Roland Fleming & Bart Anderson: Perceptual Organization of Depth 24 display can be seen as a bipartite display seen through a transparent filter with a diamond-shaped hole in the centre. The intuition behind the second luminance constraint, which we refer to as the “polarity constraint”, is that a transparent medium cannot alter the contrast polarity of the structures visible through it. Put another way, if a dark-light edge passes underneath a transparent medium, the dark side will remain darker than the light side, no matter what the absolute luminances are. As can be seen from Figure 9d, infringement of this constraint prevents perceptual scission, demonstrating that the visual system respects this optical outcome of transparency. This constraint is particularly important in determining the depth ordering in transparent displays. The polarity constraint enforces certain restrictions on the ordinal relationships between the luminances of neighbouring regions. This means that, in principle, we can classify the locations where neighbouring regions meet to determine whether scission is or is not possible in each region. This provides the visual system with a local signature of transparency. Beck and Ivry (1988) noted that if one draws a series of lines running progressively from the brightest to the darkest regions, there are three possible shapes that result, as shown in figure 10. The only difference between the three figures is the luminance of the region of overlap between the two squares. In the first instance (Figure 10a), the image is bistable, as either square can be seen as a transparent overlay. In these circumstances the lines linking regions of increasing luminance form a Z-configuration. When the lines form a C-shape (Figure 10b), only one of the squares is seen as transparent, and when the lines criss-cross (Figure 10c), the polarity constraint is infringed for all regions, and neither square scissions. Adelson and Anandan (1990) Roland Fleming & Bart Anderson: Perceptual Organization of Depth 25 provided a similar taxonomy based on the number of polarity reversals. A number of lightness illusions demonstrate that scission can be predicted from the class of Xjunctions in the display, and that these X-junctions can have powerful effects on many qualities of our experience (see, for example, Adelson, 1993, 1999). The magnitude and polarity constraints can be unified as a single rule that describes a powerful local cue to scission. Anderson (1997) phrased the rule as follows: “When two aligned contours undergo a discontinuous change in contrast magnitude, but preserve contrast polarity, the lower-contrast region is decomposed into two causal layers”. There are two valuable consequences of this rule. The first is that it unifies the two Metelli constraints. The second is that it provides a local signature of transparency that can be applied to any meeting of contours. This includes those T-junctions that are in fact degenerate X-junctions; that is, those in which two neighbouring regions happen to have exactly the same luminance. Anderson (1997) also demonstrated that a number of traditional lightness phenomena, including White’s effect and its variants, and neon color spreading, can be accounted for as cases of scission, rather than the consequence of traditional “contrast” or “ assimilation” processes. Having identified that a location contains two surfaces, the visual system has to partition the luminance at that location between the two depths. How much of the light is due to reflectance of underlying surface, and how much is due to the properties of the overlying layer? The opacity of the overlying layer determines how the luminance is divided between the two depths. Metelli’s model makes explicit predictions about the perceived opacity and lightness of the transparent layer. The equations predict that two surfaces with identical transmittance should look equally opaque irrespective of their Roland Fleming & Bart Anderson: Perceptual Organization of Depth 26 lightness. However, Metelli himself noted that dark filters tend to look more transparent than light filters with the same transmittance. Why does the visual system confuse lightness and transmittance in partitioning luminance between two depths? In a series of matching experiments, Singh and Anderson (in press) recently resolved this issue. Subjects adjusted the opacity of one filter until it matched the perceived opacity of another filter with a different lightness. Singh and Anderson found that perceived transmittance is predicted almost perfectly by the ratio of Michelson contrasts inside and outside the transparent region, even though such a measure is actually inconsistent with the optics of transparency. As discussed above, there is a general consensus that the early visual processing tends to optimise sensitivity to contrast, rather than absolute luminance. Hence, in assigning transmittance, the visual system appears to use the readily available contrast measurements, even though they are not strictly accurate measurements of opacity. 3.2 Figural conditions for scission. In addition to the luminance conditions, certain geometrical relations must hold between the various regions of the display in order for depth stratification to occur (Metelli, 1974; Kanizsa, 1979/1955). These figural conditions fall in two broad classes. The first class requires good continuation of the underlying layer. Specifically, the contours that are in plain view should be continuous with the contours viewed through the region of presumed transparency. As can be seen from figure 9e, infringement of this condition interrupts the percept of transparency. The second figural condition requires good Roland Fleming & Bart Anderson: Perceptual Organization of Depth 27 continuation of the transparent layer. Figure 9f shows that infringement of this condition weakens or eliminates the percept of transparency. There are conditions in which the figural cues to transparency are so strong that they can override the luminance cues. Beck and Ivry (1988) showed subjects displays like the one shown in figure 10c, in which the region of overlap between the two figures is the wrong contrast polarity for either figure to be seen as transparent. Despite this, naïve subjects did occasionally report seeing such figures as transparent, demonstrating that the sense of figural overlap is a central aspect of the percept of transparency. Certainly most observers are willing to agree that the region of overlap in Figure 10c appears to belong to two figures simultaneously, an impression that can be enhanced with stereo and relative motion. However, it should be noted that the grey of the overlap region does not appear to scission into two distinct sources, at least not in the same way as the overlap of a normal transparency display does (as in Figures 10a and 10b). This leads to the possibility of two distinct neural processes in the perception of transparency. One is driven by relatively local cues and leads to phenomenal color scission. The other is driven by more global geometrical relations, and leads to stratification in depth. Under normal conditions of transparency, the two processes operate concinnously to produce the full impression of transparency. However, using carefully designed cue-conflict stimuli, such as those used by Beck and Ivry, these two factors in the representation of transparent surfaces can be distinguished. An open question, however, is how these processes are instantiated neurally. All we can conclude is that the representation of depth is much more sophisticated than a mere 2D map of depth values. Roland Fleming & Bart Anderson: Perceptual Organization of Depth 28 3.3 Scission and the perceptual organization of depth. Scission can have pronounced effects on perceptual organization. For example, Stoner, Albright and Ramachandran (1990) demonstrated that perceived transparency can alter the integration of motion signals into coherent moving objects. When a plaid is drifted at constant velocity across the visual field, it is typically seen as a single coherent pattern that moves at the velocity of the intersections between the two component gratings. However, with prolonged viewing the plaid appears to separate into two component gratings that slide across each other, each of which appears to move in the direction perpendicular to its orientation. When the plaid is coherent, it appears to occupy a single depth plane, but when it separates into its components, the gratings tend to appear at different depths. Stoner et al. varied the intensity of the intersections of the plaids and measured the proportion of time for which the plaid was seen as coherent. They found that when the color of the intersection was consistent with one grating being seen through the other (i.e. when the junctions are consistent with transparency), the proportion of the time for which the plaid appeared to separate into gratings was greatly increased. By contrast, when the color of the intersections infringed the polarity constraint, such that neither grating could be seen as transparent, the pattern tended to be seen as a coherent plaid, rather than undergoing scission into distinct layers. This demonstrates that scission has important consequences for the representation of visual structure. When an image region scissions, the effects can spread to regions distant from the local cues to scission. Scission acts as a nexus between depth and other visual attributes. Scission of depth can cause regions to change in apparent lightness, and conversely changes in Roland Fleming & Bart Anderson: Perceptual Organization of Depth 29 luminance can cause changes in depth stratification. Figure 11 (taken from Anderson, 1999) demonstrates this close relationship between luminance, scission and the perceptual organization of depth. Three circular patches of a random texture were placed on a uniform background. Critical to the demonstration is that disparity is introduced between the circular boundaries and the texture inside the circles. When the disparity places the texture behind the circular boundaries, the circles appear as holes, through which the texture is visible. The texture tends to appear as a single plane with continuously, stochastically varying lightness. However, when the disparity places the texture in front of the circular boundaries, the percept changes considerably. The texture separates into two distinct layers: a near layer made of clouds with spatially varying transmittance, and a far layer that is visible through the clouds, which consists of uniform disks on a uniform background. Another interesting property of this display is that the lightness and spatial structure of the clouds and disks reverse completely when the luminance of the surround varies. In figure 11, the top and bottom displays are completely identical except for the lightness of the surround. When the surround is dark, the texture scissions into dark, smoke-like clouds in front of white disks. However, when the surround is white, it is the light portions of the texture that move forward, floating like mist in front of dark disks. One final observation about the display is that when the texture carries near disparity, and thus undergoes scission, the clouds that float in front tends to complete modally across the gaps in between the disks. This is in part due to the fact that the conditions for camouflage are satisfied, as discussed in section 2. Roland Fleming & Bart Anderson: Perceptual Organization of Depth 30 When the depth is reversed in the display, two asymmetries occur. The first is geometrical in that it alters the structure of the depths in the scene. In the near case the texture scissions into two layers, while in the far case the texture appears relatively uniform in depth by comparison. The second asymmetry that occurs with depth inversion is photometric in that it is driven by the luminance of the surround and determines the lightness of the cloud and disks. When the texture is distant, the percept changes very little with changes to the luminance of the surround; by contrast, when the texture is near, the luminance of the surround critically determines how the scission occurs as well as the lightness of the cloud and disks. In what follows, we will use the contrast depth asymmetry principle (CDAP) discussed in section 1 and the concept of scission to explain theses asymmetries. For a more thorough discussion see Anderson (submitted). Let us first consider the case in which the texture carries far disparity relative to the circular boundaries. Because the texture is continuously varying in luminance, it carries localizable disparity signals at almost every location. Put another way, if disparity is carried by contrast, as argued in section 1, then patterns that are richly structured bear the densest distribution of disparities. Recall that the CDAP requires both sides of every contrast to be at least as distant as the disparity carried by the contrast. This means that when the texture is given far disparity (or more precisely, when the contrasts of the texture are given far disparity), both the light and dark matter in the texture recede to this depth. In turn, the depth-placement of the texture uniquely determines the border- ownership of the boundaries of the disks, which carry relatively near disparity. If the insides of the disks (i.e. the texture) carry far disparity, then the outsides (i.e. the region surrounding the disks) must be at the depth carried by the circular boundary. Thus, the Roland Fleming & Bart Anderson: Perceptual Organization of Depth 31 circles are seen as holes in the surrounding surface; it is through these holes that the texture is visible. The situation is more complex when the depth is reversed, i.e. when the contrasts of the texture are nearer than the contrast of the circular boundaries. Crucial to the following argument is that it is contrasts that carry disparity, while it is the light and dark regions that make up the contrasts to which depth is assigned. First let us consider the circular boundary between the surround and the texture. When the surround is light, it is the dark portions of the texture (inside the circles) that contrast with the surround. Thus, the disparity of the circular boundary is carried by the contrast between the light matter of the surround, and the dark portion of the disk. The CDAP requires both of these regions to be at least as distant as the disparity carried by the boundary. This means that the light surround is dragged back to this depth, and the dark matter of the texture is also dragged back to this depth. Now consider the contrasts between the dark and light portions within the texture. These contrasts carry relatively near disparity. But the contrast between the dark matter and the surround has already constrained the dark matter to be at least as distant as the circular boundary. This means that it must be the light matter of the texture that is responsible for the near disparity of the texture — i.e. the light matter is a near surface that partly obscures the dark matter. This explains why the texture splits into two depths: the dark matter is dragged back by forming a contrast that carries far disparity (i.e. the boundary of the disk) and the light matter floats in front as its boundaries with the dark matter carry near disparity. The final logical step in the explanation involves scission. The texture does not consist of only two luminances, but of a continuous range of luminances from light to Roland Fleming & Bart Anderson: Perceptual Organization of Depth 32 dark. How can we explain the appearance of the intermediate luminances in the texture? Scission makes it possible to separate the intermediate luminances into two distinct components: dark “stuff”, and light “stuff”, which have been compressed into a single luminance by the process of projection onto the retina. These two components lie in different depth planes. Put another way, scission allows the visual system to interpret the grey regions as dark matter viewed through light matter. The critical insight is that it is the dark “stuff” in the texture that forms the contrast with the surround. Therefore, all of the dark stuff belongs to the more distant depth, including the dark stuff “in” the greys. All of the “remaining” lightness in the greys belongs to the transparent clouds that float in front of the disks. In this way, the intermediate luminances are interpreted as varying degrees of transmittance of the overlying layer. The lighter the grey, the thicker the cloud; the darker, the sparser. This explains why the disk appears as a uniform black disk: all of the black is “sucked out” of intermediate regions and is dragged back to form the disk. The “left-over” lightness is attributed to the transparent clouds. The whole argument reverses when we change the surround from light to dark. When the surround is dark, it is the light portions of the texture that contrast with the surround, and therefore, it is the light portions of the texture that are dragged back. The near disparity of the texture must therefore be due to the dark regions, and thus dark clouds are seen to float in front of white disks. Again, as it is the whiteness of the texture that is dragged back, all of the whiteness in the intermediate luminances is attributed to the more distant disks. The “remaining” darkness in the greys is attributed to the dark clouds that float in front. In this way, changing the luminance of the surround changes which contrasts carry the disparities, and thus which regions are dragged back by virtue Roland Fleming & Bart Anderson: Perceptual Organization of Depth 33 of the CDAP. Scission enables the visual system to separate luminances into multiple contributions and thus segment the intermediate greys into two distinct depth planes. This demonstration and others like it are important as they show how multiple processes interface to determine our percepts of depth and material quality. It is through the CDAP and scission that the visual system interprets local variations in luminance as meaningful surfaces located in depth. Depth stratification complements traditional segmentation as an important process through which the visual system organizes its representation of depth into ecologically valid structures. Conclusions It is common to think that depth perception involves little more than determining the depth at each location in the visual field. We have argued, to the contrary, that the visual system mirrors the structural organization of the environment by tying its representation of depth to surfaces and objects. Thus depth perception is an active process of perceptual organization, as well as a passive process of acquiring depth estimates. We have argued that luminance, disparity and contrast are some of the basic image features that carry local information about depth, while scission, visual completion and the CDAP are some of the means by which depth is organized into surfaces. In the first section we introduced the CDAP and argued that: (1) disparity is carried by local contrasts (e.g. luminance edges) but assigned to the regions that meet to form the contrasts. Roland Fleming & Bart Anderson: Perceptual Organization of Depth 34 (2) Occlusion introduces a critical constraint on the interpretation of local disparity signals, the CDAP. This constraint requires that both sides of a contrast are at the depth specified by the contrast, or one side could be a more distant occluded surface. In the latter case, the disparity determines the depth of the occluding side. (3) The CDAP imposes a fundamental asymmetry between near and far structures. When simultaneously applied to all edges in a display, the CDAP can explain a number of asymmetrical changes in perceived surface layout that occur with simple inversion of the disparity field. In the second section, we discussed how the visual system deals with structures that are invisible because they are hidden by occlusion or camouflaged against their background. We argued that: (1) The visual system has to actively complete the missing data if it is to accurately segment depth into objects. (2) Consideration of the environmental conditions of occlusion and camouflage predicts (a) that modal completion is sensitive to luminance, while amodal completion is not, and (b) that modal completion tends to occur over shorter distances than amodal completion. (3) As predicted from the environmental differences, distinct mechanisms are responsible for the two types of completion. The differences can be used to generate displays in which the completed forms differ when the disparity field is inverted. Roland Fleming & Bart Anderson: Perceptual Organization of Depth 35 Finally, in the third section, we discussed how scission allows the visual system to represent two depths along the same line of sight, and thus organize depth into layer. We argued that: (1) Certain luminance and figural relations must obtain in order for a region to undergo scission. (2) Scission can have pronounced effects on perceptual organization in regions distant from the local signatures of transparency. Roland Fleming & Bart Anderson: Perceptual Organization of Depth 36 References Adelson, E.H. & Anandan, P. (1990). Ordinal characteristics of transparency. AAAI-90 Workshop on Qualitative Vision, July 29, 1990, Boston, MA. Adelson, E.H. (1999). Lightness perception and lightness illusions, in The new cognitive neurosciences, (M. Gazzaniga, Editor-in-chief), Cambridge, MA: MIT Press. Anderson, B.L. (1997). A theory of illusory lightness and transparency in monocular and binocular images: The role of contour junctions. Perception, 26: 419-453. Anderson, B.L. (1999). Stereoscopic surface perception. Neuron, 24: 919-928. Anderson, B.L. Stereoscopic surface perception: Contrast, disparity and perceived depth. Submitted to Psychological Review. Anderson, B.L., Singh, M. & Fleming, R.W. (2002). The Interpolation of Object and Surface Structure. Cognitive Psychology, 44, 148-190. Anderson, B.L. & Nakayama, K. (1994). Towards a general theory of stereopsis: Binocular matching, occluding contours and fusion. Psychological Review, 101: 414-445. Beck, J. & Ivry, R. (1988). On the role of figural organization in perceptual transparency. Perception & psychophysics, 44: 585-594. Bruce, V., Green, P.R. & Georgeson, M.A. (1996). Visual Perception (3rd Edition). Hove, East Sussex, UK: Psychology Press. Cornsweet, T.N. (1970). Visual Perception. New York: Academic Press. DeValois, R.L. & DeValois, K.K. (1988). Spatial Vision. New York: Oxford University Press. Roland Fleming & Bart Anderson: Perceptual Organization of Depth 37 Hartline, H.K. (1940). The Receptive Fields of Optic Nerve Fibres. American Journal of Physiology, 130: 690-699. Howard, I.P. & Rogers, B.J. (1995). Binocular vision and stereopsis., New York: Oxford University Press. Hubel, D.H. & Wiesel, T.N. (1962). Receptive fields, binocular interaction and functional architecture of monkey striate cortex. Journal of Physiology, 160: 106154. Jones, J. & Malik, J. (1992). A computational framework for determining stereo correspondence from a set of linear spatial filters. Image and Vision Computing, 10: 699-708. Julesz, B. (1960). Binocular depth perception of computer generated patterns. Bell System Technical Journal, 39: 1125-1162. Julesz, B. (1971). Foundations of cyclopean perception., Chicago, IL: University of Chicago Press. Kellman, P.J. & Shipley, T.F. (1991). A theory of visual interpolation in object perception. Cognitive Psychology, 23: 141-221. Kellman, P.J., Yin, C. & Shipley, T.F. (1998). A common mechanism for illusory and occluded object completion. Journal of Experimental Psychology: Human Perception & Performance, 24: 859-869. Koffka, K. (1935). Principles of Gestalt Psychology. Harcourt, Brace and World: Cleveland. Kanizsa, G. (1979/1955). Organization in Vision. New York: Praeger. Roland Fleming & Bart Anderson: Perceptual Organization of Depth 38 Marr, D. & Poggio, T. (1976). Cooperative computation of stereo disparity. Science, 194: 283-287. Marr, D. & Poggio, T. (1979). A computational theory of human stereo vision. Proceedings of the Royal Society of London (B), 204: 301-328. Metelli, F. (1970). An algebraic development of the theory of perceptual transparency. Ergonomics, 13: 59-66. Metelli, F. (1974a). The perception of transparency. Scientific American, 230: 90-98. Metelli, F. (1974b). Achromatic color conditions in the perception of transparency, in Perception: Essays in Honor of J.J. Gibson, (R.B. MacLeod, H.L. Pick, eds.). Ithaca, NY: Cornell University Press. Metelli, F., da Pos, O. & Cavedon, A. (1985). Balanced and unbalanced, complete and partial transparency. Perception & psychophysics, 38: 354-366. Michotte, A., Thines, G. & Crabbe, G. (1991/1964). Amodal completion of perceptual structures, in Michotte’s experimental phenomenology of perception., (G. Thines, A. Costall, & G. Butterworth, eds.), Hillsdale, NJ: Erlbaum, pp. 140-167. Nakayama, K., Shimojo, S. & Silverman, G.H. (1989). Stereoscopic depth. Its relation to image segmentation, grouping, and the recognition of occluded objects. Perception, 18: 55-68. Palmer, S.E. (1999). Vision Science. Cambridge, MA: MIT Press. Petter, G. (1956). Nuove ricerche sperimentali sulla totalizzazione percettiva. Rivista di Psicologia, 50: 213-227. Pollard, S.B., Mayhew, J.E.W. & Frisby, J.P. (1985). A stereo correspondence algorithm using a disparity gradient limit. Perception, 14: 449-470. 39 Roland Fleming & Bart Anderson: Perceptual Organization of Depth Prazdny, K. (1985). Detection of binocular disparities. Biological Cybernetics, 52: 9399. Ratliff, F. (1965). Mach Bands: Quantitative studies on neural networks in the retina. San Francisco, CA: Holden-Day. Ringach, D.L. & Shapley, R. (1996). Spatial and temporal properties of illusory contours and amodal boundary completion. Vision Research, 36: 3037-3050. Singh, M. & Anderson, B.L. (in press). Toward a perceptual theory of transparency. To appear in Psychological Review. Sugita, Y. (1999). Grouping of image fragments in primary visual cortex. Nature, 401: 269-272. Shipley, T.F. & Kellman, P.J. (1992). Perception of partly occluded objects and illusory figures: Evidence for an identity hypothesis. Journal of Experimental Psychology: Human Perception and Performance, 18: 106-120. Smallman, H.S. & McKee, S.P. (1995). A contrast ratio constraint on stereo matching. Proceedings of the Royal Society of London (B), 260: 265-271. Sperling, G. (1970). Binocular vision: A physiological and neural theory. American Journal of Psychology, 83: 461-534. Stoner, G.R., Albright, T.D. & Ramachandran, V.S. (1990). Transparency and coherence in human motion perception. Nature, 344: 153-155. Takeichi, H., Watanabe, T. & Shimojo, S. (1992). Illusory occluding contours and surface formation by depth propagation. Perception, 21: 177-184. Wallach, H. (1948). Brightness constancy and the nature of achromatic colors. Journal of Experimental Psychology, 38: 310-324. Roland Fleming & Bart Anderson: Perceptual Organization of Depth 40 Figure Captions. Figure 1. (a) The two eyes converge by angle α on a point P. Therefore, by definition, P projects to the foveae of both eyes (P’). The Vieth-Müller circle is one of the geometrical horopters, that is, it traces a locus of points in space that project to the equivalent retinal locations in the two eyes, and thus carry no interocular disparity. Point Q is closer to the observer than P (as it falls inside the horopter). Therefore, it projects to different locations on the two retinae (Q’). The difference in the locations of Q’ is the binocular disparity, which can be scaled by the vergence angle, α, to derive depth. (b) When the visual field contains many points, there is a potential ambiguity as to which image features correspond in the two eyes. Correct matches yield correct depth estimates, such as dA. (c) By contrast, false matches yield erroneous depth estimates. Here, the image of point A has been incorrectly matched with the image of point B, leading to an incorrect depth estimate, d*. Figure 2. (a) The image of a square occluding a diamond. A receptive field of limited extent (the ellipse) captures only local information about the scene, here a vertical luminance edge. This local information is ambiguous as many different scenes could have resulted in the same image feature. (b) If disparity is calculated by matching local contrasts, then the edge carries only a single disparity. However, in this case, the light and dark sides of the edge result from two distinct objects and therefore different depths have to be assigned to the two sides of the edge. Roland Fleming & Bart Anderson: Perceptual Organization of Depth 41 Figure 3. Asymmetries in depth interpolation, adapted from Takeichi et al. (1990). (a) When the left stereopair is cross-fused, the diamonds appear to float independently in front of the Kanizsa triangle, as schematised in (b). When the disparity of the diamonds is inverted (by cross-fusing the right stereopair), the diamonds drag their background with them, creating the percept of a triangular hole, even though only the disparity of the diamonds has changed. This asymmetrical change in surface structure can be explained by the contrast depth asymmetry principle (see main text). Figure 4. Adapted from Anderson et al. (2002). A contour which carries a depth signal (e.g. disparity) is inherently ambiguous. Two main classes of world states could have given rise to the contour: the contour could have originated from a single continuous surface (e.g. a reflectance edge or cast shadow), or it could have originated from an occlusion event. In the occlusion case, the border ownership of the contour (i.e. which side is the occluder) is ambiguous. Nonetheless, in all configurations, both sides of the contour are constrained to be at least as far as the depth signal carried by the contour. This introduces a fundamental asymmetry in the role of near and far contours in determining surface structure (see text for details). Figure 5. (a) Modal completion. Most observers report seeing a vivid white triangle in front of three disks and a black triangular outline. The contours of the white 42 Roland Fleming & Bart Anderson: Perceptual Organization of Depth triangle are subjectively distinct, resembling real contours, even though there is no corresponding image contrast, and hence the triangle is “illusory”. (b) Amodal completion. Most observers report seeing a single continuous black shape, part of which is hidden from view by the grey occluder, even though the parts that are hidden from view are, by definition, invisible. (c) A self-splitting object (SSO). Even though the shape is uniform black, it tends to be seen as two forms, one in front of the other. Which form tends to complete modally, and which amodally, depends in part on the distance that must be spanned by the completion (Petter’s law). Figure 6. Adapted from Anderson et al. (2002). Demonstration of dependence of modal completion on surround luminance. When the left stereopairs of (a), (b), and (c) are cross-fused, the stripes tend to amodally complete between the gaps between the circular hole, creating the impression of a single striped surface (like wallpaper) viewed through two apertures, as depicted in (d). irrespective of the luminance of the surround. This occurs However, when the right stereopairs are cross-fused, thus inverting the disparity, only two stripes appear to complete modally, and which stripes complete depends critically on the surround luminance, as depicted in (e). When the surround is dark, as in (a), the dark stripes complete modally; When the surround is light, as in (b), the light stripes complete modally; and when the surround is intermediate, no completion is visible. This demonstrates that modal completion is luminance dependent, while amodal is not. Roland Fleming & Bart Anderson: Perceptual Organization of Depth 43 Figure 7. Adapted from Anderson et al. (2002). (a) Relative depth alters perceptual organization. When the left stereopair is cross-fused, the figure tends to appear as five disks occluded by five distinct image fragments, as depicted in (b); the transparency in (b) is included only so that both depth planes can be depicted simultaneously. When the depth ordering is reversed by cross-fusing the right stereopair, a single irregular black “star” appears to lie on a continuous white background, which is visible through five holes in a continuous overlying layer. In this depth ordering the black shape tends to appear as figure. Figure 8. The serrated-edge illusion, adapted from Anderson et al. (2002). When the left stereopair in (a) is uncross-fused, the resulting percept consists of six circular disks that are partly occluded by a jagged white surface on the right, as depicted in (b). When the right stereopair is uncross-fused, the modal completion of these four black blobs tends to take the form of a single wavy contour that runs vertically down the center of the display, as depicted in (c). Although other percepts are possible, this is an existence proof that depth inversion alone can alter the shape of modally and amodally completed contours. Figure 9. Perceptual transparency. The figure in (a) tends to be seen as a light grey transparent surface in front of a bipartite background, as depicted in (b), and thus two distinct surfaces are visible along the same line of sight. Transparency is only seen when certain relations hold between the various regions of the display. Roland Fleming & Bart Anderson: Perceptual Organization of Depth 44 In (c) the central region is higher contrast than its surround and thus is not seen as transparent. In (d), the polarity of the contrasts is reversed, and again transparency is not seen. In (e), the contour of the underlying layer is not continuous inside and outside the central region, eliminating the percept of transparency. In (f), the contour of the overlying layer is not continuous, which also reduces the percept of transparency. Figure 10. Adapted from Beck and Ivry (1988). The polarity constraint means that transparency manifests itself in distinctive local ordinal relations in luminance. The only difference between the three figures is the luminance of the region of overlap. In (a), the region is dark, and the image is bistable as either square can be seen in front. When this occurs, a line that progressively passes from brighter to darker regions creates a Z-shape. In (b), the overlap is intermediate, such that the line that joins regions of decreasing brightness is C-shaped. When this happens, exactly one of the surfaces appears transparent. In (c), the overlap is light, creating a criss-cross pattern. In this case, neither square appears transparent as the polarity constraint is infringed for both squares. Figure 11. Scission and the perceptual organization of depth; adapted from Anderson (1999). The top and bottom figures are identical apart from the brightness of the surround. When the right stereopair is cross-fused, the figure appears as a single textured plane that is visible through three circular holes. This is seen irrespective of the luminance of the surround. However, when the disparity is reversed (by Roland Fleming & Bart Anderson: Perceptual Organization of Depth 45 cross-fusing the left stereopair), the texture appears to separate into two depth planes. The near layer contains near clouds that vary spatially in thickness or opacity. Through these clouds can be seen three more distant disks, which appear more-or-less uniform in lightness. With this depth ordering, the structure completely reverses with a change in the luminance. In the top case, the dark portions of the texture form the clouds; in the bottom case, the light portions of the texture form the clouds. Scission makes these percepts possible by allowing the visual system to separate the intermediate greys into two distinct contributions. 46 Roland Fleming & Bart Anderson: Perceptual Organization of Depth Figure 1. P (a) Vieth-M ller Circle ! Q Q’ P’ Q’ (b) P’ (c) B A A B dA d* A’ A’ B’ A’ 47 Roland Fleming & Bart Anderson: Perceptual Organization of Depth Figure 2. (a) ? ? (b) wo ima ge rld Roland Fleming & Bart Anderson: Perceptual Organization of Depth Figure 3. (a) (b) (c) 48 49 Roland Fleming & Bart Anderson: Perceptual Organization of Depth Figure 4. Possible depth interpretations Continuous Surfaces Occluding Surfaces matching, disparity computation Local Image Data Roland Fleming & Bart Anderson: Perceptual Organization of Depth Figure 5. (a) (b) (c) 50 Roland Fleming & Bart Anderson: Perceptual Organization of Depth 51 Figure 6. (a) (d) (b) (e) (c) Roland Fleming & Bart Anderson: Perceptual Organization of Depth Figure 7. (a) (b) (c) 52 53 Roland Fleming & Bart Anderson: Perceptual Organization of Depth Figure 8. (a) (b) Serrated edge near (c) Serrated edge far Roland Fleming & Bart Anderson: Perceptual Organization of Depth Figure 9. (b) (a) (c) (d) (e) (f) 54 55 Roland Fleming & Bart Anderson: Perceptual Organization of Depth Figure 10. (a) (b) (c) Roland Fleming & Bart Anderson: Perceptual Organization of Depth Figure 11. 56
© Copyright 2026 Paperzz