Transparency and Occlusion

Transparency and Occlusion
Barton L. Anderson
University of New South Wales
One of the great computational challenges in recovering scene structure from images
arises from the fact that some surfaces in a scene are partially obscured by nearer surfaces
or media. Both occluding and transparent surfaces interrupt the projection of more
distant surfaces, and may be considered two ends of a continuum. Occluding surfaces
completely obscure the surfaces that they occlude, whereas transparent surfaces only
partially obscure the surfaces they overlay. The degree to which a transparent surface
obscures an underlying surface depends on its transmittance (i.e., the proportion of light
that it lets through of the underlying layer). Thus, when the transmittance is zero, the
near surface is an opaque occluder; when it is greater than zero, some light of the
underlying layer is transmitted. In order for the visual system to recover scene structure
in contexts in which the transmittance of a near layer falls between zero and one, it must
decompose the image into a “layered” representation that specifies the presence of
multiple surfaces (or a surface and intervening media) along the same line of sight. This
form of decomposition has been termed scission.
In addition to the relationship between transparency and occlusion, the physical
transformations induced by transparent surfaces are intimately related to the physical
transformations that are caused by changes in illumination. This suggests that possibility
that the decomposition that underlies the perception of transparency may also underlie the
separation of illumination from surface reflectance, and hence, the computation of
surface lightness and/or color. Thus, the topic of transparency is intimately related to two
apparently distinct domains: the computation of occlusion relationships; and the
computation of surface lightness. In this paper, I will describe some recent evidence that
reveals the intimate relationship between scission and the perception of surface opacity,
lightness, and depth, and the impact this research has on theoretical frameworks in vision
more broadly.
The computation of transparency
The topic of transparency research emerged as a fundamental area of vision
research with the seminal work of the Italian psychologist Metelli (1970; 1974a,b).
Metelli developed a model of transparency based on the physical setup he used to
generate of transparent images – namely, a disk with a missing sector (episcotister) that
rotated in front of a two-toned background (see Fig. 1). Metelli derived a simple set of
equations that described the relationship between the reflectance of the underlying
background surfaces, the transmittance of the transparent filter (i.e., the size of the
missing sector), and the reflectance of the transparent layer. Metelli argued that
perceived transparency was well predicted by the physical constraints that must be
satisfied to produce a transparent surface, and thus, embraced an “inverse optics”
approach to modeling visual perception. In particular, Metelli used his episcotister
display to derive a physical model for the transmittance (α) and reflectance (t) of the
transparent surface, which was just a weighted sum of the contributions of the light
transmitted from the underlying layer and that reflected by the episcotister:
p = αa + (1-α)t
(1)
q = αb + (1-α)t
(2)
where p is the region containing the transparent layer that overlaps background a; q is the
region containing the transparent layer that overlaps background b, and α is the
transmittance of the transparent layer (i.e., the proportion of the size of the holes in the
episcotister). These equations can be solved to derive separate expressions for the
transmittance and reflectance of the transparent surface:
α = (p-q)/(a-b)
(3)
t = (aq-bp)/(a+q-b-p)
(4)
To make physical sense, α is restricted to be between 0 and 1, which imposes two basic
constraints on the images that are consistent with transparency: the luminance difference
between the regions of transparency (p-q) must have the same sign as the luminance
difference between the regions in plain view (a-b) (to insure that α is positive); and the
magnitude of the luminance difference of the regions of transparency (p-q) must be less
than or equal to the luminance difference of the surface in plain view (a-b) (to insure that
the transmittance falls between 0 and 1). Perhaps the most salient restriction of the
applicability of these equations is that they can only be used to describe transparent filters
containing a uniform reflectance and transmittance. They make no predictions of
displays containing “unbalanced” transparent filters or media, i.e., forms of transparency
that are not uniform in reflectance and/or transmittance. Note that Metelli’s model is a
purely generative model, i.e., his equations described the (simplified) physics of his
episcotister display. Thus, the issue of whether such equations could be used to predict
when, whether, and how transparency was perceived is an issue that required
psychological experimentation.
Nearly three decades of research into transparency perception seemed to provide
compelling evidence that Metelli’s model successfully predicted when transparency was
and was not perceived (Metelli, 1974a,b, 1985; Metelli, Da Pos, & Cavedon, 1985;
Gerbino, Stultiens, Troost, & de Weert, 1990; Kaasrai & Kingdom, 2001). However, we
have recently argued that that this apparent success stems from the particular
methodologies employed to assess his model. Although Metelli derived separate
expressions for the transmittance and reflectance of a transparent surface, until recently,
no experiments assessed whether these expressions accurately captured human perception
of transparent surfaces. Rather, most experiments typically required observers to adjust
(or otherwise judge) the luminance of a test patch in a display until it appeared to form a
uniform transparent filter; they did not measure whether Metelli’s equations could
quantitatively predict the perceived transmittance and reflectance of a transparent filter.
To provide a more direct quantitative test of Metelli’s model, we (Singh &
Anderson, 2002) recently performed a number of experiments to directly assess whether
his equations correctly predicted the perception of transparency. In these experiments,
observers were required to separately match the transmittance and reflectance of a
transparent filter. Observers viewed displays containing a central sinewave grating
surrounded by a higher contrast sinewave grating of the same frequency, orientation and
phase. To enhance the perception of transparency, binocular disparity was added to the
edges of the central patch, giving rise to a clear percept of a homogeneous transparent
filter on lying on top of a high contrast grating. Metelli’s model predicts that the
perceived transmittance of a transparent filter should only depend on the ratio of
luminance differences (i.e., the luminance range) of the regions of transparency to the
regions in plain view (see equation 3). In one of our experiments, the luminance range of
the region in plain view (the high contrast grating) and the luminance difference of the
region of transparency was held constant; only the mean luminance of the region of
transparency was varied. Observers adjusted the luminance range of a matching pattern
to match the perceived transmittance of the transparent filter with varying mean
luminance values. In this experiment, the luminance range of the region of transparency
and the region in plain view are constant, so Metelli’s equations predict that the
transmittance settings in this experiment should be independent of mean luminance. This
was not at all what was observed experimentally. Rather, observer’s matches are very
strongly dependent on the mean luminance of the transparent region (see Fig. 2). In
particular, observers significantly underestimate the transmittance of light filters, and
systematically overestimate the transmittance of dark filters.
The results of this experiment reveal that human observers are not simply
inverting the physics of transparency to compute the properties of transparent surfaces.
The theoretical importance of this finding should not be underestimated, as one of the
main theoretical views of visual processing assumes that the visual system extracts the
properties of the world by performing computations that invert the image formation
process. These findings provide a striking example where human observers have a very
clear sense of a surface property (the transmittance of a transparent layer) that almost
always generates the physically incorrect answer. Why, then, does the visual system err
in this manner? What information is the visual system using to compute these properties
that causes it to give such incorrect responses?
One of the main problems with Metelli’s model is that it assumes that
transmittance is computed on the basis of the ratio of luminance differences between the
regions of transparency and the regions in plain view.1 However, the earliest stages of
cortex do not seem to have access to raw luminance values. Rather, the information about
image structure seems to be largely transformed into a contrast code. Indeed, when the
data from this experiment are plotted in terms of Michelson contrast, it becomes evident
that the visual system uses contrast to scale the transmittance of a transparent surface,
even though this yields the physically incorrect answer. In our sinusoidal displays,
Michelson contrast provides a good measure of perceived contrast, so works well in
accounting for perceived contrast in our displays. However, Michelson contrast fails to
provide an adequate measure of contrast in more complex spatial patterns containing,
e.g., random patches of achromatic surfaces. Despite these shortcomings, recent work
has shown that observer’s transparency judgments can be well described by the perceived
contrast of images (Robillotto, Khang, & Zaidi, 2002; Robillotto & Zaidi, 2004),
1
Strictly speaking, Metelli’s model is formulated on the basis of reflectance differences.
However, Gerbino et al. (1990) showed that Metelli’s equations could be rewritten as difference
in luminance values, and that the form of these equations were identical to equations (1-4).
suggesting that transparency computations are indeed based on some (yet to be
discovered) measure of image contrast.
In sum, there is now clear evidence that the visual system uses the relative
contrast of image regions to compute the opacity of transparent surfaces. In what
follows, I will consider the implications that this discovery has had in understanding
when the decomposition into a layered representation is initiated, and the consequences
that this decomposition can have on not just the properties of the transparent surface, but
the underlying surface as well.
Anchoring perceived transmittance
In addition to the quantitative failures of Metelli’s model, it was also not easily
generalized to a variety of naturally occurring forms of transparency. Metelli’s model
only described conditions of “balanced” transparency, i.e., conditions where the
transmittance and reflectance of the transparent layer is uniform. However, many
naturally occurring forms of transparency – such as that induced by smoke and fog – are
typically unbalanced, particularly in transmittance. Clearly, a more general framework is
needed to understand the wide variety of ways that transparent surfaces and media can
transform image structure. How does the visual system determine whether a scene is in
plain view or viewed through a transparent layer or medium when the properties of the
transparent layer vary continuously?
If image contrast is the currency through which the properties of transparent
surfaces are computed, then any general theory of transparency perception will have to
use image contrast as one of its main ingredients in the computation of transparency.
However, all natural scenes generate variations in image contrast, and the properties of
the underlying surfaces are unknown. So how can the visual system determine whether a
given image arose from a scene in plain view or a higher contrast scene viewed through a
(contrast reducing) transparent layer? The image data is always consistent with both
possibilities, so something is needed to understand how the visual system determines
when transparency is and is not perceived. I have recently proposed that the visual
system employs a transmittance anchoring principle to determine when transparency is
(and is not) inferred (Anderson, 1999, 2003a). The intuitive content of this principle is
that the visual system assumes the fewest surfaces necessary to account for the image
data. More specifically, it states that the visual system treats the highest contrast region
along surfaces and contours as transmittance anchors that are in plain view. All other
contrast values along such contours and/or surfaces are compared to this anchor region,
and decreases in contrast along surfaces are used to infer the presence of transparency.
More specifically, this theory asserts that if there are reductions in contrast along surfaces
or contours that are geometrically continuous, then scission is initiated and transparency
is perceived. The magnitude of the contrast reduction is used to compute the
transmittance of the overlying (transparent) layer (in proportion to this reduction; i.e., the
greater the contrast reduction, the more opaque the transparent surface will appear). In
the limit, where the contrast of the underlying surface goes to zero, the transmittance of
the overlying surface goes to zero, and the near layer becomes an occluder. Note also
that if a change in mean luminance occurs without a corresponding reduction in contrast,
that such transformations are consistent with a change in illumination, and hence, would
be correctly predicted to appear in plain view in both regions of the image. Thus, the
contrast relationships along surfaces and contours could be used to provide information
about both transparent media and changes in illumination.
We have recently shown that this theory correctly predicts the perception of
transparency in both balanced displays, as well as spatially inhomogeneous media
(Anderson, Singh, & Meng, 2006). In addition, we have shown that transmittance
anchoring also has a temporal component: if the contrast of a texture is modulated in
time, the highest contrast region in the spatio-temporal sequence is treated as a region in
plain view, and the spatio-temporal reductions in contrast appear as the intrusion of
transparent media. Thus, the transmittance anchoring principle appears to provide a
foundation on which to predict when transparency is and is not perceived, and the
perceived transmittance of the transparent layer.
Scission and the perception of lightness
The decomposition of an image region into multiple layers can also have a
significant impact on perceived lightness. In the preceding, I have focused on how
variations in image contrast can be used to determine whether transparent surfaces are
present, and if so, how the opacity of the transparent layer is computed. However, when
multiple surfaces are present along the same line of sight, the visual system must also
compute properties of the underlying surface, such as its lightness. Given the close
relationship between the physical transformations induced by transparent surfaces and the
transformations induced by changes in illumination, there is good a priori reasons to
suspect that a process of scission may play a critical role in the perception of surface
lightness. Although a number of authors have suggested the possibility that scission may
play a critical role in lightness perception (Anderson, 1997; Bergström, 1977; Gilchrist,
1977, 1979; Adelson, 1993), such authors have questioned whether an explicit
decomposition actually underlies lightness perception. More recently, we have been
developing a new paradigm that explicitly reveals the role that scission can play in the
perception of surface lightness (Anderson, 1999; 2003a,b; Anderson & Winawer, 2005).
Traditional lightness studies typically employ homogeneous targets to be judged, and
measure the effect different contexts have on their perceived lightness or brightness. In
such paradigms, it can often be difficult to determine whether scission is occurring, and if
so, whether it is playing any causal role in the perceived lightness of a figure. However,
in the paradigm we have developed, if scission occurs, it is phenomenologically very
explicit, and the effects it has on perceived lightness can be directly experienced.
To see the role that scission can play in lightness perception, consider the image
depicted in Fig. 3. This figure appears to contain white chess pieces viewed through dark
smoke, and black chess pieces viewed through white fog. However, the image regions
containing the chess pieces in the top and bottom of the image are actually absolutely
identical; the only difference between the top and bottom images is the overall lightness
of the surrounds. These images were carefully constructed to satisfy the constraints of
transparency. In the top image, all of the boundaries separating the chess pieces from the
surround are lighter inside the chess pieces than outside (so that the boundary separating
the regions is of constant polarity), whereas the opposite polarity holds for the bottom
figure. In addition to the shift in polarity between the top and bottom figure, there are
also significant differences in the way the magnitude of contrast varies along the borders
separating the chess pieces and their surrounds in the two images. Consider, e.g., the
king in the two images. In the top figure, the greatest contrast between the king and the
surround occurs along the bottom right of the piece, whereas the lowest contrast occurs
along the top. Thus, the transmittance anchoring principle states that the bottom right of
this figure should appear in plain view, which is white; whereas the reductions in contrast
that occurs along the boundaries separating the king from its surround should signal the
presence of a transparent medium that varies in opacity (being most opaque where the
contrast of the boundary is lowest, which here, occurs along the top of the king). A
similar analysis holds for the bottom figure, except now the polarity and magnitude
relationships are reversed. In this image, the highest contrast region along the border
separating the king from its surround occurs along its top, and hence, the transmittance
anchoring principle predicts that this portion of the king (which is dark) should appear in
plain view, and lower contrast regions along the contour should appear partially obscured
by transparent media. In the bottom image, the lowest contrast region of the kingsurround border occurs along the lower right of the image, and thus, the opacity of the
transparent layer should be greatest in this region. This is consistent with what observers
report.
In sum, the perception of transparency and lightness in Fig. 3 reveals the close
relationship between the perception of transparency, occlusion, and lightness. The
perception of transparency involves the decomposition of an image into multiple layers,
and the properties of the layers depend critically on exactly how luminance is partitioned
between them. There is a growing body of data suggesting that the contrast relationships
that occur along surfaces and contours play a critical role in determining when scission is
initiated, as well as determining how surface properties such as lightness and opacity are
attributed to the layers that are formed when this decomposition occurs. Such results
reveal severe limitations on inverse optics models of perception, since the computation of
properties such as the transmittance of transparent surfaces are almost always physically
incorrect. Moreover, there is currently no single measure of image contrast that
adequately captures perceived contrast in arbitrary images, which impedes the ability to
predict the precise conditions that lead to scission and the quantitative consequences that
scission should have on perceptual experience. It is therefore of critical importance to
develop a physical measure of contrast that captures the human experience of contrast, so
that it can be used to develop and assess theories of transparency perception. Finally,
although phenomena such as those depicted in Fig. 3 reveal that scission can have a
dramatic effect on perceived lightness, more research is needed to determine if an explicit
decomposition of images into layers is responsible for the perception of lightness (and
color) in all images.
References
Adelson, E.H. Perceptual organization and the judgment of brightness. Science 262,
2042-2044 (1993).
Anderson, B. L. (1997) A theory of illusory lightness and transparency in monocular and
binocular images. Perception 26, 419–453.
Anderson, B. L. (1999) Stereoscopic surface perception. Neuron, 26, 919–928.
Anderson, B. L. (2003a) The role of occlusion in the perception of depth, lightness, and
opacity. Psychological Review, 110, 762–784.
Anderson, B.L. (2003b) The role of perceptual organization in White's illusion.
Perception, 32, 269-284.
Anderson, B.L., Singh, M., & Meng, J. (2006) The perceived opacity of inhomogeneous
surfaces and media. Vision Research, 46, 1982-1995.
Bergström, S.S. (1977) Common and relative components of reflected light as
information about the illumination, colour, and three-dimensional form of objects.
Scandinavian Journal of Psychology, 18. 180-186.
Gerbino, W., Stultiens, C. I., Troost, J. M., & de Weert, C. M. (1990). Transparent layer
constancy. Journal of Experimental Psychology: Human Perception and
Performance, 16, 3–20.
Gilchrist, A.L. (1977) Perceived lightness depends on perceived spatial arrangement.
Science, 195(4274), 185-187.
Gilchrist, A.L. (1979) The perception of surface blacks and whites. Scientific American,
240, 112-123.
Kasrai, R., & Kingdom, F. (2001). Precision, accuracy, and range of perceived
achromatic transparency. Journal of the Optical Society of America A, 18, 1–11.
Metelli, F. (1970). An algebraic development of the theory of perceptual transparency.
Ergonomic, 13, 59–66.
Metelli, F. (1974a). Achromatic color conditions in the perception of transparency. In:
MacLeod, R. B., Pick, H. L. (Eds.), Perception: Essays in Honor of J. J. Gibson.
Cornell University Press, Ithaca, NY.
Metelli, F. (1974b). The perception of transparency. Scientific American, 230, 90–98.
Metelli, F., Da Pos, O., & Cavedon, A. (1985). Balanced and unbalanced, complete and
partial transparency. Perception and Psychophysics, 38, 354-366.
Robillotto, R., Khang, B., & Zaidi, Q. (2002). Sensory and physical determinants of
perceived achromatic transparency. Journal of Vision, 2, 388-403.
Robillotto, R., & Zaidi, Q. (2004). Perceived transparency of neutral density filters across
dissimilar backgrounds. Journal of Vision, 4, 183–195.
Singh, M., and Anderson, B.L. (2002) Perceptual assignment of opacity to translucent
surfaces: The role of image blur. Perception, 31, 531-552.
Figure Captions:
Fig. 1: The episcotister setup used by Metelli to derive his transparency model. A disc
with a missing sector of size α is rotated at high speed over a two toned background. The
proportion of the open to solid regions of the disc determines the transmittance of the
episcotister, and the reflectance of the solid portion of the episcotister (1-α) contributes
additional luminance in regions of transparency.
Fig. 2. Results from Experiment 1 in Singh and Anderson (2002). Observers adjusted a
matching pattern to appear equal in transmittance to a test patch with a fixed luminance
range and mean luminance. Dashed lines in the left figure indicate the predictions of
Metelli’s model; the matches made to four different luminance ranges with varying mean
luminance are given by the filled symbols for a typical observer. The data exhibit strong
and consistent departures from these predictions. When the data are re-plotted in terms
of Michelson contrast (right figure), the data are independent of mean luminance,
demonstrating that observers us perceived contrast to determine the transmittance of
transparent layers.
Fig. 3: A figure demonstrating how the contrast relationships along contours can be used
to determine the perceived transmittance and lightness of surfaces in a complex scene.
The regions within the boundaries of the chess pieces are composed of identical textures
in the top and bottom image, but are decomposed in complementary ways. In the top
figure, the texture is decomposed into dark clouds that obscure light chess pieces, and in
the bottom, the texture appears as light mist that obscured dark chess pieces. Note that
the highest contrast contour segments in both the top and bottom figure appear in plain
view, which occur in different regions for the two images. See text for details.
Figure 1
Figure 2
Figure 3