Vishwanath, 2005 - Visual Experience Lab

 Springer 2005
Axiomathes (2005) 15:399–486
DOI 10.1007/s10516-004-5445-y
DHANRAJ VISHWANATH
THE EPISTEMOLOGICAL STATUS OF VISION
AND ITS IMPLICATIONS FOR DESIGN
ABSTRACT. Computational theories of vision typically rely on the analysis of two
aspects of human visual function: (1) object and shape recognition (2) co-calibration
of sensory measurements. Both these approaches are usually based on an inverseoptics model, where visual perception is viewed as a process of inference from a 2D
retinal projection to a 3D percept within a Euclidean space schema. This paradigm
has had great success in certain areas of vision science, but has been relatively less
successful in understanding perceptual representation, namely, the nature of the
perceptual encoding. One of the drawbacks of inverse-optics approaches has been
the difficulty in defining the constraints needed to make the inference computationally tractable (e.g. regularity assumptions, Bayesian priors, etc.). These constraints, thought to be learned assumptions about the nature of the physical and
optical structures of the external world, have to be incorporated into any workable
computational model in the inverse-optics paradigm. But inference models that
employ an inverse optics plus structural assumptions approach inevitably result in a
naı̈ve realist theory of perceptual representation. Another drawback of inference
models for theories of perceptual representation is their inability to explain central
features of the visual experience. The one most evident in the process and visual
understanding of design is the fact that some visual configurations appear, often
spontaneously, as perceptually more coherent than others. The epistemological
consequences of inferential approaches to vision indicate that they fail to capture
enduring aspects of our visual experience. Therefore they may not be suited to a
theory of perceptual representation, or useful for an understanding of the role of
perception in the design process and product.
KEY WORDS: 3D shape and space perception, aesthetics, Bayesian inference,
computational vision, design, epistemology, visual perception and cognition
1.
INTRODUCTION
‘‘When it comes to deriving suitable and rigorous concepts and designations for
the various characteristics of our sensations, the first requirement is that these
concepts should be derived entirely out of the sensations themselves. We must
rigorously avoid confusing sensations with their physical or physiological causes,
or deducing from the latter any principle of classification’’
Ewald Hering 1878
400
DHANRAJ VISHWANATH
A standard refrain in the introduction to most undergraduate
textbooks on perception is that vision is not the result of a simple
camera-like process in which the external world is imaged – faithfully – onto the minds eye. Instead, it is often claimed that the first
step towards an understanding of perception is to discard the
notion that what we perceive is an objective view of the external
world. For example, in their highly regarded textbook, Sekuler and
Blake (1986, p. 3) suggest that a distinction has to be made
between ‘‘one’s perception of the world and the world itself’’. What
we perceive should be more correctly thought of as the mind’s
reconstructed 3D representation of the world generated from a
meager 2D image impinging on the retina. Perception textbooks
typically go on to say that dispelling the naı¨ve realist view that ‘‘the
world is exactly as it appears’’ (Figure 1) has historically taken two
opposing approaches: (1) Empiricism, which is best exemplified by
Helmholtz’s theory of unconscious inference. (2) Nativism, which is
best exemplified by Hering and the Gestalt school (see for example
Rock (1984), Sekuler and Blake (1986), Palmer (1999), Turner
(1994)).
We find out that the empiricist believes that our perceptions are
the result of our extensive experience and interaction with the
world, while the nativist believes that our perceptions are entirely
due to the minds innate predisposition to organize the sensory
stimulation in a particular way. The underlying motivation for
both these theories, we are told, is what is known as the poverty of
Figure 1.
Naı̈ve realism.
THE EPISTEMOLOGY OF VISION
401
the stimulus argument: the retinal image highly underdetermines the
structures that it gives rise to it in our percepts.
Take the example of two possible images of a cube (Figure 2).
Evidently, we perceive A as a cube while we perceive B as a square.
But, B is also consistent with an image of a cube. In fact, both
images, assuming Euclidean projective geometry, are consistent
with an infinite class of 3-D shapes. The empiricist’s reasoning for
our stable and unitary percepts in A and B might be as follows:
Through experience we have noted that in the preponderance of
situations that we have encountered a cube, it has appeared to us
as image A. We have rarely been in the position to view it ‘‘headon’’ as shown in B and have only experienced such an image when
encountering a square. We have thus learned to recognize a cube in
A and a square in B. Obviously, the actual story is more complicated. It may, for example, entail the fact that we support these
claims through association with our other senses such as touch. In
more quantitative analysis of such inferential approaches, notions
such as non-accidentalness or generic viewpoint assumptions may
also be brought to bear (see, for example, Barlow, 1961; Nakayama
and Shimojo, 1996; Richards et al., 1996).
The Gestaltist, on the other hand, might say that the reason we
perceive A and B the way we do, should be attributed entirely to
the mind’s innate predisposition to organize each image. There are
no cubes, squares, or surfaces in the world in the folk sense of the
terms – i.e. in exactly the way we perceive them. Rather, what we
see is the result of the spontaneous cortical organization of the sensory flux. Naturally, this does not preclude the possibility for the
organized image to be correlated non-trivially with the physical
structure of environment that gave rise to that image.
Figure 2.
402
DHANRAJ VISHWANATH
The aforementioned textbooks will usually inform us that the
last half century of research have found both these models – taken
by themselves – lacking as theories of perception. Therefore, a compromise between the two must be struck. This new approach, which
is usually a sophisticated variant on the classical theory of constructivism originating from Helmholtz’s notion of unconscious inference, one might generically call Neoconstructivism.1 It is best
characterized in the classic text by Marr (1982) (cf. Palmer, 1999).
The theory is an attempt at an amalgamation of empirical findings
in visual neurophysiology, and computational theories of vision
originating in artificial intelligence, which view perception as a
problem of inference in an inverse-optics framework. In other
words, perception is the inversion of the optical process that generates the 2D image from a 3D environment. It is usually argued that
Neoconstructivism has the appropriate combination of elements
from both empiricist and nativist theories of knowledge. The Neoconstructivist rejects a purely empiricist notion of perception because it can be shown that inverse optics, as well as concept
learning, is impossible unless the visual system has pre-specified
constrains for determining how the image must be processed. To the
Neoconstructivist the Gestaltist’s or nativist position is also deemed
unattractive because it seems in the danger of slipping into a kind of
solipsism (if it’s all in the cortex, where does the real world come
into play?). A perfect compromise for the Neoconstructivist would
be to assume, as a nativist might, that there are indeed innate constraints for processing the image. Constraints which capture objective properties and behavior of world that are learnt from
interacting with the external environment. A few examples might be:
that there exist surfaces, lines, parallel lines, and common objects
shapes; that light impinges from above; that the observer is not
viewing the environment from a special vantage point; and so on
and so fourth. These constraints, along with some tractable form of
learning, are combined with the outputs of early perceptual processes that measure properties of the objects and environment such
as brightness, illumination, distance, direction, size and orientation.
The task of the visual system, then, is to detect, recover or infer
from the 2D retinal image the simplest environmental configuration
that is consistent with these various measurements and constraints.
The requirement for simplicity arises from the well-regarded notion
that nature abhors unnecessary complexity. This principle – of
Occam’s Razor – has been expressed in the perception literature in
THE EPISTEMOLOGY OF VISION
403
such terms as the minimum principle (see Hochberg and McAllister,
1953; Hatfield and Epstein, 1985), minimum length encoding (e.g.
Boselie and Leeuwenberg, 1986), homogeneity and isotropy assumptions (e.g. Knill, 1998), regularity assumptions (e.g. Horn, 1986), and
genericity (e.g. Richards et al., 1996). In much of the literature, these
simplicity assumptions are assumed to be a direct reflection of the wellstructured behavior of the physical world.
A cursory glance at Neoconstructivism may make it appear to
have achieved, simultaneously, a successful rejection of naı̈ve realism and a perfect compromise between a nativist and empiricist
theory of knowledge; an achievement that for many renders moot
any discussion of theories of knowledge. On closer inspection
though, it appears such a conclusion may be premature, because
such a theory usually leads to the question about where the
assumptions or constraints about structure-in-the-world come from,
and how they are encoded. The Neoconstructivist will typically say
that it happens through evolution. In essence, the claim is that that
these assumptions have to be hardwired into the system through
phylogenetic interaction with the objective external world (see
Pinker (1997) for a popular scientific account of this notion; also
see Brain and Behavioral Sciences, Volume 24 for related analysis).
The story might go: different computational ‘‘tricks’’ or ‘‘rules’’
could compete with each other through evolution, until those that
most effectively ‘‘detect’’ the objective external structure are the
ones that are incorporated into the phenotype. But this, one would
submit, betrays an empiricist theory of knowledge applied to the
bootstrapping of hardwired assumptions or constraints.
Any empiricist theory of knowledge, as Hume admirably demonstrated, has to either reach the inexorable conclusion of an idealistic world, or cling to its initial mistaken (naı̈ve realist) belief that
experience can provide objective knowledge of a real world. Since
the central claim of Neoconstructivism is non-idealistic (i.e. the
constraints and assumption actually reflect something objective
about the real world) it reduces, by it’s own claims, to a naı̈ve realist theory. In other words, we find that despite the view espoused
in the introductory paragraphs of the perception texts alluded to
earlier, the theoretical basis for current approaches to perception is
essentially an empiricist, naı̈ve-realist one.
The foundational issues that afflict Neoconstructivist approaches
do not by any means bear on the whole research enterprise of
human and computer vision. Many areas of research can indeed
404
DHANRAJ VISHWANATH
remain agnostic to epistemic assumptions within the theory and, at
least to a point, implicitly assume a naı̈ve realist model. Table 1 is
a partial classification of areas of research in visual science and
perception based on whether or not epistemological issues are critical to such research. Where the foundational issues do raise a red
flag is in any theoretical or empirical research that falls in category
‘‘B’’ which involves the issue of perceptual representation.2 In these
areas of research, the representational scheme that is assumed,
explicitly or implicitly, has a direct bearing on whether the theory
is plausible or not. Note that we only refer to the theories plausibility as a theory of human perception. Undoubtedly, many of the
approaches in column B are quite suited to applications in machine
vision.
The Neoconstructivist approach aligns itself with a theory of
perceptual representation where the fruits of perception are, more
or less, an objective 3D description of the external world; the heavy
lifting of perceptual processes is the inference from a 2D retinal
image to just such a 3D description. Neoconstructivist theories are
ultimately aligned with a notion of representation that involves
‘‘symbolic’’ tokens that signal external world measurements, properties or entities, such as orientation, size, shape, color, surface,
object, part, a face, etc. This symbolic token may take the form of
the firing pattern of an individual neuron or groups of neurons.
The critical assumption is that the properties being signaled are
properties of the real external world that have been learned
through experience, and are not synthetic constructs of perception.
The existence of a symbolic form signaling a property in the brain
indicates that such and such a property, measure, or entity, has
been successfully (or perhaps erroneously) detected from the available sensorium. The informational content of the symbolic form is
necessarily parasitic on the objective information contained in the
external objects that they signal, and thus the representation is a
direct mapping, indeed a faithful image up to some resolution and
hardware limitations, of objective properties in the world.
From an epistemological standpoint, it is perhaps the issue of
information content that has been most overlooked by contemporary theories of perception. The critical importance of defining the
nature of the information content of perception was first broached
by the Gestaltist, and has been most forcefully and elegantly put
forward by Leyton (1992, 1999). Leyton’s theory makes two crucial
points regarding the informational structure of perception: (1) The
Inverse optics; shape recovery as inverse optics
Perception as shape/object recognition
Shape perception as probabilistic inference
Shape recovery of from multiple ‘‘cues’’.
Shape recovery and representation via primitives (e.g. geons)
Shape recovery via heuristics, biases, ‘‘bags of tricks’’,
mimima principles, non-acidentalness, regularity, genericity, etc.
Application of ecological statistics to shape recovery and
representation. Image correlation approaches to shape recovery
and representation (cross correlation,eigén vector, etc.)
Perceptual organization;grouping; grouping principles; shape
recovery and representation via grouping
Perceptual completion, Figure-ground Lightness and brightness,
Parts and wholes
Feature binding, object perception as feature binding
Percieved stability of visual world across eyemovements,
blinks, etc.
Forward optics
Physiological optics
Sensory physiology
Front-end properties of sensory
apparatus (e.g. thresholds, adaptation)
Estimation of spatial properties (direction,
distance, slant, size etc.)
Sensor co-calibration (spatial estimation across
multiple sources of information) including
probabilistic approaches
Spatial acuities and sensitivities (e.g.vernier,
stereo acuities)
Spatial localization and capacity
Attentional allocation, and limitations, in perception
Correlations between perceptual and visuo-motor
estimates of space
B: Research topics affected by epistemological assumptions
A: Research topics that can be neutral to epistemological
assumptions
TABLE I.
THE EPISTEMOLOGY OF VISION
405
406
DHANRAJ VISHWANATH
information content of a percept (it’s causal structure) is constituted internal to the perceptual schema and does not reside in the
external world. (2) The entities and relations used to construct a
representational model cannot be parasitic on entities identified in
the perceptual product, such as lines, surfaces, etc., but rather such
entities have to derive from the representational scheme itself. This
is achieved in Leyton’s model through a purely abstract, nested,
algebraic (group-theoretic) representational schema.
In contrast, let us try to understand the informational content of
a percept as proposed in Neoconstructivist theories by considering
the example of an observer looking at a bend on a road. Under a
Neoconstructivist theory, a bend perceived in the road is the activation of some set of signals that a bend in the road exists, and the
bend exists descriptively in the world in more or less the way that
those perceptual signals or symbols specifies. For example, certain
measures that we may specify the bending road to have, such as
length, curvature, width, distance and direction as well as any of
the ontological categories that we might ascribe to it, such as line
or surface, provide an objective spatial description of the bend in
the road that exists externally, independent of perception, and also
specify the content of those signals or symbols that make up the
percept of the bending road. In other words, both the physical
thing that exists that is the bending road, as well as it’s percept,
should naturally be described using these descriptors. Under a Neoconstructivist theory, we use these spatial and ontological descriptors not because that is the format in which our perceptions specify
the world, but because our perceptions are (more or less) faithful
descriptions of such objective physical properties and entities. In
other words, describing such objective properties and entities in the
world in terms of these ontological categories and spatial attributes
is the only objective way that they can be described. And the
descriptions that can be applied to the fruit of our percepts are also
exactly those descriptions that apply to the physical thing out there
that is the road; perhaps something less (resolution and hardware
limitations), but certainly nothing more. Thus, the epistemology and
ontology of a Neoconstructivist suggest even at first blush to be at
least weakly naı̈ve realist.
Putting aside any generic distaste for naı̈ve realism, what else
might possibly be wrong with a theory of perception like Neoconstructivism? One of the most enduring questions an inferential theory such as Neoconstructivism raises is the following: if a percept is
THE EPISTEMOLOGY OF VISION
407
either an objective measure, or indicator of the existence of a property or entity in the world, then how is this indication psychologically experienced? This question has been vexing to perceptual
researchers from Mach, Hering, the Gestalt school, through to
Gibson. Yet perhaps the most penetrating analysis on the question
of perceptual experience and it’s relationship to the information
content of the percept has been put forth by Leyton (1992). The
question his theory raises and answers is the following: for the
example of the bend in the road given earlier, how is it that we
have a phenomenological sense of the bend itself if the ‘‘indication’’
is merely specifying certain static properties or quantities?
Let us look at more closely at Leyton’s question, using the
example of the sculpture in Figure 3. The sculpture does not represent any familiar object, and the most immediate markers of familiarity are that it is carved out of stone, and is a solid rigid object.
Yet the most phenomenologically striking aspect of it, as Leyton
would point out, is that we can perceptually sense the forces, the
bending, and the bulging. Yet all our direct familiarity cues should
be telling us that such processes are not at work in the object, and
that it is instead a static, stress free, object. One might argue and
say that those perceived forces are merely because the object
‘‘resembles’’, say, a rolled toothpaste tube, or clasped hand, and so
we are not ‘‘experiencing’’ the bending, but merely experiencing the
lighting up of a hierarchical neural symbolic linkage, that might be
Figure 3.
Carving #11 Barry Flanagan, 1981 (from Beal and Jacob, 1987).
408
DHANRAJ VISHWANATH
as follows: ‘‘like a rolled paste tube fi rolling requires force and
action fi that force and action produces internal stresses fi internal
stresses cause stretching of external membrane fi excessive external
stress can cause disruption of the membrane’’. What would a Neoconstructivist theory predict sculptor Barry Flanagan3 would see
when looking at his own finished product. Presumably since he is a
sculptor, and has himself carved the object, his experience should
not make his visual system light up the in above symbolic hierarchy
(even though he may intend such tromp l’oeil in his observers). Instead, since the shape only weakly invokes some sort of familiar
object, while his experience should strongly evoke what the object
itself really is, he should just have activation of the symbolic set that
simply says ‘‘solid, hard, carved, roundish object’’ (let us ignore the
fact that even these have to be cashed out experientially). Indeed, if
he hires an assistant to carve a multitude of the same shape through
his lifetime, that assistant should cease to phenomenally experience
any of the bending and bulging, and his very percept of the object
should change. Any cognitive understanding of it’s similarity to a
rolled toothpaste tube must be post-perceptual. Indeed, an animal
with a visual system comparable to the human visual system, should
have a completely neutral perceptual experience with respect to the
object since it is neither familiar to a known object, or created with a
familiar procedure. The entire informational structure (the sensed
forces, deformations, etc., that Leyton enumerates) is under a Neoconstructivist theory either non-existent or the result of a simple
application of our cognitive experience with objects.
This line of thinking leads to another question that is definitely
a more relevant issue to the theme of this volume. It arises when
we consider what a Neoconstructivist theory of perception has to
say about aesthetics and design. Do we reflexively perceive qualitative differences above and beyond the objective spatial and recognition measures when we view different visual configurations? In
other words, is there a natural reflexive qualitative evaluation that
occurs at the level of the perceptual understanding of a visual
configuration, which is prior to any application of cognitive factors
such as memory, experience, etc? The ubiquitous perceptual evaluation that seems integral to the process of designing and the
experience of a designed product, as well as common visual
phenomenology, suggests that the answer is yes. That such direct
perceptual evaluation is at some level central to the aesthetic experience in art and architecture has been of great interest historically
THE EPISTEMOLOGY OF VISION
409
both in psychology (e.g. analysis by Kant, Klee, the Gestalt
theorists, Arnheim, etc.) as well as artistic movements (e.g. Abstract
Expressionism and Minimalism). More recently, Leyton’s theory of
perceptual representation has taken as its central charge the ability
to explain fundamental aspects of aesthetics.
An inferential theory of perception such as Neoconstructivism
implies that all physically plausible visual configurations are, at the
perceptual level, psychologically equivalent. This assumption of
psychological equivalence in inferential theories of perception is reflected in the fact that qualitative aspects of perception are usually
judiciously sidestepped in favor of measurable ones. The implicit
assumption is that since a functioning perceptual system only faithfully infers what is out in the world (up to limits on hardware), and
does not inject any non-trivial informational structure of its own,
all physically plausible configurations should yield the same perceptual quality; or perhaps, no perceptual quality. Of course cognitive
factors, such as memory, appetite, or experience, might color the
cognitive experience of the object that perception delivers, but the
perceptual act remains neutral, since all it does is indicate that such
and such a thing is out there, in the way that it is out there. Since
the way that it is out there is physically valid (we have already
made this caveat), there is nothing else that can be said about it in
terms of quality. Obviously, sometimes the recovery may be erroneous, but since there is no marker on the percept telling us this, the
erroneous percept is, from perception’s point of view, just as valid.
Any judgment on the appropriateness of a configuration must
come from extra-perceptual considerations (memory, appetite, aversion, experience, etc), what we refer to in this paper as cognitive aesthetics. It is interesting to note that for the aesthetician who wants
to claim that all perceptual preferences are learned, a Neoconstructivist theory works very well, since rather than being a result of the
very act of perception, aesthetic preference is cognitively applied
onto the neutral product of perception.
Yet, the nature of the process and product of design (and art) –
as well as common phenomenology – are convincing evidence that
such perceptual neutrality is not what we typically experience. The
very act of painting and designing, involve choices and manipulation of physical configurations that are deeply connected to perceiving differences in the quality of the configurations. Such differences
are inexplicable within a naı̈ve-realist theory of perception (which
we will hopefully show Neoconstructivism to be). Our experience
410
DHANRAJ VISHWANATH
of what one might call perceptual aesthetics suggests that for a
workable perceptual theory, the differences in perceptual quality
should be deducible from the representational schema that embodies our perceptual system. The notion that the representational
scheme of perception reveals its signature in our perceptual
phenomenology is implicit, historically, in the work of several
researchers (e.g. Hering) and particularly the Gestaltists. Leyton
(1992, 2001) in his theory has rigorously raised and answered many
of the epistemological, phenomenological and aesthetic criteria
implicit in Gestalt theory. Yet surprisingly, these central observations of Gestalt theory are precisely the one that has been
jettisoned from contemporary theories of representation aligned
with Neoconstructivism.
Generically, contemporary vision science has shied away from
tackling the enduring but difficult puzzles of perception that are
tied to phenomenology, epistemology and aesthetics. Much of this
might be attributed to the current lack of resources on the historical lineage of the epistemological and phenomenological problems,
and how they apply to contemporary scientific research in perception. None of the introductory or survey texts used for pedagogy
provide a sustained critique of current approaches and their consequences. This paper is an attempt at filling this gap by bringing
together issues within an epistemological framework that have been
sometimes explicit and sometimes implicit in prior research, and
applied to current approaches to understanding perception within
vision research.
There are six sections to this paper. Through these sections we
will attempt communicate a range of ideas. Most, if not all, have
been expressed before in the literature, starting from natural philosophy of the 18th century, empirical and theoretical research in
vision (notably Hering, the Gestalt theorists and Gibson), and most
particularly Leyton’s theory of shape.4 We will generously borrow
ideas from their analyses provided in these works, to weave an
argument that consists of the following observations:
1. In empiricist theories of vision such as Neoconstructivism (perception as inference, inverse optics, etc.) the critical informational and causal distinction between the 2D image and the 3D
percept, specified by the theory, is erased by the computational
rendering of the theory.
THE EPISTEMOLOGY OF VISION
411
2. The result of Neoconstructivist theories is a computational
model of perception where the percept itself is largely noninformative. In such theories, the percept contains no non-metric
information about the perceived world. The only non-metric
information is generic rather than percept-specific (e.g. the fact
that surfaces are continuous) and such information is entirely
the property of the inferential device. The remaining metric
information is itself not informative outside of the purview of
inter- and intra-sensory calibration, and is especially not, as
often assumed, an objective measure on the external world. All
other information is rendered to be properties of the outside
world; properties which are merely symbolically instantiated in
the inferential device.
3. Theories of perception-as-inference always involve positing
objective measures, attributes and entities to both the sensory
stimulation and the external world. On closer inspection such
attributes (‘‘features’’), measures (‘‘cues’’) and entities (lines,
surfaces, objects) turn out to be subjective descriptors parasitic
on very perceptual structures that they are used to explain.
This results, inexorably, in such theories becoming naı̈ve realist
ones.
4. Standard computational renderings of Neoconstructivist theories conflate sensor co-calibration and object recognition with
perceptual representation. Both calibration and object recognition exhibit characteristics of learning, which are usually taken
by such theories to support an empiricist or constructivist epistemology for perceptual representation.
5. A restricted model of perception as inverse optics that deals
with only inter-sensory and intra-sensory co-calibration issues
is a viable model for a range of empirical research studies in
vision, particularly 3D space perception. Such a model is viable because it takes a strictly behaviorist approach to the
notion of perceptual estimation of spatial attributes, where
relationships are restricted to predictions between output and
input; and can usually remain agnostic to explicit representational structures.
6. Although the notion of ‘‘cues’’, and their combination thereof, is a very useful construct for understanding how interand intra-sensory calibration occurs, the use of the notion of
‘‘cues’’ is problematic for areas of research that are aimed at
understanding the nature of perceptual representation, because
412
7.
8.
9.
10.
11.
DHANRAJ VISHWANATH
cues are merely ways in which to specify measurements within
the perceptual output, and are not, as is commonly assumed,
objective descriptors of either the external stimulus or the internal
image.
Recent Neoconstructivist approaches (e.g. perceptual organization, grouping, figure-ground) embrace Gestalt principles as
important factors in the generation of the visual precept. Yet,
most of these approaches are contrary to the basic epistemological and functional proposals implied by Gestalt theory.
Theories of inference introduce spurious problems to the
understanding of the perceptual process. One such red herring
is the puzzle of how a stable percept is maintained despite the
constant changes in the retinal image across saccadic eye movements and blinks.
Neoconstructivist theories cannot explain why our percepts
seem to provide greater information content than appears to be
‘‘objectively’’ present in the external array. This is an argument
implicit in Gestalt theory and central to Leyton’s generative
theory of shape.
Neoconstructivist theories cannot explain how the percept is
experienced. A fundamental charge of the theories put forth by
Hering, Gestalt theory, Gibson and Leyton.
Neoconstructivist theories cannot explain the phenomenological
reality of the reflexive qualitative judgments of perceived visual
configurations that appear pre-cognitively in art, design and
every day visual experience. Leyton is among the few who have
argued how the understanding of aesthetics is central to any
computational theory of shape.
In Section 2 we do a rudimentary review of the basic epistemological arguments in modern philosophy stretching from Descartes
to Kant. This is important because the notion of the distinction
between contingent and necessary connections between events will
be crucial for understanding why all constructivist theories of shape
representation/recovery ultimately reduce to untenable naı̈ve realist
ones.
In Section 3 we review the two basic approaches to shape representation/recovery in modern research: (1) standard computational
vision5 and (2) shape perception as Bayesian probabilistic inference.
We reiterate that the methodologies in both these approaches have
important and wide application to many problems in human and
THE EPISTEMOLOGY OF VISION
413
computer vision, and are irreplaceable in the development of artificial systems, as well as the assessment of visuo-motor capacities of
humans. The intent here will be to try and show why they cannot
be successful theories of human perceptual representation. Many
other ad hoc approaches to shape representation suffer similar
problems, but in addition, they do not provide any useful quantitative framework for other basic aspects of vision research. In that
sense, an important distinction must be made between ad hoc theories and the sound quantitative frameworks of computer-vision and
probabilistic approaches.
In Section 4 we assess two key theories of perception that have
heavily influenced current research, namely Gibson’s theory of perception and Gestalt theory. For the latter we mention only the theory and approach of the Berlin school of Gestalt (e.g. Wertheimer
and Kohler), which is the one that is most familiar to researchers
in perception. There are many important and crucial ideas that
come out of the early Gestalt theorists such as Brentano, Von
Ehrenfels, Mach, as well as other philosophers and psychologists of
the Austrian and Italian schools of Gestalt theory. The reader is
directed to the extensive reviews and analysis on their application
to contemporary perceptual science by Albertazzi et al. (1996),
Albertazzi (2000, 2001, 2002).
Section 5 analyses the shortcomings of inferential approaches
and outline diagrammatic frameworks for understanding the various approaches one might take for a theory of perception. Specifically we will outline 3 of them: (1) shape perception as inference
from 2D image to 3D world (naı̈ve realism) (2) Shape perception as
a calibration map (here we will also argue that shape or object recognition can be thought of as a form of calibration) (3) shape perception as the presentation6 of sensory flux.
Section 6 discusses the implications of each approach to perceptual experience. Section 7 discusses the implications of theories of
perception to aesthetics and design. Here the notion of representational conflict in perception is introduced. This section will, by
design, be of a speculative nature.
Since the paper is quite long, a first reading might be possible
by skipping Section 3, and for those familiar with the basic philosophical arguments, Section 2. A short reading of the paper might
include the introduction, and Sections 5, 6 and 7.
466
DHANRAJ VISHWANATH
6.5. Summary
So, let us now summarize the essential problems with the Neoconstructivist, inverse optics approach to perceptual representation:
(1) The 2D image and 3D percept are defined to be distinct entities: the former existing in the external world and the latter in
the psychological domain. Perception is supposed to provide
the causal inferential link between the two. Yet, as we have
seen, an inferential model of perception – even when placed
within a sophisticated quantitative framework – removes the
very distinction that makes the inferential link informative. Any
model of perception as inference reduces inexorably to one of
two trivial models. (1) a naı̈ve realist model (Figure 15), where
the perceptual system doesn’t involve an inductive (inferential)
step between the external and psychological, but has instead an
all knowing homunculus examining an objective image on a
camera; or (2) An idealistic model in which the external world
does not exist. This essential problem is nothing more than
Hume’s argument against any empiricist theory of knowledge
that wants to retain the notion of an objective physical world.
In all such theories we end up losing the very distinction that
we are trying to explain, and with it most information content
in the percept.
(2) The external world, the 2D image, and the 3D percept are all
defined in terms of the same descriptive parameters, which are
erroneously considered to be objective parameters independent
of perception. Thus, geometric descriptors like length, position,
orientation, are taken to be objective measures or properties of
the world, in the same way that objects and surfaces are
erroneously taken to be objective things in the world. Similarly,
the image is thought to have objective descriptors as well as
objective ‘‘cues’’ which are assumed to have informational content distinct from the perceptual machinery and the final percept.
In these models, perception is assumed to be a process that involves the successful detection of objects that have distinct
objective existences in the external world, which have been imaged onto an objective sensory field. A central aspect of these
models is the assumption that spatial descriptions are objective
from an informational point of view.
THE EPISTEMOLOGY OF VISION
467
(3) Inference requires a re-presentational scheme and re-presentation precludes direct experience, and so any re-presentational
scheme that specifies the existence of an external world reduces
to a naı̈ve realist scheme.
(4) Though many recent Neoconstructivist approaches claim to be
aligned with Gestalt theories of perception, nearly all maintain
epistemological positions exactly opposite to those espoused by
Gestalt theory.
(5) Neoconstructivist theories conflate the notion of calibration
with perceptual representation. One way they do so is by creating an unnecessary distinction between what we call measurement as motor action, and device-based measurement. In these
models, the device-based measurement is considered, erroneously, to be objective and informationally distinct from perception and motor action.
(6) Limiting the notion of inverse optics to a calibration domain
allows for workable inverse-optics models for exploring a range
of issues in vision. Such approaches, in which shape is viewed
simply as a calibration map, implicitly posit that the only information contained in the percept is spatial estimates of points in
the visual field. Note that such a model is epistemologically a
behaviorist model where there is only an ‘‘idealized’’ world of
stimulus and response.
(7) The non-metric informational content of a percept such as connectedness of points, continuity of surfaces, etc, are assumed to
be encoded as ‘‘learned’’ biases, constraints, priors, etc. There
are two problems with this: (1) Such constraints cannot be
learned since a unique percept (on which the learning can take
place) is impossible without these biases already in place. (2) It
implies that the only information content in the percept itself is
metric – all non-metric information is not part of the percept,
but part of the inferential device.
(8) Neoconstructivist theories do not provide a satisfactory explanation of perceptual experience. This is because Neoconstructivism regards perception as a re-presentation of properties of
the external world. Though we have been using the term perceptual representation throughout the text, we did introduce
the notion of perception as a presentation of the sensory flux.
This is the most critical aspect of any theory of perception; it is
most correctly defined, not as a system of re-presentation, but
of presentation. Thinking in such terms, color is not thought of
468
DHANRAJ VISHWANATH
as a re-presentation of a property in the external world; it is the
presentation of chromatic differences; where ‘‘chromatic difference’’ is a property of the percept and not the external world.28
In other words, the information content of color is parasitic on
percept and not on objective properties of the external world. It
is certainly not a re-presentation of differences in electromagnetic wavelength; any correlation between wavelength and perceived color is a synthetic description of relations between
perceptual states and measurements in visuo-motor space.
Figure 16 illustrates what an epistemologically correct model of
perception would look like, where the information content of
the percept (both of the image description, and the perceived
object description is part of the perceptual apparatus and not
the extenal world. There is naturally a correlation between the
states of the external world, and the percept, as determined by
the flux at the sensory interface. Figure 17 illustrates the correct
construal of the calibration domain in perception, again, any
metric descriptions of this domain are based on perceptual entities and metrics.
Figure 16. Perception as the presentation of an external sensory flux.
THE EPISTEMOLOGY OF VISION
469
Figure 17. Calibration domain in an epistemologically correct model of
perception.
7.
PERCEPTUAL EXPERIENCE, AESTHETICS AND DESIGN
Objects project possibilities for action as much as they project that they themselves
were acted upon – the former allows for certain subtle identifications and orientations, the latter if emphasized is a recovery of the time that welds together ends
and means... The work is such that materials are not so much brought into
alignment with static a priori forms, as that the material itself is being probed
for openings that allow the artist behavioral access. Robert Morris, Sculptor.29
Perhaps the most enduring puzzle in the history of perception is its
relationship to visual aesthetics. The range of phenomena that can
be ascribed to visual aesthetics is very broad and often involve personal, sociological, religious, ritualistic, and convention-based aspects. Though many of these are best discussed within the
framework of art history and criticism, a much more enduring puzzle in aesthetics has been the question of what the purely perceptual
dimensions of aesthetics are. This has been a particularly vexing
and important question in the domain of the design of artifacts,
namely architecture and product design. A central feature of perception, that is most evident in the process and products of design,