Springer 2005 Axiomathes (2005) 15:399–486 DOI 10.1007/s10516-004-5445-y DHANRAJ VISHWANATH THE EPISTEMOLOGICAL STATUS OF VISION AND ITS IMPLICATIONS FOR DESIGN ABSTRACT. Computational theories of vision typically rely on the analysis of two aspects of human visual function: (1) object and shape recognition (2) co-calibration of sensory measurements. Both these approaches are usually based on an inverseoptics model, where visual perception is viewed as a process of inference from a 2D retinal projection to a 3D percept within a Euclidean space schema. This paradigm has had great success in certain areas of vision science, but has been relatively less successful in understanding perceptual representation, namely, the nature of the perceptual encoding. One of the drawbacks of inverse-optics approaches has been the difficulty in defining the constraints needed to make the inference computationally tractable (e.g. regularity assumptions, Bayesian priors, etc.). These constraints, thought to be learned assumptions about the nature of the physical and optical structures of the external world, have to be incorporated into any workable computational model in the inverse-optics paradigm. But inference models that employ an inverse optics plus structural assumptions approach inevitably result in a naı̈ve realist theory of perceptual representation. Another drawback of inference models for theories of perceptual representation is their inability to explain central features of the visual experience. The one most evident in the process and visual understanding of design is the fact that some visual configurations appear, often spontaneously, as perceptually more coherent than others. The epistemological consequences of inferential approaches to vision indicate that they fail to capture enduring aspects of our visual experience. Therefore they may not be suited to a theory of perceptual representation, or useful for an understanding of the role of perception in the design process and product. KEY WORDS: 3D shape and space perception, aesthetics, Bayesian inference, computational vision, design, epistemology, visual perception and cognition 1. INTRODUCTION ‘‘When it comes to deriving suitable and rigorous concepts and designations for the various characteristics of our sensations, the first requirement is that these concepts should be derived entirely out of the sensations themselves. We must rigorously avoid confusing sensations with their physical or physiological causes, or deducing from the latter any principle of classification’’ Ewald Hering 1878 400 DHANRAJ VISHWANATH A standard refrain in the introduction to most undergraduate textbooks on perception is that vision is not the result of a simple camera-like process in which the external world is imaged – faithfully – onto the minds eye. Instead, it is often claimed that the first step towards an understanding of perception is to discard the notion that what we perceive is an objective view of the external world. For example, in their highly regarded textbook, Sekuler and Blake (1986, p. 3) suggest that a distinction has to be made between ‘‘one’s perception of the world and the world itself’’. What we perceive should be more correctly thought of as the mind’s reconstructed 3D representation of the world generated from a meager 2D image impinging on the retina. Perception textbooks typically go on to say that dispelling the naı¨ve realist view that ‘‘the world is exactly as it appears’’ (Figure 1) has historically taken two opposing approaches: (1) Empiricism, which is best exemplified by Helmholtz’s theory of unconscious inference. (2) Nativism, which is best exemplified by Hering and the Gestalt school (see for example Rock (1984), Sekuler and Blake (1986), Palmer (1999), Turner (1994)). We find out that the empiricist believes that our perceptions are the result of our extensive experience and interaction with the world, while the nativist believes that our perceptions are entirely due to the minds innate predisposition to organize the sensory stimulation in a particular way. The underlying motivation for both these theories, we are told, is what is known as the poverty of Figure 1. Naı̈ve realism. THE EPISTEMOLOGY OF VISION 401 the stimulus argument: the retinal image highly underdetermines the structures that it gives rise to it in our percepts. Take the example of two possible images of a cube (Figure 2). Evidently, we perceive A as a cube while we perceive B as a square. But, B is also consistent with an image of a cube. In fact, both images, assuming Euclidean projective geometry, are consistent with an infinite class of 3-D shapes. The empiricist’s reasoning for our stable and unitary percepts in A and B might be as follows: Through experience we have noted that in the preponderance of situations that we have encountered a cube, it has appeared to us as image A. We have rarely been in the position to view it ‘‘headon’’ as shown in B and have only experienced such an image when encountering a square. We have thus learned to recognize a cube in A and a square in B. Obviously, the actual story is more complicated. It may, for example, entail the fact that we support these claims through association with our other senses such as touch. In more quantitative analysis of such inferential approaches, notions such as non-accidentalness or generic viewpoint assumptions may also be brought to bear (see, for example, Barlow, 1961; Nakayama and Shimojo, 1996; Richards et al., 1996). The Gestaltist, on the other hand, might say that the reason we perceive A and B the way we do, should be attributed entirely to the mind’s innate predisposition to organize each image. There are no cubes, squares, or surfaces in the world in the folk sense of the terms – i.e. in exactly the way we perceive them. Rather, what we see is the result of the spontaneous cortical organization of the sensory flux. Naturally, this does not preclude the possibility for the organized image to be correlated non-trivially with the physical structure of environment that gave rise to that image. Figure 2. 402 DHANRAJ VISHWANATH The aforementioned textbooks will usually inform us that the last half century of research have found both these models – taken by themselves – lacking as theories of perception. Therefore, a compromise between the two must be struck. This new approach, which is usually a sophisticated variant on the classical theory of constructivism originating from Helmholtz’s notion of unconscious inference, one might generically call Neoconstructivism.1 It is best characterized in the classic text by Marr (1982) (cf. Palmer, 1999). The theory is an attempt at an amalgamation of empirical findings in visual neurophysiology, and computational theories of vision originating in artificial intelligence, which view perception as a problem of inference in an inverse-optics framework. In other words, perception is the inversion of the optical process that generates the 2D image from a 3D environment. It is usually argued that Neoconstructivism has the appropriate combination of elements from both empiricist and nativist theories of knowledge. The Neoconstructivist rejects a purely empiricist notion of perception because it can be shown that inverse optics, as well as concept learning, is impossible unless the visual system has pre-specified constrains for determining how the image must be processed. To the Neoconstructivist the Gestaltist’s or nativist position is also deemed unattractive because it seems in the danger of slipping into a kind of solipsism (if it’s all in the cortex, where does the real world come into play?). A perfect compromise for the Neoconstructivist would be to assume, as a nativist might, that there are indeed innate constraints for processing the image. Constraints which capture objective properties and behavior of world that are learnt from interacting with the external environment. A few examples might be: that there exist surfaces, lines, parallel lines, and common objects shapes; that light impinges from above; that the observer is not viewing the environment from a special vantage point; and so on and so fourth. These constraints, along with some tractable form of learning, are combined with the outputs of early perceptual processes that measure properties of the objects and environment such as brightness, illumination, distance, direction, size and orientation. The task of the visual system, then, is to detect, recover or infer from the 2D retinal image the simplest environmental configuration that is consistent with these various measurements and constraints. The requirement for simplicity arises from the well-regarded notion that nature abhors unnecessary complexity. This principle – of Occam’s Razor – has been expressed in the perception literature in THE EPISTEMOLOGY OF VISION 403 such terms as the minimum principle (see Hochberg and McAllister, 1953; Hatfield and Epstein, 1985), minimum length encoding (e.g. Boselie and Leeuwenberg, 1986), homogeneity and isotropy assumptions (e.g. Knill, 1998), regularity assumptions (e.g. Horn, 1986), and genericity (e.g. Richards et al., 1996). In much of the literature, these simplicity assumptions are assumed to be a direct reflection of the wellstructured behavior of the physical world. A cursory glance at Neoconstructivism may make it appear to have achieved, simultaneously, a successful rejection of naı̈ve realism and a perfect compromise between a nativist and empiricist theory of knowledge; an achievement that for many renders moot any discussion of theories of knowledge. On closer inspection though, it appears such a conclusion may be premature, because such a theory usually leads to the question about where the assumptions or constraints about structure-in-the-world come from, and how they are encoded. The Neoconstructivist will typically say that it happens through evolution. In essence, the claim is that that these assumptions have to be hardwired into the system through phylogenetic interaction with the objective external world (see Pinker (1997) for a popular scientific account of this notion; also see Brain and Behavioral Sciences, Volume 24 for related analysis). The story might go: different computational ‘‘tricks’’ or ‘‘rules’’ could compete with each other through evolution, until those that most effectively ‘‘detect’’ the objective external structure are the ones that are incorporated into the phenotype. But this, one would submit, betrays an empiricist theory of knowledge applied to the bootstrapping of hardwired assumptions or constraints. Any empiricist theory of knowledge, as Hume admirably demonstrated, has to either reach the inexorable conclusion of an idealistic world, or cling to its initial mistaken (naı̈ve realist) belief that experience can provide objective knowledge of a real world. Since the central claim of Neoconstructivism is non-idealistic (i.e. the constraints and assumption actually reflect something objective about the real world) it reduces, by it’s own claims, to a naı̈ve realist theory. In other words, we find that despite the view espoused in the introductory paragraphs of the perception texts alluded to earlier, the theoretical basis for current approaches to perception is essentially an empiricist, naı̈ve-realist one. The foundational issues that afflict Neoconstructivist approaches do not by any means bear on the whole research enterprise of human and computer vision. Many areas of research can indeed 404 DHANRAJ VISHWANATH remain agnostic to epistemic assumptions within the theory and, at least to a point, implicitly assume a naı̈ve realist model. Table 1 is a partial classification of areas of research in visual science and perception based on whether or not epistemological issues are critical to such research. Where the foundational issues do raise a red flag is in any theoretical or empirical research that falls in category ‘‘B’’ which involves the issue of perceptual representation.2 In these areas of research, the representational scheme that is assumed, explicitly or implicitly, has a direct bearing on whether the theory is plausible or not. Note that we only refer to the theories plausibility as a theory of human perception. Undoubtedly, many of the approaches in column B are quite suited to applications in machine vision. The Neoconstructivist approach aligns itself with a theory of perceptual representation where the fruits of perception are, more or less, an objective 3D description of the external world; the heavy lifting of perceptual processes is the inference from a 2D retinal image to just such a 3D description. Neoconstructivist theories are ultimately aligned with a notion of representation that involves ‘‘symbolic’’ tokens that signal external world measurements, properties or entities, such as orientation, size, shape, color, surface, object, part, a face, etc. This symbolic token may take the form of the firing pattern of an individual neuron or groups of neurons. The critical assumption is that the properties being signaled are properties of the real external world that have been learned through experience, and are not synthetic constructs of perception. The existence of a symbolic form signaling a property in the brain indicates that such and such a property, measure, or entity, has been successfully (or perhaps erroneously) detected from the available sensorium. The informational content of the symbolic form is necessarily parasitic on the objective information contained in the external objects that they signal, and thus the representation is a direct mapping, indeed a faithful image up to some resolution and hardware limitations, of objective properties in the world. From an epistemological standpoint, it is perhaps the issue of information content that has been most overlooked by contemporary theories of perception. The critical importance of defining the nature of the information content of perception was first broached by the Gestaltist, and has been most forcefully and elegantly put forward by Leyton (1992, 1999). Leyton’s theory makes two crucial points regarding the informational structure of perception: (1) The Inverse optics; shape recovery as inverse optics Perception as shape/object recognition Shape perception as probabilistic inference Shape recovery of from multiple ‘‘cues’’. Shape recovery and representation via primitives (e.g. geons) Shape recovery via heuristics, biases, ‘‘bags of tricks’’, mimima principles, non-acidentalness, regularity, genericity, etc. Application of ecological statistics to shape recovery and representation. Image correlation approaches to shape recovery and representation (cross correlation,eigén vector, etc.) Perceptual organization;grouping; grouping principles; shape recovery and representation via grouping Perceptual completion, Figure-ground Lightness and brightness, Parts and wholes Feature binding, object perception as feature binding Percieved stability of visual world across eyemovements, blinks, etc. Forward optics Physiological optics Sensory physiology Front-end properties of sensory apparatus (e.g. thresholds, adaptation) Estimation of spatial properties (direction, distance, slant, size etc.) Sensor co-calibration (spatial estimation across multiple sources of information) including probabilistic approaches Spatial acuities and sensitivities (e.g.vernier, stereo acuities) Spatial localization and capacity Attentional allocation, and limitations, in perception Correlations between perceptual and visuo-motor estimates of space B: Research topics affected by epistemological assumptions A: Research topics that can be neutral to epistemological assumptions TABLE I. THE EPISTEMOLOGY OF VISION 405 406 DHANRAJ VISHWANATH information content of a percept (it’s causal structure) is constituted internal to the perceptual schema and does not reside in the external world. (2) The entities and relations used to construct a representational model cannot be parasitic on entities identified in the perceptual product, such as lines, surfaces, etc., but rather such entities have to derive from the representational scheme itself. This is achieved in Leyton’s model through a purely abstract, nested, algebraic (group-theoretic) representational schema. In contrast, let us try to understand the informational content of a percept as proposed in Neoconstructivist theories by considering the example of an observer looking at a bend on a road. Under a Neoconstructivist theory, a bend perceived in the road is the activation of some set of signals that a bend in the road exists, and the bend exists descriptively in the world in more or less the way that those perceptual signals or symbols specifies. For example, certain measures that we may specify the bending road to have, such as length, curvature, width, distance and direction as well as any of the ontological categories that we might ascribe to it, such as line or surface, provide an objective spatial description of the bend in the road that exists externally, independent of perception, and also specify the content of those signals or symbols that make up the percept of the bending road. In other words, both the physical thing that exists that is the bending road, as well as it’s percept, should naturally be described using these descriptors. Under a Neoconstructivist theory, we use these spatial and ontological descriptors not because that is the format in which our perceptions specify the world, but because our perceptions are (more or less) faithful descriptions of such objective physical properties and entities. In other words, describing such objective properties and entities in the world in terms of these ontological categories and spatial attributes is the only objective way that they can be described. And the descriptions that can be applied to the fruit of our percepts are also exactly those descriptions that apply to the physical thing out there that is the road; perhaps something less (resolution and hardware limitations), but certainly nothing more. Thus, the epistemology and ontology of a Neoconstructivist suggest even at first blush to be at least weakly naı̈ve realist. Putting aside any generic distaste for naı̈ve realism, what else might possibly be wrong with a theory of perception like Neoconstructivism? One of the most enduring questions an inferential theory such as Neoconstructivism raises is the following: if a percept is THE EPISTEMOLOGY OF VISION 407 either an objective measure, or indicator of the existence of a property or entity in the world, then how is this indication psychologically experienced? This question has been vexing to perceptual researchers from Mach, Hering, the Gestalt school, through to Gibson. Yet perhaps the most penetrating analysis on the question of perceptual experience and it’s relationship to the information content of the percept has been put forth by Leyton (1992). The question his theory raises and answers is the following: for the example of the bend in the road given earlier, how is it that we have a phenomenological sense of the bend itself if the ‘‘indication’’ is merely specifying certain static properties or quantities? Let us look at more closely at Leyton’s question, using the example of the sculpture in Figure 3. The sculpture does not represent any familiar object, and the most immediate markers of familiarity are that it is carved out of stone, and is a solid rigid object. Yet the most phenomenologically striking aspect of it, as Leyton would point out, is that we can perceptually sense the forces, the bending, and the bulging. Yet all our direct familiarity cues should be telling us that such processes are not at work in the object, and that it is instead a static, stress free, object. One might argue and say that those perceived forces are merely because the object ‘‘resembles’’, say, a rolled toothpaste tube, or clasped hand, and so we are not ‘‘experiencing’’ the bending, but merely experiencing the lighting up of a hierarchical neural symbolic linkage, that might be Figure 3. Carving #11 Barry Flanagan, 1981 (from Beal and Jacob, 1987). 408 DHANRAJ VISHWANATH as follows: ‘‘like a rolled paste tube fi rolling requires force and action fi that force and action produces internal stresses fi internal stresses cause stretching of external membrane fi excessive external stress can cause disruption of the membrane’’. What would a Neoconstructivist theory predict sculptor Barry Flanagan3 would see when looking at his own finished product. Presumably since he is a sculptor, and has himself carved the object, his experience should not make his visual system light up the in above symbolic hierarchy (even though he may intend such tromp l’oeil in his observers). Instead, since the shape only weakly invokes some sort of familiar object, while his experience should strongly evoke what the object itself really is, he should just have activation of the symbolic set that simply says ‘‘solid, hard, carved, roundish object’’ (let us ignore the fact that even these have to be cashed out experientially). Indeed, if he hires an assistant to carve a multitude of the same shape through his lifetime, that assistant should cease to phenomenally experience any of the bending and bulging, and his very percept of the object should change. Any cognitive understanding of it’s similarity to a rolled toothpaste tube must be post-perceptual. Indeed, an animal with a visual system comparable to the human visual system, should have a completely neutral perceptual experience with respect to the object since it is neither familiar to a known object, or created with a familiar procedure. The entire informational structure (the sensed forces, deformations, etc., that Leyton enumerates) is under a Neoconstructivist theory either non-existent or the result of a simple application of our cognitive experience with objects. This line of thinking leads to another question that is definitely a more relevant issue to the theme of this volume. It arises when we consider what a Neoconstructivist theory of perception has to say about aesthetics and design. Do we reflexively perceive qualitative differences above and beyond the objective spatial and recognition measures when we view different visual configurations? In other words, is there a natural reflexive qualitative evaluation that occurs at the level of the perceptual understanding of a visual configuration, which is prior to any application of cognitive factors such as memory, experience, etc? The ubiquitous perceptual evaluation that seems integral to the process of designing and the experience of a designed product, as well as common visual phenomenology, suggests that the answer is yes. That such direct perceptual evaluation is at some level central to the aesthetic experience in art and architecture has been of great interest historically THE EPISTEMOLOGY OF VISION 409 both in psychology (e.g. analysis by Kant, Klee, the Gestalt theorists, Arnheim, etc.) as well as artistic movements (e.g. Abstract Expressionism and Minimalism). More recently, Leyton’s theory of perceptual representation has taken as its central charge the ability to explain fundamental aspects of aesthetics. An inferential theory of perception such as Neoconstructivism implies that all physically plausible visual configurations are, at the perceptual level, psychologically equivalent. This assumption of psychological equivalence in inferential theories of perception is reflected in the fact that qualitative aspects of perception are usually judiciously sidestepped in favor of measurable ones. The implicit assumption is that since a functioning perceptual system only faithfully infers what is out in the world (up to limits on hardware), and does not inject any non-trivial informational structure of its own, all physically plausible configurations should yield the same perceptual quality; or perhaps, no perceptual quality. Of course cognitive factors, such as memory, appetite, or experience, might color the cognitive experience of the object that perception delivers, but the perceptual act remains neutral, since all it does is indicate that such and such a thing is out there, in the way that it is out there. Since the way that it is out there is physically valid (we have already made this caveat), there is nothing else that can be said about it in terms of quality. Obviously, sometimes the recovery may be erroneous, but since there is no marker on the percept telling us this, the erroneous percept is, from perception’s point of view, just as valid. Any judgment on the appropriateness of a configuration must come from extra-perceptual considerations (memory, appetite, aversion, experience, etc), what we refer to in this paper as cognitive aesthetics. It is interesting to note that for the aesthetician who wants to claim that all perceptual preferences are learned, a Neoconstructivist theory works very well, since rather than being a result of the very act of perception, aesthetic preference is cognitively applied onto the neutral product of perception. Yet, the nature of the process and product of design (and art) – as well as common phenomenology – are convincing evidence that such perceptual neutrality is not what we typically experience. The very act of painting and designing, involve choices and manipulation of physical configurations that are deeply connected to perceiving differences in the quality of the configurations. Such differences are inexplicable within a naı̈ve-realist theory of perception (which we will hopefully show Neoconstructivism to be). Our experience 410 DHANRAJ VISHWANATH of what one might call perceptual aesthetics suggests that for a workable perceptual theory, the differences in perceptual quality should be deducible from the representational schema that embodies our perceptual system. The notion that the representational scheme of perception reveals its signature in our perceptual phenomenology is implicit, historically, in the work of several researchers (e.g. Hering) and particularly the Gestaltists. Leyton (1992, 2001) in his theory has rigorously raised and answered many of the epistemological, phenomenological and aesthetic criteria implicit in Gestalt theory. Yet surprisingly, these central observations of Gestalt theory are precisely the one that has been jettisoned from contemporary theories of representation aligned with Neoconstructivism. Generically, contemporary vision science has shied away from tackling the enduring but difficult puzzles of perception that are tied to phenomenology, epistemology and aesthetics. Much of this might be attributed to the current lack of resources on the historical lineage of the epistemological and phenomenological problems, and how they apply to contemporary scientific research in perception. None of the introductory or survey texts used for pedagogy provide a sustained critique of current approaches and their consequences. This paper is an attempt at filling this gap by bringing together issues within an epistemological framework that have been sometimes explicit and sometimes implicit in prior research, and applied to current approaches to understanding perception within vision research. There are six sections to this paper. Through these sections we will attempt communicate a range of ideas. Most, if not all, have been expressed before in the literature, starting from natural philosophy of the 18th century, empirical and theoretical research in vision (notably Hering, the Gestalt theorists and Gibson), and most particularly Leyton’s theory of shape.4 We will generously borrow ideas from their analyses provided in these works, to weave an argument that consists of the following observations: 1. In empiricist theories of vision such as Neoconstructivism (perception as inference, inverse optics, etc.) the critical informational and causal distinction between the 2D image and the 3D percept, specified by the theory, is erased by the computational rendering of the theory. THE EPISTEMOLOGY OF VISION 411 2. The result of Neoconstructivist theories is a computational model of perception where the percept itself is largely noninformative. In such theories, the percept contains no non-metric information about the perceived world. The only non-metric information is generic rather than percept-specific (e.g. the fact that surfaces are continuous) and such information is entirely the property of the inferential device. The remaining metric information is itself not informative outside of the purview of inter- and intra-sensory calibration, and is especially not, as often assumed, an objective measure on the external world. All other information is rendered to be properties of the outside world; properties which are merely symbolically instantiated in the inferential device. 3. Theories of perception-as-inference always involve positing objective measures, attributes and entities to both the sensory stimulation and the external world. On closer inspection such attributes (‘‘features’’), measures (‘‘cues’’) and entities (lines, surfaces, objects) turn out to be subjective descriptors parasitic on very perceptual structures that they are used to explain. This results, inexorably, in such theories becoming naı̈ve realist ones. 4. Standard computational renderings of Neoconstructivist theories conflate sensor co-calibration and object recognition with perceptual representation. Both calibration and object recognition exhibit characteristics of learning, which are usually taken by such theories to support an empiricist or constructivist epistemology for perceptual representation. 5. A restricted model of perception as inverse optics that deals with only inter-sensory and intra-sensory co-calibration issues is a viable model for a range of empirical research studies in vision, particularly 3D space perception. Such a model is viable because it takes a strictly behaviorist approach to the notion of perceptual estimation of spatial attributes, where relationships are restricted to predictions between output and input; and can usually remain agnostic to explicit representational structures. 6. Although the notion of ‘‘cues’’, and their combination thereof, is a very useful construct for understanding how interand intra-sensory calibration occurs, the use of the notion of ‘‘cues’’ is problematic for areas of research that are aimed at understanding the nature of perceptual representation, because 412 7. 8. 9. 10. 11. DHANRAJ VISHWANATH cues are merely ways in which to specify measurements within the perceptual output, and are not, as is commonly assumed, objective descriptors of either the external stimulus or the internal image. Recent Neoconstructivist approaches (e.g. perceptual organization, grouping, figure-ground) embrace Gestalt principles as important factors in the generation of the visual precept. Yet, most of these approaches are contrary to the basic epistemological and functional proposals implied by Gestalt theory. Theories of inference introduce spurious problems to the understanding of the perceptual process. One such red herring is the puzzle of how a stable percept is maintained despite the constant changes in the retinal image across saccadic eye movements and blinks. Neoconstructivist theories cannot explain why our percepts seem to provide greater information content than appears to be ‘‘objectively’’ present in the external array. This is an argument implicit in Gestalt theory and central to Leyton’s generative theory of shape. Neoconstructivist theories cannot explain how the percept is experienced. A fundamental charge of the theories put forth by Hering, Gestalt theory, Gibson and Leyton. Neoconstructivist theories cannot explain the phenomenological reality of the reflexive qualitative judgments of perceived visual configurations that appear pre-cognitively in art, design and every day visual experience. Leyton is among the few who have argued how the understanding of aesthetics is central to any computational theory of shape. In Section 2 we do a rudimentary review of the basic epistemological arguments in modern philosophy stretching from Descartes to Kant. This is important because the notion of the distinction between contingent and necessary connections between events will be crucial for understanding why all constructivist theories of shape representation/recovery ultimately reduce to untenable naı̈ve realist ones. In Section 3 we review the two basic approaches to shape representation/recovery in modern research: (1) standard computational vision5 and (2) shape perception as Bayesian probabilistic inference. We reiterate that the methodologies in both these approaches have important and wide application to many problems in human and THE EPISTEMOLOGY OF VISION 413 computer vision, and are irreplaceable in the development of artificial systems, as well as the assessment of visuo-motor capacities of humans. The intent here will be to try and show why they cannot be successful theories of human perceptual representation. Many other ad hoc approaches to shape representation suffer similar problems, but in addition, they do not provide any useful quantitative framework for other basic aspects of vision research. In that sense, an important distinction must be made between ad hoc theories and the sound quantitative frameworks of computer-vision and probabilistic approaches. In Section 4 we assess two key theories of perception that have heavily influenced current research, namely Gibson’s theory of perception and Gestalt theory. For the latter we mention only the theory and approach of the Berlin school of Gestalt (e.g. Wertheimer and Kohler), which is the one that is most familiar to researchers in perception. There are many important and crucial ideas that come out of the early Gestalt theorists such as Brentano, Von Ehrenfels, Mach, as well as other philosophers and psychologists of the Austrian and Italian schools of Gestalt theory. The reader is directed to the extensive reviews and analysis on their application to contemporary perceptual science by Albertazzi et al. (1996), Albertazzi (2000, 2001, 2002). Section 5 analyses the shortcomings of inferential approaches and outline diagrammatic frameworks for understanding the various approaches one might take for a theory of perception. Specifically we will outline 3 of them: (1) shape perception as inference from 2D image to 3D world (naı̈ve realism) (2) Shape perception as a calibration map (here we will also argue that shape or object recognition can be thought of as a form of calibration) (3) shape perception as the presentation6 of sensory flux. Section 6 discusses the implications of each approach to perceptual experience. Section 7 discusses the implications of theories of perception to aesthetics and design. Here the notion of representational conflict in perception is introduced. This section will, by design, be of a speculative nature. Since the paper is quite long, a first reading might be possible by skipping Section 3, and for those familiar with the basic philosophical arguments, Section 2. A short reading of the paper might include the introduction, and Sections 5, 6 and 7. 466 DHANRAJ VISHWANATH 6.5. Summary So, let us now summarize the essential problems with the Neoconstructivist, inverse optics approach to perceptual representation: (1) The 2D image and 3D percept are defined to be distinct entities: the former existing in the external world and the latter in the psychological domain. Perception is supposed to provide the causal inferential link between the two. Yet, as we have seen, an inferential model of perception – even when placed within a sophisticated quantitative framework – removes the very distinction that makes the inferential link informative. Any model of perception as inference reduces inexorably to one of two trivial models. (1) a naı̈ve realist model (Figure 15), where the perceptual system doesn’t involve an inductive (inferential) step between the external and psychological, but has instead an all knowing homunculus examining an objective image on a camera; or (2) An idealistic model in which the external world does not exist. This essential problem is nothing more than Hume’s argument against any empiricist theory of knowledge that wants to retain the notion of an objective physical world. In all such theories we end up losing the very distinction that we are trying to explain, and with it most information content in the percept. (2) The external world, the 2D image, and the 3D percept are all defined in terms of the same descriptive parameters, which are erroneously considered to be objective parameters independent of perception. Thus, geometric descriptors like length, position, orientation, are taken to be objective measures or properties of the world, in the same way that objects and surfaces are erroneously taken to be objective things in the world. Similarly, the image is thought to have objective descriptors as well as objective ‘‘cues’’ which are assumed to have informational content distinct from the perceptual machinery and the final percept. In these models, perception is assumed to be a process that involves the successful detection of objects that have distinct objective existences in the external world, which have been imaged onto an objective sensory field. A central aspect of these models is the assumption that spatial descriptions are objective from an informational point of view. THE EPISTEMOLOGY OF VISION 467 (3) Inference requires a re-presentational scheme and re-presentation precludes direct experience, and so any re-presentational scheme that specifies the existence of an external world reduces to a naı̈ve realist scheme. (4) Though many recent Neoconstructivist approaches claim to be aligned with Gestalt theories of perception, nearly all maintain epistemological positions exactly opposite to those espoused by Gestalt theory. (5) Neoconstructivist theories conflate the notion of calibration with perceptual representation. One way they do so is by creating an unnecessary distinction between what we call measurement as motor action, and device-based measurement. In these models, the device-based measurement is considered, erroneously, to be objective and informationally distinct from perception and motor action. (6) Limiting the notion of inverse optics to a calibration domain allows for workable inverse-optics models for exploring a range of issues in vision. Such approaches, in which shape is viewed simply as a calibration map, implicitly posit that the only information contained in the percept is spatial estimates of points in the visual field. Note that such a model is epistemologically a behaviorist model where there is only an ‘‘idealized’’ world of stimulus and response. (7) The non-metric informational content of a percept such as connectedness of points, continuity of surfaces, etc, are assumed to be encoded as ‘‘learned’’ biases, constraints, priors, etc. There are two problems with this: (1) Such constraints cannot be learned since a unique percept (on which the learning can take place) is impossible without these biases already in place. (2) It implies that the only information content in the percept itself is metric – all non-metric information is not part of the percept, but part of the inferential device. (8) Neoconstructivist theories do not provide a satisfactory explanation of perceptual experience. This is because Neoconstructivism regards perception as a re-presentation of properties of the external world. Though we have been using the term perceptual representation throughout the text, we did introduce the notion of perception as a presentation of the sensory flux. This is the most critical aspect of any theory of perception; it is most correctly defined, not as a system of re-presentation, but of presentation. Thinking in such terms, color is not thought of 468 DHANRAJ VISHWANATH as a re-presentation of a property in the external world; it is the presentation of chromatic differences; where ‘‘chromatic difference’’ is a property of the percept and not the external world.28 In other words, the information content of color is parasitic on percept and not on objective properties of the external world. It is certainly not a re-presentation of differences in electromagnetic wavelength; any correlation between wavelength and perceived color is a synthetic description of relations between perceptual states and measurements in visuo-motor space. Figure 16 illustrates what an epistemologically correct model of perception would look like, where the information content of the percept (both of the image description, and the perceived object description is part of the perceptual apparatus and not the extenal world. There is naturally a correlation between the states of the external world, and the percept, as determined by the flux at the sensory interface. Figure 17 illustrates the correct construal of the calibration domain in perception, again, any metric descriptions of this domain are based on perceptual entities and metrics. Figure 16. Perception as the presentation of an external sensory flux. THE EPISTEMOLOGY OF VISION 469 Figure 17. Calibration domain in an epistemologically correct model of perception. 7. PERCEPTUAL EXPERIENCE, AESTHETICS AND DESIGN Objects project possibilities for action as much as they project that they themselves were acted upon – the former allows for certain subtle identifications and orientations, the latter if emphasized is a recovery of the time that welds together ends and means... The work is such that materials are not so much brought into alignment with static a priori forms, as that the material itself is being probed for openings that allow the artist behavioral access. Robert Morris, Sculptor.29 Perhaps the most enduring puzzle in the history of perception is its relationship to visual aesthetics. The range of phenomena that can be ascribed to visual aesthetics is very broad and often involve personal, sociological, religious, ritualistic, and convention-based aspects. Though many of these are best discussed within the framework of art history and criticism, a much more enduring puzzle in aesthetics has been the question of what the purely perceptual dimensions of aesthetics are. This has been a particularly vexing and important question in the domain of the design of artifacts, namely architecture and product design. A central feature of perception, that is most evident in the process and products of design,
© Copyright 2026 Paperzz