1 How The Vision Works Nariman Varahram (1228406) Vienna University of Technology Abstract—Computer vision is a field which deals with problems regarding acquiring, processing, analyzing and understanding real world data. The goal of computer vision is extract information from real world input images. This input images can be in different forms, such as video sequences, multiple camera views or multi dimensional data of medical scanners. However in order to duplicate humans vision to computer vision, one should understand how human’s vision works to understand what are the challenges in implementing computer vision. Furthermore to understand human vision one need to know how mind works in regarding to see real world. This paper focus on mind and both human vision and computer vision from point of view of famous psychologist Steven Pinker, the author of ”How the Mind Works”. Keywords—Vision, Mind, Brain, Eye, Human. Computer I. I NTRODUCTION In most movies about robot when cinematographers show a world from robot eye-view, they show a video images of the world decorated with some contrivances like cross-hairs, pull down menus, fish eye distortion or red tint. However this is a misleading portrait of vision. If there was a possibility to see the world through a robot eyes or through human’s brains point of view, it would not look similar to video images with cross-hair. Instead it would be millions of values and variables corresponding to the intensity of light at the various location on retina. which gained by a two dimensional projection of three dimensional world in front of eye or the camera. To understand how the computer vision works, Initially we need to understand how the human vision works. And to understand how the human vision works we need to know how human brain and human eyes correlate and work together. The human mind is one the last great frontiers of science. And its truly magnificent organ. it is allows us to walk on the moon, to discover the secrets of life, physical universe and invent advanced and complex devices. But the mind raises many paradoxes, on the one hand the human mind is an engineering masterpiece. we can see, move and use common sense better than any existing or foreseeable computer or robot. On the other hand it struggle to find an answer in simple questions. For instance why is the thought of eating worms as source of protein are discussing ? why people believe ghosts and spirits ? why people fall in love ? One of the idea that explains both of these kinds of field is computation. Just as the function of heart is pump blood and function of kidney is filter the blood, function of brain is information processing or computation. However there are many information exist in the world which is color-less, order-less, weight-less and tasteless. For instance, to explain a person behavior, such as why the person took the taxi ? Its not possible to answer this question by solving mathematical functions or by simulation of neural networks in brain. However if that person simply questioned why he took that taxi he can gives an explanation for instance to go to hotel. in this case reaching to hotel is his belief and desire and leads to his behavior which is taking the taxi. Computational theory of minds solves this problem. one part of it because just as beliefs and desires are color-less, order-less, and taste-less, but can cause behavior we know. The information be in mathematical concept is also color-less, order-less, and taste-less, but physical devices that carrying information can by obeying law of physics, cause systematic patterns of changes that can well be characterized this abstract language information. Computation theory of mind also guides the direct study of physiology of brain an old pseudo question sometime hear introductory psychology. To show a diagram of the eye ball which shows an image of the world projected on to the retina upside down. if the image in the retina is upside down that means some part of brain that turns the images round right side out that we can see the worlds as it is right side out ? This is a pseudo question there lead not the any such process in the brain because, whether the image is upside down or right side out makes no difference how brain processes information coming from it. And the information is the only property of activity in brain, to relevant to explaining the mind. II. V ISIONS From centuries we known the body is complex devices. For instance the human eye has many parts that intellectually arranged to accomplish some outcome, namely focusing image of layers of lights in sensitive tissue. We explain the various parts of the eye by saying that some sense designed to forming image. now the punch line is the mind is a complex device it is complex enough to we can not duplicate it simple functions like seeing or retrieving information in a computer or robot. This leads to the idea that in order to understand the mind we have reverse engineer it. in forward engineering we have a goal that want our device to accomplish and it leads to build the device. In reverse engineering we start with the device and have to figure out what was designed to do ? Mind is a complicated device that has solve many different kinds of engineering problems, such as seeing in three dimensions, moving arms and legs, understanding physical world and many others. These problem are different and the tools to solving them are different. we know that specialization is ubiquitous in biology in general. The heart has a different shape and belongs because the heart designed to pump blood and eyes are designed to see the three dimensional world. heart cells are different with eyes cells. focusing of this paper is on eyes, 2 Fig. 1. Object covered by black background. Fig. 2. Same object covered by white background. vision and how brain process and reasoning the information which gained by eyes. A. Human Vision As described in Introduction section, Image of world from point of view of eyes are millions of values and variables corresponding to the intensity of light at the various location on retina. The task of the brain is to process this numbers and recover and understanding the three dimensional structure of the world from the intensities. the brain is evolve many trick for doing it, However this task is not that much simple. Objects in real world are not always easy to be understand. For instance Figure 1 shows an object which covered by dark background. Hence foreground object can not easily distinguish from background. Figure 2 represent same object is white background. It is easy to conclude that spotting shape of this object in first figure is not easy task based on respective environment. Another challenge in seeing objects is called ”shape from shading” and it works based on a simple law of physics as follows. Imagine a light source and a surface front of it. The steeper the angle of the surface the less light is reflect back. Hence as the surface is rotate with the respect to the light source the globe of light is reflected on it goes from bright to dim. this way of light reflection is true in terms of laws of physics. psychology take advantages of physic law and run backward and say the dimmer image on retina is steeper the angle of the surface in the world and therefore the brain enable to reconstruct the shape from the angles of the thousands facets that collectively defined the three dimensional of shape of the surface. The only problem with ”shape from shading” trick is that the brain interpret brightness angle and therefore assumes a uniformly or at least randomly colored world. It assumes that any difference in lightness or darkness on retina comes from differences in angle and ultimately shape in the world. This assumption obviously is not generally true, And predict the surfaces colored in clever way should foolish shape from shading module and cause us to see things that aren’t there. In fact that is exactly what happens in many of the contrivance of modern life taking advantage of this. For instance television is kind of illusion. people spend hours and hours to stare at a plane of glass. Why the people stare plane of glass ? because it designed to displays pattern of shading that are shade from shading analyzer interpret as coming from three dimensional objects. Therefore we stare the plane of glass because the plane of glass is engineered to defeat this part of our brain and cause us to hallucinate real world behind the glass. Another example is makeup, People who are skilled in fine makeup know if a person nose is too big WE can make it smaller by putting little bit of rouge on border of the nose, and brain interprets dark as steeper angle. Hence nose looks skinnier. More generally many of the illusions fallacy behaviors are like makeup and television, They come from mismatch between assumptions of world that built in our mental modules and the structure of the current world. Therefore in case of comparing two objects, For instance milk and cola, can we conclude that If large numbers come from bright regions and small numbers come from dark regions,then large number equals white and corresponds milk ? and small number equals black and corresponds to cola? No. The amount of light received by the retina depends not only on how pale or dark the object is, but also on how bright or dim the light illuminating the object is. That means we see the milk white even in dark area and see the cola black even in bright area. This means human’s conscious matches the world as it is rather than the world as it presents itself to the eye. The harmony between how the world looks and how the world is must be an 3 achievement of our neural wizardry, because black and white don’t simply announce themselves on the retina. Impressive part of this process is where human brain deduces an object shape and substance from its two dimensional projection. why its impressive ? because this kind of problems which knows as ”ill-posed problems” are generally unsolvable and does not lead to an unique solution. a patch of grey which received by eye can be either milk in shade or cola under light. Vision evolved to convert this unsolvable problem to solvable one by adding premises and assumptions. Therefore it means if we travel to another world where assumptions are no longer valid of that world because of unlucky and unpredictable coincidences we fall prey to an illusion. The next problem is depth of seeing. human eye project three dimensional image of world to two dimensional image on retina.Therefore third dimension will reconstruct in human brain. However the information which is the how far the real object was does not receive by retina. Another important fact that need to consider for human vision is humans are binocular. Which means humans receive two independent images for each of their eyes. images which project in left eye’s retina is not completely same as in right eye. This fact can explain how stereo-grams work. Moreover it explains why it is impossible for the painter to draw any near solid object as painting which can not distinguish it with real object. The two eyes have slightly different views which called ”binocular parallax”. Imagine looking at Soccer ball on a table with a rugby ball behind it and tennis ball in front it. Aim your eyes at the soccer ball, The soccer ball is at six o’clock in both retinas. Now look at projection of tennis ball which is located in front of soccer ball. In the left eye they sit in seven o’clock but in right eye they sit on five o’clock similarly when look at rugby ball which is further projected image sit at five o’clock in left and six thirty at right. Afterward when brain detect these two images correspond to a single object in real word, these two individual images (also knows as Leonardo’s Window) will combine by mind and produce the result which is the image that we see. and that is why it is impossible for the painter to draw same painting as near real object because in case of painting two similar pictures are projected However in case of real object two pictures are dissimilar. To explain what happens when looking at stereo-grams, The idea is not complex. the image was captured by two Leonardo’s windows or more generally by two cameras, each of these positioned to place where one of the eyes located. place left image front of person’s left eye and similarly place right image front of person’s right eye. when brain assumes two eyes are looking at same three dimensional real world image, with only difference in views which caused by binocular parallax, This is the time when brain fooled by picture and combine those pictures as one picture and cause to image appear in different depths. Although brain adjust the eyes physically by controlling muscles in two ways. This is the reason why some people can not see stereo-grams. In first adjustment brain controls fatness of eyes lens. this lens receives lights from world and focus them all at a point on retina. For the distant object muscle inside the eyeball control thickness of lens in way to make it thin and in case of close Fig. 3. Stereo-grams. Fig. 4. Position of images in each eyes in case of looking at stereo-gram. objects make this lens fat to avoid blurry image. Figure 5 illustrate how muscle change thickness of lens. The goal of second adjustment is to aim two eyes which separated from each other by about one and half inches at same object in world. this task applies by the help of muscles which attached to side of eyes. The more object is close the more eyes should be crossed. Figure 6 illustrates how brain controls eyes orientation. B. Computer Vision One of the clearest definition of goal of vision has come from artificial intelligence researcher David Marr. He said ”Vision is a process that produces from images of the external world a description that is useful to the viewer and not cluttered with irrelevant information.” 4 Fig. 5. Adjusting thickness of lens by controlling muscle with brain signals. Fig. 7. Fig. 6. Adjusting orientation of eyes by controlling muscle with brain signals. If vision did not produce description, then every organs and mental faculty such as moving, talking, planning ans etc would need their own procedure to deduce meaning, which is not happens. When retina project a pattern in two dimension vision deduce the shape of the object based on retinal image. After that all parts of mind starts discovering to produce a description for it. and Finally the mind attach this description with mental modules readable format to the object in three dimensional coordinates. Let pretend that we have somehow built a robot that can see and move. What will it do with what it sees? How should it decide how to act? An intelligent being cannot treat every object it sees as a unique entity unlike anything else in the universe. It has to put objects in categories so that it may apply its hard-won knowledge about similar objects, encountered in The squares marked A and B are the same shade of grey. the past, to the object at hand. But whenever one tries to program a set of criteria to capture the members of a category, the category disintegrates. Leaving aside slippery concepts like ”beauty” or ”dialectical materialism”. The fact is most challenging part of computer vision is understanding and reasoning and give the image meaning and description like the mind do, Assume Soon autonomous robots of all shapes and sizes, from cars to hospital helpers will be a familiar sight in public. But in order for that to happen, the machines need to learn to navigate our environment, and that requires a lot more than a good pair of eyes. There are some images that our brains consistently put together incorrectly, and these are what we call optical illusions. Optical illusions are interesting because if mathematical models of vision can predict new ones, it’s a useful indicator that the model is reflecting human vision accurately. Optical illusions are intrinsically fascinating magic tricks from nature but at the same time they are also a way to test how good the model is. For instance, Most robots would not be fooled by the Adelson checkerboard illusion where human think two identical grey squares are different shades Figure Although it might seem like the machine wins this round, robots have problems recognizing shadows and accounting for the way they change the landscape. Computer vision suffers badly when there are variations in lighting conditions, occlusions and shadows are very often considered to be real objects. This is why autonomous vehicles need more than a pair of suitably advanced cameras. Radar and laser scanners are 5 necessary because machine intelligence need much more information to recognize an object than we do. Its not just places and objects that robots need to recognize. To be faithful assistants and useful workers, they need to recognize people and our intentions. Military robots need to correctly distinguish enemy soldiers from frightened civilians, and care robots need to recognize not just people but their emotions. The contextual awareness needed to safely navigate the world is not to be taken lightly. Imagine a plastic ball rolling into the road. Most human drivers would expect that a child might follow it, and slow down accordingly. A robot can too, but distinguishing between a ball and a plastic bag is difficult, even with all of their sensors and algorithms. And that is before we start thinking about people who might set out to intentionally distract or confuse a robot, tricking it into driving onto the pavement or falling down a staircase. Could a robot recognize a fake road diversion that might be a prelude to a theft or a hijacking? An intelligent being has to deduce the implications of what it knows, but only the relevant implications. Dennett points out that this requirement poses a deep problem not only for robot design but for epistemology, the analysis of how we know. The problem escaped the notice of generations of philosophers, who were left complacent by the illusory effortlessness of their own common sense. III. C ONCLUSION There is a big difference between seeing the world, and understanding it. seeing is just a starting point to understand the world. The hard part is getting a robot to intelligently identify what it has detected. We take for granted what goes into creating our own view of the road. We tend to think of the world falling onto retinas like the picture through a camera lens, but sight is much more complicated. The whole visual system shreds images, breaks them up into maps of color, maps of motion, and so on, and somehow then manages to reintegrate that. How the brain performs this trick is still a mystery. I conclude and predict that for the foreseeable future we have to have a human monitoring the system. It is not realistic to say any time soon computers will take over and make all decisions on behalf of the driver. R EFERENCES [1] S. Pinker How The Mind Works, London, England: Penguin Group, 1997. [2] S. Pinker Interview at Cambridge University 1997, London, England: Penguin Group, 1997. [3] Wikipedia https://en.wikipedia.org/wiki/Computer vision
© Copyright 2026 Paperzz