Reciprocal Modelling of Active Perception of 2-D Forms in a Simple Tactile-Vision Substitution System JOHN STEWART and OLIVIER GAPENNE Université de Technologie de Compiègne, COSTECH, Dept. TSH, Centre P. Guillaumat, BP 309, 60203 Compiègne Cedex, France; E-mail: [email protected] Abstract. The strategies of action employed by a human subject in order to perceive simple 2-D forms on the basis of tactile sensory feedback have been modelled by an explicit computer algorithm. The modelling process has been constrained and informed by the capacity of human subjects both to consciously describe their own strategies, and to apply explicit strategies; thus, the strategies effectively employed by the human subject have been influenced by the modelling process itself. On this basis, good qualitative and semi-quantitative agreement has been achieved between the trajectories produced by a human subject, and the traces produced by a computer algorithm. The advantage of this “reciprocal modelling” option, besides facilitating agreement between the algorithm and the empirically observed trajectories, is that the theoretical model provides an explanation, and not just a description, of the active perception of the human subject. Key words: active perception, computer modelling, sensory-motor invariants, visual tactile sensory substitution 1. Introduction 1.1. MODELLING PERCEPTION AS A SENSORI - MOTOR INVARIANT The theme of “active perception”, which has attracted a growing volume of research over the last 20 years, comprises two major currents. The first current, which is relatively classical, assumes that the object of perception is a pre-given referent, and that the aim of perception is to actively elaborate an accurate representation of this object. In this objectivist perspective, the role of the action is to increase the amount of information available to the subject; the question then becomes that of the efficient integration of this information in order to derive a coherent representation. The prime references in this direction are those of Marr (1982), Poggio (1983) and Ullman (1980). This approach continues to be the object of substantial work, both in the area of simulation (Crowley and Christensen, 1995), and with respect to the identification of the underlying neuronal mechanisms (Kiper and Carandini, 2002). The second current dispenses with the notions of a pre-given object and representations. Rather, the aim of perception is conceived as the identification of Minds and Machines 14: 309–330, 2004. © 2004 Kluwer Academic Publishers. Printed in the Netherlands. 310 JOHN STEWART AND OLIVIER GAPENNE invariants directly in terms of the ongoing sensory-motor dynamics. In this perspective, the role of the action is not just to provide a supplement of information, but more fundamentally to constitute the very object of perception. The classical work in this direction is that of Gibson (1979); in the “ecological approach to visual perception”, the relevant information is conceived as existing directly in the “optic flow”. This approach has been characteristically developed in the field of autonomous robotics, where Brooks (1987), for example, speaks quite explicitly of “intelligence without representations”. More generally, work in this constructivist perspective centers on the genesis of perceptive invariants (Piaget 1967; Varela et al 1993; Stewart 1996; O’Regan and Noe, 2001). The work to be described in this paper is situated quite clearly in the second orientation. The aim of this paper is to model the action strategies deployed by human subjects performing a perceptive task. The experimental setup derives from the pioneering work of Bach-y-Rita (1972) on Tactile Visual Sensory Substitution (TVSS), in which the output from a video-camera was transduced to a 20 × 20 array of tactile stimulators placed on the chest or back of blind subjects. This work established three fundamental results: (i) If the camera remained stationary, form recognition by the subjects was poor, even after prolonged learning; (ii) however, if the subjects were able to act by moving the camera themselves – left to right, up and down, zoom and so on – their perceptive capacities improved remarkably. After only a few hours learning, they were able to recognize simple shapes, and after 2 weeks of daily practice their perception developed to the point of recognizing faces. (iii) Concomitantly, there was a qualitative shift in the nature of their perception. In the static phase, (i) the subjects reported feeling a tingling at the site of tactile stimulation of the skin (on chest or back). However, in the active phase, after learning, the sensory stimulation as such dropped out of their consciousness (although it could be recovered by deliberately refocusing their attention on this aspect), and was replaced by “direct perception” of the objects which were identified as being situated “in depth” in a three-dimensional space in front of them. It is also interesting to note that if, unbeknown to the subject, the “zoom” was activated so that the tactile “image” expanded, the subject instinctively recoiled – in other words, expansion of the image unrelated to an action of the subject was interpreted as the rapid and dangerous approach of a real object. In order to further investigate the fundamental issues in perceptual cognition that are involved, we have renewed these experiments by deliberately simplifying the sensory input to its simplest possible expression – i.e. a single point of tactile stimulation, which is either active (xxx) or inactive (ooo).1 The perceptual task we have chosen is the identification of simple two-dimensional forms – broken lines and curves (Lenay et al., 1999). The sensory input is provided not by an optical camera, but by a magnetic pen in conjunction with a graphic tablet connected to a computer which furnishes a virtual image (black or white pixels, which can be displayed on the screen visible to the scientific observer but is of course not visible RECIPROCAL MODELLING OF ACTIVE PERCEPTION OF 2-D FORMS 311 to the experimental subjects). In the experiments to be reported here, the subjects are sighted but blindfolded. When the subject moves the pen so that the tip is over a black pixel on the virtual image, the tactile stimulator (placed under the index finger of the other hand) is activated; if the tip is over a white pixel, there is no activation. This experimental setup has been described in previous publications (Lenay et al., 1997, 1999; Hanneton et al., 1999). Under these conditions, the sensory input is reduced to a temporal sequence: “ooooooxxooooxxoxxxxxx”, etc. It is thus immediately evident that there is no conceivable “information processing” of the input signal that can convert it into the perception of a 2-D line or curve. In other words, we have quite deliberately employed an experimental setup which illustrates, paradigmatically, the thesis of “active perception”, i.e. there is no perception in the absence of actions on the part of the subject. The question arises as to whether the sensory input has not been impoverished to such an extent that perception is impossible even if the subject can act; but the answer to this question is that the subjects do indeed succeed in perceiving 2-D forms, and are able to demonstrate their perception by drawing the figures as they have perceived them. A major advantage of this experimental setup is that forces an externalisation of the actions of the subjects so that it is possible to record them, in the form of a trace of the successive positions of the tip of the pen. These trajectories, together with the drawings by the subjects of the figures as perceived, constitute a rich set of empirical data. A major aim of this paper is to develop a conceptual framework and a set of qualitative categories which are an essential prerequisite for the fine quantitative analysis of such empirical data. 1.2. MODELLING PERCEPTION THROUGH A RECIPROCAL PROCESS Human subjects, unlike robots and animals, are able both to consciously describe their own strategies, and to apply explicit strategies; the originality of the work presented here consists of exploiting this feature both to inform and to constrain the modelling process. In general, when modelling a natural phenomenon, it is usual to consider that the phenomenon is fixed and given, and that the aim of the theoretical model is to conform as closely as possible to this pre-given referent. Epistemologically, it may be remarked that there is no such thing as a complete, neutral description of a natural phenomenon. An experimental setup is usually designed precisely in order to exhibit with particular clarity those features of a phenomenon that are theoretically relevant; and scientific descriptions generically take the form of measurements which presuppose abstract categories (e.g. “length” or “weight”) which derive not from “the thing in itself” (this is the empiricist illusion), but from a theoretical model. However, subject to this caveat, the natural phenomenon is not usually manipulated in order to conform to the model. 312 JOHN STEWART AND OLIVIER GAPENNE In the present case, modelling of the “strategy of action” deployed by a human subject has been subject to two constraints: (a) the characterization of the “strategy” employed by a human subject should be sufficiently “complete” and precise to generate a functional computational algorithm which produces traces qualitatively (and, ideally, quantitatively) comparable to those actually produced by the human subject; (b) concomitantly, and reciprocally, the computational algorithm should employ only strategies that can demonstrably be realized by the human subject. To the extent that this double constraint can be respected, the scientific goal of fully articulating theory and experimental data will be achieved. Now these twin constraints do not in themselves prohibit modification of the human strategy in order to conform to the model. In the work to be presented here, we have deliberately taken advantage of this leeway by modifying the human strategy in relation with the modelling process itself. In other words, the strategy employed by the human subject in this study was itself informed by knowledge of strategies that the computer algorithm showed to be effective. We denote this option, consisting of deliberately modifying the empirical results in order to conform to a theoretical model, by the term “reciprocal modelling”. This option clearly facilitates the achievement of convergence between a human strategy and a computational algorithm, but we hope to show that it remains far from trivial and has additional advantages. 2. Methods 2.1. STRATEGIES OF ACTION : RELIABILITY AND PLURALITY As we have already emphasized, under the experimental conditions that we have deliberately chosen here, the sensory input alone cannot possibly give rise to the perception of a form. In fact, the sensory feedback is used rather to guide the actions of the subject, in such a way that the sensory input tends towards an “ideal” form (e.g. a temporal sequence in which the tactile stimulation is alternately active and inactive: “xxxoooxxxoooxxx”, etc; a regular sequence of this type contains a maximum amount of information). This way of looking at things is closely akin to the “Perceptual Control Theory” of Powers (1988). It will be noted that the sensory regularity in question is rhythmic and temporal rather than immediately spatial. In fact, what the subject “perceives” is not the sensory input as such (just as in Bach-yRita’s experiments, this tends to fade from consciousness in experienced subjects), but rather the actions which are necessary in order to produce the “ideal” input. Indeed, even a cursory examination of the traces produced by human subjects in previous experiments reveals that already in the “perceptive” phase (i.e. before actually drawing the figure to demonstrate what they have perceived), their actions amount to “drawing” the figure. This being so, the prime object of modelling active perception must be to characterize the actions of the subjects. More precisely, the aim is to identify the strategies RECIPROCAL MODELLING OF ACTIVE PERCEPTION OF 2-D FORMS 313 of action deployed by the subjects. A model, in this sense, consists of making fully explicit the process by which a subject generates his actions, and in particular to specify how the sensory feedback is used to guide future actions. In this context, computer simulation is an invaluable tool, since it guarantees that the specification of the way in which actions are generated (including their modification according to sensory feedback) corresponds directly to a functional computational algorithm. We have observed in previous experiments (Lenay et al., 1999) that the traces produced by relatively naive human subjects are highly erratic, and in fact not very reliable in terms of effectively perceiving the 2-D figures; it would be not only difficult, but arguably not very pertinent, to try and model such chaotic flounderings. After a suitable period of learning (repeated sessions of an hour or more, over several weeks), the “strategy of action” deployed by a given human subject stabilizes into a regular, reliable form. The aim of this paper is to characterize one of these strategies. As explained above, we have adopted a “reciprocal modelling” approach in which the learning process of the human subject was itself informed by knowledge of strategies that the computer algorithm showed to be effective. In general, even in the simplest cases, there is no single solution to the problem of specifying a functionally adequate strategy. The task of specifying a strategy is, typically, an inverse problem. Once a strategy is fully specified, it is a matter of straightforward deduction to determine whether or not the strategy is successful in reliably achieving perception; but there is no direct, deductive path leading from the problem to its solution. All that can be done is to speculatively imagine a hypothetical solution; and then to test deductively whether this is actually a solution. However, generically, inverse problems have an awkward property: there is no a priori guarantee that there exists even a single solution; but if one solution does exist, then there is usually a plurality (often an unlimited plurality) of solutions. The scope of the present paper is limited to a pragmatic existence proof, i.e. demonstrating by example that there is at least one functionally adequate strategy of action. Exploration of the full plurality of action strategies employed by human subjects, including an attempt to organize them categorically, will be the object of future work. 2.2. THE SYSTEM OF CO - ORDINATES Another methodological consideration, which conditions the very possibility of writing down a computational algorithm, concerns the system of co-ordinates in which the actions are to be described and generated. An exhaustively realistic model of a human subject would be based on the anatomical articulations of the fingers, hand, wrist, elbow, shoulder and bust and the multiple muscles which determine their movements. The control of movement of the poly-articulated human body is a fascinating subject; but in the present case, a model of this sort would be excessively complicated. An appropriate simplification stems from the fact that 314 JOHN STEWART AND OLIVIER GAPENNE all the muscular actions of the human subjects are resolved into displacements of the tip of the pen. Empirical support for this simplification is provided by the fact that composite movements in a trajectory modification task were actually better described by a superposition scheme applied to pen-tip kinematics than applied to joint kinematics (Schillings et al., 1996). Thus, in order to calculate the sensory feedback consequent upon any given action, it is necessary and sufficient to specify the resultant position of the tip of the pen in the discrete cartesian co-ordinates, {ix, iy}, of the bitmap in which the pixels of the figure to be perceived are stored in the computer. This suggests the very simple possibility of employing just these co-ordinates in order to describe the actions of the subject. However, upon consideration it rather rapidly becomes evident that this possibility does not respect constraint (b) above, i.e. human subjects are quite incapable of functioning in this system of coordinates as such. In “motor” mode, a human subject is quite incapable of placing the tip of the pen on, for example, the point {37, 71}; and in “proprioceptive” mode, equally incapable of specifying the cartesian co-ordinates of the tip of the pen at any given moment. The difficulty here is not reducible simply to an unknown but constant “scaling factor” between the external, objective referent {ix, iy} and an internal subjective representation of this referent. The fact is that human subjects simply do not have any subjective representation of this kind – at least not with a precision sufficient to be functionally operational in the experimental situation as set up here. This is shown, rather conclusively, by the following observation. Suppose that the tip of the pen is placed at a certain position, {ix, iy}, which is part of the figure so that the sensory feedback is positive. If, now, the subject inadvertently moves the tip more than a few pixels away, he is quite incapable of reliably returning to the initial position in order to recover the stimulus. In other words – as any novice who has tried the experiment will amply testify – it is fatally easy to become “lost”, i.e. to find oneself in a situation where, after an initial “contact” with the figure, there is no sensory feedback and one has little or no idea how to recover one; and even if, after some desperate flailing around, one does succeed in again obtaining a positive sensory stimulation, there is no reconizable relation to the previous contact, so that the task of systematically perceiving the figure has to start all over again. We will come back to this observation, since it places a major constraint on functionally effective strategies of action. For the moment, we retain simply the conclusion that human subjects do not function directly in {ix, iy} co-ordinates. If not {ix, iy}, what then is the system of co-ordinates employed by human subjects who succeed in the perceptual task? For the purposes of modelling, we have employed the hypothesis (not so far refuted by empirical evidence) that human subjects possess a sense (i) of their absolute orientation (relative to the bust, which can be held steady), which we designate by the symbol “θ”; and (ii) of the relative distance of any given movement with respect to their previous position, which we RECIPROCAL MODELLING OF ACTIVE PERCEPTION OF 2-D FORMS 315 designate by the symbol “d”. This sensory-motor capacity is, of course, subject to error; if it were not, the subjects would be able, by path integration, to function correctly (according to all external appearances) in the {ix, iy} co-ordinate system. We will address the question of the quantitative estimation of the error involved in the next section (2.5). Two points are worthy of note here. The first is, that in this subject-centered co-ordinate system human subjects have built up, via a developmental process since birth, a reliable two-way mapping between their motor actions and their proprioceptive sensations (Gullaud and Vinter, 1996). In other words, if a human subject makes a movement, he knows what that movement is; and conversely, if a movement is induced passively (in the present case, if the tip of the pen is moved by an external force), the subject not only perceives the movement, but is immediately able to reproduce it by a voluntary motor action. It is an open question as to how this two-way mapping is implemented neurophysiologically. There are three general options here. The first is that the contribution of action to perception is exhausted by proprioceptive feedback in the form of sensory feedback from the muscles. The second option is that the contribution of action is exhausted by efferent copies of the motor commands that determine where the tip of the pen was commanded to go. The third option is that the contribution of action is a combination of the first two options. These options are described at greater length in Mandik (1999). The present paper has no claim to contribute to an identification of these underlying mechanisms; we simply observe, empirically, that human subjects do have functional knowledge of the movements of the tip of the pen when these are expressed in {θ, d} co-ordinates (but not when expressed in {ix, iy} co-ordinates). As already remarked, this involves a higher level integration of the many muscles involved in finger, hand, wrist, elbow and shoulder movements. It is likely that muscular proprioception and efferent copies of motor commands are also integrated in this process (i.e. the third option above is plausible), but we have no evidence on this point. To sum up, the work presented here neither makes nor requires a commitment to one of the three options outlined above. The second point is even more important and instructive. Maturana and Varela (1987) have emphasized the importance, in cognitive science generally, of making a clear distinction between (i) the “objective” situation, as it can be described in a third-person perspective by an external observer; and (ii) the “subjective” situation, as it is accessible in a first-person perspective to the cognitive subject in his own terms. According to the tenets of the computatio-representational paradigm, cognition consists of establishing a bijectional isomorphism between these two terms such that (ii) is a “representation” of (i); but this, Maturana and Varela vigorously deny. Although there must be some sort of relationship between the two terms, in order to satisfy a very general reality principle, they are not prima facie commensurable, so that the question of a simple mapping relationship between the two simply does not arise. This thematic opposition, between the situation as expressed 316 JOHN STEWART AND OLIVIER GAPENNE (i) in the terms of an external observer, and (ii) in the terms of the cognitive subject, is tellingly illustrated in the present context by the contrast between the two coordinate systems, {ix, iy} on the one hand and {θ, d} on the other. The existence of some relationship is exemplified by their identity under path integration; but the absence of any intrinsic mapping is illustrated by the fact that, in the presence of “noise” due to the limited precision of θ and d, the two tend inexorably to drift apart. 2.3. ESTIMATION OF PRECISION As a final methodological point prior to the specification of a particular strategy of action, it will be useful at this point to deal with the question of a quantitative estimation of the error in θ and d in typical human subjects. These errors can be estimated from previous experiments using the same experimental setup, in which human subjects draw their perceptions of simple figures composed of one or more straight segments (Hanneton et al., 1999; Lenay et al., 1999; Gapenne et al., 2001). For convenience, the angles are expressed in radians, and the standard deviation multiplied by 100 to obtain a “percentage” error, δθ. Similarly, the precision of d is expressed as a percentage error of the mean length, δd.2 The results (for 166 segments) are the following: δθ = ±18%; δd = ±25%; the total error is thus ±31% (unpublished results). These values confirm the unreliability of “dead reckoning” by path integration to achieve a functional equivalent of co-ordinates in {ix, iy}: for displacements of greater than 10 pixels, with respect to a line 2 pixels thick, the subject cannot be sure of “recovering” the line. This degree of precision also sets a constraint on “viable” strategies, which must be robust with respect to errors of this order of magnitude. In all the simulations to be reported below, the movements of the tip of the pen are affected with a pseudo-random error of ±30%. 3. Definition of a Specific Model Strategy The complexity of a strategy adequate to achieve active perception of a given form depends on the complexity of the form in question. The approach adopted here is to proceed progressively. The first stage is to achieve perception of a single straight line; and then to pass, in order, to curved lines; broken lines with obtuse angles; and broken lines with acute angles. The strategy to be presented here successfully copes with this range of complexity. Even greater complexity arises with the introduction of topological singularities: T-junctions and X-crosses. Such figures lie beyond the scope of the present paper, and will be adressed in future work. RECIPROCAL MODELLING OF ACTIVE PERCEPTION OF 2-D FORMS 317 3.1. A BASIC STRATEGY FOR A SINGLE STRAIGHT LINE We are now in a position to specify a particular strategy; to start with, for the perception of a single straight line. Even here, the strategy consists of a number of successive sub-components. For each of these steps, we will systematically and thematically distinguish: (i) a description from the point of view of an external observer, i.e. in the coordinate system {ix, iy} – this is not only the framework for the Figures 1–8 illustrating the traces produced, it is also useful for intuitively formulating the sub-task to be achieved at each stage; and (ii) an operational description of the “strategy” in terms of {θ, d}, which is the basis for the computational algorithm actually employed to generate the traces, and which is putatively a description of the strategy deployed by the human subject. 3.1.1. The Broad Scan In terms of {ix, iy}, the first task is to “find” the figure, i.e. (for the subject) to obtain a positive sensory feedback. Starting (conventionally) in the top left-hand corner of the total field, the strategy implemented by the computer algorithm consists of “scanning” the field by a series of near-horizontal but gradually descending sweeps. Operationally, expressed in terms of θ and d, the algorithm is the following. The first action is a movement of distance d1 equal to the field width, at an angle 90◦ + 10◦ (measured in degrees, anti-clockwise from due North as 0◦ ); this is a nearhorizontal sweep, descending at an angle close to the probable error. If no positive sensory stimulation is encountered, the next sweep is at an angle of (270–10)◦ , i.e. horizontally in the reverse direction, with the same angle of descent. This procedure is repeated until a positive sensory stimulation is obtained. The upper part of Figure 1 illustrates a typical result. 3.1.2. The Fine Scan In terms of {ix, iy}, the second task is to get close to the figure. This is achieved, in operational terms, by reversing the direction of the previous sweep but without descending (i.e. an orientation of 270◦ or 90◦ as the case may be); the distance, however is reduced to (about) 50% of the previous distance (i.e. di+1 = 0.5∗ di ). If a positive sensory stimulation is encountered, the direction is again reversed by 180◦ ; if not, the direction is maintained; in either case, the distance is again reduced by 50%. This procedure is iterated until dn is less than some constant preset value, designated by the parameter “amp”. A suitable value for amp is the radius of the circle (see 3.1.3 below), taken as being the same as the amplitude of the microsweep (see 3.1.4 below). As a precaution, if the last micro-sweep (with a value of dn which has fallen below a) did not itself already cross the figure, it is prudent to 318 JOHN STEWART AND OLIVIER GAPENNE prolong the procedure for one final iteration, but with an unreduced value of dn ; logically, this micro-sweep should effectively encounter the figure. If it does not, the strategy has failed. From the point of view of an external observer, when this iterative algorithm terminates, the point of the pen will be close to a segment of the figure; for what follows, it will be convenient to label this “point B”. Typical traces for both steps 1.1 and 1.2 are shown in the upper part of Figure 1. We may make, here, a remark which we shall have occasion to repeat: an important quality of this strategy is that at any given point in time, the subject “knows” (from the observer’s near-omniscient point of view) whether he is to the left or to the right of figure, so that he is never “lost”. It might be remarked that this procedure will fail if the figure consists of a single near-horizontal straight line: either the broad scan 1.1 will arrive at the bottom of the field without ever having encountered the figure, or (if one such encounter has occurred), in phase 1.2 the figure will never be found again. In simulations where the figure is deliberately set to this singular form, that is indeed what occurred. In this case, there is nothing for it but to begin over again with step 1.1 from the top left-hand corner, but in this case with near-vertical sweeps - i.e. angles of θ successively (180 - 10)◦ and (0 + 10)◦ . 3.1.3. The Circle Having “found” the figure, the next task will be to “track” the line or curve. However, in order to do this (see 3.1.4 below), it will be necessary to have a current estimation of the orientation of the line at time t, denoted by θ(t). This estimate will be continuously updated by the tracking procedure itself, but in order to start the process an initial estimate is necessary. In the present strategy, this is achieved by tracing a circle, of radius amp and centre at the point b. To be quite explicit: this involves augmenting the repertoire of actions of which subjects are presumed capable, to include the capacity of drawing a circle around a given point {cx, cy}. This is not in contradiction with the supposed incapacity of subjects to function in {ix, iy} co-ordinates, since the latter are absolute (i.e. do not drift over time), whereas the {cx, cy} co-ordinates (expressed as real numbers in the algorithm) are always relative, and thus do drift over time. The question as to whether human subjects do or do not have the capacity of drawing local figures (here, the circle; in step 3.1.4 below, a sine curve) with respect to a relative, local point {cx, cy} is an empirical one. Although most human beings, unlike than Leonardo da Vinci, are not able to trace perfect circles, the approximate circles centred on a given point that are within their capacity prove quite adequate in practice. Normally, the circle thus produced will produce two positive sensory stimulations per revolution. If the temporal rhythm of the two sensory stimulations “limps” (i.e. in terms of the external observer, the two points of contact with the figure are not diametrically opposite each other), the “circle” is traced again, with a centre shifted to the side where the two points are closer together (spatially closer for RECIPROCAL MODELLING OF ACTIVE PERCEPTION OF 2-D FORMS 319 the external observer, temporally closer for the subject). This procedure is iterated until the two points are diametrically opposite (i.e. the rythm is regular, to a prespecified degree of precision). These two points then indicate the orientation of the line, thus providing the requisite initial estimate of θ(t); the strategy has the additional advantage that the centre of the circle is itself “centred” on the line. If a complete revolution produces just one positive sensory stimulation, this indicates that the position is at the end of the line (cf 3.1.5 below); in this case, another circle is traced, with its centre at the previous point of contact; by iteration (if necessary), this leads to the previous case with two points of contact. If the circle makes no contact with the figure, the strategy has failed; in the simulations, the strategy is robust and this practically never happens. Typical traces generated by this algorithm are also illustrated in Figure 1. This “circle” strategy provides a clear example of the reciprocal modelling option described in 2.2. In the case of the traces spontaneously produced by human subjects, this strategy is rarely employed, and even then only sporadically. However, the computer algorithm demonstrates the usefulness and robustness of this strategy, since it is successfully employed not only here, but at subsequent points in the strategy (see 3.1.5 and 3.2.2 below). In general terms, it is a useful diagnostic strategy at moments of uncertainty. Moreover, it is well within the capabilities of a human subject to deploy this strategy systematically, as illustrated in Figures 4, 6 and 8. 3.1.4. Tracking From the point of view of the external observer, it might be thought that once the tip of the pen is placed on the line, with a knowledge of its local orientation, all that remains to do is to follow the line by seeking to remain directly on it (i.e. to produce a continuous tactile stimulation as sensory feedback). This strategy is indeed attempted by some novice human subjects, but it has a serious drawback: when (as inevitably happens, and in practice sooner rather than later) the sensory feedback ceases, the subject does not know why (from the external observer’s point of view, the subject does not know on which side of the line the deviation has occurred). The subject is therefore in great danger of being “lost”, and having to start all over again. The strategy presented here therefore consists of quite deliberately crossing the line. Operationally, the action consists of drawing a sin curve about the point {cx, cy}. More specifically, movements orthogonal to the current orientation θ(t) are the same as the action of drawing a circle (cf. 3.1.3 above); movements in the direction of θ(t) are produced by advancing the current point {cx, cy} at constant velocity in this direction; the composition of these two movements effectively produces a sin curve with amplitude amp. We define a “unit of action” as a single micro-sweep crossing the line (i.e. advancing 180◦ in the phase of the sin curve). 320 JOHN STEWART AND OLIVIER GAPENNE Due to imprecision in the initial estimate of θ(t), and even more importantly due to inevitable “noise” (i.e. drift in the position of {cx, cy} with respect to perfect path integration – recall that noise is set at +30% in these simulations), the external observer sees that the sin curve becomes progressively decentred with the respect to the line. For the subject, this will be reflected by the fact that the positive tactile stimulation will not occur at the mid-point of the sin curve. In the algorithm, the discrepancy between the actual occurrence of the positive tactile stimulation and its ideal mid-point position is used to generate two corrections, applied after each unit of action: (i) a lateral shift of {cx, cy} in a direction orthogonal to θ(t); and (ii) an adjustment to θ(t), calculated as the lateral shift from (i) divided by the distance that {cx, cy} has advanced in the direction of θ(t) in the course of the “unit of action”. The gains on each of these corrections are variable parameters; for suitable values of the gains, the algorithm is robust in maintaining an advancing sin curve centred on the line of the figure to be perceived. The non-triviality of this result is illustrated by the fact that if the values of either parameter are too low or too high, in the presence of noise the sin curve “loses” the line. A typical traces with optimal values for the gain parameters is shown in Figure 1. The crucial importance of these feedback correction procedures is illustrated in Figure 2a, b. In Figure 2a, the gain parameter for the lateral shift is set to zero. The problem here is similar to that of a novice car driver; if no lateral shift correction is possible, and corrections to orientation are made solely as a function of lateral deviation, the corrections “overshoot” and give rise to a divergent series. In Figure 2b, the gain parameter for the adjustment to current orientation is set to zero; the degradation in tracking capacity is even worse, thus emphasizing the crucial importance of the current estimation of θ(t) which, if not continually corrected, will drift due to noise. It is clear that in both cases, the robustness of the “tracking” procedure is severely compromised. This correction procedure provides another illustration of the reciprocal modelling option. Spontaneously, human subjects performing micro-sweeps during tracking produce functionally equivalent corrections without lifting the pen from the surface, i.e. the trace during successive micro-sweeps is continuous. The correction procedures as implemented by the computer algorithm, however, give rise to a trace which displays “breaks” between each “micro-sweep” unit of action. This is illustrated schematically in Figure 3, and can be seen in the results of actual simulations in Figures 1–3, 5 and 8. At this point in the work presented here, the question arose quite concretely as to whether the computer algorithm should be modified to model accurately the spontaneous traces of human subjects; or whether the strategy and hence the traces of the human subject should be modified in order to conform to the algorithmic model. The former possibility would not have been radically impossible, although quite difficult and “messy”; in practice, it was deemed simpler to adopt the latter approach, i.e. to modify the spontaneous RECIPROCAL MODELLING OF ACTIVE PERCEPTION OF 2-D FORMS 321 human strategy in order to conform to the model. As illustrated in Figures 4, 6 and 8, this was indeed well within the capabilities of the human subject. Finally, to complete the “tracking” module of the strategy, it is supposed in the algorithm that the subject can stock in memory a record of θ(t) and d(t) (i.e. the cumulative distance that {cx, cy} has advanced along the line) each time the trace crosses the figure: a pair of values (θ i di ) is thus recorded for each “unit of action” i. 3.1.5. End of Line The “tracking” module 1.4 is iterated as long as each “unit of action” produces one and only one positive sensory stimulation. We will deal with the case of more than one positive sensory stimulation later. Here, the case to be dealt with is when a unit of action does not produce any positive sensory stimulation in return. From the point of view of an external observer, this will normally occur because the trace has reached the end of the line to be perceived; but it may also be because the tracking sine-curve has inadvertently drifted to one side or other of the line (or, as we shall see in 3.2.2, because the figure consists of a broken line with a relatively sharp angle less than about 150◦ ). Operationally, the procedure in this case is: (i) “backtrack” the last (empty) unit of action; since the total distance involved is small, of the order of 2∗amp, the error involved is small. (ii) From this point, draw a circle centred on the current value of {cx, cy}. If a complete revolution produces just a single positive tactile stimulation, this is a topologically reliable indication that the end of the line has indeed been reached. (The case in which the circle produces two distinct positive tactile stimulations will be dealt with below in 3.2.2.) 3.1.6. Return Tracking If the hypothesis “end of line” is confirmed by the previous step (3.1.5), the tracking procedure is repeated in the reverse direction using the same tracking procedure as in 3.1.4. The sole difference is that when storing values of (θi di ), the cumulative distances di must be incrementally decreased instead of increasing as in 3.1.4. This procedure is continued until an “end of line” is reached as in 3.1.5. The whole procedure of “return trackings” can be repeated as often as desired. 3.1.7. Drawing What Has Been Perceived Using the stored values of (θi di ), the final step consists of drawing what has been “perceived”. In the computer simulations, the figure will be drawn as many times as there are return trackings. This set of procedures, 3.1–7, specifies the basic strategy for the perception of simple lines. A typical result produced by the algorithm as specified is illustrated in Figure 1. This figure is to be compared with Figure 4, which shows the analogous traces produced by a human subject employing the same qualitative strategy. 322 JOHN STEWART AND OLIVIER GAPENNE Figure 1. The traces produced by the computer algorithm specified in the text, 3.1.1–7. The top part of the figure shows the broad scan 1.1 until a first sensory return (small open circle) is obtained; the fine scan (1.2); the circle (1.3), the tracking procedure (1.4) and the “end of line” circle (1.5). The return trackings (1.6) are now shown. The middle part of the figure shows a drawing of the figure as perceived by the algorithm (1.7), based on four successive return trackings. The lower part of the figure shows the data stored in memory, i.e. a record of current orientation “Theta” as a function of cumulative distance. 3.2. COMPLEX FIGURES 3.2.1. Curves and Wide Angles For smooth curves, the strategy described in 3.1.4 – i.e. (i) a lateral shift of {cx, cy} in a direction orthogonal to θ(t) and (ii) an adjustment to θ(t) with appropriate values for the gain parameters – is sufficiently robust to ensure success of the “tracking” procedure without any additional sophistication. This is the case even if the radius of curvature is quite small, resulting in a tight curve. A typical result produced by the computer algorithm is shown in Figure 5; this can be compared with the analogous traces produced by a human subject shown in Figure 6. The same algorithm also works without modification for broken lines if the angle is very wide – approximately in the range 150◦ –180◦ (not shown). 3.2.2. Sharp Angles For sharper angles – roughly in the range 60◦ –150◦ – the same strategy 3.1.4 can occasionally work, depending on the contingencies of the approach to the angle. However, with increasing frequency as the angle becomes sharper, the change of RECIPROCAL MODELLING OF ACTIVE PERCEPTION OF 2-D FORMS 323 Figure 2. (a) The situation is the same as in Figure 1, but the gain parameter specifyingthe lateral shift correction (see 3.1.4.) has been set to zero. It can be seen that the “tracking” procedure is severely compromised; the algorithm actually ends by “losing” the line altogether. (b) The situation is the same as in (a), but here it is the parameter specifying the corrections to current orientation Theta that is set to zero. The degradation of the tracking procedure is even worse than in (a), as seen in the lower part of the figure where the spread of values of Theta is clearly inadequate (cf the closely grouped values of Theta in Figure 1 where the correction procedures are in operation). 324 JOHN STEWART AND OLIVIER GAPENNE Figure 3. A schematic illustration of the effects of the algorithmic correction procedures (lateral shift and adjustment of orientation) described in 3.1.4. During the first unit of action, on the left, the point {cx, xy} moves from C1 to C2, producing a sin-wave trace from T1 to T2. If the positive tactile stimulation is encountered well before the midpoint of the trace (indicated here by the open circle), the correction procedure will introduce a lateral shift upwards, together with an adjustment of the orientation, so that the point {cx, cy} will move from C3 to C4 with a corresponding trace T3–T4. This procedure clearly introduces a discontinuity between T2 and T3. Figure 4. The traces produced by a human subject employing a strategy qualitatively similar to that underlying the computer algorithm illustrated in Figure 1. The upper part of the figure shows the broad scan, the fine scan, the initial circle, tracking, and the “end of line” circle (3.1.1–5). The next part of the figure illustrates a “return tracking” (1.6); then a drawing of the figure as perceived by the subject; finally, at the bottom, the “true” figure to be perceived. RECIPROCAL MODELLING OF ACTIVE PERCEPTION OF 2-D FORMS 325 Figure 5. The traces produced when the computer algorithm specified in 3.1.1–7 is applied, without further modification, to a figure which is a fairly tight three-quarters arc of a circle. Upper left: the traces produced on one of the “track returns” (3.1.6); upper right: drawing of the figure “as perceived” for a total of five return trackings. The “drift” due to the fact that the algorithm has no internal representation of absolute position in {x,y} co-ordinates is clearly visible. The lower part of the figure shows the stored memory values in {θ, d} co-ordinates; the arc of a circle appears in these co-ordinates as a linearly decreasing function. Figure 6. The traces produced by a human subject in a situation analogous to that in Figure 5. The upper part of the figure shows a “return tracking” (1.6); lower left is the “true” figure to be perceived, and lower right a drawing of the figure as perceived. 326 JOHN STEWART AND OLIVIER GAPENNE direction in the line gives rise to a situation in which the micro-sweep unit of action described in 3.1.4 fails to produce a sensory return. The algorithmic procedure in this case is the same as described in 3.1.5 (as indeed it must be, since the subject does not know what is the situation as seen by an external observer; from his point of view it is quite possible that an “end of line” has been reached). In other words, the procedure is again: (i) “backtrack” the last (empty) unit of action, and (ii) from this point, draw a circle centred on the current value of {cx, cy}. However, if the situation as perceived by an external observer is that of a broken line, the “circle” will produce now two distinct sensory returns. Both algorithmically and for human subjects, it is possible to identify which positive tactile stimulation corresponds to the segment of line from which one has come, and which to the new segment of line at an angle to the previous one; and in addition, to obtain an estimation of the orientation θ of the new segment from the (angular) difference between the two positive tactile stimulations “Tracking” as in 3.1.4 can then resume on the new segment. This sequence is illustrated in the upper right part of the figure in Figure 7. 3.2.3. Acute Angles For very sharp angles – approximately in the range 5◦ –60◦ – it can happen that before encountering a “silent” micro-sweep as in 3.2.2, a micro-sweep produces not one sensory return (as is normally the case when “tracking”), but two distinct returns. For an external observer, this can occur in the case of a sharp acute angle. In a manner similar to 3.2.2, it is possible to identify which positive tactile stimulation corresponds to the original segment, and which to the new segment (usually, the latter will be the second positive tactile stimulation encountered towards the end of the micro-sweep). The procedure here is to verify this hypothesis by continuing tracking until a micro-sweep with no sensory return is achieved (for the external observer, this corresponds to an advance to a point just beyond the point of the acute angle); and then to return to the point at which a micro-sweep produces two distinct positive tactile stimulations. Tracking can then resume on the segment identified as the “new” one; an estimation (in radians) of the acute angle, and hence of the new orientation, can be obtained by dividing the distance between the two positive tactile stimulations (usually a little less than a) by the distance to the point of the acute angle. This sequence is also illustrated in Figure 7, in the lower right part of the figure. The analogous traces produced by a human subject are shown in Figure 8. This completes the specification of the model strategy. 4. Discussion As can be seen in Figures 1–8, the goal of articulating theory and experiment has been realized: there is good qualitative and semi-quantitative agreement between RECIPROCAL MODELLING OF ACTIVE PERCEPTION OF 2-D FORMS 327 Figure 7. Upper left: the results of the strategies described in 3.2.2 (upper right part of the trace, where a “sharp angle” (approximately 100◦ ) is negotiated by a “circle”); and in 3.2.3 (lower right part of the trace, where an “acute angle; negotiated by recognizing that a single micro-sweep produces two distinct sensory returns). On the right is a drawing of the figure “ as perceived” for a total of four return trackings; the “drift” noted in the legend to Figure 5 is again clearly visible. The lower part of the figure represents the values stored in memory. Figure 8. The traces produced by a human subject in a situation analogous to that in Figure 7. Left: the traces produced on a “return tracking”; it can be seen that the subject employs the “circle” strategy to negotiate the sharp angle on the upper part of the trace, and that a micro-sweep produces two distinct sensory returns on approaching the acute angle on the lower part of the trace. Centre: a drawing of the figure as perceived; on the right the original figure. 328 JOHN STEWART AND OLIVIER GAPENNE the traces produced by a human subject, and the traces produced by a theoretical model in the form of a computer algorithm. This positive result has the value of an existence proof: there is at least one strategy, within the demonstrated capabilities of human subjects, that can be successfully modelled by a detailed, explicit computer algorithm. As mentioned in 2.3, the task of perceiving simple figures in the experimental setup described here is an “inverse problem” which possesses a plurality of possible solutions. Previous experiments (Hanneton et al., 1999; Lenay et al., 1999) show that after a period of familiarisation sufficient for each subject to stabilize a preferred strategy, the “strategies of action” deployed by human subjects fall into one of a plurality of regular, reliable forms. It will be the aim of future work to characterize these strategies by regrouping them in a small number of major categories, with possibly a number of variants within each category. The achievement of convergence between a human strategy and a computer algorithm has undoubtedly been facilitated by the reciprocal modelling option adopted here. In other words, the strategy effectively employed by the human subject has been influenced by the modelling process itself. This influence could have been eliminated by basing the modelling solely on examination of the empirical traces produced, without questioning the subjects as to the strategy employed. However, although this approach would be more stringent from one point of view, it would actually be less rigorous from another point of view; the reason being that we would then deprive ourselves of a resource for controlling whether the algorithm as modelled in the computer really does correspond to the strategy actually employed by the human subject in order to generate the trajectories. Indeed, it would not even be possible to verify that the procedure as specified by computer algorithm is concretely feasible for human subjects. The important point here is the following: the fact that a computer algorithm is capable of producing trajectories indistinguishable from those produced by human subjects is not, in itself, a proof that the mechanisms generating the trajectories are the same (this is a corollary feature of inverse problems). In other words, an algorithm based solely on observation of the trajectories would provide a description of these trajectories; this description would have the merit of being economical and mathematically tractable; but it would not necessarily correspond to an explanation of the trajectories. In a sense, then, the “reciprocal modelling” approach adopted here is actually more ambitious than the more conventional type of modelling, since it aims at explanation and not just description. For this reason, we propose to pursue this approach in future work. Certainly, every effort should and will be made to maximize the adjustment of the model to the “spontaneous” empirical observations, and to minimize the extent to which the strategies effectively employed by the human subjects are influenced by the modelling process itself. However, even if it is minimized, it is unlikely that the latter effect can ever be reduced to zero: if only RECIPROCAL MODELLING OF ACTIVE PERCEPTION OF 2-D FORMS 329 because the very act of requesting a human subject to render explicit what he has hitherto been successfully doing without thinking consciously about it will almost inevitably have a retroactive effect on the performance itself. Notes 1 In the basic experimental setup, the sensory feeback is a single point of tactile stimulation. However, preliminary experiments indicate that the sensory feedback can take the form of an auditory signal, or a visual signal (a stationary square on the computer screen which is either black or white), with little or no modification in the performances of the subjects. 2 These units have the convenient property that the total error is: sqrt(δθ 2 + δd2 ); with the additional property that this error is the same in {ix, iy} and θ , d coordinates. 3 The code implementing the algorithm described below, written in Visual C++, is available on request from: [email protected] References Bach-y-Rita, P. (1972), Brain Mechanisms in Sensory Substitution, New York: Academic Press. Brooks, R.A. (1987), Intelligence Without Representation, Boston: MIT Artificial Intelligence Report. Crowley, J.L. & Christensen H. I. (1995), Vision as Process. Basic Research on Computer Vision Systems, Berlin: Springer. Gapenne, O., Lenay, C., Stewart, J., Bériot, H. and Meidine, D. (2001), Prosthetic Device and 2D Form Perception: The Role of Increasing Degrees of Parallelism; in Proceedings of the Conference on Assistive Technology for Vision and Hearing Impairement (CVHI’2001), Castelvecchio Pascoli, Italy. Gibson, J.J. (1979), The Ecological Approach to Visual Perception, Boston: Houghton Mifflin Press. Gullaud, L. and Vinter, A. (1996), The Role of Visual and Proprioceptive Information in Mirrordrawing Behavior, in M.L. Simmel, C.G. Leedham & A.J.W.M. Thomassen, eds., Handwriting and Drawing Research: Basic and Applied Issues, Amsterdam: IOS Press, pp. 99–113. Hanneton S., Gapenne O., Genouel C., Lenay C. and Marque C. (1999), Dynamics of Shape Recognition Through a Minimal Visuo-Tactile Sensory Substitution Interface, in Proceedings of the Third International Conference On Cognitive and Neural Systems, Boston, pp. 26–29. Kiper, D.C. & Carandini, M. (2002), The Neural Basis of Pattern Vision, London: Macmillan. Lenay, C., Cannu S. and Villon, P. (1997), Technology and Perception: The Contribution of Sensory Substitution Systems, in Proceedings of the Second International Conference on Cognitive Technology, Aizu, Japan, Los Alamitos: IEEE, pp. 44–53. Lenay, C., Gapenne, O., Hanneton, S. and Stewart, J. (1999), Perception et Couplage Sensori-Moteur: Expériences et Discussion Epistémologique, in A. Drogoul and J-A. Meyer, eds., Intelligence Artificielle Située (IAS’99), Paris: Hermes, pp. 71–86. Mandik P. (1999), Qualia, Space and Control, Philosophical Psychology 12(1), pp. 47–60. Maturana, H. and Varela, F.J. (1987), The Tree of Knowledge, Boston: Shambhala. Marr, D. (1982), Vision: A Computational Investigation into the Human Representation and Processing of Visual Information, San Francisco: W.H. Freeman. O’Regan J.K. and No`‘e A. (2001), A Sensorimotor Account of Vision and Visual Consciousness. Behavioral and Brain Sciences 24, pp. 939–1031. Piaget J. (1967), Biologie et connaissance: Essai sur les relations entre les régulations organiques et les processus cognitifs, Paris: Gallimard. 330 JOHN STEWART AND OLIVIER GAPENNE Poggio, T. (1983), Visual Algorithms, in O.J. Braddick and A.C. Sleigh, eds., Physical and Biological Processing of Images, Berlin: Springer, pp. 128–153. Powers, W.T. (1988), An Outline of Control Theory, in The Control Systems Group Inc., Living Control Systems, Kentucky, USA, pp. 253–293. Schillings, J.J., Meulenbroek, G.J. and Thomassen, A.J.W.M. (1996), Decomposing Trajectory Modifications: Pen-Tip Versus Joint Kinematics, in M.L. Simmel, C.G. Leedham and A.J.W.M. Thomassen, eds., Handwriting and Drawing Research: Basic and Applied Issues, Amsterdam: IOS Press, pp. 71–85. Stewart J. (1996), Cognition = Life: Implications for higher-level cognition, Behavioural Processes 35, pp. 311–326. Ullman, S. (1980), Against Direct Perception, Behavioral and Brain Sciences 3, pp. 373–415. Varela F., Thompson E. and Rosch E. (1993). The Embodied Mind, Boston: MIT Press.
© Copyright 2026 Paperzz