Reciprocal Modelling of Active Perception of 2-D

Reciprocal Modelling of Active Perception of 2-D
Forms in a Simple Tactile-Vision Substitution
System
JOHN STEWART and OLIVIER GAPENNE
Université de Technologie de Compiègne, COSTECH, Dept. TSH, Centre P. Guillaumat, BP 309,
60203 Compiègne Cedex, France; E-mail: [email protected]
Abstract. The strategies of action employed by a human subject in order to perceive simple 2-D
forms on the basis of tactile sensory feedback have been modelled by an explicit computer algorithm.
The modelling process has been constrained and informed by the capacity of human subjects both to
consciously describe their own strategies, and to apply explicit strategies; thus, the strategies effectively employed by the human subject have been influenced by the modelling process itself. On this
basis, good qualitative and semi-quantitative agreement has been achieved between the trajectories
produced by a human subject, and the traces produced by a computer algorithm. The advantage
of this “reciprocal modelling” option, besides facilitating agreement between the algorithm and the
empirically observed trajectories, is that the theoretical model provides an explanation, and not just
a description, of the active perception of the human subject.
Key words: active perception, computer modelling, sensory-motor invariants, visual tactile sensory
substitution
1. Introduction
1.1. MODELLING PERCEPTION AS A SENSORI - MOTOR INVARIANT
The theme of “active perception”, which has attracted a growing volume of research over the last 20 years, comprises two major currents. The first current, which
is relatively classical, assumes that the object of perception is a pre-given referent,
and that the aim of perception is to actively elaborate an accurate representation of
this object. In this objectivist perspective, the role of the action is to increase the
amount of information available to the subject; the question then becomes that of
the efficient integration of this information in order to derive a coherent representation. The prime references in this direction are those of Marr (1982), Poggio (1983)
and Ullman (1980). This approach continues to be the object of substantial work,
both in the area of simulation (Crowley and Christensen, 1995), and with respect
to the identification of the underlying neuronal mechanisms (Kiper and Carandini,
2002).
The second current dispenses with the notions of a pre-given object and representations. Rather, the aim of perception is conceived as the identification of
Minds and Machines 14: 309–330, 2004.
© 2004 Kluwer Academic Publishers. Printed in the Netherlands.
310
JOHN STEWART AND OLIVIER GAPENNE
invariants directly in terms of the ongoing sensory-motor dynamics. In this perspective, the role of the action is not just to provide a supplement of information,
but more fundamentally to constitute the very object of perception. The classical
work in this direction is that of Gibson (1979); in the “ecological approach to
visual perception”, the relevant information is conceived as existing directly in the
“optic flow”. This approach has been characteristically developed in the field of
autonomous robotics, where Brooks (1987), for example, speaks quite explicitly of
“intelligence without representations”. More generally, work in this constructivist
perspective centers on the genesis of perceptive invariants (Piaget 1967; Varela et
al 1993; Stewart 1996; O’Regan and Noe, 2001). The work to be described in this
paper is situated quite clearly in the second orientation.
The aim of this paper is to model the action strategies deployed by human
subjects performing a perceptive task. The experimental setup derives from the
pioneering work of Bach-y-Rita (1972) on Tactile Visual Sensory Substitution
(TVSS), in which the output from a video-camera was transduced to a 20 × 20
array of tactile stimulators placed on the chest or back of blind subjects. This work
established three fundamental results: (i) If the camera remained stationary, form
recognition by the subjects was poor, even after prolonged learning; (ii) however,
if the subjects were able to act by moving the camera themselves – left to right,
up and down, zoom and so on – their perceptive capacities improved remarkably.
After only a few hours learning, they were able to recognize simple shapes, and
after 2 weeks of daily practice their perception developed to the point of recognizing faces. (iii) Concomitantly, there was a qualitative shift in the nature of their
perception. In the static phase, (i) the subjects reported feeling a tingling at the site
of tactile stimulation of the skin (on chest or back). However, in the active phase,
after learning, the sensory stimulation as such dropped out of their consciousness
(although it could be recovered by deliberately refocusing their attention on this
aspect), and was replaced by “direct perception” of the objects which were identified as being situated “in depth” in a three-dimensional space in front of them. It is
also interesting to note that if, unbeknown to the subject, the “zoom” was activated
so that the tactile “image” expanded, the subject instinctively recoiled – in other
words, expansion of the image unrelated to an action of the subject was interpreted
as the rapid and dangerous approach of a real object.
In order to further investigate the fundamental issues in perceptual cognition
that are involved, we have renewed these experiments by deliberately simplifying
the sensory input to its simplest possible expression – i.e. a single point of tactile
stimulation, which is either active (xxx) or inactive (ooo).1 The perceptual task we
have chosen is the identification of simple two-dimensional forms – broken lines
and curves (Lenay et al., 1999). The sensory input is provided not by an optical
camera, but by a magnetic pen in conjunction with a graphic tablet connected to
a computer which furnishes a virtual image (black or white pixels, which can be
displayed on the screen visible to the scientific observer but is of course not visible
RECIPROCAL MODELLING OF ACTIVE PERCEPTION OF 2-D FORMS
311
to the experimental subjects). In the experiments to be reported here, the subjects
are sighted but blindfolded. When the subject moves the pen so that the tip is over
a black pixel on the virtual image, the tactile stimulator (placed under the index
finger of the other hand) is activated; if the tip is over a white pixel, there is no
activation. This experimental setup has been described in previous publications
(Lenay et al., 1997, 1999; Hanneton et al., 1999).
Under these conditions, the sensory input is reduced to a temporal sequence:
“ooooooxxooooxxoxxxxxx”, etc. It is thus immediately evident that there is no
conceivable “information processing” of the input signal that can convert it into
the perception of a 2-D line or curve. In other words, we have quite deliberately
employed an experimental setup which illustrates, paradigmatically, the thesis of
“active perception”, i.e. there is no perception in the absence of actions on the part
of the subject. The question arises as to whether the sensory input has not been
impoverished to such an extent that perception is impossible even if the subject
can act; but the answer to this question is that the subjects do indeed succeed in
perceiving 2-D forms, and are able to demonstrate their perception by drawing the
figures as they have perceived them.
A major advantage of this experimental setup is that forces an externalisation
of the actions of the subjects so that it is possible to record them, in the form of a
trace of the successive positions of the tip of the pen. These trajectories, together
with the drawings by the subjects of the figures as perceived, constitute a rich set
of empirical data. A major aim of this paper is to develop a conceptual framework
and a set of qualitative categories which are an essential prerequisite for the fine
quantitative analysis of such empirical data.
1.2. MODELLING PERCEPTION THROUGH A RECIPROCAL PROCESS
Human subjects, unlike robots and animals, are able both to consciously describe
their own strategies, and to apply explicit strategies; the originality of the work
presented here consists of exploiting this feature both to inform and to constrain
the modelling process.
In general, when modelling a natural phenomenon, it is usual to consider that
the phenomenon is fixed and given, and that the aim of the theoretical model is
to conform as closely as possible to this pre-given referent. Epistemologically, it
may be remarked that there is no such thing as a complete, neutral description
of a natural phenomenon. An experimental setup is usually designed precisely
in order to exhibit with particular clarity those features of a phenomenon that
are theoretically relevant; and scientific descriptions generically take the form of
measurements which presuppose abstract categories (e.g. “length” or “weight”)
which derive not from “the thing in itself” (this is the empiricist illusion), but from
a theoretical model. However, subject to this caveat, the natural phenomenon is not
usually manipulated in order to conform to the model.
312
JOHN STEWART AND OLIVIER GAPENNE
In the present case, modelling of the “strategy of action” deployed by a human
subject has been subject to two constraints: (a) the characterization of the “strategy”
employed by a human subject should be sufficiently “complete” and precise to
generate a functional computational algorithm which produces traces qualitatively
(and, ideally, quantitatively) comparable to those actually produced by the human
subject; (b) concomitantly, and reciprocally, the computational algorithm should
employ only strategies that can demonstrably be realized by the human subject.
To the extent that this double constraint can be respected, the scientific goal of
fully articulating theory and experimental data will be achieved. Now these twin
constraints do not in themselves prohibit modification of the human strategy in order to conform to the model. In the work to be presented here, we have deliberately
taken advantage of this leeway by modifying the human strategy in relation with the
modelling process itself. In other words, the strategy employed by the human subject in this study was itself informed by knowledge of strategies that the computer
algorithm showed to be effective. We denote this option, consisting of deliberately
modifying the empirical results in order to conform to a theoretical model, by
the term “reciprocal modelling”. This option clearly facilitates the achievement
of convergence between a human strategy and a computational algorithm, but we
hope to show that it remains far from trivial and has additional advantages.
2. Methods
2.1. STRATEGIES OF ACTION : RELIABILITY AND PLURALITY
As we have already emphasized, under the experimental conditions that we have
deliberately chosen here, the sensory input alone cannot possibly give rise to the
perception of a form. In fact, the sensory feedback is used rather to guide the
actions of the subject, in such a way that the sensory input tends towards an “ideal”
form (e.g. a temporal sequence in which the tactile stimulation is alternately active
and inactive: “xxxoooxxxoooxxx”, etc; a regular sequence of this type contains a
maximum amount of information). This way of looking at things is closely akin to
the “Perceptual Control Theory” of Powers (1988). It will be noted that the sensory
regularity in question is rhythmic and temporal rather than immediately spatial. In
fact, what the subject “perceives” is not the sensory input as such (just as in Bach-yRita’s experiments, this tends to fade from consciousness in experienced subjects),
but rather the actions which are necessary in order to produce the “ideal” input.
Indeed, even a cursory examination of the traces produced by human subjects in
previous experiments reveals that already in the “perceptive” phase (i.e. before
actually drawing the figure to demonstrate what they have perceived), their actions
amount to “drawing” the figure.
This being so, the prime object of modelling active perception must be to characterize the actions of the subjects. More precisely, the aim is to identify the strategies
RECIPROCAL MODELLING OF ACTIVE PERCEPTION OF 2-D FORMS
313
of action deployed by the subjects. A model, in this sense, consists of making fully
explicit the process by which a subject generates his actions, and in particular to
specify how the sensory feedback is used to guide future actions. In this context,
computer simulation is an invaluable tool, since it guarantees that the specification
of the way in which actions are generated (including their modification according
to sensory feedback) corresponds directly to a functional computational algorithm.
We have observed in previous experiments (Lenay et al., 1999) that the traces
produced by relatively naive human subjects are highly erratic, and in fact not very
reliable in terms of effectively perceiving the 2-D figures; it would be not only
difficult, but arguably not very pertinent, to try and model such chaotic flounderings. After a suitable period of learning (repeated sessions of an hour or more,
over several weeks), the “strategy of action” deployed by a given human subject
stabilizes into a regular, reliable form. The aim of this paper is to characterize one
of these strategies. As explained above, we have adopted a “reciprocal modelling”
approach in which the learning process of the human subject was itself informed
by knowledge of strategies that the computer algorithm showed to be effective. In
general, even in the simplest cases, there is no single solution to the problem of
specifying a functionally adequate strategy. The task of specifying a strategy is,
typically, an inverse problem. Once a strategy is fully specified, it is a matter of
straightforward deduction to determine whether or not the strategy is successful in
reliably achieving perception; but there is no direct, deductive path leading from
the problem to its solution. All that can be done is to speculatively imagine a hypothetical solution; and then to test deductively whether this is actually a solution.
However, generically, inverse problems have an awkward property: there is no a
priori guarantee that there exists even a single solution; but if one solution does
exist, then there is usually a plurality (often an unlimited plurality) of solutions. The
scope of the present paper is limited to a pragmatic existence proof, i.e. demonstrating by example that there is at least one functionally adequate strategy of action.
Exploration of the full plurality of action strategies employed by human subjects,
including an attempt to organize them categorically, will be the object of future
work.
2.2. THE SYSTEM OF CO - ORDINATES
Another methodological consideration, which conditions the very possibility of
writing down a computational algorithm, concerns the system of co-ordinates in
which the actions are to be described and generated. An exhaustively realistic
model of a human subject would be based on the anatomical articulations of the
fingers, hand, wrist, elbow, shoulder and bust and the multiple muscles which determine their movements. The control of movement of the poly-articulated human
body is a fascinating subject; but in the present case, a model of this sort would
be excessively complicated. An appropriate simplification stems from the fact that
314
JOHN STEWART AND OLIVIER GAPENNE
all the muscular actions of the human subjects are resolved into displacements of
the tip of the pen. Empirical support for this simplification is provided by the fact
that composite movements in a trajectory modification task were actually better
described by a superposition scheme applied to pen-tip kinematics than applied
to joint kinematics (Schillings et al., 1996). Thus, in order to calculate the sensory
feedback consequent upon any given action, it is necessary and sufficient to specify
the resultant position of the tip of the pen in the discrete cartesian co-ordinates, {ix,
iy}, of the bitmap in which the pixels of the figure to be perceived are stored in the
computer.
This suggests the very simple possibility of employing just these co-ordinates
in order to describe the actions of the subject. However, upon consideration it
rather rapidly becomes evident that this possibility does not respect constraint (b)
above, i.e. human subjects are quite incapable of functioning in this system of coordinates as such. In “motor” mode, a human subject is quite incapable of placing
the tip of the pen on, for example, the point {37, 71}; and in “proprioceptive” mode,
equally incapable of specifying the cartesian co-ordinates of the tip of the pen at
any given moment. The difficulty here is not reducible simply to an unknown but
constant “scaling factor” between the external, objective referent {ix, iy} and an
internal subjective representation of this referent. The fact is that human subjects
simply do not have any subjective representation of this kind – at least not with
a precision sufficient to be functionally operational in the experimental situation
as set up here. This is shown, rather conclusively, by the following observation.
Suppose that the tip of the pen is placed at a certain position, {ix, iy}, which is part
of the figure so that the sensory feedback is positive. If, now, the subject inadvertently moves the tip more than a few pixels away, he is quite incapable of reliably
returning to the initial position in order to recover the stimulus. In other words –
as any novice who has tried the experiment will amply testify – it is fatally easy
to become “lost”, i.e. to find oneself in a situation where, after an initial “contact”
with the figure, there is no sensory feedback and one has little or no idea how to
recover one; and even if, after some desperate flailing around, one does succeed
in again obtaining a positive sensory stimulation, there is no reconizable relation
to the previous contact, so that the task of systematically perceiving the figure has
to start all over again. We will come back to this observation, since it places a
major constraint on functionally effective strategies of action. For the moment, we
retain simply the conclusion that human subjects do not function directly in {ix,
iy} co-ordinates.
If not {ix, iy}, what then is the system of co-ordinates employed by human
subjects who succeed in the perceptual task? For the purposes of modelling, we
have employed the hypothesis (not so far refuted by empirical evidence) that human
subjects possess a sense (i) of their absolute orientation (relative to the bust, which
can be held steady), which we designate by the symbol “θ”; and (ii) of the relative
distance of any given movement with respect to their previous position, which we
RECIPROCAL MODELLING OF ACTIVE PERCEPTION OF 2-D FORMS
315
designate by the symbol “d”. This sensory-motor capacity is, of course, subject to
error; if it were not, the subjects would be able, by path integration, to function
correctly (according to all external appearances) in the {ix, iy} co-ordinate system.
We will address the question of the quantitative estimation of the error involved in
the next section (2.5).
Two points are worthy of note here. The first is, that in this subject-centered
co-ordinate system human subjects have built up, via a developmental process
since birth, a reliable two-way mapping between their motor actions and their
proprioceptive sensations (Gullaud and Vinter, 1996). In other words, if a human
subject makes a movement, he knows what that movement is; and conversely,
if a movement is induced passively (in the present case, if the tip of the pen is
moved by an external force), the subject not only perceives the movement, but
is immediately able to reproduce it by a voluntary motor action. It is an open
question as to how this two-way mapping is implemented neurophysiologically.
There are three general options here. The first is that the contribution of action to
perception is exhausted by proprioceptive feedback in the form of sensory feedback
from the muscles. The second option is that the contribution of action is exhausted
by efferent copies of the motor commands that determine where the tip of the
pen was commanded to go. The third option is that the contribution of action is a
combination of the first two options. These options are described at greater length
in Mandik (1999). The present paper has no claim to contribute to an identification of these underlying mechanisms; we simply observe, empirically, that human
subjects do have functional knowledge of the movements of the tip of the pen
when these are expressed in {θ, d} co-ordinates (but not when expressed in {ix,
iy} co-ordinates). As already remarked, this involves a higher level integration of
the many muscles involved in finger, hand, wrist, elbow and shoulder movements.
It is likely that muscular proprioception and efferent copies of motor commands
are also integrated in this process (i.e. the third option above is plausible), but we
have no evidence on this point. To sum up, the work presented here neither makes
nor requires a commitment to one of the three options outlined above.
The second point is even more important and instructive. Maturana and Varela
(1987) have emphasized the importance, in cognitive science generally, of making
a clear distinction between (i) the “objective” situation, as it can be described in a
third-person perspective by an external observer; and (ii) the “subjective” situation,
as it is accessible in a first-person perspective to the cognitive subject in his own
terms. According to the tenets of the computatio-representational paradigm, cognition consists of establishing a bijectional isomorphism between these two terms
such that (ii) is a “representation” of (i); but this, Maturana and Varela vigorously
deny. Although there must be some sort of relationship between the two terms,
in order to satisfy a very general reality principle, they are not prima facie commensurable, so that the question of a simple mapping relationship between the two
simply does not arise. This thematic opposition, between the situation as expressed
316
JOHN STEWART AND OLIVIER GAPENNE
(i) in the terms of an external observer, and (ii) in the terms of the cognitive subject,
is tellingly illustrated in the present context by the contrast between the two coordinate systems, {ix, iy} on the one hand and {θ, d} on the other. The existence
of some relationship is exemplified by their identity under path integration; but the
absence of any intrinsic mapping is illustrated by the fact that, in the presence of
“noise” due to the limited precision of θ and d, the two tend inexorably to drift
apart.
2.3. ESTIMATION OF PRECISION
As a final methodological point prior to the specification of a particular strategy
of action, it will be useful at this point to deal with the question of a quantitative
estimation of the error in θ and d in typical human subjects. These errors can be
estimated from previous experiments using the same experimental setup, in which
human subjects draw their perceptions of simple figures composed of one or more
straight segments (Hanneton et al., 1999; Lenay et al., 1999; Gapenne et al., 2001).
For convenience, the angles are expressed in radians, and the standard deviation
multiplied by 100 to obtain a “percentage” error, δθ. Similarly, the precision of
d is expressed as a percentage error of the mean length, δd.2 The results (for
166 segments) are the following: δθ = ±18%; δd = ±25%; the total error is
thus ±31% (unpublished results). These values confirm the unreliability of “dead
reckoning” by path integration to achieve a functional equivalent of co-ordinates in
{ix, iy}: for displacements of greater than 10 pixels, with respect to a line 2 pixels
thick, the subject cannot be sure of “recovering” the line. This degree of precision
also sets a constraint on “viable” strategies, which must be robust with respect to
errors of this order of magnitude.
In all the simulations to be reported below, the movements of the tip of the pen
are affected with a pseudo-random error of ±30%.
3. Definition of a Specific Model Strategy
The complexity of a strategy adequate to achieve active perception of a given form
depends on the complexity of the form in question. The approach adopted here is
to proceed progressively. The first stage is to achieve perception of a single straight
line; and then to pass, in order, to curved lines; broken lines with obtuse angles; and
broken lines with acute angles. The strategy to be presented here successfully copes
with this range of complexity. Even greater complexity arises with the introduction
of topological singularities: T-junctions and X-crosses. Such figures lie beyond the
scope of the present paper, and will be adressed in future work.
RECIPROCAL MODELLING OF ACTIVE PERCEPTION OF 2-D FORMS
317
3.1. A BASIC STRATEGY FOR A SINGLE STRAIGHT LINE
We are now in a position to specify a particular strategy; to start with, for the
perception of a single straight line. Even here, the strategy consists of a number
of successive sub-components. For each of these steps, we will systematically and
thematically distinguish:
(i) a description from the point of view of an external observer, i.e. in the coordinate system {ix, iy} – this is not only the framework for the Figures 1–8
illustrating the traces produced, it is also useful for intuitively formulating the
sub-task to be achieved at each stage; and
(ii) an operational description of the “strategy” in terms of {θ, d}, which is the
basis for the computational algorithm actually employed to generate the traces,
and which is putatively a description of the strategy deployed by the human
subject.
3.1.1. The Broad Scan
In terms of {ix, iy}, the first task is to “find” the figure, i.e. (for the subject) to obtain
a positive sensory feedback. Starting (conventionally) in the top left-hand corner
of the total field, the strategy implemented by the computer algorithm consists of
“scanning” the field by a series of near-horizontal but gradually descending sweeps.
Operationally, expressed in terms of θ and d, the algorithm is the following. The
first action is a movement of distance d1 equal to the field width, at an angle
90◦ + 10◦ (measured in degrees, anti-clockwise from due North as 0◦ ); this is a nearhorizontal sweep, descending at an angle close to the probable error. If no positive
sensory stimulation is encountered, the next sweep is at an angle of (270–10)◦ ,
i.e. horizontally in the reverse direction, with the same angle of descent. This
procedure is repeated until a positive sensory stimulation is obtained. The upper
part of Figure 1 illustrates a typical result.
3.1.2. The Fine Scan
In terms of {ix, iy}, the second task is to get close to the figure. This is achieved,
in operational terms, by reversing the direction of the previous sweep but without
descending (i.e. an orientation of 270◦ or 90◦ as the case may be); the distance,
however is reduced to (about) 50% of the previous distance (i.e. di+1 = 0.5∗ di ).
If a positive sensory stimulation is encountered, the direction is again reversed by
180◦ ; if not, the direction is maintained; in either case, the distance is again reduced
by 50%. This procedure is iterated until dn is less than some constant preset value,
designated by the parameter “amp”. A suitable value for amp is the radius of the
circle (see 3.1.3 below), taken as being the same as the amplitude of the microsweep (see 3.1.4 below). As a precaution, if the last micro-sweep (with a value of
dn which has fallen below a) did not itself already cross the figure, it is prudent to
318
JOHN STEWART AND OLIVIER GAPENNE
prolong the procedure for one final iteration, but with an unreduced value of dn ;
logically, this micro-sweep should effectively encounter the figure. If it does not,
the strategy has failed. From the point of view of an external observer, when this
iterative algorithm terminates, the point of the pen will be close to a segment of the
figure; for what follows, it will be convenient to label this “point B”. Typical traces
for both steps 1.1 and 1.2 are shown in the upper part of Figure 1.
We may make, here, a remark which we shall have occasion to repeat: an
important quality of this strategy is that at any given point in time, the subject
“knows” (from the observer’s near-omniscient point of view) whether he is to the
left or to the right of figure, so that he is never “lost”. It might be remarked that
this procedure will fail if the figure consists of a single near-horizontal straight
line: either the broad scan 1.1 will arrive at the bottom of the field without ever
having encountered the figure, or (if one such encounter has occurred), in phase 1.2
the figure will never be found again. In simulations where the figure is deliberately
set to this singular form, that is indeed what occurred. In this case, there is nothing
for it but to begin over again with step 1.1 from the top left-hand corner, but in
this case with near-vertical sweeps - i.e. angles of θ successively (180 - 10)◦
and (0 + 10)◦ .
3.1.3. The Circle
Having “found” the figure, the next task will be to “track” the line or curve. However, in order to do this (see 3.1.4 below), it will be necessary to have a current
estimation of the orientation of the line at time t, denoted by θ(t). This estimate
will be continuously updated by the tracking procedure itself, but in order to start
the process an initial estimate is necessary. In the present strategy, this is achieved
by tracing a circle, of radius amp and centre at the point b. To be quite explicit:
this involves augmenting the repertoire of actions of which subjects are presumed
capable, to include the capacity of drawing a circle around a given point {cx, cy}.
This is not in contradiction with the supposed incapacity of subjects to function
in {ix, iy} co-ordinates, since the latter are absolute (i.e. do not drift over time),
whereas the {cx, cy} co-ordinates (expressed as real numbers in the algorithm) are
always relative, and thus do drift over time. The question as to whether human
subjects do or do not have the capacity of drawing local figures (here, the circle; in
step 3.1.4 below, a sine curve) with respect to a relative, local point {cx, cy} is an
empirical one. Although most human beings, unlike than Leonardo da Vinci, are
not able to trace perfect circles, the approximate circles centred on a given point
that are within their capacity prove quite adequate in practice.
Normally, the circle thus produced will produce two positive sensory stimulations per revolution. If the temporal rhythm of the two sensory stimulations “limps”
(i.e. in terms of the external observer, the two points of contact with the figure are
not diametrically opposite each other), the “circle” is traced again, with a centre
shifted to the side where the two points are closer together (spatially closer for
RECIPROCAL MODELLING OF ACTIVE PERCEPTION OF 2-D FORMS
319
the external observer, temporally closer for the subject). This procedure is iterated
until the two points are diametrically opposite (i.e. the rythm is regular, to a prespecified degree of precision). These two points then indicate the orientation of
the line, thus providing the requisite initial estimate of θ(t); the strategy has the
additional advantage that the centre of the circle is itself “centred” on the line. If a
complete revolution produces just one positive sensory stimulation, this indicates
that the position is at the end of the line (cf 3.1.5 below); in this case, another
circle is traced, with its centre at the previous point of contact; by iteration (if
necessary), this leads to the previous case with two points of contact. If the circle
makes no contact with the figure, the strategy has failed; in the simulations, the
strategy is robust and this practically never happens. Typical traces generated by
this algorithm are also illustrated in Figure 1.
This “circle” strategy provides a clear example of the reciprocal modelling option described in 2.2. In the case of the traces spontaneously produced by human
subjects, this strategy is rarely employed, and even then only sporadically. However, the computer algorithm demonstrates the usefulness and robustness of this
strategy, since it is successfully employed not only here, but at subsequent points
in the strategy (see 3.1.5 and 3.2.2 below). In general terms, it is a useful diagnostic
strategy at moments of uncertainty. Moreover, it is well within the capabilities of a
human subject to deploy this strategy systematically, as illustrated in Figures 4, 6
and 8.
3.1.4. Tracking
From the point of view of the external observer, it might be thought that once the
tip of the pen is placed on the line, with a knowledge of its local orientation, all
that remains to do is to follow the line by seeking to remain directly on it (i.e.
to produce a continuous tactile stimulation as sensory feedback). This strategy is
indeed attempted by some novice human subjects, but it has a serious drawback:
when (as inevitably happens, and in practice sooner rather than later) the sensory
feedback ceases, the subject does not know why (from the external observer’s point
of view, the subject does not know on which side of the line the deviation has
occurred). The subject is therefore in great danger of being “lost”, and having to
start all over again. The strategy presented here therefore consists of quite deliberately crossing the line. Operationally, the action consists of drawing a sin curve
about the point {cx, cy}. More specifically, movements orthogonal to the current
orientation θ(t) are the same as the action of drawing a circle (cf. 3.1.3 above);
movements in the direction of θ(t) are produced by advancing the current point {cx,
cy} at constant velocity in this direction; the composition of these two movements
effectively produces a sin curve with amplitude amp. We define a “unit of action”
as a single micro-sweep crossing the line (i.e. advancing 180◦ in the phase of the
sin curve).
320
JOHN STEWART AND OLIVIER GAPENNE
Due to imprecision in the initial estimate of θ(t), and even more importantly
due to inevitable “noise” (i.e. drift in the position of {cx, cy} with respect to perfect
path integration – recall that noise is set at +30% in these simulations), the external
observer sees that the sin curve becomes progressively decentred with the respect
to the line. For the subject, this will be reflected by the fact that the positive tactile
stimulation will not occur at the mid-point of the sin curve. In the algorithm, the
discrepancy between the actual occurrence of the positive tactile stimulation and
its ideal mid-point position is used to generate two corrections, applied after each
unit of action: (i) a lateral shift of {cx, cy} in a direction orthogonal to θ(t); and (ii)
an adjustment to θ(t), calculated as the lateral shift from (i) divided by the distance
that {cx, cy} has advanced in the direction of θ(t) in the course of the “unit of
action”. The gains on each of these corrections are variable parameters; for suitable
values of the gains, the algorithm is robust in maintaining an advancing sin curve
centred on the line of the figure to be perceived. The non-triviality of this result
is illustrated by the fact that if the values of either parameter are too low or too
high, in the presence of noise the sin curve “loses” the line. A typical traces with
optimal values for the gain parameters is shown in Figure 1. The crucial importance
of these feedback correction procedures is illustrated in Figure 2a, b. In Figure 2a,
the gain parameter for the lateral shift is set to zero. The problem here is similar to
that of a novice car driver; if no lateral shift correction is possible, and corrections
to orientation are made solely as a function of lateral deviation, the corrections
“overshoot” and give rise to a divergent series. In Figure 2b, the gain parameter
for the adjustment to current orientation is set to zero; the degradation in tracking
capacity is even worse, thus emphasizing the crucial importance of the current
estimation of θ(t) which, if not continually corrected, will drift due to noise. It
is clear that in both cases, the robustness of the “tracking” procedure is severely
compromised.
This correction procedure provides another illustration of the reciprocal modelling option. Spontaneously, human subjects performing micro-sweeps during tracking produce functionally equivalent corrections without lifting the pen from the
surface, i.e. the trace during successive micro-sweeps is continuous. The correction
procedures as implemented by the computer algorithm, however, give rise to a
trace which displays “breaks” between each “micro-sweep” unit of action. This
is illustrated schematically in Figure 3, and can be seen in the results of actual
simulations in Figures 1–3, 5 and 8. At this point in the work presented here,
the question arose quite concretely as to whether the computer algorithm should
be modified to model accurately the spontaneous traces of human subjects; or
whether the strategy and hence the traces of the human subject should be modified
in order to conform to the algorithmic model. The former possibility would not
have been radically impossible, although quite difficult and “messy”; in practice,
it was deemed simpler to adopt the latter approach, i.e. to modify the spontaneous
RECIPROCAL MODELLING OF ACTIVE PERCEPTION OF 2-D FORMS
321
human strategy in order to conform to the model. As illustrated in Figures 4, 6 and
8, this was indeed well within the capabilities of the human subject.
Finally, to complete the “tracking” module of the strategy, it is supposed in the
algorithm that the subject can stock in memory a record of θ(t) and d(t) (i.e. the
cumulative distance that {cx, cy} has advanced along the line) each time the trace
crosses the figure: a pair of values (θ i di ) is thus recorded for each “unit of action”
i.
3.1.5. End of Line
The “tracking” module 1.4 is iterated as long as each “unit of action” produces
one and only one positive sensory stimulation. We will deal with the case of more
than one positive sensory stimulation later. Here, the case to be dealt with is when a
unit of action does not produce any positive sensory stimulation in return. From the
point of view of an external observer, this will normally occur because the trace has
reached the end of the line to be perceived; but it may also be because the tracking
sine-curve has inadvertently drifted to one side or other of the line (or, as we shall
see in 3.2.2, because the figure consists of a broken line with a relatively sharp angle
less than about 150◦ ). Operationally, the procedure in this case is: (i) “backtrack”
the last (empty) unit of action; since the total distance involved is small, of the
order of 2∗amp, the error involved is small. (ii) From this point, draw a circle
centred on the current value of {cx, cy}. If a complete revolution produces just a
single positive tactile stimulation, this is a topologically reliable indication that the
end of the line has indeed been reached. (The case in which the circle produces two
distinct positive tactile stimulations will be dealt with below in 3.2.2.)
3.1.6. Return Tracking
If the hypothesis “end of line” is confirmed by the previous step (3.1.5), the tracking
procedure is repeated in the reverse direction using the same tracking procedure as
in 3.1.4. The sole difference is that when storing values of (θi di ), the cumulative
distances di must be incrementally decreased instead of increasing as in 3.1.4. This
procedure is continued until an “end of line” is reached as in 3.1.5. The whole
procedure of “return trackings” can be repeated as often as desired.
3.1.7. Drawing What Has Been Perceived
Using the stored values of (θi di ), the final step consists of drawing what has been
“perceived”. In the computer simulations, the figure will be drawn as many times
as there are return trackings.
This set of procedures, 3.1–7, specifies the basic strategy for the perception of
simple lines. A typical result produced by the algorithm as specified is illustrated in
Figure 1. This figure is to be compared with Figure 4, which shows the analogous
traces produced by a human subject employing the same qualitative strategy.
322
JOHN STEWART AND OLIVIER GAPENNE
Figure 1. The traces produced by the computer algorithm specified in the text, 3.1.1–7. The
top part of the figure shows the broad scan 1.1 until a first sensory return (small open circle) is
obtained; the fine scan (1.2); the circle (1.3), the tracking procedure (1.4) and the “end of line”
circle (1.5). The return trackings (1.6) are now shown. The middle part of the figure shows
a drawing of the figure as perceived by the algorithm (1.7), based on four successive return
trackings. The lower part of the figure shows the data stored in memory, i.e. a record of current
orientation “Theta” as a function of cumulative distance.
3.2. COMPLEX FIGURES
3.2.1. Curves and Wide Angles
For smooth curves, the strategy described in 3.1.4 – i.e. (i) a lateral shift of {cx,
cy} in a direction orthogonal to θ(t) and (ii) an adjustment to θ(t) with appropriate
values for the gain parameters – is sufficiently robust to ensure success of the
“tracking” procedure without any additional sophistication. This is the case even
if the radius of curvature is quite small, resulting in a tight curve. A typical result
produced by the computer algorithm is shown in Figure 5; this can be compared
with the analogous traces produced by a human subject shown in Figure 6.
The same algorithm also works without modification for broken lines if the
angle is very wide – approximately in the range 150◦ –180◦ (not shown).
3.2.2. Sharp Angles
For sharper angles – roughly in the range 60◦ –150◦ – the same strategy 3.1.4 can
occasionally work, depending on the contingencies of the approach to the angle.
However, with increasing frequency as the angle becomes sharper, the change of
RECIPROCAL MODELLING OF ACTIVE PERCEPTION OF 2-D FORMS
323
Figure 2. (a) The situation is the same as in Figure 1, but the gain parameter specifyingthe
lateral shift correction (see 3.1.4.) has been set to zero. It can be seen that the “tracking”
procedure is severely compromised; the algorithm actually ends by “losing” the line altogether.
(b) The situation is the same as in (a), but here it is the parameter specifying the corrections to
current orientation Theta that is set to zero. The degradation of the tracking procedure is even
worse than in (a), as seen in the lower part of the figure where the spread of values of Theta
is clearly inadequate (cf the closely grouped values of Theta in Figure 1 where the correction
procedures are in operation).
324
JOHN STEWART AND OLIVIER GAPENNE
Figure 3. A schematic illustration of the effects of the algorithmic correction procedures (lateral shift and adjustment of orientation) described in 3.1.4. During the first unit of action, on
the left, the point {cx, xy} moves from C1 to C2, producing a sin-wave trace from T1 to T2.
If the positive tactile stimulation is encountered well before the midpoint of the trace (indicated here by the open circle), the correction procedure will introduce a lateral shift upwards,
together with an adjustment of the orientation, so that the point {cx, cy} will move from C3
to C4 with a corresponding trace T3–T4. This procedure clearly introduces a discontinuity
between T2 and T3.
Figure 4. The traces produced by a human subject employing a strategy qualitatively similar
to that underlying the computer algorithm illustrated in Figure 1. The upper part of the figure
shows the broad scan, the fine scan, the initial circle, tracking, and the “end of line” circle
(3.1.1–5). The next part of the figure illustrates a “return tracking” (1.6); then a drawing of the
figure as perceived by the subject; finally, at the bottom, the “true” figure to be perceived.
RECIPROCAL MODELLING OF ACTIVE PERCEPTION OF 2-D FORMS
325
Figure 5. The traces produced when the computer algorithm specified in 3.1.1–7 is applied,
without further modification, to a figure which is a fairly tight three-quarters arc of a circle.
Upper left: the traces produced on one of the “track returns” (3.1.6); upper right: drawing of
the figure “as perceived” for a total of five return trackings. The “drift” due to the fact that the
algorithm has no internal representation of absolute position in {x,y} co-ordinates is clearly
visible. The lower part of the figure shows the stored memory values in {θ, d} co-ordinates;
the arc of a circle appears in these co-ordinates as a linearly decreasing function.
Figure 6. The traces produced by a human subject in a situation analogous to that in Figure 5.
The upper part of the figure shows a “return tracking” (1.6); lower left is the “true” figure to
be perceived, and lower right a drawing of the figure as perceived.
326
JOHN STEWART AND OLIVIER GAPENNE
direction in the line gives rise to a situation in which the micro-sweep unit of action
described in 3.1.4 fails to produce a sensory return. The algorithmic procedure in
this case is the same as described in 3.1.5 (as indeed it must be, since the subject
does not know what is the situation as seen by an external observer; from his point
of view it is quite possible that an “end of line” has been reached). In other words,
the procedure is again: (i) “backtrack” the last (empty) unit of action, and (ii) from
this point, draw a circle centred on the current value of {cx, cy}. However, if the
situation as perceived by an external observer is that of a broken line, the “circle”
will produce now two distinct sensory returns. Both algorithmically and for human
subjects, it is possible to identify which positive tactile stimulation corresponds to
the segment of line from which one has come, and which to the new segment of
line at an angle to the previous one; and in addition, to obtain an estimation of
the orientation θ of the new segment from the (angular) difference between the
two positive tactile stimulations “Tracking” as in 3.1.4 can then resume on the new
segment. This sequence is illustrated in the upper right part of the figure in Figure 7.
3.2.3. Acute Angles
For very sharp angles – approximately in the range 5◦ –60◦ – it can happen that
before encountering a “silent” micro-sweep as in 3.2.2, a micro-sweep produces
not one sensory return (as is normally the case when “tracking”), but two distinct
returns. For an external observer, this can occur in the case of a sharp acute angle. In
a manner similar to 3.2.2, it is possible to identify which positive tactile stimulation
corresponds to the original segment, and which to the new segment (usually, the
latter will be the second positive tactile stimulation encountered towards the end
of the micro-sweep). The procedure here is to verify this hypothesis by continuing
tracking until a micro-sweep with no sensory return is achieved (for the external
observer, this corresponds to an advance to a point just beyond the point of the acute
angle); and then to return to the point at which a micro-sweep produces two distinct
positive tactile stimulations. Tracking can then resume on the segment identified as
the “new” one; an estimation (in radians) of the acute angle, and hence of the
new orientation, can be obtained by dividing the distance between the two positive
tactile stimulations (usually a little less than a) by the distance to the point of the
acute angle. This sequence is also illustrated in Figure 7, in the lower right part of
the figure.
The analogous traces produced by a human subject are shown in Figure 8.
This completes the specification of the model strategy.
4. Discussion
As can be seen in Figures 1–8, the goal of articulating theory and experiment has
been realized: there is good qualitative and semi-quantitative agreement between
RECIPROCAL MODELLING OF ACTIVE PERCEPTION OF 2-D FORMS
327
Figure 7. Upper left: the results of the strategies described in 3.2.2 (upper right part of the
trace, where a “sharp angle” (approximately 100◦ ) is negotiated by a “circle”); and in 3.2.3
(lower right part of the trace, where an “acute angle; negotiated by recognizing that a single
micro-sweep produces two distinct sensory returns). On the right is a drawing of the figure “
as perceived” for a total of four return trackings; the “drift” noted in the legend to Figure 5 is
again clearly visible. The lower part of the figure represents the values stored in memory.
Figure 8. The traces produced by a human subject in a situation analogous to that in Figure
7. Left: the traces produced on a “return tracking”; it can be seen that the subject employs
the “circle” strategy to negotiate the sharp angle on the upper part of the trace, and that a
micro-sweep produces two distinct sensory returns on approaching the acute angle on the
lower part of the trace. Centre: a drawing of the figure as perceived; on the right the original
figure.
328
JOHN STEWART AND OLIVIER GAPENNE
the traces produced by a human subject, and the traces produced by a theoretical
model in the form of a computer algorithm. This positive result has the value of
an existence proof: there is at least one strategy, within the demonstrated capabilities of human subjects, that can be successfully modelled by a detailed, explicit
computer algorithm.
As mentioned in 2.3, the task of perceiving simple figures in the experimental
setup described here is an “inverse problem” which possesses a plurality of possible solutions. Previous experiments (Hanneton et al., 1999; Lenay et al., 1999)
show that after a period of familiarisation sufficient for each subject to stabilize
a preferred strategy, the “strategies of action” deployed by human subjects fall
into one of a plurality of regular, reliable forms. It will be the aim of future work
to characterize these strategies by regrouping them in a small number of major
categories, with possibly a number of variants within each category.
The achievement of convergence between a human strategy and a computer
algorithm has undoubtedly been facilitated by the reciprocal modelling option adopted here. In other words, the strategy effectively employed by the human subject
has been influenced by the modelling process itself. This influence could have been
eliminated by basing the modelling solely on examination of the empirical traces
produced, without questioning the subjects as to the strategy employed. However,
although this approach would be more stringent from one point of view, it would
actually be less rigorous from another point of view; the reason being that we
would then deprive ourselves of a resource for controlling whether the algorithm as
modelled in the computer really does correspond to the strategy actually employed
by the human subject in order to generate the trajectories. Indeed, it would not
even be possible to verify that the procedure as specified by computer algorithm is
concretely feasible for human subjects.
The important point here is the following: the fact that a computer algorithm is
capable of producing trajectories indistinguishable from those produced by human
subjects is not, in itself, a proof that the mechanisms generating the trajectories are
the same (this is a corollary feature of inverse problems). In other words, an algorithm based solely on observation of the trajectories would provide a description
of these trajectories; this description would have the merit of being economical and
mathematically tractable; but it would not necessarily correspond to an explanation
of the trajectories.
In a sense, then, the “reciprocal modelling” approach adopted here is actually
more ambitious than the more conventional type of modelling, since it aims at
explanation and not just description. For this reason, we propose to pursue this
approach in future work. Certainly, every effort should and will be made to maximize the adjustment of the model to the “spontaneous” empirical observations,
and to minimize the extent to which the strategies effectively employed by the
human subjects are influenced by the modelling process itself. However, even if it
is minimized, it is unlikely that the latter effect can ever be reduced to zero: if only
RECIPROCAL MODELLING OF ACTIVE PERCEPTION OF 2-D FORMS
329
because the very act of requesting a human subject to render explicit what he has
hitherto been successfully doing without thinking consciously about it will almost
inevitably have a retroactive effect on the performance itself.
Notes
1 In the basic experimental setup, the sensory feeback is a single point of tactile stimulation. However, preliminary
experiments indicate that the sensory feedback can take the form of an auditory signal, or a visual signal (a
stationary square on the computer screen which is either black or white), with little or no modification in the
performances of the subjects.
2 These units have the convenient property that the total error is: sqrt(δθ 2 + δd2 ); with the additional property that
this error is the same in {ix, iy} and θ , d coordinates.
3 The code implementing the algorithm described below, written in Visual C++, is available on request from:
[email protected]
References
Bach-y-Rita, P. (1972), Brain Mechanisms in Sensory Substitution, New York: Academic Press.
Brooks, R.A. (1987), Intelligence Without Representation, Boston: MIT Artificial Intelligence
Report.
Crowley, J.L. & Christensen H. I. (1995), Vision as Process. Basic Research on Computer Vision
Systems, Berlin: Springer.
Gapenne, O., Lenay, C., Stewart, J., Bériot, H. and Meidine, D. (2001), Prosthetic Device and 2D
Form Perception: The Role of Increasing Degrees of Parallelism; in Proceedings of the Conference on Assistive Technology for Vision and Hearing Impairement (CVHI’2001), Castelvecchio
Pascoli, Italy.
Gibson, J.J. (1979), The Ecological Approach to Visual Perception, Boston: Houghton Mifflin Press.
Gullaud, L. and Vinter, A. (1996), The Role of Visual and Proprioceptive Information in Mirrordrawing Behavior, in M.L. Simmel, C.G. Leedham & A.J.W.M. Thomassen, eds., Handwriting
and Drawing Research: Basic and Applied Issues, Amsterdam: IOS Press, pp. 99–113.
Hanneton S., Gapenne O., Genouel C., Lenay C. and Marque C. (1999), Dynamics of Shape Recognition Through a Minimal Visuo-Tactile Sensory Substitution Interface, in Proceedings of the
Third International Conference On Cognitive and Neural Systems, Boston, pp. 26–29.
Kiper, D.C. & Carandini, M. (2002), The Neural Basis of Pattern Vision, London: Macmillan.
Lenay, C., Cannu S. and Villon, P. (1997), Technology and Perception: The Contribution of Sensory Substitution Systems, in Proceedings of the Second International Conference on Cognitive
Technology, Aizu, Japan, Los Alamitos: IEEE, pp. 44–53.
Lenay, C., Gapenne, O., Hanneton, S. and Stewart, J. (1999), Perception et Couplage Sensori-Moteur:
Expériences et Discussion Epistémologique, in A. Drogoul and J-A. Meyer, eds., Intelligence
Artificielle Située (IAS’99), Paris: Hermes, pp. 71–86.
Mandik P. (1999), Qualia, Space and Control, Philosophical Psychology 12(1), pp. 47–60.
Maturana, H. and Varela, F.J. (1987), The Tree of Knowledge, Boston: Shambhala.
Marr, D. (1982), Vision: A Computational Investigation into the Human Representation and
Processing of Visual Information, San Francisco: W.H. Freeman.
O’Regan J.K. and No`‘e A. (2001), A Sensorimotor Account of Vision and Visual Consciousness.
Behavioral and Brain Sciences 24, pp. 939–1031.
Piaget J. (1967), Biologie et connaissance: Essai sur les relations entre les régulations organiques
et les processus cognitifs, Paris: Gallimard.
330
JOHN STEWART AND OLIVIER GAPENNE
Poggio, T. (1983), Visual Algorithms, in O.J. Braddick and A.C. Sleigh, eds., Physical and Biological
Processing of Images, Berlin: Springer, pp. 128–153.
Powers, W.T. (1988), An Outline of Control Theory, in The Control Systems Group Inc., Living
Control Systems, Kentucky, USA, pp. 253–293.
Schillings, J.J., Meulenbroek, G.J. and Thomassen, A.J.W.M. (1996), Decomposing Trajectory
Modifications: Pen-Tip Versus Joint Kinematics, in M.L. Simmel, C.G. Leedham and A.J.W.M.
Thomassen, eds., Handwriting and Drawing Research: Basic and Applied Issues, Amsterdam:
IOS Press, pp. 71–85.
Stewart J. (1996), Cognition = Life: Implications for higher-level cognition, Behavioural Processes
35, pp. 311–326.
Ullman, S. (1980), Against Direct Perception, Behavioral and Brain Sciences 3, pp. 373–415.
Varela F., Thompson E. and Rosch E. (1993). The Embodied Mind, Boston: MIT Press.