Face distinctiveness in recognition across viewpoint: An analysis of

Face distinctiveness in recognition across
viewpoint: An analysis of the statistical structure
of face spaces
Alice J. O'Toole
School of Human Development
The University of Texas at Dallas
Richardson, TX 75093-0688, USA
[email protected]
Abstract
We present an analysis of the eects of face
distinctiveness on the performance of a computational model of recognition over viewpoint
change. In the rst stage of the model, the face
stimulus is normalized by being mapped to an
arbitrary standard view. In the second stage,
the normalized stimulus is mapped into a \face
space" spanned by a number of reference faces,
and is classied as familiar or unfamiliar. We carried out experiments employing a parametrically
generated family of face stimuli that vary in distinctiveness. The experiments show that while
the \view-mapping" process operates more accurately for typical versus distinctive faces, the base
level distinctiveness of the faces is preserved in
the face space coding. These data provide insight
into how the psychophysically well-established
inverse relationship between the typicality and
recognizability of faces might operate for recognition across changes in viewpoint.
1 Psychophysical background
The recognition of familiar faces is something
that people do very well. This is true even with
relatively dramatic changes in faces that can occur daily (e.g., hair style), and over longer periods of time as faces age. Somewhat distinct
from the problem of recognizing faces that have
changed in various ways is the problem of generalizing this recognition ability across varying
viewpoints. In this case, the 3D structure of a
face/head is relatively constant. The problem is
to determine whether or not we know a face when
we see it from dierent, even completely novel,
viewpoints. Notably, the problem of recognition
requires the ability to represent the information
in individual faces that makes them unique. The
problem of generalization across views entails the
additional requirement that this unique information be accessible across viewpoint variations.
It is evident that individual faces vary in the
quality of the uniqueness information they provide for a face recognizer | either human or
computational. More simply stated, individual
Shimon Edelman
Department of Applied Math. and CS
The Weizmann Institute of Science
Rehovot 76100, Israel
[email protected]
faces vary in how \distinctive" or unusal they
are, and hence in how likely they are to be mistaken for other faces. The relationship between
the distinctiveness of a face (as rated by human
subjects) and the accuracy with which human
observers recognize the face has been well established in the psychological literature: not surprisingly, distinctive or unusual faces are more
accurately recognized than are typical faces [9,
10, 15].
This nding has implications both for theoretical accounts of human memory for faces and for
more applied issues concerning the factors that
aect, e.g., the accuracy of eyewitness identication. From a theoretical perspective, many psychological and computational models of face processing have posited a representation of faces in
a \face space," with a prototype/average face at
the center e.g., [2, 14, 15]. By this account, individual faces are encoded in terms of their deviation from the prototype face | typical faces are
harder to recognize than unusual faces, because
the face space is more \crowded" close to the prototype, making it easier to confuse typical faces
with other (un)familiar faces. While these data
are well-established, they have been collected and
applied almost exclusively to the problem of face
recognition from a single viewpoint (though see
[12], for an exception).
These data suggest that the human performance depends on the statistical structure of the
set of faces to which the observer has been exposed. This observation serves as the main guiding principle behind the model we describe next.
This computational model builds on the basic
psychological ndings and extends them to consider the eects of face distinctiveness for recognition over viewpoint change.
2 Computational background
The central role of the statistics of the stimuli in
our model is motivated both by the psychological considerations surveyed above, and by the
growing importance attributed to the statistical
structure of the visual world in current theories
of visual processing. A number of researchers
have attempted to derive the shapes of the receptive elds found at the early stages of the visual system from the statistics of natural images
([6]; see [13] for a review). More recently, it has
been suggested that a similar approach may be
productive at the higher levels of vision, which
should be tuned to the statistics of natural objects (such as faces), rather than random scenes
[16].
Our model relies on the statistics of a collection of face shapes in two ways. First, the
common manner in which images of faces change
with viewpoint (due to the common 3D structure
of faces) is exploited at the initial stage of the
model, which performs normalization of the input image to a \standard" view of the face. The
normalized image is then compared to a number
of reference faces, which span our version of the
face space. At this second stage, the statistics
of the collection of faces with respect to a set of
reference faces constitutes the system's internal
representation of the face space. The rest of this
section describes the two stages of the model in
some detail.
cult computationl problem is via class-based processing: assuming that the stimulus belongs to a
familiar class, the visual system can take advantage of its prior experience with other members
of that class in processing the image of a new
member. For example, a normalizing transformation that brings familiar members of the class
of faces into a normal form, can be used to estimate the appearance of a less familiar face from
some standard viewpoint, facilitating subsequent
recognition of that face [8].
2.2 The shape space
stimulus
view−map
(RBF)
reference−face
modules
(RBF)
2.1 The view space
input
image
face space
subsample
recognize
identify
(threshold)
(RBF)
train
view−map
(RBF)
familiar/unfamiliar?
who?
output
vector
Figure 2: The entire model. Following normalization of the stimulus image by the view-mapper
[7], it is projected into a view-specic face space
spanned by a set of reference faces [5].
Figure 1: The view-mapper. The way in which
known faces change across viewpoint is exploited
in deriving a normalized representation of a novel
face seen from a familiar orientation.
At the recognition stage, the system must deal
with a stimulus that may have been normalized
(e.g., by class-based processing), but may still
turn out to be unfamiliar, i.e., may not match any
of the stimuli for which internal representations
are available in long-term memory.
Just as the problem of making sense of an unfamiliar viewpoint can be dealt with by exploiting
the similarity of the view space of a given face
to those of other members of the class of faces,
we treat the problem of making sense of unfamil-
equivalent
image
As noted frequently in the vision literature, the
human visual system is usually able to make
sense of a 2D image, even when the object to
which it corresponds was never before encountered under that particular combination of viewing conditions. A possible solution to this di-
3 Experiments
We carried out two sets of simulations, the rst of
which assessed the eects of face distinctiveness
on the performance of the normalization procedure, and the second | its eects on the quality
of the resulting face-space representations.
3.1 The stimuli
To characterize the eect of face distinctiveness
on the functioning of the model, we had (1) to
quantify the distinctiveness itself, and (2) to obtain a series of faces varying along the distinctiveness dimension. For this latter purpose, one may
use synthetic parametrically controlled shapes [3,
8], or derive the parameter space from a set of
real faces.
2
K
1.5
Ji
1
weight of eigenface 2
iar shapes by exploiting the similarity structure
of the shape space, to which all the members of
a certain class of objects (such as faces) belong.
This is done by representing explicitly the similarities of the stimulus to a number of reference
shapes [4]. The resulting scheme constitutes a
useful method for class-based dimensionality reduction [5], and can support the representation
of a potentially innite variety of shapes from a
given class.
0.5
Ha
P
0
He
D
−0.5
S
R
−1
Jo
−1.5
−2
−3
−2
−1
0
1
weight of eigenface 1
2
3
Figure 4: The weights of the nine faces used in
the generation of the stimulus set, in the space
of the rst two eigenfaces.
computer graphics), we decided to derive the dimensions of the shape space from a principal
component analysis (PCA) of nine 3D laser scans
of human faces (see Figure 3; three of these are
distributed with SGI systems, and the other six
are available over the Internet, courtesy of Cyberware Inc., as a part of their demonstration
software).1
This approach to the parameterization of the
face spaces leads to a natural quantication of
distinctiveness in terms of the parameter-space
distance between a given face and the mean face,
the parameters of a face being its projections
onto the eigenfaces obtained by the PCA. The
locations of the nine faces used in the PCA in
the subspace spanned by the rst two eigenfaces
appear in Figure 4. We used eight of the faces2
to generate 80 face stimuli, in the following manner. For each of the eight points in the face space,
10 versions were generated, corresponding to 10
equally spaced locations along the line connecting that point to the origin. For convenience and
for later reference, faces numbered 1, 11, 21, . . . ,
71 were the least distinctive versions of the eight
faces, while faces 10, 20, . . . , 80 | were the most
distinctive versions of these faces. Each of the 80
faces was rendered from four viewpoints, starting at a full-face orientation, and proceeding by
rotation around the vertical axis (in 22 5 incre:
Figure 3: The nine faces used in the generation
of the stimulus set.
Because synthetic faces oer only a crude approximation to the rich 3D structure of the human face (unless a large investment is made in
1
A similar approach to the generation of parametrically controlled face stimuli has been recently
proposed in [1]. Also, a low-dimensional PCA-based
representation of faces has been shown to be useful
for quantifying gender { a part of the categorical information
available in faces [11].
2
Omitting face P, whose direction relative to the
origin in the face space nearly coincided with that
of Ha.
ments) to 67 5.
:
3.2 Procedure
Our procedure was designed to assess the eects
of distinctiveness at both stages of the computational model. Intuitively, the quality of the
view-mapped faces is expected to be the best for
typical faces (i.e., the ones that are closest to the
average face). Typical faces are expected, however, to be the most dicult to recognize, while
distinct faces, due to their location in the face
space, should have the advantage at the second
stage of the model's representation.
3.2.1 Face distinctiveness and the
view-mapper
Separate linear view-mappers were trained to
produce estimates of the full-face view from each
of three other views: 22 5, 45 , and 67 5. To
test the generalization performance of the viewmappers, we employed standard \leave-one-out"
cross-validation: a view-mapper was trained with
all 10 distinctiveness versions of seven faces and
was tested with all 10 distinctiveness versions of
the \left-out" face. This procedure was repeated
for all eight faces, resulting in view-mapped fullface estimates for all eight faces from each of the
three views.
:
:
1
22.5−>full
0.999
45.0−>full
67.5−>full
view−map quality
0.998
0.997
0.996
0.995
0.994
0.993
1
2
3
4
5
6
distinctiveness
7
8
9
10
Figure 5: The performance of the view-mapper
declines with face distinctiveness and with the
disparity between the input and normal views.
Analysis 1. We rst assessed the quality of the
view-mapped face estimates as a function of face
distinctiveness. View-map quality was measured
as the cosine of the angle between the original
full-face view and the view-mapper's estimate of
this view (both dened as vectors). The results
(Figure 5) show that: (1) view-map quality declines as view-map angle increases; and (2) viewmap quality declines as the face distinctiveness
increases (i.e., typical faces were better preserved
than distinct faces in the normalization process,
as expected).
Analysis 2. Recognition of faces across viewpoint depends not only on the quality of the normalized (view-mapped) face estimate, but also,
critically, on the extent to which the structure
of face space is preserved across the normalization transformations. We examined this latter
issue by analyzing the Procrustes distortion between the original full-face views and their viewmapped versions. This was done by applying
Procrustes transformations3 to compare the similarity of original and view-mapped congurations, in which each face was represented by its
coordinates in the space of the two leading eigenvectors derived from the face images. The Procrustes distance (the residual that remains after the application of the optimal transformation, and measures the discrepancy between the
two congurations) was 2.91 for the 22 5 viewmap condition, 3.183 for the 45 view-map condition, and 4.04 for the 67 5 view-map condition
{ all signicantly better than estimates of the expected random distance, obtained by bootstrap,
indicating the preservation of the original similarity structure of the face space by the view
mappers.4
Analysis 3. Finally, we examined the extent
to which face distinctiveness inuenced the distortion of the face space under view-mapping,
by comparing Procrustes distances between the
original frontal views and view-mapped versions
of the faces for dierent levels of distinctiveness
(see Figure 6). We found that the face space distortion increased with the size of the view change.
In the two smaller view change conditions, the
distortion was lower than the expected random
distortion, estimated by bootstrap, in all 10 dis:
:
The optimal combination of scale, rotation,
translation, and reection that minimizes the sum of
squared distances between the corresponding points
of two
congurations.
4
Note that the above analysis was concerned with
the preservation of the information in face images,
rather than in the 3D head data. Procrustes analysis of the relationship between the similarity space
of the 3D head data and that of its 2D representation (a full-face view) indicated that the 3D head and
2D view face spaces did not match well. In other
words, view-based and 3D face codes make rather
dierent predictions about the distinctiveness of individual faces; cf. [11].
3
tinctiveness cases.5 Moreover, there was a relatively consistent relationship between face-space
distortion and distinctiveness, with the lowest
distortion for the least and the most distinct
faces. Thus, while Figure 5 shows that viewmap quality declines with increasing distinctiveness, the extent to which the structure of the
similarity space is preserved does not follow a
similar decline. Note that the rise in the distortion with distinctiveness suggests that the viewmapper looses more information from the distinct faces than from the typical faces. There is,
however, more uniqueness information in the distinct faces to begin with; this eect, apparently,
more than cancels the previous one, resulting in
a downward trend in the Procrustes distortion as
the distinctiveness continues to grow.
1.3
Procrustian distance
1.2
1.1
1
tinctiveness versions of the 8 original faces. The
remaining 40 faces served for testing and were
projected into the face space spanned by the responses of the reference-face RBF modules.
To assess the eects of face distinctiveness on
the discriminability of novel faces projected into
the face space, we plotted the corresponding projections directly, for dierent levels of distinctiveness (Figure 7). As expected, the face projections
show maxima along the diagonals, due to the
fact that these novel test faces were \neighbors"
in the distinctiveness space to the learned faces.
The extent to which there is activation o the
diagonals is an indication that the model projections are confusable with other \nontarget" faces.
The plotted data can be seen, therefore, to represent a confusion table of sorts. Note, rst, that
the relatively higher activation levels on the diagonal indicate that the similarity of the test faces
to their neighbors in the learned set was sucient
to activate the RBF nodes of the learned neighbors. Of more direct interest, however, is the
decrease in o-diagonal activation in the projection patterns for our parametrically more distinct
face versions, eectively indicating lesser confusability of the distinct faces with other faces.
0.9
1
1
0.5
0.5
0.8
22.5 deg.
0.7
45 deg.
67.5 deg
0.6
1
2
3
4
5
6
7
8
9
0
0
−0.5
40
−0.5
40
3.2.2 Distinctiveness and the view-space
The eects of face distinctiveness on the face
space representation was examined by projecting
novel faces onto a set of reference faces and analyzing the resulting representations. We used
40 faces to train a Radial Basis Function (RBF)
network. These reference faces were interleaved
by distinctiveness (i.e., every other face; 1, 3, 5,
...11, 13, etc.), comprising 5 out of the 10 dis-
The largest view change condition was dierent,
so we will not interpret it further. The apparent difference between this result and that reported for the
large view change condition in Analysis 2 is likely to
be due to the loss of the statistical power incurred
in analyzing 10 faces, grouped by distinctiveness, as
opposed to 80 faces.
5
5
0 0
1
1
0.5
0.5
0
0
−0.5
40
−0.5
40
10
20
10
20
5
0 0
face distinctiveness
Figure 6: Procrustes distance between original
and view-mapped faces as a function of face distinctiveness version and view-map condition.
10
20
10
5
0 0
10
20
5
0 0
Figure 7: Face space projections for four levels of
face distinctiveness, top left least distinctive, top
right second most distinctive, etc. (the plot for
the fth level of distinctiveness, omitted to save
space, was similar to the fourth one).
4 Summary
We have presented a computational model of
face recognition that is sensitive to the statistical characterization of faces on a number of levels, mirroring a similar sensitivity of human observers, and of numerous other models. In spite
of the importance of this issue and its potential
for bringing together human and computational
data on face processing, the eects of individual face distinctiveness on the accuracy/nature of
face processing has been little investigated in the
context of computational models. Our preliminary investigation into this matter indicates that
in recognizing faces over changes in viewpoint,
the eects of face distinctiveness seem to operate paradoxically. The normalization process we
apply to standardize viewpoint, while useful for
preserving the richness of the perceptual information in faces, operates most eciently for faces
that are lacking in highly distinct perceptual information. The coding of faces in terms of their
similarity structure with respect to a set of reference faces, while operating at a level of abstraction beyond the richness of the perceptual representation, retains information about the distinctiveness of faces. At this level, the confusability
of a face with other faces is directly dependent
on the statistical characteristics of the entire set
of faces, and can be used to make psychophysical
predictions about individual faces.
[7]
[8]
[9]
[10]
[11]
References
[1] J. J. Atick, P. A. Grin, and A. N. Redlich.
The vocabulary of shape: principal shapes
for probing perception and neural response.
Network, 7:1{5, 1996.
[2] M. Bichsel and A. Pentland. Human face
recognition and the face image set's topology. Computer Vision, Graphics, and Image
Processing: Image Understanding, 59:254{
261, 1994.
[3] S. Edelman. Representation of similarity in
3D object discrimination. Neural Computation, 7:407{422, 1995.
[4] S. Edelman, F. Cutzu, and S. DuvdevaniBar. Similarity to reference shapes as a basis
for shape representation. In G. Cottrell, editor, Proceedings of COGSCI'96, San Diego,
CA, July 1996. to appear.
[5] S. Edelman, D. Reisfeld, and Y. Yeshurun.
Learning to recognize faces from examples.
In G. Sandini, editor, Proc. 2nd European
Conf. on Computer Vision, Lecture Notes in
Computer Science, volume 588, pages 787{
791. Springer Verlag, 1992.
[6] D. J. Field. Relations between the statistics
of natural images and the response proper-
[12]
[13]
[14]
[15]
[16]
ties of cortical cells. Journal of the Optical
Society of America, A 4:2379{2394, 1987.
M. Lando and S. Edelman. Generalization
from a single view in face recognition. CSTR 95-02, Weizmann Institute of Science,
1995.
M. Lando and S. Edelman. Receptive eld
spaces and class-based generalization from
a single view in face recognition. Network,
6:551{576, 1995.
L. L. Light, F. Kayra-Stuart, and S. Hollander. Recognition memory for typical
and unusual faces. Journal of Experimental
Psychology: Human Learning and Memory,
5:212{228, 1979.
A. O'Toole, K. Deenbacher, D. Valentin,
and H. Abdi. Structural aspects of face
recognition and the other-race eect. Memory and Cognition, 22:208{224, 1994.
A. O'Toole, T. Vetter, N. Troje, and
H. Bultho. Sex classication is better with
three-dimensional head structure than with
image intensity information. Perception, accepted.
A. J. O'Toole and S. Edelman. Modeling
face recognition across viewpoint. MPIK
TR 21, Max Planck Institut fur biologische
Kybernetik, Tubingen, Germany, October
1995.
D. L. Ruderman. The statistics of natural
images. Network, 5:517{548, 1994.
M. Turk and A. Pentland. Eigenfaces for
recognition. J. of Cognitive Neuroscience,
3:71{86, 1991.
T. Valentine and V. Bruce. The eects of
distinctiveness in recognising and classifying
faces. Perception, 15:525{535, 1986.
Y. Weiss and S. Edelman. Representation of
similarity as a goal of early visual processing.
Network, 6:19{41, 1995.
5 Acknowledgements
This work was supported by a grant from
the National Institutes of Mental Health
(1R29MH5176501A1) to A.O'T.