An estimate of the prevalence of developmental phonagnosia

Brain & Language 159 (2016) 84–91
Contents lists available at ScienceDirect
Brain & Language
journal homepage: www.elsevier.com/locate/b&l
An estimate of the prevalence of developmental phonagnosia
Bryan E. Shilowich a, Irving Biederman a,b,⇑
a
b
Department of Psychology, University of Southern California, United States
Neuroscience Program, University of Southern California, United States
a r t i c l e
i n f o
Article history:
Received 2 October 2015
Accepted 7 May 2016
Available online 1 July 2016
Keywords:
Phonagnosia
Voice recognition
Voice imagination
Prosopagnosia
Celebrity familiarity
Speaker familiarity
Famous voices
Imagery ratings
a b s t r a c t
A web-based survey estimated the distribution of voice recognition abilities with a focus on determining
the prevalence of developmental phonagnosia, the inability to identify a familiar person based on their
voice. Participants matched clips of 50 celebrity voices to 1–4 named headshots of celebrities whose
voices they had previously rated for familiarity. Given a strong correlation between rated familiarity
and recognition performance, a residual was calculated based on the average familiarity rating on each
trial, which thus constituted each respondent’s voice recognition ability that could not be accounted
for by familiarity. 3.2% of the respondents (23 of 730 participants) had residual recognition scores 2.28
SDs below the mean (whereas 8 or 1.1% would have been expected from a normal distribution). They also
judged whether they could imagine the voice of five familiar celebrities. Individuals who had difficulty in
imagining voices were also generally below average in their accuracy of recognition.
Ó 2016 Elsevier Inc. All rights reserved.
1. Introduction
Prosopagnosia, or ‘‘face blindness,” is a well-studied phenomenon in which individuals cannot recognize the faces of people
with whom they are familiar (Behrmann & Avidan, 2005; Susilo &
Duchaine, 2013). Phonagnosia, the voice equivalent of prosopagnosia, is the inability to identity a familiar speaker from his or
her voice (Kreiman & Sidtis, 2011). As with prosopagnosia, this
condition can be ‘‘acquired,” as the result of a lesion (Duchaine
et al., 2010; Hailstone, Crutch, Vestergaard, Patterson, & Warren,
2010; Van Lancker & Canter, 1982) or ‘‘developmental,” likely congenital (Garrido et al., 2009; Kennerknecht et al., 2006). In broad
terms, we can consider the inability to identify a familiar face or
a voice as arising from (a) a poorly defined perceptual representation, or (b) a failure at matching a well-defined perceptual representation to previously stored representations. If the latter, we
would expect normal levels of performance in discriminating simple visual or auditory stimuli as well as discriminating unfamiliar
faces or voices. The evidence, although limited, suggests that the
majority of cases of prosopagnosia and, quite likely, phonagnosia
as well, are developmental and arise from deficits in matching current percepts to previously stored representations (Duchaine,
⇑ Corresponding author at: University of Southern California, Neuroscience
Program, Hedco Neurosciences Bldg., 3641 Watt Way, Los Angeles, CA 900892520, United States.
E-mail address: [email protected] (I. Biederman).
http://dx.doi.org/10.1016/j.bandl.2016.05.004
0093-934X/Ó 2016 Elsevier Inc. All rights reserved.
2011; Duchaine & Nakayama, 2006; Eimer, Gosling, & Duchaine,
2012; Susilo & Duchaine, 2013). Acquired conditions are more
likely to present deficits in low-level perceptual discriminations,
e.g., Xu and Biederman’s (2014) prosopagnosic MJH, and the
phonagnosic cases reviewed by Kreiman and Sidtis (2011);
although such deficits, of course, may be present in some individuals with developmental conditions, as in the two phonagnosics
identified by Roswandowitz et al. (2014).
Thousands of participants have taken tests for prosopagnosia on
faceblind.org, yielding an estimate of the incidence of developmental prosopagnosia of approximately 2% (Holden, 2006;
Kennerknecht et al., 2006). The present study provides an estimate
of the prevalence of developmental voice recognition deficits. Only
five developmental cases of phonagnosia have been reported in the
literature in which the primary deficit was in matching familiar
voices (Biederman et al., 2013, which included the phonagnosic
reported by Garrido et al., 2009; and Roswandowitz et al., 2014).
However, this should not be taken as an indicator of its rarity;
phonagnosics may either be unaware they have a deficit or, if they
are, may simply not have been discovered by researchers. The present investigation was not designed to provide a theoretical characterization of phonagnosia. Xu et al. (2015) investigated two
broad potential bases for phonagnosia, (a) an inability to develop
a perceptual representation sufficient to distinguish similar voices,
and (b) an inability to match a clear perceptual representation to
previously stored representation of voices. Their intensive investigation of phonagnosic AN clearly established that her deficit was in
B.E. Shilowich, I. Biederman / Brain & Language 159 (2016) 84–91
matching, long term, familiar voices. She showed no deficit in distinguishing, short term (e.g., over 20 s) highly similar voices.
An estimate of the prevalence of phonagnosia (0.2%) was
reported by Roswandowitz et al. (2014) with a web-based test in
which speakers were taught to recognize new voices and only later,
based on a selection of performance on this voice-learning test,
were a subset of these participants invited to the laboratory for
testing on familiar voice recognition. The present study, in its initial assessment of the recognition of familiar celebrity voices (as
opposed to newly taught voices) provides an estimate that would
thus appear to provide a more direct estimate of the core capacity
for voice individuation, in the same manner as the more direct
measure of prosopagnosia would be a deficit in the recognition
of familiar faces. In addition, our assessment avoided a possible
selection bias of willingness to engage in a lab-based investigation.
The challenge in our assessment was to achieve the estimate in the
face of varying familiarity of the participants with the target voices.
2. Materials and methods
Voice recognition ability was assessed through three highly
similar web-based Celebrity Voice Recognition surveys (Fig. 1).
The largest pool of respondents (n = 534) was comprised of undergraduates at the University of Southern California who received
credit for their participation in courses in psychology. They were
tested between August and December of 2013. A subsequent sample (n = 376) of undergraduates in the Introduction to Psychology
course at USC was acquired between January and February of
2014 as part of ‘‘prescreening” testing to determine individual
characteristics for follow-up experiments. The final survey was
conducted from August 2013 to February 2014 on USC’s Image
Understanding Laboratory’s website (http://www.geon.usc.edu/)
where 67 participants either voluntarily found or were directed
to the test out of interest in the phenomenon. All three surveys
were identical with the exception of the imagination section and
phrasing of some of the personal history and judged capacity questions, noted in later sections. The surveys each began with basic
descriptive items asking for age, sex, handedness, and occupation.
Next were self-judgments of voice and face recognition abilities,
rated on a three point scale from below to above average. Last, participants provided information on how long they had lived in
America and/or been immersed in American culture as the recognition test required familiarity with American celebrities. 95 of
these respondents also took a celebrity face recognition test
(https://usc.qualtrics.com/SE/?SID=SV_1UiJND20qc6h0a1).
Our
test was modeled after the one on faceblind.org in which the participants had to provide some identifying information about the
celebrity, e.g., profession, rate the celebrity’s familiarity, and self
score the accuracy of the identification.
After the three surveys described above, an additional, fourth
survey was conducted with 423 participants. This survey included
two neural history questions and self-assessments of basic auditory and visual functioning (with deficits expected to be quite rare
in this young, largely college-attending population) which are
included in Table 1. There were sufficient differences in the familiarity ratings and voice recognition test formats for these participants that their recognition data are not included in the present
results although their data are highly similar to those of the prior
surveys. They are noted here for their contribution to the selfassessment and personal history items in Table 1.
2.1. Pretest familiarity assessment
Participants first completed a survey of their familiarity with
celebrity speaking voices. They were presented with a list of 100
85
celebrities–entertainers, politicians, and public figures–popular in
mainstream culture. Each celebrity was shown with their name
and a headshot. This list was originally generated by phonagnosic
subject, AN, a 20 year old female student, well seeped in popular
culture, as part of a prior study (Xu et al., 2015). In the first three
surveys the participants indicated by mouse click those celebrities
whose speaking voices were unfamiliar to them. Many of the participants indicated that they were familiar with all or almost all of
the 100 celebrities.
2.2. Recognition Task
On each of 50 trials, participants listened to two 6–8 s voice
clips extracted from interviews found on the internet, chosen
specifically not to convey any information about identity or profession. Although the original comparison (Xu et al., 2015) of a
phonagnosic compared to 20 controls used 100 trials on the recognition test, initial testing on the larger web-based surveys indicated that many respondents did not complete the full hour-long
test so the test was reduced to the 50 most familiar celebrities,
which required approximately 30 min for completion. Respondents were to match each clip to 1, 2, or 4 celebrity targets specified by name and photo (Fig. 1). The restriction in the number of
alternatives on each trial was incorporated into the design because
prior research (e.g., Legge, Grosman, & Pieper, 1984), as well as a
pilot study in our own laboratory, had shown that voice recognition accuracy among a large and unconstrained set is extremely
low. One of the clips was the voice of a pictured celebrity and
the other was of a non-famous person. The foils matched the celebrity target in sex, and generally in race, accent, and approximate
age. Upon listening to the voice clips, the participants chose the
voice that they believed matched one of the pictured celebrities.
Participants clicked on a five-point confidence rating scale following both their choice as to which clip was the celebrity and (for the
two- and four-choice conditions), which celebrity matched the
clip. (In the one-target condition, the selection of the celebrity by
default designated the particular target so only the question as to
which one was the celebrity was posed.) There were a total of 50
trials: 16 with one and four targets and 18 with two targets, randomly ordered. The test, including the Pretest Familiarity Assessment and Imagination Ratings, described below, took
approximately 30 min to complete.
2.3. Imagination ratings
After the recognition trials, the participants were instructed to
name five celebrities who were not on the test whose speaking
voices were familiar to the subject. For each celebrity named, the
participants then rated on a 1–5 scale how well they could imagine
that celebrity’s voice. The participants in the first survey and from
the lab website (see below) were provided the guidelines ‘‘[1] I
simply could not imagine the celebrity’s voice; [3] I only had a
vague image of their voice that was not particularly distinctive
from other voices; [5] I was able to clearly imagine the celebrity’s
voice.” The second survey provided the following, more specific
instructions for voice imagination ratings (adapted from Marks,
1973): ‘‘[1] No auditory imagery at all, you only know that you
are thinking of the person’s voice; [2] Vague and dim; [3] Moderately clear and vivid; [4] Clear and reasonably vivid; [5] Perfectly
clear and as vivid as normal hearing.” The mean imagination ratings between the two different wordings of the instructions were
virtually identical (mean ratings for the first pool set = 4.38; second
set = 4.36, two sample t-test t(771) 1.00, d = 0.02). Thus the data
from all three pools were combined for further analysis. No respondent run in the lab ever complained of being unable to judge such
imagery. All participants, even those who could not imagine
86
B.E. Shilowich, I. Biederman / Brain & Language 159 (2016) 84–91
Fig. 1. Sample screen shot of a four-celebrity trial. (The headshots on the test were in color.) Participants listened to the clips by pushing the ‘‘play” buttons and then selected
the bubble for Voice 1 or Voice 2 to choose which voice was one of the celebrities. The specific identity of that choice was chosen with the bubble options under the headshots.
Confidence ratings for both the voice and identity choices were made with the five-star scale. Trials with two celebrities were exactly the same format but with only two
celebrity pictures. Trials with one celebrity displayed only the top half – the voice choice and confidence – as the identity was defined as the one pictured celebrity.
human voices, gave ratings near ceiling for non-voice sounds such
as inanimate objects (‘‘glass breaking”), natural sounds (‘‘ocean
waves crashing”), or animal sounds (‘‘pig oinking;” ‘‘frog ribbiting”)
(Xu et al., 2015).
B.E. Shilowich, I. Biederman / Brain & Language 159 (2016) 84–91
Table 1
Sample characteristics, self-reports of neural history, and subjective judgments of
quality of sensory systems and face and voice recognition ability.
Mean age [SD]: (n = 1153)
% Right handed (n = 1153)
% Female (n = 1153)
20.34 [3.51]
90.95
74.28%
Have you had brain trauma that may affect face
perception? (n = 422)
Yes
No
Unsure
1.18%
96.92%
1.90%
Have you had brain trauma that may affect
voice perception? (n = 423)
Yes
No
Unsure
0.24%
98.35%
1.42%
How would you rate your sense of sight?
(n = 422)
Above average
Average
Below average
33.65%
54.50%
11.85%
How would you rate your sense of hearing?
(n = 423)
Above average
Average
Below average
28.37%
64.30%
7.33%
How would rate your ability to recognize faces?
Excellent
(n = 423)
Above average
(n = 1148)
Average
(n = 1148)
Below average
(n = 1148)
Poor (n = 423)
15.60%
Excellent
(n = 423)
Above average
(n = 1148)
Average
(n = 1148)
Below average
(n = 1148)
Poor (n = 423)
5.44%
How would you rate your ability to recognize
voices?
44.64%
45.51%
4.01%
0.24%
21.25%
70.47%
6.1%
87
celebrities recognizable to those steeped in American culture, 31
respondents who lived in America for less than five years were also
removed from further analysis. Applying these criteria trimmed
the set from 977 original respondents to 730 participants, 560
female, mean age 20.6 (SD = 4.0), 658 right handed.
3. Results
3.1. Subject characteristics
Table 1 shows the breakdown of the participants according to
the initial demographic, neural history, and self-assessment questions. The primary criterion for phonagnosia is the inability to recognize familiar voices. Of course, one must rule out lower level
auditory deficits, which would be expected to be rare in the sampled population (generally, 18–22 year-old college students) and
the selection of participants with stated familiarity to celebrity
voices. The 423 respondents in the most recent survey were asked:
‘‘How would you rate your sense of hearing? (If hearing is corrected, rate as corrected [by bubble click].)” 120 rated their hearing
as above average, 272 as average, and 31 as below average. The 31
participants who rated their sense of hearing as below average
scored 72.8% on the recognition test slightly, but non significantly,
below the recognition performance (75.7%) of participants with
self-reported average and above average hearing, t(421) = 0.92,
p = 0.35, d = 0.19. Only one subject reported a history of brain
trauma. That subject’s voice recognition accuracy corrected for
familiarity (as discussed below) was below average but not statistically deviant (a score residual corrected for familiarity, discussed
later, of 10.2).
0.47%
3.2. Celebrity recognition test
Note: Different sample sizes are a consequence of different questions and different
alternative answer possibilities being asked on different surveys.
2.4. Data analysis inclusion criteria
977 respondents started the survey but the data from a total of
247 individuals were excluded, 150 because they did not finish the
test. Most (121) of these were people who logged on and might
have answered a few of the demographic or personal judgment
questions but never made it to the first trial and 29 who reached
the first trial but quit before the end. It is possible that some of
the 29 who started the test but then did not finish did so because
they were having difficulty and thus their absence from the data
could have had some effect on the results. However, given that
121 participants abandoned the test for other reasons (losing interest, technical issues, competing activities, etc.), these reasons not
implausibly could apply to the 29 as well. Ultimately, there is no
way to be certain; however, we can say that maximally 3% of the
pool opted out.
There were also 66 people who reached the end but appeared to
not give the test a reasonable effort. Individuals in this last group
were primarily undergraduates taking the online survey for credit
in their Introductory Psychology course. Detection flags for poor
effort included skipping questions, taking the test too quickly, scoring at or below chance, and having equal confidence ratings for
correct and incorrect answers. Another criterion was the removal
of any of the remaining participants who answered that they were
unfamiliar with Barack Obama’s voice. Given that the President has
a distinct and ubiquitous voice, a resident of the U.S. rating it as
‘‘unfamiliar” seemed suspect with respect to responsiveness to
the test. Lastly, because the test was based on English-speaking
A trial was scored as correct if both the voice of the celebrity
and the correct celebrity were selected. (On trials with only a single celebrity, the selection of the correct voice was all that was
required.) The overall mean voice recognition score was 76.7%
(SD = 13.5%, chance = 29%). Effects of differences in sex (79.4% for
males vs. 75.9% for females, t(728) = 2.68, p = 0.008, d = 0.25)
and handedness (right 76.4% vs. left 79.8%, t(728) = 1.836,
p = 0.0675, d = 0.17) were small but reliable or nearly reliable given
the very large sample size. There was a positive correlation, r(728)
= 0.570, p < 0.001, between the number of celebrities whose speaking voice was judged to be familiar on the pretest and the recognition accuracy on the test (Fig. 4). (As described later, this advantage
for those familiar with the celebrities held even when individual
trials were ‘‘corrected” for unfamiliarity.) Age had a small but reliable positive correlation with accuracy, r(728) = 0.17, p < 0.001,
likely due to greater target exposure as age increased (more closely
matching the age of the then 20 year old who generated the list
four years ago). Slight positive, but reliable, correlations were
found between performance on the test and self-ratings of voice
recognition and face recognition abilities, r(728) = 0.14 and r
(728) = 0.15, both ps < 0.001. The correlation between the imagination ratings and recognition accuracy was also reliable, r(696)
= 0.26, p < 0.001, and would likely have been higher except that a
large portion of the participants rated their imagery at or near ceiling as shown in Fig. 8. There was a positive correlation between
face and voice recognition accuracy, r(93) = 0.28, p < 0.01, for those
95 participants who took both tests. This correlation may be
inflated by the overlap in familiarity of the celebrities as the same
set of celebrities served as stimuli in both tests.
The overall familiarity rating for the 730 participants who satisfied the criteria for test completion was 0.73 (SD = 0.17). Fig. 2
shows that there was a positive correlation between each subject’s
88
B.E. Shilowich, I. Biederman / Brain & Language 159 (2016) 84–91
Fig. 2. Voice recognition performance as a function of the rated familiarity of the
options on each trial. This scatterplot, r(728) = 0.569, contains only those participants, (n = 730), who satisfied all criteria for test completion.
Fig. 3. Recognition as a function of the number of celebrity options on each trial
and the participants’ familiarity with the celebrity voices (0 = unfamiliar; 1 = familiar) on each trial. Given the two-stage scoring of first judging which of two voices is
the celebrity and then selecting which celebrity’s (among two or four choices) voice
was heard, the chance levels are 0.5, 0.25, and 0.125 for 1, 2 and 4 celebrity trials,
respectively. The number of trials at each level ranged from 433 (0 familiarity, 4
Celeb trial) to 3600 (1 familiarity, 1 Celeb trial). The average number of trials
represented by the other points (removing the ends of the range) is 1777 (SD = 784).
average familiarity rating (across all trials) and recognition accuracy, r(728) = 0.569, p < 0.001.
To assess the effect of familiarity on recognition performance on
individual trials, for each trial a mean familiarity rating (0 = unfamiliar; 1 = familiar) for the celebrity options was calculated using
the data from the pretest. Familiarity ratings on a given trial could
thus range from 0 to 1, with greater numbers of celebrity choices
leading to greater familiarity possibilities (e.g., one-celebrity trials
were binary, 0 or 1, but ratings on four celebrity trials were in
increments of 0.25). The effect of within-trial familiarity on recognition ability is shown in Fig. 3. The performance above chance for
the 0 familiarity cases for the one and four celebrity trials is perhaps due to participants having some familiarity to celebrities’
voices that they marked as ‘‘unfamiliar” but were not confident
in their memory of the exposure. The distinct monotonic functions
for the remainder of the cases document the benefit of voice familiarity on recognition.
For the trials with two and four celebrity choices, the data were
further analyzed according to whether the participants were
specifically familiar with the target’s (i.e., correct) voice rather than
the foil on each trial. On a two choice trial, being familiar with the
foil’s but not the matching celebrity’s voice might have yielded
recognition scores equal to the case where it was the target but
not the foil’s voice that was familiar, as they both allow elimination
of one of the two alternative celebrities. But this was not the case.
Fig. 4 shows that there is a marked benefit in recognition accuracy
conferred by familiarity with the target voice. Having a positive
confirmation of the target voice led to better performance than a
negative elimination model, i.e., one that is not the target, even
though (most simply in the 0.50–0.50 case), they would be
expected to be equally informative.
A possible reason for this effect is that those participants with
high familiarity scores to a greater extent live in the same neighborhood of ‘‘celebrity space” with AN, the original generator of
the celebrity set, than those who were not as familiar with the
celebrities. To analyze this potential neighborhood effect, we split
the data into high and low familiarity groups (each n = 365). The
mean performance of each group was assessed for each specific
within-trial familiarity level (Fig. 5). Independent of the familiarity
level on a given trial, there was an advantage of simply knowing
more of the celebrities on the test. The high familiarity group performed better than the low familiarity group at every individualtrial familiarity level at p < 0.0001 (t-values and Cohen’s d for each
familiarity level - 0: t(668) = 7.53, d = 0.60; 0.25: t(450) = 31.65,
Fig. 4. Effect of being familiar versus unfamiliar with the target voice across the individual familiarity levels. With the exception of the nonsignificant 0.25 Trial Familiarity
value, all target familiar-unfamiliar comparisons are significant at p < 0.001.
B.E. Shilowich, I. Biederman / Brain & Language 159 (2016) 84–91
Fig. 5. Individual Trial Performance based on Total Familiarity. Participants were
split into high and low total familiarity groups (each n = 365) based on the total
number of celebrities rated as being familiar in the pretest. At equal individual trial
familiarity levels, participants who were familiar with a greater number of
celebrities performed better than participants who were less familiar with those
celebrities.
89
Fig. 7. Distribution of residuals on the voice recognition test (n = 730) with actual
proportions in black and expected proportions from a normal distribution in gray.
The ordinate is the percentage of the total sample. The values for the bins on the
abscissa represent the highest (most positive) value for that bin, either positive or
negative. The lowest bin was 2.28 SDs below the mean. 23 participants (3.2%) were
in that bin; 8 (1.1%) would have been expected from a normal distribution, 99.5%
Confidence Interval = ±0.012, zprop1-prop2, p < 0.001.
Fig. 6. Ranking of 730 participants by their recognition residuals based on their
familiarity with the target voices on individual trials.
d = 2.84; 0.5: t(714) = 30.58, d = 2.28; 0.75: t(698) = 38.58, d = 2.82;
1: t(728) = 28.78, d = 2.13).
Given that performance on the recognition task was a function
of how familiar each subject was with the target celebrities, subsequent analyses of performance were based on each subject’s residual in the regression function of recognition accuracy as a function
of mean per trial familiarity shown in Fig. 3. The distribution of
these residuals is shown in Fig. 6.
The residuals constitute people’s voice recognition abilities that
cannot be accounted for by their familiarity with the targets. The
standard deviation of the residuals was 11.4% on the recognition
task. A distribution of the residuals (Fig. 7) into 15 bins yielded a
distribution that was unimodal and fairly symmetric but departed
from a Gaussian, X2(14) = 77.14, p < 0.001, primarily with more
peakedness (lower kurtosis) but a higher than expected frequency
in the lowest bin. That bin was 2.28 SDs below the mean with 23
participants (19 female), 3.2% of the total, whereas only 8 (1.1%)
would have been expected from a normal distribution. The difference between the obtained and expected proportions was highly
reliable, 99.5% C.I. = ±1.2%, p < 0.001.
Fig. 8. ROC curves generated from the confidence ratings of the highest vs. lowest
scoring participants (23 in each group) on the recognition test. Mean d0 (across all
confidence levels) for the high recognition scorers was 1.89; for the low recognition
scorers it was 1.08.
3.3. Confidence when correct
We analyzed the confidence ratings on the voice and identity
choices to determine whether, when participants answered correctly, they were confident that they were correct (thus rating a
confidence of 5) and whether this was true for high scoring participants as it was for low scoring participants.
Separate ROC curves were generated using the confidence ratings of the 23 participants with the lowest and 23 with highest
residual recognition scores (Fig. 8). Each point on the curve represents the mean accuracy of the participants’ assessments of their
own correctness at each confidence level. The accurate discrimination of performance among the lower recognition scores serves as
an indication that these participants were taking the test questions
as seriously as the better performers.
90
B.E. Shilowich, I. Biederman / Brain & Language 159 (2016) 84–91
3.4. Imagery ratings
In contrast to acquired prosopagnosics, developmental
prosopagnosics evidence an inability to imagine familiar faces
(Michelon & Biederman, 2003; Tree & Wilkie, 2010) although there
is some question as to whether this inability is confined to faces or
reflects a general deficit in imagery (Grüter, Grüter, Bell, & Carbon,
2009; Grüter, Grüter, & Carbon, 2008). Xu et al. (2015) reported
that the three confirmed developmental phonagnosics that they
tested were unable to imagine familiar voices but could readily
imagine non-voice sounds of animals and inanimate objects. Given
a possible theoretical parallel between developmental prosopagnosia and developmental phonagnosia, we assessed the relation
between recognition performance and imagination ratings. A complicating factor to such an analysis was that there was a strong
propensity for participants, somewhat independent of their performance on the recognition test, to give the highest imagery rating, a
5 (‘‘Perfectly clear and as vivid as normal hearing”) to the celebrities that they had individually selected to be the targets on the
imagery test. In part to minimize the influence of this bias, in addition to comparing the imagery ratings of those (the ‘‘outliers”) with
the highest and lowest residuals on the recognition test (±2.28 SDs
from the mean), we assessed the recognition performance of participants whose imagery scores were at or below 2.28 SDs below
the mean, as those participants would not have the bias to rate
their imagery at ceiling.
Voice imagery ratings of high vs. low scoring participants on the
recognition test. Of the 23 low-recognition outliers, three were
removed from this analysis due to non-completion of the imagination section. Of the 20 that remained, two participants had voice
imagination scores on the five imagination trials that were 2 SDs
below the mean (M = 4.42, SD = 0.62), with ratings of 2.4 and 2.8.
With respect to their ratings of their own voice recognition ability,
fifteen rated themselves as average and three above average. Two
participants, one of whom had the second-lowest voice recognition
score in the distribution of residuals (38.65%), rated their voice
recognition abilities as below average. Ten rated their ability at
face recognition to be above average; nine at average, one below
average. Looking specifically at the recognition outliers at the
low end of the distribution and the twenty best recognition performers, the two groups’ imagination ratings were distinct, with
a mean of 4.6 (SD = 0.52) for the high recognition group and 4.2
(SD = 0.71) for the low group, t(38) = 2.23, p = 0.031, d = 0.64).
Fig. 9. Scatterplot of Imagery Ratings and Recognition Test Score Residuals for 698
participants (out of the 730) who completed the imagination ratings. The
correlation between score residual and rated voice imagery ability was r(696)
= 0.21, p < 0.001. The correlation likely would have been higher had it not been that
many higher scoring participants on the recognition test rated their voice imagery
at ceiling. Each X is a single subject with the regression function depicted.
Recognition performance of participants with low imagery ratings.
23 participants scored at or below 2.28 SDs below the mean
(Fig. 9). 18 of those 23 participants—78.2%–had negative recognition score residuals, zprop = 4.72, p < 0.001. That is, low voice imagery scores are more likely to come from participants with low
(i.e., negative) residual recognition scores. Four of these lowscoring voice recognition and voice imagery participants also took
the face recognition test. Although they tended to be below the
mean, none of them fell beyond 2.28 SDs below the mean.
4. Discussion
To the authors’ knowledge, this is largest survey ever conducted
on the accuracy of the recognition of familiar voices to provide an
assessment of the prevalence of phonagnosia. The web assay of
Roswandowitz et al. (2014) focused on discriminating newly
learned voices. It began with a web-based test (n = 1047) in which
participants learned to recognize newly learned voices. Those who
had difficulty on that phase (n = 233) were asked to complete a
written questionnaire to which 55 participants responded. Of
those, five participants were contacted by phone for further interviews and a lab study. Ultimately only two of these participants
passed more thorough neurological screening and were administered a battery of tests, one of which assessed recognition of celebrity voices. Roswandowitz et al.’s estimate of phonagnosia
prevalence of 0.2% was thereby based on the 233 participants
selected after the initial selection of difficulty in learning new
voices (followed by an interview phase with a 23.6% response rate).
One of the two participants judged to be phonagnosic, AS, was reliably below average in discrimination of newly learned voices but
scored above average in naming celebrity voices.
Given that the core deficit in phonagnosia, from our perspective,
is an inability to recognize familiar voices, it would seem that
familiar voice recognition should be the primary basis for the estimate, with associated characteristics, such as the learning or discrimination of new voices, potentially providing an explanatory
basis of the deficit. AN, the phonagnosic intensively studied by
Xu et al. (2015), showed normal discrimination and short term
memory of voices so would likely not have been recruited for additional study in the Roswandowitz et al. study. We acknowledge
that different investigators may define phonagnosia in a manner
that differs from our own as is evident in Roswandowitz et al.’s
definition.
The positive correlation between imagination ratings and voice
recognition-test accuracy suggests that the strong relation
between these two variables found in Biederman et al. (2013) with
three phonagnosics and 20 controls is likely a general characteristic of the population at large. In that study, AN was comparable to
controls on basic auditory and voice discrimination and short-term
voice-memory tasks. She differed from controls in her poor performance on the recognition task and a self-reported inability to
imagine voices but not non-voice sounds of objects and animals.
An fMRI scan revealed that controls showed significant activation
of the ventromedial prefrontal cortex (vmPFC) when imagining
familiar voices (vs. non-voice sounds). AN did not. Given the
greater activation of the vmPFC when identifying personally familiar faces, with known biographies, as compared to faces merely
familiarized as a function of repetition in the course of the experiment (Leveroni et al., 2000), it is likely that AN was unable to associate, long term, the Voice Individuating Cues (VICs) of a familiar
person with a node in vmPFC providing access to that person’s
identity. [The Leveroni et al. (2000) result would suggest that the
Roswandowitz et al. (2014) survey for phonagnosia was not directed toward the same deficit in recognizing personally familiar
voices, with known traits and biographies, as explored in the
B.E. Shilowich, I. Biederman / Brain & Language 159 (2016) 84–91
present study.] Xu et al. (2015) conjectured that developmental
prosopagnosics have a disruption in the white matter fibers connecting temporal lobe areas, which are normal with respect to
selectivity to faces, to prefrontal cortex. This would imply that they
have an intact face representation but are unable to associate it
with what Bruce and Young (1986) termed a ‘‘person identity node
(PIN).” Our research suggests that the PIN, at least for voices, may
be partially localized in the vmPFC, a region that is also activated
when thinking about the traits of familiar people (Ma et al.,
2014). (The presumed localization of the PIN is qualified with ‘‘partially” insofar as it is not unlikely that it is a distributed network
that includes the anterior temporal regions.) Whether participants
who score low in voice recognition accuracy but do not report an
inability to imagine voices would also show an absence of activation of vmPFC remains to be determined. To the extent that voice
imagination deficits are characteristic of developmental phonagnosics, both our fMRI study of AN and the present correlation
between our behavioral task and imagination self reports suggest
that a deficit in the ability to imagine familiar voices may be a
behavioral marker for some cases of developmental phonagnosia.
Although there can be some arbitrariness in selecting any statistical cutoff for a diagnostic criterion, we have provisionally
adopted 2.28 SDs as a criterion for phonagnosia in that: (a) there
is a detectable increase in the incidence of poor recognition performance below that value from what would be expected from a
Gaussian distribution, especially given that the distribution of
recognition residuals had lower kurtosis than a Gaussian, and (b)
below that cutoff on the imagination ratings there was a disproportionately higher likelihood of individuals with deficits in voice
recognition.
At this stage in the research, one cannot rule out some heterogeneity in the characterization of phonagnosia. Although phonagnosic AN in the Xu et al. (2015) study did not evidence a deficit
in voice discrimination, one of Roswandowitz’ two phonagnosics
(AS) did. And in the present survey, not all respondents who scored
low on familiar voice recognition gave low ratings of their ability to
imagine such voices.
4.1. Conclusions
3.2% (23 individuals) of a sample of 730 respondents scored
below 2.28 SDs below the mean of the residual recognition scores
on a celebrity voice recognition test (with 1.1% expected from a
normal distribution) and therefore could qualify for being phonagnosic. The residuals served to correct for differences in the participants’ familiarity with the voices in the task. A second criterion,
that of the inability to imagine voices, was generally true of these
23 participants. It was also strongly apparent in all 3 phonagnosics
reviewed by Xu et al. (2015). A deficit in recognition accuracy of
familiar voices when coupled with the inability to mentally imagine those voices can thus serve as dual indices for phonagnosia.
Acknowledgments
We are grateful to Dr. Xiaokun Xu for his assistance both with
the formulation of the recognition test, the code used to score
the responses, and for his careful and critical reading of the paper.
Dr. Lúcia Garrido provided excellent guidance for creating identity-
91
neutral voice clip stimuli and she, Claudia Roswandowitz, and
Sarah Herald provided valuable comments on an earlier version
of this ms. We are grateful for Dr. Brad Duchaine’s guidance
regarding web survey logistics and for his ongoing insights into
both prosopagnosia and phonagnosia. An earlier report of this
research was presented by Biederman et al. (2013). Supported by
the Dornsife Research Fund.
References
Behrmann, M., & Avidan, G. (2005). Congenital prosopagnosia: Faceblind from birth.
Trends in Cognitive Sciences, 9, 180–187.
Biederman, I., Xu, X., Herald, S. B., Shilowich, B. E., Amir, O., & Allen, N. E. (2013).
Developmental Phonagnosia implicates a neural correlate for perceiving
speaker identity. In Paper presented at the annual meeting of the society for
neuroscience, San Diego. November.
Bruce, V., & Young, A. (1986). Understanding face recognition. British Journal of
Psychology, 77, 305–327.
Duchaine, B. (2011). Developmental prosopagnosia: Cognitive, neural, and
developmental investigations. In Oxford handbook of face perception
(pp. 821–838). Oxford, UK: Oxford University Press.
Duchaine, B., Garrido, L., Fox, C., Iaria, G., Sekunova, A., & Barton, J. (2010). Face
detection in acquired prosopagnosia. Journal of Vision, 10(7), 589.
Duchaine, B. C., & Nakayama, K. (2006). Developmental prosopagnosia: A window
to content-specific face processing. Current Opinion in Neurobiology, 16,
166–173. http://dx.doi.org/10.1016/j.conb.2006.03.003.
Eimer, M., Gosling, A., & Duchaine, B. (2012). Electrophysiological markers of covert
face recognition in developmental prosopagnosia. Brain: Journal of Neurology,
135(2), 542–554.
Garrido, L., Eisner, F., McGettigan, C., Stewart, L., Sauter, D., Hanley, J. R., ... Duchaine,
B. (2009). Developmental phonagnosia: A selective deficit of vocal identity
recognition. Neuropsychologia, 47, 123–131.
Grüter, T., Grüter, M., Bell, V., & Carbon, C. C. (2009). Visual mental imagery in
congenital prosopagnosia. Neuroscience Letters, 453, 135–140.
Grüter, T., Grüter, M., & Carbon, C. C. (2008). Neural and genetic foundations of face
recognition and prosopagnosia. Journal of Neuropsychology, 2(1), 79–97.
Hailstone, J., Crutch, S., Vestergaard, M., Patterson, R., & Warren, J. (2010).
Progressive associative phonagnosia: A neuropsychological analysis.
Neuropsychologia, 48(4), 1104–1114.
Holden, C. (2006). Have we met? Science Magazine, 312, 1449.
Kennerknecht, I., Grüeter, T., Welling, B., Wentzek, S., Horst, J., Edwards, S., &
Grüeter, M. (2006). First report of prevalence of non-syndromic hereditary
prosopagnosia (HPA). American Journal of Medical Genetics, 140A, 1617–1622.
Kreiman, J., & Sidtis, D. (2011). Foundations of voice studies: An interdisciplinary
approach to voice production and perception. West Sussex, UK: Wiley-Blackwell.
Legge, G. E., Grosman, C., & Pieper, C. M. (1984). Learning unfamiliar voices. Journal
of Experimental Psychology: Learning, Memory, & Cognition, 10, 298–303.
Leveroni, C. L., Seidenberg, M., Mayer, A. R., Mead, L. A., Binder, J. R., & Rao, S. M.
(2000). Neural systems underlying the recognition of familiar and newly
learned faces. The Journal of Neuroscience, 20, 878–886.
Ma, N., Baetens, K., Vandekerckhove, M., Kestemont, J., Fias, W., & Overwalle, F. V.
(2014). Traits are represented in the medial prefrontal cortex: An fMRI
adaptation study. Social Cognitive and Affective Neuroscience, 9(8), 1185–1192.
Marks, D. F. (1973). Visual imagery differences in the recall of pictures. British
Journal of Psychology, 64(1), 17–24.
Michelon, P., & Biederman, I. (2003). Less impairment in face imagery than face
perception in prosopagnosia. Neuropsychologia, 41, 421–441.
Roswandowitz, C., Mathias, S., Hintz, F., Kreitewolf, J., Schelinski, S., & von
Kriegstein, K. (2014). Two cases of selective developmental voice-recognition
impairments. Current Biology, 24(19), 2348–2353.
Susilo, T., & Duchaine, B. (2013). Advances in developmental prosopagnosia
research. Current Opinion in Neurobiology, 23(3), 423–429.
Tree, J. J., & Wilkie, J. (2010). Face and object imagery in congenital prosopagnosia: A
case series. Cortex, 46, 1189–1198.
Van Lancker, D., & Canter, G. J. (1982). Impairment of voice and face recognition in
patients with hemispheric damage. Brain and Cognition, 1, 185–198.
Xu, X., & Biederman, I. (2014). Neural correlates of face detection. Cerebral Cortex,
24, 1555–1564.
Xu, X., Biederman, I., Shilowich, B. E., Herald, S. G., Amir, O., & Allen, N. E. (2015).
Developmental phonagnosia: Neural correlates and a behavioral marker. Brain
& Language, 149, 106–117.