Brain & Language 159 (2016) 84–91 Contents lists available at ScienceDirect Brain & Language journal homepage: www.elsevier.com/locate/b&l An estimate of the prevalence of developmental phonagnosia Bryan E. Shilowich a, Irving Biederman a,b,⇑ a b Department of Psychology, University of Southern California, United States Neuroscience Program, University of Southern California, United States a r t i c l e i n f o Article history: Received 2 October 2015 Accepted 7 May 2016 Available online 1 July 2016 Keywords: Phonagnosia Voice recognition Voice imagination Prosopagnosia Celebrity familiarity Speaker familiarity Famous voices Imagery ratings a b s t r a c t A web-based survey estimated the distribution of voice recognition abilities with a focus on determining the prevalence of developmental phonagnosia, the inability to identify a familiar person based on their voice. Participants matched clips of 50 celebrity voices to 1–4 named headshots of celebrities whose voices they had previously rated for familiarity. Given a strong correlation between rated familiarity and recognition performance, a residual was calculated based on the average familiarity rating on each trial, which thus constituted each respondent’s voice recognition ability that could not be accounted for by familiarity. 3.2% of the respondents (23 of 730 participants) had residual recognition scores 2.28 SDs below the mean (whereas 8 or 1.1% would have been expected from a normal distribution). They also judged whether they could imagine the voice of five familiar celebrities. Individuals who had difficulty in imagining voices were also generally below average in their accuracy of recognition. Ó 2016 Elsevier Inc. All rights reserved. 1. Introduction Prosopagnosia, or ‘‘face blindness,” is a well-studied phenomenon in which individuals cannot recognize the faces of people with whom they are familiar (Behrmann & Avidan, 2005; Susilo & Duchaine, 2013). Phonagnosia, the voice equivalent of prosopagnosia, is the inability to identity a familiar speaker from his or her voice (Kreiman & Sidtis, 2011). As with prosopagnosia, this condition can be ‘‘acquired,” as the result of a lesion (Duchaine et al., 2010; Hailstone, Crutch, Vestergaard, Patterson, & Warren, 2010; Van Lancker & Canter, 1982) or ‘‘developmental,” likely congenital (Garrido et al., 2009; Kennerknecht et al., 2006). In broad terms, we can consider the inability to identify a familiar face or a voice as arising from (a) a poorly defined perceptual representation, or (b) a failure at matching a well-defined perceptual representation to previously stored representations. If the latter, we would expect normal levels of performance in discriminating simple visual or auditory stimuli as well as discriminating unfamiliar faces or voices. The evidence, although limited, suggests that the majority of cases of prosopagnosia and, quite likely, phonagnosia as well, are developmental and arise from deficits in matching current percepts to previously stored representations (Duchaine, ⇑ Corresponding author at: University of Southern California, Neuroscience Program, Hedco Neurosciences Bldg., 3641 Watt Way, Los Angeles, CA 900892520, United States. E-mail address: [email protected] (I. Biederman). http://dx.doi.org/10.1016/j.bandl.2016.05.004 0093-934X/Ó 2016 Elsevier Inc. All rights reserved. 2011; Duchaine & Nakayama, 2006; Eimer, Gosling, & Duchaine, 2012; Susilo & Duchaine, 2013). Acquired conditions are more likely to present deficits in low-level perceptual discriminations, e.g., Xu and Biederman’s (2014) prosopagnosic MJH, and the phonagnosic cases reviewed by Kreiman and Sidtis (2011); although such deficits, of course, may be present in some individuals with developmental conditions, as in the two phonagnosics identified by Roswandowitz et al. (2014). Thousands of participants have taken tests for prosopagnosia on faceblind.org, yielding an estimate of the incidence of developmental prosopagnosia of approximately 2% (Holden, 2006; Kennerknecht et al., 2006). The present study provides an estimate of the prevalence of developmental voice recognition deficits. Only five developmental cases of phonagnosia have been reported in the literature in which the primary deficit was in matching familiar voices (Biederman et al., 2013, which included the phonagnosic reported by Garrido et al., 2009; and Roswandowitz et al., 2014). However, this should not be taken as an indicator of its rarity; phonagnosics may either be unaware they have a deficit or, if they are, may simply not have been discovered by researchers. The present investigation was not designed to provide a theoretical characterization of phonagnosia. Xu et al. (2015) investigated two broad potential bases for phonagnosia, (a) an inability to develop a perceptual representation sufficient to distinguish similar voices, and (b) an inability to match a clear perceptual representation to previously stored representation of voices. Their intensive investigation of phonagnosic AN clearly established that her deficit was in B.E. Shilowich, I. Biederman / Brain & Language 159 (2016) 84–91 matching, long term, familiar voices. She showed no deficit in distinguishing, short term (e.g., over 20 s) highly similar voices. An estimate of the prevalence of phonagnosia (0.2%) was reported by Roswandowitz et al. (2014) with a web-based test in which speakers were taught to recognize new voices and only later, based on a selection of performance on this voice-learning test, were a subset of these participants invited to the laboratory for testing on familiar voice recognition. The present study, in its initial assessment of the recognition of familiar celebrity voices (as opposed to newly taught voices) provides an estimate that would thus appear to provide a more direct estimate of the core capacity for voice individuation, in the same manner as the more direct measure of prosopagnosia would be a deficit in the recognition of familiar faces. In addition, our assessment avoided a possible selection bias of willingness to engage in a lab-based investigation. The challenge in our assessment was to achieve the estimate in the face of varying familiarity of the participants with the target voices. 2. Materials and methods Voice recognition ability was assessed through three highly similar web-based Celebrity Voice Recognition surveys (Fig. 1). The largest pool of respondents (n = 534) was comprised of undergraduates at the University of Southern California who received credit for their participation in courses in psychology. They were tested between August and December of 2013. A subsequent sample (n = 376) of undergraduates in the Introduction to Psychology course at USC was acquired between January and February of 2014 as part of ‘‘prescreening” testing to determine individual characteristics for follow-up experiments. The final survey was conducted from August 2013 to February 2014 on USC’s Image Understanding Laboratory’s website (http://www.geon.usc.edu/) where 67 participants either voluntarily found or were directed to the test out of interest in the phenomenon. All three surveys were identical with the exception of the imagination section and phrasing of some of the personal history and judged capacity questions, noted in later sections. The surveys each began with basic descriptive items asking for age, sex, handedness, and occupation. Next were self-judgments of voice and face recognition abilities, rated on a three point scale from below to above average. Last, participants provided information on how long they had lived in America and/or been immersed in American culture as the recognition test required familiarity with American celebrities. 95 of these respondents also took a celebrity face recognition test (https://usc.qualtrics.com/SE/?SID=SV_1UiJND20qc6h0a1). Our test was modeled after the one on faceblind.org in which the participants had to provide some identifying information about the celebrity, e.g., profession, rate the celebrity’s familiarity, and self score the accuracy of the identification. After the three surveys described above, an additional, fourth survey was conducted with 423 participants. This survey included two neural history questions and self-assessments of basic auditory and visual functioning (with deficits expected to be quite rare in this young, largely college-attending population) which are included in Table 1. There were sufficient differences in the familiarity ratings and voice recognition test formats for these participants that their recognition data are not included in the present results although their data are highly similar to those of the prior surveys. They are noted here for their contribution to the selfassessment and personal history items in Table 1. 2.1. Pretest familiarity assessment Participants first completed a survey of their familiarity with celebrity speaking voices. They were presented with a list of 100 85 celebrities–entertainers, politicians, and public figures–popular in mainstream culture. Each celebrity was shown with their name and a headshot. This list was originally generated by phonagnosic subject, AN, a 20 year old female student, well seeped in popular culture, as part of a prior study (Xu et al., 2015). In the first three surveys the participants indicated by mouse click those celebrities whose speaking voices were unfamiliar to them. Many of the participants indicated that they were familiar with all or almost all of the 100 celebrities. 2.2. Recognition Task On each of 50 trials, participants listened to two 6–8 s voice clips extracted from interviews found on the internet, chosen specifically not to convey any information about identity or profession. Although the original comparison (Xu et al., 2015) of a phonagnosic compared to 20 controls used 100 trials on the recognition test, initial testing on the larger web-based surveys indicated that many respondents did not complete the full hour-long test so the test was reduced to the 50 most familiar celebrities, which required approximately 30 min for completion. Respondents were to match each clip to 1, 2, or 4 celebrity targets specified by name and photo (Fig. 1). The restriction in the number of alternatives on each trial was incorporated into the design because prior research (e.g., Legge, Grosman, & Pieper, 1984), as well as a pilot study in our own laboratory, had shown that voice recognition accuracy among a large and unconstrained set is extremely low. One of the clips was the voice of a pictured celebrity and the other was of a non-famous person. The foils matched the celebrity target in sex, and generally in race, accent, and approximate age. Upon listening to the voice clips, the participants chose the voice that they believed matched one of the pictured celebrities. Participants clicked on a five-point confidence rating scale following both their choice as to which clip was the celebrity and (for the two- and four-choice conditions), which celebrity matched the clip. (In the one-target condition, the selection of the celebrity by default designated the particular target so only the question as to which one was the celebrity was posed.) There were a total of 50 trials: 16 with one and four targets and 18 with two targets, randomly ordered. The test, including the Pretest Familiarity Assessment and Imagination Ratings, described below, took approximately 30 min to complete. 2.3. Imagination ratings After the recognition trials, the participants were instructed to name five celebrities who were not on the test whose speaking voices were familiar to the subject. For each celebrity named, the participants then rated on a 1–5 scale how well they could imagine that celebrity’s voice. The participants in the first survey and from the lab website (see below) were provided the guidelines ‘‘[1] I simply could not imagine the celebrity’s voice; [3] I only had a vague image of their voice that was not particularly distinctive from other voices; [5] I was able to clearly imagine the celebrity’s voice.” The second survey provided the following, more specific instructions for voice imagination ratings (adapted from Marks, 1973): ‘‘[1] No auditory imagery at all, you only know that you are thinking of the person’s voice; [2] Vague and dim; [3] Moderately clear and vivid; [4] Clear and reasonably vivid; [5] Perfectly clear and as vivid as normal hearing.” The mean imagination ratings between the two different wordings of the instructions were virtually identical (mean ratings for the first pool set = 4.38; second set = 4.36, two sample t-test t(771) 1.00, d = 0.02). Thus the data from all three pools were combined for further analysis. No respondent run in the lab ever complained of being unable to judge such imagery. All participants, even those who could not imagine 86 B.E. Shilowich, I. Biederman / Brain & Language 159 (2016) 84–91 Fig. 1. Sample screen shot of a four-celebrity trial. (The headshots on the test were in color.) Participants listened to the clips by pushing the ‘‘play” buttons and then selected the bubble for Voice 1 or Voice 2 to choose which voice was one of the celebrities. The specific identity of that choice was chosen with the bubble options under the headshots. Confidence ratings for both the voice and identity choices were made with the five-star scale. Trials with two celebrities were exactly the same format but with only two celebrity pictures. Trials with one celebrity displayed only the top half – the voice choice and confidence – as the identity was defined as the one pictured celebrity. human voices, gave ratings near ceiling for non-voice sounds such as inanimate objects (‘‘glass breaking”), natural sounds (‘‘ocean waves crashing”), or animal sounds (‘‘pig oinking;” ‘‘frog ribbiting”) (Xu et al., 2015). B.E. Shilowich, I. Biederman / Brain & Language 159 (2016) 84–91 Table 1 Sample characteristics, self-reports of neural history, and subjective judgments of quality of sensory systems and face and voice recognition ability. Mean age [SD]: (n = 1153) % Right handed (n = 1153) % Female (n = 1153) 20.34 [3.51] 90.95 74.28% Have you had brain trauma that may affect face perception? (n = 422) Yes No Unsure 1.18% 96.92% 1.90% Have you had brain trauma that may affect voice perception? (n = 423) Yes No Unsure 0.24% 98.35% 1.42% How would you rate your sense of sight? (n = 422) Above average Average Below average 33.65% 54.50% 11.85% How would you rate your sense of hearing? (n = 423) Above average Average Below average 28.37% 64.30% 7.33% How would rate your ability to recognize faces? Excellent (n = 423) Above average (n = 1148) Average (n = 1148) Below average (n = 1148) Poor (n = 423) 15.60% Excellent (n = 423) Above average (n = 1148) Average (n = 1148) Below average (n = 1148) Poor (n = 423) 5.44% How would you rate your ability to recognize voices? 44.64% 45.51% 4.01% 0.24% 21.25% 70.47% 6.1% 87 celebrities recognizable to those steeped in American culture, 31 respondents who lived in America for less than five years were also removed from further analysis. Applying these criteria trimmed the set from 977 original respondents to 730 participants, 560 female, mean age 20.6 (SD = 4.0), 658 right handed. 3. Results 3.1. Subject characteristics Table 1 shows the breakdown of the participants according to the initial demographic, neural history, and self-assessment questions. The primary criterion for phonagnosia is the inability to recognize familiar voices. Of course, one must rule out lower level auditory deficits, which would be expected to be rare in the sampled population (generally, 18–22 year-old college students) and the selection of participants with stated familiarity to celebrity voices. The 423 respondents in the most recent survey were asked: ‘‘How would you rate your sense of hearing? (If hearing is corrected, rate as corrected [by bubble click].)” 120 rated their hearing as above average, 272 as average, and 31 as below average. The 31 participants who rated their sense of hearing as below average scored 72.8% on the recognition test slightly, but non significantly, below the recognition performance (75.7%) of participants with self-reported average and above average hearing, t(421) = 0.92, p = 0.35, d = 0.19. Only one subject reported a history of brain trauma. That subject’s voice recognition accuracy corrected for familiarity (as discussed below) was below average but not statistically deviant (a score residual corrected for familiarity, discussed later, of 10.2). 0.47% 3.2. Celebrity recognition test Note: Different sample sizes are a consequence of different questions and different alternative answer possibilities being asked on different surveys. 2.4. Data analysis inclusion criteria 977 respondents started the survey but the data from a total of 247 individuals were excluded, 150 because they did not finish the test. Most (121) of these were people who logged on and might have answered a few of the demographic or personal judgment questions but never made it to the first trial and 29 who reached the first trial but quit before the end. It is possible that some of the 29 who started the test but then did not finish did so because they were having difficulty and thus their absence from the data could have had some effect on the results. However, given that 121 participants abandoned the test for other reasons (losing interest, technical issues, competing activities, etc.), these reasons not implausibly could apply to the 29 as well. Ultimately, there is no way to be certain; however, we can say that maximally 3% of the pool opted out. There were also 66 people who reached the end but appeared to not give the test a reasonable effort. Individuals in this last group were primarily undergraduates taking the online survey for credit in their Introductory Psychology course. Detection flags for poor effort included skipping questions, taking the test too quickly, scoring at or below chance, and having equal confidence ratings for correct and incorrect answers. Another criterion was the removal of any of the remaining participants who answered that they were unfamiliar with Barack Obama’s voice. Given that the President has a distinct and ubiquitous voice, a resident of the U.S. rating it as ‘‘unfamiliar” seemed suspect with respect to responsiveness to the test. Lastly, because the test was based on English-speaking A trial was scored as correct if both the voice of the celebrity and the correct celebrity were selected. (On trials with only a single celebrity, the selection of the correct voice was all that was required.) The overall mean voice recognition score was 76.7% (SD = 13.5%, chance = 29%). Effects of differences in sex (79.4% for males vs. 75.9% for females, t(728) = 2.68, p = 0.008, d = 0.25) and handedness (right 76.4% vs. left 79.8%, t(728) = 1.836, p = 0.0675, d = 0.17) were small but reliable or nearly reliable given the very large sample size. There was a positive correlation, r(728) = 0.570, p < 0.001, between the number of celebrities whose speaking voice was judged to be familiar on the pretest and the recognition accuracy on the test (Fig. 4). (As described later, this advantage for those familiar with the celebrities held even when individual trials were ‘‘corrected” for unfamiliarity.) Age had a small but reliable positive correlation with accuracy, r(728) = 0.17, p < 0.001, likely due to greater target exposure as age increased (more closely matching the age of the then 20 year old who generated the list four years ago). Slight positive, but reliable, correlations were found between performance on the test and self-ratings of voice recognition and face recognition abilities, r(728) = 0.14 and r (728) = 0.15, both ps < 0.001. The correlation between the imagination ratings and recognition accuracy was also reliable, r(696) = 0.26, p < 0.001, and would likely have been higher except that a large portion of the participants rated their imagery at or near ceiling as shown in Fig. 8. There was a positive correlation between face and voice recognition accuracy, r(93) = 0.28, p < 0.01, for those 95 participants who took both tests. This correlation may be inflated by the overlap in familiarity of the celebrities as the same set of celebrities served as stimuli in both tests. The overall familiarity rating for the 730 participants who satisfied the criteria for test completion was 0.73 (SD = 0.17). Fig. 2 shows that there was a positive correlation between each subject’s 88 B.E. Shilowich, I. Biederman / Brain & Language 159 (2016) 84–91 Fig. 2. Voice recognition performance as a function of the rated familiarity of the options on each trial. This scatterplot, r(728) = 0.569, contains only those participants, (n = 730), who satisfied all criteria for test completion. Fig. 3. Recognition as a function of the number of celebrity options on each trial and the participants’ familiarity with the celebrity voices (0 = unfamiliar; 1 = familiar) on each trial. Given the two-stage scoring of first judging which of two voices is the celebrity and then selecting which celebrity’s (among two or four choices) voice was heard, the chance levels are 0.5, 0.25, and 0.125 for 1, 2 and 4 celebrity trials, respectively. The number of trials at each level ranged from 433 (0 familiarity, 4 Celeb trial) to 3600 (1 familiarity, 1 Celeb trial). The average number of trials represented by the other points (removing the ends of the range) is 1777 (SD = 784). average familiarity rating (across all trials) and recognition accuracy, r(728) = 0.569, p < 0.001. To assess the effect of familiarity on recognition performance on individual trials, for each trial a mean familiarity rating (0 = unfamiliar; 1 = familiar) for the celebrity options was calculated using the data from the pretest. Familiarity ratings on a given trial could thus range from 0 to 1, with greater numbers of celebrity choices leading to greater familiarity possibilities (e.g., one-celebrity trials were binary, 0 or 1, but ratings on four celebrity trials were in increments of 0.25). The effect of within-trial familiarity on recognition ability is shown in Fig. 3. The performance above chance for the 0 familiarity cases for the one and four celebrity trials is perhaps due to participants having some familiarity to celebrities’ voices that they marked as ‘‘unfamiliar” but were not confident in their memory of the exposure. The distinct monotonic functions for the remainder of the cases document the benefit of voice familiarity on recognition. For the trials with two and four celebrity choices, the data were further analyzed according to whether the participants were specifically familiar with the target’s (i.e., correct) voice rather than the foil on each trial. On a two choice trial, being familiar with the foil’s but not the matching celebrity’s voice might have yielded recognition scores equal to the case where it was the target but not the foil’s voice that was familiar, as they both allow elimination of one of the two alternative celebrities. But this was not the case. Fig. 4 shows that there is a marked benefit in recognition accuracy conferred by familiarity with the target voice. Having a positive confirmation of the target voice led to better performance than a negative elimination model, i.e., one that is not the target, even though (most simply in the 0.50–0.50 case), they would be expected to be equally informative. A possible reason for this effect is that those participants with high familiarity scores to a greater extent live in the same neighborhood of ‘‘celebrity space” with AN, the original generator of the celebrity set, than those who were not as familiar with the celebrities. To analyze this potential neighborhood effect, we split the data into high and low familiarity groups (each n = 365). The mean performance of each group was assessed for each specific within-trial familiarity level (Fig. 5). Independent of the familiarity level on a given trial, there was an advantage of simply knowing more of the celebrities on the test. The high familiarity group performed better than the low familiarity group at every individualtrial familiarity level at p < 0.0001 (t-values and Cohen’s d for each familiarity level - 0: t(668) = 7.53, d = 0.60; 0.25: t(450) = 31.65, Fig. 4. Effect of being familiar versus unfamiliar with the target voice across the individual familiarity levels. With the exception of the nonsignificant 0.25 Trial Familiarity value, all target familiar-unfamiliar comparisons are significant at p < 0.001. B.E. Shilowich, I. Biederman / Brain & Language 159 (2016) 84–91 Fig. 5. Individual Trial Performance based on Total Familiarity. Participants were split into high and low total familiarity groups (each n = 365) based on the total number of celebrities rated as being familiar in the pretest. At equal individual trial familiarity levels, participants who were familiar with a greater number of celebrities performed better than participants who were less familiar with those celebrities. 89 Fig. 7. Distribution of residuals on the voice recognition test (n = 730) with actual proportions in black and expected proportions from a normal distribution in gray. The ordinate is the percentage of the total sample. The values for the bins on the abscissa represent the highest (most positive) value for that bin, either positive or negative. The lowest bin was 2.28 SDs below the mean. 23 participants (3.2%) were in that bin; 8 (1.1%) would have been expected from a normal distribution, 99.5% Confidence Interval = ±0.012, zprop1-prop2, p < 0.001. Fig. 6. Ranking of 730 participants by their recognition residuals based on their familiarity with the target voices on individual trials. d = 2.84; 0.5: t(714) = 30.58, d = 2.28; 0.75: t(698) = 38.58, d = 2.82; 1: t(728) = 28.78, d = 2.13). Given that performance on the recognition task was a function of how familiar each subject was with the target celebrities, subsequent analyses of performance were based on each subject’s residual in the regression function of recognition accuracy as a function of mean per trial familiarity shown in Fig. 3. The distribution of these residuals is shown in Fig. 6. The residuals constitute people’s voice recognition abilities that cannot be accounted for by their familiarity with the targets. The standard deviation of the residuals was 11.4% on the recognition task. A distribution of the residuals (Fig. 7) into 15 bins yielded a distribution that was unimodal and fairly symmetric but departed from a Gaussian, X2(14) = 77.14, p < 0.001, primarily with more peakedness (lower kurtosis) but a higher than expected frequency in the lowest bin. That bin was 2.28 SDs below the mean with 23 participants (19 female), 3.2% of the total, whereas only 8 (1.1%) would have been expected from a normal distribution. The difference between the obtained and expected proportions was highly reliable, 99.5% C.I. = ±1.2%, p < 0.001. Fig. 8. ROC curves generated from the confidence ratings of the highest vs. lowest scoring participants (23 in each group) on the recognition test. Mean d0 (across all confidence levels) for the high recognition scorers was 1.89; for the low recognition scorers it was 1.08. 3.3. Confidence when correct We analyzed the confidence ratings on the voice and identity choices to determine whether, when participants answered correctly, they were confident that they were correct (thus rating a confidence of 5) and whether this was true for high scoring participants as it was for low scoring participants. Separate ROC curves were generated using the confidence ratings of the 23 participants with the lowest and 23 with highest residual recognition scores (Fig. 8). Each point on the curve represents the mean accuracy of the participants’ assessments of their own correctness at each confidence level. The accurate discrimination of performance among the lower recognition scores serves as an indication that these participants were taking the test questions as seriously as the better performers. 90 B.E. Shilowich, I. Biederman / Brain & Language 159 (2016) 84–91 3.4. Imagery ratings In contrast to acquired prosopagnosics, developmental prosopagnosics evidence an inability to imagine familiar faces (Michelon & Biederman, 2003; Tree & Wilkie, 2010) although there is some question as to whether this inability is confined to faces or reflects a general deficit in imagery (Grüter, Grüter, Bell, & Carbon, 2009; Grüter, Grüter, & Carbon, 2008). Xu et al. (2015) reported that the three confirmed developmental phonagnosics that they tested were unable to imagine familiar voices but could readily imagine non-voice sounds of animals and inanimate objects. Given a possible theoretical parallel between developmental prosopagnosia and developmental phonagnosia, we assessed the relation between recognition performance and imagination ratings. A complicating factor to such an analysis was that there was a strong propensity for participants, somewhat independent of their performance on the recognition test, to give the highest imagery rating, a 5 (‘‘Perfectly clear and as vivid as normal hearing”) to the celebrities that they had individually selected to be the targets on the imagery test. In part to minimize the influence of this bias, in addition to comparing the imagery ratings of those (the ‘‘outliers”) with the highest and lowest residuals on the recognition test (±2.28 SDs from the mean), we assessed the recognition performance of participants whose imagery scores were at or below 2.28 SDs below the mean, as those participants would not have the bias to rate their imagery at ceiling. Voice imagery ratings of high vs. low scoring participants on the recognition test. Of the 23 low-recognition outliers, three were removed from this analysis due to non-completion of the imagination section. Of the 20 that remained, two participants had voice imagination scores on the five imagination trials that were 2 SDs below the mean (M = 4.42, SD = 0.62), with ratings of 2.4 and 2.8. With respect to their ratings of their own voice recognition ability, fifteen rated themselves as average and three above average. Two participants, one of whom had the second-lowest voice recognition score in the distribution of residuals (38.65%), rated their voice recognition abilities as below average. Ten rated their ability at face recognition to be above average; nine at average, one below average. Looking specifically at the recognition outliers at the low end of the distribution and the twenty best recognition performers, the two groups’ imagination ratings were distinct, with a mean of 4.6 (SD = 0.52) for the high recognition group and 4.2 (SD = 0.71) for the low group, t(38) = 2.23, p = 0.031, d = 0.64). Fig. 9. Scatterplot of Imagery Ratings and Recognition Test Score Residuals for 698 participants (out of the 730) who completed the imagination ratings. The correlation between score residual and rated voice imagery ability was r(696) = 0.21, p < 0.001. The correlation likely would have been higher had it not been that many higher scoring participants on the recognition test rated their voice imagery at ceiling. Each X is a single subject with the regression function depicted. Recognition performance of participants with low imagery ratings. 23 participants scored at or below 2.28 SDs below the mean (Fig. 9). 18 of those 23 participants—78.2%–had negative recognition score residuals, zprop = 4.72, p < 0.001. That is, low voice imagery scores are more likely to come from participants with low (i.e., negative) residual recognition scores. Four of these lowscoring voice recognition and voice imagery participants also took the face recognition test. Although they tended to be below the mean, none of them fell beyond 2.28 SDs below the mean. 4. Discussion To the authors’ knowledge, this is largest survey ever conducted on the accuracy of the recognition of familiar voices to provide an assessment of the prevalence of phonagnosia. The web assay of Roswandowitz et al. (2014) focused on discriminating newly learned voices. It began with a web-based test (n = 1047) in which participants learned to recognize newly learned voices. Those who had difficulty on that phase (n = 233) were asked to complete a written questionnaire to which 55 participants responded. Of those, five participants were contacted by phone for further interviews and a lab study. Ultimately only two of these participants passed more thorough neurological screening and were administered a battery of tests, one of which assessed recognition of celebrity voices. Roswandowitz et al.’s estimate of phonagnosia prevalence of 0.2% was thereby based on the 233 participants selected after the initial selection of difficulty in learning new voices (followed by an interview phase with a 23.6% response rate). One of the two participants judged to be phonagnosic, AS, was reliably below average in discrimination of newly learned voices but scored above average in naming celebrity voices. Given that the core deficit in phonagnosia, from our perspective, is an inability to recognize familiar voices, it would seem that familiar voice recognition should be the primary basis for the estimate, with associated characteristics, such as the learning or discrimination of new voices, potentially providing an explanatory basis of the deficit. AN, the phonagnosic intensively studied by Xu et al. (2015), showed normal discrimination and short term memory of voices so would likely not have been recruited for additional study in the Roswandowitz et al. study. We acknowledge that different investigators may define phonagnosia in a manner that differs from our own as is evident in Roswandowitz et al.’s definition. The positive correlation between imagination ratings and voice recognition-test accuracy suggests that the strong relation between these two variables found in Biederman et al. (2013) with three phonagnosics and 20 controls is likely a general characteristic of the population at large. In that study, AN was comparable to controls on basic auditory and voice discrimination and short-term voice-memory tasks. She differed from controls in her poor performance on the recognition task and a self-reported inability to imagine voices but not non-voice sounds of objects and animals. An fMRI scan revealed that controls showed significant activation of the ventromedial prefrontal cortex (vmPFC) when imagining familiar voices (vs. non-voice sounds). AN did not. Given the greater activation of the vmPFC when identifying personally familiar faces, with known biographies, as compared to faces merely familiarized as a function of repetition in the course of the experiment (Leveroni et al., 2000), it is likely that AN was unable to associate, long term, the Voice Individuating Cues (VICs) of a familiar person with a node in vmPFC providing access to that person’s identity. [The Leveroni et al. (2000) result would suggest that the Roswandowitz et al. (2014) survey for phonagnosia was not directed toward the same deficit in recognizing personally familiar voices, with known traits and biographies, as explored in the B.E. Shilowich, I. Biederman / Brain & Language 159 (2016) 84–91 present study.] Xu et al. (2015) conjectured that developmental prosopagnosics have a disruption in the white matter fibers connecting temporal lobe areas, which are normal with respect to selectivity to faces, to prefrontal cortex. This would imply that they have an intact face representation but are unable to associate it with what Bruce and Young (1986) termed a ‘‘person identity node (PIN).” Our research suggests that the PIN, at least for voices, may be partially localized in the vmPFC, a region that is also activated when thinking about the traits of familiar people (Ma et al., 2014). (The presumed localization of the PIN is qualified with ‘‘partially” insofar as it is not unlikely that it is a distributed network that includes the anterior temporal regions.) Whether participants who score low in voice recognition accuracy but do not report an inability to imagine voices would also show an absence of activation of vmPFC remains to be determined. To the extent that voice imagination deficits are characteristic of developmental phonagnosics, both our fMRI study of AN and the present correlation between our behavioral task and imagination self reports suggest that a deficit in the ability to imagine familiar voices may be a behavioral marker for some cases of developmental phonagnosia. Although there can be some arbitrariness in selecting any statistical cutoff for a diagnostic criterion, we have provisionally adopted 2.28 SDs as a criterion for phonagnosia in that: (a) there is a detectable increase in the incidence of poor recognition performance below that value from what would be expected from a Gaussian distribution, especially given that the distribution of recognition residuals had lower kurtosis than a Gaussian, and (b) below that cutoff on the imagination ratings there was a disproportionately higher likelihood of individuals with deficits in voice recognition. At this stage in the research, one cannot rule out some heterogeneity in the characterization of phonagnosia. Although phonagnosic AN in the Xu et al. (2015) study did not evidence a deficit in voice discrimination, one of Roswandowitz’ two phonagnosics (AS) did. And in the present survey, not all respondents who scored low on familiar voice recognition gave low ratings of their ability to imagine such voices. 4.1. Conclusions 3.2% (23 individuals) of a sample of 730 respondents scored below 2.28 SDs below the mean of the residual recognition scores on a celebrity voice recognition test (with 1.1% expected from a normal distribution) and therefore could qualify for being phonagnosic. The residuals served to correct for differences in the participants’ familiarity with the voices in the task. A second criterion, that of the inability to imagine voices, was generally true of these 23 participants. It was also strongly apparent in all 3 phonagnosics reviewed by Xu et al. (2015). A deficit in recognition accuracy of familiar voices when coupled with the inability to mentally imagine those voices can thus serve as dual indices for phonagnosia. Acknowledgments We are grateful to Dr. Xiaokun Xu for his assistance both with the formulation of the recognition test, the code used to score the responses, and for his careful and critical reading of the paper. Dr. Lúcia Garrido provided excellent guidance for creating identity- 91 neutral voice clip stimuli and she, Claudia Roswandowitz, and Sarah Herald provided valuable comments on an earlier version of this ms. We are grateful for Dr. Brad Duchaine’s guidance regarding web survey logistics and for his ongoing insights into both prosopagnosia and phonagnosia. An earlier report of this research was presented by Biederman et al. (2013). Supported by the Dornsife Research Fund. References Behrmann, M., & Avidan, G. (2005). Congenital prosopagnosia: Faceblind from birth. Trends in Cognitive Sciences, 9, 180–187. Biederman, I., Xu, X., Herald, S. B., Shilowich, B. E., Amir, O., & Allen, N. E. (2013). Developmental Phonagnosia implicates a neural correlate for perceiving speaker identity. In Paper presented at the annual meeting of the society for neuroscience, San Diego. November. Bruce, V., & Young, A. (1986). Understanding face recognition. British Journal of Psychology, 77, 305–327. Duchaine, B. (2011). Developmental prosopagnosia: Cognitive, neural, and developmental investigations. In Oxford handbook of face perception (pp. 821–838). Oxford, UK: Oxford University Press. Duchaine, B., Garrido, L., Fox, C., Iaria, G., Sekunova, A., & Barton, J. (2010). Face detection in acquired prosopagnosia. Journal of Vision, 10(7), 589. Duchaine, B. C., & Nakayama, K. (2006). Developmental prosopagnosia: A window to content-specific face processing. Current Opinion in Neurobiology, 16, 166–173. http://dx.doi.org/10.1016/j.conb.2006.03.003. Eimer, M., Gosling, A., & Duchaine, B. (2012). Electrophysiological markers of covert face recognition in developmental prosopagnosia. Brain: Journal of Neurology, 135(2), 542–554. Garrido, L., Eisner, F., McGettigan, C., Stewart, L., Sauter, D., Hanley, J. R., ... Duchaine, B. (2009). Developmental phonagnosia: A selective deficit of vocal identity recognition. Neuropsychologia, 47, 123–131. Grüter, T., Grüter, M., Bell, V., & Carbon, C. C. (2009). Visual mental imagery in congenital prosopagnosia. Neuroscience Letters, 453, 135–140. Grüter, T., Grüter, M., & Carbon, C. C. (2008). Neural and genetic foundations of face recognition and prosopagnosia. Journal of Neuropsychology, 2(1), 79–97. Hailstone, J., Crutch, S., Vestergaard, M., Patterson, R., & Warren, J. (2010). Progressive associative phonagnosia: A neuropsychological analysis. Neuropsychologia, 48(4), 1104–1114. Holden, C. (2006). Have we met? Science Magazine, 312, 1449. Kennerknecht, I., Grüeter, T., Welling, B., Wentzek, S., Horst, J., Edwards, S., & Grüeter, M. (2006). First report of prevalence of non-syndromic hereditary prosopagnosia (HPA). American Journal of Medical Genetics, 140A, 1617–1622. Kreiman, J., & Sidtis, D. (2011). Foundations of voice studies: An interdisciplinary approach to voice production and perception. West Sussex, UK: Wiley-Blackwell. Legge, G. E., Grosman, C., & Pieper, C. M. (1984). Learning unfamiliar voices. Journal of Experimental Psychology: Learning, Memory, & Cognition, 10, 298–303. Leveroni, C. L., Seidenberg, M., Mayer, A. R., Mead, L. A., Binder, J. R., & Rao, S. M. (2000). Neural systems underlying the recognition of familiar and newly learned faces. The Journal of Neuroscience, 20, 878–886. Ma, N., Baetens, K., Vandekerckhove, M., Kestemont, J., Fias, W., & Overwalle, F. V. (2014). Traits are represented in the medial prefrontal cortex: An fMRI adaptation study. Social Cognitive and Affective Neuroscience, 9(8), 1185–1192. Marks, D. F. (1973). Visual imagery differences in the recall of pictures. British Journal of Psychology, 64(1), 17–24. Michelon, P., & Biederman, I. (2003). Less impairment in face imagery than face perception in prosopagnosia. Neuropsychologia, 41, 421–441. Roswandowitz, C., Mathias, S., Hintz, F., Kreitewolf, J., Schelinski, S., & von Kriegstein, K. (2014). Two cases of selective developmental voice-recognition impairments. Current Biology, 24(19), 2348–2353. Susilo, T., & Duchaine, B. (2013). Advances in developmental prosopagnosia research. Current Opinion in Neurobiology, 23(3), 423–429. Tree, J. J., & Wilkie, J. (2010). Face and object imagery in congenital prosopagnosia: A case series. Cortex, 46, 1189–1198. Van Lancker, D., & Canter, G. J. (1982). Impairment of voice and face recognition in patients with hemispheric damage. Brain and Cognition, 1, 185–198. Xu, X., & Biederman, I. (2014). Neural correlates of face detection. Cerebral Cortex, 24, 1555–1564. Xu, X., Biederman, I., Shilowich, B. E., Herald, S. G., Amir, O., & Allen, N. E. (2015). Developmental phonagnosia: Neural correlates and a behavioral marker. Brain & Language, 149, 106–117.
© Copyright 2026 Paperzz