Cross-linguistic sound symbolism and crossmodal

Brain & Language 128 (2014) 18–24
Contents lists available at ScienceDirect
Brain & Language
journal homepage:
Short Communication
Cross-linguistic sound symbolism and crossmodal correspondence:
Evidence from fMRI and DTI
Kate Pirog Revill a,⇑, Laura L. Namy b, Lauren Clepper DeFife b,1, Lynne C. Nygaard b
Center for Advanced Brain Imaging, Georgia Institute of Technology, Atlanta, GA, USA
Department of Psychology, Emory University, Atlanta, GA, USA
a r t i c l e
i n f o
Article history:
Accepted 1 November 2013
Available online 15 December 2013
Sound symbolism
Crossmodal correspondences
Spoken language
a b s t r a c t
Non-arbitrary correspondences between spoken words and categories of meanings exist in natural language, with mounting evidence that listeners are sensitive to this sound symbolic information. Native
English speakers were asked to choose the meaning of spoken foreign words from one of four corresponding antonym pairs selected from a previously developed multi-language stimulus set containing both
sound symbolic and non-symbolic stimuli. In behavioral (n = 9) and fMRI (n = 15) experiments, participants showed reliable sensitivity to the sound symbolic properties of the stimulus set, selecting the consistent meaning for the sound symbolic words at above chances rates. There was increased activation for
sound symbolic relative to non-symbolic words in left superior parietal cortex, and a cluster in left superior longitudinal fasciculus showed a positive correlation between fractional anisotropy (FA) and an individual’s sensitivity to sound symbolism. These findings support the idea that crossmodal
correspondences underlie sound symbolism in spoken language.
Ó 2013 Elsevier Inc. All rights reserved.
1. Introduction
One of the most basic and enduring assumptions regarding natural language is that the relationship between linguistic form and
meaning is fundamentally arbitrary (Hockett, 1977; Jackendoff,
2002; Pinker, 1999; Saussure, 1959). Indeed, arbitrary connections
between linguistic form and meaning are thought to be a necessary
design characteristic of language, granting language its compositional power, referential flexibility, and productivity (Gasser,
2004; Hockett, 1977; Monaghan, Christiansen, & Fitneva, 2011;
Saussure, 1959). Despite the apparent arbitrariness of the relationship between linguistic signs and their meaning, both historical
and recent evidence suggests that non-arbitrary correspondences
between linguistic structure and categories of meaning exist in
natural language, and that language users are sensitive to these
correspondences (Kohler, 1947; Kovic, Plunkett, & Westermann,
2010; Maurer, Pathman, & Mondloch, 2006; Monaghan, Christiansen, & Chater, 2007; Nygaard, Cook, & Namy, 2009; Ohala, 1984;
Perniss, Thompson, & Vigliocco, 2010; Ramachandran & Hubbard,
2001; Sapir, 1929; Sereno & Jongman, 1990). These correspondences, dubbed sound symbolism, include special classes of words
⇑ Corresponding author. Present address: Facility for Education and Research in
Neuroscience, Emory University, 36 Eagle Row, Atlanta, GA 30322, USA. Fax: +1 404
727 0372.
E-mail address: [email protected] (K.P. Revill).
Present address: Communication Sciences and Disorders, Georgia State University, Atlanta, GA, USA.
0093-934X/$ - see front matter Ó 2013 Elsevier Inc. All rights reserved.
such as onomatopoeia, Japanese mimetics (Imai, Kita, Nagumo, &
Okada, 2008; Kita, 1997) and phonesthemes (Bergen, 2004) in
which the structure of spoken word forms either resemble or reliably predict characteristics of the referents. Although these examples suggest that non-arbitrary mappings exist in natural language,
these cases often reflect conventions within particular languages.
There is mounting evidence that listeners are also sensitive to
cross-linguistic sound symbolism (Nuckolls, 1999), enabling listeners to match unfamiliar foreign words to their correct meanings at
rates above chance (Berlin, 1994; Brown, Black, & Horowitz, 1955;
Kunihira, 1971). This evidence suggests that sound-to-meaning
mappings may display consistency across languages, allowing native speakers of one language to recruit these correspondences in
the service of inferring the meaning of words in another language.
Recent findings also suggest a facilitative effect of sound symbolic
correspondences on foreign word learning. Following Kunihira
(1971), Nygaard et al. (2009) taught native English-speaking adults
the English equivalents of Japanese antonyms. At test, listeners
were more accurate and responded more quickly when the
Japanese items had been paired with the actual English equivalent
during learning than when paired with a mismatched meaning.
1.1. Sound symbolism, crossmodal correspondence, and synesthesia
One account of the underlying mechanisms of cross-linguistic
sound symbolism is based on cross-modal correspondences.
Ramachandran and Hubbard (Hubbard, Brang, & Ramachandran,
K.P. Revill et al. / Brain & Language 128 (2014) 18–24
2011; Ramachandran & Hubbard, 2001) suggest that sound symbolism is a product of cross-modal integration, whereby motor aspects of speech production or acoustic aspects of the speech signal
elicit activation of corresponding properties in other sensory
modalities and direct attention to aspects of the physical objects
to which the words refer. In signed languages, visual-spatial linguistic forms often directly correspond to perceptual properties
of a sign’s referent (Perniss et al., 2010) and signers are faster to
match iconic signs with pictures when features of the referent corresponding to the iconic form of the sign are emphasized in the
picture (Thompson, Vinson, & Vigliocco, 2010). Although the iconicity of the mapping between a visual property and its auditory
analog may not be as literal or as readily recognized, reliable crossmodal correspondences have been reported, for example, between
auditory pitch and visual size, with participants preferring pairings
between high pitch and small size or low pitch and large size
(Spence, 2011). Some (but not all) of these crossmodal correspondences may arise from the correlation of physical properties in the
real world, for example where smaller or faster objects have higher
resonating frequencies than larger or slower ones (see, e.g., Ohala,
1983; Spence, 2011).
That these cross-modal links extend across languages and operate independently of specific language experience suggests a potential basis for cross-linguistic sound-to-meaning mappings. The
results of a recent categorization experiment by Kovic and colleagues (Kovic et al., 2010) support this hypothesis. Participants
trained with sound symbolic labels categorized objects more
quickly than participants trained with non-symbolic labels, and
event related potentials (ERP) recorded during the final categorization test showed an increased early negativity when participants
viewed objects in the presence of sound symbolic relative to
non-symbolic labels. This ERP component has been linked to
cross-modal integration and stimulus binding in other tasks
(Molholm et al., 2002). These findings are also consistent with current neural models of synesthetic crossmodal integration, which
typically hypothesize co-activation of early sensory areas as well
as activation of parietal areas associated with stimulus binding
(Brang, Hubbard, Coulson, Huang, & Ramachandran, 2010).
Although there are important differences between the perceptual
experiences of true synesthetes and the sensitivity to crossmodal
correspondences that most individuals exhibit, several research
groups have drawn connections between crossmodal integration
and synesthesia (Brang, Williams, & Ramachandran, 2012; Martino
& Marks, 2001; Spector & Maurer, 2009). Relative to normal controls, synesthetes show enhanced crossmodal integration effects
(Brang et al., 2012) as well as changes in the functional and anatomical properties of parietal areas known to be involved in
cross-modal integration (Neufeld, Sinke, Dillo, et al., 2012; Neufeld,
Sinke, Zedler, et al., 2012; Rouw & Scholte, 2007; van Leeuwen, den
Ouden, & Hagoort, 2011).
In previous work (Namy, DeFife, Mathur, & Nygaard, submitted
for publication; Tzeng, Nygaard, & Namy, submitted for
publication), we developed a multi-language stimulus set to investigate the extent to which native monolingual speakers of English
display sensitivity to sound-to-meaning correspondences in words
drawn from other natural languages from distinct language families. Native speakers of Albanian, Dutch, Gujarati, Indonesian, Korean, Mandarin, Romanian, Tamil, Turkish, and Yoruba nominated
multiple synonyms for each member of 9 antonym pairs. In a
2AFC paradigm, native English speakers were asked to choose the
meaning of foreign words from the antonym pairs. Although listeners identified the correct meanings at greater than chance levels
across semantic domains and languages, how consistently listeners
selected a particular meaning varied across individual items. This
variation indicates that words differed with respect to their degree
of sound symbolism within each language and semantic domain
and underscores the probabilistic nature of sound symbolism.
From these data, we identified a set of sound symbolic words,
items that were judged to mean a particular antonym by at least
80% of participants, as well as a set of non-symbolic words for
which there was no consensus on the meaning. In this study, we
investigate the cross-modal integration account of sound symbolism by examining whether activation in parietal or perceptual
areas differs when participants guess the meanings of sound
symbolic versus non-symbolic foreign words, as well as whether
an individual participant’s sensitivity to sound symbolism is
related to structural connectivity in multi-modal regions.
2. Results and discussion
2.1. Behavioral replication and results
In the stimulus selection work (DeFife, Nygaard, & Namy, in
preparation; Namy et al., submitted for publication), all items for
a single antonym pair were presented in a single block. In order
to accommodate fMRI task constraints, the current paradigm intermixes short blocks of five trials from each of four antonym pairs
(big/small, round/pointy, still/moving, fast/slow), with sound symbolism classification held constant within each block. To determine
whether this adapted paradigm elicited the same sensitivity to
sound symbolism, we replicated our previous results with nine
participants in a behavioral pilot experiment. Participants heard
a foreign word and were asked to guess its meaning from its corresponding English antonym pair. We calculated the proportion of
antonym 1 responses for sound symbolic words previously classified as meaning antonym 1, sound symbolic words previously classified as meaning antonym 2, and non-symbolic words that were
previously equally likely to be paired with each antonym. Participants showed clear effects of sound symbolism (Fig. 1, left panel).
A repeated measures ANOVA with two levels of meaning (form/
motion) and three levels of ‘‘sounds-like’’ category (sounds like
antonym 1, sounds like antonym 2, sounds like neither) showed
a significant main effect of ‘‘sounds-like’’ category, F(2,16) = 75.3,
p < 0.001, but no main effect of meaning type or interaction between category and meaning type (both F < 1). Pairwise comparisons (collapsed across meaning type) show that all three word
sets differed reliably from each other, t’s (8) = 10.2, 7.1, and 8.8
for antonym 2 vs. neither, neither vs. antonym 1, and antonym 1
vs. 2 respectively, all p’s < 0.001. Consistent with our previous
work, participants showed similar sensitivity to sound symbolic
Fig. 1. Mean proportion of antonym 1 responses for words previously chosen to
sound like antonym 1, words equally likely to be paired with either antonym, and
words previously chosen to sound like antonym 2 for the behavioral and fMRI
experiments. Error bars represent standard error of the mean.
K.P. Revill et al. / Brain & Language 128 (2014) 18–24
properties of foreign words despite changes in how the materials
were presented.
Participants in the imaging study (n = 15) also showed sensitivity to sound symbolism in their pattern of responses (Fig. 1, right
panel). A repeated measures ANOVA with two levels of meaning
(form/motion) and three levels of ‘‘sounds-like’’ category showed
a significant main effect of ‘‘sounds-like’’ category, F(2,28) = 71.4,
p < 0.001, but no main effect of meaning type, F(1,14) = 1.76,
p > 0.2, or interaction between category and meaning type,
F(1,14) = 1.01, p > 0.2. Pairwise comparisons (collapsed across
meaning type) show that all three word sets differed reliably from
each other, t’s (14) = 7.7, 8.2, and 8.7 for antonym 2 vs. neither, neither vs. antonym 1, and antonym 1 vs. 2 respectively, all
p’s < 0.001.
Relative to rest, task-related activation was seen in a network of
brain areas (Table 1, Fig. 2a) including bilateral superior temporal
gyrus, bilateral (but predominantly left-lateralized) inferior frontal
gyrus, and supplementary motor area. These regions are frequently
identified as important components of the network that processes
spoken language (Hickok & Poeppel, 2007; Scott, Blank, Rosen, &
Wise, 2000), particularly of the dorsal speech perception pathway
associated with the integration of auditory and motor
Despite participants’ reports of guessing on all stimuli and their
lack of awareness of the sound symbolic manipulation, participants did show sensitivity to the manipulation in both their behavior (Fig. 1) and brain activity (Fig. 2a). The contrast between sound
symbolic and non-symbolic word blocks revealed an area of significant activation in the intraparietal sulcus in left superior parietal
lobe (Fig. 2a, Table 1). No areas surviving correction for multiple
comparisons were more reliably active for non-symbolic words
than symbolic words, or for the contrast of meaning type (motion
versus form). Previous results from several studies of synesthesia
show that synesthetic perception is associated with increased activation or volumetric differences in superior and inferior parietal
lobes and along the intraparietal sulcus, often in the left hemisphere. The cluster identified here is located near the region showing more activity in audio-visual synesthetes than nonsynesthetes
during auditory stimulus processing by Neufeld and colleagues
(Neufeld, Sinke, Dillo, et al., 2012). Parietal cortex is known to be
involved in multisensory integration (Calvert, 2001; Robertson,
2003), and permanent or temporary lesions to cortex along the
inferior parietal sulcus can lead to difficulty with stimulus binding
in patients (Robertson, 2003) and synesthetes (Esterman,
Verstynen, Ivry, & Robertson, 2006).
stimuli referring to aspects of object form or motion can activate
occipitotemporal regions associated with perceiving those properties (Chao, Weisberg, & Martin, 2002; Revill, Aslin, Tanenhaus, &
Bavelier, 2008; Willems & Casasanto, 2011). We were able to use
independently defined regions of interest to further investigate
whether regions involved in perceiving object form (for all participants) and motion (for a subset of the participants) were activated
while guessing meanings for words relating to those properties and
whether activation in these areas varied based on the sound symbolic status of the words. We observed greater activation for intact
abstract shapes relative to scrambled abstract shapes bilaterally in
all participants from the visual form localizer data. Mean peak voxel coordinates across all participants were ( 47, 75, 2) and (48,
74, 5). The contrast of moving versus static dots revealed bilateral activation in visual areas including area MT +, with average
Talairach coordinates of ( 50, 73, 5) and (48, 70, 2) in a subset
of 11 participants for whom MT localizer data was available. Each
individual’s peak coordinates were used as the centers of independent ROIs for a targeted analysis of the sound symbolic word task
data. Mean beta values, scaled as percent signal change, were extracted for each participant. Despite the small sample size, a repeated measures ANOVA revealed a significant effect of word
meaning in the left MT ROI, F(1,10) = 6.29, p < 0.05, with less activation for motion words than form words. No other ROIs showed effects of word meaning. Although the relative reduction in left MT
activity during processing of motion antonym blocks was not predicted, decreased activation in perceptual regions for linguistic
stimuli has been observed in other studies (Aziz-Zadeh et al.,
2008) and may indicate interference between linguistic stimuli
and normal perceptual processing (Landau, Aziz-Zadeh, & Ivry,
2010; Meteyard, Zokaei, Bahrami, & Vigliocco, 2008). We did not
observe significant effects or interactions with sound symbolism
in any ROI (all F < 1). Planned contrasts did not reveal significant
activation differences between symbolic and non-symbolic motion
words in the MT ROIs (left: t(10) = 0.9, p > 0.2; right: t(10) = 1.5,
p > 0.1) or between symbolic and non-symbolic form words in the
LOC ROIs (left: t(14) = 0.7, p > 0.2; right: t(14) = 0.7, p > 0.2). While
crossmodal activation theories of synesthesia posit direct connections between and activation of perceptual areas during processing
of synesthetic stimuli, we did not observe evidence for direct activation of visual sensory areas by sound symbolic stimuli, though
caution is warranted when drawing conclusions about null effects,
particularly with a relatively small sample size. However, we do see
increased activation for sound symbolic items in integrative areas
associated with crossmodal binding of stimuli, an important component of current models of synesthetic processing (Brang et al.,
2010; Neufeld, Sinke, Dillo, et al., 2012; Neufeld, Sinke, Zedler,
et al., 2012; Rouw & Scholte, 2007; van Leeuwen et al., 2011).
2.3. ROI analyses
2.4. DTI analyses
Prior evidence has shown that making categorical decisions
about object form or motion properties or encountering language
As a group, participants were sensitive to the sound symbolic
properties of the words, pairing sound symbolic words with the
2.2. Whole-brain fMRI analysis
Table 1
Talairach coordinates of the center of mass, volume, and peak t value of significant activation (cluster-based FWE corrected p < 0.05; uncorrected p < 0.001, cluster size 31 voxels)
for each contrast of interest. BA = Brodmann’s Area.
# Voxels
Peak T
All words > rest
Sound symbolic words > non-symbolic words
Right superior temporal gyrus
Left superior temporal gyrus
Left inferior frontal gyrus (BA 13/45)
Medial frontal gyrus
Left middle frontal gyrus (BA 44/45)
Right insula, right inferior frontal gyrus
Left superior parietal lobe
K.P. Revill et al. / Brain & Language 128 (2014) 18–24
Fig. 2. (A) Warm colors: significant activation for task relative to baseline. Cool colors: area showing significant activation for sound symbolic words relative to non-symbolic
words (collapsed across meaning dimension) in left superior parietal lobe. Activation is thresholded at a corrected p < 0.05 (uncorrected p < 0.001, cluster size 31 voxels). (B)
Significant correlations between sensitivity to sound symbolism (% correct on sound symbolic words) and FA were found in two clusters (red) within the left superior
longitudinal fasciculus. The FA skeleton (green) is overlaid on the group mean FA image in standard space. The relationship between participants’ performance on the sound
symbolic trials and mean FA extracted from the significant clusters is shown for illustrative purposes only.
‘correct’ antonyms (the meaning agreed upon by more than 80% of
the participants in the initial stimulus set construction) at a rate
well above chance (67.7% of trials), but not all participants were
equally likely to choose the correct meanings (range:
47.5–85.0%). We used each individual’s sound symbolic accuracy
score to perform a regression against the fractional anisotropy
(FA) skeleton defined by TBSS. Initial whole-brain analyses did
not reveal any clusters in which an individual’s sensitivity to sound
symbolism was significantly correlated with FA after permutationbased correction for multiple comparisons. However, previous research has suggested that FA in parietal/temporal white matter,
including the superior longitudinal fasciculus (SLF), correlates with
behavior on cross-modal integration tasks (Brang, Taich, Hillyard,
Grabowecky, & Ramachandran, 2013) and language tasks, particularly language tasks involving phonological processing (Vandermosten et al., 2012; Wong, Chandrasekaran, Garibaldi, & Wong,
2011). SLF masks from the JHU white matter tractography atlas
(Hua et al., 2008) were combined with the group FA skeleton using
a region of interest approach. Within this limited search volume,
two clusters in the left superior longitudinal fasciculus ( 26,
43, 28; 39, 41, 28; Fig. 2b) survived TFCE correction for multiple comparisons (p < 0.05) and show a positive correlation between FA and accuracy on sound symbolic words. Similar
clusters show correlations between FA and crossmodal integration
in nonsynesthetes (Brang et al., 2013) and with sound to meaning
mapping in word learners (Wong et al., 2011). Synesthetes also
show increased FA in this area compared to healthy controls (Rouw
& Scholte, 2007).
3. General conclusions
These data provide support for cross-modal activation as a
mechanism by which sound symbolism facilitates word-to-meaning mappings. Heightened activation in left superior parietal lobe
for sound symbolic relative to non-symbolic words suggests that
sound symbolic foreign words engage cross-modal sensory integration processes to a greater extent than non-symbolic words.
DTI analysis revealed that individual differences in performance
on the behavioral task reliably predicted FA in the left superior longitudinal fasciculus, which has previously been linked to individual
differences in cross-modal integration (Brang et al., 2013; Rouw &
Scholte, 2007). We did not (possibly due to our small sample size)
observe predicted differences between sound symbolic and nonsymbolic stimuli in lower level sensory ROIs involved in the perception of object form or motion. Future studies with a larger sample size and a stimulus set constructed to maximize sound
symbolic properties will be needed to fully explore a direct co-activation hypothesis.
While these findings are consistent with research linking increased activity and structural integrity in intraparietal regions
with cross-modal integration, these areas are also part of an impor-
K.P. Revill et al. / Brain & Language 128 (2014) 18–24
tant dorsal pathway involved in phonological processing (Hickok &
Poeppel, 2007; Wong et al., 2011). An important issue to address in
future work is whether the relationship between FA and sensitivity
to sound symbolism can be explained by individual differences in
cross-modal integration or by variability in listeners’ phonological
processing skills. Although there are no gross phonological differences between the sound symbolic and non-symbolic words in this
stimulus set, there are phonological regularities in the sound symbolic stimulus set that provide reliable cues to meaning (Namy
et al., submitted for publication). Better attunement to phonological information or sensitivity to phonological regularities, particularly when listening to words from unfamiliar phonologies, might
have enabled some participants to capitalize more readily than
others upon the cross-modal sound-to-meaning correspondences.
In sum, these findings suggest that cross-modal correspondences between particular auditory stimuli and particular visuospatial features of objects account for at least some aspects of
sound symbolism. That these correspondences appear to transcend
language families suggests that the associations are not dependent
upon language experience, but rather on a general sensitivity to
relations across auditory and visual domains including natural correlations between physical features of objects and their auditory
consequences. Recent phonological analyses (Namy et al.,
submitted for publication) have confirmed that there are common
phonological properties associated with particular meanings
across languages, and that the prevalence of these features is correlated with accuracy in guessing the meanings of these foreign
words. A critical question, of course, is why these particular
sound-to-meaning correspondences exist at all. These reliable correspondences may reflect underlying acoustic or articulatory properties of natural language that are non-arbitrarily related to
features in other sensory modalities, perhaps through an abstract
form of iconicity or embodiment. An additional question is why
sound-to-meaning correspondences continue to exist in natural
languages, given the clear advantages of arbitrariness in language.
Perhaps cross-modal correspondences between sound and meaning persist because they ease the formation of sound to meaning
mappings in first- or second-language learners (Maurer et al.,
2006; Nygaard et al., 2009) or because they render semantic retrieval or categorization faster or more efficient in skilled language
users (Kovic et al., 2010; Thompson et al., 2010). These will be
important directions for future research.
4. Methods
4.1. Stimuli
Four antonym pairs were employed for this study: two pairs
relating to object motion (still/moving and fast/slow) and two pairs
relating to object form (big/small and round/pointy). Stimuli were
derived by asking native speakers of 10 foreign languages (6 M,
ages 21–29, mean age of first exposure to English 9.7 years, all currently living in the US) to nominate as many synonyms as they
could think of for each word and to accept or reject additional synonyms from language-to-English dictionaries (DeFife et al., in
preparation). The final list of synonyms was recorded by the same
native speaker using neutral, list-like prosody. The ten languages
come from seven different language families (four are Indo-European, with one each from Austronesian, Korean, Sino-Tibetan, Dravidian, Altaic, and Niger-Congo language families) with a range of
phonological and morphological properties. Two of the languages
are tone languages. Vowel and consonant inventories range from
moderately small to large, syllable structures range from simple
to complex, and morphologies range from isolating to synthetic
(see Supplemental materials). However, all participants in the
experiments described here were native speakers of English with
no exposure to or knowledge of any of these languages. After the
stimuli were recorded by the native speakers, groups of 13–15 native English speakers heard each word and guessed which member
of the antonym pair it referred to. From these ratings, we identified
a subset of words for which at least 80% of listeners assigned the
word to a single member of the antonym pair and a subset for
which there was no consensus on meaning. Twenty of the sound
symbolic (high consensus) items and twenty non-symbolic items
were selected from each antonym pair as materials for the current
study. For each antonym pair, equal numbers of sound symbolic
and non-symbolic words were chosen from each language.
4.2. Participants
Twenty-four young adults from the Emory University and Georgia Institute of Technology communities participated in the study,
nine in the pilot experiment (8 female, mean age 19.2, SD 0.42, age
range 18–20 years), and fifteen in the imaging paradigm (5 female,
mean age 22.7, SD 3.8, age range 19–33 years). All participants
gave informed consent in accord with Emory and Georgia Tech
IRB protocols. Per self-report, all participants were right-handed
with normal hearing, normal or corrected-to-normal vision, and
no history of language or neurological disorders. All were native
speakers of English and none had prior experience with any of
the ten languages comprising the stimulus set. Participants were
paid for their participation.
4.3. Procedures
At the beginning of each task block, an instruction screen displayed the antonym pair that would be the basis for the next set
of trials. The antonym pair remained on the screen throughout
the block, with one word presented on each side of a central fixation cross. On each trial, participants heard a single word and indicated which antonym they thought corresponded with the
meaning of the spoken word by depressing the first or second buttons of a response box affixed to the participant’s right leg using
their right index and middle fingers. No feedback was provided.
To signal the beginning of a new trial, the central fixation cross
turned from a dark grey to a light grey color 200 ms prior to the onset of the target word. The fixation cross remained light grey until
the participant made a response or until 3.3 s had elapsed and responses were no longer accepted. 500 ms after the time-out, the
next trial began. Each block contained 5 trials, for a total block
duration of 24 s (4 s instruction screen plus 5 4 s trials). In the
fMRI experiment, task blocks were separated by a 12 s rest interval
where only the fixation cross was present. Trials were blocked by
antonym pair (fast/slow, moving/still, pointy/round, big/small),
and sound symbolism status (symbolic/non-symbolic). Participants were not told about sound symbolism or informed of the
sound symbolic blocking prior to the experiment, and post-experiment questioning indicated that participants were unaware of this
manipulation. Participants completed 32 blocks of trials, four
blocks each for every combination of antonym pair and sound symbolism level. In the fMRI experiment, eight blocks (two for each
antonym pair, one at each level of sound symbolism) comprised
a single functional run lasting 4:54. Participants completed four
functional runs. The order of blocks was counterbalanced across
participants and runs.
Following completion of the main task, imaging participants
also viewed stimuli designed to separately localize form- and
motion-sensitive areas of visual cortex. An object form localizer
was used to select regions in lateral occipital cortex (LOC) responsive to intact versus scrambled objects or shapes. The localizer
consisted of alternating blocks of abstract shapes and scrambled
K.P. Revill et al. / Brain & Language 128 (2014) 18–24
abstract shapes presented centrally. Each image was displayed for
800 ms with a 200 ms ISI between images, with 20 images per
block. A blank screen was displayed for 12 s between shape and
scrambled shape blocks. To ensure attention to the stimuli, participants performed a 1-back task, pressing a button to repeated
shapes or scrambled shape images (10% of trials). Six complete cycles of intact and scrambled blocks were presented. During the motion localizer task, participants passively fixated a central cross
while twelve 20-s blocks of moving or stationary dots were presented. Dots moved radially at 7°/s in an annulus ranging from
1° to 14° during the motion intervals. Due to equipment malfunction, only 11 of the 15 participants viewed the motion localizer
stimuli; all participants viewed the form localizer.
4.4. Image acquisition
All MRI data were collected on a Siemens 3T Trio scanner with a
12-channel RF-receive head coil. Functional data were collected
using an EPI pulse sequence with the following scan parameters:
repetition time (TR) 2000 ms, echo time (TE) 30 ms, flip angle
90°, 64 64 matrix, 192 192 mm field of view (FoV), GRAPPA
parallel imaging with acceleration factor PE = 2, and isotropic voxel
size of 3 mm3. Thirty-seven axial slices aligned with the A–P plane
were collected in an ascending interleaved order. For the main
task, we collected a total of 572 volumes (143 in each of 4 runs).
Object form and motion localizers consisted of 192 and 160 functional volumes, respectively. Diffusion tensor images (DTI) were
acquired using diffusion-weighted EPI sequence with a TR of
7700 ms, TE of 90 ms, matrix size 102 102 and FoV
204 204 mm, with voxel size 2 2 2 mm. Two repetitions of
30 directions were collected, along with a reference B0 image. In
addition, a 3D anatomical image was acquired for each participant
using a T1-weighted MP-RAGE sequence at a voxel size of 1 1 1 mm with a TR of 2300 ms, TE of 3.02 ms, TI of 1100 ms,
256 256 matrix, 256 256 mm FoV, 192 slices, and GRAPPA parallel imaging with acceleration factor PE = 2.
4.5. fMRI data analysis
Following the removal of three initial volumes, functional data
were slice-time corrected, motion-corrected, aligned to each subject’s anatomical image, normalized to the colin27 template in
Talairach space, and smoothed at 8 mm FWHM using AFNI (Cox,
1996). Data were analyzed using multiple linear regression via AFNI’s 3dDeconvolve tool. The regression model included head movement vectors as regressors of no interest. Five task regressors were
modeled with gamma variate functions convolved with stimulus
timing and duration. The four conditions of interest were sound
symbolic words referring to motion antonym pairs, sound symbolic words referring to object form antonym pairs, non-symbolic
motion pairs, and non-symbolic form pairs. The initial instruction
screen was modeled separately. Beta weights from the subject-level analysis were submitted to a whole-brain group-level analysis.
Data were thresholded at a cluster-based FWE corrected p = 0.05 (a
minimum cluster size of 31 voxels at an uncorrected p < 0.001)
using AFNI’s cluster-based Monte Carlo simulation method. Data
from the motion and form localizers were pre-processed and
analyzed separately using identical methods. Due to group-level
overlap between LOC and MT/MST (Kourtzi, Bulthoff, Erb, & Grodd,
2002), individual ROIs for visual motion processing and visual form
processing were defined for each participant separately (Saxe,
Brett, & Kanwisher, 2006) by drawing a sphere with radius
4.4 mm around each individual’s peak voxel in each hemisphere
for the motion (when present) and form localizer contrasts
4.6. DTI data analysis
DTI data analysis was performed in FSL using tract-based spatial
statistics (TBSS) (Smith et al., 2006). Data were eddy-current corrected before fitting a diffusion tensor model to calculate fractional
anisotropy (FA) values at each voxel in the brain. The FA images
were aligned to MNI standard space and the mean FA map across
all participants was thresholded at a FA value of 0.2 to define a
FA skeleton representing the centers of all tracts common to the
group. Individual subject FA values were projected onto the group
skeleton for further analyses. We examined the relationship between behavioral performance and FA using the threshold-free
cluster enhancement (TFCE) techniques available in FSL’s randomise program for permutation testing. To maintain consistency between the fMRI and DTI analyses, all coordinates of significant
clusters from the TBSS analysis are reported in Talairach space following the icbm_fsl2tal transformation (Lancaster et al., 2007).
This work was supported by a GSU/GT Center for Advanced
Brain Imaging seed grant (KPR) and an Emory College Instrumentation, Bridge, Instruction, and Seed grant (LCN & LLN).
Appendix A. Supplementary material
Supplementary data associated with this article can be found, in
the online version, at
Aziz-Zadeh, L., Fiebach, C. J., Naranayan, S., Feldman, J., Dodge, E., & Ivry, R. B. (2008).
Modulation of the FFA and PPA by language related to faces and places. Social
Neuroscience, 3(3–4), 229–238.
Bergen, B. K. (2004). The psychological reality of phonaesthemes. Language, 80(2),
Berlin, B. (1994). Evidence for pervasive synaesthetic sound symbolism in
ethnozoological nomenclature. In L. Hinton, J. Nichols, & J. Ohala (Eds.), Sound
symbolism (pp. 77–93). New York: Cambridge University Press.
Brang, D., Hubbard, E. M., Coulson, S., Huang, M., & Ramachandran, V. S. (2010).
Magnetoencephalography reveals early activation of V4 in grapheme-color
synesthesia. Neuroimage, 53(1), 268–274.
Brang, D., Taich, Z., Hillyard, S. A., Grabowecky, M., & Ramachandran, V. S. (2013).
Parietal connectivity mediates multisensory facilitation. Neuroimage.
Brang, D., Williams, L. E., & Ramachandran, V. S. (2012). Grapheme-color
synesthetes show enhanced crossmodal processing between auditory and
visual modalities. Cortex, 48(5), 630–637.
Brown, R. W., Black, A. H., & Horowitz, A. E. (1955). Phonetic symbolism in natural
languages. Journal of Abnormal Psychology, 50(3), 388–393.
Calvert, G. A. (2001). Crossmodal processing in the human brain: Insights from
functional neuroimaging studies. Cerebral Cortex, 11(12), 1110–1123.
Chao, L. L., Weisberg, J., & Martin, A. (2002). Experience-dependent modulation of
category-related cortical activity. Cerebral Cortex, 12(5), 545–551.
Cox, R. W. (1996). AFNI: software for analysis and visualization of functional
magnetic resonance neuroimages. Computers and Biomedical Research, 29(3),
DeFife, L. C., Nygaard, L. C., & Namy, L. L. (in preparation). Cross-linguistic
consistency and within-language variability of sound symbolism in natural
Esterman, M., Verstynen, T., Ivry, R. B., & Robertson, L. C. (2006). Coming unbound:
disrupting automatic integration of synesthetic color and graphemes by
transcranial magnetic stimulation of the right parietal lobe. Journal of
Cognitive Neuroscience, 18(9), 1570–1576.
Gasser, M. (2004). The origins of arbitrariness in language. Paper presented at the
Proceedings of the Cognitive Science Society.
Hickok, G., & Poeppel, D. (2007). The cortical organization of speech processing.
Nature Reviews Neuroscience, 8, 393–402.
Hockett, C. F. (1977). The view from language: selected essays, 1948–1974. Athens:
University of Georgia Press.
Hua, K., Zhang, J., Wakana, S., Jiang, H., Li, X., Reich, D. S., et al. (2008). Tract
probability maps in stereotaxic spaces: Analyses of white matter anatomy and
tract-specific quantification. Neuroimage, 39(1), 336–347.
Hubbard, E. M., Brang, D., & Ramachandran, V. S. (2011). The cross-activation theory
at 10. Journal of Neuropsychology, 5(2), 152–177.
K.P. Revill et al. / Brain & Language 128 (2014) 18–24
Imai, M., Kita, S., Nagumo, M., & Okada, H. (2008). Sound symbolism facilitates early
verb learning. Cognition, 109(1), 54–65.
Jackendoff, R. (2002). Foundations of language: Brain, meaning, grammar, evolution.
Oxford; New York: Oxford University Press.
Kita, S. (1997). Two-dimensional semantic analysis of Japanese mimetics.
Linguistics, 35(2), 379–415.
Kohler, W. (1947). Gestalt psychology, an introduction to new concepts in modern
psychology. New York: Liveright Pub. Corp.
Kourtzi, Z., Bulthoff, H. H., Erb, M., & Grodd, W. (2002). Object-selective responses in
the human motion area MT/MST. Nature Neuroscience, 5(1), 17–18.
Kovic, V., Plunkett, K., & Westermann, G. (2010). The shape of words in the brain.
Cognition, 114(1), 19–28.
Kunihira, S. (1971). Effects of expressive voice on phonetic symbolism. Journal of
Verbal Learning and Verbal Behavior, 10(4), 427–429.
Lancaster, J. L., Tordesillas-Gutierrez, D., Martinez, M., Salinas, F., Evans, A., Zilles, K.,
et al. (2007). Bias between MNI and Talairach coordinates analyzed using the
ICBM-152 brain template. Human Brain Mapping, 28(11), 1194–1205.
Landau, A. N., Aziz-Zadeh, L., & Ivry, R. B. (2010). The influence of language on
perception: listening to sentences about faces affects the perception of faces.
Journal of Neuroscience, 30(45), 15254–15261.
Martino, G., & Marks, L. E. (2001). Synesthesia: strong and weak. Current Directions
in Psychological Science, 10(2), 61–65.
Maurer, D., Pathman, T., & Mondloch, C. J. (2006). The shape of boubas: soundshape correspondences in toddlers and adults. Developmental Science, 9(3),
Meteyard, L., Zokaei, N., Bahrami, B., & Vigliocco, G. (2008). Visual motion interferes
with lexical decision on motion words. Current Biology, 18(17), R732–R733.
Molholm, S., Ritter, W., Murray, M. M., Javitt, D. C., Schroeder, C. E., & Foxe, J. J.
(2002). Multisensory auditory-visual interactions during early sensory
processing in humans: A high-density electrical mapping study. Brain
Research. Cognitive Brain Research, 14(1), 115–128.
Monaghan, P., Christiansen, M. H., & Chater, N. (2007). The phonological–
distributional coherence hypothesis: Cross-linguistic evidence in language
acquisition. Cognitive Psychology, 55(4), 259–305.
Monaghan, P., Christiansen, M. H., & Fitneva, S. A. (2011). The arbitrariness of the
sign: Learning advantages from the structure of the vocabulary. Journal of
Experimental Psychology: General, 140(3), 325–347.
Namy, L. L., DeFife, L. C., Mathur, N. M., & Nygaard, L. C. (submitted). Cross-linguistic
sound symbolism: Phonetic determinants of word meaning.
Neufeld, J., Sinke, C., Dillo, W., Emrich, H. M., Szycik, G. R., Dima, D., et al. (2012a).
The neural correlates of coloured music: A functional MRI investigation of
auditory-visual synaesthesia. Neuropsychologia, 50(1), 85–89.
Neufeld, J., Sinke, C., Zedler, M., Dillo, W., Emrich, H. M., Bleich, S., et al. (2012b).
Disinhibited feedback as a cause of synesthesia: evidence from a functional
connectivity study on auditory-visual synesthetes. Neuropsychologia, 50(7),
Nuckolls, J. B. (1999). The case for sound symbolism. Annual Review of Anthropology,
28, 225–252.
Nygaard, L. C., Cook, A. E., & Namy, L. L. (2009). Sound to meaning correspondences
facilitate word learning. Cognition, 112(1), 181–186.
Ohala, J. (1983). Cross-language use of pitch: An ethological view. Phonetica, 40(1),
Ohala, J. J. (1984). An ethological perspective on common cross-language utilization
of F0 of voice. Phonetica, 41(1), 1–16.
Perniss, P., Thompson, R. L., & Vigliocco, G. (2010). Iconicity as a general property of
language: Evidence from spoken and signed languages. Front Psychology, 1, 227.
Pinker, S. (1999). Words and rules: The ingredients of language (1st ed.). New York:
Basic Books.
Ramachandran, V. S., & Hubbard, E. M. (2001). Synesthesia – A window into
perception, thought, and language. Journal of Consciousness Studies, 8(12), 3–34.
Revill, K. P., Aslin, R. N., Tanenhaus, M. K., & Bavelier, D. (2008). Neural correlates of
partial lexical activation. Proceedings of National Academy Science USA, 105(35),
Robertson, L. C. (2003). Binding, spatial attention and perceptual awareness. Nature
Reviews Neuroscience, 4(2), 93–102.
Rouw, R., & Scholte, H. S. (2007). Increased structural connectivity in graphemecolor synesthesia. Nature Neuroscience, 10(6), 792–797.
Sapir, E. (1929). A study in phonetic symbolism. Journal of Experimental Psychology,
12, 225–239.
Saussure, F. d. (1959). Course in general linguistics. New York: Philosophical Library.
Saxe, R., Brett, M., & Kanwisher, N. (2006). Divide and conquer: a defense of
functional localizers. Neuroimage, 30(4), 1088–1096; discussion 1097–1089.
Scott, S. K., Blank, C. C., Rosen, S., & Wise, R. J. (2000). Identification of a pathway for
intelligible speech in the left temporal lobe. Brain, 123(Pt 12), 2400–2406.
Sereno, J. A., & Jongman, A. (1990). Phonological and form class relations in the
lexicon. Journal of Psycholinguistic Research, 19(6), 387–404.
Smith, S. M., Jenkinson, M., Johansen-Berg, H., Rueckert, D., Nichols, T. E., Mackay, C.
E., et al. (2006). Tract-based spatial statistics: Voxelwise analysis of multisubject diffusion data. Neuroimage, 31(4), 1487–1505.
Spector, F., & Maurer, D. (2009). Synesthesia: A new approach to understanding the
development of perception. Developmental Psychology, 45(1), 175–189.
Spence, C. (2011). Crossmodal correspondences: A tutorial review. Attenuation
Percept Psychophysics, 73(4), 971–995.
Thompson, R. L., Vinson, D. P., & Vigliocco, G. (2010). The link between form and
meaning in British sign language: Effects of iconicity for phonological decisions.
Journal of Experimental Psychology. Learning, Memory, and Cognition, 36(4),
Tzeng, C., Nygaard, L. C., & Namy, L. L. (submitted for publication). The specificity of
sound symbolic correspondences in spoken language.
van Leeuwen, T. M., den Ouden, H. E., & Hagoort, P. (2011). Effective connectivity
determines the nature of subjective experience in grapheme-color synesthesia.
Journal of Neuroscience, 31(27), 9879–9884.
Vandermosten, M., Boets, B., Poelmans, H., Sunaert, S., Wouters, J., & Ghesquiere, P.
(2012). A tractography study in dyslexia: neuroanatomic correlates of
orthographic, phonological and speech processing. Brain, 135(Pt 3), 935–948.
Willems, R. M., & Casasanto, D. (2011). Flexibility in embodied language
understanding. Front Psychology, 2, 116.
Wong, F. C., Chandrasekaran, B., Garibaldi, K., & Wong, P. C. (2011). White matter
anisotropy in the ventral language pathway predicts sound-to-word learning
success. Journal of Neuroscience, 31(24), 8780–8785.