Foundations of Intonational Meaning

Topics in Cognitive Science 8 (2016) 425–434
Copyright © 2016 The Authors. Topics in Cognitive Science published by Wiley Periodicals, Inc. on behalf
of Cognitive Science Society
ISSN:1756-8757 print / 1756-8765 online
DOI: 10.1111/tops.12197
Foundations of Intonational Meaning: Anatomical
and Physiological Factors
Carlos Gussenhoven
Department of Linguistics, Radboud University Nijmegen
Received 17 January 2014; received in revised form 17 July 2015; accepted 6 August 2015
Abstract
Like non-verbal communication, paralinguistic communication is rooted in anatomical and
physiological factors. Paralinguistic form-meaning relations arise from the way these affect speech
production, with some fine-tuning by the cultural and linguistic context. The effects have been
classified as “biological codes,” following the terminological lead of John Ohala’s Frequency
Code. Intonational morphemes, though arguably non-arbitrary in principle, are in fact heavily
biased toward these paralinguistic meanings. Paralinguistic and linguistic meanings for four biological codes are illustrated. In addition to the Frequency Code, the Effort Code, and the Respiratory Code, the Sirenic Code is introduced here, which is based on the use of whispery phonation,
widely seen as being responsible for the signaling and perception of feminine attractiveness and
sometimes used to express interrogativity in language. In the context of the evolution of language,
the relations between physiological conditions and the resulting paralinguistic and linguistic meanings will need to be clarified.
Keywords: Intonational meaning; Paralinguistics; Biological codes; Effort Code; Frequency Code;
Respiratory Code; Sirenic Code
1. Introduction
Human vocal communication proceeds through two systems, language and paralanguage
(Ladd 1996). The simultaneous use of these systems is particularly evident in vocal fold
vibration, and tone and intonation therefore provide a unique platform for seeing them in
action. This contribution has two goals. First, it argues that the form–meaning relations in
expressive uses of vocal pitch are grounded in anatomical and physiological effects on vocal
Correspondence should be sent to Carlos Gussenhoven, Afdeling Taalwetenschap, Faculteit der Letteren
Radboud University Nijmegen, Postbus 9103 6500HD Nijmegen, The Netherlands. E-mail: c.gussenhoven
@let.ru.nl
426
C. Gussenhoven / Topics in Cognitive Science 8 (2016)
fold vibration and that the intonation systems of languages are biased toward the form–
meaning relations based on these “biological codes.” Second, it introduces the Sirenic Code
as an explanation for the use of breathy or whispery voice in questions.
2. Paralinguistics
Paralinguistics preceded the origin of language. Just as non-verbal gestures like the nose
wrinkle for social distancing and the eyebrow flash for surprise or fear are “grounded in our
common anatomical-emotional history’ (Segerstr
ale and Molnar, 1997; in reference to a
chapter by Wolf Schiefenh€
ovel in their volume), so paralinguistic form–meaning relations
are based on metaphorical interpretations of the effects of anatomical and physiological factors on vocal fold vibration. John Ohala pointed out that differences in larynx size and the
resulting differences in vocal fold size, notably those between men and women, are responsible for the interpretation of high pitch as “small” meanings and low pitch as “big” meanings, a relation he termed the Frequency Code (Ohala, 1983 1984, 1996). In retrospect,
there are additional codes besides the size/frequency relation. Gussenhoven (2002, 2004)
extended the idea to the Effort Code and the Respiratory Code. (The latter term is from
Nolan [2006].) A fourth code to be discussed below is the Sirenic Code. Paralinguistic
meanings can be divided into those that relate to the message (“informational” meanings)
and those that relate to the speaker (“affective” meanings). An example of an informational
meaning is “interrogative,” while an example of an affective meaning is “cooperative.”
2.1. The Frequency Code
The association of “small” meanings with high pitch yields interpretations like submission, friendliness, uncertainty, and vulnerability, while “big” meanings are their opposites:
dominance, aggressiveness, certainty, and protectiveness. Placing his account within the
general complex of agonistic signals in mammals and birds (Morton, 1977), Ohala (1983)
noted that the association of large larynxes with large creatures is pre-determined in
mammalian species through sexual dimorphism. Puberty in human males is marked by a
disproportionate growth of the larynx, increasing vocal fold mass, as well as a lowering
of the larynx, increasing the length of the vocal tract, in addition to the onset of peripheral facial hair. These adjustments serve to create the impression of a large creature,
which will discourage potential aggressors.
2.2. The Effort Code
The Effort Code is based on the effect that more careful pronunciation has on the pitch
range and the precision in the execution of pitch movements. Wider pitch range and more
precise realizations of pitch events are associated with greater significance. In addition to
greater insistence, wider pitch range can signal cooperativeness, as in infant-directed
speech. For some languages, it has been shown that pitch falls tend to be steeper and
C. Gussenhoven / Topics in Cognitive Science 8 (2016)
427
timed earlier when used to express focus (Smiljanic & Hualde, 2000). A less expected
use of the Effort Code is pitch range compression toward mid pitch to express negation,
intended to effect the retraction as opposed to the addition of information (Gussenhoven,
2004, p. 88). The prediction was tested by Reckling and K€ugler (2011) for German.
Speakers compressed their pitch to the mid range in negative utterances and listeners
associated that adjustment with negation.
2.3. The Respiratory Code
The Respiratory Code, earlier the “Production Code” (Gussenhoven, 2002), relates the
declining f0 caused by diminishing subglottal air pressure to a phrasal profile beginning
with high pitch and ending with low pitch. The physiological effect applies to the exhalation phase, and thus to speech produced during a breath group. However, even though
breath intakes do not reliably coincide with phrase boundaries, the association is with the
beginnings and ends of phonological units, like the intonational phrase and the utterance.
In fact, declination is itself an exploitation of the Respiratory Code, being largely under
the control of the speaker.1 At the beginning of these phonological constituents, a high
pitch signals new topics and low pitch the continuation of a topic. At the end, high pitch
signals turn maintenance, while low pitch signals closure and thus potentially a turn shift.
In English, for instance, higher and later peaks have been reported in topic-initiating
sentences (Wichmann, House, & Rietveld, 1997).
2.4. The Sirenic Code
Breathy or husky voice is associated with feminine sexiness, anecdotal evidence for
which is found in the British English term “bedroom voice” as well as by informal
reports that breathy voice is associated with female prostitution in South Korea.
Breathiness is more common in female speech, as concluded by van Bezooijen (1984,
p. 25) on the basis of studies like Henton and Bladon (1988), Trittin and de Santos y
Lle
o (1995), and Mendoza, Valencia, Mufioz, and Trujillo (1996). From the perspective of regular phonation, breathy voice is typically created by slack vocal folds and
weak adduction, so that no firm closure stage is achieved. The escaping air that
causes the glottal friction cannot fully contribute to the force that pushes the closed
vocal folds up and apart, and because of the larger width of the glottis, the vocal
folds are less affected by the lowered air pressure between them when air flow is
increased (thus diminishing the Bernoulli effect). Both low arousal and femininity
have been associated with breathiness in perception research. First, breathy voice is
associated with meanings like relaxed, intimate, friendly, and timid, as found on the
basis of synthetic stimuli by Gobl and Nı Chasaide (2003). Second, voice quality, and
breathy voice in particular, have different perceptual effects in male and female
speech. Addington (1968) found that female breathy voice was perceived as feminine,
small, slim, good-looking, immature, and humorous, while the only unique attribute
for male breathy voice was artisticity. Xu, Lee, Wu, Liu, and Birkholz (2013) showed
428
C. Gussenhoven / Topics in Cognitive Science 8 (2016)
that male listeners preferred a female voice that signals a small body size, a voice
with high pitch, wide formant dispersion, and breathy voice. Female listeners preferred
a male voice with the opposite qualities for the first three factors; breathy voice was
positively evaluated, which Xu et al. (2013) explain as a softening effect on the
aggressiveness associated with a large body size, in agreement with the low arousal
attribute referred to above.
An important question for our topic is why a breathy or husky female voice should
have these attributes. Laver (1991) points out that voice quality characteristics may
derive from hormonal conditions, “where, for example, these result in changes in the
copiousness and consistency of the supply of lubricating mucus to the larynx, and in
the characteristics of the mucous membrane covering the actual vocal cords. Such
changes occur in the pregnant and pre-menstrual states in women (Perello, 1962).”
These conditions, Laver goes on, may cause “slight harshness and whispery or breathy
voice.” Possibly, such states may also occur during sexual arousal as a result of
increased hydration of the sexual organs at the expense of other organs.2 Evidence that
a woman’s voice is more attractive during the pre-menstrual period is provided by Pipitone, Gallup, and Gordon (2008).
3. “Biological codes” and language structure
Parallel to segmental structure, intonational structure consists of a set of tonal morphemes, a phonological grammar describing the way they are combined and integrated into
the sentence, plus language-specific phonetic realization rules (Pierrehumbert, 1980). In
many languages, single tones (i.e., H for high or L for low) or complexes of tones (HL,
etc.) are morphemes located at the edges of prosodic constituents (“boundary tones”) or in
locations inside these constituents (“pitch accents”). In this section, examples are given of
intonational morphemes whose tonal composition reflects one of the four codes outlined
above. The non-arbitrariness of intonational morphemes is akin to segmental sound symbolism (Dingemanse, 2011; Ohala, 1996). For instance, the frequent occurrence of palatal consonants and front high vowels for diminutive morphemes is explained by the interpretation
of a “small” meaning suggested by the forward position of the tongue body and the spread
position of the lips, gestures that mimic a short forward section of the vocal tract (Ohala,
1996), acoustically inversely related to the second vowel formant. These discrete interpretations of gradient paralinguistic forms need not be present in all languages. Interestingly,
while paralinguistic meanings of pitch shapes may relate to states of the speaker (affective
meanings), this is rarely the case for intonational morphemes, which by and large have
informational meanings. (See Supplementary Information S1.)
3.1. The Frequency Code in grammars
In (1a), the British English Would you like some coffee? combines with the pitch
accent H*L on coffee, the initial boundary tone %L, and the final boundary tone H%.
C. Gussenhoven / Topics in Cognitive Science 8 (2016)
429
(The * after a tone marks it as synchronizing with the accented syllable, while % indicates the boundary of the intonation phrase.) The phonetic realization is given above the
sentences, where the bullets indicate the timing and pitch height of the pronunciation of
each tone, with connecting lines forming the pitch contour.
Structure (1a) is discretely permutable with other tone options. Instead of %L we
could have %H, instead of H*L we could have L*H, and so on. Replacing H*L with
H* will lead to the intonation contour in (1b) (where the mid pitch on cof- is directly
followed by high pitch on -fee, without first dipping down), but there is no third contour with a semi-dip and an intermediate meaning. Structure (1a) is commonly used
for questions in England, while (1b) is the usual form in the United States and
Canada. These rising contours are representative of interrogative intonations found
world-wide. Although there are many languages that have falling question intonation
(Bolinger, 1978), the number of languages with rising question intonations is well
above chance.
Grammaticalization of secondary uses of the Frequency Code (see Supplementary
Information S1.) is found in languages with pitch accents defining earlier peaks for
statements and later peaks for questions, like Neapolitan Italian (D’Imperio & House,
1997). Increased final syllable length is found in West Greenlandic, where questions
are formed by adding a mora (a phonological timing unit) to the end of the utterance
(Rischel, 1974). As a result, the intonation contour gets shifted rightward, creating different shapes on the final two syllables.
3.2. The Effort Code in grammars
The prosodic expression of focus is typically achieved with the help of intonational
structures that enhance the prominence of the focus constituent. West Germanic languages have no pitch accents on words that come after the focus of the sentence, as
in coffee in Would you like still MORE coffee? Other languages may contrast H-toned
pitch accents for focused words with L-toned ones after the focus (Frota, 2000). Withdrawal of information, that is, negation, through pitch range reduction can also
be expressed discretely. In Engenni, a tone language, high tones are lowered and
low tones are raised in negative sentences from the verb onwards. This feature may
be the only structural way negation is expressed in the language (Thomas, 1978,
p. 67).
430
C. Gussenhoven / Topics in Cognitive Science 8 (2016)
3.3. The Respiratory Code in grammars
Many languages use a high final boundary tone (H%) as a marker of non-finality, often
referred to as “comma-intonation.” An iterative lowering of H-tones in an utterance, a feature known as Downstep, is a second type of grammaticalization of the Respiratory Code.
Its meaning (“there is no room for further discussion”; Gussenhoven, 2004, p. 107) is a
version of the “finality” meaning of final low pitch. It explains the preference for the
downstepped pronunciation of titles of stories in English, a point where the listener is
expected not to interrupt the reader. Observe that this account interprets the fact that H%
may mean both “non-finality” and “interrogativity‘’ in the same language as accidental,
since they derive from different codes. Central Swedish keeps them separate: L% is used
in questions and statements, while H% predominantly occurs in non-final intonational
phrases and more rarely in questions (Riad, 2014, p. 266).
3.4. The Sirenic Code in grammars
If breathy voice signals femininity, as suggested in Section 2.4, it may be usable to
signal interrogativity, the informational meaning of the Frequency Code based on “smallness” associated with the female larynx. It might moreover be expected to be a feature in
languages with reduced availability of pitch cues, as in tone languages. “Lax voice” was
reported to occur as a question cue in the last syllable of phrases in 36 of 78 African languages listed by Rialland (2007), many of which are tone languages. Rialland describes
this type of question marker as a complex of lengthening, vowel lowering, and low pitch.
Increased final lengthening is identified as a secondary feature of the Frequency Code in
Supplementary Information S1. and low pitch is a natural accompaniment of breathy
voice. The function of vowel lowering, which may increase the amplitude of the first harmonic, is not clear. Breathy or whispered termination, a phrase-final [+spread glottis] feature, can be independent of the lexical tone value (low, high, falling) on the last syllable.
In (2a) (from Ikaan, a Benue-Congo language spoken in Nigeria), a final whispered vowel
co-occurs with H and in (2b) with L (Salffner, 2010). In addition, questions have
expanded pitch range, here speculatively indicated by the initial %H boundary tone. (An
acute accent indicates H, a grave accent L, while acute-grave and grave-acute indicate
falling and rising tone, respectively.)
C. Gussenhoven / Topics in Cognitive Science 8 (2016)
431
The Sirenic Code may dispense with the breathy voice and have low tone as a secondary cue. For instance, in Basaa (Benue-Congo, Cameroon), polar questions end in
low-toned [e], which assimilates to a preceding adjacent vowel, causing the vowel in final
open syllables to lengthen by one low-toned mora (Makasso, 2008, p. 54), as illustrated
in (4a,b,c). WH-words have a final floating H-tone that attracts an extra mora in phrasefinal, though not in phrase-medial position (Hamlaoui & Makasso, 2011). These examples
again show that secondary grammaticalizations (final syllable lengthening) may exist in
the absence of primary grammaticalizations (H% to mark questions).
4. Conclusion
The phonological forms of a considerable proportion of the intonation contours in languages derive from paralinguistic form-meaning relations that have arisen through
metaphorical interpretations of effects that anatomical and physiological conditions have
on vocal fold vibration. One question here concerns the extent to which these observations provide a perspective on the possible continuity between phylogenetically older systems of communication and language. Ohala (1983), in fact, provides ample parallels
between the paralinguistic meanings of the Frequency Code as used in human communication and agonistic signaling in animals. This connection exists independently of the
degree to which animal signals encode referential meanings as opposed to involuntarily
reflecting biological states, which provide information to other animals (cf. Rendall,
Owren, & Ryan, 2009). A second question concerns the extent to which paralinguistic
meanings are reflected in early development of human communication (cf. Oller, 2000).
This issue is considered in Supplementary Information S2.
Another position defended here is that final lax voice to mark interrogativity, as
attested in many African languages (Rialland, 2007), indirectly supports the conception
behind the Frequency Code by Ohala (1983), which holds that vocal features that express
femininity may be used to express interrogativity. The Sirenic Code was introduced here
to account for the interrogativity meaning of breathy voice. Varying hydration of the larynx as a function of menstrual and sexual activity and the resulting attractiveness of a
breathy or husky voice were proposed as the basis for the metaphor in this case. Ohala
(1983, 1984) derives the interrogativity meaning of the Frequency Code from the
submissiveness and dependence signalled by high pitch. It could also be a referential
transfer of the “uncertainty” meaning as relating to the speaker to one relating to the message (Gussenhoven, 2004). In either interpretation, the interrogativity meaning of whispery voice may have been mediated by the shared feminity attribute of high pitch and
whispery phonation.
432
C. Gussenhoven / Topics in Cognitive Science 8 (2016)
Acknowledgments
I am grateful to John Laver and Kim Oller for discussing the issues with me and
amply providing me with references, and to Annie Rialland and two anonymous reviewers for commenting on a previous version.
Notes
1. The contribution of falling subglottal pressure is hard to establish (Ohala, 1990;
Strik & Boves, 1995). Subglottal pressure variation is largely due to impedance
effects of the articulation of speech sounds. An indication of the role of subglottal
pressure is given by the finding by van Katwijk (1974) that the onset of speech
coincides with a sudden increase in subglottal air pressure, while “at the endings of
utterances a gradual shapeless diminishing of pressure occurred.”
2. Laver (1991) mentions sexual arousal along with the other reproductive functions, but
he observes in a personal communication (2013) that he has found no confirmation in
the medical literature. On the connection between hormonal states and singing performance, see L~a and Davidson (2005), who also give a survey of observations and
research on the connections between the larynx and hormonal states.
References
Addington, D. (1968). The relationship of selected vocal characteristics to personality perception. Speech
Monographs, 35, 492–503.
van Bezooijen, R. (1984). Characteristics and recognizability of vocal expressions of emotion. Dordrecht, the
Netherlands: Foris.
Bolinger, D. (1978). Intonation across languages. In J. In Greenberg (Ed.), Universals of human language,
Vol. 2 (Phonology) (pp. 471–524). Stanford, CA: Stanford University Press.
D’Imperio, M., & House, D. (1997). Perception of questions and statements in Neapolitan Italian.
Proceedings of EUROSPEECH 1997, 1, 251–254.
Dingemanse, M. (2011). The meaning and use of ideophones in Siwu. Nijmegen, the Netherlands: Max
Planck Institute for Psycholinguistics.
Frota, S. (2000). Prosody and focus in European Portuguese. New York: Garland.
Gobl, C., & Nı Chasaide, A. (2003). The role of voice quality in communicating emotion, mood and attitude.
Speech Communication, 40, 189–212.
Gussenhoven, C. (2002). Intonation and interpretation: Phonetics and phonology. In Speech Prosody 2002
(pp. 47–57). ProSig and Universite de Provence, Laboratoire de Parole et Langage: Aix-en-Provence.
Gussenhoven, C. (2004). The phonology of tone and intonation. Cambridge, UK: Cambridge University
Press.
Hamlaoui, F., & Makasso, E. M. (2011). Basaa Wh-questions and prosodic structuring. ZAS Papers in
Linguistics, 55, 47–63.
Henton, C., & Bladon, A. (1988). Creak as a socio-phonetic marker. In L. M. Hyman & C. N. Li (Eds.),
Language, speech and mind: Studies in honor of Victoria A. Fromkin (pp. 3–29). London: Croom Helm.
C. Gussenhoven / Topics in Cognitive Science 8 (2016)
433
van Katwijk, A. F. V. (1974). Accentuation in Dutch. Assen: van Gorcum.
L~a, F., & Davidson, J. W. (2005). Investigating the relationship between sexual hormones and female
Western classical singing. Research Studies in Music Education, 24, 75–87.
Ladd, D. R. (2008 [1996]). Intonational phonology (2nd ed.). Cambridge, UK: Cambridge University Press.
Laver, J. (1991). Voice quality and indexical information. In J. Laver (Ed.), The gift of speech: Papers in the
analysis of speech and voice (pp. 147–161). Edinburgh: Edinburgh University Press.
Makasso, E.-M. (2008). Intonation et m
elismes dans le discours oral spontan
e en B
as
aa. Ph.D. thesis,
Aix-Marseille Universite.
Mendoza, E., N. Valencia, J. Mufioz, & H. Trujillo (1996). Differences in voice quality between men and
women: Use of the Long-Term Average Spectrum (LTAS). Journal of Voice, 10, 59–66.
Morton, E. W. (1977). On the occurrence and significance of motivation-structural rules in some bird and
mammal sounds. The American Naturalist, 111, 855–869.
Nolan, F. (2006). Intonation. In B. Aarts & A. MacMahon (Eds.), Handbook of English linguistics (pp. 433–
456). Oxford, UK: Blackwell.
Ohala, J. J. (1983). Cross-language use of pitch: An ethological view. Phonetica, 40, 1–18.
Ohala, J. J. (1984). An ethological perspective on common cross-language utilization of f0 in voice. Phonetica,
41, 1–16.
Ohala, J. J. (1990). Respiratory activity in speech. In W. J. Hardcastle & A. Marchal (Eds.), Speech
production and speech modeling (pp. 23–53). Dordrecht, the Netherlands: Kluwer.
Ohala, J. J. (1996). The frequency code underlies the sound symbolic use of voice pitch. In L. Hinton, J.
Nichols, & J. J. Ohala (Eds.), Sound symbolism (pp. 325–347). Cambridge, UK: Cambridge University
Press.
Oller, D. K. (2000). The emergence of the speech capacity. Mahwah, NJ: Lawrence Erlbaum.
Perell
o, J. (1962). La disfonia premenstruel. Acta Oto-rino-laringologica Ibero-Americana, 23, 561–563.
Pierrehumbert, J. B. (1980). The Phonetics and Phonology of English Intonation. Ph.D. thesis, MIT.
Distrubuted by Indiana University Linguistics Club.
Pipitone, R. N., & J. Gallup, Gordon G. (2008). Women’s voice attractiveness varies across the menstrual
cycle. Evolution and Human Behavior, 29, 268–274.
Reckling, F., & K€ugler, F. (2011). Pitch range in negative and positive connoted sentences in German.
Rendall, D., M. J. Owren, & M. J. Ryan (2009). What do animal signals mean? Animal Behaviour, 78, 233–240.
Riad, T. (2014). The phonology of Swedish. Oxford, UK: Oxford University Press.
Rialland, A. (2007). Question prosody: An African perspective. In T. Riad & C. Gussenhoven (Eds.), Tones
and tunes. Volume I: Typological and comparative studies on tone and intonation (pp. 35–56). Berlin:
Mouton de Gruyter.
Rischel, J. (1974). Topics in West Greenlandic phonology. Copenhagen: Akademisk Forlag.
Salffner, S. (2010). Intonation and phonation type as markers in Ikaan yes/no questions. Paper presented at
the Fourth Conference on Tone and Intonation (TIE4), Stockholm.
Segerstr
ale, U., & Molnar, P. (1997). Non-verbal communication: Crossing the boundary between culture and
nature. In U. Segerstr
ale & P. Molnar (Eds.), Non-verbal communication: Where nature meets culture (pp.
1–26). Mahwah, NJ: Laurence Erlbaum.
Smiljanic, R., & Hualde, J. I. (2000). Lexical and pragmatic functions of tonal alignments in two SerboCroatian dialects. In A. Okrent & J. Boyle (Eds.), Proceedings from the Main Session of the 36th
Regional Meeting of the Chicago Linguistic Society, Vol. 36–1 (pp. 469–482). Chicago: CLS.
Strik, H., & Boves, L. (1995). Downtrend in f0 and psb. Journal of Phonetics, 23, 203–220.
Thomas, E. (1978). A grammatical description of the Engenni language. Dallas, TX: Summer Institute of
Linguistics.
Trittin, P. J. T., & de Santos y Lleo, A. (1995). Voice quality analysis of male and female Spanish speakers.
Speech Communication, 16(4), 359–368.
Wichmann, A., House, J., & Rietveld, T. (1997). Peak displacement and topic structure. In A. Botinis, G.
Kouroupetroglou, & G. Carayannis (Eds.), Intonation: Theory, models and applications. Proceedings of an
434
C. Gussenhoven / Topics in Cognitive Science 8 (2016)
ESCA workshop (pp. 329–332). Athens (Greece): ESCA and University of Athens, Department of
Informatics.
Xu, Y., Lee, A. Wu, W.-L. Liu, X., & Birkholz, P. (2013). Human vocal attractiveness as signaled by body
size projection. PLOSone, 8, 1–9. doi:10.1371/journal.pone.0062397
Supporting Information
Additional Supporting Information may be found in
the online version of this article:
Supporting Information S1. Biological codes.
Supporting Information S2. On intonational meaning.
Supporting Information S3. Intonation in phylogeny
and ontogeny.
Supporting Information S4. Sound files for examples
(2) and (3),
courtesy of Sophie Salffner, and (1).