the attitudinal effects of prosody, and how they

ITRW on
Speech and Emotion
Newcastle, Northern Ireland, UK
September 5-7, 2000
ISCA Archive
http://www.iscaĆspeech.org/archive
THE ATTITUDINAL EFFECTS OF PROSODY, AND HOW THEY RELATE
TO EMOTION
Anne Wichmann
Department of Cultural Studies, University of Central Lancashire, Preston PR1 2HE, UK
ABSTRACT
The aim of this paper is to contribute to a theoretical framework
for the study of affective intonation. I draw a distinction
between 'attitude' and 'emotion', suggesting that only the latter is
likely to be reflected directly in the speech signal, while
'attitude' is reflected indirectly, and can only be explained by a
process of linguistic analysis. The term 'attitude', as applied to
intonation and prosody, is a problematic one. It has been used
differently in different fields, such as social psychology and
linguistics, and is not made any clearer by the proliferation of
'attitudinal' labels in the intonation literature. I suggest that
while there are clearly prosodic signals in speech which
contribute to the impression of 'attitude', this perceived meaning
should be treated as a pragmatic implicature or a pragmatic
inference. This means that it can only be explained by taking
into account contextual features, such as speaker-hearer
relationship, and the text itself. The same intonational feature
can be attitudinally neutral, or signal positive and negative
attitudes depending on a complex interaction between prosody,
text and context.
1. INTRODUCTION
One of the most important yet elusive functions of intonation is
its so-called 'attitudinal' function. There seems to be no dispute
over the fact that we are able, simply by how we say something
(in everyday terms, our 'tone of voice'), to convey meanings in
conversation which are different from, or go beyond, what we
say. There is a long tradition of of trying to identify these
different nuances of meaning which we intuitively feel can be
conveyed by intonation, and there have been many attempts to
describe the intonational or other prosodic features which give
rise to these meanings. However, since each observed example
of perceived meaning tends to be given an ad hoc label, one of
the main problems we face is the profusion of labels, along with
the assumption that every label must refer to something
different. I am by no means the first to suggest that this mass of
labels is unhelpful. O'Connor made the same point over a
quarter of a century ago:
"This topic (attitudinal intonation) is bedevilled by the lack of
agreed categories and terms for dealing with attitudes; they have
been characterised here by terms such as 'conspiratorial, awe,
concern, perfunctory' etc., etc., more with an eye to identifying
them to the reader than to classifying them in an orderly
scheme, and until some method of dealing with attitudes is
developed along very much more scientific lines than is possible
at present, we shall not even be able to tell whether this
language and that are similar or different in the number or
nature of attitudes they mark" O'Connor 1973: 270.
If we look at the most recent publications on intonation (e.g.
Tench 1997) there seems to have been little progress on this
front. And yet we know intuitively that intonation can express
something which we loosely term attitude.
2. METHODOLOGICAL PROBLEMS
One of the problems of researching this issue lies in deciding
what we mean by 'attitude'. It appears to be difficult to
distinguish 'attitude' from other kinds of affective meaning such
as 'emotion'. In fact in many accounts of intonation the two are
frequently assumed to be synonymous, or at least overlapping.
Crystal notes that an important function of intonation is
'attitudinal', but he does so under the heading of 'emotion', thus
unhelpfully conflating the two: "Emotion: Intonation's most
obvious role is to express attitudinal meaning - sarcasm,
surprise, reserve, impatience, delight, shock, anger, interest, and
thousands of other semantic nuances" (1995: 249). It seems
therefore that the most important prerequisite for studying the
intonational (or other prosodic) correlates of attitude and
emotion is to consider carefully what we mean by the terms. In
the following I will examine first of all the way in which the
terms 'attitude' and 'emotion' appear to be used by psychologists.
I will then turn to the way in which it is used by linguists.
2.1. Attitude and Human Behaviour
In social psychology the attempt to define 'attitude' is part of the
desire to construct a theory of social behaviour, in other words
to explain why people behave as they do. The scientific study of
attitude constitutes an important part of the work in social
pyschology in the 20th century. An early view of attitude was
that it was a 'behavioural disposition', a general tendency
towards or against certain social actions. Methods of measuring
attitude, usually on unidimensional, evaluative scales, were
developed (e.g. Osgood's semantic differential), but empirical
studies failed to show a clear relationship between measured
attitudes and subsequent actions.
Subsequently a more complex, multi-dimensional view of
attitude emerged - composed of affect, cognition and volition.
Attitudes were thus seen as "complex systems comprising the
person's beliefs about the object, his feelings towards the object,
and his action tendencies with respect to the object" (Ajzen &
Fishbein 1980: 19). These components appear to be highly
interrelated, and not necessarily in a consistent way. Some
studies have found, for example, a high degree of correlation
between affect and volition, while others have found that the
strength of correlation depended on the type of intended
behaviour. The differences between findings such as these may
be due to different kinds of intentions being considered in each
study. Little difference appears to have been made, when
comparing these studies, between, for example, a general
intention to interact with a person or group, and specific
intentions such as 'giving friendship' or 'showing admiration'.
That the latter correlates more highly with affect than the former
is not counter-intuitive.
The fact that beliefs, feelings and intentions may interact in a
complex way in determining behaviour does not preclude the
need to draw distinctions between them. Ajzen and Fishbein
restrict their use of the term 'attitude' to describe the cognitive
component of the model: "a person's general evaluation or
overall feeling of favourableness or unfavourableness towards
the behaviour in question" (1980: 55). Thus while some use the
term 'attitude' to describe the complex whole, i.e. as an inclusive
term encompassing evaluation, belief, affect and volition, Ajzen
and Fishbein use the term to refer to only one component of that
whole.
In summary, from the social psychologist's point of view,
'attitude' is a term for a determinant of intention (to act) and is
itself determined by beliefs (e.g. whether the intended behaviour
is good or bad) . Alternatively, 'attitude' may also be found as a
superordinate term for the complex system which includes
beliefs, and, in addition, emotions and willingness to act, which
together determine a person's intention to act.
2.2. Attitude and Emotion
Recent work on prosody and emotion (e.g. Cowie et al. 1999)
has departed from what Cowie et al. term the 'classical'
approach, i.e. to restrict the study of emotion and speech to the
extreme, primary emotions, and attempts to deal with more
subtle variation and less extreme emotions. They define emotion
on two dimensions: positive/negative and active/passive. In
their study of perceived changes in emotional state in the course
of dialogues they elicit judgements from listeners on these two
dimensions, but also supply "selected emotion words ... at that
point in the (two-dimensional) space where their reported coordinates indicate that they lie" (1999: 43). The two dimensions
seem to reflect a combination of 'feeling' (the active/passive
axis) and evaluation (the positive/negative axis). The latter
closely resembles Ajzen and Fishbein's definition of 'attitude'.
If one examines the 'emotion words' used in Cowie et al.'s study,
rather than the description of the dimensions themselves, it
becomes clear that many of them suggest a meaning which not
only contains a conflation of these two dimensions but also
additional meaning relating more closely to a specific context.
'Trusting', for example, is located in the space defined as
moderately positive and moderately passive. This may of course
capture very well the abstract affective and evaluative elements
in 'trust', but it involves a further element, namely the belief that
the other person or information can be relied on. This goes
beyond simple affect and attitude. Other examples, too - e.g.
hopeful, suspicious, obedient, contemptuous - convey much
more meaning than is reflected by its position in the twodimensional 'emotional' space. Labelling an emotion, as
opposed to identifying it in two-dimensional space, seems in
most cases to provide additional meaning which goes beyond
the underlying feelings and evaluation, and captures beliefs
relating to a specific situational context.
The problem with labels is therefore this: words which we use
every day to describe behaviour, linguistic or otherwise, such as
'friendly', 'trusting', 'suspicious', do not lend themselves readily
to the labelling of single attitudinal or affective parameters.
Thus while psychologists have clearly made every effort to
define what they see as the determinants of social action, the
words used to label individual examples of 'emotion' or 'attitude'
do not always successfully capture what their users intend, but
are more complex, and more importantly, bound to a specific
context of situation.
2.3. Attitude and Language
The subtle and complex nuances of attitudinal meaning
expressed by such words are not, of course, conveyed by
intonation alone. Ladd rightly points out that "... there are
universal signals in tone of voice which color our reading of a
speaker's attitude ... (but) these are a more basic means of
expressing attitude than linguistic - including intonational choice" (Ladd 1978: 128) (my emphasis). Ladd makes here a
crucial distinction between non-linguistic and probably
universal settings of the voice (his use of 'tone of voice' is
similar to what is now generally referred to as 'voice quality'),
and intonation as part of a linguistic system. The former may
correlate reliably with human emotions and attitudes (in the
sense of evaluation) and can be measured acoustically.
However, as Ladd himself argues, it makes little sense to seek
direct acoustic correlates of linguistic meaning. Much of what I
have to say here relates to the kind of linguistic analysis which
is required to account for perceived attitudinal meanings, in
particular the attitudinal cues of intonation, which are known to
be language-specific.
While psychologists have been at pains to define precisely what
they mean by emotion and attitude, linguists have paid little
attention to the definition or categorisation of 'attitudinal' or
'affective' labels. These have been used in an ad hoc way,
arising mainly from individual examples used by different
writers to illlustrate how intonation can convey such meaning
which goes beyond the text. Far more attention has been paid,
albeit in an equally unsystematic way, to the intonational
correlates of these perceived meanings. A first attempt at
categorisation was made by Couper-Kuhlen (1986: 185-7). She
suggests a way of distinguishing between emotion and attitude
by defining emotion as a speaker state ('he is feeling
happy/sad...') and attitude as a kind of behaviour ('he is being
condescending, friendly...'). This confuses the issue slightly:
while psychologists define attitude as a possible predictor of
social behaviour, many of the attitudinal labels used in
intonation research actually refer to the behaviour itself. This
model also omits precisely that kind of cognitive attitude which
is so central to the work of social psychologists, namely
attitudes which are functions of opinions, beliefs or knowledge.
These have been described elsewhere as 'propositional attitudes'
as they are attitudes towards propositions, a "psychological
attitude towards a state of affairs" (Leech 1983: 106). A
propositional attitude is affectively neutral, but the kind of
labels we associate with it reflect the emotions aroused by that
opinion, belief or knowledge. The labels associated with the
expression of propositional attitude seem therefore to be closely
allied to those associated with emotion. They also have in
common the fact that emotions can be felt and opinions held,
without resulting in any behaviour at all. In terms of
conversational interaction this means that they can exist without
another person being present. Attitude defined as speaker
behaviour, on the other hand, cannot be present independently
of interaction. To put it simply, you can be happy on your own,
and disapproving on your
condescending on your own.
own,
but
you
cannot
be
I will categorise as 'expressive' intonation those intonational
characteristics which appear to convey pure emotion, and
emotions arising from or closely linked to beliefs, knowledge
and opinion. I use 'attitudinal' intonation to mean any
intonational cue which, together with non-linguistic information
in the voice, reflects speaker behaviour in a given situation,
either as intended by the speaker, or as inferred by the receiver,
or both. I believe this kind of attitude is best approached as a
part of speaker meaning. To understand it and to establish the
role of intonation in communicating it, we must see it as part of
pragmatics.
Expressive intonation
Attitudinal intonation
and …
reflects …
emotion
reflects …
propositional attitude
speaker behaviour
Opinion, belief, knowledge about
a person or issue
Intended and/or perceived in a
given context
He is (feeling).
I am
You are being
happy
angry
sad…
critical
impressed
disapproving…
condescending
friendly
rude…
Figure 1: This figure shows the distinction between expressive and attitudinal meaning, showing examples of labels which might
apply in each case.
3. INTONATIONAL CUES TO ATTITUDE
First of all I will consider the way in which prosody may relate
to emotion. Speaker emotions are usually assumed to affect
utterance-level settings of paralinguistic features. A wide pitch
range is commonly associated with strong emotions (both
positive and negative), while a compressed range is associated
with low emotional involvement ('passive' on the active to
passive dimension) both positive and negative. Thus a stretch of
speech with a low, narrow pitch range together with indications
of positive orientation, e.g. smiles, might reflect contentment,
and with negative signals it might indicate boredom or gloom.
The same is true of backchannel responses, which normally,
assuming cooperative behaviour on the part of the interlocutors,
reflect the mood indicated by the other speaker. A nasalised
backchannel ('hmm') with a wide range might be an appropriate
response to an animated comment, while a low, narrow version
would be more appropriate if the the current talk is low and
narrow in range.
There are very few linguistic choices, such as choice of local
contour (nuclear tone) which are associated typically with
emotion. In the British system, the choice between a high fall
and a low fall is said to indicate affect. "The low fall is
generally more uninterested, unexcited, and dispassionate
whereas the high fall is more interested, more excited, more
involved ..." (Cruttenden 1986: 100). One has to say here that
the high/low fall is often regarded as a single phonological
choice and the high/low distinction a difference of degree. It is
therefore a matter for debate whether this is a case of
paralinguistic setting or phonological choice. Less
controversially, the rise-fall (sometimes analysed as a falling
tone with a delayed F0 peak) is said to carry affective
meaning(s). Cruttenden suggests two - 'impressed' or 'surprised',
and 'challenging', depending on the type of clause and the
proposition contained in it. The very fact, however, that the
same contour can carry different meanings shows that even
these kind of local meanings depend on context and are
therefore governed by pragmatic factors. "The 'challenging' as
opposed to the 'impressed' meaning seems to arise where the
speaker is disagreeing with the listener in some way; ... Another
local meaning sometimes ascribed to the rise-fall is 'ironic' or
'sarcastic'; but these meanings arise in a rather different way.
Like many types of humour, they depend on a mismatch, in this
case between situation and linguistic expression. If you do
something stupid and I say ^clever! there is firstly a mismatch
between the obvious stupidity of your action and my use of the
word clever; but the sarcasm is made even more telling by my
use of a tone which would normally indicate that I was
'impressed'" (Cruttenden 1986: 102).
Knowles (1987: 205-6) suggests that "it is extremely unlikely
that there are any attitudes which are conveyed uniquely by
intonation. It is possible that intonation patterns that are
regarded as attitudinally marked use the intonation system in an
unexpected way, and possibly in conjunction with other
linguistic patterns". This view is consistent with a pragmatic
approach to meaning, the kind of approach which is contextdependent. Speaker meaning, in so far as it deviates from the
surface meaning of an utterance, is generated in a systematic
way, usually when there is a perceived mismatch between the
content of an utterance and the context in which it is conveyed. I
assume that at least some intonationally conveyed attitudes are
also conveyed by some kind of mismatch, for example between
the intonation and the message, or between the intonation and
the context.
Some attitudes, of course, are not conveyed by mismatch.
Sometimes we say what we mean. The response 'wonderful!'
with a smile and wide pitch range to a speaker's good news,
such as having just become a father, needs little explanation.
The wide pitch range is a typical expression of strong feeling
(here, given the meaning of 'wonderful' we can assume it
conveys pleasure - the same pitch contour on 'No!' might
convey a very different emotion - in a different part of the two
dimensional space described above (active and negative rather
than active and positive). The expression of pleasure can be
interpreted as a positive evaluation of the proposition contained
in the message, and this in turn is likely to be interpreted as a
positive evaluation of the speaker. The interactional behaviour
is therefore likely to be seen as 'being friendly' or 'being
supportive'.
Many intonationally conveyed attitudes, however, especially
negative ones, are the result of some kind of mismatch. In the
situation described above, the response 'wonderful!' with a
narrow pitch range and no smile would indicate that the speaker
meant something else, although what precise meaning is
inferred would depend on the context, i.e. on who the
participants were and the situation they were in. Clearly if we
are to investigate mismatches we must have a clear idea of what
constitutes a match. It must be possible to assign to a particular
intonation pattern, or some aspect of intonational behaviour, a
normal or expected meaning, which then has the potential to be
exploited in an unexpected way, either intentionally, generating
a prosodic implicature, or unintentionally, generating an
inference on the part of the hearer. These implied or inferred
meanings are in my view the key to many cases of perceived
'tone of voice' or attitudinal intonation. However, until we are
able to identify a normal association between intonation, text
and context we are not in a position to identify any deviation
from that norm.
4. A PRAGMATIC APPROACH TO
ATTITUDE
If we take a subset of attitudinal labels to reflect implicatures,
rather than a direct expression of meaning through prosody, it
no longer makes sense to try to systematise those labels, but
vindicates the subjective, ad hoc choice of label in any one
situation. Pragma-linguists no longer attempt to make a list of
all possible implicatures. After some unsuccessful attempts (e.g.
Searle 1975) the preferred approach is simply to identify the
mechanisms whereby implicatures and inferences are generated.
Many of the attitudes which refer to interactive behaviour are in
fact descriptions of inferences made by the listener rather than
implicatures intended by the speaker, and linguists are just as
unlikely to consider listing all possible inferences on which
listeners can draw. In the same way I regard it as futile to
attempt to look for the acoustic correlates of these labels.
In fact the majority of 'attitudinal' meanings can only be
explained using pragmatic analysis. If we accept that these
meanings are implied or inferred speaker meaning, we can
begin to re-evaluate some of the impressionistic comments
found in the intonation literature. For example Brown et al..
(1980) consider the low endpoint of a terminal contour (falling
tone) to be an important finality signal. Not-low terminals (e.g.
rises, fall-rises) on the other hand are associated with 'more to
come'. Brown claims that "Not-low is also associated with a
range of affective meanings including deference, politeness,
vulnerability" (1980: 30). This is not difficult to account for.
Extreme finality means closure - 'this is the end of the matter' - a
powerful statement which is only appropriate in situations
where the speaker has the 'right' to control the discourse. The
expression of non-finality or non-closure, on the other hand,
effectively lays the power to control the discourse, or at least to
participate in it equally, with another participant - hence the
perceived 'deference' which, if thought to be appropriate by both
participants in a dialogue, will be seen as polite, but if not, may
be reinterpreted as unassertive, weak or uncertain. The meaning
thus depends to some extent on the power relationship between
speaker and hearer. Vulnerability also reflects an unequal power
relationship, but on the the dimension of weak to strong rather
than high status to low status. When power lies with the
stronger, the weaker is by definition vulnerable. Low falls are a
typical feature of child-directed speech. Here the sense of
closure signals control: 'It's all \right' with a low falling nuclear
tone, in conjunction with signals of positive affect (breathy
voice, smile), sounds comforting ('I am strong and I can protect
you'). The same tone may be used between adults if the
participants agree that there is a similar care-giving, and hence
temporarily unequal, relationship between them (such as
between a doctor and a worried patient). If this perception of the
relationship is not shared, however, the tone is likely to be
perceived (and possibly intended) as patronising or sarcastic.
All these 'meanings' simply refer to the effect in different
situational contexts of the same intonational behaviour, namely
signalling closure or non-closure. Closing things down or
leaving things open means different things at different times and
to different people.
There are as many nuances of speaker meanings as there are
speakers and conversations. They can only be explained by
means of linguistic analysis. The intonational cues to speaker
meaning must be accounted for by linguistic analysis too. As
Ladd points out: "because intonation is assumed to be
peripheral, many ordinary assumptions and techniques of
linguistic analysis are discarded without discussion, and the
resulting confusion is taken to reflect the nature of intonation
rather than any shortcomings in the investigation" (1978: 120).
A non-linguistic approach to the intonation of emotion and
attitude can only hope to reveal those characteristics of the
human voice which signal universal human responses. Many of
the labels used in this work, however, reflect meaning which
has been linguistically encoded. To understand how intonation
contributes to meaning in this way, the normal process of
linguistic analysis is essential.
5. REFERENCES
Ajzen, I. and Fishbein, M. Understanding Attitudes
and Predicting Social Behaviour. Prentice Hall,
London, 1980
Couper-Kuhlen, E. English Prosody. Edward Arnold,
London, 1986
Cowie, R., E. Douglas-Cowie, and A. Romano
'Changing emotional tone in dialogue and its prosodic
correlates'. Proc ESCA International Workshop on
Dialogue and Prosody. Veldhoven, The Netherlands,
1999
Crystal, D. Encyclopedia of the English Language.
CUP, Cambridge, 1995
Knowles, G. Patterns of Spoken English. Longman,
London, 1987
Ladd, D.R. The Structure of Intonational Meaning.
Indiana University Press, London, 1978
Leech, G.N. Principles of Pragmatics. Longman,
London, 1983
Murray, I. and Arnott, J. Synthesising emotions in
speech: is it time to get excited? Proc 4th Internat
Conf Spoken Language Processing. 1816-1819,
Philadelphia, 1996
O'Connor, J.D. Phonetics. Penguin Books, Middlesex,
1973
Searle, J.R. A classification of illucutionary acts.
Language in Society 5: 1-23, 1975
Tench, P. The Intonation Systems of English. Cassell,
London, 1996
Wichmann, A. Intonation in Text and Discourse.
Pearson Education, London, 2000