ITRW on Speech and Emotion Newcastle, Northern Ireland, UK September 5-7, 2000 ISCA Archive http://www.iscaĆspeech.org/archive ATTITUDES AND YES-NO QUESTIONS IN STANDARD FRENCH: TESTING TWO HYPOTHESES Olivier Piot Institut de Phonétique de l'Université Paris III-La Sorbonne Nouvelle, CNRS UPRESA 7018, FRANCE e-mail: [email protected] ABSTRACT Two hypotheses are tested here, one about the use of F0 to convey an attitude of ignorance, the other about the expression of motivation by means of F0, vocal intensity and speech rate. Sixteen Klatt-synthesised prosodic variations of a yes-no question « Natacha? » (In French) are used for this test. Results show that greater ignorance is expressed by higher final F0 (and not by speech rate, final /a/ initial F0 or intensity rise), greater desire to know by both higher speech rate and higher final vowel pitch register. Both hypotheses are satisfyingly verified, and this experiment allows to state them in a more precise way, so it may contribute to the theoretical study of the expression of emotions and attitudes by prosody. INTRODUCTION There are a lot of studies on the expression of emotions by prosody, and also many attempts at understanding the linguistic role of intonation: the former ones generally take into account only average prosodic values (average pitch, loudness, speech rate, etc.), while the latter ones very often merely allege the paralinguistic aspect of prosody. But there are few of them that consider the interplay between pitch contours and attitudes, or emotions. The aim of this study is to test two hypotheses, regarding the expression of attitudes and emotions by prosody, applied to a yes-no question rising pitch contour in Standard French: the first one is derived from the issue of size-sound symbolism, the second one comes from the classical concept of arousal. 1. FIRST HYPOTHESIS In the field of the expression of attitudes in speech, there are few theoretical proposals trying to give a biologically based explanation for some prosodic phenomena which are, however, broadly believed to be of a universal kind. One of them comes from Ohala (Ohala, 1984), and is named by him the "frequency code" hypothesis: based on ethological studies by Morton (1977), it claims that man and animals use a lower pitch as a way to convey « the primary meaning of ‘large vocalizer’ and such secondary meanings as ‘dominant, aggressive, threatening, etc.’ », and conversely a higher pitch to convey « the primary meaning of ‘small vocalizer’ and such secondary meanings as ‘subordinate, submissive, non threatening, desirous of the receiver’s goodwill, etc. ». Thus the often-reported cross-language similarities in the 1 intonation contours of yes-no questions (cf. Bolinger, 1978, who reported that about 70% of a sample of nearly 250 languages were said to use a rising terminal to signal questions, and that the remaining 30% use a higher over-all pitch for questions than for non-questions), is explained as a consequence of its universal use to express one’s « desire of the receiver’s goodwill ». More precisely, questioning means asking someone for information, and thus raising pitch is a way of doing so with some well-adapted politeness: expressing inferiority (and therefore a need for assistance) on this one particular point, allows the speaker to make it (more) acceptable for the receiver to provide information without any comparable « reward » in return. Recently we proposed (Piot, 1999) another psychologically grounded explanation for the same linguistic data, based on an intra-individual axis instead of the inter-species one proposed by Ohala. It is a well-agreed and well-documented fact that the average pitch of each individual decreases from birth to adulthood, mainly because of the growth in length and weigh of the vocal folds, and that it stabilises at the age of about thirty (cf. for instance Titze, 1993). This age axis thus constitutes an intra-individual scale that we believe is « symbolically » at stake in yes-no questions final rise, to express a feeling of low « maturity » on one particular point. This view takes subjective experience as a basic morphogenetic principle, rather than inferred knowledge of the acousticophysical properties of vibrating folds, as is implied by Ohala’s proposal. This makes Ohala’s view more difficult to defend, as body size alone doesn’t allow to predict with any accuracy the pitch of animals cross-specifically, and even intra-specifically. On the contrary, the relationship between age and pitch doesn’t suffer from any such disparity, and the curves of men and women show great similarities (Titze, 1993). This is consistent with the observation that pitch excursions, in general (and those of yes-no questions in particular), are described as independent from the average pitch of the speaker, while the Ohala’s hypothesis would predict similar pitch targets, hence greater excursions (toward the high pitch values) for men than for women. As a consequence, what we propose is that high pitch basically conveys a feeling of ignorance or helplessness. Hence in the 1 i.e. questions whose answer is basically - and may be limited to - yes or no. case of a yes-no question, a greater final rise in pitch is expected to convey a greater ignorance. To be more precise, great ignorance means having no clue for guessing what the answer is, while low ignorance means having an opinion. In his paper, Ohala describes two experiments showing that both a lower pitch register or a sharper pitch terminal fall are judged « more dominant » (Ohala, 1984). These results are in agreement with our proposal, but they just provide an evaluation of a gross psychological parameter, not of the pragmatic use of pitch for the purpose of communication. We deal with this aspect in the experiment presented here. 2. SECOND HYPOTHESIS The concept of arousal, coming from research on emotions, is now widely considered as the primary parameter in the field of their expression in speech. It can be glossed as the automatic preparation of the body for action in response to « pregnant » stimuli (i.e. stimuli of importance for the survival of the individual and his cast). Among other autonomic responses (increase in blood pressure, respiratory activity, etc.), one of its main skeletal motor consequence is to increase tonus. It is involved in the vocal expression of anger or joy for instance, increasing the mean values of pitch, intensity, speech rate, pitch range, etc. (cf. Scherer, 1986). This idea was already suggested by Aristotle, who claimed that fear and attraction were the two basic emotions at the source of movement. But Darwin is undoubtedly the first author to put it into a clear and deterministic form, that of his third principle (Darwin, 1872): the excitation of the sensory system, due to painful sensation or the perception of a pregnant stimulus, is redirected to the motor system, in a diffuse and undifferentiated way. The motor aspect of arousal is described, on its rapidly reacting part, as a tonic response of the somatic nervous system (Scherer 1986). Arousal thus facilitates movement, and therefore tends to enhance speech rate (greater articulatory force means greater accelerations of articulators, and so a greater overall speed of articulation), vocal loudness (through an increase in the subglottal pressure Ps, thanks to the increase in respiratory and laryngeal activities, cf. Titze, 1993), and pitch (increase of the global laryngeal activity - or even of Ps alone - raises pitch, cf. Farley, 1994). This doesn’t mean, of course, that greater arousal necessarily increases all three prosodic parameters, because there is a process of control behind the realisation of any motor program (cf. Fonagy 1983, who provides examples of high arousal emotions such as anger, expressed with lower pitch and loudness through active laryngeal constriction), even though tonus itself may escape from voluntary control. But the increase of one or several of these parameters constitute possible indices of greater arousal, and thus may appear (either spontaneously, or through mimicry) in the expression of communicative attitudes such as a desire to get information on an important topic. This also is what we tested in our experiment. 3. YES-NO QUESTIONS IN FRENCH All stimuli used in this study are yes-no question rising contours. This is the most common and unambiguous one (see Grundstrom, 1973) for communicating that a « syntactic assertion » is in fact a question (ex. in English: he is gone?). It has been showed that the only important feature for this contour to be perceived as a question is a high rise in the last syllable (see for instance Faure, 1973), and that before it pitch may either be rising, falling or staying at a constant value. We chose the latter one (see figure 1), which may be considered as unmarked, for the following reasons: 1) it is one of the most studied, simple and unambiguous contours in Standard French, 2) both our hypotheses can be tested in a simple manner on this contour, by modifying either speech rate or final vowel loudness and pitch contours, 3) a great many intonation languages use final syllable pitch rise for yes-no questions (see part 1, this paper), 4) this latter observation is certainly the most immediate issue to be accounted for by theories of a universal meaning of intonation, 5) ignorance and desire to know are two important psychological features of the speaker’s situation in regard to the subject of his questioning. Applied to the one word phrase « Natacha » (see below for a justification of this choice), it could have the meaning « Is it Natacha that you’re talking about? », or « who has won this time, Natacha? » when the context and centre of interest is clearly the points' counting at the end of a game. 4. EXPERIMENTAL PROCEDURE We thus limited ourselves here to synthesising one particular type of yes-no question contour: the pitch stays at a flat level until the last syllable, where it sharply rises. We saw in part 3 above that this is the most representative contour for yes-no questions in French, and that it is particularly well suited to testing both our hypotheses. We used the « Compost » software, an ergonomic and high quality Klatt synthesiser, which is the only known way to modify one speech parameter while leaving all the others unchanged. We used the one word phrase « Natacha », which was chosen for the following reasons: 1) 3 syllables are enough for all three prosodic parameters variations to be clearly perceived (which is confirmed by the results), 2) the use of 3 /a/ vowels neutralises intrinsic pitch and intensity effects, 3) the use of a small number of syllables diminishes the non informative input to the subjects’ ears, while allowing them to listen to the stimuli several times in a smaller space of time. Figure 1 shows the method and values we chose for the 2 synthesis of our stimuli : they are based on the cross-variation of two different speech rates (the lower of the two corresponds to the initial synthesis, which was made after a recording at a rather low speech rate), two different loudness contours on « cha » (the lower of the two corresponds to an unmarked yes-no question, see Fonagy & Bérard, 1973 p. 78, and the higher one to a version perceived as emphatic), and four final pitch contours: the four values 154, 192, 240 and 300 Hz constitute a geometric progression, which means that they are perceptively in a constant progression. This implies here only two different 2 This choice is based on recordings: we recorded numerous realisations of the yes-no question « Natacha? » as described in the text, varying all three prosodic values while remaining in the scope of a perceved yes-no question final /a/ pitch rises on a perceptual point of view: one going from 154 to 192 and from 192 to 240, and a steeper one going from 154 to 240 and 192 to 300. This was designed in order to allow the comparison of as many influences as that of final syllable mean pitch (the perceptual steepness of the rise being held constant), initial value of the rise (with a constant final value), final value of the rise (with a constant initial value), and hence steepness of the rise (with a constant initial or final value). In a preliminary experiment, all 16 stimuli sounded like possible yes-no questions to our subjects, which was later confirmed during the main experiment (see also for example Faure 1973, p. 14, for a precedent experimental justification). 300 Hz 240 Hz Pitch 192 Hz 192 Hz 154 Hz 123 Hz Vocal Intensity + 7 dB + 4 dB + 2 dB Speech original phonemes durations Rate or multiplied by 0,8 figure 1. Prosodic cross-variation of pitch, loudness and speech rate, used to synthesise 16 yes-no question stimuli on the one word phrase « Natacha? ». The output of the synthesiser is at a 5 kHz - 16 bits format, so the stimuli were bandpass filtered between 75 and 5700 Hz. 200 and 0 milliseconds spaces of time were put at the beginning and end (resp.) of each file, the 200 ms insuring that the mouse click didn’t interfere with the hearing of the stimuli, and the final 0 ms allowing Ss to listen to the stimuli again as quickly as they 3 liked. The experiment was designed as a HyperCard stack , containing general instructions on the first card, followed by 8 test cards (4 for each one of both tested hypotheses) for a preliminary practise (insuring that the instructions were clearly understood, and displaying the entire range of prosodic variations appearing in the stimuli); then appeared two series (a) and (b) (one for each tested hypothesis) of 32 test cards, each series being made of two successive randomised series of all 16 stimuli, and being preceded by one additional instruction card. 3 a stack is made of a series of cards following each other Each test card contained one stimulus (which could be listened to one or several times, by pressing a button), one attitude, and a popup box for the ratings. In part (a) of the experiment the attitude was « ignorance EXPRIMEE » (EXPRESSED ignorance), and in part (b) it was « envie de savoir EXPRIMEE » (EXPRESSED desire to know). The word « exprimée » was highlighted because we wanted the subjects to make their judgement only on what was directly expressed, and so to prevent them from making any inference on the speaker’s thoughts. For instance we had to avoid such underlying interpretations as « the speaker is restraining his anger »: the prosody was the target of the judgement. In both parts (a) and (b) the rating choice was between « très grande » (very high), « grande » (high), « moyenne » (medium), and « faible » (low). This « literal » judgement was found, in a preliminary experiment, to be easier than the arithmetical one (notation from 1 to 4) for the subjects to use. The lowest level of notation was chosen not to be « absent » (or 0), because both attitudes are implicitly present in this particular context of a yes-no question. Each additional instruction cards had two roles: the first was to indicate that part (a) or part (b) was about to start, and the second was to warn the subjects against the usual positive correlation between ignorance and desire to know, by giving them examples where the former is high while the latter is low, and vice versa. The division of the experiment into two following parts was made to facilitate both the experimental task and the consistency in the judgements, by favouring the subjects’ elaboration of their own notation scale. They could go back to the additional instruction card whenever they wanted to, by just pressing a button. On each card of each part, a number (decrementing from 32 to 1) showed them their progression. The experiment took place in an anechoic room, using professional quality earphones. The loudness (vocal intensity) at the subjects’ ears was comprised between 65 and 72 dBA, which provided a comfortable listening without causing noticeable auditive fatigue, all the more as the experiment was of a rather short duration (between 10 and 20 minutes). The subjects were 20 students in linguistics, all were native speakers of standard French. They were between 18 and 25 years old, 14 were female and 6 male. 5. RESULTS AND DISCUSSION The results, averaged over all 20 Ss, are shown on table 1 for part (a), and table 2 for part (b), where « très grande » (very high), « grande » (high), « moyenne » (medium), and « faible » (low) ratings were converted into 4, 3, 2 and 1 (resp.). The stimuli are coded in the following way: the first letter is for the speech rate (R: rapid, L: slow), and the following numbers are, from left to right, for final /a/ initial pitch (1: 154 Hz, 2: 192 Hz), final pitch (2: 192 Hz, 3: 240 Hz, 4: 300 Hz), and final loudness (0: rise from 0 to 2 dB, 1: rise from 4 to 7 dB). Part (a): « ignorance exprimée » Part (b): « envie de savoir exprimée » (expressed ignorance) (expressed desire to know) stimulus mean value st. dev. stimulus mean value st. dev. L120 1,88 1,02 L120 1,40 0,55 L121 1,88 0,99 L121 1,48 0,64 L130 2,20 0,72 L130 2,05 0,68 L131 2,25 0,71 L131 2,08 0,69 L230 2,35 0,74 L230 1,95 0,64 L231 2,15 0,77 L231 2,20 0,72 L240 3,10 0,78 L240 3,23 0,77 L241 3,20 0,65 L241 3,38 0,67 R120 1,88 0,82 R120 1,83 0,75 R121 1,85 0,74 R121 1,98 0,77 R130 2,23 0,73 R130 2,33 0,73 R131 2,38 0,81 R131 2,58 0,71 R230 2,15 0,74 R230 2,63 0,74 R231 2,60 0,87 R231 2,83 0,71 R240 3,10 0,84 R240 3,50 0,55 R241 3,18 0,93 R241 3,75 0,44 table 1: Mean ratings and standard deviation (expressed ignorance) for each yes-no question stimulus, averaged on twice-repeated ratings from 20 Ss (for a description of the stimuli’s naming, see part 5 above). Rating choice was between 1 (lowest notation), 2, 3 and 4 (highest notation). table 2: Mean ratings and standard deviation (expressed desire to know) for each yes-no question stimulus, averaged on twicerepeated ratings from 20 Ss (for a description of the stimuli’s naming, see part 5). Rating choice was between 1 (lowest notation), 2, 3 and 4 (highest notation). part (a): « ignorance EXPRIMEE » part (b): « envie de savoir EXPRIMEE » For all our statistical comparative measures we used the bilateral paired t-test. We didn’t find any influence of the speech rate, either globally (p=0,44), or considering each of the 8 R/L pairs of stimuli (p>0,23, except for L/R231: p=0,012). As we have p=0,23 for L/R230, which sounds very much like L/R231, we decided not to give too much importance to the L/R231 exception by using p=0,01 as the threshold value, i.e. by considering only the « very significance » of our results. Therefore we grouped together the results of R/L stimuli pairs. In the same way, no influence of the loudness contour could be found, either globally (p=0,14), or considering each of the 4 pitch contours (p>0,28). We therefore constituted 4 groups of 4 stimuli, each group differing from the others by its pitch contour. We then couldn’t find any influence of the final /a/ initial pitch, by comparing « 13 » and « 23 » pitch contours (who have the same final pitch value): p=0,50. But the influence of final pitch is highly significant, stimuli being judged to express more ignorance when final pitch is higher (p=1,16E-06 for « 12 » vs. « 13 », and p=7,83E-20 for « 23 » vs. « 24 »). Loudness has a positive influence on all mean values (see table 2), but none of them is very significant (p>0,02), while globally it is (p=3,13E-04). This suggests that loudness could have a small influence, which needs more Ss to appear as significant. However we grouped together the results of all pairs of stimuli differing by their intensity contour. The global influence of speech rate was high (p=6,87E-16), and was attested in all 4 cases (p<2E-03). Final /a/ initial pitch had no influence at low speech rate (p=0,90), but had a slight one at high speech rate (p=9,66E-03): a stimulus whose final rise is beginning higher was judged to express a little more desire to know at high speech rate (2,33 and 2,58 vs. 2,63 and 2,83 resp.). Maybe this small effect could be explained by the need of a fast increase in laryngeal tension, at high speech rate, in order to go from 123 Hz before the « ch » sound, to 192 Hz immediately after it. This fast increase could therefore be perceived as an indication of a higher arousal. The influence of final pitch was highly significant, either globally (p=4,43E-46) and for each of all 4 pairs (p<1E-06). As predicted the final pitch value has a great influence on ratings for « ignorance EXPRIMEE ». Final /a/ average pitch can not represent this tendency as well as final pitch value, because final /a/ initial pitch alone has no influence on the ratings. The purpose of the additional instructions, that is the dissociation between ignorance and desire to know, have been successfully achieved: parts (a) and (b) appear to have very different results. Thus vocal intensity has a small positive effect on « envie de savoir EXPRIMEE », although it is significant only in a global comparison. Speech rate also has an influence, but it is higher (with the prosodic variations we used in this experiment) and always highly significant. The same is true of the influence of final pitch, but with the influence of final /a/ initial pitch at high speech rate, the question is to know which one of final pitch or final /a/ pitch register gives the best account for this tendency. Thus if we further compare « 12 » vs. « 23 », and also « 13 » vs. « 24 » pitch contours, for both high and low speech rate, we find that the positive influence of final /a/ pitch register is even much more significant than that of final pitch, either globally (p=2,90E-57, to be compared to p=4,43E-46), and for each of all 4 pairs (p<1E-10, to be compared to p<1E06), while mean ratings show a comparable effect for « L » stimuli, but a higher effect of final /a/ pitch register for « R » 4 ones . It could be that a high speech rate reveals, maybe for the reason we suggested above, the critical influence of final /a/ initial pitch, and hence that final /a/ pitch register and speech rate are two clue parameters explaining the results of part (b). CONCLUSION Both hypotheses that are tested here are quite satisfyingly verified: in yes-no questions, greater ignorance may be expressed by a steeper final pitch rise, and greater desire to know by higher final vowel pitch register and faster speech rate. In this latter case final vocal loudness has a systematic positive influence on the ratings, but it doesn’t reach significance except on a global comparison. This study, being at the interplay between attitudes and emotions, may contribute to theorical work on of the expression of emotions in speech, as well as to a better understanding of the interplay between linguistic and paralinguistic aspects of intonation. It may be followed by a comparable study, based on assertive contour(s). REFERENCES Bolinger (1978): « Intonation across languages », in J. H. Greenberg et al. (Eds.) Universals of human language , Phonology, 2: 471-523. Darwin, C. (1872): The expression of the emotions in man and animals , London: Murray. Farley, G., R. (1994): "A quantitative model of voice F0 control", JASA 95 (2): 1017-1029. Faure, G. (1973): « La description phonologique des systèmes prosodiques », in A. Grundstrom & P. Léon 4 a simple calculus shows that it straightly comes from the difference we found in the influence of final /a/ initial pitch (Eds.) Interrogation et intonation en Français standard et Français Canadien (Studia Phonetica, 8), Montreal: Didier. Fonagy, I. , Bérard, E. (1973): « Questions totales simples et implicatives en français parisien », in A. Grundstrom & P. Léon (Eds.) Interrogation et intonation en Français standard et Français Canadien (Studia Phonetica, 8), Montreal: Didier. Fonagy, I. (1983): La vive voix , bibliothèque scientifique Payot. Morton, E., W. (1977): "On the occurrence and significance of motivation-structural rules in some birds and mammal sounds", Am. Nat., 111: 855-869. Ohala, J., J. (1984): "An ethological perspective on common cross-language utilization of F0 of voice", Phonetica, 41: 1-16. Piot, O. (1999): « Une approche morphogénétique des ‘clichés mélodiques’ du français standard », Faits de Langue, 13: 26-34. Scherer, K., R. (1986): "Vocal affect expression: a review and a model for future research", Psychol. Bulletin, 99: 141-165. Titze, I., R. (1993): Principles of voice production, Prentice Hall.
© Copyright 2026 Paperzz