Language, Cognition and Neuroscience ISSN: 2327-3798 (Print) 2327-3801 (Online) Journal homepage: http://www.tandfonline.com/loi/plcp21 Prosody and information structure in a tone language: an investigation of Mandarin Chinese Iris Chuoying Ouyang & Elsi Kaiser To cite this article: Iris Chuoying Ouyang & Elsi Kaiser (2015) Prosody and information structure in a tone language: an investigation of Mandarin Chinese, Language, Cognition and Neuroscience, 30:1-2, 57-72, DOI: 10.1080/01690965.2013.805795 To link to this article: http://dx.doi.org/10.1080/01690965.2013.805795 Published online: 14 Jun 2013. Submit your article to this journal Article views: 403 View related articles View Crossmark data Citing articles: 1 View citing articles Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=plcp21 Download by: [USC University of Southern California] Date: 29 May 2016, At: 13:44 Language, Cognition and Neuroscience, 2015 Vol. 30, Nos. 12, 5772, http://dx.doi.org/10.1080/01690965.2013.805795 Prosody and information structure in a tone language: an investigation of Mandarin Chinese Iris Chuoying Ouyang* and Elsi Kaiser Department of Linguistics, University of Southern California, Los Angeles, CA 90089, USA Downloaded by [USC University of Southern California] at 13:44 29 May 2016 (Received 2 April 2012; final version received 8 May 2013) Prosody conveys discourse-level information, but the extent to which prosodic cues distinguish different kinds of information-structural concepts remains unclear. The prosodic encoding of information structure is even more complicated in tone languages, where acoustic cues such as F0, intensity and duration also distinguish lexical items (e.g. Mandarin). Prior work on Mandarin led to divergent findings regarding whether and what prosodic cues mark the distinctions between information-structural types. We conducted a production study on Mandarin to investigate whether (1) the presence/absence of corrective focus and (2) the distinction between new/given information are encoded prosodically. Our results show that correctiveness was reflected in all three acoustic parameters: corrective words had longer durations, larger F0 ranges and larger intensity ranges than non-corrective words. The new-given distinction was reflected only in lengthening and F0 range expansion, and only in the absence of correction (correctiveness-by-givenness interaction). This suggests that new information is encoded differently from corrective focus in Mandarin: only corrective focus is associated with intensity range expansion. Our results provide further evidence for the multi-functionality of acoustic-prosodic dimensions. Even in a language with lexical tones, which differ in F0, intensity and duration, all these dimensions also encode information structure. Furthermore, not only can prosodic cues indicate discourse importance, they also distinguish different types of information structure in Mandarin. Our findings highlight the fine-grained ability of the language production system to utilise different aspects of acoustic dimensions with great efficiency. Keywords: corrective focus; new-information focus; prosodic encoding; intensity range expansion Introduction Across languages, it is widely accepted that prosodic structure can convey discourse-level information (e.g. Gussenhoven, 1983; Ladd, 1980; Watson, 2010). Acoustically, there are three dimensions that are commonly regarded as providing cues about information structure: duration, fundamental frequency (F0) and intensity. In English, signals of prosodic prominence such as longer duration, changes in F0 movement and greater intensity appear on elements that are semantically or pragmatically prominent (e.g. Breen, Fedorenko, Wagner, & Gibson, 2010; Cooper, Eady, & Mueller, 1985). For instance, consider the sentence ‘he bought mangos’ in the following contexts: (1) a. What did Peter buy at the market? b. He bought mangos (narrow new-information focus). (2) a. What did Peter buy at the market? Did he buy mangos? b. Yes, he bought mangos (given information). (3) a. What fruits did you and Peter buy at the market? *Corresponding author. Email: [email protected] # 2013 Taylor & Francis b. I bought apples and oranges; he bought mangos (contrastive information). In response to (1a), mangos in (1b) is new information that cannot be inferred from the preceding discourse context. In contrast, the same word mangos in (2b) in response to (2a) is given information because it has been mentioned in preceding discourse. When responding to question (3a), mangos in (3b) is contrastive information: it belongs to the same set (i.e. fruits sold in the market) as some other information provided in the previous utterances (i.e. apples and oranges) and hence creates a contrast. Prior work on English indicates that prosody can distinguish between different information-structural properties: Katz and Selkirk (2011) recently show that contrastive focus has stronger effects than new information on words’ duration, F0 movement and intensity. While most researchers agree that contrastive focus and new-information focus are associated with increased duration and intensity, the relationship between F0 contours and different focus types is less well understood. It has traditionally been argued that Downloaded by [USC University of Southern California] at 13:44 29 May 2016 58 I.C. Ouyang and E. Kaiser different types of discourse information occur with distinct pitch accents. Pierrehumbert and Hirschberg (1990) conclude that the stressed syllables of words in new-information focus receive an H* pitch accent, whereas the stressed syllables of words in contrastive focus receive an LH* pitch accent. However, recent evidence suggests that the mapping between informationstructural types and pitch accents is not a straightforward one-to-one relation. Watson, Tanenhaus, and Gunlogson’s (2008b) visual-world eye-tracking study reveals that listeners look towards contrastive referents when they hear an LH* pitch accent, whereas hearing an H* accent leads listeners to consider both new and contrastive referents. While these findings confirm that the patterns of F0 movement provide cues for new-information focus and contrastive focus in English, Watson et al.’s results also show that newinformation focus and contrastive focus do not map straightforwardly onto different pitch accents. Information structure and prosody in Mandarin Chinese The question of how prosodic cues map on to different information-structural types becomes even more complex when we consider tone languages, where duration, F0 and intensity also distinguish between lexical items (e.g. African languages: Zerbian, Genzel, & Kugler, 2010; Cantonese: Bauer, Cheung, Cheung, & Ng, 2004; Vietnamese: Jannedy, 2007; cross-linguistic discussion: Hartmann, 2007). In Mandarin Chinese, for example, four pitch patterns commonly referred to as ‘tones’ function as phonemes: high (Tone 1), rising (Tone 2), low (Tone 3) and falling (Tone 4). They can alter lexical meaning, as shown in (4). In addition to the four-way distinction based on F0 movement, lexical tones in Mandarin also differ in amplitude and length. Tone 2, Tone 3 and Tone 4 are perceptible solely on the basis of their amplitude contours (Whalen & Xu, 1992), and Tone 3 is 1.5 times longer than the other tones when produced in isolation (Xu, 1997). (4) Tone Tone Tone Tone 1 2 3 4 ma ma ma ma [High] ‘mother’ [Rising] ‘hemp’ [Low] ‘horse’ [Falling] ‘scold’ Given that intensity, duration and F0 are used to distinguish lexical items in Mandarin, we are faced with the question of whether these acoustic dimensions also function as signals to information structure, and if so, how they accomplish this dual role. As will become clear in the rest of this section, existing research on different information-structural categories in Mandarin has led to divergent results regarding the specifics of which prosodic cues distinguish different informationstructural categories. Regarding F0, most researchers agree that in Mandarin, information structure is conveyed not by the shapes of F0 contours (as is the case in English), but rather by their ranges (e.g. Chen & Braun, 2006; Jin, 1996). For example, Chen and Gussenhoven (2008) found that the F0 shapes specified by different lexical tones are still distinct from one another within an information-structural type. This makes sense given that the shapes of F0 contours are the major cue for lexical tones in Mandarin, and thus need to be maintained to ensure successful spoken word recognition. However, if it is the ranges and not the shapes of the F0 contours that mark information structure in Mandarin, this means that the approaches based on data from English (e.g. Pierrehumbert & Hirschberg’s view that LH* and H* map onto different informationstructural categories) cannot be applied directly to Mandarin, raising fundamental questions for our understanding of how human languages represent the interface between prosody and information structure. Let us now review existing work on how prosody encodes information structure in Mandarin, starting with claims regarding new-information focus, as mangos in (1b), repeated here as (5b). Researchers have compared words that are new information in narrow focus to those in broad focus, exemplified in (6). Here, the entire response in (6b) can be construed as new information. (5) a. What did Peter buy at the market? b. He bought mangos (narrow new-information focus). (6) a. What happened? b. He bought mangos (broad new-information focus). Prior research has found that words that are new information in narrow focus have longer duration (Jin, 1996), larger F0 ranges (Jin, 1996; Xu, 1999) and higher mean F0 (Chen, Wang, & Xu, 2009), when compared to words that are new information in broad focus. The results for intensity in this domain are less clear: Chen et al. (2009) claim that new information in narrow focus has a higher mean intensity than new information in broad focus, but this is not found by Jin (1996). It is important to note that in general, these studies compared narrow new-information focus (5b) and broad new-information focus (6b), rather than new-information focus and non-focus (i.e. given information). Consequently, their results shed light on the differences between narrow new-information focus and broad new-information focus, but do not allow us to draw conclusions regarding the nature of the Downloaded by [USC University of Southern California] at 13:44 29 May 2016 Language, Cognition and Neuroscience prosodic differences between focused material and nonfocused (given) material. The only existing experimental work in Mandarin that we have seen directly comparing new and given information (termed ‘rheme’ and ‘theme’ in their study) is Chen and Braun (2006). They found that new information has longer duration and larger F0 ranges than given information. However, their data were elicited through a reading task where the participants read questionanswer pairs; to the best of our knowledge, there is no prior work investigating this issue with a more natural design. Turning now to contrastive focus, we first need to point out that most prior work has focused on corrective focus, as exemplified in (7): (7) a. What fruit did Peter buy in the market? Did he buy apples? b. No, he bought mangos. Here, mangos in (7b) is corrective information that is intended to replace the wrong information in a previous utterance (i.e. apples). Correction is considered to be a subtype of contrastive focus, since it provides an alternative to some information in the context (e.g. Dik, 1997). We will refer to this subtype of contrastive focus as ‘corrective focus’ henceforth. A number of studies on corrective focus realisation in Mandarin compared correctively-focused words to non-focused (given) words embedded in an utterance that contains some kind of focus, e.g. new-information focus as in (8), such as Chen (2006), or corrective focus as in (9), such as Chen and Gussenhoven (2008). Thus, these studies compared corrective ‘mangos’ in a context like (7b) to unfocused/presupposed ‘mangos’ in contexts like (8b) or (9b). (8) a. What did Peter do to the mangos? b. He bought mangos (new-information focus on ‘bought’). (9) a. Did John buy mangos? b. Peter bought mangos (corrective focus on ‘Peter’). As a whole, these studies found that correctively-focused words have longer durations (Chen, 2006; Chen & Gussenhoven, 2008) and larger F0 ranges (Chen & Gussenhoven, 2008) than non-focused words. This result fits well with data from other languages as well as the general intuition that corrective words tend to be more prominent (in various ways) than words that refer to already mentioned or presupposed information. However, there seems to be no prior work investigating the intensity of correctively-focused words in Mandarin. Some existing Mandarin studies have also compared new-information focus and corrective focus on 59 the other hand, but with conflicting results. Greif (2010)1 found that correctively-focused words had longer durations than words in new-information focus, but did not differ reliably in terms of their F0 ranges. In contrast, when Chen and Braun (2006)2 compared corrective focus to new-information focus, they found differences in F0 ranges but no differences in duration. Neither Greif (2010) nor Chen and Braun (2006) looked for differences in intensity between correctively-focused words and new-information words. In sum, while existing findings for Mandarin generally agree that (1) narrow new-information focus involves increases in F0 displacement and duration when compared to broad new-information focus and that (2) corrective focus similarly involves increases in F0 displacement and duration when compared to unfocused words, there is as of yet no consensus about how the two focus types (new-information focus and corrective focus) differ from each other, nor about how constituents in new-information focus differs from non-focused constituents. In light of the outcomes of Greif (2010) and Chen and Braun (2006), one might conclude that perhaps new-information focus and corrective focus do not differ reliably in their acoustic encoding. However, because the details of the target sentences, the designs and the information-structural manipulations in these two studies differ from each other (because their general aims were different), we should be careful in comparing them directly. In essence, then, existing work has not led to a conclusion about whether and how corrective focus and new-information focus differ from each other. It is quite striking that the prosodic properties of two major types of information structure that have received the most attention in research on English (e.g. Calhoun, 2006; Katz & Selkirk, 2011; Pierrehumbert & Hirschberg, 1990; Watson et al., 2008b) new information and contrastive focus are not yet well understood in Mandarin. An understanding of this question in a cross-linguistic context is important for theories of information structure. This question relates to fundamental issues about whether new-information focus and corrective focus are semantically distinct categories or variants of the same category, and whether their prosodic realisations are categorically different or variants on a continuum (see Watson et al., 2008b for related discussion). Aims of this study, predictions As we saw in the preceding section, it remains unclear to what extent prosodic cues differentiate one type of discourse information from another in Mandarin. To Downloaded by [USC University of Southern California] at 13:44 29 May 2016 60 I.C. Ouyang and E. Kaiser shed light on this issue, and more generally on the question of how information structure is realised prosodically in Mandarin, a tone language where all three prosodic dimensions duration, F0 and intensity already serve lexical purposes, we conducted a psycholinguistic production study that investigates three main questions. First, do words in new-information focus vs. corrective focus differ from each other in terms of their prosodic realisation in Mandarin, and if so, which parameters (e.g. duration, mean F0, F0 range or intensity) encode these differences? Second, we also investigated whether and how the distinction between ‘new’ and ‘given’ influences the acoustic realisation of both corrective and non-corrective words. As we saw in the preceding section, prior work does not yield a clear picture. Broadly speaking, we expected new information to be prosodically more prominent than given information in Mandarin, based on existing work in other languages (Brown, 1983; Fowler & Housum, 1987) as well as Chen and Braun (2006)’s findings regarding increased duration and larger F0 ranges. However, the question of whether and how corrective/contrastive focus differs from new-information focus is more open. As we saw above, existing work on Mandarin has not reached a consensus (Greif, 2010 vs. Chen & Braun, 2006). A comprehension experiment on English by Watson et al. (2008b) found that F0 shapes in English do not map neatly onto contrastive vs. new-information focus, and a production study by Katz and Selkirk (2011) further suggests that contrastive focus and new-information focus differ in prosodic prominence. To contribute to our understanding of this phenomenon in a cross-linguistic setting, we wanted to see how and whether the two focus types differ in Mandarin, a language where discourse-driven prosodic cuing is constrained by the existence of lexical tone. The third key aim of our study is a more inclusive analysis of the acoustic parameters relevant to information structure, to ensure that potentially crucial distinctions are not inadvertently overlooked. This aim has two sub-parts: First, we analysed not only duration and F0 but also intensity. Prior studies on Mandarin mostly focused only on duration and F0. The two studies that did look at intensity (Chen et al., 2009; Jin, 1996) analysed mean intensity but did not look at potential effects of information structure on intensity ranges (i.e. difference between maximum and minimum intensity) and even in the domain of mean intensity, their results do not agree with each other. Given that intensity contours, as well as F0 contours, are associated with lexical tones in Mandarin, we expected that intensity ranges could reflect discourse-level information just like F0 ranges do. Thus, we investigated how and whether all three prosodic dimensions encode information structure. The second sub-part of this third aim has to do with how F0 range and intensity expansion is accomplished. Theoretically, there are three possible ways of expanding the range of an excursion: (1) raising the maximum and lowering the minimum, (2) only raising the maximum and (3) only lowering the minimum, as illustrated schematically in Figure 1. If one finds, say, F0 range expansion for both new-information focus and corrective focus, then in order to assess whether the phenomena are really the same or not, we need to investigate the underlying source of the range expansion, i.e. which of the three ‘strategies’ is being used. Failing to do this could result in incorrectly grouping together two phenomena that are underlyingly different. Thus, we conducted detailed analyses not only of F0 and intensity ranges but also maxima, minima and means. Production study: method To investigate the prosodic encoding of information structure in a tone language, we conducted a production study on Mandarin. Participants produced instructions based on pictures and arrows shown on a computer screen. They were told to imagine that another person in another room would be listening to their instructions and moving the objects on the screen accordingly (though this person might sometimes make mistakes), and that the movements made by the listener would be visible on the participants’ computer screen. Using pictures allowed us to avoid presenting participants with written sentences which can result in unnatural ‘reading’ intonation. In the following subsections, we will first discuss the experimental design and the stimuli, and then go over the procedure. In this study, we focus on two major distinctions between information-structural types: (1) Do new and corrective elements differ from each other in terms of their prosodic realisation in Mandarin? (2) Are new and given elements marked differently in prosody? Although these kinds of issues have been investigated (a) (b) Figure 1. Possible ways of expanding ranges. (c) Language, Cognition and Neuroscience Downloaded by [USC University of Southern California] at 13:44 29 May 2016 in prior work, the existing results do not yield a clear picture. Design and stimuli Participants saw coloured pictures on the computer screen. Objects were presented in circles, each with its name shown below. There were six pictures on each screen. Arrows were used to indicate the commands participants should produce. For example, in Figure 2a, the arrow points from the cigarette to the lounge chair, so participants should say: ‘Move the cigarette next to the lounge chair’. After they produced the instruction, participants saw a moving event on the computer screen that responded to the instruction either correctly or incorrectly. For example, in Figure 2b, the cigarette is moved next to the lounge chair, which is a correct response. To examine discourse-level intonation across lexical tones, we manipulated the information structure of the target words and controlled their tonal combinations. Specifically, a repeated-measures within-subjects design with two independent variables were used: (1) correctiveness (with two levels: presence or absence of correction) and (2) givenness (with two levels: new or given information). Target words were bisyllabic, with one of the three tonal combinations: HighHigh (HH), HighLow (HL) or LowHigh (LH). A third of the target words were HH, a third were HL and a third were LH. All sentences were produced in the frame illustrated in (10).3 For instance, the sentence ‘ba xiangyan (cigarette) fangdao/fangzai tangyi (lounge chair) pangbian’ would be produced for the display in Figure 2. (10) ba OBJECT fang-dao/-zai LOCATION pangbian BA OBJECT put-PREP LOCATION side ‘Move the OBJECT next to the LOCATION’ Figure 2. 61 A target word always appeared in the OBJECT role in a sentence. Table 1 shows the summary of the four conditions. Target sentences were the last two sentences in a trial, i.e. the sentences in bold in Table 1. Next, let us consider the four conditions in more detail. Broadly speaking, there were two types of target trials: new-information trials and given-information trials. The new-information trials were composed of three spoken instructions (sentence (a), (b) and (c) in Table 1). First, a participant saw an image with an arrow and produced the corresponding sentence, e.g. Move the cigarette next to the lounge chair (sentence (a)). The object moved correctly in the display, and after that, another arrow appeared. To convey the information represented by the second arrow, the participant produced another sentence, e.g. Move the juice next to the crow (sentence (b)). After the second sentence was uttered, an incorrect object moved to the location. For example, instead of the juice, the pacifier moved next to the crow. To correct the moving event, the participant repeated the instruction, e.g. Move the JUICE next to the crow (sentence (c)). This time, the correct object moved on the screen. The corrective sentence was the last sentence in a trial. Note that on these new-information trials, the target word (juice in this case) had not been mentioned until sentence (b) was uttered for the first time, i.e. neither of the two nouns mentioned in sentence (a) (cigarette and lounge chair in this case) was the target word in sentence (b). Thus, ‘juice’ was new information when it was mentioned in sentence (b), which we refer to as the non-corrective new information condition. Later, when the participant repeated the instruction in order to correct the incorrect moving event, thus producing sentence (c), we refer to this as the corrective new information condition since the target word (juice) here was uttered in a corrective context. As a result, the distinction between new and given in this study is defined from the hearer’s perspective. From the Sample display. 62 Table 1. I.C. Ouyang and E. Kaiser Structure of target trials. Trial type Given information 1st sentence 1st visual event (a) Move A next to B (Correct moving) (d) Move A next to TARGET (Correct moving) 2nd sentence [Non-corrective new] (b) Move TARGET next to C (Wrong object is moved next to C) [Non-corrective given] (e) Move TARGET next to C (Wrong object is moved next to C) [Corrective new] (c) Move TARGET next to C (Correct moving) [Corrective given] (f) Move TARGET next to C (Correct moving) 2nd visual event 3rd sentence 3rd visual event Downloaded by [USC University of Southern California] at 13:44 29 May 2016 New information perspective of the speaker, by the time they got to sentence (c), ‘juice’ was already given information because the speaker had already uttered it in sentence (b). However, from the perspective of the hearer, ‘juice’ was presumably still new information at the point where sentence (c) was uttered, since the hearer apparently did not hear sentence (b) correctly and moved an incorrect object instead of the ‘juice’. Thus, the fact that a wrong object was moved implies that the hearer did not pay attention to sentence (b) and misheard the object to be moved i.e. it was still new information to the hearer in sentence (c) (we discuss this more below). Having considered the new-information trials, let us now turn to the given-information trials. The giveninformation trials had the same structure as the newinformation trials, except for the information-structural properties of the target words. Specifically, in the given trials, the target word had already been mentioned in the LOCATION role of the first sentence (sentence (d) in Table 1). In other words, it had already been involved in an earlier moving event, as shown in Table 1. Thus, in the given-information trials, the second spoken instruction (sentence (e) in Table 1) is in the noncorrective given information condition, and the third sentence (sentence (f) in Table 1) is in the corrective given information condition. In our design, we used the distinction between the speaker’s perspective and the hearer’s perspective to differentiate the corrective given and corrective new conditions. Importantly, there is a considerable body of work showing that speakers’ prosodic realisations are indeed sensitive to the hearer’s perspective and attentional state. For example, in instruction-giving tasks where participants gave instructions to confederates about where to place objects (e.g. ‘the teapot goes on red’), longer pronunciations were used for the determiner when the addressees were not anticipating them (Arnold, Kahn, & Pancani, 2012) and for the object to be moved when the addressees were multitasking (Rosa, Finch, Bergeson, & Arnold, 2013). In addition, Ito and Speer (2006) showed that if the addressee makes a mistake, the correction by the speaker is produced with a contrastive accent. Other studies using interactive designs also found that listener-oriented processes affect prosodic prominence in the speaker’s output in various ways (e.g. Lam & Watson, 2010; Watson, Arnold, & Tanenhaus, 2008a). In addition, corpus work on word order patterns and non-canonical constructions (in English and other languages) shows that people are very sensitive to whether or not something has been mentioned in preceding discourse/successfully introduced into the discourse model (e.g. Birner & Ward, 1998, see also Prince, 1992). Based on these findings, it seems reasonable to assume that speakers understand that in our corrective new condition, the critical target noun is new to the hearer, whereas in the corrective given condition, the critical target noun has already been introduced to the discourse (and the listener’s mental model of the discourse) by virtue of being successfully moved after the first sentence (sentence (d)). Definition of new/given. It is important to note that in our design, the distinction between ‘given’ and ‘new’ information is drawn in terms of discourse-status, i.e. whether the noun has already been mentioned in the preceding discourse or not (e.g. Birner & Ward, 1998; Kaiser & Trueswell, 2004; Prince, 1992). Thus, any entity that has already been mentioned in the prior discourse, regardless of grammatical position, is by definition discourse-old/given information. In particular, the target word in the given conditions first occurs in the LOCATION role and then in the OBJECT role. Prior work has identified connections between prosodic cues and discourse-status (e.g. Féry, Kaiser, Hörnig, Weskott, & Kliegl, 2009 on F0 contours in German) as well as word order patterns and discourse-status (e.g. Birner & Ward, 1998 on English). This definition of old vs. new differs from Terken and Hirschberg (1994) and Schwarzschild (1999), who note that a change in syntactic role (and/or surface position) can render something ‘new’ for purposes of accent assignment. According to their view, the target nouns in our given conditions should in fact count as ‘new’ for purposes of Downloaded by [USC University of Southern California] at 13:44 29 May 2016 Language, Cognition and Neuroscience prosodic prominence because their syntactic role has changed. However, an eye-tracking study by Dahan, Tanenhaus, and Chambers (2002) shows that when listeners hear temporarily ambiguous discourse-old nouns realised in a new syntactic position with prosodic prominence, they do not interpret these prosodic cues as potentially referring to discoursenew referents. In other words, Dahan et al.’s results suggest that discourse-old entities mentioned again in a different syntactic position do not pattern in the same way that discourse-new nouns do, seemingly in contrast to the claims of Terken and Hirschberg (1994). Pre-empting our findings somewhat, let us note already that our results indicate that Mandarin speakers are indeed sensitive to the given vs. new distinction defined in terms of discourse-information status and independently of grammatical role or surface position. Although the given information manipulation in our experiment may not represent the ‘highest degree’ of givenness (due to the change in grammatical role), the key observation (discussed more in the results section) is that the difference between our new and given information conditions is reflected in participants’ prosody. Each condition contained nine items, three in each of the three tone combinations. (See Appendix for the list of target words.) There were 18 target trials, each including a non-corrective target sentence and a corrective target sentence. The dependent variables that we measured were the duration, F0 range and intensity range of the target word region in a target sentence. In the target trials, either three or four of the objects were mentioned in a particular trial. The extra two to three pictures in a display were presented to ensure that participants could not predict which picture was going to be involved in the next instruction. The experiment also included 36 filler trials, which differed from the target trials along one or more of the following parameters: the number of sentences in a trial, whether and in which sentence a wrong moving event occurs, on which noun the correction needs to be made in a corrective sentence, and the lexical tones of the nouns. Among all the trials, any two of the nouns cooccurred less than four times, and any three of the nouns co-occurred less than once in a trial. The tonal combinations used in target words (HH, HL and LH) appeared 56 times each, and the tonal combination which was only used in filler words (LL) appeared 48 times. All the words (i.e. names of the pictures) were concrete nouns that denoted movable objects and had a frequency no higher than 15.23 counts per million according to Cai and Brysbaert (2010). We controlled for various factors including the number of syllables, the combination of tones, semantic properties and word frequency. This placed severe constraints on the choice of words, and thus voiceless and sibilant 63 consonants in the words could not be entirely avoided. The positions and directions of arrows on the displays were counterbalanced across trials. Procedure and participants Participants were told to give instructions to move objects based on the pictures and arrows on the computer screen, to check whether their instructions were carried out correctly, and to provide a correction if their instructions were not followed. They were asked to only use the sentence frame in (10) during the entire experiment, and to speak as naturally as possible. Participants were told to imagine that they were speaking to a person in another room, in front of another computer connected to the participants’ computer, and that the listener, who might sometimes get distracted and make mistakes, would move the objects according to the participants’ instructions. This was done in order to make the task as natural as possible; our assumption was that people would be most likely to mark information-structural cues in their prosody in a communicative situation. Ten adult native speakers of Mandarin, five women and five men were participated. All were either born in Beijing or had lived in Beijing since age 13 or younger. All of them were students or visiting scholars at University of Southern California who left Beijing no longer than 2 years before. The participants received $10 for their participation. Data analysis Acoustic analyses were done using the Praat software with the ProsodyPro script (Xu, 20052011). Duration, F0 and intensity were extracted by the script. Repeated measure ANOVAs and paired t-tests were conducted on the duration, F0 ranges (maximum F0 minus minimum F0) and intensity ranges (maximum intensity minus minimum intensity) of target words.4 All ANOVAs presented in this paper had correctiveness (correction or non-correction) and givenness (given or new) as independent variables. In the by-subject analyses, the tonal combination of each target word (HH, HL or LH) was also included as a control variable.5 To make sure that our data or conclusions are not distorted by the occurrence of creaky voice, we manually removed the markings for aperiodic pulses (resulting from irregular vibration of vocal folds) before the F0 values were computed. This is because pitch tracking for aperiodic waveforms is often inaccurate, which could potentially impact the analysis of F0, as low tones prevalently bring about creakiness in tone languages (e.g. Yoruba: Welmers, 1973, p. 109; Downloaded by [USC University of Southern California] at 13:44 29 May 2016 64 I.C. Ouyang and E. Kaiser appear in the corrective conditions. The observations are confirmed by statistical analyses: ANOVAs show a main effect of correctiveness (F(1,9) 20.020, pB0.01), with corrective conditions showing significantly longer duration than non-corrective conditions, and no main effect of givenness (F(1,9) 2.189, p 0.173). There is a significant interaction between correctiveness and givenness (F(1,9) 6.260, p B0.05). More specifically, planned comparisons reveal that the correctiveness effect on duration occurs in both the new information conditions (corrective new has longer duration than non-corrective new: t(9) 4.177, p B0.01) and the given information conditions (corrective given has longer duration than non-corrective given: t(9) 4.641, pB0.01), but the givenness effect on duration emerges only when the words are non-corrective (noncorrective new has longer duration than non-corrective given: t(9) 3.333, pB0.01) and not when the words are corrective (corrective new does not differ from corrective given: t(9)0.331, p 0.748). In sum, while non-corrective words show an effect of givenness, no effect of givenness is detected on corrective words. Cantonese: Vance, 1977; Mandarin: Belotel-Grenié & Grenié, 1994). To minimise this problem, all of our F0 analyses focus on the non-creaky portions. Since the minimum F0 in the HL and LH tonal combinations appears during the low tone component, our data might not accurately reflect the actual F0 ranges. In a HL or LH word that contained a creaky region, the actual minimum F0 could be lower than what was measured, and if so, the F0 range calculated by deducting minimum F0 from maximum F0 would be smaller, and the effects of information structure on F0 ranges would be consequently underestimated. Thus, creaky voice could potentially obscure effects of information structure on F0 ranges because it may cause the ranges to be underestimated. Nevertheless, as will become clear in the rest of the paper, our results clearly indicate that F0 ranges do play a significant role signalling information structure (see the following subsections). Thus, we do not regard the occurrence of creaky voice as a problem for our findings, as it did not obscure the patterns of F0 ranges in this study. Main results: duration, F0 ranges and intensity ranges In this section, we first present the results for duration of target words. Then we turn to F0 and intensity measures, which we analysed in three different ways: (1) the range of F0 and intensity measures on the target words (difference between maximum and minimum on a given word), (2) the maximum and minimum measures for F0 and intensity and (3) the mean values for F0 and intensity on the target word. F0 ranges Having considered duration, let us now turn to the findings for F0 ranges. Overall, F0 ranges show a similar pattern as duration, as can be seen in Figure 4. Mirroring the results of duration, words in the corrective conditions have larger F0 ranges than words in the non-corrective conditions; given information has smaller F0 ranges than new information in the noncorrective conditions, but this given/new distinction is not present in the corrective conditions. These observations are again confirmed statistically: ANOVAs show a main effect of correctiveness (F(1,9) 22.232, pB 0.01) but no main effect of givenness (F(1,9) 0.749, p0.409). There is a significant interaction between correctiveness and givenness (F(1,9) 5.892, pB0.05). Planned comparisons reveal that the correctiveness effect on F0 ranges occurs in both the new information conditions (corrective new has a larger range than Duration Overall, as can be seen in Figure 3, words in the corrective conditions (the two bars on the left) are longer than words in the non-corrective conditions (the two tall bars on the right). Within the non-corrective conditions, words that are given information have shorter durations than words that are new information, but this distinction between given and new does not Duration (ms) 500 450 400 350 300 Corrective New Figure 3. Corrective Given NonCorrective NonCorrective New Given Average duration of the target words in each condition (error bars show91 SE). Language, Cognition and Neuroscience 65 F0 Ranges (HZ) 80 70 60 50 40 Corrective New Downloaded by [USC University of Southern California] at 13:44 29 May 2016 Figure 4. Corrective Given NonCorrective NonCorrective New Given Average F0 ranges of the target words in each condition (error bars show91 SE). non-corrective new: t(9) 3.536, p B0.01) and the given information conditions (corrective given has a larger range than non-corrective given: t(9) 5.059, p B0.01). However, the givenness effect on F0 ranges emerges only when the words are non-corrective (noncorrective new has a bigger range than non-corrective given: t(9)3.348, pB0.01) and not when they are corrective (non-corrective new does not differ from non-corrective given: t(9) 0.669, p 0.521). Intensity ranges Finally, let us move on to the findings for our third parameter, intensity ranges, shown in Figure 5. Interestingly, the intensity range patterns differ from the patterns observed for F0 ranges and duration. On one hand, the distinction between corrective and noncorrective conditions remains: words in the corrective conditions have larger intensity ranges than words in the non-corrective conditions. However, new information does not differ from given information on intensity ranges, in either the corrective or non-corrective conditions. ANOVAs show a main effect of correctiveness (F(1,9) 9.659, pB0.05). There is no main effect of givenness (F(1,9) 0.130, p 0.727) and no interaction between correctiveness and givenness (F(1,9) 0.563, p 0.472). Further analyses of F0 and intensity ranges: F0 and intensity maxima and minima In contrast to prior work where the distinction between correction and new information was only found in one prosodic dimension (duration in Greif, 2010; F0 ranges in Chen & Braun, 2006), the results presented in the preceding sections show that corrective focus and newinformation focus differ in all three prosodic dimension. However, in order to better understand the underlying nature of the F0 and intensity ranges, further analyses are needed. As mentioned in the ‘aims’ section, F0 ranges or intensity ranges are not a single parameter in and of themselves. To understand how F0 and intensity ranges are altered acoustically, we need to examine their ‘components’ the maxima and minima of F0 and intensity. Inspecting the maxima and minima allows us to see how range expansion is accomplished, e.g. (1) by raising the maximum, (2) lowering the minimum or (3) both? (Figure 1). We aim to answer two questions: First, are F0 range expansion and intensity range expansion achieved in the same way? Second, do new-information focus and contrastive focus involve different ‘strategies’ of range expansion for either F0 or intensity? In this section, we present the results of maximum and minimum F0, and maximum and minimum intensity. Intensity Ranges (dB) 23 21 19 17 15 Corrective New Figure 5. Corrective Given NonCorrective NonCorrective New Given Average intensity ranges of the target words in each condition (error bars show91 SE). 66 I.C. Ouyang and E. Kaiser Max & Min F0 (Hz) 230 Max Min 210 190 170 150 130 Corrective New Downloaded by [USC University of Southern California] at 13:44 29 May 2016 Figure 6. Corrective Given NonCorrective NonCorrective New Given Average maximum and minimum F0 of the target words in each condition (error bars show91 SE). Maximum and minimum F0 Breaking F0 ranges down into maximum and minimum F0, we see that different types of information structure enlarge F0 ranges through different means. As shown in Figure 6, words in the corrective conditions have overall higher maximum F0 and lower minimum F0 than words in the non-corrective conditions in other words, the increased F0 range that we observe for correction is accomplished by both raising the maximum and lowering the minimum. Taking a closer look at the conditions, we see that the corrective given and corrective new conditions do not differ from each other in terms of their F0 maxima or minima (jt(9)js B 0.643, p’s0.536). However, when we look at the noncorrective conditions, we see that non-corrective given and non-corrective new differ in terms of their maximum F0 (higher for new information) but not in terms of their minimum F0. Thus, the F0 range expansion that we reported above for words that are new information in non-corrective contexts is accomplished by raising the maximum F0 without changing the minimum F0. The statistical analyses confirm these observations to a large extent. ANOVAs show a main effect of correctiveness on maximum F0 (F(1,9) 27.844, pB 0.01) and minimum F0 (F(1,9) 7.415, pB0.05), with maximum F0 being significantly higher and minimum F0 significantly lower in the corrective words than the non-corrective words. Somewhat unexpectedly, there is no significant interaction between correctiveness and givenness in either maximum F0 (F(1,9) 2.994, p 0.118) or minimum F0 (F(1,9) 0.802, p0.394). Nevertheless, given and new information in the noncorrective conditions differ in maximum F0 by 5 Hz, which is in magnitude the same as the statistically significant difference in minimum F0 between corrective and non-corrective words. Maximum and minimum intensity Turning now to the question of intensity ranges, we find a very different pattern: in contrast to F0, maximum intensity stays the same among different types of information structure, as indicated in Figure 7. The presence and absence of correction is reflected only in minimum intensity: minimum intensity is lower in corrective words than non-corrective words. ANOVAs show a main effect of correctiveness on minimum intensity (F(1,9) 9.013, pB0.05) but not on maximum intensity (F(1,9) 0.059, p0.813). Consistent with the patterns of intensity ranges, there is no interaction between correctiveness and givenness in Max & Min Intensity (dB) Max 80 Min 75 70 65 60 55 50 Figure 7. Corrective New Corrective Given NonCorrective NonCorrective New Given Average maximum and minimum intensity of the target words in each condition (error bars show91 SE). Language, Cognition and Neuroscience Downloaded by [USC University of Southern California] at 13:44 29 May 2016 either minimum intensity (F(1,9) 1.119, p0.318) or maximum intensity (F(1,9) 0.802, p 0.394). In sum, we find that the intensity range expansion for F0 and for intensity is accomplished in different ways, with intensity range expansion accomplished simply by lowering the minimum intensity and F0 range expansion showing a more complex pattern: the F0 range expansion observed for corrective focus is accomplished by both raising the maximum and lowering the minimum, whereas the F0 range expansion observed for non-corrective new-information focus is accomplished by raising the maximum F0 without changing the minimum F0. Mean F0 and mean intensity Having inspected the ranges of F0 and intensity and their maxima and minima, we now look at the means of F0 and intensity. In the preceding section, we saw that only in some contexts did range expansion involve both raising maxima and lowering minima (i.e. F0 range expansion for corrective focus). In other contexts where the ranges were expanded by only raising the maxima (i.e. F0 range expansion for new-information focus) or only lowering minima (i.e. intensity range expansion for new-information focus), means could potentially also reflect information-structure. Somewhat surprisingly, mean F0 and mean intensity do not robustly differ between information-structural types. Despite a main effect of correctiveness on mean F0 (F(1,9) 12.244, pB0.01), the difference in mean F0 is significant only between the given information conditions (i.e. corrective given is significantly higher than non-corrective given: t(9)4.307, p B0.01) but marginal between the new information conditions (i.e. corrective new is marginally higher than non-corrective new: t(9)1.858, p 0.096). There is no main effect of givenness on mean F0 (F(1,9) 0.616, p0.453). Also, neither correctiveness (F(1,9) 0.110, p 0.748) nor givenness (F(1,9) 0.486, p 0.503) has a main effect on mean intensity. In sum, we find that unlike ranges, maxima and minima means of F0 and intensity do not provide reliable cues about information status in Mandarin. This finding highlights the importance of a comprehensive analysis of prosodic features; in our case, the major cues for information-structural distinctions would have been overlooked if one had only analysed the means of F0 and intensity. Results regarding lexical tone combinations As mentioned in the design section, we used bisyllabic target words in one of three lexical tonal combinations: 67 HH, HL or LH. These tonal combinations were equally distributed among the target words and the different conditions, to ensure that our conclusions would not be restricted to a single tonal combination type. Because we controlled for factors such as word frequency and semantic properties, the identity of the segments in target words was not controlled across tonal combinations. This means that prosodic differences between tonal combinations may come from segmental variance rather than tonal properties. Thus, any comparison between tonal combinations must be viewed cautiously. The analyses in this section are included for the sake of completeness, but the reader should keep in mind that these are post-hoc analyses and the study was not designed with these analyses in mind (due to the variability in the segmental properties of the target words). When we look at the prosodic features within each tonal combination, we see that the majority of the patterns discussed in the results section emerges within each tonal combination as well. We first consider the corrective vs. non-corrective manipulation, and then the given vs. new manipulation. Corrective vs. non-corrective manipulation Paired t-tests were conducted comparing corrective new vs. non-corrective new for all three tonal combinations, and corrective given vs. non-corrective given for all three tonal combinations. Overall, the effects of correctiveness in both new and given information that we observed in the preceding sections appear in all tonal combinations, although some of them do not reach significance. The absence of significance is not surprising, given (1) the fact that identity of the segments in target words was not controlled across tonal combinations and (2) the reduction in power that comes from looking at a third of the entire dataset (since there are three tonal combinations), and even less when we split it into given vs. new and corrective vs. non-corrective. Nonetheless, the distribution of significance among the conditions shows compatible patterns with prior work that examines the dependence of focusdriven prosody on lexical tones (Chen & Gussenhoven, 2008). In fact, while all tonal combinations are significantly lengthened in corrective focus, only the HL combination robustly differs in the F0 and intensity dimensions between corrective and noncorrective conditions. The properties of tones seem to impose greater restrictions on LH and HH, than on HL, in terms of the extent to which F0 and intensity can be altered to encode information structure (see Chen & Gussenhoven, 2008 for relevant further discussion). Downloaded by [USC University of Southern California] at 13:44 29 May 2016 68 I.C. Ouyang and E. Kaiser Duration: corrective words are longer (t(9)s3.707, p’sB0.01) in all tonal combinations than non-corrective words. F0 ranges: corrective words have larger F0 ranges (t(9)s2.659, p’sB0.05) than non-corrective words (except corrective new vs. non-corrective new in LH, which is marginal, p0.083, and corrective new vs. non-corrective new in HH which is ns). Intensity ranges: corrective words have larger intensity ranges than non-corrective words (t(9)s 2.599, p’s B0.05; except for corrective given vs. non-corrective given in LH which is marginal, p0.057, and corrective new vs. non-corrective new in HH which is ns). F0 maxima: corrective words have higher F0 maxima than noncorrective words (t(9)s2.986, p’sB0.05, except for corrective new vs. corrective given in LH which is ns). F0 minima: corrective words have numerically lower F0 minima than non-corrective words (ns). Intensity maxima: corrective words have numerically higher intensity maxima (corrective new vs. non-corrective new and corrective given vs. non-corrective given for HH reach significance, p’sB0.05). Intensity minima: corrective words have lower intensity minima (t(9)s B2.522, p’sB0.05; except for corrective new vs. non-corrective new LH which is marginal, p 0.052, and corrective new vs. non-corrective new HH which is ns). Given vs. new manipulation Paired t-tests were conducted comparing corrective new vs. corrective given for all three tonal combinations, and non-corrective new vs. non-corrective given for all three tonal combinations. Numerically, the prosodic properties of the three tonal combinations largely mirror the data pattern presented in the preceding section, although the analyses do not reach significance6, which as discussed above for the corrective manipulation is not surprising. The descriptive statistics are largely consistent with our previous observations: within a tonal combination, most of the numerical differences between non-corrective new and non-corrective given conditions are towards the same tendencies as tested in the main analyses. Discussion The study presented in this paper investigates the prosodic cues for two kinds of distinctions between discourse-information structures in Beijing Mandarin: the presence or absence of corrective focus, and the new vs. given distinction. Although existing findings on Mandarin prosody generally agree that new-information focus and corrective focus both involve increases in F0 displacement and duration (relative to given words or words in broad focus), there is as of yet no consensus about whether and how the two focus types new- information focus and corrective focus differ from each other. The role of intensity is also not well understood. A better understanding of these issues is important because they are involved in fundamental questions regarding the relationship of information structure and prosody, such as how the prosodic system represents different information-structural categories and lexical contrasts at the same time. In this section, we discuss how our results relate to our three key aims sketched out at the start of the paper, as well as their broader implications. Our first aim was to investigate whether words in new-information focus vs. corrective focus differ from each other in terms of their prosodic realisation in Mandarin, and if so, which parameters (e.g. duration, mean F0, F0 range or intensity) encode these differences. Our second aim was to explore whether and how the distinction between ‘new’ and ‘given’ influences the acoustic realisation of both corrective and noncorrective words. As regards the first aim, our results suggest that the cues for corrective focus do differ from those signalling the new vs. given distinction. Corrective focus and newinformation focus are distinguished from each other both by the degrees of prominence they induce along the same acoustic dimensions and by the different acoustic dimensions they occupy. On one hand, corrective focus and new-information focus both affect duration and F0, but corrective focus has a stronger impact on these prosodic features than new-information focus. On the other hand, intensity cues only appear for corrective focus, not for new-information focus. Despite the fact that we did not focus on the same kinds of information-structural distinctions, our findings are consistent with Chen and Braun (2006) on the conceptual level. This indicates the full complexity of prosodic structure that allows many informationstructural categories to be encoded distinctively. As regards the second aim, we found that the prosodic distinction between new and given information only emerged in non-corrective words in our study. There are several possible reasons for why correctively focused words do not show a distinction between given and new. Cognitively, correction might be more salient than ‘newness’ (Kaiser, 2011), which could in some sense ‘overwhelm’ the distinction between new and given information in a context where the words are also corrective. In principle, such a ‘ceiling effect’ could also be caused by physiological constraints. Since correction yields extremely strong prosodic prominence even when the information is given, it might be difficult or inefficient to further increase the prominence for new information. However, a prior study has found duration lengthening for corrective focus in two different emphatic degrees (Chen & Gussenhoven, 2008). This Downloaded by [USC University of Southern California] at 13:44 29 May 2016 Language, Cognition and Neuroscience suggests that physiology is unlikely to be hindering the prosodic realisation of different degrees of emphasis (for related work on speech perception, see Ladd & Morton, 1997). Another explanation is that speakers in our experiment might define givenness from their own perspective (rather than from the hearer’s perspective as intended), which would then effectively remove the distinction between new and given in the corrective focus conditions in our experiment. More specifically, recall that target sentences in the corrective conditions had been uttered by the speaker, although the listener apparently did not hear them properly the first time. If speakers fail to keep a log of the listener’s knowledge state, then the prosodic differences between new and given information in non-corrective conditions might actually reflect whether the words had been uttered more than once, rather than whether the listener had heard the words (Lam & Watson, 2010; Watson et al., 2008a). Lastly, as discussed in the method section, existing work has shown that given information is prosodically prominent when it appears in a different syntactic role (Schwarzschild, 1999; Terken & Hirschberg, 1994). In our study, a target word in the given-information conditions first occurs in the LOCATION role and then in the OBJECT role. Although we did find prosodic differences between given information and new information when there is no correction, the degree of givenness might not be large enough for given information to be substantially de-accented in corrective focus due to the change in syntactic roles. These are intriguing questions that deserve to be investigated further in future work. Our third aim was to provide a more inclusive analysis of the acoustic parameters relevant to information structure, including analysis of intensity as well as a closer look at how F0 range and intensity expansion is accomplished (e.g. lowering minima, raising maxima or both). Regarding intensity, our results show that it can provide information-structural cues, though to a limited extent: while contrastive focus results in increased intensity range expansion, newinformation focus does not do so. Regarding range expansion, we found that intensity range expansion for F0 and for intensity were accomplished via multiple routes. As illustrated in Figure 1, the two parameters involved in the expansion of ranges maximum and minimum potentially form three ways of achieving range expansion: (1) raising the maximum and lowering the minimum, (2) only raising the maximum and (3) only lowering the minimum. Intuitively, one might expect that extending both the upper and lower bound of a range (option (a)) would require the least articulatory effort while accomplishing the largest range expansion, so this pattern might be 69 the most widely used. However, analyses of the production data from our experiment reveal that both maximum and minimum were employed, and all three possible ways of expanding ranges emerged in different portions of our data. Recall that F0 ranges were extended for both corrective focus and new-information focus, whereas intensity ranges were extended only for corrective focus. We found that corrective focus expands F0 ranges by raising the maximum and lowering the minimum (option (a)), new-information focus expands F0 ranges by only raising the maximum (option (b)), and corrective focus expands intensity ranges by only lowering the minimum (option (c)). Our findings rule out a simple hypothesis that the semantically or pragmatically prominent words are merely spoken slower, louder and with higher pitch. Indeed, the duration of the words does become longer, but F0 range expansion during a focused word results from extending not only the upper bound but also the lower bound of the range. Moreover, the expansion of intensity ranges is due to a decrease in intensity during some part of the word, rather than an increase. In earlier work, Chen and Gussenhoven (2008) found that the expansion of F0 ranges for information structure in Mandarin is mainly accomplished by raising maximum F0, which is somewhat inconsistent with our findings. As pointed out by Chen and Gussenhoven (2008), this might have to do with the fact that F0 lowering in a low lexical tone leads to serious creakiness, which makes it difficult to assess the effect of emphasis on F0 minimum. In the section on ‘Results regarding lexical tone combinations’, our analyses of F0 minimum also show no significant difference between information-structural conditions within a tonal combination. However, a possibility that cannot be excluded is that the cues for range expansion do not necessarily exist at both sides of the range; we leave the question open for future work. Let us now briefly consider the fact that in a tone language like Mandarin, discourse-level intonation and lexical tones potentially occupy the same acoustic dimensions in tone languages. Existing work on Mandarin has found that all three prosodic dimensions duration, F0 and intensity provide cues for discourse-level information, as well as make contrasts (i.e. lexical tones) between word meanings. Largely consistent with prior studies (Chen, 2006; Chen & Braun, 2006; Chen & Gussenhoven, 2008; Greif, 2010; Jin, 1996; Xu, 1999)7, we found lengthening and F0 range expansion in corrective focus and new-information focus. Furthermore, our results show that intensity ranges may also be expanded to emphasise words in an utterance: intensity excursions become larger when the speakers express a correction. In other words, there is no evidence for specialised functions where some Downloaded by [USC University of Southern California] at 13:44 29 May 2016 70 I.C. Ouyang and E. Kaiser prosodic dimensions mark information structure and others mark lexical items, e.g. it is not the case that intensity ranges are used to mark only lexical contrasts, whereas F0 ranges are used to mark only informationstructural distinctions. All three prosodic dimensions are multi-functional. Our findings are also compatible with earlier observations regarding the manner in which prosodic cues in Mandarin encode discourse-information structure and lexical distinctions in particular, the idea that these two kinds of information are encoded differently: the latter has to do with the shapes of F0 movement and intensity movement, and the former with the ranges of their movement. Earlier work has pointed out that, for different lexical tones, the shapes of F0 contours clearly differ, whereas with informationstructural types, what vary are the ranges of F0 contours (Chen & Gussenhoven, 2008; Xu, 1997). Whalen and Xu (1992) suggest that F0 and intensity are positively correlated in lexical tones, which enables Mandarin speakers to perceive tones without the presence of contrastive F0 patterns. Indeed, if one inspects the contours of F0 and intensity in our data, both their shapes considerably differ between tonal combinations while staying similar across different conditions of information structure. Given our results showing that intensity ranges are used to differentiate information-structural types, there appear to be parallels between F0 and intensity in the specialisation of parameters. Lexical information is encoded by the shapes of F0 and intensity contours, whereas discourse information is marked by the ranges of F0 excursions and, as indicated by our findings, the ranges of intensity excursions. This highlights the fine-grained ability of the language production system to utilise different aspects of acoustic dimensions. Conclusions On the basis of the production study reported in this paper, we can draw two main conclusions. First, our findings provide further evidence for the multifunctionality of acoustic-prosodic dimensions. Even in a language with lexical tones, which differ in F0, intensity and duration, all these dimensions nevertheless also encode information structure (see also Chen & Gussenhoven, 2008). Nevertheless, since the shapes of F0 and intensity contours mark lexical items (Whalen & Xu, 1992), these two prosodic dimensions are modulated in another way to mark discourse importance. Parallel to what has been found in F0, the ranges of which provide cues for focus (e.g. Jin, 1996), our results clearly show that the ranges of intensity encode discourse information as well. Second, not only can prosodic cues indicate discourse importance, they also distinguish different kinds of information-structural distinctions in Mandarin (see also Chen & Braun, 2006). Lengthening and F0 range expansion occur in both corrective focus and new-information focus, whereas intensity range expansion only appears on correctively-focused words, regardless of their givenness. Taken together, our results show that even in a tone language where the dimensions of F0, duration and intensity are used to mark lexical distinctions prosodic information is nevertheless a rich source of cues about information structure, including different subtypes of focus. Our findings highlight the fine-grained ability of the language production system to utilise different aspects of acoustic dimensions with great efficiency. Acknowledgements Earlier version of this work were presented at AMLaP (Architectures and Mechanisms for Language Processing) 2011, ETAP (Experimental and Theoretical Advances in Prosody) 2, WECOL (Western Conference on Linguistics) 2011 and the Annual Meeting of the Linguistic Society of America in 2012; we would like to thank the audience members for the valuable comments and suggestions. Thanks also go to the USC Language Processing Lab group for feedback during the development of this project. We would also like to thank the editors of ‘Language and Cognitive Processes’ and two anonymous reviewers for their helpful comments. Notes 1. 2. Greif (2010) compared narrow new-information focus with two types of corrective focus: semantic correction and pragmatic correction. Due to length reasons, we will not discuss Greif (2010)’s findings for what he calls ‘pragmatic correction’, since only semantic correction is similar to the corrective focus that we investigated. Chen and Braun (2006)’s study is important, as it is the first prosodic investigation of information structure in Mandarin that investigated different informationstructural categories in a way that closely tied them to theoretical work. Their information-structural notions and terminology are based on Steedman (2000): Theme, Rheme, Background and Focus. In our discussion, we refer to their ‘normal rheme focus’ and ‘corrective rheme focus’ as ‘narrow focus’ and ‘corrective focus’ for the sake of consistency. More specifically, Chen and Braun (2006) examine four categories of discourse information: theme background, theme focus, rheme background and rheme focus. Following Steedman (2000)’s framework, these four categories are based on two layers of information structure: a primary distinction between rheme and theme, and a secondary distinction between focus and background. While ‘rheme’ and ‘theme’ Downloaded by [USC University of Southern California] at 13:44 29 May 2016 Language, Cognition and Neuroscience 3. 4. 5. 6. 7. roughly correspond to the new and given information as defined in our study, the division between ‘focus’ and ‘background’ is based on prosody focus is intonationally marked and background is not. Chen and Braun (2006) find that both rheme and focus are marked by lengthening and F0 range expansion, and that the distinction between rheme and theme is prosodically more prominent than the distinction between focus and background. Additionally, and most relevantly for us they look into two subtypes of rheme focus (essentially, new-information focus and corrective focus), and find that correctively-focused rhemes have bigger F0 ranges than non-correctively-focused rhemes, but do not differ in duration a finding which contrasts with Greif (2010). As a whole, Chen and Braun (2006)’s findings suggest that different information-structural distinctions can be encoded in the same prosodic dimensions with different degrees of prominence, as well as reflected in different prosodic dimensions. For the verb ‘put’, the variant fang is also possible, in addition to fang-dao and fang-zai. These forms are interchangeable across speakers in this context. Participants were asked to use the one most natural to them; only one participant used the short form fang. Three sentences are missing from the recordings due to technical problems, and two sentences were misspoken. They amount to 1.39% of the data. In this paper, we focus on the by-subject analyses because the design of the study does not allow for by-item analyses of all targets, as the nine target words were ‘cycled’ through the 36 target items. However, if one analyses the nine target words in all four conditions, the statistical patterns closely resemble the by-subject analyses. The analysis comparing given vs. new words in the noncorrective condition do not reach significance, except for the analysis of intensity range: for words in the HH tone group, non-corrective new information has a larger intensity range than non-corrective given information (t(9)2.519, p B0.05). For corrective focus, Chen and Braun (2006) did not find lengthening and Greif (2010) did not find F0 range expansion, as discussed in the introduction. References Arnold, J. E., Kahn, J. M., & Pancani, G. (2012). Audience design affects acoustic reduction via production facilitation. Psychological Bulletin & Review, 19(3), 505512. doi:10.3758/s13423-012-0233-y Bauer, R. S., Cheung, K.-H., Cheung, P.-M., & Ng, L. (2004). Acoustic correlates of focus-stress in Hong Kong Cantonese. Papers from the Eleventh Annual Meeting of the Southeast Asian Linguistics Society. Arizona State University, Program for Southeast Asian Studies. Belotel-Grenié, A., & Grenié, M. (1994). Phonation types analysis in Standard Chinese. Proceedings of the 3rd International Conference on Spoken Language Processing (ICSLP 1994), Yokohama, Japan. Birner, B. J., & Ward, G. (1998). Information status and noncanonical word order in English. Amsterdam/Philadelphia: John Benjamins. Breen, M., Fedorenko, E., Wagner, M., & Gibson, E. (2010). Acoustic correlates of information structure. Language and Cognitive Processes, 25, 10441098. doi:10.1080/01690965.2010.504378 Brown, G. (1983). Prosodic structure and the given/new distinction. In A. Cutler & R. Ladd (Eds.), Prosody: Models and measurements (pp. 6777). Berlin: Springer Verlag. 71 Cai, Q., & Brysbaert, M. (2010). SUBTLEX-CH: Chinese word and character frequencies based on film subtitles. Plos ONE, 5(6), e10729. doi:10.1371/journal.pone.0010729 Calhoun, S. (2006). Information structure and the prosodic structure of English: A probabilistic relationship. Edinburgh: University of Edinburgh dissertation. Chen, Y. (2006). Durational adjustment under corrective focus in Standard Chinese. Journal of Phonetics, 34(2), 176201. doi:10.1016/ j.wocn.2005.05.002 Chen, Y., & Braun, B. (2006). Prosodic realization in information structure categories in standard Chinese. In R. Hoffmann & H. Mixdorff (Eds.), Speech prosody 2006. Dresden: TUD Press. Chen, Y., & Gussenhoven, C. (2008). Emphasis and tonal implementation in Standard Chinese. Journal of Phonetics, 36, 724746. doi:10.1016/j.wocn.2008.06.003 Chen, S.-W., Wang, B., & Xu, Y. (2009). Closely related languages, different ways of realizing focus. Proceedings of Interspeech 2009, Brighton, UK. Cooper, W., Eady, S., & Mueller, P. R. (1985). Acoustical aspects of contrastive stress in question-answer contexts. Journal of the Acoustical Society of America, 77, 21422156. doi:10.1121/1.392372 Dahan, D., Tanenhaus, M. K., & Chambers, C. G. (2002). Accent and reference resolution in spoken-language comprehension. Journal of Memory and Language, 47, 292314. doi:10.1016/S0749-596X(02) 00001-3 Dik, S. (1997). The theory of functional grammar. Berlin: Mouton De Gruyter. Féry, C., Kaiser, E., Hörnig, R., Weskott, T., & Kliegl, R. (2009). Perception of intonational contours on given and new referents: A completion study and an eye-movement experiment. In P. Boersma & S. Hamann (Eds.), Phonology in perception (pp. 235266). Berlin: Mouton de Gruyter. Fowler, C. A., & Housum, J. (1987). Talkers’ signaling of ‘‘new’’ and ‘‘old’’ words in speech and listeners’ perception and use of the distinction. Journal of Memory and Language, 26, 489504. doi:10.1016/0749-596X(87)90136-7 Greif, M. (2010). Contrastive focus in Mandarin Chinese. Proceedings of Speech Prosody 2010, Chicago. Gussenhoven, C. (1983). Focus, mode, and nucleus. Journal of Linguistics, 19, 377417. doi:10.1017/S0022226700007799 Hartmann, K. (2007). Focus and tone. In C. Féry, G. Fanselow, M. Krifka (Eds.), Interdisciplinary studies on information structure 6 (pp. 221235). Potsdam: Universitätsverlag Potsdam. Ito, K., & Speer, S. R. (2006). Using interactive tasks to elicit natural dialogue. In S. Sudhoff et al. (Eds.), Methods in empirical prosody research (pp. 229258). Berlin: Mouton de Gruyter. Jannedy, S. (2007). Prosodic focus in Vietnamese. In S. Ishihara, S. Jannedy, & A. Schwarz. (Eds.), Interdisciplinary studies on information structure (Vol. 8, pp. 209230). Potsdam: Universitätsverlag Potsdam. Jin, S. (1996). An acoustic study of sentence stress in Mandarin Chinese (Doctoral dissertation). Columbus, OH: Ohio State University. Kaiser, E. (2011). Focusing on pronouns: Consequences of subjecthood, pronominalization and contrastive focus. Language and Cognitive Processes, 26, 16251666. doi:10.1080/01690965.2010. 523082 Kaiser, E., & Trueswell, J. (2004). The role of discourse context in the processing of a flexible word-order language. Cognition, 94(2), 113 147. doi:10.1016/j.cognition.2004.01.002 Katz, J., & Selkirk, E. (2011). Contrastive focus vs. discourse-new: Evidence from phonetic prominence in English. Language, 87, 771 816. doi:10.1353/lan.2011.0076 Ladd, D. R. (1980). The structure of intonational meaning: Evidence from English. Bloomington: Indiana University Press. Downloaded by [USC University of Southern California] at 13:44 29 May 2016 72 I.C. Ouyang and E. Kaiser Ladd, D. R., & Morton, R. (1997). The perception of intonational emphasis: Continuous or categorical? Journal of Phonetics, 25, 313 342. doi:10.1006/jpho.1997.0046 Lam, T. Q., & Watson, D. G. (2010). Repetition is easy: Why repeated referents have reduced prominence. Memory and Cognition, 38, 11371146. doi:10.3758/MC.38.8.1137 Pierrehumbert, J., & Hirschberg, J. (1990). The meaning of intonational contours in the interpretation of discourse. In P. R. Cohen, J. Morgan, & M. E. Pollack (Eds.), Intentions in communication (pp. 271311). Cambridge, MA: MIT Press. Prince, E. (1992). The ZPG letter: Subjects, definiteness, and informationstatus. In S. Thompson & W. Mann. (Eds.), Discourse description: Diverse analyses of a fund raising text (pp. 295325). Philadelphia/ Amsterdam: John Benjamins. Rosa, E. C., Finch, K., Bergeson, M., & Arnold, J. E. (2013). The effects of addressee attention on prosodic prominence. Language and Cognitive Processes. doi:10.1080/01690965.2013.772213 Schwarzschild, R. (1999). GIVENness, AvoidF and other constraints on the placement of accent. Natural Language Semantics, 7(2), 141 177. doi:10.1023/A:1008370902407 Steedman, M. (2000). Information structure and the syntaxphonology interface. Linguistic Inquiry, 31, 649689. doi:10.1162/ 002438900554505 Terken, J., & Hirschberg, J.(1994). Deaccentuation of words representing ‘given’ information: Effects of persistence of grammatical function and surface position. Language and Speech, 37(2), 125145. doi:10.1177/002383099403700202 Vance, T. J. (1977). Tonal distinctions in Cantonese. Phonetica, 34, 93 107. doi:10.1159/000259872 Watson, D. G. (2010). The many roads to prominence: Understanding emphasis in conversation. The Psychology of Learning and Motivation, 52, 163183. doi:10.1016/S0079-7421(10)52004-8 Watson, D., Arnold, J. A., & Tanenhaus, M. K. (2008a). Tic Tac TOE: Effects of predictability and importance on acoustic prominence in language production. Cognition, 106, 15481557. doi:10.1016/ j.cognition.2007.06.009 Watson, D. G., Tanenhaus, M. K., & Gunlogson C. A. (2008b). Interpreting pitch accents in online comprehension: H* vs. LH*. Cognitive Science, 32, 12321244. doi:10.1080/03640210802138755 Welmers, W. E. (1973). African language structures. Berkeley: University of California Press. Whalen, D. H., & Xu, Y. (1992). Information for Mandarin tones in the amplitude contour and in brief segments. Phonetica, 49, 2547. Xu, Y. (1997). Contextual tonal variations in Mandarin. Journal of Phonetics, 25, 6183. Xu, Y. (1999). Effects of tone and focus on the formation and alignment of f0 contours. Journal of Phonetics, 27, 55105. Xu, Y. (20052011). ProsodyPro.praat. Retrieved from http://www. phon.ucl.ac.uk/home/yi/ProsodyPro/. Zerbian, S., Genzel, S., & Kugler, F. (2010). Experimental work on prosodically-marked information structure in selected African languages (Afroasiatic and Niger-Congo). Proceedings of Speech Prosody 2010, Chicago. Appendix 1. Target words Tone HH (highhigh) HL (highlow) LH (lowhigh) Word xiang.yan wu.ya qing.wa qiu.yin ying.wu ban.ma gui.wu yu.yi hai.ou ‘cigarette’ ‘crow’ ‘frog’ ‘earthworm’ ‘parrot’ ‘zebra’ ‘ghost house’ ‘raincoat’ ‘seagull’
© Copyright 2025 Paperzz