Protocol for manual verification of automatically generated word segmentations D. Binnenpoorte, 29-01-02 (+ additions 15-01-04) A. General A.1 Aim The aim is to separate words from each other in continuous speech by placing boundaries in such a way that the individual words sound acceptable acoustically. This means that the words concerned should be auditively recognizable as such. Experience shows that short words such as ‘de’, ‘en’, ‘ik’, ‘met’, isolated in continuous speech can barely be recognized. These kinds of words will then not meet the criterion of ‘acoustically acceptable’, but you should nevertheless always try to place the boundaries as well as possible. The function behind placing word boundaries is so that they can serve as anchor points in time. The final user of the corpus should be able to give a search instruction in which, for example, a word is specified. The anchor points mark the beginning and the end of the signal. The task of the manual aligner is to check the automatically generated word boundaries according to the following principles and correct them where necessary. A.2 Basic assumptions All speech material has been both orthographically and phonetically transcribed. Both transcriptions have been produced by people and form the basis for the word boundaries which have already been placed automatically. In order to generate these, automatic speech recognition has been used. The boundaries found by automatic speech recognition must therefore be checked manually and adjusted where necessary. The program used is Praat. In Figure 1 an example of a work screen from the Praat program is shown, in which the following terms occur: - Orthographic transcription with boundaries (in upper tier) gives the spelling form of the words which have been uttered; - Phonetic transcription with boundaries (second tier) shows how the words have been realized, using a set of phonetic symbols which indicate their pronunciation. - The vertical lines between the words in both tiers are the segment boundaries, or word boundaries. If necessary, these boundaries must be moved by a manual aligner. - Above the tiers is the oscillogram, a representation of the speech signal. - With the help of the selection buttons screen size can be determined. In Figure 1 just over a minute of a longer utterance has been selected. 1 oscillogram word boundaries orthographic tier with boundaries phonetic tier with boundaries duration of the screen selection buttons Figure 1 Praat work screen for verification of word alignment A.3 What are words? Words are all entities which in orthography and spelling have been separated from each other by means of a space. The orthography of the CGN also contains entities, codes, such as ‘ggg’ for speaker sounds, or ‘xxx’ for incomprehensible speech. These, too, are words, if they can be separated from other entities by a space. However, these codes can also occur within words. In that case, they should not be regarded as words, obviously. You can read more about this in C.5 and C.6. A rule of thumb for the verification of automatically detected word segmentations is that all words are separated from each other by boundaries. Therefore there are boundaries between all words. There are two exceptions to this rule. 1. Firstly, a pause or silence can occur between words. This pause is then preceded and followed by a boundary, see paragraph C.4. 2. The second exception applies when two successive words share a plosive sound at their boundary, i.e. the first word ends in a plosive sound and the following word begins with the same plosive sound. In that case the shared plosive is also enclosed by boundaries. You can read more about this in section D. B. Types of word boundaries Automatic word boundaries are created when the ASR links the sound symbols from the phonetic transcription to the corresponding parts of the sound signal. For each phoneme (sound symbol) a corresponding small part of a signal is found to which it belongs. The final word boundaries are derived from the phoneme segmentations. These word boundaries are shown by vertical blue lines. We need to check if these boundaries are in the right place. 2 We differentiate between different types of word boundaries or places where they occur perceptively and phonetically: • • • • • The most frequently occurring boundaries are the normal flowing transitions between the words. The end boundary of the first word coincides with the start boundary of the following one. There can be a pause between two words. This could best be shown as a separate segment, to prevent a selected word from being surrounded by too long a pause. Between words sounds (phonemes) can be inserted to ensure flowing speech. For example in: Daarom doe [w] ik het nu. This is shown in the phonetic transcription as: darOm du-w-Ik @t ny. Paragraph D.2 covers this in more depth. Words can share phonemes at their boundaries. This is found where word1 ends with the same sound as word2 begins. For example: …naar Rotterdam….Usually in such cases the [r] sound is not pronounced twice. In the phonetic transcription these cases are shown as nar_rOt@rdAm. Paragraph D.1.2 gives more examples and describes where the word boundaries should be marked in such cases. The other word boundaries can demonstrate assimilation phenomena. This means that sounds influence each other across word boundaries. In the example: Is dit goed? the [s] is influenced by the [d] and pronounced as a [z] as is shown phonetically in the following: Iz dIt Gut? More explanation on this phenomenon can be found in the section on phonetic background information. C. Basic principles for verification The most important basic principle for verifying the word boundaries is that it should be consistent. This can be achieved by applying a few rules. C.1 Consistency in orthographic transcription Word alignment must stay consistent with the orthographic transcription. This transcription is in the top transcription tier, see paragraph A.2. For every word (including ‘xxx’, ‘ggg’) in the orthographic transcription a segment must be defined in the alignment. Keep the alignment consistent with the orthographic transcription. Hesitation, stuttering and all other interjections are orthographically transcribed. Consider these phenomena as separate words, surrounded by word boundaries. Hesitation in a word should never lead to extra word boundaries, as they do not change the number of orthographic units. 3 Hesitation, for example uh, and other interjections should be separated by segment boundaries. If these speech sounds occur IN a word they should not be separated by segment boundaries. You are not allowed to make any changes to the orthographic transcription. Any errors or oddities should be reported to your project manager. NEVER make any changes to the orthographic transcription. In the orthographic transcription words can be enhanced with so-called star codes. These codes are used in the orthography to differentiate between the following phenomena: *v → words from a foreign language *d → dialect words *z → strong dialect words *n → ‘new’ words *t → new interjections *a → broken off words *u → strange pronunciation or mispronunciations *x → difficult to understand words Never change anything in these codes, but use the extra information when checking the word boundaries. The code and the word it precedes form a single entity. C.2 Consistency in phonetic transcription The phonetic transcription tier, see paragraph A.2, contains a great deal of information on the phenomena, listed in section B, which can occur on word boundaries. For example: (1) a. zE+ hAd@n nits b. zE+ hAd@n_nits c. zE+ hAd@ nits [zij hadden niets] [zij hadden niets] [zij hadden niets] These examples are different versions of the sentence ‘zij hadden niets’, as they have been transcribed by expert listeners from the phonetic transcription group. Example (1)a shows a pronunciation of the sentence in which both the final /n/ in ‘hadden’ as the first /n/ in ‘niets’ are pronounced. The two /n/s are easy to separate. This situation will not occur very often in joined-up speech. Example (1)b is a case of a split phoneme, degemination. The idea is that the phoneme is audible in both words, see section D. Example (1)c shows that the /n/ in ‘hadden’ is not pronounced at all. The boundary between the words here lies after the /@/ and before the /n/ of ‘niets’. The phonetic and orthographic transcriptions given are decisive for the auditive demands placed on the segmented words. The transcriptions have been made by trained transcribers who also have used context information during the generation of the transcriptions. 4 Keep the alignment consistent with the phonetic transcription. The same applies here as with the orthographic transcription; you are not allowed to make any changes to the phonetic transcription. Oddities or errors should be reported to the project manager. NEVER change anything in the phonetic transcription. C.3 Do not move the boundaries unless really necessary The word boundaries which have been generated by the automatic speech recognizer have to be checked. For this each segmented word has to be listened to. If, on the basis of the auditive impression, you feel a word boundary has to be moved, this can only be done by listening to the word before the boundary and the word after it. Once again the criterion for moving a boundary is always the auditive impression. For long words the rule of thumb is: if the word sounds acoustically acceptable, then the boundary is acceptable. For short words it is often the case that moving the boundary makes no difference to the original position. In these cases the boundaries should be left where they are. It is possible to use the ‘undo’ function in Praat to put the boundary back in its previous position. NB: if the boundary is moved twice from its original position, then the ‘ongedaan maken’ (‘undo’) function can only be used to relocate the one before the previous position i.e. not the original position. When carrying out these checks be pragmatic: ‘Good is good’. Don’t move the boundaries unnecessarily over ‘small distances’. C.4 Pauses and other non-transcribed sounds The automatic segmentation also contains ‘silences’ marked as a separate segment. Only move these boundaries if the word before or after the silence doesn’t sound right, or when the socalled ‘silence’ definitely contains a piece of transcribed speech. In exceptional cases it is possible that a ‘silence’ has to be removed completely. If the automatic segmentation has not marked annoying ‘silences’, add them yourself using any empty transcription. Change the automatically identified pauses as little as possible. If annoying non-transcribed sounds have not yet been automatically segmented, add the boundaries for these sounds. 5 Figure 2 illustrates how pauses can occur between words and can be enclosed by segment boundaries. To prevent too many pause segments from being enclosed by boundaries, we have created the rule that pauses shorter than 50 ms (0.050 s) are not shown as a separate segment in automatic segmentation. If, as a result of moving boundaries, a silence or pause becomes shorter than 50 ms, it must be removed. Pauses or silences shorter than 50 ms are not marked. A pause or silence starts once the previous word has been completed. C.5 Segmenting speaker sounds Speaker sounds, such as laughing, yelling, coughing, are indicated in the orthographic transcription- tier as ‘ggg’, whilst in phonetic transcription the ‘#’ sign is used. They must be seen as words, if they are enclosed by spaces. Naturally, they must also be verified. Always determine the start boundary by listening to the previous word and only change its position, if this word does not sound acceptable. In order to determine the end boundary of the speaker sound, the following word must sound good. Therefore, the speaker sound, ‘ggg’, is of minor importance in relation to the surrounding words. So do not move the boundaries enclosing these sounds unnecessarily. Speaker sounds outside of the words are separated by segment boundaries. C.6 Segmenting ‘xxx’ In transcriptions ‘xxx’ codes also occur, in phonetic transcription these are indicated as ‘[]’. They indicate one or more successive words which have not been understood properly. Such parts must be interpreted as one word, which should be separated by segment boundaries. Segment incomprehensible speech as one word enclosed by segment boundaries. ‘xxx’ codes can also occur within words. Words in which this occurs must be treated as normal words and be enclosed by boundaries in their entirety. C.7 Example Figure 2 contains a fragment in which most of the phenomena described above occur. Important points are: • The end boundary of word1 overlaps the start boundary of word2. The boundary is in between the two words; • Incomprehensible speech (‘xxx’) is enclosed by word boundaries; • Speaker sounds (‘ggg’) are also enclosed by word boundaries. Both the incomprehensible speech and the speaker sounds are transcribed entities; 6 • a pause (either silence or annoying background noise) starts where the previous word (in this case: ‘ggg’) ends. A pause ends where the next word (in this case: ‘duidelijk’) starts. Figure 2 Examples of pauses and speaker sounds D. Shared phonemes at word boundaries In section B we have briefly discussed which phenomena can occur at word boundaries in joined-up speech. To repeat; in joined-up speech words are not separated by pauses, like spaces in written language, but the sounds continue in an uninterrupted flow. There are, of course, pauses for breathing, or for other reasons which contribute to enhancing communication. The word boundary types which could be problematic are those at which the same (section D) or, the reverse, inserted phonemes (section E) occur. Below we describe how to check word boundaries for each phenomenon. Every type has its own notations and codes, which have already been indicated automatically and are therefore easier to trace. The extra codes serve as pointers; they should not be changed or added. D.1 What are shared phonemes? In joined-up speech assimilation, or ‘adaptation’, of sounds occurs, both within words and at word boundaries. A special case of assimilation at word boundaries is called degemination. This can occur when the final sound of one word is the same as the initial sound of the following word, so that the sound is shared by both words. For example: (2) a. Ik wil naar Rotterdam. b. Hij is zeker vandaag gevallen. c. Hij komt terug. d. Ik heb belachelijk veel te doen. e. Hij wil vast stoppen. f. Het liefst sta ik vroeg op. [shared sonorant] [shared fricative] [shared voiceless plosive] [shared voiced plosive] [multiple shared phoneme sequence] [multiple shared phoneme sequence] 7 Such sounds are often pronounced as a single sound and will therefore probably be difficult to separate. Guidelines for such cases are given below. D.2 Non-plosives as shared phoneme In the case of example (2)b: ‘Hij is zeker vandaag gevallen’ you could easily use roughly the middle of the ‘g’ as the boundary between /vandaag/ and /gevallen/. This is also the case with (2)a: ‘Ik wil naar Rotterdam’. This type of phoneme, the non-plosives, is particularly suitable for this, as, contrary to plosives, they do not consist of two phases, also see paragraph D.3. Since such word pairs share a phoneme, they will not sound the same after the boundary has been inserted as they would when pronounced in isolation. The shared phonemes of the two words will not be longer than when they are pronounced within a word. They must nevertheless be separated, so that words will not sound ‘complete’, but are still easily recognizable. If in a phoneme more than one energy maximum occurs, which is possible with an /r/, for example, as illustrated in figure 3, an energy minimum close to the middle must be selected. Figure 3 Non-plosive as shared phoneme: approximate middle Figure 3 gives an example, from sample sentence (2) a. It is clearly visible that roughly the middle of the /r/ can constitute a word boundary. In the tier with the phonetic transcription this boundary is marked with a ‘=‘-sign to indicate that in this case the /r/ is a shared phoneme so that the words sharing the /r/ do not sound complete. In the case of shared phonemes, which are non-plosives, make sure that the shared phoneme is audible in both words. D.3 Plosives (/p/, /t/, /k/, /b/, /d/, /g/) as shared phoneme The shared phonemes in examples (2)c and (2)d are plosives. The plosives in Dutch are /p/, /t/, /k/ (voiceless plosives) and /b/, /d/, /g/ (voiced plosives). These phonemes are categorized as such, because when pronouncing them a type of explosion takes place. An enclosed space is created with one of the articulators, see paragraph F.2, which is then suddenly opened, so that the air inside it is released. The location of the enclosed space and whether or not there is vibration of the vocal cords, determines which plosive is realized. Before the air is released, air pressure first builds up in the enclosed space. 8 This is called the occlusion. Then the air is released, the so-called burst, producing the sound. If two words share a plosive at their boundary, as in (2) c and (2) d, the build-up of air pressure and its release do not occur twice in joined-up speech. If the two words are to be separated, and the middle of the duration of a plosive is taken as a guideline for the point of separation, you will usually hear nothing of the identity of the plosive in the first word. After all, in that part the build-up of air-pressure is taking place, and the plosive will only be audible in the second word. Figure 4 clearly shows the problem. Here @kAn is shown as part of the word pair ‘de kantine’. The plosive /k/ has been circled. We can see how the plosive sound is made up of a part where air-pressure build up takes place, the occlusion, and a part in which the occlusion is opened and the air released, the burst. This is an example of a voiceless plosive, the /k/. A similar situation occurs with the other voiceless plosives and the voiced plosives, /b/, de /d/ and /g/. The difference between voiced and voiceless plosives is vibration of the vocal cords, which is present both during the build up of air-pressure and the burst in voiced plosives. The occlusion stage of voiced plosives will show vibration with small amplitude, instead of a flat line. However, the transition to the burst is usually still clearly visible. Figure 4 Voiceless plosive, /k/ Strictly speaking, shared plosives cannot be separated, so we will not attempt to do so. Instead the shared plosive is seen as an independent segment. D.3.1. Singular shared plosives Figure 5 is the visualization of example (2) c in which the /t/ occurs as a shared plosive in the sentence ‘Hij komt terug’. As you can see, the /t/ as a whole is treated as a separate segment and is shown in the work sheet as underscore, ‘_’. When listening to it, only one /t/ will be audible. In order to listen to the word ‘komt’ two segments must be selected, both the part /kOm/ and the segment containing the complete /t/, and listened to in succession. To listen to the word ‘terug’, start the selection with the /t/ segment and add the /@rYx/ segment. 9 Figure 5 Shared voiceless plosive Figure 6 is an example of a shared voiced plosive. It shows a detail from example (2)d: ‘Ik heb belachelijk veel te doen’. Here, too, the word ‘heb’ must be made audible by selecting both the segment in which /hE/ can be heard and the segment which only contains the /b/, and listening to them in succession. Figure 6 Shared voiced plosive The same applies again for the word ‘belachelijk’; first the segment with the /b/ and then the segment with /@lAx@l@k/. Shared plosives are placed in a separate segment, represented by a ‘_’ in both the phonetic and the orthographic tier and must be joined together with the previous or the next word to be listened to. D.3.2 Shared plosive as a word As a result of quick pronunciation, words are frequently reduced. With short words, as little as one sound may remain. For example: het is is reduced to’t is. It is, of course, possible for a reduced word to share a plosive with another word, or share 13 itself with another word, as shown in the examples below: (3) a. dat ’t niet uitmaakt b. dAt_t nit Y+tmakt c. dus ‘k keek naar hem d. dYs k_kek nar hEm [orthography] [phonetic representation] [orthography] [phonetic representation] The words ‘’t’ (het) and ‘’ k’ (ik) are shown as such brief forms that their entire duration is overlapped by the first (or last) plosive of another word. Nevertheless, there is enough evidence for the orthographic and phonetic transcribers to see them as separate words. We have chosen to treat such cases separately. The words which are made up of a plosive that is shared are not divided into two parts, the word and the ‘_’-sign, but retain the status of a word and remain a single segment. In order to ascertain whether this word is a word ‘in its own right’ or, as shown in the examples in (3)’, is also a shared plosive, this must be marked in both the orthographic and the phonetic tier with a ‘_’ sign on the side where the phoneme is shared within the segment. The examples in (3) are represented in the orthographic tier as follows: (4) a. dat _’t niet uitmaakt b. ‘k_ keek naar hem [ _ to the left of the ‘t] [ _ to the right of the ‘k] Figure 7 clarifies this solution. In this figure part of example (3) c/d is shown. In order to be able listen to the word ‘keek’ both the segment containing ‘k_’ and the ‘keek’ segment must be listened to. Figure 7 Short plosive words: as one segment D.3.3 Double shared plosives We talk about double shared plosives when a shared plosive of two words is itself a word. For example: (5) a. dat ’t toch niet kan b. dAt_t_tOx nit kAn c. dan maak ‘k kip met rijst [orthography] [phonetic presentation] [orthography] 13 d. dAn mak_k_kIp mEt rE+st [phonetic presentation] Both example (5)b and (5)d show that the words ’t and ‘k respectively have been reduced to such an extent that they overlap with the words preceding and following them. In that case we have a so-called double shared plosive. The plosive shares itself with the previous and the following words. For such cases a separate annotation, ‘_t_’, has been chosen in order to indicate that this is a double shared plosive in this particular segment. Figure 8 is a visualization of example (5) a/b. To make things clear, in order to be able to listen to the word ‘dat’, the segment with the ‘_t_’label must also be selected. For the word ‘ t’ only the exception segment ‘_t_’ must be selected, and, finally, to listen to the word ‘toch’, both the segment with ‘_t_’ and the segment ‘toch’ must be selected and played. Figure 8 Double shared plosive D.3.4 Multiple shared phonemes In the case of multiple shared phonemes a series of phonemes is shared at the word boundaries. Examples (6) a and (6) c (from section D.1), show two phonemes which are shared. (6) a. Hij wil vast stoppen. b. hE+ wIl vAst_stOp@n c. Het liefst sta ik vroeg op. d. @t lifst_sta Ik vrux Op. [multiple shared phoneme sequence] [phonetic transcription] [multiple shared phoneme sequence] [phonetic transcription] It is impossible to say with certainty where the pronunciation of the first word ends and that of the second starts, i.e. to part of which word this shared phoneme sequence belongs. This phenomenon must be treated along the same lines as singular share plosives. The sequence of shared phonemes must be seen as one entity and placed in a separate segment. In both the orthographic and the phonetic tier it is represented by the ‘_’ sign. 13 In Figure 9 the sentence of example (6) c/d is shown with the correct segment boundaries. Figure 9 Multiple shared phonemes E. Inserted sounds Another phenomenon is the insertion of sounds, as in: (7) a. Daarom doe /w/ ik het nu. b. Toen belde /n/ ie naar huis. In such cases a sound is inserted to enhance the flow of speech. It is a type of linking sound, which is the result of the articulators having to move from one extreme position to the other. This phenomenon is optional. If two words joined like this are separated, part of the inserted phoneme, like the /w/ in (7) a, will remain audible, when both words are listened to. The guideline for determining the boundary position in cases of a linking sound is roughly in the middle of the linking sound. In contrast with share phonemes, where audibility of the shared phoneme is very important, this is less important for inserted phonemes. The most important criterion is that the words on either side of it sound acceptable. This also applies if the inserted phoneme is a plosive. In the latter case, this plosive is not treated as a separate word. 13 /w/ oscillogram orthography with boundaries phonetic tier with boundaries Figure 10 Inserted linking sound: approximate middle Figure 10 illustrates the phenomenon of insertion with a /w/. Note both the phonetic and the orthographic transcription. In the phonetic transcription inserted phonemes are represented by a ‘-‘, between the word and the inserted sound. Figure 11 gives an example in which the inserted phoneme is a plosive, a /t/. As already said, these are treated in the same way as inserted non-plosives. Since there is no middle in a plosive, the criterion here is that the words on either side of the plosive must sound acceptable. /t/ oscillogram orthography with boundaries phonetic tier with boundaries Figure 11 Inserted plosive: both words acceptable F. modus operandi Summarizing, checking automatic word boundaries is realized as follows: • Do not change the transcriptions, but always report possible errors. This goes for orthographic as well as phonetic transcriptions; • Use the sound, orthography and phonetic transcriptions as a guideline; 14 • Listen word for word. If something does not sound acceptable, listen to the word pairs on either side of that specific boundary; • Do not move boundaries unnecessarily: good is good; • Only use the graphical information from the oscillogram to supplement the auditive impression; • For a workable set-up select a frame size of 1 to no more than 2 seconds; • Apply the above rules for moving boundaries; G. Background information G.1 Continuous speech The material compiled in the Corpus Gesproken Nederlands (CGN) consists of a multitude of different speech types. These forms of speech vary from spontaneous dialogues to speech read out loud (monologues). Speech can be characterized as joined-up speech. This means that no pauses are inserted between the words, so that words are not spoken in isolation. In joined-up speech the final sound of the first word and the initial sound of the second are often joined together. Below you will find a brief phonetic explanation of the phenomena found in relation to boundaries, which are relevant to the placement of word boundaries. G.2 Phonetics When producing speech you use your articulators. These include your lips, tongue, jaw, vocal cords, etc. Together these articulators help to produce speech. They are attuned to each other when it comes to timing (your lips are opened at the right time when you want to say the word /maar/ and proceed from the /m/ to the /a/-sound). But they are also in tune with each other when it comes to place. Place here refers to the location where the narrowing or complete closing off of the opening between the tongue and the palate takes place, at the front or the back of the oral cavity. Each position of the tongue results in a specific sound. For example, a /x/-sound or a /G/-sound is produced when the closure is at the back, whereas an /l/-sound can be produced when the tip of the tongue is placed against the inside of the upper teeth. Naturally, the position of the other articulators at that point is decisive for the specific sound produced. Apart from changes in the position of the articulators, there is also diversification as a result of vibration or non-vibration of the vocal cords. The difference between the /s/ and the /z/ lies in the vibration of the vocal cords, which is present in the /z/-sound, but absent with the /s/. The position of the articulators remains unchanged in this instance. The /z/-sound is a voiced sound, the /s/ is a voiceless sound. An utterance, a sentence or a word, consists of a sequence of different sounds (phonemes) each of which is produced in its own way (positions of the articulators). The changes in the position of the articulators are rapid. When sounds are produced in rapid succession, they will influence each other. The articulators are probably unable to take their characteristic position when the next sound requires another extreme position. This influencing phenomenon is called coarticulation. It is also audible. An extreme form of coarticulation, whereby even phoneme identity changes, is called assimilation. If words are extracted from isolated speech, i.e. when there is a clear pause between the words, they will sound different to words extracted from joined speech. In the case of words pronounced in isolation, no assimilation will occur at the start or the end sounds, phonemes, of the word. This does occur with words extracted from joined-up speech. The start and end phonemes have been influenced audibly by the surrounding sounds. 15 Example: (8) a. is dit alles b. Iz dIt Al@s [orthographic] [phonetic] The voiced /d/ at the beginning of dit influences the realization of the /s/ in is. Instead of being voiceless the /s/ also has a voiced pronunciation, as in a /z/. (9) a. dat die nog loopt b. dAt ti nOx lopt [orthographic] [phonetic] In (9) b the voiced /d/ in die is changed under the influence of the voiceless /t/ at the end of dat also turning voiceless. Both (8) and (9) show examples of the plosive of one word transferring its voiced or voiceless quality to the phoneme of the next word. (10) a. ik schreef veel in bad b. Ik sxref fel im bAt The first example in bold print in (10)b shows the transition of the voiced /v/, to a voiceless phoneme, the /f/. This happens under the influence of the preceding fricative /f/ in schreef. The second example in (10)b shows that the place of articulation of the /n/ approximates that of the /b/ so that the /n/ changes into an /m/. The lips are closed both when producing the /m/ and the /b/. H. Additional information H.1 Double shared plosives (September 3, 2003 by W. Goedertier) Paragraphs D.3.2 (Shared plosive as a word) and D.3.3 (Double shared plosives) explain how shared plosives in word alignment are dealt with. Meanwhile, however, another special case has been found, which combines both phenomena. A practical example (11): (11) a. dan*z gaat de*d ’t zeker weten b. tA Gat_t_t sek@r wet@n [orthographic] [phonetic] Word alignment should be as follows: ORT-tier: | FON-tier: | dan*z tA | | gaat Gat | | _de*d_ _t_ | | ©t | t | zeker sek@r | | weten wet@n | | The position of the boundaries (timing information of the segments) should then be as follows: 2nd segment (gaat/Gat) 3rd segment (_de*d_/_t_) 4th segment (© t/t) : contains phoneme /G/ and /a/ : contains phoneme /t/ : contains NOTHING !! Theoretically the 4th segment should therefore have ZERO duration. However, as a segment with ZERO duration is problematic in Praat, we will allocate VERY SHORT duration (5 to 10 milliseconds) to this segment. (In automatic word alignment such segments are allocated a duration of exactly 1 millisecond.) A second practical example (12) of the same phenomenon which can be solved by analogy: (12) a. je hebt ’t de hele tijd zo gezegd b. j@ Ept_t_t hel tE+t so x@z@xt [orthographic] [phonetic] 16 H.2 Two words as one (January 15, 2004 by W. Goedertier) Paragraph D.3.4 (Multiple shared phonemes) explains how to proceed when more than one phoneme is shared by two words. Meanwhile it has become clear that the following case frequently occurs, e.g. (13): (13) a. heb je je computer al geïnstalleerd? [orthographic] b. hEp j@_j@ kOmpjut@r A G@Int@lert [phonetic] The transcriber who produced the orthographic transcription wrote "je" twice. Grammatically speaking, this is the only thing that makes sense. The transcriber who produced the phonetic transcription, however, only heard /j@/ once. According to the protocol for phonetic transcription this can be noted as /j@_j@/. This way the ‘word for word’principle is still respected. When rule D.3.4 (and D.3.2) are strictly adhered to word alignment should be as follows: ORT-tier: | FON-tier: | heb hEp | | je j@ | | _ _ | | je j@ | | computer kOmpjut@r | | al A | | geïnstalleerd? | G@Inst@lert | The position of the boundaries (timing information for the segments) should then be as follows: 2nd segment (je/j@): contains NOTHING, ZERO duration (in practice therefore 5 to 10 milliseconds) 3rd segment (_/_) 4th segment (je/j@) : contains the phonemes /j@/ : contains NOTHING, ZERO duration (in practice therefore 5 to 10 milliseconds) Here, too, segments with ZERO duration must be used. However, there is an alternative for handling this situation. Instead of /j@_j@/ you can also write /j_j@/ in the phonetic transcription. According to the protocol for phonetic transcription the interpretation is the same, i.e. only one /j/ and one /@/ is heard. Word alignment is then as follows: ORT-tier: | FON-tier: | heb hEp | | je j= | | je =j@ | | computer | kOmpjut@r | al A | | geïnstalleerd? | G@Inst@lert | The position of the boundaries (timing information) is then: 2nd segment (je/j=): contains the first half of the phoneme /j/ 3rd segment (je/=j@): contains the second half of the phoneme /j/ and the whole of phoneme /@/ In this alternative there is no need to use segments with ZERO duration, which therefore makes this concept preferable. A second example (14) of the same phenomena, which can be solved by analogy, i.e. from (14)b to (14)c: (14) a. Anders loop je jezelf een beetje… b. Anz lop j@_j@zelv @m betj@ c. Anz lop j_j@zelv @m betj@ [orthographic] [phonetic] [as in word alignment] 17
© Copyright 2026 Paperzz