Protocol for manual verification of automatically generated word

Protocol for manual verification of
automatically generated
word segmentations
D. Binnenpoorte, 29-01-02 (+ additions 15-01-04)
A. General
A.1 Aim
The aim is to separate words from each other in continuous speech by placing boundaries in
such a way that the individual words sound acceptable acoustically. This means that the words
concerned should be auditively recognizable as such. Experience shows that short words such
as ‘de’, ‘en’, ‘ik’, ‘met’, isolated in continuous speech can barely be recognized. These kinds
of words will then not meet the criterion of ‘acoustically acceptable’, but you should
nevertheless always try to place the boundaries as well as possible.
The function behind placing word boundaries is so that they can serve as anchor points in
time. The final user of the corpus should be able to give a search instruction in which, for
example, a word is specified. The anchor points mark the beginning and the end of the signal.
The task of the manual aligner is to check the automatically generated word boundaries
according to the following principles and correct them where necessary.
A.2 Basic assumptions
All speech material has been both orthographically and phonetically transcribed. Both
transcriptions have been produced by people and form the basis for the word boundaries which
have already been placed automatically. In order to generate these, automatic speech
recognition has been used. The boundaries found by automatic speech recognition must
therefore be checked manually and adjusted where necessary. The program used is Praat. In
Figure 1 an example of a work screen from the Praat program is shown, in which the following
terms occur:
- Orthographic transcription with boundaries (in upper tier) gives the spelling form of the
words which have been uttered;
- Phonetic transcription with boundaries (second tier) shows how the words have been
realized, using a set of phonetic symbols which indicate their pronunciation.
- The vertical lines between the words in both tiers are the segment boundaries, or word
boundaries. If necessary, these boundaries must be moved by a manual aligner.
- Above the tiers is the oscillogram, a representation of the speech signal.
- With the help of the selection buttons screen size can be determined. In Figure 1 just over a
minute of a longer utterance has been selected.
1
oscillogram
word boundaries
orthographic
tier
with boundaries
phonetic tier with
boundaries
duration of the screen
selection buttons
Figure 1 Praat work screen for verification of word alignment
A.3 What are words?
Words are all entities which in orthography and spelling have been separated from each other
by means of a space. The orthography of the CGN also contains entities, codes, such as ‘ggg’
for speaker sounds, or ‘xxx’ for incomprehensible speech. These, too, are words, if they can be
separated from other entities by a space. However, these codes can also occur within words. In
that case, they should not be regarded as words, obviously. You can read more about this in
C.5 and C.6.
A rule of thumb for the verification of automatically detected word segmentations is that all
words are separated from each other by boundaries. Therefore there are boundaries between all
words. There are two exceptions to this rule.
1. Firstly, a pause or silence can occur between words. This pause is then preceded and
followed by a boundary, see paragraph C.4.
2. The second exception applies when two successive words share a plosive sound at their
boundary, i.e. the first word ends in a plosive sound and the following word begins with
the same plosive sound. In that case the shared plosive is also enclosed by boundaries. You
can read more about this in section D.
B. Types of word boundaries
Automatic word boundaries are created when the ASR links the sound symbols from the
phonetic transcription to the corresponding parts of the sound signal. For each phoneme
(sound symbol) a corresponding small part of a signal is found to which it belongs. The final
word boundaries are derived from the phoneme segmentations. These word boundaries are
shown by vertical blue lines. We need to check if these boundaries are in the right place.
2
We differentiate between different types of word boundaries or places where they occur
perceptively and phonetically:
•
•
•
•
•
The most frequently occurring boundaries are the normal flowing transitions between
the words. The end boundary of the first word coincides with the start boundary of the
following one.
There can be a pause between two words. This could best be shown as a separate
segment, to prevent a selected word from being surrounded by too long a pause.
Between words sounds (phonemes) can be inserted to ensure flowing speech. For
example in: Daarom doe [w] ik het nu. This is shown in the phonetic transcription as:
darOm du-w-Ik @t ny. Paragraph D.2 covers this in more depth.
Words can share phonemes at their boundaries. This is found where word1 ends with
the same sound as word2 begins. For example: …naar Rotterdam….Usually in such
cases the [r] sound is not pronounced twice. In the phonetic transcription these cases
are shown as nar_rOt@rdAm. Paragraph D.1.2 gives more examples and describes
where the word boundaries should be marked in such cases.
The other word boundaries can demonstrate assimilation phenomena. This means that
sounds influence each other across word boundaries. In the example: Is dit goed? the
[s] is influenced by the [d] and pronounced as a [z] as is shown phonetically in the
following: Iz dIt Gut? More explanation on this phenomenon can be found in the
section on phonetic background information.
C. Basic principles for verification
The most important basic principle for verifying the word boundaries is that it should be
consistent. This can be achieved by applying a few rules.
C.1 Consistency in orthographic transcription
Word alignment must stay consistent with the orthographic transcription. This transcription is
in the top transcription tier, see paragraph A.2. For every word (including ‘xxx’, ‘ggg’) in the
orthographic transcription a segment must be defined in the alignment.
Keep the alignment consistent with the orthographic transcription.
Hesitation, stuttering and all other interjections are orthographically transcribed. Consider
these phenomena as separate words, surrounded by word boundaries. Hesitation in a word
should never lead to extra word boundaries, as they do not change the number of orthographic
units.
3
Hesitation, for example uh, and other interjections should be separated by segment
boundaries. If these speech sounds occur IN a word they should not be separated by
segment boundaries.
You are not allowed to make any changes to the orthographic transcription. Any errors or
oddities should be reported to your project manager.
NEVER make any changes to the orthographic transcription.
In the orthographic transcription words can be enhanced with so-called star codes. These codes
are used in the orthography to differentiate between the following phenomena:
*v
→ words from a foreign language
*d
→ dialect words
*z
→ strong dialect words
*n
→ ‘new’ words
*t
→ new interjections
*a
→ broken off words
*u
→ strange pronunciation or mispronunciations
*x
→ difficult to understand words
Never change anything in these codes, but use the extra information when checking the word
boundaries. The code and the word it precedes form a single entity.
C.2 Consistency in phonetic transcription
The phonetic transcription tier, see paragraph A.2, contains a great deal of information on the
phenomena, listed in section B, which can occur on word boundaries. For example:
(1)
a. zE+ hAd@n nits
b. zE+ hAd@n_nits
c. zE+ hAd@ nits
[zij hadden niets]
[zij hadden niets]
[zij hadden niets]
These examples are different versions of the sentence ‘zij hadden niets’, as they have been
transcribed by expert listeners from the phonetic transcription group. Example (1)a shows a
pronunciation of the sentence in which both the final /n/ in ‘hadden’ as the first /n/ in ‘niets’
are pronounced. The two /n/s are easy to separate. This situation will not occur very often in
joined-up speech. Example (1)b is a case of a split phoneme, degemination. The idea is that
the phoneme is audible in both words, see section D. Example (1)c shows that the /n/ in
‘hadden’ is not pronounced at all. The boundary between the words here lies after the /@/ and
before the /n/ of ‘niets’. The phonetic and orthographic transcriptions given are decisive for
the auditive demands placed on the segmented words. The transcriptions have been made by
trained transcribers who also have used context information during the generation of the
transcriptions.
4
Keep the alignment consistent with the phonetic transcription.
The same applies here as with the orthographic transcription; you are not allowed to make any
changes to the phonetic transcription. Oddities or errors should be reported to the project
manager.
NEVER change anything in the phonetic transcription.
C.3 Do not move the boundaries unless really necessary
The word boundaries which have been generated by the automatic speech recognizer have to
be checked. For this each segmented word has to be listened to. If, on the basis of the auditive
impression, you feel a word boundary has to be moved, this can only be done by listening to
the word before the boundary and the word after it. Once again the criterion for moving a
boundary is always the auditive impression. For long words the rule of thumb is: if the word
sounds acoustically acceptable, then the boundary is acceptable. For short words it is often the
case that moving the boundary makes no difference to the original position. In these cases the
boundaries should be left where they are. It is possible to use the ‘undo’ function in Praat to
put the boundary back in its previous position. NB: if the boundary is moved twice from its
original position, then the ‘ongedaan maken’ (‘undo’) function can only be used to relocate the
one before the previous position i.e. not the original position.
When carrying out these checks be pragmatic: ‘Good is good’.
Don’t move the boundaries unnecessarily over ‘small distances’.
C.4 Pauses and other non-transcribed sounds
The automatic segmentation also contains ‘silences’ marked as a separate segment. Only move
these boundaries if the word before or after the silence doesn’t sound right, or when the socalled ‘silence’ definitely contains a piece of transcribed speech. In exceptional cases it is
possible that a ‘silence’ has to be removed completely. If the automatic segmentation has not
marked annoying ‘silences’, add them yourself using any empty transcription.
Change the automatically identified pauses as little as possible.
If annoying non-transcribed sounds have not yet been automatically segmented, add the
boundaries for these sounds.
5
Figure 2 illustrates how pauses can occur between words and can be enclosed by segment
boundaries.
To prevent too many pause segments from being enclosed by boundaries, we have created the
rule that pauses shorter than 50 ms (0.050 s) are not shown as a separate segment in automatic
segmentation. If, as a result of moving boundaries, a silence or pause becomes shorter than
50 ms, it must be removed.
Pauses or silences shorter than 50 ms are not marked.
A pause or silence starts once the previous word has been completed.
C.5 Segmenting speaker sounds
Speaker sounds, such as laughing, yelling, coughing, are indicated in the orthographic
transcription- tier as ‘ggg’, whilst in phonetic transcription the ‘#’ sign is used. They must be
seen as words, if they are enclosed by spaces. Naturally, they must also be verified. Always
determine the start boundary by listening to the previous word and only change its position, if
this word does not sound acceptable. In order to determine the end boundary of the speaker
sound, the following word must sound good. Therefore, the speaker sound, ‘ggg’, is of minor
importance in relation to the surrounding words. So do not move the boundaries enclosing
these sounds unnecessarily.
Speaker sounds outside of the words are separated by segment boundaries.
C.6 Segmenting ‘xxx’
In transcriptions ‘xxx’ codes also occur, in phonetic transcription these are indicated as ‘[]’.
They indicate one or more successive words which have not been understood properly. Such
parts must be interpreted as one word, which should be separated by segment boundaries.
Segment incomprehensible speech as one word enclosed by segment boundaries.
‘xxx’ codes can also occur within words. Words in which this occurs must be treated as normal
words and be enclosed by boundaries in their entirety.
C.7 Example
Figure 2 contains a fragment in which most of the phenomena described above occur.
Important points are:
• The end boundary of word1 overlaps the start boundary of word2. The boundary is in
between the two words;
• Incomprehensible speech (‘xxx’) is enclosed by word boundaries;
• Speaker sounds (‘ggg’) are also enclosed by word boundaries. Both the
incomprehensible speech and the speaker sounds are transcribed entities;
6
•
a pause (either silence or annoying background noise) starts where the previous word
(in this case: ‘ggg’) ends. A pause ends where the next word (in this case: ‘duidelijk’)
starts.
Figure 2 Examples of pauses and speaker sounds
D. Shared phonemes at word boundaries
In section B we have briefly discussed which phenomena can occur at word boundaries in
joined-up speech. To repeat; in joined-up speech words are not separated by pauses, like
spaces in written language, but the sounds continue in an uninterrupted flow. There are, of
course, pauses for breathing, or for other reasons which contribute to enhancing
communication. The word boundary types which could be problematic are those at which the
same (section D) or, the reverse, inserted phonemes (section E) occur. Below we describe how
to check word boundaries for each phenomenon. Every type has its own notations and codes,
which have already been indicated automatically and are therefore easier to trace. The extra
codes serve as pointers; they should not be changed or added.
D.1 What are shared phonemes?
In joined-up speech assimilation, or ‘adaptation’, of sounds occurs, both within words and at
word boundaries. A special case of assimilation at word boundaries is called degemination.
This can occur when the final sound of one word is the same as the initial sound of the
following word, so that the sound is shared by both words. For example:
(2)
a. Ik wil naar Rotterdam.
b. Hij is zeker vandaag gevallen.
c. Hij komt terug.
d. Ik heb belachelijk veel te doen.
e. Hij wil vast stoppen.
f. Het liefst sta ik vroeg op.
[shared sonorant]
[shared fricative]
[shared voiceless plosive]
[shared voiced plosive]
[multiple shared phoneme sequence]
[multiple shared phoneme sequence]
7
Such sounds are often pronounced as a single sound and will therefore probably be difficult to
separate. Guidelines for such cases are given below.
D.2 Non-plosives as shared phoneme
In the case of example (2)b: ‘Hij is zeker vandaag gevallen’ you could easily use roughly the
middle of the ‘g’ as the boundary between /vandaag/ and /gevallen/. This is also the case with
(2)a: ‘Ik wil naar Rotterdam’. This type of phoneme, the non-plosives, is particularly suitable
for this, as, contrary to plosives, they do not consist of two phases, also see paragraph D.3.
Since such word pairs share a phoneme, they will not sound the same after the boundary has
been inserted as they would when pronounced in isolation. The shared phonemes of the two
words will not be longer than when they are pronounced within a word. They must
nevertheless be separated, so that words will not sound ‘complete’, but are still easily
recognizable.
If in a phoneme more than one energy maximum occurs, which is possible with an /r/, for
example, as illustrated in figure 3, an energy minimum close to the middle must be selected.
Figure 3 Non-plosive as shared phoneme: approximate middle
Figure 3 gives an example, from sample sentence (2) a. It is clearly visible that roughly the
middle of the /r/ can constitute a word boundary. In the tier with the phonetic transcription this
boundary is marked with a ‘=‘-sign to indicate that in this case the /r/ is a shared phoneme so
that the words sharing the /r/ do not sound complete.
In the case of shared phonemes, which are non-plosives, make sure that the shared
phoneme is audible in both words.
D.3 Plosives (/p/, /t/, /k/, /b/, /d/, /g/) as shared phoneme
The shared phonemes in examples (2)c and (2)d are plosives. The plosives in Dutch are
/p/, /t/, /k/ (voiceless plosives) and /b/, /d/, /g/ (voiced plosives). These phonemes are
categorized as such, because when pronouncing them a type of explosion takes place. An
enclosed space is created with one of the articulators, see paragraph F.2, which is then
suddenly opened, so that the air inside it is released. The location of the enclosed space and
whether or not there is vibration of the vocal cords, determines which plosive is realized.
Before the air is released, air pressure first builds up in the enclosed space.
8
This is called the occlusion. Then the air is released, the so-called burst, producing the sound.
If two words share a plosive at their boundary, as in (2) c and (2) d, the build-up of air pressure
and its release do not occur twice in joined-up speech.
If the two words are to be separated, and the middle of the duration of a plosive is taken as a
guideline for the point of separation, you will usually hear nothing of the identity of the
plosive in the first word. After all, in that part the build-up of air-pressure is taking place, and
the plosive will only be audible in the second word. Figure 4 clearly shows the problem. Here
@kAn is shown as part of the word pair ‘de kantine’. The plosive /k/ has been circled. We can
see how the plosive sound is made up of a part where air-pressure build up takes place, the
occlusion, and a part in which the occlusion is opened and the air released, the burst. This is an
example of a voiceless plosive, the /k/. A similar situation occurs with the other voiceless
plosives and the voiced plosives, /b/, de /d/ and /g/. The difference between voiced and
voiceless plosives is vibration of the vocal cords, which is present both during the build up of
air-pressure and the burst in voiced plosives. The occlusion stage of voiced plosives will show
vibration with small amplitude, instead of a flat line. However, the transition to the burst is
usually still clearly visible.
Figure 4 Voiceless plosive, /k/
Strictly speaking, shared plosives cannot be separated, so we will not attempt to do so. Instead
the shared plosive is seen as an independent segment.
D.3.1. Singular shared plosives
Figure 5 is the visualization of example (2) c in which the /t/ occurs as a shared plosive in the
sentence ‘Hij komt terug’. As you can see, the /t/ as a whole is treated as a separate segment
and is shown in the work sheet as underscore, ‘_’. When listening to it, only one /t/ will be
audible. In order to listen to the word ‘komt’ two segments must be selected, both the part
/kOm/ and the segment containing the complete /t/, and listened to in succession. To listen to
the word ‘terug’, start the selection with the /t/ segment and add the /@rYx/ segment.
9
Figure 5 Shared voiceless plosive
Figure 6 is an example of a shared voiced plosive. It shows a detail from example (2)d: ‘Ik heb
belachelijk veel te doen’. Here, too, the word ‘heb’ must be made audible by selecting both the
segment in which /hE/ can be heard and the segment which only contains the /b/, and listening
to them in succession.
Figure 6 Shared voiced plosive
The same applies again for the word ‘belachelijk’; first the segment with the /b/ and then the
segment with /@lAx@l@k/.
Shared plosives are placed in a separate segment, represented by a ‘_’ in both the
phonetic and the orthographic tier and must be joined together with the previous or
the next word to be listened to.
D.3.2 Shared plosive as a word
As a result of quick pronunciation, words are frequently reduced. With short words, as little as
one sound may remain. For example: het is is reduced to’t is.
It is, of course, possible for a reduced word to share a plosive with another word, or share
13
itself with another word, as shown in the examples below:
(3)
a. dat ’t niet uitmaakt
b. dAt_t nit Y+tmakt
c. dus ‘k keek naar hem
d. dYs k_kek nar hEm
[orthography]
[phonetic representation]
[orthography]
[phonetic representation]
The words ‘’t’ (het) and ‘’ k’ (ik) are shown as such brief forms that their entire duration is
overlapped by the first (or last) plosive of another word. Nevertheless, there is enough
evidence for the orthographic and phonetic transcribers to see them as separate words.
We have chosen to treat such cases separately. The words which are made up of a plosive that
is shared are not divided into two parts, the word and the ‘_’-sign, but retain the status of a
word and remain a single segment. In order to ascertain whether this word is a word ‘in its
own right’ or, as shown in the examples in (3)’, is also a shared plosive, this must be marked
in both the orthographic and the phonetic tier with a ‘_’ sign on the side where the phoneme is
shared within the segment. The examples in (3) are represented in the orthographic tier as
follows:
(4)
a. dat _’t niet uitmaakt
b. ‘k_ keek naar hem
[ _ to the left of the ‘t]
[ _ to the right of the ‘k]
Figure 7 clarifies this solution. In this figure part of example (3) c/d is shown. In order to be
able listen to the word ‘keek’ both the segment containing ‘k_’ and the ‘keek’ segment must
be listened to.
Figure 7 Short plosive words: as one segment
D.3.3 Double shared plosives
We talk about double shared plosives when a shared plosive of two words is itself a word. For
example:
(5)
a. dat ’t toch niet kan
b. dAt_t_tOx nit kAn
c. dan maak ‘k kip met rijst
[orthography]
[phonetic presentation]
[orthography]
13
d. dAn mak_k_kIp mEt rE+st
[phonetic presentation]
Both example (5)b and (5)d show that the words ’t and ‘k respectively have been reduced to
such an extent that they overlap with the words preceding and following them. In that case we
have a so-called double shared plosive. The plosive shares itself with the previous and the
following words.
For such cases a separate annotation, ‘_t_’, has been chosen in order to indicate that this is a
double shared plosive in this particular segment. Figure 8 is a visualization of example (5) a/b.
To make things clear, in order to be able to listen to the word ‘dat’, the segment with the ‘_t_’label must also be selected. For the word ‘ t’ only the exception segment ‘_t_’ must be
selected, and, finally, to listen to the word ‘toch’, both the segment with ‘_t_’ and the segment
‘toch’ must be selected and played.
Figure 8 Double shared plosive
D.3.4 Multiple shared phonemes
In the case of multiple shared phonemes a series of phonemes is shared at the word
boundaries. Examples (6) a and (6) c (from section D.1), show two phonemes which are
shared.
(6)
a. Hij wil vast stoppen.
b. hE+ wIl vAst_stOp@n
c. Het liefst sta ik vroeg op.
d. @t lifst_sta Ik vrux Op.
[multiple shared phoneme sequence]
[phonetic transcription]
[multiple shared phoneme sequence]
[phonetic transcription]
It is impossible to say with certainty where the pronunciation of the first word ends and that of
the second starts, i.e. to part of which word this shared phoneme sequence belongs. This
phenomenon must be treated along the same lines as singular share plosives. The sequence of
shared phonemes must be seen as one entity and placed in a separate segment. In both the
orthographic and the phonetic tier it is represented by the ‘_’ sign.
13
In Figure 9 the sentence of example (6) c/d is shown with the correct segment boundaries.
Figure 9 Multiple shared phonemes
E. Inserted sounds
Another phenomenon is the insertion of sounds, as in:
(7)
a. Daarom doe /w/ ik het nu.
b. Toen belde /n/ ie naar huis.
In such cases a sound is inserted to enhance the flow of speech. It is a type of linking sound,
which is the result of the articulators having to move from one extreme position to the other.
This phenomenon is optional. If two words joined like this are separated, part of the inserted
phoneme, like the /w/ in (7) a, will remain audible, when both words are listened to. The
guideline for determining the boundary position in cases of a linking sound is roughly in the
middle of the linking sound. In contrast with share phonemes, where audibility of the shared
phoneme is very important, this is less important for inserted phonemes. The most important
criterion is that the words on either side of it sound acceptable. This also applies if the inserted
phoneme is a plosive. In the latter case, this plosive is not treated as a separate word.
13
/w/
oscillogram
orthography with
boundaries
phonetic tier with
boundaries
Figure 10 Inserted linking sound: approximate middle
Figure 10 illustrates the phenomenon of insertion with a /w/. Note both the phonetic and the
orthographic transcription. In the phonetic transcription inserted phonemes are represented by
a ‘-‘, between the word and the inserted sound.
Figure 11 gives an example in which the inserted phoneme is a plosive, a /t/. As already said,
these are treated in the same way as inserted non-plosives. Since there is no middle in a
plosive, the criterion here is that the words on either side of the plosive must sound acceptable.
/t/
oscillogram
orthography with
boundaries
phonetic tier with
boundaries
Figure 11 Inserted plosive: both words acceptable
F. modus operandi
Summarizing, checking automatic word boundaries is realized as follows:
•
Do not change the transcriptions, but always report possible errors. This goes for
orthographic as well as phonetic transcriptions;
•
Use the sound, orthography and phonetic transcriptions as a guideline;
14
•
Listen word for word. If something does not sound acceptable, listen to the word pairs
on either side of that specific boundary;
•
Do not move boundaries unnecessarily: good is good;
•
Only use the graphical information from the oscillogram to supplement the auditive
impression;
•
For a workable set-up select a frame size of 1 to no more than 2 seconds;
•
Apply the above rules for moving boundaries;
G. Background information
G.1 Continuous speech
The material compiled in the Corpus Gesproken Nederlands (CGN) consists of a multitude of different speech types.
These forms of speech vary from spontaneous dialogues to speech read out loud (monologues). Speech can be
characterized as joined-up speech. This means that no pauses are inserted between the words, so that words are not
spoken in isolation. In joined-up speech the final sound of the first word and the initial sound of the second are often
joined together. Below you will find a brief phonetic explanation of the phenomena found in relation to boundaries,
which are relevant to the placement of word boundaries.
G.2 Phonetics
When producing speech you use your articulators. These include your lips, tongue, jaw, vocal cords, etc. Together these
articulators help to produce speech. They are attuned to each other when it comes to timing (your lips are opened at the
right time when you want to say the word /maar/ and proceed from the /m/ to the /a/-sound). But they are also in tune
with each other when it comes to place. Place here refers to the location where the narrowing or complete closing off of
the opening between the tongue and the palate takes place, at the front or the back of the oral cavity. Each position of
the tongue results in a specific sound. For example, a /x/-sound or a /G/-sound is produced when the closure is at the
back, whereas an /l/-sound can be produced when the tip of the tongue is placed against the inside of the upper teeth.
Naturally, the position of the other articulators at that point is decisive for the specific sound produced. Apart from
changes in the position of the articulators, there is also diversification as a result of vibration or non-vibration of the
vocal cords. The difference between the /s/ and the /z/ lies in the vibration of the vocal cords, which is present in the
/z/-sound, but absent with the /s/. The position of the articulators remains unchanged in this instance. The /z/-sound is a
voiced sound, the /s/ is a voiceless sound.
An utterance, a sentence or a word, consists of a sequence of different sounds (phonemes) each of which is produced in
its own way (positions of the articulators). The changes in the position of the articulators are rapid. When sounds are
produced in rapid succession, they will influence each other. The articulators are probably unable to take their
characteristic position when the next sound requires another extreme position. This influencing phenomenon is called
coarticulation. It is also audible. An extreme form of coarticulation, whereby even phoneme identity changes, is called
assimilation.
If words are extracted from isolated speech, i.e. when there is a clear pause between the words, they will sound
different to words extracted from joined speech. In the case of words pronounced in isolation, no assimilation will
occur at the start or the end sounds, phonemes, of the word. This does occur with words extracted from joined-up
speech. The start and end phonemes have been influenced audibly by the surrounding sounds.
15
Example:
(8)
a. is dit alles
b. Iz dIt Al@s
[orthographic]
[phonetic]
The voiced /d/ at the beginning of dit influences the realization of the /s/ in is. Instead of being voiceless the /s/ also has
a voiced pronunciation, as in a /z/.
(9)
a. dat die nog loopt
b. dAt ti nOx lopt
[orthographic]
[phonetic]
In (9) b the voiced /d/ in die is changed under the influence of the voiceless /t/ at the end of dat also turning voiceless.
Both (8) and (9) show examples of the plosive of one word transferring its voiced or voiceless quality to the phoneme
of the next word.
(10)
a. ik schreef veel in bad
b. Ik sxref fel im bAt
The first example in bold print in (10)b shows the transition of the voiced /v/, to a voiceless phoneme, the /f/. This
happens under the influence of the preceding fricative /f/ in schreef. The second example in (10)b shows that the place
of articulation of the /n/ approximates that of the /b/ so that the /n/ changes into an /m/. The lips are closed both when
producing the /m/ and the /b/.
H. Additional information
H.1 Double shared plosives
(September 3, 2003 by W. Goedertier)
Paragraphs D.3.2 (Shared plosive as a word) and D.3.3 (Double shared plosives) explain how shared plosives in word
alignment are dealt with. Meanwhile, however, another special case has been found, which combines both phenomena.
A practical example (11):
(11)
a. dan*z gaat de*d ’t zeker weten
b. tA Gat_t_t sek@r wet@n
[orthographic]
[phonetic]
Word alignment should be as follows:
ORT-tier: |
FON-tier: |
dan*z
tA
|
|
gaat
Gat
|
|
_de*d_
_t_
|
|
©t |
t
|
zeker
sek@r
|
|
weten
wet@n
|
|
The position of the boundaries (timing information of the segments) should then be as follows:
2nd segment (gaat/Gat)
3rd segment (_de*d_/_t_)
4th segment (© t/t)
: contains phoneme /G/ and /a/
: contains phoneme /t/
: contains NOTHING !!
Theoretically the 4th segment should therefore have ZERO duration. However, as a segment with ZERO duration is
problematic in Praat, we will allocate VERY SHORT duration (5 to 10 milliseconds) to this segment. (In automatic
word alignment such segments are allocated a duration of exactly 1 millisecond.)
A second practical example (12) of the same phenomenon which can be solved by analogy:
(12)
a. je hebt ’t de hele tijd zo gezegd
b. j@ Ept_t_t hel tE+t so x@z@xt
[orthographic]
[phonetic]
16
H.2 Two words as one
(January 15, 2004 by W. Goedertier)
Paragraph D.3.4 (Multiple shared phonemes) explains how to proceed when more than one phoneme is shared by two
words. Meanwhile it has become clear that the following case frequently occurs, e.g. (13):
(13)
a. heb je je computer al geïnstalleerd?
[orthographic]
b. hEp j@_j@ kOmpjut@r A G@Int@lert [phonetic]
The transcriber who produced the orthographic transcription wrote "je" twice. Grammatically speaking, this is the only
thing that makes sense. The transcriber who produced the phonetic transcription, however, only heard /j@/ once.
According to the protocol for phonetic transcription this can be noted as /j@_j@/. This way the ‘word for word’principle is still respected. When rule D.3.4 (and D.3.2) are strictly adhered to word alignment should be as follows:
ORT-tier: |
FON-tier: |
heb
hEp
|
|
je
j@
|
|
_
_
|
|
je
j@
|
|
computer
kOmpjut@r
|
|
al
A
|
|
geïnstalleerd? |
G@Inst@lert |
The position of the boundaries (timing information for the segments) should then be as follows:
2nd segment (je/j@): contains NOTHING, ZERO duration (in practice therefore 5 to 10
milliseconds)
3rd segment (_/_)
4th segment (je/j@)
: contains the phonemes /j@/
: contains NOTHING, ZERO duration (in practice therefore 5 to 10 milliseconds)
Here, too, segments with ZERO duration must be used.
However, there is an alternative for handling this situation. Instead of /j@_j@/ you can also write /j_j@/ in the phonetic
transcription. According to the protocol for phonetic transcription the interpretation is the same, i.e. only one /j/ and one
/@/ is heard. Word alignment is then as follows:
ORT-tier: |
FON-tier: |
heb
hEp
|
|
je
j=
|
|
je
=j@
|
|
computer
|
kOmpjut@r |
al
A
|
|
geïnstalleerd? |
G@Inst@lert |
The position of the boundaries (timing information) is then:
2nd segment (je/j=): contains the first half of the phoneme /j/
3rd segment (je/=j@): contains the second half of the phoneme /j/ and the whole of phoneme /@/
In this alternative there is no need to use segments with ZERO duration, which therefore makes this concept preferable.
A second example (14) of the same phenomena, which can be solved by analogy, i.e. from (14)b to (14)c:
(14)
a. Anders loop je jezelf een beetje…
b. Anz lop j@_j@zelv @m betj@
c. Anz lop j_j@zelv @m betj@
[orthographic]
[phonetic]
[as in word alignment]
17