KfirRaphael1988

CALIFORNIA STATE UNIVERSITY, NORTHRIDGE
SPEECH SYNTHESIZER FOR THE IBM PC
A graduate project submitted in partial satisfaction of the
requirements for the degree of Master of Science in
Engineering
by
Raphael Kfir
January 1988
The graduate project of Raphael Kfir is approved:
Prof.
Prof.
Prof.
California State University, Northridge
ii
To my parents
iii
TABLE OF CONTENTS
Page
DEDICATION.
LIST OF
TABLES~
. . . . ... ..................... . . . .. . . . . . . . iii
• . . . ... . .... . . . . . . . . . . . . . . . . . . . . . .. .... • viii
LIST OF FIGURES.
ix
ABSTRACT.
xi
Chapter
1.
INTRODUCTION TO SYNTHETIC SPEECH
1.1
Electronic Voice ••••••••••••
1
1.2
The formation of Speech Waves.
2
1.3
The Elements of Speech Sounds.
1.4
2
Phonemes • •••••••••••••••••
3
The techniques of synthesis.
3
Amplitude Sampling •••.•.•.
4
LPC • ••.•••.•.•.
4
Non-Linear PCM.
2•
. .. . . . . . . . . .
.. . . . . . . . . . . . . . . . .. . . . .. .
5
ADPCM ••••••••••
5
Parametric Coding.
5
Formant Synthesizer.
6
Inevitable Tradeoffs.
6
THE IBM PERSONAL COMPUTER
2.1
Introduction •••••••
8
2.2
System Description.
8
2.3
I/O Channel ••.•..••
10
iv
Page
Chapter
3.
THE SSI 263A PHONEME SPEECH SYNTHESIZER
3.1
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
3.2
The SSI263 Functional Description ••.••••••
15
Production of Speech •••••.••.•••••.••..•
15
Modes of Operation •••••.••.•.••.••••••.•
20
The Speech Attribute Registers ••.•.•••..
21
Phoneme Selection •..••••••..••••.•.••••.
23
Duration Adjustment ••.••...•.•.••••..••.
24
Articulation Adjustment •.•••••••...••.••
24
Amplitude Adjustment ••••.•.•...•••..•.•.
25
Inflection Adjustment (pitch) •••.•.•.•..
26
Rate of Speech..........................
29
F i 1 ter Frequency . • • . • . • • • • • • . . • • . • . • • • • .
29
Description of Pin Function ••••••••.••.•••
30
3.3
4.
HARDWARE DESIGN
4.1
System Overview. • . . • • • • . • . • • • • . • • . • . . • • . • •
33
4.2
Interface and Address Decoding •.•••••.••..
35
Input Buffering. • • • • • • • • • • . • . . . • • • . • . • . .
35
Data Direction Decoding .•...••.•...•••.•
37
Address Decoder. • • • • • • • . . • • • • . . • . • . • • • . .
37
Speech processor and supporting circuitry.
39
Reset/Power Down ••••.••.•.•••.••.•••••••
40
Clock Input . ....... ~ . . . . . . . . . . . . . . . . . . . .
4o
Audio output circuit ..•..•••.•••••.•.••••.
40
The filter.. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
Power Amplifier. . . . . . . . . . . . . . . . . . . . . . . . .
44
4.3
4.4
v
Chapter
Page
4.5
5.
46
overall Audio Output Performance.
46
Power Supplies Decoupling ••••.•••••
50
SOFTWARE DESIGN
5.3
.. .. . . . . . . . . . . . . . . . .. .
Keyboard Input Mode •. ... . .. . . . . . . . . . . . . . ..
Text File Input Mode. . . .. . ... .. . . . . . . . . . ..
5.4
Talk File Input Mode.
57
5.5
Dictionary Mode.
58
Search Method.
58
Vocabulary ••
59
5.1
5.2
6.
Speaker Interface . .............. .
General Description.
54
56
SPEECH DEVELOPMENT
6.1
Introduction ••
61
6.2
Phoneme Discussion.
61
consonant Sound.
62
Vowel Sounds •••.
62
6.3
Phonetic Programming Methodology ••.•.•.•..
63
6.4
Text Entry Format . ...•...•••••............
64
6.5
Phonetic Transcription Development Sheet ..
66
6.6
An Example of Phonetic Transcription
Development ••..•
7.
51
67
TEST AND CALIBRATION
7.1
Calibration ••.•.
..- . .. . . . . . .. . .... . .
Hardware Calibration.
77
. . . . . . . . .. . .. . . . . . . .
77
Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
78
Software Calibration.
7.2
77
vi
Chapter
Page
8. CONCLUSIONS.
81
REFERENCES
83
APPENDIX
A. Speech Card Drawings •••
85
B. IBM Memory and I/O Map.
90
c. Phonetic Speech Development Sheet.
. ...... .
94
D. Program Listing and Flow Charts .•.
96
E. Phoneme Chart . ................... .
135
F. Overall Audio Stage Analysis Using Spice.
140
vii
LIST OF TABLES
Page
Table 1.
Examples of Phonemes and Their
Formant Frequencies. • • • . • • • . . • . • . • • • . • . • . . .
2o
Table 2.
SSI263A Modes of Operation •..•..•••.•.•....
20
Table 3.
SSI263A Registers •..• ~ .••••••••••..••.•.••.
21
Table 4.
Classification of Attributes •.••..
~········
22
Table 5.
Phoneme Duration Chart •.•••••••.••••.•.•...
24
Table 6.
Articulation Adjustment Chart •..•...•.....•
25
Table 7.
Amplitude Setting. . . • • . • • . • . • • • • . • . . • . • • • • .
26
Table 8.
Target Inflection Frequencies (Hz) ••••••...
28
Table 9.
Slope of Inflection •••.••••••..••.•••••.••
28
Table 10.
f)I'E!E!C:ll
• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
29
Table 11.
Address Decoding. • • • • • . . • • • • . • • . • • • • • • • • . . .
39
Table 12.
Filter's second Section as a Function
~ct1:E!
of Volume Control Setting •••.••.•••••••....
44
Table 13.
Program Speech Attributes ••.••••.•..•••...•
56
Table 14.
Phonetic Dictionary Basic Dictionary •.•.•••
60
viii
LIST OF FIGURES
Page
Figure 1.
I/O Channel Connector •••••••••••••••.•••••
10
Figure 2 • . simplified Block Diagram of
Speech Production.........................
.
.
Figure 3.
The SSI263A Block D1.agram •••••••••••••••••
Figure 4.
Acoustic Spectrum of the Phoneme
17
18
/AW/ 1 /E/ 1 and /I/ • • . • • • • • • • • . . . • . . • • • . . . •
19
Figure 5.
system Block Diagram •.•••••••••.•...••.•.•
34
Figure 6.
Speech Card Block Diagram ••.••••••••••...•
36
Figure 7.
Buffers and Data Direction Decoder
Block Dia-gram • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
38
Figure 8.
Address Decoder Circuit Diagram • • . . • • . • • • •
38
Figure 9.
Reset/Power Down Circuit Diagram..........
41
Figure 10.
Clock Driver Circuit Diagram..............
41
Figure 11.
Audio output Filter Circuit Diagram.......
43
Figure 12.
Filter's Second Section Gain and
Roll off Frequency as a function
of Volume Control Setting • • • • • • • • . • • • . . • • .
43
Figure 13.
Audio output Filter Frequency Response....
45
Figure 14.
LM386 Power Dissipation and THD level
as a Function of Output Power • . • • • • • . • • . • •
47
Figure 15.
Speaker Interface Circuit Diagram.........
47
Figure 16.
Overall Audio stage Frequency Response
(wide band) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ix
48
Page
Figure 17.
Overall Audio Stage Frequency ResponseA
(narrow band) . . . . . . • . • . . . . . . . . . . • • . . . . . . . . .
49
Figure 18.
Program Block Diagram . .....................
52
Figure 19.
Phonetic Speech Development Methodology ••••
65
Figure 20.
Phonetic Transcription Development
of the word "HELLO", step 1 .••••.•.•.•.••.•
Figure 21.
Phonetic Transcription Development
of the word "HELLO", step 2 ••••••••••••••••
Figure 22.
76
Wideband Spectrograph of the Word "hello"
Generated by the Author Voice •••.••••••••••
Figure 29.
75
Phonetic Transcription Development
of the word "HELLO", step a .•.••••.•.••••..
Figure 28.
74
Phonetic Transcription Development
of the word "HELLO", step 7 •••••••..•••••••
Figure 27.
73
Phonetic Transcription Development
of the word "HELLO", step 6 ••••.•••.••.....
Figure 26.
72
Phonetic Transcription Development
of the word "HELLO", step 5 ••••••••••....•.
Figure 25.
71
Phonetic Transcription Development
of the word "HELLO.", step 4. • • • • • • • • • • • • • • •
Figure 24.
70
Phonetic Transcription Development
of the word "HELLO", step 3 •••••••.••••.••.
Figure 23.
69
79
Wideband Spectrograph of the Word "hello"
Produced by the Speech Synthesizer •..••..••
X
80
ABSTRACT
SPEECH SYNTHESIZER FOR THE IBM PC
by
Raphael Kfir
Master of Science in Engineering
The objective of this project is to design, build, and
program a phonetic speech synthesizer to be used with an
IBM Personal Computer.
It involves hardware and software
development and the final product may be conceived as a
phonetic transcription development system for continuous
speech.
In addition, the project contains an overview on
synthetic speech, and a user's guide for speech development.
Speech is made possible with a Voice Synthesizer Processor integrated circuit manufactured by Silicon system
Incorporated.
Speech is generated by combining basic
speech elements called phonemes, and by controlling parameters such as duration, pitch, amplitude, and others.
The
circuit is built on a card which plugs into any of the peripheral connector slots inside the IBM PC.
xi
An a-ohm
speaker is used to reproduce the output speech.
The soft-
ware was written in Pascal and include the utility program
TALK.COM and the phonetic dictionary DICT.PHM.
It provides
a complete tool by which speech may be created, stored, and
retrieved.
Speech synthesis adds a new dimension to many systems.
The intent of the project was to develop the necessary
tools for its production.
The product presented here is an
inexpensive, flexible, and easy to use development system
for speech generation.
cated.
The system was designed and fabri-
Its performances were found to be acceptable as in-
dicated by the test
result~
xii
CHAPTER 1
INTRODUCTION TO SPEECH SYNTHESIZING
1.1 Electronic Voice
When a computer or other product talks to you, it is
said to have voice output.
This simply means that the de-
vice outputs a synthesized word or phrase.
There are dif-
ferent technologies that one can apply to generate electronic voice.
ural voices
High quality systems can generate quite natwhich sound nearly as good as a human speaker.
Voice output offers several distinct advantages
over other types of output:
• It's omnidirectional
• It reduces the operator's visual workload
. It offers better comprehension than other means
• It can be transmitted over a telephone without
additional special equipment.
However, synthesized speech lacks the visual cues of
human conversation.
Experiments by Dr. David Pisoni of In-
diana University indicate that synthetic speech imposes
greater demands on human memory than does natural speech,
simply because visual cues such as facial expression, gestures, and lip reading of conversation are missing.
Yet,
in many applications, a simple synthesized message can be
valuable to warn an operator of an imminent emergency.
1
2
1.2 The Formation of Speech Waves
If a signal in the frequency range of 20Hz to 20 kHz,
is connected to a loudspeaker, a sound is produced, and the
air surrounding the ear is perturbed.
Through the mecha-
nism of hearing, this perturbation is perceived as weak or.
strong, pu1sative or repetitive, and periodic or nonperiodic.
For the human speaker, the vocal tract causes the outgoing air stream to pulsate.
When the air is pushed out
from the lung, it must flow through the glottis, a passage
between the vocal cords, through the pharynx cavity and
then through the oral and nasal cavities before reaching
the outside air.
It is at the glottis and oral cavity that
the air stream is mostly modulated at a temporal rate high
enough to make it audible.
tis vibrate.
The vocal cords make the glot-
The jaw, tongue and lips change the size and
shape of the oral cavity to each type of sound generated.
The contractions are created by the tongue, the teeth, the
lips, the soft palate and the hard palate to produce the
perturbation of the speech waves.
For the electronic speech synthesizer, elaborate techniques are needed to achieve sound quality similar to a
sound created by a human speaker.
1.3 The Elements of Speech Sound
The speech sound is divided into voiced or unvoiced
sounds.
The division refers to the circumstance that, in
the former group the vibration of the vocal cords is the
3
first step in generating the oscillate components of the
airstream.
While in the latter group the vocal cords do
not vibrate and the speech is resulted from the excitation
of the vocal tract by turbulent air stream flowing through
a point of constriction or other sharp edges in the tract.
The voiced sound is of great importance as it is one
of the elements by which a speaker can be identified or a
word can be recognized.
Phonemes
Phonemes are the smallest distinguishable units of
speech.
The phonetic description of speech is an attempt
to classify speech sounds in terms of fundamental elements
or units which cannot be further subdivided.
These ele-
ments or phonemes are combined to produce morphemes, the
simplest meaningful sounds.
Morphemes are then arranged to
make up words and phrases.
The phonemes may be thought of as a code relating specific vocal tract configurations and excitations to specific speech sound.
Such a viewpoint would suggest that
speech can be analyzed, synthesized, and characterized.
1.4 The Techniques of Synthesis
Although numerous speech synthesis techniques exist,
only two methods are practical.
The first begins with the
human voice sample that is analyzed and regenerated using a
synthesizer algorithm.
The second method uses minimum
4
speech units, phonemes, that may or may not have been originated from a human voice.
Amplitude sampling
The simplest way to represent a waveform digitally is
to store its amplitude values at selected uniform time intervals.
The more samples, the higher the likelihood of an
accurate approximation of the original waveform.
The
trade-off is between quality and amount of memory required
to store the speech.
A 128k ROM may only hold one second
of speech data at high sampling rate.
Linear Predictive Coding (LPC)
Lower bit rate systems, such as those used in linear
predictive coding (LPC), store perceptually significant
features about an original waveform and then output a synthetic waveform that sounds like the original but is not
necessarily similar in appearance.
In LPC, glottis, vocal
tract, and lip movements are modeled as a filter which usually has 20 poles for 4 kHz bandwidth and 12 poles for 5
kHz bandwidth.
Parameters that correspond to the linear
prediction codes are K (filter), pitch and gain
(ampl;i.tude) •
They are usually extracted in 20 millisecond
segments or "frames".
Because sudden changes in parameter values may create
extraneous noises in the output, the values are interpolated between two consecutive frames, typically using a
linear interpolation method.
However, other methods are
5
used too.
AMI's 53620 speech synthesizer, for example,
uses a pseudolinear interpolation method with an interpolation period of 5 milliseconds.
Non-Linear PCM
In pulse code modulation (PCM), a band-limited waveform is sampled at twice the highest frequency (Nyquist
rate).
Non-linear PCM stores samples in a manner which
~inimizes
quantization errors.
Non-linear PCM is used by the telephone company because it
handles voice signals well even in a noisy environment.
Low bit-rate speech encoders also use PCM as the database
for speech analysis.
Typical PCM analog-to-digital conver-
sion takes data samples at 8 or 10 Hz sampling rates with
12-bit per sample.
A system with high storage capacity,
such as a VAX 11/780 minicomputer, is used to handle the
resulting 96 Kbit to 120 Kbit per second data rate.
ADPCM
Adaptive differential pulse code modulation (ADPCM)
essentially takes a PCM data word and compress it to 3 or 4
bits.
The ADPCM data bits describe the change from one
data point to the next rather than the PCM sample words.
Parametric Coding
This technique extracts information about the waveform
rather than uses the waveform itself.
When speech is
reconstructed from stored data, the resulting waveform may
6
not even look like the original one, but it will sound very
much like it.
A parametric model assumes either voiced or unvoiced
sound is spoken.
A voiced sound (usually a vowel sound) is
reproduced by a train of impulses that are separated by a
period of pitch.
The unvoiced sound is usually generated
by a random noise generator.
Formant Synthesizer
In a formant synthesizer, the glottal, vocal tract,
and lip-radiation models are implemented as sections of a
second-order filter resonator.
Tow parameters are required
to specify the input-output characteristics of a resonator:
the resonant (formant) frequency, f, and the resonance
bandwidth.
The voiced sound in a formant synthesizer is typically
implemented as a second-order cascade resonator, whereas
the unvoiced sound is implemented in a parallel configuration.
In addition, the glottal model is implemented as a
second-order low-pass filter.
Vocal tract parameters are defined by the formant frequencies and their related bandwidth.
The number of for-
mant frequencies is dependent on the bandwidth of the
speech signal.
For example, four formants are typically
presented in a 4 kHz band-limited signal and five are presented in a 5 kHz band-limited signal.
The extracted and coded parameters in the formant synthesizer represent formant frequency, bandwidth, pitch, and
7
gain values.
The parameters are usually extracted at 5 or
10 millisecond intervals in order to reduce noise introduced by abrupt parameter changes.
Inevitable Tradeoffs
The different methods of speech synthesis carry with
them tradeoffs in total storage requirements and voice
quality achievable.
ADPCM synthesizers can reproduce
voice, music, and sound effects of high quality using the
human voice sampling method.
The penalty is a very large
data rate of 12 to 32 Kbits per second.
Like ADPCM, LPC and partial coutocorrection (PARCOR)
are also based on human voice samples.
Both offer intelli-
gible speech but with lower data rates (1 Kbits to 9.6
Kbits per second) and lower storage requirements.
However,
these techniques result in lower speech quality than that
of ADPCM.
Phoneme-based synthesis, sometimes referred to as synthesis by rule, offers a so-called unlimited vocabulary
possibility at very low bit rates of 70 to 500 bits per
second.
quality.
Unfortunately, the generated speech lacks human
CHAPTER 2
THE IBM PERSONAL COMPUTER
2.1 Introduction
The IBM Personal Computer is a micro-computer system
for hobbyists and professionals.
It has been one of the
most popular micro computer for the last several years and
it is very well supported by IBM compatible software vendors.
Its large software base makes it an attractive com-
puter with which to add and process speech.
2.2 System description
The computer system consist of the system board that
fits horizontally in the base of the system unit, two double sided double density. disk drives, a monochrome monitor
and a standard typewriter-style keyboard.
The five expan-
sion slots normally occupied by diskdrive controller card,
a display driver card, and a memory expansion card.
The
speech synthesizer card may occupy any of the two remaining
expansion slots.
The heart of the system board is the Intel 8088 microprocessor.
This is an 8-bit external-bus version of
Intel's 16-bit 8086 Microprocessor, and is software compatible with the 8086.
The microprocessor operates at 4.77-
MHz rate, and the bus operation takes four 210-ns clocks or
840-ns.
I/O operations take five 210-ns clocks or 1.05-us.
8
9
The system board contains both Read Only Memory (ROM)
and Random Access Memory (RAM).
512k of ROM and 256k of RAM.
It has space for up to
Additional 384k RAM for a
total of 640k is provided through the I/O channel.
Additional features of the system include the optional
8087 coprocessor, four channels of direct-memory access
(DMA), three programmable timer/counters, and eight prioritized levels of interrupt, six of which are bussed to the
system expansion slots.
The IBM PC memory and I/O channel assignments are
shown in appendix B.
2.3 I/0 Channel
Each expansion slot has a 62-pin card-edge connector
as shown in Figure 1.
The following is a description of
the I/O channels.
Signal
I/0
AO-A19
0
Description
Address bits
o
to 19: These lines are used
to address memory and I/O devices within the
system.
AO is the least significant bit
(LSB) and A19 is the most significant bit
(MSB).
The speech card uses the signals AO
through A9.
AEN
0
They are active high.
Address Enable: This line is used to de-gate
the microprocessor and other devices from
the I/O channel to allow DMA transfers to
10
Rear Panel
1\
Signal Name
~
GND
+RESETDRV
+SV
+IRQ2
r-81
-
-
'-
Al-
Signal Name
~
-1/0CH CK
+07
+06
+OS
-
-
r- 810 A10
_._
-svoc
+DR02
-12V
·Reserved
1-
+04
1-
+03
+02
+01
+DO
+1/0CH ROY
+AEN
+A19
+A18
+A17
f-
+12V
GND
-MEMW
1-
-MEMR
-lOW
!--
-lOR
1-
-DACK3
+DR03
-DACK1
+DRQ1
-DACKO
CLK
+IR07
+IRQG
+IRQS
+IR04
+IR03
-DACK2
1-
r1-
-
-
-
+A16
+A15
+A14
+A13
+A12
+All
-
820 A2o:-
11-
r1-
-
...
+TIC
+ALE
+5V
+OSC
+GND
1I--
831
-
+A10
+A9
+AB
+A7
+AS
+AS
+A4
-
.-
+A3
+A2
-
+A1
+AO
A31 .._
\
\
Com ponentSide
(Source: IBM Technical Reference Manual)
Figure 1
I/O Channel Connector
11
take place.
When this line is active
(high), the DMA controller had control of
the address bus, data bus, Read command
lines (memory and I/O), and the Write command lines (memory and I/0).
ALE
0
Address Latch Enable: This line is used on
the system board to latch valid addresses
from the microprocessor.
It is available to
the I/O channel as an indicator of a valid
microprocessor address.
It is not used by
the speech card.
CLK
o
System clock: It has a period of 210-ns
(4.77-MHz), and a 33% duty cycle.
D0-07
I/O
Data Bits 0 to 7: These lines provide data
bus bits 0 to 7 for the microprocessor, memory, and I/O devices.
DO is the least sig-
nificant bit (LSB) and 07 is the most
significant bit (MSB).
These lines are
active high.
-DACKO
0
-DMA Acknowledge 0 to 3: These lines are
to
used to acknowledge DMA requests (DRQ1-DRQ3)
-DACK3
and refresh system dynamic memory (-DACKO).
They are active low.
These signals are not
used by the speech card.
12
DRQl to
I
DMA Request 1 to 3: These lines are channel
requests used by peripheral devices to gain
DRQ3
DMA service.
These lines are not used by
the the speech card.
-I/0 CH
I
CK
-I/0 Channel Check: This line provides the
microprocessor with parity (error) information on memory or devices in the I/O channel.
A low signal indicates parity error.
This line is not used by the speech card.
I/0 CH
I
RDY
I/O Channel Ready: This line, normally high
(ready), is pulled low (not ready) by a memorJ or I/O device to lengthen I/O or memory
cycles.
It allows slower devices to attach
to the I/O channel with a minimum of difficulty.
This line is not used by the speech
card.
-IOR
0
-I/0 Read Command: This command line instructs and I/O device to drive its data
onto the data bus.
It may be driven by the
microprocessor or the DMA controller.
This
line is used by the speech card and is
active low.
-row
0
-I/0 Write Command: This command line instructs an I/O device to read the data on
13
the data bus.
It may be driven by the
microprocessor or the DMA controller.
This
signal is used by the speech card and is
active low.
IRQ2 to
I
Interrupt Request 2 to 7: These lines are
used to signal the microprocessor that an
IRQ7
I/O device requires attention.
They are
prioritized with IRQ2 as the highest priority and IRQ7 as the lowest.
This signal is
not used by the speech card.
-MEMR
0
-Memory Read Command: This command line instruct the memory to drive its data onto the
data bus.
It may be driven by the micropro-
cessor or the DMA controller.
This signal
is active low and is not used by the speech
card.
-MEMW
0
-Memory Write Command: This command line instructs the memory to store the data present
on the data bus.
It may be driven by the
microprocessor or the DMA controller.
This
signal is active low and is not used by the
speech card.
osc
0
Oscillator: High-speed clock with a 70-ns
period (14.31818-MHz).
It has a 50% duty
cycle and is not used by the speech card.
14
RESET
0
DRIVE
Reset Drive: This line is used to reset or
initialize system logic upon power-up or
during a low line-voltage outage.
This
signal is synchronized to the falling edge
of CLK and is active high.
This signal is
used by the speech card.
T/C
0
Terminal Count: This line provides a pulse
when the terminal count for any DMA channel
is reached.
This signal is active high and
is not used by the speech card.
CHAPTER 3
THE SSI263A PHONEME SPEECH SYNTHESIZER
3.1 Introduction
The Silicon Systems SSI263A is a versatile phoneme
speech synthesizer, packaged as a single 24 pin monolithic
CMOS (complementary metal-oxide) integrated circuit.
It
provides an analog output for music, sound effects, and
continuous speech of a large vocabulary at low data rates.
Speech is synthesized by combining phonemes in the appropriate sequence.
The SSI263A speech synthesizer con-
tains 64 different phonemes each with four different duration settings giving an equivalent of 256 phonemes.
It operates on a 5-volt supply.
Its interface cir-
cuitry contains five eight (8) bit wide internal registers
that allow software control of speech rate, pitch, pitch
movement rate, amplitude, articulation rate, vocal tract
filter response (useful for sound effect), and phoneme selection and duration.
3.2 The SSI263A Functional Description
The description in this section provides the SSI263A
features, capabilities, and control information.
15
16
Production of Speech
The production of speech phonemes is done by the principle of formant synthesis.
An electronic model of the hu-
man vocal tract is constructed with five cascaded programmable low pass filter sections.
The filter sections
are programmed internally by a digital controller.
Either
a glottal (pitch) or a pseudo-random noise source is used
to excite the vocal tract, depending on whether a voiced or
non-voiced phoneme is selected.
The model mimics the natu-
ral resonances of the vocal tract, and the audio output
contains bands of resonant frequencies called formants.
Figure 2
shows a simplified block diagram of formant
speech production, and figure 3 shows the full block diagram of the SSI263A.
The formant frequencies, of which three are normally
required for adequate speech synthesis, may range in frequency from 200 Hz for the first formant in the male
speaker to 2000 Hz for the third formant in the female
speaker.
In addition, two more filters are employed to
produce the fricative and stop consonants and a nasal resonator to simulate the nasal consonants. The SSI263A employ~
switched-capacitor filter.
The exact placement of the formant frequencies within
the audio spectrum determines the sound that we interpret
as speech.
phonemes
Figure 4 shows the acoustic spectrum of the
/AW/, /E/, and /I/, and Table 1 shows some exam-
ples of phonemes and their formant frequencies.
17
NOISE
SOURCE
FRICATIVE
RESONATOR
)
AF ~>
~
-=$='
PITCH
ft
-AA .A
y
'YY
""" .;/'v
GLOITAL
PULSE
OSCILLATOR
r+
~
F2
..A.
AA
"' "
);+
Jt 7•
AV~
>
*
FORMANT
SPEECH
OUTPUT
FJ
.A
..A
" ""
L.,
~
FN
~
AN
>
>
·V'V-
NASAL
RESONATOR
-.:!:-
-=-
Figure 2
Simplified Block Diagram of Speech Production
I
vv
vv I
ASO liSt AS1
v.
9
y
,,., " ~
r
~
DO
~
CD
Dt
en
en
02
H
N
0\
w
>
tJ:I
......
0
0
~
03
....
"'j
a.
~
t1
Dl
w
Dl
CD
....0I»
or
\Q
t1
vv
D0-05
I-- REGISTER
3
V
It
ORO. OAt
r-
TT
JO. ''·"'
AO·IU
AEOI:TfA
Hr1 -.
IIIO~TI"
IN,LECTION
RAM PI NO
LOOIC
t3-15
I
1--
I
AI).A,
TAO-TAl
,Jo en
'- ...~,,.. "'"
I
~
..r
P'HONEME
TIMING
LOOIC
....
I
. II i
~
-
ADDRESS
LOOIC
en
00
I» r:!
rtt1
I» 0
ID
·~
;o,m
cso
~+~
..
AMPLI-
'ILTEA5
LATCH
~
I
i.ol-
DATA
LATCHES
TO'
VOCAL
TRACT
CONTROL!
1t-1-
HIGH·PA$$
FILTER
~
CLOCK
TIMING
LOOIC
w · ~w
ICCK
DIYI
~
AIR
~
H
FILTER:t
,.,.,
1--
FILTER1
l-j
rI
FILTER!
1-c
l-
't
~
I'RICATIVE
SOURCE
r
........
'
OLOnAL
SOURCE
NOISE
GENERATOR
I
A.
"LTEA•
I--
~
VOCAL TRA
CLOCK
I
1--'
......
CLOSURE
RAMP
TIMING
... _. ....
........ "
TAANsmoN
CONTAOLLEA
TRANSITION
RAM
1&-110
1-- AEO~!ITEA
~
r
PHONEME
CHAAACTERISTCS
ROM
NOISE
SHAPING
I'ILTER
);;-
~ 1-
Lt
t7->
AO
1--
ANALOG CIRCUIT POWEA ENABLE
,_
~
~
'll'2
DONO
AONO
--(>ANALOG
tJ:I ••
0
oen
~en
-H
.....
0)
19
ACOUSTIC
SPECTRUM
/AW/
FREQUENCY
/E/
FREQUENCY
/I/
FREQUENCY
Figure 4
Acoustic Spectrum of the Phonemes /AW/, /E/, and /I/
20
Table 1
Example of Phonemes and Their Formant Frequencies
F1 (Hz)
F2(Hz)
F3(Hz)
Qffice
629
928
2078
2820
4138
E
MEET
307
1948
2364
3139
4134
ER
BIRD
451
1162
1428
2180
4144
I
SIX
419
1672
2221
2982
4138
s
SAME
Phoneme
AW
Example
F4(Hz) FS(Hz)
4162
Modes of Operation
The device can operate in four different modes.
The
different modes include choice of timing response between
"frame" or "phoneme", transitioned or immediate inflection
response, and setting the A/R (Acknowledge/Request Not) pin
for active or disabled operation.
The four modes of opera-
tion are shown in Table 2.
Table 2 - SSI263A Modes of Operation
DR1 ORO
Hi
Hi
FUNCTION
A/R NOT active; phoneme timing response;
transtioned inflection.
Hi
Lo
A/R NOT active; phoneme timing response;
immediate inflection.
Lo
Hi
A/R NOT active; frame timing response;
immediate inflection.
Lo
Lo
Disabled A/R NOT output only; does not change
previous A/R NOT response.
Q
21
The state of the Duration/Phoneme Register bits DRl
and DRO determine the operating mode of the device.
The
mode is selected when the Control bit (CTL) is changed from
a logic one to a logic zero.
The Speech Attribute Registers
Speech is produced by programming speech attribute
(characteristic) data into five
eig~t-bit
registers.
These
internal registers allow selection of phonemes and speech
characteristics.
The registers are addressed or selected
via the three register select lines (RS2, RSl, and RSO) as
shown in Table 3.
Table 3 - SSI263A Registers
RS2 RSl RSO NAME
SYM D7
Lo
Lo
Lo
Duration
Lo
Lo
Hi
Inflection IS
DP
D6
D5 D4 D3
D2 Dl DO
DRl DRO P5 P4 P3
P2 Pl PO
N3
N2 Nl NO
S2 Sl so
IlO I9
IS I7 I6
I5 I4 I3
R3
Rl RO Ill I2 Il IO
N4
Lo
Hi
Lo
Rate/Infl
Lo
Hi
Hi
Control/
Tran Rate;
Amplitude TA
CTL T2
Tl TO T3
A2 Al AO
Filter
Frequency
F7
F5 F4 F3
F2 Fl FO
Hi
X
X
RE
FF
R2
F6
•
22
The SSI263A has two general classes of attribute data:
"control" and "target".
Table 4 shows the attribute con-
trolled in each class.
Table 4 - Classification of Attributes
TARGET DATA
CONTROL DATA
Speech Rate
Phoneme Selection
Filter Frequency
Audio Amplitude
Phoneme Articulation Rate
Transitioned Inflection
Phoneme Duration
Immediate Inflection
Inflection Movement Rate
The SSI263A responds immediately upon loading
"control" data.
Upon loading "target" data the device be-
gins to move towards that target at the prescribed transition rates.
The fully internal linear transitioning be-
tween target values, done in a manner as is found in normal
speech, is a key factor in reducing control data rate without sacrificing speech quality.
Setting in a speech mode (i.e. phoneme timing, and
transitioned inflection), the phoneme may be accompanied by
eight (8) different attributes.
They are: duration, ampli-
tude, target inflection, slope of inflection, rate of
speech, extension and range pitch, and filter frequency.
The attributes and the phonemes selection are now explained
in details.
23
Phoneme Selection
Phonemes are selected by the six least significant
bits (DO-D5) of the Phoneme/Duration (DP) register.
[ X X P5 P4 P3 P2 Pl PO ]
The 64 different symbols constitute the SSI263A phonetic alphabet.
Of the 64 symbols, 34 represent sound ba-
sic to the pronunciation of American English.
The remain-
ing 30 symbols fall into 2 groups: the ALLOPHONE group and
the NO-SOUND group.
The basic sound symbols are:
A, AE, AH, AW, B, D, E, EH, ER, F, HF, I, J, K, KV, L, M,
N, NG, O, 00, P, R, S, SCH, T, TH, THV, U, UH, V, W, Y, Z.
Symbols in the allophone group represent speech sounds
that vary in pronunciation from one of the basic sounds.
They may be used in transcribing word segments (syllables
or morphemes) whose pronunciations are not satisfied by the
basic phonemes alone such as words rooted in a foreign language, or words adapted by a regional dialect.
The ALLO-
PHONE symbols are:
Al, AEl, AHl, AY, El, E2, EHl, HN, HV, IE, IU, IUl, Ll, LB,
LF, OU, Rl, R2, Ul, UHl, UH2, UH3, Yl, :A, :OH, :U, :UH.
The no-sound symbols represent silent states.
/PA/ symbol represents a "pause" state.
The
It is used to sep-
arate phoneme sequences into phrase-like segments which assist in more closely imitating the natural pausing in human
speech for breathing or for delayed emphasis.
The pause is
subjected to all the attributes used in programming the
other phonemes.
Other no-sound symbols represent "hold"
24
states.
They are used in combination with basic phonemes
or allophones to generate articulation.variations on their
pronunciations.
The NO-SOUND symbols are:
FC, HVC, PA.
Appendix E presents the 64 different hexadecimal
phoneme codes accompanied by their symbols.
Duration Adjustment
A phoneme specified by the six least significant bits
of the register DP may be set to four different durations
by the two most significant bits DRl and ORO of the same
register.
[ DRl ORO X X X X X X ]
Table 5 shows the timing of phonemes based on a default rate of a hexadecimal "A".
Table 5 -
Phoneme Duration Chart
DRl
ORO
DURATION
Hi
Hi
24.576 ms
Hi
Lo
49.152 ms
Lo
Hi
73.728 ms
Lo
Lo
98.304 ms
Articulation Adjustment
Articulation is the transition rate from one set of
formant position to a new set determined by a new selected
phoneme.
The three bits (T2, Tl, and TO) of the register
TA select one of eight transition rates (this transition
25
rate which is done for articulation purposes should not be
confused with the inflection transition rate discussed
later).
[ X T2 T1 TO X X X X ]
This rate is relative in that articulation is not affected by speech rate and is given in terms of a duration
period.
Table 6 give the eight different articulation set-
ting.
Table 6 - Articulation Adjustment Chart
T2,T1,&TO
DURATION
(in decimal)
7
1
6
2
5
3
4
4
3
5
2
6
1
7
0
8
Amplitude Adjustment
The overall audio output level is set with the four
least significant bits of the register TA.
[ X X X X A3 A2 A1 AO ]
26
An amplitude adjustment may be used to enhance the
speech quality and add emphasis.
It is transitioned lin-
early at rate dependent on the phoneme duration setting.
Of the 16 amplitude levels, the one set by the
hexadecimal code
o is
of special value.
At that level, all
instantaneous setting at output will be delayed until excitation source into vocal tract ramps down to zero excitation. It is used for silent phonemes and provides more resolution for amplitude decays.
Table 7 gives the 16 ampli-
tude settings.
Table 7 - Amplitude setting
AMPLITUDE
A3-AO
v
0
.oo
1
.07
2
AMPLITUDE
A3-AO
v
8
.53
-5.51
-23.10
9
.60
-4.44
.13
-17.72
10
.67
-3.48
3
.20
-13.98
11
.73
-2.73
4
.27
-11.37
12
.80
-1.94
5
.33
-9.63
13
.87
-1.21
6
.40
-7.96
14
.93
-0.63
7
.47
-6.56
15
1.00
dB
dB
0
Inflection Adjustment Cpitch)
There are two type of inflection control depended on
the mode of operation.
The immediate inflection mode used
27
for music, and the transitioned inflection mode used for
speech.
In the immediate inflection mode, there are 12 control
bits (Ill-IO) spread over registers IS and RE.
[ IlO I9 IS I7
[
X
I6 IS I4 I3 ]
IS
X X X Ill I2 Il IO ]
RE
The total of 12 bits of inflection gives seven octaves
on an even tempered scale.
It is useful for singing, musi-
cal sound effects, and for duplicating human inflection on
a frame basis.
The inflection period cannot be less than
five times the vocal tract clock period set by the filter
frequency register.
In the transitioned mode, the five most significant
bits of the IS register (N4-NO) control one of 32 target
values of inflection.
The three least significant bits
(52-SO) determine the rate (slope) at which the inflection
will approach its new target position.
[ N4 N3 N2 Nl NO 52 Sl SO ]
Table 8 shows the 32 levels of targets frequencies.
Level 0 is associated with the Hex Code IS=$00
31
and level
with IS=$F8.
Table 9 shows the 8 slope-of-inflection levels while_
in the inflection transitioned mode.
A Frame is defined as
one duration period and the duration is set in the register
DP
28
Table 8 - Target Inflection Frequencies (Hz)
LEVEL FREQUENCY
LEVEL FREQUENCY
LEVEL FREQUENCY
0
61
11
93
22
195
1
63
12
98
23
217
2
65
13
103
24
244
3
67
14
108
25
279--
4
70
15
115
26
325
5
72
16
122
27
391
6
75
17
130
28
488
7
78
18
140
29
651
8
81
19
150
30
997
9
85
20
163
31
1953
10
89
21
178
NOTE: SO=S1=S2=IO=I1=I2=1ow; I11=high.
Table 9 - Slope of Inflection
Hex Code
S2-SO
Number of
Frames
0
8
1
7
2
6
3
5
4
4
5
3
6
2
7
1
29
Rate of Speech
The 4 most significant bits of the register RE are
designated to select one of 16 overall settings of speech
rates.
[ R3 R2 R1 RO X X X X ]
With level 10 being the normal setting Table 10 shows
the different rate of speed of speech compared to the normal setting.
Level
o
corresponds to R=O (Hex Code 0) and
level 15 to R=15 (Hex Code F)
Table 10 - Speech Rate
Level
Rate (%)
Level
Rate (%)
0
38
8
75
1
40
9
86
2
43
10
100 (Normal)
3
46
11
120
4
50
12
150
5
55
13
200
6
60
14
300
7
67
15
Apprx. 2 ms.
Filter Frequency
The eight bits of the register FF are used to set or
shift the frequency of all of the vocal tract filters
(formants) similar to slowing down or speeding up a record
player, but with a greater range and without effecting inflection or other timing.
Compared to a normal setting for
30
speech, the filter frequency register can lower the vocal
tract filter frequencies greater than a factor of 10 and
increase them by a factor of 12.5 times, which give a good
sound effects capabilities.
The normal vocal tract filter frequency setting is 20
KHz which is a hexadecimal 'E7' or decimal 231.
The fol-
lowing is a formula to calculate the vocal tract filter
frequency setting:
Frequency
=
0. 5E6 1 (256 - DC)
( 1)
where;
DC = decimal equivalent
3.3 Description of Pin Function
Pin
1
AO (Audio) - Analog output, D.C. biased at VDD/2.
Pin
2
AG (Analog Ground) - Provides the option to connect the analog section to a good ground to
eliminate noise in the output signal.
Pin
3
TP1 (Test Point One) - This pin low will allow
external glottal signals to be put into Pin 5.
Pin
4
A/R NOT (Acknowledge/Request Not) - Digital open
collector output.
new data.
When forced low it requests
It can respond on a frame or phoneme
boundary condition or be deactivated.
This sig-
nal can also be read on "D7 11 in an inverted
state.
Pin
5
TP2 (Test Point Two) - Normally not used.
When
Pin 3 is low, Pin 5 can be used to input an
external glottal pulse signal.
31
Pin
6
RS2 (Register Select Two) - Used in conjunction
with RS1 and RSO, these three register select
lines are used to select one of five internal
registers for writing.
Pin
7
RS1 (Register Select One)·
Pin
8
RSO (Register Select Zero)
Pin
9
DO (Data Zero) - Input only, tied to microprocessor data bus.
Pin 10
D1 (Data One)
Pin 11
D2 (Data Two)
Pin 12
Vss (Ground)
Pin 13
D3 (Data Three)
Pin 14
D4 (Data Four)
Pin 15
D5 (Data Five)
Pin 16
D6 (Data Six)
Pin 17
D7 (Data Seven) - MSB of eight bit data bus.
Data seven is bi-directional.
When read high is
an active request for new data and low is an
acknowledge that data has been received.
(A/R)Not.
Pin 18
PD/RST NOT (Power Down Not/Reset Not) - This input is active in the low state.
Activating this
signal will directly set the internal CTL bit to
an one.
This will silence the audio output and
reduce the power consumption of the analog circuitry.
The CTL bit can be reset to zero with a
TA register load.
32
Pin 19
cso (Chip Select zero) - Control input which selects the SSI263 on a microprocessor controljmapping bus.
Pin 20
Active high state.
CSl NOT (Chip Select One Not) - See cso, active
low state.
Pin 21
R/W NOT (Read/Write Not) - Control input, write
is an active low state (writes into D0-07), read
is an active high state (reads 07 only).
Pin 22
XCLK (External Clock Input) nally supplied clock.
Inpu~
for exter-
Normal frequency input is
2.0 MHZ or 1.0 MHz dependent upon level of "DIV2"
input.
Pin 23
DIV2 (External Clock Divide by Two Input) - When
this input is high, XCLK=2 MHz.
When this input
is low XCLK=l MHz.
Pin 24
Vdd (Positive Voltage supply) - Operating range
is +4.5 VDC to 6.6 VDC.
CHAPTER 4
HARDWARE DESIGN
4.1 System overview
The speech system consist of the host computer (the
IBM PC), the speech board, and the ·external speaker and
reset/power down switch.
The heart of the system is the
speech card which performs all the speech processing.
The
host computer contains the software to control the speech
board and to provide the operator with a utility program
necessary for speech development.
Figure 5 shows the sys-
tem block diagram. ·
The Speech Card plugs directly into a I/O Channel connector slot inside the IBM PC chassis.
An a-ohm external
speaker mounted in a plastic box, and an ON/OFF switch are
placed outside the chassis.
The switch serves for
RESET/POWER DOWN purpose when in the OFF position, and is
mounted on the side wall of the speaker box.
Speaker ca-
bles are fed through access holes in the rear panel to the
card.
The circuit board is a Vector Electronics "prototype"
card made for the IBM PC.
It has a card-edge connector to
mate to the interface slot, and provides plated-through
holes on 0.10-inch centers for installing I.e. sockets.
There are ten integrated circuits; they are the speech
33
MONITOR
rn
'<
(/1
rt"
ID
s
t:d
~
0
0
~
t:='
.....
I»
\Q
t1
~
l2j
.....
~t1
IBM PC
BUFFERS
SPEECH
PROCESSOR
SPEAKER
AUDIO
FILTER tAMPLIF"ER
CD
Ut
KEYBOARD
ADDRESS
CLOCK
DECODER
DRIUER
RESET/
POWER DOWN
SWITCH
---
--~------~--
~-------
w
.!=>-
35
processor, interface buffers, amplifier, 1-to-8 decoder,
clock driver, and gates.
to allow
Two component carriers are used
resistors and capacitors to be wire-wrapped to
the circuitry.
Two larger capacitors are mounted directly
on the circuit board.
A block of 20 test points, as well
as the volume control potentiometer is located at the top
of the board.
The board is powered through the I/O Channel connector
and uses the +5 Volt and the +12 Volt power supplies.
The circuit consists of three main sections; they are
the IBM PC interface, including the address decoder; the
speech processor including the support circuitry; and the
audio output that includes the filter, power amplifier, and
the speaker interface.
A block diagram of the card is
shown in figure 6. The circuit board schematic is shown in
Appendix A.
4.1 Interface and Address decoding
Since the SSI263A is compatible with the 8088, the interface requirements are minimal.
The speech card inter-
face to the IBM PC includes input buffering, data direction
decoder and address decoder.
Input Buffering
Input buffering is required in order to protect the
IBM PC from any hazardous condition on the speech card, and
to provide higher fan out to some of the signals.
Out of the sixty-two (62) signals available on the I/O
EDGE
CONNECTOR
,..-
...
..
_
r
DATA
BUFFER
t ..
,.-
.
SSI263A
(I)
(I)
::r
n
11
0.
IJf
....
~
~
11
0
0
0\
r--
T
i
1-1
~
PROCESSOR
DATA
DIRECTION
LOGIC
0
PI
SPE KER
CONI CTOR
SPEECH
til
't1
AUDIO
7
POWER
FILTER
AMPLIFIER
UOLUME
SPEAKER
CONTROL
INTERFACE
ADDRESS
(I)
BUFFER
1
....0PI
ADDRESS
11
DECODER
I.Q
PI
s
~
.
TO
SPEAKER
1..
"
I
.
-
CONTROL
BUFFER
1...
CLOCK
RESET
DRIUER
LOGIC
"f ..,.
-
TO
RESET
SWITCH
w
0'\
37
Channel, only twenty-seven (27) are used on the speech
card; they are eight data bits (DO-D7); ten address bits
(AO-A9); four control bits (IOR NOT, IOW NOT, RESET DRV,
AEN); and five power lines (+5V,+5V,+l2V,GND,GND,GND).
The data signals are bi-directional, and are buffered
with a 74LS245 device.
The address and control bits are
unidirectional and are buffered with 74LS244 devices.
Data direction decoding
The data buffer Ul is controlled by two signals: PORT
SEL NOT and IORB NOT.
The buffer is disabled until the
speech card is selected and an I/O action is requested
(PORT SEL NOT is true).
Once enabled, its direction is
controlled by the IORB NOT.
Figure 7 shows the detailed block diagram of the
buffers and data direction decoder.
Address Decoder
The speech card uses three address lines to access its
five registers.
Hence, the address decoder logic provides
a total of eight addresses once the card is selected.
The SSI 263A has two chip select signals:
NOT.
cso and CSl
The speech card uses CSO for chip selecting, and make
CSl NOT always selected (grounded).
The speech card uses addresses 300H-307H which are in
the range of addresses used for prototype cards.
The heart of the decoder is 74LS138, 3-to-8 decoder.
Addresses A6, A7, and AS are used as the chip enable and
38
EDGE
CONNECTOR
D007
DATA
BUFFER
74LS24S
DB0-DB7
WR
SELECT
lOR
JOW
Pl0A2
RESET
AB0-AB2
AB3-AB7
A3A7
AENB
PI EN
Figure 7
Buffers and Data Direction Decoder Block Diagram
+AEN
+A9
+AS
+A7
+A.;
+AS
+A"4
+A3
3
c
2
B
1
A
ADDRESS
300-307 CDEC)
Figure 8
Address Decoder Circuit Diagram
39
addresses A3, A4, and A9 are used to select the second output of the device (output 1).
The output of U4 is then
AND'ed with AEN and AS.
The address decoder table is given in Table 11 and the
circuit diagram is given in figure 8.
AEN (Address Enable)
signal is used in the decoder so that the card is degated
during DMA transfer.
Table 11 - Address Decoding
SSI REG. ADDRESS AEN A9 AS A7 A6 AS A4 A3 A2 A1 AO
D/P
300
0
1
1
0
0
0
0
0
0
0
0
I
301
0
1
1
0
0
0
0
0
0
0
1
R/I
302
0
1
1
0
0
0
0
0
0
1
0
C/A/A
303
0
1
1
0
0
0
0
0
0
1
1
F
304
0
1
1
0
0
0
0
0
1
0
0
not used
30S
0
1
1
0
0
0
0
0
1
0
1
not used
306
0
1
1
0
0
0
0
0
1
1
0
not used
307
0
1
1
0
0
0
0
0
1
1
1
4.3 Speech Processor and Supporting Circuitry
The SSI263A is the phoneme speech synthesizer.
It is
addressed by three address lines to access its five registers and the data is written to it by providing the a-bit
data bus to its data input and latch it by the WR NOT signal.
Before a new phoneme is send out the MSB of the DP
register is read in order to determine if the processor is
40
ready for a new phoneme.
All other registers may be writ-
ten while the processor is still executing a phoneme.
The supporting circuitry includes the clock and the
Reset/Power Down circuit.
Reset/Power Down
Activating the PD/RST NOT in the low state, will directly set the internal CTL bit to an one.
This will si-
lence the audio output and reduce the power
consump~ion
the analog circuitry.
of
An internal switch is used in order
to keep the speech processor in this mode while not in
used.
This line is OR'd with the RESET signal presented
from the host computer.
Figure 9 shows the RESET/PO NOT
circuit diagram.
Clock Input
The speech processor is designed to operate with a
clock frequency of lMHz or 0.895MHz.
The DIV2 signal is
provided as an option to allow the use of an internal divide by two circuit, to provide 2MHz or 1.79MHz (TV color
burst frequency).
The speech board is using a 4MHz clock
driver integrated circuit, and one flip-flop to reduce the
clock to 2Mhz.
The DIV2 signal is held high in order to
produce the -internal lMhz operation.
Figure 10 shows the
clock driver circuit diagram.
4.4 Audio Output
The audio output section of the speech board includes
the filter, the power amplifier, and the speaker interface.
41
RESET
11 74!LS02
~~-----------------1~2~
~~1~3~----
+5
POWER DOWN
POWER DOWN RTN
TO
SWITCH
Figure 9
Reset/Power Down Circuit Diagram
l-4
TP
-4 MHZ
I osc.
CLOCK
7"'1LS7-4
.A. D PRO s
MX055
Is
3
1
+5
"". Jl<
CCL l:i
6
rl
Figure 10
Clock Driver Circuit Diagram
XCK
42
The quality of this section is of great value to the overall system performance.
The filter
The filter consist of a two pole, simple RC, low pass
filter with a frequency dual pole at 1KHz and a big capacitor to block the DC bias of the speech processor output.
Figure 11 shows the filter circuit diagram.
The first section of the filter has the transfer
function
V2
Vl
1
= 1----------+ Rl Cl s
{2)
This section creates a "pole" at the frequency 1.69K.
The second section of the filter has the transfer
function:
V3
V2
R3B
1
X --------------------
= R2+R3A
------
(R2+R3A)R3B
{3)
1 + ----------- C2 s
R2+R3A+R3B
Note that the volume control is R3A + R3B and its
ting is represented by the ratio of these two values.
set~
The
equation may also be written as:
V3
V2
1
=G
----------1 + R' C2 s
(4)
This transfer function creates a pole which is shifted
as a function of the volume control.
Table 12 shows the
gain values as well as the second pole frequency as a function of the volume control setting.
those values graphically.
Figure 12 present
43
V2
Rl
2KAA (3)
(2)
R2
3.3K
.AA
,.
T
(-4)
R3A*
-A,.A
>
Ill ~ .AAE01
::
:;..R3B
.;..
F" .0-47UF
Cl
::r:.0-47UF
Vl
(5)
V3
j_
-=:
SECTION!
Ill
SECTION2
R3A AND R3B ARE REPRESENTING THE VOLUME CONTROL
Figure 11
Audio Output Filter Circuit Diagram
ROLL OFF
FREQUENCY
GAIN
-4 KHZ
3 KHZ
-5 DB
2KHZ
-10 DB
KHZ
-15 DB
1
-20 DB
0
25~
50~
VOLUME
75:.:
SETTING
Figure 12
Filter's Second Section Gain and Roll off Frequency
as a function of Volume Control Setting
44
Table 12
Filter's Second Section as a Function of
Volume Control Setting
Vol.
R I (K)
G
G(dB)
fp
10%
0.92
.075
-22.5
3680
12.5%
1.13
.094
-20.5
3000
25%
2.03
.19
-14.5
1670
SO%
3.12
.38
-8.5
1085
75%
3.27
.56
-5
1040
100%
2.48
.75
-2.5
1370
Figure 13 shows the filter frequency response analysis
done with Spice circuit simulator.
The simulation was done
with the volume control set at 50%.
Power Amplifier
The output power is provided by the LM386 Low Voltage
Audio Amplifier.
The gain is internally set to 20 (26 dB)
and can be increased up to 200 (46dB) with the addition of
a resistor and a capacitor between pins 1 and 8 of the device.
The speech board uses the amplifier at 26dB gain.
The inputs to the amplifier are ground referenced with
an internal 50K resistor while the output is automatically
biased to one half of the supply voltage (6V).
The device output power is o.7W with a total harmonic
distortion (THD) of 3% , or 0.8W with THD of 10% •
Figure
45
+
:::::..::::
to
c
+
•
:~:
I
\
\
I.
•
+\:\
I
I
I
:
I
I
1
\\
.,b.
....'V·, \,
'\ \ \ \
""'
-.._::, \
\ '\
.
\, •
[..()
\
'».\_ \.
~'-·"
~-\_ " \
"\.,'\
'.
··
'
'
0
I
I
I
'c
.£:
E
oO
+--------~·--~---4--------~--------+--------+
o
N
1
CD
1
o
v
I
o
~
I
o
~
I
o.......-1
I
Figure 1.3
Audio Output Filter Frequency Response
L'
>
C
46
14 shows the device power dissipation and THD level as a
function of output power.
Speaker· Interface
A 100-uf capacitor is used in series with the speaker
in order to block the DC bias and cut off the low frequencies at the output of the power amplifier.
The RC network
produces a high pass roll-off frequency at:
1
f
1
=
2
R C
= ------------ =
2
8 lOOE-6
199 Hz
(5)
This roll-off frequency is a desired value for the
overall audio output characteristic, and it eliminates the
need of low frequency filtering in the filter section.
The
R-C low pass filter from the amplifier output to ground is
the frequency compensation recommended by the manufacturer.
Figure 15 shows the speaker interface circuit.
Overall Audio Output Performance
An overall Audio Stage performance has been analyzed
with the SPICE circuit simulator.
Appendix F contains the
circuit module along with analysis output file.
Figure 16
shows the overall transfer function over a wide frequency
band (.1Hz-100KHz).
Figure 17 shows the same function
over the significant frequency band (100Hz-10KHz).
It
should be noted that since the input voltage to the circuit
model is 1 VRMS (0 dB), the output voltage across the
speak~r
(Vl2), is actually a representation on the overall
gain, expressed in dB.
47
2.1
.
1J
i
•
a
i
Cl
Ill
u
1.1
t.4
tJ
Va • tiV
~
~
t.l
r1
~
~
/
•••
~
~
I
a
~
V, • t2V
) ~
J -~"'V,':lv
I
~
IJ ........,THD
·" .. -1m5
._LEVEL
Yt•IV
:::.~._.(/
-;..·.,.
l%THD
1.1
~
a 1.4
IJ
~·
... LEVEL
"""
I
I t .I 1.2 IJ 8.4 1.1 D.l 1.1 G.l 1.1 1.1
OUTPUT ,OWER fWJ
(Source: National Semi Conductors, Linear Data Book)
Figure 14
LM386 Power Dissipation and THO level
as a Function of output Power
SPEAKER
CONNECTOR
LM386
5
+
SPEAKER
1
100UF
TO
SPEAKER
10
• 1 UF
SPEAKER RTN
2
Figure 15
Speaker Interface Circuit Diagram
20 t---------- +---------- +---------- t---------- -1---------- -t---------- t
I
. _.....,----o
cl
--.__
I
·J
0
I
I
I
I
<
(t)
11
Ill
1-'
1-'
1-'·
0
{Jl
rt
Ill
~
(t)
11
(t)
~(t)
::s
0
'<
co
1-'·
~
(t)
z......
.....
(!)
11
<t:
I
Ul
I
(t)
(t)
-
"I
~
1-'·
p.
(t)
tr
Ill
::s
p.
r//
I
-?0
SPEAKER
INPUT
.
.
.
.
.
I
··
/
·
.
. ·
·
1
·
I
I
.. . ..
l
V
I
:
f
II
.
.
'
\
. . ..
I
I
I
I
I
I
:
. . .t
.I
I
\0
'\
I
.\.
\
. . ..
I
I
\
-40 DB/DEC
\I
)'
I
··
·I
I
I
I•
..
.
.
iI
I
I
.
l
·+
I
..
'
I
I
I
I
I
I
I
I
. \
I
I
I
I·
I
..
..
I
I
!
I
.
.
I
'
.
.
.
;·· ·
I
I.
'•
I·
/·
-40+
0\
/
I
"I
.
·/
//
'
.
I
I
I
. ///.
/.
. ' .. / '•'.
I
I
,.
I
//
/
./
~
-Lot
:
::s
rn
t'lt1PL 1 F I ER
OUTPUT
.
~
'0
0
. .
I
0
1-:tj
-20 DB/DEC
./ t'IUD I 0
I
•
//1
/
I
I
I
I
•
•,_,.---~ o.,
•..-
/
~0
~
IJ""'·
/
0t
)>I
1-:tj
/
:
p.
I/
/
'
.\
0\
I
I
I
\
\
I
· · ·+
\
'
.
o\
I
\I
\1
. .+I
I
II
-P.O + ---------- +-----------+----------- +---------- +---- ------ --------- - +
lOOmh
1. Oh
10h
100h
1. Ol<h
10I<h
1001: h
·~.
0
VDB ( 10)
1,
I
I
\t'DB ( 12)
,
I
FREQUENCY
""'CD
49
t----------~--------~-+-----------r------07 --t ~
I
.
.
.
.
...
,./
.../
/
G
.--1
I
I
__
.. -·
.
/./~......
,C
/
_,/.
//
,Ill
c
/'
I~
//-'
t
to
./
..-···"
//
•
c
/
;·
I.
I
>
0
I
::::L
I
....-!
z
w
-to
f
.~
/''
:::>
0
w
~
!t
LL
l
I\
c
co
0
::=-
\
•
\•
\
CD
•,
I
\
I
+- __ . . . ___ ---~- ---------+--------- --r- --------- -+
C
0
C'-J
\
I
0
0
0
....-t
<sa>
0
>
0
C>
c
0.--1
.--~
('-J
I
I
NI'-:19
Figure 17
Overall Audio Stage Frequency Response (narrow band)
Q
•
50
It may be established that with the volume control set
at 50%, the audio stage has a maximum gain of approximately
14.5 dB at 400 Hz with a bandwidth of 1KHz, centered at 600
Hz.
From the result of Table 12 and when volume control
set at 100% (full volume), the maximum gain is 20.5 dB.
4.5 Power supplies Decouplinq
The purpose of the power supplies decoupling is to reject noise injected into the IBM PC power supply lines.
The noise is generated on the resistive path from the power
supply module to the speech card.
By adding a capacitor on
the speech card we create a low pass filter which introduce
a "pole" at a frequency depending on the resistive path and
the
capac~tor
value.
Any signal at a frequency higher then
the pole frequency will be attenuated at a rate of -20
dB/decade.
The filtering is especially essential on the +12V line
use to power the output signal.
CHAPTER 5
SOFTWARE DESIGN
5.1 General Description
The
progra~
TALK.PAS is a program designed for pho-
netic speech development.
It is written in the PASCAL lan-
guage, using the TURBO PASCAL software package.
The com-
piled version of the program is called TALK.COM and may be
accessed by entering the command 'TALK' at the DOS command
level.
A normal preceding disk drive and sub-directory at-
tributes may be applied.
The program is operated with a
menu and then applies its own menu or options.
An effort
was made to make the program as "user friendly" as possible.
The task is accomplished by allowing flexible format
and extensive error messages.
Figure 18 shows the program
block diagram.
The options available under the main menu are:
1. Keyboard
2. Text file
3. Talk file
4. Phonemes
5. Dictionary
selecting option 1 from the main menu puts the program
in a keyboard input mode.
Under this mode the operator is
allowed to enter one line only.
51
The line may contain all
I'd
11
0
I.Q
11
Ill
s
t%j
.....
1-'
U1
'§
0
0
11
(I)
0
.....
Ol
t-i'
1-'
Ill
I.Q
11
Ill
s
Ul
l\J
53
the options and attributes that a file may contain.
The
purpose of the line mode is for practicing, experimenting,
and developing short phrases, diphthongs, or words.
Selecting option 2 from the main menu puts the program
in a Text file input mode.
Under this mode, a previously
prepared file ('filename.TXT') is read, processed, and sent
out to the speaker.
The output speech data may be saved
under a different file name (.TLK).
The latter may be exe-
cuted by selecting option 3 from the main menu.
Option 2 being the most important feature of the program, is used for phonetic transcription development.
us-
ing a text editor, a phonetic transcription is written,
following certain rules.
The transcription may contain
explanatory portions that are ignored by the program,
phonemes, eight different attributes, and words accessed
from the phonetic dictionary.
Any Text editor may be used to prepare the file, although special control characters should be excluded.
WORDSTAR users should use the editor in a 'Non Document'
mode.
Users of other text editors (like Multimate) may
need to use other techniques in order to create a non-document file.
to a disk).
(One way of doing this is by printing the file
A very "handy"· editor to use is the desk top
utility program Side Kick.
The program contains a 'NOTE'
option as one of its utilities.
This option is actually a
limited version of Wordstar in a non-document mode.
The
editor may be accessed at any time without terminating the
54
running program {TALK.COM), feature that saves a lot of
time, and appears almost as if it is part of the program.
Selecting option 3 from the main menu puts the program
in a TALK file input mode.
A file with the extension .TLK
is a previously processed file that contains the speech
output data only.
This option allows the user to listen to
the output speech without waiting for the analysis to complete.
This may save time for a long file, but more impor-
tantly, enables the user to write a separate program that
merges selected .TLK files for a continuous speech.
Selecting option 4 from the main menu puts the program
in a Phonemes input mode.
Under this mode, the user is al-
lowed to access one phoneme at a time.
A delay is added to
the phoneme "normal" duration so that its characteristic
can be better studied.
Should the user wish to hear the
particular phoneme in a normal mode (with or without attribute), option 1 must be used.
Selecting option 5 form the main menu puts the program
in the Phonetic dictionary mode.
Under this mode a word
from the dictionary may be selected and announced. Its phonetic transcription may be altered, or a new word may be
added •. Words in the dictionary may be accessed from a text
file (filename.TXT) by placing the word between exclamation
marks.
(Example: !HELLO!).
Appendix D contains the program listing and flow
charts.
55
5.2 Keyboard mode
Selecting this mode allows the user to enter one line
of text from the keyboard.
The line may contain commentary
text (English) as well as phonetic text.
The phonetic text
is contained between a pair of slash marks '/' and is the
only portion of the line that will be processed.
This por-
tion is referred to as a Section and is processed by a procedure having the same name.
A line may contain more then
one section.
A section may contain any number of phonemes separated
by at least one blank.
Each phoneme may be followed by an
attribute portion contained between a pair of parenthesis.
The attribute portion may contain one to eight different
attributes, separated by at least one blank.
The following is an example of a line format.
xxxxx / H(D2 NS) EH(D3 Nl2 (S3) L (S2) 0/ xxxxxx
x = any text; ignored by the program.
Table 13 shows the different attributes along with
their associated range, and their associated register.
The
phonemes entry must be one of the 64 valid phonemes listed
in Appendix E
By the time the line process is finished, the output
speech data array contains the data to be send out to the
speech processor.
The content is first send to the dis-
play, preceded by a status line containing the present set
56
of the eight attributes, and then to the speech processor
for a continuous speech.
The display format consist of the register symbol name
(DP, IS, TA, RE, FF) followed by the data in hexadecimal
format.
This information may be useful to verify the data
that is being send out to the speech processor.
Table 13 -Program Speech Attributes
Symbol
Name
Range
Associated
Register
D
Duration
0 -
3
DP
A
Amplitude
0 -
15
TA
N
Inflection
0 -
31
IS
s
Slope Of I.
0 -
7
IS
T
Articulation
0 - 7
TA
R
Rate
0 -
15
RE
E
Ext. Pitch
0 - 15
RE
F
Filter Freq.
0 - 255
FF
5.3 Text File Mode
In this mode a text file containing large number of
lines is processed.
Each line must be of the same format
as a Keyboard line input d"iscussed earlier.
The process
ends when the end of file (EOF) mark is read.
By selecting this mode, a user is first asked to enter
the name of the file to be read.
Secondly, the user is
asked whether or not he or she wishes the output data to be
57
displayed.
Upon answering Y (yes), the actual data send to
the speech processor will be displayed.
Should the file
not be found a run time error has occurred.
The text file is then read from the disk into a memory
buffer, and is processed one line at a time.
The line·pro-
cess is identical to the one discussed in the keyboard input mode.
When an error occurs within the line process,
the file process is terminated, and the error message indicates the line number as well as the column number of the
error location.
By the time the process is finished, the output data
array contains all the speech data and is then send to the
speech processor for a continuous speech out.
After the
file has been "spoken out", the user has the following
options:
Q - Quit,
<F3> - Repeat,
S - Save,
Any key - new file
Terminate the Text File input mode and
Q -
returns to the main menu.
<F3>
Repeat the previous speech output.
The data
already in memory is used
s -
Save the output speech data to a Talk file
(SameFileName.TLK).
Any key
Stay in the same mode and ask for a new
filename~
5.4 Talk file input mode
A talk file contains speech output data ready to be
send to the speech processor.
Usually, this file is ere-
58
ated after a successful process of a text file.
The file
may be used in order to save the process time of the text
file.
It may also be merged with some other Talk files in
order to crate a long output speech.
The user is asked to enter up to five file names
(defaulting to extension .TLK).
The program then read them
into a memory buffer and send them to the speech processor
in the same order for a continuous output speech.
After the files have been "spoken out" the user has the
following options:
Q - Quit,
<F3> - Repeat,
any key - new file(s)
Q -Terminate this mode and returns to the main menu.
<F3> -
Repeat the same file(s).
memory is used.
The data already in
If the file has been modi-
fied since the last load, the change will not
be reflected here.
Any key - Stay in this mode and request new file
name(s).
5.5 Dictionary mode
In this mode the user may retrieve, or modify the phonetic transcription of a word already existing in the phonetic dictionary.
Should a word not exist in the dictio-
nary, the user will be given the option to add it and later
to save the new version of the dictionary.
59
Search method
The dictionary search is done very effectively by a
successive approximation method.
The word to be searched
is compared with a dictionary word pointed out by a dictionary pointer.
Initially, the pointer is set to the center
of the dictionary.
If the value of the word is higher from
the dictionary word, the pointer is incremented half way
higher.
If the word value is lower then the dictionary
word value, the pointer is decremented half way downward.
If the length of the dictionary is between 2**(n-1) and
2**n, the total number of search cycles is n.
vocabulary
The basic vocabulary of the speech synthesizer dictionary, consists of the most common words used within the
spoken English language.
It is important that the speech
synthesizer should, at the very least, pronounce each of
these words accurately because they are the ones which will
be spoken most often.
The list of words shown in Table 14, were compiled
many years ago by Godfrey Duey at Harvard University.
words are listed in descending order.
The
The most frequently
spoken word is given at the top with the relative frequency
of occurrence besides each word.
The proper way to inter-
pret this table would be, for instance, to say that the
word "the" occurs slightly more than 7 times out of 100
words.
The next most frequent word "of" occurs almost 4
times out of 100 words and so on.
60
Table 14 - Phonetic Dictionary Basic Vocabulary
# Word
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
the
of
and
to
a
in
that
it
is
%
7.31
3.99
3.28
2. 92•
2.12
2.11
1.34
1.21
1.21
I_
1.15
for 1.03
be
.84
was
.83
as
.78
you
.77
with .72
the
.68
on
.64
have .61
.60
by
not
.58
.58
at
this .57
are
.54
.55
we
# Word
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
his
but
they
all
or
which
will
from
had
has
one
our
and
been
no
their
there
were
so
my
if
me
what
would
who
%
.51
.so
.47
.46
.45
.45
.44
.43
.41
.39
.36
.33
.33
.32
.32
.31
.30
.30
.30
.29
.26
.25
.25
.26
.24
# Word
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
%
when .23
him
.23
them .22
her
.22
am
.21
your .21
any
.21
more .21
now
.21
its
.20
time .20
up
~20
do
.20
out
.20
can
.19
than .19
only .18
she
.18
made .17
other .16
into .16
men
.16
must .16
people.16
said .16
# Word
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
may
man
about
over
some
these
two
very
before
great
could
such
first
upon
every
how
come
us
shall
should
then
like
will
little
say
%
.16
.15
.15
.15
.15
.15
.14
.14
.13
.13
.13
.13
.13
.12
.12
.12
.12
.12
.12
.11
.11
.11
.11
.11
.11
CHAPTER 6
SPEECH DEVELOPMENT
6.1 Introduction
This chapter presents a method for phonetic speech development.
It is required and assumed that the user is fa-
miliar with the utility program TALK.COM and its various
options.
Although the phonetic aspect of speech is beyond
the scope of this work, a short discussion regarding speech
sounds and the SSI263 phonemes in particular is presented.
Equally importantly is the phonetic programming methodology
as it is related to this particular speech processor, and
the text entry format which must be adhered precisely.
A Phonetic Speech Development Sheet is developed in
order to aid the user in speech development and is presented followed by an example.
6.2 Phoneme Discussion
section 3.2 describes the SSI263 phonetic alphabet as
divided into 3 groups for the purpose of differentiating
between phonemes and allophones.
Another way of dividing
the alphabet is according to usage.
The most familiar di-
vision is a two sections split: CONSONANT SOUNDS and VOWEL
sounds.
Within each of these sections, sounds may be fur-
ther subdivided according to the distinctive features that
best describe the sounds phonetically or acoustically.
61
62
Consonant Sounds
There are 22 Consonant Phonemes, subdivided into 6
categories according to their manners of production in the
human speech mechanism. They may be voiced or unvoiced (no
vocal source during air flow).
Stops
voiced: B,D,I<V
_unvoiced: P,T,K
Fricatives
voiced: Z,V,J,THV
unvoiced: S,F,SCH,TH,HF
Affricates
voiced: D,J
unvoiced: T,SCH
Semi-Vowels
voiced: R,L
Glides
voiced: W,Y
Nasals
voiced: M,N,NG
Consonant sounds are selected for a sequence in much
the same manner as an alphabet character would be selected
for the spelling of a word.
The selection of a particular
consonant is done by distinguishing it from all other
consonants.
63
vowel Sounds
There are 12 basic vowel phonemes.
Vowels are subdi-
vided according to the manner in which they are produced.
All vowels are voiced sounds but each has a different output based on the degree of obstruction created by the opening of the mouth (closed to open) and changes in the tongue
position (front, medial or back).
The 14 allophones are divided the same way as the
basic vowels.
The sounds they emit vary slightly from the
basic vowels that occupy the same positions (in the list
below, the allophones are listed in parentheses next to
their basic phoneme.
Front Vowels (closed to open mouth):
E (YI,E1,IE), I (AY), A (A1), EH (EH1), AE (AE1)
Medial Vowels (closed to open mouth):
(E2), UH (UH1), ER (UH2), (UH3)
Back Vowels (closed to open mouth):
U (U1), 00 (IU,IU1), 0 (OU), AW, AH (AH1)
6.3 Phonetic Programming Methodology
Due to the great variety of phonemes and parameters
(attributes) choices, as well as the different effects the
parameter selections have on the speech sounds, a systematic approach to selecting the variables is necessary.
The first step is to transcribe the target word and
phrase into its basic phonetic components.
The Keyboard
input mode may be selected to experiment at this stage and
to enter the selected sounds in the speech synthesizer.
64
The output is checked and modified auditorily.
The de-
faults values for the attributes are already programmed in,
and may qause the first trial to sounds somewhat misarticulated.
Phoneme adjustment is done next, until an adequate
pronunciation of the target is established.
The second step is the parameter adjustments.
First,
the articulation (T), pitch extension (E), and filter frequency (F) should be maintained at their nominal values.
Adjustment should be made in the levels of only one of the
remaining 4 parameters at a time, beginning with the duration (D); and moving on to the inflection (N), rate of
speech (R), and amplitude (A) (in that order) once the specific effect that the parameter can make has been achieved.
since the phonetic transcription at this time may be
lengthy, it is advised to use the text file input mode.
Figure 19 presents graphically (in a flow chart
manner) the phonetic speech development methodology.
6.4 Text Entry Format
Whether a Keyboard Input or the Text File Input mode
is used, the text is processed on a line by line basis.
Therefore, it is essential to keep the appropriate line
format throughout the file.
A line format consists of English text, phonetic transcription, and dictionary words.
English text is com-
pletely ignored by the program and is used solely for the
convenience of the user.
the target word or phrase.
It is basically used to identify
A phonetic transcription is en-
65
Evaluate the
target woN.
in tei'MS or
vowels and
consonents
Select/AdJust
phonettes
AdJust
Dwoation
<» atbi:bute>
AdJust
ln!lechon
(attl'i:bute H>
AdJust Slotte
of Inflechon
(atbi:Lute S>
Ad.iust
AIIPiitude
(dtl'iLute A>
Figure 19
Phonetic Speech Development Methodology
66
closed between two slash marks
1
1 1 and is called a section.
A section is processed by the program_ and may contain
phonemes and attributes.
at least one blank.
two parentheses
1
()
The phonemes must be separated by
The attributes are enclosed between
1
and may or may not follow the phoneme.
The attribute duration (D), however, must follow the
phonemes since it is directly related to the phoneme.
line may contain any number of sections.
A
A section may
contain any number of phonemes and attributes.
Any number
of blanks may be used among the different line elements.
A Dictionary word may be entered outside the section
and must be enclosed between two exclamation marks
1
!
1
•
Should the word be included in the phonetic dictionary, its
phonetic transcription would be processed.
The following is an example of.a typical line input.
hello there IEH (N7 Rl3) HF(Dl NlO A4 R7) EHl(NS Al3 Rl4) I
I UH3(D2 N9 AlO Rl3) LF(DO NS) UH3(D2 NS Al3 R9) 0 I
I OU(DO N6 RlO) U(D2 N7 AlO Rl3) U(D3 NlO AO R7) I !THERE!
6.5 Phonetic Transcription Development Sheet
The phonetic transcription Development sheet is a
document which was developed in order to aid the user in
the process of speech development.
It may also be used to
save and store phonetic transcriptions of words with
special interest.
The Phonetic Transcription Sheet is a table consisting
of nine lines and fourteen columns.
Each column represents
a phoneme with its associated parameters.
The top row
67
indicates the phoneme followed by additional eight rows,
dedicated for the eight different attributes.
Each entry
box of an attribute includes at its top portion, the default value,
al~eady
programmed in.
By leaving the partic-
ular box empty, the default value is assumed.
The top line
includes an additional information as the Word (for which
the phonetic transcription is developed), the Step No. The
particular sheet may be one of several used to achieve the
final result), and a check mark to indicate whether or not
the particular sheet is the final result.
Figure 27 shows an example of the phonetic transcription of the word 'hello'.
Appendix C contains a blank form
of the Phonetic Transcription Sheet which may be reproduced
and used by the user.
6.6 An Example of Phonetic Transcription Development
The following example illustrates, step by step, the
procedure for the development of the phonetic transcription
of the word 'hello'.
The eight steps used to develop this
particular word are illustrated in Figure 20 through Figure
27.
The result of each step is stored in a disk file named
HELLO.TXT which may be processed and 'spoken' by the
program TALK.COM.
The entry forms of the eight steps are as follow:
step 1 (original phoneme entry):
/PA PA HF EH L 0 PA PA/
step 2 (phonemes selection refinement):
/PA PA HF EH UH3 LF UH3 0 OU U PA PA/
68
step 3 (Duration adjustment):
/PAPA HF(Dl) EH(DO) UH3(D2) LF(DO) 3(D2) 0 U(DO)/
/U(D3) PA(DO) PA/
step 4 (Phoneme and Duration adjustment):
/PAPA HF(Dl) EHl UH3(D2) LF(DO) UH3(D2) 0 OU(DO)/
/U(D2) PA(DO) PA/
step 5 (Inflection adjustment):
/PA(Nll) PA HF(Dl NlO) EHl(N8) UH3(D2 N9) LF(DO N8)/
/UH3(D2 N5) 0 OU(DO N6) U(D2 N7) PA(DO NlO) PA(Nll)/
step 6 (Phoneme, Duration, Inflection, and Rate of Speech
adjustment):
/PA(Nll)PA HF(Dl NlO R7) EHl(NS Rl3) UH3(D2 N9 Rl2)/
/LF(DO NS) UH3(D2 N5 R9) 0 OU(DO N6 RlO)/
/U(D2 N7 Rl2) U(D3 NlO R7) PA(DO Nll RlO) PA(NlO)/
step 7 (Phoneme, Duration, Inflection, Rate of Speech, and
Amplitude adjustment):
/PA(Nll) PA EH(N7 AO Rl3) HF(Dl NlO R7 A4)/
/EHl(N8 Rl3 Al2) UH3(D2 N9 Rl2 AlO) LF(DO N8)/
/UH3(D2 N5 R9 Al2) 0 OU(DO N6 RlO)/
/U(D2 N7 Rl2 AlO)U(D3 NlO R7 AO) PA(DO Nll RlO Al2)/
/PA(NlO)/
step 8 (final adjustment for personal preference):
/PA(Nl3) PA EH(Rl3 AO) HF(Dl N7 RS A2)/
/EHl(N9 S2 Rl3 Al2) UH3(D2 N9 S4 Rl2 AlO) LF(DO SO)/
/UH3(D2 N7 S7 R9 Al2) O(N6 S4) OU(Dl N5 S2 RlO)/
/U(D2 N6 S3 RS A3) U(D3 N7 S4 Rl2 AO) PA(DO N5 Al2)/
/PA(Nl) I
0
HI
'tl
P'
::s(!)
rt
.....
~
0
11
8
11
(!)
0
fJ.,
..
~
t<.1
t:-t
b..
m
rt
I
WORD
1
9-1-n
/t.£~~f?
--- ---------------------
Dl
Nl
"xJ
.....
..... ~11
'tj
rt
.....
0
::s
(!)
'tj
0
....
(!)
~
1-'
0
'tj
s(!)
::s
rt
DURI'\TION
CS) 3-0 CFl
(!)
AI
AMPLITUDE
CLl 0-15 CH)
"'0
Rf
RATE OF SPEECH
CS) 0-15 CFl
EI
STEP NO.
1
_
_I_
FINnL
1
YES __
NO
.L
J. ... !L.t... ~.... .I ....~.... J.... ~.... J.... ~.....l. ... ~....
.. .J:~.. ..I ... H~.. ..I ... .t~.. J....~~... .I..J~.... I... ~~----'--J~... J.. J~... .L ...~~- .. .L ...;~... .I .. J~... .I ....~~-·..1 ....~~---.I ....~~---·
INFLECTION
0-15 (f)
SLOPE OF I NFL.
Tl
_C:.J;Ef{{ ______ _
($)
sI
0
11
I
p,4 I PA
D
PA I P/1 I !IF I £/1 I L.
.. .. !L..I .... ~.... .I ....~.--.I .... ~.... J.... ~..... I.... ~.... J.... ~.... J.... ~....
PHONEME
Ill
::s
m
PREPPRED BY
------
0
P'
rt
DATE
($)
0-7
... g ...J...J;C.L:J..~•. J... ~.~ .. ..I .. .!:~.... I... t~ ... .l ... ~-~- ...1 ... ~~---.l. .. ~.~----'···t~ .... l...~.~----'···~·~----'··}.~ ... .l ... ~~---·
0-7
(f")
EXTENSION PITCH
0-7 LOW
8-15 HIGH
F 1 F 1L rER
'u
...!:!iL.I... ~~... .I...~~--.I .. J.~... .I ... ~~---·l-.J~ ... .I .. ).~... .I ... ~~---.I..}.~... .L.).~... .I ... l.~ ... .l...~.~---.l ... ~.!'!... .l ... ~.~---·
.... ~.... L .. ~.... .t...:~....l....~.-....1....:!.... .1.. ~- .. 1... ~.... 1. .. ~....1... ~ ...1... 2... .1. .. 2... ..1 ... 2.. L. !?.... !.. ...~...
nRTJ CULI'IT I 014
($)
.... ~.....I....~.....L ...!!.• ..I .....~.....I....~..... I.... ~.....I.... ~.... .L.. ~.... .I .... ~.... J.... ~.... .I .... ~.... .I .... ~.....I .... ~... ..I .... ~......
(f)
.... JL.. .I .... §.....I....!I....I... JL ...I .... Y.... .I. ... ~.....I....!L,..I .... § .....I.....fl.... .l. ....'l.....l.. ...f1.... .1. ... §.....1.. .. f/.... .I. ... !L...
..
..
FREauENcvt .. ~I~ L~~} I..~.H.L~~-~ I..~E.~ LiQ.~ ..L.~.:n.t ~n.L~?~
..
...
..
..I.. ~1.~ ..1..t?~}.. .l..n:U .. ~.~}.. .I..~.~-~
0-255 CHl
PHONETIC TRANSCRIPTION DEVELOPMENT SHEET
m
1.0
0
't1
t:J'
rt
::f
HI
t:J'
(I)
1-3
~
IV
1
~~~~-------------------
PHONEME
rt
.....
0
rn
rt
WORD
PREP~EO
tf:(.:{J
BY
1
--~J-~~~~-----STEP NO.
1
-~-
FIHAL
I
YES --
HO ~
0
0
11
g,
s
.-
1
(I)
(
~
DATE
11
PI
::f
rn
0
11
~
rt
.....
0
~
0
~
.....
(I)
.g
1:1
(I)
t%j
Dl
DURI"'TION
CSl 3-8 CF)
Nl
cs, e-1s ,,.,
P/1- I PA IHF I£H IV..V3 ltF 1Vk3 I t!J lo" I U I I'A I P..-4
.... ~.....L.!L..L.!L..I. ...~.... .I....!L.. I.. .. ~.... .L.. ~.... .I. ...~.... .I.... ~.....I ....!L..I....!L..I.. ..~..•..I....~.....I ....~.....
.•.A!'....LML.I....tLI ...l:.~....l. ..l:~.... I....~~--..1 ....~~---.I ... }.~... .I .. J~... .I ....~~--..I ....~~... .I ... A~....I ...ML.I ..J~....
INFLECTION
.....
Sl
SLOPE OF INFL •
cs> e-7 <F>
....! .....I....! .... J.....!. •..I.... ~.....I...:~.... .I. ... ~.... .1. ... !'.... .1.. •. ~..•. .1 .••• ! .... .1. ... ! .....1. ... ~•.... I....~.....I.....~.....I....!L...
11
AI
AMPLITUDE
CLl e-15 <Hl
...!L.I...!~....I...~.~-..I...!~....I...!~....I... :I:~... .I..}.L.I..J:!L..I..J.~... .I..J.~ ....I...!~... .I..J.L.I..J.L.I...H~....
(I)
IV
Rl
OF SPEECH
CSJ 8-15 CFl
...!!?.•..I ...!!?..• .I .. J.!•..L.H-!....I...ML.I.. J.!i!....I...M~....I...!~... .I..J.~.:.LJ~.:~J... HL ..I..J.!Cl..f!?....L!!i!...
IIRTICULATJON
csl e-7 ·en
•••••il5.... .1 ..•• ~.••••l.... ~.•..I .... ~.... .I .... !?..... I.... :?.... .I .....!?.... .I .....!?..•..I .....!?.... .I .... !?.... .I .... J?•.•• .I .... ~.....I.... ~.... .I .....il5.....
~
.....
Tl
El
RI"'TE
EXTENSION PITCH
e-? LOW
8-15 HIGH
.....~.....l ....§.....l.....f.•..I....!L...l....!l.....l.... § .... .l .....~......l .... !l.....l ....!l.....l....§ .... .l .....~.....l....§.....l.... JI.....l....§....
F 1 F 1LTER FREouENcvi .. ~~-~-.I..~.~-~-L~1~ ..u~.~-.l..~.~-~-.l .. ~.~-~..l..n~ ..l ..~~-~ ..u.~.~ .. u;I~-l.:?.~.~ .. l..~~-~ ..u~.~--.l ..~~-~--·
CL) 8-255 CHl
::f
rt
PHONETIC TRANSCRIPTION DEVELOPMENT SHEET
-...!
0
DATE
0
.I
I
HI
rt
"d
rt
,....
(
0
0
li
p,
1-3
li
= rn
::.::
0
l."lj
rn
rt
(I)
't:J
w
L~ 0
-I/.If'
-------------------------
PHONEME
0
N
Ill
:::1
=
PREPARED BY •
-~~-Jrf:/~------
A_
STEP NO . •
FINAL • YES --
NO
~
(I)
::t
t-4
WORD •
::t
0
:::1
(I)
b
!:f.:!']
1
li
1-'·
I'Ij
1-'·
~
P/1 I P-4 I ,¥f"
....!L ..I. ...!L .. .I ....
I £/1 I VNJ I LF IVH.3
;--+. .
DURATION
CSl 3-0 CFl
I ()
1/ I V II'A I P/9
!L..,... ~.....,.... ~.....,...; .... .I .... ~.... .I .... ~.... .I .... ~.... .I .... ~.... .I .... ~.... .I .... ~.....I.... ~....
...~.".... l. ...~~... .l ... !.~. ..l.. .l.~... .I. .. M~.... l....~~... .I ....~.~... .I ... A~... .I ... .l:~... .l .. ).~.... l. ...~~---.l ....~~... .l ... ~~.. ..l ... .l:~....
INFLECTION
CSl 0-15 CFl
s
SLOPE OF INFL.
.... e... ..I ....!L ..l....!'.•..l.... ~.... .l. ... ~..... l.. ...~.....1....~.... .! .... ~.... .1 .....~.... I .... ~..... I.... !!l.... .1 ..•. ~.... .1 .... ~.... .I. ....~.....
<Sl 0-7 CFl
't:J
li
(I)
A
<Ll 0-1!5 CHl
1-'·
N
N
R
RATE OF SPEECH
<Sl 0-lS CFl
.. .!.!'!... .I ..J.!L.I ...~.!'!-..I .. J.!L ..I.. J.!'! ....I.. J.~ ... I... !~ ....I .. J~ ....I... ~.~... .I .. J.!'!... .I...~.!'!... .I...H-!... .1 ... ~.~---.I. .. ~.!'! .....
0
T
ARTICULATION
CSl 0-7 (Fl
.... ~.... .L ... ~.....I....L.L ..~... ..I .... ~.... .I. ... ~... ..I .... ~.....I.... ~.....I:... ~:... .I .... 2.~: ..1 .... 2. ....1..~~ey.... .l.... ~.... .l .... ~.....
......
E
rt
0
:::1
~(I)
0
:::1
rt
EX TENS I ON PITCH (.....~....., .....~.....,.... l ...t.... !L..t.... ~..... t.....~....
9-7 LOW
.t. ....L
..I. ... ~.....I.... ~..... J..•. ~.... J ... !L.J ... ~.... J.....~.... .I .....~.....
8-1!5 HIGH
't:J
s
(I)
.. J?;... .I ... !?;... .l..J~. ..I ... ~.L.I...!?;.... I...~.t: ... .l ... !t:... .l ...!t:... .l..}.~ ... .l. .. !t:... .l...~f. ... .I ...!L.I ...!t:... .l ... !;t;....
I'IMPLITUDE
F
l..
F I L TER FREOUEHCY( .. g.~-~ ..j.. gi~ .. f.. ~~-~ ..f..;t;:"n.j .. g_?} .. f.. ~.?.~ .. t:I~ .. f.. ~~-~ ..
(Ll 0-255 CHl
f.. :?.~.~ ..l..:?.t?....l.. t:.~.~. f-~~} ..0
~-~ ..l .. t:~.~--.
PHONETIC TRANSCRIPTION DEVELOPMENT SHEET
-...]
1-'
0
·I
I
I
'tJ
HI
::t
rt
(I)
::t
(I)
p,
t-1
s.
.
{II
rt
(I)
'U
""'
WORD
1
-~~-~tD-------------------
I
-~:~~Ltr ______ _
STEP NO.
1
_4_
FINnL
1
YES __
NO
L
~
1-3
li
Ill
trJ
'/:::(_:J!J
rt
.....
~
::ii
1
0
0
0
li
PREP~ED BY
DATE
::s{II
p,4 I PI/ I IfF IEHtlvH1ILF IVHJ I o ll)v I l/ lf'll I P/1
.... !L..I....~.....I....L.I ....~... ..I .... ~..... I.... ~.... I.... ~.... .I .... ~.... .I. ... ~.....I .... £'.... .1 .... £".... .1. ... ~.... .1 .... £'..... 1. ... ~.....
PHONEME
Dl
OURI'ITION
CSl 3-0 (Fl
Nl
INFLECTION
<S> 0-15 <Fl
z.
z.
z.
2..
... .l:~.... l....l:~....l....l..~--.l....t~... .I ... .!:~... .I .. .A~... .I ....~.~... .I ....~~... .I. ...~~... .I ....t~... .l ....~~- ...l... 'l:~....I.. YL.I ... .!:~....
t%j
.....
li
..... ~li
0
S( SLOPE OF INFL.
<Sl 0-7 <Fl
... !~... .I. ..!L.i ... !~- ..I ..J.;?... J. .. !;;;.... (.. ).!?... .I ... ~.~... .I ... !&...LJ;;;... .L.!;z ... .l...!;2 ... .1 ... ~.!? ... .l. ..!:~... .I .. J::?....
AMPLITUDE
0-15 CH)
'U
(I)
AI
0
"'w
R( RATE OF SPEECH
rt
.....
~
0
(I)
<
(I)
......
0
'U
!3
(I)
~
(L)
<Sl 0-lS CFl
Tl
.... ~.: ...I. ... ~.....I....!L ..I .... ~.....I.. ...~..... I. ... ~.....I... ..~.... .I .... ~.... .I .... ~.... .I. ... ~.... .I .... ~.....I .... ~.... .I.. .. ~.... .I .....~.....
ARTICULATION
CSl e-? CFl
..J.~ ... .I..J.!l!....I .. J.~-..I .. J!LI..J.~... .l..).~ ... .I ... !.~ ... .I ...!.!'!... .I..J.~ ....I.. J.~... .I ... !.~ ... .I ... ~.~- ... I...~.!L ..I.. J.~ ...
.... ~.....l. ... ~.... .l.... ~.--.1 ....~... ..1 .... ~ .... .r .... ~.... .l. ... 2.....1.... 2.... .1 .... 2.... .1 .... 2... ..1 .... ~.....1.... :?.... .1 .... 2.... .1 .... ~.....
E(
EXTENSION PITCH .....fl.... .l .... ~.... .l .... l-..l.... ~... ..l .... ~..... l. ....~.... .l .....~.... .l. ... ~.... .l .....~..... l.....~.... .I .....L ..I .....fl..... l.....L ..I .....~.....
0-7 LOW
8-15 HIGH
F1
F 1LTER FREauEHcvL;;;~-~ . .1..:?~.~-.L;;;.n ..L;;;~-~ ..I..::?.~.~... I..;;;.~:?.. .I..!?.~} ..I..~n ..I .. ~~-~-.I..~.E.L:?.~.~ .. I..:?.~}..I .. ;;;~}. ..I..;Q.~ ..
(L) 0-255 CHl
rt
PHONETIC TRANSCRIPTION DEVELOPMENT SHEET
-...1
"'
0
HI
rt
I
'l:f.:!l
WORD a
_hf~j~-------------------
PHONEME
(
0
0
li
p..
1-3
li
DURATION
CSJ 3-0 CFJ
.....
l'!j
.....
..... ~li
rn
0
li
rn
!j
(D
'U
U1
N
!j
'U
rt
D
PI
8.
.
--~~Jt~j~------STEP NO.
1
_£
FINAL • YES --
NO
~
0
rt
.
1
(tl
(tl
::I:
t>1
ti
PREPARED BY
!j
::t
I
.I
~
DATE ,
rt
.....
0
0
(tl
1\J
~
... f..~.--.L.YL.I ... f.~--.l ... !.~... .l ... f.~.... I....~~... J .. _!,~... .I ... !~.. --l ... f.~.--.l ....~-~----1 ... ~~--:.l ... ~~---.l- ...~~----'----~~---·
INFLECTION
CSJ 0-15 CFl
s
SLOPE (IF INFL •
A
CLJ 0-15 CH)
R
RATE OF SPEECH
CSl 0-15 <F>
---~~----L~!L..L..~~---'---~~----'---~-~---·1--.}~ ... .I..}.~---·1--}·~---·l--.!:~ ... .I ... ~~----'-·-~-~--..I ... !:~ ... .I .. J~... .I ... ~~---·
T
nRTICULATION
CSJ 0-7 CFJ
.... 2.... .L .. 2.... L--~----'·----9.... .1 .... 2.... .1 .... !'.... .1 .... ~.... .1 ....!?.... .1 .... ~.... .1 .... !?..... 1.....~.... .1 .... ~-----'----~---·-'----E.....
~(D
.....
E
a(tl
F
0
'U
!j
o v I ll I I'll- I PI/
~~~ I f/1 I /IF I Ell! I V/13 I L.F I (//I) I ()
----~-----'----~... --l .... ~---L--~----L·-~-----'-·-·~-----'----~----.L ... ~.. -.L·-~----.L ... ~-----'----~-----'··--~---·-'----~-----1 .... ~---··
2
I
z
I
2..
2.
CSl 0-7 CFJ
---~-:?
AMPLITUDE
... .1 ... ~:?... .1. .. ~:?---'---~:?....I. ..E ... I,.).; .. .l ..}.i?... .l ... ~.f:... .I.J:<;....I ... ~.:?... .I ... ~;;;;_ ...I... ~-~----'·--~~----'---~-:?....
EX TENS I ON pITCH I-----~----·I----~----·1--··-'----I-----~-----I---·~----·I-·---'L
0-7 LOW
.. j.....~.... .l.. .. fL ..I.....'!.... .I. ....L ..l.. ...~.. ---'- ... !L ..I. ....L ..I.....L
..
8-15 HIGH
F 1LTER FREauENcvt .. !?~-~--l .. :?.~} ..l..i?.n ..I.. :?.~.~--1--:?.~.~--1-.~-~-LI .. ?~}... I.D. ~---1--~.:p .. l.. n~--1-.n~ .. IJ~.:":J... I.. :?.~.~-.1.-:?.~. 3. ..
<Ll 0-255 <Hl
rt
PHONETIC TRANSCRIPTION DEVELOPMENT SHEET
.....
w
"1:1
HI
0
::T
rt
:::s
::T
(I)
~
0
ti
0..
:X:
tt1
t-4
b
~
I
PREPffiEO BY
-~~0------------------
I
__
.1?'-. __t'._E/R ___ _
STEP liO.
1
_£_
FINAL
I
YES --
110
_Y-
-~~. ..1..~:....1.~~-.1.~:.~..1.~:.~
. 1.~~- . t;~ .1 . ~2.. . .1.~.~. .1 . . ~ . .1 . ~. . .1.~~. . +.;.. .1....~.....
:z.
PHONEME
f-t·
0
0
DURATION
ISl 3-0 (F")
N
INFLECTION
ISl 0-15 IFl
f-t·
s
SLOPE OF INFL.
ISl 0-7 (Fl
(I)
A
AMPLITUDE
ILl 0-15 (Hl
... !.~ .... l... !.~....L~-~--.l..·~-~.... l...!,.~ .... l. .. !.:::... .l. .. ~.~ ... .l ... ~.~... .l ... !.~ ... .l ... ~.~---.1 ... !.~ ... .1 ... ~.1;... .1 ... !.~ ... .1 ... ~.~---·
Rl
RATE OF SPEECH
ISl 0-15 IFl
~(I)
T
I'IRTICULI'ITION
ISl 0-7 (F)
... ~.~... .l...!:~... .I...!.!L.I ... !.~....l. .. !.~.... l. .. ~.~ ... .l. .. t.~... .l. .. l.~ ... .I.J~.... l. .. ~.~- .. .l ... ~~... .l ... ~.~... .l..J~... .I ... !,.~....
9
/2. I 7
7 I I) I /Z I 12.
.... ~.....L.. ~.... .L. .. ?._..l.... ~.... .l .... ~.... .1 .... 2.... 1.... :?.... .1 .... 2.... .1 .... ~- ....I.... ?... ..1 .... 2.....1.... :?... ..1 .... ~.... .! .... ?.....
1-'
E
EXT ENS I ON pITCH j ... JL ..
0-7 LOW
8-15 HIGH
FI
FILTER FREOUEHCYI .. ~~-~ .. I..~~-~.. J..~~-~ I..~~-~.. .I .. ~~-~-.l .. ~~-~-.l..n~ ..l..~~-~...(.. ~~-~... l..n~.J .. n~.. .I..~~-~.. J..1;~.3.. .1..~;'p...
ILl 0-255 <Hl
1-3
ti
Ill
:::s
rn
0
ti
f-t·
'U
rt
t'%j
'2ti
N
rn
01
(I)
0
0
·'
I
~=(.::/}
(I)
0
0'1
WORD
rt
:::s
'U
I
0
f-t•
rt
DATE
i
(I)
:::s
2..
2.
3
... J:~... .I. .. J:~... .L..l_~_..J ....~~... .l. ...~.~---·l ....~~... .l. ...t~.... l... :!~... .l. ...~~---.l ....~.'?... .l ....~.~---·l ....~~.... l. .. :!~... .l. ...~~-···
(/
7
/(
/I
SIS""I6
1 I cP
8'
.... ~.... .1. ... ~.... .1.. .. .!'.-..L .. ~.....L.. ~.... .I. ....~.... .l .....'P.... .l. ... ~.... .l .... ~.... .l .... ~.... .I .... ~.... .I .... ~.... .I .... ~.... .I. ....'P.....
"'
I... J~.....(.....L.I .... !L..(.... § .....I.... ~.... .I. ....L ..I.... § .....I.....~.....I.....f!.... .I. ....'L..I. ....f!.... .l. ....f!.....l.....f!.....
..
rt
PHONETIC TRANSCRIPTION DEVELOPMENT SHEET
-...1
~
I
I
i
I
HI
~
rt
~
(ll
(I)
rt
,_..
0
::t
~
0
ti
c.
(I)
-.,J
rJl
0
~
0
(I)
<
(I)
......
0
't1
a(I)
.I'
'f:=f_-£!
PREPI'REO BY
-~-~~~------------------
1
__
g_,__K£_1_~-----STEP NO.
•
_7_
FINr'tL
I
YES --
HO a./
INFLECT I ON
<S> 0-15 CF>
~
..
t1j
I.Q
,:::
ti
(I)
N
0\
s
SLOPE OF INFL.
CSl 0-7 CFl
A
CLl 0-15 CH)
Rl
RATE OF SPEECH
CS> 0-15 CF>
T
ARTICULATION
CS) 0-7 ,CFl
EI
v I P.4 I f'A
....!L.. L..~.... L ..~.--L.. ~.....I.... !'!'.•... I....~.... l ... ~.... l. ... ~.... .l .... .,..... l....~.. -.l. ... !'!.... .I. ... ~----.I. .. 5L. ..I. ... ~----·
I
I
2
.
..
2.
2.
2.
.3
DURo'ITION
3-0 CF)
($)
.....
IP.4 I.E"II IH...C I EHI lvH3 I u= lliHJ I (') I ov 1 l/
f,&~
PHONEME
Ill
0
rt
o
1-3
ti
8
't1
~lORD
0
ti
.....
't1
= rt
.....
rJl
1
0
~
~
DATE
J~....... J:~....... J:~--- . . J:~...... Y~.. -- ·--~~.. J. .~~---L.J:~. ..L .. ~....I...J:~. ..I. ...~.~... .I...J:~. . L.J:~. . L. ~~. . I
fl
/1
7
tf
s 16 7
1 l.P
II
...• ~.... .1 .....~.... .1. ... .!'.-..l .... ~.....l .... ~.... .l. ... ~... .1. .•. ~.... .1 .... ~.....1.... ~.... .1 .... ~..•. .1 .... ~.... .1.. .. ~.... .1.. .. ~.... .1.. .. ~.....
s
..J~ .. ..I. _g ....I.. }.L.I .. J.~ ... .I ... ~-*-··.1. .. ~.~----'··-~-*-···I ... !:t:....l... ~.~----l .. J.~ ....I... ~.~----'··-~-~----'· .J.? ... .I ... ~.~---·
o'INPLITUDE
EXTE;~71 OL"'o: ITCH
ltJ I /tJ
4
0
/tJ I 0
.. ....1 ... ~-~---·'-· }.~....I .. }~... .1 ... ~-~----'---~-~....!.. }.~... .I ...1 !'!....!. .. ~.!'!... .1 ...!.!'!... .1...~~---.
/3 I 7
.... ~.... .I.. .. §.... J....L.I.. ..L ..I...~:...J<~ ..I.~
.L.~ 2.....
~ ~ ..I.... 2.....j
·--~~----'---~-~----'---~~--.1 }.~
. . . .L.;. . . .J. . L.f.J. . .J. . . .
t..•.!L...I.....tii•....I.....L ..J... JL ..J.... !;I..... J.... ~.....I.....lil.., ..l.... ~.... .l ... .f~... L ...L ..I .....L ..I. ....L .. I... ..t;l.... .l.....L .
8-15 HIGH
F1
.
..
..I .. :?.~}..I .. :?}.~ ..I..;;;~.~--.l..*I~..I .. :?~.:?...u.~.3. ..
F IL rER FREauENcvl .. n~ .L!?.~-~ .t..~I~ .J..~~} .. I..:?.~.3...1 t:.~.LI .. :?}.~..I ~.~-?.
CLl 0-255 Oil
.
..
..
rt
PHONETIC TRANSCRIPTION DEVELOPMENT SHEET
-.,J
U1
DATE
0
HI
rt
::t
ID
~
0
t1
0.
~
t'i
sUl
I
?:::(_-::.fl
PREPARED BY
--~~~~~-----------------
1
--~~-~f:(~---STEP NO.
I
sf_
FINnL
I
VES
.!!.:.
NO --
0
~
ID
PHONEME
0
DURATION
rt
......
0
1-:3
N
t1
Ill
~
Ul
0
t1
......
'0
rt
......
0
~
rt
ID
t:J
0)
ID
'0
WORD
~
1
~
~
.g
sID
t%j
cSl
:.~
3-0 CF'l
INFLECTION
<S l 0-15 CFl
......
s
SLOPE OF INF'L.
t1
A
CLl 0-15 CHl
1\)
R
~
ID
-...!
T
EI
CSl 0-7 <Fl
t1MPLITUDE
RATE OF SPEECH
CSl 0-15 CF>
ARTICULATION
:.:. ~- ~;-:r~ 7-J=: li)lirrltl ;·1·1··
l/
I f'll I P/J
... ~~....1...~~... .1....~~- ..l ... !l~... .l....~.e.... l....~ ~... .1. ...1~... .1 ....~~-- ..I ....~~... 1....~~... .1 .. )~... .I ....~~....I .. Y:>... .1 ....~~....
q
')
I
~
13 113 I 13 I 7
6 l.r 6 I 7
7
.... ~.... .1.. .. ~.....1....~.- ..l.. .. ~.....l....~..... l.. .. ~.... .l.... .'~.... .l. ... ~.... .l .... ~.... .l. ... ~.... .l .... ~.....l .... ~.....l. ....~.... .l ....~.....
z 1471412 3 4 14- 14... ~~... .I ..J.~... .I...~~---1...~~... .1 ... ~:?.... 1... 1.?:... .1 ... ~.:;; ....1... ~.~---.I .. J:Z.... I... ~~---.I ... ~.?: ... .I ... ~.:?... .I .. J.~... .J ... ~.?:....
(!)
z. 12 I ;o I/()
CJ
3
... ~.~....l..J.~... .I...!~--.1 ... ~~---.I ... ~~- ...I.. J.~... .I...~.~---.I ... ~~---.I ... ~.~---.I..J.~... .I ... ~.!i!.... I...~.~---.I..J.~....I ... ~.~....
q
/3 I 3 113 I IZ I /2 I 't
s- I /2 I /Z I IZ
.... !?.....1.. .. !?.... .1 .... !?.• ..1 .....!?.....1....!?.... .I .... !?.....I.... !?.... .I ....!?.... .1.. .. !?...•.1 .... !?.... .I .... !?.....I.... !?.... .1 .....~.... .I .... !?.....
'
<Sl 0-7 CF'l
ExrE~s; o~: 1TCH
f... JL ..I.....9.....1.... .P. ...I.... JL ..I... JL ..j. ... :?.... l ... § .....l... ..~.... .l ... f~.... .l. ... .fL..I .... !'L..I. ....L ..I. ....L ..I... ..9....
8-JS HIGH
F 1 F 1L TER
FREauENc'!'t .. ~I~..I .. ~-~-~ ..L~~-t.l..g.~.~ ..I .. ~~-~T.?::~.~--.l..~.~}... t.. !2}}..t ..;.;J~::.u;t~J .. f.I~.J..~I~ ..Dl~ ..J ..~:?.~.:;~. ..
Cl..l 0-255 CHI
~
rt
PHONETIC TRANSCRIPTION DEVELOPMENT SHEET
-...J
0'1
CHAPTER 7
TEST AND CALIBRATION
7.1 Calibration
A Speech Board Calibration is necessary in order to
set the default values for the output speech •
There is a
hardware calibration that includes volume control, and
software calibration that includes setting the speech parameters to the default values.
Hardware Calibration
The hardware calibration includes the volume control
setting.
The phoneme /AW/ is generated and the volume is
adjusted to the desired level.
Initial setting should be
around the 50% setting.
Software Calibration
The software calibration includes the attribute default setting.
bration.
There are two methods to perform that cali-
The first is to set some values and auditorily
check the output and make the desired changes.
The second
one is to analyze human voice with a spectrograph machine
and from the spectrogram to conclude the desired parameters
values.
In this work, the first method was performed first.
Next, the result was recorded and analyzed along with the
77
78
author's voice.
The word 'hello' was used, and the result
is given in figures 28 and 29.
The following are the final settings for the default
values.
Duration - DO (98.3 ms)
Inflection - N10 (85 Hz)
Slop of Inflection - SO (8 phonemes durations)
Rate of Speech - R10 (normal setting)
Amplitude - A12 (-1.94 dB from the maximum setting)
Articulation - T5 {3 phoneme durations)
Filter Frequency - F233 {21.7 KHz)
7.2 Test
The purpose of the test is to establish weather or not
an acceptable output speech may be achieved in a reasonable
time.
Experiments with short words (2-3 syllables) and using
the methodology as described in Figure 19 demonstrated that
phonemes selection/adjustment alone takes a short time in
the range of 1-5 minuets.
The voice is intelligible but
rather monotonic and lacks human quality.
Programming the
speech parameters improves the speech quality considerably
but may take from 10 to 30 minuets to do so, depending on
the quality required.
The speech quality is found to be acceptable.
The de-
velopment time is reasonable and due to the easy to use
utility program.
79
Figure 28
Wideband Spectrograph of the Word "hello"
Generated by the Author Voice
80
. .
7E '0 e
c
-_r=-·.
. -~ ..
·~
.
w ·
-- !-....;:;;:
.
-~
-
·-~-:-
.
.,...
"-"!
~
··~--
·-
~
·-~
-~
. . ..,_
.;::l
.a-1
·-- : .:.;;..~
-~
-~
-~
Figure 29
Wideband Spectrograph of the Word "hello"
Produced by the Speech Synthesizer
- - ---------
-- ---
- - - - - - ----------
CHAPTER 8
CONCLUSIONS
A phonetic speech synthesizing development system was
designed and developed to be used with the IBM PC or
compatible machines.
A speech board was designed to inter-
face the SSI263 speech processor with the IBM PC and to
produce adjustable output voice to an outside speaker.
An
extensive utility program was developed in order to allow a
user to create an effective speech output, using the advance features of this particular speech processor.
Some
of these features include a practicing mode, i.e. keyboard
input mode, text file mode, permitting the user to create a
phonetic transcription using any word processor, and an expandable phonetic dictionary.
Other features include the
Phonetic Speech Development Sheet, a user friendly, menu
oriented program, and an extensive error message utility.
The system performance was tested and found to be acceptable.
The output speech, although somewhat "robotic",
sounds relatively intelligent, even with no parameters control.
This fact is found to be significant since the time
it takes to develop elementary transcriptions (phonemes
selection only), is minimal.
The addition of parameters control improves the output
speech quality significantly.
81
A certain characteristic may
~
82
be "tailored" to a word or sentence.
Such characteristics
are intonation, personality (man or woman voice), and speed
of speech.
The time required to achieve good results, is
generally extensive, and depends on the speech quality required.
A time period of 15-30 minuets per word was found
to be sufficient to achieve satisfying results.
In addition to the system design and development, this
work also represents an overview on speech characteristic,
different techniques of speech synthesis, and a methods for
speech development using the software developed in this
project.
The power of a phonetic speech synthesizer is in the
fact that it has a large vocabulary, it is inexpensive to
build (less than a $100), and a relative simple program is
needed to develop a speech.
The limitation of the phonetic
speech synthesizer are in the speech quality, the time it
takes to develop an intelligent speech, and the expertise
in speech phonetics that must be developed in order to
achieve good results.
A potential extension for this project might be a text
to phonemes conversion program.
Such a program uses a set
of rules to develope a phonetic transcription from an
English text.
In fact, the prime advantage of using a
phonetic synthesizer over other types of synthesizers is
for such an application.
.
83
REFERENCES
BOOKS
Bristow, Geoff, Electronic Speech synthesis, McGraw-Hill,
New York, 1984.
cater, John P., Electronically Speaking: Computer Speech
Generation, Howard W. Sams & co., Indianapolis, 1983.
Flanagan, James Loton, Speech Analysis, Synthesis and
Perception, Springer-Verlag, Berlin, New York, 1972.
Flanagan, James Loton and Lawrence R. Rabiner, Speech
Synthesis, Dowden, Hutchinson, and Ross, Pennsylvania,
1973.
Holmes, John Nicholas, Speech Synthesis, Mills & Boon,
London, 1972.
Ladefoged, Peter, A Course in Phonetics, Harcourt Brace and
Johanovich, New York, 1975.
Morgan, Nelson, Talking Chips, McGraw-Hill, New York, 1984.
Teja, Edward R. and Gary Gonnella, Voice Technology, Reston
Publishing, Reston, Virginia, 1983 •.
Veltri, Steven J., How To Make Your Computer Talk, McGrawHill, New York, 1985.
Witten, Ian H., Making Computer Talk: An Introduction To
Speech Synthesis, Prentice-Hall, New Jersey, 1986.
PERIODICALS
Elovitz, H.s., R.Johnson, A. McHugh, and J.E. Shore,
"Letter-to-Sound Rules For Automatic Translation of English
Text to Phonetics", IEEE Trans. ASSP, Bol. ASSP-24, No. 6,
p446-473, December 1976.
Fons, K. and T. Gargagliano, "Articulate Automata: An
overview of Voice Synthesis", Byte, February 1981, pl64187.
.
Hertz, S.R., "Appropriateness of Different Rule Types in
Speech Synthesis", Jour. Acoust. Soc. Am., p5ll-514, May
1979.
84
Savon, K., "Speech Synthesis Techniques", Radio
Electronics, p62-65, February 1982.
Secrest, B., M. Arjmand, and M. Ni, "Speech Analysis and
Synthesis Become Practical on Micro Computer Chip",
Electronic Design, pl29-136, May 1982.
Teja, E.R., "Board Level Voice-Output Systems Help Products
Speak For Themselves", EDN, April 1981, p45-59.
Wilson, D., "Processors Shape Trends In Speech Synthesis,
Voice Recognition", Digital Design, October 1983, p66-72.
85
APPENDIX A
SPEECH CARD DRAWINGS
P1
('v
1'12
1'13
1'1<4
A!S
I'll&
1'17
1'18
1'19
IJ1
, ... &.,.;>.,:; .. ;;)
9
+07
+06
+05
+D-4
+03
+02
+0
+00
,,
.. ""'
!;
DB?
88 11
97 l2
86 13
85 .1<4
8<1 15
~~
83 l?
82
81 18
1'19
8 1'17
7 1'16
&
1'13
3 1'12
2 I'll
096
095
nR.oli
083
OB2
llR 1
090
1'192
1'19
1'190
Sl'"ll'"r.!T HOT
11'1 ..9 EN
1
~
U2
7<~LS2<1 ..
1'129 +1'12
1'130 +1'1
1'131 +A0
SPARE
s
ARE:
+ ESI': TDRl
92
813 -lOW
81<1 - R
WR HOT
13
: 21'1<1 2Y.. 15
21'13
21'12
21'11
11'1o4
o4 11'13
2 11'12
liU
2Y3
2Y2 !7
2Y1
lY-4
1Y3 6
1Y2
1Y1 18
1G ::!G
4!
RESET
I
us
us
1 7"1LS00
.2
.3
~~·
L~
...!r
¥~LS138
6 G1
.of G21'1
5 G29
U3
7"1LS2<44
1'111 +I'IEN
1'122 +A9
+AA
1'123
A2<1 +"'7
1'125 +AI&
1'126 +1'15
1'127 +Ao4
1'129 ·~
"' ....
17
IS
I~
ll
8
I&
..
~
21'14
21'13
21'12
2Al
11'14
11'13
11'12
11'11
lG
2Y4
2Y3
2'1'2
2'1'1 12
,.,...
1Y3 14
1Y2 16
1 Vl 8
:G
L~
I
3
c
r L2.!. ~
~
Tye
2
¥~LS02
~
.. ~
~
vs
.,.
-~
.lr
Y7
Y6
~
'1'3
Y2
Y1
Y0 ~
¥6
8
U6
o4LS02
.10
"'
~
6
"'
-"'ODRESS
300-307 co EC)
0)
0\
U9
S
TP.,9
""
WR NOT
AB2
ABl
r21
~~=~
AG~g
~ ;,
!00F
1
2
"'"'
·vv
y
-.&
67
RS2
lS
RSl -PD/RST
S RS 0
· XCK 22
17 D?
OJ u2 23
16
06
15 OS
TEST 3
14 04
13 D3
cs 0 19
l l D2
CSl p20
10 Dl
9 00
~.
'":'..
~!.-----...,:....,
AB0
DB?
096
OBS
094
093
092
091
DBe
l0K
~---------------------------------------J
y
1+5
TP8
TP7
)'
TP4
.,·
U10
LM386
~~ s
r=-v
I
4 I'1HZ
I
~~
+5
RESET
f
~~LSeZ
I
13
- - --
SPEAKER RTN
2
POWER DOWN
3
POWER DOWN RTN
4
~
Yt
11
12 I
..--=:...J
0
S PKR
::f:;c7
• lUF
3 0
~ 6
cL i~----4-~
R6
4 • 7K
·vv
~
10
~ D PR a 5
I
SPEAKER
RS
I ua
64 74LS74
CLOCK
8
OSC.
(
J2
10~UF
~
T~S
• 047UF
"¢7
TPl
SELECT NOT
U7
MXOSS
~+--t------.
1
CS
TP3
,,;
+5
I
~
R4
:: ;;;; 04
• 047UF
TP6
-y
l
~+S
0
SW TCH
7
"""--
R7
- -- - - -- - :· 7K
-
CD
'-l
.I
DEVICE
+s
GND
U1
U2
U3
U-4
20
20
20
16
1-4
14
10
10
10
8
NOTES
1. ALL RESISTORS ARE 1/4W. 5~
2· ALL .lUF C~PACITORS ARE 50U.
20~
us
U6
+12
GNDA
cu
C12
C13
C1-4
C15
C16
7
7
us
14
7
U9
U10
24
12
CAP
2
6
C18
C19
4
---·----~·-
Pl
+5
GND
Ol
Ol
89
VI
t-
z
0
0..
t-
tl)
w
t-
O::N oc=HI
0:: (') oc=HI
__
~¢ ..._
tt:-t
oc=H~
or--
oc=HI
0::11)
~
0::\0
oc=HI
N
N
.....
N
90
APPENDIX B
IBM PC MEMORY AND I/0 MAP
91
Start Address
Decimal
Hex
0
16K
32K
48K
ocooo.
64K
80K
96K
112K
10000
14000
18000
1COOO
128K
144K
160K
176K
20000
24000
28000
2COOO
192K
20BK
224K
240K
30000
34000
38000
3COOO
256K
272K
288K
304K
40000
44000
48000
4COOO
320K
352K
368K
50000
54000
58000
5COOO
384K
400K
416K
432K
60000
64000
68000
6COOO
44SK
464K
480K
496K
70000
74000
78000
7COOO
512K
528K
544K
56 0K
80000
84000
88000
8COOO
576K
592K
608K
624K
90000
94000
98000
9COOO
Function
00000
04000
08000
336K
-
64 to 256K Read/Write Memory
on System Board
Up to 384K Read/Write
Memory in 1/0 Channel
System Memory Map for 64/256K System Board (Part 1 of 2)
(Source: IBM Technical Reference Manual)
92
Start Address
Decimal
Hex
640K
656K
672K
688K
AOOOO
A4000
ASOOO
ACOOO
704K
720K
80000
736K
88000
752K
8COOO
768K
784K
coooo
BOOK
816K
caooo
ccooo
832K
848K
864K
880K
00000
04000
08000
DCOOO
896K
912K
928K
944K
EOOOO
E4000
EBOOO
ECOOO
Function
128K Reserved
Monochrome
84000
Color/Graphics
C4000
960K
FOOOO
976K
992K
100BK
F4000
FBOOO
FCOOO
Fixed Disk Control
192K Read Only Memory
Expansion and Control
Reserved
48K Base System ROM
System Memory Map for 64/256K System Board (Part 2 of 2)
(Source: IBM Technical Reference Manual)
93
Hex
Usage
Rant~e•
000-00F
020-021
040-043
060-063
080-083
OAX••
200-20F
210-217
2F8-2FF
300-31F
320-32F
378-37F
380-38c•••
380-389···
390-393
3A0-3A9
3B0-3BF
3D0-3DF
3F0-3F7
3F8-3FF
790-793
B90-B93
1390-1393
2390-2393
DMA Chip 8237A-5
Interrupt 8259A
Timer 8253-5
PPI8255A-5
DMA Page Registers
NMI Mask Register
Game Control
Expansion Unit
Asynchronous Communications (Secondary)
Prototype Card
Fixed Disk
Printer
SDLC Communications
Binary Synchronous Communications (Secondary)
Cluster
Binary Synchronous Communications (Primary)
IBM Monochrome Display/Printer
Color I Graphics
Diskette
Asynchronous Communications (Primary)
Cluster (Adapter 11
Cluster (Adapter 21
Cluster (Adapter 31
Cluster (Adapter 41
• These are the addresses decoded by the current set of adapter
cards. IBM may use any of the unlisted addresses for future use.
•• At power-on time, the Non Mask Interrupt into the 8088 is masked off.
This mask bit can be set and reset through system software as follows:
Set mask: Write hex 80 to 1/0 Address hex AO (enable NMI)
Clear mask: Write hex 00 to I /0 Address hex AO (disable NMI)
••• SDLC Communications and Secondary Binary Synchronous Communications
cannot be used together because their hex addresses overlap.
1/0 Address Map
(Source: IBM Technical Reference Manual)
94
APPENDIX C
PHONETIC TRANSCRIPTON DEVELOPMENT SHEET
I
II
DATE
I
WORD
I
PREPARED BY
1
STEP NO.
I
---
FINAL • YES --
NO --
I
I
I
PHONEME
Dl
DURATION
CS) 3-0 CF>
·-·!.....l ....~.....l.. ..~.....l .... ~.....l.... ~.... .L ... ~.....l.... ~.....l.... ~.... .l .... ~.....l....~.....l.... ~.... .l ....~.... .l .... ~.--.1 .... ~.....
Nl
INFLECTION
CS> 0-15 CF>
__ .l..~....l... !~....l ...l~....l ... !~....l ....~~---.l. ...~.~---.I ....~S'....I... !~... .I ... !~... .l. ...t~... .l ... !.~.. ..l ... !~....l. .. !~....l ... !~....
Sl SLOPE OF JNFL.
CS> 0-7 CF)
AI
--~-~.... l... !~....l ...!~....l ... !~....l. .. !~.... l... !~....l ... !~... .l ... !~... .l ... !~... .l... !~....l ... !~... .l ... !~....l ... !~....l. .. !~...
AMPLITUDE
CL) 0-15 CH>
Rf RATE OF SPEECH
CS> 0-15 CF)
Tl
ARTICULATION
CS> 0-7 CF>
·--~..... l.... ~.....l .... ~..... l.... ~.... .l ....~..... l.... ~.....l .... ~.... .l .... ~.... .l .... ~.....l .... ~.... .l .....~.....l.... ~.....l .... ~.... .l .....~.....
--~-'t
...I... !~....I...Ht...l...!~....l...!~....l. ..!~....I...!~....I...!~....I... !~....I...!~....I..J&...I ...!~....I... ~&...I...!~...
___ §.... .l .... f?•••• .l .... f?.....l. ... f?..... l.....'!?••••• l. ... f?.... .l .... f?.....l .... f?..... l.... f?.....l....!!?.....l .... f?.... .l ....:?.... .l .... f?.... .l .... f?.....
Ef EXTENSION PITCH ___1.....1.••. § .....1.... !l.....l....!l..... l. ...§ ..... l.... § .....l. ....fl.....l.... !l.....l .... !1...•.1....!1..... 1.... !1.....l. ... g.... .l.... § ..... l.....fl.....
0-7 LOW
8-15 HIGH
FI
FILTER FREQUENCYI.-~1.~ . .1 .. ~.~-~-.1;.~2.~ .. 1..~2.~ ..1 .. ~.~-~ .. 1..~2.~ ..1 .. ~2.~ ..1 .. ~2.~ .. 1..~.~-~-.1 .. ~2.~ ..1..~2.~ ..1 .. ~2.~ ..1..~.~-~--.1..~2.~..
CLl 0-255 <H>
PHONETIC TRANSCRIPTION DEVELOPMENT SHEET
\0
01
96
APPENDIX D
PROGRAM LISTING AND FLOW CHARTS
97
Main Pl'<l9l'a.M
(
Main
)
Init
- Cload phoneMe data>
·
1'1 c tin it
(initialize
dictionary)
clea.l' screen
di spa}.!j tui n
enter·
selection
t\t!OU
yes
yes
)----.;.~
key :board
MOde
··~--en_a_ _)
TextFileMode
no
no
set
Endo!Loop
yes
Phonel!iod
DictK-Jde
98
(xeyhoaN.tiode )
Initialize
flags
Pl'int title
•J,te~houd
Input
on CRT
»etalut
no
)
'-----
set
EndotLineLoop
Speak
no
no
Initialize
line
pointers
Line Process
99
Procedure TextFileHode
Nset
lisOut»ab
eli splay
title and
N~est foro
f1le nu.e
Enter tile
--Th-rcl_ _).
naJile
add. extension
.rxr to tile
no
flaMe.
FileProcess
Srealt
set
Sa..eFile
yes
no
~h.ult
dis:play
ophons
enterQ-to quit
<1'3>-to reput
set
Indo! Loop
YfS
100
Proctdurf
Read file
rro., disk
into a. Lu.He:r
line
f:rott LuHe:r
rud a.
LineP:rocess
incl'eMent
line mu~Lel'
file~ocess
---
-
101
Procednre DictKode
clea!'
EndOUictLoop
cltu screen
Vl"i tt ti tlt
Vl"itt
vocaklal'!l
tnttr Vtl!'i to
seuch
list
tntt!' if to
add w!'d to
diet. <YIN>
Speak»ict~J'd
the
woN.>
(SJ*aX
Addto»ict
Yts
Ho
no
taveJ.Iict
c___
»-_d___
102
~cedure
~cedure
Save Diet
uase tile
DICT.BAK
l'enaMe
tile
SaveDict
Dictlnit
Dictlnit
p.rint:
•Reading
Phonetic
Diet.PHH to
Diet .BAK
Dictionary•
Vl'i te
dictionarg
file in
JOead. tile
Diet .PHK into
MeMO~
to
DICT.PHM
End
dictionary
tile in MeMOI'!I
Ind.
103
Procedure DictSearch
( J>ictSea.rch
clear
J
~oN!ound
and LastLoop
Set LoPb to ~
a.nd HiPtl' to
Dictlast~ol'd
Yes
set
La.stLoor
Die tlrol'dPh•:=
<LoPtr+HiPtl')/2
round orr the
l't!SUl t UpWONs,
LoPtl'::
DictWoNPb
Jlol'c!Loc::
Dictllol'dPtl'+l
HiPtl'::
:DicUbl'dPb
lolorclLoc: =
DictlloNPb
set lolordfound.
set LastLoop
lolord.Loc : =
:DictUoN.Loc
No
104
Proced~
AddtoPict
Pl'int: •liltel'
phonetic •..
bansc:ription
or the
. WON •••
ente:r phonetic
bansc:ripton
I •.•••.• I
inc~ttent
all
1110rd.s above
the location
or the new
IliON
lfl'i te the new
~CON (
,eag•
• phlt) into
the specified
location
105
PNted.urt
SpuUict~J'd
rtad the rhonetic
t:ransc:r1 pt ion
indicated Ly
DictWordPt:r into
the vu Line
Detaul t
ini tilize
LinePtr and
OutspchPt:r
LineP:rocess
106
Procedure LineProcess
Clear Ul'Ol'
thg
Section
Ho
Inc.
LinePt:r
Inc.
LinePb
Ho
107
Procedure Section
Section
)
lnc. LinePb,·
Cleu all
xxRegSet
flags
Jnc.
LinePtl'
Attribute
No
Ho
Ca.tchPhoneM,
CheckPhoneM,
Set the ne.,
DPRtsValue
Set DPRegSet tiag,
Stol'tlali.
Set Phone..Set flu
(..___rn_c~__,
108
Procedure CatchPhoneM
Clev
Phon eM
st:l'ing
F.ncl
Ho
ac1c1 CJa. to
the phonell
sbing
Inc.
LinePb
109
froctdurt CheckPhoneM
lni tia.l izt
PhoneMes talale
Pointe!'. Cleu
Phone..Ua.l id
!lase.
Get PhoneR
Code.
Inc. PhoneMes
taLle Pointe!'
Set
PhoneMVal id
flag
Send &1'0!'
Message
110
Inc.
LtntPtP
Jnc~Mnt
LintFtl'
uticulation
l11c LintPtl'
RattoUPttCh
Jlwoation
Ixtension
Pitch
Tii ttl'
F~'IU!'DC9
Jalltetion
Sl•r
lalltc itD
111
P~cedUJ>e
Duration
inc LinePtl'
find the
value
Pl'int urol"
ttessege,
location
print e!'ror
Yes
MaSk the
phoneM data
add the
dUl'ation
value to the
variable
:DPRegValu@
set
variahle
DPRegSet
End
sp
clear the
phoneM data
set &ror
112
Procedure AMPlitude
inc LinePb
ned the ch
indicated
J,y LinePtr
Yes
assign the ch
to the \l.il'
AIJalue
dec LinePtr
tARecr\lal
with Hex 1'9
NSlc Val'
add to vu the
va.ht! ot fH1alue
Set TARegSet
Ind
Pl'int errol'
_
~S$ege
Pl'IRt
location
or
!J'I'OP
113
Procedure Inflection
incl'e"ent
LinePt:r.
clea.:r leMpStr.
jncM"ent
LinePt:r.
stl'ing
includes
the value
decrettent
LinePt:r
aclcl next ch.
to st:ring,.
ancl get be
nWte:rlc value
Vl'ite e:r:ror
ttessage. ·
w:rite the
location of
the e:rrol'
set the new
value or
ISRegfJalue.
set ISRegSet
end
set I:r:ror
114
~ced~ Store:Data
Sto.re:Data
add a .recol'd to the
OutSpch al':rag
.Reg :: 'D'
.Code :: DPRe!\lalut
inc OutSpehPb
aclcl ftcol'd to
OutSpeh al':rag
.Reg :: 't'
.Code :: TATegValue
inc OutSpehPb
add ftCON to
OutSpeh al'l'a!l
.Reg :: 'I'
.Code :: JSRegVa.Iue
inc OutSpehPb
aclcl Hcord to
Outspeh al':rag
.Reg :: 'R'
.Code :: RERe!Va.lue
inc OutspehPtl'
acid Hcol'd to
OutSpeh al'l'ag
.Reg :: 'F'
.Code :: .ll'Re!\lalue
inc OutspehPb
Incl
115
J>etaul t
)
!
Set Processol'
fol'
transi tioned.
..od.e
!
Set
attl'iLutes
default
values
r
Set the
!"f~stus
to
default
va.lues
t
!
Send the
Nfisten
con ent to
the speech
Pl'OCfSSOl'
c
!
encl
116
ProcedUJ'e Speak
the l'e9
nue in the
urag and send
out the
l'e a.cl
associated code
to the right
l'e9 of the
P1'0Ct!SS01'
inc
Ho
yes
1----~
)
End
'------
117
(*********************************************************)
(*
(*
(*
(*
(*
(*
(*
*)
Program Name:
Author
Mod. Date
:
Description :
TALK.PAS
RAPHAEL KFIR
October 20, 87
Generate Speech Output to be
used with the Speech Card
*)
*)
*)
*)
*)
*)
(*********************************************************)
program Words(input, output);
{$R+}
const DP=$300;
(*phonem and Duration register*)
IS=$301;
(*inflection and inflection slop.reg.*)
RE=$302;
(*rate and extension register*)
TA=$303;
(*AtriculationjAmplitude register*)
FF=$304;
(*Filter Frequency register*)
NameOfDict = 'DICT.PHM';
MaxDictEntry = 100;
type Workstring = string[3];
LinetoProcess = string[200];
Register= string[5];
DictRange = l •. MaxDictEntry;
RegData = record
Reg : char;
Code : integer;
end;
DictWord = record
Eng: string[l5];
Phm: string[200];
end;
var
ReadDAta, PhonemTblPtr, PhonemCode, LinePtr,
outSpchPtr, DPRegValue, ISRegValue, RERegValue,
FFRegValue, TARegValue, TempBuffLine,
TempBuffLastLine, DValue, AValue, SValue,
TValue, RValue, EValue, FValue, Result, Mask,
Index, Code, FirstTmpDig, SecondTmpDig, Nvalue,
DictWordPtr, DictLastword, I, LoPtr, Hiptr, WordLoc
: integer;
PhonemValid, EndofLoop, EndofLineLoop,
EndofFileLoop, Keyboardinput, Fileinput, Error,
Phonemset, SameFile, DPRegSet, TARegSet, ISRegSet,
RERegset, FFRegSet, DisoutData, EndofDictLoop,
WordFound, LastLoop : boolean;
Line, SourceFileLine, DictFileLine : LinetoProcess;
Key
: string[1];
HexCode, FirstHexDig, SecondHexDig: string[2];
Tempstr
string[3];
RegVar
string[5];
TextFile
string[lO];
WordToSearch
string[15];
Phonetic
string[200];
PhonemArr
array[1 .• 64] of WorkString;
Phonem
workString;
118
PhonemCodeArr
SourceFile, DictFile
TempBuff
OutSpchReg
OutspchCode
outspch
Diet
. array[1 •• 64]
. text;
of integer;
:array[l •• 50] of LinetoProcess;
:array[1 •• 200] of Register;
:array[1 •• 200] of integer;
:array[1 •• 200] of RegData;
:array[DictRange] of DictWord;
{---------------------------------------------------------}
procedure init;
begin
(*initialize*)
1
PhonemArr[1]
PA 1 ;
IE
I;
PhonemArr[2] :=
1 E1 I;
PhonemArr[3]
Iy I i
PhonemArr[4]
I YI I ;
PhonemArr[5]
1 AY I;
PhonemArr[6]
I IE I;
PhonemArr[7]
II I ;
PhonemArr[S]
PhonemArr[9] := IA I;
PhonemArr[10] := 1 A1 I;
PhonemArr[11] := 1 EH 1 ;
I EH1 I;
PhonemArr[12]
PhonemArr[13] := IAE I;
IAE1 I;
PhonemArr[14]
IAHI;
PhonemArr[15]
1 AH1 I;
PhonemArr[16]
1
PhonemArr[17] := AW 1 ;
IQ I i
PhonemArr[18]
1
PhonemArr[19]
0U 1 ;
1 00 I;
PhonemArr[20] :=
PhonemArr[21] := 1 IU 1 ;
I IU1 I;
PhonemArr[22]
IU I;
PhonemArr[23]
1 U1 I;
PhonemArr[24]
PhonemArr[25]
'UH';
1
PhonemArr[26]
UH1 1 ;
1
PhonemArr[27]
UH2 1 ;
1
PhonemArr[28]
UH3 1 ;
1
PhonemArr[29]
ER 1 ;
PhonemArr[30] := IR I;
1 R1 I;
PhonemArr[31]
PhonemArr[32] := 1 R2 I;
ILl ;
PhonemArr[33]
I L1 I ;
PhonemArr[34]
PhonemArr[35] := I LF I;
PhonemArr[36] := IWI;
I BI ;
PhonemArr[37]
PhonemArr[38] := I DI ;
PhonemArr[39] := IKVI;
I pI i
PhonemArr[40]
·.-
.·.·-
·.·..·.·-
.·-·..··.-
.··..·.·.··..·.··.-
.·.·.·-
.··.-
.·-
.··.·.·.·.·.·-
PhonemCodeArr[1]
$00;
PhonemCodeArr[2] .- $01;
PhonemCodeArr[3]
$02;
PhonemCodeArr[4]
$03;
PhonemCodeArr[5]
$04;
PhonemCodeArr[6] := $05;
PhonemCodeArr[7]
$06;
PhonemCodeArr[S]
$07;
PhonemCodeArr[9] := $08;
PhonemCodeArr[lO] := $09;
PhonemCodeArr[l1] .- $0A;
PhonemCodeArr[12]
$0B;
PhonemCodeArr[13]
$OC;
PhonemCodeArr[14]
$00;
PhonemCodeArr[15] := $0E;
PhonemCodeArr[16]
$OF;
PhonemCodeArr[l7] := $10;
PhonemCodeArr[18]
$11;
PhonemCodeArr[19]
$12;
PhonemCodeArr[20]
$13;
PhonemCodeArr[21] := $14;
PhonemCodeArr[22]
$15;
PhonemCodeArr[23]
$16;
PhonemCodeArr[24]
$17;
PhonemCodeArr[25] := $18;
PhonemCodeArr[26] := $19;
PhonemCodeArr[27] := $1A;
PhonemCodeArr[28]
$1B;
PhonemCodeArr[29]
$1C;
PhonemCodeArr[30]
$10;
PhonemCodeArr[31] := $1E;
PhonemCodeArr[32] := $1F;
PhonemCodeArr[33]
$20;
PhonemCodeArr[34]
$21;
PhonemCodeArr[35] := $22;
PhonemCodeArr[36]
$23;
PhonemCodeArr[37] := $24;
PhonemCodeArr[38]
$25;
PhonemCodeArr[39]
$26;
PhonemCodeArr[40]
$27;
·.··.-
.·-
·..·.··..·.·-
.·.··.-
.·-
.·.·.·.··.-
.·-
119
.·.···.-
IT I ~
Phonem.Arr[41]
I K I;
Phonem.Arr[42]
PhonemArr[43] .- IHVI ~
1
HVC 1 ;
Phonem.Arr[44]
1 HF I~
Phonem.Arr[45] :=
1 HFC I;
Phonem.Arr[46]
Phonem.Arr[47] := IHNI;
Phonem.Arr[~8] := I Z I ;
IS I;
Phonem.Arr[49]
IJ I ;
Phonem.Arr[SO]
I SCH I;
Phonem.Arr[51]
lVI;
PhonemArr[52]
IF I ;
Phonem.Arr[53]
1 THV 1 ;
PhonemArr[54]
I TH I;
Phonem.Arr[55]
IM I;
Phonem.Arr[56]
INI;
Phonem.Arr[57]
1
Phonem.Arr[58]
NG 1 ;
I :A I ;
PhonemArr[59]
I: OH I ;
Phonem.Arr[60]
I :U I ;
Phonem.Arr[61]
I :UH'.;
Phonem.Arr[62]
I E2 I;
PhonemArr[63]
I LB I;
Phonem.Arr[64]
end; (* init *}
.·-
.··..·.·.··..·-
.·.·.··..·.··.·..·-
PhonemCodeArr[41]
PhonemCodeArr[42]
PhonemCodeArr[43]
PhonemCodeArr[44]
PhonemCodeArr[45]
PhonemCodeArr[46]
PhonemCodeArr[47]
PhonemCodeArr[48]
PhonemCodeArr[49]
PhonemCodeArr[SO]
PhonemCodeArr[51]
PhonemCodeArr[52]
PhonemCodeArr[53]
PhonemCodeArr[54]
PhonemCodeArr[55]
PhonemCodeArr[56]
PhonemCodeArr[57]
PhonemCodeArr[58]
PhonemCodeArr[59]
PhonemCodeArr[60]
PhonemCodeArr[61]
PhonemCodeArr[62]
PhonemCodeArr[63]
PhonemCodeArr[64]
.- .$28;
·:= $29;
.- $2B;
·.·:=
.·-
$2A~
$2C~
$2D;
:= $2E;
$2F;
.·.·.·-
.·:=
.·.·.·.·.·.·.·.·-
$30~
$31;
$32~
$33;
$34~
$35;
$36;
$37;
$38;
$39;
$3A;
$3B;
:= $3C;
:= $3D;
$3E;
$3F;
.·.·-
{------------======---------------------------------------}
Procedure Default;
begin
end;
EndofLoop :=false;
port[TA] := $80; (*CTL is HI*}
port[DP] := $CO~
(*mode is HH*)
port[TA] := $00; (*load the mode*)
Dvalue := O;
TValue := 5;
AValue := 12;
NValue := 10~ SValue := o;
RValue := 10; EValue := a;
FValue := 233;
DPRegValue := $00; (* set registers default values *)
TAregValue := $5C;
FFregValue := $E9;
ISregValue := $50;
REregValue := $AS;
port[DP] := DPregValue;
(* initialize registers *}
port[FF] := FFregValue;
port[TA] := TAregValue;
port[RE] := REregValue;
port[IS] := ISregValue;
{---------------------------------------------------------}
120
procedure CheckPhonem;
begin
PhonemTblPtr :=1;
PhonemValid := false;
repeat
if Phonem = PhonemArr[PhonemTblPtr] then
begin
PhonemCode := PhonemCodeArr[PhonemTblPtr];
PhonemValid := true
end;
PhonemTblPtr := PhonemTblPtr + 1
until (PhonemTblPtr=65) or PhonemValid;
if not PhonemValid then
begin
writeln;
writeln('ERROR*** Phoneme ',phonem,' is not valid!');
writeln( 'Location: Row ',TempBuffLine,' Column '
LinePtr-1) ;
Error := true;
end;
end;
{---------------------------------------------------------}
procedure CatchPhonem;
begin
Phonem := '';
while (LinePtr <= length(Line)) and
(Line[LinePtr] <> ' ') and
(Line[LinePtr] <> '(') and
(Line[LinePtr] <> '/') do
begin
Phonem := Phonem + Line[LinePtr];
LinePtr := LinePtr + 1;
end;
end;
{---------------------------------------------------------}
function PhonemOone:boolean;
(*Read 07 and loop untill HI is read *)
begin
repeat
ReadOata := port[OP];
(* loop unitl 07 is HI *)
until ReadOata >= $80 ;
PhonemOone := true;
end;
(* PhonemOone *)
{---------------------------------------------------------}
121
Procedure HexConv;
begin
FirstTmpDig := Code div 16;
SecondTmpDig := Code mod 16;
HexCode : = ' ' ;
str(FirstTmpDig,FirstHexDig);
case FirstHexDig of
1 10 1 :
FirstHexDig := 'A'
'11': FirstHexDig := 'B'
1 12 1 :
FirstHexDig := 'C'
1 13': FirstHexDig := 'D'
'14': FirstHexDig := 'E'
1
15': FirstHexDig := 'F' ;
end; (* case *)
str(SecondTmpDig,SecondHexDig);
case SecondHexDig of
1
10': SecondHexDig :='A'
1 11': SecondHexDig := 'B'
1 12 1 :
SecondHexDig := 'C'
'13': SecondHexDig := 'D'
' 14 ' : SecondHexDig : = · 1 E 1
1 15 1 :
SecondHexDig := 'F' ;
end; (* case *)
HexCode := FirstHexDig + SecondHexDig
end;
{---------------------------------------------------------}
Procedure Speak;
begin
writeln; writeln;
write('The Registers setting is: D',DValue, 'A',AValue);
write( 1 T' ,TValue, ' N' ,NVa1ue, 1 s• ,SVa1ue, 1 R' ,RVa1ue);
write1n(' E',EValue, 'F', FVa1ue);
{ Display output Data }
if DisOutData then
begin
Index := 1;
while index <= OutSpchPtr -1 do
begin
Code := OutSpch[Index].code;
Hexconv;
write (INDEX, •. •,outspch[Index].reg, •-$'
I
I ) ;
Hexcode, I
Index := Index + 1;
end;
end;
{ Send Output Data to Processor }
Index := 1;
while index <= outspchPtr -1 do
begin
Key:= OutSpch[Index].reg;
122
case Key of
'D' : begin
repeat until PhonemDone;
port[DP] := Outspch[Index].Code;
end;
'T' : port[TA] := outspch[Index].Code;
'R'
port[RE] := outspch[Index].Code;
'I'
port[IS] := outspch[Index].Code;
'F'
port[FF] := outspch[Index].Code;
end; (* case *)
Index := Index + 1;
end; (* while *)
repeat until PhonemDone;
Port[DP] := $00;
end;
{---------------------------------------------------------}
Procedure StoreData;
begin
if DPRegset then
begin
outspch[OutspchPtr].Reg := 'D';
outspch[OutspchPtr].Code := DPRegValue;
outspchPtr := OutspchPtr + 1;
end;
if TARegSet then
begin
OutSpch[OutSpchPtr].Reg := 'T';
outspch[OutspchPtr].Code := TARegValue;
outSpchPtr := outspchPtr + 1;
end;
·
if ISRegSet then
begin
outspch[OutspchPtr].Reg :='I';
OutSpch[OutspchPtr].Code := ISRegValue;
outspchPtr := outspchPtr + 1;
end;
if RERegSet then
begin
outspch[OutspchPtr].Reg := 'R';
outspch[OutSpchPtr].Code := RERegValue;
outspchPtr := outSpchPtr + 1;
end;
if FFRegSet then
begin
OutSpch[OutSpchPtr].Reg := 'F';
outSpch[OutspchPtr].Code := FFRegValue;
OutspchPtr := outspchPtr + 1;
end;
end;
{---------------------------------------------------------}
123
Procedure Duration;
begin
LinePtr := LinePtr + 1;
val(Line(LinePtr],DValue,Result);
if DValue < 4 then
begin
if PhonemSet
then Mask := DPRegValue and $3F
else Mask := DPRegValue and $00;
DPRegValue := Mask + DValue * 64;
DPRegSet := true;
end
else
begin
writeln;
writeln('ERROR*** Duration may only be set 0 to 3');
writeln('Location: Row ',TempBuffLine,' Column',
LinePtr-1);
Error := true;
end;
{LinePtr is now pointing at the end of the D value }
end; (* D *)
{---------------------------------------------------------}
Procedure Amplitude;
begin
LinePtr := LinePtr + 1;
TempStr := ' ' ;
TempStr := Line[LinePtr];
LinePtr := LinePtr + 1;
if (line(LinePtr] = ' ') or (Line(LinePtr] = ') ')
then
begin
val(TempStr,AValue,Result);
LinePtr := LinePtr - 1;
end
else
begin
Tempstr := TempStr + Line[LinePtr];
val(TempStr,AValue,Result);
end;
if AValue > 15 then
begin
writeln;
write('ERROR*** the maximum value for attribute');
writeln('A is 15!');
writeln('Location: Row ',TempBuffLine,' Column'
LinePtr-1) ;
error := true;
end
124
else
begin
Mask := TARegValue and $FO;
TARegValue := Mask + AValue;
TARegSet := true;
end;
(* if *)
{ LinePtr is now pointing at the end of the A
end; (* A *)
value }
{---------------------------------------------------------}
Procedure Inflection;
begin
LinePtr := LinePtr + 1;
TempStr := ' ' ;
TempStr := Line[LinePtr];
LinePtr := LihePtr + 1;
if (line[LinePtr] = ' ') or (Line[LinePtr] = ') ')
then
begin
val(TempStr,NValue,Result);
LinePtr := LinePtr - 1;
end_
else
begin
Tempstr := Tempstr + Line[LinePtr];
val(Tempstr,NValue,Result);
end;
if NValue > 31 then
begin
writeln;
write('ERROR*** the maximum value for Target
Inflection');
writeln('(N) is 31!');
writeln('Location: Row ',TempBuffLine,' Column '
LinePtr-1);
error := true;
end
else
begin
(* set the new register value *)
Mask := ISRegValue and $03;
ISRegValue :=Mask+ NValue * 8;,
ISRegset :=-true;
end;
(* if *)
{ LinePtr is now pointing at the end of the N
value }
end; (* A *)
{---------------------------------------------------------}
Procedure Slopinflection;
begin
LinePtr := LinePtr + 1;
125
val(Line[LinePtr],SValue,Result);
if SValue < 8 then
begin
Mask := ISRegValue and $F8;
ISRegValue := Mask + SValue;
ISRegSet := true;
end
else
begin
writeln;
writeln('ERROR*** Slope of inflection (S) may only be
0 to 7');
writeln( 'Location: Row ',TempBuffLine,' Column 1
LinePtr-1);
Error := true;
end;
{LinePtr is now pointing at the end of the S value }
end; (* s *)
{---------------------------------------------------------}
Procedure arTiculation;
begin
LinePtr := LinePtr + 1;
val(Line[LinePtr],TValue,Result);
if TValue < 8 then
begin
Mask :~ TARegValue and $SF;
TARegValue := Mask + TValue * 16;
TARegSet := true;
end
else
begin
writeln;
writeln('ERROR*** Articulation (T) may only be o to
7 I) ;
writeln('Location: Row ',TempBuffLine, 1 Column 1
LinePtr-1);
Error := true;
end;
{LinePtr is now pointing at the end of the T value }
end; (* T *)
{---------~-----------------------------------------------}
Procedure RateofSpeech;
begin
LinePtr := LinePtr + 1;
Tempstr := 1 ' ;
TempStr := Line[LinePtr];
LinePtr := LinePtr + 1;
if (line[LinePtr] = ' ') or (Line[LinePtr]
= ') ')
126
then
begin
val(Tempstr,RValue,Result);
LinePtr := LinePtr - 1;
end
else
begin
TempStr := TempStr + Line[LinePtr];
val(TempStr,RValue,Result);
end;
if RValue > 15 then
begin
writeln;
write('ERROR*** the maximum value for Rate of Speech
I )
i
writeln('(R) is 15! ');
writeln('Location: Row ',TempBuffLine,' Column',
LinePtr-1);
error := true;
end
else
begin
Mask := RERegValue and $OF;
RERegValue := Mask + RValue * 16;
RERegset := true;
end; (* if *)
value }
{ LinePtr is now pointing at the end of the R
end; (* R *)
{---------------------------------------------------------}
Procedure ExtensionPitch;
begin
LinePtr := LinePtr + 1;
Tempstr := ' ' ;
Tempstr := Line[LinePtr];
LinePtr := LinePtr + 1;
if (line[LinePtr] = ' ') or (Line[LinePtr] = ~) ')
then
begin
val(Tempstr,EValue,Result);
LinePtr := LinePtr - 1;
end
else
begin
TempStr := Tempstr + Line[LinePtr];
val(TempStr,EValue,Result);
end;
if EValue > 15 then
begin
writeln;
write('ERROR*** the maximum value for Extension of
Range Pitch');
127
writeln('(E) is 15! ');
writeln('Location: Row ',TempBuffLine,' Column'
LinePtr-1) ;
error := true;
end
else
begin
Mask := RERegValue and $FO;
RERegValue := Mask + EValue;
RERegSet := true;
end;
(* if *)
{ LinePtr is now pointing at the end of the E
value }
end; (* E *)
{---------------------------------------------------------}
Procedure FilterFrequency;
begin
LinePtr := LinePtr + 1;
Tempstr := ' ' ;
repeat
Tempstr := Tempstr + Line[LinePtr];
LinePtr := LinePtr + 1;
until (line[LinePtr] =' ') or (Line[LinePtr] = ') ');
val(TempStr,FValue,Result);
if FValue > 255 then
begin
writeln;
write('ERROR*** the maximum value for Filter
Frequency');
writeln('(F) is 255! ');
writeln('Location: Row ',TempBuffLine,' Column'
LinePtr-1) ;
error := true;
end
else
begin
FFRegValue := FValue;
FFRegSet := true;
end;
(* if *)
LinePtr := LinePtr - 1;
{ LinePtr is now pointing at the end of the E
value }
end; (* F *)
{---------------------------------------------------------}
Procedure Attribute;
begin
LinePtr := LinePtr + 1;
while (Line[LinePtr] <> ') ') and not Error do
begin
while Line[LinePtr] = ' ' do LinePtr := LinePtr + 1;
128
Key := Line[LinePtr];
case Key of
'D' , : Duration;
'A' : Amplitude;
'N'
Inflection;
'S'
Slopinflection;
'T'
arTiculation;
'R'
RateofSpeech;
'E' : ExtensionPitch;
'F'
FilterFrequency;
')'
linePtr := linePtr- 1;
else
writeln;
writeln('ERROR*** Illigal attribute (may only use
D,A,N,S,T,R,R,F) ');
writeln('Location: Row ',TempBuffLine,' Column'
LinePtr) ;
Error := true;
end; (* case *)
LinePtr : = LinePtr + 1 ; ·.
end;
(* while *)
LinePtr := LinePtr + 1;
end; (* Attribute *)
{--------------------------------------------------------}
Procedure· Section;
begin
repeat
LinePtr := LinePtr + 1;
DPRegset := false;
TARegSet := false;
ISRegSet := false;
RERegSet := false;
FFRegSet := false;
Phonemset := false;
while Line[LinePtr] = ' ' do LinePtr := LinePtr + 1;
if Line[LinePtr] ='('then Attribute
else if Line[LinePtr] <> 'I' then
begin
CatchPhonem; (*get phoneme from the input line*)
CheckPhonem; (*check if the phoneme is valid *)
Mask := DPRegValue and $CO;"
DPRegValue := Mask + PhonemCode;
DPRegset := true;
Phonemset :=true;
if Line[LinePtr] = '('then Attribute;
end;
StoreData;
until (Line[LinePtr] = '/') or Error;
end;
{---------------------------------------------------------}
129
Procedure LineProcess;
begin
Error := false;
repeat
while (LinePtr <= length(Line)) and (Line[LinePtr] <>
'I') do
begin
LinePtr := LinePtr + 1;
end;
if Line[LinePtr] = 'I' then Section;
LinePtr := LinePtr + 1;
until (LinePtr >= length(Line)) or error;
end;
{---------------------------------------------------------}
procedure FileProcess;
begin
assign(SourceFile, TextFile);
reset(SourceFile);
clrscr;
writeln('Reading FILEl.TXT');
writeln;
TempBuffLine := 1;
while not eof(SourceFile) do
begin (* reading the selected file *)
readln(SourceFile, SourceFileLine);
Tempbuff[TempBuffLine] := sourceFileLine;
TempBuffLine := TempBuffLine + 1;
end;
close(SourceFile);
TempBuffLastLine := TempBuffLine - 1;
writeln('File has total of', TempBuffLastLine,' lines');
writeln;
TempBuffLine :=1;
repeat
(* writing file to screen *)
begin
writeln(TempBuff[TempBuffLine]);
TempBuffLine := TempBuffLine + 1;
end;
until TempBuffLine = TempBuffLastLine + 1;
TempBuffLine :=1;
repeat
(* process lines *)
begin
Line := TempBuff[TempBuffLine);
LinePtr := 1;
LineProcess;
TempBuffLine := TempBuffLine + 1;
end;
until TempBuffLine = TempBuffLastLine + 1;
end; (* FileProcess *)
130
{--------------------------------------------------------}
Procedure KeyboardMode;
begin
Disoutoata := True;
clrscr;
writeln('KEYBOARD INPUT'); writeln;
EndofLineLoop := false;
repeat
writeln; writeln;
_
writeln('Enter Line Please •••• (Q to quit)');
read(Line);
if Line = 'DEF' then Default else
if Line= 11 then Speak else
if Line = 1 = 1 then Speak else
if Line = 'Q' then EndofLineLoop := True else
if Line = 'q' then EndofLineLoop := True
else
begin
OutspchPtr := 1;
LinePtr := 1;
LineProcess;
if not Error then Speak;
end;
until endofLineLoop
end;
{--------------------------------------------------------}
Procedure TextFileMode;
begin
DisoutData := false;
repeat
(* until EndofFileLoop *)
EndofFileLoop := false;
outSpchPtr := 1;
clrscr;
writeln('READING TEXT FILE 1 );writeln;
writeln('Would you like to display output data
(Y/N)?');
repeat until keypressed;
read(kbd,Key);
if Key = 'Y' then Disoutoata := True;
writeln('Enter name of Text File (.txt)');
repeat
readln(Textfile);
until Textfile <> '';
TextFile := TextFile + '.TXT';
FileProcess;
if not Error then speak;
repeat (* SameFile *)
begin
131
Default; (* reset all attributes to default *)
SameFile := false;
writeln;
writeln;write('(Enter Q to quit, F3 to repeat,');
writeln(' any key to continue)');
repeat until keypressed;
read(Key);
if (Key= 'q') or (Key= 'Q') then
endofFileLoop := true;
if Key = '=' then
begin
speak;
SameFile := true;
end;
end;
until not Samefile
until EndofFileLoop
end;
{--------------------------------------------------------}
Procedure Dictsearch;
begin
Wordfound := false;
LastLoop := false;
Loptr := o; Hiptr := DictLastword;
repeat
if HiPtr - Loptr = 1 then LastLoop := true;
DictwordPtr := (LoPtr + HiPtr) div 2;
if (LoPtr + hiPtr) mod 2 = 1 then
DictWordPtr := DictWordPtr + 1;
if WordToSearch > Dict[DictWordPtr].Eng then
begin
LoPtr := DictWordPtr;
WordLoc := DictWordPtr +1 ;
end
else
if WordToSearch < Dict[DictWordPtr].Eng then
begin
HiPtr := DictWordPtr;
WordLoc := DictwordPtr;
end
else
begin
WordFound := True;
WordLoc := DictWordPtr;
LastLoop := True;
end
until LastLoop;
end;
{---------------------------------------------------------}
132
Procedure AddToDict;
begin
Writeln('Enter the Phonetic Transcription of the
word ••• ');
readln(Phonetic);
for I := DictLastWord downto WordLoc do
begin
Dict[I+1].Eng := Dict[I]..Eng;
Dict[I+1].Phm := Dict[I].Phm;
end;
DictLastWord := DictLastWord + 1;
Dict[WordLoc].eng := WordToSearch;
Dict[WordLoc].phm :=phonetic;
end;
{-----------------------------------------~--------------}
Procedure Dictinit;
begin
writeln('Reading Phonetic Dictionary .••. ');
assign(DictFile, NameOfDict);
reset(DictFile);
clrscr;
DictWordPtr:=1;
while not eof(DictFile) do (* reading dictionary *)
begin
readln(DictFile, DictFileLine);
Dict[DictWordPtr].Eng := DictFileLine;
readln(DictFile, DictFileLine);
Dict[DictWordPtr].Phm := DictFileLine;
DictWordPtr := DictWordPtr + 1;
end;
close(DictFile);
DictLastWord := DictWordPtr - 1;
end;
{--------------------------------------------------------}
Procedure SaveDict;
begin
assign(DictFi1e,'DICT.BAK 1 ) ;
erase(DictFile);
assign(DictFile, Nameofdict);
rename(DictFile, 'DICT.BAK');
assign(DictFile, Nameofdict);
rewrite(DictFile);
for DictwordPtr := 1 to DictLastWord do
begin
DictFileLine := Dict[DictWordPtr].Eng;
writeln(DictFile, DictFileLine);
DictFileLine := Dict[DictWordPtr].Phm;
133
writeln(DictFile, DictFileLine);
end;
close(DictFile);
end;
{--------------------------------------------------------}
Procedure SpeakDictWord;
begin
Line := Dict[DictWordPtr].Phm;
Default;
LinePtr := 1;
outSpchPtr :=1;
LineProcess;
if not Error then Speak;
end;
{---------------------------------------------------------}
Procedure DictMode;
begin
EndofDictLoop := false;
repeat (* until EndofDictLoop *)
clrscr; writeln('DICTIONARY MODE');
for DictwordPtr := 1 to DictLastWord do
writeln(DictwordPtr,'.',dict[DictWordPtr].Eng);
Writeln(•enter word to search •.• ');
readln(WordToSearch);
Dictsearch;
if WordFound then
begin
writeln('Word was found in location' wordLoc);
SpeakDictword;
end
else
begin
Write('Word was not found! should be inserted');
writeln(' in location ',WordLoc);
writeln;writeln('Would you like to add that word?
(Y/N) I) ;
repeat; read(kbd,Key);
until (Key= 'Y') or (Key= 'N');
if key = 'Y' then AddToDict;
end;
writeln; writeln; writeln('Want to quit? (Y/N) ');
read(kbd,Key);
if (Key= 'Y') or (Key= 'Y') then
EndofDictLoop := True;
until EndofDictLoop;
writeln; writeln('Would you like to save changes?
(Y/N)');
read(kbd,key);
134
if (Key= 'Y') or (key= 'Y') then saveDict;
end;
{-------------------------------------------~------------}
begin (* main *)
Default;
Init;
port(DP] := $00;
clrscr;
EndofLoop := false;
Dictinit;
repeat
OutSpchPtr := 1;
TempBuffLine := 1;
Default;
(* Reset the registers to default values *)
clrscr;
writeln('Main Manue');
writeln;
writeln('1. Keyboard');
writeln('2. Text File');
writeln('3. Talk File');
writeln('4. Phonemes');
writeln( 1 5. dictionary');
writeln('Q. Quit');
writeln;
write('Enter Selection');
repeat until keypressed;
read(kbd,Key);
case Key of
'1' : KeyboardMode;
'2'
TextFileMode;
15 1
:
·DictMode;
'Q' : EndofLoop := True;
'q' : EndofLoop := True;
end; (* case *)
until EndofLoop or Error
end.
135
APPENDIX E
PHONEME CHART
136
PHONEME CHART
<Listed_by Code)
Hex Code Symbole
Example Word
=======================================
00
01
02
03
04
05
06
07
08
09
OA
OB
oc
OD
OE
OF
10
11
12
13
14
15
16
17
18
19
1A
1B
lC
lD
1E
1F
20
21
22
23
24
25
26
27
28
29
2A
2B
PA
E
E1
y
VI
AY
IE
I
A
AI
EH
EH1
AE
AE1
AH
AHl
AW
0
au
00
IU
IUl
u
U1
UH
UHl
UH2
UH3
ER
R
R1
R2
L
L1
LF
w
B
D
I<V
p
T
K
HV
HVC
(pause>
mEEt
bEnt
bEfore
Year
plEase
anY
six
mAde
cAre
nEst
After
dAd
After
gOt
fAther
Office
stOre
bOAt
lOOk
yOU
cOUld
tUne
cartOOn
wOnder
lOve
whAt
nUt
biRd
Roof
Rug
muttER <German)
Lift
pLay
faLL <final>
Water
Bag
paiD
taG <glottal stop)
Pen
Tart
Kit
<hold vocal>
<hold vocal closure)
137
PHONEME CHART
<Listed by Code>
Hex Code Symbole
Example Word
--------------------------------------HF
2C
Heart
2D
2E
2F
30
HFC
HN
z
s
31
J
32
33
34
35
36
37
SCH
38
N
39
3A
3B
3C
30
3E
3F
NG
:A
:OH
v
F
THV
TH
M
:u
:UH
E2
LB
<hold fricative closure>
·<hoi d nasal>
Zero
Same
meaSure
SHip
Very
Four
THere
wiTH
More
NiNe
raNG
mArch en (German)
lOwe <French)
fUnf <German)
menU <French>
bittE <german)
Lube
138
PHONEME CHART
<Listed by Phonemes)
Hex Code Symbole
Example Word
=======================================
3A
3B
3C
3D
08
oc
OD
OE
OF
09
10
05
24
25
01
.02
3E
OA
OB
1C
34
2C
2D
2E
2A
2B
07
06
14
15
:A
: OH
:u
:UH
A
AE
AEl
AH
AHl
AI
AW
AY
B
D
E
El
E2
EH
EHl
ER
F
HF
HFC
HN
HV
HVC
I
IE
IU
IUi
31
J
29
26
20
21
3F
22
37
K
38
39
11
13
12
27
00
1D
KV
L
L1
LB
LF
M
N
NG
0
00
ou
p
PA
R
mArch en <German>
lOwe <French>
fUnf <German>
menU <French>
mAde
dAd
After
gOt
fAther
cAre
Office
plEase
Bag
paiD
mEEt
bEnt
bittE <german>
nEst
After
biRd
Four
Heart
<hold fricative closure>
(hold nasal>
<hold vocal>
<hold vocal closure>
six
anY
yOU
cOUld
meaSure
Kit
taG <glottal stop>
Lift
pLay
Lube
faLL <final>
More
NiNe
raNG
stOre
lOOk
bOAt
Pen
(pause>
Roof
139
PHONEME CHART
<Listed by Phonemes)
Hex Code Symbole
Example Word
=======================================
1E
1F
30
32
28
36
35
16
17
18
19
1A
1B
R1
R2
-
s
SCH
T
TH
THV
u
U1
UH
UH1
UH2
UH3
33
23
v
03
y
04
2F
VI
w
z
Rug
muttER <German)
Same
SHip
Tart
wiTH
THere
tUne
cartOOn
wOnder
lOve
whAt
nUt
Very
Water
bEfore
Year
Zero
140
APPENDIX F
OVERALL AUDIO STAGE ANALYSIS USING SPICE
8
3
V2
2.5V DC
OFFSET
V1
1V P-P
6U DC
OFFSET
9
C2
.047UF
R4
SK
C3
.047
E1
liZJ
RS
10
C6
12
R6
s
cs
•1
IZJ
AUDIO OUTPUT STAGE. SPICE MODEL
I-'
~
I-'
142
*
?~UDIO.
****
CIR
CIRCUIT DESCRIPTION
**********************************************************
** Analysis of the Audio Out
*.OPT ACCT LIST NODE OPTS
of the Speech Synthesizer Card
.l•JIDTH OUT=80
.PROBE
.PROBE VDB<8J VPC8) VDB<10) VPC10> VDBC12J VP<12)
.AC DEC 10 .1 lOOK
Vi P 0 AC 1
V2 ""
·-· 1 DC 2.5
..,.
C1 ·-' 4 lOU
R1 4 5 2•<
C2 5 0 .047U
p?
,_ 5 6 3.3r:
R"' 6 8 51<
R4 8 !) 5•<
C3 8 i) .047U
El 9 0 8 0 20
V3 10 9 DC 6
R5 10 11 10
C5 11 0 0. 1U
C6 12 10 lOOU
F:6 12 0 8
.PRINT AC VDB<12) VP ( 12)
.END
·-I

Download Report

KfirRaphael1988

Paperzz.com

Your Paperzz