The mental representation of speech sounds

PSYCH 150 / LIN 155
syn
lab
UCI
COGNITIVE
SCIENCES
01.10.13:
Psychology of Language
Prof. Jon Sprouse
The Mental Representation of Speech
Sounds
1
A logical organization
For clarity’s sake, we’ll organize our investigations around the direction of
comprehension:
sensory
conceptual
1.
sounds
2.
words
3.
sentences
We’ll also organize our investigation around the logical order of questions we
saw last time:
1. What are the properties of the mental representations that must be
constructed?
2. What processes does the brain deploy to construct those mental
representations?
3. How do the physical parts of the brain implement those processes?
2
What is the mental
representation of speech
sounds?
3
Physical signals and Percepts
When a person speaks, they
emit a physical signal.
“ah”
What are the physical properties
of the signal?
You perceive this physical signal -the mental object that you perceive
is often called a percept.
“ah”
What is the mental representation
of the percept?
4
The simplest hypothesis
The simplest possible hypothesis is that the mental representation of speech
sounds is the physical properties of the signal
signal
“ah”
percept
physical properties
“ah”
This suggests a very straightforward workflow:
1. Identify all of the speech sounds that make up a language (or all languages)
2. Identify the physical properties of the speech sounds
3. Use those physical properties as the definition of the mental representations
5
Identifying the speech sounds
of a language (or all
languages)
6
Identifying the set of speech sounds of
a language (or all languages)
Speech is a continuous stream of sound. Although there are breaks, they
probably don’t correspond with your intuition about what counts as speech
sounds:
What we need then is a principled way to identify the set of speech sounds
in a language.
7
The phoneme approach
The phoneme approach says that the units of speech are called phonemes, and
are defined as follows:
phoneme:
The smallest segment of speech that leads to meaningful
contrasts between words.
In other words, two speech sounds are phonemes if they are the smallest
change that you can make that results in a different word:
pat
bat
= p and b are distinct phonemes
sat
sad
= t and d are distinct phonemes
8
The phonemes of English
If you follow this procedure, you will end up with the following set of phonemes:
Note that these symbols are
from the International
Phonetic Alphabet
The IPA was created
because many of the
symbols of the Roman
alphabet are ambiguous.
In the IPA, each symbol
represents a single
phoneme.
9
Identifying the physical
properties of phonemes
10
The vocal tract: source and filter
Much like a musical instrument, the vocal tract consists of a source of the
sound and a filter that alters the properties of the source sound.
source: reed
filter: body
filter: pharynx
and oral cavity
source: vocal folds
11
Basic properties of the source
12
The basics of sound: longitudinal waves
Sound is a distortion in pressure that travels through a gas.
Gas molecules fill whatever size space they are given.
space 1
twice the volume, half the density
You can push certain air molecules closer together for a time (increase the air
pressure, called compression):
compression
13
The basics of sound: longitudinal waves
When you push air molecules closer together (compression), you also create a
space behind the compression that is less dense called rarefaction.
compression
Compression and rarefaction are two opposing forces - compression causes
rarefaction behind it.
And because gas molecules want to equalize their density in a given space,
compression of one set of molecules will cause a “wave” of compression to occur
throughout the space as the compressed molecules try to “get away” from each
other.
14
The basics of sound: longitudinal waves
The best way to visualize the way that a compression wave travels through space
is with a slinky:
If you push one end of a stretched slinky, you can see the first set of coils
compress, and watch the compression wave spread across the slinky as the coils
try to equalize their density.
www.youtube.com/watch?v=f66syH8B9D8
15
The basics of sound: visualizing waves
To visualize the temporal properties of sound, we focus on a single point in
space (i.e., there is no spatial information at all).
We then draw the compressions and rarefactions that occur over time at that
single point in space.
So the x-axis represents time, and the y-axis represents the changes in pressure
(measured in the force applied to the air) over time.
crest = higher air pressure,
coincides with compression
normal pressure
time
trough = lower air pressure
coincides with rarefaction
16
The basics of sound: amplitude
Once we have a representation of a wave, we can describe its physical
properties.
Amplitude is a measure of the force applied to an area of air during
compression and rarefcation.
It is a way of measuring the energy that a wave has.
Amplitude is represented by the
height of the wave between the
normal pressure line and a peak (or
trough)
17
The basics of sound: frequency
Frequency is a measure of the number of cycles that a wave completes in a
given unit of time.
A complete cycle consists of compression, rarefaction, and return to normal.
3 cycles per second
12 cycles per second
Frequency is measured in Hertz (Hz): cycles per second.
Humans can hear from 10Hz - 20,000Hz... but we lose upper frequencies as
we age:
http://www.noiseaddicts.com/2009/03/can-you-hear-this-hearing-test/
18
The basics of sound: fundamental
frequency
The fundamental frequency (F0) is the frequency at which sound sources
(oscillators) vibrate:
Guitar strings:
Human vocal folds:
196 Hz
246.9 Hz
329.6 Hz
146.8
110
82.4
For males it is around 130hz (C)
For females it is around 220hz (A)
19
The basics of sound: harmonics
In addition to the fundamental frequency (F0), vibrating sound sources
(oscillators) also produce a series of additional frequencies called harmonics.
Harmonics are always integer multiples of the F0:
F0
100 Hz
F0
200 Hz
F0
400 Hz
2nd
200 Hz
2nd
400 Hz
2nd
800 Hz
3rd
300 Hz
3rd
600 Hz
3rd
1200 Hz
4th
400 Hz
4th
800 Hz
4th
1600 Hz
5th
500 Hz
5th
1000 Hz
5th
2000 Hz
We call a tone that consists of a F0 and harmonics and complex tone.
The physics behind the existence of harmonics is complex, so let’s wait a bit
before talking about it.
20
The basics of sound: wavelength
Wavelength is a measure of the distance between identical locations in the
cycle of a wave
The wavelength of a sound is related to the frequency and velocity of the
wave:
wavelength = velocity / frequency
21
Are the properties of the
source critical to the
representation of phonemes?
22
Experiment 1:
Amplitude and Phonemes
Here is a simple experiment to determine if amplitude of the source is
critical to the difference between speech sounds.
Step 1: say “ah”
Step 2: say “ah” with high amplitude
Step 3: say “ah” with low amplitude
Remember, amplitude is a
measure of the size of the
distortion -- it is the force applied
to the air to cause the
disturbance
Question: Did varying the amplitude result in a different phoneme? (e.g., ‘ee’)
Alternative experiment: say ‘ah’ and ‘ee’ with the same amplitude...
Conclusion: Varying the amplitude does not result in changes in the
phonemes, only changes in volume, so amplitude is not critical to the
representation of speech sounds.
23
Experiment 2:
F0 and Phonemes
Here is a simple experiment to determine if F0 of the source is critical to
the difference between speech sounds.
Step 1: say “ah”
Step 2: say “ah” with high frequency
Remember, F0 is a measure of
the number of cycles the wave
completes in a given time.
Step 3: say “ah” with low frequency
Question: Did varying the frequency result in a different phoneme? (e.g., ‘ee’)
Alternative experiment: say ‘ah’ and ‘ee’ with the same frequency...
Conclusion: Varying the F0 does not result in changes in the phonemes,
only changes in pitch, so frequency is not critical to the representation of
speech sounds.
24
Experiment 3:
Harmonics and Phonemes
Here is a simple experiment to determine if harmonics of the source are
critical to the difference between speech sounds.
Step 1: say “ah”
Step 2: say “ah” with high harmonics
Remember, harmonics are
integer multiples of F0.
Step 3: say “ah” with low harmonics
Problem: you can’t vary harmonics independently. You can only vary the
harmonics by varying F0 (which we already tried).
Conclusion: Varying harmonics is the same as varying F0, which only
changes the pitch, not the phonemes
25
Experiment 4:
Wavelength and Phonemes
Here is a simple experiment to determine if wavelength of the source is
critical to the difference between speech sounds.
Step 1: say “ah”
Step 2: say “ah” with short wavelength
Remember, wavelength is a
measure of the distance between
identical parts of the wave cycle
Step 3: say “ah” with long wavelength
Problem: you can’t vary wavelength independently. You can vary frequency
(which we already tried), or you can vary the velocity of the wave, but the
latter is dependent on the medium that the wave travels through...
Conclusion: Because we can’t control the velocity of sound waves with our
body (it is dependent on the gas they travel through), we can only control
wavelength through frequency. And frequency is not critical to the
representation of speech sounds.
26
So what is critical? The filter!
Much like a musical instrument, the vocal tract consists of a source of the
sound and a filter that alters the properties of the source sound.
source: reed
filter: body
filter: pharynx
and oral cavity
source: vocal folds
27
So what is critical? The filter!
ah
filter: plastic tubes
ee
oral cavity
pharynx
duck call
eh
oh
source: duck call
http://www.exploratorium.edu/
exhibits/vocal_vowels/
vocal_vowels.html
28
To understand how the filter
works, we need to learn more
about Harmonics...
29
Reflection and Interference
When a wave is reflected back on itself, the crests and troughs interact in a
process known as interference.
Constructive interference occurs when the crests align with other crests (and
troughs with other troughs), which is also called being “in phase”.
The two reflected
waves are in phase,
which means that
crests and troughs
align with other
crests and troughs.
This doubles the
amplitude of the
wave.
30
Reflection and Interference
When a wave is reflected back on itself, the crests and troughs interact in a
process known as interference.
Destructive interference occurs when the crests align with troughs (and troughs
with crests), which is also called being “out of phase”.
The two reflected waves
are out of phase, which
means that the crests and
troughs are aligning with
each other.
If two waves are perfectly
out of phase, the
amplitude will reduce to 0.
31
Reflection and Interference
A wave traveling through a string of finite length will be reflected back after it
hits the end.
Crucially, the reflected wave will have the same speed and amplitude, but be
completely out of phase with the original wave.
32
Reflection and Interference
An oscillator is an object, like a string, that is continually vibrating. This means
that waves are continually traveling down the string.
If there is a wave traveling one direction on a string, and a reflected wave
traveling the other direction on a string, some form of interference will occur.
33
Reflection and Interference
The interference pattern of
oscillators creates standing
waves: even though the
original wave and reflected
wave are traveling, the resulting
interference creates a wave that
doesn’t appear to be moving:
The animation shown on this
webpage is a much better
illustration than this static
picture:
www.phys.unsw.edu.au/jw/strings.html
34
Reflection and Interference
Standing waves are interesting because there are certain points along the
string that define the boundaries of the standing waves:
These are the points
where the traveling
waves are completely
out of phase, resulting
in zero amplitude.
Crucially, these points
will always be an
integer fraction of
the full length of the
string.
35
Harmonics are the result of standing
waves
Harmonics are always integer multiples of the F0:
F0: 200 Hz
H2: 400 Hz
H3: 600 Hz
H4: 800 Hz
H5: 1000 Hz
36
Vocal folds produce a broad spectrum
of frequencies simultaneously
Because of harmonics, the vocal folds produce a broad spectrum of frequencies
simultaneously:
We can represent this with a
graph like this: frequencies
are on the x-axis and the
amplitude of the frequencies
is on the y-axis.
amplitude
source: vocal folds
1000
2000
3000
frequency
Harmonics are represented in a line spectrum graph by vertical lines (a
spectrum of frequencies). Note that the amplitude of harmonics decreases as
their frequency increases. The solid horizontal line is called the envelope of
the spectrum.
37
The Fundamental is Special
Recall that vibrating objects have a fundamental frequency and an associated
set of harmonics that are integer multiples of the fundamental frequency:
F0
200 Hz
H2
400 Hz
H3
600 Hz
H4
800 Hz
H5
1000 Hz
H6
1200 Hz
We call a tone that contains multiple frequencies a
complex tone, and a tone that contains a single
frequency a simple tone.
In the case of complex tones that have a harmonic
structure, we perceive the pitch of the tone as
being equal to the pitch of the fundamental
frequency.
We perceive the harmonics as overtones, which lead to a richer sound.
Now, let’s ask ourselves why we treat the fundamental differently than the
harmonics (i.e., we perceive the pitch of the complex tone as equal to the
fundamental, and not equal to the harmonics)?
38
Why is the fundamental special?
Hypothesis 1:
It is simply because the fundamental has the highest
amplitude (i.e., the loudest)
F0
200 Hz
100 dB
H2
400 Hz
50 dB
H3
600 Hz
25 dB
H4
800 Hz
12 dB
H5
1000 Hz
6 dB
H6
1200 Hz
3 dB
This hypothesis makes an interesting
prediction:
If the crucial property is amplitude,
then taking away the fundamental
should change the pitch of the tone:
the pitch should now be the frequency
of H2!
Similarly, if we take away both the F0 and H2, the pitch should be based on H3.
Here is a schematic of this test: each successive
complex tone has the lowest frequency
removed.
39
Why is the fundamental special?
Surprisingly, removing the lowest tone in these complexes does not change the
pitch that we perceive. How can this be?
The answer seems to be that the brain
restores the missing fundamental from a
complex tone if that tone appears to have
harmonic structure.
If this is just an illusion, it isn’t very helpful.
But if the auditory cortex can actually reconstruct the fundamental from the
harmonics, then it tells us something about the abilities of the auditory cortex:
1. The auditory cortex may be able to perform calculations on the incoming
signal in order to create new information that is not transparently available in
the signal. So we can go beyond the simplest hypothesis for the representation
of speech sounds!
2. The auditory cortex may be able to do some sort of mathematical factoring
(or perhaps division) to figure out the common denominator in the tone
complexes.
40
Telephone companies are cheap!
Telephones only transmit a narrow band of frequencies 300Hz-4000Hz:
This is partly because small speakers can’t reproduce low frequencies well. But
telephone companies seized on this limitation as a way to save money on data
transfer (both landlines and cell networks).
Recall that the F0 of the human voice is below 300Hz: the male average is
130Hz, the female average is 220Hz.
This means that telephones do not transmit the F0 of our voices!
The reason that we can discriminate the gender of the people we are talking to
is because our brains can restore the fundamental from the harmonics!
41