physical_factors_w_figs

CReSS LLC Working Papers
Working Paper 2
An essay on physical and geometric factors in speech development
R. S. McGowan
CReSS LLC
1 Seaborn Place, Lexington MA 02420
[email protected]
April 2, 2014
1
Introduction
Physical and geometric aspects of children’s vocal tracts help determine the sounds that
children produce as they learn to speak. In the age range of 6 months to 30 months, the
vocal tract goes through substantial physical change. It is a well-accepted notion that
many of the speech problems that appear at the beginning of elementary school are simply
continuations of earlier speech behaviors. Our view is that many of these earlier, preserved
behaviors first arise because of physical/geometric (roughly physiological/anatomic) factors
that constrain children as they begin to produce speech. If this thesis is correct, then it
means that some children preserve earlier behaviors that are 1) not adult-like, and 2) may
no longer be necessary because the physical constraints for non adult-like behavior are no
longer extant.
This essay proposes reasons that certain speech sounds may not be produced in an
adult-like manner as children begin to talk. The essay does not consistently reference
previous scientific work in speech development, other than work that we have done at
CReSS LLC. (We do reference two other works that do not appear in the bibliographies of
our own published works.) This is partly due to the fact that we want to synthesize our
knowledge of particular aspects of speech development that have emerged from research
at CReSS LLC. The personnel at CReSS LLC who are included are Margaret Denny,
Michel Jackson, Rebecca McGowan, and Richard McGowan. We also have had an ongoing
collaboration in speech development with Dr. Susan Nittrouer. We are proposing to study
children’s tongue curvature using ultrasound with Professor Diana Archangeli.
There is one underlying assumption made here, that, we believe, many researchers make:
children try to be like adults in their speech. (The reason for the desire to be like adults
may simply be a motivation to imitate or a motivation to be understood, or both. We
are neutral as to the reason in the present essay.) We take the operational consequence of
this assumption to be that children attempt to produce speech that is a proportionately
scaled version of adult speech. Specifically, the proportionate scaling should hold in the
Fourier frequency domain of the acoustic signal. This is a fairly strong statement, but it is
2
necessary to have something like it in order to proceed. There are many possible scalings
that could be employed when comparing adults and children. One example of a variant to
the assumption here would be to use a perception-weighted frequency.
We consider four kinds of English speech sounds 1) [ô], 2) sibilant fricatives, 3) alveolar
and velar stops, and 4) back vowels. We will be discussing tongue surface shape in regards
to 1), 2), and 4). All of the four speech sounds involve the issue of vocal tract length scaling.
Next, we provide a little background for the geometric concepts and state hypotheses using
these concepts.
Curvature, scaling, and hypotheses
We will characterize functions and surface shapes using something called curvature. In
particular, we examine curvature for area functions, tongue surfaces, and outer vocal tract
wall surfaces.
An area function is a plot of the cross-sectional area of an acoustic tube, such as a vocal
tract, as a function of axial position along the tube. Figure 1 shows an area function with
two circles “touching” the function at points (x1 , A1 ) and (x2 , A2 ). These circles illustrate
the definition of curvature at these points. For example, at (x1 , A1 ) the arc of the circle
near this point actually approximates the plot of the function. The more “bent” the curve
at (x1 , A1 ), the smaller the radius, r1 , of the circle. Further, the curve is said to have a
high curvature at (x1 , A1 ). Therefore, we define curvature κ1 at (x1 , A1 ) to be positive or
negative the reciprocal of the radius of the approximating circle. We assign a positive sign
if radius of the approximating circle touches the function plot from above, and we take a
negative sign in the case that it touches the function plot from below. Referring to Figure
1, κ1 = −1/r1 and κ2 = 1/r2 . r1 is a radius of curvature, as is r2 . So if r1 = 2 cm and
r2 = 4 cm, then κ1 = −0.5 cm−1 and κ2 = 0.25 cm−1 . Note that |κ1 | > |κ2 | when r1 < r2 .
We will be referring to changes in curvature, as, for example, |κ2 − κ1 |. With the numerical
values given above, |κ2 − κ1 | = 0.75 cm−1 . Note that when curvatures have opposite signs,
the difference in curvatures is the sum of the absolute values of the curvatures.
3
Given a surface, such as the outer tongue surface, we can cut that surface with a
perpendicular plane, such as the mid-sagittal plane or a coronal plane. The intersection
of that plane and the tongue surface produces a two-dimensional curve. Figure 2 shows
these resulting two-dimensional curves in bold. The idea of curvature also applies to these
two-dimensional curves.
Figure 3 shows the curves that result from the mid-sagittal cut of the tongue and outer
surface of the vocal tract (e.g. palate and rear pharyngeal wall). Curvature can be defined
for each point on that curve in the same way that it is defined for the area function.
Again, any point on either the tongue surface curve or the outer vocal tract curve has an
approximating circle. The radius of that circle is the radius of curvature of the curve at
that point, and the inverse of that radius is the curvature, to within a sign. We will use a
negative sign if the radius of the approximating circle approaches the point from inside the
vocal tract. Otherwise the curvature is positive. This is consistent with the sign convention
for the curvature of the area function plot. Thus, the curvature is positive if the curve is
“bulging” into the vocal tract (e.g. circles numbered 1, 3, and 5), and it is negative if it is
“cupped away” or “arched away” from the vocal tract (e.g. circles numbered 2 and 5).
Figure 4 shows a coronal section of a highly grooved tongue surface curve and the hard
palate curve. The same sign conventions apply for these curves.
There is one more measure that is considered: the average, spatial rate of change of
curvature, κ̇ , which is now defined. Consider a smooth two-dimensional curve, and let
s = s(x, y) be the curve’s length from the left-most terminal point to point (x, y) on the
curve. For two points on the curve, (x1 , y1 ) and (x2 , y2 ), |s2 − s1 | = |s(x2 , y2 ) − s(x1 , y1 )| is
the distance along the curve between the points (x1 , y1 ) and (x2 , y2 ). Average, spatial rate
of change of curvature is defined
κ2 − κ1
κ̇ ≡
s2 − s1
(1)
where κ1 and κ2 are the curvatures at (x1 , y1 ) and (x2 , y2 ), respectively. With r1 = 2 cm,
κ1 = −0.5 cm−1 , r2 = 4 cm, κ2 = 0.25 cm−2 , and s2 − s1 = 3 cm, κ̇ = 0.25 cm−2 . κ̇ is a
measure of how rapidly curvature changes along the curve. We often refer to κ̇ as spatial
4
rate of change of curvature, without the term “average”.
Vocal tract length scaling will often be employed when comparing a child’s and an adult
male’s vocal tracts. (When speaking about hypotheses and simulations, we will be speaking
of a typical child and a typical male adult. Only when we are discussing previous results do
we refer to groups or populations of children and adults.) The scale factor, S, is the ratio
of the adult’s vocal tract length to the child’s vocal tract length. It will often be applied to
the child’s vocal tract to make it easier to compare shapes of the tracts. With the scaled
child’s vocal tract, the child’s speech output in terms of the Fourier frequency domain
should match that of the adult when our criterion for the child to produce adult-like speech
is met. Note that S > 1, and, typically S < 2. We will use S = 3/2 in examples. The
child’s lengths can be scaled up using S for direct comparison between child and adult area
0
functions and vocal tract shapes. Thus, if ` is a length for the child, the scaled length is ` ,
0
with ` = S`.
Because curvature, κ, has units of inverse length (e.g. cm−1 ) and κ̇ has units of inverse
length squared (e.g. cm−2 ), the corresponding scaled quantities are given by
0
κ̇
κ
0
κ =
and κ̇ = 2
S
S
(2)
Thus, scaling length up with S > 1 means that curvature and spatial rate of change of
curvature are scaled down. Further, the percentage differences between scaled and unscaled
is greater for κ̇ than for κ. Using the values from the example after Equation (1), we
0
0
0
obtain κ1 = −0.333 cm−1 , κ2 = 0.167 cm−1 , and κ̇ = 0.111 cm−2 from κ1 = −0.5 cm−1 ,
κ2 = 0.25 cm−1 , and κ̇ = 0.25 cm−2 , respectively, when S = 3/2.
We are now ready to state two plausible hypotheses regarding tongue surface curvature
and spatial rates of change of curvature. These hypotheses are in terms of maximum values
of these quantities.
The first hypothesis is that the maximum absolute value of scaled curvature and scaled
spatial rate of change of curvature for a child is less than the same quantities for the adult.
Symbolically,
0 0 max κadult ≥ max κchild and max κ̇adult ≥ max κ̇child 5
(3)
Note that the first inequality can be stated in terms of radius of curvature: the minimum
scaled radius of curvature of the child is greater than the minimum radius of curvature of
the adult. Further, the hypothesis can be restricted to a particular region of the tongue.
Even if the child’s maximum scaled values in Equation 3 are less than those of the adult’s,
his or her maximum unscaled values could be greater than those of the adult. For instance,
th child may be able to produce a maximum unscaled curvature of |κchild | = 0.6 cm−1 , which
corresponds to the child being able to produce a minimum unscaled radius of curvature of
r = 1.666 cm. Suppose that the child can attain a maximum spatial rate of curvature of
κ̇child = 0.25 cm−2 . The adult could have a maximum curvature of |κadult | = 0.5 cm−1 ,
which corresponds to a minimum radius of curvature of 2 cm, and a maximum spatial rate
of curvature of κ̇adult = 0.15 cm−2 . With a scaling factor of S = 3/2, the child’s scaled
0
curvature is |κchild | = 0.4 cm−1 < 0.5 cm−1 = |κadult |, and maximum scaled spatial rate of
0 change of curvature κ̇child = 0.111 cm−2 < 0.15 cm−2 = κ̇adult . Thus, the following
hypothesis regarding the child’s unscaled quantities is more restrictive than the hypothesis
expressed in Equation (3),
max κadult ≥ max κchild and max κ̇adult ≥ max κ̇child (4)
What are the reasons to choose one or the other hypothesis? The author has no more than
an intuitive idea that a more highly curved tongue, and/or a tongue shape that requires
rapid spatial changes in curvature are more difficult for a speaker to produce. By difficult, we
mean that the motor units need to be more highly spatially differentiated and simultaneously
controlled. Here we are concerned mostly with tongues performing speech acts, and not other
acts such as swallowing. (Margaret Denny points out that the distinction can probably be
made between learned behavior and behavior that is present at birth, or between cortically
mediated behavior and bulbar control.) Further, except in the case of sibilant fricatives, we
are concerned with an unbraced tongue. Because the young child learning to speak does not
possess the experience that the adult speaker does, we expect that the hypothesis expressed
in Equation (3) is true. That is, we expect that the child cannot produce articulations more
difficult when the vocal tract is scaled up than those of the adult. The more restrictive
6
hypothesis in Equation (4), involving unscaled values, could be true, but we do not have
physiological evidence in terms of, say, the independence of control of motor units in the
tongues of the child versus the adult (Denny & McGowan 2012a). Thus, here we will
invoke the less restrictive hypothesis expressed in Equation (3) in the remainder of this essay.
[ô] production
In a longitudinal acoustic study, McGowan, Nittrouer, and Manning (2004) showed that
stressed syllable-initial [ô] for young Americans growing up in a rhotic dialect area was the
last of all the rhotic sounds to develop for a small group of children. For adults, stressed
syllable-initial [ô] is the strongest [ô] in comparison with [ô] in other contexts, including
medial and syllable-final [ô]. Here, the term “strong” is used in the sense that the first three
formants attain very low values. In particular, the third formant frequency, F3, is very low
for strong [ô]. We believe that the stressed, syllable-initial [ô] is also the most extreme of all
the rhotic articulations, in the sense that the oral constrictions are the tightest and the area
function expansions are the greatest of all rhotic articulations. This idea is now explored
with area function-to-acoustic simulations.
We can compute formant frequencies from area functions (McGowan, ongoing). In the
present simulations we use a vocal tract length of 17.3 cm for both the adult and the scaled
child. Thus, we are using an adult male vocal tract length, and consider the child’s vocal
tract to be scaled up by the factor of the ratio of the adult’s vocal tract length to the child’s
vocal tract length, i.e. by the factor S throughout this section, except in its final paragraph.
We do not consistently use the term “scaled” until this final paragraph, but it should be
understood implicitly. While [ô] is glide-like in its articulation, we are considering the glide
at its most extreme articulation, which should correspond to the lowest values of the first
three formant frequencies F1, F2, and F3.
The purpose of the simulations is to examine the changes of the formant frequencies
when there is expansion in the area function behind the palatal tongue constriction with
a simultaneous constriction in the pharyngeal region. Lip rounding and a palatal tongue
7
constriction are presumed to be part of the articulation of [ô] for all the area functions
examined here. Figure 5 shows a series of area functions, where increasing area function,
or expansion, behind the palatal constriction, as well as decreasing area function, or
constriction, in the pharynx are indicated by vertical arrows. The area functions vary
along the vocal tract axis from 0 cm at the larynx to about 8.6 cm, and are all the same
from 8.6 cm to the lips at 17.3 cm. The area function with no change in area from 8.6 cm
to 0 cm is shown by the thick dashed line, and the area function with the largest area
function expansion behind the palatal constriction and greatest degree of constriction in
the pharyngeal region is also shown by the thick solid line in the rear of the vocal tract
in Figure 5. There are intermediate degrees of expansion and constriction indicated by
narrower lines in Figure 5.
We examine these different area functions in terms of curvature in the rear of the vocal
tract. There is no change of area function curvature in the rear of the vocal tract when it is
a constant 2 cm2 : its curvature is zero throughout. (Recall that we are speaking of the area
function here and not the curvature of the tongue surface, so approximate zero curvature
can be real.) The other four area functions exhibit changes in area function curvature in the
rear of the vocal tract: from negative curvature in the region of increased cross-sectional
area right behind the palatal constriction (the expansion), to positive curvature where
the pharyngeal constriction can be formed. As the volume of the expansion increases, the
constriction in the pharyngeal region becomes tighter, and the maximum absolute values
of the curvatures, |κ|, in these regions also increase. Further, the maximum absolute value
of the spatial of change in curvature, κ̇ , along the curve between the maximum area
behind the palatal constriction and the maximum pharyngeal constrictions also increases.
The formant frequencies corresponding to these area functions are shown below the plot
in Figure 5. The formant frequencies from top to bottom go from the least curved to the
most curved rear vocal tract area function, as indicated by the vertical arrow to the left
side of the table. All of the formant frequencies are well below their values for a straight
acoustic tube (i.e. 500 Hz, 1500 Hz, and 2500 Hz, for F1, F2, and F3, respectively) for
all the area functions depicted in Figure 5. As the area function in the rear of the vocal
8
tract goes from having zero curvature to maximum absolute curvature and spatial rate
of change of curvature, F1 and F2 increase, and F3 decreases. F1 goes from 316 Hz to
321 Hz, a 2% increase, F2 goes from 910 Hz to 913 Hz, a 0.3% increase, and F3 goes
from 1975 Hz to 1749 Hz, an 11% decrease. Thus, for negligible changes in F1 and F2, we
obtain a substantial decrease in F3 in going from the configuration with no curvature to the
maximally curved configuration in the area function in the rear of the vocal tract. In order
to attain a strong [ô] with very low F3, it is advantageous to use the configuration with the
maximum absolute curvature and spatial rate of change in curvature in the area function in
the rear of the vocal tract.
Are there particular anatomical, or geometric, factors that make the production of a
strong [ô] particularly challenging for young children? We believe that there are several
factors that impede the production of strong [ô] by children 18-months of age, say. Factors
making strong [ô] production difficult for very young children are discussed in Denny and
McGowan (2012a). We briefly review the factors in the Denny and McGowan work, and
add an additional factor in the present essay.
Before proceeding, we need to relate area functions to the vocal tract as viewed in the
mid-sagittal plane. We are most interested in comparing the adult with the child in the
region behind the palatal constriction. For this purpose, we make the crude assumption
that the anatomical features in the dimension lateral to the mid-sagittal plane of the child’s
scaled vocal tract are approximately the same as that of the adult’s (i.e. when the factor
S is applied to the child). This means that differences between the child and the adult in
distance from the upper tongue surface to the outer vocal tract surface of hard and soft
palates and rear pharyngeal wall are proportional to the vocal tract cross-sectional areas.
For example, the distance from the upper tongue surface to the outer vocal tract surface
could be measured along grid-lines shown in Figure 6.
The relevant factors in comparing adult and young children’s articulation of [ô] are, from
Denny and McGowan (2012a): 1) the orientation of the axes of the oral and pharyngeal
cavities, 2) the relative axial lengths of the oral and pharyngeal cavities, 3) the size of
9
adenoidal tissue, and 4) the orientations of external tongue muscles. We add an additional
factor here, factor 5), and this is the fact that young palates tend to be less “domed” and
flatter than adult palates in the mid-sagittal plane (Hiki & Ito, 1986). Here we concentrate
on factors 1), 2), and 5) and leave 3) and 4) to be found by the interested reader in Denny
and McGowan (2012a). (The growth of adenoidal tissue, factor 3), is probably not a factor
in the youngest of children learning to speak.) We will show that factors 1), 2), and 5)
combine to require more extreme tongue surface curvature and spatial rate of change of
curvature for the young child in order to attain the changes in area function curvature to
produce an adult-like strong [ô].
Figure 7 shows mid-sagittal configurations for strong [ô] production for the adult (Figure
7a) and the child (Figure 7b). As with the area function, the child’s vocal tract has been
scaled by a factor that is equal to the ratio of the length of the adult vocal tract to the
length of the child’s vocal tract, S. We show a tongue tip up [ô] production, but the
argument below also applies to bunched tongue [ô] production.
The length of the child’s pharyngeal cavity compared to its oral cavity is less than the
same ratio of lengths for the adult. Thus, for the child to attain the same area function
as the adult, he or she must have palatal and pharyngeal constrictions farther forward in
relation to the palate and rear pharyngeal walls than the adult does, as shown schematically
in Figure 8. This means that the child makes a palatal constriction that is lower and a
pharyngeal constriction that is higher than the adult possesses. As a consequence, the two
constrictions’ heights are more nearly the same for the child compared to the adult.
This also means that the expansion behind the palatal constriction contains a greater
proportion of pharynx compared to mouth for the adult than it does for the child. This, in
turn, requires that the expansion be made with a larger proportion of the surface of the
child’s tongue opposite the hard palate. It takes both the tongue surface and the outer vocal
tract wall to form expansions and constrictions. The reader can refer back to Figure 3 to
see that an expansion is best formed with both the tongue surface and the outer vocal tract
wall possessing large negative curvatures. But, a less domed palate for the child compared
10
to the adult, means that this portion of the outer surface has less negative curvature for
the child. One other factor that diminishes the negative curvature of the child’s outer
vocal tract wall in the expansion is the fact that the axis of the mouth and the axis of the
pharynx intersect at a more obtuse angle for the child than for the adult, where the axes
are closer to perpendicular.
What does the child need to do in order to attain a similar area function to the adult
in the expansion region? Combining the fact that the two tongue constrictions are more
nearly at the same height for the child compared to the adult, and the fact that there is
less negative curvature in the outer wall in the expansion for the child than for the adult,
means that the child’s tongue surface must possess more negative curvature than the adult’s
tongue surface in the expansion region. A slight curvature is shown in the adult’s tongue
surface in this region in Figure 7(a), but even this is probably not necessary, so the adult
can have essentially a flat tongue with zero curvature in this region to produce the area
function in Figure 5.
Not only is the maximum negative curvature of the child’s tongue surface greater than
that of the adults, but the average spatial rate of change from the negative curvature to the
positive curvature of the tongue surface in the pharyngeal constriction region is also larger.
We have not considered the curvatures of the surfaces in the region below the pharyngeal
constriction. That could require another change in tongue surface curvature from positive
near the pharyngeal constriction to negative in the lower pharynx.
It may be that the adult could produce an even greater negative tongue surface curvature
behind his palatal constriction, while, at the same time, retaining the positive tongue
surface curvature necessary to create the pharyngeal constriction. That is, in the region of
the tongue behind the palatal constriction, the hypothesized first part of the hypothesis
0 expressed in Equation (3) could be met, max κadult ≥ max κchild , in the region between,
and inclusive of, the palatal and pharyngeal constrictions.
On the other hand, with the changes from negative to positive tongue surface curvature
from the expansion behind the palatal constriction over a small distance, it is likely that
11
the second part of the hypothesis expressed in Equation (3) is violated in order for the child
to produce an adult-like strong [ô]. That is because spatial rate of change of tongue surface
curvature scales as S −2 , and not S −1 , that the second part of the hypothesis in Equation
(3) could be violated, while the first part is not. We write,
0 max κ̇child > max κ̇adult ,
and, perhaps,
0 max κchild > max κadult (5)
where “and, perhaps” is, logically, an inclusive or. Equation (5) is a contradiction to the
hypothesis expressed in Equation (3). Therefore we offer the possibility that the hypothesis
expressed in Equation (3) being true for the inability of young children to produce strong,
adult-like strong [ô].
The situation appears severe if both the inequalities in Equation (5) are satisfied. We
revert to the child’s unscaled dimensions to find what the child must do to produce strong
[ô]. Equation (2) can be used to give quantities in unscaled terms. If a reasonable value for
S is taken to be 3/2, then this means that the child’s tongue surface curvature in absolute
terms is at least 150% that of the adult, and the average spatial rate of change of the tongue
surface curvature is at least 225% that of the adults. These are lower bounds that could be
much larger. If the maximum curvature of the adult’s tongue in the region of interest is
|κadult | = 0.5 cm−1 , which corresponds to a radius of curvature, radult , of 2 cm, then the child’s
maximum curvature must be at least |κchild | = 0.75 cm−1 , which corresponds to a radius of
curvature, rchild = 1.333 cm. Thus, the child is required to attain very small radii of curvature in order to produce a strong [ô], and changes between small radii of curvature (i.e. large
curvatures of opposite sign) need to occur in 2/3 the adult distance along the tongue surface.
Sibilant fricatives
Our earliest work on children’s speech production was on sibilant fricatives spoken by
American children, aged 3 to 7 years, and by American adults (McGowan & Nittrouer,
12
1988). We found that the amplitudes of the second formants were relatively high for the
children compared to the adults during sibilant production. Because the great majority
of the second formant energy is behind the tongue constriction for sibilant fricatives, we
argued that the children’s tongue constrictions are generally not as tight during sibilant
production as those for adults. In the notation of the present essay, we expect the scaled
constriction cross-sectional area of the child, Achild , to be greater than the constriction
cross-sectional area of the adult, Aadult ,
0
Achild = S 2 Achild > Aadult
(6)
Because S > 1, it is possible for the inequality in Equation (6) to be true and still have
Achild < Aadult . We cannot speak to that possibility at the present time. We believe that
the relation expressed in Equation (6) is the result of the relation expressed in Equation
(3). This cannot be shown with mathematical certainty, but a plausibility argument is
presented.
We are not certain about the tongue surface curvatures near the tongue tip during
sibilant production, because the constriction is formed using three dimensional deformation
of the tongue near the tip, and can be affected by individual variation in palate morphology.
However, we can consider aspects of tongue surface shape in a coronal plane near the tip.
The doming of the child’s palate in coronal planes near the sibilant constriction is not as
great as that of the adult’s (Hiki & Itoh, 1986). Again, we consider the child’s vocal tract
scaled up by the factor S.
It can be imagined that the adult’s tongue and palate in a coronal plane near the
maximum constriction would look like the sketch shown in Figure 9a. For the scaled child
0
with Achild = Aadult and his or her shallower palate, the corresponding sketch should be like
the one shown in Figure 9b. The doming of the palate in the coronal plane for the adult
means that it has a large negative curvature, which in turn means that the tongue surface
can have a positive curvature through this coronal plane, and still form a constriction of
small area, Aadult .
0
What would be the case if the child were to attempt to attain Achild = Aadult ? Figure 9b
13
shows that the tongue needs to form most of three sides of the constriction channel, because
the negative curvature of the child’s palate the coronal plane has a small magnitude. This,
in turn, requires a rapid change in large-magnitude tongue surface curvature in the coronal
plane from positive, to negative, and back to positive. (The possibility of a channel that is
very narrow and long in the coronal plane is discounted, because the jet of air emanating
from the constriction needs to be directed toward the incisors.) The large curvatures and
rapid changes in curvature could well be in contradiction to the relations expressed in
Equation (3) for a coronal plane near the tip of the tongue. Here the tongue is braced
on the sides for both the adult and the child, so we could expect that we could attain
larger values for these quantities than for unbraced configurations. However, there must be
limitations, even in the braced conditions.
In the next section, issues of tongue surface curvature only appear as an aside. We will return to tongue surface curvature as a major factor in speech development in the final section.
Alveolar and velar stops
Fronting of velar consonants is a well-attested phenomenon in young children’s speech.
We believe that this phenomenon, at least for the youngest children, is partly the result of a
scaling mismatch between the physics of acoustic propagation and the physics of the noise
source at the teeth during stop release. Formant frequencies scale according to the ratio of
adult vocal tract length to child vocal tract length, or S, if the child produces adult-like
speech. However, the relation between acoustic sources and the resulting acoustics possess
more complicated scaling properties. For instance, if the length of an acoustic tube is scaled
down by a factor of two, we would expect that the resonance frequencies to increase by a
factor of two. This is not true of acoustic noise source properties. In fact, we examine an
instance where the predominant noise source property, intensity in higher frequency bands,
depends on the unscaled distance from the tongue tip constriction to the teeth. We do
not expect that S has any effect on the intensity in the higher-frequency bands. (What
constitutes the higher frequency band does depend on S, but not the intensity of sound in
14
that band.) Thus, intensities in the higher frequency bands are the same for the child and
adult if the distance from the tongue tip constriction to the teeth is the same, all else being
equal.
Here we concentrate on stressed syllable-initial voiceless velar and alveolar stops, which
are aspirated in American English. First, we consider some empirical measures of the noise
after the release bursts for both some adults and children. We then argue that a young
child’s intended velar noise source patterns do not resemble those of the adult, because the
child has the impossible task of simultaneously matching the formant pattern at release.
The result is that the child produces an intended velar release with ambiguous acoustic cues
for adult listeners.
Another aspect of velar fronting is what has been termed “undifferentiated tongue
gesture”, where the tongue covers a large area of the palate during closure. Indeed, tongue
contact with the palate is a function of tongue surface curvature. We do not address this
aspect of velar fronting in this essay. However, the reader can imagine that limitations on
tongue surface curvature can lead to relatively large areas of palate covered by the tongue
during stop closure.
We performed a small study of voiceless, aspirated velar and alveolar stop release noise
spectra as they evolve in time with one adult male subject and one adult female subject.
We used a multi-scale spectral analysis proposed by Stockwell, where the bandwidth of the
analysis increases with frequency, while the time resolution also increases with frequency
(Stockwell, 1996). The Stockwell analysis shares this property with wavelet analysis, but it
is more easily interpretable in terms of standard Fourier analysis than are wavelets. (We
thank M.T-T. Jackson for finding Stockwell’s method and computer code on the World
Wide Web.)
The results for the two adults producing velar and alveolar releases in four different vowel
contexts are shown in Figures 10-13. These figures show the amplitude of different octave
bands for about 5 ms before and 15 ms after the release of the consonant. The center
frequencies of the bands are given in the legends of these figures.
15
In general, the amplitude of the band centered at 4.4 kHz, represented by the thick,
light gray line, remains high after the alveolar burst, but not after the velar burst. For the
front vowel contexts, /i/ and /æ/, the next higher band centered at 8.9 kHz, represented
by thick dashed lines, shows the same difference in place-of-articulation particularly for the
male (Figures 10 and 11). The exceptions to these general trends are the velar releases for
the female in front vowel, /i/ and /æ/ contexts. We cannot discount the possibility that the
release was somewhat fronted for the female speaker in these front vowel contexts. However,
the amplitude of her 4.4 kHz centered band does not remain high during the entire 15 ms
interval after the burst, indicating that multiple releases of the /k/ may have occurred.
These results are, perhaps, not the expected results. After all, it is known that the velar
release is slower than the alveolar in general. We interpret the results as follows. The
sustained strong 4.4 kHz and 8.9 kHz centered bands after alveolar release is due to the
the strong air flow from the glottis during aspiration being directed toward the teeth by the
tongue blade as the tongue tip moves from the palate. This creates a sustained strong noise
source at the teeth for a relatively long time after the burst. Such a directing of air flow
toward the teeth does not occur after velar release, so the high intensities in the 4.4 kHz
and 8.9 kHz centered bands are brief.
What about the same analysis performed on the speech of young children? Figure 14
shows the same analysis applied to alveolar releases of a child who was 30 months-old.
Because of the shorter vocal tract, it is the band with center frequency of 8.9 kHz that has
the highest sustained amplitude. Figure 15 shows the acoustic evolution for three intended
velar releases. These plots are a part of a larger sample of plots that we have examined for
six children from 30 to 48 months of age. They indicate that young children’s intended velar
releases have post-release noise characteristics that are more like those of alveolar releases.
The fact that a young child’s vocal tract is shorter means that the release for an intended
velar would be closer to the teeth than that of the adult’s release, even if the child’s release
was at the velum. This would mean that noise after release would have more sustained
intensity in the higher frequency bands. Further, we hypothesize that the young child’s
16
intended velar release are indeed fronted in many cases, which could make the noise source
after release even more like the noise source produced after an alveolar release. We now
explore a reason that children are likely to front their velar releases.
There are at least two important perceptual cues for place-of-articulation for the types of
stops that we are considering: 1) formant transition from the release to the following vowel,
and 2) the spectral composition of the noise after release as a function of time. We argue
that the young child cannot simultaneously produce both adult-like perceptual cues for a
/k/ release due to his or her anatomy.
One of the expected features of the formant frequency trajectories at /k/ release is what
has been termed the “F2-F3” pinch; the second and third formant frequencies appear to
emerge near the same frequency at the time of an adult’s velar release. The explanation
for this pinch is that the constriction at the velum produces a rear acoustic tube that
is approximately twice the length of the front tube. The rear tube is approximately a
half-wave resonator and the front tube a quarter-wave resonator, and, thus, they possess
nearly identical resonant frequencies ascribed to F2 and F3.
The ratio of pharyngeal cavity length to oral cavity length is less for the young child
compared to the adult. This means that the child needs to make his or her constriction
farther forward with respect to his or her teeth compared to the adult in order to produce
the pinch in F2 and F3 at the time of release of the intended velar. Figure 16 illustrates
this phenomenon. The top panel shows the case of velar release for the adult, and the
middle panel for a velar release for the young child. In order that the young child attain the
F2-F3 pinch it is necessary to move the tongue closure forward along the hard palate. Not
only is the velum closer to the teeth for the child, but the child has a reason to move the
intended /k/ constriction even closer to the teeth. It is plausible that the child’s /k/ release
occurs at a location that has a comparable distance to the teeth that the adult has for his
/t/ release. This diminished distance has substantial consequences for the other acoustic
cue: the spectral properties of the noise source as a function of time. Roughly, moving
a tongue constriction substantially closer to the teeth can result in sibilance, where there
17
would otherwise be no sibilance.
We conclude that what adults hear as a fronted velar release is indeed fronted. However,
while we hear a noise source indicating a frontal release, we could find, with further
investigation, that the formant trajectories contain the F2-F3 pinch for an intended velar.
Back vowels
We return to issues of tongue surface curvature in relation to the curvature of the outer
vocal tract wall in this most speculative section. Here we address the observed large
variability in the acoustics of children’s back vowel production. McGowan, McGowan,
Denny, and Nittrouer (2014) observed that young children’s (aged 30 to 48 months) back
vowel formant frequencies were the most variable compared to other vowels. This variability
was found in terms of the vowels’ mean position in the F1-F2 space as a function of age, as
they were measured about every six months (see their Figure 6). In other words, the back
vowels appear to be acoustically less stable as a function of the age of the young children
than do other vowels. In a study of a synthetic speech of a four-year old child, McGowan
(2006) showed that the pronunciation of vowels preferred by adult listeners had longer
rear tube lengths in relation to front tubes than for the comparable adult pronunciation.
(Note that the rear tube for /A/ is a constriction, and for /u/ it is an expansion.) This
indicates that the constrictions and expansions in the rear of the vocal tract are less well
spatially localized for children than adults. This is related to notions regarding maximum
spatial rate of change of tongue surface curvatures, κ̇ , because it is necessary for tongue
curvature to change rapidly along the length of the tongue to obtain rapid spatial change
between a constriction and an expansion. According to the hypothesis expressed in the
second inequality in Equation (3) applied to the rear surface of the tongue, we expect the
scaled child’s sagittal tongue surface curvatures to change less rapidly than adult tongue
surface curvatures along much of the tongue body.
Could there be a relation between the decreased localization of acoustic tubes and the
observed variability of back vowels at different ages? Of course, the developing vocal tract
18
changes its morphology rapidly with age (Denny & McGowan 2012b). This, alone, could
cause substantial acoustic variability. However, we want to determine whether differences
between the child’s and adult’s tongue surface curvature, coupled with outer vocal tract
wall curvature, could contribute to the observed acoustic variability. We assume that the
net effect of tongue surface curvature limitations are similar to those of the area function in
the rear of the vocal tract. We use a simulation to show the possibility that the observed
acoustic variability of children’s back vowels with age could be due to limitations on the
maximum spatial rate of change of tongue surface curvature.
In this section we work with the child’s vocal tract scaled up by the factor S. We examine
area functions that have a constant area in the front and in the rear of the vocal tract.
These are regions of zero area function curvature. (In these regions, the curvature of the
tongue matches the curvature of the outer vocal tract wall, except that the curvatures have
the opposite signs.) The length of the back region of zero curvature of the adult is assumed
to be 2/3 that of the scaled child. Between the regions of zero area function curvature there
is a region of positive curvature that connect the regions. The connecting region has axial
length that is 1/4 the length of the vocal tract for the adult, and 3/8 the length of the
scaled vocal tract for the child. These differences in length are consistent with the idea that
maximum scaled rates of change of curvature of the child are less than those of the adult.
Figure 17a shows an area function for the adult, and Figure 17b shows an area function for
the child. The child’s and adult’s area functions have the same basic shape. These area
functions are appropriate for a low, back vowel.
Variability in area function was simulated by changing the length of the rear constriction
from a minimum length to twice the minimum length in 10% increments. Also, the
cross-sectional area of the constriction tube was varied from 0.3 cm2 to 0.7 cm2 , so that all
possible constriction cross-sectional areas were coupled with all possible rear constriction
lengths. Figure 18a shows the extreme positions for the adult and Figure 18b shows the
extreme positions for the child.
For the adult, the mean F1 and F2 across all the perturbations was 677 Hz and 826 Hz
19
respectively, and the range of F1 was 141 Hz, or 21%, and the range of F2 was 139 Hz,
or 17%. For the child the mean F1 and F2 across all the perturbations was 723 Hz and
1040 Hz, respectively. The range of F1 for the child was 160 Hz, or 22%, and the range of
F2 was 296 Hz, or 28%. Thus, while the child’s range was comparable to the adult’s range
in F1, for F2, the child’s range was substantially larger than that of the adult.
We believe that non-back vowels behave differently than back vowels, because the rear
portion of tongue bodies are more limited in terms of surface curvature. Particularly, the
child is less limited by maximum curvatures and spatial rates of change of curvature in the
production of non-back vowels.
Conclusion
In this essay we have concentrated on physical and geometric factors that could limit
a child’s ability to produce adult-like speech in certain instances. Much recent research
focus has been on dynamic factors in young speakers’ articulation. We believe that both
geometric and dynamic factors play a role in speakers’ maturation. We have relied on an
unproven idea that highly curved tongues and high spatial rates of change in curvature are
difficult motorically. This idea needs to be investigated.
So why do some children, who apparently have grown out of physical and geometric
limitations, continue to behave as they did months previously? We do not know. However,
one possibility is that some children get along well enough in their limited community
with whatever subphonemic distinctions that they possess, and, further, they are naturally
conservative. These children, for one reason or another, are not motivated to go through
periods of high variability with certain speech motor acts that are necessary to attain new,
more adult-like pronunciations. This merits investigation, if it is not understood already.
20
Acknowledgments
We thank Margaret Denny and Rebecca McGowan for improving this essay. This essay
is dedicated to all present and past employees of CReSS LLC.
21
References
Denny, M. and McGowan, R.S. (2012a). Implications of peripheral muscular and
anatomical development for the acquisition of lingual control of speech production: a
review. Folia Phoniatrica et Logopaedia, 64, 105-15.
Denny, M. and McGowan, R.S. (2012b). Sagittal area of the vocal tract in young female
children. Folia Phoniatrica et Logopaedia, 64, 297-303.
Hiki, S. and Ito, H. (1986). Influence of palate shape on lingual articulation. Speech
Communication, 5. 141-58.
McGowan, R.S. (2006). Perception of synthetic vowel exemplars of 4 year old children
and estimation of the their corresponding vocal tract shapes. J. Acoust. Soc. Am.,
120, 2850-58.
McGowan, R.S. (ongoing). Lectures in Acoustics for the Speech Sciences. CReSS LLC,
Lexington, MA. (www.cressllc.net).
McGowan, R.S., Denny, M., and Jackson, M.T-T. (2011). Alveolar and velar stop
releases during speech development. J. Acoust. Soc. Am., 129. 2597 (abstract).
McGowan, R.S. and Nittrouer, S. (1988). Differences in fricative production between
children and adults: evidence from an acoustic analysis of /S/ and /s/. J. Acoust.
Soc. Am., 83, 229-36.
McGowan, R.S., Nittrouer, S., and Manning, C.J. (2004). Development of [ô] in young,
Midwestern, American children. J. Acoust. Soc. Am., 115, 871-84.
McGowan, R.W., McGowan, R.S., Denny, M., and Nittrouer, S. (2014). A longitudinal
study of childrens vowel production. J. Speech Lang. Hear. Res., 57, 1-15.
Stockwell, R.G., Manshinha, L., and Lowe, R.P. (1996). Localization of the complex
spectrum: the S transform. IEEE Trans. Sig. Proc., 44. 998-1001.
22
Area functions
4
cross−sectional area (cm2)
3.5
3
2.5
2
1.5
1
0.5
0
lips
0
2
4
F1
(Hz)
F2
(Hz)
F3
(Hz)
316
317
318
320
321
910
910
911
912
913
1975
1909
1854
1802
1749
6
8
10
axis position (cm)
Figure 5
12
14
16
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
areas (cm2)
4
3
2
1
0
0
2
4
6
8
10
tube axis position (cm)
(a)
12
14
16
0
2
4
6
8
10
tube axis position (cm)
(b)
12
14
16
lips
areas (cm2)
4
3
2
1
0
lips
Figure 17
areas (cm2)
4
3
2
1
0
0
2
4
6
8
10
tube axis position (cm)
(a)
12
14
16
2
4
6
8
10
tube axis position (cm)
(b)
Figure 18
12
14
16
areas (cm2)
4
3
2
1
0
0