CReSS LLC Working Papers Working Paper 2 An essay on physical and geometric factors in speech development R. S. McGowan CReSS LLC 1 Seaborn Place, Lexington MA 02420 [email protected] April 2, 2014 1 Introduction Physical and geometric aspects of children’s vocal tracts help determine the sounds that children produce as they learn to speak. In the age range of 6 months to 30 months, the vocal tract goes through substantial physical change. It is a well-accepted notion that many of the speech problems that appear at the beginning of elementary school are simply continuations of earlier speech behaviors. Our view is that many of these earlier, preserved behaviors first arise because of physical/geometric (roughly physiological/anatomic) factors that constrain children as they begin to produce speech. If this thesis is correct, then it means that some children preserve earlier behaviors that are 1) not adult-like, and 2) may no longer be necessary because the physical constraints for non adult-like behavior are no longer extant. This essay proposes reasons that certain speech sounds may not be produced in an adult-like manner as children begin to talk. The essay does not consistently reference previous scientific work in speech development, other than work that we have done at CReSS LLC. (We do reference two other works that do not appear in the bibliographies of our own published works.) This is partly due to the fact that we want to synthesize our knowledge of particular aspects of speech development that have emerged from research at CReSS LLC. The personnel at CReSS LLC who are included are Margaret Denny, Michel Jackson, Rebecca McGowan, and Richard McGowan. We also have had an ongoing collaboration in speech development with Dr. Susan Nittrouer. We are proposing to study children’s tongue curvature using ultrasound with Professor Diana Archangeli. There is one underlying assumption made here, that, we believe, many researchers make: children try to be like adults in their speech. (The reason for the desire to be like adults may simply be a motivation to imitate or a motivation to be understood, or both. We are neutral as to the reason in the present essay.) We take the operational consequence of this assumption to be that children attempt to produce speech that is a proportionately scaled version of adult speech. Specifically, the proportionate scaling should hold in the Fourier frequency domain of the acoustic signal. This is a fairly strong statement, but it is 2 necessary to have something like it in order to proceed. There are many possible scalings that could be employed when comparing adults and children. One example of a variant to the assumption here would be to use a perception-weighted frequency. We consider four kinds of English speech sounds 1) [ô], 2) sibilant fricatives, 3) alveolar and velar stops, and 4) back vowels. We will be discussing tongue surface shape in regards to 1), 2), and 4). All of the four speech sounds involve the issue of vocal tract length scaling. Next, we provide a little background for the geometric concepts and state hypotheses using these concepts. Curvature, scaling, and hypotheses We will characterize functions and surface shapes using something called curvature. In particular, we examine curvature for area functions, tongue surfaces, and outer vocal tract wall surfaces. An area function is a plot of the cross-sectional area of an acoustic tube, such as a vocal tract, as a function of axial position along the tube. Figure 1 shows an area function with two circles “touching” the function at points (x1 , A1 ) and (x2 , A2 ). These circles illustrate the definition of curvature at these points. For example, at (x1 , A1 ) the arc of the circle near this point actually approximates the plot of the function. The more “bent” the curve at (x1 , A1 ), the smaller the radius, r1 , of the circle. Further, the curve is said to have a high curvature at (x1 , A1 ). Therefore, we define curvature κ1 at (x1 , A1 ) to be positive or negative the reciprocal of the radius of the approximating circle. We assign a positive sign if radius of the approximating circle touches the function plot from above, and we take a negative sign in the case that it touches the function plot from below. Referring to Figure 1, κ1 = −1/r1 and κ2 = 1/r2 . r1 is a radius of curvature, as is r2 . So if r1 = 2 cm and r2 = 4 cm, then κ1 = −0.5 cm−1 and κ2 = 0.25 cm−1 . Note that |κ1 | > |κ2 | when r1 < r2 . We will be referring to changes in curvature, as, for example, |κ2 − κ1 |. With the numerical values given above, |κ2 − κ1 | = 0.75 cm−1 . Note that when curvatures have opposite signs, the difference in curvatures is the sum of the absolute values of the curvatures. 3 Given a surface, such as the outer tongue surface, we can cut that surface with a perpendicular plane, such as the mid-sagittal plane or a coronal plane. The intersection of that plane and the tongue surface produces a two-dimensional curve. Figure 2 shows these resulting two-dimensional curves in bold. The idea of curvature also applies to these two-dimensional curves. Figure 3 shows the curves that result from the mid-sagittal cut of the tongue and outer surface of the vocal tract (e.g. palate and rear pharyngeal wall). Curvature can be defined for each point on that curve in the same way that it is defined for the area function. Again, any point on either the tongue surface curve or the outer vocal tract curve has an approximating circle. The radius of that circle is the radius of curvature of the curve at that point, and the inverse of that radius is the curvature, to within a sign. We will use a negative sign if the radius of the approximating circle approaches the point from inside the vocal tract. Otherwise the curvature is positive. This is consistent with the sign convention for the curvature of the area function plot. Thus, the curvature is positive if the curve is “bulging” into the vocal tract (e.g. circles numbered 1, 3, and 5), and it is negative if it is “cupped away” or “arched away” from the vocal tract (e.g. circles numbered 2 and 5). Figure 4 shows a coronal section of a highly grooved tongue surface curve and the hard palate curve. The same sign conventions apply for these curves. There is one more measure that is considered: the average, spatial rate of change of curvature, κ̇ , which is now defined. Consider a smooth two-dimensional curve, and let s = s(x, y) be the curve’s length from the left-most terminal point to point (x, y) on the curve. For two points on the curve, (x1 , y1 ) and (x2 , y2 ), |s2 − s1 | = |s(x2 , y2 ) − s(x1 , y1 )| is the distance along the curve between the points (x1 , y1 ) and (x2 , y2 ). Average, spatial rate of change of curvature is defined κ2 − κ1 κ̇ ≡ s2 − s1 (1) where κ1 and κ2 are the curvatures at (x1 , y1 ) and (x2 , y2 ), respectively. With r1 = 2 cm, κ1 = −0.5 cm−1 , r2 = 4 cm, κ2 = 0.25 cm−2 , and s2 − s1 = 3 cm, κ̇ = 0.25 cm−2 . κ̇ is a measure of how rapidly curvature changes along the curve. We often refer to κ̇ as spatial 4 rate of change of curvature, without the term “average”. Vocal tract length scaling will often be employed when comparing a child’s and an adult male’s vocal tracts. (When speaking about hypotheses and simulations, we will be speaking of a typical child and a typical male adult. Only when we are discussing previous results do we refer to groups or populations of children and adults.) The scale factor, S, is the ratio of the adult’s vocal tract length to the child’s vocal tract length. It will often be applied to the child’s vocal tract to make it easier to compare shapes of the tracts. With the scaled child’s vocal tract, the child’s speech output in terms of the Fourier frequency domain should match that of the adult when our criterion for the child to produce adult-like speech is met. Note that S > 1, and, typically S < 2. We will use S = 3/2 in examples. The child’s lengths can be scaled up using S for direct comparison between child and adult area 0 functions and vocal tract shapes. Thus, if ` is a length for the child, the scaled length is ` , 0 with ` = S`. Because curvature, κ, has units of inverse length (e.g. cm−1 ) and κ̇ has units of inverse length squared (e.g. cm−2 ), the corresponding scaled quantities are given by 0 κ̇ κ 0 κ = and κ̇ = 2 S S (2) Thus, scaling length up with S > 1 means that curvature and spatial rate of change of curvature are scaled down. Further, the percentage differences between scaled and unscaled is greater for κ̇ than for κ. Using the values from the example after Equation (1), we 0 0 0 obtain κ1 = −0.333 cm−1 , κ2 = 0.167 cm−1 , and κ̇ = 0.111 cm−2 from κ1 = −0.5 cm−1 , κ2 = 0.25 cm−1 , and κ̇ = 0.25 cm−2 , respectively, when S = 3/2. We are now ready to state two plausible hypotheses regarding tongue surface curvature and spatial rates of change of curvature. These hypotheses are in terms of maximum values of these quantities. The first hypothesis is that the maximum absolute value of scaled curvature and scaled spatial rate of change of curvature for a child is less than the same quantities for the adult. Symbolically, 0 0 max κadult ≥ max κchild and max κ̇adult ≥ max κ̇child 5 (3) Note that the first inequality can be stated in terms of radius of curvature: the minimum scaled radius of curvature of the child is greater than the minimum radius of curvature of the adult. Further, the hypothesis can be restricted to a particular region of the tongue. Even if the child’s maximum scaled values in Equation 3 are less than those of the adult’s, his or her maximum unscaled values could be greater than those of the adult. For instance, th child may be able to produce a maximum unscaled curvature of |κchild | = 0.6 cm−1 , which corresponds to the child being able to produce a minimum unscaled radius of curvature of r = 1.666 cm. Suppose that the child can attain a maximum spatial rate of curvature of κ̇child = 0.25 cm−2 . The adult could have a maximum curvature of |κadult | = 0.5 cm−1 , which corresponds to a minimum radius of curvature of 2 cm, and a maximum spatial rate of curvature of κ̇adult = 0.15 cm−2 . With a scaling factor of S = 3/2, the child’s scaled 0 curvature is |κchild | = 0.4 cm−1 < 0.5 cm−1 = |κadult |, and maximum scaled spatial rate of 0 change of curvature κ̇child = 0.111 cm−2 < 0.15 cm−2 = κ̇adult . Thus, the following hypothesis regarding the child’s unscaled quantities is more restrictive than the hypothesis expressed in Equation (3), max κadult ≥ max κchild and max κ̇adult ≥ max κ̇child (4) What are the reasons to choose one or the other hypothesis? The author has no more than an intuitive idea that a more highly curved tongue, and/or a tongue shape that requires rapid spatial changes in curvature are more difficult for a speaker to produce. By difficult, we mean that the motor units need to be more highly spatially differentiated and simultaneously controlled. Here we are concerned mostly with tongues performing speech acts, and not other acts such as swallowing. (Margaret Denny points out that the distinction can probably be made between learned behavior and behavior that is present at birth, or between cortically mediated behavior and bulbar control.) Further, except in the case of sibilant fricatives, we are concerned with an unbraced tongue. Because the young child learning to speak does not possess the experience that the adult speaker does, we expect that the hypothesis expressed in Equation (3) is true. That is, we expect that the child cannot produce articulations more difficult when the vocal tract is scaled up than those of the adult. The more restrictive 6 hypothesis in Equation (4), involving unscaled values, could be true, but we do not have physiological evidence in terms of, say, the independence of control of motor units in the tongues of the child versus the adult (Denny & McGowan 2012a). Thus, here we will invoke the less restrictive hypothesis expressed in Equation (3) in the remainder of this essay. [ô] production In a longitudinal acoustic study, McGowan, Nittrouer, and Manning (2004) showed that stressed syllable-initial [ô] for young Americans growing up in a rhotic dialect area was the last of all the rhotic sounds to develop for a small group of children. For adults, stressed syllable-initial [ô] is the strongest [ô] in comparison with [ô] in other contexts, including medial and syllable-final [ô]. Here, the term “strong” is used in the sense that the first three formants attain very low values. In particular, the third formant frequency, F3, is very low for strong [ô]. We believe that the stressed, syllable-initial [ô] is also the most extreme of all the rhotic articulations, in the sense that the oral constrictions are the tightest and the area function expansions are the greatest of all rhotic articulations. This idea is now explored with area function-to-acoustic simulations. We can compute formant frequencies from area functions (McGowan, ongoing). In the present simulations we use a vocal tract length of 17.3 cm for both the adult and the scaled child. Thus, we are using an adult male vocal tract length, and consider the child’s vocal tract to be scaled up by the factor of the ratio of the adult’s vocal tract length to the child’s vocal tract length, i.e. by the factor S throughout this section, except in its final paragraph. We do not consistently use the term “scaled” until this final paragraph, but it should be understood implicitly. While [ô] is glide-like in its articulation, we are considering the glide at its most extreme articulation, which should correspond to the lowest values of the first three formant frequencies F1, F2, and F3. The purpose of the simulations is to examine the changes of the formant frequencies when there is expansion in the area function behind the palatal tongue constriction with a simultaneous constriction in the pharyngeal region. Lip rounding and a palatal tongue 7 constriction are presumed to be part of the articulation of [ô] for all the area functions examined here. Figure 5 shows a series of area functions, where increasing area function, or expansion, behind the palatal constriction, as well as decreasing area function, or constriction, in the pharynx are indicated by vertical arrows. The area functions vary along the vocal tract axis from 0 cm at the larynx to about 8.6 cm, and are all the same from 8.6 cm to the lips at 17.3 cm. The area function with no change in area from 8.6 cm to 0 cm is shown by the thick dashed line, and the area function with the largest area function expansion behind the palatal constriction and greatest degree of constriction in the pharyngeal region is also shown by the thick solid line in the rear of the vocal tract in Figure 5. There are intermediate degrees of expansion and constriction indicated by narrower lines in Figure 5. We examine these different area functions in terms of curvature in the rear of the vocal tract. There is no change of area function curvature in the rear of the vocal tract when it is a constant 2 cm2 : its curvature is zero throughout. (Recall that we are speaking of the area function here and not the curvature of the tongue surface, so approximate zero curvature can be real.) The other four area functions exhibit changes in area function curvature in the rear of the vocal tract: from negative curvature in the region of increased cross-sectional area right behind the palatal constriction (the expansion), to positive curvature where the pharyngeal constriction can be formed. As the volume of the expansion increases, the constriction in the pharyngeal region becomes tighter, and the maximum absolute values of the curvatures, |κ|, in these regions also increase. Further, the maximum absolute value of the spatial of change in curvature, κ̇ , along the curve between the maximum area behind the palatal constriction and the maximum pharyngeal constrictions also increases. The formant frequencies corresponding to these area functions are shown below the plot in Figure 5. The formant frequencies from top to bottom go from the least curved to the most curved rear vocal tract area function, as indicated by the vertical arrow to the left side of the table. All of the formant frequencies are well below their values for a straight acoustic tube (i.e. 500 Hz, 1500 Hz, and 2500 Hz, for F1, F2, and F3, respectively) for all the area functions depicted in Figure 5. As the area function in the rear of the vocal 8 tract goes from having zero curvature to maximum absolute curvature and spatial rate of change of curvature, F1 and F2 increase, and F3 decreases. F1 goes from 316 Hz to 321 Hz, a 2% increase, F2 goes from 910 Hz to 913 Hz, a 0.3% increase, and F3 goes from 1975 Hz to 1749 Hz, an 11% decrease. Thus, for negligible changes in F1 and F2, we obtain a substantial decrease in F3 in going from the configuration with no curvature to the maximally curved configuration in the area function in the rear of the vocal tract. In order to attain a strong [ô] with very low F3, it is advantageous to use the configuration with the maximum absolute curvature and spatial rate of change in curvature in the area function in the rear of the vocal tract. Are there particular anatomical, or geometric, factors that make the production of a strong [ô] particularly challenging for young children? We believe that there are several factors that impede the production of strong [ô] by children 18-months of age, say. Factors making strong [ô] production difficult for very young children are discussed in Denny and McGowan (2012a). We briefly review the factors in the Denny and McGowan work, and add an additional factor in the present essay. Before proceeding, we need to relate area functions to the vocal tract as viewed in the mid-sagittal plane. We are most interested in comparing the adult with the child in the region behind the palatal constriction. For this purpose, we make the crude assumption that the anatomical features in the dimension lateral to the mid-sagittal plane of the child’s scaled vocal tract are approximately the same as that of the adult’s (i.e. when the factor S is applied to the child). This means that differences between the child and the adult in distance from the upper tongue surface to the outer vocal tract surface of hard and soft palates and rear pharyngeal wall are proportional to the vocal tract cross-sectional areas. For example, the distance from the upper tongue surface to the outer vocal tract surface could be measured along grid-lines shown in Figure 6. The relevant factors in comparing adult and young children’s articulation of [ô] are, from Denny and McGowan (2012a): 1) the orientation of the axes of the oral and pharyngeal cavities, 2) the relative axial lengths of the oral and pharyngeal cavities, 3) the size of 9 adenoidal tissue, and 4) the orientations of external tongue muscles. We add an additional factor here, factor 5), and this is the fact that young palates tend to be less “domed” and flatter than adult palates in the mid-sagittal plane (Hiki & Ito, 1986). Here we concentrate on factors 1), 2), and 5) and leave 3) and 4) to be found by the interested reader in Denny and McGowan (2012a). (The growth of adenoidal tissue, factor 3), is probably not a factor in the youngest of children learning to speak.) We will show that factors 1), 2), and 5) combine to require more extreme tongue surface curvature and spatial rate of change of curvature for the young child in order to attain the changes in area function curvature to produce an adult-like strong [ô]. Figure 7 shows mid-sagittal configurations for strong [ô] production for the adult (Figure 7a) and the child (Figure 7b). As with the area function, the child’s vocal tract has been scaled by a factor that is equal to the ratio of the length of the adult vocal tract to the length of the child’s vocal tract, S. We show a tongue tip up [ô] production, but the argument below also applies to bunched tongue [ô] production. The length of the child’s pharyngeal cavity compared to its oral cavity is less than the same ratio of lengths for the adult. Thus, for the child to attain the same area function as the adult, he or she must have palatal and pharyngeal constrictions farther forward in relation to the palate and rear pharyngeal walls than the adult does, as shown schematically in Figure 8. This means that the child makes a palatal constriction that is lower and a pharyngeal constriction that is higher than the adult possesses. As a consequence, the two constrictions’ heights are more nearly the same for the child compared to the adult. This also means that the expansion behind the palatal constriction contains a greater proportion of pharynx compared to mouth for the adult than it does for the child. This, in turn, requires that the expansion be made with a larger proportion of the surface of the child’s tongue opposite the hard palate. It takes both the tongue surface and the outer vocal tract wall to form expansions and constrictions. The reader can refer back to Figure 3 to see that an expansion is best formed with both the tongue surface and the outer vocal tract wall possessing large negative curvatures. But, a less domed palate for the child compared 10 to the adult, means that this portion of the outer surface has less negative curvature for the child. One other factor that diminishes the negative curvature of the child’s outer vocal tract wall in the expansion is the fact that the axis of the mouth and the axis of the pharynx intersect at a more obtuse angle for the child than for the adult, where the axes are closer to perpendicular. What does the child need to do in order to attain a similar area function to the adult in the expansion region? Combining the fact that the two tongue constrictions are more nearly at the same height for the child compared to the adult, and the fact that there is less negative curvature in the outer wall in the expansion for the child than for the adult, means that the child’s tongue surface must possess more negative curvature than the adult’s tongue surface in the expansion region. A slight curvature is shown in the adult’s tongue surface in this region in Figure 7(a), but even this is probably not necessary, so the adult can have essentially a flat tongue with zero curvature in this region to produce the area function in Figure 5. Not only is the maximum negative curvature of the child’s tongue surface greater than that of the adults, but the average spatial rate of change from the negative curvature to the positive curvature of the tongue surface in the pharyngeal constriction region is also larger. We have not considered the curvatures of the surfaces in the region below the pharyngeal constriction. That could require another change in tongue surface curvature from positive near the pharyngeal constriction to negative in the lower pharynx. It may be that the adult could produce an even greater negative tongue surface curvature behind his palatal constriction, while, at the same time, retaining the positive tongue surface curvature necessary to create the pharyngeal constriction. That is, in the region of the tongue behind the palatal constriction, the hypothesized first part of the hypothesis 0 expressed in Equation (3) could be met, max κadult ≥ max κchild , in the region between, and inclusive of, the palatal and pharyngeal constrictions. On the other hand, with the changes from negative to positive tongue surface curvature from the expansion behind the palatal constriction over a small distance, it is likely that 11 the second part of the hypothesis expressed in Equation (3) is violated in order for the child to produce an adult-like strong [ô]. That is because spatial rate of change of tongue surface curvature scales as S −2 , and not S −1 , that the second part of the hypothesis in Equation (3) could be violated, while the first part is not. We write, 0 max κ̇child > max κ̇adult , and, perhaps, 0 max κchild > max κadult (5) where “and, perhaps” is, logically, an inclusive or. Equation (5) is a contradiction to the hypothesis expressed in Equation (3). Therefore we offer the possibility that the hypothesis expressed in Equation (3) being true for the inability of young children to produce strong, adult-like strong [ô]. The situation appears severe if both the inequalities in Equation (5) are satisfied. We revert to the child’s unscaled dimensions to find what the child must do to produce strong [ô]. Equation (2) can be used to give quantities in unscaled terms. If a reasonable value for S is taken to be 3/2, then this means that the child’s tongue surface curvature in absolute terms is at least 150% that of the adult, and the average spatial rate of change of the tongue surface curvature is at least 225% that of the adults. These are lower bounds that could be much larger. If the maximum curvature of the adult’s tongue in the region of interest is |κadult | = 0.5 cm−1 , which corresponds to a radius of curvature, radult , of 2 cm, then the child’s maximum curvature must be at least |κchild | = 0.75 cm−1 , which corresponds to a radius of curvature, rchild = 1.333 cm. Thus, the child is required to attain very small radii of curvature in order to produce a strong [ô], and changes between small radii of curvature (i.e. large curvatures of opposite sign) need to occur in 2/3 the adult distance along the tongue surface. Sibilant fricatives Our earliest work on children’s speech production was on sibilant fricatives spoken by American children, aged 3 to 7 years, and by American adults (McGowan & Nittrouer, 12 1988). We found that the amplitudes of the second formants were relatively high for the children compared to the adults during sibilant production. Because the great majority of the second formant energy is behind the tongue constriction for sibilant fricatives, we argued that the children’s tongue constrictions are generally not as tight during sibilant production as those for adults. In the notation of the present essay, we expect the scaled constriction cross-sectional area of the child, Achild , to be greater than the constriction cross-sectional area of the adult, Aadult , 0 Achild = S 2 Achild > Aadult (6) Because S > 1, it is possible for the inequality in Equation (6) to be true and still have Achild < Aadult . We cannot speak to that possibility at the present time. We believe that the relation expressed in Equation (6) is the result of the relation expressed in Equation (3). This cannot be shown with mathematical certainty, but a plausibility argument is presented. We are not certain about the tongue surface curvatures near the tongue tip during sibilant production, because the constriction is formed using three dimensional deformation of the tongue near the tip, and can be affected by individual variation in palate morphology. However, we can consider aspects of tongue surface shape in a coronal plane near the tip. The doming of the child’s palate in coronal planes near the sibilant constriction is not as great as that of the adult’s (Hiki & Itoh, 1986). Again, we consider the child’s vocal tract scaled up by the factor S. It can be imagined that the adult’s tongue and palate in a coronal plane near the maximum constriction would look like the sketch shown in Figure 9a. For the scaled child 0 with Achild = Aadult and his or her shallower palate, the corresponding sketch should be like the one shown in Figure 9b. The doming of the palate in the coronal plane for the adult means that it has a large negative curvature, which in turn means that the tongue surface can have a positive curvature through this coronal plane, and still form a constriction of small area, Aadult . 0 What would be the case if the child were to attempt to attain Achild = Aadult ? Figure 9b 13 shows that the tongue needs to form most of three sides of the constriction channel, because the negative curvature of the child’s palate the coronal plane has a small magnitude. This, in turn, requires a rapid change in large-magnitude tongue surface curvature in the coronal plane from positive, to negative, and back to positive. (The possibility of a channel that is very narrow and long in the coronal plane is discounted, because the jet of air emanating from the constriction needs to be directed toward the incisors.) The large curvatures and rapid changes in curvature could well be in contradiction to the relations expressed in Equation (3) for a coronal plane near the tip of the tongue. Here the tongue is braced on the sides for both the adult and the child, so we could expect that we could attain larger values for these quantities than for unbraced configurations. However, there must be limitations, even in the braced conditions. In the next section, issues of tongue surface curvature only appear as an aside. We will return to tongue surface curvature as a major factor in speech development in the final section. Alveolar and velar stops Fronting of velar consonants is a well-attested phenomenon in young children’s speech. We believe that this phenomenon, at least for the youngest children, is partly the result of a scaling mismatch between the physics of acoustic propagation and the physics of the noise source at the teeth during stop release. Formant frequencies scale according to the ratio of adult vocal tract length to child vocal tract length, or S, if the child produces adult-like speech. However, the relation between acoustic sources and the resulting acoustics possess more complicated scaling properties. For instance, if the length of an acoustic tube is scaled down by a factor of two, we would expect that the resonance frequencies to increase by a factor of two. This is not true of acoustic noise source properties. In fact, we examine an instance where the predominant noise source property, intensity in higher frequency bands, depends on the unscaled distance from the tongue tip constriction to the teeth. We do not expect that S has any effect on the intensity in the higher-frequency bands. (What constitutes the higher frequency band does depend on S, but not the intensity of sound in 14 that band.) Thus, intensities in the higher frequency bands are the same for the child and adult if the distance from the tongue tip constriction to the teeth is the same, all else being equal. Here we concentrate on stressed syllable-initial voiceless velar and alveolar stops, which are aspirated in American English. First, we consider some empirical measures of the noise after the release bursts for both some adults and children. We then argue that a young child’s intended velar noise source patterns do not resemble those of the adult, because the child has the impossible task of simultaneously matching the formant pattern at release. The result is that the child produces an intended velar release with ambiguous acoustic cues for adult listeners. Another aspect of velar fronting is what has been termed “undifferentiated tongue gesture”, where the tongue covers a large area of the palate during closure. Indeed, tongue contact with the palate is a function of tongue surface curvature. We do not address this aspect of velar fronting in this essay. However, the reader can imagine that limitations on tongue surface curvature can lead to relatively large areas of palate covered by the tongue during stop closure. We performed a small study of voiceless, aspirated velar and alveolar stop release noise spectra as they evolve in time with one adult male subject and one adult female subject. We used a multi-scale spectral analysis proposed by Stockwell, where the bandwidth of the analysis increases with frequency, while the time resolution also increases with frequency (Stockwell, 1996). The Stockwell analysis shares this property with wavelet analysis, but it is more easily interpretable in terms of standard Fourier analysis than are wavelets. (We thank M.T-T. Jackson for finding Stockwell’s method and computer code on the World Wide Web.) The results for the two adults producing velar and alveolar releases in four different vowel contexts are shown in Figures 10-13. These figures show the amplitude of different octave bands for about 5 ms before and 15 ms after the release of the consonant. The center frequencies of the bands are given in the legends of these figures. 15 In general, the amplitude of the band centered at 4.4 kHz, represented by the thick, light gray line, remains high after the alveolar burst, but not after the velar burst. For the front vowel contexts, /i/ and /æ/, the next higher band centered at 8.9 kHz, represented by thick dashed lines, shows the same difference in place-of-articulation particularly for the male (Figures 10 and 11). The exceptions to these general trends are the velar releases for the female in front vowel, /i/ and /æ/ contexts. We cannot discount the possibility that the release was somewhat fronted for the female speaker in these front vowel contexts. However, the amplitude of her 4.4 kHz centered band does not remain high during the entire 15 ms interval after the burst, indicating that multiple releases of the /k/ may have occurred. These results are, perhaps, not the expected results. After all, it is known that the velar release is slower than the alveolar in general. We interpret the results as follows. The sustained strong 4.4 kHz and 8.9 kHz centered bands after alveolar release is due to the the strong air flow from the glottis during aspiration being directed toward the teeth by the tongue blade as the tongue tip moves from the palate. This creates a sustained strong noise source at the teeth for a relatively long time after the burst. Such a directing of air flow toward the teeth does not occur after velar release, so the high intensities in the 4.4 kHz and 8.9 kHz centered bands are brief. What about the same analysis performed on the speech of young children? Figure 14 shows the same analysis applied to alveolar releases of a child who was 30 months-old. Because of the shorter vocal tract, it is the band with center frequency of 8.9 kHz that has the highest sustained amplitude. Figure 15 shows the acoustic evolution for three intended velar releases. These plots are a part of a larger sample of plots that we have examined for six children from 30 to 48 months of age. They indicate that young children’s intended velar releases have post-release noise characteristics that are more like those of alveolar releases. The fact that a young child’s vocal tract is shorter means that the release for an intended velar would be closer to the teeth than that of the adult’s release, even if the child’s release was at the velum. This would mean that noise after release would have more sustained intensity in the higher frequency bands. Further, we hypothesize that the young child’s 16 intended velar release are indeed fronted in many cases, which could make the noise source after release even more like the noise source produced after an alveolar release. We now explore a reason that children are likely to front their velar releases. There are at least two important perceptual cues for place-of-articulation for the types of stops that we are considering: 1) formant transition from the release to the following vowel, and 2) the spectral composition of the noise after release as a function of time. We argue that the young child cannot simultaneously produce both adult-like perceptual cues for a /k/ release due to his or her anatomy. One of the expected features of the formant frequency trajectories at /k/ release is what has been termed the “F2-F3” pinch; the second and third formant frequencies appear to emerge near the same frequency at the time of an adult’s velar release. The explanation for this pinch is that the constriction at the velum produces a rear acoustic tube that is approximately twice the length of the front tube. The rear tube is approximately a half-wave resonator and the front tube a quarter-wave resonator, and, thus, they possess nearly identical resonant frequencies ascribed to F2 and F3. The ratio of pharyngeal cavity length to oral cavity length is less for the young child compared to the adult. This means that the child needs to make his or her constriction farther forward with respect to his or her teeth compared to the adult in order to produce the pinch in F2 and F3 at the time of release of the intended velar. Figure 16 illustrates this phenomenon. The top panel shows the case of velar release for the adult, and the middle panel for a velar release for the young child. In order that the young child attain the F2-F3 pinch it is necessary to move the tongue closure forward along the hard palate. Not only is the velum closer to the teeth for the child, but the child has a reason to move the intended /k/ constriction even closer to the teeth. It is plausible that the child’s /k/ release occurs at a location that has a comparable distance to the teeth that the adult has for his /t/ release. This diminished distance has substantial consequences for the other acoustic cue: the spectral properties of the noise source as a function of time. Roughly, moving a tongue constriction substantially closer to the teeth can result in sibilance, where there 17 would otherwise be no sibilance. We conclude that what adults hear as a fronted velar release is indeed fronted. However, while we hear a noise source indicating a frontal release, we could find, with further investigation, that the formant trajectories contain the F2-F3 pinch for an intended velar. Back vowels We return to issues of tongue surface curvature in relation to the curvature of the outer vocal tract wall in this most speculative section. Here we address the observed large variability in the acoustics of children’s back vowel production. McGowan, McGowan, Denny, and Nittrouer (2014) observed that young children’s (aged 30 to 48 months) back vowel formant frequencies were the most variable compared to other vowels. This variability was found in terms of the vowels’ mean position in the F1-F2 space as a function of age, as they were measured about every six months (see their Figure 6). In other words, the back vowels appear to be acoustically less stable as a function of the age of the young children than do other vowels. In a study of a synthetic speech of a four-year old child, McGowan (2006) showed that the pronunciation of vowels preferred by adult listeners had longer rear tube lengths in relation to front tubes than for the comparable adult pronunciation. (Note that the rear tube for /A/ is a constriction, and for /u/ it is an expansion.) This indicates that the constrictions and expansions in the rear of the vocal tract are less well spatially localized for children than adults. This is related to notions regarding maximum spatial rate of change of tongue surface curvatures, κ̇ , because it is necessary for tongue curvature to change rapidly along the length of the tongue to obtain rapid spatial change between a constriction and an expansion. According to the hypothesis expressed in the second inequality in Equation (3) applied to the rear surface of the tongue, we expect the scaled child’s sagittal tongue surface curvatures to change less rapidly than adult tongue surface curvatures along much of the tongue body. Could there be a relation between the decreased localization of acoustic tubes and the observed variability of back vowels at different ages? Of course, the developing vocal tract 18 changes its morphology rapidly with age (Denny & McGowan 2012b). This, alone, could cause substantial acoustic variability. However, we want to determine whether differences between the child’s and adult’s tongue surface curvature, coupled with outer vocal tract wall curvature, could contribute to the observed acoustic variability. We assume that the net effect of tongue surface curvature limitations are similar to those of the area function in the rear of the vocal tract. We use a simulation to show the possibility that the observed acoustic variability of children’s back vowels with age could be due to limitations on the maximum spatial rate of change of tongue surface curvature. In this section we work with the child’s vocal tract scaled up by the factor S. We examine area functions that have a constant area in the front and in the rear of the vocal tract. These are regions of zero area function curvature. (In these regions, the curvature of the tongue matches the curvature of the outer vocal tract wall, except that the curvatures have the opposite signs.) The length of the back region of zero curvature of the adult is assumed to be 2/3 that of the scaled child. Between the regions of zero area function curvature there is a region of positive curvature that connect the regions. The connecting region has axial length that is 1/4 the length of the vocal tract for the adult, and 3/8 the length of the scaled vocal tract for the child. These differences in length are consistent with the idea that maximum scaled rates of change of curvature of the child are less than those of the adult. Figure 17a shows an area function for the adult, and Figure 17b shows an area function for the child. The child’s and adult’s area functions have the same basic shape. These area functions are appropriate for a low, back vowel. Variability in area function was simulated by changing the length of the rear constriction from a minimum length to twice the minimum length in 10% increments. Also, the cross-sectional area of the constriction tube was varied from 0.3 cm2 to 0.7 cm2 , so that all possible constriction cross-sectional areas were coupled with all possible rear constriction lengths. Figure 18a shows the extreme positions for the adult and Figure 18b shows the extreme positions for the child. For the adult, the mean F1 and F2 across all the perturbations was 677 Hz and 826 Hz 19 respectively, and the range of F1 was 141 Hz, or 21%, and the range of F2 was 139 Hz, or 17%. For the child the mean F1 and F2 across all the perturbations was 723 Hz and 1040 Hz, respectively. The range of F1 for the child was 160 Hz, or 22%, and the range of F2 was 296 Hz, or 28%. Thus, while the child’s range was comparable to the adult’s range in F1, for F2, the child’s range was substantially larger than that of the adult. We believe that non-back vowels behave differently than back vowels, because the rear portion of tongue bodies are more limited in terms of surface curvature. Particularly, the child is less limited by maximum curvatures and spatial rates of change of curvature in the production of non-back vowels. Conclusion In this essay we have concentrated on physical and geometric factors that could limit a child’s ability to produce adult-like speech in certain instances. Much recent research focus has been on dynamic factors in young speakers’ articulation. We believe that both geometric and dynamic factors play a role in speakers’ maturation. We have relied on an unproven idea that highly curved tongues and high spatial rates of change in curvature are difficult motorically. This idea needs to be investigated. So why do some children, who apparently have grown out of physical and geometric limitations, continue to behave as they did months previously? We do not know. However, one possibility is that some children get along well enough in their limited community with whatever subphonemic distinctions that they possess, and, further, they are naturally conservative. These children, for one reason or another, are not motivated to go through periods of high variability with certain speech motor acts that are necessary to attain new, more adult-like pronunciations. This merits investigation, if it is not understood already. 20 Acknowledgments We thank Margaret Denny and Rebecca McGowan for improving this essay. This essay is dedicated to all present and past employees of CReSS LLC. 21 References Denny, M. and McGowan, R.S. (2012a). Implications of peripheral muscular and anatomical development for the acquisition of lingual control of speech production: a review. Folia Phoniatrica et Logopaedia, 64, 105-15. Denny, M. and McGowan, R.S. (2012b). Sagittal area of the vocal tract in young female children. Folia Phoniatrica et Logopaedia, 64, 297-303. Hiki, S. and Ito, H. (1986). Influence of palate shape on lingual articulation. Speech Communication, 5. 141-58. McGowan, R.S. (2006). Perception of synthetic vowel exemplars of 4 year old children and estimation of the their corresponding vocal tract shapes. J. Acoust. Soc. Am., 120, 2850-58. McGowan, R.S. (ongoing). Lectures in Acoustics for the Speech Sciences. CReSS LLC, Lexington, MA. (www.cressllc.net). McGowan, R.S., Denny, M., and Jackson, M.T-T. (2011). Alveolar and velar stop releases during speech development. J. Acoust. Soc. Am., 129. 2597 (abstract). McGowan, R.S. and Nittrouer, S. (1988). Differences in fricative production between children and adults: evidence from an acoustic analysis of /S/ and /s/. J. Acoust. Soc. Am., 83, 229-36. McGowan, R.S., Nittrouer, S., and Manning, C.J. (2004). Development of [ô] in young, Midwestern, American children. J. Acoust. Soc. Am., 115, 871-84. McGowan, R.W., McGowan, R.S., Denny, M., and Nittrouer, S. (2014). A longitudinal study of childrens vowel production. J. Speech Lang. Hear. Res., 57, 1-15. Stockwell, R.G., Manshinha, L., and Lowe, R.P. (1996). Localization of the complex spectrum: the S transform. IEEE Trans. Sig. Proc., 44. 998-1001. 22 Area functions 4 cross−sectional area (cm2) 3.5 3 2.5 2 1.5 1 0.5 0 lips 0 2 4 F1 (Hz) F2 (Hz) F3 (Hz) 316 317 318 320 321 910 910 911 912 913 1975 1909 1854 1802 1749 6 8 10 axis position (cm) Figure 5 12 14 16 Figure 10 Figure 11 Figure 12 Figure 13 Figure 14 Figure 15 areas (cm2) 4 3 2 1 0 0 2 4 6 8 10 tube axis position (cm) (a) 12 14 16 0 2 4 6 8 10 tube axis position (cm) (b) 12 14 16 lips areas (cm2) 4 3 2 1 0 lips Figure 17 areas (cm2) 4 3 2 1 0 0 2 4 6 8 10 tube axis position (cm) (a) 12 14 16 2 4 6 8 10 tube axis position (cm) (b) Figure 18 12 14 16 areas (cm2) 4 3 2 1 0 0
© Copyright 2026 Paperzz