1 Frequency of Stress Patterns in English: A

Frequency of Stress Patterns in English: A Computational Analysis
Cynthia G. Clopper
Indiana University
Abstract
Word reduction by weak syllable omission is a fairly common
phenomenon in several populations, including normally developing
children, children with Specific Language Impairment, and adults with
aphasia. A recent study (Carter & Clopper, submitted) has shown that
normal adults reduce words using strategies similar to those used by the
above-mentioned populations and that these strategies vary to some degree
with the number of syllables and the location of the primary stress of the
target word. A computational analysis of several frequency measures was
conducted on the entire set of English words contained in the Hoosier
Mental Lexicon (Luce & Pisoni, 1998) that follow nine different patterns
of syllable number and primary stress location (syllable-stress patterns).
The frequency measures considered were total word count within each
syllable-stress pattern, sum frequency of all words within each syllablestress pattern, and mean, median, and mode frequencies of all words within
each syllable-stress pattern. Results indicated that words in English are
not evenly distributed across syllable-stress patterns. Specifically, shorter
words tend to be more frequent in terms of word count, sum frequency,
and mean frequency than longer words. Additionally, some syllable-stress
patterns, such as three-syllable and four-syllable words with stress on the
final syllable are much less frequent than other words with the same
number of syllables. The implications of these results include possible
explanations for the different strategies found in word reduction studies for
different syllable-stress patterns.
1
Introduction
Lexical stress assignment in English is free, in that stress can be assigned to a
syllable in any position in a word. Every content word in English contains a single
syllable bearing primary stress and, optionally, other syllables bearing secondary stress.
In general, stress is assigned from right to left to form trochaic (Strong-weak) feet,
resulting in an alternating stress pattern in multisyllabic words, such as ánecdòte1 and
Mìnnesóta (Hammond, 1999; Hayes, 1982).
Despite the fact that stress can be assigned to any syllable in a multisyllabic word
in English, there are strong tendencies for stress to occur in certain positions more than
others. For example, Cutler and Carter (1987) reported that in a corpus of over 20,000
English words, 90% of the content words began with a stressed syllable. Cutler and
Norris (1988) provided experimental evidence that naïve listeners are sensitive to the
tendency for English words to begin with a stressed syllable. Listeners were presented
with disyllabic nonsense words and were asked to identify monosyllabic real words
embedded in them. Cutler and Norris found that listeners could more easily identify the
real words when they were embedded in Strong-weak nonsense words (e.g., míntesh) than
in Strong-Strong nonsense words (e.g., míntàyve). Cutler and Norris argued that these
results reveal the tendency for listeners to identify word onsets with stressed syllables.
The present computational analysis was carried out to further explore the
frequency of different stress patterns in English. In particular, we examined the
differences between various combinations of syllable number and primary stress location
(syllable-stress patterns) in English in terms of frequency of occurrence. In particular,
nine different syllable-stress patterns were selected for investigation and several measures
of frequency were calculated based on the lexical frequency2 data provided by Kucera and
Francis (1967) from the Brown Corpus. The results of the present analysis may
contribute to our understanding of the role of stress in speech perception by directly
comparing the relative frequency of occurrence of various syllable-stress patterns in
English.
Given the established role of lexical frequency in spoken word recognition (Luce &
Pisoni, 1998), syllable-stress pattern frequency might also be expected to impact some
linguistic tasks where stress and word length are important. For example, Carter and
Clopper (submitted) conducted a study of word reduction behavior under laboratory
conditions in normal adults and found that reduction strategies were related to the
syllable-stress patterns of the stimuli. Carter and Clopper presented Indiana University
undergraduates with a list of words auditorily and asked them to reduce each word in
some way.
Subjects were given examples such as “Hippo is a reduction of
hippopotamus,” but most of the 160 stimulus items were not normally reduced in
everyday speech (e.g., tàpióca, ártichòke, and máple). However, all of the words in the
1
Primary stress is marked with an acute accent ´ and secondary stress in marked with a grave accent `
throughout this manuscript.
2
Lexical frequency is defined as the number of occurrences of a given word per million words. In this
case, the written Brown Corpus was used as the basis for the occurrences per million count (Kucera &
Francis, 1967).
2
stimulus set were highly familiar, based on familiarity ratings from another set of Indiana
University undergraduates (Nusbaum, Pisoni, & Davis, 1984).
Studies on word reduction behavior in several populations, including normally
developing children, children with Specific Language Impairment, and adults with aphasia,
have shown that these populations often reduce words in normal language situations
(Carter, 1999; Gerken, 1996; Smith, 1973). For example, banána might be reduced to
nána or giráffe to ráffe in the continuous speech in any of these three populations.
Studies of syllable omission have revealed a set of strategies that recur in reduction
processes across the different populations: unstressed syllables are omitted in reductions
more often than stressed syllables (Allen & Hawkins, 1980; Carter, 1999; Gerken, 1996)
and prominent syllables such as initial and final syllables are retained in reduced forms
more often than omitted (Carter, 1999; Echols & Newport, 1992; Kehoe, 1999).
The results reported by Carter and Clopper (submitted) indicated that normal
adults tend to employ the same strategies as other populations when reducing words.
These strategies include retaining the primary stressed syllable in the reduced form;
reducing the word to a good prosodic foot, (i.e., either a single stressed syllable or a
disyllabic trochee; Hammond, 1999); and retaining other perceptually and cognitively
prominent syllables in the reduction, such as initial syllables (Brown & McNeill, 1966;
Grosjean, 1980; Taft, 1979). However, the results did suggest systematic differences
within the normal adult population in reduction strategies based on syllable-stress
patterns. For example, primary stressed syllables were preserved in the reduced forms
more often than not for all syllable-stress patterns except three syllable words with
primary stress on the final syllable (e.g., càbarét). In addition, word reductions took the
form of a good prosodic foot for all but the four syllable words with primary stress on
the second syllable (e.g., aquárium). The results of the present computation analysis
may provide some new insights into the strategies used in word reduction tasks. In
particular, unexpected reduction strategies may be explained by the relatively high or
relatively low frequency of a given syllable-stress pattern.
Purpose
The main purpose of this study was to explore the possible interaction between
syllable-stress patterns and lexical frequency in English. Specifically, an examination of
the number of different words that follow each pattern (Word Count) will reveal which
patterns are more common in the English lexicon when all words are weighted equally.
The Sum Frequency of all words within each pattern will reveal the frequency with which
each pattern is encountered. This measure of Sum Frequency is equivalent to the
frequency of occurrence of the syllable-stress pattern per million words. Finally,
measures of average frequency (mean, median, and mode) of all words within a single
pattern will reveal the “typical” frequency of occurrence of words with a given pattern.
These average frequency measures allow us to compare the syllable-stress patterns in
terms of the kinds of words that are contained in each group.
Intuitively, it seems that some syllable-stress patterns are more common than
others. To the extent that this is true, these trends should be evident in all of the
3
computational measures taken in this study of frequency. In terms of overall word
length, there should be a general trend for shorter words to have higher frequency than
longer words. Both in terms of Word Count and Sum Frequency, we expect to find that
the longer words are less common than shorter words.
In terms of specific syllable-stress patterns, we expect that primary stress on the
first syllable will be more common than primary stress on the final syllable for two and
three syllable words, given the rules of trochaic stress assignment in English (Hammond,
1999; Hayes, 1982). Many disyllabic verbs in English appear with primary stress on the
second (and final) syllable, however, so there might be some attenuation of the frequency
difference between the words with initial syllable and final syllable stress for two syllable
words. Also based on the rules of English stress assignment, primary stress should be
found more frequently on the second or third syllable of four syllable words than on the
first or fourth syllable (Hammond, 1999).
Methods and Procedures
A series of searches was conducted of the Hoosier Mental Lexicon (HML; Luce &
Pisoni, 1998), an online version of Webster’s Pocket Dictionary that includes
orthographic and phonetic transcriptions (including syllable and stress markings) of
20,000 words, as well as lexical frequency (Kucera & Francis, 1967) for each entry. The
searches located and extracted all of the words in the HML that fell into one of nine
different syllable-stress patterns. These nine patterns are shown in Table 1.
Syll-Stress
Number of
Primary Stress
3
Pattern
Syllables
Location
2syl-1pri
2
1st
2syl-2pri
2
2nd
3syl-1pri
3
1st
3syl-2pri
3
2nd
3syl-3pri
3
3rd
4syl-1pri
4
1st
4syl-2pri
4
2nd
4syl-3pri
4
3rd
4syl-4pri
4
4th
Table 1. Syllable-stress patterns searched in the HML.
Each of the nine lists of words were then analyzed in several ways to obtain
frequency information for each of the syllable-stress patterns. First, the total number of
words in each syllable-stress pattern was tallied (Word Count) to provide some indication
of how many different words in the HML followed each of the syllable-stress patterns.
3
The notation used in this column indicates the number of syllables (e.g., 2syl, 3syl, or 4syl) and the
location of the primary stressed syllable (e.g., 1pri, 2pri, 3pri, or 4pri).
4
Then, the frequency counts for all of the words in each syllable-stress pattern were
summed (Sum Frequency) to provide some indication of how frequently a given stress
pattern is encountered per million words. Finally, the mean, median, and mode
frequencies for each syllable-stress pattern were calculated to provide several descriptive
measures of the average4 frequency of words with a given syllable-stress pattern. The
Mean Frequency is equal to the Sum Frequency divided by the Word Count for each
syllable-stress pattern. Along with the Median and Mode Frequencies, the Mean
Frequency represents the central tendency of the frequency of occurrence of words
within a given syllable-stress pattern.
Results
The main results of this computational analysis are shown in Table 2. Word
Count (the total number of different words in the HML that follow each of the nine
syllable-stress patterns) is shown in the second column. It is clear from this column that
words in English are not evenly distributed in terms of syllable-stress pattern. In just the
nine patterns considered here, the range of Word Counts spans two orders of magnitude
with the smallest count (37) for four syllable words with primary stress on the final
syllable and the largest (3624) for two syllable words with primary stress on the first
syllable.
Syll-Stress
Word
Sum
Mean
Median
Mode
Pattern
Count
Frequency
Frequency
Frequency
Frequency
2syl-1pri
3624
67693
18.68
1.00
1.00
2syl-2pri
995
19881
19.98
1.00
1.00
3syl-1pri
2619
24558
9.38
1.00
1.00
3syl-2pri
1510
15278
10.12
1.00
1.00
3syl-3pri
369
1398
3.79
1.00
1.00
4syl-1pri
497
3549
7.14
1.00
1.00
4syl-2pri
1331
9014
6.77
1.00
1.00
4syl-3pri
1017
6831
6.72
1.00
1.00
4syl-4pri
37
97
2.62
1.00
1.00
Table 2. Frequency information calculated for each of the nine syllable-stress patterns.
The Sum Frequency (the frequency of each of the nine syllable-stress patterns per
million words) is shown in the third column. This column reveals that some syllablestress patterns are more frequent than others in English. In the nine patterns considered
here, the range of frequency of syllable-stress patterns spans three orders of magnitude
with the smallest frequency for four syllable words with primary stress on the final
syllable (97) and the greatest for two syllable words with primary stress on the first
syllable (67693).
4
Average is used here in the technical sense encompassing mean, median, and mode.
5
The Mean Frequency (Sum Frequency divided by Word Count) of words in each
syllable-stress pattern is shown in the fourth column. These data are also presented in
Figure 1. It is clear from this figure that words in different syllable-stress patterns are not
equivalent when it comes to their Mean Frequency. There is a general trend for shorter
words to be more frequent than longer words. However, three syllable words with
primary stress on the final syllable are, on average, less frequent than longer, four syllable
words with primary stress on any of the first three syllables. In addition, while three
syllable words with primary stress on either the first or second syllable are relatively
similar in their mean frequency, three syllable words with primary stress on the final
syllable are much less frequent than other three syllable words. Similarly, four syllable
words with primary stress on the fourth syllable are less frequent, on average, than four
syllable words with primary stress on any of the first three syllables.
25.00
Mean Frequency
20.00
15.00
10.00
5.00
0.00
2syl1pri
2syl2pri
3syl1pri
3syl2pri
3syl3pri
4syl1pri
4syl2pri
4syl3pri
4syl4pri
Syll-Stress Pattern
Figure 1. Mean frequency of occurrence for words in nine different syllable-stress
patterns.
The Median and Mode Frequencies are shown in the last two columns in Table 2.
It is interesting to note that in both cases, all values are equal to 1.00. That is, at least half
of all words in each syllable-stress pattern have a frequency of 1 occurrence per million
(Median Frequency) and the most common frequency of the words in each of the nine
syllable-stress patterns is 1 (Mode Frequency). Even for the two syllable words with
primary stress on the first syllable pattern, which occurs most frequently in terms of
Word Count and Sum Frequency, the majority of the words in the group have a lexical
frequency of 1 per million.
The results confirm the predictions about the relative frequency of words of
different lengths. In particular, shorter words are more frequent in terms of Word Count,
Sum Frequency, and Mean Frequency than longer words. Within the different syllable
6
number categories, the expected results were also found. For two and three syllable
words, the Word Count and the Sum Frequency of words with stress on the first syllable
are greater than the Word Count and the Sum Frequency, respectively, of words with
stress on the second or, for three syllable words, third syllable. In addition, for four
syllable words, primary stress on the second or third syllable is more common than
primary stress on the first or final syllable, both in terms of Word Count and Sum
Frequency.
Discussion
The results of this computational analysis demonstrate that some syllable-stress
patterns are more frequent than others in English. That is, some of these syllable-stress
patterns are encountered more frequently by native speakers of English than others.
Specifically, longer words are less common than shorter words in general and three and
four syllable words with primary stress on the final syllable are less common than other
words of the same length.
One of the major implications of these findings is their explanatory power for the
word reduction study reported by Carter and Clopper (submitted). The results in Carter
and Clopper suggested that native speakers of English are sensitive to the prominence of
primary stress because they rarely omit stressed syllables in their reductions. The mean
retention rate of the primary stressed syllable from the original stimulus in the reduction
response across all syllable-stress patterns was 67%. However, for the three syllable
words with final syllable primary stress, the final syllable was retained in reductions only
37% of the time. The rarity of this syllable-stress pattern (Sum Frequency = 1398)
relative to the other patterns with the same number of syllables (Sum Frequency = 24558
for first syllable stress and Sum Frequency = 15278 for second syllable stress) might be
one reason why there was a change in reduction strategy for this group of words.
The four syllable pattern with primary stress on the second syllable was found to
be more common than the other four syllable patterns in terms of Word Count and Sum
Frequency. Carter and Clopper (submitted) found that their participants’ reductions
took the form of a good foot (either a monosyllabic foot or a disyllabic trochaic foot) in
83% of reduction responses over all stimulus patterns. However, for the four syllable
words with primary stress on the second syllable, the reductions were in the form of
other prosodic patterns, such as disyllabic iambs or trisyllables. The mean number of
good foot reductions for this pattern was only 37%. The relatively high frequency of this
stimulus pattern might be one cause of this change in reduction strategy for this group of
words.
Unexpectedly, the Median and Mode Frequencies for all nine syllable-stress
patterns were equal to 1.00. These results reflect the distribution of the lexicon: the vast
majority of English words have a lexical frequency of only 1 per million words. An
interesting follow-up analysis would be to consider the median and mode frequencies of
the words in each of the nine syllable-stress patterns after removing all words with a
lexical frequency of 1. Such an analysis might be expected to reveal a similar pattern of
results as the Mean Frequency measure taken in this study.
7
The important role of lexical frequency in spoken word recognition has been
understood for some time (see Luce & Pisoni, 1998 for a review). Cutler and her
colleagues have been arguing for more than a decade for consideration of the role of stress
in speech perception and spoken word recognition (Cutler, 1990; Cutler & Carter, 1987;
Cutler & Norris, 1988). The results of the present computational analysis on the
frequency of syllable-stress patterns and their explanatory value for word recognition
research provide further evidence for the important roles that frequency and stress play in
human spoken language processing. In particular, relatively rare and relatively common
stress patterns for words of a given length seem to elicit different reduction strategies in
the laboratory task with adults than the general reduction strategies seen for words with a
more typical syllable-stress pattern.
Conclusion
The results of this computational analysis of syllable-stress patterns in English
confirmed our intuitions about the most common stress patterns of words with a given
number of syllables. In particular, two and three syllable words are more likely to have
primary stress on the first syllable than on any other syllable. Four syllable words,
however, are more likely to have primary stress on the second or third syllable than on
the initial or final syllable. In addition, the results have implications for the study of such
phenomena as word reduction because they reveal the inherent imbalance in the
distribution of stress over syllables in multisyllabic words in English.
References
Allen, G. & Hawkins, S. (1980). Phonological rhythm: definition and development. In G.
Yeni-Komshian, J. Kavanaugh, & C. Ferguson (Eds.), Child Phonology, Volume I:
Production (pp. 227-256). New York: Academic Press.
Brown, R. & McNeill, D. (1966). The ‘tip-of-the-tongue’ phenomenon. Journal of
Verbal Learning and Verbal Behavior, 5, 325-337.
Carter, A. (1999). An Integrated Acoustic and Phonological Investigation of Weak Syllable
Omissions. Doctoral dissertation. University of Arizona.
Carter, A. K. & Clopper, C. G. (submitted). Prosodic effects on word reduction.
Language and Speech.
Cutler, A. (1990). Exploiting prosodic probabilities in speech segmentation. In G. T. M .
Altmann (Ed.), Cognitive Models of Speech Processing: Psycholinguistic and
Computational Perspectives (pp. 105-121). Cambridge, MA: MIT Press.
Cutler, A. & Carter, D. M. (1987). The predominance of strong initial syllables in the
English vocabulary. Computer Speech and Language, 2, 133-142.
Cutler, A. & Norris, D. (1988). The role of strong syllables in segmentation for lexical
access.
Journal of Experimental Psychology: Human Perception and
Performance, 14, 113-121.
Echols, C. & Newport, E. (1992). The role of stress and position in determining first
words. Language Acquisition, 2, 189-220.
8
Gerken, L. (1996). Prosodic structure in young children’s language production.
Language, 72, 683-712.
Grosjean, F. (1980). Spoken word recognition processes and the gating paradigm.
Perception and Psychophysics, 28, 267-283.
Hammond, M. (1999). The Phonology of English. Oxford: Oxford University Press.
Hayes, B. (1982). Extrametricality and English stress. Linguistic Inquiry, 13, 227-276.
Kehoe, M. (1999). Truncation without shape constraints: the latter stages of prosodic
acquisition. Language Acquisition, 8, 23-67.
Kucera, H. & Francis, W.N. (1967). Computational Analysis of Present-Day American
English. Providence, RI: Brown University Press.
Luce, P. A. & Pisoni, D. B. (1998). Recognizing spoken words: the neighborhood
activation model. Ear and Hearing, 19, 1-36.
Nusbaum, H. C., Pisoni, D. B., & Davis, C. K. (1984). Sizing up the Hoosier Mental
Lexicon: measuring the familiarity of 20,000 words. In Research on Speech
Perception Progress Report No. 10 (pp. 357-376). Bloomington, IN: Speech
Research Laboratory, Indiana University.
Smith, N. V. (1973). The Acquisition of Phonology. Cambridge: Cambridge University
Press.
Taft, M. (1979). Lexical access via an orthographic code: the Basic Orthographic Syllable
Structure (BOSS). Journal of Verbal Learning and Verbal Behavior, 18, 21-39.
9