A Maximum Entropy Model of Phonotactics and

Linguistic Society of America
Annual Meeting
January 8, 2010
Baltimore, MD
Gerard Manley Hopkins’s Sprung Rhythm:
Corpus study and stochastic grammar
Bruce Hayes
UCLA
Claire Moore-Cantwell
University of Massachusetts, Amherst
1. Sprung rhythm
• A poetic meter invented by the Victorian poet Gerard Manley Hopkins (1844-1899), the
basis of some of his most admired poetry.
• Until recently, nobody understood how it worked. Hopkins’s own explanations give us
only hints.
• Some scholarship has even charged Hopkins with being deluded, in imagining he was
composing to a real meter.
2. Kiparsky (1989)
• Kiparsky claims to have cracked the code.
 Essential ingredient, neglected by earlier analysts: syllable quantity.
3. Revisiting Kiparsky’s work
• We returned to the same data, using newly available tools:
 electronic corpus
 machine scansion
 stochastic grammar frameworks
4. Goals
• Validate the original analysis, using digital technology for thoroughness.
• Render the analysis more complete by adding constraints and couching it in the
framework of stochastic grammar.
5. Outline of the talk
•
•
•
•
Summarize the Kiparskian analysis
Add two new inviolable constraints
Outline the problem of indeterminate scansions
Offer a solution: violable constraints and stochastic grammar
Hayes/Moore-Cantwell
p. 2
Sprung Rhythm: Corpus study and stochastic grammar
THE KIPARSKYAN ANALYSIS OF SPRUNG RHYTHM
6. Notation for syllable weight
–
⏑
⏓
heavy
light
ambiguous: heavy or light, as the scansion requires
7. How syllable weight works in sprung rhythm
• Basically, the normal Latin-like syllable weight criterion, with closed and long-voweled
syllables counting as heavy.
• With three added complications:
 You can optionally ignore a single word final consonant (cf. “consonant
extrametricality”; Hayes 1982) So havoc [»hæ.vək] is /⏑ ⏓/.
 You can optionally consider a word-final stressless non-low vowel as short (cf.
Chomsky and Halle 1968) So they [eɪ] is /⏓/, I [aɪ] is /–/.
 You’re allowed to collapse stressless vowel + coda sonorant into a single short
syllabic sonorant; hence light (dandled /ˈdændəld/ can be /– ⏑/, interpreted as
[»dændld].
8. Framework
• Kiparsky’s work falls in the mainstream research tradition of generative metrics (Halle
and Keyser 1969, 1971; Kiparsky 1975, 1977, et seq.)
• Meter is a sequence of strong and weak slots; e.g. like W S W S W S W S W.
• There are rules for how you can fill an S or a W slot (cf. Hanson and Kiparsky 1996).
• Here is a case of such filling:
/w () /s To- /w wery /s city /w and /s bran- /w chy be- /s tween /w () /s to- /w wers;
DO 1
• These rules refer to stress, syllable weight, phrasing, and word boundaries.
9. Meters of sprung rhythm
• Each is an alternating sequence of S (Strong) and W (Weak) positions, beginning and
ending with W.
• e.g. W S W S W S W S W, tetrameter
• 2 S’s = dimeter, 3 S’s = trimeter, and so on.
• Possible meters: 2, 3, 4, 5, 6, 8, or varying length in a fixed stanza type
1
For abbreviations of poem titles see Appendix A below.
Hayes/Moore-Cantwell
Sprung Rhythm: Corpus study and stochastic grammar
p. 3
RULES FOR FILLING S AND W POSITION
10. Preliminary definition
• Resolved sequence = stressed light followed by a stressless light in the same word
• Words that embody resolved sequences: dapple [ˈdæpl ̩], level [ˈlɛvl ̩]
11. The rules for filling W and S position
• W may be filled with any of the following:
a.
b.
c.
d.
e.
A single stressless syllable
A stressed monosyllable
A sequence of stressless light syllables
A resolved sequence
Null
• S may be filled with any of the following:
a. A single stressed syllable
b. A resolved sequence
c. A single stressless syllable, provided it is not light.
• For examples of all of these rules, see Appendix B below.
OUTRIDES
12. Outrides
• Extra syllables not affiliated with either W or S position.
• They are Hopkins’s extension of “extrametrical syllables,” a common phenomenon in
ordinary verse (Kiparsky 1977, 230-232).
13. Conditions on outrides
• Content of outride: same as any W position
• Required context for outrides:
 Must precede a phonological break (end of P-phrase, I-phrase; Selkirk 1980)
 Stress on an outride must be weaker than stress on the preceding S.
 Normally allowed only in tetrameter or longer lines
14. Example of an outride
/w () /s Shares /w their /s best /w gifts /s sure- /o ly, /w fall /s how /w things /s will),
BC
Hayes/Moore-Cantwell
p. 4
Sprung Rhythm: Corpus study and stochastic grammar
DATA CORPUS
15. Purpose
• In order to test Kiparsky’s analysis (and our revisions) by our method, we must first code
the entire sprung rhythm corpus in digital form.
16. Poems included
• All 25 sprung rhythm poems in which the number of S positions per line is known.
• 583 lines, 6127 syllables
17. Phonological coding
For every syllable we coded:
• Stress — numbers 1-2-3-4, with 4 the greatest. We follow the phrasal stress rules in the
literature (e.g. Chomsky and Halle 1968, Selkirk 1984, Hayes 1995).
• Weight: light, heavy, or ambiguous
• Phonological phrasing: Selkirk’s Prosodic Hierarchy (1980, 1986), with levels of
phrasing (Word, Clitic Group, Phonological Phrase, Intonational Phrase) and rules for
formation as in Hayes (1989).
18. Example of a coded line
IP
PP
CG
CG
Word
To- we3
1
–
⏓
PP
CG
Word Word
ry
1
⏓
ci- ty
4 1
⏑
⏓
and
1
⏓
CG
Word
Word
Word
bran- chy be- tween to- wers;
3
1
1 2
4 1
–
⏓
⏑
–
– ⏓
phrasing 2
syllables
stress
weight
MACHINE SCANSION
19. Chopkins.exe
• This program knows all the options for filling S, W, O.
• It finds all the possible scansions of a line, or where appropriate tells the user that no
legal scansions exist.
2
We don’t code the actual tree. Instead: for each syllable, the rank of the highest constituent of which it is
the rightmost syllable.
Hayes/Moore-Cantwell
Sprung Rhythm: Corpus study and stochastic grammar
p. 5
20. Procedure: first step
• We inspected the outputs of Chopkins and used them to discover new constraints to add
to Kiparsky’s system, in order to make the grammar more restrictive.
PROPOSED ADDITIONAL INVIOLABLE CONSTRAINTS
21. FINAL FALL
Assess a violation when the rightmost S is filled by a syllable that does not have more stress
than what fills the following W.
22. *EMPTY W INSIDE LEVEL I
• Empty W is rare inside a word.
• Moreover, all 5 attested cases put the empty W between a Level II (Kiparsky 1982) affix
and the stem, like this:
/w ()/s Strokes /w of /s ha- /w voc /s un- /w () /s selve
• So: we impose an inviolable ban on empty W internal to a Level I domain.
23. Modify the grammar
• We altered Chopkins to respect these two new constraints.
TESTING THE ANALYSIS
24. The question
• How well does the grammar work?
 Might counterexamples have slipped by Kiparsky in his earlier inspection of the
data?
 Do our new constraints still permit the whole corpus to be scanned?
25. Results
• A fair amount of discussion and fine tuning of individual examples is needed (see full
paper), but the upshot is that there are about 2 unmetrical lines (583 lines total):
Forward-like, but however, and like favourable heaven heard these.
A heart’s-clarion! Away grief’s gasping, joyless days, dejection.
26. Query
• How meaningful is it that there are just 2 exceptions?
• One way to check: do a comparison with prose:
BC
HF
Hayes/Moore-Cantwell
Sprung Rhythm: Corpus study and stochastic grammar
p. 6
 How many “exceptions” would we get if we tried to scan lines of ordinary
English as sprung rhythm?
 Earlier uses of this method: Tarlinskaja and Teterina (1974), Tarlinskaja (1976),
Biggs (1996)
27. Sources of prose
• We used Hopkins’s own writings:
 unpublished “Author’s preface”
 a few of his letters
28. Forming the sample
• Separated these texts into “pseudo-lines” — sequences separated by punctuation marks
• Selected 155 to match real corpus distribution of line lengths in syllables
• For each: randomly assign to a meter (trimeter, tetrameter, etc.), matching the statistics
of corpus (e.g., words of n syllables occur in lines with m S positions x% of the time)
29. Result
• About 10% of the prose lines are unscannable with the meter that was randomly assigned
to them.
• This proportion is higher—significantly so—then the proportion of unscannable real
lines.
metrical unmetrical
581
2
verse
139
16
prose
Fisher’s exact test: p < 10–9
30. Summing up so far
• Kiparsky’s system can be slightly tightened with two inviolable constraints.
• Thus modified, it suffers very few counterexamples and stands up to statistical testing
with the prose model method.
• But we think there is still a problem with it: insufficient restrictiveness.
THE PROBLEM OF TOO MANY LEGAL SCANSIONS
31. Defining the problem
• If a metrical analysis allows a great number of scansions, it is unrestrictive.
• An insufficiently restrictive system would be scientifically uninteresting; it would make
scansion “as indeterminate as slicing cucumbers” (Kiparsky 1989, 308)
Hayes/Moore-Cantwell
Sprung Rhythm: Corpus study and stochastic grammar
p. 7
32. How many scansions does the (modified) Kiparskian analysis allow?
Chopkins can tell us.





Line with a unique scansion: just 47 (out of 583)
More than 10: 211 lines
More than 100: 12 lines
average: 14.8
median: 6
• We think this is probably too many.
33. Our proposed solution
• We think Kiparsky’s work only found a subset of the constraints under which Hopkins
wrote—the inviolable ones.
• We can and should add additional violable constraints.
• We can deploy these constraints rigorously by using a framework of stochastic grammar.
• … and we can test our proposal, because Hopkins left testimony about which scansions
he felt were best.
USING HOPKINS’S DIACRITICS
AS A DIAGNOSTIC FOR HIS PREFERRED SCANSIONS
34. The diacritics
• Hopkins added them because his friends couldn’t scan his poems.
• They mark outrides, empty W, syllables that should be scanned in S or in W.
• We can use these diacritics to single out a “Hopkins preferred” scansion from the many
logically possible scansions.
35. The informative subset of the corpus
• The lines in which only one scansion is compatible with the diacritics.
• This is true for 311/583 lines
36. Goal
• Construct a stochastic grammar that maximally favors the same scansions that Hopkins
preferred, according to his diacritics.
CONSTRUCTING A STOCHASTIC GRAMMAR I: CONSTRAINT SET
37. Source
• The literature on generative metrics, including Kiparsky (1989).
Hayes/Moore-Cantwell
Sprung Rhythm: Corpus study and stochastic grammar
p. 8
38. Avoiding multiply-filled positions
a. *2 SYL IN W
Assess a violation for W positions filled by 2 or more syllables.
b. *3 SYL IN W
Assess a violation for W positions filled by 3 or more syllables.
c. *RESOLUTION IN S Assess a violation for each resolved sequence in S position.
39. Matching the stress pattern to the S’s and W’s
• Match SW to a “fall” in stress contour (Magnuson and Ryder 1970; Tarlinskaja 1976)
MATCH SW
Assess a violation if the (first) syllable occupying W position has
more stress than the (first) syllable occupying the preceding S.
• We tried MATCH WS, but it did not improve the model fit and so we omit it.
40. Flanking empty W positions with stressed syllables
*NO-CLASH EMPTY W
Assess a violation for an empty W position if the S positions that flank it are not both filled
by a stressed syllable.
41. Constraints on outrides
a. *OUTRIDE
Assess a violation for every outride.
b. *OUTRIDE-WEAK BREAK
Assess a violation for every outride that is only at the end of a Clitic Group.
c. *OUTRIDE-SHORT LINE
Assess a violation for every outride in a line with 4 or fewer S positions.
42. All these constraints are violable
Constraint
*2 SYL IN W
*3 SYL IN W
*RESOLUTION IN S
MATCH SW
*NO-CLASH EMPTY W
*OUTRIDE
*OUTRIDE-WEAK BREAK
*OUTRIDE-SHORT LINE
Number of violations in 311
“Hopkins scansions”
211
23
21
117
21
88
3
2
Hayes/Moore-Cantwell
Sprung Rhythm: Corpus study and stochastic grammar
p. 9
CONSTRUCTING A STOCHASTIC GRAMMAR II: FORMAL FRAMEWORK
43. Goal
• A system that calculates output probabilities from constraint violation profiles in a
principled way.
• Here, for “principled”, we use the maximum likelihood criterion: maximize the
predicted probability of Hopkins’s own preferred scansions.
 This is a pretty standard criterion for fitting models.
• Maxent grammars do this in a fairly simple way, and are backed by solid mathematics.
44. Maxent grammars in recent phonological work
• Goldwater and Johnson 2003, Wilson 2006, Hayes and Wilson 2008, Hayes, Zuraw,
Siptár and Londe (in press)
45. Basis of maxent grammars
• They are a subspecies of harmonic grammars (Legendre, Smolensky, and Miyata 1990;
Pater 2009, Potts et al., in press)
• Each constraint has a weight, a non-negative number, expressing how much it lowers the
output probability of candidates that violate it.
46. Finding the weights
• This is done by fitting them to the data.
• For a presentation of the algorithm involved, see Hayes and Wilson 2008.
• Software used: the Maxent Grammar Tool (Colin Wilson/Ben George),
http://www.linguistics.ucla.edu/people/hayes/MaxentGrammarTool/
47. Our maxent simulation
• Feed the software:




the 583 sprung rhythm lines of the corpus
every legal candidate scansion of these lines—8633 in total
the violation profiles for all the candidate scansions
for the 311 lines in which an unambiguous Hopkins scansion was determinable, a
designation of this scansion as the “winning” one, for purposes of training the
weights.
• The algorithm tries to maximize the probability assigned to these winning scansions.
• Program output:
 the best-fit weights
 the predicted probability of every scansion
Hayes/Moore-Cantwell
Sprung Rhythm: Corpus study and stochastic grammar
p. 10
48. Weights obtained
MATCH SW
*NO CLASH EMPTY W
*S RESOLUTION
*2 SYL IN W
*3 SYL IN W
*OUTRIDE
*OUTRIDE SHORT LINE
*OUTRIDE WEAK BREAK
1.05
1.74
1.44
1.75
1.75
2.22
11.48
1.69
EVALUATING THE MACHINE-LEARNED GRAMMAR
49. Guesses needed to find right answer
• Procedure: for each line, sort the candidates by predicted probability assigned by the
maxent grammar, in descending order.
• Count ties as the larger value (2 candidates tied for 1st = “2”)
• How far down the list (many such “guesses”) were needed to find the Hopkins-preferred
scansion?
Guesses Number
1
2
3
4
5
6
8
10
15
54
158
260
24
10
8
2
2
1
1
1
1
1
Cumulative fraction of total
0.836
0.913
0.945
0.971
0.977
0.984
0.987
0.990
0.994
0.997
1.000
• The average rank of the correct guess is 2.02.
• Without the violable stochastic constraints, guessing candidates with equal Kiparskian
probability, the comparable number is 7.5 guesses.
50. Other models (work in progress)
• We plan to run our data on similar stochastic grammar models: Stochastic OT (Boersma
1997, Boersma and Hayes 2001); Noisy Harmonic Grammar (Pater 2009, Potts et al. in
press).
• Point of interest: we think there may be harmonically bounded winners in the corpus,
(Appendix C), which these theories predict to have zero probability.
Hayes/Moore-Cantwell
Sprung Rhythm: Corpus study and stochastic grammar
p. 11
DISCUSSION
51. Hopkins’s metrical experiment
• He wasn’t deluded about his meter. His system is restrictive, and even more so than
Kiparsky claimed.
• With our additions, the system now normally exhibits a strong preference for one single
scansion (one dominates the others in its probability), which in a large majority of
testable cases coincides with the one Hopkins preferred.
52. Quasi-prediction in linguistics
• Although all the options that obey the inviolable constraints are in principle legal, the
choice among these legal options is far from random and can be “quasi-predicted” by a
stochastic grammar.
• Compare Bresnan et al. (2007), who use a stochastic model to quasi-predict which kind
of dative construction (V NP PP or V NP NP) speakers use in dative sentences.
53. Methodology of metrics
• We think doing metrics with a corpus and machine search puts you on safer ground, helps
you discover new constraints, and helps in verifying you’re on the right track.
Appendix A: poems studied with title abbreviations
AB
BC
BP
BR
CC
CS
DO
FR
HF
HH
HP
HR
ID
KF
LE
MM
NW
PB
RB
SD
Ashboughs
The Bugler’s First Communion
Binsey Poplars
Brothers
Carrion Comfort
The Caged Skylark
Duns Scotus’s Oxford
Felix Randal
That Nature is a Heraclitean Fire and of the Comfort of the Resurrection
Hurrahing in Harvest
Harry Ploughman
Henry Purcell
Inversnaid
As Kingfishers Catch Fire
The Loss of the Eurydice
The May Magnificat
No Worst
Pied Beauty
Ribblesdale
The Soldier
Hayes/Moore-Cantwell
SF
SS
TG
WH
WM
Sprung Rhythm: Corpus study and stochastic grammar
p. 12
Spring and Fall
Spelt from Sibyl’s Leaves
Tom’s Garland
The Windhover
At the Wedding March
Appendix B: Examples of how S and W positions are filled
1. W can be any single stressless syllable
• This is the norm in stress-based poetry; many examples.
2. W can be any sequence of stressless light syllables
/w A /s bea- /w con, an e- /s ter- /w nal /s beam. /w Flesh /s fade, /w and /s mor- /w tal /s trash
HF
[kən ən ə]
3. W can be any stressed monosyllable
/W Young /S John: /W then /S fear, /W then /S joy
BR
4. W can be a resolved sequence
/w Her /s fond /w yellow /s horn- /w light /s wound /w to the /s west, /w her /s wild
[»jlU]
/w hollow /s hoar- /w light /s hung /w to the /s height
[»hɒlU]
SS
5. W can be Null
• This is what Hopkins meant by “sprung”.
/w () /s Aèll /w () /s félled,: /w () /s félled, /w are /s áll /w () /s félled;
BP
6. S can be filled by a single stressed syllable
• This is the norm in stress-based poetry.
7. S can be filled by a resolved sequence
• Resolved sequence as defined as in (4) above.
• Not as common as in W position.
/w This /s very /w very /s day /w came /s down /w to us /s af- /w ter a /s boon /w he on
[»v̆rɪ ̆]
BC
Hayes/Moore-Cantwell
Sprung Rhythm: Corpus study and stochastic grammar
p. 13
8. S can filled by a single stressless non-light syllable
• This one is ok, because [ənd] can be counted as heavy:
/w Till a /s life- /w belt /s and /w God’s /s will
[ənd]
LE
• But this constructed line is impossible, since /C0ə/ is never heavy:
*/w Till it /s streng- /w then /s a /w man’s /s will
[ə]
References
Biggs, Henry (1996) A Statistical Analysis of the Metrics of the Classic French Decasyllable and
Classic Alexandrine. Ph.D. dissertation, Program in Romance Linguistics, UCLA, Los
Angeles, CA.
Bresnan, Joan, Anna Cueni, Tatiana Nikitina, and Harald Baayen. 2007. “Predicting the Dative
Alternation.” In Cognitive Foundations of Interpretation, ed. by G. Boume, I. Kraemer,
and J. Zwarts. Amsterdam: Royal Netherlands Academy of Science, pp. 69-94. 33 pages.
Chomsky, Noam and Morris Halle (1968) The Sound Pattern of English. New York: Harper
and Row.
Goldwater, Sharon, and Mark Johnson. 2003. Learning OT constraint rankings using a maximum
entropy model. In Proceedings of the Stockholm Workshop on Variation within Optimality
Theory, ed. Jennifer Spenader, Anders Eriksson, and Osten Dahl, 111–120.
Halle, Morris and S. Jay Keyser (1969) Chaucer and the study of prosody. College English 28,
187-219.
Halle, Morris, and S. Jay Keyser (1971) English Stress: Its Form, Its Growth, and Its Role in
Verse, Harper and Row, New York.
Hanson, Kristin and Paul Kiparsky (1996) A Parametric Theory of Poetic Meter. Language 72:
287-335.
Hayes, Bruce (1982) Extrametricality and English stress. Linguistic Inquiry 13, 227-276.
Hayes, Bruce (1989) The Prosodic Hiearchy in meter. In Paul Kiparsky and Gilbert Youmans,
eds., Rhythm and Meter, Academic Press, Orlando, FL, pp. 201-260 (1989).
Hayes, Bruce. 1995. Metrical stress theory: principles and case studies. Chicago: University of
Chicago Press.
Hayes, Bruce, Kie Zuraw, Péter Siptár, and Zsuzsa Londe (in press) Natural and unnatural
constraints in Hungarian vowel harmony. To appear in Language.
Hayes, Bruce and Colin Wilson (2008) A maximum entropy model of phonotactics and
phonotactic learning. Linguistic Inquiry 39: 379-440.
Kiparsky, Paul. 1982. Lexical phonology and morphology. In Linguistics in the morning calm,
ed. In-Seok Yang, 3–91. Seoul: Hanshin.
Kiparsky, Paul (1975) Stress, Syntax, and Meter. Language 51, 576-616.
Kiparsky, Paul (1977) “The rhythmic structure of English verse,” Linguistic Inquiry 8, pp. 189247, 1977
Hayes/Moore-Cantwell
Sprung Rhythm: Corpus study and stochastic grammar
p. 14
Kiparsky, Paul (1989) Sprung rhythm. In Paul Kiparsky and Gilbert Youmans, eds., Rhythm and
Meter. San Diego: Academic Press.
Legendre, Géraldine, Yoshiro Miyata, and Paul Smolensky. 1990. Harmonic grammar: A formal
multi-level connectionist theory of linguistic well-formedness: an application. In COGSCI
1990, 884–891.
MacKenzie, Norman H. (1990) The Poetical Words of Gerard Manley Hopkins. Oxford:
Clarendon Press.
MacKenzie, Norman H. (1991) The Later Poetic Manuscripts of Gerard Manley Hopkins. New
York and London: Garland.
Magnuson, Karl, and Frank G. Ryder. 1970. The study of English prosody: an alternative
proposal. College English 31. 789-820.
Potts, Christopher, Joe Pater, Karen Jesney, Rajesh Bhatt, and Michael Becker (in press)
Harmonic grammar with linear programming. To appear in Phonology.
Pater, Joe (2009) Weighted constraints in generative linguistics. Cognitive Science 33: 999-1035
Selkirk, Elizabeth (1980). Prosodic domains in phonology: Sanskrit revisited. In Mark Aronoff
and Mary-Louise Kean, eds., Juncture. Saratoga, California: Anma Libri.
Selkirk, Elizabeth O. 1982. Sound and Syntax: the Relation between Sound and Structure, MIT
Press.
Selkirk, Elizabeth O. 1986. On derived domains in sentence phonology. Phonology Yearbook
3:371–405.
Tarlinskaja, Marina (1976) English Verse: Theory and History. The Hague: Mouton.
Tarlinskaja, Marina and L. M. Teterina (1974). Verse-prose-metre. Linguistics 129: 63-86.
Wilson, Colin. 2006. Learning phonology with substantive bias: an experimental and
computational investigation of velar palatalization. Cognitive Science 30:945–982