Rhyme graphs, sound change, and
perceptual similarity
Morgan Sonderegger
University of Chicago
LSA 83rd Annual Meeting
January 9, 2009
Introduction
Corpus
Graphs
Applications
Diachronic
Synchronic
Discussion
Introduction
I
I
Half vs. full rhymes.
Half-rhymes in (synchronic) phonology:
I
I
I
I
I
Poetry: Jakobson (1960), Holtman (1996), Hanson (2002),
Steriade (2003), ...
Lyrics: Zwicky (1976), Kawahara (2007), Katz (2008)
HRs reflect similarity, recent interest (Steriade, 2001; Frisch
et al., 2004)
Q: Focus on sequences (≥ 2) of half-rhymes... only part of
picture?
Historical phonology:
I
I
I
Rhymes v. important in reconstruction (other source
problems).
HRs in English verse 1500s–present (Wyld, 1923)
Problem: Any one rhyme unreliable, need corroboration.
I
Today: Higher-level structure of corpus – rhyme graph.
I
Qualitatively different from rhyming sequences.
I
Claim: Rhyme graph structure reflects phonological
change, similarity.
I
Rhyme graphs mentioned once before:
J. Joyce (1979). “Re-Weaving the Word-Web: Graph Theory and
Rhymes.” BLS 5.
Example (c. 1700)
Rhyme corpus
I
52k rhymes, English poetry, British (most English) poets b.
≈ 1555–1890.
1600
Jonson
Shakespeare
Drayton
6 minor
1650
Milton
Lovelace
1700
Pope
Swift
Dryden
A. Finch
M. Prior
1750
Goldsmith
Turner
1800
Wordsworth
Shelley
Byron
1900
Brooke
Chesterton
Housman
Kipling
others
I
Subcorpora: 1600, 1650, 1700, 1750, 1800, 1900
I
1900: assume pronunciation known, ≈ current British
(CELEX).
I
Rhyme domain: primary-stressed nucleus, all following
segments of last word of line.
Graphs
I
Set of
1. Vertices V (G)
2. edges E(G)
3. (here) weights w(e)
(e ∈ E(G))
I
Connected component:
“Piece” of G
I
Degree of v , d(v ): #
neighbors of v .
I
Path between v , w ∈ E(G):
set of edges connecting
v , w.
Rhyme graphs
I
Given corpus of rhymes,
Vertices
⇐⇒
words
(v , w) edge ⇐⇒
“ v /w rhyme observed”
Weight
⇐⇒ number of times observed
I
Rhyme data parsed using Perl scripts, → diagrams using
GraphViz.1
1
http://www.research.att.com/sw/tools/graphviz/
Example components
1700 corpus, colors=current pronunciations of final nuclei:
From 1900 corpus:
Diachronic application: the meat/meet merger
I
I
Present-day English (PDE) /i:/ is (roughly) merger of Late
Middle English (LME) /e:/ and /i:/.
Whether PDE /i:/ was ∗ e or ∗ i (usually) follows from current
spelling:
I ∗
I ∗
I
e: meat, complete, receive
i: meet, believe, me
Abstracting away from other issues:
I
I
Roughly concurrent meat/mate (near) merger (Labov, 1994)
Some exceptions: threat, dread, . . .
I
Question: What was time-course of merger?
I
Know ∗ e and ∗ i were largely distinct in 1600 (e.g. Gill,
1619; Lass, 1992), merged today.
I
Stragegy: Use 1600, 1900 graphs as endpoints, compare
relative likelihood of hypotheses that a given poet’s rhymes
consistent with each graph.
I
We already ≈ know the answer, but can we get it using just
rhyme graphs?
Background: Bayesian inference
I
Statistical approach, knowledge about variable θ of interest
probabilistic.
I
Combine prior knowledge about θ with obs of data.
From
I
1. Prior distribution P(θ)
2. Data x
3. Conditional distribution P(x | θ)
get posterior distribution:
P(θ | x) =
P(x | θ)P(θ)
P(x | θ)P(θ)
=R
P(x)
P(x | θ 0 )P(θ 0 ) dθ 0
Bayes factors
I
Hypothesis test.
I
Hypotheses, H1 & H2 , define priors P(θ | H2 ), P(θ | H2 ).
I
Bayes factor of data is relative probability under H1 & H2 :
BF (x; H1 , H2 ) =
=
P(x | H1 )
P(x | H2 )
R
P(x | θ)P(θ | H1 ) dθ
R
P(x | θ)P(θ | H2 ) dθ
Methodology
I
BF for 19 poets (1k-5k rhymes) born 1536–c.1890.
I
For poet i, take PDE /i:/ words subgraph:
I
Data: x i = (xi,1 , . . . , xi,6 )
xi,j =# observed type j, poet i rhymes.
I
Parameters: θ i = (θi,1 , . . . , θi,6 )
θi,j = underlying fraction type j, poet i rhymes.2
j = 1, . . . , 6: nuclei × codas:
I
I
I
2
Nuclei: {∗ e/∗ e, ∗ i/∗ i, ∗ e/∗ i}
Codas: {Same, Diff}
So θi,1 + · · · + θi,6 = 1.
I
Given θi , x i is multinomially distributed, i.e.
P(x i | θ i ) =
Y
6
ni
x
θi,ji,j
xi,1 , . . . , xi,6
j=1
with ni = xi,1 + · · · + xi,6 .
I
Intuitively:
I
I
I
There are ni rhymes
θi,j : Prob. any one rhyme is type j.
⇒ prob. of choosing xi,1 of type 1, . . ., xi,6 of type 6.
Priors
I
Data vectors for 1600 and 1900 /i:/ graphs:
x 1600 = (148, 691, 8, 2, 3, 1)
x 1900 = (132, 503, 320, 4, 1, 4)
I
Translate to priors P(θ | 1600), P(θ | 1900)
I
Dirichlet priors, easiest for multinomial data (Gelman et al.,
2004).
I
Weighting factor included, varies data/prior importance.
Results
I
BF for 1600 vs 1900 for each poet:
I
y-axis on log scale.
I
BF < 1 after ≈ birth year 1700 ⇒ disappearing in 1725
poetry.
Lass (1992):
I
I
I
“The change-over begins in the late 17th century.”
By ≈ 1750 distinct ∗ e & ∗ i rare, seen as rustic.
Synchronic application: Graph structure correlates of similarity
I
Corpus rhyme graph made up of components.
I
Which contain all identical rhyme stems, mostly identical,
other (spelling rhymes) ... ?
Synchronic application: Graph structure correlates of similarity
I
Corpus rhyme graph made up of components.
I
Which contain all identical rhyme stems, mostly identical,
other (spelling rhymes) ... ?
I
Hypothesis: Component strucure correlated with degree
of vertices’ (phonological) similarity.
Component goodness
I
For graph G, components C1 , . . . , Cn , need measure of
similarity of rhyme stems of a component’s vertices. Want:
I
I
I
Best: All vertices same RS.
Worst: All vertices different RS.
Nonlinear measure: Two distinct RS much worse than 1,
slightly better than 3...
I
Simple: If words in C have k distinct RS’s, component
goodness is γ(C) = 1/k.
I
Similarity measure binary, no segmental information.
Graph structure variables (GSVs)
I
Quantitative property of graph correlated with goodness.
I
Hypothesized correlates of goodness, corresponding
GSVs:
I
Let n = |V (C)|, m = |E(C)|, N =total rhymes.
I
Connectedness: “More connected” components less
likely to contain half-rhymes. (“If a rhyme can occur,
eventually it will.”)
GSVs:
I
I
I
Normalized maximum clique size: The size of the largest
set of words ∈ C for which all rhymes observed (clique),
divided by max possible.
Edge ratio: Ratio of observed to possible edges.
Diameter: Maximum shortest-path distance between words
(length=# edges).
I
Existence of good partition: Component made up of two
interally well-connected pieces, few edges between them
⇒ half-rhyme.
GSV: Eigenvalue gap:
Measure of how easily V can
be cut in two, ∈ [0, 1], higher =
more cuttable.3
3
1 − λ2 , where λ2 second largest eigenvalue of Laplacian of G.
I
Size: Bigger components more likely to contain half-rhyme.
GSVs: log(n), log(#rhymes)
I
Other GSVs:
I
I
Rhymes per vertex
Maximally-connected vertex: Maximum of
degree(v )
(n−1) .
Results: GSV/Component goodness correlations
I
Corr coeff r for each
GSV w/ γ(C)
I
1900 subgraph,
components w/ ≥ 4
words → (135
components), rhyme
stems from CELEX.
Variable
max clq nzd
edge rat
diameter
log(size)
log(rhymes)
ev gap
maxdegs nzd
rhy per vert
r
0.67
0.67
-0.56
-0.69
-0.65
-0.48
0.55
-0.41
p(r )
0
0
0
0
0
< 10−3
0
< 10−2
I
All correlations highly significant
I
Covariance high for GSVs with same motivation
I
Connectedness & size best predictors of component
goodness
I
Eigenvalue gap v. good considering “existence of good
parition” ≈ binary.
I
All correlations highly significant
I
Covariance high for GSVs with same motivation
I
Connectedness & size best predictors of component
goodness
I
Eigenvalue gap v. good considering “existence of good
parition” ≈ binary.
I
Overall: Component graph structure reflects similarity of
rhyme stems.
Discussion
I
Rhyme graphs give linguistic evidence at different level
than individual rhymes: ensemble.
I
Accessible using computational methods, graphical
representation.
Rhyme graph structure reflects:
I
I
I
Change: Similarity to pre- vs. post-merger rhymes →
merger trajectory.
Similarity: Component structure ↔ (phonological) similarity
of included words.
I
Potential applications:
I
I
I
I
Reconstruction for languages with less data, non-phonetic.
Graph spectrum → can say what good partition is,
reconstruct pre-merger classes.
(Synchronic) phonological significance of components?
To do:
I
I
I
I
Everything here very simplified
Binary similarity → phonetic similarity
Better statistical models: individual components instead of
whole graph, reconstruct mergers in different phonetic
environments.
Consider spelling
I
Potential applications:
I
I
I
I
To do:
I
I
I
I
I
Reconstruction for languages with less data, non-phonetic.
Graph spectrum → can say what good partition is,
reconstruct pre-merger classes.
(Synchronic) phonological significance of components?
Everything here very simplified
Binary similarity → phonetic similarity
Better statistical models: individual components instead of
whole graph, reconstruct mergers in different phonetic
environments.
Consider spelling
Thanks!
© Copyright 2026 Paperzz