Rhyme graphs, sound change, and perceptual similarity

Rhyme graphs, sound change, and
perceptual similarity
Morgan Sonderegger
University of Chicago
LSA 83rd Annual Meeting
January 9, 2009
Introduction
Corpus
Graphs
Applications
Diachronic
Synchronic
Discussion
Introduction
I
I
Half vs. full rhymes.
Half-rhymes in (synchronic) phonology:
I
I
I
I
I
Poetry: Jakobson (1960), Holtman (1996), Hanson (2002),
Steriade (2003), ...
Lyrics: Zwicky (1976), Kawahara (2007), Katz (2008)
HRs reflect similarity, recent interest (Steriade, 2001; Frisch
et al., 2004)
Q: Focus on sequences (≥ 2) of half-rhymes... only part of
picture?
Historical phonology:
I
I
I
Rhymes v. important in reconstruction (other source
problems).
HRs in English verse 1500s–present (Wyld, 1923)
Problem: Any one rhyme unreliable, need corroboration.
I
Today: Higher-level structure of corpus – rhyme graph.
I
Qualitatively different from rhyming sequences.
I
Claim: Rhyme graph structure reflects phonological
change, similarity.
I
Rhyme graphs mentioned once before:
J. Joyce (1979). “Re-Weaving the Word-Web: Graph Theory and
Rhymes.” BLS 5.
Example (c. 1700)
Rhyme corpus
I
52k rhymes, English poetry, British (most English) poets b.
≈ 1555–1890.
1600
Jonson
Shakespeare
Drayton
6 minor
1650
Milton
Lovelace
1700
Pope
Swift
Dryden
A. Finch
M. Prior
1750
Goldsmith
Turner
1800
Wordsworth
Shelley
Byron
1900
Brooke
Chesterton
Housman
Kipling
others
I
Subcorpora: 1600, 1650, 1700, 1750, 1800, 1900
I
1900: assume pronunciation known, ≈ current British
(CELEX).
I
Rhyme domain: primary-stressed nucleus, all following
segments of last word of line.
Graphs
I
Set of
1. Vertices V (G)
2. edges E(G)
3. (here) weights w(e)
(e ∈ E(G))
I
Connected component:
“Piece” of G
I
Degree of v , d(v ): #
neighbors of v .
I
Path between v , w ∈ E(G):
set of edges connecting
v , w.
Rhyme graphs
I
Given corpus of rhymes,
Vertices
⇐⇒
words
(v , w) edge ⇐⇒
“ v /w rhyme observed”
Weight
⇐⇒ number of times observed
I
Rhyme data parsed using Perl scripts, → diagrams using
GraphViz.1
1
http://www.research.att.com/sw/tools/graphviz/
Example components
1700 corpus, colors=current pronunciations of final nuclei:
From 1900 corpus:
Diachronic application: the meat/meet merger
I
I
Present-day English (PDE) /i:/ is (roughly) merger of Late
Middle English (LME) /e:/ and /i:/.
Whether PDE /i:/ was ∗ e or ∗ i (usually) follows from current
spelling:
I ∗
I ∗
I
e: meat, complete, receive
i: meet, believe, me
Abstracting away from other issues:
I
I
Roughly concurrent meat/mate (near) merger (Labov, 1994)
Some exceptions: threat, dread, . . .
I
Question: What was time-course of merger?
I
Know ∗ e and ∗ i were largely distinct in 1600 (e.g. Gill,
1619; Lass, 1992), merged today.
I
Stragegy: Use 1600, 1900 graphs as endpoints, compare
relative likelihood of hypotheses that a given poet’s rhymes
consistent with each graph.
I
We already ≈ know the answer, but can we get it using just
rhyme graphs?
Background: Bayesian inference
I
Statistical approach, knowledge about variable θ of interest
probabilistic.
I
Combine prior knowledge about θ with obs of data.
From
I
1. Prior distribution P(θ)
2. Data x
3. Conditional distribution P(x | θ)
get posterior distribution:
P(θ | x) =
P(x | θ)P(θ)
P(x | θ)P(θ)
=R
P(x)
P(x | θ 0 )P(θ 0 ) dθ 0
Bayes factors
I
Hypothesis test.
I
Hypotheses, H1 & H2 , define priors P(θ | H2 ), P(θ | H2 ).
I
Bayes factor of data is relative probability under H1 & H2 :
BF (x; H1 , H2 ) =
=
P(x | H1 )
P(x | H2 )
R
P(x | θ)P(θ | H1 ) dθ
R
P(x | θ)P(θ | H2 ) dθ
Methodology
I
BF for 19 poets (1k-5k rhymes) born 1536–c.1890.
I
For poet i, take PDE /i:/ words subgraph:
I
Data: x i = (xi,1 , . . . , xi,6 )
xi,j =# observed type j, poet i rhymes.
I
Parameters: θ i = (θi,1 , . . . , θi,6 )
θi,j = underlying fraction type j, poet i rhymes.2
j = 1, . . . , 6: nuclei × codas:
I
I
I
2
Nuclei: {∗ e/∗ e, ∗ i/∗ i, ∗ e/∗ i}
Codas: {Same, Diff}
So θi,1 + · · · + θi,6 = 1.
I
Given θi , x i is multinomially distributed, i.e.
P(x i | θ i ) =
Y
6
ni
x
θi,ji,j
xi,1 , . . . , xi,6
j=1
with ni = xi,1 + · · · + xi,6 .
I
Intuitively:
I
I
I
There are ni rhymes
θi,j : Prob. any one rhyme is type j.
⇒ prob. of choosing xi,1 of type 1, . . ., xi,6 of type 6.
Priors
I
Data vectors for 1600 and 1900 /i:/ graphs:
x 1600 = (148, 691, 8, 2, 3, 1)
x 1900 = (132, 503, 320, 4, 1, 4)
I
Translate to priors P(θ | 1600), P(θ | 1900)
I
Dirichlet priors, easiest for multinomial data (Gelman et al.,
2004).
I
Weighting factor included, varies data/prior importance.
Results
I
BF for 1600 vs 1900 for each poet:
I
y-axis on log scale.
I
BF < 1 after ≈ birth year 1700 ⇒ disappearing in 1725
poetry.
Lass (1992):
I
I
I
“The change-over begins in the late 17th century.”
By ≈ 1750 distinct ∗ e & ∗ i rare, seen as rustic.
Synchronic application: Graph structure correlates of similarity
I
Corpus rhyme graph made up of components.
I
Which contain all identical rhyme stems, mostly identical,
other (spelling rhymes) ... ?
Synchronic application: Graph structure correlates of similarity
I
Corpus rhyme graph made up of components.
I
Which contain all identical rhyme stems, mostly identical,
other (spelling rhymes) ... ?
I
Hypothesis: Component strucure correlated with degree
of vertices’ (phonological) similarity.
Component goodness
I
For graph G, components C1 , . . . , Cn , need measure of
similarity of rhyme stems of a component’s vertices. Want:
I
I
I
Best: All vertices same RS.
Worst: All vertices different RS.
Nonlinear measure: Two distinct RS much worse than 1,
slightly better than 3...
I
Simple: If words in C have k distinct RS’s, component
goodness is γ(C) = 1/k.
I
Similarity measure binary, no segmental information.
Graph structure variables (GSVs)
I
Quantitative property of graph correlated with goodness.
I
Hypothesized correlates of goodness, corresponding
GSVs:
I
Let n = |V (C)|, m = |E(C)|, N =total rhymes.
I
Connectedness: “More connected” components less
likely to contain half-rhymes. (“If a rhyme can occur,
eventually it will.”)
GSVs:
I
I
I
Normalized maximum clique size: The size of the largest
set of words ∈ C for which all rhymes observed (clique),
divided by max possible.
Edge ratio: Ratio of observed to possible edges.
Diameter: Maximum shortest-path distance between words
(length=# edges).
I
Existence of good partition: Component made up of two
interally well-connected pieces, few edges between them
⇒ half-rhyme.
GSV: Eigenvalue gap:
Measure of how easily V can
be cut in two, ∈ [0, 1], higher =
more cuttable.3
3
1 − λ2 , where λ2 second largest eigenvalue of Laplacian of G.
I
Size: Bigger components more likely to contain half-rhyme.
GSVs: log(n), log(#rhymes)
I
Other GSVs:
I
I
Rhymes per vertex
Maximally-connected vertex: Maximum of
degree(v )
(n−1) .
Results: GSV/Component goodness correlations
I
Corr coeff r for each
GSV w/ γ(C)
I
1900 subgraph,
components w/ ≥ 4
words → (135
components), rhyme
stems from CELEX.
Variable
max clq nzd
edge rat
diameter
log(size)
log(rhymes)
ev gap
maxdegs nzd
rhy per vert
r
0.67
0.67
-0.56
-0.69
-0.65
-0.48
0.55
-0.41
p(r )
0
0
0
0
0
< 10−3
0
< 10−2
I
All correlations highly significant
I
Covariance high for GSVs with same motivation
I
Connectedness & size best predictors of component
goodness
I
Eigenvalue gap v. good considering “existence of good
parition” ≈ binary.
I
All correlations highly significant
I
Covariance high for GSVs with same motivation
I
Connectedness & size best predictors of component
goodness
I
Eigenvalue gap v. good considering “existence of good
parition” ≈ binary.
I
Overall: Component graph structure reflects similarity of
rhyme stems.
Discussion
I
Rhyme graphs give linguistic evidence at different level
than individual rhymes: ensemble.
I
Accessible using computational methods, graphical
representation.
Rhyme graph structure reflects:
I
I
I
Change: Similarity to pre- vs. post-merger rhymes →
merger trajectory.
Similarity: Component structure ↔ (phonological) similarity
of included words.
I
Potential applications:
I
I
I
I
Reconstruction for languages with less data, non-phonetic.
Graph spectrum → can say what good partition is,
reconstruct pre-merger classes.
(Synchronic) phonological significance of components?
To do:
I
I
I
I
Everything here very simplified
Binary similarity → phonetic similarity
Better statistical models: individual components instead of
whole graph, reconstruct mergers in different phonetic
environments.
Consider spelling
I
Potential applications:
I
I
I
I
To do:
I
I
I
I
I
Reconstruction for languages with less data, non-phonetic.
Graph spectrum → can say what good partition is,
reconstruct pre-merger classes.
(Synchronic) phonological significance of components?
Everything here very simplified
Binary similarity → phonetic similarity
Better statistical models: individual components instead of
whole graph, reconstruct mergers in different phonetic
environments.
Consider spelling
Thanks!