The Sounds of the Psalter: Computational Analysis of

‘The Sounds of the Psalter:
Computational Analysis of
Soundplay’
............................................................................................................................................................
Drayton C. Benner
University of Chicago, Chicago, IL, USA
.......................................................................................................................................
Correspondence:
University of Chicago, Near
Eastern Languages and
Civilizations, 1155 East 58th
Street, Chicago IL 60637,
USA.
E-mail:
[email protected]
This article presents computational techniques for analyzing soundplay in a
corpus and applies it to a corpus of Biblical Hebrew poetry, namely, the Book
of Psalms. Evidence is presented to show that there is soundplay in the Book of
Psalms, and computational techniques are presented to evaluate a poetic passage
proposed by a scholar as having soundplay. That is, the computational techniques, though not definitive, help to distinguish between artistic soundplay
and the results of chance and a limited phonemic inventory. In addition, visualization tools are presented to aid the researcher in finding soundplay in a
corpus.
.................................................................................................................................................................................
1 Introduction
In discussions of poetic alliteration and other
soundplay, intuition has usually been the sole
guide. Soundplay is sometimes sufficiently striking
that an author’s artistry is apparent, but other times,
poets are more subtle. With a limited phonemic
inventory, clusters of particular sounds are inevitable in a passage of poetry. How can a scholar
decide whether a cluster of sounds in a poetic passage is artistic or merely the result of chance?
Intuition alone is an insufficient guide to problems
involving complex probability and statistics.
Computational techniques can aid the critical eye
and ear of a reader. These computational techniques
do not replace the human critic, but they are tools
that can improve the scholar’s reading of poetry.
This article presents computational techniques
for analyzing soundplay in a corpus and applies it
to a corpus of Biblical Hebrew poetry, namely, the
Book of Psalms. There are three major goals of the
computational techniques presented here. First,
when a scholar has posited that there is artistic
soundplay in a poetic passage, computational techniques ought to be capable of assessing the plausibility of the scholar’s assertion. They may not decide
the issue definitively, but they can provide independent support for a scholarly proposal. The computational techniques involved need to be firmly
founded in mathematical theory and make use of
as much data as possible in making that determination. Second, computational techniques ought to
assist the scholar in finding statistically anomalous
uses of sound in a poetic corpus so as to present
possible instances of soundplay to the scholar.
Third, computational techniques should be enlisted
to determine whether a poetic corpus contains
soundplay at all.
Following a discussion of past scholarly attempts
to quantify alliteration in poetry and an identification of the source of data used along with how it
was segmented, the computational approaches
adopted here are presented in three major sections,
one for each of the three main goals listed above.
Literary and Linguistic Computing, Vol. 29, No. 3, 2014. ß The Author 2014. Published by Oxford University Press on
behalf of EADH. All rights reserved. For Permissions, please email: [email protected]
doi:10.1093/llc/fqu024
Advance Access published on 14 June 2014
361
Downloaded from http://llc.oxfordjournals.org/ at Pennsylvania State University on September 16, 2016
Abstract
D. C. Benner
Finally, one poetic passage in the psalms containing
soundplay is briefly discussed.
2 Previous Attempts to Quantify
Alliteration in Poetry
362
Literary and Linguistic Computing, Vol. 29, No. 3, 2014
Downloaded from http://llc.oxfordjournals.org/ at Pennsylvania State University on September 16, 2016
Attempts at quantifying alliteration in ancient
Northwest Semitic poetry have been non-existent,
with almost all of the work on identifying alliteration being subjective and intuitive.1 However, there
is a tradition of quantifying alliteration in other
poetic corpora, especially English (of various time
periods) and German poetry. This tradition has
used valuable concepts and has sometimes been at
least partially grounded in mathematical theory,
though none of the approaches has been completely
satisfactory.
The renowned psychologist B. F. Skinner wrote
two articles on quantifying alliteration (Skinner,
1939, 1941). In his first article, Skinner examines
Shakespeare’s sonnets. He restricts the consonants
under consideration to important syllable-initial
consonants, where a determination of importance
involves a consideration of stress, whether the
word is a content word, and subjective factors. For
Skinner, alliteration consists of the repetition of a
particular phoneme in a single line. Though he does
not use the term, his reliance on the line introduces
the useful concept of a window, a contiguous
sequence of words in the poem. Skinner uses the
binomial distribution to calculate the expected
number of lines with a given number of occurrences
of any phoneme. In his second article, Skinner includes poetry by Algernon Swinburne and examines
windows of 2–10 syllables, regardless of line breaks.
He also considers sets of multiple phonemes as well
as single phonemes. Finally, he produces a measure
of alliteration in a corpus by returning to his technique from his first article, essentially treating lines
that are alliterative due to repeated words as onethird as important as other alliterative lines. He
finds no evidence of alliteration in Shakespeare
but does find Swinburne alliterative.
Skinner’s studies are, in many ways, fine studies.
The window is a useful concept in delineating the
bounds of a passage that might contain alliteration,
and his approach is rightly grounded in probability
theory. That said, there are weaknesses. First, is alliteration in Shakespeare really limited to a small
handful of the consonants (cf. Wright, 1974)?
Would it not be better to treat certain consonantal
phonemes as more important than others rather
than excluding some altogether? Second, Skinner
does not know how to treat repeated words responsibly, an issue that has plagued computational studies of alliteration. Third, Skinner’s first study limits
itself to windows of precisely one line in length. Was
Shakespeare constrained to a single line in producing alliteration (cf. Wright, 1974)? In his second
study, why did he only examine the first and last
syllables in his new windows rather than every
phoneme in those windows?
The later work of Karl Magnuson and David
Chisholm on German verse incorporated many of
Skinner’s insights but also many of the weaknesses
of his approach (Magnuson, 1962, 1966; Chisholm,
1976, 1981). Direct challenges to Skinner’s work,
however, fared worse. Elizabeth Jackson compares
the proportion of lines that are alliterative by different poets, wherein a line is considered alliterative if
at least two consonantal phonemes of important
syllables are identical (Jackson, 1942). Jackson
dismantles Skinner’s mathematical foundation,
ignoring his use of the binomial distribution. She
does not even take into account differences in line
lengths, an issue even for the tiny samples of iambic
pentameter she uses. Finally, Jackson dismisses
Skinner’s legitimately uneasy conscience concerning
how he was handling repeated words. N. B. Wright’s
later response to Skinner correctly noted some of
the problems with Skinner’s studies, but he, like
Jackson, does not recognize how having repeated
words complicates matters and explicitly bases his
analysis on English orthography rather than phonology (Wright, 1974).
A variety of independent attempts to quantify
alliteration in poetry have been relatively unsuccessful. Richard Bailey tries a variety of techniques for
quantifying alliteration in poetry and comparing
poetic texts to prose texts, but by his own admission, he has no success (Bailey, 1971). Jay Leavitt
proposes a technique by which to rank texts relative
to one another based on their use of alliteration, but
The Sounds of the Psalter
3 Source of data
This project requires a morphologically tagged
electronic edition of the Hebrew Bible. It uses
the Westminster Leningrad Codex (WLC) and Westminster Hebrew Morphology (WHM), both version
4.14. WLC is a diplomatic edition of Codex Leningradensis, the oldest complete manuscript of the
Hebrew Bible in the Tiberian tradition. That is, it
follows Codex Leningradensis faithfully even when
there is an occasional scribal error. WHM provides
lemmas and morphology codes for each Hebrew
segment. These have been developed since the
1980s by a variety of scholars and are currently
maintained by the J. Alan Groves Center for
Advanced Biblical Research. Though perfection is
impossible to attain, particularly given the many
legitimate scholarly disputes concerning how to
analyze particular passages, these are mature sources
of good data.
4 Segmenting the corpus
The primary corpus of study is the Hebrew Bible,
and the language is Biblical Hebrew. From WLC, a
few parts must be excised: principally the Aramaic
parts of the Hebrew Bible but also some nonlinguistic symbols. In addition, WLC provides the
text for the kethib and qere forms, the occasional
instances in which the Massoretes preserved the
consonants of their written tradition but provided
the vowels to a different word that belonged to their
oral tradition. To use both would be redundant, so I
opted to follow the kethib text.
Working computationally with a corpus requires
tokenizing, or segmenting, the corpus. Suppose we
have a corpus C in language L consisting of a set of
texts T ¼ ft1 , t2 , . . . , ta g, which form a partition
over the corpus. I divided the corpus into texts at
book boundaries, psalm boundaries, and after the
language shifts from Aramaic to Hebrew. Each text t
contains a sequence of words fw1 , w2 , . . . , wb g.
Word divisions are generally indicated orthographically by the presence of a space or a Massoretic
maqqeph. However, following WHM, I allow a
word to span across a space or maqqeph for certain
Literary and Linguistic Computing, Vol. 29, No. 3, 2014
363
Downloaded from http://llc.oxfordjournals.org/ at Pennsylvania State University on September 16, 2016
there is no theoretical basis for his measure of alliteration, and his samples are far too short to be
useful—each 300–500 phonemes in length (Leavitt,
1976).
Marc Plamondon has perhaps worked on quantifying sound patterning in poetry more persistently
than anyone else (Plamondon, 2001, 2005, 2009).
Plamondon maintains sensibly that a phoneme
leaves an impression on the hearer, and the impression of this phoneme decreases over time. He posits
that the effect of repeating a particular phoneme is
additive. This approach yields a graph of the phonemic accumulations for a phoneme or set of phonemes. He then runs a Discrete Fourier Transform on
it to try to distinguish the signal from the noise. He
can also plot the ratio of the results of the Fourier
Transform between two sets of phonemes (e.g. stops
and fricatives) (Plamondon, 2009).
Plamondon’s techniques will likely be helpful,
but I sought other techniques for identifying a passage that contains alliteration for three reasons.
First, there is no theoretical justification for
Plamondon’s function for how the impression of a
phoneme declines over time, nor is there theoretical
or empirical justification for treating the effect of
phonemes as additive. Second, while phonemes
could theoretically be weighted differently in
Plamondon’s system, he does not do so. The final
issue for my purposes is that Plamondon’s system is
designed to identify a particular moment in the
poem wherein the effect of alliteration on a hearer
is strongest. A poet, however, uses alliteration over a
particular poetic passage. Techniques designed to
identify passages containing alliteration will likely
produce overlapping results with techniques designed to identify peak effects on a reader, but
where the latter is successful, it is likely to identify
a peak near the end of an alliterative section and
provide less information concerning where the alliterative section began.
In sum, despite the positive characteristics of past
work,2 there is a need for a new approach. Any
technique for quantifying alliteration in poetry
ought to be built from the base of a solid mathematical framework and use all of the data available,
weighting different evidence appropriately. Much of
the rest of this article presents such a technique.
D. C. Benner
5 Measuring soundplay in a
passage
The central issue in computationally validating a particular window—a sequence of contiguous words in
a single text—as containing a cluster of soundplay is
that one needs a way in which to determine whether a
set of phonemes is overrepresented in a given passage. In particular, this determination needs to be
made in a mathematically rigorous fashion.
5.1 Binomial distribution and its cdf
We want to know whether the number of
phonemes in a window W belonging to a set of phonemes F P, where P is the set of consonantal phonemes in the language, is abnormally high under the
assumption that the consonantal phonemes in W are
selected at random according to their distribution in
corpus C. Toward this end, the binomial distribution
provides a probability distribution of the number of
phonemes in W that belong to F if each consonantal
phoneme is chosen from P independently with some
probability p, where p is the frequency of the consonantal phoneme in the corpus. Figure 1 shows p for
each consonantal phoneme in Biblical Hebrew.
From the cumulative distribution function (cdf )
of the binomial distribution, for a given window
Fig. 1 Consonantal phoneme probabilities in the Hebrew Bible
364
Literary and Linguistic Computing, Vol. 29, No. 3, 2014
Downloaded from http://llc.oxfordjournals.org/ at Pennsylvania State University on September 16, 2016
proper nouns. Each word w consists of a sequence
of segments fs1 , s2 , . . . , sc g. For example, the word
(mhmlk; ‘from the king’) consists of three
(mn;
segments corresponding to the lexemes
(mlk; ‘king’), but the
‘from’), (h; ‘the’), and
(jktb; ‘he will write’) consists of only one
word
segment. Each segment s consists
of a sequence
of
consonantal phonemes
p1 , p2 , . . . , pd , where
pi 2 P, the set of consonantal phonemes in language
L. Matres lectionis—vowels indicated via consonant
letters before the development of a full system for
representing vowels in Hebrew—and Massoretic
symbols were removed from WLC. I limit my investigation to the consonants on account of the difficulties in reconstructing the precise vowels of the
biblical period, particularly in the Book of Psalms,
an anthological corpus with compositions from
many different centuries. For the consonants, I
assume that the consonantal orthography represents
the phonology well, with three exceptions: the representation of /¿/ and // by u (¤ ), the representation of / / and /x/ by j (h[), and the occasional
quiescence of a (? ). I also assume that the
Massoretes correctly distinguished between two
phonemes in marking ? as ?_ (S ) and _? (¸ ). The
articulation of r () is uncertain. Each token also
has a part of speech and a lexeme, which are provided by WHM.
The Sounds of the Psalter
containing n consonantal phonemes, the probability
that it will contain at most k consonantal phonemes
in F is:
Pr ðX k Þ ¼
k X
n i
p ð1 pÞni
i
i¼0
5.2 Weighting the phonemes of a
segment
The foregoing use of the binomial distribution is
quite powerful and grounded nicely in mathematical
theory, but it assumes that all consonantal phonemes
are of equal importance to the poet in producing
soundplay, which is simplistic. Poets working
within a particular poetic tradition may make more
use of certain sounds than others. Consider an example of alliteration from Shakespeare:
From forth the fatal loins of these two foes
A pair of star-cross’d lovers take their life.3
In these two lines, /f/ appears five times and /l/ four
times, but in seven of the nine times it is the first
sound of the word, and in the remaining two cases it
is the final sound in the word. A model that
weighted all phonemes equally might miss this
pattern.
5.3 Weighting phonemes in Biblical
Hebrew
From my observations of Biblical Hebrew poetry,
the location of the phoneme within the word does
not appear to be significant. However, three factors
are important: the relative frequency of lexemes, the
repetition of lexemes, and the parts of speech of
segments. A list of nouns with the definite article
h (h) does not represent a cluster of the h (h) phoneme. By contrast, a rare lexeme may have been
chosen over a more common one precisely on account of its sound. Soundplay using content words
is more effective than that which uses function
words. The repetition of a particular lexeme does
1
a f,d ¼
8
>
>
>
>
1,
>
<
bðpos Þ ¼ 0:8,
>
>
>
0:6,
>
>
:
0:5,
:2
1
4f
d
1:2
d
(
pos
pos
pos
pos
)
verb, common noun,
2
adjective, number
¼ pronoun
¼ particle
¼ proper or gentilic noun
Figures 2 and3 show
the frequency weights for the
frequencies a f , d for the lexemes of the Hebrew
Bible.
Unlike the functions a and b, one can define
c r, m, f , d in a manner that is theoretically
grounded. The binomial distribution provides a
probability distribution concerning the number
of times we would expect a given lexeme to
appear in a window containing m segments. If
Literary and Linguistic Computing, Vol. 29, No. 3, 2014
365
Downloaded from http://llc.oxfordjournals.org/ at Pennsylvania State University on September 16, 2016
When this calculation produces a value that is
greater than some threshold, it is consistent with
the hypothesis that the poet used the set of phonemes F in W for artistic effect.
not signal that the poet was specifically seeking to
employ phonological parallelism, or soundplay,
over against another type of parallelism. A clause
(pd wpt wp ¿ lk ‘terror,
such as
pit, and snare are upon you’; Isaiah 24:17; Jeremiah
48:43) is rhetorically effective on account of its use
of different lexemes with similar phonemes.
Initially, when each consonantal phoneme was
weighted equally, each consonantal phoneme
counted as a single Bernoulli trial. However, if
one weights the consonantal phonemes of each segment according to the relative frequency of their
lexemes, their parts of speech, and the repetition
of lexemes in the window, then each consonantal phoneme counts as a trial with weight in
the range ½0:0, 1:0. It is calculated as follows:
wðf , d, pos, r, mÞ ¼ aðf , dÞ bðposÞ cðr, m, f , dÞ,
where f is the number of occurrences of the lexeme
in the corpus, d is the number of segments in the
corpus, pos is the part of speech of the segment’s
lexeme, r is the number of times the lexeme associated with the segment appears in the window,
and m is the window size in segments. There is
no theoretical guidance for how to define the functions a and b, but they can be defined in an ad hoc
but straightforward and reasonable manner as
follows:
D. C. Benner
Downloaded from http://llc.oxfordjournals.org/ at Pennsylvania State University on September 16, 2016
Fig. 2 Frequency weight aðf , dÞ for all lexemes in the Hebrew Bible
Fig. 3 Frequency weight aðf , dÞ of lexemes occurring 100 times in the Hebrew Bible
the cdf of the binomial distribution is very high,
then the lexeme is overrepresented in the window,
and the weight of each occurrence ought to be
lower. Let F ðk; n, pÞ be the cdf of the binomial
distribution, where k is the number of successes
in n trials, where
each trial has a probability of
success p. Let F 0 y; n, p be the inverse cdf of the
binomial distribution.
366
Literary and Linguistic Computing, Vol. 29, No. 3, 2014
Let c r, m, f , d ¼
8
f
>
1:0,
if
F
r
1;
m
1,
>
d 0:9;
>
< f
F r1;m1,
f
d
max 1, F 0 0:45 þ
;m1, d
>
2
>
>
:
, if F r 1; m1, f >0:9
r
d
In words, when a lexeme appears in a window, if by
chance it would appear at least as many times
The Sounds of the Psalter
6 Discovering soundplay in a text
Three visualization tools to aid the researcher in
discovering soundplay will be presented. The first
two are independent of the computational techniques for discovering soundplay described above,
while the third makes full use of them. The third has
been the most productive.
6.1 Two preliminary visualizations
The first two visualization techniques both rely on
mapping from phonemes to color. This mapping is
designed such that similar-sounding phonemes will
have similar colors, as shown in Figure 4. The two
velar fricatives in Biblical Hebrew were not distinguished in the orthography from their pharyngeal
counterparts, so they are colored as though they
were pharyngeal.
6.1.1 Colored text
With this mapping from consonantal phoneme to
color in hand, it is possible to display the text of the
Hebrew Bible with each consonantal phoneme colored accordingly, as shown in Figure 5. As this visualization tool was written in xhtml and javascript
and none of the major web browsers could handle
the placement of vowels and cantillation marks
Fig. 4 Mapping from Hebrew consonantal phonemes to colors
Literary and Linguistic Computing, Vol. 29, No. 3, 2014
367
Downloaded from http://llc.oxfordjournals.org/ at Pennsylvania State University on September 16, 2016
elsewhere in the window as it actually does at
least one-tenth of the time, then the repetition
weight is 1; it receives full weight. However, if the
lexeme appears more times than that in the rest of
the window, then the weight is scaled down. The
function c is defined in such a way that it is
continuous.
Without weighting phonemes, a window contained k successes in n Bernoulli trials, where
k, n 2 Z, so the discrete binomial distribution was
appropriate. Once phonemes are weighted differently, k, n 2 R, so a continuous distribution is
required. The cdf of the binomial distribution can
be represented in terms of the continuous regularized incomplete beta function, so it is used.
Weighting the phonemes also affects the baseline
probability of a success in the weighted trials. If each
phoneme has a constant weight, this presents little
challenge. However, the weights, as we have defined
them, are dependent on the selection of the window
because they take into account repetition of lexemes
within the window. A window-independent weight
is required for the phoneme. This is calculated by
finding the average weight of the phoneme for all
windows of which it is a part, only requiring that the
window’s length be in the range ½wmin , wmax words.
In all work described henceforth, wmin ¼ 5 and
wmax ¼ 25.
D. C. Benner
while having them a separate color from the consonant to which they are attached, the vowels and
cantillation marks surrounding a consonantal phoneme are colored. The chief benefit of this visualization tool is that it allows the researcher to read the
text at the same time as seeing the colors, performing a close reading of the text. The same is true of
other visualization tools that other scholars have
created to aid in the close reading of poetry with
particular attention to sound: AnalysePoems,
ProseVis, and PoemViewer (Plamondon, 2005;
Clement et al., 2013; Coles and Lein, 2013).
6.1.2 Coloring with one vertical strip per
phoneme
The second visualization tool allows the researcher to
view a large swath of text at one time. In this visualization tool, the phonemes themselves are not shown.
Rather, each phoneme is given a box that is only one
pixel wide and several pixels high. The box is filled
with the color corresponding to the phoneme.
Importantly, one can fill the boxes for any subset of
the phonemes while leaving the rest black. Figures 6
and 7 show this visualization tool, which was written
in Java, with particular phonemes selected for viewing. The white boxes at the end of a line indicate the
end of a chapter. On mouseover, the researcher
learns the location in the text via a tooltip.
368
Literary and Linguistic Computing, Vol. 29, No. 3, 2014
6.2 Visualization with the computational
techniques described above
The third and final visualization tool has been the
most helpful at finding soundplay because it uses
the computational techniques described above.
Once the user selects a phoneme or group of phonemes, it tests every possible window in each text
T ¼ fw1 , w2 , . . . , wb g in the range ½wmin , wmax words long. I exclude all windows that have fewer
than pmin phonemes in F. In all work described
henceforth, pmin ¼ 3.
For each word, it records the highest cdf value of
any window containing that word. These values can
then be color-coded and shown in a visually compact manner. Each word receives a box one pixel in
width and several pixels in height. The colors are
chosen according to the value v for each word using
a rainbow color
8 scale:
Red,
v 0:999
>
>
>
>
Orange,
0:99 v < 0:999
>
>
>
>
< Yellow, 0:98 v < 0:99
Color is Green,
0:97 v < 0:98
>
>
Blue,
0:95
v < 0:97
>
>
>
>
>
Indigo,
0:5
v < 0:95
>
:
Violet,
v < 0:5
Figures 8 and 9 show this visualization tool
for two different sets of phonemes. The
Downloaded from http://llc.oxfordjournals.org/ at Pennsylvania State University on September 16, 2016
Fig. 5 Psalm 119:1-13 with color mapping of Hebrew consonantal phonemes
The Sounds of the Psalter
Downloaded from http://llc.oxfordjournals.org/ at Pennsylvania State University on September 16, 2016
Fig. 6 Psalm 1ff. (one Psalm per line) showing r ()
Fig. 7 Psalm 1ff. (one Psalm per line) showing the dental stops (t (t), d (d), and ~ (t'))
location in the corpus is provided in a tooltip on
mouseover.
7 Validating the existence of
soundplay in the psalms
One can use statistical and computational techniques to validate that a poetic corpus does in fact
contain soundplay. This requires a two-step process.
The first step is to produce a corpus that is comparable to the poetic corpus yet has little or no artistic soundplay. The second step is to quantify the
amount of soundplay in each of these corpora and
compare them. Given the discussion above, the
second step is more straightforward and will thus
be discussed first. The discussion of developing
comparable corpora will be presented next, followed
Literary and Linguistic Computing, Vol. 29, No. 3, 2014
369
D. C. Benner
Fig. 9 This shows Psalms 84-113 (one psalm per line) for the consonantal phonemes in the word Jerusalem.
finally by the results of the comparisons, which do
validate that there is soundplay in the psalms.
7.1 Measuring soundplay in a corpus
To define a metric of relative soundplay in a corpus,
one can make use of the computational techniques
described above. In particular, the third visualization technique above described finding a peak value
for each word. One can set a threshold for this peak
value and count how many words in the corpus
exceed that threshold. For example, if the threshold
370
Literary and Linguistic Computing, Vol. 29, No. 3, 2014
is 0:999, this is equivalent to counting the number
of red, one-pixel by several-pixel boxes in the visualization tool. This number can be compared to the
number of red boxes for another corpus, dividing
each by the corpus size in words if the two corpora
are of different lengths.
7.2 Producing a comparable corpus
There are two methods by which one can produce a
corpus comparable to the original poetic corpus
Cpoetic yet lacking a significant amount of soundplay.
Downloaded from http://llc.oxfordjournals.org/ at Pennsylvania State University on September 16, 2016
Fig. 8 This shows Psalms 23-37 (one psalm per line) for the phoneme r ()
The Sounds of the Psalter
The first is to generate a test corpus by rearranging
the words of the poetic corpus at random. The
second involves comparing the poetic corpus to a
corpus of prose texts. Each of these ways will be
discussed in turn.
7.2.2 Using a prose corpus as a test corpus
The second way to produce a corpus for comparison
to Cpoetic is to use an already-existing prose corpus
in the same language. Prose can contain soundplay
as well, but it is likely that soundplay is not as prominent in prose as it is in poetry in most literary
traditions. In the case of the Hebrew Bible,
the following predominantly prose books were
selected: Genesis, Exodus, Leviticus, Numbers,
Deuteronomy, Joshua, Judges, Ruth, 1-2 Samuel,
1-2 Kings, 1-2 Chronicles, Ezra, Nehemiah, and
Esther.
There are some potential differences between the
two corpora. The language is a bit more formulaic
in prose, and word order is less fluid. There is more
varied vocabulary in poetry than in prose. Certain
particles that are extremely common in prose are
less common in poetry, and uncommon words
appear more frequently in poetry than in prose.
Moreover, content words are likely to be distributed
in prose in a manner that less resembles a Poisson
distribution than in poetry. This is especially the
case for proper nouns, which, excluding divine
Literary and Linguistic Computing, Vol. 29, No. 3, 2014
371
Downloaded from http://llc.oxfordjournals.org/ at Pennsylvania State University on September 16, 2016
7.2.1 Generating a test corpus by rearranging
the words of the poetic corpus
To generate a corpus that shares most of the characteristics of Cpoetic yet has no artistic uses of soundplay, one can rearrange the words of the poetic
corpus at random. In this manner, the distributions
of phonemes within a word can be preserved, and
the text boundaries in Cpoetic can be preserved as
well. It is important to rearrange the words at
random, not the phonemes, because the phonemes
within a token or word are not independent of one
another. This is true of languages generally, but it is
especially true of Semitic languages like Biblical
Hebrew, wherein nearly every word derives from a
tri-consonantal root. There are phonological constraints on the selection of the three consonants of
the roots (Greenberg, 1950; cf. Weitzman, 1987;
Frisch, 2004; Frisch et al., 2004; Vernet, 2011).
This generated corpus cannot possibly share
every characteristic of Cpoetic without itself being
Cpoetic . The differences in characteristics between
the generated corpus and Cpoetic , however, are
minor and unlikely to be of consequence in measuring soundplay.
If the language in Cpoetic is fairly formulaic, then
one would expect some n-grams (for n > 1) at the
word level to be quite frequent in Cpoetic , whereas
one would not expect this in the generated corpus.
This could affect a measure of soundplay in the
corpus if the commonly adjoining words happen
to contain similar phonemes. However, the psalms
are not particularly formulaic; its vocabulary is
more varied than the vocabulary of the prose
books. Additionally, there is no reason to expect
frequent n-grams to contain the same phonemes.
The one area that has been argued to be formulaic
in the psalms is fixed word-pairs.4 That is, particular
pairs of words are commonly used in parallel lines,
whether the two words be synonyms, antonyms,
etc., but this is unlikely to produce a significant
effect. It consists of only two words, usually separated by several words that vary. Moreover, there is
no reason to expect these word-pairs to contain the
same phonemes an abnormal proportion of the
time.
After rearranging the words of the poetic corpus,
we expect the distances between segments corresponding to the same lexemes to follow a Poisson
distribution. This is generally also the case in natural
language for function words, but occurrences of
particular content words tend to bunch together.
That is, the variance of the distances between the
occurrences of a content word is larger than the
variance of a Poisson distribution. This can affect
a measure of soundplay, but its effect is mitigated by
two factors. First, the more varied vocabulary of the
Book of Psalms lessens the effect of this phenomenon. Second, the measure of soundplay that I am
using is intentionally chosen to mitigate the effects
of the repetition of lexemes by weighting those repeated words lower. Thus, I do not anticipate that
these differences between Cpoetic and the generated
corpus pose any problems.
D. C. Benner
7.2.3 Results of the comparisons
7.2.3.1 Comparing psalms to rearranged psalms
I generated a corpus from the psalms with the
words rearranged
100 times. Let us call them
Crearranged ¼ Crearranged1 , . . . , Crearranged100 . Using my
metric of counting the number of peaks above a
certain threshold t, we do indeed find a difference
between the psalms and these rearranged corpora,
provided that t is set sufficiently high.
For t ¼ 0:999, Figure 10 compares the number of
peaks above the threshold for each consonantal
phoneme in Cpoetic over against Crearranged . The average value in Figure 10 is 74. If each corpus in
Crearranged were compared against the remaining 99
corpora in Crearranged , the average value on a comparable graph would be in the range ½35, 64.
Similarly, if one totals the peaks above the threshold
for all of the consonantal phonemes, this sum for
the corpora in Crearranged is in the range ½2444, 3573.
By contrast, the sum for Cpoetic is 4340, as shown in
Figure 11. Thus, above the 0:999 threshold, it is
reasonable to expect approximately 56 82% of
the peaks to be the result of chance and a limited
phonemic inventory, while the remaining 18 44%
are the result of the artistic use of sound.
When the threshold is lowered to 0:99, Cpoetic still
looks to have more soundplay than the corpora in
Fig. 10 Word peaks above threshold 0.999 in psalms as percentile of word peaks above the same threshold in C rearranged
Fig. 11 Number of peaks above threshold 0.999 for all consonantal phonemes
372
Literary and Linguistic Computing, Vol. 29, No. 3, 2014
Downloaded from http://llc.oxfordjournals.org/ at Pennsylvania State University on September 16, 2016
names, are much more common in Biblical Hebrew
prose than poetry. All of these are mostly mitigated
by my weighting strategies. If there is any residual
effect, it would be to skew measures of soundplay in
the direction of prose. Additionally, the texts are
longer in prose than in poetry. As a result, a
higher proportion of the words in prose are part
of as many windows as possible. The more windows
of which a word is a part, the more likely it is to
have a high maximum value. Again, this difference
tilts my measure toward favoring prose. Thus, if my
measure of soundplay results in showing more
soundplay in poetry despite these differences and
despite the fact that there is probably some soundplay in prose as well, then one can be even surer of
the result that the poetic corpus contains soundplay.
The Sounds of the Psalter
7.2.3.2 Comparing psalms to prose
Cprose consists of seventeen books of the Hebrew
Bible primarily consisting of prose. Figures 16
and 17 show the peak word values by consonantal
phoneme for the thresholds 0:999 and 0:99, respectively. Figure 18 shows the peak word values for all
consonantal phonemes for a variety of threshold
values in one graph. It shows that the psalms
word peaks are substantially higher than the prose
word peaks for the rightmost tail of the distribution,
where the threshold is 0:99 or higher, indicating that
there is more soundplay in the psalms than in the
prose books.
7.2.4 Conclusion
The comparison between the psalms and Crearranged
as well as the comparison between the psalms and
Cprose show that there is indeed soundplay in the
psalms. Moreover, it shows that a peak value
below the threshold 0:99 should not be taken as
significant. Regardless of the threshold used, there
will be peaks that are simply the result of chance and
a limited phonemic inventory. Computational techniques can show that sound is plausibly used for
artistic purposes in a passage, but it cannot prove
that any given instance of soundplay is artistic and
not the result of chance; the human critic has to
look for additional lines of evidence to distinguish
between the two or humbly accept uncertainty.
Fig. 12 Word peaks above threshold 0.99 in psalms as percentile of word peaks above the same threshold in C rearranged
Fig. 13 Number of peaks above threshold 0.99 for all consonantal phonemes
Literary and Linguistic Computing, Vol. 29, No. 3, 2014
373
Downloaded from http://llc.oxfordjournals.org/ at Pennsylvania State University on September 16, 2016
Crearranged , but the gap is much narrower on a percentage basis. Figures 12 and 13 are the two comparable charts with t ¼ 0:99. This suggests that by
the time the threshold t ¼ 0:99, most of the peaks
correspond to chance. However, perhaps between
3 12% of them are artistic uses of sound, some
of which satisfied the more rigorous threshold of
t ¼ 0:999.
When one lowers the threshold further to 0:98,
one finds no distinction between Cpoetic and
Crearranged . Thus, in looking for artistic uses of
sound in the psalms, one should look only at
values above 0:99. Figures 14 and 15 show the
peaks in the range ½0:98, 0:99.
D. C. Benner
Fig. 15 Number of peaks in range ½0:98, 0:99 for all consonantal phonemes
Fig. 16 Comparison of word peaks above threshold 0:999 in psalms and prose books by consonantal phoneme
8 Results
The computational techniques described above have
been successful in uncovering soundplay in the
374
Literary and Linguistic Computing, Vol. 29, No. 3, 2014
psalms. One example is presented here. In Psalm
37:34-36, the r ( ) phoneme appears precisely one
time in each of fifteen consecutive words. Psalm 37
is an acrostic poem, and these fifteen words span the
Downloaded from http://llc.oxfordjournals.org/ at Pennsylvania State University on September 16, 2016
Fig. 14 Word peaks in the range ½0:98, 0:99 in psalms as percentile of word peaks in the same range in C rearranged
The Sounds of the Psalter
Fig. 18 Comparison of peak word values for the psalms and prose books
q (k') and r ( ) sections. This soundplay helps to
reinforce the acrostic structure of the poem and also
serves to bind the q (k') and r ( ) sections together.
Figure 19 shows the text using the first visualization
tool presented above, along with a transliteration of
its consonants and a translation. The second and
third visualizations for Psalm 37 are also shown in
Figures 20 and 21. With one notable exception
(Levine, 2003, p. 78–9), modern exegetes, following
the lead of the LXX, the ancient Greek translation
of the Hebrew Bible, have not noticed this
phonological pattern and have sought to smooth
( ;
the sense of verse 35 via emendation:
‘high-spirited, arrogant’) in place of
( ; ‘vio(
; ‘one who elevates himself’) in
lent’),5
place of
(
; ‘spreading’), and
( ;
(
; ‘native’). While the
‘cedar’) in place of
final of these is plausible, the first two miss the
poet’s phonological artistry. The poet’s word
choice in verse 35 might be sub-optimal when considered semantically, but that is because sound, not
semantics, was driving his word choice. Exegetes’
Literary and Linguistic Computing, Vol. 29, No. 3, 2014
375
Downloaded from http://llc.oxfordjournals.org/ at Pennsylvania State University on September 16, 2016
Fig. 17 Comparison of word peaks above threshold 0:99 in psalms and prose books by consonantal phoneme
D. C. Benner
Fig. 20 Psalm 37 with Visualization Tool 2
Fig. 21 Psalm 37 with Visualization Tool 3
emendations represent an aesthetically appealing alternative artistry available to the ancient poet, but it
is not the artistry that the ancient poet chose.
9 Conclusion
Computational techniques are necessary for evaluating soundplay in a corpus. Past discussions of
soundplay in Northwest Semitic poetry have
mostly been driven by intuition, but human critics
are not good at distinguishing between artistic uses
of sound and the results of chance and a limited
phonemic inventory. Computational techniques developed for other poetic corpora have been more
advanced, but none of them has been completely
376
Literary and Linguistic Computing, Vol. 29, No. 3, 2014
satisfactory. The approach described in this chapter
is grounded firmly in mathematical theory, makes
maximum use of the available data, and recognizes
that some phonemes are more important to a poem
than others. Computational techniques are helpful
not only in evaluating soundplay that a scholar has
proposed but also in aiding a scholar in finding
soundplay. Among the three visualization tools presented herein, the third visualization tool has been
the most useful in finding soundplay because it uses
the computational techniques designed for evaluating whether soundplay exists in a poetic passage.
Searching a poetic corpus for soundplay is fruitless
if there is not actually artistic soundplay in that
corpus. As a result, I presented techniques that
show that there is soundplay in the psalms.
Downloaded from http://llc.oxfordjournals.org/ at Pennsylvania State University on September 16, 2016
Fig. 19 Psalm 37 with Visualization Tool 1
The Sounds of the Psalter
visualization. iConference 2013 Proceedings [Online].
http://hdl.handle.net/2142/38940 (accessed 18 October
2013).
Frisch, S. A. (2004). Language Processing and Segmental
OCP. In Hayes, B., Kirchner, R., and Steriade, D. (eds),
Phonetically-based phonology. Cambridge: Cambridge
University Press.
Frisch, S. A., Pierrehumbert, J. B., and Broe, M. B.
(2004). Similarity avoidance and the OCP. Natural
Language and Linguistic Theory, 22: 179–228.
Greenberg, J. H. (1950). The patterning of root morphemes in semitic. Word, 6: 162–81.
Hidley, G. R. (1986). Some Thoughts concerning the application of software tools in support of old english
poetic studies. Literary and Linguistic Computing, 1:
156–62.
Jackson, E. (1942). The quantitative measurement of assonance and alliteration in swinburne. American
Journal of Psychology, 55: 115–23.
Leavitt, J. A. (1976). On the measurement of alliteration
in poetry. Computers and the Humanities, 10: 333–42.
Lessard, G. and Hamm, J. -J. (1991). Computer-aided
Analysis of repeated structures: the case of Stendhal’s
Armance. Literary and Linguistic Computing, 6:
246–52.
References
Bailey, R. W. (1971). Statistics and the sounds of poetry.
Poetics, 1: 16–37.
Levine, N. (2003). Vertical poetics: interlinear phonological parallelism in psalms. Journal of Northwest
Semitic Languages, 29: 65–82.
Barquist, C. R. (1987). Phonological Patterning in
Beowulf. Literary and Linguistic Computing, 2: 19–23.
Logan, H. M. (1988). Computer analysis of sound and
meter in poetry. College Literature, 15: 19–24.
Barquist, C. R. and Shie, D. L. (1991). Computer analysis
of alliteration in beowulf using distinctive feature
theory. Literary and Linguistic Computing, 6: 274–80.
Magnuson, K. (1962). Consonant Repetition in the lyric
of Georg Trakl. Germanic Review, 37: 263–81.
Berlin, A. (1985). The Dynamics of Biblical Parallelism.
Bloomington, Ind.: Indiana University Press.
Magnuson, K. (1966). Phonological Investigations into the
Structure of German Verse. Ph.D. dissertation,
University of Michigan.
Chisholm, D. (1976). Phonological patterning in German
Verse. Computers and the Humanities, 10: 5–20.
Margalit, B. (1975). Studia ugaritica I: introduction to
ugaritic prosody. Ugarit Forschungen, 7: 289–313.
Chisholm, D. (1981). Phonology and style: a computerassisted approach to German Verse. Computers and the
Humanities, 15: 199–210.
Pardee, D. (1973). A Restudy of the commentary on
Psalm 37 from Qumran Cave 4 (Discoveries in the
Judaean Desert of Jordan, vol. V, no 171). Revue de
Qumran, 8: 163–94.
Clement, T., Auvil, L., Tcheng, D., Capitanu, B.,
Monroe, M., and Goel, A. (2013). Sounding for meaning: using theories of knowledge representation to analyze aural patterns in texts. Digital Humanities
Quarterly, 7. http://digitalhumanities.org/dhq/vol/7/1/
000146/000146.html (accessed 13 May 2014).
Coles, K. and Lein, J. (2013). Finding and figuring
flow: notes toward multidimensional poetry
Plamondon, M. R. (2001). The Musical Aesthetics of the
Poetry of Tennyson and Browning. Ph.D. dissertation,
University of Toronto.
Plamondon, M. R. (2005). Computer-assisted phonetic
analysis of english poetry: a preliminary case study
of browning and tennyson. TEXT Technology, 14:
153–75.
Literary and Linguistic Computing, Vol. 29, No. 3, 2014
377
Downloaded from http://llc.oxfordjournals.org/ at Pennsylvania State University on September 16, 2016
This work has been specifically geared toward
Biblical Hebrew poetry, but the techniques are general enough to apply to other poetic corpora as well
with only minor changes. The method of segmenting the data might vary from corpus to corpus; for
example, line and stanza divisions might be more
important to soundplay in other poetic corpora.
The way in which phonemes are weighted might
also vary; for example, the location of a phoneme
within a word or syllable might be significant in
other poetic corpora.
Computational techniques are critical to a study
of sound in poetry, but they are not sufficient in and
of themselves. They do not replace the human critic.
When computational techniques suggest that there
may be soundplay in a poetic passage, a human
critic can bring other lines of evidence to try to
determine whether the soundplay is artistic.
Certainty is elusive, but an able critic working
with computational techniques will produce the
best readings of poetic soundplay.
D. C. Benner
Plamondon, M. R. (2009). Poetic waveforms, discrete
fourier transform analysis of phonemic accumulations,
and love in the garden of tennyson’s maud. Digital
Studies / Le champ numérique (Online), 1. http://
www.digitalstudies.org/ojs/index.php/digital_studies/
article/view/179/228 (Accessed 26 July 2013).
Wright, N. B. (1974). Measuring Alliteration: A Study in
Method. In Mitchell, J. L. (ed.), Computers in the
Humanities. Edinburgh: Edinburgh University Press.
Shirley, C. G. (1979). Alliteration as Evidence in dating a
poem of thomas churchyard: an exploratory computeraided study. Modern Philology, 76: 374–7.
Notes
Skinner, B. F. (1941). A Quantitative Estimate of Certain
Types of Sound-Patterning in Poetry. American Journal
of Psychology, 54: 64–79.
Vernet, E. L. (2011). Semitic root incompatibilities and
historical linguistics. Journal of Semitic Studies, 56:
1–18.
Watson, W. G. E. (1984). Classical Hebrew Poetry:
A Guide to its Techniques. Sheffield, England: JSOT
Press.
Weitzman, M. (1987). Statistical Patterns in hebrew and
arabic roots. Journal of the Royal Asiatic Society of Great
Britain and Ireland, 1: 15–22.
378
Literary and Linguistic Computing, Vol. 29, No. 3, 2014
Downloaded from http://llc.oxfordjournals.org/ at Pennsylvania State University on September 16, 2016
Skinner, B. F. (1939). The alliteration in Shakespeare’s
sonnets: a study in literary behavior. Psychological
Record, 3: 186–92.
1 Baruch Margalit does attempt to delineate rules for
alliteration in Ugaritic poetry, but his approach is statistically meaningless (Margalit, 1975, p. 311).
2 Several other studies deserve mention, though their
methods are either rather simplistic or not given in as
much detail as one might like: Shirley (1979); Hidley
(1986); Barquist (1987); Logan (1988); Barquist and
Shie (1991); Lessard and Hamm (1991).
3 Romeo and Juliet, Act I, Prologue, lines 5-6.
4 For an overview of research on word pairs in Biblical
Hebrew poetry, see Watson (1984, p. 128–44). For a
more skeptical treatment that explains fixed word
pairs in Biblical Hebrew in terms of psycholinguistics
rather than a culturally specific stock vocabulary of
word-pairs, see Berlin (1985, p. 65–83).
5 Note that 4Q Psalm 37 preserves #yru (¿s'; ‘‘violent’’)
but is broken such that the other two words are unknown. See Pardee (197, p. 166).