Stephen Crane and the New-York Tribune: A

Computers and the Humanities 35: 315–331, 2001.
© 2001 Kluwer Academic Publishers. Printed in the Netherlands.
315
Stephen Crane and the New-York Tribune:
A Case Study in Traditional and Non-Traditional
Authorship Attribution
DAVID I. HOLMES, MICHAEL ROBERTSON and ROXANNA PAEZ
The College of New Jersey, USA
Abstract. This paper describes how traditional and non-traditional methods were used to identify
seventeen previously unknown articles that we believe to be by Stephen Crane, published in the
New-York Tribune between 1889 and 1892. The articles, printed without byline in what was at the
time New York City’s most prestigious newspaper, report on activities in a string of summer resort
towns on New Jersey’s northern shore. Scholars had previously identified fourteen shore reports as
Crane’s; these possible attributions more than double that corpus. The seventeen articles confirm how
remarkably early Stephen Crane set his distinctive writing style and artistic agenda. In addition, the
sheer quantity of the articles from the summer of 1892 reveals how vigorously the twenty-year-old
Crane sought to establish himself in the role of professional writer. Finally, our discovery of an article
about the New Jersey National Guard’s summer encampment reveals another way in which Crane
immersed himself in nineteenth-century military culture and help to explain how a young man who
had never seen a battle could write so convincingly of war in his soon-to-come masterpiece, The Red
Badge of Courage. We argue that the joint interdisciplinary approach employed in this paper should
be the way in which attributional research is conducted.
Key words: authorship, New York Tribune, Stephen Crane, stylometry
1. Introduction
The past forty years have witnessed a revolution in authorship attribution. When
Erdman and Fogel (1966) assembled their massive collection of the best work in
the field to date, not one of the articles they selected employed computer-assisted
statistical methodologies. Even fifteen years later, a guide to literary research that
is still regarded as a standard in the field (Altick, 1981) devoted its entire chapter on
“Problems in Authorship” to the traditional methods treated by Erdman and Fogel’s
contributors: the use of “external” evidence such as letters and other contemporary
testimony and the “internal” evidence provided by a work’s content and style.
However, two years before Erdman and Fogel published their collection, Mosteller
and Wallace (1964) completed a groundbreaking study of the vexed problem of
authorship in The Federalist Papers, using sophisticated statistical methodology.
The example of Mosteller and Wallace, combined with the late twentieth-century
revolution in computing, inaugurated a new era for “non-traditional” statistically
316
DAVID I. HOLMES ET AL.
based studies of authorship; Holmes (1998) offers a comprehensive survey of the
flood of non-traditional scholarship that followed Mosteller and Wallace.
The best-known studies of authorship attribution, both traditional and nontraditional, have centered on a relatively limited body of texts, notably British
works from the Renaissance through the eighteenth century. However, Stephen
Crane, the nineteenth-century American writer best know for The Red Badge of
Courage, affords an interesting case study in attribution. Crane’s early unsigned
journalism, written from the New Jersey shore, has been studied by a number of
scholars using traditional methods (Berryman, 1950; Bowers, 1973; Elconin, 1948;
Kwiat, 1953; Williams and Starrett, 1948). In addition, O’Donnell (1966) used
computer-aided discriminant analysis in his non-traditional study of the posthumously published novel The O’Ruddy, begun by Crane and finished by Robert Barr.
However, no one had combined traditional and non-traditional methods in determining Crane’s authorship of disputed texts. This essay, a collaboration between a
literary scholar and two statisticians, is the first to do so.
2. Stephen Crane’s New Jersey Shore Journalism
Stephen Crane began his career as a professional writer in the summer of 1888,
when he was sixteen (Wertheim and Sorrentino, 1988). His assignment was to
assist his brother J. Townley Crane, Jr., almost twenty years older than Stephen,
who had established Crane’s New Jersey Coast News Bureau in 1880 when he
arranged to serve as correspondent for the Associated Press and the New-York
Tribune. For three-quarters of the year, Townley Crane’s duties must have been
light as he ferreted out news in the sparsely populated shore towns of Monmouth
County. However, during the summer months the news bureau’s duties exploded.
New York City newspapers of the 1880s and 1890s devoted remarkable amounts
of space to chronicling the summer vacations of the city’s upper and upper-middle
classes. Every Sunday edition of most New York newspapers and, during July and
August, most daily editions as well carried news articles from the summer resorts
popular with the more affluent citizens of Gilded Age New York: Saratoga Springs,
Newport, the Adirondacks, Cape May, and the northern New Jersey shore. The
format of these articles was standardized: a lead proclaimed the resort’s unique
beauties and the unprecedented success of the current summer season; a few brief
paragraphs recounted recent events, such as a fund-raising carnival or the opening
of a new hotel; and the article concluded with a lengthy list of names of recent
arrivals and where they were staying.
Stephen Crane’s best-known New Jersey shore article, published in the Tribune
on August 21, 1892, explodes this traditional format. His assignment was to report
on a parade of the Junior Order of United American Mechanics, a working-class
nativist organization that came annually to Asbury Park for a patriotic fest known
as “American Day.” Other newspapers, mindful of the group’s political power,
covered the parade with a few flattering sentences. Crane saw it as an oppor-
TRADITIONAL AND NON-TRADITIONAL AUTHORSHIP ATTRIBUTION
317
tunity for satire. He began by observing that the spectacle of an Asbury Park
crowd confronting the working-class marchers was “an interesting sight,” then
proceeded to juxtapose ironically the three groups brought together by the scene:
the marchers, “bronzed, slope-shouldered, uncouth and begrimed with dust”; the
spectators, “composed of summer gowns, lace parasols, tennis trousers, straw hats
and indifferent smiles”; and the native Asbury Parker, “a man to whom a dollar,
when held close to his eye, often shuts out any impression he may have had that
other people possess rights” (Bowers, 1971, pp. 521–522). Crane, who always
reserved his sharpest barbs for his own class, admired the “sun-beaten honesty”
in the faces of the marchers; however, it was the United American Mechanics who
wrote a letter of complaint to the Tribune, which led the newspaper to fire both
Stephen and Townley Crane (Wertheim and Sorrentino, 1994).
This ignominious episode in the early career of one of America’s greatest
writers was commented upon in letters and memoirs by many of his contemporaries, providing ample external evidence for Crane’s authorship of the article.
In the 1940s, literary scholars Elconin (1948) and Williams and Starrett (1948)
examined the files of the New-York Tribune for the summer of 1892, searching for
additional articles by Crane. Using internal evidence of both content and style, they
attributed eight other articles to Crane. The fact that these articles were strikingly
different in content and tone from the Tribune’s usual New Jersey shore articles
and their close resemblance in subject matter and style to the fiction Crane wrote
in 1892 – plus their identification by two different sets of Crane scholars, working
independently – made these attributions so convincing that they have been accepted
without question for over fifty years.
Kwiat (1952) found internal evidence as solid and compelling as that used by
Elconin and Williams and Starrett to attribute one additional 1892 Tribune article
to Crane. Berryman (1950) used definitive external evidence from a Crane contemporary to attribute an 1891 article. Thus, when the highly respected textual scholar
Fredson Bowers began to assemble his complete edition of Stephen Crane’s works,
there were a total of eleven articles in the canon of Stephen Crane’s New Jersey
shore journalism. Convinced that there were more to be found, Bowers set his corps
of graduate student assistants to work combing the files of the Tribune. They found
three articles which treated topics that Crane later developed into lengthy signed
articles; Bowers sensibly regarded this evidence as sufficient for attribution. His
edition of Crane’s journalism (1973) thus established the canon of Jersey shore
articles at a total of fourteen. In addition, Bowers’ researchers flagged twentyeight articles that, on the basis of internal evidence of style and content, seemed
to be by Stephen Crane. Bowers reprinted these articles in his edition as “Possible
Attributions.”
318
DAVID I. HOLMES ET AL.
3. Discovery and “Traditional” Attribution
The eleven articles definitively attributed to Crane in the 1940s and 1950s bore
datelines from three adjoining towns on the New Jersey shore: Asbury Park, Ocean
Grove, and Avon-by-the-Sea. When Bowers set his researchers to work to find
possible attributions, he evidently decided to limit his search to articles with
datelines from those three towns. No scholar questioned his decision. However,
during research for a book on Stephen Crane’s journalism (Robertson, 1997), we
came across an item in the Schoberlin Collection at the Syracuse University Library
that revealed limitations in Bowers’ search. In a folder labeled “Crane–1891,”
part of the materials that Melvin Schoberlin assembled for his never published
biography, a one-page prospectus for Crane’s New Jersey Coast News Bureau was
found, evidence of an attempt by Townley Crane to expand his business. The document’s subheading, printed just below the news bureau’s name, is “Sandy Hook
to Barnegat Bay.” The body of the prospectus lists the shore towns bounded by
those two prominent geographical features, including some of the most prominent
resorts on the Jersey shore – notably Long Branch, which was visited by every
U.S. President from Grant to Harrison and vied with Cape May for the distinction
of being New Jersey’s most fashionable summer destination; and Spring Lake, a
small but elegant resort.
With this new external evidence of the Crane news bureau’s wide geographical
range, we questioned Bowers’ decision to limit his search for possible attributions
to articles originating from Asbury Park and the two towns just south of it. Would
it not make sense for Townley to send his teenaged brother to cover news in the
resorts a few miles distant from their home base of Asbury Park and save himself
the trouble? Wouldn’t he need Stephen’s help to cover the news at Long Branch,
which was even larger and livelier than Asbury Park?
Shortly after finding the prospectus, we came across an article from Spring Lake
in the New-York Tribune of June 26, 1892. It begins:
This town has taken on its usual garb of lurid summer hue. The beach, the hotel
verandas and the lakeside are now all alive with the red and white and gold of
the town’s summer revellers, who make merry in a nice, mild sort of way. The
hotel proprietors have removed the sackcloth and ashes which is said to be their
dress during the dreary winter months, and have appeared in gentle, expansible
smiles and new clothes, for everything points to a most prosperous season.
Surely this was by the same author who wrote a week later from Asbury Park:
Pleasure seekers arrive by the avalanche. Hotel-proprietors are pelted with hailstorms of trunks and showers of valises. To protect themselves they do not put
up umbrellas, nor even prices. They merely smile copiously. The lot of the
baggageman, however, is not an easy one. He manipulates these various storms
and directs them. He is beginning to swear with a greater enthusiasm. It will be
a fine season. (Bowers, 1973, p. 509)
TRADITIONAL AND NON-TRADITIONAL AUTHORSHIP ATTRIBUTION
319
The second article was attributed to Stephen Crane by both Elconin (1948) and
Williams and Starrett (1948). We had little doubt that the first was his also. Both
passages are marked throughout by Crane’s distinctive ironic tone; both contain
witty hyperbole; and both employ striking lexical juxtapositions, such as the hotel
proprietors who wear “expansible smiles and new clothes” in the first passage and
who refrain in the second from putting up either umbrellas or prices.
It seemed likely that the Tribune contained additional Stephen Crane articles
from Spring Lake, Long Branch, and other locations not examined by Bowers and
other scholars. We determined to search for them. However, our first step was to
analyze Townley Crane’s prose. We searched the New-York Tribune for the summer
of 1886, when Crane’s New Jersey Coast News Bureau was already well established but Stephen had not yet begun his journalistic career, and collected articles
with a dateline from the New Jersey shore towns named in Townley’s prospectus.
We found a total of twenty-two articles. Although in accordance with journalistic
practice of the time none of the articles was signed, all bore an identical byline:
“From the Regular Correspondent of the Tribune.” In addition, the relatively small
number of articles published that summer – a fraction of the total published each
summer during the early 1890s – made it likely that Townley wrote all the articles
himself. Their style is remarkably consistent. Townley Crane seems to have been a
completely straightforward writer, an unimaginative but sincere booster of the New
Jersey shore towns where he made his living. In contrast, Stephen Crane is noted
for his gleefully scorching irony, evident throughout his journalism and fiction.
To locate articles that might be by Stephen, we searched the New-York Tribune
for the summers of 1888, when Stephen claimed he began assisting Townley,
through 1892, when he was fired. We read every issue from the last Sunday in
May, the earliest date when resort news was likely to appear, through the second
Sunday in September, when the last of the summer visitors departed, searching for
articles with a dateline from the New Jersey shore towns named in Townley Crane’s
prospectus.
The results of our search were striking. The 1886 articles were uniformly pallid
and inoffensive in their style. However, in 1889, when Stephen was seventeen, a
distinctive new voice suddenly emerged in the Tribune. On July 30 the newspaper
published an article that takes ironic aim at the visitors to a summer institute for
Protestant clergy:
After spending half a day in discussing the question “Is There Any Other
Science Than Physical Science? If So, What & Why?” it was a curious sight
to see a number of the reverend intellectual giants of the American Institute of
Christian Philosophy seated in a boat fishing for crabs and gravely discussing
the question “Is there any better bait for crabs than fish tails? If so, what and
where is it to be found?” Other eminent lecturers went in bathing, and as they
bobbed up and down in the waves they solemnly argued about immersion.
The internal evidence of its playfully ironic style strongly suggested that this article
was Stephen’s. Content provided additional evidence for the attribution; Stephen
320
DAVID I. HOLMES ET AL.
wrote about the American Institute of Christian Philosophy the following summer
in an article definitively attributed and reprinted by Bowers (1973).
Using the traditional attributional tools of content and style, we found sixteen
other articles published between 1889 and 1892 that we identified as possibly by
Stephen Crane. As a whole, the seventeen possible attributions that we identified,
written when Crane was seventeen to twenty years old, confirm how remarkably
early he set his distinctive writing style and artistic agenda; more than a century
after their original newspaper publication they remain delightful reading. In addition, the sheer quantity of articles from the summer of 1892 – fourteen of our
seventeen attributions, which supplement dozens of other articles and short stories
that he wrote in 1892 – reveal how vigorously the twenty-year-old Crane sought to
establish himself in the role of professional writer. Finally, our discoveries include
an 1892 article about the New Jersey National Guard summer encampment at Sea
Girt. Like all of Crane’s work, the article is witty and ironic. Its larger significance
is that it shows Crane was familiar with the military culture of his state’s national
guard; thus, it constitutes an important piece in completing the puzzle of how a
young man who had never seen war could write so convincingly about it in The
Red Badge of Courage, which Crane began the year after he left the Tribune.
Our initial attributions were limited to articles that were so stylistically
distinctive in their irony and verbal inventiveness that they clearly looked to
be from Stephen’s hand rather than Townley’s. For an alternative and objective
statistical analysis, we turned to the science of stylometry.
4. ‘Non-Traditional’ Attribution: Stylometry
4.1. S AMPLING AND TEXTUAL PREPARATION
The stylometric task facing us was to examine the seventeen articles and attribute
them to either Stephen or Townley Crane, who so far as is known were the only
writers contributing New Jersey shore articles to the Tribune. Suitable control
samples in more than one genre are required, so, within the genre of fiction,
several textual samples of about 3,000 words were obtained from The Red Badge
of Courage and Joseph Conrad’s The Nigger of the “Narcissus”, the latter being
chosen because we know that Crane and Conrad read and admired each other’s
novels. For journalistic controls, we turned to Richard Harding Davis and Jacob
Riis, who were, along with Crane, the most prominent American journalists of
the 1890s. We know that Crane was familiar with their work, which paralleled his
own war correspondence (in the case of Davis) and New York City journalism (in
Riis’s case). Accordingly, samples of text were taken from Davis’s A Year from a
Reporter’s Notebook and Riis’s How the Other Half Lives.
Examples of Stephen Crane’s New Jersey shore reports, his signed New York
City journalism, and his war correspondence, also signed, were taken from the
University of Virginia edition of Crane’s work; samples of Townley Crane’s journalism were taken from the New-York Tribune. The seventeen anonymous articles
321
TRADITIONAL AND NON-TRADITIONAL AUTHORSHIP ATTRIBUTION
Table I. Textual samples
Author
Title
Date
Sample
Number
of words
Stephen Crane
The Red Badge of Courage
1895
1
2
3
4
5
3022
3036
3037
3009
3006
Joseph Conrad
The Nigger of the “Narcissus”
1897
1
2
3
4
5
3000
3000
2999
2996
3014
Richard Harding Davis
A Year from a Reporter’s Notebook
1897
1
2
3
3000
3000
2999
Jacob Riis
How the Other Half Lives
1890
1
2
3
3000
2992
3032
Townley Crane
Journalism
1886
1
2
3
1660
1660
1658
Stephen Crane
New York City journalism
1894
1
2
3
3000
3000
3000
Stephen Crane
Shore journalism
1890–1892
1
2
3
2304
2304
2306
Stephen Crane
War correspondence
1897–1898
1
2
3
2888
3447
3406
1889–1892
1
2
1814
1802
Anonymous articles
were first merged, the resultant text then being split into two halves of approximately 1800 words each. All samples were either typed, scanned or downloaded
from an internet resource. The following table lists the texts and samples used in
this investigation along with their dates of composition.
322
DAVID I. HOLMES ET AL.
4.2. S TYLOMETRIC METHODOLOGY
A number of studies have recently appeared in which the features used as indicators
are not imposed by the prior judgement of the analyst but are found by straightforward procedures from the texts under scrutiny (see Burrows, 1989, 1992; Binongo,
1994; Burrows and Craig, 1994; Holmes and Forsyth, 1995; Forsyth and Holmes,
1996; Tweedie et al., 1998; Forsyth et al., 1999). Such textual features have been
used not only in authorship attribution but also to distinguish among genres. This
approach involves finding the most frequently used words and treating the rate of
usage of each such word in a given text as a feature. The exact number of common
words used varies by author and application but generally lies between 50 and 75,
the implication being that they should be among the most common in the language,
and that content words should be avoided. Multivariate statistical techniques are
then applied to the vector of occurrence rates to search for patterns.
Each phase of the analysis (see below) employs different text selections, so
only the most frequently occurring non-contextual function words for those particular texts under consideration are used. Special computer software identifies these
words from the corpus of texts and computes their occurrence rates for each
individual text in that corpus.
4.3. H IERARCHY OF ANALYSES
(a) Fiction only: Stephen Crane and Joseph Conrad
The first phase in the investigation was designed to establish the validity of
the technique discussed above, within the context of this research. Known texts
should appear to be internally consistent within author but distinct from those by
other authors. Using the textual samples from Stephen Crane’s The Red Badge of
Courage and Conrad’s The Nigger of the “Narcissus”, the fifty most frequently
occurring words were identified and the occurrence rates of these words used as
input to a principal components analysis. The positions of the samples in the space
of the first two principal components are plotted in Figure 1.
Figure 1 shows that the five Crane text samples are tightly clustered, having
positive values on the first principal component, whereas the five Conrad text
samples all lie to the left of the plot with negative values on the first principal
component. The horizontal axis (PC1) is the dominant axis, explaining 39.2% of
the variation in the original data, with the vertical axis (PC2) explaining only an
additional 15.3%. In looking for patterns, therefore, it is in order to project the
points downwards onto this first axis. We can see which words are highly associated with Crane and Conrad by looking at the associated scaled loadings plot in
Figure 2, which helps to explain the clusterings observed in the main plot. We may
imagine this to be superimposed on top of Figure 1. Words on the right of this
plot such as “himself”, “youth” and “from” have high usages by the author on the
right of the previous plot, namely Crane, while words to the left such as “on”, “up”
TRADITIONAL AND NON-TRADITIONAL AUTHORSHIP ATTRIBUTION
323
Figure 1. PCA fiction: Crane vs. Conrad.
and “out” are words favored by Conrad. These plots confirm the validity of the
“Burrows” technique within this context, showing the Crane and Conrad samples
to be clearly distinguishable from each other.
(b) Genre comparison: Crane’s fiction and journalism
In this phase, we discard the Conrad samples and bring in the textual samples of
Stephen Crane’s journalism both from the shore (labeled S) and from New York
City (labeled N). The samples from The Red Badge of Courage are labeled R.
Using the fifty most frequently occurring words from this corpus, Figure 3 shows
the textual samples plotted in the space of the first two principal components, which
together explain 54.5% of the variation in the original data set.
This plot clearly shows that Crane’s shore journalism differs markedly in his
use of function words from his fiction writing. Projection onto the first principal
component also reveals that his New York City journalism has a style that differs
from his shore journalism but is similar in word usage to the style of his fiction.
Looking at the dates of composition of these textual samples, it is interesting to
note that the New York City journalism is also closer in chronological terms to his
novel than are the textual samples from the shore. It is not impossible, therefore,
that the first principal component may have captured date of composition and not
324
DAVID I. HOLMES ET AL.
Figure 2. Scaled loadings plot fiction: Crane vs. Conrad.
genre, but the time scale here spans just five years and date of composition may
not be an important factor. The associated scaled loadings plot in Figure 4, which
again, may be superimposed on Figure 3, tells us that words such as “and”, “is”,
“which”, “of”, “on” and “are” occur more frequently in his shore journalism than
in his other writings.
(c) Stephen Crane’s journalism
Having noted the stylometric difference between Crane’s New York City journalism and his shore journalism, we can now discard the genre of fiction, which has
served its purpose as a control, and add Crane’s third mode of journalism to the
analysis, namely his war correspondence. Accordingly the three textual samples
obtained from his war dispatches from the Greco-Turkish War (1897) and from the
Spanish-American War (1898) were added to the other samples of his journalism,
and a principal components analysis run on the occurrence rates of the fifty most
frequently occurring words in this corpus, in the usual manner. Figure 5 shows the
samples plotted in the space of the first two principal components, which together
explain 50% of the variation in the data set.
This plot clearly illustrates how even Crane’s non-contextual function words
differ in their rate of usage among the three sub-genres of his journalism, along the
TRADITIONAL AND NON-TRADITIONAL AUTHORSHIP ATTRIBUTION
325
Figure 3. PCA Crane: Journalism vs. Fiction.
first principal component. Examination of the dates of composition of the textual
samples indicate that this principal component may once again be capturing “time”,
although there is a maximum span of just eight years between his earliest shore
journalism and his latest war correspondence. Clearly, when looking at the disputed
texts in a forthcoming analysis, we must be careful to compare them only against
the appropriate mode of journalism from our known writings and we must also be
aware of possible chronological factors.
(d) Journalism controls
We now proceed to the next phase by bringing in the samples of journalistic writing
from Townley Crane, Richard Harding Davis and Jacob Riis, and discarding the
samples of Stephen Crane’s war journalism, which have served their purpose.
By comparing writing styles solely within the genre of journalism, we hope to
add further weight to the validation of the method of analysis. Figure 6 shows
these textual samples plotted in the space of the first two principal components
derived from the occurrence rates of the fifty most frequently occurring words.
The groupings are very evident, the most interesting being the tight clustering of
the three Townley Crane samples (labeled T), which all lie well to the left along
the first principal component, which explains 32.7% of the variation in the original
326
DAVID I. HOLMES ET AL.
Figure 4. Scaled loadings plot Crane: Journalism vs. Fiction.
data set. It is the second principal component, which explains an additional 17.0%
of the variation, that separates out the Davis (labeled D) and Riis (labeled R)
textual samples from the others, although it is hard to distinguish between these
two writers with just three samples from each. Nevertheless, the clear distinction
between Townley’s shore journalism and Stephen’s shore journalism means that
we may now confidently proceed to the final stage of the investigation involving
the anonymous articles from the New Jersey shore.
(e) The Crane brothers and the anonymous articles
Having validated the technique on the control samples, we may now focus exclusively on the main task, namely the attribution of the seventeen anonymous articles
in the New-York Tribune, assumed to be from the hand of either Stephen or
Townley Crane. The only textual samples used in this final phase of analysis are
the shore journalism extracts from both Stephen and Townley, and, of course, the
two samples containing the anonymous articles. The samples of Stephen Crane’s
New York City journalism will be discarded, since we are now looking solely at
journalism originating from the shore. These shore textual samples are also closest
in chronological terms to the anonymous articles.
TRADITIONAL AND NON-TRADITIONAL AUTHORSHIP ATTRIBUTION
327
Figure 5. PCA Stephen Crane journalism.
The number of high-frequency function words used in this attributional phase
was maintained at 50. The occurrence rates of these words for the texts under
consideration were computed and, once again, a principal components analysis
conducted on the data array. Figure 7 shows the textual samples plotted in the
space of the first two principal components, which together explain 53.7% of the
variation in the data set.
Projection onto the first principal component in Figure 7 shows the two disputed
samples (labeled D) to be remarkably internally consistent and to lie clearly on the
left of the axis, the “Stephen” side. They do, however, appear to be somewhat
distinctive since they are pulled away by the second principal component (which
explains 16.6% of the variation). It is possible that this distinction in vocabulary between Crane’s previously published shore articles and the newly attributed
articles arises because all of the latter are short news articles, whereas the previously identified pieces include both news reports and several long feature articles
that have a somewhat different generic status.
Since the evidence provided by Figure 7 is not compelling, an alternative
analysis may be made using the technique of cluster analysis. Dendrograms
represent a more reliable depiction of the data since we do not lose a significant proportion of the original variability when using cluster analysis. Figure 8
328
DAVID I. HOLMES ET AL.
Figure 6. PCA all journalism controls.
shows the resulting dendrogram, using the occurrence rates of the 50 words as
raw variables, squared Euclidean distance as the metric and average linkage as the
clustering algorithm.
Looking at the clustering, we can see that the two disputed samples first merge
together, then join into the “Stephen” cluster. The “Townley” cluster remains
distinct. The results of the cluster analysis and principal components analysis
are now mutually supportive, confirming the “traditional” attribution of these
seventeen articles to the youthful ironist Stephen Crane.
5. Conclusion
The “non-traditional” analysis has supplied objective, stylometric evidence that
supports the “traditional” scholarship on the problem of authorship of these seventeen articles. However, we do not wish to claim that our dual approach to attribution
offers proof positive of Stephen Crane’s authorship of each of the articles; indeed,
we regard such assertions of authorship of disputed texts, in the absence of
conclusive external evidence, as remnants of an outmoded positivist epistemology.
Postmodern inquiry suggests that we be sceptical of truth claims in authorship
attribution. In this, it agrees with poet John Keats, who argued that the mark of
TRADITIONAL AND NON-TRADITIONAL AUTHORSHIP ATTRIBUTION
329
Figure 7. PCA journalism and the disputed articles.
Figure 8. Dendrogram Crane brothers and the disputed articles.
the highest intellect is “negative capability,” the capacity to accept the limits of our
knowledge and to remain in “uncertainties, Mysteries, doubts, without any irritable
reaching after fact and reason” (Rollins, 1958).
A postmodern approach to authorship attribution avoids positivist claims, yet
it need not remain adrift in a sea of signifiers. If, in the absence of definitive
external evidence, no attributional claim can be absolute, some methodologies will
330
DAVID I. HOLMES ET AL.
nevertheless be more reliable than others. In blending a traditional approach to the
attribution of these seventeen articles with a non-traditional, stylometric approach,
we agree with the viewpoint of Hänlein (1999), who argues that the most reliable
results in authorship recognition studies take into account both “intuitive” findings – i.e., the traditional scholar’s inherently subjective recognition of an author’s
distinctive style – and computational methods. A sequential approach to attribution is recommended by Rudman (1998), who stresses, “Any non-traditional study
should only be undertaken after an exhuastive traditional study. The non-traditional
is a tool for the traditional authorship scholar, not a proving ground for statisticians
and others to test statistical techniques.” We believe that this joint interdisciplinary
approach should be the way in which attributional research is conducted.
Acknowledgements
Michael Robertson’s research was supported by a FIRSL grant from The College
of New Jersey. David Holmes’ and Roxanna Paez’s research was supported by
the New Jersey Minority Academic Career fellowship program. We wish to thank
Dr Richard Forsyth of the University of Luton, UK, for the use of his specialist
computer software in the analysis phase of this investigation.
References
Altick, R.D. The Art of Literary Research, 3rd edn. New York: Norton, 1981.
Berryman, J. Stephen Crane: A Critical Biography. New York: William Sloane, 1950.
Binongo, J.N.G. “Joaquin’s Joaquinesquerie, Joaquinesquerie’s Joaquin: A Statistical Expression of
a Filipino Writer’s Style”. Literary and Linguistic Computing, 9 (1994), 267–279.
Bowers, F., ed. Tales, Sketches and Reports. Vol. 8 of The University of Virginia Edition of the Works
of Stephen Crane. Charlottesville: University Press of Virginia, 1973.
Burrows, J.F. “ ‘An Ocean Where each Kind . . .’: Statistical Analysis and Some Major Determinants
of Literary Style”. Computers and the Humanities, 23 (1989), 309–321.
Burrows, J.F. “Not Unless You Ask Nicely: The Interpretive Nexus Between Analysis and Information”. Literary and Linguistic Computing, 7 (1992), 91–109.
Burrows, J.F. and D.H. Craig. “Lyrical Drama and the ‘Turbid Mountebanks’: Styles of Dialogue in
Romantic and Renaissance Tragedy”. Computers and the Humanities, 28 (1994), 63–86.
Elconin, V.A. “Stephen Crane at Asbury Park”. American Literature, 20 (1948), 275–289.
Erdman, D.V. and E.G. Fogel, eds. Evidence for Authorship: Essays on Problems of Attribution.
Ithaca: Cornell University Press, 1966.
Forsyth, R.S. and D.I. Holmes. “Feature-Finding for Text Classification”. Literary and Linguistic
Computing, 11 (1996), 163–174.
Forsyth, R.S., D.I. Holmes and E.K. Tse. “Cicero, Sigonio and Burrows: Investigating the Authenticity of the ‘Consolatio’ ”. Literary and Linguistic Computing, 14 (1999), 1–26.
Hänlein, H. Studies in Authorship Recognition – A Corpus-based Approach. European University
Studies, Series XIV, Vol. 352. Frankfurt am Main: Peter Lang, 1999.
Holmes, D.I. “The Evolution of Stylometry in Humanities Scholarship”. Literary and Linguistic
Computing, 13 (1998), 111–117.
Holmes, D.I. and R.S. Forsyth. “The ‘Federalist’ Revisited: New Directions in Authorship Attribution”. Literary and Linguistic Computing, 10 (1995), 111–127.
TRADITIONAL AND NON-TRADITIONAL AUTHORSHIP ATTRIBUTION
331
Kwiat, J.J. “The Newspaper Experience: Crane, Norris, and Dreiser”. Nineteenth-Century Fiction, 8
(1953), 99–117.
Mosteller, F. and D.L. Wallace. Applied Bayesian and Classical Inference: The Case of the Federalist
Papers. Reading, MA: Addison-Wesley, 1964.
O’ Donnell, B. “Stephen Crane’s ‘The O’Ruddy’: A Problem in Authorship Discrimination”. In The
Computer and Literary Style. Ed. Jacob Leed. Kent, OH: Kent State University Press, 1966.
Robertson, M. Stephen Crane, Journalism, and the Making of Modern American Literature. New
York: Columbia University Press, 1997.
Rollins, H.E., ed. The Letters of John Keats, Vol. 1. Cambridge: Harvard University Press, 1958.
Rudman, J. “Non-Traditional Authorship Attribution Studies in the Historia Augusta: Some
Caveats”. Literary and Linguistic Computing, 13 (1998), 151–157.
Tweedie, F.J., D.I. Holmes and T.N. Corns. “The Provenance of ‘De Doctrina Christiana’, Attributed
to John Milton: A Statistical Investigation”. Literary and Linguistic Computing, 13 (1998), 77–
87.
Wertheim, S. and P. Sorrentino, eds. The Correspondence of Stephen Crane, 2 Vols. New York:
Columbia University Press, 1988.
Wertheim, S. and P. Sorrentino. The Crane Log: A Documentary Life of Stephen Crane. New York:
G. K. Hall, 1994.
Williams, A.W. and V. Starrett. Stephen Crane: A Bibliography. Glendale, CA: John Valentine, 1948.