Robo-Grading and Writing Instruction: Will the Truth Set Us

Speaking My Mind
Robo-Grading and Writing
Instruction: Will the Truth
Set Us Free?
Dave Perrin
Rochelle Township High School
Rochelle, Illinois
[email protected]
When I was a graduate student in En­glish literature in the mid-1990s, I wrote a paper on Kenneth
Branagh’s (then recent) production of Much Ado
about Nothing. My chosen topic was discrepancies
between Shakespeare’s text and the seduction scene
involving Barachio and Margaret as portrayed in
Branagh’s film. When the professor returned the
paper to me, I noticed that he had commented on
the front page something to the effect that mine
was one of the most finely written papers he had
received. This comment is not why I remember
the paper. His comment on the back of the paper
is what has stuck with me all these years. The comment began with a question: “So what?” He went
on to posit, quite rightly, that my thesis did nothing to shed light on our understanding of Shakespeare or his play. My essay was, itself, “much ado
about nothing” because although well written, it
was vacuous and void of substantial critical insight.
I was reminded of this incident last year as
I read of a competition sponsored by the William
and Flora Hewlett Foundation to see how algorithms could be developed to assess student writing. Hewlett’s education program director, Barbara
Chow, concluded, “We had heard the claim that the
machine algorithms are as good as human graders,
but we wanted to create a neutral and fair platform
to assess the various claims of the vendors. It turns
out the claims are not hype” (qtd. in Stross). Earlier
that year, a University of Akron study found that an
analysis of 16,000 middle school and high school
essay tests previously graded by humans indicated that, as the researchers stated, robo-­graders
“achieved virtually identical levels of accuracy,
with the software in some cases proving to be more
104
reliable” (qtd. in Bienstock). Dr. Mark Shermis,
Akron’s College of Education dean, conceded that
“automatic grading doesn’t do well on very creative
kinds of writing. But this technology works well
for about 95 percent of all the writing that’s out
there” (qtd. in Bienstock).
Regardless of the claims of the proponents
of robo-graders, the clear winners in the standards movement thus far have been those in the
billion-dollar-a-year testing industry. Within the
language arts curriculum, the emphasis of the
standards movement until now has been focused
almost exclusively on reading and its concomitant multiple-choice testing, but as Common Core
and other movements place renewed emphasis on
writing and critical thinking, the test-prep industry will undoubtedly keep churning out “breakthroughs” in these areas of composition curriculum
and assessment.
There are skeptics. Les Perelman, a retired
director of writing at the Massachusetts Institute
of Technology (MIT), has set out to expose the
flaws and foibles of robo-grading. After studying
the Educational Testing Service’s e-Rater, Perelman
found some of the qualities that e-Rater privileges
as “good” writing—sentence structure; length of
words, sentences, and paragraphs; and conjunctive
adverbs—as evidence of complexity of thought.
Where it fails is in recognizing facts, logic, and truth.
“E-Rater doesn’t care if you say the War of 1812
started in 1945,” Perelman was quoted as saying in
a 2012 New York Times column by Michael Winerip.
In fact, Perelman has manufactured essays based on
obviously faulty premises and received e-Rater’s top
score of six. One such argument reads, “The average
En­g lish Journal 102.6 (2013): 104–106
Copyright © 2013 by the National Council of Teachers of English. All rights reserved.
EJ_July2013_C.indd 104
7/3/13 2:39 PM
Speaking My Mind
teaching assistant makes six times as much money
as college presidents. In addition, they often receive
a plethora of extra benefits such as private jets, vacations in the south seas, starring roles in motion pictures” (qtd. in Winerip). While the e-Rater fails to
grasp the obviously faulty logic of this argument, the
indisputable advantage of the robo-grader over the
classroom teacher is speed. E.T.S. researcher David
Williamson boasts that e-Rater can grade 16,000
essays in 20 seconds (qtd. in Winerip).1
A more recent target of Perelman’s ire is EdX,
a nonprofit joint venture of Harvard and MIT to
offer online courses promoting essay-grading software as an integral part of its instructional platform.
EdX has partnered with other institutions, including Stanford. Grading software is also being used
by other “massive open online courses” (MOOCs),
including two start-ups founded by Stanford faculty members. Daphne Koller, a computer scientist and founder of one of these MOOCs, Coursera,
says of grading software, “It allows students to get
immediate feedback on their work, so that learning
turns into a game, with students naturally gravitating toward resubmitting the work until they get it
right” (qtd. in Markoff). Perelman states that his
major concern with the EdX software is that it “did
not have any valid statistical test comparing the
software directly to human graders” (qtd. in Markoff). Perelman has joined an online petition to stop
robo-grading called Professionals Against Machine
Scoring of Student Essays in High-Stakes Assessment, which states flatly in its mission, “Computers
cannot ‘read’” (qtd. in Markoff). In spite of its critics, EdX has aligned itself with a dozen universities
and expects to expand its influence even further.
So what?
What’s at Stake in Robo-Grading?
What is most vitally at stake here is not writing
assessment or the validity of standardized writing tests. What is most at stake is instruction. In
the standards movement, testing drives instruction because the stakes are so high for teachers and
schools. When facts, logic, and truth become dispensable in the assessment of writing, then writing
instruction, ostensibly, will become focused solely on
the mechanics of writing. So much for the short-lived
return to critical thinking that the Common Core
State Standards initiative promises to bring back to
the En­glish curriculum. Although the e-Rater and
its brethren may not be interested in the truth, the
truth is that writing teachers always have been.
Good writing teachers encourage fact checking,
ensure that material is asserted and quoted within
its intended context, challenge the logical assumptions of student writing, make personal connections,
encourage a student’s development of voice and linguistic ingenuity, and perform a thousand other tasks
in responding to student writing that are simply
above the pay grade of robo-graders. But will teachers continue to offer rich responses in a high-stakes
testing world proliferated with robo-graders, or will
they succumb to the pressure to teach to a test that so
narrowly defines what “good” writing is?
As a writing teacher for the past 20 years, I
have often found myself borrowing my professor’s
“So what?” question, using it as a jumping-off point
for my responses to student writing whenever a student fails to make a logical connection or, as I did in
my Much Ado paper, labors for pages on a banality.
As writing teachers know, this type of critical commentary is necessary in teaching students to be critical thinkers. Moreover, “good” writing is always
subjective. En­glish teachers are notorious for their
pet peeves and personal opinions of what is “good.”
Over time, astute student writers will collect these
hallmarks of good writing
from various teachers, stack Will teachers continue to
them against one another offer rich responses in a
and their own, reject some high-stakes testing world
and embrace others, and proliferated with roboeventually develop their
graders, or will they
own style and criteria for
good writing. The adop- succumb to the pressure
tion of such a narrowly to teach to a test that so
defined concept of writing narrowly defines what
in which, for instance, each “good” writing is?
sentence in a student essay
must be at least 15 words long or contain a conjunctive adverb, threatens this process. As Patricia
O’Connor points out in the May 2012 En­glish Journal, “I know from my teaching experience that the
nature of writing is not as linear as the data miners would lead us to believe. Learning how to write
well is a complex, recursive process, and despite
years of research, it still resists being broken neatly
into discreet segments for assessment” (105).
English Journal
EJ_July2013_C.indd 105
105
7/3/13 2:39 PM
Robo-Grading and Writing Instruction: Will the Truth Set Us Free?
Writing teachers like O’Connor develop what
Elliot Eisner refers to as “connoisseurship,” or “the
art of appreciation” (215). Of educational connoisseurship, in general, Eisner states, “One must have
a great deal of experience with classroom practice to
be able to distinguish what is significant about one
set of practices or another” (216). Writing teachers
develop this connoisseurship of the practices of writing and writing instruction over time through their
own reading, writing, and interaction with other
readers and writers, as well as through thoughtful
reflection on the processes of writing and the teaching of writing. The proponents of robo-grading laud
it precisely because it provides some sort of objective quantification of writing, but writing teachers
know that a certain degree of subjectivity is inescapable, and indeed even essential to the assessment
of writing, as the self cannot be removed from the
act of reading (or grading) any more than it can be
removed from the act of writing. Students must be
taught to read and write in a world where facts matter, where logic is challenged, and where the “truth”
is often not only subjective but also subject to nearly
inscrutable nuances. In short, they must be taught to
write for diverse audiences, not algorithms.
Editor’s Note
1. For more information, see the “NCTE Position Statement on Machine Scoring: Machine Scoring Fails the Test,”
prepared by the NCTE Task Force on Writing Assessment and
adopted by the NCTE Executive Committee in April 2013
(http://www.ncte.org/positions/statements/machine_
scoring).
Works Cited
Bienstock, Jordan. “Grading Essays: Human vs. machine.”
Schools of Thought. CNN.com. 10 May 2012. Web. 7 July
2012. <http://schoolsofthought.blogs.cnn.com/2012/05/
10/grading-essays-human-vs-machine/>.
Eisner, Elliot. The Educational Imagination. 3rd ed. Upper
Saddle River: Prentice, 1994. Print.
Markoff, John. “Essay-Grading Software Offers Professors a
Break.” New York Times. 4 Apr. 2013. Web. 14 Apr. 2013.
<http://www.nytimes.com/2013/04/05/science/new-testfor-computers-grading-essays-at-college-level.html>.
O’Connor, Patricia. “Waving the Yellow Flag on DataDriven Assessment of Writing.” En­glish Journal 101.5
(2012): 104–05. Print.
Stross, Randall. “The Algorithm Didn’t Like My Essay.” New
York Times. 9 June 2012. Web. 12 June 2012. <http://
www.nytimes.com/2012/06/10/business/essay-gradingsoftware-as-teachers-aide-digital-domain.html>.
Winerip, Michael. “Facing a Robo-Grader? Just Keep
Obfuscating Mellifluously.” New York Times. 22 Apr.
2012. Web. 7 July 2012. <http://www.nytimes.com/
2012/04/23/education/robo-readers-used-to-grade-testessays.html>.
Dave Perrin has been teaching high school literature and composition for more than 20 years. He currently teaches dualcredit composition courses at Rochelle Township High School through Kishwaukee College. He holds a master’s degree in
En­glish literature as well as a doctoral degree in curriculum leadership, both from Northern Illinois University.
The Teacher
He said This line is loose, and tugged
at it and the whole poem began to unravel,
and Why this word and not another?,
and the foundation seemed about to
crumble as if a brick had been removed.
Then he said Ah, this line is good, and
the ground steadied again, and What an interesting
image, and an oak sprouted from the ground.
—Matthew J. Spireng
© 2013 by Matthew J. Spireng
Matthew J. Spireng’s book What Focus Is was published in 2011 by Word Press. His book Out of Body won the 2004
Bluestem Poetry Award and was published in 2006 by Bluestem Press. He has several chapbooks including Clear Cut, a
signed and numbered limited edition of his poems paired with photographs by Austin Stracke on which the poems are based.
Email him at [email protected].
106
July 2013
EJ_July2013_C.indd 106
7/3/13 2:39 PM