test-retest and inter-analyst reliability of the automated

TEST-RETEST AND INTER-ANALYST RELIABILITY
OF THE AUTOMATED READABILITY INDEX,
FLESCH READING EASE SCORE, AND THE FOG COUNT
Georgelle Thomasa
R. Derald Hartley
Georgia Southern College
So. Carolina Vocational Rehabilitation Dept.
J. Peter Kincaidb
Georgia Southern College
Abstract. Using six analysts, test-retest and inter-analyst reliabilities were
determined for the Automated Readability Index (ARI), the Flesch Reading Ease
Score, and the Fog Count. All coefficients, with the exception of one Flesch
measure, were above .94.
Analysis of variance applied to measured working times indicated that the
Flesch takes significantly longer to use than the ARI and Fog.
A number of readability formulas have been designed to assess the difficulty
level of written material. Generally, readability formulas include at least two
components: (1) some measure of sentence difficulty (usually sentence length) and
(2) some measure of word difficulty. These two measures are typically put into a
regression equation which assigns a numerical value denoting ease or difficulty of
the material in the form of a total formula score or a grade level.
The Flesch formula (Flesch, 1948; Farr & Jenkins, 1949; Flesch, 1951) and
the Fog Count (Gunning, 1952) have been widely used in the last two decades.
More recently, a new readability formula, the Automated Readability Index (ARI)
has been introduced (Smith & Senter, 1967). With the ARI, reading material is
typed on a slightly modified IBM Selectric typewriter, the Readability Index
Tabulator.0 The modification consists of attaching three microswitches which
a
Reprints may be requested from Dr. Thomas, Department of Psychology, Georgia Southern
College, Division of Social Sciences, Statesboro, Georgia 30458.
b
This research was supported by Grant #OEG-4-71-0069 from the United States Office of
Education, Department of Health, Education, and Welfare, and by a Georgia Southern College
Faculty Research Committee Grant.
C
A complete description, pictures, and a wiring diagram are in the appendix of the Kincaid, Van
Duesen, Thomas, Lewis, Anderson and Moody (1972) report.
Downloaded from jlr.sagepub.com at PENNSYLVANIA STATE UNIV on February 21, 2016
150
Journal of Reading Behavior 1975 VII, 2
cumulatively record the formula factors: number of strokes (recorded each time the
ball of the typewriter goes forward, number of words (recorded each time the
space-bar is pressed), and number of sentences (recorded each time the = is
pressed). This last operation, recording sentences, is the primary change the typist
must make in his usual typing technique; he is simply instructed to punctuate the
sentence as usual and follow each end of sentence punctuation mark with a =.
The ARI prediction equation is as follows:
GL = 0.50 (wd/sn) + 4.71 (st/wd) - 21.43
where GL = assigned grade level
wd/sn
= words per sentence or sentence length
st/wd
= strokes per word or word length
There are two published validation studies of the ARI (Smith & Kincaid; 1970;
Kincaid & Delionbach, 1973) which used military technical material with military
personnel serving as subjects. Kincaid et ah (1972) validated the ARI using adult
basic reading material and subjects enrolled in a federally sponsored job training
program.
Reliability studies of the Flesch Reading Ease Score (Hayes, Jenkins, &
Walker, 1950; England, Thomas, & Paterson cited in Klare, 1963) indicated
test-retest and between-analyst reliabilities about +.90. Kincaid et ah (1972)
reported very high test-retest reliabilities for the ARI formula factors (each
exceeding +.98). A search of the literature failed to locate reliability studies on the
Fog Count.
The present investigators obtained reliability coefficients for the ARI, the
Flesch Reading Ease Score, and the Fog Count. Test-retest and inter-analyst
reliabilities were determined for the separate formula factors, total formula scores,
and grade level. Additionally, time measures were recorded for each analyst on all
three formulas.
METHOD
Subjects
Six paid female volunteers served as analysts. They were selected on the basis
of a standardized typing test. Group mean typing speed was 56.7 words per minute
with a mean error rate of .57 errors per minute.
Materials
The written material consisted of 20 paragraphs of the Minnesota Reading
Examinations for College Students, Forms A and B (Haggerty & Eurich, 1930).
Instructions for calculating all formulas were compiled and edited for
simplicity and clarity. A combined tabulation and computational sheet was devised
for each formula.
Downloaded from jlr.sagepub.com at PENNSYLVANIA STATE UNIV on February 21, 2016
Thomas, Hartley, Kincaid
151
The Readability Index Tabulator and electronic calculators were the only
other materials.
Procedure
Prior to the study, the analysts attended a training session in which they were
familiarized with all three readability formulas, and were given practice in applying
the formulas, using the tabulation sheets, the Readability Index Tabulator and the
calculator.
In the study, analysts worked independently. Each analyst was given the 20
paragraphs and 20 tabulation sheets and worked with a particular formula until all
20 selections were completed. First, analysts examined (either manually or with the
Readability Index Tabulator) each selection for the formula factors required to
compute the total score or grade level. When these values were recorded on a
tabulation sheet, that sheet was given to the examiner who recorded the counting
time spent on that paragraph. After the 20 selections were completed in this
manner, all the tabulation sheets were returned to the analysts so that the necessary
mathematical computations could be completed. As computations for each
paragraph were completed, the tabulation sheets were returned to the examiner
who recorded computation time.
The analysts completed all three formulas in the above manner. Order of
presentation of the three readability formulas to each analyst was determined by a
table of random numbers.
After an interval of two weeks, the analysts reapplied all the formulas in the
same order as the first session.
RESULTS
The test-retest reliability coefficients are found in Table 1.
Interanalyst reliability was determined by the intraclass procedure described
by Guilford (1954); this statistic yields an average intercorrelation based on all possible pairs of Analysts. Results are in Table 2.
Total time data are presented in Table 3.
An analysis of variance was performed on the total time data with Test-Retest
and Formula as factors. A significant Test-Retest factor .FQ,25) = 16.39, p < .01,
indicates the expected practice effect in the use of the formulas. The Formula
Factor result, Ff2,25) = 21.18, p < .01, was further examined by a Duncan's New
Multiple Range Test. Significant differences (p < .01) were found between the ARI
and the Flesch, and between the Fog Count and the Flesch.
Downloaded from jlr.sagepub.com at PENNSYLVANIA STATE UNIV on February 21, 2016
Table 1
Pearson Product Moment Test-Retest Correlations
for ARI, Flesch Reading Ease Score , and Fog Count
Readability Measure
Correlation Coefficient
Table 2
Inter-Analyst Correlations
for ARI, Flesch Reading Ease Score, and Fog Count
Readability Measure
Correlation Coefficient
ARI
Strokes
Words
Sentences
Total ARI
Grade Level
.990
.994
.991
.987
.989
ARI
Strokes
Words
Sentences
Total ARI
Grade Level
.994
.999
.998
.987
.998
Flesch Reading Ease Score
Syllables
Words
Sentences
Total Flesch Reading Ease Score
.983
.997
.945
.795
Flesch Reading Ease Score
Syllables
Words
Sentences
Total Flesch Reading Ease Score
.995
.999
.979
.969
Fog Count
Easy Elements
Polysyllables
Sentences
Total Fog Count
Grade Level
.977
.980
.958
.941
.963
Fog Count
Easy Elements
Polysyllables
Sentences
Total Fog Count
Grade Level
.988
.995
.986
.990
.994
Downloaded from jlr.sagepub.com at PENNSYLVANIA STATE UNIV on February 21, 2016
Thomas, Hartley, Kincaid
153
Table 3
Summary Table of Total Time Measures
(in Minutes) of Test-Retest and Formula Factors
Fog
Formula
ARI
Flesch
Total
Time
X
(Per Passage)
Test
Retest
44.2
28.1
34.0
29.2
62.2
46.8
140.4
104.1
9.3
6.9
Total Time
X(Per Passage)
72.2
7.2
63.3
6.3
109.0
10.9
DISCUSSION
The test-retest and inter-analyst reliability coefficients indicate the consistently high reliability of all three formulas. Only one r (Flesch Reading Ease Score
test-retest r = .795) was below .94. The low r might have resulted from the
comparatively more difficult application of the Flesch regression equation and
hence a practice effect.
Analysis of variance of total time measures revealed that the Flesch formula
takes significantly longer to use than the ARI and the Fog Count. It is entirely
possible that the use of one of the two available Flesch charts, the Farr-Jenkins
Table (Farr & Jenkins, 1949) or the Flesch Nomograph (Flesch, 1951) would
decrease total time required for computation. The total application time for the
ARI is, of course, dependent upon the efficient use of a typewriter. The analysts in
the present study were fairly skilled (57.7 words per minute with per minute error
rate of .57). It appears, then, that the ARI can be applied about as rapidly as the
Fog Count if the typist achieves a speed of 55-60 words a minute.
Aside from time considerations, the ARI has several advantageous qualities.
As pointed out by Smith and Kincaid (1970) the Tabulator accomplishes two
things simultaneously: it provides a typed copy of the material, and it automatically tabulates the necessary formula factors thus bypassing the necessity of manual
counting.
All three formulas have been programmed for computer use, thus permitting
the evaluation of larger quantities of material by a reduced number of personnel. Of
the formulas, the ARI lends itself best to computer programming as strokes, words,
and sentences can be perfectly counted by a computer. The report by Kincaid et al.
(1972) contains a FORTRAN IV program for the ARI. Both the Fog Count and
Flesch Reading Ease Score require the counting of syllables and this is somewhat
more difficult to program for computer computation. However, Klare, Roe, St.
John, and Stolurow (1969) reported a program for the Flesch formula that counts
syllables with over 99% accuracy. Because any readability formula gives only an
Downloaded from jlr.sagepub.com at PENNSYLVANIA STATE UNIV on February 21, 2016
154
Journal of Reading Behavior 1975 VII, 2
indication of reading difficulty this accuracy is quite satisfactory. These same
investigators (Klare et ah, 1969) also developed a computer program for the Fog
Count. Coke and Rothkopf (1970) have developed three algorithms for estimating
syllables from vowels per word, consonants per word, and letters per word. The
vowels per word algorithm gives the best estimate and this can be used to calculate
Flesch Reading Ease scores.
REFERENCES
COKE, E. V., & ROTHKOPF, E. Z. Note on an algorithm for a computer-produced
reading ease score. Journal of Applied Psychology, 1970, 54, 208-210.
FARR, J. N. & JENKINS, J. J. Tables for use with the Flesch Readability Formula.
Journal of Applied Psychology, 1949, 33, 275-278.
FLESCH, R. A new readability yardstick. Journal of Applied Psychology, 1948, 23,
221-233.
FLESCH, R. How to test readability. New York: Harper & Brothers, 1951.
GUILFORD, J. P. Psychometric methods. New York: McGraw-Hill, 1954.
GUNNING, R. The technique of clear writing. New York: McGraw-Hill, 1952.
HAGGERTY, M. E., & EURICH, A. C. Minnesota reading examinations for college
students, Form A and Form B. Minneapolis, Minnesota: University of
Minnesota Press, 1930.
HAYES, P. M., JENKINS, J. J., & WALKER, B. J. Reliability of the Flesch
Readability Formulas. Journal of Applied Psychology, 1950, 34, 22-26.
KINCAID, J. P. & DELIONBACH, L. J. Validation of the Automated Readability
Index: A follow-up, Human Factors, 1973, 15, 17-20.
KINCAID, J. P., VAN DUESEN, J., THOMAS, G., LEWIS, R., ANDERSON, P. T.
and MOODY, L. Use of the Automated Readability Index for evaluating
peer-prepared material for use in adult reading education. OEG-4-71-0069.
Statesboro, Georgia: Georgia Southern College, 1972 (ERIC file # ED-068814).
KLARE, G. R. The measurement of readability. Ames, Iowa: Iowa State University
Press, 1963.
KLARE, G. R., ROWE, P. P., ST. JOHN, M. G., & STOLUROW, L. M. Automation
of the Flesch "Reading Ease" Readability Formula, with various options.
Reading Research Quarterly, 1969, 4, 550-559.
SMITH, E. A. & KINCAID, J. P. Derivation and validation of the Automated
Readability Index for use with technical materials. Human Factors, 1970, 12,
457-464.
SMITH, E. A., & SENTER, R. J. Automated Readability Index. AMRL-TR-66-22.
Wright Patterson AFB, Ohio: Aerospace Medical Division, 1967.
Downloaded from jlr.sagepub.com at PENNSYLVANIA STATE UNIV on February 21, 2016