as a PDF

International Comparisons and
the Condition of American Education
GERALD W. BRACEY
Edi4cational Researcher, Vol. 25, No. 1, pp. 5-11 Given that the results of at least one study were debated
I
nternational comparisons of achievement levels across
various curriculum subject areas have received a considerable amount of publicity. Indeed, some argue that
trends on domestic achievement tests are no longer relevant because the salient reference groups are students in
other nations. In articles outside of educational research,
the reports on international comparisons have accentuated
the negative, even when the results have been positive. For
instance, the November, 1992 issue of The American School
Board Journal carried an unbylined report on How in the
World Do Students Read? under the headline, "Good News:
Our 9-Year-Olds Read Well; Bad News: O u r 14-Year-Olds
Don't." In fact, as discussed in more detail below, 14-yearolds finished eighth among 31 nations, and only one of the
seven countries ahead of the United States actually had
significantly higher scores.
On the pages of educational research journals, such comparisons have also been the subject of debate: Is the United
States an underachiever in comparison to Japan or not?
(Baker, 1993; Westbury, 1992, 1993). Stedman (1994a) addressed the isjjue in a more balanced manner than most
commentators, b u t even his treatment detailed problems
while giving short shrift to positive results (the reading
study mentioned previously received only a one-sentence
mention, for instance).
As noted, though various researchers have m a d e various
interpretations of the studies, elsewhere there has been little ambiguity about what the results show:
American students are performing at much lower levels
than students in other industrialized nations (Shanker,
1992a).
International examinations designed to compare students
from all over the world usually show American students
at or near the bottom (Shanker, 1992b).
Yet even America's best high school students, as international comparisons reveal, rank far behind students in
countries challenging us in the multinational marketplace
(Gray & Kemp, 1993).
with vigor, given the importance that international studies
have assumed, a n d given the strong negative pronouncements emanating from a variety of sources in the general
media, it seems appropriate to summarize the findings of
these various reports.
Because some have challenged the methodology of early
studies (Rotberg, 1990), I begin with the Second International Mathematics Study (SIMS; 1987). This study is not
without its problems, especially at the 12th grade. At this
level, SIMS administered tests of advanced algebra and
calculus, b u t only to those students still taking science and
advanced mathematics. Such an attempt to measure apples
against apples fails because the proportion of students in
the mathematics classes varies enormously among countries. For instance, In H o n g Kong only 2% of the students
were still enrolled in mathematics courses, whereas in
Japan 12% were and in H u n g a r y 50% were. This difference
no doubt accounts for changes in the relative rankings
from 8th grade to 12th grade. H o n g Kong moved from an
average ranking in Grade 8 to n u m b e r 1 in Grade 12,
whereas H u n g a r y fell from number 2 to near the bottom.
Japan finished number 1 at the 8th-grade level, n u m b e r 2
at the 12th-grade level. Such vagaries cloud the meaning of
different countries' average scores.
Such selection biases are likely one major reason why no
study since SIMS has attempted to measure achievement at
the senior high school level (the Third International Mathematics Study will try). Another is that students at this
level, in this country at least, have no reason to perform
well on a test that has n o relevance to their life. Such omissions carry their o w n risks, of course. Because of tracking
programs in the high school, curricula may differ—for
some nations, eighth grade is the last time that all students
in a cohort will be studying the same material.
The United States did not finish "dead last" in any of the
comparisons. At the 12th grade, U.S. students were 14th of
15 nations in advanced algebra, 12th of 15 in geometry, and
12th of 15 in calculus. These finishes are quite in line with
the SIMS measurement of Opportunity To Learn (OTL).
Stedman (1994b) makes much of the SIMS comparisons
of the top 1% and 5% of 12th-grade students in calculus,
thinking, incorrectly, that this equated nations for selectiv-
In mathematics and science, American high schoolers finish last or next to last in virtually every international measure (Gerstner, Semerad, Doyle, & Johnston, 1994).
We have gotten used.. .to coming in dead last in international math comparisons (Krauthammer, 1994).
An ' F In International Competition (Newsweek, 1992)
GERALD W. BRACEY is an educational researcher, writer, and Ex-
ecutive Director of The Alliance for Curriculum Reform, 3333
Helen Street, Alexandria, VA 22305. His areas of interest are assessment and policy analysis.
JANUARY-FEBRUARY 1996
Downloaded from http://er.aera.net at PENNSYLVANIA STATE UNIV on September 12, 2016
lty. Says Stedman, "The researchers found that the United
States ranked last or next to last in each analysis, scoring
significantly lower than many countries " In fact, no comparisons of significance were made. Although the U.S. did
achieve the low ranks Stedman claims, the U.S. top 1%
scored at 61 and the top-ranked Japanese scored at 70. Differences among the scores of the top 5% in various countries were also small except in comparison to those of the
Japanese
At the eighth-grade level US. students finished in the
middle in arithmetic, 10th of 20 nations, 12th of 20 m algebra, 16th of 20 in geometry, and 18th of 20 in measurement.
Many countries have quite similar scores. For instance, in
geometry U. S students average 38% correct to finish in
16th place whereas Scotland attained a 4th-place ranking
with an average of 44% correct America's average score of
51 in arithmetic garnered 10th place, whereas an average of
59 would have meant a third-place finish. One wonders if
the differences among the averages are suitably large to
form a basis for making policy decisions about the state of
education in any nation. As will be seen later, average
scores are also quite misleading given the enormous variability about the average.
The slightly below-average finish in algebra seems unusually high given that most eighth-grade students in this
country do not take algebra as a formal course and most
students in other nations do Indeed, Westbury has contended that only the comparison of arithmetic scores is a
like-against-like comparison, at least when looking at
American scores versus those of the Japanese because the
curricula sequences differ so greatly in other curncular
areas (Westbury, 1993).
It was the concern of comparability that led Westbury to
disaggregate U S scores according to the kind of class that
students were in- remedial, typical, prealgebra, or algebra
With such a disaggregation, U.S. students actually taking
algebra scored somewhat higher than Japanese students,
whose algebra average score was the highest of any nation
Of course, comparing a group of 20% of American students, probably the most academically able 20%, against an
entire Japanese cohort introduces a selection bias favoring
the American students However, when Westbury compared the scores of those Americans taking algebra against
the top 20% of Japanese students, they still scored as well
as this Japanese elite (Westbury, 1992) Such results, not uncommon, run counter to claims emanating from the work
of Stevenson (et al., 1990; Stevenson, 1992) that only a tiny
fraction of American students score as well in mathematics
as the median Japanese student.
The 1992 Second International Assessment of Educational Progress (IAEP-2) produces results similar to those
found in Westbury's analyses (Lapointe, Askew, & Mead,
1992, Lapointe, Mead, & Askew 1992) In mathematics, the
U S. ranks low, 13th of 15 nations for 9-year-olds and 14th
of 15 for 13-year-olds Ranks, of course, obscure performance and when one looks at the actual scores, the U.S. average is only slightly behind the international average as
shown in Tables 1 and 2. Table 2 shows the comparable results for science
Table 2 shows that in science U.S. 9-year-olds score just
above the international average (they ranked 3rd among 15
nations), whereas 14-year-olds have an average score identical to the international average
6
Table 1
U.S. and International Averages for Mathematics on
the Second International Assessment of
Educational Progress
Average
U.S
Internationa!
(Percent correct)
9-year-olds
13-year-olds
58
63
55
58
Another way of looking at these results is to compare the
distance from the U S. score to that of the top-ranked country in terms of number of items. These results are shown in
Table 3.
The picture that emerges here is clearly mixed with U.S.
9-year-olds averaging only one item fewer correct than the
top-ranked nation while American 13-year-olds missed 14
more items in mathematics, on average, than those in the
top-ranked nation.
As with SIMS, scores among countries in IAEP-2 are
often tightly bunched, and small differences in scores make
large differences in ranks If U.S. 13-year-olds had scored
72% correct in science instead of 67, they would have finished 5th rather than 13th Similarly, if the third-ranked 9year-olds had scored 60 instead of 65, they would have
finished 12th. These results point to an important and often
ignored aspect of international comparisons: Most countries score close together such that small differences in
scores make large differences m ranks It is difficult to imagine drawing strong conclusions or framing major policy
decisions from such small differences.
Yet another picture emerges, perhaps the most meaningful picture, if one compares the results from IAEP-2 with
results from the 1992 NAEP mathematics assessment
which included state-level data for 41 states. Some such
comparisons are made in the NCES report Education in
States and Nations in which the 13-year-old IAEP-2 data are
transformed into NAEP scales (NCES, 1993). Using other
NAEP reporting categories (ETS, 1992), I found the following situation among the top scorers (Bracey, 1994).
Asian students, U S Schools 287
Taiwan 285
Korea 283
Advantaged urban students, U S. schools 283
Iowa, North Dakota, 283
White students, U S schools 277
Hungary 277
Table 2
U.S. and International Averages for
Science Second International Assessment of
Educational Progress
Average
U.S
International
(Percent correct)
EDUCATIONAL RESEARCHER
9-year-olds
13-year-olds
65
67
62
67
Table 3
Raw Scores of U.S. and Top-Ranked Nations
Second International Assessment of
Educational Progress
Score
Mathematics, 9-year-olds
Mathematics, 13-year-olds
Science, 9-year-olds
Science, 13-year-olds
Top-ranked
nation
United
States
Maximum
possible
46
55
39
57
35
41
38
48
61
75
58
72
That Asian American students outscore all other nations,
states, or other NAEP reporting groups at the very least
raises questions about the sources of their achievement.
That White students tie for third place with Hungary (comparing only countries) means that a large majority of
American students are scoring at the highest levels: White
and Asian students together comprise more than 70% of
the K-12 population.
With a near-average overall score and some American
groups scoring high, some must be scoring low. At the
lower end of the distribution these results appear (Bracey,
1994)
Jordan 246
Mississippi 246
Hispanic students, U S. schools 245
Disadvantaged urban students, U.S schools 239
Black students, U S schools 236
There is no NAEP category of "disadvantaged rural students" else we would likely find another group at this
lower end. Research by the Population Reference Bureau
reveals a rural underclass about half the size of the urban
underclass (O'Hare & Curry-White, 1992). These rural
dwellers are in even more dire straits than poor people in
the cities
These data are reported by ethnicity because ethnicity is
a commonly used system in this nation for categorizing results.1 However, other data, such as those reported by
NCES (1990, p 26), Jaeger (1992), and Robinson and
Forsyth (1994), strongly convey that much of the difference
is actually attributable to social factors. Robinson and
Forsyth, for instance, noted that 83% of the variance in
state-level NAEP scores could be accounted for by four
variables, number of parents in the home, parental educational level, type of community, and state poverty rates at
ages 5-17.
Another IAEP-2/NAEP table in Education in States and
Nations strongly suggests that comparison of nations'
school systems on the basis of average scores is not meaningful. As one moves from Jordan, the lowest nation at 246,
to Taiwan, the highest nation at 285, one traverses 39
NAEP scale points. Within Taiwan, though, as one moves
from the 5th percentile to the 95th percentile, one traverses
123 NAEP scale points. If the distance were from the 1st to
the 99th percentile, another 20-odd points would be added.
Many other states and nations also show similar patterns. Thus, the within-country and within-state variance
is much greater than the between-country or between-state
variance. Though some critics of American schools have
stated or strongly implied that students in other nations
are monohthically better than U.S students (e.g., Gray &
Kemp, 1993; Shanker, 1992a, 1992b), this is not true. Indeed, the variabilities are such that one must question how
representative the average scores are. In IAEP-2, for instance, in a number of comparisons, the 95th percentile of
the United States is higher than the 95th percentile of some
countries with higher average scores
The variability of scores, somewhat incidentally, raises
important questions for endeavors such as the New Standards Project and other standard-setting activities. Where
could one set a standard that would be perceived as credibly high without failing large percentages of students 7
And what would one do then?
The studies leading to the most damaging assessments
of the condition of U.S. schools have come from a series of
studies conducted on a much smaller scale than SIMS or
IAEP-2 by Harold Stevenson and James Stigler and various
colleagues at the University of Michigan (Stevenson, 1992;
Stigler & Stevenson, 1991; Stevenson, Lee, Chen, & Lummis, 1990; Stigler, Lee, & Stevenson, 1987; Stevenson, Lee,
& Stigler, 1986; and summarized in the book, The Learning
Gap, Stevenson & Stigler, 1992). This work has lead to the
widespread citation of statistics purportedly showing that
only 1% of American students performs as well in mathematics as the median Japanese student (Matthews, 1992).
It should be noted that the cover of The Learning Gap
reads, in its entirety, "Why American Schools Are Failing
and What We Can Learn From Japanese and Chinese Education." However, the book does not examine "American
schools." It contains some information from kindergarten,
but the bulk of the data are from Grades 1 and 5 only and
almost exclusively from mathematics. One wonders how
much such data from two elementary grades and one curriculum subject can say about the entire American educational system. Or the Japanese system
While the cover wording might be attributed to the publisher's marketing needs, there are methodological problems in the Stevenson et al studies. The various articles do
not reveal how the schools were selected or how representative they are It would be naive in the extreme to believe
that a nation as closed, a nation as obsessed with its public
image as The People's Republic of China (demonstrated
vividly in the recent two women's conferences held there)
would give an American researcher free access to a random sample of schools.
Kazuo Ishizaka, a former teacher and member of the National Institute for Educational Research, has called attention to selection problems in Japan (Ishizaka, 1994). He
contends that Stevenson—and other visitors—are gaining
entrance only to the schools the government wishes them
to see. Ishizaka reports that though the curriculum in Japan
is demanding, achievement on that curriculum is low. This
outcome is important when coupled with a selection bias:
"I took (visiting) teachers to an ordinary level high school
and they said 'Oh it is terrible. Why did you guide us to
such schools?'"
Ishizaka's observations, empirical but not systematic,
need to be followed up for they have implications far beyond the' Stevenson work. Stedman (1994a) and others
have assumed either that countries obtained a representa-
JANUARY-FEBRUARY1996
7
Downloaded from http://er.aera.net at PENNSYLVANIA STATE UNIV on September 12, 2016
tive sample of students and/or that, by selecting only the
top 1% or 5% in the study, selectivity was countered. Says
Stedman (1994a), "The point is that the researchers who
conducted the assessments explicitly dealt with the sampling problems." But such dealings were constrained to the
samples presented and Ishizaka's assertions strongly
imply that such dealings were bound to fall short.
Similarly, as regards the sampling in the Stevenson et al.
studies, the education of most Chinese is very poor, and
the hope of the PRC is to attain universal ninth-grade education sometime around 2000 Yet the Chinese parents in
the Stevenson studies had attended school on average
more than 11 years.
Stevenson and colleagues have also not presented much
demographic information about the Minneapolis, Taiwanese, and Japanese samples, but one report comparing
students in Chicago and Beijing contains some important
data (Stevenson, Lee, Chen, & Lummis, 1990). In that sample, 13% of the families in Chicago had incomes below
$10,000, and the sample seems generally weighted toward
low incomes. Black and Hispanic students, as shown earlier, do not perform well on academic tests, but the Chicago
sample contained 39% Black and Hispanic children. Nationally, by 1986, after the Stevenson et al. data-gathering
had been completed, Blacks and Hispanics constituted 26%
of the K-12 population (NCES, 1990).
Over 20% of the Chicago children did not speak English
at home. The Chicago sample was thus not a representative
sample of the United States, nor was it comparable to the
Beijing sample on many important demographic variables.
The Chicago sample is heavily weighted with variables
associated with low achievement
There are other variables difficult to quantify in terms of
their effects, but that render these studies difficult to interpret. For instance, the children in the Chicago sample lived
in homes with, on average, two siblings. Because of the
population control policies in China, however, most of the
Beijing progeny were only children. To reduce a nation that
used to count children by the dozen to a nation of single
children is to produce jangling cultural changes. The new
children are a brood sometimes referred to as "The Spoiled
Brats of Beijing." Parents and grandparents alike lavish
attention on them. In addition, some 10% of the Chicago
households contained at least one grandparent, whereas
50% of the Beijing household had at least one How would
one enter this "granny factor" into a regression equation?
In any case, the Chinese children are in a much more adultmtensive environment than their Chicago peers.
We can note in passing that in an early study, American
students were found to do as well as Asian students in
reading and better in vocabulary (Stevenson, Lee, Lucker,
& Stevenson, 1982). These areas were dropped from later
research Early studies, too, explained the differences in
achievement exclusively in terms of time:
If the classrooms visited in this study are representative
of American elementary school classrooms, we must conclude that American children fail to receive sufficient instruction. They spend less time each year in school, less
time each day in classes, less time in the day in mathematics classes, and less time in each class receiving instruction (Stigler, Lee, & Stevenson, 1987)
If we deem it important to close the mathematics "learning gap," doing so should not be difficult. To do so,
8
though, might be to lower our ranking in international
studies in reading which, as mentioned earlier, is quite
high. In any case, Stevenson et al. have never explained
why they have needed more recently to invoke explanatory variables other than time.
From the work of Stevenson and Stigler has also
emerged the notion that Americans believe in ability
whereas Asians believe in effort. One section of The Learning Gap is titled "The American Emphasis on Ability,"
whereas another is headlined "The Asian Emphasis on Effort" (Stevenson, Lee, & Stigler, 1986; Stevenson & Stigler,
1992). This interpretation of Stevenson and Stigler by Merseth (1993) is typical:
A predominant view in America is that one either "has it"
or one doesn't. Effort receives little credit for contributing
to successful learning in mathematics—or, for that matter, in any subject For example, American, Japanese, and
Chinese mothers were asked what factors among ability,
task difficulty, and luck made their children successful in
school American mothers ranked ability the highest,
while Asian mothers gave high marks to effort. This led
the researchers to conclude that "the willingness of Japanese and Chinese children to work so hard in school may
be due, in part, to the stronger belief on the part of their
mothers in the value of hard work."
The researchers that Merseth refers to and cites are
Stevenson, Lee, and Stigler, 1986. But in this instance
Stevenson, Lee, and Stigler, and Merseth after them, appear to have misinterpreted the data.
As can be seen in Figure 1, it is true that American mothers rank ability as more important than Asian mothers. But
as they figures show, they rank effort as more important than
ability And, in fact, they rank it almost as highly as Asian
mothers The data create the pictorial impression that the
scores are on a 5-point scale, but as the legend shows, they
are on a 10-point scale. It is hard to imagine that these differences in the value assigned to effort produce the kinds
Effort
Ability
- Task
difficulty
Luck
FIGURE 1 Number of points (out of ten) that mothers assigned
to the relative importance of factors that affect academic achievement [from Stevenson & Stigler, 1992].
EDUCATIONAL RESEARCHER
their achievement test did screen students on one variable
place of birth.
Almost all of the U S students who [scored at the lowest
levels] were nonnative speakers of English who had been
identified as Limited English Proficient by their local
schools. Most were recent immigrants from Spanishspeaking countries, many of whom had had limited exposure to formal education.
When studies leave mathematics, the most easily cross-culturally tested subject, for the area of reading, they find that
U.S. students have consistently performed well. In the
most recent and sophisticated of these studies, conducted
on students in 31 nations by the International Association
for the Evaluation of Educational Achievement (IEA),
Chicago
Beijing
American 9-year-olds finished second only to Finland, a
small, homogeneous nation that does not concern itself a
lot with teaching Finnish as a second language (Elley,
FIGURE 2 Children's evaluation of the importance of effort and
1992). The 9-year-olds were 22 points out of first place on a
ability for success in school. (Scale. 5 = very important; 1 = Not
600-point scale identical to that of the SAT. American 14at all important.) [From Stevenson & Shgler, 1992].
year-olds tied for eighth place Ranks, as noted earlier, obscure performance. The 14-year-olds were only three
points further away from first place, 25 points, than the 9of dedication to hard work that Merseth and Stevenson,
year-olds. The high ranking countries were again tightly
Lee, and Stigler believe they are observing.
bunched such that the distance from second-place France
As Figure 2 shows, American children have the same beto eighth-place America was only 14 points (the scale is 600
lief structure as American mothers: They think ability is
point, identical to that of the SAT). An analysis by NCES
more important than do Chinese children, but they think
(OECD, 1993) found that there were, in fact, no significant
that effort is more important than ability. They rank it virdifferences among the second through eleventh place
tually as important as Chinese children. It appears that
nations.
American children and their mothers are still taken with
As with the mathematics studies, there was considerable
the notion of "an 'A' for effort."
variance within countries in reading scores. The variability
One study by Mayer, Tajika, and Stanley (1991) puts
was such that American students of both ages had the
mathematics achievement in Japan and America in a somehighest scores at their 90th, 95th, and 99th percentiles
what different light. Mayer et al. administered an achieve(NCES, 1994, pp 217-218).
ment test consisting largely of computation along with a
It is worth noting that though the German education sysproblem-solving test to Japanese and American students.
tem has been considered by some as worthy of emulation,
The Japanese students scored considerably higher than the
the German students' reading scores were very close to the
American students The researchers then divided the
median. The German Research Service reported these findgroups into six subgroups based on scores on the achieveings with some anxiety, especially after noticing that "Germent test, and looked at the problem-solving test scores of
man standards were exceeded. . .even in the USA." Even
each group. For five of the groups, the American problem(German Research Service, 1992). German authorities,
solving scores were higher, and for the remaining group
called on to explain Germany's poor showing, promptly
the American and Japanese scores were the same. These reput the blame on the family for neglecting books.
sults agree with allegations that instruction in mathematics
It is also worth noting, for what it says about the media
(and, for that matter, other subjects) in Japan emphasizes
and
the control of perceptions, that when IAEP-2 was rethe rote aspects of the subject (Boylan, 1993, Schooland,
leased in February of 1992, it received wide media cover1990; Van Wolferen, 1987).
age. "An ' F in World Competition," was the headline at
Stigler and Miller (1993) challenged these findings,
Newsweek, a typical reaction (1992). When the IEA reading
claiming that they suffered the "matched group fallacy"
study was released in July of 1992, no media outlet carried
where matching on one variable systematically unmatches
it The study surfaced only when a European friend of
on other important variables. This was especially imporEducation Week reporter Robert Rothman sent him a copy
tant, they contended, because so many American students
when it was published in Germany. Education Week then
scored at the lowest level. This meant that Mayer et al.
carried the story on page one (Rothman, 1992) USA Today
(1991) had obtained a group of very intelligent students at
picked up the report from Education Week and also ran the
the upper end. Mayer and Tajika retorted that the matchstory on page 1 (Manning, 1992) No other print or elecing fallacy, and its concomitant regression to the mean,
tronic medium covered the event. The USA Today story
might apply if the match were only on one extreme group
featured a comment from Francie Alexander, then Deputy
at the high end, but that the differences had shown up for
Assistant Secretary of Education, dismissing the study:
all groups (Mayer & Tajika, 1993) They also noted that
"This is OK for the '80s, but for the '90s and beyond, kids
Stigler and Miller failed to define "intelligence" and were
are going to have to do better." Even today, when I speak
thus begging the question.
around the country and ask audiences of administrators
In looking at why so many American students had
and professors for a show of hands from those who have
scored at the lowest level, Mayer and Tajika found that
JANUARY-FEBRUARY 1996
9
Downloaded from http://er.aera.net at PENNSYLVANIA STATE UNIV on September 12, 2016
from Iowa is important partly because, in some ways, Iowa
is frozen in time It has no large cities and is still 98%
White Most state-level then-and-now studies are difficult
to interpret because states have changed on so many variables. Iowa has changed, too, of course, but less than most.
In addition, the Iowa testing program is part of the air in
Iowa schools, having been in existence for almost 60 years.
It is not a recently imposed, high-stakes program As a
long-standing low-stakes program, it is less subject than
many other programs to testing's "Lake Wobegon Effect."
International Comparisons and Domestic Indicators
One can ask why people feel obliged to portray U S.
These high performances should not really come as a surschools in such a negative light in international comparisons or on domestic indicators. For those approaching the
prise because these results confirm those from domestic insituation from a political or ideological stance, an answer is
dicators. We are often unaware of them because of the
not hard to find. Some answers are presented in "The
ready mindset to perceive failure We have already noted
the The American School Board Journal headline on the IEA Right's Data-Proof Ideologues" (Bracey, 1995b). In addition, some observers have pointed to disingenuousness
reading study where an 8th-place finish of 14-year-olds in
among business and industry (Bracey, 1994a; Cuban, 1994;
a comparison of 31 nations was seen as "bad news." SomeRothstein, 1994). In addition, some observers have noted
times the distortion of results seems deliberate. The reader
the lack of K-12 support in the university research comis referred to The Manufactured Crisis for discussion of this
munity (Bracey, 1994b; Myrdal, 1969) After all, almost all
possibility (Berliner & Biddle, 1995).
of the papers commissioned by the people who produced
Still other domestic indicators show trends similar to
"A Nation At Risk" were written by academics. A lot of
those of international comparisons. For example, in spite of
university faculty have staked their reputations on redemographic changes in the SAT test-taking pool weighted
search projects that assume the system is in crisis. Shortly
against math achievement, the proportion of students scorafter the first "Bracey Report" appeared, I received a letter
ing above 650 is at a record level. The average score on the
from one professor that read, in part, "The American comSAT was set at 500 on 10,654 students living in the Northmon school is an endangered species. The wimps in edueast Ninety-eight percent of these students were White,
cation won't defend it. They're afraid of losing their money
60% were male, and 40% had attended private high
or access to the corridors of power "
schools Currently, the SAT test-taking population is 30%
minority, 52% female, and 31% of the students report famWhile it is common currently to date criticism of the
ily incomes of less than $30,000 a year Yet the SAT matheschool from the 1983 publication of "A Nation At Risk,"
the schools have actually been subject to similar criticisms
matics "national average" has fallen only 22 points and not
from the public and critics for over a century (Bracey,
at all if one controls for these demographic changes
1995b, Kent, 1987; Newman, 1978). These criticisms rose to
(Bracey, 1990).
a crescendo in the years just after World War II, were valiCurrently, the proportion of students scoring above 650
dated to the critics by the launch of Sputnik in 1957, and
on the SAT mathematics is at an all-time high, something
have never experienced a diminuendo since. Beginning in
that cannot be accounted for by Asian students. Though
the period of Reconstruction and continuing through
Asian students score higher than other ethnic groups, they
today, whenever faced with a pressing social problem, the
constitute only 8% of all testtakers, up from 4% a decade
nation has turned to its schools for a solution while failing
ago, far too few to produce the numbers seen (College
to provide sufficient resources to make the solution a genBoard, 1992,1993, Jackson, 1976). When the standards were
uine possibility The reader is referred to Final Exam. A
set on the SAT, 6.68% of the test takers scored above 650.
Study of the Perpetual Scrutiny of American Public Schools
Currently 12% score above 650.
(Bracey, 1995b) for an elaborated discussion of this posiSince 1978, the number of students taking the College
tion.
Board's Advanced Placement tests has increased more
Independent of what the findings actually say, the value
than fourfold, from 98,000 in 1978 to 448,000 in 1994 (Colof international comparisons is a subject of much debate
lege Board, 1994). Yet the average score has fallen only 10
and perhaps the topic of a separate paper at a later time.
points on the AP's five-point scale. Certainly some increase
Suffice it to say here that when the data are analyzed with
in the number of test takers is occasioned by economic conno prior position as to what they say, the nation's schools
cerns: AP tests provide an economical way of obtaining
are seen to be performing at a much higher level than has
college credit Still, with such an increase in testtakers, one
been presented by many U. S Department of Education ofmight have expected more of a drop in average scores.
ficials or in the popular press 2
Some achievement test scores are at record levels, also
(H. D Hoover, personal communication, June, 1994)
Scores on the ITBS and ITED, both in Iowa and the nation,
dropped from the 1960s to the mid-1970s. They have been
Notes
rising consistently since. By 1990, all grades were at record
•About 10 years ago, NAEP moved to cease reporting data by ethhighs.
nicity NAEP officials were persuaded to keep these categories, howLike the SAT but unlike most commercial achievement
ever, largely by Mary Hatwood Futrell, then president of the National
tests, new forms of the ITBS and ITED are equated to preEducation Association. Futrell argued that it was important to mainvious forms. Thus longitudinal trend data are possible to
tain ethnic categories to measure progress—or the lack of it—in attaining equality of educational outcomes. She felt the value of this
track back to the 1930s. In addition, that we have such data
heard of the study, only perhaps 1% of the group knows
of it.
In this summary, many of the results in international
comparisons have been for mathematics, a subject in which
it is typically reported that American students, in general,
perform terribly. The data, taken together, however, reveal
that many students perform quite well relative to students
even in mathematics and in even the highest scoring nations in mathematics.
20
EDUCATIONAL RESEARCHER
Downloaded from http://er.aera.net at PENNSYLVANIA STATE UNIV on September 12, 2016
measurement outweighed the danger that the results were amenable
to racist interpretations
2
I have become identified with having a "position" on what the
data say However, when I was first led to analyze the data in the fall
of 1990, it was quite by accident and I held the position that is consistently found m the annual Phi Delta Kappa /Gallup poll' Parents think
the local schools are OK and that there is a crisis in the nation's schools
(Rose & Elam, 1995). The conclusion that this condition was, indeed,
a "manufactured" crisis was initially quite surprising.
References
Advanced Placement Report, 1993 (1994) New York. The College Board.
Good news' Our 9-year-olds read well, Bad news Our 14-year-olds
don't (1992, November) American School Board Journal
An ' F in World Competition (1992, February 17) Newszveek,p 57.
Baker, D P (1993) Compared to Japan, the U S is a low achiever
really. Educational Researcher, 22(3), 18-20
Bracey, G W (1990, November 21) SATs Miserable or miraculous 7
Education Week, 36
Bracey, G W (1992) The Second Bracey Report on the Condition of
Public Education Phi Delta Kappan, 72,104-117.
Bracey, G W (1994a) First world, third world, all right here at home
Phi Delta Kappan, 75, 649-651.
Bracey, G W (1994b, July) The greatly exaggerated death of our
schools Presentations to the IDEA Summer Fellows Program, Ontario (CA), Denver, and Applcton, WI
Bracey, G W. (1995a). Final exam. A study of the perpetual scrutiny of
American public schools. Bloomington, IN Agency for Instructional
Technology
Bracey, G W. (1995b, January 25). The right's data-proof ideologues.
Education Week, p. 44
Cuban, L (1994, June 27) The great school scam Education Week, p 44
Education m States and Nations. (1992). Washington, DC' Center for Education Statistics
Elam, S M , & Rose, L C (1995) The 27th Annual Phi Delta Kappan/Gallup Poll of the public's attitudes toward public schools Phi
Delta Kappan, 77, 41-56
Elley, W B (1992) How in the world do students read7 Hamburg International Association for the Evaluation of Educational Achievement.
German Research Service (1992) Special Science Reports, VIII, November, 12-14
Gerstner, L V, Jr, Semerad, R D , Doyle, D P, & Johnston, W B
(1994) Reinventing education New York Dutton Books
Gray, C B, & Kemp, E J, Jr (1993) Flunking testing Is too much fairness unfair to school kids? The Washington Post, September 19
International Mathematics and Science Assessments What Have We
Learned' (1992) Washington, DC U S Department of Education,
National Center on Education Statistics
Ishizaka, K (1994) Japanese education—The myths and the realities
In Different Visions of the Future of Education Ottawa, Ontario. Canadian Teachers Foundation
Jackson, R (1976) An examination of declining numbers of high-scoring
SAT candidates New York The College Board
Jaeger, R M (1992) Weak measurement serving presumptive policy
Phi Delta Kappan, 74,118-128
Kent, J D (1987) A not too distant past Echoes of the calls for reform
Educational Forum, Winter, 137-150
Krauthammer, C (1994). Save the border collie The Washington Post,
July 15, A21
Lapomte, A E, Askew, J M , and Mead, N A (1992) Learning mathematics Princeton, NJ Educational Testing Service
Lapomte, A E , M e a d , N A , & Askew, J M (1992) Learning science
Princeton, NJ Educational Testing Service
Manning, A (1992) U S kids near top of class in reading USA Today,
September 29, 1992 (The USA Today article is dated before the
Education Week article on the same topic, but Education Week typically appears two days before its formal publication date )
Matthews, J (1992) Lessons from Asian schools The Washington Post,
November 30, A23
Mayer, R E , Tajika, H , & Stanley, C. (1991). Mathematical problem
solving in Japan and the United States: A controlled comparison
join nal of Educational Psychology, 83, 69-72
jANUARY-FE,
Mayer, R E , & Tajika, H (1993) Conducting and comprehending
cross-cultural comparisons Reply to Stigler and Miller journal of
Educational Psychology, 85, 560-565
Merseth, K K (1993) How old is the shepard 7 - An essay about mathematics education Phi Delta Kappan, 74, 548-554
Myrdal, G (1969) Objectivity m social science New York Pantheon
Books
NAEP Mathematics Report Card, 1992 (1992) Princeton, NJ Educational Testing Service
National Center for Education Statistics (1994). The Condition of Education 1994. Washington, DC U S Department of Education
Newman, A. J (Ed ) (1978) In defense of the American public school
Berkeley, CA. McCutchan
O'Hare, W. P., & Curry-White, B. (1992) The rural underclass- Examination of multiple-problem populations in urban and rural settings
Washington, DC Population Reference Bureau
O'Neill, B (1994) Anatomy of a hoax Neiv Yoik Times Sunday Magazine, March 6, 46-49
Profiles of College Bound Seniors, 1992,1993 (1994). New York The College Board.
Robinson, G , & Forsyth, J (1994) NAEP test scores' Should they be used
to compare and tank state educational quality7 Arlington, VA' Educational Research Service
Rotberg, I (1990). I never promised you first place Phi Delta Kappan,
72, 296-303
Rothman, R (1992, September 30) U S ranks high m international
study of reading. Education Week, 1.
Rothstein, R (1994, June) Presentation to the President's Professional
Development Workshop, AASA, Arlington, VA
Schooland, K (1990). Shogun's ghosts The dark side of Japanese education
New York Bergin and Garvey.
Stcdman, L C. (1994a) Incomplete explanations The case of U S performance in the international assessments of education. Educational
Researcher, 23(7), 24-32
Stedman, L C (1994b) The Sandia Report and U S achievement An
assessment Journal of Educational Research, 87,133-146
Shanker, A, (1994, July 24) Making time New York Tunes, Section 4,
p. 7
Shanker, A (1992a, July 4). World class standards Neiu York Times, Section 4, p 7
Shanker, A. (1992b, July 11) The wrong message Neiv York Times, Section 4, p 7.
Stevenson, H A. (1992) Learning from Asian schools Scientific American, December, 70-76.
Stevenson, H W, & Stigler, J W (1992) The learning gap New York
Summit Books
Stevenson, H W., Lee, S-y, & Stigler, J. W (1986) Mathematics
achievement of Chinese, Japanese, and American children Science,
231,693-699
Stevenson, H W, Lee, S-y, Chen, C , & Lummis, M (1990) Mathematics achievement of children in China and the United States
Child Development, 61,1053-1066
Stevenson, H. W, Stigler, J W, Lucker, G. W, & Lee, S-y Reading disabilities The case of Chinese, Japanese, and English Child Development, 53,1164-1182
Stigler, J. W., & Stevenson, H W (1991) How Asian teachers polish
each lesson to perfection American Education, Spring, 12-47
Stigler, J W., Lee S-y, & Stevenson, H W (1987). Mathematics classrooms in Japan, Taiwan, and the United States Child Development,
58,1272-1285
Stigler, J W., Lee, S-y, Lucker, G W, & Stevenson, H W (1982) Curriculum and achievement in mathematics A study of elementary
school children in Japan, Taiwan, and the United States
Van Wolferen, K (1989) The enigma of Japanese power New YorkKnopf
Westbury, I. (1992) Comparing American and Japanese achievement
Is the United States really an underachiever? Educational Researcher,
21(5), 18-24.
Westbury, I (1993) American and Japanese achievement .again Educational Researcher, 22(3), 21-26
Yano, H (1993) What can we learn from the learning gap 7 Educational
Researcher, 22(1), 36-37.
Received May 25,1995
Revision received August 3,1995
Accepted August 29,1995
\RY 1996
11
Downloaded from http://er.aera.net at PENNSYLVANIA STATE UNIV on September 12, 2016