Educational achievements of thirteen-year-olds in

international
studies
EDUCATIONAL
In
education
ACHIEVEMENTS
OF
THIRTEEN-YEAR-OLDS
IN TWELVE COUNTRIES
Results
research
project,
reported
by:
institute
for
education,
1959-61,
ARTHUR
W. FOSHAY
ROBERT
L. THORNDIKE
FERNAND
HOTYAT
DOUGLAS
A. PIDGEON
DAVID
1962
unesco
of an international
A. WALKER
hamburg
CONTENTS
Foreword.
.
.
.
Arthur
W. Foshay
THE
BACKGROUND
AND
TWELVE-COUNTRY
Robert
.
THE
STUDY
.
.
PROCEDURES
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
.
.
7
OF THE
.
.
.
.
.
.
.
.
.
.
.
.
.
L. Thorndike
INTERNATIONAL
COMPARISON
OF THE ACHIEVEMENT
Fernand
FROM
AND
BELGIAN
Douglas
NATIONAL
DATA.
.
.
.
.
INTERPRETATIONS
.
.
.
.
.
.
.
.
,
.
. . . 43
.
. . . 63
.
\
A. Pidgeon
A COMPARATIVE
Davld
OF 13-YEAR-OLDS
Hotyat
INTERNATIONAL
STUDY
OF THE
DISPERSIONS
OF TEST
SCORES
A. Walker
AN ANALYSIS
PUPILS
AND
.
OF THE
TO ITEMS
SCIENCE
TESTS
----
REACTIONS
IN THE
.
OF SCOTTISH
GEOGRAPHY,
.
.
.
.
.
TEACHERS
AND
MATHEMATICS
.
.
.
.
.
.
.
.
.
.
The national data pooled for purposes of this study derive from the work
of various national centres for educational
research and are used with
their kind permission.
The opinions expressed
in the various sections
of this report, which is sponsored by the Unesco Institute for Education,
are those of their authors and do not necessarily
represent
the views of
the Unesco Institute for Education, of Unesco, Paris, or of the research
institutions
to whose staffs the authors belong.
Foreword
The present study may well be described as an unusual addition to the literature of education.
The results of the project here reported suggest that both empirical educational
research and
comparative
education can gain new dimensions, the one by extending its range over various
educational
systems, the other by including empirical methods among its instruments.
In the
minds of its authors the project had the double purpose of throwing light on the possibilities
of
such research, and of obtaining actual results which would, not so much evaluate educational
performances
under different educational systems in absolute terms, but rather discern patterns
of intellectual functioning and attainment in certain basic subjects of the school curriculum under
varying conditions. This would be a first step towards bringing into profile the relative merits
of various learning processes and procedures.
If the results so far, because of limitations on
their validity which the authors freely admit, are little more than suggestive, at least they offer
real encouragement
for believing that such researches can, in the future, lead to more significant
results and begin to supply what Anderson has lamented as “the major missing link in comparative education”,
which in his view is crippled especially by the scarcity of information about the
outcomes or products of educational systems*.
It does not detract from the achievement of this exploratory
study to say that, when further
refining international
empirical research methods, it will be essential for the researcher also to
explore the possibilities of building in more possibilities of relating the data to the specific educational principles and objectives which underlie the various national educational systems.
Certainly the international
group itself was sufficiently
encouraged
by the results of its first
exploratory
study to embark on a more ambitious one during which, at several key points in the
secondary school cycle, as comparable samples of schoolchildren
as can be obtained will be
subjected to tests which bear close reference to curricula and educational aims in all the participating countries.
The project here reported could not have been accomplished
without substantial contributions
from twelve different national centres for educational research (listed on p. 8 and 9) who were responsible for the technical and scientific aspects of the study, whilst the Unesco Institute contributed from its experience in international administration
and coordination
of research. Each national
centre was involved in the expense of providing tests and the local organisation
and admlnistration of field work, whilst the Unesco Institute substantially
underwrote the expenses arising
at the international
level. Beyond this the project depended on the goodwill and cooperation
of a
large number of teachers in whose schools the tests were administered
and the background information collected. I am very glad that the importance of this support, which for unavoidable
reasons usually remains anonymous, is brought out clearly in Dr. Walker’s final section of this
volume.
In conclusion, I should like to express the particular debt of thanks which is owed to Professor
Foshay, who directed the project, to Professor Thorndike, who acted as chief test editor and
was primarily responsible
for analysing the international
data, and to the authors who have
contributed to this volume. It is hoped that subsequently
it may prove possible to add further
analyses of the international
data to those contained in the present report. Our thanks are
due no less to Mr. Cobb who, following
his responsibility
for day-to-day
coordination
of the
project, has undertaken the compilation and arrangement of the present report.
Hamburg, August
C. Arnold
Anderson:
1961. No. 1, p. 7 and 8.
l
Saul B. Robinsohn
1982
“Methodology
of Comparative
Education”,
International
Review
of
Education,
Vol.
VII,
5
Arthur
W. Foshay
TWELVE-COUNTRY
THE
BACKGROUND
AND
THE
PROCEDURES
OF THE
STUDY
In the genersl
orientation
with which
of setting
up an international
project
I” 12 countries.
and the characteristics
this report
opens.
Professor
Foshay
describes
for achievement
testing.
the tests that were
of the samples
to which
they were given.
the process
administered
If custom and law define what is educationally
allowable within a nation, the educational systems
beyond one’s national boundaries
suggest what is educationally
possible. The field of comparative education exists to examine these possibilities.
The present exploratory
study, like all studies in comparative
education, has its roots in the
desire of each of us to know more about educational
systems other than our own. It differs
from others, however, in that it seeks to introduce prominently an empirical approach into the
methodology of comparative education, a field that has in the main relied on cultural analysis as
its chief mode of inquiry.
Beginning in June, 1959, and ending in June, 1961, research agencies from twelve countries
cooperated in a pilot study of school achievement, under the general sponsorshrp of the Unesco
Institute for Education in Hamburg. The purpose of the present report is to describe this effort
and to indicate some of the results.
Background and need
The number of cross-country
comparisons of school achievement is small, and the findings must
be limited severely. Typically, such studies have involved two populations
somewhere
in the
middle of their respective
school careers. Achievement
has been compared
in one school
subject, such as mathematics.
Such studies, useful as they have been, have been limited in
scope. In some cases, the comparisons have been inappropriate
because of differences in curriculum, a difficulty compounded by the “mid-career”
point at which comparison was made.
What has not heretofore been attempted, even on a limited basis, is a comparison that would
take the school population near a terminal point, and involve many countries from the same
general world culture. Such a large-scale effort would seem difficult to administer and exceedingly
expensive to carry out. Such an effort, however, would have advantages:
the results could be
examined with one’s mind on the fact that they arose from many apparently different conceptions
of the nature and meaning of education; since the students were near the end of their formal
education one might take the test responses as representing
the outcome of the educational
system as a whole, rather than catching a student in mid-career, before the curriculum had been
completed. Since a large-scale attempt has not been made, one could find by trying it whether
such a project was in fact feasible. More important than these considerations,
however, is the
possibility
that comparisons
could be made more analytically
than has so far been attempted.
The function of the academic curriculum is to teach children to think in ways appropriate to the
subject matter being learned. It would be very useful if short-answer
tests, with their desirable
attributes of definiteness
and objectivity,
could be used to discern patterns of intellectual functioning in several standard school subjects. Such patterns, if they exist, would shed light on an
ancient pedagogical
problem - the problem created by the fact that we know much more about
the measurement of the results of educational effort than we know about the effort itself.
Purposes of the present study
The present exploratory
study was intended to show whether
general terms, the purposes of the study can be stated as follows:
such needs
could
be met. In
1. To see whether some indications of the intellectual
functioning
behind responses to shortanswer tests could be deduced from an examination of the patterning of such responses from
many countries.
7
2. To discover
the possibilities
and the difficulties
attending
a large-scale
international
study.
In the papers that follow in this volume, several reports are presented of the results so far
achieved. Here I shall describe the procedures we have used, the populations tested, and the
tests we have employed.
Early planning:
the participants
In 1958, the Governing Board of the Unesco Institute for Education accepted the present writer’s
proposal that an “international
study of intellectual
functioning”
be undertaken. The officers
of the Institute invited several directors of educational research organizations to meet in Hamburg
in June, 1959, to consider the proposal further, and to decide whether they wished to participate
in such a study. As it happened, the second meeting of representatives
of European Centres
of Educational Research was scheduled to take place near London a week later, and the proposal
was described there also, with the result that some centers not represented
at the Hamburg
meeting also joined in the project.
The Hamburg meeting of June, 1959 lasted for five days. During this time, the participants considered and adopted the proposal that a study be designed, and then proceeded to make the
design, to prepare preliminary tests, and to arrange a schedule.
Each of the participants
was to bear the costs of test administration
within his own country.
The Unesco Institute paid the cost of travel and maintenance for the European participants
at
the three meetings finally held, and furnished extensive coordinative
services. The participants
who finally took part in the study were the following:
Belgium:
Fernand
Hotyat,
Directeur
du Centre
des Travaux,
lnstitut
de Pedagogic,
England: Dr. W. D. Wall, Director, and D. A. Pidgeon, Senior Research Officer,
tion for Educational Research in England and Wales, London.
Finland: Professor Martti Takala,
Research, Jyvaskylii.
Professor
of Psychology
and Director,
France: Professor Gaston Mialaret, Professor of Psychology
(President of the International
Association
of Experimental
Countries).
Morlanwelz.
National
Centre
Founda-
for Educational
and Pedagogy, University of Caen
Education of the French-Speaking
Federal Republic of Germany: Professor Dr. Walter Schultze, Director, und Dr. Rudolf Raasch,
Research Assistant, Hochschule fur internationale
padagogische Forschung, Frankfurt am Main.
Israel: Dr. Moshe Smilansky, Pedagogical Adviser and Director of Research, Ministry
tion and Culture; Director, Henrietta Szold Institute for Child Welfare, Jerusalem.
Poland: Professor
Scotland:
Jan Konopnicki,
Dr. D. A. Walker,
University
Director,
Scottish
of Wroclaw.
Council
for Research
Sweden: Professor Torsten Husbn, Research Professor
Bjiirkquist, Research Assistant, Institute of Educational
of Stockholm.
Switzerland:
Professor
Dr. S. Roller, Institute
of Educa-
of Sciences
in Education,
Edinburgh.
of Educational Psychology,
Research, Teachers College,
and Education,
University
and L.-M.
University
of Geneva.
USA: Professors Arthur W. Foshay, A. H. Passow, and D. L. Super, Horace Mann-Lincoln
Institute, and Robert L. Thorndike, Institute of Psychological
Research, Teachers College, Columbia
8
University, New York. Professors Benjamin S. Bloom and C. Arnold Anderson,
parative Education, University of Chicago.
Yugoslavia:
Dr. Vladimir
Muiiit,
Institute of Education,
University
Center for Com-
of Zagreb.
In addition to Mr. D. J. Cobb, who served as the continuing coordinator of the project, the Unesco
Institute for Education, under the direction of Dr. S. B. Robinsohn (and prior to his appointment
the Acting Director, M. R. E. Hennion), put the services of its excellent staff at the disposal of
the project.
Under the supervision
of Professor
for the tabulations of data.
The procedure
Thorndike,
Dr. and Mrs. Leonard
Burgess
were responsible
as planned
When the group described here met in June, 1959, they reached a number of agreements about
procedure, first having considered
and accepted the general proposal. Before stating them,
however, it is necessary to state the caveat that applies to this study.
This is an exploratory
study. The participants
in this study were working with no extra funds,
no extra allotment of time, and without the benefit of a previously developed set of procedures.
It will therefore be apparent that both the tests and the sampling procedures do not meet the
standards that might otherwise be required. For these shortcomings we are not apologetic; it was
necessary to accept them, and hence to restrict the statements based on the data gathered,
if the study was indeed to be undertaken. Since not all of the sample populations are comparable, we shall not report total scores here as if they could be compared. The most interesting
analyses involve patterns of responses among items and sub-scores,
not comparisons
of total
scores, and it is this kind of analysis that is reported here.
The following procedural agreements were made and acted on by the participants
in the study:
1. The sample
a. The students to be tested would all be aged from 13 years to 13 years 11 months on the first
day of the school year whatever might be the school level (grade) at which they were found.
b. The sample population in each country would be between 600 and 1000 in number.
c. The sample to be tested would be all the children of both sexes residing in a community
or communities selected to yield a population of the designated size.
d. The community or communities
selected for testing would be as representative
as possible of the total population of the country, according to whatever data were available to the
participant in the study. If (as was true in some countries) no data were available to aid the
participant in his selection, he was to use his own judgment.
2. Data about the children
a. Background
1)
2)
3)
4)
5)
6)
7)
8)
9)
to be tested
data on each of the children
to be tested would
birth date
sex
number of siblings
place in birth order
home language (if different from school language)
location of home (city of 20,000-100,000; 2,000-20,000;
years in school
kindergarten
(attended, not attended)
size of class (by lo’s, from 10 or less to 61 or more)
be gathered,
as follows:
under 2,000 inhabitants)
9
10)
11)
12)
13)
14)
15)
father’s education
mother’s education
interest of parent (much, moderate, little or no)
father’s occupation
mother’s occupation
score on non-verbal intelligence test
3. The tests
a. Tests would be administered
science, geography.
b. A non-verbal
the background
test would
information.
in the following
be administered
fields:
reading
comprehension,
to all of the children,
mathematics,
the score to be added to
c. The working languages of the study would be French and English. Translation of the test
items would be done by each participant
into his home language. Copies of the translated
tests would be deposited with the Unesco Institute for Education.
d. Trial forms of the tests ( except the non-verbal, which had been developed by the National
Foundation for Educational
Research in England and Wales) would be developed
by the
participants working together at Hamburg. (The items for the tests as finally constructed were,
in the main, taken from existing tests originally developed in England, France, Germany, Israel
and the U.S.A.)
e. The trial forms of the tests would be pre-tested with a small number of children
country, and criticisms and suggestions sent to a test editor for consideration.
f. The tests as finally
the Unesco Institute.
approved
would
be duplicated
and circulated
in each
to the participants
by
g. Alterations
in the substance of items would be permissible, provided they were approved
by the test editor. (A typical alteration involved the change in units of measure to conform
with the custom of the country.)
h. The tests would be held to approximately
30 items, in the hope that each of them could
be completed in less than 45 minutes. (Pre-testing
and the later administration
of the tests
confirmed this as an adequate length of time.)
i. No time limit would
be imposed
on the students.
j. A practice test would be constructed
which included examples of all the kinds of items
included in the tests to be scored. During the practice session students would be encouraged
to ask any question that occurred to them about the practice test and about the project as a
whole. Teachers were requested to answer all questions fully, including giving the answers
to the practice test.
4. The schedule
The tests were to be administered in November, 1960, the data sent to New York for processing
by February 1, 1961, and the results of the first data processing were to be made available to the
participants by June, 1961. (Chiefly because of the excellent coordination
by the Unesco Institute,
this schedule was met virtually to the minute.)
10
5. Organization
The administrative
center for the project was the Unesco Institute in Hamburg. The participants
met there three times, each time for one week: in June, 1959, to plan the project and construct
trial forms for the tests; in October, 1960, to take a final look at the project before testing, and
in June, 1961, to examine the data and to plan for interpretation
and publication.
Certain persons accepted special responsibilities
for the conduct of the project, as follows:
Arthur W. Foshay (U.S.A.), project director; editor of the geography
Robert L. Thorndike (U.S.A.), test editor,
A. Harry Passow (U.S.A.), editor of the science test,
Gaston Mialaret (France), editor of the mathematics test,
Walter Schultze (Germany), editor of the reading test,
D. A. Pidgeon (England), editor of the non-verbal test,
D. J. Cobb (Unesco Institute), coordinator
of the project.
The procedure
test,
as executed
The procedure as described above was as uncomplicated
and as realistic as the participants
could make it. Future planners of such projects as this, however, will be interested in the variations from the plan that developed as it was actually carried out. There were several of these:
the sampling procedure varied from the plan in a number of details: the times at which the
tests were administered
varied somewhat because of differences
in the academic calendar;
a few test items had to be changed, even after the pre-testing.
In order that the limits of the
comparisons
may be known explicitly,
we shall present here descriptions
of the population
samples furnished by the participants,
descriptions
of the tests, and some comments on the
translation of the tests.
The samples
The plan called for samples of from 600 to 1000 children between 13 and 14 years of age, these
being all of the children in a representative
community. The samples actually ranged from 300
(Switzerland)
to 1,732 (Israel). The total number of children tested in all the countries was 9,918.
In the section that follows, descriptions
of each sample as provided by the participant
are
reproduced.
Belgium
The area, over which the work of the lnstitut Superieur de Pedagogie du Hainaut extends, consists of localities with between 500 and 2.500 inhabitants (a small industrial city and its environs).
Pupils at the post-primary
level are scattered over a number of schools and not concentrated
in any one town. Due to this fact, it was necessary to use statistical
information
relating to
the whole of the French-speaking
region of Belgium as the basis for forming the sample. According to statistics for the school year 1956-59. the school population aged 13-14 (counting only
pupils who were in the grade appropriate
to their age or not retarded more than 1 year) was
divided up as follows:
Boys: general secondary schools . . . . . . . . . . . . . . . . . . . . . . . . . . . .._...__..........
55%
vocational schools or “quatrieme degre” * of the primary school , . . . . . . 45 Oh
II
Girls: general secondary schools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .._..
47 %
11 vocational schools or “quatrieme degre” of the primary school
53 %
4e degr6
de l’kole
primaire:
two top classes
(7th and 8th years)
of the primary
school
providing
a suitable
terminal
course
generally
with
vocational
bias for pupils
who
have
not transferred
to general
or vocational
secondary
education
at the end of the sixth year.
(World
Survey
of Education,
Vol.
III, Unesco,
Paris,
p. 236)
l
11
In order to obtain representative
groups, complete schools were taken, but these were later
reduced by random sampling methods to conform to the percentages given in the official statistics. The procedure adopted is set out in the following table:
BOYS
Retalned
in samples for
Tested
Vocational
General
schools
secondary
G
analysis
Tested
I
R
L
S
Retained
In samples
analysis
and 4e degre
145
145
205
170
schools
214
175
193
150
-for
The sample finally submitted to analysis thus contained 640 subjects, 320 boys and 320 girls.
A check was made to ensure that the elimating process did not affect the mean scores.
No attempt was made to provide any representation
of the Flemish-speaking
part of Belgium.
Children who were retarded two or more years in school were excluded. These are estimated
(by M. Hotyat) to be about 10 y. of the total. One or two of the sections in the vocational schools
included quite retarded children. In the vocational schools about 26 o/o of the pupils were children
of foreign workers (primarily miners), and the corresponding
percentage in the general course
was 6 o/O.
England
The sample from England consisted of 1,181 pupils, 607 boys and 574 girls. The pupils were all
the 13-year-olds
attending school under one Local Education Authority in central England. This
particular area had been chosen since other evidence had shown that, on tests given at the
age of II, the authority was quite representative
of the whole country both with respect to mean
score (100) and standard deviation (15). Although the proportions
of children from urban and
rural administrative
areas were also similar to those in the country as a whole, the authority
was predominantly
rural in character and contained no large industrial town.
The number of schools
and pupils were as follows:
Schools
Grammar
Modern
Unreorganised
all-age
4
6
3
Boys
Girls
Both
115
461
31
105
443
26
220 (18.6 %)
904 (76.5 %)
57 ( 4.8%;)
Three of the grammar schools, each containing boys
authority concerned and the fourth was a Direct Grant
in the enquiry. All six secondary modern schools were
small unreorganised
rural schools containing pupils of
and girls, were controlled
by the local
boys’ school which volunteered
to assist
coeducational,
as were the other three
both primary and secondary school age.
Finland
The sample from Finland included 727 pupils, 386 boys and 361 girls. These came from about
50 classes in schools widely distributed
over Finland. The choice of schools and numbers of
pupils are such as to make the sample closely representative
of Finland as a whole with respect
to grade and type of school and with respect to the percentages of urban and rural pupils. The
proportions
for the country and for the sample were reported as follows:
12
Whole
Sample
country
62%
38%
school (Grades IV-VIII)
In secondary school (Grades I-V)
In rural community (7- 15 yrs.)
In urban community (7-15yrs.)
Rural students in primary school
Rural students in secondary school
Urban students in primary school
Urban students in secondary school
In primary
68 x
32%
64%
36 %
78%
67 %
33%
76%
24 %
45%
55%
22 %
48 7;
54 %
No pupils were tested in Finland who were in classes below Grade VII of the primary school.
This meant that in primary schools about 2 y. of children in the age group, most of them retarded,
were not included in the sample. In the secondary schools all 13-year-old children were included,
irrespective
of their grade level (actual proportions:
55% in Grade II, 20 O/Oin Grade I, and
25 oh in Grade Ill) and the secondary school sample was thus as representative
as possible.
No pupils were included from Swedish-speaking
districts in Finland (about 8 y. of the population). Many of the classes in rural areas included mentally retarded children.
France
The sample from France was a relatively small one of 451 pupils, 181 of whom were boys and
270 girls. The small size was accounted for in part by bad weather, which reduced school attendance at the time of testing. The sample was drawn from one small city near Caen, together
with the adjoining rural area. The sample corresponded
approximately
with the total French
population with respect to percent of urban residence, occupation of father and size of family,
as these were determined in an earlier extensive survey by Heuyer, Pieron and Sauvy*.
The sample was chosen to represent the different types of schools in the following numbers:
Primary
school
Vocational school
Secondary school
Boys
Girls
Total
117
20
52
142
20
98
259
40
150
Those pupils who were retarded more than one year in school were excluded.
to be about 5 y. of the total age group.
Federal
Republic
This is believed
of Germany
The sample consisted of 811 pupils, 403 boys and 408 girls, who were attending schools in the
city of Darmstadt in Hessen, or in the adjoining rural districts. These three districts are believed
to correspond well with the country as a whole in socio-economic
structure, in distribution
by
types of occupation, and in education. Furthermore, previous experience in setting up test norms
has shown this region to be close to the national average.
Within the region, the sample was chosen so that the proportions
corresponded
closely with
the national average with respect to type of school attended, and within the Volksschule
representativeness
was sought with respect to grade level reached and, within the eighth grade, with
respect to size of school.
The proportions
are shown below for both sexes combined.
* Heuyer,
veraltalres
G., pieron,
de France,
H., and
1950.
Sauvy,
A. Le Nlveau
lntellectuel
dae
Enfanta
dlAge
Scofalre.
paris, presses u,,I-
13
Country
E.S.N. 8th grade
Mittelschule 8th grade
Gymnasium 8th grade
2.0
10.7
18.1
Country
Volksschule Total
6th grade
7th grade
8th grade
l-3 classes
4 - 7 classes
8-9 classes
as a whole
as a whole
69.2
1.7
7.3
63.2
15.0
18.1
27.1
Sample
2.1
11.0
18.5
Sample
68.4
1.7
7.4
59.3
15.4
16.2
27.7
No data were available concerning
retardation
in the Mittelschule
or Gymnasium, and so no
cases of this type were included.
As in all other countries, except Scotland, the tests were administered in Germany in November,
1960, to pupils aged 13.0 to 13.11 on the first day of the school year. As Germany was in the
unique position of having a school year beginning in April, not September as in other countries,
the German children were in fact aged between approximately
13.7 and 14.6 at the actual time
of testing, as opposed to approximately
13.2 to 14.1 in the other countries. This difference may,
however, have been partly offset by the long summer and other holidays which considerably
reduced the German pupils’ effective period in school prior to testing.
Israel
Because of local interest in certain types of sub-groups, Israel tested a relatively large sample,
almost 1,900. Data were analyzed for 1,873, 930 coded as girls and 942 coded as boys. These
were from a number of different schools in different localities. The basic classification
is into
town and city schools, schools in Moshavim (collective agricultural
settlements), and schools in
Kibbutzim (communities with communal housing, dining, and child-rearing).
Schools in Moshavim
were all located in well-established
settlements
of early immigrants.
Schools in Kibbutzim were chosen in equal proportions to represent three ideological trends, but
otherwise at random. Town and city schools were chosen so as to include essentially
equal
numbers of schools that had scored high, above average, and average on the national eight grade
survey.
The Israeli data are based on all eighth grade pupils in the schools which were tested. No 13-yearolds who were not in the eighth grade were included in the Israeli sample. The result is that
177, or 9.5 O/o of the group were 14 years of age or over at the beginning of the school year,
while 514 or 27.4 O/cwere less than 13 years of age. At the same time, all 13-year-olds who were
in the 7th or earlier grades were excluded. These are estimated to be 10 O/o of the total age
group. Likewise about 4 oh of the 13-year-olds
who had progressed
beyond the eighth grade
were excluded. Furthermore, the sample was limited to schools in which the pupils were primarily of the early group of immigrants to Israel, and thus largely of European origin. Thus, of the
total sample only 193 or 1O.30/o had fathers born in an African or Asian country compared to
about 30 O/oin the eighth grade nationally.
(It is to be noted that of the fathers 166, or 8.9 %, are classified as professional
or managerial,
but Dr. Smilansky indicates this is not unusually high for the European section of the country’s
population.
(It is also to be noted that the date of testing in Israel was February-March
rather than November, so that pupils had an additional three or four months of growth and schooling as compared
with most countries.)
14
Poland
The total population tested in Poland was 1,000, consisting of 346 boys and 654 girls. Children
were selected from five different environments.
The rural children were tested in large villages
possessing at least the first classes of secondary school. Children from small towns were tested
at Milicz, from mixed agricultural-industrial
surroundings
at Dzierzoniow,
from industrial areas
at Walbrzych (where there are coal mines and heavy industry), and children from a favored
cultural environment in one of the districts of Wrociaw, 80 O/oof whose inhabitants were reported
by the Polish research agency as being highly educated people.
Scotland
The Scottish sample consisted of 991 pupils, 515 boys and 476 girls, drawn from two of the
educational administrative
units in Scotland, the city of Aberdeen and the country of Stirlingshire. These were chosen because each had been found in the past to be representative
of
educational achievement in Scotland as a whole. In Aberdeen, one-sixth of the 13-year-old pupils
were tested, the sampling being based upon the day of the month on which they were born. In
Stirlingshire,
a sample was drawn from half of the schools in the county in such a way that each
school course was represented
in the same proportions
as in the county as a whole. Thirteen
schools were involved in Aberdeen and ten in Stirlingshire.
The testing in most of the countries took place in November of 1960, it having been agreed that
this was a good time for those countries in which the school term begins early in October or
during the month of September. The school year would have been well started by this time, and
the children would have had a substantially
similar number of days since the school year had
begun. In Scotland, however, the testing was conducted in June of 1960 as certain reorganisations
were due to take place in Scottish secondary schools during the autumn of 1960. It was therefore
necessary to give the test at that time, and to adjust the selection procedure so that the Scottish children were at the appropriate age when they took the tests.
Sweden
The sample in Sweden consisted of 567 pupils in all, 284 boys and 283 girls. These were drawn
from about 30 classes in the middle part of the country. The classes were chosen from schools
which in previous national surveys had given results with an average score and variability close
to the national average. Testing was limited to seventh year classes in the various types of
schools. Those pupils in the classes who were above or below the age limits for the study were
excluded from the sample.
No attempt was made to test those 13-year-olds who were in classes either above or below the
seventh. The number of 1S-year-old pupils in classes above the seventh is estimated to be about
10 y0 and the number in the sixth or lower classes is estimated to be about 5 %.
Switzerland
The Swiss sample consisted of only 314 pupils, 153 boys and 161 girls. These were drawn from
the city of Geneva, no attempt being made to represent a wider geographical
region. Dr. Roller
points out that a national sample in Switzerland
would have had to be drawn from each of the
cantons in the country, there being no other unit that is appropriate to the cultural and population
distribution
in Switzerland.
Since the total 13-year-old population of Geneva is about 2,000, the
sample is considered adequate to represent them.
Within Geneva, the sample was set up so as to include appropriate
numbers in the different
types of schools in grades 7 and 8. The total numbers tested were as follows:
15
7th Grade
boys
8th Grade
girls
boys
girls
Ecole primaire
College de G&eve
Ecole primaire
College de Geneve
College moderne
Ecole superieure lat.
Ecole superieure mod
Ecole m&-rag&e
101
72
203
72
94
52
77
76
Those pupils among this total group who were 13 years of age were included in the final sample.
No attempt was made to test 13-year-olds who fell below the seventh grade. These are estimated
to comprise 5 y0 of the age group.
U.S.A.
In the United States the total group tested was 2,254, but in order to reduce the burden of
statistical analysis, only every other pupil was included in the sample finally analyzed, which
comprised 1,127 pupils, 568 boys and 559 girls.
The United States sample consisted of all the 13-year-olds in the public school systems in three
different educational administrative
units. One was an industrial city that is part of the Boston,
Massachusetts
metropolitan
area. A second was an industrial city of about 50,000 in southern
Ohio, not far from Cincinnati. The third was a rural county in south central Illinois. The three
units were chosen because they had been found to give results close to the national average
when they were used in the nationwide standardization
testing for the Metropolitan
Achievement
Test published by the then World Book Company.
Yugoslavia
The Yugoslav
sample consisted
Boys
Urban
Rural
Multiple
Total
Class teaching
of 685 children,
Girls
drstributed
as indicated
in the following
table:
Combined
202
135
(28)
206
142
(31)
408
277
(59)
337
348
685
685 subjects in 24 classes,
of 13 schools
in 10 localities.
Some previously determined localities were substituted by others with the same general situation
(general cultural level of the population, distances from principal ways of communications,
etc.).
The localities where testing actually occurred were: Bukevje, Klara, Lomnica, Novo tire, Odra,
Vele4evac. Velika Gorica, Velika Mlaka. Vukovina, Zagreb.
In the localities tested, all I3-year-olds
attend school. Only handicapped children educated in
special institutions for the handicapped were excluded from the sample. Testing was also carried
out with pupils above and below the seventh grade. All classes are co-educational.
The tests
Four tests of academic achievement were given. Since our general purpose was to gather data
that could yield inferences about intellectual functioning, or reasoning, an attempt was made in
each test to include items that called for reasoning, but did not require previous knowledge of
the field. The ideal item was one that presented all the information required for a correct answer.
The typical item was multiple-choice.
Mathematics
of the tests.
test:
5 items requiring
5 “basic
“Which
a)
b)
c)
d)
Here are very brief descriptions
simple computation
concept” items, e. g.:
is the largest of the following
94512
19542
95421
59241”
numbers?
7 verbal problems, e. g.:
1) “A train leaves Rome at 1:OO p. m.. traveling
along the same line 30 minutes later, traveling
second train overtake the first?”
at 60 miles an hour. A second train leaves
at 60 miles an hour. At what time does the
2) “This table gives readings of maximum and minimum temperature in degrees Fahrenheit, of
rainfall in inches, and of sunshine recorded for each month of the year.
a) In which month did the highest temperature occur?
b) In which month was the difference
between the maximum and minimum temperatures
greatest?
c) Which was the wettest month?”
A
9 problem sequences, e. g.:
“We know that the altitude of a triangle is a line drawn from one
vertex perpendicular
to the opposite side. We are given the triangle in Figure 2.
H”
I
I
\
B
a) What is the altitude from vertex B?
b) What is the altitude from vertex C?”
d
---lH
C
\
FIGURE
\\J/
Total: 26 items, some with subdivisions;
Science
29 responses.
2
H’
test:
16 items of the form:
“When
a)
b)
c)
d)
you enter
pupils of
lenses in
pupils of
lenses in
a movie
the eyes
the eyes
the eyes
the eyes
theater on a sunny day, you do not see well at first because the
are still large
will focus the light in front of the retina
are still small
will focus the light behind the retina.”
5 items to be marked as either “definitely
false, or definitely false,” e. g.:
“One can tell the approximate
true, probably
true, impossible
age of a tree from the rings on a cross-section
to determine,
probably
of its trunk.”
Total: 21 items.
17
Geography
test:
12 multiple-choice
items depending
on information,
e. g.:
“The Danube flows into the
a) Red Sea
b) Mediterranean
Sea
c) Dead Sea
d) Black Sea.”
16 items involving
drawing inferences
from a set of hypothetical
4 items requiring
that generalizations
be stated as supported
maps.
or not supported
by a bit of text.
“Statements
a) Many mountain stream beds are narrow, steep, and full of rapids.
b) People who live in rugged mountain areas tend to depend for their livelihood more upon
animal products than upon the growing of crops.
c) Mountain dwellers are often fine craftsmen.
d) In mountainous areas, water power can be used to produce electricity for manufacturing.
Generalizations
1)
2)
3)
4)
Beautiful carved leather articles are made in the Himalayan area.
In mountainous areas, export trade is restricted to articles of high value and small bulk.
Mountain people do not build large boats.
Life in many mountainous areas has become considerably
more comfortable during the present century.”
Total: 32 items.
Reading comprehension:
5 reading passages,
each followed
by 6 or 7 comprehension
items, e. g.:
“According to the text, considerate driving means
a) greeting other people in a friendly way
b) giving help if there has been an accident
c) assisting if there has been a breakdown
d) watching out for old and infirm people and children.”
Total: 33 items.
Non-verbal
test:
74 items, requiring either perception of analogies among abstract
perception of differences, or perception of relationships.
Validity
completion
of series,
of the tests
The multiple-choice
18
figures,
form of testing
is more familiar
to students
in Scandinavia,
the United King-
dom, and Germany than it is in the other participating
countries. A practice test which included
examljles of all the forms of test items was provided for this reason. There is no evidence that
the scores actually achieved bore any relationship to previous experience with tests of this kind.
Reliability
of the tests
The reliability
of the tests is discussed
report. The general estimates of reliability
by Professor Thorndike in the second
of the tests are as follows:
Mathematics
Reading
Geography
Science
chapter
of this
.81
.81
.70
.62
A note on translation
The tests were originally prepared in either English, French, or German. They had to be translated
into eight languages: English, Finnish, French, German, Hebrew, Polish, Serbo-Croatian
and
Swedish. The problem of translation was, of course, of great concern. Since the participants did
not mean to make this the main problem of the study (any more than they meant to make sampling
or test construction
the main problems), they agreed to leave the translation of the items into
their own languages to each participant. They did not, for example, test the translation by having
it translated back into its original language in order to compare the re-translation
with the original.
This led to occasional differences in items. A striking example of this, as might be excepted, was
in the translation of a passage in the reading comprehension
test, in which the literary quality of
the passage in its original French was its main characteristic.
“Elle sort d’une touffe d’herbe qui I’avait cachbe pendant la chaleur. Elle traverse I’allbe de
sable & grandes ondulations.”
“A caterpillar emerges from a tuft of grass where it has been concealing itself during the warm
weather. It crosses the gravel path, moving in a series of large ripples.”
“Quelle belle chenille, grasse, velue, fourrbe. brune, avec des points d’or et ses yeux noirs!”
“What a beautiful caterpillar-fat,
hairy, furry and brown, with golden spots and black eyes!”
A different translation problem appeared at one point in the mathematics test. A question in the
English original read: “How would omitting the decimal point in 18.52 change the number?
One of the answers from which the examinee could choose, read: “Makes it 1/I 0 as large”.
The French translation reads: “II devient 10 fois plus petit”-an
entirely different problem.
Such difficulties
in translation apparently were so small in number and so sca’ttered as to be
insignificant. There is no evidence that they seriously influenced the national scores.
Taking this exploratory
study as a whole, we think
importance in the field of comparative education:
1. A large scale project, which depends on similarities
in education and in measurement, can be done.
2. The data obtained, even under the restrictions
that
And, by extension, we think we have shown that it
an element only
element into comparative education-
we have demonstrated
in technical
certain
and philosophical
matters
of
assumptions
inevitably arise, can be analyzed fruitfully.
is possible to introduce a large empirical
slightly present in the field until now.
19
Robert
L.Thorndike
ACHIEVEMENT
INTERNATIONAL
COMPARISON
OF THE
OF 13-YEAR-OLDS
In the second
section
of this report
Professor
Thorndike
presents
and discusses
certain
aspects
of the results.
He explains
the reason
for deciding
to restrict
comperlsons
between
country
and
country
to patterns
of achievement
(national
proflles),
omitting
comparisons
of levels,
and he
presents
findings
on the reliability
of the tests.
The results
are analysed
in relationship
to sex
and certain
background
variables.
Further
analyses
of variations
behveen
countries
in the relative
difficulty
experienced
with selected
test items,
combined
with a study of item content,
lead the
author
to investigate
e number
of hypotheses
with results
which
demonstrate
the possibilities
of
international
evaluations
of achievement.
Because the present project was a pilot enterprise, carried out with limited resources, it was not
practical to try to get a truly representative
sample of the 13-year-old population In each country.
Sampling procedures varied from country to country, as described by Foshay, but in most instances sampling was limited to one or a few communities or regions that were thought to be representative of the country as a whole. In a few countries (England, Scotland, Sweden) there had
previously been fairly complete national testing surveys, and communities or regions could be
chosen which had been found on these to correspond to the country as a whole. In some countries (Switzerland,
Israel) the sample was intentionally
restricted to a place or to a fraction of
the population that were fairly clearly not representative
of the country as a whole. In most of
the countries an attempt was made to achieve representativeness,
but the evidence upon which
communities or schools were chosen was rather meager and impressionistic.
Because of these limitations on the representativeness
of the national samples, there seems to
be little value in comparing the absolute level of achievement in one country with that in other
countries. For this reason, no country by country tables of mean scores are reported. We will
turn our attention instead to an examination of the magnitude of the differences
between countries, and to the differences in patterns of achievement from country to country.
Statistical
characteristics
of the tests
The test battery consisted of four short achievement tests and a non-verbal measure of scholastic
aptitude. Three of the achievement tests yielded separate part scores as well as a total score.
The nature of the several tests and sub-tests is described by Foshay (pp. 16 to 18). At this point
we shall merely supplement that description by a brief table (Table I) showing for each test and
sub-test (1) the number of items, (2) a general estimate of the mean, obtained by averaging the
means for 11’ national groups, (3) a general estimate of the standard deviation, obtained as an
average of the 11 standard deviations within countries, and (4) a general estimate of reliability,
obtained by Kuder-Richardson
Formula No. 20 from the average standard deviation and the
average item difficulty over 11 national groups.
As might be expected, the reliabilities
of many of the sub-tests, consisting of from 4 to 10 items,
are quite low. However, the total test reliabilities
are fairly satisfactory,
the estimated values
ranging from a low of .62 for the 21-item science test through .70 for the geography test, .81
for the reading test and the mathematics test, to .89 for the considerably
longer non-verbal test.
The estimates are rough, since the assumptions underlying the Kuder-Richardson
formulas are
not completely
met. However, the general order of magnitude is indicated. Though the tests
would be of no use for the study of single individuals, they appear adequate for comparisons of
groups
of several hundred, and these are the comparisons
with which this study is primarily
concerned.
l
Results
from
Yugoslavia
became
available
too
late
for
inclusion
in these
and
certain
other
analyses.
21
TABLE 1
Parameters
Statistical
No.
of
items
Non-verbal
Aptitude
Mathematics
- Part 1
- Part 2
- Part 3
- Part 4
-Total
of Tests
Average
K-R No. 20
Average
Stand.
Mean
Dev.
Reliability
75
33.66
12.19
.89*
5
5
7
9
26
3.63
4.19
3.91
3.05
14.98
1.14
1.03
1.60
2.02
4.40
.51
.58
.51
.73”
.81
Reading Comprehension
33
21.36
5.27
.81
Geography
- Part 1
- Part 2
- Part 3
-Total
12
16
4
32
7.01
7.82
1.59
16.42
2.16
2.77
1.07
4.65
.49
.57
.38
.70
Science
-Part1
- Part 2
-Total
16
5
21
7.77
1.99
9.76
2.85
1.17
3.39
.59
.28
.62
* By K-R Formula
l
* Inflated,
No. 21. but spuriously
because
items
high
because
of speed
factor.
are not independent.
Within and between countries
variance
From the raw scores on each test, raw score means and standard deviations were obtained. A
crude average of the variances in the 11 separate countries provided an estimate of the average
variability
of performance
of pupils within a single country-the
“within-countries”
variance.
The mean of the means for 11 countries was used as a grand “total mean”. Variance of the 11
national means around this average value provided an estimate of variability
from country to
country-the
“between-countries”
variance. A comparison of the two variance estimates-the
one for variability within a country and the other for variability from country to country-provides
an index of the magnitude of international
differences.
Table 2 expresses the variance between
countries as a percent of typical variance within a country. The results are shown for boys and
girls separately, and for the total group of all pupils.
TABLE 2
Variance Between Countries Expressed as a
Percent of Average Within-Country
Variance
Boys
Non-verbal
Mathematics
Aptitude
- Part 1
-Part2
- Part 3
- Part 4
-Total
Reading Comprehension
22
and
Girls
Boys
Only
Girls
Only
12.1 ,s
11.5%
11.7%
9.9
9.4
11.8
14.3
16.2
13.9
11.6
14.0
18.4
21.2
7.5
8.8
11.7
12.2
13.4
6.2
8.1
5.6
Geography
-Pa*1
-Part2
- Part 3
-Total
35.8
7.1
5.6
15.4
39.3
7.9
4.4
17.7
37.4
7.8
7.1
15.9
Science
- Part 1
- Part 2
-Total
6.1
2.3
5.2
7.3
1.5
5.9
8.0
3.3
7.1
It is clear that the variation between national means is small in relation to the variability
of
scores within any one country. National differences
represent a minor rather than a major component in these results, And the probability is that they are over-estimated
rather than underestimated, because the countries that did relatively well on the tests were in several instances those
that were known to have tested an up-graded sample of their populations. We suspect that with
truly representative
national samples, the differences would have been reduced. Of course, the
participants
in this survey were all countries with a basically European culture, and with welldeveloped educational
systems. A greater heterogeneity
in national cultures and educational
levels would very probably increase the national differences, perhaps substantially.
A comparison of the different tests with respect to magnitude of international
differences brings
out some rather dramatic and surprising results. With these tests and samples, the tests that show
the smallest variations from country to country are the tests of science and of reading comprehension. The presumably relatively culture-free
non-verbal aptitude test shows about twice as much
country-to-country
variation as the reading and science tests, and the geography and mathematics
tests about two-and-a-half
times as much. It must be remembered that all the tests had been
translated into eight different national languages - English, Finnish, French, German, Hebrew,
Polish, Serbo-Croatian,
and Swedish. The fact that a reading test, which would appear to be
especially susceptible
to changes in difficulty
with translation,
remained so uniform is rather
unexpected. The findings suggest that the nearest thing we have to a culture-fair
test may be a
carefully translated reading test, and that level of reading ability is the feature with respect to
which different educational programs are most nearly uniform.
Several of the tests had sub-tests and a comparison of the variability between nations on these is
of some interest. In the case of mathematics it is the verbal problems (Part 3) and the inductive
series (Part 4) that showed the largest international differences. Geographical
information (Part 1)
showed by a large margin the widest international variation of any test, whereas in map reading
(Part 2) the differences were much smaller and in drawing generalizations
(Part 3) smaller still.
Scientific judgment (Part 2) showed very little variation between countries, and scientific information somewhat more. In some measure, the above results are an outcome of differences
in reliability of the sub-tests. If the within-group
variance is inflated by measurement
errors, the
between-group
variance will necessarily
look small in comparison.
However, this accounts for
only part of the results, and the major differences appear to arise from more genuine factors.
Generally speaking, the boys varied more from country to country rhan did the girls. However, the
sex differences in this respect were neither large nor entirely consistent.
National
profiles
Though variations in the sampling procedure from country to country make comparisons of level
of achievement of questionable
value, comparisons of patterns of achievement from country to
country seem sound and of a good deal of interest. By pattern of achievement we mean a country’s achievement on the specific tests and sub-tests, relative to its own over-all level of achievement.
Patterns of achievement were arrived at through the following steps:
(1) For each test a crude average of the national means was computed, and also a crude
average of the national standard deviations.
(2) On any one test, such as the test of reading comprehension,
each country’s average score
was converted into a “standard score”, by subtracting from it the average score for all countries
and dividing the result by the average standard deviation for that test.
(3) The “average standard score” for the five tests (i. e., the total scores) was computed for
each country.
(4) This “average standard score” was subtracted from the standard score on each test and
sub-test. That is, each specific standard score was expressed as a deviation from the country’s
“average standard score”. In this way, each national group was reduced to a common and comparable base line. It is then possible to examine and compare directly the peaks and hollows of
achievement in the different countries.
23
TABLE 3
National Patterns of Achievement
Expressed as standard score deviations from national
average on all 5 test5
Belgium
Non-verbal
Aptitude
Mathematics
-Part1
-Part2
- Part 3
-Part4
-Total
-16
-36
9
-18
-6
-17
-31
-16
-16
-19
21
-40
-7
c46
-38
28
32
23
-18
6
-24
-11
- ia
3
-18
Science
-Part1
-Part2
-Total
-16
5
- 14
7
4
28
43
30
-8
11
19
-58
-12
27
-33
ta
11
-7
4
3
23
7
15
25
16
-15
26
47
-a
-29
20
13
23
16
16
4
9
- 29
13
-21
24
-8
24
-14
-24
-14
Aptitude
-Part1
-Part2
-Part3
- Part 4
-Total
-18
25
-9
2
43
30
Scotland
Sweden
Swik.
-a
6
12
11
33
19
-16
3
0
0
10
-4
16
-25
-26
-27
-58
-23
-43
23
-45
12
20
92
-31
-65
16
-33
-12
33
-16
-20
12
ia
-5
-9
9
0
24
7
12
20
12
-43
-39
-43
-44
-3
23
-1
-24
-Part1
- Part 2
-Total
Yugosl.
25
Geography
-Part1
-Part2
-Part3
-Total
U.S.A.
- 28
-43
- 29
- 19
-39
Reading Comprehension
Science
3
Israel
Combined
Poland
Non-verbal
Germany
-51
-42
- 20
-9
-40
-Part1
- Part 2
-Part3
-Total
Mathematics
France
12
Geography
and Girls
Finland
25
40
34
42
44
Reading Comprehension
c = Boys
England
-1
4
4
-3
7
-9
6
30
-54
15
35
-16
27
10
16
5
28
24
27
28
24
21
The complete set of national profiles is presented in Table 3. These show results for boys and
girls combined. All entries in the table are expressed In hundredths of a standard deviation. That
is, the entry 12 for Belgium on the non-verbal test means that Belgium’s standard score on that
test was twelve hundredths of a standard deviation higher than Belgium’s average standard
score on all five of the tests. Thus, if we look at the results for Belgium, we see that the pupils in
Belgium were most outstanding, relative to their over-all level of performance,
in mathematics.
Here they show a peak of almost half a standard deviation. They are slightly above their own
over-all average on the non-verbal aptitude test, and they do relatively least well on the test of
reading comprehension.
The sub-test scores show only minor deviations from the total scores.
England, by contrast, performs especially well on the non-verbal aptitude test and is especially
weak in mathematics and geography. The geography sub-test dealing with geographical
information is notably lower than the map-reading or inference tests.
24
”___..
.,_
A similar analysis could be made of the pattern for each country, pointing out points of relative
strength or weakness in each. Or the results can be examined from the point of view of each
test in turn. This has been done in Figure 1, in which the strength or weakness of each country
(relative to its own over-all mean) has been plotted on a common scale. We see that on the
non-verbal test the country that doea especially well is England. Since the test was English in
origin, this result may possibly reflect some degree of previous familiarity
with the test, and
acceptance of the task as a reasonable and sensible one. Scotland also does well on the test,
while Germany and Finland perform poorly on it.
On the mathematics test, all the French-speaking
countries are superior performers, with Belgium
leading the way. Poland also shows up to advantage. The English-speaking
countries are consistently poor. One wonders what part of this is contributed
by their complex system of denominate numbers. Yugoslavia also has marked difficulty with this test.
National differences
in reading comprehension
are relatively small. It is on this test that Yugoslavia shows up to best advantage, followed
by Scotland and Finland, while Belgium and
Poland do relatively poorly.
On the geography test we find Germany, Israel and Poland leading the way, and their superiority is especially
marked in that section of the test dealing with geographic
information.
The
English-speaking
countries do notably poorly on the geography test as a whole, and especially
on the sub-test dealing with geographical
facts and information.
This is an area in which the
different national curricula appear to have produced distinctly different results.
Science is an area in which the French-speaking
countries are relatively weak. Here the leaders
in relative achievement are the United States and Germany, with Yugoslavia and England following in that order.
Some countries show rather marked peaks and hollows in their profiles. Thus, England is very
high on non-verbal aptitude and very low in mathematics and geography. Belgium is high on
mathematics and quite low in reading. Others show a notably even pattern of performance. The
best example is Sweden, which performs at almost the same level of excellence on all the tests.
The patterns of relative strength and weakness provide a picture of achievement
under the
different educational
systems. They provide no explanation
of how the differences
come into
being. This must be contributed by the investigator who is intimately acquainted with the educational systems in the several countries. However, the data presented here must still be considered
quite tentative. They are limited by (1) the local and only partially representative
character of
many of the national samples, (2) the brevity of the tests and especially the sub-tests, and (3)
the limited opportunity
to plan test content 50 as to assure the most balanced and appropriate
representation
of content and objectives.
The results reported so far are for boys and girls combined. It is of some interest to look at
the results for the sexes taken separately, and this is done for the five total tests in Table 4.
Scores for boys and girls are each expressed as deviations from the average of all five tests for
that sex in that country. That is, the score of 11 for Belgian boys on the non-verbal test means that
the Belgian boys were eleven-hundredths
of a standard deviation higher on that test than they
were on the average of the five tests.
25
FIGURE
Non-verbal
Aptitude
England
Mathematics
1
Relative
Achievement
Reading
Comp.
Groups
on Tests
Geography
Science
Belgium
France, Poland
Scotland
Yugoslavia
Scotland
Switzerland
Finland
Belg., Switz.
USA
England
Israel
USA
Switzerland
Germany
Flnland
Sweden
Sweden
of National
Israel
=rance, Israel
I
Israel
USA
Germany
Yugoslavia
Poland
France
England
Germany
Sweden
Switzerland
;,;w$=fia
Finland
Scotland
Poland
Yugoslavia
France
Germany
Belg., Israel
Poland
France
USA
Scotland
England
Yugoslavia
Switzerland
TABLE 4
Sex Differences in Pattern and Level
of Achievement
Average
B-G
32
22
24
15
32
18
37
14
6
24
- -11
17
Belgium
England
Finland
France
Germany
Israel
Poland
Scotland
Sweden
Switzerland
U.S.A.
Yugoslavia
Average
19
Non-Verbal
6
G
11
42
-47
-7
-39
3
-22
16
-21
3
12
-12
-5
13
50
- 28
- 22
- 33
17
-17
33
9
21
11
-6
Geography
Science
B
G
B
G
B
G
B
G
42
-42
-2
34
-18
-10
22
-44
-4
17
-35
-56
46
-37
13
27
-14
-4
34
-34
6
22
-18
-34
-35
-2
11
-27
-4
-17
-34
14
-8
-8
-8
25
-12
25
27
6
10
-3
-18
33
7
16
20
35
-16
-30
-21
-36
-25
7
5
29
20
15
-11
1
6
-14
8
0
23
23
21
16
-23
-11
9
-18
1
-1
31
31
-5
32
5
20
22
33
-16
47
37
12
2
Mathematics
4
-8
Reading Comp.
1
-8
-1
20
0
-14
-32
16
-32
-15
-11
-9
-68
7
4
-15
The first column of Table 4 shows a different kind of a finding. In this column, the average
standard score on all five tests for boys and for girls is compared, country by country. Thus. in
Belgium on the average of all five tests the boys fell 0.32 standard deviation units above the girls’.
in
An examination of this column, headed “Average B-G”, shows the extent of male superiority
a pooled average performance,
country by country. Thus, there is only one country in which
the girls surpassed the boys in total performance - the United States. DifFerences between the
two sexes were small in Sweden and Scotland. Largest differences were in Poland, Germany and
Belgium. These results provide some clue as to the comparability
of educational opportunity and
motivation for the two sexes in different countries. On average, over all countries and tests,
the boys fall about a fifth of a standard deviation above the girls.
An examination of results for the different tests shows that girls perform best, relatively speaking, on the reading test, and least well on the test of science. This pattern is a universal one,
appearing in each one of the 12 countries. We appear to have here a universal and quite stable
sex characteristic. There is also a small, but rather consistent tendency for the girls to do relatively
better on the non-verbal test (10 countries) and the mathematics test (11 countries). Differences
in the geography test were small and inconsistent. All of these differences in relative performance on specific tests appear, of course, after adjusting for the 0.19 standard deviation difference
in average performance of boys and girls.
Achievement
in relation to parental education
or occupation
An attempt was made to get information on parental education or father’s occupation for the
children in each country. However, there was very real difficulty in getting comparable data for
different countries. Pressures and sensitivities
differ from country to country, so that in some
it is possible to get information about education and in others it is possible to get information
about occupation, but it is rarely possible to get both. Furthermore, the differences in educational
structure in different countries make it difficult to establish classifications
that will be comparable
from country to country. However, the operation was carried out as well as could be done, and
some comparisons are presented in Tables 5-10.
*
The
basic
unit
IS the average standard deviation of boys and girls combined, averaged for 11 countries.
27
TABLE 5
Percent of Pupils with Fathers at Different
Levels of Education or Occupation
B
Level of father’s educ.
Belgium
Elementary only
Some secondary
Secondary completed
Some college
College
England
45
32
14
*
t
completed
0
College
33
44
15
*
*
completed
B
Germany
Father’s occupation
Unskilled, semi-skilled
Skilled, farmer
Clerical, sales
Sub-professional
Prof. & managerial
*
77
11
7
24
55
19
37
27
14
.
74
,*
9
78
14
l
l
l
*
+
l
.
l
Too few to justify computation
I
R
21
60
14
*
22
34
30
11
l
l
*
0
10
43
27
9
10
Y
Sweden
L
t
76
12
7
Israel
25
37
9
9
16
S
Poland
G
Elementary only
Some secondary
Secondary completed
Some college
Y
France
50
26
15
5
+
Yugoslavia
5
58
21
9
4
9
S
4
l
78
55
23
11
c
78
l
10
16
*
+
G
S
Scotland
U.S.A.
SWit2.
Germany
22
30
18
*
23
26
42
10
5
15
I
Israel
R
8
L
S
Scotland
SVMZ.
8
45
27
49
30
14
21
35
17
10
l
l
9
l
17
of means.
Table 5 shows the percent of cases with fathers at different levels of education or occupation.
It is clear from this table that the different national samples were not comparable with respect
to distribution
of education or occupation of fathers. What is not clear is the extent to which
these sample differences
reflect similar differences
in the total national population and the
extent to which they reflect biases in the specific sample tested in that country. Thus the Scottish
sample showed twice as many unskilled and semi-skilled workers as the German sample, twoand-a-half times as many as the Swiss, and five times as many as the sample from Israel. How
shall we understand this? Examination of the sampling procedure brings out that the Israeli
sample was limited to that segment of the population who were of European origin, that the
Swiss sample was limited to Geneva, while several fee-paying schools were excluded from the
Scottish sample. Thus, in part at least the differences
between nationalities
appear to reflect
differences
in sampling. However, it is also probably true that differences
reflect in part actual
national differences,
especially in the amount of schooling. Thus, the differences
in educational
level of parents in Sweden and in Yugoslavia are certainly at least in part a reflection of the
past educational
level prevailing in the two countries. And to come from a family in which the
father has completed secondary education certainly signifies a less outstanding experience in the
United States than in most European countries, where such a level of education is still the exception
rather
than
a typical
event.
However,
in education
also
the figures
suggest
that
the sample
be non-representative
of the total population in some countries. Thus, a Polish sample in
which 40 s/s of fathers have completed secondary education hardly seems representative
of the
total Polish population
in the age range 40-50.
Comparisons
of national sub-groups in which the level of parental education or occupation is
may
uniform
from
national
groups, though they have some limitations
country
to country
are almost
certainly
more
meaningful
as pointed
than
comparisons
out in the preceding
of total
paragraphs.
Table 6 shows comparative
data for the non-verbal aptitude test. Average score is quite clearly
related to average level of fathers’ education, and to a somewhat lesser degree to fathers’
occupation.
Some fairly substantial differences
between countries remain, when comparisons
are restricted to those of a common level of parental education or occupation. However, it should
be noted that the countries that show up least favorably in such a comparison, as of those with
some high school education, are countries like Sweden and the United States in which almost
all parents had received at least that much education - i. e., where education had been most
nearly universal and non-selective.
TABLE 6
Non-Verbal Test Averages by Level of
Father’s Education or Occupation
B
Level of father’s educ.
Elementary only
Some secondary
Secondary completed
Some college
College completed
France
Poland
-6
23
50
-
33
96
115
- 24
-6
-
0
23
23
40
-34
-14
9
-
Unskilled, semi-skilled
Skilled, farmer
Clerical, sales
Sub-professional
Prof. & Managerial
0
I
-60
-33
18
69
49
B
Father’s occupation
Y
England
G
Elementary only
Some secondary
Secondary completed
Some college
College completed
0
Belgium
Sweden
-93
-59
-12
23
- 26
24
-
S
- 26
16
-
1
-
-65
7
L
Yugoslavia
U.S.A.
-52
-
-25
-19
3
Y
R
S
12
S
-76
- 14
24
-
G
Germany
Israel
Scotland
SWltZ.
Germany
- 1
-8
15
24
59
- 3
36
48
74
79
-12
28
59
58
-
29
41
77
53
- 28
-34
-18
-13
39
-92
-70
-63
-25
I
Israel
4
33
55
50
86
R
L
S
Scotland
0
41
37
-
SWlb.
24
60
39
53
29
TABLE 7
Mathematics Test Averages by Level of
Father’s Education or Occupation
B
Level
of father’s
educ.
Belgium
England
23
56
88
-
-58
38
43
-
Elementary only
Some secondary
Secondary completed
Some college
College completed
0
France
11
52
-
5
10
55
-
College
-
completed
-64
-8
-48
occupation
0
Y
Sweden
I
R
L
U.S.A.
-37
IO
-
-107
-124
22
33
51
-
-31
13
-
56
-
S
Israel
Scotland
Unskilled, semi-skilled
Skilled, farmer
Clerical, sales
Sub-professional
15
11
26
61
-7
25
34
62
-74
-31
-7
15
Prof. & Managerial
87
71
-
Switz.
-148
-86
-43
-9
-70
-30
-143
- 79
- 59
-
-43
8
-
G
Germany
Yugoslavia
S
-12
19
32
-
B
Father’s
S
46
70
69
71
G
Elementary only
Some secondary
Secondary completed
Some college
Y
Poland
I
6
R
L
S
Germany
Israel
Scotland
77
52
72
-
-7
-14
30
3
-22
12
37
49
-71
-28
-23
-
44
61
38
-
88
42
55
-
36
Switz.
TABLE 8
Reading Test Averages by Level of Father’s
Education or Occupation
B
Level
of father’s
educ.
Elementary only
Some secondary
Secondary completed
Some college
College completed
Belgium
England
-47
-25
-12
-
France
-53
-25
-
-
-20
53
91
-
College
completed
-3
39
20
-61
-39
-11
-
S
Sweden
-13
12
17
18
I
R
L
occupation
Unskilled, semi-skilled
Skilled, farmer
Clerical, sales
Sub-professional
Prof. & Managerial
30
Germany
26
22
57
90
96
Israel
-11
21
28
55
67
-65
4
38
-45
22
-
40
S
-
-
-
12
-
4
BOYS
Father’s
Yugoslavia
-96
-
-37
-19
2
7
U.S.A.
-38
-4
-
-30
24
-48
-
-
Y
Poland
G
Elementary only
Some secondary
Secondary completed
Some college
0
-80
1
Switz.
Germany
-13
21
58
78
-
42
28
38
63
19
7
32
47
86
-8
51
6
-
G
Scotland
-66
-16
I
Israel
-20
19
35
56
62
R
57
L
Scotland
-2
48
41
-
S
Swttz.
7
49
53
51
TABLE
9
Geography Test Averages by Level of
Father’s Education or Occupation
B
Level
of father’s
educ.
Elementary only
Some secondary
Secondary completed
Some college
College completed
Belgium
England
-33
-2
18
-
-44
31
52
-
College
completed
-60
-58
- 21
-
-68
-17
-23
-
Father’s
occupation
Germany
Unskilled, semi-skilled
Skilled, farmer
Clerical, sales
Sub-professional
Prof. & Managerial
0
-25
-
-
-14
-
50
15
56
71
97
95
-
2
12
42
-
-51
-1
-113
31
-
-40
-3
39
74
TABLE
32
-92
-53
-40
-
-46
7
-
Germany
52
50
69
53
-
S
G
Switz.
5
0
-
L
-72
-34
-47
-
R
Yugoslavia
-132
-
S
Scotland
Israel
61
59
83
88
139
I
U.S.A.
Sweden
45
63
70
Y
S
14
8
-
-22
18
25
-
B
Y
Poland
G
Elementary only
Some secondary
Secondary completed
Some college
0
France
I
Israel
23
25
64
35
94
-18
R
L
S
Scotland
2
41
54
75
76
Switz.
-53
-22
-18
-
8
46
32
46
-
10
Science Test Averages by Level of
Father’s Education or Occupation
B
Level
of father’s
educ.
Elementary only
Some secondary
Secondary completed
Some college
College
completed
Belgium
England
-19
18
12
-
16
95
99
-
-
-70
-57
-14
-
-35
-36
16
15
B
Father’s
occupation
Unskilled, semi-skilled
Skilled, farmer
Clerical, sales
Sub-professional
Prof. & Managerial
Germany
0
Israel
S
-
-
Yugoslavia
4
-
6
50
69
-
70
-
-
-41
-
-71
I
R
L
-38
-34
-16
8
15
9
-
70
-73
-80
-56
-15
- 19
5
-
22
-
G
Switz.
-44
-7
23
-
S
S
Scotland
U.S.A.
Sweden
56
66
60
-
-66
-38
-44
Y
Y
Poland
G
Elementary only
Some secondary
Secondary completed
Some college
College completed
0
France
I
R
L
S
Germany
Israel
Scotland
Switz.
-42
-5
-42
87
61
96
90
1
45
57
73
5
32
58
50
40
17
42
-
25
30
26
25
102
87
-
40
54
1
32
-6
-
-62
-25
-35
-
9
-
-41
2
31
Tables 7-10 show results for the mathematics, reading, geography and science tests respectively.
The relative performance
of different countries, and of boys and girls, differs on the different
tests, reflecting the national and sex differences in patterns of achievement previously discussed
in connection with Tables 3 and 4. However, the patterns of achievement in relation to education
or occupation of father are much the same from test to test. A crude pooling of results from
different tests and countries yields the following results:
BOYS
Father’s education
Elementary only
Some secondary
Secondary completed
Some college
College completed
Father’s occupation
Unskilled, semi-skilled
Skilled, farmer
Clerical, sales
Sub-professional
Professional
Girls
-36
-2
36
36
41
- 58
-27
1
12
13
14
28
51
66
79
-10
18
25
36
50
There is a difference of roughly three-fourths
of a standard deviation in average score between
the lowest educational category and the highest, a range of about two-thirds between extreme
occupational
groupings. Though the gradient seems to be a little less steep for girls than for
boys, the general pattern is quite similar for the two sexes. The gradient is common to most of
the countries studied. but there are one or two exceptions. Poland shows relatively little diffe
rence associated with level of father’s education, Switzerland
relatively
little associated with
father’s occupation. One can only speculate on whether this represents some peculiarity in the
specific sample or whether school achievement
in these countries is less related to parental
education or occupation.
Additional tabulations were carried out by size of community in which the pupil resided. Communities were classified into those of under 2,000. those of 2.000 to 20,000, and those of over
20,000 inhabitants. However, there were certain ambiguities in this coding from country to country. It was not entirely clear whether place of residence meant the community in which the
school was located, or the immediate community in which the pupil had his home. Thus, in some
countries, farm children living in quite rural areas were apparently
coded as coming from
communities of two to twenty thousand because that is where they attended school. There was
also no systematic attempt to have the rural areas in the sample be representative
of all rural
areas in the country and the urban areas be representative
of all urban areas. Thus, in the United
States the primarily rural area was from a rather prosperous mid-western farming area, whereas
the two urban communities were primarily rather undistinguished
industrial centers.
The average result over all tests is summarized in Table 11. In general, though with some exceptions, those in very small communities did slightly less well (by about one-fifth of a standard
deviation, on the average) than those in larger communities, but no differences were found between the two categories of larger communities. A comparison of the different tests, as averaged
over all countries, suggests that the differences associated with size of community are greatest
in the case of mathematics and reading, least for the non-verbal test. Results for the separate
tests are shown in Table 12.
Average
TABLE 11
Score by Size of Community
Pooled Results on All Tests
B
Under
2C0J
0
Y
2000 - 20 oco
for
S
G
Over
20 000
England
-15
Finland
France
Germany
Poland
Scotland
Sweden
U.S.A.
Yugoslavia
31
21
-4
-44
- 34
- 94
-22
56
42
43
2
- 20
-37
-
4
-11
58
58
-8
-13
- 33
- 30
Crude Average
- 20
8
4
Under
2000
I
2000
R
L
- 20 000
S
Over
20 000
-
-32
- 20
4
2
- 23
- 33
-48
-6
-99
-51
18
11
6
2
-18
-15
-50
-11
-
-34
-13
-14
- 24
22
26
-9
- 29
- 26
-58
No data on size of community available for Belgium.
Different
system
of categories
used
for
Israel.
TABLE
12
Average Score by Size of Community for Each Test
Based on Pooled Data from All Countries
B
Under
Non-Verbal
Mathematics
Reading
Geography
Science
Item difficulties:
Zoo0
-17
-44
- 27
-22
15
resemblance
0
Y
2000 - 20 CM
-2
0
-3
5
42
S
G
Over
20 000
-17
-8
4
-3
27
Under
2000
- 26
-54
-31
- 28
-31
I
2000
R
L
- 20 000
-16
-20
Over
20 000
-20
-13
0
-14
-15
S
3
-11
- 25
between countries
In addition to analyzing scores on tests and sub-tests, it was also possible to study responses to
specific items country by country. The original tabulations showed the frequency with which each
wrong option to an item was selected as well as the frequency of correct response. However,
most of the analyses to be reported here deal only with the correct responses. These are studied
from two points of view. First, correlations
are presented showing the degree of consistency of
item difficulty from country to country. Secondly, certain special groups of items are examined to
throw more light on certain elements of content that are especially easy or difficult in different
countries.
Tables 13-16 show the correlations
of item difficulty among eleven countries (excluding Yugoslavia). The correlations
are over the population of items. A high correlation
signifies that the
same items are difficult and the same ones easy for the pair of countries in question.
The first thing that impresses one as one scans Tables 13-16 is the generally substantial correlations across countries. The average correlation
is .87 for mathematics,
.87 for reading,
.68 for geography, and .72 for science. The high correlations
for mathematics and reading are
especially
impressive. A difficult item is a difficult item in these two tests, regardless of the
school system in which the pupil has been educated or the language in which his schooling has
been couched. The reading test is particularly
noteworthy,
because the differences
between
countries are small both in level of average score and in the relative difficulty of different items.
33
TABLE 13
of Item Difficulties Between
Mathematics Test
Correlations’
1. Belgium
2. England
3. Finland
4. France
5. Germany
6. Israel
7. Poland
8. Scotland
9. Sweden
10. Switzerland
11. U.S.A.
l
Average
of correlation
1
2
3
4
5
6
7
8
9
10
11
86
92
95
92
90
84
84
90
97
80
86
89
78
93
88
69
98
92
89
90
92
89
90
95
96
76
89
90
95
92
95
78
90
85
86
75
76
84
91
75
92
93
95
85
93
78
94
95
95
91
90
88
96
86
93
75
87
86
93
89
84
69
76
75
78
75
71
76
83
60
84
98
89
76
94
87
71
93
88
92
90
92
90
84
95
86
76
93
92
91
97
89
95
91
95
93
83
88
92
86
80
90
92
75
91
89
60
92
91
86
-
for boys
and for girls.
TABLE 14
of Item Difficulties Between
Reading Test
Correlations’
Belgium
England
Finland
France
Germany
Israel
Poland
Scotland
9. Sweden
10. Switzerla: nd
11. U.S.A.
l
Average
of correlation
2
3
4
5
6
7
8
9
10
11
88
89
98
83
87
87
85
92
96
89
88
88
86
86
91
82
98
92
85
96
89
88
84
88
84
86
85
91
85
89
98
86
84
80
84
85
83
91
94
86
83
86
88
80
84
81
87
89
81
88
87
91
84
84
84
88
90
85
83
90
87
82
86
85
81
88
82
82
82
86
85
98
85
83
87
90
82
88
84
94
92
92
91
91
89
85
82
88
89
91
96
85
85
94
81
83
82
84
89
85
89
96
89
86
88
90
86
94
91
85
-
for boys
and for girls.
TABLE 15
of Item Difficulties Between
Geography Test
Correlations*
1. Belgium
2. England
3. Finland
4. France
5. Germany
6. Israel
7. Poland
8. Scotland
9. Sweden
10. Switzerlal 7d
11. U.S.A.
* Average
34
of correlation
Countries
1
1.
2.
3.
4.
5.
6.
7.
8.
Countries
Countries
1
2
3
4
5
6
7
8
9
10
11
63
69
93
67
62
56
70
67
90
68
63
67
61
74
54
26
95
82
60
89
69
67
67
84
77
55
74
82
77
72
93
61
67
64
55
54
67
65
86
70
67
74
84
64
84
54
77
80
76
74
62
54
77
55
84
66
64
71
62
58
56
26
55
54
54
66
35
40
49
31
70
95
74
67
77
64
35
85
65
88
67
82
82
65
80
71
40
85
70
84
90
60
77
86
76
62
49
65
70
66
68
89
72
70
74
58
31
88
84
66
-
for boys
and for girls.
TABLE 16
of Item Difficulties Between
Science Test
Correlations*
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
l
Belgium
England
Finland
France
Germany
Israel
Poland
Scotland
Sweden
Switzerland
U.S.A.
Average
of correlation
Countries
1
2
3
4
5
6
7
8
9
10
11
73
73
94
89
04
46
68
70
90
72
73
70
68
78
84
37
95
72
75
83
73
70
70
78
81
64
65
79
77
75
94
68
70
84
79
39
63
67
84
63
89
78
78
84
88
48
72
78
88
80
84
84
81
79
88
51
82
88
79
84
46
37
84
39
48
51
37
57
47
53
68
95
65
63
72
82
37
71
68
80
70
72
79
67
78
88
57
71
84
79
90
75
77
84
88
79
47
68
64
78
72
83
75
83
80
84
53
80
79
78
-
for boys
and for girls.
The correlations
in Tables 13-16 appear to show a certain amount of clustering. In order to bring
out
the structure more clearly, a factor analysis of the correlation
tables was carried out. The
rotated factor loadings (by varimax rotation) are shown in Tables 17-20. Countries have been
rearranged in the tables to bring out most clearly the clusters.
TABLE 17
Mathematics Test
Loadings of Rotated Factors
Factor
1
2
Belgium
France
Switzerland
Poland
Finland
Israel
Germany
Sweden
England
Scotland
U.S.A.
96
90
98
80
97
95
98
96
94
94
91
-ia
-09
-10
-38
08
05
00
03
08
09
37
X of variance
87.8
3.2
3
17
27
12
06
12
08
-08
-09
-26
-29
-05
2.8
4
07
00
04
02
-11
-21
-02
17
-02
04
06
0.9
5
-11
-14
- 01
10
06
02
07
06
-12
-03
11
0.7
35
TABLE
16
Reading Test
Loadings of Rotated Factors
Factor
1
Belgium
France
Switzerla md
Poland
Israel
Germany
Finland
Sweden
U.S.A.
England
Scotland
96
94
93
90
93
90
93
95
96
96
95
% of variance
87.9
2
- 25
- 31
- 26
00
09
17
07
-01
13
16
18
4
3
05
- 02
-03
10
-10
12
23
11
-06
-14
- 22
-01
-01
05
-26
-19
01
00
16
05
12
05
3.1
1.4
1.6
5
04
08
- 10
05
- 02
-12
05
04
09
06
-07
0.5
TABLE 19
Geography Test
Loadings of Rotated Factors
Factor
1
2
3
Belgium
France
Switzerland
Poland
Israel
Germany
Finland
Scotland
England
U.S.A.
81
79
83
50
78
89
88
92
89
a9
-11
-06
-05
- 60
- 48
- 20
-19
16
29
25
-53
-53
-38
-17
16
14
08
10
13
04
% of variance
62.0
7.8
4
-04
02
27
-03
05
12
22
- 28
- 25
-08
7.4
36
“.., . _ - .
_
2.6
5
-05
09
-13
04
-06
-18
06
-02
-03
15
0.8
TABLE 20
Science Test
Loadings of Rotated Factors
Factor
1
2
Belgium
France
Switzerland
Germany
Israel
Poland
Finland
Sweden
U.S.A.
England
Scotland
92
a7
91
94
95
57
86
86
a7
89
85
25
29
11
13
-02
07
06
- 01
- 21
-40
-44
g of variance
75.3
5.3
3
-13
-17
00
-06
-02
48
33
16
12
-12
-09
4.1
4
- 14
-09
-32
-03
21
05
00
37
05
-06
04
2.9
5
-14
-18
06
11
04
-04
06
04
12
01
-09
0.9
The most striking feature of Tables 17-20 is the large proportion of variance accounted for by
the first factor. This can be thought of as a general factor of difficulty determined by the content
of the item and independent of country. Loadings on later factors, corresponding
to sub-groups
of country, are quite small and account for only a minor fraction of the variance. The later factors
seem to be bi-polar factors in most instances, discriminating
one sub-group of countries from
another.
In mathematics, Factor 2 involves primarily difference between Poland and the USA, while Factor
3 involves difference between the French-speaking
and the English-speaking
countries. The other
factors are of no consequence.
The only factor beyond the first that seems to amount to anything in the reading test is Factor 2,
which again discriminates French-speaking
from English-speaking
countries.
Factors 2, 3 and possibly 4 appear to amount to something in the case of the geography test.
Factor 2 contrasts a cluster composed of Israel. Poland, Germany and Finland with the Englishspeaking countries. Factor 3 groups together the French-speaking
countries. Factor 4 contrasts
England and Scotland with Switzerland and Finland.
On the science test, Factor 2 seems once again to be the French-speaking
versus Englishspeaking factor. Factor 3 links together Poland and Finland, while Factor 4 separates Switzerland from Sweden.
The language groupings represented
in these factors after the first make sense at least in that
the test remains completely uniform for all countries using the same language. It is also quite
possible that educational
patterns are more alike within language groups. A knowledgeable
person may see some rationale underlying the other groupings and polarities that is not apparent
to the author.
Item difficulty:
special groups of items
Examination of item content suggests certain hypotheses concerning groups of items that can
be tested by examining item difficulty
in different countries. In this section, several of these
hypotheses will be stated and examined. The basic data in terms of which these hypotheses are
examined are item difficulty
deviation values. The procedures
for deriving these values are
stated below.
1. The percent right on each item in each country was first transformed
into a normal deviate,
using tables of the normal curve.
2. The average scaled value was obtained for each item over all countries, and the scaled value
in any one country was expressed as a deviation from this. This procedure brought all items to
37
a common base line, so that the deviation value for a given country had comparable meaning
from item to item.
3. The average deviation value over all items on a single test was computed for each country,
and this average deviation value was subtracted from the single item deviation values. This
procedure eliminated differences
in average level of performance
between countries, and left
a residual deviation value, expressed in standard-score
units, that indicated how much harder or
easier that item was for that country than would be expected on the basis of average difficulty
of the item and average performance level in the country.
Hypothesis
1
Decimal fractions will be relatively
in the English-speaking
countries.
easy in the French-speaking
countries,
and common fractions
The mathematics test includes four items dealing directly with the manipulation or understanding
of decimal fractions and three dealing with common fractions. The residual difficulty indices for
these were averaged over the items and for boys and girls, and are shown below for eleven
countries. We show the average residual for decimal fraction items, the average residual for
common fraction items, and the difference between them.
Decimals
Belgium
England
Finland
France
Germany
Israel
Poland
Scotland
Sweden
Switzerland
U.S.A.
18
-46
13
30
01
-11
08
-42
18
10
01
Fractions
-10
10
-10
-71
14
13
-07
19
-03
03
44
Dec.-Fract.
28
-56
23
101
-13
- 24
15
-61
21
07
-43
From the tabulation, we see that for Belgian children the difference in residuals favors the decimal
items to the extent of about 28/100 of a standard deviation on our normal deviate scale, on the
average. By contrast, for English children, the fractions items are relatively easier by 56/100 of
a standard deviation. In general, our hypothesis
is supported because the large differences
favoring fractions
are all for the English-speaking
countries, and all of the French-speaking
countries show a difference favoring decimals. However, the difference between France on the
one hand and Belgium and Switzerland
on the other is also quite striking. French children of
this age appear to be especially weak on items dealing with fractions.
The major differences
which we find in these groups of items are explicable
in terms of the
systems of weights and measures in the countries involved, and a corresponding
curricular
emphasis. The English, Scats and Americans have so many measures that go by 3’s, 4’s, or 12’s
that they must spend instructional
time on denominate numbers and the fractions that go with
them. The continental countries, relying almost entirely on a decimal metric system can concentrate on decimal fractions, and give other types of fractions only limited emphasis. The results
suggest that this is what has taken place.
Hypothesis
2
National differences will be greater in the relative
of specific places than in geographical
information
difficulty of items dealing with the geography
relating to the world as a totality.
It was possible to identify in the geography test five items dealing with facts about specific
places and five others dealing with concepts of latitude, longitude, and time. Average residual
values were computed for each of these sets, and are shown below.
38
Specific
Belgium
England
Finland
France
Germany
Israel
Poland
Scotland
Sweden
Switzerland
U.S.A.
Places
Latitude
-16
-31
& Longitude
9
-5
-4
3
-18
10
50
82
-11
-13
-12
-45
6
-5
-9
54
-10
-17
-11
-7
In general, the hypothesis is supported by the data. The largest average residuals tend to relate
to specific place geography. Israel and Poland perform notably well in these items, and England
and the United States relatively
poorly. The items on latitude and longitude show generally
smaller residuals, only Poland performing on these items somewhat better than would be expected in the light of her performance
on the total test. The results suggest fairly wide national
differences
in emphasis on the teaching of specific geographic facts; at least they are learned to
different degrees.
Hypothesis
3
Reading test items will be easier for pupils of the country and language from which the passage
and items originally come than for other countries and especially
those speaking a different
language.
The reading test was composed of five passages, two of which had originally been in French
(one from Belgium and one from France), two in German, and one in English (from the United
States). It seems plausible, at least, that these might be easier in the language in which they
had been written and for the national culture for which they had been designed. Therefore, the
residuals were computed separately for each passage for each of the English-speaking,
Frenchspeaking and German-speaking
countries. (In Belgium and Switzerland
testing was limited to
French-speaking
parts of the country.) Results are shown below.
Passage No. I
(Belgium)
Belgium
England
France
Germany
Scotland
Switzerland
U.S.A.
00
-06
12
-08
-11
35
-13
Passage
No.
(France)
07
-04
13
-19
-13
- 09
-06
2
Passage
(Germany)
No. 3
Passage
No. 4
(USA)
05
01
-07
04
04
04
05
-07
-03
- 10
10
07
-15
11
Passage
No. 5
(Germany)
-06
13
-09
12
11
-10
02
Examination of the table shows that the results are in general supported. More generally, the
passages of French language origin seem to be slightly easier for the French-speaking
countries
and the passages of either English or German origin for the English- and German-speaking
countries. Thus, the precaution of choosing original passages from different language sources appears
to have been a sound one. However, even though the differences
appear in a fairly consistent
pattern, they are generally of very small size. The task difficulty seems to transcend language in
considerable
measure. Thus, once again, the universality
of the reading task is affirmed.
39
Hypothesis
National
clusion.
4
groups
show consistent
differences
in willingness
to express
certainty
about a con-
The second part of the science test consists of five statements, for each of which the pupils must
pick one of the five choices: Definitely True, Probably True, Impossible to Determine, Probably
False, and Certainly False. The keying of each item was based on the pooled judgment of several
faculty members at Teachers College, Columbia University, New York, where the test was constructed.
Our current interest is in the nature of the erroneous
can be wrong in any one of these ways:
answers.
Over the set of items, an answer
(1) An examinee can be too sure. That is, he can mark an item definitely true or false when
it is keyed only probably true or false, or he can choose one of the four other alternatives when
he should mark the item indeterminate.
(2) An examinee can be too cautious. That is, he can mark an item as probably true or false
or as indeterminate when he should have marked it definitely true or false, or he can mark the
item indeterminate when he should have marked some other choice.
(3) He can be grossly in error. That is, he can mark an item on the true side when he should
have marked it on the false side, or vice versa.
For the set of 5 items, there were 20 wrong response options. Of these, 6 were of Type 1, 6 of
Type 2, and 8 of Type 3. We have examined the results for the different countries to see what
proportion of the choices fell on each of these types of error each time the opportunity offered.
That is, we have divided the total number of errors of a given type by the total number of
opportunities
to make that category of error. We have also determined the ratio of too sure
errors to too cautious errors, providing an index of readiness or reluctance to jump to conclusions. The results are shown by sex for 10 national groups*.
o/o
Too Sure
Belgium
England
Finland
France
Germany
Israel
Poland
Scotland
Sweden
U.S.A.
21.9
23.7
13.8
15.8
13.9
15.6
22.3
26.7
24.0
21 .l
16.9
16.3
19.9
20.6
14.6
14.6
14.5
13.4
14.0
12.7
Boys
Girls
Boys
Girls
Boys
Girls
Boys
Girls
Boys
Girls
Boys
Girls
Boys
Girls
Boys
Girls
Boys
Girls
Boys
Girls
o/o
o/o
Too
Cautious
Gross
10.0
14.2
15.0
17.3
17.2
20.5
11.1
12.2
9.8
12.0
14.7
16.8
19.7
22.4
14.2
18.4
22.6
25.9
18.7
21.5
Error
Index
of Sureness
2.19
1.68
0.92
0.92
0.81
0.76
2.00
2.19
2.46
1.75
1.15
0.97
1 .Ol
0.92
1.03
0.80
0.64
0.52
0.75
0.59
10.8
10.6
11.3
11.6
14.0
14.3
9.5
7.3
7.1
9.7
9.8
9.8
14.3
14.1
10.2
10.9
6.7
7.5
12.7
12.6
Clearly, there are substantial differences
between national groups in their tendency to be too
assured or too cautious on these items. When Belgian, French or German children made errors,
they were much more likely to be errors of over-sureness.
When Finnish, Swedish or American
l
40
Frequencies
of choice
of the
separate
response options were not
available
for
Switzerland
and
Yugoslavia.
pupils made errors, they were likely to be errors of over-caution.
We are dealing, of course, with
a very limited sample of items, and the differences we find may be peculiar to this item sample
rather than a more general characteristic
of the national educational
system or cultural background. However, the differences
are certainly suggestive, and open up a line of inquiry that
might be pursued with profit using a larger and more varied sample of tasks.
Concluding
statement
The preceding pages have shown some of the kinds of comparisons that are possible when academic achievement is examined at the same time in a number of countries with the same set of
tests. These results are of some value for themselves for the international
differences
and simi!arities that were found and described. They are certainly also of interest as exhibiting a mode1
of international cooperation and of empirical comparative education that may be further developed in the future.
41
Fernand
FROM
Hotyat
BELGIAN
INTERNATIONAL
AND
NATIONAL
INTERPRETATIONS
DATA
Following
Professor
Thorndike’s
overall
view of the results,
we publish
two accounts
In which
the
authors
have used the data for the lnterpretatlon
mainly
of national
performances.
In the first of
these,
M. Hotyat
concerns
himself
particularly
with
some striking
analogies
between
patterns
of achievement
in French-speaking
countries
and contrasts
these
with
patterns
of achievement
on the same groups
of test Items in English-speaking
countries.
M. Hotyat
also uses the international
results
to throw
light
on certain
aspects
of the Belgian
situation,
and in the second
half of hls article
analyses
the Belgian
results
according
to school
type,
sex. regularity
of
pupils’
promotion
through
the grades,
and nationality
of parents.
In the course
of his paper he
suggests
a number
of interesting
methods
which
can be used to interpret
data of this kind.
The Belgian share in the research was carried out by a team from the Centre de Travaux de
I’lnstitut Superieur de Padagogie du Hainaut (Mme. Delepine, M. M. Hotyat. Lowyck, Rousseaux.
Manouvrier).
In accordance
with the decision taken by the international
research group, the
analyses made by the Belgian participants aim at profiles of achievement which provide fruitful
opportunities
for interpretation.
In order to do this, we have constructed
profiles in such a
manner that in each country the average score on each of the five tests is reduced to zero. The
following statistical approach was adopted with this in mind:
1. We calculated for each test the mean and standard deviation
of all the test scores, and
expressed the mean for each country as a standard score*.
2. We established for each country the mean of its 5 standard scores.
3. We scaled down the number thus obtained from each of the 5 scores so that, for each country,
the mean of the marks equals zero.
Here is an example based on the standard scores from one of the countries:
Non-verbal
+28
Mathematics
- 30
Reading Comprehension
1-25
Science
0
Geography
-8
Their mean is: 28 - 30 + 25 + 0 - 8
= c3
5
By deducting 3 from each of the standard scores we obtain scores with a mean of 0. We thus
arrive at the following adjusted scores: Non-verbal, +25; mathematics, - 33: reading comprehen\
sion, + 22; science, -3; geography, - 11.
This process finally enables us to establish a profile of the scores obtained by each sample in
relation to its own mean.
For example, Table I and Figure I shown the results from the groups in the three French-speaking
countries which took part in the study. The samples were taken from Lisieux in France, from
Geneva in Switzerland, and from a central eastern region of Hainaut in Belgium.
l
The standard
deviation
from
scores
the mean.
express
the
positive
or
negative
value
of
a test
score
in
hundredths
of
a
standard
43
TABLE
Non-verbal
France (Lisieux)
Switzerland (Geneva)
Belgium (Hainaut)
1
Reading
Comprehension
Maths
- 20
t31
-6
+17
t9
1-21
14
+9
t8
+45
- 21
FIGURE
Non-verbal
+
0.4
Geography
Maths
Science
- 20
-42
-17
-12
1
Reading
Comprehension
Science
Geography
-
0
--
Lisieux
Geneva
-
-
_ -
-
Hainaut
-
. -.
These three profiles show some interesting analogies, in particular a peak in mathematics
troughs in reading comprehension and in science which are common to all three.
and
A detailed analysis of the profiles obtainable from the scores of each of the twelve participating
countries would be beyond the scope of this article, and we shall therefore limit ourselves in
Table 2 and Figure 2 to comparing the profiles of the means of the three French-speaking
samples
with those obtained similarly from the three English-speaking
groups in England, Scotland and
the USA.
The way in which the two profiles appear to complement each other is striking. (It should be
remembered that the two language groups which are here being compared represent only half
of the national samples which took part in the research.)
TABLE 2
Non-verbal
Fr.-speaking countries
Engl.-speaking countries
44
-1
+23
Maths
+32
-34
Reading
Comprehension
-8
+15
Geography
-1-3
-21
Science
-25
+17
FIGURE 2
Non-verbal
Maths
Reading
Comprehension
Results
from the English-speaking
samples-
Results
from the French-speaking
samples
Geography
_
_
-
-
Science
-
-
-
_
Of course, even if the research had related to true national samples, it would still be rash to
draw any conclusions
from these findings about the quality of the respective school systems;
other factors would have to be considered, such as curricula, time-tables, and the form of the
tests. But we have now developed a method which will enable us to make comparisons
based
on real national samples when subsequent research has been carried out.
Relative
difficulty
of items
We asked ourselves, too, to what extent the order of difficulty of items in each of the subjects
tested was the same from country to country when their results were compared.
1. In our preliminary analysis we applied Yule’s” formula to the percentage of success on each
item obtained by each national group. The correlation coefficients
are spread out in the following
way (220 coefficients
for 11 samples and 4 subjects): in mathematics, from 0.75 to 1, with a
mean of 0.94; in reading comprehension
from 0.60 to 1, with a mean of 0.89; in geography from
0 to 0.92, with a mean of 0.68; in science from 0.05 to 0.95, with a mean of 0.54.
Assuming that the order of difficulties
within each test were equivalent, these data would mean
that there existed a closer relationship
between the way in which mathematics was taught in
the various countries than there was in the teaching of geography and science.
2. The table of percentages of items correctly answered enables us to make comparisons which
are of definite educational
interest. Let us take, for example, on the one hand the mean percentages for three items involving fractions, and on the other for four items involving decimal
numbers, and compare the results obtained in Belgium with those from the Anglo-Saxon
countries
where the system of weights and measures requires early and intensive teaching of fractions.
The percentages are set out in Table 3.
* If a = the number
of items
for which
the results
are higher
than the mean
in the hvo samples
El and
if b = the number
of items superior
to the mean in El and inferior
in E2: if c = the number
of items Inferior
the mean in El and superior
in E2; and if d = the number
of items inferior
to the mean in both samples,
ad+bc
we have o = ad-bc
EC?:
to
45
TABLE 3
Belgium
Fractions (mean %I
Decimal numbers (mean %I
Anglo-Saxon
countries
76
61
78.5
75.5
All other things being equal, it would appear that pupils in the Anglo-Saxon
disadvantage because of the need to start learning fractions at an early age.
countries
are at a
3. By converting the percentages of items passed into standard deviations from various national
means, we are able to establish national profiles of the relative difficulty of items in each of
the tests, leaving out of account the absolute levels of national achievement.
These results, quantified in this way, are more precise and more flexible than those provided by
the ranking of items according to the degree of success achieved’on
them, and they offer very
interesting possibilities of analysis.
Thus, a comparison of these indices for a particular country enables us to examine whether the
order of results has a close correspondence
with the hierarchy of aims set up by the authors of
school curricula there, and, if this is not so, to study the teaching methods which are being
employed with a view to making whatever improvements seem desirable.
Table 4, for example, shows the standard deviations relating to some of the mathematical items,
obtained from the Belgian sample’s scores:
TABLE 4
2 written calculations (whole numbers)
2 written calculations (decimal numbers)
4 calculations of areas and volumes
1 problem regarding meeting point of moving objects
3 items requiring pupil to extract information from a table
Boys
Girls
- 0.145
+0.126
+0.25
- 0.21
- 0.28
-0.16
-to.02
+0.355
- 0.32
- 0.20
Is there any close correspondence
between these relative percentages
of success (which are
independent of the absolute scores in mathematics) and the order of priority in which we would
place these topics? The authorities would be faced with the solution of an educational problem
if they thought, for example, that accuracy in calculating with whole numbers had not been given
a sufficiently prominent ranking in the list of aims set for the teaching of mathematics.
A comparison of these deviations or of the profiles with those deriving from results obtained in
the other countries could also lead to interesting observations.
Thus, the mean deviations (in standard deviation units) for Belgium and Country X on three parts
of the geography test are shown in Table 5:
TABLE 5
Belgium
12 items of information
Interpretation of maps (16 items)
Relating facts to generalised statements
- 1.2
-2.1
+15
Taking into account the age of the pupils who are being tested, do these
respond to the hierarchy of aims which the education authorities assign to
If not, there are indications that we should study the way in which the
methods and procedures have been conceived in a country where, it seems,
reveal a more satisfactory balance.
46
Country
x
+19
-11.8
- 11.3
relative results corgeography teaching7
curriculum, teaching
the results achieved
Results according
to sex
If we first convert the scores of the boys to a standard score scale with a mean of 0, here for
purposes of comparison is the profile of the mean scores of the group of Belgian girls given as
standard scores:
Non-verbal
Reading
Comprehension
Maths
Geography
Science
+ 0.60
+ 0.4
+ 0.2
0
- 0.2
- 0.4
- 0.60
- 0.115
- 0.23
+ 0.015
- 0.24
- 0.54
The table of frequencies of mean plus and minus scores for girls from eleven countries
confirms this profile when compared with the boys’ results.
Non-verbal
Belgium
England
Finland
France
Germany
Israel
Poland
Scotland
Sweden
Switzerland
U.S.A.
Total
Maths
TABLE 6
Reading
Comprehension
Geography
Science
+
+
+
+
8/l 1
-I-
-
+
+
-
-
+
+
+
-
+
+
+
9111
5/l 1
9/l 1
-
-
ll/ll
(Table 6)
No. of negative
results
5
4
5
3
5
5
5
3
2
4
1
42155
Over the whole range of subjects tested we are able to conclude that, with the exception of
reading comprehension,
the girls’ results are clearly inferior to those of the boys and the gap is
particularly
marked in science. Various hypotheses could be advanced to explain this situation
which seems all the more surprising considering
that common programmes of instruction
are
followed by both boys and girls in most of the countries concerned. Is the relative weakness of
the girls’ results to be explained by educational factors - for example, teaching which is biased
towards literary subjects - or rather by the way in which the tests themselves have been conceived, so that they call especially for the types of intellectual
functioning
which come more
naturally to boys? Since the profiles of scores vary very greatly from country to country, we are
faced with the hypothesis (which needs to be verified experimentally)
that there is possibly the
47
influence of a combination of social and educational factors at the bottom of these differences*
The most striking feature is the general inferiority
of the girls’ average scores in science. It
would be very valuable if research could be done on this problem. The question has real social
significance
at a time when more room is being found for sciences in educational programmes,
and when the technical aspects of life require that schoolchildren
receive a basic training in which
the experimental sciences play an important part.
Correlations
between different
types of test
The non-verbal test which we used contains three types of test item requiring the following kinds
of reasoning:
- choosing a fourth figure which has the same relationship to the third as the first two have to
each other;
t
- extending a series of numbers or of letters in accordance with a pre-set pattern;
- picking out from a group the odd figure which does not conform to the principle governing
the others.
Comparisons have been made of the correlations
between this test and those parts of the mathematics and geography tests which depend on reasoning rather than information for an answer.
The correlations obtained (Bravais-Pearson
formula) are as follows:
Mathematics:
Section with items involving information and calculation . . . . . . . . . . . . . . . . . . 0.26
. . . . . . . . . . . . . . . . . . 0.50
Section with items involving arithmetical reasoning
Section with items involving geometrical reasoning . . . . . , . . . . . . . . . . . . . . . . 0.30
Geography:
Section with items involving information
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.23
Section with items involving interpretation
of maps . . . . . . . . . . . . . . . . . . . . . . 0.17
Section with items requiring facts to be related to general statements . . . . , . . . 0.64
Comparison of these results does not confirm the existence of a close link between the types of
reasoning brought into play in the non-verbal test and those required by the scholastic tests. By
way of contrast, Table 7 shows the correlations
between the three sections of the mathematics
test.
TABLE 7
Maths.
II
(arithmetical
reasoning)
Maths.
I (information
Maths. II (arithmetical
0.72
-
and calculations)
reasoning)
To sum up, in a subject like mathematics
range of information and skills.
The dispersion
Maths.
(geometrical
the intellectual
activity
III
reasoning)
0.81
0.72
involved
depends
on a whole
of test scores
We have taken as the coefficient of variation the relationship between the standard deviation and
the mean. Table 8 shows these coefficients
in the form of percentages for each of the tests and
each of the national samples.
* This hypothesis
metic.
Thus Toivo
the same textbook,
and significant
not a significant
48
has already
been found to have some foundation
in experimental
studies
in the field of arithVahervuo
studied
the performance
of 30 classes
in the 4th school
year in Helsinki,
all using
and discovered
that the results
of the boys’ classes
were superior
to those of girls’
classes
at the
level.
2 y.
level:
but
he also
observed
a slight
superiority
of the
girls
in mixed
classes,
although
TABLE 8
ABCDEFGHIIK
Countries
Means
Tests
Non-verbal
26
30
36
31
41
37
48
Mathematics
Reading comprehension
Geography
Science
19
19
21
21
29
24
26
20
27
24
26
24
30
29
36
40
46
28.9
26
26
28
26
28
24.1
24
20
24
23
28
30
28
33
35
33
36
28
33
33
29
40
36
34
36
36
35
36
40
35.2
Means
23.7
23.7
26.5
27.5
28.2
28.5
30
31
33.5
33.7
37.5
43
35
34
36
36.9
Two striking observations
emerge from this table:
1. The coefficients
of variation vary quite noticeably according to subject. In the scholastic tests,
they are almost constantly lowest in reading comprehension
and highest in science. Two chief
hypotheses suggest themselves to explain these differences.
The first to occur to us is that the
tests are not all equally discriminating. However, it is also quite possible that the teaching methods
at present in use lead to children producing less homogeneous results in natural sciences than in
other subjects. Experimental research is needed to determine to what extent the second factor
influences the scores.
2. These coefficients
vary considerably
from country to country (the difference
between the
extremes is over 50 O/J. Assuming that the same care had been taken when drawing the samples
in every case, this would mean - not taking into account the absolute value of the means that the national systems produced far from homogeneous
results. Some questions of profound
importance emerge from this: Is the dispersion of results satisfactory
in relation to the general
educational aims in a given country? If not, what sort of curricula and methods have led in other
school systems to a variability which could be considered more desirable?
INTERPRETATIONS
1. General
OF DlFFERENCES
IN PERFORMANCE
WITHIN
THE BELGIAN
SAMPLE
situation
One should remind the reader (see description
of Belgian sample,
analysis does not concern all Belgian schools but is only a regional
mean scores obtained by the Belgian sample (scores in hundredths
after the general mean has been converted to 0).
p. 10) that the following
study. Table 9 gives the
of a standard deviation,
TABLE 9
Non-verbal
Total
Boys
Girls
Maths
Readmg
Comprehension
Geography
+10
+45
- 27
-15
-;25
+58
+32
- 23
i-3
-5
These results are particularly
high in mathematics
-31
and particularly
- 34
Science
- 25
+2
-53
low in reading comprehension.
Going beyond this, we can compare the scores on the various sections of the tests. In mathematics, the Belgian scores, which are barely satisfactory
for arithmetical calculations,
are particularly high for items which require reasoning and the use of concepts. At the other extreme, in
reading comprehension
they are expecially low for literary, historlcal and economic texts, One
wonders whether our present teaching methods demand sufficient
individual participation
from
pupils when texts are being studied, especially in literature and history. It would be worthwhile
to study this problem, since silent reading - which is true reading according to our official
primary school syllabus-plays
a highly important cultural role.
49
In geography, Belgian scores are average on questions which involve the direct use of verbal
information, but they are rather weak on items calling for the interpretation of maps.
An evaluation of the results in science seems inappropriate,
since curricula in this subject differ
too much to permit valid comparisons to be made between the mean scores of different countries.
2. Differences
according
to types of school
Table 10 and Figure 4 below show pupils’ achievements
expressed in fractions of standard
deviations according to types of school and sex. Since in this case we are concerned with comparisons within one country, the figures relate to the norms of the total Belgian sample.
TABLE
Type
of school
Non-verbal
A. Vocational
schools - boys
B. Vocational
schools - girls
C. General secondary
schools-boys
D. General secondary
schools - girls
10
Reading
Comprehension
Maths
Science
- 26
-32
-44
- 24
-44
-55
- 34
-55
-48
+52
+55
+52
+55
+57
+24
Jr34
+11
+22
-6
FIGURE
Non-verbal
C. Mean-general,
boys
Geography
Maths
-3
4
Reading
Compr.
Geography
Science
-
D. Mean-general,
girls
Mean
of total
sample
Mean
_
total
A. Mean-vocational,
boys
-
B. Mean-vocational,
girls
.
-1
of
sample
(I
a. Assuming that scores on the non-verbal test can be taken as a valid criterion of the pupils’
capabilities, these data lead us to the following conclusions:
- in the sample of boys in vocational schools, the mean performance
is rather low in mathematics, very low in reading comprehension,
but high in science, when compared with the result
of the non-verbal test:
50
- among girls in vocational schools the mean is higher than might have been expected in
reading comprehension,
but particularly low in mathematics and geography;
- among boys in general secondary schools, the mean is high in all subjects;
- among girls in general secondary schools, the mean is high in mathematics, a little low in
reading comprehension
and very low in science.
b. Both for boys and for girls the means in general secondary schools are superior to those for
either sex in vocational schools.
If the scores for both sexes are combined, the differences are, respectively:
71 y. of a standard deviation on the non-verbal test, 93 y. in mathematics, 72 y. in reading
comprehension,
81 o/0 in geography and 57 o/o in science.
It would, of course, be quite erroneous to judge the relative merits of these two types of schools
merely by comparing these results. In the first place the study covered only subjects to which
more importance is attached in general secondary schools than in vocational
schools, and in
addition to this, the gap between mean scores achieved on the non-verbal test entitles US to
assume that the differences
already existed at the time the pupils started their post-primary
education*.
The means give us only a rough idea of the differences,
but the following
table (Table 11) of
quartile distributions
provides US with a more differentiated
picture. (These distributions
have
been obtained by finding the score point at the quartiles for the total group and then calculating
the percentage falling within the different quartile ranges in the general and vocational schools
separately.)
TABLE 11
Reading
Comprehension
Maths
Gfl”.
Lowest
VOC.
Gen.
Geography
VOC.
Gen.
Science
voc.
GC?“.
voc
Quartile
Lower Intermediate
Quartile
2
l.5
pq
:;‘::
pj
ii?
m
F
pq
Upper Intermediate
Quartile
Highest
Quartile
* Here, for example,
tions to post-primary
are the
schools
mean
based
percentages
on samples
of success
of N=150+
on the arithmetic
test
In each type of school.
Boys: General
Secondary
Schools
Whole and decimal numbers
Mental arithmetic
Fractions
Metric system
Calculation of areas and volumes
Geometrical figures
Problems
From: Epreuves Analytlques
d’Arithm&ique
Girls: General
Secondary
Schools
74.7 %
59
74
74.3
54.5
71.3
64.8
(Publications
74.7 %
56.4
73.1
73.4
50.7
70.5
64
de
I’lnstitut
Sup6rieur
taken
during
entrbnce
Boys:
Vocational
Schools
exsmlns-
Girls:
Vocational
Schools
70 %
38.6
54.1
56.9
35.5
58.5
53.2
de Pbdagogie
du Halnaut,
62 %
29.8
50.6
54.1
25.8
44.7
37.4
1961).
51
The differences
proceed, in short, from an unequal distribution of superior and inferior scorers
between the different types of schools, the superior ones being more numerous in general
secondary schools, and inferior ones in vocational schools.
c. The dispersion of scores is wider in vocational schools.
on the mathematics test in the two types of school:
Below, in Table 12, we give the scores
TABLE 12
Vocational
schools
General Secondary
Schools
Mean
S. D.
14.4
18.4
4.04
3.59
Coefficient
of variation
in 1OOths of an
S. D. from the mean
28
19
The greater homogeneity of results in general secondary schools results from the fact that they
are very selective, whereas in the large vocational schools, the spreading of pupils over different
parallel courses makes it possible for weaker pupils to stay on in classes leading to leaving
certificates
at lower levels of achievement.
3. Differences
according
to sex
According to Table 10 and Figure 4 the boys’ means are higher than those of the girls in general
secondary as well in vocational schools, except in vocational schools where reading comprehension is concerned.
Because of the small number of schools covered by our present enquiry, we are not entitled,
however, to draw any general conclusions from the results obtained. It would be interesting to
carry out research on a larger scale into this problem as it exists in Belgium in order to determine the causes for these differences, should they appear significant.
4. Differences
The pupils
Table 13:
according
included
to regularity
in the sample
of promotion
were
distributed
among the various
grades
as shown
in
TABLE 13
Vocstionsl:
In grade appropriate
Repeated one year
to age
boys
62
78
Vocstlonsl:
84
86
girls
General:
114
59
boys
General:
girls
97
53
Vocational
schools clearly contain a higher number of 13- and 14-year-old
pupils who have
repeated a grade, but this repetition has usually already taken place before entry into the vocational schools, as their intake includes numbers of pupils who have already doubled classes at
primary school or who have been diverted to technical schools after failure in general secondary
schools.
In order to study the extent to which repetition of a grade corresponds to lower levels of ability
or educational performance, we present in Table 14 figures for the degree of significance of the
difference between the means for each of the tests and for each type of school.
52
3
E
p’
VOCATIONAL:
TEST
SCHOOL
MSWl
Non-verbal
Mathematics
SOYS
Regularly promoted
Repeaters
Regularly promoted
Repeaters
32.24
29.65
15.69
14.37
s. D.
13.7
10.5
t
GIRLS
Regularly promoted
Repeaters
17.80
16.97
Mean
30
s. D.
4.6
1.06
14.32
Science
Regularly promoted
Repeaters
Regularly
26.93
10.5
14.77
3.5
BOYS
13.17
4.1
18.75
16.90
3.8
GENERAL:
12.80
GIRLS
4.1
19.94
9.37
2.7
8.42
3.05
10
2.9
3.8
8.29
2.5
16.84
2.6
23.07
4.3
2.15
37.72
11.3
31.67
9.6
18.82
14.77
16.32
13.17
17.80
4.4
19.44
3.8
16.97
4.05
18.47
4
17.74
3.8
12.98
3.5
9.11
2.8
7.86
2.6
4.17
2.05
4.73
15.44
3.9
11.50
2.7
7.8
4.78
9.30
2.95
t
3.47
5.67
4.83
6.55
S. D.
7.2
1
12.23
Meall
5.73
33.93
3.6
t
11.3
3.24
0.67
13.83
S. D.
2.75
2.63
Repeaters
8
promoted
4.4
Meall
43.57
3.7
Geography
GENERAL:
1.88
1.85
4.8
4.3
t
10.8
1.22
3.6
Reading
Comprehension
VOCATIONAL:
HISTORY
4.16
According to this table of pupils’ t scores, the differences
are important in general secondary
schools at the 1 ‘J,$level for all tests, except for reading comprehension
in the case of girls where
the level is 5 %; on the other hand, the t scores are particularly
low for boys in vocational
schools and the only significant one is that for science. The same trend is present, but to a less
marked extent, in girls’ vocational schools where the degree of significance
of the difference is
higher in reading comprehension and in science.
These observations
confirm that general secondary schools are more severely selective where
the subjects examined by the international tests are concerned.
5. Differences
according
to nationality
of parents
The sample of boys in vocational schools included a particularly
high percentage - more than
35 O& - of children of foreign workers, mostly Itaiians. A recent study* has shown that these
pupils are subject to a great deal of educational failure due to the absence of teaching methods
which would help them to adapt progressively
to our language and school system. It seemed to
us interesting to examine, on the one hand, whether or not the presence of this group had caused
a lowering of mean scores for boys in vocational schools, and on the other hand, to study the
extent to which children of foreign parents, who had overcome the initial obstacles facing them
in elementary school, still remained handicapped in one direction or another.
To this end, we divided the results of the 145 subjects into two sub-groups: children with foreign
parents (N=52),
and children of Belgian parents (N=93).
We then calculated the means and
of
standard deviations
separately
for each group and estimated the degree of significance
differences between the means.
Table 15 gives the results obtained.
TABLE 15
MS%lS
Non-verbal
Mathematics
Reading
Comprehension
Geography
Science
Foreign
Belgian
parents
parents
~~ .-... __--
Standard
Foreign
parents
Deviations
Belgian
parents
DlffWeWXS
31 .l
14.8
30.2
15
12.3
5.28
11.9
4.17
16.7
13.8
8.5
18.6
14.1
9.1
4.82
4.59
3.1 1
4.62
3.97
2.81
not significant
not significant
significant at 57; level
not significant
not significant
-_.~-.-
A study of Table 15 enables us to conclude:
a. that the mean obtained by children with foreign parents on the non-verbal
test is slightly
superior to that of Belgian children. There are probably two reasons which chiefly explain this.
In the first place they have already overcome a more severe selection process than the Belgian
children and it is therefore likely that their mental capacities are higher. In the second place,
fewer of them have chosen to enter general secondary schools.
b. that in spite of this advantage, their mean scores on the scholastic tests are all inferior to
those of Belgian children. These differences are not high, except in reading comprehension where
they represent 40 y0 of a standard deviation and are significant at the 5 y. level. This higher
difference is explained by the fact that in silent reading knowledge of the language is the essential element, while it is only one of the factors coming into play in the other tests.
c. that the standard deviations are all higher in the group of children with foreign parents. This
group is altogether more heterogenous:
while the best pupils remain at a level close to that of
the best Belgian-born children, the weaker ones appear to fall further behind their Belgian counterparts.
* De Caster,
S.. and Derume:
‘Retard
pkdagogique
- lnstltute
de Sociologic
U. L. B. 1961.
54
et situation
soc~ale
dans
la region
du Centre
et du Borlnage
Figures 5 and 6 below illustrate this situation: they present in diagrammatic form the two groups’
results in reading comprehension
and mathematics in terms of 5-point normalised scales.
FIGURE
6
FIGURE
Mathematics
Reading
6
Comprehension
‘I
25F
-3/2a
-712
Mean
-I-
‘12
Children
with Belgian parents
- B
Children
with foreign
- F
parents
+31’2 a - 312 a
- 112
.
I-*
Mean
+1/z
-?- 3/2 a
55
-.-
-
--.-
D.A.Pidgeon
TEST
A COMPARATIVE
STUDY
OF THE
DISPERSIONS
OF
SCORES
In the second
of the two articles
which
eerve to demonstrate
the usefulness
of international
comparative
studies
in shedding
further
light on problems
occurring
In particular
educational
systems,
Mr. Pidgeon
compares
the standard
deviation
on the five tests in the twelve
countries
and draws
conclusions
about
the effect
which
“streaming”
(the form
of class
organisation
extensively
practised
in England
and Scotland)
the different
approach
to teaching
for the phenomenon
he has noted.
has
which
on the dispersion
of test scores.
The author
discusses
“streaming”
may encourage
and which
could
account
Introduction
In an earlier study (Pidgeon, 1958) in which the performances
of eleven-year-old
children from
Queensland,
Australia, and from England and Wales were compared on tests of non-verbal
ability, reading and arithmetic, it was noted that in each of the tests used, the standard deviation
of test scores obtained by the English and Welsh children was considerably
greater than that of
the Australian
sample. It was thought that this finding might, in part, reflect the effects of
“streaming”
as practised in England - it being sometimes alleged that one of the results of
streaming is that the brighter children make more rapid progress than they would otherwise
achieve, and also that the duller children, when assigned to a”C”stream,
become more backward,
partly as a result of poor morale, and partly because there is a tendency in some schools for the
more experienced teachers to be put in charge of the abler streams. It was noted, however, that
while there were larger proportions of high scorers on all tests in the English and Welsh sample,
there were differences
between the tests with respect to the low scorers; on the two arithmetic
tests there were considerably
fewer children in this category from the Queensland sample, while
on the reading and non-verbal tests there were slightly more. It was cautiously concluded, therefore, that the results might be due to differences
in the methods and approach employed in
teaching the different subjects and not to any overall differences
in the organisation
of classes.
Since that study, however, other evidence has been reported which suggests that perhaps the
larger variance of test scores may after all be due in part to the system of class organisation
employed in England. An investigation
(Lloyd and Pidgeon. 1961) in which a non-verbal test,
standardised
in England, was given to groups of African, European and Indian children in Natal,
South Africa, revealed considerably
lower standard deviations in that country. It was observed
that the system of class organisation
employed in Natal meant that “no child can proceed from
one class to another unless he can pass the examination
set for that class. Such a system
inevitably has repercussions
on the methods of teaching employed and the concern of a teacher
in Natal is to get as many through the examination as possible, since too many failures might
reflect on his efficiency.
This leads to mass methods of teaching and to a complete lack of
recognition of individual differences”.
It might be mentioned here that the very opposite occurs
in England. Teachers are trained to recognise individual differences
and to adjust their teaching
accordingly,
and indeed, the system of streaming is a further aid to this end. The system
employed in Natal is used with all ethnic groups but “its effect is more pronounced with the
African and Indian children in view of their larger school classes”. The results reported reflected
this, in that the standard deviations tended to be smaller with these groups, particularly
with the
African children.
57
A further study, particularly
relevant to the question of streaming in England, is that of Daniels
(1961). Daniels compared the performances of children in two large primary schools that streamed
by ability, with those of children in two similar schools in the same type of neighbourhood
that
did not stream. Daniels stressed that the “non-streaming”
in the latter schools was a “consistently
thought-out policy” or, in other words, the teachers in these schools did not believe in streaming
and in fact “felt it was educationally
wrong to do so”. His results revealed a consistent trend
towards smaller dispersions of test scores in the un-streamed schools; the standard deviations
in 22 of his 24 separate comparisons
were smaller with un-streamed children although only
five of these reached statistical significance.
There would appear to be some evidence, then, that the dispersion of test scores in England
tends to be rather larger than in other countries and that this larger dispersion may be associated
with the method of class organisation
employed. The number of studies providing this evidence
is, however, fairly small, and it would need to be confirmed before any generalised conclusions
could be drawn.
The present study
It seemed clear that, apart from its other main purpose, the international study sponsored by the
Unesco Institute for Education, Hamburg, would also provide an excellent opportunity for making
further comparisons of the dispersions of test scores in England and other countries. This article,
therefore, is concerned with presenting such results as are relevant for this purpose and to discuss the implications of the results.
A full description of the investigation
has been given by Foshay in the opening chapter of this
report. It is only necessary to note here that attempts were made to make the tests appropriate
for each country, despite the necessity to translate into eight different languages. It can be
reasonably claimed that this was successfully
accomplished
for, although a translated test is,
of course, a different test, as Foshay points out there is no evidence that difficulties
in translation
“seriously influenced the national scores”.
Since this was in the nature of a pilot study in this field, it had been agreed that it was unnecessary to procure a strictly random sample of the school population of the stated age in each
country, but that attempts should be made to obtain a representative
sample that was as typical
as possible of the total population at least with respect to mean test score and standard deviation.
The detailed descriptions of the sample tested in each country given by Foshay suggest that, with
a few exceptions, this had been achieved. It is, of course, particularly
important, if inferences
are to be drawn about the dispersions of test scores in different countries, to ensure that samples
fully representative
of the countries concerned are obtained. Further comment, therefore, will be
made on this point in the discussion of the results below.
Results
The obvious statistic for measuring the spread of scores on a test in any sample is the standard
deviation. Using the raw S. D. for each country on a particular test, comparisons can legitimately
be made between countries on that test provided it can be shown that there is no direct relationship existing between S. D. and mean. As an indication of how successful the tests were in
each country, in no case was the mean score either too high or too low to be seriously influenced
by a “ceiling”
or “floor” effect, and in only one test (science) was the rank order correlation
between S. D. and mean, positive. Hence the standard deviation has been used for making comparisons in this study. Since, however, five different tests were used, each with a different number
of items, in order that comparisons could be made between tests, the raw S. D.s for each country
have been converted into a standard score. In Table 1, which gives the relevant figures, a high
positive value indicates a standard deviation well above the average for all countries on that
test, and a negative value a standard deviation below average for all countries.
58
Table 1
Standard
Deviations,
country
expressed
Belgium
England
Finland
France
Germany
Israel
Poland
Scotland
Sweden
Switzerland
U.S.A.
Yugoslavia
-
-
in Standard
Score
Mathematics
Non-Verbal
0.43
1.47
0.29
0.78
0.66
0.23
1.74
0.20
0.81
1.72
0.06
1.34
Form, on Five Tests from Twelve
Reading
- 0.26
2.23
- 0.38
- 0.06
0.14
-0.18
- 1.21
1.27
- 0.66
- 1.68
0.14
0.65
Geography
- 0.85
1.44
0.48
- 0.55
0.84
-1.16
- 1.39
1.21
0.11
- 1.49
0.89
0.46
Countries
Science
-0.19
1.23
- 0.69
0.18
0.53
- 0.47
- 2.37
1.21
0.76
- 0.49
1.03
- 0.72
Average
- 1.34
2.61
- 0.49
- 0.87
0.27
- 0.06
0.08
0.99
0.18
- 0.77
0.23
- 0.82
- 0.44
1.80
-0.16
- 0.10
0.49
- 0.33
- 1.33
0.97
0.24
- 1.23
0.45
- 0.35
It will be seen from Table 1 that, consistently
on every test, England has by far the largest
dispersion of test scores. To illustrate these figures graphically,
the average value for each
country on all five tests is depicted in Figure 1. (To obviate negative values 2.0 has been added
in each case.)
FIGURE1
The Average
Value of the Standard
Deviation
on Five Tests for each of Twelve
Countries
I
4.0 I
B
E
Fi
Fr
G
I
P
SC0
Swe
Swi
U
Y
Some comments on Table 1 and Figure 1 are clearly necessary. Firstly, it must be emphasised
that no significant
relationship
exists between the standard deviations
and the mean scores
obtained in each country. Secondly, the fact that the pupils tested in each country were not
strictly random samples must to some extent detract from the significance
to be attached to
these findings. Samples of schools selected by subjective judgment to be representative,
are
more likely than not to yield smaller standard deviations than random samples, owing to the
human tendency to under-represent
the very bad. Also in some instances a restriction
was
acknowledged.
The description
provided by Switzerland
of its sample clearly indicates that it
was hardly representative
of the whole country, since it was taken exclusively from a prosperous
middle-class town. In Israel the sample excluded recent immigrants from under-developed
areas;
it was also chosen from 8th grade pupils, that is, although the mean age was 131/g, it contained
some pupils younger than 13 and older than 14. Both these factors clearly affected the dispersion
of test scores. In other countries, however, the sample was chosen by methods similar to that
employed in England, namely, the testing of all pupils falling within the stated age range attending
---.
-_-_
-_-.__.
all types of secondary school in a seiected area - the selection of this area being based on
other test evidence, which suggested that it was reasonably typical of the whole country both
as regards mean and spread of test scores. In Scotland, two areas were chosen - a city and
a county - but the sample was deficient in children from the professional
and skilled worker
classes, probably resulting in some restrictions of the dispersion of test scores.
With these reservations
the data from Table 1 must be viewed with caution. Nevertheless,
there
would seem to be fairly clear indications that the dispersion of test scores in England is large
compared with that in other countries, thus supporting the previous evidence cited. It would not
seem an idle occupation, therefore, to speculate upon the reasons for this.
Discussion
All countries concerned in this study, apart from England and Scotland, employ some variant of
what might be called the “grade placement” system. In such a system, children are assigned to
grades initially according to age, but subsequently according to their ability to assimilate successfully the work covered in the grade in which they were the previous year. It is possible for some
children to be “accelerated”,
that is, to miss a grade, and for others to be “retarded”,
that is, to
repeat a grade. Thus, at any given point in time, say after five years’ schooling, while the majority
of pupils will be found in Grade 6, some will be in Grade 7, and others in Grade 5 or possibly,
having “repeated”
twice, in Grade 4. The numbers of pupils accelerated or retarded wi!l depend
upon the limits accepted as constituting a successful “pass” in the previous grade’s work, and
this, in turn, upon the standard of work demanded in each grade.
In England and Scotland, however, yearly promotion is primarily based on age, and if numbers
necessitate, pupils within one age group will be divided into separate classes. In both countries,
it Is the general practice in such circumstances to “stream” these separate classes by ability and
attainment, even in the junior school i.e. between age 7 and 11. Owing to the relatively larger
number of small primary schools in Scotland, the proportion of children in streamed classes is
somewhat less.
These descriptions
do not do justice to the differences
that exist in the various countries; they
are probably sufficient, however, for the present purpose, since it is contended that it is not the
difference between the two systems that is important so much as the general aims and beliefs
of the teachers practising within the systems. It is argued that the major objective of the grade
class teacher is to ensure that as many of his pupils as possible complete the work of the grade
successfully.
His class is, however, heterogeneous
with regard to ability and it is not unreasonable to suppose that the brighter children will require much less effort from the teacher
than the duller ones. Hence, while it is possible that the brighter children within such a group will
not be unduly extended, the duller ones will presumably receive every encouragement
to achieve
the grade pass and the net result will be a tendency for achievement test scores, if not to cluster
closely around the mean, at least to be relatively unrepresented
at the extremes. In England,
however, there is no such general acceptance of a similar curriculum for all children in a particular class and, certainly in streamed schools, the work achieved by “A” stream children at any
given age will be considerably
more advanced than that expected of “C” stream children.
This introduces the notion of “expectancy”
or the standard of work expected by a teacher of
his pupils. What is expected will clearly be determined in the first place by any curriculum that
is defined, but secondly, it will be influenced by the philosophical
beliefs of the teacher. In
many countries employing the grade placement system, the curriculum for any grade will be
defined quite clearly even to the provision of “state” text books; in others a greater degree of
fluidity within a school or class may be found. In England and also Scotland individual schools
have a higher degree of autonomy and what is taught at any age will depend to a greater degree
upon what the head teacher or even the class teacher thinks is right, although external examinations even in the primary school control this to a certain extent. But in all countries what a teacher
expects from individual pupils in his class must also depend upon his own particular beliefs.
Different patterns of achievement would be obtained, for example, by two grade class teachers,
one of whom believed that all children in his class were perfectly capable of covering adequa-
80
tely the work of the grade, given sufficient effort on his part with the duller ones, and the other
who believed that, since achievement was necessarily
limited by innate ability, there must be
some children in his class incapable of completing the year’s work successfully.
Such difference
in beliefs will also have an influence on the work of brighter children. The teacher who strives to
match attainment with ability will also be aware that children of high ability are capable of work
more advanced than that demanded by the grade syllabus, and although in the grade system this
matching will be achieved to some extent by jumping a grade at promotion time, it is clear it
will also influence what the teacher expects from the brighter pupils within his own class.
When considering the effects of these
given age, it would seem clear that the
extent of regarding innate ability as the
hence tend to obtain a wider dispersion
the fact of individual ability differences,
of the teaching situation play an equal if
different beliefs upon the spread of achievement at any
teacher who stresses innate individual differences
to the
major factor in determining achievement, will expect and
of attainment than the teacher who, although accepting
nevertheless
believes that the environmental
influences
not more important part.
It is maintained that this belief in the relative importance of innate individual ability differences
is
predominantly
held in England and indeed has lead to the general acceptance of the practice of
streaming. Burt (1959) has said “. . it is plainly imperative that both teachers and local authorities should take full account of such differences
in their efforts to provide an education which
will (in the words of the Act)* be adapted to each child’s ‘ability and aptitude”‘. To match attainment to innate ability presupposes,
of course, that the ability can be measured. Some popular
misunderstandings
regarding attempts to do this have been described elsewhere (Pidgeon, 1961).
Burt himself has stressed many times the difficulty
of ascertaining
a child’s “I. Q.” accurately
(e, g.: Burt, 1959), but nevertheless
insists that, because of the wide dispersion of measured
selection is absolutely essential: in my
intelligence
at age 10 or 11, “some kind of provisional
view it should start much earlier,
. indeed, as soon as possible after a child has entered school”.
The acceptance of this view led not only to the separation of children of greater and less ability
into different types of secondary school (Board of Education, 1926) but also to the practice of
streaming in junior schools (Board of Education, 1937). However, and Burt himself stresses this,
if children are to be separated for differential
instruction from an early age, it is essential that
the child is “free to swim from one stream to another” as his “capacities
develop or decline”
(Burt, 1959) and also, it must be added, to allow for inaccuracies in the original measurement. But.
as Daniels (1961a) has shown, there is far less fluidity in streaming than teachers themselves
imagine, or, as Vernon (1955) has demonstrated, is necessary.
The concern here is not whether streaming is, in itself, good or bad, but with its effect on the
dispersion of achievement. It is argued that the expectancy of “A” stream teachers for relatively
high attainment helps in itself to lead to this result being obtained, just as the expectancy of “C”
stream teachers for relatively low attainment helps to produce this result. Also, of course, the
belief that attainment can and should be matched to ability, made easier perhaps in homogeneous
ability classes, while it would tend to result in the stretching of brighter children, would not have
this effect with duller ones, since, for many at least, the limit of their capacity would apparently
have been reached. The effect this has on increasing the dispersion of achievement scores might,
perhaps, be enhanced where teachers use tests of “intelligence”
and attainment to help measure
the success of their efforts, for ordinary regression effects will tend to make “dull” “C” stream
children, subsequently tested for attainments, appear to be “working up to capacity” and bright
“A” stream children appear to have room for further improvement.
It should perhaps be added
here, that the more successfully
children’s attainments are matched to their ability, the more
successful will any initial streaming appear to be - the “self-fulfilling
prophecy”
described by
Daniels (1959).
There would appear, therefore, to be a number of factors affecting the dispersion of achievement
at any given age. In the first place, the general aim of the grade class teacher may tend to result
in a relatively smaller dispersion. Perhaps exerting a greater influence, ‘however, is the belief
a teacher may have that innate ability is of paramount importance in determining the level of
* The 1944 Education
Act.
61
attainment to be expected from a child. Streaming by ability, which is viewed as an administrative
device resulting from the acceptance of this belief, will merely tend to enhance its effects. When
all these factors act in the same direction the effect will clearly be greatest and this is what happens in England. Here, it is claimed, the aims and, more especially, the beliefs of most teachers
and educational administrators
lead them to expect wide differences
in performance,
and this is
what is therefore achieved. Where, on the other hand, the grade placement system operates and
especially where, within such a system, teachers do not attempt to measure innate ability and
therefore do not expect their pupils’ attainments to be matched to it, then the dispersion of
achievement will be much less.
While possible explanations
for the relatively wide dispersion of test scores in England can be
offered, clearly a value judgment is involved in answering the question as to whether this is,
educationally,
good or bad. Some further relevant evidence can be given from the present study,
however, by examining the proportions
of pupils obtaining relatively
high and low scores,
Obviously, the average levels of performance will influence these proportions,
but the contrast
between England and other countries
is shown in Table 2, which gives the percentage
of
pupils scoring outside the limits of raw score, approximating
to plus and minus one standard
deviation.
TABLE 2
Percentage
of Pupils scoring
Test
Average
12 countries
20.7
20.9
19.5
21.9
18.4
Non-Verbal
Mathematics
Reading
Geography
Science
Below
of
beyond
& 1 S. D. on each of Five Tests
I S. D.
England
15.9
39.1
25.1
34.7
24.2
Above
Average
of
12 countries
20.3
22.7
18.2
18.3
16.0
+1 S. D.
England
28.7
15.5
20.8
11.4
18.1
It will be observed from Table 2 that, in three of the five tests (non-verbal,
reading and science)
England has a larger percentage than the average scoring above plus one standard deviation,
but that in all four attainment tests, it also has a larger percentage than the average scoring below
minus one standard deviation. Some concern might be felt for the 39.1 O/Oof pupils obtaining
low scores on the mathematics test.
Bibliographical
references
Board of Education
1926 -
Board of Education
Burt, C., 1959
1937 -
Daniels,
J. C., 1959
-
Daniels, J. C., 1961a
-
Daniels, J. C.. 1961 b
-
Lloyd, F. and
Pidgeon, D. A., 1961
-
Pidgeon,
D. A., 1958
-
Pidgeon, D. A., 1961
-
62
Report of the Consultative
Committee on the Education of Adolescent Children, H. M. S. O., London.
Handbook of Suggestions for Teachers, H. M. S. 0.. London.
General Ability and Special Aptitudes, Educational Research, Vol. I,
No. 2.
Some effects of sex segregation
and streaming on the intellectual
and scholastic development
of junior school children. Unpublished
thesis, Nottingham University.
The effects of streaming in the Primary School, I - What Teachers
Believe, Brit. Jour. Educ. Psych. 31, 69-78.
The effects of streaming in the Primary School, II - A Comparison
of Streamed and Unstreamed Schools, Brit. Jour. Educ. Psych. 31,
119-l 27.
An Investigation
into the Effects of Coaching on Non-Verbal
Test
Material with European, Indian and African Children, Brit. Jour. Educ.
Psych. 31, 145-l 51.
A Comparative
Study of Basic Attainments,
Educational Research,
Vol. I, No. 1.
The interpretation
of test scores, Educational
Research, Vol. IV.
33-43.
David A.Walker
AN ANALYSIS
OF THE REACTIONS
OF SCOTTISH
TEACHERS
AND PUPILS
TO ITEMS
IN THE GEOGRAPHY,
MATHEMATICS
AND SCIENCE
TESTS
To establish
the relevance
of test items to pupils’
learning
opportunities
is important
both from
the point of view of measuring
achievement
and from that of maintaining
the goodwill
of teachers
whose
pupils
undergo
the tests.
In the following
article
Dr. Walker
shows
how this need can be
fulfilled
and seeks further
to establish
the extent
to which
in- and out-of-school
learning
opportunities.
ability
and otherfactors
appeared
from the international
tests to be determinants
of success.
When the same test, or series of tests, is administered to pupils of different countrieswith
different
educational systems, it is unlikely that the items will be equally acceptable or equally useful in
all of the countries concerned. The present inquiry was intended in the first place to assess the
reactions of the teachers of the classes concerned to the items used in the tests of geography,
mathematics and science, and secondly to estimate, if possible, the contributions
made by stress
on the topics in the curriculum and bythe environment to the accuracywith
which pupils answered
the questions.
Rating the items
To obtain the opinion of the teachers concerned, copies of the test booklets were sent to each
school taking part in the original investigation.
The following
instructions were issued with the
booklets.
“instructions
(1)
(2)
for rating
items
(Mathematics,
Science)
was taken
by a group
to rate the items of the test in two ways:
used for these pupils;
of the pupils.
of 13-year-old
Read
the
required
each
test
item.
typically
covered
point scale.
Rating
Rating
Rating
Enter
Consider
in the
degree
instruction
of
to which
pupils
in
rating
Consider
1, 2 or 3 in the
next
the
extent
answer
apace
provided
to which
pupils
have
Rating
Rating
Rating
rating.
(4)
knowledge
like
in the
Use the
and
yours.
skills
Give
extensively,
in the test
opportunity
encounter
knowledge
such as those
involved
some, or little
exposure
to such experiences.
Enter
the
classes
1 Stressed:
well covered
in class and in homework
(if any).
2 included
but not stressed;
touched
on but not dealt with
3 Not included.
the
(3)
test
The attached
test in Geography
school
last year. You are asked
(a) in relation
to the curriculum
(b) in relation
to the environment
a rating
intensively
pupils
by the
on
the
in your
question
following
era
three
or repeatedly.
booklet.
in the
home
and
in the
test question.
Decide
following
rating
scale.
community
whether
to use
there
skills
or
is considerable,
A Considerable
exposure
B Some exposure
C Litt!e or no exposure
the rating
A. 6 or C in the
e.g., 1A. 2C.
answer
apace
provided
in the
test
booklet.
Indicate
clearly
on the front of the test booklet
the school
and course
of the tests these coursea
were described
as five-year,
three-year
with
no foreign
language
and three-year
modified.
In some
booklets
schools
should
(5)
comment
Any
the coursea
for boys
be labelled
accordingly.
which
teachers
may
and
wish
girls
differ
to make
and
separate
on the tests
will
Each
item
will
thus
have
a double
to which
the ratings
refer. At the
one foreign
language,
three-year
booklets
be welcomed
will
be required
for
each
sex.
time
with
The
by the Council.”
All schools taking part in the original investigation
co-operated
in this supplementary
inquiry. A
very small part of the data had to be rejected because some teachers did not give a definite rating,
e.g., an item was rated “1 or 2” or “B/C”.
63
Of the 864 curriculum ratings given to
were 2, and 28% were 3. In the opinion
relation to the topics covered in school
the proportions 9 %, 34 % and 57 z, i.e.,
from the environment.
the 32 items of the geography test 28 % were 1, 44 X
of the Scottish teachers this test had only a moderate
work. The environment
ratings A, B and C occurred in
the teachers felt that only a minor amount of help came
In mathematics
the position was similar though in this subject there was a higher proportion
given to rating 1. Of the 1,066 curriculum ratings for the 26 items, 42 % were 1, 34 % were 2 and
24 % were 3. The help expected from the environment was even less in this subject, the rating A
occurring in only 6 % of the replies, B in 28 % and C in 65 %.
In science, the curriculum ratings were similar to those in the other subjects. Of 730 ratings on
21 items, 30 % were 1, 30 z were 2 and 40 9; were 3. Greater help was thought to be available
from the environment in this subject, the rating A being given in 20 % of the cases, B in 40 % and
Cin40%.
The ratings
Appendix.
differed
greatly
from item to item in each test as is shown
in the list given in the
It must not be assumed from the figures quoted above or from the data in the Appendix that the
topics with adverse curriculum ratings are not covered in the school courses. In many cases they
occur at a later stage in the curriculum. The pupils tested were mostly in the second year of a
course which is of three to six years’ duration and different schools have different schemes for
covering the work.
Agreement
among the ratings
It would have been possible to calculate a mean curriculum rating and standard error of the
mean for all teachers, using the values 1, 2 and 3 for the three ratings. This might, however, have
given a misleading picture. For example, an item rated 1 by 20 teachers, 2 by none, and 3 by 20
teachers would then be given a mean rating of 2, which was not actually given by any teacher,
and a standard error of about 0.16, which is relatively small because of the number of teachers
involved. For this reason the table in the Appendix gives the most frequently occurring rating
for each item and not the mean.
One possible factor causing disagreement
among ratings is the variation in course to suit the
ability and sex of the pupils. In any assessment of the extent of agreement among teachers in
rating the items it is therefore advisable to deal separately with different types of course. The
main types in Scotland are (a) the five-year course for the more gifted pupil, (b) the three-year
course with one foreign language for the pupil a little above the average, (c) the three-year
course with no foreign language for the average pupil and (d) the modified course for the pupil
within the lowest 10 to 20 % of the ability range. In some schools it is also necessary to differentiate between courses for boys and those for girls.
Within each of these types of course we can assess the extent of agreement among the teachers’
ratings by calculating their variance. If the curriculum ratings are valued at 1, 2 and 3, as given
by the teacher, the variance of the distribution
wi!l be zero when all teachers agree, 2/3 when
the ratings are distributed evenly over all three values, and 1 when half of the teachers select
rating 1 and the other half select rating 3, showing maximum disagreement.
These variances
ously described,
64
were calculated for all the items of the three tests, using the categories
and the results are summarised in Table 1.
previ-
TABLE
Variances
of Curriculum
9
z
E
5
2
Items
Highest
variance
Lowest
variance
Average
variance
0
0.30
1
0
0.15
0
0.39
0.36
0.20
1
0.63
0.67
Five-year
6
11
6
4
0
0.29
0.75
Three-year
one foreign language
no foreign language (boys)
no foreign language (girls)
G
11
12
0.14
0
0
0.39
0.25
0.25
0.80
0.52
0.67
0
0.18
0.75
0
0.13
0
0
0.43
0.53
0.43
0.36
0.80
0.98
0.86
1
A
Five-year
Three-year
one foreign language
no foreign language
modified
4
Five-year
E
z
z
cn
for Different
Number of
teachers
COURX
2
$
m
:
(3
1
Rating Distributions
Three-year
one foreign language
no foreign language (boys)
no foreign language (girls)
no foreign language (boys and girls)
-
It will be observed that even within the main types of course the curriculum ratings of particular
items were in perfect agreement for some items and in complete disagreement
for others. A
similar pattern was obtained from the environment ratings. These patterns indicate that the extent
of agreement among the teachers, even within a course, was only moderate when averaged over
all the items of each test. They throw little or no light on the reliability of each teacher’s ratings.
The relation
between
facility
of item, teachers’
ratings and ability
of class
Although the reliabilities
of the teachers’ ratings were not established by the results of the preceding section, an effort was made to measure the extent to which the stressing of a topic in
the curriculum, as assessed by the rating, or having help from the environment, similarly assessed, determined the pupils’ proficiencies
in that topic.
The proficiency
of each group (which might comprise all pupils in a particular course in one
school) was measured by the percentage of correct answers to the appropriate
item. This percentage was converted to the corresponding
probit (i.e., normal deviate plus five) for statistical
reasons. The curriculum ratings 1, 2 and 3 were replaced by 1, 0 and -1 and the environment
ratings A, B and C were changed to the numerical scale 1, 0 and - 1 also.
It is likely that the proficiency of a group is affected by the general ability of the group, whatever
the stress in the curriculum or the help provided by the environment. A simple and rough measure
of this ability for each group was obtained by rating five-year courses as 2, three-year courses
with one foreign language as 1. three-year courses with no foreign language as 0, and modified
courses as -1. The objection may be raised that it is not the ability of the pupils that is here
being assessed but their exposure to a given type of course. It would have been possible, by
seeking further information from schools, to establish that the average abilities, as measured by
tests of verbal reasoning, are in the descending order of the numbers given. Readers who prefer
in the following
discussion
by the phrase “type of
to do so may replace the word “ability”
course followed”.
For the geography test there were then available 27 different groups, each containing at least
seven pupils, and for each of these groups and for each item of the test there were available (a)
the facility probit; (b) the curriculum rating and (c) the environment rating for the item; and (d)
65
the ability level of the group. For example, the 49 pupils in the 5-year course in one school gave
35 correct answers to item 2 of the geography test, rated 1 B by the teachers in that school.
Thus the facility percentage for curriculum rating 1, environment rating 0 and ability level 2 was
71.4 y0 for this group, giving a facility probit of 5.57. The 27 groups then provided the data to set
up for each item the regression equation
facility
probit = br x curriculum
rating
+ b2 x environment
rating
+ bs x abrlity level
As a first approximation
each group was given equal weight, i.e., the differences
in the numbers
of pupils in the groups were ignored.
This technique was applied to three items in the geography test, three in the mathematics test
and four in the science test. The items were chosen partly for their relevance to the present
inquiry and partly because results from the main inquiry had suggested points of interest.
A summary of the results is shown in Table 2 in which coefficients
which are statistically
significant are marked *. It will be observed that for no item was the regression coefficient
for the
curriculum rating significantly
different from zero, and only for one item was this true for the
regression coefficient for the environment rating. On the other hand, for all items save one the
regression coefficient
for the ability rating was significantly
greater than zero. In other words,
the proficiency
of a group in answering an item is directly related to the ability level of the
group, but appears to have little relation to the amount of stress given by the teacher to the
topic tested by the item. The fraction of the whole variance of the probits accounted for by the
regression equation varied from a non-significant
12 o/0 for Item 9 of the science test to 57 o/0
for Item 22 of the mathematics test.
Table 2
Regression
for Selected
Items
Percentage 0
Test
Geography
Item
Curric.
Errors
Envt.
Ability
varmnce
accounted
2
- 0.12
0.30
0.36*
0.19
0.18
0.11
50
0.06
- 0.04
0.34"
0.11
0.14
0.08
46
0.26
0.31*
0.16
0.15
0.11
45
0.16
0.30*
0.21
0.21
0.15
24
0.54*
0.13
0.13
0.09
57
0.31*
0.19
0.10
0.17
0.11
52
0.1 1
0.08
43
0.14
0.17
0.13
0.14
0.10
0.1 1
23
5
12
22
Science
Standard
8
14
Mathematics
Regresslon
coefficients
Curric.
EfM.
Ability
1
- 0.11
0.22
see discussion..
0.21
-0.11
-0.04
7
0.16
9
0.1 1
ia
0.11
0.44*
0.07
-0.02
0.31*
0.17
0.12
0.26*
for
12
Item 2 of the geography test was of information
type, asking what use was made of Tundra
regions. Item 8 was purely factual, asking into which sea the River Danube flowed. In either item
it might have been thought that degree of stress in the curriculum would have markedly affected
the proportion of correct answers. This was not so, the accuracy of the pupils’ answers being
related only to their general ability and not to the stress given in the curriculum or to the help of
the environment. The position was very similar in response to Item 14, which was an exercise
in interpreting a map.
The first mathematics item to be examined was number 5, which was “multiply 9.04 by 0.4”. The
performances
of the groups varied from 0 oh to 100 %. but the regression accounted for only
24 y. of the variance. One reason may be that 31 of the 40 teachers gave the curriculum rating 1,
showing that this type of calculation was stressed in the curriculum. Within that rating, degrees
of stress would no doubt vary.
The second mathematics item examined was number 12, which was a problem involving the cal-
66
culation of the area of a triangle with base 40 yards and altitude 37 yards. This question proved
very difficult for Scottish pupils, twenty-three
of the forty groups scoring zero, and the mean
score of all groups being only 19 %. As there was so large a number of zero scores, the regression technique was not applied. It was, however, noted that the percentages of correct answers
for the three curriculum ratings were 23 %, 18 y. and 17 %, while those for the four ability
groups were 49 %, 24 %, 12 y. and 0 %.
Item 22 of the mathematics test was again a calculation of areas, but in this case an example
was given. Scottish pupils fared better on this item; every group contained pupils giving correct
answers and the mean score of all groups was 59 %. The percentage of the variance accounted
for by the regression equation was 57, the highest of all the items examined.
Items 1, 7 and 9 of the science test were selected partly because sex differences were shown in
the percentages
of correct answers, boys being superior in all three. The first referred to the
force required to push an object up an inclined plane and the teachers, while not stating that this
type of question was stressed more frequently with boys than girls, appeared to be of opinion
that it was the kind of problem more likely to occur in a boy’s environment than a girl’s This was
the only item in which the environment rating was significant.
Boys were also superior in their replies to Item 7, which dealt with the principle of flotation, but
neither curriculum rating nor environment rating appeared to affect the regression equation.
The responses to Item 9, on the principle of the lever, provided some surprises. Not one of the
regression coefficients
was significant nor was the contribution of all three together, the percentage of the variances attributable to regression being only 12 %. The percentage correct over
all groups was 36 O/Oand the percentage for the various groups ranged from 9 to 75 %. As the
item was a multiple choice one, with four possible answers, there is a suggestion here that a fair
amount of guessing had occurred. This idea is supported by the fact that the curriculum rating for
26 of the 36 groups indicated that the topic had not been referred to by those teachers.
Finally, Item 18 of the science test, which referred to the usual method of estimating the age of a
tree, produced a good response from the Scottish pupils, but once again the only factor associated with success was the ability rating, and the proportion of the total variance accounted for
by all three factors was only 23 %.
The results of this analysis may be disappointing
to teachers in that so little difference seems to
be made to the proportion of correct answers by their stressing or not stressing particular topics.
It must be borne in mind, however, that the measures used were comparatively
coarse and that
the analysis has been made as simple as possible. With these reservations,
it would appear that,
at the age at which the tests were administered, ability is a greater determinant of success than
stress by teacher or help from environment,
and that other factors are, in most cases, having
greater effect than all three together.
APPENDIX
ON PAGE 66
67
APPENDIX
Most frequently
Item
68
Geography
given ratings for each item
Mathematics
Science
1
IB
18
3c
2
28
1c
3c
3
2c
IC
3c
4
IC
5
1c
2B
1c
IC
38
6
2c
2B
3c
7
1C
1B
3c
8
2c
16
IB
9
1C
IC
3c
10
2c
IC
3c
11
2C
IC
3c
3A
1B
12
2c
13
2c
2c
IB
14
2c
IA
15
2c
3c
2B
16
2c
2c
2c
17
2c
2c
28
18
2c
3c
25
19
2c
3c
2c
20
2c
3c
3B
21
2c
1B
1C
22
2c
1c
23
2c
3c
24
2c
36
25
2c
3c
26
2c
3c
27
2c
28
2c
29
3c
30
3c
31
3c
32
3c
IC