I
I
.e
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
•e
I
THE PREDICTIexl OF COLIEGE GRADES FROM COLLEGE BOARD SCORES
AND HIGH SCHOOL GRADES
by
Richard F. Potthoff
University of North Carolina
Institute of Statistics Mimeo Series No. 419
December 1964
The adjustment of high school grades for differences
among high schools in erading standards and practices is a
problem Which I:DJ.st be dealt with in any prediction system
which utilizes 'hiel1 SCl1001 grades in predicting academic success in colleee. Similarly, the need for adjustment of college
grades for differences among colleges "'1i.ll also enter into the
picture in the operation of a central prediction system. This
report emmines and evaluates different possible approaches
for the prediction of college grades usirl['; College Board scores
and high school erades as the predictors, conziders what assumptions are made under the different approaches, and obtains the
rna.x:imwn-likelil10od estimators of the pa.ra.meters which are used
to adjust the lligh school grades and college crades under a
large number of different possible rodels. The variances of
the predicted college grades, as well as confidence intervals
associated with these predicted grades, are considered.
This research ,ms supported by Educational. Testing Service, and
was also supported in part by the Mathematics Division of the
Air Force Office of Scientific Research.
DEPARTMENT OF STATISTICS
UNIVERSI'l'Y OF NORTH CAROLINA
Chapel Hill, N. C•
I
I
.I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
Ie
I
TABIE OF CONTENTS
Section 1:
Introduction and summary ••••••••••••••••••••••••••••••
1
Section 2: Notation, assumptions, and understandings •••••••••••••
3
Section 3: Previous related studies ••••••••••••••••••••••••••••••
8
Section 4:
The use of different prediction equations- for
different groups •••••••••••••••••••••••••••••••••••••
10
Section 5: The first approach: equating the college grades
by means of the T variable', 1'o11owed: by e-qua'ting
of the high SCllOOl grades ••••••••••••••••••••••••••••
12
Section 6: Evaluation of the first approach ••••••••••••••••••••••
17
Section 7:
The second approach: equating the high school
grades by Il¥!a.IlS of the T variable, 1'oliowed: ~J
equating of the college grades •••••••••••••••••••••••
21
Section 8: Evaluation of the second approach •••••••••••••••••••••
29
Section 9:
The third approach: equating the high school
grades and equating -tbecollege grades'
simultaneously •••••••••••••••••••••••••••••••••••••••
35
Section 10: Evaluation of the third approach, and "C'on~:son
with the second approach •••••••••••••••••••••••••••••
40
Section 11: Determination of the predicted college grades,
and of confidence intervals for these grades •••••••••
48
l'hthematical Appendix
Note 1
(referred to in Section 5)
••••••••••••••••••••••••••••
54
Note 2
(referred to in Section 5) • •••••••••••••••••••••••••••
56
Note :5
(referred to in Section 5)
••••••••••••••••••••••••••••
57
Note 4
(referred to in Section 5)
••••••••••••••••••••••••••••
61
Note 5
(referred to in Section 7) ••••••••••••••••••••••••••••
64
Note 6
(referred to in Section 7)
••••••••••••••••••••••••••••
72
Note 7
(referred to in Section 7) • •••••••••••••••••••••••••••
73
(referred to in Section 9) • •••••••••••••••••••••••••••
76
(referred to in Section 9)
78
Note
0
u
Note 9
i
••••••••••••••••••••••••••••
I
I
Note 10
(referred to in Section 9) • ••••••••••••••••••••••••
81
Note 11
(referred to in Section 9)
·.........................
83
Note 12
(referred to in Section 9) • ••••••••••••••••••••••••
88
Note 13
(referred to in Section
9)
• ••••••••••••••••••••••••
91
11~
(referred to in Section 9) • ••••••••••••••••••••••••
92
Irote 15
(referred to in Section 10) • ••••••••••••••••••••••••
94
Irote 16
(referred to in Section 11) • ••••••••••••••••••••••••
97
Irate
Acl:no'tTle~~nts
••••••••••••••••••••••••••••••••••••••••••••••••••
99
TIc:crcncez •••••••••••••••••••••••••••••••••••••••••••••••••••••••
100
-,
I
I
I
I
I
I
el
I
I
I
I
I
I
I
-,
ii
I
I
I
.e
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
•e
I
THE PREDICTION OF COLLEGE GRADES FROM COLIEGE BOARD SCORES
. *
AIID HIGH SCHOOL GRADES
by
Ricl~
F. Potthoff
University of North Carolina
1.
Introduction and summa.ry.
Certain statistical problems will arise
in any program to set up a centraJ. prediction system under "lhich the college
grades (C) of a large number of students are to be predicted using both their
College Board test scores (T) and their high school grades (H) as predictors.
'!'he purpose of this report is to explore different possible statistical
methods and models for e.ffectine; such predictions, and to make appropriate
evaluations of the different aJ.ternatives.
A major reason for the existence of statistical complications in this
area is the fact that grades in one high school cannot be assumed to be equivalent to erades in any other high school and grades in one college likewise
cannot be asswned to be equivaJ.ent to grades in another college.
'!'hus something
must be done w'hich will result in equating the grades from the different high
SC1100ls and equating the grades from the different colleges.
How to effect
this equating is a basic statistical problem which must be faced in connection
with the establishment of a central prediction system.
One idea whiCh has been suggested is to equate tiE college grades by
usine tIle College Board test scores, and then, once all the college grades are
on a common basis so that
tl1e~l
are comparable, the high school grades can be
*This research vm.s supported by EducationaJ. TestinG Service, and was
also supported in part by the Hathematics Division of the Air Force Office of
Scientific Research•
eq~uated
through the use of standard methods.
less reverses this process:
on
t~le
A second suggestion more or
the grades of different hie;h schools are eqU&ted
basis of the College Board test scores, and then these equated high
SC1100l grades are used (along ,lith the test scores) for equating the college
Still a third possibility is to use a technique winch does the equat-
grades.
ing for the higll school grades and for the college grades simultaneously.
All three of these possible approaches will be considered in some detail
(see Sections 5, 7, and 9L and certain basic models will be formulated and
studied.
For all of the formulations, we will show
110W
to obtain the maxirrwn-
li1:elL1000 estimates of tile equating parameters.
All three approaches will be evaluated (see Sections 6, 8, and 10).
The third one is tentatively recommended over the other two, since it would
seem to be less vulnerable to sneaky systematic biases by virtue of the fact
tnat the assumptions upon ,-Thic}l it is based appear to be mare realistic and
less restrictive.
As an alternative choice, a special form of the second.
apl)rOaCll is reconnnended, for t:lere mir;bt exist certain conditions under which
t~is
choice could result in a bit more efficient estimation and prediction
t~lan
the third approach, and also the calculations are noticeably simpler.
The thir<i approach requires much lengthier computations than the other two:
a1"'1;11011[;.1 ti:e recommendation in its favor is contingent (for one thing) upon
t:,1C feasibility of perfol-;;ti.ng t-nese computations, the equation systems to be
solved ::ave been carefully exanrl.ned and it appears that t'heir solution is
entij:-el~r
practicable provi(le(L that a little time can be obta.1IEd on a suffi-
ciently laree computer.
O~ce
the
maximum-li~;:elihood estimates
of the equatinc; parameters have
becn found: the predicted college grades can be obtained.
It might be desir-
able to abtain for each stuc1.ent not only his predicted college grade, but also
I
I
-,
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-,
I
I
I,
,I
I
I
I
I
I,
Ie
I
I
I
I
I
I
I
.I
a confidence
inte~:"V'a1
thi~
for
predicted grade.
These l.'1atters are covered
after the discussion of the third approach (see Section JJ_).
The conf:klenee
intervals on the predicted. :;rades will tend to be na:rro't'rer for the students
fr:>ln the lar'ger high
SChOO~8,
since the estimation of the eCluating parameters
far a iliGll school will tend t:> be more accurate the larger the high seno::>l.
Tl1is report consists of two parts.
The second part is a Mathematical.
Appendix (consisting of a series of notes) to which we have relegated some of
the more teclmical details and ",18.tilematical derivations.
Here in the first
part, 'fIlich is less technical, '\'Ie present and discuss the principal. findincs
of the report, and at the same tiille we exhibit the importe.ntforImllas.
the reference numbers (
1
,
2
All of
,etc.) here in the first part "Till refer to tIle
notes in the Mathematical Appendix.
2. Notation, assWlIJ?tions, and understandings.
es sentially with three t~"Pes of variables:
test scores (T), and hiBIl school grades (H).
We will be concerned
college grades (C), College Boar<.1
The obj ect of the central
prediction system will be to predict the value of the va:c:ta'ble C for each of
a large number of students for
,mOID-
the values of
T and
H are given.
This
predicting is to be done by examining data on all three variables from an
earlier group of students for whom data on
C (as well as on
T and
already available, and then finding an optimal way of predicting
basis of
in
T and
H.
inG
C on the
For ey..a.mple, students who graduated from high school
1964 will have received
T, and
H) is
SOllle
college grades by earl~r 1965; the data on C,
H from these students could be employed to develop a way of predict-
C given T and
H, to be used on students who will graduate in 1965
and for whom only . T and
H data will be available.
Most of our development "Till deal with the earlier group (1. e., the
I
group for ''1lliCh
C data as well as
T
and
H data is available), since it
is this group which is utilized to estimate the parroneters of the prediction
system w.i.llch is to be applied to the later group.
Witil res:pect to the earlier
group in particular, we adOl)t J.;,he following notation.
of students in the group.
and let
Let them be distributed among
N •• be the nUI.'lber
n
different coller;es,
m be the number of d.ifferent high schools from 'lhich these N••
students came.
,·;ho
Let
Ca:.1}C
Let
denote the number of students in ti.1e j-th co11ece
N ..
~J
from the i-th
11i[~1
school.
students in the j-th college, and
frO:.l ti.le i-th hi6h school.
We also define
N.
~.
Tlms
m
, lI. J' =.1: Nij , and
~=l
~.
to be the number:>f
to be the number of students who came
m
n.
N. j
N••
n
m n
= 1: Ni = 1: IT
i=l
. = 1:: 1:: N .
j=l· J i=l j=l i J
•
refers to the hie12 school from which he ca.me (i = 1, 2, ••• , m),
i
j l'efers to the college l-Illich he is attending
k (1:=:1,2, ... ,N
(j = 1,2, ... ,n), and the index
) is used to distinguish the N different s1..-udents who are
ij
ij
in tile j-th college and came from the i-th high school.
ano.
H"
J.J k
T11en
C
, T
,
ijk
ijk
,.;ill represent respectively the college grade a.verage, College
Board test score, and hig}1 school grade averaee of the student with identification (i, j, k);
C
i~ in terms of the grading scale of colicee j, and
ijk
H. 'k is in terms of the grading scale of high school
J.J
(2.1)
N
ij
n
C
1::
=
Cijk ' Ci. = 1: Cij '
ij
j=l
k=l
Cij
= Ci/Nij ,
C.
~.
C. j =
= Ci.!Ni • '
4
~
m
1: C ,
ij
i=l
j = C.
/~
i.
1fe also define
C•• =
j'
c..
m n
1: 1: C
ij
i=l j=l
= C• • /N••
e.
Ii
I'
I;
I:
I:
I
Out notatiol1 will identif~/ each student by means of a triple of indices (i, ;j, k),
wilere
I:
,
•
el
I
I
I
I
I
I
I
,e.
I
I
I
,e
By substituting the letter
Ie
I
I
I
I
I
I
I
.e
I
C wherever tile latter appears
in (2.1), we define eight m::>re expressions; likewise, by substituting
C,
I
I
I
I
I
I
T for the letter
1'Te
H for
define another eight expressions.
Perj:)8.ps our terminology should be explained m:>re precisely.
Tne term
"high sCl1ool" embraces what might otherwise be called "secondary scmols",
"preparatory schools", or
s~ly
embraces "universities".
The term "college grade" (C
"schools".
The term "college" of course
ijk
) refers to a college
grade average over some specified period of time (or possibly to some closely
related measurement); e.g.,
Cijk
might be the average of all the individual's
grades for the entire fresl1man year.
By "high school grade" (H
ijk
) we mean
the average of the student t s high school grades over a designated period of
time (Which might be an~'Where from one to four years), or possibly some related
measurement Such as one based on his rank in class.
By "College Board test
score" (Took) we mean a single score (which might be a sum. or weighted combinaJ.J
tLon of other scores) received by the student in a common testing program "mich
was administered to all N•• students.
We will not consider the possibility of using IIm"e than one
H-variable
as a predictor (e. g., one might attempt to use hi gh school grade averages in
several different subject areas as predictors, instead of using the single
predictor consisting of t11e over-all grade average).
We likewise "Till not
consider the possibility of trying to predict more than one C-variable (e.g.,
one cculd try to set up a system with multiple criterion variables consisting
of college grades in several different SUbject areas, rather than restricting
oneself to the single over-all college grade average).
Such refinements as
these would complicate the statistical analysis of the system considerably,
and so at this stage in development it appears best to concentrate on the more
ba.sic lOOdels.
We will, however, consider at various points in this report
5
the use of more than one
T-variable (i.e., more than one t-J'Pe of test score
arising from a common testing program administered to all. H.. students) as a
predictor,since such a generalization does not cause as much complication.
For some purposes, such as judging the extent and cost of the computation that will be required wi t1l different prediction schemes, it wU1 be helpf'ul to know the approximate values of
m and
n.
m,
It is anticipated that
the number of high schools encompassed in the system, may be as high as 4000
or 5000, and that
n, the number of colleges in the system, will be roughly
400 or 500.
For carrying out the equating for the college grades and for the high
school grades, we will assUL'le tb.a.t a linear transformation of the grades of
ee.ch college or each high school will be a
transformation.
sufficient~t
Thus we associate with college
j
general type of
a pair of constants
CX
j
and I3 , so determined that the variable
j
(2.2)
cijk'S (2.2) fran
represents an "equated" college grade; in other words, two
any t·uo different colleges are always comparable, whereas
ent colleges are of course not comparable.
high school a pair of equating parameters
Cijk 's from differ-
Similarly, we associate with each
a.
~
and b.
~
SUCll
that the variable
represents an equated high school grade, and thereby compensates for differences in grading standards among the high schools.
In the m:xiels which we 1'11.11
treat, a regression parameter will ordinarily be considered to be absorbed
I
I
-,
I
I
I
I
I
I
_I
I
I
I
I'
I
I
I
-.
in the ai'S and bits.
6
I
I
I
,e
Actually, the
f3j
<lj'S,
parameters) are all unknOim.
'S,
ai's, and bi's (as vrell as a couple of other
It is the estimation of these parameters which
constitutes our principal statistical problem.
I
I
I
I
I
I
With the
we will adopt the convention that these variables are all scored in such a
way that a better performance is reflected in a higher (rather than a lower)
I
I
I
I
I
,e
I
This implies, incidentally, that all f3 's [see (2.2)]
value of the variable.
j
and all b. 's [see (2.3)] are
l-
> O.
If we wish, we can twe only one rather than two parameters for equating,
and eliminate the term
f3 j in (2.2) and/or the term b i in (2.3).
The result-
ing set-ups would be less complicated, but at the same time less general and
presumably less accurate.
He shall consider such set-ups at various points
in tirl.s report, however.
Ie
I
I
cijk'S, the Cijk's, the hijk'S, the Hijk's,and the Tijk's,
Although we will nat introduce all of our notation at this stage, "re
define the following expressions before closing this section:
(2.4)
SHHi.
=
I: I: It:' - II. ~
J.. J..
k J.J 1.:
j
STHi.
=
I: I:Ti--l.H··k - Ni
.J .. J.J
j k
T.
J. •
•
H.J..
Tf
.. - Ili
STT'J.. = I: kI: ~l.JK
••
j
STT
=
I: I: I:~
- rl..
i j k ijk
te. j
=
I: I: Cfjk i k
P..
~ j c:a j
C. 'l-T. '1- - N j
SCT.j = I: I:
•
i k l.J .. J.J..
STT.j
r
j
=
I: I: T:;'1 - Ii .
•J
i k J.J;;:
=
SCT ,/(Scc ,STT j)2
C• j T• j
T2• j
1..
•J
•J
•
7
I
I
SCHij = ~ Cijk Hijk - Nij
d
-,
~ CijkTijl;: - Nij Cij Tij
SCTij =
Cij Hij
.. H.
ij = Hij - IT~J
~.
=
e ij
3. Previous related studies.
''Ie consider
brien~l
Tucker [9], af'ter mention-
which is related to the material of this report.
ing sO.me
ear~ier
work in the areas of equating of grades and prediction of
college grades, examines tln-ee different
system.
some previous work
The first two are canonical
m:>d~
for a
corre~ation
does not favor for prediction purposes.
centra~
prediction
models, which Tucker himself
The third one, which is
c~ed
the
"predictive mdel", is extremely general (as are the ot.i1.er two), and, in
fact, is more general than our formulations in a number of respects, including
the aJ.lowance for provisions to take care of more than one C-variable and
more than one H-variable; i,ritil this greater generality, however, are associa.ted
m:>re serious computational complications.
IlIOdel seems to be
~
At the same tiJ1te, Tucker's third
general than our formulations in ale respect : it
apparently makes no provision for anything similar to our Pj'S (see (2.2»,
and so .a problem develops a.s to how to weightcertain error terms (see [9],
p. 55).
As Tucker [9, p.2] points out, one method "lhich has been used to
predict college grades is for a given college to use data on its own current
students to set up a regression system for predicting the grades of its pros~)ective
students.
This
n~thod
is not only computational ly simple but also
rests on a minimum of ass'U11IPtions.
it uses only a
relativel~l
However, its fault lies in the fact that
small &m::lunt of data, which causes the estiDBtes of
8
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-.
I
I
I
.e
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
,e
I
the equating parameters, and consequently also the predictions, to be comparatively inefficient and inaccurate; in fact, the grades frau those high schools
which send only a few students to the college can
har~
be used at all in any
prediction, since the estimates of the equating parameters for such high schools
would be so unstable.
With a central. prediction system, on the other hand, the
entire mass of data encompassing all n
colleges, all
TIl
high schools, and
a.U H•• students is utilized sitIultaneously to estimate the various equatinG
parameters, and this sllould result in much better estimation and improved
predictions.
Gullilesen, in a section entitled "Equating two forI!lS of a test given
to different groups" [5, p. 299ff.], presents a theoreticaJ. development (which
he attribut~s to Tucker) t:1B.t seer.lS to be appl.icable to the problem of equating college grades (e) of different col.l.eges on the basis of a common test
(T) administered to all students.
The development treats explicit1y the
special case which woul.d correspcnd to only n = 2
to
11
> 2 is iJllllediate.
colleges, but generalization
In any event, though, the development deals essentiaJ.J.y
with the population parameters themselves rather than with the estimation o'f
these parameters, whereas in thi.s rePOrt all of' our methodDl.ogy will be based
on max:i.muu-likelihood estitBtion of' unknown parameters.
The Gulliksen-Tucker
development assumes that there has been selection on the basis of the equating
variable (T) and only on the basis of T (i.e., not on
e or
on some third
variable such as H); the case ,mere there is selection on the basis of the
variable to be equated rat11er than on the equating variable (such as might be
assumed to exist if' T is
t~le
equating variabl.e and H the variable to be e(luated,
e. g.) is not considered by Gulliksen, but simlar tools might be applied to
this case to obtain relations between the population parmmters.
~le
effect of sel.ection receives considerable attention in Gulliksents
9
book [5] in various connections; it turns out that the sometimes tricky influence of selection will also ma.l"e itself felt in various phases of this
report.
I
I
-,
One basic principle 'uhich we will be relying upon is the fact that,
if X and Y are two variables such that selection is made on X but not
on Y, tllen the conditionaJ. distribution of Y given X is unaltered by the
selection on X (whereas tile joint distribution of X and Y is ~ generally
unaJ.tered, and neither is the conditional distribution of X given Y).
4. The use of different prediction equations for different groups.
In his study, Tucker [9] goes to some length to provide for the possibility
of usinG different regression equations, or predictive composites, for the
sradc predictions for different groups of colleges.
exatlPle, he explores a system in
~Thich
In particular, for
grades at liberal arts colleges are
predicted via one set of rec;ression weights, and grades at engineering and
teclmica.l colleges are predicted via a second set of regression weights.
argument is that the grades at the two different
t~'J?es
The
of collEg es essentially
constitute two different t:'''Pes of criterion variables, so that it is more
realistic to have two separate sets of regression iveights for predicting them.
Such refinements as these have the advantage of generalizing the basic
model, but at the same tim.e they create certain complications.
The computations
seem to be comparatively curnbersome in relation to the sort of computaticnal
requirements that are proposed here in this report.
Also, Tucker [9, see
especially pp. 54-55] takes note of several unsolved nathe::18,tical problems
Which arise
~li.th
his "predictive nx>del".
The latter include questions as to
the uniqueness of the solution for the estimates of the parameters, as well
as, perll8.ps, the convergence of the iterative process leading to this solution; the cr~ice of a certain integer which he calls
10
np
(ifllich is the number
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-.
I
I
I
,.
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
••I
of predictive composites, and is
~
the weightingof the squared errors.
developrnen~,
1 but
::s
n, the number of colleges); and
It appears that, at the present stage of
it might be best, so far as practical application is concerned,
not to move right away into a relatively sophisticated model 'ihich allows for
splitting the colleges into eroups with different basic prediction equations
for the di:f:ferent groups.
Nevertheless, though, a model like Tucker's provides
st:tmuJ..ating food-for-thoue-.,ht for future stages of development.
In this report, we will assume that the criterion variable (college
grade average) is basicalJ.y the same variable for all
n
colleges (but with
linear transformations being required to equate grades of different colleges),
so that for all
n
colleges the same regression weights can be applied to
the predictor variables.
This assumption would seem to be considerably lOOre
reasonable for freshman grades thaa for post-freshman grades, since curriculums
would tend to become less hOllDgeneous the more advanced the students are in
college.
In case it is felt) however, that the regression weights for the
eneineering (technical) colleges really should be different fran those for
the liberal arts colleges, then the engineering colleges (ulu.ch would probably
not account for too large a fraction of the total) could be taken as excluded
from the group of n
the students at the
colleges.
n
The estimates of the
ai's and bi'S based on
liberal arts colleges might then be utilized
Someh01'T
in setting up a prediction system for the (presumably) sm.a.ll number of engineering colleges.
Finally, we mention a type of grouping which is different from grouping
by colleges.
The question r.1ight arise as to whether
lTe
should have one set of
reo-ession weights for predicting girls' grades and a second set for predicting boys' grades.
It is to be hoped that it will be satisfactory to assume
the same set for both, or that, at worst, the appending to the model of a
11
single extra predictor (essentially a second T variable, but able to assume
on.ly t't'1O values) will suffice to take care of any sex differences.
5. The first approach: equating the college grades by means of the
!!!.:table, followed by equating of the high school grades..
T
i'le are now ready to
t\l.rn to a detailed consideration of the three approaches to equating which were
mentioned in Section 1.
In the first approach, the first step is to equate the
college grades using the T-scores as the equating variable.
mod.el in l'lhich the conditional distribution of
c
ijk
We will assume a
(2.2) given
T
is
ijk
normal vlith variance equal to 1 and with mean equal to a linear function of
T
•
ijk
Such a model, which w'e 'tdll designate by CiT, seems to be the most
realistic (i.e., more realistic than a model which specifies a bivariate normal
distribution for
C and
bution of T given
T, or one which specifies tlJat the conditional distri-
C is normal), since there certainly is selection on the
ba.sis of
T whereas there is no selection based
bution of
C and
T before selection
cJnditional distribution of
al
T
011
C.
was bivariate normal, then the
C given T would be unaltered by selection on
after the selection.
Under our model which ",re just specified, the conditional distribution
T
ijk
(23()
where
J-l
and
--k
v are
may be vlritten in the form
e
1. (
-2 Cijl~
unkn01-n1
parameters (and v "Till be
>0
positive correlation between college grades and test scores).
because of
It is the Cijl: 's
rather than the cijk's which are the observed quantities, however.
aPIll~/
the transformation
12
-,
I
I
I
I
I
I
_I
If the joint distri-
T, i. e. ; it would be normaJ. 't'ri th the same mean and variance both before and
given
I
I
If we
I
I
I
I
I
I
I
-.
I
I
I
.-
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
.I
c ijk
OJ' + f3, C,
:=
J
'k
~J-"
to (5.1), then, after first getting
from (5.;":), w~ use (5.1 - 5.3) to find that the conditional distribution of
C
given
ijk
T
is
ijk
(21t)
-t
I~ j
I
where
OJ
For convenience, we denoted the additive term by
than by
Q:
j
as in (2,.2)], so that we could save the <lj
which absorbs both
aj
va.!"1ance being equal to
and
1
(J..
in (5.2) [rather
notation for (5.5)
Note also that the assUlllption about the
is not really an arbitra.r:r &smmption, but rather
it still allows for complete generality since, in effect,
i'Te
can think of the
standard deviation parameter as being absorbed in the <l,' s, the ~.' s, and
J
J
v in (5.4); or, to look at it another way, we may observe that the formulation (2.2) [or (5.2)] is not actual1~r unique (since it '\Inll still be valid if
the <lj'S
and
~j 's are IlUlltipl1ed by any positive constant), and the assump-
tion tll8.t the variance is 1
can simply be thought of as a means of making the
formulation unique.
It is (5.4) which ,,·re use in getting the maximum-likelihood estimates
of the Ctj's
and
~j 's
ting presence of the
(and also, incidentally, of v).
~j'S,
Thanks to the complica-
the obtaining of these ma.ximum-likelihood estimates
I
I
in this case is not merely a standard problem in least-squares estimation
involving simply the solution of a linear equation system.
do is to take the logarith.."n. of the product over
i,j,k
(5.4); differentiate it ,'lith respect to the CXj's, the
Uhat we have to
of the expressions
~j's,
and
II ;
set the
resulting derivatives equal to 0; and solve this equation system for the
CX
j
's, tile
~ IS,
j
and
The formulas for the estimates (the details of
II •
derivinc; '.fl1ich are given
'ttTe
1
in the Mathematical Appendix) are as follows.
First
solve the equation
(5.6)
~,by using (e.g.) the Newton-Raphson metood2 •
for
slu.a.re root of this solution ~
to get
we take the positive
¢ , the estinBte of v.
Then we
~.
=
J
CT.j
(5.8)
cxj
=
'\_
li
A-
T J' - ~.e .
•
J .J
the estimates of the ~j f S and CXj ·s.
Note that
A
~j(5.7) will be
>0
so long
I
I
I
Scr.j
>0
(5.9) merely requires
•
C and T to have a positive
Sa.nI,Ple
correlation
fcrr each college, a condition which must certainJ.y be satisfied if the grades
of tile college are to have any mes.ningtul relation to
unlikely event that
SeT. j
is
I
I
I
as
NOll
I
I
2See • j
and
/\
I
I
el
calculate
~
-.
T at all; in the
< 0 for some college, such a college should
probably be thrown out of the system anyway.
14
I
I
I
-.
I
I
I
,e
l'le might wish to consider the problem of how' to eCluate college grades
1'Then tl1ere is more than one T-variabJ.e upon which to base the equating.
'1'he
ma:dnu.lll-J.ikeJ.ihood estimates can still be obtained for this case, but the
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
,e
I
caJ.cuJ.a.tions may be noticeably more conu>licated than those IIllich are associated
with the relatively simple formulas (5.6 - 5.8, A2.2).
So;re details for tillS
case are given in the A;ppendii".
He observe that, if all 13 's
j
(5.2-5.4),
then
(5.4)
(21(02)
-t
are eliminated (i.e., set equal to 1) in
becomes
e
--:\(c.J.J;:
'1 + a.- v TJ.'jk)2/~
'"
J
af'ter including a parameter a2
for the variance.
Thus (5.10) represents a
simplified model with on1.y one equating parameter (a.) instead of t"TO
J
(Cljl13) for each college.
The estimation of OJ
under the model (5.J.0) is
nothing but a standard problem in least squares analysis:
''.
1\_
• v T. j - C. j
OJ
the esti.mator is
,
where
This completes the discussion of the first step (eCluating of college
grades via the T-scores) of our first approach to equating.
We now consider
the equating of the high school grades, l'41ich is the second step.
We suppose
tilat) to start with, we have for each student an equated college grade, to be
denoted by
*J.J ;:
*
cljk'
which is such that a
a. c. '1. from any other college.
'1'hus, if
*
c ijk for one college is com,parable to
'We
use the technique for eClUating
college grades which was described in the first part of this section, then the
* ,
c"
s 't'rould be calculated by the f ormula
J.J 1~
15
I
I
•
Note that
c
*ijk
(5.]3) is not emctly the same as
based on the true (unknown) values of OJ
on
est~nates
of OJ
and
~j.
and
c
ijk
(2.2); the latter is
~j' y,hile the former is based
We may assume a model in 'mich
c
ijk
has
conditional expectation (given T and H) of the form
E(C
and
unl'".nO'lV'll
hijk
ai
=
a' + b'hijk + b Tijk
=
a' + b'(aI+bIHijk )+ b Tijk
=
a i + biHijk + b Tijk
variance independent of
= 0.1 + bI
= a'
ijk )
In (5.14) ,re are writing
Hijk [essentially the same thing as (2.3)], and we define
+ b'aI and b i = b'b;'.
bi's,and b
(i,j,k).
Although the estimation
or
the ai's, the
under the mdel (5.14) involves nothing more tmn the applicatiCl'l
of standard least-squares theory, the solution of the normal equations is a
bit tricl{y; therefore we are treating the matter in some detail, but in the
Appendix".
The material in the APpendix4- is developed in terms of the cijk'S
rather than the
*
cijk's, but in reality it is of course the
which l'lill have to be used in e.ll the calculations.
.
*
cijk's (5.13)
The assumption is made
* ,
"OJ t s
tnat the c ijk s are reasonably close to the c ijk t
s, ·J..e., thatthe
A
to
anc, ". t S are reasonably close/the 0.' s and ~j t s; by "reasonably close II is
J
J
meant, roughly speaking, that the discrepancies (errors) here are small rela..
tive to other pertinent errors in the prediction system.
The assunq>tion
should n.::>t be too unrealistic, in view ot the fact that relatively large
numbers of students from ea.ch college (as indicated by the
presu.me.bly have been utilized for estimating the
16
OJ's and
N. J 's) would
~jts (note that
-•
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-.
I
I
I
,e
the N . 's will tend to be quite large in comparison '\'r.tth the
• J
This section, while
e~laini.ng
N. 's) •
L
the mechanics of the first approach and
the assumptions upon which it rests, has essentially not attempted any critical
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
,e
I
Tne latter '\'r.ill be the task of Section 6.
evaluation.
6.
Evaluation of the first approach.
The first step in the first
approach involves equatinG of the college grades by neans of the
Unfortunately, though, the a.ssumptions required by tins
c IT
ciT
model.
model will
Mainly for this reason, it appears that the
probably fail to be satisfied.
first approach should not be recommended for use.
Nevertheless, certain
Sl:ecialized situations may arise for which the first approach can be
proper~
e~loyed.
If all colleges select their students on the basis of T and
ciT
then the assumptions of thc
T alone,
model would presumably be fully satisfied.
In
reality, though, the selections of mst colleges would be heavil.Y influenced
by both
H and T; further!X)re, admissions decisions are likely to lean re-
latively nx>re heavily on II in the future than at present, if a central prediction
s~rstem
becomes available "Thereby grades from different high schools
can be made comparable wi til each other.
The distortion and bias which the
C IT model leads to, when there is selection on the basis of
H as well as
T,
is perhaps best dem:::mstrated by some illustrations.
Consider two colleges,
j
and J.
To make matters simple, suppose
tha.t their grading standards are actually the same, so that a j • C%J and
~j
• ~J.
If both colleges select their students on the basis of T
(a.)
tllen the
C
IT
model should be appropriate.
Thus there should essentially be
no distortion or bias in the estimation of the a's and
tendency for
1\
A
a j to exceed a
J
or vice versa, or for
17
alone,
1\
~'s,
~j
and no built-in
to exceed
1\
~J
or
I
I
vice versa..
(b)
lThile
til!Lt
J
J
Suppose now tha.t college
utilizes both
H and
T for selection.
accepts each stud.ant lfilOse
j
accepts each student for
~vllom
selects on the basis of T alone,
j
More specifically, suppose
T-score is above a ceruain mininn.un, wiule
a certain linear conbination of
(actuaJJ.y, something like (2.,) should be used for
mum..
Since
consider
j
nOll
not.
J
~j
and
what happens i'Tith college J.
students in
colleGe
selects on T alone,
J
~j
T and
H
H] is above a certain tlinil·rill not be distorted.
But
For each fixed value of T, the
will tend to have higher
H-scores than those in
j, since
has a mininum II-score for each value of T w'l1ereas college
Tilus, since the students in
value of T than the
students in
also tend to have higher
tend to have higher
J
j,
J does
H-scores for a fixed
it follows that the students in
J
1'1ill
C-scores for a fixed value of T(remember that w'e
J
than at
j, li!1ereas in fact they receive the same grades.
The trouble lies in the fact
tha.t students witil a given T-score at college
do
ability as students with the very same
J
~
have the same average
T-score at college
j •
Rather
the~r
have a hiGher ability (tha.nks to the superior admissions policy of college
and this higher ability is reflected in higher grades.
C IT
But unfortunately,
Thus the
C IT model leads to a distorted estimate
of C!J (essentially the esti."na.te tends to be too loW), and perhaps also to a
distorted estimate of
~J.
Consider next a situatioo where
basis of both
J),
model, these higher grades are mis-interpreted as indicating
lOvTer grading standards.
(c)
I
I
I
I
I
C IT model, this condition will of course force us to the false con-
clusion that students of equal ability receive higher grades at
under the
I
_I
are assuminG that the grading standards are identical at the t'WO colleges).
Under the
-,
T and H.
Suppose that
18
j
j
and
J both select on the
accepts each student for Whom a
I
I
I
I
I
I
I
_I
I
I
I
.e
certain linear combination of
suppose that
tion of
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
.e
I
~rlll
H is above a certain mini:mum, and
accepts eacll student for whom the
J
T and
YeIjr
same linear combina.-
H is above a certain minimum, but suppose that
higher minimum than j.
J
T and
J
uses a
Even in this case, students uitll a given T-score at
tend to have higher
H-scores, and hence
with the very same T-score a.t
C-scores, than students
hi~ler
Thus it is clear that, j'L1.St as in (b) a.bove,
j.
thc CIT ~del will lead to the false conclusion that gradinG standards are
toucher at
than at
j
J.
This distortion will be reflected in the estimtes
of tile equating parameters.
(d)
Suppose that
in otiler ,,,ords,
and J
j
both select on tile basis of
T is not used for selection at all.
a minirmun H-score which is higher than that used by
it is easy to see that the
C IT
model leads
US
H alone;
Suppose that
j.
J
uses
Then, here again,
to tile ver'./ same kind of dis-
tortion which troubled us in (b) and (c) above.
If the first Part of the first approach ends up by giving us distorted
values to use for the ex.·s and t3.'s in (5.13), then tIlls distortion will
J
J
certainl.Y tend to be carried over into the estimation of the
a.'
s and b.'
s
].
].
in the second part of the first approach, although its effect will probably
But, in general, if ex.
be diluted.
J
cular college
tile
8.
i
and t3. are badly distorted for a partiJ
j, this will tend to exert a strong distortive influence on
's and b ' s
i
of those high schools which send a proportionately 1a.J.·ge
number of students to colleGe
j.
Thus the built-in bias ,·ihich is evidently present in the first approach
would seem to be sufficient reason for recommending against the use of this
approach.
However, it is always possible that, in practice, this bias miGht
be denx>nstrated to be of a smaJ.l enough magnitude tllB.t it i'Tould not be considered serious; but if such be the case, the burden of proof, for safety's
19
saJ-;:e, should probably rest on those who feel that tile bias is too small to
be important.
One way of assessinG the potential importance of this bias would be
to examine the sample distribution of
H given T for different colleges.
In order to do this, though, one would either have to work with equated Hvalues, or else deal with students from a single high school at a time.
linear regression of H on
given
T
(representing the estimated mean value of H
T
as a linear function of T)
these linear regressions
sh~~
The
could be computed for each college; it
significant differences among colleges, then
1il is should constitute adequate grounds for avoiding the use of the first
approaCh.
But if no
SUCll
differences appear, then one mieht consider uti liz-
inc; tIle first approach after all.
We now mention
approach.
il
second but lesB important
drawbacJ~
with the first
When the college equating parameters are estiI18ted umer the CiT
I
I
-,
I
I
I
I
I
I
_I
model in the first step of the first approach, only the information on T
(and
C) is utilized, and not the information on H.
the estimates of the O:j'S and
13j
It irould appear that
'S would be more efficient (i.e., would
have smaller variance) if the infornation on both. T and H were utilized,
as is done in the third approach
distribution of
C given
T
~There
and
H.
If the first step of the first approach
results in estimates of t:ile Ctj's and 13
the~l
the model is based on the conditiona!
j
'S
which are not quite as efficient as
might be, then this loss of efficiency (probably relatively small) would
be expected to carry over into the estimaticn of the
ai's and
bits in the
second step.
One reason for presenting the formulas and teclmiques of the first
approach in Section 5 was that c:>nditions might sometimes eT..ist under Which
the first approach would be valid and could be applied in its entirety.
20
A
I
I
I
I
I
I
I
-,
I
I
I
.I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
.I
second reason, however,
vlaS
the fact that the separate parts of Section 5
will find applications in several other contexts.
We shall see that formulas
,from tIle first part of tne first approach will form the basis of the second
part of the second approach.
Also, an important use of the material of
t~e
first part of the first approach will be to provide an approximate solution
(to serve as a starting point) for a system of equations Which will arise in
ccnnection with the third approach and which will have to be solved by iterative procedures.
Note finally that the second part of the first approach (by
itself) is wl1&t would be used in the case of a single collec;e which wants to
utilize the data of its
T and
Olm
students for predicting
C
on the basis of both
H.
7.
The second approach:
~uating
the high school e;rades by means of
tile T variable z followed by eq,uating of the college grades.
of
t~le
second approach is to equate the high school grades using the
as the equating variable.
it
'UaJ3
evident that the
models, inasllDlch as there
C.
The first step
T-scores
In the case of the first step of the first approach,
C IT
model was the most reasonable of the possible
1'10.8
clearly selection on the basis of T but not
In tile present case, 11m·rever, the situation is not at all Clear-cut,
since it is not so evident "Vi'hat role the selection is plaYiIlG.
consider
One might
(n a rodel in wiu.cll the conditional distribution of H given T is
nor;:Jal (to be called the H IT model); (ii) a model Which specifie s tba.t the
joint distribution of
T and
H is bivariate normal (to be called t~ T, H
nodel); or (iii) a zoodel in lThicl1 the conditional distribution of T given
H is normal (to be called tIle
T IH rodel).
If the HIT rodel is er.1Ployed, then we use exactly the same procedure
for estirll8,ting the equatinG parameters as in the first step of the first
2l
approach (see Section 5); i're need onJ.y replace
a.nd alter the subscripts.
However, the
C by
H i·merever
C appears I
H IT model may not be too appealing;
this i'lOUld particularly be the case if' all individuals "'110 took the test are
entered into the calculations, thereby avoiding any apparent selection on the
basis of
T.
If the T, H model is Valid, then that automaticaJ.]s" means that both
the H IT model and the T IH mdel are valid, since a normal joint distributioo.
inplies that all conditional distributions are normal.
Hence the T, H model
ma.::es l-:lOre assum;ptions than either the H IT model or the
TIII model,
miC;lt often be considered a disadvantage.
so~~
Tin
formulas for the T, II
l~odel
and this
Nevertheless, ,·re shall present
later in this section, after we consider the
nodel.
ti:ree, since the high SCllool students wro end up taldng the test (T) may have
grades would feel more optir:dstic about college and would be more likely to
register for the College Board exa.minations) and through the influence of
tea.chers and counsellors, 'fil0 vlOuld be more lilcely to enco1.'Xage students with
joint distribution of
If the
T and H for the entire (unselected) population of
high scllool students would be bivariate normal if they were to take the test,
then it follows that the conditional distribution of
T given
H for the
selected group (i. e., the group that actually takes the test) will be nomal
(with the conditional mean
of T being equal to a linear function of H) I pro-
vided tllat the selection is on the basis of
22
H alone.
I,
I
I:
I:
I:
_I
Such
selection could occur both through self-selection (i.e., students with better
hiVler grades to try to go to college and to register for tIle test.
-.
I
It might be argued t;1B.t the T IH model is the nx>st reasonable of the
been, at least to some extent, selected on the basis of their grades (H).
I
I
I
I
I
I
I
I
I
-,
I
I
I
,I
I
I
The argument in favor of this T IH model is mucil norc convincing if the
data fron
&!
students wIlo tool:: tl1e test, regardless of what they ended up
doine about college, is utilized in theEq,uating.
If only those students uho
go to college, and to a college within the system, are utilized, so that all
students with T-scores (and H-scores) but no C-scores are omitted from the
data., then there will aJ.nx)st certainly be selection on the basis of T as
well as
This would caD. into question the validity of the T IH model,
H.
as well as of the HIT and T, H models.
I
I
I
Ie
I
I
I
I
I
I
I
.I
We now consider the estinBtion of the equating parameters under the
Tin i"lOdel.
h
Under this mdel, the conditional distribution of Tijk given
iji : is of the form
(7.1)
(2~a2)
-~
e
-i
(Tijlc-a t -b thijk)2/ a2
Alternatively, instead of specifying our model by the distribution (7.1),
lie
can sin;?ly W1"ite the IOOdel expectation equation
,
which is in the form customarily used for analysis of variance problems involving linear nx:xiels.
Since the Hijk's and not the h ijk t s are what is observed,
we substitute
into (7.2) and obtain
E(T
)
ijk
=
=
at + b l (ai + bi Hijk )
a i + b i Hijk
'
23
where
a i = a' + b'ai
parameters
a
and b
i
and b
= b'bi.
i
Thus the estinBtion of the equating
is of course nothing but an elementary problem in
i
analysis of variance, for 'uhich tIle well-known soluticn is given by
and
•
(Obviously
SHHi. Imlst be
>0
for all i; any high school 't'r.i.th SHHi.;; 0
1'1OUld have to be thrown out.)
In case the model is made simpler by considering all bi's to be
equal to
1
in (7.3) and (7.4), then (7.4) would be COIlE
Under the lIDdel (7.7), the equating parameters are of course estimated by
1\
_ -
a i - Ti
•
1\,_
- b H.
,
J..
where
A
b' = Z STH! / Z SHHi
i
•
i
•
•
If there are two or more T-variables instead of just one, then the
estir:lB.tes of the equating ~ters under the T IH model become distinct1;y
more complicated than the simple formulas (7.5-7.6).
This case is covered
in the Appendix5 •
We turn now to the T, H model.
Although this IJX)del may nat find f're-
quent use, we include it here for the sake of completeness.
The model might
be appropriate if a situation should arise where there is no selection on
the basis of either T or
test
H; such a situation might occur, e.g., if a
T is given to everJ student in all the schools and then the
24
I
I
-.
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
_I
T and
I
I
I
,e
H data
fro~n
every student is entered into the calcuJ.a.tions.
We will restrict
our consideration of the T, H model
~r
to the case where there is only one
T-variable, and will not attempt
to treat the case of two or l!Dre T-variab1es except far the simplified model
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
in 1'lhich the b ' s are dropped.
i
The T, H model assumes & bivariate normal joint distribution for
Let!J. and (1/ e)2 denote respective~ the mean and variance of
Tijk and hijk •
Tijk , and let p be the correlation coefficient betl'reen Tijk and hijko
uijk (2.3)
arbitrari1y specify that
has mean 0
joint distribution of Tijk and h
ijk
Then the
is given by
(7.10)
•
Upon app~ing the transformation (2.3) to (7.10), we find that the joint distriHijl~
bution of Tijk and
(27t)-1
I~ fl-i
Olb
is given by
i
I
e-J (E7rijk-e!J.l&i+biHijk)c; i]-1 ('E7r ijk -G!J. )
ai+biHijk •
In the Appendix 6 it is shown that the JMDlIDJT:l-lil:elihood estimates of
the eq).1a.ting parameters
&i
and b
i
under the
T,H model (7.11) are given
by
and
1\ 1\
P
2SHHi •
where
0
and
'P
.
e STHi..
r1 + /1 + 4·i • (1-
L
are the values of
1\2
p
~~S2TRi.
\
e and
Ie
I
and variance 1.
We can
25
'J.l
)SHHi.)
2j
,
p which ma.x:imize the expression
I
I
-.
•
We "rill make no attempt to eXj?lore in detail the problem of how to find tIle
max:L~Ul~
attacl~ed
of tins complicated fUncticn
L(e"p) (7.14), but the problem might be
by techniques of numerical analysis" such as the method of steepest
descent.
In case
e and
p are considered to be known" then the known values of
e and p should be used in (7.J2-7.13) and of course the problem of maximizinG
(7.14) i'rould no longer exist.
vented via other avenues.
e=
1
exact ,:BXimizing value
value of
e
~
~rield a value of
e
1'Those difference from the
'Hould be negligible for lare-e
!~..
•
If this
't-rere substituted into (7.14), then it would remain only to maxi-
mize (7.l1~) with respect to the single variable ~
Again, it might be possible
to find a relatively straightforward formula which, far large
N•• , would
yield a. value of p that would be only negligibly different from the one
(~)
which [';ives the e:x:act maxiDDJI:l of (7.14).
vIe consider now the sll'!pler but more restrictive
all the
bi'S in (2.3) are set equal to 1.
distribution of T
ijk
and
_I
Tile maxi.m1zation of (7.14) miGht also be circ1.Un-
For example" the simple and obvious formula
(n,,/sr:rr p " would probabJ..:;
I
I
I
I
I
I
T"H mdel in which
For our purposes, the joint
H
under this model is most conveniently written
ijk
in the form
26
I
I
I
I
I
I
I
_I
I
I
I
,e
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
Ie
I
,
where
!:
We show in the Appendix 7 that the ma:d.-
is the 2x2 variance matrix.
mum-1i1:e1ihood estimate of &i under this rode1 (7.15) is
-iti.
1: S.THi I .
+ if
.
1: Smno f
i ...... J. •
'
(
T -T
i...
)
•
Thus the estimation of the equating parameters turns out to be Imlch easier
under the model (7.15) than under the Il¥)del (7.11).
If the IlDdel (7.15) is
generalized to allOY for tl'lO or more T-variables, then the estimation of the
a
i
IS sti~
presents little difficulty, although the inversion of a matrix is
required ~
Tilis completes our discussion of the first step (eq.uating of high
school erades via the T-scores) of the second approach.
vIe now turn to the
second step, which consists of the equating of the college
of tIle T-scores and the eq.ua.ted H-scores.
~des
on the basis
We suppose that, at the start of
the second step, we have for each student an equated higll school grade, to be
denoted
b~l
ll!jk' Which is such tl1at an
to an 11'" Ok from any other high 8coool.
J.J
h!jk for one high school is comparable
Thus the
ht 01
J.J ;:
f
S
,muld be calculated
by a forraula. of the form
if any of tIle methods presented earlier in this section are utilized for equating the high school grades.
h
ijk
Now htjk (7.17) is nat quite the same thing as
(2.3), since the latter is based on the exact but unknOtm values of the
equating parameters.
In what follows, we shall present our development in
27
terms of the hijk ' s in order to keep everything rigorous.
Hm·rever, for practical
purposes the unknown hijk's cOtlld not, of course, be used, and the h!jk'S would
have to be used instead with the assumption that they would be reasonably close
to the 11, '1
J.J \:
IS.
l'le :ma.y assume a model in vthic:i1 the conditional distribution of the
equated college grade (2.2) Gi,ven
i trariJ.;y taken to be equal to 1.
T
and
ijk
l1ij1t is normaJ. vli th variance arb-
Then this conditiona! distribution, when
expressed in terms of the observed quantity
C
(rather than in terms of
ijk
cijl)' is of the form
•
Note, in
fL~ct,
that this model is formally identical 1'1ith the one covered in
Note 3 of the Appendix, and t;'JB.t (7.18) is the same thine as (A3.l) except
that (7.18) has
hi 'k where (A.3.l) uses the notation T~ 'J
J
~\:
out tl1at the problem of estiL-ating the O:j' sand
second approach is
essenti~·
•
Thus it turns
f3 j 's in the second step of t'ile
the same problem as one 't'lhich was previously
encountered and dealt with in Section 5 in connection witIl t'ile first step of
tne first approach.
Because of tlrl.s, we need not consider the problem further,
except to point out again its ,solution: the esti:ma.tes of the
O:j'S and
~j'S
are calculated by formulas (.A.3.7) and (A3.9) respective1¥, but only af'ter the
rather complicated system (.A3.l0 - A,;.ll) has been solved for v and
,,0.
In case there is IOOre than just the one T-variable represented in (7.18) (i.e.,
more tl1BJl just the two T-variables represented in (A.3.l)], then the theory of
Note 3 generalizes in a straightfonTaTd manner.
It mig11t be argued tl1at the estimation of the a j ' s and
~j
second step of the second approach could be omitted altogether.
28
's in the
If it is
I
I
-.
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-,
I
I
I
,e
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
,e
I
desired to predict only the college grades on the
COmmal
measuring scale
(tile cij1/s) rather than tIle grades at individual colleces (the Cijk's),
then for this purpose there lTOuld be no need to estinate
t~le
CXj's and I3 's.
j
A collee;e could compare its applicants with each other on tIle basis of their
predicted
cijk's just as effectively as on the basis of
~leir
predicted
Cijl: 's for that college, since one is just a linear 1'unction of the other.
On
the other hand, though, thc college might like to Imol1 the relationship
betl,;een the
cijk's and its Oi·m grades (Cijk's); in other lTOrds, it might
A
1\
like to find out What its Ol-m CX and I3j are (and it mLgllt also, incidentally,
j
1\
1\
be curious about the CXj'S and I3 j 'S of other colleges).
We might also note
that; even for the purposes of predicting just the cijk's, it would still be
necessa.ry to obtain estimates of v and v O , so that we 11Ould.
apparent~
have
to solve the system (A3.l0-A..?ll) to get ~ and ~o in any event.
8. Evaluation of the second approach.
approach involves estimation of the
The first step of the second
ai's and
for the high school grades, by means of the H
T IH tl:>del.
a.n:r
bi'S, the equating parameters
IT model,
the
T,
H model, or the
In Section 7 we indicated doubt that the assumptions underlyinc;
of these three models irould be satisfied So long as the calculations in
the first step were based onl~l on those students for i'1l1om C-scores (as well
as T-scores and H-scores) irere a.vailable, since
SUCll
a group of students
would probably have been selected on the basis of both T and H.
On
the
other lland, it was suggested that the T IH model migYt well be valid if all
studcnts who took the test i'rere included in the calculations, since it could
then be argued that there 1'1B.S selection on
however, there is still
beloi'T.
SOr.1e
H but not on T; even then,
possibility of selection on T, as we shall see
Finally, we will want to consider the question of hmf the
29
~i' s
1\
and b i •s of the second approa.ch compare in efficienc:r 'd. tIl the correspondinG
estinB.tors under the third approach, but we shall defer this matter to
I
I
-,
Section 10.
Our evaluation of tile second approach is based on
mentioned.
t~le
factors just
Thus it appears tllat the second approach should not be reconunended
at all if the data used in the first step is restricted to tllOse students for
"Thon C-scores are reported.
But if the data from all students who took the
test is used in the first step, and if the calculations are based on the Till
model, then the results of the second approach may be reasonably satisfactor;y.
Even so" it appears tllat tIlis special form of the second approach (i. e., usinc tile T IH model and includi:nc; all students who tool\.: tIle test in the first
step) still rests on less solid assumptions than the third a.pproach, and
therefore SllOuld not be preferred over the third approach unless we are tr.ring
to aV"Oid the latter because of some reason such as excessive calculation costs.
On
the other hand, the variance of the estimators of the
appa.rent~
a.'s and b.·s
1.
1.
can easily favor our special form of the second approach, depending
on conditions (see Section 10).
But, as we shall see later in this section,
it is possible for a certain t:"'Pe of systematic bias to creep in if we use
the special form of the second approach; since this twe of bias cannot arise
witIl the third approach, the third approach therefore appears safer.
l'le
now turn to a more detailed explanation of a couple of the points
which were summarized in tile first paragraph of this section.
We first con-
sider ,·tbat happens if only those students with C-scores arc utilized in the
first step of the second approacll, so that, of those students who took the
test (T), the ones are excluded "r110 either failed to go to college at all or
else ,rent to a college outside the
s~rstem.
tllat tile students who faileu to
to college at all received relatively
[';0
;0
Now it is rather safe to assume
10ir
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-,
I
I
I
,I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
,I
T-scores and H-scores.
'!'he students who did go to college, but to a college
outside the system, would also have relatively low T-scores and H-scores if
the colleges outside the system tend to be lower-quality institutions than
Thus it is logical to expect that there will
the colleges inside the system.
be some selection on the basis of both
T and
H with respect to the determin-
ing of i'T!lether or not a student becomes part of the eroup receiving C-scores.
Now note that, if the unselected group has a bivariate normal distribution,
then neither the HIT IOOdel nor the T,H mdel will be valid if there is selection on
H, and neither the
T IH model nor the T, H mdel will be valid if
there is selection on T; thus none of the three models will be valid if there
is selec~ion on both
H and T.
In fact, in the idealized situation where
selection occurs strictly on the basis of 'Whether a certain linear combination
of a student's T-score and II-score exceeds a certain minimum, the average value
of H given T will no lOIlGer even be a lineS' function of T, and the average
value of T
given
H will
IX>
longer be a linear function of H.
It would
apPear that distortion in the estimation of the ai's and bi'S woold be Parti-
cularl~l marked
with respect to the relation between the
~i 's
and
~i 's
of,
say, ti'lO high schools Which differed substantially in their average T-scores
and H-scores, or Which migl1t differ ,,11th respect to the infJ.uence of the
selection for the different values of T and H.
Consider the following
SupPOse tha.t two high schools
in their average
H-scores, but that they have exactly the sa.oe grading
standards (i.e., a
i
=
&x
and
b
i
= b ).
I
exactJo;{ the same basis for both schools.
i
and
I
differ
sUbstanti~·
special exatr!Ple.
Suppose that the selection is on
If we use the T IH model, then the
estir.ation process will essentially try to estimate, for e&Cll high school, the
average value of T given
val-ue of T given
H, as a linear function of H.
But the true average
H will be a curvilinear function of H, since there is
31
selection on both T and It.
Thus the estimation for schools
i
and I
will essentiaJ.J.y produce two linear approx1.nBtions of the same curvilinear
function.
However, these tllO linear approximations will tend to be rather
different from each other, since the
H data from the two schools will be
concentrated on different parts of the H-axis, for which the best linear approxiDations of the curvilinear function wou1d of course be different.
A
A
could easily end up with (ai' b i ) radical.1y different from
A
(a:r,
Thus we
A
b I ), when in
fact they should be aJ.nx)st the same since the two grading standards are
identical.
We consider next w'bat 118.ppens if, in the first step of the second
approach, all students who took the test (T) are entered into the calculations,
rather than just the students for Whom C-scores were reported.
Such a scheme
might or might not produce additional administrative problems, but in any
event it should result in substantially less distortion in the
when the second approach is used (with the T IH model).
~i 's
and
~i 's
I f all students who
toolc t11e test are included in the calculations, then it would appear, superficially, at least, that there was no selection on the basis of T.
Now there
would still seem to be selection on the basis of H, for reasons previously
mentioned in the earlier part of Section 7:
not all high scmol students sign
up to take the test, and those who do are probably partiaJ..ly selected on the
basis of It.
T,
Hr.IOd.el
ion on
If there is selection on
H, then both the HIT model and the
are invalid;but the T IH model is still valid, so long as no select-
T ha.s crept in.
We might feel that it would be reasonable to assume that there i8 no
selection on T, so that the T IH model 'WOuld be fully valid.
Nevertheless,
it would perhaps be best to try to examine this assumption d.osely.
We shall
now try to present an argument l'mch advances the point of view tl18.t there
32
I
I
-,
I
I
I
I
I
I
el
I
I
I
I
I
I
I
-,
I
I
I
.I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
.I
might indeed be selection on T, and 1h at such selection could distort the
~ 's
and
~i IS.
Suppose that high school students are able to form some rough
idea of how well they would do on the test (T) if they ~lere to take it.
For
example, consider a bright individual who has loafed all through high school
and received grades (H) Which are poor in relation to his ability.
Such an
individual might easily recognize that he would scot'e qut-te well on the test
in comparison with other persons having the same H-score as his, and, for this
reason, he and others like him might be lJI)re likely to register for the test
than individuals who would expect to have average or below-average T-scores in
relation to their H-scores.
Thus the conditional distribution of T given
H would be different in the selected population (those 'Who s:ign up for the
test) than in the unselected population (all high school students).
If the
latter distribution were normal (with the mean being a linear functim of H),
then the former distribution would not be, except perhaps by sheerest coincidence.
Thus selection on the basis of T would be present, and 'WQlld render
invalid the
(Whether this kind of selection is really occurr-
TIH model.
ing, incidentally, might be checked by an experiment.
Af'ter the test (T) has
been administered to those who register for it, it could be administered again,
necessarily free of charge, to all students in a few selected high schools who
did not take it previously.
The distribution of T given H for the latter
group could then be compared with the distribution of
former group, to see if a differenc+'eauy existed.)
distortion in the
ing situation.
1\
ai's
and "bi'S
T given
H for the
To see how a serious
could possibly occur, consider the follow-
Suppose we have two high schools, i and
I,
such that
i
is
in .a low-income area and I is in a. high-income area. Suppose t11at i and I have the
same grading standards (i.e.) ai=&:r and bi=bI .) NOloT it "rould mt be surprising if,
for any
/ given H-score, more students in I than in
i register far the test, 1oas-
33
much as students who can better a:f'f'ord college
to try to go to college.
would
probab~r
be Jl¥)re likel;y
Furthernx>re, the relatively few students in
i
who
do register for the test might be mainly the ones who WOlld be c.a.pable of getting
such high T-scores (in relation to their H-scores) that the;'l "I'1Ould be
scholarShips, and could thereby afford to go to college.
hand, it "I'lOuld be not merely the potential scholarsl1ip
In
awarded
I, on the other
a"lmrdees who would
register for the test, but also a large number of students of lesser ability
(i. e., with lower anticipated T-scores in relation to iheir H-scores) Who could
afford to Be to college even without a scholarship.
registering for the test,
higher in
i
than in
tl~
Thus, among the individuals
average T-score for any given H-score would be
I, due to the differential selecticn involving both
T
A
A
a
would generally tend to be higher than ~, thereby indicai
ting (falsely) that i has tougher grading standards than I. It cannot be
and H.
Hence
occur in actual practice, or . . r hether it is merely a slim theoretical possibility;
unfortunately, though, a bias 'tlhich has an economic or sociological basis is
probabl;'l one of the most undesirable tY]!es of biases that could be built into
a precliction system of this Idnd, and so the possibility of such a bias would
evidently have to be carefully investigated and ruled out before the bias could
safely be assumed not to exist.
In closing this section, we point out sane potential difficulties in
In the first place,
it should be apparent that, if' systematic biases creep into the
~i'S
and
~i's
in one way or another during the first step of the second approach, then these
biases "ldll be carried over into the
1\
<lj I S
and
mined in the second step of the second approach.
A
f3 j 's W'hen the latter are deterAlthough the effect of the
distortion in the "&i' s and 1\
b 's may be sanewhat diluted by the time it
i
34
-,
I
I
I
I
I
I
el
foretold with any certainty w'llether this type of distortion is likely to
connection with the second step of the second approach.
I
I
I
I
I
I
I
I
I
-,
I
I
I
,e
reaches the
the
a
.1\,
j'S and ~j IS, l.t may still make itself felt rather strongly on
"
aj's and
/to
~j 's
of those colleges Which draw a proportionately large number
of students from high schools whose
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I_
I
A
ai's
/to
and bi's
"rere badJ.y distorted.
A second diffi culty in connection wi. th the second step of the second
approach becomes evident when we nate that hoooscedasticity (equality of
variance) is assumed in the rDdel (7.18).
(7.17) rather than the
Ac~, it is reaJ.1¥ the
hijk's '\'lhich are utilized in the calculations.
h!jk'S
In
general, lltjk wU1 tend to be closer to h ijk the larger Ni • is and the
smaller IHijk - Hi. I is.
and lrlljl:
Tlms the conditional variance of
presumably 'WOUld be smaller the larger
c ijk given Tijk
N • is and the smaller
i
IH. '1. - Hi I is. This means that the homoscedasticity assumption is not
l.J""
•
strici:-1.y satisfied, since the conditional varian ce is not the same for all
(i,j,k).
But the resulting effect on the estfmates of the aj's and ~j'S
and of v and
V
O
would probably not be too serious, because of the la:rge
number of students upon
whom
these estimates are based.
Hmrever, it seems
desirable to at least call attention to this difficulty, even though it appears
to be only a minor one.
There will apparently aJ.so be other minor difficul.ties
related to the fact that the conditicnal distribution of Cijk given Tijk
and h. 'k (7.18) is not emct1.;y- the same thing as the conditional distribution
l.J
.
of Cij1:: given Tijk and h!jk.
9.
T'ne third approach:
equating the lrlgh school grades and e9!l!ting
the college grades simultaneously.
In this third approach, the high school
equating parameters and the college equating parameters are estinated simultaneals~r
in a single step, rather than in two separate steps as was the case with
each of the first two approaches.
distribution of
c
ijk
l'1e assume a model in Which the conditional.
given T
snd h ijk
ijk
is normal l'r.i.th variance equal
to 1 and with mean equal to a linear function of Tijk and llijk , so that
35
I
I
-•
the distribution can be written in the form
•
The observed data, though, is in tel'ilS of the C
t S and H :' s rather than
ijk
ij1
the c.~J01cfS and h.jk·S.
~
(5.2-5.3)
In (9.1)
and the substitution
(21f)
-~
thus apply the transformatim of variable
'\-Ie
(7.3)
I~ .I e ~ao~oCiol-v.r·jk-ai-biHijk)2
J J JC
~
J
as the conditional distribution of
we are defining
and b!
~
to obtain
Cijk
given Tijk
= btai ' and
(5.2) and (7.3).
a j = aj-l-1, ,ai
are as indicated in
could. just as well be absorbed in
a
i
as
in
and
b i = b'bi,
H •
ijk
'tThere
In (9.2)
aj, ai,
Note that the parameter 1-1
OJ ; this reflects a certain
in-
determinacy which is present.
A model of the form (9.2) we will call the
distribution of
C,T, and H in
C IT,H mdeJ..
If the joint
the unse1ected population (of all high school
students) is trivariate normal, and if selection is IIIl.de in a.ny fashion whatever on the basis of T and
H and of
distribution of C given T and
function of T and
T and
H alone, then the conditional
H will be normal (with the mean a linear
H) for tIle selected population (i.e., individuals for whom
C-scores are available) as "'rell as for the unse1ected population.
reason for utilizing the
TIus is the
CiT, H mdeJ..
The calculations for obtaining the estimates of the equating parameters
under the ciT, H JOOdel (9.2) are sonew'hat involved.
8
l'1e present here all the
basic formulas, and in the Appendix we give their derivation.
Let us recollect the notation
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
_I
I
I
I
,I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
,I
•
The estinBtes of the high sc1x>ol equating parameters are given by
(9.4)
and
/\
bi
= (1I SHHi.)
1\"
v, the a j 's,
where
1\
A
-
1\
[ ~ dij aj + ~ ~j(SCHij+ Cijdij ) -v STHi.]
,
and the 1\
~j 's are determined by the means outlined below.
At this point we need to define some matrices.
matrix "'Those general element in the
Let G( [n+l] x n) be a
j-th row and J-th coltunn is
dij(SCHiJ + CiJ~J)
SHHi.
for the first
n
rows, and
E(S
+ C. -e ) _ E
e vJ = i CTiJ
:w iJ
i
(9.7)
for the (n+l)-th row, where
if
j
FJ
and 8
jJ
,J
=1
if j
THi.
= J.
is the Kronecker delta; that is, 8 jJ = 0
Next we introduce a sJlIl1lletric ([n+l]x[n+l])
matri.::\:
fll
(9.8)
F =
fJ2
• • • fln f lv
f 22 ' ., ., ., f 2n f 2v
• • • • • ,eo ,• • • • • • •
f
f
f
f
n2 • • • nn nv
nl
f
fl-Q • • • f vn f vv
Vl
f
+ ~ ~. )
CHiJ
i~J.J
SHHi.
(S
use the subscript v rather tl18.Il (n+l) to refer
'VTe
In (9.6), o'J
to the bottom row.
S
21
37
I
I
-.
whose elements are defim d by the equaticns
( =f
I
I
I
I
I
I
Vj) ,
and
f
J STTi
•
0
vv
.
_ S2 THi . )
SHHi.
1\
1\
A
TIle formulas for the ex.' s and v will be in terms of the 13 j 's: we
J
solve the equation system
=
F
(9.12)
••
for
<1.'
... ,
~,
belmv] , and so F
G
NOH
, v •
~
_1
If:)
F
does not exist.
is not of tne full raru~ [see (9.l4)
F
the mtrix F* ([n+l] x [n + 1]) denotes
will generally be
9
9
of rank
n.
If
a conditional inverse of F, then a
solution of (9.12) is
(9.13)
A
~•
=
~
F*G
~
•
•
•
•
~n
~
The Appendix
9
•
indicates a "Ivay of obtaining
F*.
Incidentally, the solution
of (9.12) for the aj's will not be unique; if the same constant is added to
each member of a set of a.'s satisfying (9.12), then the new set of values
J
will also constitute a solution, inasmuch as
n
L:
J=l
= 0
f
vJ
38
•
_I
I
I
I
I
I
I
I
-.
I
I
I
,e
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
,e
I
To obtain the ~j IS, we have to solve a non-linear equation system.
We
define a matrix U(n x n) whose general element is
Then
~'re
define
A(n x n) = U - G'F*G
We use ~ (n x 1)
•
to denote the vector
the vector (N•
l' •
N 2' ••• , Ii! ) "
.n
whose diagonal elements are
(~l' ~2' ... , ~n)',
!:!
(n x 1) to denote
and DQ.~ (n x n) to denote a diagonal na trix
~l' 13
2, ••• , 13n •
The
A
~j' s
are foum by solving
the system
A ~ = D
_1
13
for
~.
•
The system (9.l7a) can be written alternatively in the form
n
J:l a jJ
where
!
~J
=
(j
N. j /13j
=1,2,
•• , n)
a jJ is the general element of A(9.l6) •
This system (9.17) bas no more than one solution for the 13 's such that
j
10 Possible iterative techniques for solving the
all 13.·s are positive.
J
system (9.17) are indicated in the Appendix; the generalized llewton-Raphson
method and the method of steepest descent are considered
11
•
Having just presented all the details, we now bring everything together
and summarize what has to be done to obtain the estimates of the equating
parameters under the
ciT,
H model (9.2).
After calculating the elements of
F(9.8 - 9.11), we obtain a conditional inverse F*, which
essenti~
involves
9
the inversion of an n x n matrix •
U(9.l5), and A(9.l6).
Then we calculate G(9.6 - 9.7),
Next we solve the system (9.17) to get the estimates
39
f3 j ·s.
of the
These estimates are plugged into the right-hand side of (9.13)
A
1\
stit'Q.ted into (9.4
-
1\
1\
1\
Finally, the a.' s, the f3.' s, and v are subJ
J
J
1\,
1\,
9.5 ) in order to get the a. s and b. s.
in order to obtain the a.' s and v.
J.
J.
Evidently tbe two mat troublesome st"eps in thi s procedure will be to
find F* and to solve the systeu (9.17).
In assessing the ma.r;nitude of the
computing problem which we "Till face in these two steps, we should recall tllat
n (the number of colleges) is anticipated to be about 400 or 500.
If the CiT, H oode1 is altered so that the bi's of (7.3) are eliminated
(i.e., set equal to 1), then the formulas for estimating the equating parametem
are a bit different from the a.bove, although the calculations will be aJJoost as
1:2
1et1{,rtl~
•
If the
ciT, H rodel is altered so that the f3 's are eliminated
j
(set equal to 1), then the procedure for getting the estimates will be virt~
the same as the one given above except that we no longer will have the burden
13
of solving tIle non-linear system (9.17)
one T-variab1e in connection
in. tIl
the
•
If it is desired to use more than
CiT, H model, due to there being se1ect-
ion on the basis of more than one T-variab1e, then arrangements for the additional T-variab1e(s) can be incorporated into the calculation procedure with
~4
a minimum of difficulty
10.
approach.
•
Evaluation of tIle third approach, and comparison ,·71th the second
In this section, "le com;pare the third approach llith what seems to
be its leading competitor, viz., that special form of tile second approach in
Which (see Sections 7-8) the first step utilizes all students who took the
test and is based on the T IH node1.
The third approach apparently rests on
more plausible assumptions than the special form of the second approach, but
at tile same time requires greater computational effort.
The third approach
utilizes data from a smaller Group of students; this factor lnay have both its
40
I
I
-.
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-,
I
I
I
.e
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
.e
I
dra'ubacks and its advantage s.
of tlle
~. s
approach.
and
~i •s
Finally, we will need to compare the variances
under the third approach and the special form of the second
We now consider t11ese various points in detail.
As we noted in Section 8, the assumptions underlying the special fom
of the second approach may be open to some questicn, because of the possibility
of selection on
T even when all
~dents Who
the calculations of the first step.
Such a
took the test are included in
danger cannot arise with'the third
approach, however: the third approach will not be invalidated by there being
selection on T as well as
H.
This is because the third approach is based
on a distribution (the conditional distribution of
C given
T and H) which
holds both T and H fixed.
Although the third approach tj;ms seems to rest on Sanewhat more reasonable assumptions than the special farm of the second approach, the third
approach is still not completely i.mr:mJne from conditions of selection which
might cause the assumptions of the ciT, H model to be violated.
if tllere is selection based on
cause trouble.
addition to
For example,
C(which seems unlikely), this could obviously
Again, i f some colleges are using a third predictor (in
T and H), and if this third predictor is successful in improv-
ing tIle prediction of
C, then the assumptions underlying the third approach
would no longer hold.
However, if same college(s) should indeed discover on
tl1eir own such a third predictor Which genuinely does improve appreciably the
prediction of
C, then SUC:il a third predictor could and probabl y would be quicldy
included among the predictors employed by the central prediction system.
This
third predictor would probably be in the form of an additional T-variable, and,
as "re have aJ.ready seen in Note 14 of the Appendix, the introduction of an
additional T-variable causes only a relatively small increase in the computational
effort required for the third approach.
All in all, we conclude that the third approach appears to be a some-
41
what safer choice, because of the ereater plausibility of the assumptions
underlying it.
At the same time, 1-re have to recognize that the computational
burden is much heavier for the third approach than for the special form of
the second approach.
For tile third approach, the calculation of the various
elements (9.6 - 9.7, 9.9-9.11) of G and F
evident~·
is no small matter in itself, but
the two biggest computational tasks are to invert the large matrix
F (see Note 9) and to solve the system (9.17) (see Note 11).
11
Each of these
tasl\:s involves an (n x n) matrix (F
and A respectively), and we are antic111
patine; that n, the number of colleges, will be about 400 or 500. However, in
inve::.~ting
F11' . we essentially have the job of inverting a matrix which hal
large positive diagonal elements and smll off-diagonal elenents, a condition
Whicl1 may greatly simplify the inversion problem; and, in solving the nonlinear system (9.17), we may be able to employ a relatively quick and simple
iterative technique by making use of the suggestions in Note li.
In any
I
I
-.
I
I
I
I
I
I
_I
event) tIle final assessment of hou great a disadvantage the heavy computational
burden of the third approach is
~Till
have to be made in terms of the estimated
total cost of the required computations.
For some phases of the computing for
the third approach, it might be mre economical to purchase a small amount of
time on a rather large computer.
However, decisions such as these, as well
as the cost estimates, would have to be left to computer specialist••
Data :from fewer students is utilized in the third approach than in
the first step of the special. form of the second approach.
From high school
i, let there by Ni. students 'Who took the test (T) (and for Whom H-scores are
assur~d
to be available).
Let there be Ni • out of these
Nt
students tor
C-scores are also available; in other words, there are (Hi. - Ni .) out at
whom
the H!
J..
students who either didn't get to college at all or else went to a
colleee outside the system.
TIlen the third approach utilizes data (on C,!,
42
I
I
I
I
I
I
I
-.
I
I
I
,e
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
,e
I
and H) just 1'rom the N•• = ~ N • students who received C-scores, whereas tIle
i
l.
first step of the special form of the second approach estimtes the ai's and
b i 's by using T and H data 1'rom all N:. = ~ Ni. students (although the second
step of the second approach necessarily uses
data only from. the N.. students).
The fact that the third approach utilizes data from a smaller group of
students nay have both its advantages and disadvantages.
it is possible,
particular~
In the first place,
¥Then the central prediction system is first being
started up, that there might be some extra administrative problems involved
in getting the H-scores of the (N:. - N•• ) individuals far: "t';hom no C-scores
are reported.
If this should be the case, then the third approach would have
the advantage of avoiding these extra administrative problems and their associated costs.
On
the other hand, the third approach may be at a disadvantage for
certain other reasons related to the differences mentioned above.
For one
thing, the sm&ller the ratio Ni •/Ni.' the less favorably the variance of
~
~
A
A
a i ( or bi) under the third approach compares wi tIl the variance of a i (or bi )
under the special form of the second approach, as we shall. see in mre detail
later in tlus section. There is a second possible disadvantage for the third
approach whicll relates to the variables which are used rather than to the
Since aJ.l three variables (C, T, and H)
numbers of students which are used.
are used in estimating the a
i
ts
and b
i
t
s u.'lder the third approach, the estimates
of the ai's and bi'S will necessarily have to be based on data from students
who alrea.dy graduated 1'ran lrlgh school the previous year (or earlier), inasmuch as we have to wait until the students have received their C-scores before
~
A
we can calculate the ai's and bi'S.
With the secorn. a.pproach, though, only
the t,·TO variables T and H are used in getting the
~.l. 's and ~.l. 'so We may, if
Ao
1\
we wish, base our calculations of the ai's and bi'S in the second approach on
43
the same class of students (prabably the college f':resl1Ill8.n class) that would be
used irith t"lle third approacl1.
But, on the other lland, i'Te also have the option
I
I
-,
of using a later (younger) class of students, inasmuch as ue do not need to
i'18.it for their C-scores; this i'1Ould result in more up-to-date estimates of the
ai's and bi'S, which might or might not be an important advantage dependirlG
on the extent to Which the ai's and bi'S tend to change in the course of tire.
[The parameters v and v 0 of (7.18), as i'lell as the exj 's and f3 j
's,
would of
course still have to be estima.ted from the earlier data, 110\-TeVer.]
At this point, we pause to mention some rather s1L"!Plc types of checks
whicll can be run to determine whether a prediction system if> operating as it
should in certain respects, or to compare tifO or mare different predictive
techniques.
and
~thods.
These checks i-rill be applicable to all the different approaches
Let
~ijk
denote the predicted college grade for individual.
(i,j,1;:), and let Cijk denote (as always) his actual grade.
Tne
~ijk'S
_I
are
based on the Tijk'S and Hijl~ IS, and on parameters estimted probably fran the
previous year's students; tIle Cij1-;: IS, of course, do not becone available until
A
some tilre a.f'ter the Cijk ' s.
Some checks whicl1 can be made are as follows.
l'le
calculate
(10.1)
for each student.
We then croup the Dijk ' s (10.1) according to tIle high
schools (i), and again accordinG to the colleges (j).
We ta~y the number of
positive and negative D..1 's '\"r.i.thin each high school, and again i'Tithin each
J.J
~
collec;e; if there are high SCHools
or colleges for Wllich t:ile ratio of positive
to negative Dijk's differs radically from fifty-fifty, tIns mny indicate
trouble with respect to the
~.J. 's
and
~i's
1\
and t3 's of these colleges respectively.
j
44
of tllese high schools or the
~.J 's
For identif'ying trouble spots in
I
I
I
I
I
I
I
I
I
I
I
I
I
-,
I
I
I
,e
the high school, it might also hell> to arrange the students of
Ie
I
I
I
I
I
I
I
,e
I
high school
in order according to their T-scores, according to their H-scores, and/or
according to a linear combination of T and
I
I
I
I
I
I
a
H, and then perform a run test
on the signs of the Dijk f S for any or all of these three arrangements.
The
Dij1: f S ~rithin a high school (and, to a lesser extent, w.i.tllin a college) will
of course not be independent since the
1\
J\
and b i (a cammon a j
~jk
f
S
are affected b"J a common
~i
A
and f3 j in the case of a college), but the, checks suggested
above roa:y nevertheless provide scme useful indications.
A different type of checl: can be based
al
the D~jk f S •
For example,
we r.UGllt calculate
(10.2)
(lIN. j )
1: 1:
i k
n:;).J...
.'1_
for ea.ch college, and compare the quantities (10.2) with their eJqlectations.
If there are certain colleges for which (10.2) is inordinately large, this
might indicate trouble with their
native explanation as well:
~j f S
and
~j f S ,
but there could be an alter-
the criterion variable (C) at such colleges miC;ht
be something essentially different from the criterion varia.ble at the vas't
majority of colleges, as miGht happen, e. g., as a result of unconventional or
different curriculums.
A check r.rl.gl1t also be made by calculating a
quantit~r
similar to (10.2) for each high scl"X)ol, and making appropriate comparisons.
If there are certain high schools for which the quantity is inordinately larGe,
this ::ucht indicate trouble with their
~i f S
and
~i
f
S,
or it
~ht
possibly
reflect poorly-controlled or non-uniform grading techniques in these high
schools.
Cllecl:s such as the ones just mentioned might be verJ helpful in reaching a decision between the t:;1ird approach and the special form of the second
approach.
Both approaches could be a.pplied to the
45
Sa.IIe
body of data (taken,
I
I
-.
perilaps, from a relatively s:c.1B.ll number of high SChools and colleges), and
the results of the above cllec::s for tile tim approaches cO'Lud 1)e compared.
~'1e
turn now to a comparison of the variances of tile
~i 's
~.'
s
J.
obtained ll.nder the third approo.cll versus the variances of tJ1C
obt::dned under the special i'orr.l of the second approacll.
and
~i
and of
~i.
J.
(~.J. +~.J.H)
~.J. 's
(~.].
+
~.J.H)
After necessar:' adjustments for
tlle rec;ressi.:m parameters tilat are absorbed in tile a. 's and b.
of t.lle variance of
and
ActnaJ.ly, imat i'le irill
do irill be to make tIle compar:taon on the bas is of the variance of
rather t.1B.n tIle variance of
~i 's
J.
IS,
the ratio
1\
uncler tJlC third approacll to tile variance of (a
i
A
+ b .H) under the special form of the second approach is 15 a}?Ilroximately
J.
+
,
(10.3)
+
(H-H! )2
J..
smu.
vThere (JCT' PCH' and
Pr.rH
1
j
1
denote the correlation coefficients aTllOng C, T, and
H in tile unselected population.
SHHi
•
Sirm.. in (10.3) denotes tile same thing as
e:ccept that it is fiB,"LU'ecl iTitll respect to all N!
J..
than just over tl1e
individuals.
Ni •
individuals, and
Hi.
individuals rather
denotes t~le mean over all
Ni.
The variance for:nuJ.a.s for the third approach ii.dch ifere used in
obtaininG (10.3) were figured on the basis of a simplified nndel in whicl1 tae
ex ' s ano.'
j
I=l ,
1-'.
J
S
were assumed to be known exactly15; for tIlis reason, the ratio
(10• .5) is presumably sJightl~l snaJ.ler than it should be, although the discrepancy
1\
1\
is probabl~'l not great since tile exj ''''~ and 13.'
J s are based on SUCll large numbers
(N .'s) of students.
.J
One might perhaps suspect intuitively tl18.t the third approach should
produce estimators having smaller variance than those produced by the second
approach, inasmuc11 as the farner approach utilizes information from all three
46
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-.
I
I
I
••
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
.e
I
varia.bles (C, T, and H) whereas the latter utilizes only the T and H information.
Thus, except for the factors enclosed by the square bra.ckets, one might
suspect tll8.t the ratio (10.5) ought
this uill often not be the ca.se.
to be < 1.
However, it turns out that
The explanation lies in the different
assm;Ij?tions and distributions upon 1'Thich the CiT, H model and the T IH model
are based.
The ratio of the two factors in square brackets in(10. 3) should be
rougj~r
n! IN
J..
i•
in most cases.
If these two factors are omitted, what remains
of (10.3) is
(10.4)
,
whicll is the ratio of the ccnditional variance of
conditional variance of
T c;iven
C gi. ven
T and H to the
H (under a trivariate normal distribution,
which is assumedl. 5), nn1l.tiplied by a factor which adjusts for the different
regression coefficients that are absorbed in a
i
and b
i
under the wo approaches.
It may be instructive to calculate (10.4) for a couple of numerical
examples.
Suppose
(10.4) is 18.0.
PCT = .70, PCH = .50,
Again, suppose
time (10.4) is equal to 0.81.
PCT
= .70,
and PTH = • 60.
PCH
= .65,
Then the value of
and
A.rH
Note hOlo1 a small change in the
= .50.
Tins
p's can produce
a vio1en-t cl'l8.Ilge in (10. 4).
Thus we see that (10.4) can easily favor the special form of the second
approach rather strongly (by being rmlch greater than 1), although it can also
favor the third approach with certain values of the p's.
The factor (Ni' • IN.J.. )
obviously "dll always favor the special form of the second approach.
Hence,
with respect to the criterion (10.3), the third approach nay not show up at aJ.l
favorably.
On
the other hand, if the special form of the second approach
restuts in biased estimates of the ai'S and bi'S whereas the third approach
47
does not, tllen the magnitude of such biases crold
actinG advantage accruing from (10.3) being
> 1.
possibl~'-
dHarf any counter-
AlsO, lTe ldll be able to see
from some formulas in the next section that, insofar as the expectation of
D~jk
(Qi +
[see (10.1)] is concerned,. the contribution which tl1e variance of
~iH)
rneU{es to this expectation will be relatively small even if the third
approacll is used wilen (10.3) is r;mch larger than 1.
In closing this section, ire pose the question of l'111ether the a. I sand
1.
b. 's miC;11t be estimated by sone approacn which wculd utilize both the informal.
tion from the T IH model and the information from the CiT, H nodel, and thereby
produce ~.
1.
I
S
and ~. 's whose
1.
variances would be better tha.n tile correspondinG
vario.nces under either the third approach or the special forrt of the second
apprJacll.
We have made no aJctempt to explore this
question.
It is possible, t:louC;h, that lve might end up b:y running into forr:rl.d-
potentia~'-
complicated
able difficulties of either a t"lleoretical or computational nature.
11. Determination of tne predicted college grades,and of confidence
interv:aJ-s for these grades.
After the estimates of all of the equating para-
meters and regression parameters have been calculated, it still remains to ob1R in predicted college grades, and perhaps confidence intervals therefor, for
the current applicants for collec:;e a.dm:i.ssion.
We will use
A
c ijk to denote the
predicted value of the equated college grade c" k (2.2), and our presentation
l.J
A
here will eA';plicitly be only in terms of the ~ijk 's rather tll8.n the Cijk's.
BOl-leVer, tIle latter can be obtained from the former via the formuJ.a
(11.1)
•
Pres'll!"Jab~/
the N. j I s will be large enough so tllat the "OJ I S and A
13 j I s will be
relativel~r
quite close to tlle true parameters.
48
I
I
...
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-.
I
I
I
.e
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
.e
I
Tile mst obviros fornro.la for determining
\'1rold be
(11.2)
If the third approach was used to estimate the equating para.meters, then the
A
a 's, the 1\
b 's, and "v of (11.2) will be respectively the estimates of the
i
i
a 's,
i
v under the model (9.2) as calculated by the metb:>ds
the b 's, and
i
of Section 9.
If the special form of the second approach was used to estimate
1\
the eq,uating parameters, then the v of (11.2) will be the estimate of v under
A
1\
tile model (7.18), and the a.·s and b.
J.
J.
f
S
of (11.2) will be respectively the
estimates (7.5) and (7.6) multiplied by the estimate of
O
V
under (7.18).
Appropriate llDdifications in (11.2) can be made if there is more tbLn one Tvariable or if the bi'S have been eliminated.
Hov1
,
(ll.3)
where v , ai' and b
i
are tIle parameters appearing in (9.2).
Hence the expecta-
tion of
(U.4)
conditional upon T ijk and
1\
Hij}~ should for practical purPOses
be O[since
~i
A
and b. as well as v in (1l.2) ought to be virtually unbia.sed estimates of the
J.
parameters 't"1hich they estimate].
Tmroximatelyl
and H••,
is a orr
J.Jli:
1
The variance of (1l.4) conditional upon Tijk
6
n:J "') = 1 + +
~lc-c
N~i).
(Hijk-H(i).)2
-SHH(i).
::"1' tile third approach was used; the corresponding formula for the special
form of the second approach is treated in the Appendir 6.
The parentheses
around the i' s in (ll. 5) indicate that the associated quantities refer to the
data
t~at v~s
used to estimate the ai's and bits, not to the data of the stu-
dents for i'mom the ~ijk' s are to be obtained.
Finally, we see that n 95~ confidence interval for Co 0k( the equated
~J
college [,Tacle Which is beiIl[j predicted) is given approximate1;'l by
,
(11.6)
~ijl: is specified by (11.2) and CT(c_~) is obtained from (11.5) [or fran
"There
(A16.5].
Note from (11.5) tlJat, in general, the interval (11.6) will be wider
the sr~er N(i). is.
For students from SC11001s i·1ith very small N(i). 's, (11.5) may becane
rat~ler
larGe, and we might i-mnt to consider Whether it i-/Ould be better in SUCll
cases to base ~ijk on Tijk alone rather than on both Tij1: and Hijk • In fact,
o(c-~) (11.5) is taken to be infinite if N(i). is 1 or O.
We investigate the l;lB.tter of basing ~ijk on Too, alone.
~J~;:
tlla.t tIle conditional distribution of c
ijk
Note first
given Tijk has mean of the form
(11.7)
[or,
e~uiva1ently,
(ll.C)
"There the notation of (11. 8) is explained in Note 15 of the Appendix], and
variance
var(cijklTijk)
= var(cIT)
= ~(1 -P~T)
(l-~T)(l-~)
=------..;;:.-----(l-p~ -P~H-~
P.,m)
+ 2PCT PCH
[In obtaining the second line of (11.9), we assume that tIle c ijk ' s of the
model (11.7-11.8) are based on the same ~j'S as in (9.2), and then we utilize
50
I
I
-.
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-.
I
I
I
.e
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
.e
I
the fact tl1B.t (Al5.5) is ec.ual t.o 1, solve for ~ , anti substitute into the
2 ) to get t.·,lC final expression in (11. 9).
expression a2(1-PCT
c
In this ~ray,
lre
have a formula w'hicl1 may be 2'.101'e suitable for any comparison involving (11. 5)].
He suppose that l'le can obtain estimates of fJ and
of
V'e.i.'(c
IT)(ll. 9).
VI
in (11.7), and
[Such estir.1B.tes under the model (11. 7) i-Tould have to be
calculated from data for whi cJ: 1 tllere uas no selection on tlle basis of
H; in
fact, the data of students fron the high scb:)ols having ver:l small N(i). 's
and tllerefore presumably uninterpretable H-scores might be used, since such
uninterpretable H-scores should not have been utilized far selectioo..
starting
Before
a.m' calculating for estin13.tes under the rodel (11.7), values of the
equatinc paraJl1eters for tl1e college grades could be taken from the results of
the main estimtion procedure and used to transform the C, '1 's in the data to
c
ijk
's.]
J.J ~
A
A
Once the estimates fJ and v' of fJ and v'have been found, we can
write
(ll.lO)
as tJ:le formula. for predictinG Cij1;: on the basis of
If
Tijk alone.
A
A
c ij1;: is given by (11.10), we see tmt tl1e expectation of (cijk-Cijl}
conditional upon T 'k(but not the expectation conditional upcn T, jk and H, '1 )
iJ
J.
J.J i:
Then, if we asswne tl18.t the sample that
should. for practical purposes be O.
was used for estimating fJ ane:.
VI
was large enough so tl1at tlle differences be-
A
ll.
tween Il and Il and between v' o.nd
VI
are of relatively minor ma.cnitude, we find
A
that tJ:le variance of (cijk-Cijli:) conditional upon Tijk is approximately
(11.11)
~( c-c
A)
= var(cIT)
,
\-There the right-hand side of(l1.11) may also be written in the alternate
foTr.1S Given by (ll.9).
Finally, "re can use (11.10) and (11.11) to obtain a
confidence interval like (11.6).
51
If' 'Ire 'uant to choose beti'reen the two prediction forr;nl1as (11.2) and (11.10),
,.,e Lucht compare (11.5) [or (Al6.5)] with (11.11) and decide according to '\'ThiCll
is s::nller.
However, we probably should use the same
~ij1: formula far
all
the students of a given hiG!l SCilOO1, rather than (e. g.) usinc (11.2) :ror those
students whose Hijk's are su:f'f'iciently moderate that (11.11) exceeds (11.5)
while using (11.10) for students 111'Dse Hijk's are extreme enough that (11.5)
exceeds (11.11).
If we used the latter procedure, then we might create a
situation l111ere t'\\U students fron the same high school would l1ave identical
T-scores but the student with the higher H-score would have the lower ~ijk;
it v101.1.1d be hopeless to try to explam such an outcome to a college administrator.
Instead of making the comparison between (11.5) and (11.11) for each
student ind.i.vidu.a.1ly, we could, as one possibility, compute the average of
(11.5)
acro~s all
Ni • students in a given high school i, and then choose
forr.IU1o. (ll.2) or (11.10) for all students in high school i according as
tI1is average value of (11.5) is (respectively) smaller or larger tmn (ll.ll).
A different possibility wou1d be to use an approximaticn to tnis average
ratllcr than cong;>uting its emct value:
the ratio of the numerator of the
third terr! on the right-hand side of (ll.5) to its denominator should, on the
avera.c;c, be roughly [l+(l/W(i)'>] to [N(i). -1], so tmt
(ll.12)
r1'
(c-~)
l+(l/N(i).)
+ N. -1
(1.).
(1.).
1
= 1 +
No
represents an approximation to the average value of (11.5) for high school
i.
UOii note that if we choose (11.2) or (ll.lO) accordmg as (11.12) is
(respectively) smaller or larger than (11.11), then our choice will be deter-
52
I
I
-.
I
I
I
I
I
I
el
I
I
I
I
I
I
I
-.
I
I
I
,e
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
,e
I
mined strictly by N(i).' in SUCll a vray that we use (ll.lO) for all high schools
with H(i). 's below a certa:i.n number and we use (ll.2) for all high schools lr.i.th
larc;(J." Ii( i ). 's.
TIle question might neJ::t be raised as to whether
vie
could get a more
sophisticated ;jk by some l:ind of joint utilization of (ll.2) and (ll.lO),
rather t:llan by using one or tile other alone.
For example, i'le might consider a
prediction formu1a. of the fOrLl
(11.13)
"thicll is a linear combination of (ll.2) and (11.10).
(ll.15) is to be between 0 and 1.
of H(.)
, in which case
1 •
p
'-Ie
might determine
The coefficient p in
p
strictJ..y on the basis
Sllould
increase as H(.)
increases.
.
1 •
If
a ~( c-cA)
can be determined for (11.13), then this 1c-~) s110uld be a quadratic function
in p, l'ihose minimum with respect to
p
would be easily obtainable.
Howevel~,
sane complications seem to arise i·then certain expectations pertaining to c ..,"
1Js.
A
and c ij1: are taken.
that, if
~ijk
In particular, ife are faced in the beginning with the fact
is given by (ll.l;), then
E[(cijk-~ijk)ITijl):,Hijk]
is a linear
function of T
and H
rather than being equal (approx:tmately) to 0, and
ijk
1jk
"
incidentally E[(cijk-Cijk) I Tijl:] is not 0 either unless we use for
E(Hijl: I Tijh) the same forml1la. i'l1lich holds in the unselected population (Whicll
probably i'louJ.d not be reasonable).
If tllis paradox am other difficulties can
be resolved, then the prediction formula (11.1') WOlld appear to be a logical
type of formula; but here in this report we will not attempt to explore its
possibilities any further.
For practical purposes, it vlOUld appear tmt (U.2)
should be adequate for all students except a few coming from high schools with
very sr.18J.1 N(i).
IS,
and for these latter students (ll.lO) can be used.
5'
I
I
HATHEMATlCAL APPENDIX
Note 1
The logarithm of the product over
(Al.l)
L
= constant
+ Z N ~logl~jl
j
.J
i, j, k of the expressions (5.4) is
-t Ei Ej kE (aj+~jei'k-v.r'jk)2
J
~
•
Differentiating L(Al.l) with respect to a~, ~j' and v , and setting the derivatives equal to 0, we obtain respectively
,
(Al.2)
(Al.3 )
and
(Al.4)
1: a.T .1+ 1: ~jE E
j
J.
.1
i k
Upon solving (Al.2) for
e·j1.T. 'l- v 1: I: E ~jk = 0
J.
~ ~J C
i j k
~
a., lre get (5.8) (after putting on the hats). After
J
substituting this solution for a
j
into (Al.), we end up ,nth the quadratic
equation
(Al.5)
in f3 j , which of course has tuo solutions.
N. j are both always
other negative.
> 0, one solution for
Since we can assume that Sec. j and
~j
will always be positive and the
One solution of (Al.5) is given by (5.7), and the second solu-
tion is the same thing except i-rith the first plus in (5.7) replaced by a minus.
The root (5.7) may be either positive or negative (according as SeT.j is positive or negative), but in eitlwr case it can be shown that (5.7) results in a
larger value of L(AJ..l) than does the other root, no matter what the value of
v( v > 0).
For all practical purposes we can assume tlJ8,t (5.9) holds for each
-,
I
I
I
I
I
I
el
I
I
I
I
I
I
I
-,
I
I
I
,e
A
(5.7) will be > O.
j, so that all ~j's
Substitution of the solutials for Qj and ~j
[see (5.8) and (5.7)]
into (Al. 4) gives
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
.e
I
,
S2
~Z ~+lv1:
SCC.j
j
j
S~.j
SCC.j
4N. j
,_
_
2
(1 + S
r J!)
,,1: 8ft j - 0,
ft.j j
j.
which, after multiplication bJ.T (-2/") and application of the formula. for rj
[see (2.4)], reduces to (5.6).
Which satisfies (5.6).
Now there is one and only one value of .j2
'this is evident if we note that the left-hand side of
(5.6) is a strictly increasing function ot
(since
rj
~,
becoming positive as ~-+ flO
is oS 1, and can be assumed < 1), and becoming (infinitely) negative
After finding tIle solution~ of (5.6) (see Note 2), we take its
as ". -+0.
positive square root to get ~ • It can be shown that, mathematically speaking,
no generality is gained if
~1e
permi t negative values of
".
Consider now the (2n+l)x (2n+1) matrix ot second derivatives of L (Al.1)
with respect to
~,Q2'
~1' ~2'
••• ,an'
... , ~n' ".
We differentiate the 1e'tt-
hand sides of (Al.2), (Al.3), and (n.4) to find the elements of this matrix.
It can readily be seen that this matrix, after mu1tiplication by -1, can be
written a.s the sum of a matrix 'td.th the element N./~ in the (n+j,n+j)
diagonal position and zeroes every\mere else, plus a second ma.trix which is
equal to the product of the (2n+l)x N.. matrix
-
~.
..
o
• • •
o
o
1
• ••
1
• ,• • o
0
.•
• • •
•
•
C
0
C
• • •
1
..
o
o
(Al.6)
•
•
0
•
mlN
••• o '~ml
•
••• 0
C'Jlll • • •
o
o
•
•o
121 • • •
•
0
•••
0
o
o
C
:
0
m2R
m2
o
o
• • •
.
• • •
1
.• ••
o
o
o
o
o
••
1
.. ,. . o
,
• • •
•
C3Jll;o o.
o
·
•
C
mnR
--.r-----,----..----+wr-------.r---+---+'1IIr'----_"iilIll~
-Tlll • '. • -TmJ]l' -T12l • • • -Tm2N
cl
~
-
55
-Tlnl° ••
.l.
IIID
DIDBm
-
by its transpose.
Thus, since
1-Te
can certainly assume tll8.t the last row of
(Al.6) is nat a linear combino.tion of the first
n rows, it follo1'18 tlat our
matri:: of second derivatives is negative definite for a.ll. values of the a. 's,
J
the f3. '5, and
J
v, including in particular those veJ.ues which satisfy (Al.2-Al.4).
'1'his is sufficient to establisJl tllat any solution of the system (Al. 2-Al. 4)
constitutes a relative maximum of the function L (Al.l) (see, e.g., Apostol
[2, pp.l5l-152]).
Furthermore, it can be shown that our solution based on
(5.6-5.8) 1dll constitute an absolute maxirmun (the second to the last paragra.ph
of llote 3 below is partia.ll.y relevant here).
the enstence of this solution, and
"Ie
We have previously established
bave established tlJat, if (5.9) holds
for all j, then it is a unique solution of (Al.2-Al.4) under the restrictiCll
that
v and all f3j'S are > O.
Note 2
i'l1e Ne,'lton-Raphson metl'X)d (see, e. g., [7, p.l92 if.]) is an iterative
method of solving an equation of the form fez) .. 0: it uses the iteration
forn1l..lla.
(A2.1)
znew
= zold
to fjl1d a new (and usually better) approximation to the root of f(z}. 0
at ea.ch step.
Applying this formula (A2.l) to the equation (5.6), with
z .. .j'-, we obtain the iteration formula
4N
(A2.2)
va
.j'-
new" old -
1
·e2\i2)1!]
1: ST'r j[2-rj-rj(1+s
j.
TT.j j old
N
2
1:
.j
(va
old
)2
j
(
4N. j r2\J2· ')"k
TT.j j old
1+ S
I
I
..
I
I
I
I
I
I
el
I
I
I
I
I
I
I
-.
for finding the root of (5.6).
56
I
I
I
,.
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
Ie
I
There are certain sufficient conditions for the Nmrt.on-Rapl1son procedure
(~.l)
f'(z)
and
to converge to the root.
>0
.:s tile
In particular ,convergence is assured if
z ~ the value used in the first iteration
and f"(z) < 0 for all
root (it being assuned that a starting value can
to t'ile left of the root).
HOlT
'j)C
found "'hicll is
the denominator of the last term of (A2.2) is
the first derivative of the left-hand side of (5.6) witl1 respect to
it is clearly positive for all 11':
.;z
>
.;z,
and
the second derivative vTith respect to
0;
> O. Hence convergence of the
is ea.sily shown to be neca.tive for all v 2
procedure (A2.2) is assured,oncei're find a (positive) value of V" imich is
smaller than the root.
d.
Note
'o1e treat the special case where there are just t1'ro T-variables; this
"'ill be sufficient to put across the general idea for larQer numbers of Tvaria.bles.
Let the two T-variables be denoted by Tijl~ and T~jl~.
assume a nodel in which the conditional distribution
and Tijl:
0
f
cijl:
We
given T. 'k
~J
is of the form
,
so tl1B.t [if ire proceed ana.locousJ.:,r to (5.2-5.5)] the conditional distribution
of C•.". given Tijk and T o ' k is of the form
i J ..
~J~'"
(A3.l)
(21t)
-! 1f3.1
e
-i(o:.. +f3.. C ,
J
J
'1,. -Vl,
J ~J.'"
The logarithm of the product over
'k-V~~jk)2
~J
i,j,k
~
•
of the expressions (A3.l) is
L = const + EN ..1oglf3.I--k E E E (a.+f3,ci'k-v.r'j,.-v~iOjk)2
j
•J
J
i j k
J
J
J...
~
••
Differentiating L (A3.2) vTitll respect to the O:j 's, the f3 j 's, ", and
57
O
V
,
I
I
-.
and settinc the derivatives cQual to 0, we obtain the ss'S ten
,
(A3.5)
L: a ..T j+ E ~jE E C...,.T ..'l.-v E E E ~jk- vOE E E Ti"'1,T~j1'
j
J.
j
i k J.J"- J.J '.
i j k J.
i j l~
J •. J. "
(A3.6)
E a.To .+E f3jE E C... T~ ., -v E E E TijkT~ .1 -voE E E T o:: = 0,
J J •. J j
i k J.JJ~ J.J~:
i j k
J.J {
i j k i J l:
=
°,
TO. is defined ana.locousl:l to T.. Let TO j , SCTo ., SToTo ., and
.J
.J..J
.J
STTo.j also be defined in analoGlJ with previous notatioo. [see (2.1, 2.4)].
where
Solvinc
(A3.3) for a. gives us
J
1\
aj
J\ =V
+"
1\
T j
•
-0
_I
1\ _
T . - ~.C .
.J
J.J
•
If "Ire substitute this solution for CX. into (A3.4),
J
'\'le
'tor.ill obtain the qua.d-
ratic cquation
(A3.3)
in f3;1.
It can be
J\
f3 j
4N
~i]
[ 1 + ( 1 + (¢ S ~~"S ° )~
CT.j
CT .J
~
lcc .
=
of (A3. G) 'Hill result in a la.rcer value of
that
i·re
estimate f3
j
by (A3.9).
4
.j
.
L (A3. 2) than tile otl1er root, so
Finally, if we plug (A3.7) and then (A3.9)
into (A5. 5) and (A3. 6), we ,lill end up wi tIl a system of tv10 equations in the
58
I
I
I
I
I
I
I
I
I
I
I
I
I
-.
I
I
I
,I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
.I
(A3.l0)
('
v~~STT.j- ~ S2
SCT·j)+V02~S
J
CC.j
j
~
0
_
TT.j
j
SCT.jCTo.j
S
)'
SCC.j
j
and
(A3.ll)
[ 1 +
Once (A3.l0-A3.11) are solved for v and
i\
v
and
1\ O
V
yO ,
1t = 0
4N. j SCC • j
.
O·
o
1\SCT oJ.+V SCT oJ.. ) 2 J
1re
0
can substitute the solutioos
1\
1\
into (A3.9) and (A3.7) to obtain the ~j's and CXj 's.
We will not
attCn[>t to ex;plore tne details of solving (A3.l0-A3.11) for v and
V
O
,
but
this problem can be attackecl tl1rougll metb:lds of numerical analysis, such
as
t~lC
ceneralized Newton-Rapllson mathod or the method of steepest descent.
1\
.
In practical applications, all ~.'s (A3.9) and eJ.1 SCT .'s and SCT o .• 's
J
.J
oJ
will !>resu.rna.b~ be > O.
Furthermore,
helpful to realize that, if all
Y
andv o should be >
o.
SCT. j • s and SC'l0. j • s are > 0, then tllere
can be no tlOre than one solution ,-:>f (A3.l0-A3.11) such tlmt
both> O.
It may be
v
and
yO
are
This is an immediate consequence of (A3.9) and of a proposition
which Ire nO'H present.
Ue stall prove tlmt tllere can be no more than one solution of (A3.3A3.6) iihicIl lies in the set
(A3.J2)
f3 j > 0 for all
j,
-c:J
< O:j <
co
59
for all j,
-co
<v, ,,0<
co
Tllis .'lill follO'" from the continuit:- of
I
II
L (A3.2) ane"'.. its ci.erivatives
in the set (A3.12), and from t:nc fact tInt the matrix of second o.crivatives of
L
is necntive definite througuo'XG (A3.12).
For., let
Z ([2n+2]::1) denote a
Suppose that
2:.1 ([2nK:]xl) and Z2( [2n+2]xl) are tHO points in (A3.l2) both satisfying
(A3. 3-A3. 6); Ire
~ ([2n+::~]Y..l) =
SI1OV,
that this leads to a contradiction.
Z2 - Zl'
Define
Define !!l (Z) ([2n+2]xl) to be tIle vector of first
derivatives of L(Z) (A3.2)[as ghren by tlle left-hand sides of (A3.3-A3.6)], and
define
L:
L (Z)
2
2
(z) ([.:2n+2]x[2n+2]) to be tIle matrix of second derivatives of L(Z).
can be sllo'l-m to be negative definite by using the same
tl1at was used in Note 1.
is a sca.lar.
Now define a function
g( A)
= L(ll
tj1)e
of argument
+ A Q), I,rhere A
Since L and its clerivatives are continuous in 1 tlrroughout
(A3.l2), eft-.) and its derivatives In.ll be continuous for 0 < A < 1.
g'(A)
= QI!!l(Zl
Also g"(A)
o :5
= .l?,'L2 (Zl+A £)£
A :5 1 since
g I(0)
+ A£)' so that
CI(O)
= .£1~l(Zl) = 0
= .£'~1(~)=0.
, so tllat G"(A) < 0 for all A in the interval
L is negative definite.
2
= C1(1) = 0
and 13 1(1)
We find
implies that
But, by the law' of tne mean,
e" (A) must be 0 for some A bet"reen 0 and
1.
Hence the contradiction.
Solving ~le system (A3.l0-A).11) is distinctly more complicated than
solving t~le camparatively simple equation (5. 6) via the iteration procedure
(A2.2).
Furtllermore, the complications increase as tile number of' T-variables
becomes larcer.
For the salce of c,:n;tpleteness, we mention an alternative method
of solvin:~ tIle system (A3.3-A3.6).
solution for a
j
After obtaining (A3. 7), He substitute this
into (A3. 5-A3. 6) and get a linear system in v and vo,
(A3.l3)
-,
I:
I
I
II
I
I
_I
I
I
I
I
I
I
I
_I
60
I
I
I
.e
(A3.l4)
I
I
I
I
I
I
'-Thicll rna,:,' be solved for v and vo.
Ie
T-variable, ue use a model it11.th tllO T-variables,
I
I
I
I
I
I
I
I_
I
TIle resulting formulas for v and v O, which
"Till be linear combinations of tIle 1',' s, may be substituted [along with (A3. 7) ]
J
into (A5.4) to obtain an equation system in the I'j' s, which vTill be linear
except for tIle (N. / I'j) terms.
He 1'11.11 not dwell here on tIle solution of this
system in tIle 13.' s, except to sa;,;' that a similar system vTill arise later in
J
connection ,-lith our third approach (Section 9 of the paper) and 'dil be consi dered in detail.
Note 4
ActuaJ.ly, instead of worl~in::.; ':.i. th the model (5.14) which llas only one
(A4.l)
E(c,J.J'1)
-:
= ai
+ b.H.
'1- + b~Oijk
,. + bT.J.J.'
1. 1. j ••
in order to provide a more general development.
,
We shall use notation for tre
ci.l'S (and T? .,.'s) which is in analogy wit11 (2.1) and (2.4); ren.ember, though,
J -:
J.J."
t'lat in on application of the type being considered in this report, it is
really t:i.le cf.jIC'S (5.13) rather than the cijk's
w.hich will have to be entered
into tIle caJ.culations.
Our object is to obtain tIle least-squares estimates (whic~l are the same
as the rna::i.mum-li};:elihood estinntes in this case, if the distributicn is normal)
of the ai'S, the bi'S, b, and bOo
(A4.2)
First
"re re-write (A4.l) in the form
E(c"I'> =Ai +b i (H·.·1.-ii. ) +b(Tijk-T. ) +bO(T~ ..,.-T?),
J.J "
J.J ' .~.
1...
J.J"- J..
where
61
(A4.3)
= a.
A.;
~
~
.
+ biB + b
i
I
I
T.
~.
-.
+ bOT?
~.
Then the normal equations are easil,j' found to be
H
1. •
·•
0
0
0
(A4.4)
•
·
0·
•
-
0
~
0
0
A
c
1.
•
•
c
STom.
b
S
STHm.
·
STID.•• • • STHm. I.S
i TT ~.
ST°m.. • • STORm. D3 o.
. TT ~.
~
0
0
0
STID..
0
0
·• •0
·• ·
0
A
···
•
0
N
• · • 0
m.
• 0
SHIn•• · •
•
·• ··· • • S.•
· HHr.1.
··
··
0
0
0
A.~. = c i .' so that
Thus
1\
1\-
1\-
a. = c.~. - biH.~. - bT.~. _
~
(A4.5)
in vieiI of
(A4. 3).
1\
1
STomn.
··•
b
n3
b
..
i TT
m.
m
:'1
o.
~.
D3
°.
i TOtr ~.
cm..
•
=
bO
S
cHm.
PcTi.
fJcToi
-
bOT o
i.
1\ 1\1\
(A4. 1,.),
1
'-
m
and then muJ.tiply the resulting inverse b:'l the (m+2)xl sUb-vector
"'hich cO::I]?rises tl1e bottom (m+2) positions of the vector on the right-hand side
of
V.
(A4.lJ.).
This (m+2)x(m+2) matrix i11licl1 has to be inverted ire nay denote by
We then define certain a.dditional matrices by tIle identifications
(A4.6)
_I
To obtain (b , b", ••• ,b ,b, b ° ), we need to invert the
(m+2)x(n+2) sub-matrix appearinc; in the lo~r right-band corner of tile matrix
in
I
I
I
1\
1\
I
I
I
I
I
I
I
I
I
I
v=
and
(A4.7)
_I
62
I
I
I
.e
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
.e
I
Now
(A4.8a)
,
(A4.8b)
and
,;m = (I - vmvTH)V~
Where
I
,
denotes the (J!l>4Il) identity matrix.
Formulas (A4.8) are standard
formulas 1finch are used in cormection 'tdth the inversion of partitioned matrices;
their c:)rrectness can easily be verified directly, however.
Observe that these
formulas (A4.8) do not require tIle inversion of any matrices of 111gller order
than 2 (except for the matrix VIm' but V is diagonal).
HH
T'llUS
-1
V
is found
rather easily by performing the "tiu ee calculations (A4.8) and tIlen plugging
1
tile resulting matrices into (A4. '0.
We can then obtain all the desired estin8tes.
In case we are willing to simplify our roodel by eliminating the bi's
(i.e., setting them equal to 1) in (5.14), then (5.14) is altered by replacing
b. with b'.
v1ith two T-variables [as in (A4.l)], the DDdel tall:es the form
J.
The para...neters are then estimated b Jr tile formulas
(A4.l0)
"a. = ci
and
'"
b'
(AL~.11)
J.
1\
b
bO
'"
•
A _
- b'H.J..
rSHHL
=
f STIli.
1: SToHL
~STHi
J.
•
~STT'
J.
J..
I:
i S~.J..
-1
r SToHL
STToL
~
SToTo.J..
J.
r
~S
J.
cHi.
fS cTi.
~
1. SC TOi •
•
I
I
\'le consider explicitly just tlle case where there are eJmctly two T-
variables: Tij1i: and Tijl~'
two T-varial)les
~vill
The extension of tIle tlleory to tIle case of IlX)re than
be obvious,
~lO'lrever.
If tlle joint distribution of
T
ijk , T1jk,
and h
ijk
is tri-variate normal
in tile unselected population, tllel1 the conditional joint distribution of T
ijk
and Ti j l:
Given l\jk' for either t';le unselected or the selected population,
~dll be (see, e.c., [1], p.29, Tlleorem 2.5.1) a bivariate normal distribution
of tl1e fora
(A5.l)
after l'Te have substituted for h ijk in accordance with (2.3).
!lote that, in
(A5.l), tlle (ai,bi)'s are not unique, in the sense that, if all ai's and bits
are multiplied by the same constant, this constant can be absorbed in b and
b O , and if all a.'s are increased or decreased by the same constant,this adjust~
ment can be absorbed in Il and 1.1°.
mates of' these equating parameters a
(B, BO)
(A5.2)
1\
a
i
= N••
We 'till stlOw that maximwn-li1:elihood estii
and b
are given by
(~;. -_:-)
Ti • -T••
(B, BO)
i
1\
- b.~
(Wl-H)(~o)
ii.~.
and
(A5.3)
"b. =
~
,
N••
SHHi.
-.
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-.
64
I
I
I
,I
I
I
I
I
I
1:
H(~)= i
S2
THi-=-+1: N (To -T )2
SHHl. i i. J.. ••
S
1:
i
STHi. STom.
(- ) (-0
-0)
1: -S·---- +i1l
Ti -T..
T. -T••
i
i
HHi.
i··
J..
S~oHi
S 0Hi
THi.
(iii.
-0)
T•• ) (-0
S T . +mJ.o ""i..Ti -T••
HHi.
i·
.
1: ""
i
SHHi.
• +nr. (TO _T~.)2
i J.. i.
and
Ie
I
I
I
I
I
I
I
and 't-there (B, BOP is a chara.cteristic vector of the Ills:trix (Wl-~'l)-lW correspondinc to tIle largest characteristic root of (W -W)-:J,'1.
l
se.tisf"J the equation
,
~lhere A is the maximum root of (Wl-W) -~.
He now shoW hm.,. these estimators were derived.
Tile logarithm of the
product over (i.,j,k) of the ~"q)ressions (A5.l) is
,I
O
In other words, B and B
(fornmla. continued on following po.Ge)
65
I
I
(p.,5.8) (cont.)
~
,
° -1 bbO 1:1:1:(a +b.H.. ) 2 +1: 1: 1:(a +b.H. j,J (b,b ° )1:-1 ijk -J..L)
....~~{b,1)1:
( )
i j J~ i ~ ~J k
i j k i ~ ~ ...
T:jl -IJ. °
HOlT
~
•
:;:
L(A5.8) ,·lith respect to the elements of 1: (2x2) and
if lTe differentiate
tl1el1 set tile resulting derivatives equal to 0, we obtain the equation
,
,-tlic~l
is tile condition for L to be maximized 'dth respect to 1: (see, e. g., [lt
pp.4G-1.~7,
The other first derivatives of L(A5.8) are
Lemma 3.2.2 for details).
Given b:"
(1'.5.10)
2:L
(
~=-b,b
ca..
J.
-1 (b~(
( 0) -l(T.TO -lot)°
boN.a.+b.Hi )+b,b
0) 1:
~.
l
~
J.
1:
•
co.
J.
(1'.5.12)
(
cL J
dil
.=
CL
r.- l
b
i.
J.
(T~.
-I;•• \-1 0
) -r.i
T.. -I.•• p.
J. .,.
JJ'"
J.jt
(N. a +b.H. ) r.- l ( b
J.. i
J. J..
b
W
and
2.J
L
c b =-r.-1 (bbO~ r.r.r.
"'c. L
~'-e"
• "'1J.J -
(a.+b.Hi .} ).--+r.EE
:' (ai+b.H ..,.)
J.
J.
J
ijJC
I:
-J..L
1
>T = -(b, bO)1:- 1 /b 0) (a.H +b.n:: H:; J+(b, bO):Er
(1'.5.ll) ~
J. J.Jl..
0b
Settinc (1'.5.10) equal to 0 and. solving for ai' we obtain
LZr
J
•. ,.Hi J'J" -iJH.J..
j1: J.J."
~ T0
H
0.,,.
~
••,. • 'J- -\-1 n.
,
jJ;: J.Jr.. J.J J..
0)
I
I
I
I
I
I
_I
~.
•
J..
-.
I
I
I
I
I
I
I
-.
66
I
I
I
,I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
(A5.l4)
set (A5.ll) equal to 0, substi1llte from (A5.l4) for ai' and solve for
Next
't're
b .
i
We get
b.].
=
•
° and then plug in(A5.l4),
If lTe set both rows of (A5.12) equal to
(A5.16)
IJ
= T.. ,
IJ
o
satisfies the resulting equation system.
-0
•
T••
(The solution for
unique, due to reasons mentioned previous1y.)
IJ
and 1J0 is not
Now we set both rows of (A5.13)
equaJ. to 0, pre-multiply by Z, and substitute from (A5.l4-A5.l6) to obtain
Where
W is given by (A5.4).
If we substitute (A5.14-A5.16) into (A5.9),
then (A5.9) reduces to
(A5.l8)
.I
we see tll8.t
67
where
\'11
is given by (A5. 5).
I
I
-.
Now (A5 .18) reduces furtl1er to
upon substitution of the relation (A5.l7).
I
I
I
I
I
I
Thus we obtain
(A5.20)
upon post-multiplying (A5.l9) by
1:-1 (b, bO) I, and
(A5.2l)
a.:fter pre-multiplYing (A5.20) by (b,bO)1:- l and then substituting
_I
,
ltnere (A5.~>2) defines
Band BO.
After application of t~le relations
(A5.20-A5.22), equation (A5.l7) will finally reduce to
(A5.23)
NOll tile only way for (B,BO)' to satisfy (A5.23) is for it to be a characteristic
vector of the mtrix (Wl-H)-~'1.
a
cl~acteristic
of (Hl-W) -~'"
In order to maximize L(A5.3), we must choose
vector corresponding to the largest characteristic root
as indicated by (A5. 7).
A
Note finally t:il8.t, upon substitution
of (A5.l6), (A5.21), and (A5.22) into (A5.l'-A5.l5), we obtain (A5.2-A5.3).
We might wisb to consider what happens under a simplified model in
which tlle b.'s in (A5.l) are all eliminated and set equal to 1.
1.
68
It turns out
I
I
I
I
I
I
I
-,
I
I
I
.e
I
I
I
I
I
I
tha.t estimation under this nx>del is not really too Imlch easier than under t'ne
model (A5.l).
(A5.24)
waere
is eiven by the forlTllla
,
- H.1..
= (I: SHHi' )
i'
•
the largest root of this matrix.
Here we are defining
1
=r.
Ie
I
I
I
I
I
I
I"
(A5.;:~6)
I
tl..
1.
(B,BO), is a characteristic vector of the matrix W;J1i4 correspondinc to
and
•e
~i
We will ShO't'T that the estinator for
"There
'I'll
is given by (A5.5) and
~
- )2
I: N. (-T.-T
i 1.. 1.. • •
W2 (2x2) = I: N. (T. -T.• )(T~ -T~.)
i 1.. 1..
1..
Thus
I: N (-T. -T..
Ti -T••
- )(-0
-0
i i . 1.
•
-0
J
-0)2
I: N ( T. -T••
i . 1..
i
B and B satisfy the equation
O
,
-1
wllCl"e 1\' is the maxinum characteristic root of W
3
W4•
The derivation of tlle estimators is sanewhat similar to the derivation
69
E~uations (A5.1), (A5.8-A5.10), (A5.12-A5.14), and.
for tile previous model.
(A5.16) go through just as before; except with all bits replaced by 1.
the solution for a
i
~nls
[corresponding to (A5 .14)] becomes
l
(A5.29)
a. =
~
(b, b O)I:- (' ~~. -~:.)
T. -T••
1
~.
(b, bO)I:- (~o)
after substitution of (A5.16).
- Hi
.
After this point, the solutioo of the maxitlu.m-
lil:elL1ood equations takes a sOlilew11B.t different course.
Instead of' (A5.17)
we obtain
(b,bO)I:-J,'12I: -l(b,bO)t
(A5.50)
-
+
[ [(b,bO)I:- l (b,bO)t]2
1
(b,b·)I:-l(~1
1'1
E- l
2
+
I b 0)
\b
+(rf ST"Hij
STili. \
I
I
-.
I
I
I
I
I
I
_I
= (0 )
0
'
and instead of (A5.18) we llave
If t~le relation (A5.30) is applied to (A5.31), then (A5.31) reduces to
I
I
I
I
I
I
I
-.
(A5.52)
70
I
I
I
.I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
If (A5. 30) is pre-multipliecl b:' (B, BI», u'l"lere
b:!
(A5.;~2), 'VTe
,.rill arrive at tjle equation
Ho,.. if (A5.32) is post-muJ:'dpliecl by (B,BI»"
then ue "rill end up 1'rith tile
relat::on
(A5.34)
a:f.'ter substituting (A5.33) and dividing by N••
Obse?Ve that fran (A5. 26),
(A5. 33) -' and (A5. 34) it follm1S that
(A5.35)
T;lUS
(A5. 30) becOIIes
(A5.36)
[ W4 -
(B,BO) Hl~ (B,BO)'
(B,BO) H (B,BO)'
'toT
J(
3
B)
BO =
(0)
0
3
upon r.ro.ltipl:ying by (b, bO) 1:-1 (b, bl», and then appl~/ine (A5.33 -A5.35).
From (A5.36) it follows tlm.t (B,BO)' must be a characteristic vector of
U3J,'Tl~ ; in order to maximize tIle lil';.e1ihood, we choose a c~18Xacteristic vector
correspondiIlG to tIle largest root
A',
as indicated by (A5.28).
Tins deter-
mines (B,BO) except for a nm1tip1icative constant, but it is obvi:JUsly not
necessary to find this nm1tiplicative constant for purposes of using torr.n.tla
(A5.24), 'VThich is obtained from (A5.29) by substituting (A5.33).
.I
B and B° are a.e;ain definecl
71
I
I
Note 6
The logarithm of the product over (i, j,k) of tJ1e expressions (7.11) is
(AG.l)
We differentiate
L (A6.1) with respect to ai' bi' and J.L, and obtain
dL
-=
~i
(AG.3)
dL
~
QUi
i •
= -Nb
i
Y..2
+ 1-12 [(
- a.H. +b r.t n-;i 'k)+ pe(r.t T. '1_Hijk-~i )
i jk
P
1. 1..
J•
jl;: l.J..
•
J
,
and
If
l1e
set (A6.2) equal to 0.. solve for (Ni •ai+biHi.)'
S'U.r.1
over
i, and substi-
tl:t.C tile result into (A6.4) after setting (A6.4) equal to 0, then we find
1\
-
J.L = T••
upon solvine for J.L •
1'1e
After setting (A6.2) equal to
solve for a. and end up with (7.12).
1.
° and substituting (A6.5),
Next we put (A6.:?) equal to 0, substi-
tute (7.12) and (A6.5), and obtain eventually the quadratic equation
(AG.G)
-.
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-.
I
I
I
.e
in hi.
One solution of (AG.G) must be positift and the other negative.
solution is Given by (7.13)., and the other solution is t~le same thing except
,·riJ';;l t:le first plus in (7.13) replaced by a minus.
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
.e
I
One
In any case, the solution
(7.13) "Till result in a larGer value of L (A6.l) tllan the ot!1er solution.
Note tha.t (7.13) will be positive so long as STHi.
> 0 (~
being assumed> 0).
The (2m+l)x(2m+l) matrix of second derivatives of L (A6.l) with respect
~
to a..,a,),
••• ,am, b ,b , ••• ,b,
.L ,_
I'1
l 2
is negative definite for all values of these
para.meters and all values of e and p.
This is easily esta.blis11ed by an argu-
ment sir.ti.la.r to the one used in Note 1.
for any fL"ted values of
e end
p
J
Thus we are able to conclude tl1&t,
L (A6.l) takes on an absolute maxinnun't'tIlen
tJlC o.i's, bi'S, and ~ are as Given by (7.12), (7.13), and (A6.5) •
If .re substitute (7.32), (7.13), and (A6.5) into the formula. for L(A6.l),
and
o.lso utilize (A6.6), then (~·r.l.th the constant terms omitted) L becomes the
inv'Olvec1 function of 9 and p 'tilu.ch is given by (7.14).
lil::elillood estimates of
,9
t~lesc
tlle;)~
values are found,
Thus the maximu..n-
a.nd p are the values WhicIl L1BJd..mize (7.14).
After
are plugged into (7.12) and (7.13) in order to cet
the naxirJtUn1-li1l:elihood estinates of the a. I s and b. 's.
J.
110te
J.
7
Actually, we will consider explicitly just the case 't.f:1ere there are
exactly two T-variables.
Fram
our development for tyro T-variables, the exten-
sion of tIle theory to the ca.se of m::>re t11an two T-variables 'tdll be obvious,
md at the same time it "riJ~ be apparent how to prove fornn.u.a. (7.16), which is
for tile case of just one T-variahle.
The analogue of (7.16) far two T-variables
is Given by (A7.13) belo"".
lTith two T-variables, 'tre work 'tdth the tri-variate normal distribution
(A7.l)
73
in place of the distribution (7.15).
I
I
-.
We define certain natrices by the
equations
(A7.2)
Z(3x3) =
[-~(=) 4.m(2xl.~,
r.,ffi(:uc2)
~(lxl~
= [~~=)
E-1 (3):3)
~(2xl.)J
I
I
I
I
I
I
.
Z-- (1x2) l;--(lxl)
TIle 1.o13a.rithm of the product over (i, j, k) of the expre ssions (A7.1) is then
L = const
-i N.. 1OS Izl-;~
+ (ai +Hijk )2
n:E
ijk
r(Tijk-J.1,Tijk-J.1°)~(Tijk-J.1)
L
TO
°
ijk-J.1
rm+2(Tijk-~Tijk-J.1°) z TH(ai+Hijk )
]
•
L (A7.3) with respect to the elements of I; and then.
settinc the resulting derivatives equal to 0, we obtain the equation (see e. g.,
After differentiating
[1], pp.46-47, Lemma. 3.2.2 for details)
--I:r.t(T
ijk
(A7.4)
H•• I;
=
ijk
(
n:E(Tijk-J.1
ijk
n:E(Tijk -J.1) (ao+Hi ok)
ijk
J.
J
n:E(Tijk-J.1°)(ai+Hijk) r.r.t(a i +Hijk )2
ijk
ijk
0
CJ L
di?
T.. -N.. J.1
mi' ai+H••
i
•
and
.2&.. = _(~t, J!IH)
Cai
)
L (A7.3) are given by
~)I: (r,TT, i1'H) (~:.• -N•• J.1 °
;
(A7.6)
2
0
I:r.tJ.
jk(Tijk -J.1) (Tijk-J.1°)
The other first derivatives of
I
_I
_J.1)2
Ti • -Ni • J.1
TO -N J.10
i.. i.
Ni • &i+Hi •
r.r.t(Ti j k-J.1°) (ai+Hijk)
ijk
•
I
I
I
I
I
I
I
-.
I
I
I
,e
I
I
I
I
I
I
Nmr 1f we set (A7.5) equal to (0,0)', obtain a third equation
I:,
then ,te "1ill arrive at the saJ.utions
"Il =-T•• ,
fer Il and Ilo.
-0
T••
By setting (A7.6) equal
to 0 and solving for ai' we get
(A7.8)
after substituting (A7.7).
l-Jext we plug (A7.7) and (A7.8) into (A7.4) and
obtain the reJ.ations
Ie
I
I
I
I
I
I
I
(A7.l0)
I
?J../?Ai = 0
by using (A7.6), end pre-mul.tiply the resulting set of tl1r'ee equations by
and
.e
-1:
i
~c
N..
(rr
r..m =
STHi•. ) STom..
1m
W2
l;--
~
,
end W are as defined by (A5.5) and (A5.27) respectively.
2
(A7.2) and i'rom the fact the 1:E-J. = I, it follows that
1'l1
'1-r
~+~
= 0 (null ms.trix)
From
,
so that
(A7.ll)
•
Non 'ue substitute (A7.ll) and then (A7.9) into the left-hand side of (A7.l0),
and find tl1at
(A7.12)
1
- 1: HH
~ = (w1 -w2 ) -J. ( I:f SSmi. )
1
75
Tom.
upon solving for
~.
Finally, we plug (A7.12) into (A7.8) and end up with
as the max:Lmum-like11hood estimator for the equating parameter
Note
a •
i
8
The logarithm of the product over (i,j,k) of the expressions (9.2) is
(A8.l)
L • const + ~ N.jlogl~j
I
-1 EEE(aj+~jcijk-v.rijk-ai-biHijk)2
ijk
•
and
~~ (N./~j)- ajc'.j-~j~ C~jk+ v ~ CijkTijk+~ aiCij+~ bi~ CijkHijk.
=:
If we set (A8.2) equal. to 0 and solve for
-.
I
I
I
I
I
I
_I
Ta.king derivatives, we find
(A8.6)
I
I
ai' we get (9.4).
Next we set
I
I
I
I
I
I
I
-.
I
I
I
(I
(A8.3) e,qual to 0" JlBke the substituticm (9.4)" and so~ve far b ; this gives
i
us (9.5). Now it we substitute first (9.4) and then (9.5) into (A8.4-A8.5)
I
I
I
I
I
I
after setting the latter equal to 0" we wind up eventually nth the system
(9.12).
For this system (9.12) we obtain a solution of the form. (9.13)" which will
generaJ.ly be a unique solution except for the fact that a constant can be
added to each
1\
j (see Note 9 below for details).
CZ
After settiag (A8.6) equal
to 0" we plug in first (9.4) and then (9.5) to obtain
'r
(A8. 7a) 1: B 1.I: ~ -I:
J - jJ1k ijk i
cij ciJ _ I: (sCHij~
~ d)(SCH1J~.1J-iJ
--d )
Ni •
i
HHi.
]
~
J
Ie
I
I
I
I
I
I
I
,.
note that the system (A8.7) is unaffected if each o.:J is altered by the same
additive constant" inasDDlch as r. gJj = 0. Upon substimting (9.13) into
.
J
(A8. 7b)" we end up with the system (9.17), the solution of ,,4tich is discussed
A
below (see Notes 10 and 11).
Cklce the ~j's are determined, the estinates of the
other parameters are of ccurse obtained via formuJ.as (9.13)" (9.5), and (9.4).
,e
I
77
Note
I
I
9
Up through the point of determining the estimates of the D: ' S, v,
j
the ai's, and the bi'S in terms of the I3j 's, we are dealing with nothing more
than a strictly linear analysis of variance rodel.
Up to this point, the
problem may be thought of as tl1B.t of maximizing L (A8.l) for fixed values of
the I3 j 's.
This is equivalent to the problem of finding the least squares
estimtes of the ai's, the bi's, the D:j 's, and v under the linear model
(A9.l)
•
Consequently, we may apply some of the broad theoretical results for the
general linear IOOdel (as Given, e.g., by Bose [3]) in examining certain facets
of the equation system (9.12).
For this purpose, we consider that the 13.' s
J
on the right-hand side of (9.12) and on the left-hand side of (A9.l) are
fixed.
The model (A9.l) actually bears some resemblance to the IOOdel for the
incomplete block design, or for the two-way layout with unequal numbers in
the cells.
Consequently, t11ere is some similarity in the formulas, the normal
equations, and the theoretical development.
the ra.nk of the matrix F (9.8).
virtue
of (9.14).
designs,
n.
'He
We first examine the matter of
Now the rank of F
C1\nnot exceed n, by
By using methods akin to those used for incomplete blocl~
will develop a. set of conditions for the rank of F
to be exactly
TIllS set of conditions irlll be sufficient but not necessary;
it should be
adequate for practical purposes.
First we make the trivial assumption that v is estir.la.ble [3] under the
mdel (A9.l).
A sufficient condition for this to be true is that there exist
three students, identified by (i,j,~),(i,j,~), and (i,j,~), who are from
,the
s~
high school and ilho go to the same college, and for whom
78
..
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-,
I
I
I
('
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
.e
I
1
TijY'l
H
ijk
1
1
Tio y
J '2
Hij~
1
Tijl~
Hijl~
Another assumption which
''Ie
I
0
make, of course, is that
•
SHHi.
> 0 for every
high school.
NO~T
it can be shown that the rank of F will be as high as
only if the contrast (CtJ-o:j ) is estimable for all pairs (j,J).
n
if and
We now give
a sufficient set of conditions for all such contrasts to be estimable.
let
m <.s D.) be the number of different high schools (i-valUes) such that, far
o
each of these m i •s, there exis,t,s at least one j -value (college) such that
o
Nij
>1
and such that Hijl,Hij2' ... , HijNij
m0
that bois estimable for each of these
1
are nat all alike.
high schools.)
(This implies
Consider the
strix W'hose general element is H , wi tb the rows of the natrix reij
o
ferring to the m specified high schools and the columns to the totality of
m x n
p
the
n
If this
colleges.
for a connected
m x n
o
matrix constitutes the incidence matrix
incomplete block design [i.e., if, for every pair (j,J),
there exists a connecting chain Hi j,Ni j , Hi j , Ni j , No j , ••• ,
1
1 1
2 1
2 2 13 2
Niu j
, N j
, Hi J' all of whose elements cane from the natrix anti are
i
r-l r-l
r r-l
r
> 0], then (CtJ-<Xj )
consequent~,
F
will be estimable far aJ.l pairs of colleges (j,J), and,
will be of rank
n.
We have not supplied all of the fine
deta.i1s in the argument we bave used in this paragraph, but the argument bears
SOIJE
resemblance to certain standard
devel~nts
in incam;plete block design
theory.
Thus we have a sufficient condition for
condition
~lOU1d
F
to be of rank
not appear to be too difficult to check for.
79
n.
The
Offhand, there
I
I
would appear to be little doo.bt of the condition being satisfied under a
J.e.:rlje central prediction system..
If F
were not of rank n, then not all differences (aJ-a ) 'WOuld be
j
estinE.ble, 'Which would mean there would be no basis for comparing grades from
certain high schools.
The calculations for estimating the parameters wou.lcl
tIlen become a bit more conq>licated.
the case where F
has
However, we ,·1111 nat consider further
ran!.. smsJ.1.er than n, since
such a case should not
arise in practice with a large system, and. the sufficient condition of the
previous. paragraph ought to be satisfied without any trouble.
we ,-r.Ul assume that F
If
(a.t,~,
is of rank n.
.•., Cb' v
) t and
(aI 'ct2' ..., a~,
ent solutions of (9.12), then it follows that
a.-aj
J.
t
From now on,
=v
t
) t
represent two differ-
and that the difference
To show this, we consider tre two equations
is the same for all j.
(9.12), one with (a.t,~,
v
vt
..., an'
v )t and the other with
(c)"et2' ... ,
a~,
v')'.
We subtract the latter frau the former, and obtain the null vector on the
riGht-hand side and F
side.
T"nus
times
(a.t -<Xi.,~~, ..., an ..a~,v-v')' on
(a.t -eti.'~ -<X2' ••• '~n~, v-v')
to the rO't'iS (or colwnns) of F.
lies in the vector space orthogonaJ.
But, since F
space orthogonal to the rOll'S of F
the left-hand
is of rank
n, the vector
must be of rank 1, and in fact wst have
as its basis the vector (1, 1, ••• , 1, 0)' in view of (9.14).
TIlis is suf"fi-
cient to COmPlete the proof, and we conclude tba.t the solution of (9.12) is
uniql.le except for the fact tbat the Qj'S may all be altered by the same additive constant.
NOlf
(9.13) will be a solution of (9.32) if F* is any conditional in-
verse (3] of F.
M:>re particulArly, we will indicate here how to obtain a
specific F*([n+l] x [n+l]) Which can satisfactorily be used.
We consider
the matter simply in terms of finding a solution of the system (9.12).
80
BJ.-
..
I
I
I
I
I
I
el
I
I
I
I
I
I
I
-,
I
I
I
{'
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
.I
virtue of (9.14), one of the first
n rows of (9.12) is superfluous; accord-
ingly, we arbitrarily eJ.iminate the first row of the system (9.12).
arbitrarUy decide to picl:: the solution for which 0).
= 0;
Next, we
accordingly, we
ms:y Imock out the first colUL1l1 of "mat remains of F, and at the same time
remve CX:t from the vector.
We
~
which "Till have a unique solution.
left with n equaticns in n unknowns,
What remains of F
is the nm matrix in
the lower right-hand corner of F, which we call F
from the fact that
F
(nxn). From (9.14) and
U
is of ranlc n, it foUows that F11 is of rank n.
Hence F;i (nxn) exista. t'1e rrm;y thus take F*[(n+l)x(n+l)] to be ~ matrix
-1
conta.ining F11
in its 10tler right-hand corner and zeroes elsewhere. It may
aJ.so be verified directly that, "lath such an F*, (9.1.3) satisfies (9.12).
Thus the problem of finding this F* is essentiaJ.ly 'the problem of inve.rti~ Fl,l.
F U (nm) is of course a huge matrix to invert.
Note from
(9.8,9.9,9.14), however, that (except for the last row and last column) the
diae;onaJ. elements of F
'trill be large and positive "I'rhile the off-diagonal
U
elenEnts lr.Ul be small and apparently nearly all negative. Because of this,
F11 Sllould be much easier to invert than would othenTise be the case if the
main diagonaJ. elements were not relatively large.
laSt column of F
As far the last row and
, it might possibly help, in inverting the natrix, to
ll
partition off this row and column and then utilize formulas similar to (A4.8).
Note 10
lie prove the uniqueness of any solution of (9.17) such that all ~j's
are
> O. We note first tl:e.t this system (9.17) can
different manner.
be
arrived at in a
In line "Inth the discussion of Notes 8 and 9, let us
observe first that, if (9.4), (9.5), and (9.13) are plueged into (A8.l), this
"Till result in maximizing L (A8.l) for any fixed values of the ~j ·s.
81
After
these substitutions, the e:q>ression within the parentheses in (A8.l) will
beco~
a linear function of the t3 j 's, of the form 1: Zijl#J' say, and L
J
(AS.l) itself will become
(AlO.l)
= const + 1:
j
Ii.J.log lt3j l-t ~'Z'z~
't·nlere the matrix Z (N.. x n) contains the generaJ. elemerrt
(i,j.,l~)-th row and J-th colunn.
ZijkJ in its
To find the values of the t3
j
'S
'Which naximize
(AlO.l), we differentiate (AlO.l) and set the derivatives equaJ. to 0:
(AlO.2)
NOVT
~ = D-~
~
~-
- Z'Z
t3 =
--
(null vector)
0
•
the system (AlO. 2) must necessarily be tIle same as the system (9.17); tl1us
ztz = A
(AlO.;)
•
We next use (AlO.2) to find tIle matrix of second derivatives of L (A10.l):
where
D /
N
t32 (n x n) denotes a diagonal matrix with the elements N. i~
along the nain diagona1..
for all values of the
How this mtrix (Alo.4) is clear1y negative definite
t3j 's. FurtherIl'X)re, L(AlO.J.) and its derivatives are
continuous throughout the set of points for Which
(AlO.5)
t3j > 0 for all
j
•
TI1US, by using the same line of argument that was set forth in the next to
the J.ast paragraph of Note 3, we conclude that there can be no nore than one
soJ.uticm of (AlO.2) which lies in the set (AlO.5).
82
Hence, since (9.17) is
I
I
..
I
I
I
I
I
I
el
I
I
I
I
I
I
I
-.
I
I
I
{'
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
•e
I
the same system as (AlO.2), e.ny solution ~ of (9.17) l~licll satisfies (Alo.5)
must
necessar~
be unique in the set (Alo. 5).
Note
11
The details of selecting a technique for solving the system (9.17)
l-lOUld best be left to a specia.list in computers and numerical analysis.
Hen'T"
ever, ire "Till indicate here sorne avenues of approacl1 't'thich may be promisinl3.
First we consider the generaJ.ized Ne'Wton-Raphson method (see, e.g.,
[7, p.203 ff.], [6, p. 135 fi.], or [10, p. 171 ff.]).
t~rine
to solve a system of n
In eeneral, i f we are
siI!D.1ltaneous equations in n unknowns,of the
fom
(j = 1,2, ... , n)
(All.la)
,
i.e.,
(~...w-.lb)
.! (Q)
=
Q
is an (n x 1) vector :runction, then the generalized Newton-RaphSon
where
$
~thod
uses the iteration forrm.lla
,
(All. 2)
where [<!I (IV] denotes an (n x n) mtrix Whose generaJ. ~_ler.2nt in the j f -tIl
l
renT and j-th column is d~(j')/d/3 '.
Sonet1mes [eII (!;,)]-l , where ~ denotes
j
l
the vaJ.ue of ~ in the init~ iteration, my be used 1n place of [ \ (~ld) ]-1
in (All.2) (see, e.g., [10, p.l72J );this spares us from ha.ving to invert an
n x n matrix at each iteration, and hence only a sinele
n x n matrix has
to be inverted•
He turn our attention to the specific system (9.17).
83
Taking
I
I
..
,
(All. 3)
we find
(m..4)
•
Thus, l·re obtain our i teration fornnl1a by substituting (All. 3) and (All. 4) into
(All. 2)
AJ.ternatively, we couJ.d taJ;:e
•
to be
,
(All. 5)
for ,.thicl1
,
(All. 6)
DA~
where
denotes a diagonal mtrix whose diagonal elements are the elements
of the vector
(~).
The other method which vie will consider for solving (9.17) is the method
of steepest descent (see, e. G. , [4], [6, p.132 ff.], or [10, p. 175 ff.]).
Strictly speaking, the metllod of steepest descent is used for finding the point
at 'H'ilich a function of n variables assumes a realtive minimum, rather than
for solving a system of the form (All.l); however, this latter problem can be
handled as a special case (see [6, pp. 132-133]).
which is to be minimized.
01,!/?f3..
J
Let
'lr<1!)
denote the function
Let oIl (,ID denote an nxl vector whose j -th element
Then the iteration formu1a for tre method of steepest descent is
,
(All. 7)
where
i\(~)
_I
Oft118.tld} though, this formulaticn (All.5) would not
appear to offer any advantage ClVer (All. 3 ).
is
I
I
I
I
I
I
is a scalar ,those determination
84
'We
now consider.
Ideally,
I
I
I
I
I
I
I
-,
I
I
I
('
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I_
I
A(~)
shouJ.d be (see [10, p.176, equation (6.27)] or [4, p.260, equation (5)])
the sI1aJ.lest positive root of the equation
(All. 8)
,,'(A) =
°
,
'Where
(All. 9)
He'uever, for use in (All.7) it is adequate to obtain an approximation to tl1i.s
ideaJ. value of 'A.
In cases 't-there the actual minimwn. vaJ.ue of
'If is 0, we can
get the Ne'\'rton-Raphson first approximation to the (supposed) root of r( A)
=°
rather tha.n to the root of (All.8), so that the fornnla
ShOl:t1.d Cive a. satisfactory vaJ..ue far A (see [10, p.176, formu.l.a. (6.28)] or
[4, p.259, last paragraph]).
However, in cases where the miniImnn value of
t
is not 0, the formula (All.10) may lead to trouble (see [4, p.259, footnote
7]), and it i'1ouJ.d appear to be better (if feasible) to use the Newton-Raphson
first approximation to the solution of (All.8) itself, which is
,
(All. 11)
,mere V'2(~)
We
nOvT
is the
nxn matrix of the second derivatives of
'If(~).
consider how the method of steepest descent can be applied to the
system (9.17).
If we take
,
(All.J2)
then
85
I
I
,
(All. 13)
so that the problem of solving (9.17) is equivalent to tl:e problem of finding
a point at 't'mich
(All. 7).
W(~)
(All.J2) assumes a minimum.
Thus we plug (All.13) into
For A(~) we wou1cl probably use (All.ll), in 'Hinch case we need the
•
There are also other i'laYs in which the mthod of steepest descent can be
applied for solving (9.17).
For instance, let us set
I
where
.!(~)
is as given by (All.3).
Then
value (of 0) if and only if ~ satisfies
t<~)
(9.17).
t<~)
steepest descent to find tIle point 't.mere
(All.15) assumes its minimur.1
Thus "re use the method of
(All.15) is minimal.
Fran
(All. J5) and (All.:3) we get
(All. 16)
•
Hence
(All. 17)
.II (~)
I:
2(A +
DN/~)(A~
ano. i'1e plug (All.17) into (All. 7).
D~~)
-
For
A(~)
,
this +'"i..me 't're sl'X)u1d probably
use (All.lO), for 'toThich we need only (All.16) and (All.17).
Notice that no matrix inversions are required in either of the iteration
procedures Which are based on the Irethod of steepest descent.
This is in con-
trast to our two iteration procedures based on the generalized Newton-Rapnson
met'llod, both of which require matrix inversion; hcrwever, in (All.2) we could
86
..
I
I
I
I
I
I
el
I
I
I
I
I
I
I
_I
I
I
I
f
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
use an approY..ima.tion to the inverse of
inverse.
It is possible that still other techniques for solvine (9.17) should be
considered; our treatment here is not intended to be eXha.ustive.
to otller methods for solvine a
S~lstem
of n
equations in n
For references
unknowns, see,
e.c., [8, p.215 ff.] and [10, Chapter 6].
In order to utilize any iterative procedure for solv:inC (9.17), it is
neCesSa.17 to have an initiaJ. value of ~ (which we call ~) to start off "Tith.
Of course, the "closer
we should be.
tl
~ is
to the true solution of (9.17), the better off
What we propose is to use for
~
the maximum-likelihood esti-
nates of tIre 13 j 's tinder the C IT model whicll was covered in Section 5.
Thus
the elements of ~ could be caJ.culated from (5.7) after (5.6) has been solved
far v.
Such a
~
should not be violently different from the exact solu-
tion of (9.17) 'Which we are aiming toward.
that
'We
Incidentally, it would appear
sl10uld thrO'\'T out any college for which SeT. j is < 0 [this causes
(5.7) to be negative aJ.so], but it would not seem to be too likely that such
a condition 1rould ever crop up in the first place.
An aJ.ternative possibility for choosing
~
l'TOUld be to use the estimates
of the f3.' s which are obtained by going through both steps of the second
J
approach
.
as des cribed in Section 7.
A
~
so chosen migh";; in many case s be
"closer" to the solution of (9.17) tl1a.n the ~ which was described in the
previous paragraph, but the additional computational labor u'hich 'WOUld be
required miglrt or might not be 1'1Orth it.
It is hoped that the mteriaJ. presented here in Note 11 1d1l provide an
adequate foundation for findi.ne a means of solving (9.17) at a reasonable cost•
•e
I
'1 (~) rather than caJ.culate the exact
87
I
I
Note
12
Under this sin;?lified nodel in which
'We
omit bi in the formula for h
(7.3), the distribution of Cijk Given Tijk and Hij1c is of
(21t)
(A12.l)
-ll~j I e-cJ{aj+PjCij1: -Yrijk-ai -bHijk)
ij1
:
~le form
2
The tlleoretical development for this model (Al2.l) will be quite similar to
that for tile mdel (9.2).
The loearithm of the product over (i,j,k) of the
eJC;Pressions (A12.l) is the same as (A8.l), except with b
i
replaced by b.
Let
us use (A8.10),(A8.2°), (AS.4°), (A8.5°), and (A8.6°) to designate the same
eCJ.U8,tions as (A8.l), (A8.2), (AS.!~), (A8.5), and (AB.6) respectively, except
"lith b
i
replaced by b.
to ai' v, a j , and ~j
and (AS. 6°).
Then the partial derivatives of L(AS.1 0 ) '\'lith respect
are Given res~ctivelY by (A8.2°), (A8.4°), (A8.5°),
dL
Upon setting (A8.2°) equal to 0 and solving for ai' we find
&.
~
Next
'He
(AJ2.3).
A
= (l/Ni • )(Ej Ni.a~
J J
1\
"-
11._
+ E ~,Ci' ) - Yr. -bH.
j
J
J
~.~.
set (A8. 5 0), (A8. 4 0), and (.AJ2.2) equal to 0 and nake the substitution
Then, after the terms in the ~j 's are isolated on the right-hand side,
this Get of equations becomes
(AJ2.1~ )
F0
~
a2
•
••
an
v
=G
~l
0
,
~2
•
••
I
I
I
I
I
I
el
Also we find from (A8.l 0 ) that
""="" =!: a.H j+!: ~J'lZ Ci ·1~Hi 'k-vm::rijkHi'1 -E a.H. -bl:EI: H2ij1 •
00
j J. j
11:
J. J
ijk
J C i ~~. ijk
C
A.
..
I
I
I
I
I
I
I
-.
~n
b
88
I
I
I
f'
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
.e
I
where Fo ([n+2] x [n+2]) and G0 ([n+2]m)
general element of G
o
for the first
are defined as follows.
The
in the j-th row and J-th column is
n rows ,
for the (n+l)-th row, and
(Al2.7)
for the (n+2)-th row [e iJ and diJ being defined by (9.3)]. F is symmetric
o
and ha.s the form
f
f
foll
oln f olv f Olb
o12 •• • •
f
f
f
f
f o2b
•
022 •
02n
02l
02v
,
(Al2.8)
F =
• • • • • • • • • • • • •
0
f
f
f
f
•
•
fon!
on2 •
onn
onv onb
f
f
f
f
f
ovv
ovb
ovl
om
ov2 • • •
f
f
f
f
f
obv obb
ob2 • • •
obn
obl
·
where
,
(A12.9)
(A12.l0)
f
-4
0""
v
= -T j+ ~ N••
•
i J.J
T.1..
(=fovj)
,
(AJ2.ll)
,
(AJ.2.12)
,
I
I
(A12.13)
f
:10.
oVu
= 1: STHi.
i
..
,
•
and
(A12.l4)
•
If ,'re make the trivial assumption that v and b
are estimable UIXier the linear
model with fixed ~j's which is analogous to (A9.l), tl'en it follows immediately
(by appealing to incomplete block design theory) that F 0 (A12.8) is of rank
n , so long as the
of a connected
"connected").
mm matrix of the Nij'S constitutes the incidence matrix
incomplete block design (see Note 9 for the definition of
By using virtually the same technique which was indicated in
tile latter part of Note 9 for determining F*, we can obtain a matrix F* such
o
that
el
A '
~
CX
(AJ2.l5)
=F*G
o 0
.•2
~
V
1\
b
is a solution of (A12.4) for the
CX
j
's, v, and b in terms of the ~j ·s.
Note 9 for certain remarks "ihich apply also in the present development.)
tbis point it remains only to solve for the ~j 's:
(See
At
we set (A8.6°) equal to 0,
plug in first (A12. 3) and then (A12.l5), and. wind up finally with the system
(A12.l6)
where
(A12.l7)
I
I
I
I
I
I
,
I
I
I
I
I
I
I
-.
I
I
I
{'
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
.e
I
and U in (Al2.l7) is an nxn matrix Whose general element is
o
(AJ2.l8)
•
Since the system (AJ2.l6) is of the sane form as (9.17), we may refer at
tIns point to Notes 10 and II for pertinent information concerning the solution of (AJ2.l6).
TIms we must first obtain 1* and A, then solve (AJ2.J.6) to get the
o
0
A
13j 's, and fin&1ly obtain the estimates of the remaining parameters via (AJ2.l5)
As we can see, it turns out that the ma.ximum-likelihood estima-
and (AJ2.3).
tion procedure is formally a.l.nx>st the same for the model (A12.1) as far the
mdel (9.2).
Although many of the formu1.as are somewhat sin1;>ler under the
model (AJ2.1) [compare (AJ2.5), (Al2.9), and (Al2.l8) with their counterparts,
e. g.], the computational labor required for the two most difficult parts of
the calculations [i.e., obtaining F~ or F*, and solving (AJ2.l6) or (9.17)]
does not appear to be reduced at all by simplifying the model so as to eliminate tlle b! ·s.
J.
Note 13
If the mdel (9.2) is simplified by eliminating the 13-, 's, tllen the
J
estimation procedure is exactly the same as that indicated in Section 9,
.
A
except that we stop just before (9.15), and we replace with 1'8 all ~j 's or
f3j's i'1hich appear in fornmlas (9. 4), (9.5), (9.12), and (9.13).
Note that
we no longer have to solve the non-linear system (9.17), so that the canputational burden is reduced considerably by using a s:i.tIplified model from.
which the
~j
•s have been eliminated.
Tile distribution of C
given 'rijk and Hijk under this simplified model
ij1c
91
I
I
..
is, of course, of the form
•
Thus this JIDdel (m.l) is actually a linear rodel; in fact, it is essentially
the same thing as (A9.l).
If the nxxiel (Al2.l) is simplified by eliminating the ~j t s, then the
developnent proceeds exactly as in Note
"
(A12.l6), and we replace with 1 t s all ~ j
(A12.3),
]2,
t
S
except that we stop just before
or ~ j
t
S
wbi cll appear in formuJ.as
(A12.4), and (Al2.l5).
Note
14
If the m:>del (9.2) is aJ.tered to include mare than one T-variable, then
there 'Hill be no substantial cllange either in the theoreticaJ. development or
in the amount of computational labor required to obtain the est1mB.tes.
We
treat the case of exactly ttvo T-variables, since this will be sufficient to
indicate 'tillat happens with a general number of T-variables.
The ccnditiona.l
distribution of Cijk given Tij1:, T~jk' and Hijk is then
(A14.l)
(2n)
-i I~ I e -M a:J•+~J_. Ci J'k- vr.1 jk- v °T1~jk-a1. -b 1•H.1J'kF~
j
•
The manipulations that are involved in getting tIle formuJ..as for the
estimates under the m:>del (A14.l) are practically the same as those outlined
in Note 8 above.
What we will do here will be simply to indica.te how the
various computing formulas of Section 9 are altered when there are two Tvariables instead of one:
(i) In the fornmla
for
(ii) In the fornmla for
A
8.
A
A
.
i (9.4), we include an a.dditional term (- v~i'>.
A
b i (9.5), we include an additional term (-voSTom)
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-,
I
I
I
{'
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
Ie
I
inside the square brackets.
(iii) In the matrix G,
11e
add an extra row, so that G
1d.ll
be (n+2)x n.
The elenents of this (n+2) -th row are specified by the formula
,
(A14.2)
where . ei,j = Ti j - NijTi.
The first (n+l) rows of G are the same as before
(9.6-9.7).
(iv) In the mtrix F (9.8), we add another row and another column, so
tl1at F will be (n-t2) x (n-fJ2) and still symmetric.
of F
The additional elements
are specified by
(A14.3)
f. ° =
Jv
_To .. + t N .
• oJ
i
iJ
Tio'
+ t
i
•
dijSwt°Hi.
~s---.=;;
HHi.
(=fo)
v v
,
and
(Al4.5)
•
The original elements of F
(v)
are the same as before (9. 9-9.11).
At the bottom of the vectors appearing on t~1e left-hand sides of
(9.12) and (9.1.3), we include a
V
O
and a
(9.:G) is now (n+2) x (n-fJ2), of course.
~o respectively.
The matrix F* in
No change is mde in formulas
If' it is desired to use nxxre than one T-variable in any of the models
covered by Note 12 and Note :G above, then the various relevant formulas can
be altered in an obvious nmmer to accommodate the additional T-variable(s).
93
As 'Has the case with the mdel (9.2), the relative increase in computational
labor 'l.il1ich results from the additional T-variab1e(s) will not be substantial.
Hote 15
He suppose that C ijk (c0.u ated college gra.cle) , Tijl~' ana 1\jk (e'luated Iligll
sC~lool ero.de) follow a trivo.riate normal
distribution in the unse1ected population.
Let (J..1c'~' '1.1) ,
denote the mean
vector and
a2c
(A15.l)
PCTO'cO'T
PClfcO'h
PCTO'cO'T
~
AmO'TO"h
PCHO"CO"h
Pr.rnO"TO'h
~
T
the variance matrix of this distribution.
(Note that P
and PCT' P and
CH
By appealing (e. g. )
CT
., and a..... and a.... can be used interchangeably.)
and pCll
. '.r.tt
. '.rn
to [1, p.29, Theorem 2.5.1], utilizing (A15.1), and fina.lly making the substitution (7.3), we find that the conditional distribution of Tijk given H
ijk
has mean
a:
E(Tijk IHijk )
= ~ -I- A.rH ~
(ai + biHijk-I-b)
and variance
,
and tIle conditional distribution of
(Al5.4)
c ijk given Tijk and Hij1;: has mean
I
I
..
I
I
I
I
I
I
el
I
I
I
I
I
I
I
-.
I
I
I
{'
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
.e
I
and variance
~(l-p~-P~H-~+2PCT PCHAm)
l-~
[IncidentaJ.1y, (Al5.5) is also equal to 1 the way the m::>de1 (9.2) is set up,
but this fact need not concern us here.]
Hcm the second approach is based on the mdel (7.4), "lhich we repeat
here for convenience,
but "re append an asterisk to a
i
and bi :
•
The tl1ird approach, strictly speaking, is based on the IOOdel (9.2), but, as
we indicated in Section 10, we "\'liD. figure the variances for the third approach
on tIle basis of a siJr;;>lified model "lhich assumes that we have the exact ratl1er
tll8.l1 just the estimted values of the OJ'S and ~j 's in (9.2).
1'his simplified
model, "I,t.hich is of the form
,
(Al5.7)
is linear, of course, whereas (9.2) is not; tlms the determinaticn of the
variances is facilitated considerably.
Although the variances of the estimates
of tile parameters as figured on the basis of the JJDdel (Al5.7) will presumably
be sliGhtly smaller than the true variances based on (9.2), the difference
apparently shouJ.d not be ereat, because the largeness of the N ,,'s should
1\
.J
1\
result in the OJ'S and t3 j 'S being relatively quite close to the OJ'S and
~j 's
respectively.
HatT "Ire can use results from the theory of the general linear model to
obtain the variances of the estimators under (Al5.6) and (A15.7).
ultirately at the formulas
95
We arrive
I
I
..
for the special form of the second approach, and
for the third approach.
TIle "terms pertaining to v" in (Al5. 9) relate ,to the
discrepancy between v and ~, and can be ignored for practical purposes.
Comparing (AJ.5.6) With (AJ.5.2) and (Al5.7) with (Al5.4), we see that the
at onc1 bI of (Al5.6) are not the same thing as the a i and b i of (Al5. 7),
since (.ai + bi Hijk ) is nnu.tiplied by a factor
(AJ.5.l0)
in (A15. 2) , and by a factor
in (A15. 4) •
In order to ma.ke (AJ.5. 8) cort.q)arable wit h (Al5. 9) , we multiply
"
(~~ + b~ H) in (Al5.8) by tl~ ratio of (AL5.l1) to (AJ.5.l0) :
J.
~
FinaJ.ly, in order to obtain (10.3), we divide (Al5.9) (~iith the "terms
perta.i.ninG to vlt omitted) by (AJ.5.12), after first substituting (Al5.3) into
(Al5.J2) and (Al5.5) into (A15.9).
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-.
I
I
I
i'
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
.e
I
Note 16
Since the conditionaJ. expectation of (11.4) is only neGligibly different from 0, the conditional variance of (11.4) is essentially
(AlG.l)
E[(Cijk-~ijk)2ITijl~,Hijl,;;J= E[ (c ijk ..E(c ijk ITijk,Hijk ) }21 Tijk,Hijl~J
+2E[ (cijk-E(cijl~ ITijk,Hijk)} LE(cijkITijk,Hijk)"~ijk}ITijk,Hijk]
-+E[ LE(c ijk ITijl:::,HijlC)~ijlC}2ITijk,HijkJ
We consider individually the three terms on the riglrt ..hand side or (Al6.l).
The first term is the same thine as (Al5.5), Which is equal to 1 under eitller
of tile formulations
(9.2)
or (7.18).
The second term is
0,
inasmuch as c
ijk
and ~ijl;: will be independent (in the conditional distribution, at least) no
matter 'uhat technique
was used for obtaining the
~i 's
and
~i 's.
In evalua..
tine the third term, lie phlC in (ll.3) and (11.2), and first of all 'We
sir.lplii'lJ l:18.ttersby putting
practical difference).
~
=:
v
(an approximation Which should ma.ke no
Then 'tilmt lre Im.ve is
,
which under the third approach is essentially (A15.9), i.e., essentially
(Hijk-rr(i)'. )2
SHH(i).
(AlG.3)
since var(c
ijk
ITijk,Hijk)
= 1.
side of (AlG.l)] to (A16.3),
,'Te
Adding 1 [the first term on the right..hand
get (11.5).
In considering (Al6.2) for the special form of the second approach, we
A
1\
will treat only the case l.filcre the ai's and hi's of (7.5) and (7.6) are
97
obtained from a set of (T, H) data 'Which is different fi'om that for 'Which the
(If the two sets of data are the same, then the
c. "I 's are to be predicted.
J.J :
fOrrmllas ''1'mch follow may be sOInevihat off, particularly so if Ni. is small.)
We can thus assume that neither
~i
nor
bi
is a function of Tij1t or Hijk •
Hence (.Al6.2) will be appro.~tely the same thing as (.Al5. J2), 'mich can
aJJ30 be ''1l'itten as
(.Al6.4)
~.l:... +
(H-HL )2 ] (,,0)2 var(T IH)
N'i.
•
S'HHi.
The v°in (.Al6.4) is of course tL'1e parameter which appears in (7.18); var (T IH)
is the variance under the model (7.4), and of course does nat depend on H
[see oJ.so (Al5.3) J.
We can assume that
VO
and var(T IH) can both be estinated
with neGligible error.
In (Al6. 4)
'We
write H••
J.J 1t
instead of H, and
'We
put parentheses around
the i' s every1'lhere else to indicate reference to a different set of data.
After then adding 1,
cfi "
(c-c)
=1
+
'We
~
end up loTith
..l:.N'·
(i).
+
-ill
(H •
)2 ]
iJki).
S'
HH(i).
as the approximate conditionaJ.. variance of (1l.4) for the special form of the
second approach.
I
I
..
I
I
I
I
I
I
_I
I
I
I
I
I
I
II
-.
I
I
I
e-.
I
I
I
1J CRNOl'lIEOOMENTS
The author wiSl1e.s to tlw.nk Dr. Riclw.rd S. Levine and Dr. Richard W.
Watldns of EducationaJ. Testinc Serivce, who were responsible for posing the
problem to hiIn.
The author also benefited from helpfUl discussions with
Professor R. Darrell Bock of the Psychometric Laboratory at- the University
of North Carolina.
I
I
I
Ie
I
I
I
I
I
I
.e
I
99
I
I:
REFERENCES
[1]
Anderson, T. W. (1958). An Introduction to M.1J.tivariate Statiltical
Analysis. John Wiley and Sons, New York.
[2]
Apostol, Tom M. (1957). Mathematical Analysis.
ing Co., Reading, }.Bssachusetts.
[3]
Bose, R. C. )timeographed notes on least squares and anaJ.ysis of variance.
Department of Statistics, University of North Carolina.
[4]
Curry, Haskell B. (l9lJ,4)."The M!thod of Steepest Descent far Bon-1Lnear
Minimization Problems". Quarterly of Applied Mathematics, Vol. 2,
pp. 258-261.
[5]
Gulliksen, Harold (1950).
New York.
[6]
Householder, Alston S. (1953). Principles of Numerical Analysis.
McGraw-Hill Book Co.. New York.
[7]
Sca.:rborougl'l,James B. (1955). Numerical Mathematical. Analysis.
Johns Hopkins Press, Balt!lDOre.
[8]
TraUb, J. F. (1964). Iterative Methods for the SoJ.ution of Equatiol1l.
Prentice-Hall, Englewood Cliffs, New Jersey.
[9]
Tucker, Ledyard R. (1960). Formal !blels for a Central Predicticm
System. Research Bulletin RB=60-l4, Educaticnal. Testing Service,
Princeton, New Jersey.
[10]
Addison-Wesley Publish-
Theory of Mental Tests.
Jolm Wiley and Sons,
The
Z8.guskin, V. L. (1961). Handbook of 1I'umerical Methods for the Solution
of Algebraic and Transcendentallquations. PergMal Press, K. .
York.
100
-II
II
II
I
II
II
I,
ell
II
II
I
I
I
I
I
-I
I
© Copyright 2026 Paperzz