Decomposing Differences between Groups: A Cautionary Note on

Somewhat divergent approaches to finding the components ofgroup differences
(for example, the difference in average earnings across gender or racial groups)
have grown up in the literatures ofeconomics and sociology. Both, however, are
variants of the same underlying approach. This article illustrates different
approaches and draws attention to the conditions under which two, three, orfour
components ofthe income gap between groups can be usefully distinguished. The
two central issues are which standard one uses to evaluate endowment differences
between groups, and whether the scales used to operationalize underlying concepts
have arbitrary zero-points or not. This latter difficulty has often been neglected in
empirical applications using variables such as education, region, occupation,
industry, and marital status. In such cases, the choice of a comparison group is
inherently arbitrary, and nothing other than an arbitrary decomposition of the
residual group difference can result.
Decomposing Differences
Between Groups
A Cautionary Note on Measuring Discrimination
F. L. JONES
JONATHAN KELLEY
Australian National University
n the literature of economics and sociology dealing with
discrimination, parallel but comparable applications of
regression techniques have been developed to partition observed
differences between groups (for example, differences in income
between men and women, or blacks and whites). What follows is
an attempt to bring these approaches into a common framework,
to describe some additional interpretations of the components,
and to note some difficulties often neglected in the empirical
literature. 1 To illustrate the argument, several simple models
relating hours worked to gender differences in income are
estimated.
The theoretical concept of interest is discrimination. In its
simplest formulation, the discrimination model assumes that for
SOCIOLOGICAL METHODS & RESEARCH, Vol. 12 No.3, February 1984323-343
© 1984 Sage Publications, Inc.
323
from the SAGE Social Science Collections. All Rights Reserved.
324
SOCIOLOGICAL METHODS & RESEARCH
each income earner there is a vector of characteristics related to
productivity and thereby to earnings. Discrimination is said to
exist when the market values the' same bundle of productivityrelated characteristics differently for one group (say, women)
than for another (say, men). One question of interest is how much
of, the observed gap in earnings between groups is due to
compositional differences between them (different endowments
or abilities, for example differences in number of hours worked
each week by men and women) and how much is due to
discrimination (different returns or treatments, for example
differences in pay for men and women who work the same
number of hours). Discrimination is measured as a residual left
after controlling for differences in endowments. This residual will
be less than the observed gap when the low earning group has
inferior endowments to, and lower rates of return than, the high
earning group, and greater if their endowments are in fact
superior. 2 It is important to recognize that because "discrimination"is defined as a residual, its measured size will depend on how
well the model is specified and on the level of measurement error.
We do not treat these two important issues here, because .they
apply generally to all analytic operations in the social sciences and
do not affect the methodological (as distinct from substantive)
issues that are the primary focus of this article. Thus, our
examples use a restricted-range of explanatory variables and we
make no adjustment for measurement error.
Some writers on the discrimination literature go further than
the simple bipartite division outlined above, and offer a three-fold
decomposition (e.g., Blinder, 1972) or even a four-fold division
(Winsborough and Dickenson, 1971; Althauser and Wigler, 1972;
lams and Thornton, 1975). To illustrate differences (and similarities) in approach and interpretation we begin with the four-fold
division using lams and Thornton as a basis, although the
approach seems to have been introduced into the sociological
, literature by Winsborough and Dickenson. The notation throughout largely follows that of lams and Thornton, where
Jones, Kelley / MEASURING DISCRIMINATION
325
Y stands for mean annual income;
Xi is the mean of the ith explanatory variable;
a is the regression constant; and
bi is the partial regression coefficient for the ith explanatory
variable.
A FOUR-FOLD DIVISION:
THE uINTERACTION" MODEL
We first estimate a regression relating some vector of productivity-related characteristics of the worker to his or her income.
The general form is Y = a + IibiXi and we estimate two separate
regressions, one for the high earning group and another for the
low earning group (the superscript H is for high, L for low; for
notational simplicity, we have suppressed explicit mention of the
subscript i):
y H = a H + Ib H X H
[1]
[2]
The basic idea is to express the difference in average incomes,
which is the gap to be explained, in terms of the difference in
predicted incomes from equations 1 and 2:
The hypothetical example shown graphically in Figure 1 helps to
make more concrete the general approach. Assume that men, the
high-earning group, have weekly incomes estimated as $40 plus
$3 per hour worked; whereas women, the disadvantaged group,
have incomes of $15 plus $2.50 per hour worked. If men work 40
hours a week on the average and women work only 30 hours, then
men's incomes will average $160 and women's $90. It is the net $70
difference bewteen these means that we seek to decompose via
equation 3:
SOCIOLOGICAL METHODS & RESEARCH
326
$160 - $90
=($40 + $3 · 40 hours) - ($15 + $2.50 · 30 hours) [3a]
Equation 3 above can be expanded in various ways to obtain
the Winsborough-Dickenson (and other) decompositions. One
obvious strategy is to express the regression coefficient for the
high-earning group, b H , as the sum of two parts, the regression
coefficient for the low earning group plus the difference between
coefficients for each group: b H =b L +' (b H - b L). Similarly, we can
express the mean for the explanatory variable for the high group
as the sum of the mean for the low group and the difference
between groups: X H =XL + (X H - XL). Using the above example
we can express men's wages ($3 per hour) as the sum of women's
wages and the gap between men's and women's wages ($2.50 +
$0.50), and men's hours of work (40 hours) as women's hours plus
the gap between them (30 hours + 10 hours). These expressions for
b H and X H can then be substituted in 1, and the terms can be
rearranged to obtain an alternative way of expressing the high
group's regression equation:
yH
= aH + k[b L + (b H
_
b L)] [XL + (XL + (XH
_
XL)]
= aH + kbLX L + kbL(X H _ XL) + kXL(b H
+ k(b H - bL)(X H - XL)
_
b L)
[4]
By substituting 4 into 3 and rearranging terms, we derive
Winsborough and Dickenson's four-fold decomposition:
(y" _ yL)
=(a" _ a L) + IXL(b" _ b L) + IbL(X" _ XL)
+ I(b" - bL)(X H
GAP
= membership
-
XL)
[5]
+ coefficients + endowments + interaction
per se
per se
Jones, Kelley / MEASURING DISCRIMINATION
327
The first term, (aH - a L), is interpreted as an "unexplained" part of
the difference in incomes due to group membership. In the
example, this is the difference between the income (from, for
exampIe, savings or transfer payments) of men who do not work
and women who do not work (namely, $40 - $15). Graphically, it
is the difference between the intercept of the men's and women's
equations (Figure 1).4 Note that if our theory specified that
gender per se did not directly influence earnings, the membership
term should approach.zero. To the extent it did not, it would
imply that the model was misspecified and that relevant productivity-related attributes of workers associated with gender had
been omitted from the model. Alternatively, it would imply
measurement error in the included variables.
The second term in 5 represents the part of the gap that is due to
differences in the coefficients per se, evaluated in terms of the
endowments of the low earning group. It estimates how much of
the income gap results from differences in how the low group's
endowments are actually valued in the market and how they
would be valued if they enjoyed the rates of return of the high
income group. In the example in Figure 1, this term is the $15
increase in income women would get if their pay rate was raised
by $0.50 to equal the male rate and women continued to work 30
hours per week ($0.50 · 30 hours = $15). The third term,
endowments per se, can be interpreted as that part of the gap in
income attributable to endowment differences valued at the
discriminatory rates of return under which the low earning group
labors. It estimates the amount by which their average income is
depressed compared to the high earning group because of an
endowments deficit. In the example, it is the part of the income
difference due to the fact that men work 10 hours more than
women; if women worked the extra 10 hours at their usual rate of
pay, they would get $25 more ($2.50 . 10 = $25). Because this 10hour difference is evaluated at the women's rate of pay rather than
the man's, it reflects how much more women would earn if they
worked as many hours as men but nothing else changed. We will
SOCIOLOGICAL METHODS & RESEARCH
328
j
WEEKLY
INCOME
170
yH
160
due to interaction
(bH_bL)(XH_X L ) = (3-2.')0)(40-30)
150
5
due to endowmen ts per se
H L
l,l'(X _X ) = 2.50(40-30)= 25
140
-0
=
130
~
0
c..
E
120
~
liO
(bH_bL)X
100
due to membershi p
L
(aH_a ) = (40-15)
8
due tu
coeffici~nts ~.~
L
tli
=
(3-2.50)30 = 15
,D
3
R
<r>-
yL
90
=
25
80
70
60
50
40
r
Hen
yH
=
a H + b H XH
(,$160 = $40
+
$3.00 . 40 hours
30
Humen
20
yl.
= }. +
{ $90
=
$15
b L XL
+
$2.50 . 30 hours
10
10
20
30
40
XL
XH
50
~
~Di.
fference in endowments
HOURS I.JOIWED FOR PAY EACH \.JEEK
Figure 1: illustrative Example of Decomposing Differences Between Groups
see below that an alternative decomposition values this 10 hour
difference at the male rate of pay, and so reflects how much less
men would earn if they worked as few hours as did women.
The fourth term in equation 3 reflects that part of the income
gap due to what might be called an "interaction" between
differences in endowments and differences in coefficients. In the
example, it is the difference in men's and women's rates of pay
($3.00 - $2.50) times the difference in hours worked (40 hours - 30
Jones, Kelley / MEASURING DISCRIMINATION
329
hours), giving a total of$5. This is the amount that women would
gain if they worked as long as men and if those extra 'hours
attracted a differential currently paid only to men. It is a
consequence of that pay differential and would disappear if there
were no such.difference. But it is equally a consequence of the
difference in hours worked, and would disappear if men and
women worked the same hours. So it is an interaction term in the
sense of depending jointly on both-differences, and there is no
unambiguous way of allocating it to either rates-of-return or
endowments. We will see that one difference between alternative
decopositions lies in the allocation of this interaction term. As it
appears separately only in model 5, we will refer to 5 as the
"interaction" model.
BLINDER'S uPRIVILEGE" MODEL
We could have take a different approach to evaluating how a
difference in endowments contributes to the income gap. Instead
of taking the difference in endowments and valuing it as in the
interaction model at the (low) rate of return that the low-earning
group receives, we could value it at the (high) rates of return of the
high-earning group. Instead of asking how much we would raise
the average income of the low-earning group by increasing their
endowments to equal those of the high earning group (but valuing
that increase at the low earning group's rate of return), we could
ask how much the high income group's average would be reduced
if we decreased their endowments to those of the low income
group and then valued that drop at the high income group's rate
of return. The endowments term would then be ~bH(XH - XL)
instead of ~bL (X H - XL). In our example, this would value the
IO-hour difference in hours worked at the men's rate of pay ($3),
giving an endowments difference of $30 rather than the $25
endowment difference computed using women's rate of pay.
Using the men's rate of pay to value the difference in hours
worked is reasonable if the implicit policy (or mental experiment)
one has in mind is to eliminate privilege by reducing men's hours
330
SOCIOLOGICAL METHODS & RESEARCH
of work rather than increasing women's (which is why we thought
for the present example, but there are other, equally cogent,
examples for which it is. 5
Blinder's treatment of the differences due to group membership
and differences in coefficients is the same as the interaction model
of 5. But he has no interaction term, so the full decomposition
consists of three terms (shown in the second panel of Table 1).
What he has done in effect is to add the term for endowments per
se in 5 to the interaction term, obtaining a new, larger endowment
term:
[6]
In the example, Blinder's endowment term, $30, is equal to the
endowment per se term with a value of $25, plus the interaction
term of $5.
THE tlDEPR/VAT/ON" MODEL
Another model (e.g., Oaxaca, 1972; Blinder, 1976), departs
from the interaction model of 5-not in augmenting the endowments term, but in augmenting the coefficients term. It values the
difference in rates of return at the 'mean endowment of the high
group, which gives a gap due to differences in the coefficients of
I(b H - bL)X". In the example in Figure i, we apply the difference
in men's and women's rates of pay not to the 30 hours that women
average but to the 40· hours that men work. This is a logical
procedure if the implicit policy (or mental experiment) is to
eliminate discrimination by raising women's hours of work rather
than reducing men's. In practice that will often be the case.
Proposals to end discrimination against women and blacks, for
example, typically envision raising the income and endowments
of women and blacks to the levels of white males, not lowering
white male incomes to those of women's or blacks'. The view is
that women and blacks are deprived, earning less than their just
TABLE 1
Alternate Decompositions of the Difference
Between Two Groups
Unexplained differences
Due to group
membership
Due to differences
in the coefficients
(2)
(1)
Due to difference
in endowments
(3)
1. "Interaction" Nadell
(name 0 f term)
(a H _ aL)
L(b H _ bL)X L
LbL(XH _ XL)
"membership"
"coefficients ~ ~"
"endowments
2
2. "Privilege" Mode1
(in terms of line 1)
(aH _ aL)
L(b H _ bL)X L
Lb H
=
=
4
3. "Deprivation" Hode1
(in terms of line 1)
(a H _ a L )
L(b H _ bL)tt
l:bL(XH _ XL)
= membership
= coefficients ~ ~
=
membership
+
coefficients ~ ~
L(b H _ bL)(XH _ XL)
~ ~..
(0 _ XL)
: ~:~~:~~~~n~ ~
endowments
Due to interaction
between differences
in coefficients and
endowments
(4)
"interaction"
(none)
(none)
~ ~
interactionS
r---'--.-----4. The recommended modelS
(in terms of line 1)
H
H
(a + 2:b XL) _ (a L
= membership per ~
+ Eb L XL)
+ coeffic ients
(= unexplained differences)
l:(b H _ b L ) (XH _ XL)
L:bL(XH _ XL)
+ endowments
~ ~
~~
1. Althauser and Wigler, 1972; lams and Thornton, 1975; Wisborough and Dickenson, 1971, using the second line of their equation 5.
2. Binder, 1973.
3. See equation 6 in the text.
4. Oaxca, 1972; cf. also Blinder, 1976.
5. See equation 7 In the text.
e.
l.H
l.H
Wlnsborough and Dickenson, 1971, using the first fine of their equation 5.
= interaction
332
SOC,IOLOGICAL METHODS & RESEARCH
desserts, rather than that white males are privileged, earning more
than they should. That is why we refer to it as the "deprivation"
model.
In effect, this view of the matter has just added the intera~tion
term of model 5 to its coefficients term, obtaining a new and
larger coefficients term:
[7]
The remainder of the model is the same as the interaction model,
as can be seen from panel 3 of Table 1.
COMPARISON OF THE THREE MODELS
These three decompositions are thus v~ry closely related both
conceptually and algebraically. Their only difference is in the
treatment of the interaction term. The main substantive effect is
to increase or decrease the "unexplained" part of the income gap
(the part usually attributed to "discrimination") according to how
the interaction term is treated. If the interaction term is excluded
from "discrimination", then "discrimination" is represented by
the membership term of the interaction model plus the coefficients
per se term. This sum represents the difference between the actual
income of the low earning group and the estimated income the
high earning group would receive if it had the endowments of the
low income group but converted them into income according to
its own equation. That is to say, it represents the difference
between the high and low income group's equations evaluated at
the mean endowment of the low income group:
Jones, Kelley / MEASURING DISCRIMINATION
333
In the example this is the difference between what men and
women would earn if both worked 30 hours a week but everything
else remained unchanged. Geometrically, it is the distance
between the male and female regression lines at 30 hours per week
(see Figure 1). This is the. approach taken by the interaction and
privilege models.
If instead the interaction term is treated as part of "discrimination" (as it is in the deprivation model), our estimate of
"discrimination" will be larger. It is now the difference between
the actual income of the high-income group and the income that
hte low-income group would be expected to earn if it had the same
endowments as the high-income group but everything else
remained unchanged:
In the example, this is the difference between what men and
women would earn if both worked 40 hours per week. Geometrically, it is the difference between the male and female regression
lines at 40 hours per week (Figure 1).
So the difference between models has largely to do with where
the comparison between the high and the low earning groups is
best made6 • If it is made at the mean of the low earning group, as
would be reasonable if the policy or mental experiment at hand
envisioned reducing the income of the high earning group to that
of the low earning group, then Blinder's privilege model seems
logical. But if it is made at the mean of the high earning group, as
would be reasonable if the policy or mental experiment envisioned increasing the income of the low earning group to that of
the high earning group, then one of the other two models seems
most reasonable. The choice between them depends on whether
or not there is a clear argument for including the interaction term
as an aspect of "discrimination. " If there is, then the deprivation
334
SOCIOLOGICAL METHODS & RESEARCH
model seems appropriate. If not, then the most reasonable
procedure is probably to keep the interaction term separate, as in
the interaction model. In any event, the interaction term simply
tells us the dollar difference involved in making one or the other
choice in evaluating the endowments difference.
SOME DIFFICULTIES
All the above models suffer from a weakness that effectively
vitiates the distinction between the membership term and the
coefficients term. The problem is that the relative size of these
terms (but not their joint contribution) depends on the location of
the zero point of each independent variable in the model. As the
zero points of at least some of the variables used in most
sociological and economic analyses of discrimination are inherently arbitrary (in the sense that several alternative zero points are
equally logical,and lead to empirically identical predictions), the
substantive results of the decomposition are arbitrary, depending
on the happenstance of model selection rather than the realities of
discrimination. This weakness has been recognized in some
theoretical writings (e.g., Winsborough and Dickenson, 1971: 7)
but not in others (e.g., Blinder, 1973) and has been sometimes
ignored in the empirical literature (cf. Kehrer, 1976).
The difficulty can be conveniently illustrated with recent
Australian data on annual income, gender and schooling. 7 Let us
first express the relation between income and schooling in terms
of a conventional human capital specification using "years of
schooling. " This' gives
yH
=$2,576 + $829 · Years of schooling
yL = -$664 + $615 · Years of schooling
(for men)
(for women)
An alternative specification, commonly used in the British
tradition, is in terms of "age left school." This is no less plausible
Jones, Kelley / MEASURING DISCRIMINATION
335
than the "years of schooling" specification and, of course, leads to
mathematically identical predictions for the income of any
particular person. But the regression equations for this specification have different intercepts:
=-$2,399 + $~29 Age left school
yL =-$4,354 + $615 Age left school
yH
(for men)
(for women)
The decomposition implied by these questions is given in Table 2
(panel 1).
Using the years of schooling formulation, we would conclude
that the amount of the income gap due to group membership was
$3,240. But using the age left school formulation, this falls by 40%
to $1,955, a very considerable difference. Similarly, the part of the
income gap due to differences in the coefficients (Le., in the, rates
of return to education) is $2,390 in the years of schooling
formulation but $3,675 (over 50% higher) in the age left school
formulation. This is again a considerable difference. These
differences are large enough to be substantively important. The
years of schooling formulation leads to the conclusion that the
main explanation of the income gap between men and women has
to do with gender per se, while the age left school formulation
leads to the conclusion that it is due mainly to differences in the
returns to education. Conclusions as different as these would
suggest different foci for remedial policies. That such large
differences in interpretation arise from a trivial difference in the
way the model is specified shows that something is fundamentally
wrong with the decomposition.
These difficulties are not confined to continuous variables but
are even more serious in dummy variable formulations. For
example, suppose that instead of measuring educationin years we
distinguished five categories (say, primary schooling only, some
secondary schooling, secondary level completed, some university,
university graduate), created five corresponding dummy variables, and used these in the analysis of income (Table 2, panel 2).
336
SOCIOLOGICAL METHODS & RESEARCH
TABLE 2
Decompositions of the Income Difference
Between Australian Men and Women
Group
membership
(1)
Indepp.ndent variab 1es
Coefficients
per se
(2)
(1)+(2)
(3)
Endowments
~
(4)
Interaction
(5)
1. Education (continuous)
(a) Years of schooling
3,240
2,390
(5,620)
-123
-43
(b) Schoo 1 leaving age
1,955
3,675
(5,620)
-123
-43
(a) University graduate is the
omitted category
7,267
-1,603
(5,664)
-77
-121
(b) Primary schooling is the
omitted category
3,972
1,692
(5,664)
-77
-121
(a) Capitalist = 1 and
worker =
4,987
51
(5,038)
241
187
(b) Worker = 1 and
capitalist =
6,753
-1,715
(5,038)
241
187
3,201
2,129
(5,330)
84
51
5,312
18
(5,330)
84
51
2. Education (5 dummy variables) 2
3. Capi ta1is t /worker
°
°
4.Education t occupation,
farmer, immigrant, Catholic,
capitalist/worker
(a) Education in years;
occupat ion in status points;
farmer = 1 and non-farm = 0;
immigrant = 1 and native
born = 0; Catholic = 1 and
non-Catholic = 0; capitalist
= 1 and worker = 0
(b) As in (3a) but worker
and capitalist = 0
=
1
1. The models in each panel are identical conceptually and predictively. but produce different membership
and coefficient terms.
2. Primary schooling only; some secondary; secondary level completed; some university; university graduate.
Of course, one dummy variable must be left out of the equation. 8
The omitted category effectively sets the zero point. Suppose that
university graduates are the omitted category. Decomposition of
the income gap between men and women indicates that group
membership (i.e., gender per se) accounts for some $7,000 of the
gap while women actually have an advantage of$I,600 due to the
coefficients (i.e., due to the returns to education). But if the
omitted category is changed to primary education, the results
change markedly. Group membership accounts for only some
$4,000 of the gap while men have an advantage (rather than a
disadvantage) of some $1,700 due to the coefficients. Other
Jones, Kelley / MEASURING DISCRIMINATION
337
choices of the omitted category lead-with equal arbitrarinessto other decompositions.
Similar difficulties arise if, as is often the case in social research,
the direction in which a variable is scored is arbitra~y. For
example, if we consider the effect of owning the means, of
production, it is as logical to score capitalists 1 (and workers 0) as
it is to score workers 1 (and capitalists 0). Yet this choice leads to
substantially different decompositions (panel 3 of Table 2).
Scoring capitalists high implies that the income gap between men
and women is almost entirely due to group membership (i."e., to
gender per se) with a negligible difference in the coefficients. But
scoring workers high implies an even larger effect of group
membership, offset by a substantial advantage to women due to
the coefficients.
Finally, it should be noted that these difficulties arise if the zero
point of any variable in a complex model is arbitrary or open to a
reasonable alternative specification. Suppose that income is
estimated from a model including education, occupation, farm
ownership, an immigrant versus native-born distinction, religion,
and ownership of the means of production (Table 2, panel 4).
Under one straightforward coding of these variables, we would
conclude that some $3,000 of the income gap is due to group
membership and $2,000 to differences in the coefficients. But
changing the scoring of the capitalist-worker variable while
leaving the rest of the model unchanged leads to the very different
conclusion that the gap is due entirely to group membership and
that differences in the coefficients have a trivial effect. In practice,
the models used "to analyze discrimination in sociology and
economics almost always include a wide range of variables, many
of them with essentially arbitrary zero points. As a consequence,
it will rarely if ever be wise to attempt to distinguish the effects of
group membership from those due to differences in the coefficients.
A RECOMMENDED DECOMPOSITION
These difficulties undermine only the distinction between the
group membership and coefficients per se terms. But while the
338
SOCIOLOGICAL METHODS & RESEARCH
division between them may be arbitrary, the joint total is not.
That total reflects differences not due to endowment differences
and is unaffected by the choice of zero point (Table 3, column 3).
The endowment per se and "interaction" terms (column 5) are not
affected by the choice of zero-point. 9
The income gap between groups can sensibly be decomposed
into only three components (see Table 1, line 4). The first is the
unexplained difference between groups. This is equal to the
difference between the actual mean income of the low earning
group and the income the high earning group would be estimated
to have if it had the same endowments as the low earning group
but nothing else changed (see Figure 1). The second component is
the difference due to endowments per see The third is the
interaction between the difference in endowments and the
difference in coefficients, which (as already indicated) represents
the dollar difference between valuing the endowments differences
at the lower earning group's rather than the higher earning
group's rates of return. It may be that theory will in some cases
dictate unambiguous zero-point~, and that a cogent case can be
advanced for interpreting the coefficients per se term. But we have
not yet found any such examples iIi the income discrimination
literature.
CONCLUSION
While most of our points have been made one way or another
in the existing literature, the caveats have not always been
recognized in much of the theoretical or empiric:al literature.
Given disagreement in large parts of social science about how
specific theoretical constructs should be operationalized and the
arbitrary nature of many measurement decisions, w.e argue that
any attempt to go beyond the simple partitioning of group
differences into three components-one unexplained (or "residual") component, one due to differences in endowments, and
one interaction component-should usually be resisted. Unless
all measurement scales can be shown with a reasonable degree of
Jones, Kelley / MEASURING DISCRIMINATION
339
plausibility to have appropriate and unique zero points, any
further decomposition of the unexplained component will be
inherently arbitrary. Certainly, the comparison of studies that use
different specifications and different operationalizations ofunderlying concepts should be treated very sceptically indeed.l° To
quote Althauser and Wigler (1972: 118),
A case can probably be made for confounding differences in
regression coefficients and y intercepts. Both potentially reflect the
operation of discrimination.
Indeed they do.
The appropriate treatment of the interaction term is a matter to
be decided on substantive rather than statistical grounds. If the
income gap comes about because the high earning group is
privileged-earning more than the appropriate return on their
endowments-or if the policy envisioned is to reduce the returns
of the high earning group to those of the low earning group, then
the interaction term can most reasonably be added to the
endowments term (as in the "privilege" model). The endowments
term then will reflect the extra income the high earning group gets
because of their privilege or the drop in their income that will be
produced by the policy change (cf. Duncan, 1968). But if the
income gap comes about because the low earning group is
deprived-earning less than the appropriate return on their
endowments-or if the policy envisioned is to increase their
returns to match those of the"~:high earning group, then the
interaction term can almost reasonably be added to the "discrimination" component, as in the "deprivation" model~ The endowments term then reflects the increase in the income of the low
earning group that would come about by equalizing endowments
without changing anything else. There are of course intermediate
solutions for which reasonable arguments can be advanced. II
If there are no· clear substantive grounds for allocating the
interaction term one way or another, it is reasonable to leave it
separate. The endowments term will then reflect the increase in
the income of the low earning group that would come about by
340
SOCIOLOGICAL METHODS & RESEARCH
equalizing endowments but changing nothing else, as in the
deprivation model. The interaction term is the dollar difference
implied by adopting the deprivation rather than the privilege
perspective, between evaluating the income gap at the mean
endowment of the low earning group and evaluating it at the
mean endowment of the high earning group. So far as gender
differences in earnings are concerned, we would argue that the
principle of "equal pay for work of equal value" has generally
meant an attempt to ensure that the returns women get for their
endowments are raised to equal those of men. So the deprivation
model is the appropriate decomposition of the income gap. Other
problems may demand different decompositions appropriate to
the theory and issues under scrutiny.
NOTES
1. The origins of this note lie in an attempt to apply what seemed to be a wellaccepted method of decomposition of the income gap between two groups, much used in
the economics literature. However, the estimation of various regressions in which
predictors were systematically varied (mainly as a check on the effects of collinearity)
produced results which, while undoubtedly algebraically correct, could not sensibly be
made to support the interpretation suggested for the difference in intercepts versus slopes.
An anonymous reader of the substantive paper referred to in the references (J ones, 1982a)
drew attention to a fourth component of difference-the interaction term referred to in the
two articles published in this journal. We are not aware of other work in this area of
application pointing to the arbitrary behavior of intercepts when dummies with more than
two categories are used in substantive analysis, although Blinder (1972: 443) discusses the
dichotomous case. Coleman et al. (1972) offer what they claim as a solution to the
zero-point problem-evaluating differences not in average endowments but in standard
deviations of those endowments. Now, while it is clear that the standard deviation is
invariant under an arbitrary (linear) scale change and that their algebra is correct, we find
it impossible to give any substantive sociological or economic meaning to the terms they
thereby generate. Perhaps that is why other researchers in this area do not seem to have
adopted their approach.
2. If both their endowments and rates of return were superior, they would not be a
low earning group.
Jones, Kelley / MEASURING DISCRIMINATION
341
3. The predicted income obtained by inserting the means ofthe explanatory variables
into each regression equation is, of course, the average income. Hence, the difference in
predicted incomes is equal to the actual difference in mean incomes.
4. In both the example of Figure 1 and the discussion in the text, we assume that as is
usually the case the low earning group suffers -from both inferior endowments and lower
rates of return. But if one or the other were actually higher, then it would produce an
advantage that would partially offset the disadvantage of the other. The mathematical
definition and conceptual meaning of the decomposition would remain unchanged. For
simplicity, we will confine our discussion to the usual case.
5. For example, in some American jurisdictions women are at a disadvantage in
gaining civil service jobs that are allocated to the candidates scoring highest on a
standardized exam, as (mostly male) veterans·are given bonus points on the exam. In this
case it is natural to ask what would happen if the veteran's bonus were eliminated.
6. The choice of an appropriate comparison point is usually referred to in the
ecnomics literature as the index number problem (cf. Blinder, 1972: 438; Brown et aI.,
1980: 26; and Kahne, 1975: 1259).
7. These data are from the 1979 Australian National Political Attitudes Survey.
There are 2,016 cases but this analysis is restricted to thos.~ in the labor force with complete
information on the included variables, resulting in a final sample of 714 men and 555
women. We thank Professor Don Aitkin for making these data available. A more fully
specified model based on census data and employing a wider range of productivity-related
characteristics is reported elsewhere (Jones, 1982a; 1982b).
8. A conventional regression equation with all dummy variables included would be
underidentified. But an alternative specification including all dummy variables while
omitting the intercept term is identified. This effectively sets the zero point at the mean.
Decomposing differences from such equations is subject to the same difficulties described
in the text.
9. These are the same for models that differ only in the choice of zero point (e.g., lines
1a and 1b). But of course models that include different variables can lead to different
conclusions, reflecting the different effect of variables in generating income differences.
10. That such comparisons have been attempted, with curious results arising from
both different specifications and different operationalizations, can be seen from Kehrer's
attempt (1976: 536-8) to reconcile her results with those of Blinder (1972) and Oaxaca
(1973); and from Davis's and Hubbard's (1979: 287) observation that estimates of the
proportion of the earnings gap due to discrimination range from 10-100%.
11. A common approach to solving the problem of choosing a standard for evalution
has been to use both the "privilege" and "deprivation" models and then to take an
unweighted average of the two. A more attractive solution might be to recognize that the
comparison groups typically represent different proportions of the total labor force and
therefore take a weighted, rather than unweighted, average of the results (the weights
being the proportion each group contributes to this total labor force, in this example 0.56
for men and 0.44 for women). Such an approach might be justified in terms of the fact that
it implies a redistribution of the existing wages bill. Note that using equation 5 typically
implies a substantial addition to the wages bill; using equation 7 usually implies a
substantial reduction; and using an unewighted average of both also usually implies a
342
SOCIOLOGICAL METHODS & RESEARCH
substantial reduction, as the high earning group is typically a majority group and the low
earning group a minority group in the labor force. Using a weighted average implies a
redistribution of the amount currently available to pay for wages and salaries. Whatever
standard is adopted, the important point is all such decomposition are essentially
mental-not social-experiments. Any policy to eliminate discrimination will typically
lead to unanticipated consequences for the regimes governing how endowments are
converted into earnings within and between groups.
REFERENCES
ALTHAUSER, R. P. and M. WIGLER (1972) "Standardization and component
analysis." Soc. Methods and Research 1 (August): 97-135.
BLINDER, A. S. (1976) "On dogmatism in human capital theory." J. of Human
Resources 11 (Winter): 8-22.
- - - (1973) "Wage discrimination: reduced form and structural estimates." J. of
Human Resources 8 (Fall: 436-55.
BROWN, R. S., M. MOON~and B. S. ZOLOTH (1980) "Incorporating occupational
attainment in studies of male-female earnings differentials. " J. of Human Resources 15
(Winter): 3-28.
COLEMAN, J. S., Z. D. BLUM, A. B. SORENSON, and P. H. ROSSI (1972)"White
and black careers during the first decade of labor force experience, part I:
Occupational status." Social Sci. Research 1 (September): 243-270.
DAVIS, J. C. and C. M. HUBBARD (1979) "On the measurement of discrimination
against women." Amer. J. of Economics and Sociology 38 (July): 287-92.
DUNCAN, O. D. (1968) "Inheritance of poverty or inheritance of race?" pp. 85-110 in
D,P. Moynihan (ed.) On Understanding Poverty: Perspectives from the Social
Sciences. New York: Basic Books.
lAMS, H. M. and A. THORNTON (1975) "Deoomposition of differences: a cautionary
note." Soc. Methods and Research 3 (February): 341-52.
J ONES, F. L. (1982a) "Sources of gender inequality in income: what the Australian census
says." Social forces.
- - - (1982b) "On decomposing the wage gap: a critical comment on Blinder's method."
J. of Human Resources.
KAHNE, H. with A. I. KOHEN (1975) "Economic perspectives on the roles of women in
American economy." J. of Economic Literature 13 (September): 1249-1292.
KEHRER, B. H. (1976) "Factors affecting the income of men and women physicians: an
exploratory analysis." J. of Human Resources 11 (Fall): 527-45.
OAXACA, R. (1973) "Sex discrimination in wages,"pp. 124-151 inO. AshenfelterandA.
Rees (eds.) Discrimination in Labor Markets. Princeton, NJ: Princeton Univ. Press.
WINSBOROUGH, H. H. and P. DICKENSON (1971) "Components of negro-white
income differences." Proceedings of the American Statistical Association, Social
Statistics Section: 6-8..
Jones, Kelley / MEASURING DISCRIMINATION
343
F. L. Jones is Professor of Sociology in the Research School ofSocial Sciences in
the Institute ofAdvanced Studies ofthe Australian National University. He has a
Ph. D. from the same university. His most recent book, coauthored with Broom,
McDonnell, and Williams, is The Inheritance of Inequality. He is currently
completing a manuscript on women and the Australian labor market in the
twentieth century.
Jonathan Kelley is Senior Fellow in the Institute of Advanced Studies of the
Australian National University. He did his B.A. at Cambridge University, England
(in philosophy and mathematical logic) and his Ph. D. at University of California,
Berkeley (in sociology). He taught at Columbia University and Yale University
before coming to ANU. Kelley:SO main interests are in social stratification. His book
with Herbert S. Kelin, Revolution and the Rebirth of Inequality: A Theory
Applied to the National Revolution in Bolivia was recently published by the
University of California Press; he is just completing a book with Ian McAllister,
Australian Political Behaviour in Comparative Perspective.