Response error in a transformation model with an

Econometrics Journal (2004), volume 7, pp. 366–388.
Response error in a transformation model with an application
to earnings-equation estimation∗
J ASON A BREVAYA †
†
‡
AND J ERRY
A. H AUSMAN ‡
Purdue University, Department of Economics, 403 W. State St., W. Lafayette,
IN 47907-2056, USA
E-mail: [email protected]
Massachusetts Institute of Technology, Department of Economics, 50 Memorial Drive,
Cambridge, MA 02142-1347, USA
E-mail: [email protected]
Received: May 2004
Summary This paper considers estimation of a transformation model in which the
transformed dependent variable is subject to classical measurement error. We consider cases
in which the transformation function is known and unspecified. In special cases (e.g. log
and square-root transformations), least-squares or non-linear least-squares estimators are
applicable. A flexible approximation approach (based on Taylor expansion) is proposed for
a parametrized transformation function (like the Box–Cox model), and a semi-parametric
approach (combining a semi-parametric linear-index estimator and non-parametric regression)
is proposed for the case of an unspecified transformation function. The methods are applied
to the estimation of earnings equations, using wage data from the Current Population Survey
(CPS).
Keywords: Measurement error, Transformation model, Box–Cox model, Semi-parametric
estimation, Local polynomial estimation.
1. INTRODUCTION
Classical measurement error (i.e. additive error uncorrelated with the covariates) in the dependent
variable is generally ignored in regression analysis because it simply gets absorbed into the error
residual. However, when a non-linear transformation of the dependent variable appears in the
regression model, classical measurement error (in the original variable) no longer gets absorbed
into the residual. As a result, standard estimation techniques can lead to biased and inconsistent
coefficient estimates.
∗ Seminar participants at Harvard, MIT, University College London, and Statistics Canada provided valuable feedback.
Comments by Pravin Trivedi and two anonymous referees greatly improved this paper. Mark Rainey provided excellent
research assistance.
C Royal Economic Society 2004. Published by Blackwell Publishing Ltd, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main
Street, Malden, MA, 02148, USA.
Response error in a transformation model
367
To fix ideas, assume that the following model relates the (‘latent’) dependent variable y∗ and
the covariates x (K × 1 vector),
h(y ∗ ) = α + x β + ,
(1)
where h is some strictly increasing ‘transformation’ function, is an i.i.d. random variable
(independent of x, with E[ | x] = 0), and y∗ is the ‘latent’ dependent variable (which may be
observed with measurement error). The model in (1) is known as the transformation model and
has become increasingly used in economics and studied in the econometrics literature. Special
cases of (1), when h is parametrized, include the Box–Cox model and parametric duration models
such as the Weibull. The Box–Cox model, in particular, has been used in many applications and is
covered in most graduate econometrics textbooks (e.g. Judge et al. 1985; Greene 2002; Amemiya
1985). More recently, the case of h unknown (and not parameterized) has received great attention
in the literature, including Chen (2002), Horowitz (1996) and Klein and Sherman (2002).
To include measurement error in the transformation model, assume that y∗ is observed with
classical measurement error (call the observed dependent variable y),
y = y ∗ + η,
where E(η | x, ) = 0,
(2)
or, equivalently, E(y | x, y ∗ ) = y ∗ . If h were a linear function, least-squares regression would
consistently estimate β because the conditional expectation of y given x would be unaffected by
the classical measurement error (i.e. E(y | x) = E(y ∗ | x)). If h is a (known) non-linear function,
however, the conditional expectations E(h(y) | x) and E(h(y ∗ ) | x) are not necessarily the same
and, as a result, least-squares regression of h(y) on x can lead to inconsistent estimates of β. If h
is continuous and differentiable, equations (1) and (2) can be combined to yield
h(y − η) = α + x β + ,
(3)
or, after applying the mean-value theorem,
h(y) = α + x β + ( + ηh ( ȳ)),
(4)
∗
where ȳ is a value between y and y . Inconsistency of the least-squares regression of h(y) on x is
caused by the correlation between x and ηh ( ȳ).1 Similarly, the standard methods of estimating
parametrized transformation models like the Box–Cox model (maximum likelihood estimation
or non-linear two-stage least squares (Amemiya and Powell 1981)) are generally inconsistent in
the presence of dependent-variable measurement error.
To deal with the measurement error, we consider methods that base estimation upon the
conditional distribution of y given x (rather than the conditional distribution of h(y) given x). In
particular, inverting the transformation in (1) and combining with (2) yields
y = h −1 (α + x β + ) + η.
(5)
We consider various estimation strategies based on the ‘inversion’ idea embodied in equation (5).
First, for certain transformation functions (including the log transformation and square-root
transformation), the expression in (5) simplifies in such a way that least-squares techniques are
directly applicable.2 Second, for other transformations (including parametrized transformations
1 This inconsistency was discussed by Abrevaya and Hausman (1999) in the application of transformation models to
duration data.
2 We note that this result is well known for the log-transformation model but not for other models.
C Royal Economic Society 2004
368
Jason Abrevaya and Jerry A. Hausman
like the Box–Cox transformation), a Taylor expansion can be used to approximate the conditional
mean of y (given x) and provides the basis for a non-linear least-squares estimation strategy.
Finally, in the case where h is allowed to be an unspecified (increasing) transformation function,
existing semi-parametric and non-parametric estimators can be combined in order to consistently
estimate the conditional expectation of y and the marginal effects of x on E[y | x].
To illustrate our techniques, we consider estimation of a standard wage equation (where y is
observed hourly wage) using data from the current population survey (CPS). The type of earnings
equation that we estimate is common in empirical labour economics and has its theoretical basis
in human capital theory (e.g. Mincer 1974; Becker 1975). The log-wage specification is adopted
widely in the literature, and the use of a flexible transformation specification originated from
Heckman and Polachek (1974), who reported the first results from estimation of a Box–Cox
transformation model of the earnings equation. Wage data gathered in large cross-sectional surveys
are known to be poorly measured. For a variety of reasons, including poor recall, rounding error,
and proxy response, respondents misreport their wages.
Theoretical treatment of measurement error in the dependent variable has focused primarily
on the situation in which ‘calibration’ data are available and can be used to estimate parameters
in a model of measurement error (see, e.g. Buonaccorsi 1996), which considers response error in
a general linear model under several alternative models of response error. In the case of economic
data, the ‘calibration’ data are usually called ‘cross-validation’ data. ‘Cross-validation’ data are
constructed from more reliable sources than surveys (e.g. employer records, Social Security
records) so that the accuracy of survey data can be assessed. Various studies have used crossvalidation data to highlight the extent of measurement error of income and wages from various
surveys—for example, the Panel Study of Income Dynamics (Pischke 1995; Brownstone and
Valletta 1996), the National Longitudinal Survey of Youth (Cole and Currie 1994), and the CPS
(Bound and Krueger 1991). Brownstone and Valletta (1996) use a multiple-imputation approach
to estimate a wage equation based on cross-validation data. Lee and Sepanski (1995) propose a
different approach to the same problem.
In this paper, we consider a situation in which calibration or cross-validation data is not
available to the researcher. The primary interest is determining what can be inferred about
the parameters and marginal effects under a model of response error (in particular, classical
measurement error) when such data are unavailable. Previous studies have examined models of
response error in the binary choice model (Hausman et al. 1998), count models (Li et al. 2003),
and the proportional hazards model (Abrevaya and Hausman 1999) when cross-validation data
are unavailable.
In Section 2, we consider some specific transformation functions h(·) for which standard
least-squares methods can be used to estimate the coefficient vector β and, in turn, predicted
values and marginal effects. These methods apply to log transformations (h(y) = log(y)) and
√
root transformations (h(y) = k y for k = 2, 3, . . . ). In Section 3, we consider the use of a Taylor
expansion for approximating the conditional expectation in (5) as the basis for using non-linear
least squares. We focus on the Box–Cox model, which is the most common parameterized
transformation used by researchers and a model that we use in our empirical example. In
Section 4, we consider semi-parametric estimation of a flexible linear-index model (which includes
(1) as a special case). The semi-parametric approach has the advantage that the measurement
error can be of a more general form than classical measurement error. Section 5 examines the
practical performance of the Taylor-approximation strategy for the Box–Cox model. Monte Carlo
simulations compare the non-linear least squares estimators of Section 3 to the standard Box–
Cox maximum likelihood estimator. Some suggestions for practitioners are given based upon
C Royal Economic Society 2004
Response error in a transformation model
369
our results. In Section 6, the techniques are applied to wage data from the CPS. The results
which take into account measurement error in wages differ from those obtained using traditional
estimation methods. In particular, the non-linear least-squares methods yield higher estimates of
the incremental effects of education on wages. The semi-parametric estimates are consistent with
these higher estimates. Finally, Section 7 concludes.
2. SPECIAL CASES
We consider some specific forms of the transformation function for which least-squares or nonlinear least-squares regressions can be used to consistently estimate β. The three examples
that we consider are the log transformation, the square-root transformation, and the cube-root
√
transformation. (Analogous results for h(y) = k y, k > 3 would follow immediately.) We assume
that the distribution of is independent of x.
Example 1 h(y) = log(y)
For the log-transformation model, equation (5) is
y = eα+x β+ + η,
(6)
or, equivalently,
E(y | x) = E(eα+x β+ ) = Ce x β ,
(7)
C = eα E (e ) .
(8)
where
If is normally distributed ∼ N(0, σ 2 ), the well-known form of (8) is3
σ2
C = eα+ 2 .
(9)
Non-linear least squares (NLS) based on (7) can be used to estimate β (and C), as long as the first
moment of e exists.4 This result is very different from the case of classical measurement error
in x, for which additional information (e.g. instrumental variables) is required for identification.
The marginal effects of a covariate x j on E(log(y ∗ ) | x) and E(y ∗ | x) are consistently estimated
by β̂ j and β̂ j Ĉe x β̂ , respectively.
The NLS estimates are consistent with or without measurement error, whereas ordinary least
squares (OLS) estimates obtained from a regression of log(y) on x may be inconsistent if there
is measurement error. A Hausman (1978) specification test can be used to test for measurement
error, as the OLS estimator of β is efficient when there is no measurement error and inconsistent
in the presence of measurement error.
√
Example 2 h(y) = y
For the square-root transformation model,
y = (α + x β + )2 + η = (α + x β)2 + 2 + 2(α + x β) + η,
3 The
4 The
assumption that is normal is standard for the Box–Cox model, of which this model is a special case.
parameter α is not separately identified (e.g. separately from σ in the normally distributed case).
C Royal Economic Society 2004
(10)
370
Jason Abrevaya and Jerry A. Hausman
which yields the conditional expectation
E(y | x) = (α + x β)2 + σ2 ,
(11)
σ2 ≡ E( 2 ).
(12)
where
An NLS estimator based on (11) can be used to estimate (α, β, σ ) as long as standard regularity
conditions hold (including, of course, existence of the second moment of ).
Interestingly, the NLS estimator is equivalent to a restricted OLS estimator. To see this
equivalence, multiply out the first term on the right-hand side of (11) to yield
E(y | x) = α 2 + σ2 +
(2αβk )xk +
(βk )2 xk2 +
(2βk β )xk x .
(13)
k
k
k<
K (K −1)
) independent variables
2
2
A least-squares regression based upon (13) would have (1 + 2K +
with (K − 1 + K (K2−1) ) restrictions. (These restrictions are readily testable using a χ test
(or F test).) In the case of a single x covariate (K = 1), there are no restrictions; in this situation,
equation (13) becomes
E(y | x) = α 2 + σ2 + 2αβx + β 2 x 2 .
(14)
If the OLS estimator of y on (1, x, x 2 ) yields the estimates (γ̂0 , γ̂1 , γ̂2 ), the estimates of (α, β, σ )
are given by
γ̂1
γ̂ 2
γ̂1
β̂ = γ̂2 ; α̂ =
(15)
= √ ; σ̂ = γ̂0 − α̂ 2 = γ̂0 − 1 .
4γ̂2
2 γ̂2
2β̂
As in the log-transformation model, the NLS estimates of (α, β) (based on (11)) can be
√
compared with the standard OLS estimates (regressing y on x) as a specification test for
measurement error.
√
Example 3 h(y) = 3 y
For the cube-root transformation model,
y = (α + x β + )3 + η = (α + x β)3 + 3(α + x β)2 + 3 2 (α + x β) + 3 + η,
(16)
which yields the conditional expectation
E(y | x) = (α + x β)3 + 3σ2 (α + x β)2 + µ3 ,
(17)
µ3 ≡ E( 3 ).
(18)
where
An NLS estimator based on (17) can be used to estimate (α, β, σ , µ 3 ). If is also assumed to
be symmetric, note that µ 3 = 0 and can be dropped from (17).
3. A TAYLOR-APPROXIMATION APPROACH
In this section, we consider an approach to estimation when the inversion in (5) does not lead to a
simple closed form as in the examples of the previous section. To allow for transformation models
C Royal Economic Society 2004
371
Response error in a transformation model
in which the transformation is parametrized and estimated (e.g. the Box–Cox transformation), we
re-write the transformation function as h(y) = h(y, λ). If h is assumed to be known, the λ can be
ignored in what follows.
To simplify notation, let g(v, λ) denote the inverse of the transformation function (i.e. defined
so that g(v, λ) = y ⇔ h(y, λ) = v) and g ( j) (v, λ) denote the jth derivative of g(v, λ) with respect
to v. Taking the expectation of (5) yields
E[y | x] = E[g(α + x β + , λ) | x].
(19)
Assuming that g is (d + 1)-times continuously differentiable, Taylor’s theorem implies that there
exists a function t() for which
E[y | x] = g(α + x β, λ) +
+
E
d
E[ j ]
j=1
d+1 (d+1)
g
j!
g ( j) (α + x β, λ)
(α + x β + t(), λ)
(d + 1)!
(20)
and t() ∈ [0, 1] ∀. If the last term (the ‘remainder term’) is negligible, then (20) provides a way
to approximate the conditional expectation:
E[y | x] ≈ g(α + x β, λ) +
d
E[ j ]
j=1
j!
g ( j) (α + x β, λ) ≡ gd∗ (α + x β, λ).
(21)
Under the additional assumption that is symmetric, E( j ) = 0 for odd j so that every other
term simply drops out of the expansion in equations (20) and (21). The stronger assumption
of normality (which is used almost always for the Box–Cox model) is especially helpful since
∼ N(0, σ 2 ) implies E( j ) = σ j ( j − 1)( j − 3) · · · 3 · 1 for even j. In this case, the moments in
the expansion terms of (21) are functions of the same parameter σ so that additional nuisance
parameters are not required with additional expansion terms. Under the assumption of normality,
equation (21) can be re-written as
E[y | x] ≈ g(α + x β, λ) +
d/2
σ 2 j (2 j − 1)(2 j − 3) · · · 3 · 1 (2 j)
g (α + x β, λ)
(2 j)!
j=1
(22)
for even d.
The remainder of this section focuses on the transformation model of Box and Cox (1964),
where for λ ∈ (0, 1)
h(y, λ) =
yλ − 1
λ
(23)
and
g(v, λ) = (1 + λv)1/λ .
(24)
For the Box–Cox model, the derivatives of g(v, λ) are given by
g ( j) (v, λ) = (1 + λv) λ − j
1
j−1
(1 − kλ).
k=0
C Royal Economic Society 2004
(25)
372
Jason Abrevaya and Jerry A. Hausman
To make maximum-likelihood estimation of the Box–Cox model possible, a normality assumption
is by far the most common approach in empirical work. Combining the normality assumption
with (22) and (25) yields
g0∗ (v, λ) = (1 + λv)1/λ
σ2
1
g2∗ (v, λ) = g0∗ (v, λ) +
(1 + λv) λ −2 (1 − λ)
2
..
.
d−1
σd
1
∗
gd∗ (v, λ) = gd−2
(v, λ) +
(1 − kλ),
(1 + λv) λ −d
d(d − 2) · · · 4 · 2
k=0
as the successive Taylor-approximating functions. For a given choiceof d, the Box–Cox Taylorn
∗
2
approximation NLS estimator minimizes the objective function
i=1 (yi − gd (a + x i b, )) .
When d = 0 (one-term Taylor expansion), the NLS estimator turns out to be equivalent to
an estimator of Wooldridge (1992), who directly models the conditional expectation of y as
g 0 (α + x β, λ) (i.e. without entering into the non-linear function). For larger d, the additional
Taylor-expansion terms may offer a better approximation to the conditional expectation. It is
important to note again that, under the assumption of normality, the additional terms in the
Taylor expansion do not require additional parameters to be estimated. As it turns out, low-order
expansions are adequate for the wage-equation application discussed in Section 6. In order to
provide some theoretical justification of the Taylor-approximation approach for the Box–Cox
model, the Appendix contains a proof that the remainder term in (20) becomes negligible (i.e.
goes to zero) as d gets large.
Before ending our discussion of the Box–Cox model, we should note that there has been a
vast literature concerning the Box–Cox model (and its estimation) since the introduction of the
model. Issues such as proper intepretation of the Box–Cox parameters, the normality assumption
and appropriate asymptotics for the model are considered and reviewed in a recent paper (and the
subsequent comments) by Chen et al. (2002). In terms of least-squares based estimation methods
(like that proposed in this section), the most relevant related work is Breiman and Friedman
(1985), who suggest a least-squares based strategy (called alternating conditional expectations)
for estimating transformation functions. The approach of Breiman and Friedman (1985) applies
to the more general case of an unspecified transformation function (though it could be used in the
specific context of the Box–Cox model), but differs from our inversion approach because it is not
robust to dependent-variable classical measurement error.
4. SEMI-PARAMETRIC ESTIMATION
In this section, we introduce a more flexible linear-index model in which the transformation h(·)
is left unspecified. The semi-parametric approach estimates β without a distributional assumption
on and with weaker restrictions on the form of the measurement error. With an estimate of β,
marginal effects of the covariates on E(y ∗ | x) are identified under the assumption of classical
measurement error. The semi-parametric model is given by
E(y ∗ | x) = G(x β),
(26)
C Royal Economic Society 2004
Response error in a transformation model
373
where G is an unspecified strictly increasing function. Note that the transformation model in
(1) with unknown h and independent of x is a special case of (26). As G is unspecified, β is
only identified up-to-scale and normalization of the coefficient vector is required. The location
parameter α has also been omitted because it is not identified if G is unspecified.
Under the assumption of classical measurement error, note that
E(y | x) = G(x β),
(27)
meaning that several semi-parametric index-model estimators (including Han 1987; Powell et al.
1989; Ichimura 1993; Cavanagh and Sherman 1998) could be used to estimate β up-to-scale. Any
type of response error which preserves the linear-index nature of the model will allow consistent
estimation of β up-to-scale. As Abrevaya and Hausman (1999) contains a general discussion of
sufficient conditions for consistency in linear-index models, we refer the reader there for further
details.
To identify the marginal effects, we assume that G is differentiable and return to the assumption
of classical measurement error so that (27) holds. Then,
∂E (y ∗ | x)
∂E (y | x)
=
= g(x β)β j ,
∂x j
∂x j
(28)
where g(v) = dG(v)/dv. Non-parametric regression can be used to consistently estimate G(·) and
g(·). Combined with a consistent estimate of β, the semi-parametric marginal effect from (28)
can be consistently estimated for any x. Unlike the parametric cases discussed in Section 2, the
marginal effects of the covariates on other transformations of y cannot be estimated under the
assumption of classical measurement error. For instance, E(log(y ∗ ) | x) = E(log(y) | x) in general,
so non-parametric regressions using log(y) cannot be used for inference about log(y∗ ). We also
note that recent non-parametric estimators of the h function (such as Horowitz 1996; Chen 2002;
Klein and Sherman 2002) will generally be inconsistent in the presence of measurement error
in y∗ .
To summarize the estimation procedure for the unspecified h case, β is first estimated (upto-scale) by a semi-parametric linear-index estimator. Then, in a second stage, the conditional
expectation E(y | x) and/or the marginal effects of x on this conditional expectation are estimated
by some non-parametric estimator (e.g. kernel estimator, spline estimator or local polynomial
estimator). For the wage-equation application in the next section, the maximum rank correlation
estimator of Han (1987) is used in the first stage to estimate β. This estimator has two nice
features: (1) no bandwidth selection is required for estimation and (2) transformation of the
dependent variable has no effect on the estimation results (i.e. estimation based on log(y) or y
would yield the same estimates). In the second stage, local polynomial estimators are used to
estimate the conditional expectation of wages as well as the marginal effect of education on
this conditional expectation. A local linear regression estimator and a local quadratic regression
estimator are used, respectively, to estimate these functions.5 These estimators are extremely
easy and fast to implement as they require only a series of ordinary least-squares estimations.
In addition, as estimation of β in the first stage has resulted in dimension reduction, the
5 As Fan and Gijbels (1996, pp. 76ff) discuss, it is better to use an odd number of extra terms for local polynomial
estimators. To estimate the intercept (for the conditional expectation), then, one extra term is used, resulting in a local
linear estimator. Likewise, to estimate the slope (for the marginal effect), one extra term results in a local quadratic
estimator.
C Royal Economic Society 2004
374
Jason Abrevaya and Jerry A. Hausman
local polynomial regressions involve a single-dimensional covariate and avoid any curse-ofdimensionality problems.
5. BOX-COX APPROXIMATIONS AND MONTE CARLO SIMULATIONS
This section considers the Box–Cox model in somewhat more detail, as it is easily the most widely
used parametrized transformation model in empirical economics. First, we consider how well the
Taylor-approximation approach described in Section 3 approximates the conditional expectation
when the Box–Cox model holds. This exercise does not require simulation because it only requires
evaluation of equation (22) for a given choice of the length of the Taylor expansion. True values
for σ and λ can be used in order to compare these approximated conditional expectations with
the true conditional expectation E(y | x). Second, we consider the performance of the Taylorapproximation (NLS) estimator (for different choices of the number of expansion terms) in
relation to the Box–Cox-maximum-likelihood estimator. For this comparison, we conduct a series
of Monte Carlo simulations when the Box–Cox model holds and when it does not as well as
specifications that either have measurement error in y or are free of measurement error.
5.1. Approximating the conditional expectation
The Box–Cox model (for λ = 0) can be re-written as
y = (λ(v + ) + 1)1/λ ,
where v ≡ α + x β. The conditional expectation of interest is
E(y | x) = E(y | v) = E (λ(v + ) + 1)1/λ .
(29)
(30)
The ability of the Taylor approximation in equation (22) to approximate (30) will depend upon
v, σ 2 (the variance of ) and λ (the non-linearity parameter).
Figures 1 and 2 graph four different approximating functions (starting with one term in the
expansion and going up to four terms in the expansion) on the range v ∈ [10, 35]. Each figure
considers four different values for λ (0.10, 0.33, 0.50 and 0.75), with Figure 1 having a lower
value for σ (σ = 3) as compared to Figure 2 (σ = 6). A few general things can be said about the
patterns found in these figures (which would also extend to other choices for the range of v or for
the parameters λ and σ ). It requires more terms to approximate the conditional expectation when
there is a large degree of non-linearity (e.g. λ = 0.10 in the two figures). The predictions of the
theory concerning the ‘special cases’ of Section 2 are evident in the figures as well. The squareroot transformation (λ = 0.50) requires only an expansion of one term to get the conditional
expectation correctly; adding additional terms gives curves right on top of the one-term curve.
Similarly, the cube-root transformation (λ = 0.33) requires two terms in the expansion to get
the conditional expectation correctly—only two curves are visible in Panel B of both figures. As
expected, a larger value for σ makes it more difficult to approximate the conditional expectation
with fewer terms. For each of the λ values (except of course λ = 0.5), the approximation curves are
more spread in Figure 2 than they are in Figure 1. Despite this fact, notice that the approximations
for the λ = 0.10 have basically ‘converged’ with only four terms (though not shown, the five-term
expansion is very close to the four-term expansion). The theory implies that 10 terms are required
in order to get the expansion exactly right, but both figures indicate that fewer terms may be
needed (depending upon the relevant range of v).
C Royal Economic Society 2004
375
Response error in a transformation model
Panel A: λ=0.10, σ=3
1 term
3 terms
Panel B: λ=0.33, σ=3
2 terms
4 terms
1 term
3 terms
10
15
20
Panel C: λ=0.50, σ=3
1 term
3 terms
10
15
25
30
35
10
15
20
Panel D: λ=0.75, σ=3
2 terms
4 terms
1 term
3 terms
20
25
30
35
2 terms
4 terms
10
15
25
30
35
25
30
35
2 terms
4 terms
20
Figure 1. Taylor approximations of the Box–Cox model (σ = 3).
5.2. Simulation results
For the Monte Carlo simulations, we consider three different Box–Cox designs corresponding to
λ = 0.2, λ = 0.5 and λ = 0.8. Specifically, the latent dependent variable y∗ is related to a single
covariate x by the following model:
y ∗ = (λ(10 + 0.5x + ) + 1)1/λ ,
(31)
where x is evenly spaced on the range [0, 30] and is normally distributed. To make the designs
somewhat comparable to each other, the standard deviation of is chosen to be proportional to λ
(specifically, values of 0.4, 1.0 and 1.6 for the three designs). In addition to the Box–Cox designs,
we consider another transformation model for which the Box–Cox model would be misspecified.
In particular, the design takes the following linear-spline specification:
y ∗ = z + 4 · 1(z > 18) · (z − 18),
where z ≡ 10 + 0.5x + ,
(32)
with x evenly spaced on the range [0, 30] (as in the Box–Cox designs) and ∼ N (0, 4).
For each of the four designs, we consider four possible specifications for the dependentvariable measurement error:
Specification 1:
No measurement error (y = y ∗ )
C Royal Economic Society 2004
376
Jason Abrevaya and Jerry A. Hausman
Panel A: λ=0.10, σ=6
1 term
3 terms
10
15
20
Panel C: λ=0.50, σ=6
1 term
3 terms
10
15
Panel B: λ=0.33, σ=6
2 terms
4 terms
1 term
3 terms
25
30
35
10
15
20
Panel D: λ=0.75, σ=6
2 terms
4 terms
20
1 term
3 terms
25
30
35
2 terms
4 terms
10
15
25
30
35
25
30
35
2 terms
4 terms
20
Figure 2. Taylor approximations of the Box–Cox model (σ = 6).

∗

with prob. 1/3
y
Specification 2: y = (5/4)y ∗ with prob. 1/3

 (3/4)y ∗ with prob. 1/3
Specification 3: y = uy ∗ , where u ∼ U [3/4, 5/4]
Specification 4: y = y ∗ + u, where u ∼ U [−c, c] (with c = (unconditional) standard
deviation of y∗ ).
Each specification satisfies the classical measurement error assumption, E(y | x, y ∗ ) = y ∗ .
Specifications 2 and 3 have measurement errors whose variance depends upon the level of the
latent variable y∗ , whereas Specification 4 has measurement error with variance independent of
the level of y∗ .
The combination of the four model designs (three Box–Cox and one non-Box–Cox) with
the four measurement-error specifications gives 16 total designs. For each of the 16 designs, we
conducted 200 simulations using a sample size of 2000 observations; for each simulation, four
different estimators were used—the Box–Cox maximum likelihood estimator (MLE) and three
Taylor-approximation NLS estimators (using one, two and three terms in the Taylor expansion).
Results from the series of simulations are gathered together in Table 1. The Taylor-approximation
NLS estimators are denoted by BCT1, BCT2 and BCT3 (with the number indicating the number
of terms used in the expansion). For each design, the mean and root-mean-squared error (rmse)
for λ̂ is reported. (Note that only the mean is reported for the spline specification in which there
is no ‘true’ value for λ.) In addition, since our focus in this paper is the estimation of conditional
C Royal Economic Society 2004
C Royal Economic Society 2004
rmse
ratios
for
Ê(y | x):
‘spline’
rmse
ratios
for
Ê(y | x):
λ = 0.8
rmse
ratios
for
Ê(y | x):
λ = 0.5
rmse
ratios
for
Ê(y | x):
λ = 0.2
0.313
n/a
0.548
0.307
0.464
0.250
0.103
0.499
0.023
1.385
1.426
1.434
1.354
1.179
BCT2
0.210
0.020
1.484
1.223
1.305
1.456
1.019
Mean(λ̂) −0.041 −0.062
rmse(λ̂)
n/a
n/a
x =5
1.000 0.173
x = 10
1.000 0.143
x = 15
1.000 0.153
x = 20
1.000 0.064
x = 25
1.000 0.089
0.496
0.027
1.387
1.433
1.433
1.359
1.185
BCT1
0.214
0.024
1.553
1.265
1.324
1.443
1.028
0.793
0.072
1.324
1.374
1.358
1.204
1.084
0.805
0.039
1.000
1.000
1.000
1.000
1.000
0.497
0.020
1.000
1.000
1.000
1.000
1.000
MLE
0.200
0.005
1.000
1.000
1.000
1.000
1.000
0.787
0.079
1.359
1.423
1.363
1.221
1.090
Mean(λ̂)
rmse(λ̂)
x =5
x = 10
x = 15
x = 20
x = 25
Mean(λ̂)
rmse(λ̂)
x =5
x = 10
x = 15
x = 20
x = 25
Mean(λ̂)
rmse(λ̂)
x =5
x = 10
x = 15
x = 20
x = 25
0.033
n/a
0.288
0.181
0.233
0.110
0.091
0.800
0.061
1.300
1.344
1.351
1.190
1.081
0.499
0.023
1.376
1.421
1.434
1.355
1.171
BCT3
0.201
0.014
1.344
1.155
1.315
1.448
1.018
Specification 1 (no meas. error)
0.785
0.202
0.818
0.488
0.366
0.509
0.844
0.497
0.070
0.481
0.584
0.393
0.546
0.605
BCT1
0.213
0.040
0.780
0.867
0.609
0.891
0.543
0.013 −0.061
n/a
n/a
1.000 0.144
1.000 0.135
1.000 0.144
1.000 0.069
1.000 0.091
0.288
0.513
1.000
1.000
1.000
1.000
1.000
0.225
0.276
1.000
1.000
1.000
1.000
1.000
MLE
0.142
0.058
1.000
1.000
1.000
1.000
1.000
0.273
n/a
0.443
0.262
0.393
0.218
0.102
0.792
0.192
0.802
0.482
0.368
0.510
0.842
0.497
0.066
0.479
0.577
0.392
0.546
0.604
BCT2
0.215
0.048
0.798
0.864
0.589
0.924
0.544
0.403
0.399
1.000
1.000
1.000
1.000
1.000
0.305
0.196
1.000
1.000
1.000
1.000
1.000
MLE
0.166
0.035
1.000
1.000
1.000
1.000
1.000
0.779
0.145
1.212
0.393
0.306
0.375
0.974
0.503
0.048
0.709
0.521
0.360
0.461
0.825
BCT1
0.210
0.034
1.133
0.784
0.648
0.884
0.689
0.067 −0.072 −0.080
n/a
n/a
n/a
0.292 1.000 0.172
0.191 1.000 0.151
0.251 1.000 0.159
0.128 1.000 0.080
0.094 1.000 0.104
0.797
0.156
0.790
0.481
0.366
0.506
0.839
0.497
0.066
0.479
0.569
0.390
0.546
0.602
BCT3
0.201
0.030
0.764
0.859
0.599
0.896
0.543
0.793
0.115
1.170
0.380
0.300
0.366
0.962
0.502
0.044
0.707
0.506
0.359
0.464
0.821
BCT3
0.198
0.024
1.102
0.762
0.640
0.888
0.690
0.306 −0.055
n/a
n/a
0.580 0.325
0.324 0.217
0.497 0.282
0.291 0.155
0.119 0.108
0.775
0.146
1.184
0.384
0.305
0.371
0.968
0.502
0.044
0.707
0.514
0.360
0.463
0.824
BCT2
0.208
0.035
1.133
0.780
0.633
0.896
0.688
Table 1. Box–Cox simulation results.
Specification 2
Specification 3
0.795
0.070
0.995
1.050
1.006
0.863
0.739
0.498
0.029
0.910
0.988
0.946
0.915
0.747
BCT1
0.211
0.023
0.298
0.138
0.092
0.140
0.287
0.006 −0.062
n/a
n/a
1.000 0.163
1.000 0.156
1.000 0.167
1.000 0.075
1.000 0.100
0.803
0.065
1.000
1.000
1.000
1.000
1.000
0.516
0.026
1.000
1.000
1.000
1.000
1.000
MLE
0.306
0.107
1.000
1.000
1.000
1.000
1.000
0.239
n/a
0.509
0.301
0.446
0.246
0.112
0.800
0.068
0.967
1.013
1.009
0.848
0.729
0.499
0.025
0.904
1.009
0.962
0.914
0.749
BCT2
0.207
0.019
0.295
0.137
0.090
0.140
0.287
Specification 4
0.027
n/a
0.318
0.210
0.275
0.138
0.102
0.797
0.069
0.949
0.992
1.004
0.827
0.720
0.499
0.025
0.897
1.034
0.969
0.908
0.749
BCT3
0.200
0.017
0.290
0.135
0.090
0.140
0.287
Response error in a transformation model
377
378
Jason Abrevaya and Jerry A. Hausman
expectations, we report ‘rmse ratios’ for estimation of the conditional expectation at five different
points on the support of x. The ‘rmse ratio’ is defined as the root-mean-squared error of a given
estimator divided by the root-mean-squared error of the MLE estimator. As a result, the ‘rmse
ratios’ are all equal to one for the MLE estimator. A ‘rmse ratio’ below (above) one indicates than
an estimator has more (less) accurate predications than the MLE estimator (using the root-meansquared error criterion).
For the three Box–Cox specifications, there are several interesting findings from the results
in Table 1 which are summarized below:
r In the case of no measurement error (Specification 1), the Box–Cox MLE estimator
r
r
r
outperforms the BCT estimators as one would expect. The MLE estimator has lower RMSE
associated with both estimation of λ̂ and the various conditional expectations. For many of
the conditional expectations, the RMSE of the BCT estimators are about 40% higher than
the RMSE of the MLE estimator. The bulk of this difference is explained by the higher
variance of the BCT estimators (although not reported in the table, the bias in the BCT
estimates of the conditional expectations is minimal). Thus, in cases where measurement
error is either minimal or nonexistent, the MLE estimator may perform better than the BCT
estimators even in cases where the MLE is slightly biased.
The results for the BCT estimators are quite similar across the three different choices of
expansion length. For λ = 0.2 and λ = 0.8, the lower-order expansion estimators (BCT1
and BCT2) seem to exhibit some bias in the estimation of λ. This bias stems from an
omitted-variables problem because the extra expansion terms have been ‘left out’ of the
regression. The bias of the lower-order BCT estimators would be expected to worsen for
larger choice of σ 2 (for a given λ), as suggested by Figures 1 and 2 in Section 5.1. As
expected, this omitted-variables bias is absent from the λ = 0.5 case because only a single
term is required in the Taylor expansion. The λ̂ estimates for this case are very close to 0.5
for each of the BCT estimators.
Across the four measurement-error specifications, the ‘rmse ratios’ seem to generally
improve as more terms are added to the BCT estimation. As the BCT estimators are explicitly
defined to minimize least-squares criteria, the performance of in-sample prediction will
improve as additional terms are added to the regression function. The ‘rmse ratios’ in Table 1
relate to out-of-sample prediction (i.e. how well the estimators estimate the true conditional
expectation). The usual concern with an estimator that uses successively more terms in
the regression function is the variance-bias tradeoff, where the variance gets smaller with
more terms (better in-sample prediction) and the bias gets larger. For the BCT estimators,
however, the additional terms beyond the second-order expansion do not add more estimable
parameters to the model. This feature appears to minimize the usual variance-bias tradeoff
that one would find in other contexts like series estimation (where the additional terms entail
additional parameters). The simplification afforded by the normality assumption seems quite
important here since estimation based upon the original expansion in (21) would be more
likely to suffer from bias problems.
The measurement-error specifications cause the MLE estimator to be inconsistent. The
RMSE for the MLE estimates of λ̂ are larger in most of the cases. Specifications 2 and 3,
where the measurement-error variance depends upon the level of y∗ , have a greater effect
on the MLE estimator. For these specifications, the ‘rmse ratios’ for the BCT estimators
indicate that they significantly outperform the MLE estimator in estimating the conditional
expectations. The heteroskedasticity induced by the measurement error causes problems
C Royal Economic Society 2004
Response error in a transformation model
r
379
for MLE because MLE assumes homoskedasticity and essentially attempts to undo the
heteroskedasticity by its (biased) choice of λ̂. The MLE results are comparable to the BCT
results in the case of Specification 4 for λ = 0.5 and λ = 0.8. In the most non-linear case
(λ = 0.2), however, even homoskedastic measurement error (Specification 4) causes
problems for the MLE estimator; the ‘rmse ratios’ for the BCT estimators in this case
are extremely low (indicating 3–10 times greater efficiency in estimating the conditional
expectations).
Because of the non-linear nature of the models being considered, there can be considerable
variation in relative performance of the estimators based upon the x values of interest. For
instance, in Specification 1 with λ = 0.2, the MLE estimator performs much better than the
BCT estimator at each of the x values except for x = 25, where the estimators are nearly
equivalent in efficiency terms. As another example, in Specification 3 with λ = 0.5, the
MLE is outperformed by the BCT estimators to a greater extent at the interior x values and
to a lesser extent at x = 5 and x = 25.
For the ‘spline’ specification (where the Box–Cox model does not hold), the results in the
bottom panel of Table 1 indicate different behaviour for the BCT estimators:
r Unlike the correctly specified Box–Cox specifications, the BCT estimates vary quite a bit as
r
additional terms are added to the expansion. For this particular design, the ‘rmse ratios’ tend
to be lowest for BCT1 and highest for BCT2 (with the ratios for BCT3 being in between
the ratios for BCT1 and BCT2).
The ‘rmse ratios’ for the BCT estimators are significantly below one for this design (across
all four measurement-error specifications), suggesting much greater efficiency than MLE
in estimation of the conditional expectations. Though it is difficult to generalize to other
misspecified models, one would expect that (i) the flexibility allowed by the BCT method
(relative to MLE) and (ii) the explicit focus upon minimizing a least-squares criterion would
lead to similar findings for other models that are significantly different from the Box–Cox
model. In cases where the Box–Cox model is ‘close’ to the true model and where there is
no measurement error in y∗ , one might expect the MLE to perform better.
Although no series of Monte Carlo simulations can consider all the possible scenarios that
a researcher is likely to encounter, the findings of this section lead us to make the following
remarks to aid practitioners in the estimation of Box–Cox models with potential dependentvariable measurement error:
r If there is dependent-variable measurement error, the MLE will tend to perform worse
r
r
if (i) the transformation function is very non-linear (λ values closer to zero) and (ii) the
dependent-variable measurement error is heteroskedastic (variance depends upon level of
y∗ ).
Differences between the BCT estimators (of different expansion length) are more likely to
occur when either (i) the Box–Cox model is correctly specified and σ 2 is larger or (ii) the
Box–Cox model is incorrectly specified.
If dependent-variable measurement error is suspected, the researcher should estimate the
model using both the MLE and the BCT estimators. Significant differences between the
MLE and the BCT estimates (of λ̂ and/or conditional expectations) would suggest that either
C Royal Economic Society 2004
380
Jason Abrevaya and Jerry A. Hausman
r
dependent-variable measurement error is a problem or some other model mis-specification
has caused a problem.
It is a good idea to try several BCT estimators, especially if the researcher sees differences
among lower-order BCT estimators. As mentioned above, the difference among BCT
estimators could result from particular parameter values in the Box–Cox framework (in
which case more expansion terms are needed) or from the Box–Cox model being incorrectly
specified. As such, it is recommended that the semi-parametric approach of Section 4 be
used in conjunction with BCT estimation.
6. APPLICATION
The data come from the March 1991 wave of the Current Population Survey (CPS). The sample
consists of 5137 private-sector, full-time male employees aged 25 to 64. The wage variable
(WAGE) is weekly earnings divided by hours worked per week. Education (EDUC) is the number
of years of schooling completed, and experience (EXPER) is defined as age minus EDUC
minus six. In the standard Mincer (1974) framework, the logarithm of WAGE is modeled in
a linear regression with education, experience and experience squared. This specification is called
Specification 1 in the empirical results. Murphy and Welch (1990) find that higher-order terms
(third- and fourth-order polynomials) in experience yield a significantly better fit for log-wage.6
To incorporate higher-order terms, Specification 2 includes experience cubed and an interaction
term (education times experience). (Fourth-order polynomials and other interactions yielded no
significant improvements in the fits obtained by any of the estimation methods.)
To compare the different alternatives discussed in this paper, we consider three different
treatments of the transformation function h applied to the WAGE variable. The first assumes
that h(·) ≡ log(·), for which the OLS and NLS estimators discussed in Example 1 of Section 2
are applicable. The second assumes that h(·) belongs to the Box–Cox transformation family, for
which the Box–Cox maximum likelihood estimator (denoted by MLE) and the Box–Cox Taylor
approximation estimator of Section 3 (denoted BCT) are applicable. The third assumes that h(·)
is an unspecified increasing function, for which the semi-parametric approach of Section 4 is
applicable.
Table 2 reports parameter estimates from the estimation of the log-linear and Box–Cox models
under both specifications for the covariates. Standard errors are reported in parentheses. The OLS
estimates are obtained from a regression of log(WAGE) on the covariates, and the NLS estimates
minimize a least-squares objective function based on (7). For both specifications, the λ parameter
from the Box–Cox MLE is statistically significant meaning that the log-linear model is rejected in
a statistical sense. However, the magnitude of the λ estimates are quite close to zero in an absolute
sense, which indicates that inference from the OLS and the MLE parameters may be similar. The
results from Specification 2 are similar to those for Specification 1. The two additional covariates
are statistically significant in the OLS and the MLE columns, whereas the interaction term is
statistically insignificant for NLS.
Table 3 reports the estimated effects of education on expected log-wage, with standard errors
reported in parentheses. For Specification 1, the estimated effects for OLS and NLS are the EDUC
coefficient estimates reported in Table 2. The marginal effect in Specification 1 is estimated to
6 Card
(1999) provides a more complete discussion of the wage-equation specification.
C Royal Economic Society 2004
381
Response error in a transformation model
Table 2. Parametric estimates.
Specification 1
log model
const
EDUC
EXPER
EXPER2
(×100)
EXPER3
(×100)
EDUC × EXPER
(×100)
λ
Specification 2
log model
Box–Cox model
Box–Cox model
OLS
NLS
MLE
OLS
NLS
MLE
0.9342
0.7413
0.8676
0.5234
0.4957
0.3695
(0.0431)
0.0904
(0.0560)
0.1088
(0.0568)
0.1081
(0.0980)
0.1101
(0.1239)
0.1190
(0.1336)
0.1324
(0.0023)
(0.0031)
(0.0066)
(0.0057)
(0.0071)
(0.0103)
0.0272
0.0318
0.0326
0.0603
0.0558
0.0727
(0.0025)
−0.0344
(0.0029)
−0.0427
(0.0035)
−0.0412
(0.0076)
−0.1294
(0.0095)
−0.1275
(0.0109)
−0.1558
(0.0052)
(0.0063)
(0.0066)
(0.0303)
0.0011
(0.0004)
−0.0780
(0.0223)
(0.0368)
0.0011
(0.0005)
−0.0410
(0.0309)
(0.0401)
0.0013
(0.0005)
−0.0943
(0.0270)
0.0709
(0.0226)
0.0691
(0.0212)
Table 3. Incremental effects of education on expected log-wage.
Specification 2
EXPER = 10
Specification 1
Quantile
OLS
MLE
NLS
20%
50%
80%
20%
50%
EXPER = 20
80%
20%
50%
80%
0.0904
0.0904
0.0904
0.1023
0.1023
0.1023
0.0945
0.0945
0.0945
(0.0023) (0.0023) (0.0023) (0.0038) (0.0038) (0.0038) (0.0025) (0.0025) (0.0025)
0.0922
0.0909
0.0896
0.1045
0.1030
0.1016
0.0965
0.0951
0.0938
(0.0025) (0.0024) (0.0023) (0.0063) (0.0061) (0.0059) (0.0064) (0.0061) (0.0059)
0.1088
0.1088
0.1088
0.1146
0.1146
0.1146
0.1107
0.1107
0.1107
(0.0031) (0.0031) (0.0031) (0.0046) (0.0046) (0.0046) (0.0032) (0.0032) (0.0032)
be 9.0% and 10.9% based on OLS and NLS, respectively. For the Box–Cox MLE, the estimated
marginal effect depends on covariate values at which it is evaluated. The table reports estimates at
three quantiles (20%, 50% and 80%) of the estimated index values x β̂ M L E . The MLE estimates
are close to the OLS estimates and slightly decline at higher quantiles. To offer some more detail,
Figure 3 provides a plot of estimated marginal effects against observation number (where the
observations are sorted by their estimated index values x β̂). For Specification 2, Table 3 reports
marginal effects evaluated at experience values of 10 and 20. Across the three estimators, the
marginal effect of education on expected log-wage falls with the level of experience (as indicated
by the negative coefficient estimates on the interaction term in Table 2). As in Specification 1, the
OLS and MLE results are similar while NLS yields higher estimated marginal effects than both
C Royal Economic Society 2004
382
Jason Abrevaya and Jerry A. Hausman
0.125
OLS
NLS
MLE
0.120
0.115
0.110
0.105
0.100
0.095
0.090
0.085
0.080
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Figure 3. Incremental effects of education on expected log-wage.
OLS and MLE. Across the columns, the NLS estimates are 12–20% higher than the OLS and
MLE estimates.
Next, we consider inference on expected wage (rather than expected log-wage). Under a
classical measurement error assumption on WAGE, the Taylor-approximation estimator and the
semi-parametric estimator can be used for inference as discussed in the previous two sections.
For the first stage of the semi-parametric procedure discussed in Section 4, the maximum rank
correlation (MRC) estimator of Han (1987) was used to estimate β. This estimator maximizes
the objective function
S(b) =
1(xi b > x j b) · 1(yi > y j )
(33)
i= j
over the (normalized) parameter space.7 Table 4 reports the normalized coefficient estimates for
all of the estimators. For the Box–Cox Taylor-approximation method, BCT1 and BCT2 denote
one-term and two-term expansions, respectively. Additional expansion terms gave nearly identical
results and are thus omitted. The MRC coefficient-ratio estimates are statistically significant in
Specification 1, but the interaction term is insignificant in Specification 2. For both specifications,
the estimated ratios are quite similar across all the estimators. In particular, the estimated ratios
themselves do not provide evidence of misspecification (e.g. through a Hausman specification
7 The algorithm of Abrevaya (1999) was used in order to evaluate S(b) in O(n log n) operations rather than O(n2 )
operations. Standard errors were computed using the kernel-estimation method proposed by Cavanagh and Sherman
(1998), with window width of σ̂ n −1/6 (where σ̂ is the standard deviation of the estimated index values).
C Royal Economic Society 2004
383
Response error in a transformation model
Table 4. Comparison of normalized coefficient estimates.
Specification 1
Specification 2
EXPER2
EDUC
EXPER
OLS
1
0.301
MLE
1
NLS
(×100)
−0.381
EDUC
EXPER
1
0.548
(0.029)
(0.059)
0.302
(0.031)
−0.381
(0.062)
1
1
0.292
−0.392
BCT1
1
(0.027)
0.289
(0.030)
(0.059)
−0.393
(0.064)
BCT2
1
0.288
(0.030)
−0.393
(0.064)
MRC
1
0.321
(0.028)
−0.483
(0.061)
EXPER2
EXPER3
(×100)
(×100)
−1.176
0.010
EDUC × EXPER
(×100)
−0.708
(0.061)
(0.279)
(0.004)
(0.170)
0.549
(0.066)
−1.178
(0.294)
0.010
(0.004)
−0.711
(0.176)
1
0.468
−1.078
0.010
−0.325
1
(0.068)
0.446
(0.071)
(0.308)
−0.912
(0.331)
(0.004)
0.008
(0.005)
(0.239)
−0.451
(0.238)
1
0.444
(0.071)
−0.902
(0.334)
0.007
(0.005)
−0.453
(0.240)
1
0.440
(0.089)
−1.081
(0.390)
0.009
(0.005)
−0.053
(0.378)
test). Because of the difference in the models, however, similarities in estimated ratios do not
indicate that estimated marginal effects will be similar across the models.
To obtain non-parametric estimates of the expected wage function under Specification 1, a
local linear regression estimator was applied to the estimated index values x β̂ M RC (normalized
as in Table 4 to have the education coefficient equal to one).8 Figure 4 shows the estimated
expected-wage function (versus index value) along with 90% confidence bands. The rule-ofthumb bandwidth from Fan and Gijbels (1996, p. 111) was used.9 An index-value density estimate
is also provided in order to show the relevant regions of the expected-wage function.10
Table 5 reports estimated marginal effects of education on expected wage for all of the
estimators (with standard errors in parentheses). The semi-parametric approach (labelled
MRC/LQR) in the table uses a local quadratic regression in the second stage. As expected, the
non-parametric approach has larger standard errors, reflecting the inherent tradeoff between
flexibility and efficiency. The estimates at the 20% quantile in both specifications are quite similar
across the methods. At the 50% and 80% quantiles, the methods based on the inversion idea (NLS,
BCT1, BCT2, MRC/LQR) yield higher estimates than OLS and MLE. Figure 5 provides graphs
of the estimated marginal effects for Specification 1. For comparison purposes, the x-axis is
observation number sorted by estimated index. To avoid cluttering, the non-parametric estimates
are compared with the OLS and MLE estimates in the first graph and with NLS and BCT1 in the
second graph. For the higher index values, the OLS and MLE estimates are somewhat lower than
8 Other non-parametric estimators could certainly be used in this second stage. The authors have also tried both kernel
regression and isotonic regression with very similar results.
9 In the notation of Fan and Gijbels (1996), w (x) = 1 and C
o
ν, p (K ) = 1.719 (ν = 0, p = 1) were used for the local
linear regression. For the local quadratic regression considered later, w o (x) = 1 and C ν, p (K ) = 2.275 (ν = 1, p = 2)
were used.
10 For this density estimate, we used an Epanechnikov kernel and the associated Silverman (1986) rule-of-thumb bandwidth (equal to 2.34σ̂ n −1/5 , where σ̂ is the standard deviation of the estimated index values).
C Royal Economic Society 2004
384
Jason Abrevaya and Jerry A. Hausman
Table 5. Incremental effects of education on expected wage.
Specification 2
EXPER = 10
Specification 1
EXPER = 20
Quantile
20%
50%
80%
20%
50%
80%
20%
50%
80%
OLS
1.004
1.223
1.502
1.128
1.381
1.688
1.042
1.276
1.559
MLE
(0.023)
0.923
(0.032)
1.112
(0.047)
1.347
(0.036)
1.039
(0.053)
1.258
(0.077)
1.515
(0.025)
0.959
(0.034)
1.162
(0.054)
1.399
(0.024)
(0.029)
(0.041)
(0.038)
(0.052)
(0.074)
(0.026)
(0.033)
(0.052)
BCT1
1.159
(0.028)
1.013
1.454
(0.041)
1.346
1.865
(0.068)
1.892
1.174
(0.046)
1.062
1.459
(0.070)
1.412
1.890
(0.087)
2.031
1.091
(0.034)
1.012
1.356
(0.048)
1.346
1.757
(0.076)
1.935
BCT2
(0.037)
1.001
(0.037)
1.340
(0.065)
1.896
(0.055)
1.051
(0.083)
1.409
(0.088)
2.038
(0.045)
1.001
(0.064)
1.343
(0.074)
1.941
(0.041)
(0.037)
(0.067)
(0.057)
(0.083)
(0.090)
(0.048)
(0.064)
(0.076)
0.875
(0.100)
1.516
(0.095)
2.075
(0.155)
0.917
(0.105)
1.537
(0.097)
2.080
(0.168)
0.912
(0.104)
1.529
(0.096)
2.069
(0.167)
8
10
12
14
16
18
20
22
8
10
12
14
16
18
20
22
NLS
MRC/LQR
Expected Wage
20
15
10
5
4
0.25
6
Index Density
0.20
0.15
0.10
0.05
4
6
Figure 4. Non-parametric estimates of expected wage and index density.
C Royal Economic Society 2004
385
Response error in a transformation model
5
4
OLS
NP
MLE
500
1000
NLS
NP
BCT
500
1000
3
2
1
0
0
5
4
1500
2000
2500
3000
3500
4000
4500
5000
1500
2000
2500
3000
3500
4000
4500
5000
3
2
1
0
0
Figure 5. Incremental effects of education on expected wage.
the nonparametric estimates (consistent with the numbers in Table 5), whereas the NLS and BCT
estimates track closer to the nonparametric curve at the higher index values.
To summarize the findings of this section, the methods that are robust to classical measurement
error in wages yield higher marginal effects of education on both expected log-wage and expected
wage. Of course, several caveats should be kept in mind. First, the sample considered here is just
a single year and may not generalize to other samples. Second, without cross-validation data, it is
impossible to determine whether the reported differences in estimates arose from measurement
error in wages or from some other specification issue. In any case, it is interesting to note that
accounting for classical measurement error in wages seems to indicate a downwards bias in the
OLS estimate of the marginal effect of education. Previous research on returns to education
(see Harmon and Walker (1995) and Card (1999) for recent surveys) has focused on the effect
of omitted variables and mismeasured education on the estimated return to education. Through
the use of additional information (e.g. panel data, instrumental variables), almost all of these
studies have found that the OLS estimate of returns to education is biased downwards. Future
research might attempt to combine the approach taken in this paper with previous approaches in
the literature.
7. CONCLUSION
In this paper, we have considered methods of dealing with the problem of classical measurement
error in the dependent variable of a transformation model. The basic strategy has been to re-write
C Royal Economic Society 2004
386
Jason Abrevaya and Jerry A. Hausman
the transformation model directly in terms of y and estimating its conditional expectation, which is
not affected by classical measurement error. Of course, these methods are not substitutes for actual
data on the measurement-error process (e.g. cross-validation data) because their applicability relies
upon the maintained measurement-error assumption(s). Future work might examine usefulness
of these methods in other economic applications of the transformation model, such as the use of
the Box–Cox model to estimating hedonic pricing equations or the use of duration models on
complete-spell data.
There are also some interesting possibilities for future research that have not been considered
in the current paper. First, we have restricted the latent-variable transformation model (see
equation (1)) to have homoskedastic errors. It would be interesting to relax this assumption
and allow for heteroskedastic errors. Second, as the BCT estimators are based upon least-squares
criteria, they have some of the usual drawbacks associated with least-squares estimators. In
particular, the BCT estimators may be influenced by outliers and may be relatively inefficient.
Perhaps median-based or weighted least-squares methods might offer valuable alternatives to the
proposed BCT estimators.
REFERENCES
Abrevaya, J. (1999). Computation of the maximum rank correlation estimator. Economics Letters 62, 279–
85.
Abrevaya, J. and J. A. Hausman (1999). Semiparametric estimation with mismeasured dependent variables:
An application to duration models for unemployment spells. Annales d’Economie et de Statistique 55/56,
243–75.
Amemiya, T. (1985). Advanced econometrics. Cambridge, MA: Harvard University Press.
Amemiya, T. and J. L. Powell (1981). A comparison of the Box-Cox maximum likelihood estimator and the
non-linear two-stage least squares estimator. Journal of Econometrics 17, 351–81.
Becker, G. S. (1975). Human Capital. New York: Columbia University Press.
Bound, J. and A. B. Krueger (1991). The extent of measurement error in longitudinal earnings data: Do two
wrongs make a right? Journal of Labor Economics 9, 1–24.
Box, G. E. P. and D. R. Cox (1964). An analysis of transformations (with discussion). Journal of the Royal
Statistical Society (Series B) 26, 211–52.
Breiman, L. and J. H. Friedman (1985). Estimating optimal transformations for multiple regression and
correlation. Journal of American Statistical Association 80, 580–98.
Brownstone, D. and R. G. Valletta (1996). Modeling earnings measurement error: A multiple imputation
approach. Review of Economics and Statistics 78, 705–17.
Buonaccorsi, J. P. (1996). Measurement error in the response in the general linear model. Journal of the
American Statistical Association 91, 633–42.
Card, D. (1999). The causal effect of education on earnings. in O. C. Ashenfelter and D. Card (Eds) Handbook
of Labor Economics. Vol. 3A, pp. 1801–1863, Elsevier Science, Amsterdam.
Cavanagh, C. and R. P. Sherman (1998). Rank estimators for monotonic index models. Journal of
Econometrics 84, 351–81.
Chen, G., R. A. Lockhart, and M. A. Stephens (2002). Box-Cox transformations in linear models: Large
sample theory and tests of normality (and comments). Canadian Journal of Statistics 30, 177–234.
Chen, S. (2002). Rank estimation of transformation models. Econometrica 70, 1683–97.
Cole, N. and J. Currie (1994). Reported income in the NLSY: Consistency checks and methods for cleaning
the data. NBER Technical Paper no. 160.
Fan, J. and I. Gijbels (1996). Local Polynomial Modelling and Its Applications. London: Chapman & Hall.
C Royal Economic Society 2004
Response error in a transformation model
387
Greene, W. H. (2002). Econometric Analysis. New Jersey: Prentice Hall.
Han, A. K. (1987). Non-parametric analysis of a generalized regression model: The maximum rank
correlation estimator. Journal of Econometrics 35, 303–16.
Harmon, C. and I. Walker (1995). Estimates of the economic return to schooling for the United Kingdom.
American Economic Review 85, 1278–86.
Hausman, J. A. (1978). Specification tests in econometrics, Econometrica 46, 1251–71.
Hausman, J. A., J. Abrevaya and F. Scott-Morton (1998). Misclassification of the dependent variable in a
discrete-response model. Journal of Econometrics 87, 239–69.
Heckman, J. and S. Polachek (1974). Empirical evidence on the functional form of the earnings-schooling
relationship. Journal of the American Statistical Association 69, 350–54.
Horowitz, J. L. (1996). Semiparametric estimation of a regression model with an unknown transformation
of the dependent variable. Econometrica 64, 103–37.
Ichimura, H. (1993). Semiparametric least squares (SLS) and weighted SLS estimation of single-index
models. Journal of Econometrics 58, 71–120.
Judge, G. G., W. E. Griffiths, R. C. Hill, H. Lütkepohl and T. Lee (1985). The Theory and Practice of
Econometrics. New York: John Wiley & Sons.
Klein, R. W. and R. P. Sherman (2002). Shift restrictions and semiparametric estimation in a generalized
transformation model. Econometrica 70, 663–91.
Lee, L. and J. H. Sepanski (1995). Estimation of linear and nonlinear errors-in-variables models using
validation data. Journal of the American Statistical Association 90, 130–40.
Li, T., P. K. Trivedi and J. Guo (2003). Modeling response bias in count: A structural approach with an
application to the National Crime Victimization Survey data. Sociological Methods and Research 31,
514–44.
Mincer, J. (1974). Schooling, Earnings, and Experience. New York: Columbia University Press.
Murphy, K. M. and F. Welch (1990). Empirical age-earnings profiles. Journal of Labor Economics 8, 202–29.
Pischke, J. S. (1995). Measurement error and earnings dynamics: Some estimates from the PSID validation
study. Journal of Business and Economic Statistics 13, 305–14.
Poirier, D. J. (1978). The use of the Box-Cox transformation in limited dependent variable models. Journal
of the American Statistical Association 73, 284–87.
Powell, J. L., J. H. Stock and T. M. Stoker (1989). Semiparametric estimation of index coefficients.
Econometrica 57, 474–523.
Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. New York: Chapman and
Hall.
Wooldridge, J. M. (1992). Some alternatives to the Box-Cox regression model. International Economic
Review 33, 935–55.
APPENDIX
We consider the remainder term in the Taylor expansion for the Box–Cox model (see (20)) and show that it
tends to zero as d gets large. It has long been understood that the normality assumption can not be strictly
true in order for the Box–Cox model to be well-defined. See, for instance, Poirier (1978). We first assume
that the linear index α + x β is bounded from below by a constant L. Then, assume that is distributed as a
(symmetric) truncated normal distribution with bounds [−M, M] where L > M. In the types of economic
applications that the Box–Cox model has been used (e.g. wage equations and hedonic pricing equations),
the constants L and M would be extremely large positive numbers so that, for estimation purposes, the use
of a (non-truncated) normal distribution is not troublesome.
C Royal Economic Society 2004
388
Jason Abrevaya and Jerry A. Hausman
In the case of the Box–Cox model, the remainder term can be written as Rd ≡ E[Rd ()], where
Rd () ≡
(1 − λ)(1 − 2λ) · · · (1 − (d − 1)λ) d
1
(1 + λ(α + x β + t())) λ −d .
d!
(34)
Choosing d large enough so that λ1 − d < 0, we have
(1 − λ)(1 − 2λ) · · · (1 − (d − 1)λ) d
1
(1 + λ(α + x β − M)) λ −d |Rd ()| ≤ d!
d
(1 − λ)(1 − 2λ) · · · (1 − (d − 1)λ) ||
1
λ
= 1 + λ(α + x β − M) (1 + λ(α + x β − M))
d!
for positive λ. Then, using the fact that bd /d! → 0 as d → ∞ for positive b, we have lim d→∞ |Rd ()| = 0
for 0 < λ < 1. With the boundedness of ||, the same holds for |Rd | by taking expectations.
C Royal Economic Society 2004