Identification and estimation of dynamic models with a time

Journal of Econometrics 59 (1993) 99-123. North-Holland
Identification and estimation of dynamic
models with a time series of
repeated cross-sections*
Robert Moffitt*
Brown University, Providence, RI 02912, USA
Repeated cross-sectional data contain information on independent cross-sections of individual
units at two or more points in time. Estimation of dynamic models with such data is made difficult
by the general lack of information on lagged dependent and independent variables and the
consequent unobservability of the intertemporal covariances needed to identify and estimate
dynamic models. It is demonstrated here that the parameters of such models, both linear and
nonlinear, both with and without fixed individual effects, are identified and can be consistently
estimated with the imposition of certain restrictions. The paper includes an examination of the
identification and estimation with repeated cross-sectional data of dynamic discrete dependent
variable models, which can be parameterized in terms of transition rates between the different
cross-sections.
1. Introduction
Repeated cross-sectional (RCS) data contain information from independently
drawn sets of cross-sections of a population at two or more points in time. While
such data obviously provide more information than data from a single crosssection, RCS data are generally regarded as inferior to true panel data - that is,
data on the same cross-sectional units over time - for the estimation of dynamic
models. However, it has been pointed out previously that at least one class of
models - linear with fixed effects - is identified and can be consistently estimated
with RCS data [Browning et al. (1983, Deaton (19831.’ The present paper
extends this point in several ways: (1) the identification conditions for the
Correpondence
to: Robert Moffitt, Department
of Economics, Brown University, Providence, RI
029 12, USA.
*The author would like to thank Christopher Flinn, James Heckman, Franc0 Peracchi, Marno
Verbeek, and the participants of seminars at several universities for comments. The comments of
three anonymous ;efere& were also helpful. This paper was presented at the August 1990 World
Congress of the Econometric Society in Barcelona and the CIDE Conference on ‘The Econometrics
of Panels and Pseudo Panels’ in V&ice in October 1990.
‘It has also been shown by Heckman and Robb (1985) that certain classes of models for the
impact of interventions can be consistently estimated with RCS data.
0304476/93/f%.OO
0
1993-Elsevier
Science Publishers B.V. All rights reserved
100
R. MojQt,
Model ident$carion
and estimation
with RCS data
linear fixed effects model are stated explicitly; (2) estimation methods for
the linear fixed effects model are demonstrated which make use of the individual
micro data and which economize on parameters; (3) autoregressive linear
models are considered; and (4) the methods are extended to models with discrete
dependent variables, both with and without fixed effects. It is shown that
the dynamic models in all cases are identified and that consistent estimating
techniques are available from RCS data with the imposition of certain restrictions.
Efficiency questions are not considered in the paper nor is any direct consideration of the relative efficiency of true panel data and RCS data considered.
That is an important topic but one left for future work. The analysis raises
a number of other important questions as well which are left for future investigation; these are discussed at the end of the paper.
The availability of methods for the estimation of dynamic models with RCS
data is important for applied work in countries without panel data, as is the
situation in many Western European countries and many developing countries.
However, it is also important even in countries, such as the U.S., where panel data
are often available, for the available panels are often inferior to the available
cross-sections in some respects. For example, the U.S. Current Population Survey
(CPS) has larger samples, more representative samples over time because they are
unaffected by attrition, and more consistently-defined questions over time than
the available U.S. panels. While a direct comparison of this pe will not be made in
the paper, it is worth noting that the relative desirability of the particular panels
and RCS data sets that may be available in a country may not be clearcut.
The analysis of RCS data is also of interest because such data provide
a connecting link between micro and aggregate data. As will become clear
below, the methods for estimating dynamic models with RCS data are explicitly
or implicitly grouping methods akin to those used to generate aggregate data
series. A comparison of estimates from RCS data and from true panel data
would shed light on, for example whether the commonly observed differences in
parameter estimates from aggregate data and panel data in some areas of
research (e.g., of life cycle models) are the result of the panel nature of the latter
or simply their individual micro nature.
Section 2 of the paper outlines the classes of models to be considered. Linear
models are discussed in section 3 and discrete-dependent-variable
models are
discussed in section 4. Section 5 reports two empirical illustrations, and a summary and set of conclusions follows in section 6.
2. Classes of models considered
The classes of models considered
general model:
YZ = aYi.t-1 +
4tB +_h +
&it*
in the paper are specializations
i=l,...,
IV, t=l,...,
l-,
of the
(1)
R. MoJitt. Model iden@cation and estimation with RCS data
E(&) = g2,
E(EitEjJ = 0,
Vi,j, t # 5,
101
(2)
where yt is a latent endogenous continuous variable for cross-sectional unit i at
time t, yit is its (possibly dichotomous) realization, Xit is a (K x 1) vector of
strictly exogenous (w.r.t. &it)variables for i at t, /? is its associated coefficient
vector,fi is a fixed effect orthogonal to &itbut not to Xit and yi, 1- 1, and &itis an
error term with scalar covariance matrix. The dynamics in the model arise both
from the presence of the fixed effect and from the lagged endogenous variable.
Fixed effects rather than random effects are considered because the latter, at
least if orthogonal to Xi,, are less likely to raise issues of consistency (e.g., if GI= 0,
then consistent estimates in the presence of orthogonal random effects can be
obtained by pooling the cross-sections).2 Only first-order lags in yi, are considered for simplicity; generalization to higher-order lags is straightforward.
A nonscalar covariance matrix of &it is also not considered because serial
correlation parameters in that matrix will generally not be identified with RCS
data, and identification of the other parameters in the model will generally not
hinge on the absence of such correlation in any case. Finally, it is assumed for
simplicity that the cross-sections at each t are of the same size (N), that the
population is closed with respect to in- and out-migration, and that there are no
births or deaths.
Letting y, denote the observed value of the endogenous variable for unit i at
time t, the critical characteristic of RCS data is that yi, is observed but yi, 1_1 is
not. Thus no estimate of cov (yil, yi, f _ 1) is available in the data. Covariances are
needed to estimate (1) not only because of the presence of yi, f_ 1 as a regressor
but also because of the presence of the fixed effect. The question of the paper is
the nature of the restrictions that must be imposed to estimate such models in
the absence of information on the covariances.
Four special cases will be separately considered:
I. Linear Models (yi, = yt)
(A) CI= 0
(fixed effects model),
(B) 5 = 0
(autoregressive model).
II. Binary Choice Limited Dependent Variable Models
(yi, = 1
if
yi”;2 0;
yi, = 0
otherwise)
‘The ‘fixed effect’ in this model can be considered
to be either a fixed unknown constant or
a stochastic term possibly correlated with Xi, (i.e., a ‘correlated random effect’). The distinction will
have no operational
content for the analysis below.
R. Mofitt,
102
Modelidentification and estimation with RCS data
(A) CI= 0
(binary choice fixed effects model),
(II) fi = 0
(Markov model).
Extensions to cases which are combinations of these are straightforward and not
discussed.3
3. Linear models
3.1. Fixed effects models
To keep matters simple we shall initially consider a model with a single
regressor. In addition, we shall henceforth make explicit the nonpanel nature
of the data by indexing individuals in each cross-section by i(t), for in RCS
data the individuals are potentially different in each cross-section. We therefore have
J’i(t)t= PO+ P1xict,t+.hcl, + Ei(z)t, i(t) = 1, . . . , N,
t = 1, . . . , T.
(3)
Estimation of (3) by least squares on the pooled RCS data set will yield
inconsistent estimates for /?I if xicrjt andAft) are correlated, since the latter is
unobserved and hence must be omitted from the regression. Considering identification indirectly by search for a consistent estimator, we seek an instrumental
variables (IV) estimator that yields a consistent estimate of fll, and hopefully an
estimator for which identification conditions are well-known. For this we
require an instrument for Xi(r)t that will, among other things, be asymptotically
uncorrelated with fictJ.
The most important class of instruments is that consisting of functions of t.
Let us specify a linear projection of Xi(r)t onto K such functions as follows:
Xi(r)t =
5
k=l
dkgk(t)
+
Wi(t)t,
(4)
where gk are known functions and where Wi(r)tis a residual orthogonal to those
functions by construction. Using a function of t as an instrument is sensible
because x.,(f)f presumably varies with t; if it did not, /?i could not be identified
31n the binary choice model the lagged dependent variable could be in latent form rather than in
the form of the observable. The analysis would be simpler in that case because an analytic reduced
form could be derived and because instrumental
variables could be used, as it cannot be in the case
actually considered here (see below).
R. MoJiirt, Model ident$cation and estimation with RCS data
103
even with true panel data.4 Also, functions oft should be uncorrelated with true
fixed effects because the latter are time-invariant and hence must be uncorrelated with functions of t.
Formally, let 2, be the least squares prediction from (4). Then consistency of
the IV estimator of p1 requires that at least one element of the set (6,) be
nonzero and that plim [(l/NT)Ci(t),l~t~(r)l
= 0.’ The first of these conditions
should hold if ~2~varies with t, as just noted. Whether the second condition will
hold is more questionable, for the nature of the RCS data implies that the fixed
effects in the sample will differ in each cross-section (t) because there are
potentially different individuals in each.
The fixed effectfict) can be decomposed into its sample mean at each t ($) and
its deviation from that mean (Vi(t)):
f;:(t)=X + Vi(t).
The former can be further decomposed into the true mean fixed effect in the total
population (f*), which is time-invariant by necessity - here the assumption of
a closed population is important - and a deviation from that mean arising from
sampling error in the data (vi):
Thus we have
h(t) =P + Vi +
(7)
Vi(t).
Consistency of the IV estimator requires that i2, be asymptotically uncorrelated
withf*, vi, and vi(t). Sincep is a constant, this condition must be met. However,
it is met for the latter two variables as well as N --too, though not as T +co. As
N +oo, the sampling error vi goes to 0; further, the distribution of vi(t) becomes
independent of t (and hence of 2,) because the sample of individuals in each
successive cross-section becomes identical when the total population is sampled,
even if their identities are not known.
Remark
I. The two conditions for consistency are stronger than
required for true panel data. With panel data, /I1 can be identified
8k = 0 Vk because Oi(t)t still provides variation over t conditional
Furthermore, sincefi(,) =fi Vt, unbiased estimates of p1 can be obtained
finite N by conditioning on A.
“However,
variation
‘By assumption,
+),
in xi,,), over t could arise from q,),
is asymptotically
uncorrelated
only; see below.
with Q,,),.
what is
even if
on i(t).
even at
104
R. Mofitt,
Model identiJication and estimation
with RCS data
Remark 2. The restrictions necessary for identification and estimation with
RCS data are needed to circumvent what is a type of missing data problem,
namely, that the past and future values of yi, and xit for the same individual i are
missing. But the lack of data on Xi, at different t for the same i is more serious
than the lack of data on similar yi,. With true panel data the conventional within
estimator for /I1 can be written as an instrumental variables estimator with
(xir - xi,) as instrument (xi, = mean over t for each i):
11
Yi*(xit - xi.)
fQwithin) = $i
(8)
Xit(Xir - xi,).
t
If the available RCS data contained
individuals in the sample, even if not
could be computed even for such
consistent under the same conditions
identical.
information on histories of the Xit of the
the yi, histories, the right-hand side of (8)
data. The resulting estimator would be
as is the within estimator because they are
The model thus far requires time series aggregation - the instrument g2,varies
only with t, not with i(t).6 More efficient estimates can be obtained if the micro
data contain information on exogenous time-invariant characteristics which are
determinants of x. In many models of individual behavior, year of birth qualifies
as such as a variable. If there are ‘cohort’ effects in x, then an instrument for
xi(t)t can be based on a linear projection onto functions of both t and ci(t), the
year of birth of individual i(t):
xi(t)t =
5
k=l
81kglk(t)
+
f
m=l
82mgZm(~i(t)~
t)
+
i
j=l
63jg3j(ci(t))
+
mi(r)t,
(9)
and g3j are known functions. Unfortunately, an instrument
glk?
92mT
based on (9) is likely to be correlated withf;:,,, because the fixed effect is likely to
be correlated with cohort as well (indeed, this may be one source of the
covariation between xi(,), and &,). Denoting x(t) as the mean fixed effect for
those sampled at t who were born in year c; Vi(t)as the deviation ofA from that
mean;f: as the true mean for cohort c in the total population; and v&i as the
deviation of &) from f:, we have
where
6The left-hand side need not be grouped w.r.t. (i.e., aggregated over) t for consistent estimation.
See footnote 8 below.
R. Mojitt,
Model identification and estimation with RCS data
105
in analogy with (7). Sincef: can be represented with cohort dummies explicitly
in the main equation (3), consistency again requires that Ai( be asymptotically
uncorrelated with v&) and vi(t). Consistency follows in this case as N +co,
holding T and the number of cohorts fixed. The reasoning is identical to that
given previously with the modification that the number of observations in each
cohort-t cell goes to infinity.’
Remark 3. Browning et al. (1985) computed group means of yi(r)t and
Xi(r)t within cohort-age cells and regressed the y means on the x means and
cohort dummies. This procedure is a special case of the one outlined here, with
a full set of age, cohort, and cohort-age interaction dummies appearing in (9). It
is well-known that grouping methods are instrumental variables methods, with
cohort and age serving as the grouping criteria in this case. Only age (t) is
excluded from the main equation and hence it serves as the identifying grouping
variable, just as in the simpler model.’
It may be noted that a full set of cohort dummies may not be required in the
main equation. The equation
J
Yi(r)t = PO + PlXi(t)t +
(11)
C yjhj(Ci(t)) + Ef(t)t,
j=l
where hj are known functions, has as a special case equation (3) with (10)
substituted in and with cohort dummies used for thef:. In most applications, it
is likely that y will vary smoothly with cohort effects and, hence, those effects will
be representable with fewer parameters than would be required with a full set of
cohort dummies. Efficiency gains could thus result. An example of the degree to
which such parsimony can be achieved in one application is given in the
empirical section below.g
Remark 4.
Deaton (1985) noted an errors-in-variables
problem with
the Browning et al. procedure that arises because the means of Xilt)t for each
cohort-age cell are error-filled measures of true cohort-age means. In the model
here that problem appears in the terms vi(,) and vi(f)*Those terms represent errors
‘See Verbeek
and Nijman
(1992) and Angrist
(1991) for a discussion
of this condition
as well.
*Grouping the dependent variable, yi,,,,, is unnecessary for identification
and point estimation. It
is straightforward
to show that a least squares regression of the cell means of y on the cell means of
x yields estimates identical to those of a regression of the individual yitot on the cell means of x.
However, standard errors may be more accurate if y is grouped. The efficiency and optimality of IV
methods is not discussed here, only their identification.
‘Parsimony
in (9) is not particularly
desirable asymptotically.
There is no large-sample efficiency
loss from including all possible functions of t and ci,,r in the instrument.
However, there may be
small-sample
losses.
R. Mo$ftt, Model idenlification and estimation with RCS data
106
in the cohort dummies forf: which arise because those dummies imperfectly
proxy the cross-section-specific meansJ(,,. Those errors may be correlated with
A
Xi(r)t in small samples, but this correlation disappears as N-co,
as noted
previously.
Moving to the general case, new issues are raised. The general case can be
written
Yi(f)f=
X(f)fP+h(f) + &i(t)f
(14
9
where Xi(r) is now a (K x 1) vector of regressors potentially correlated with&,,.
Assume that there exists an (L x 1) vector Zi(t) of time-invariant variables [for
individual i(t)] including not only cohort but also, for example, sex, race, years
of education (if schooling has been completed), and residential location (if
mobility can be ignored). Also let FI’i(t)tbe an (M x 1) vector of time-varying
variables uncorrelated withACt,, which may consist only of functions oft. Then
the linear projections upon which the IV method is based can be written as
Xi(f)t= 61Wi(r)t
5‘(t)= zi#JY +
+
d*Zi(f)
+
(13)
mi(t)t7
(14)
Vi(t),
where 6i is a (K x M) matrix of coefficients, S2 is a (K x L) matrix of coefficients,
Wi(t)t is a (K x 1) vector of errors, y is a (L x 1) vector of coefficients, and
Vi(t)represents the remaining individual effect conditional on Zi(t). Eq. (14) is
a generalization of (lo), with Z&,,y capturing the effects of all time-invariant
variables, not just cohort, and with vi(t) in (14) representing both errors in (10)
combined. Letting y, X, Z, and v be the stacked NT vectors for all i and t and
letting U = [X Z] and 6 = [X Z], where X is the matrix of least squares
predictions from (13), the IV estimator for /I and y is (oU)-’
t?“y and consistency requires that plim [(l/NT) ov] = 0 and that 0 be of full column rank,
(K + L). As in the simpler case, consistency is achievable only as N +co holding
T fixed.”
Remark 5. This generalization permits ‘grouping’ conditional on any set of
time-invariant variables, not just cohort. The sample sizes in most data sets
would not permit literal grouping by cells of Zi(t). The regression-based procedure proposed here is one method for dealing with this problem (literal grouping
by sufficiently large cells is still an alternative).”
“‘Also required is plim[(l/NT)
because of time-specific shocks,
C?‘E] = 0; this is assumed throughout.
If this assumption
different instruments
may be needed. See below.
“Time-specific
variables such as the aggregate
to ‘group’ in this regression sense.
unemployment
fails, e.g.,
rate may also be entered and used
R. MO&,
Model identification
and estimation
with RCS data
107
Remark 6. There may be difficulties in meeting the rank condition if the
dimension of Xi(t)t is high and if W’i(r)ronly contains functions oft, for identification in this case requires an additional independent function of t for each
additional variable in Xi(r)t, The problem is likely to be especially severe if
functions oft are included themselves in Xi(t)r, for, while they can serve as their
own instruments, they will make it more difficult to identify the coefficients of
other variables in Xi(t)t because additional functions of t will be required.
Remark 7. If Xi(r)r is redefined to include time-invariant variables for which
‘structural’ coefficients in the /I vector are defined, their coefficients cannot be
identified if those variables are allowed to be correlated withfict). This result is
similar to the nonidentification of the coefficients on time-invariant variables in
the panel data model with fixed effects.
Remark 8. If Xi(t)*is not strictly exogenous w.r.t. .sicrjt,then additional restrictions on the choice of instruments may result. For example, if aggregate or
individual shocks affect both Xi(t)t and si(r)t, instruments which use only variables in an information set at some prior time may appear in (13). More
generally, whatever orthogonality conditions are suggested by theory may be
used to generate the instruments. See Blundell et al. (1989) for a discussion of this
issue in the context of the Browning et al. grouping estimator.
3.2. Autoregressiue models
The autoregressive model is
Yi(t)t = olYi(t),t-
1 +
Xi(t),/l + Zi(,,y + Eictjt,
i = 1, . . . ,N,
t = 2, . . . , T,
(15)
where all variables are as defined previously, but where Xi(t)t is defined to
include only time-varying variables distinct from the vector of time-invariant
variables Zi(t). Estimation of the autoregressive model with RCS data can be
achieved with methods similar to the IV methods used for the fixed effects model
if an instrument for yi(t),r_ 1 can be constructed, though here it must be 2SLS
since the true value Ofyi(r),t _ 1 is not observed. Consider a linear projection using
the observations at t - 1 on yicr_ lJ,l- 1:
Yi(r-lhr-1
=
W(t-1),1-l 6 1
+
Z::(t-1)6 2
+
Oi(t-l),t-1,
(16)
where Wi(r-l),t-1 is a vector of time-varying variables. The vector of timeinvariant variables is assumed to be identical in (15) and (16) although it is
conceivable that they may differ.
R. Mojitt.
108
Model identtjkation and estimation with RCS data
Inserting a predicted variable ii(t), f_ 1 obtained from least-squares estimation
of(16) (using Wi(t),t-l and Zict,) into (15) in place of yi(t),f_ 1 and applying least
squares will yield consistent estimates of /.I and y provided that $i(l),l_ 1 is
asymptotically uncorrelated with si(t)t. Identification of the coefficients requires
that the stacked matrix [y XZ] be of full column rank.
Remark 1. As in the simple case considered in the fixed effects model,
the Zi(t) could consist of cohort functions and the H’i(t)t vector could consist
of functions of t alone. The model would then be clearly identified by
the variation in t in the instrument. Inserting the mean Ji(t-i),t_i of the
sample at t - 1 in the same cohort as that at t is a simple example of such an
instrument. But (15) and (16) generalize that method to one permitting regression-based ‘grouping’ on any other set of time-invariant variables Zi(t) and any
other time-varying variables that may be available. Indeed, even a single
cross-section may be used to estimate the model if there are no cohort effects if
t is defined as age and if yi(t),f _ 1 is constructed from the units aged t - 1 in the
same cross-section.
Remark
2. Construction
of an instrument
ji(l),t_i
using variables
other
than
t
itself
requires
knowing
the
history
of
those
variables for
in Wi(t)t
the cross-sectional units at t. This is a strong data requirement unlikely
to be satisfied by most RCS data sets. Variables that may be thought to be
in Wi(t)tbut which are not observed must themselves be projected onto functions
oft and whatever time-varying variables are observed in much the same manner
as xi(t)t was projected in the fixed effects model.” The rank condition for
identification may be difficult to meet in this circumstance if Xi(t)ritself contains
time-varying functions of t or functions of t itself.
Remark 3.
A more formal approach to the problem may be taken by imposing
the structure (15) on all previous t and by constructing an instrument for
yi(r),f_ 1 from the reduced form of the equation. For illustration, assume that the
process has a finite start date and that the initial value of yi(c)fis determined by
the function
Yi(t)l
=
X(t)lS + G(t)7 +
&i(t)17
(17)
which is simply (15) evaluated at yi(t)o = 0. A more general formulation could be
given in which the variables, coefficients, and error variance in (17) are less
121t should be noted that some time-varying
variables like the number and ages of children
be backcast with considerable accuracy, and that aggregate variables such as the unemployment
will presumably
also be measurable
in the past.
may
rate
R. Mojitt.
Model identification and estimation with RCS data
109
strictly tied to those in (15). The reduced form of (15) can be shown to be
f-l
Yi(t),r-1 =
1 _
1 u’-‘X(r)rP + Z:(,)y
r=1
(
(y-1
1-U
>
+
Vi(t), f-
19
(18)
where
f-l
vi(r), r - 1 =
C
@'-'Ei(t)rT
(19)
r=l
which can be estimated on the t - 1 sample and used to construct an instrument
for the t sample. Note that in this case the absence of any time-varying variables
Xi(t)r in the model still permits identification because the autoregressive structure implies that the cumulative effects of Zi(r) on yi(t),,_l vary with t, thus
permitting the construction of an instrument not linearly dependent on Zi(t) in
(15). More flexible specifications of the autoregressive structure of the model
would generate less rigid forms of the r-dependence of the effect of Zi(t) in (18).
Remark 4. It may be of interest to note that least squares estimates of (15) on
true panel data using the observed values of yi(t),f_ 1 will generate inconsistent
estimates of the parameters if there is individual-specific serial correlation in
ei(t)t (e.g., from ‘unobserved heterogeneity’). But this problem does not appear
with RCS data because IV methods are used by necessity. These methods are
consistent in the presence of such heterogeneity.13
4. Discrete dependent variables
4.1. Fixed eflects models
Only binary choice models will be considered; extensions to other discrete
models and to limited dependent variable models is straightforward. Write the
fixed effects model as
Y3t)t =
X:(t)ZS+./i(t) +
Yi(t)r =
l
= 0
Ei(r)r,
(20)
if yz,), 2 0,
otherwise.
(21)
13Note that one cannot used the lagged y.1(1,,,- 1as an instrument -that is not observable. Using it
in true panel data would be erroneous
if there is first-order
serial correlation.
Instead, lagged
yic,- 1j, ,- t must be used, but that variable is identical for all individuals i(t) who are of the same age
or cohort.
110
R. Mo@I,
Model iden@cation
and estimation
with RCS data
Assuming si(r)t * N(0, l), the issue is whether the IV or 2SLS methods discussed
in section 3.1 can be applied here. 2SLS methods for limited dependent variable
models without fixed effects were first used by Nelson and Olsen (1978), who
simply estimated a Tobit model with predicted regressors. Amemiya (1979)
demonstrated the consistency of the estimator and derived its asymptotic
covariance matrix. A number of other consistent 2SLS and IV methods for
probit and Tobit have since been developed; Newey (1987) considers their
relative efficiencies. Thus consistency of 2SLS in such models has been established. Identification requires meeting the same conditions as in the linear
model.
Relatively little is altered when the fixed effects model in (20)-(21)
is considered, although an additional normality assumption is required. In the
case where there are no cohort effects, insertion of (7) into (20) generates
a composite error term vi + Vi(r) + &i(t)t. As N -00, vi -+ 0, as before, and,
further, the distribution of the within-period individual fixed effects vi(f)
converges to a distribution uncorrelated with the instruments for Xi(t)r. With
the additional (not necessarily innocuous) assumption of normality of
Vi(t)+ &i(t)t,the consistency results in the articles cited in the previous paragraph
are immediately applicable. Unlike the case with true panel data, where
large T is required for consistent estimation of logit and probit fixed effects
models [Chamberlain (1980), Heckman (1981a)], here the individual fixed
effects are not estimated and hence no inconsistency in their estimates at finite
T is transmitted to /3.
If cohort fixed effects are estimated, consistency of the 2SLS-estimatorplus-cohort-dummies requires that N +cc while the number of cohorts is held
fixed, or, alternatively, that N/NC +co, where N, is the number of cohorts.
However, this requirement is present even in the linear model, where the cohort
dummies are already error-filled proxies for true mean cohort fixed effects in
finite-N samples. Thus we have consistency only with N -+ co in both the linear
and probit models. The same applies for the general case shown in (20) using (13)
to construct instruments.
4.2. Markov models
A first-order autoregressive model between binary outcome variables is
equivalent to a first-order Markov model. i4 That model can be characterized by
two transition rates, one each for the probabilities of inflow and outflow from
141f the lagged endogenous
variable is yt,analogous to those in the linear autoregressive
1 rather than yi, ,_ 1, simpler methods more directly
model can be used.
R. Mofitt,
Model identification
and estimation
111
with RCS data
each of the two states. Define the model as follows:”
Pit = Prob(y,
Pit =
= l),
Prob(yit= 11yi, t-
jl, = Prob(y,
(22)
O),
(23)
= 0 1yi.r- 1 = 1).
(24)
1 =
Then we have the accounting identity
Pit = Pitt1 - Pi,r-1) + C1 -
&It1
-
Pi,t-1) =
Pit
+
VirPi,t-17
(25)
where vi, = 1 - ili, - bit. Eq. (25), a standard flow equation in the literature on
Markov processes, relates the two marginal probabilities at t and t - 1 to the
two transition rates.
That the parameters of (25) are identified with panel data in cases in which
they are not with RCS data is intuitive and forms the basis for a fundamental
nonidentification result for RCS data. For example, reinterpreting the crosssectional index i as indexing groups of cross-sectional units within which
frequency estimates of p and A can be estimated with a given panel sample,
nonidentification in the RCS case can be seen from (25) by merely noting that
the two parameters CLitand lit for group i at period t cannot be identified solely
from knowledge of Pit and pi,r_ 1. Thus a model with pit and Ait completely
unrestricted with respect to i and t cannot be estimated with RCS data.16
Identification is possible with restrictions imposed over i and/or t, however.
Consider two examples.
Example 1:
Time-homogeneous and unit-homogeneous hazards.
If pit = p and
li, = A.for all i and t, then, letting v = 1 - II - II, we have
Pit = P + ulPi,r-13
(26)
with reduced form
(27)
ISThe notation for i(t) is not used in this section for simplicity.
16More generally, in a transition model with L possible states rather than 2, the L(L - 1) unique
elements of a L x L transition matrix cannot be estimated from the marginal probabilities at c - 1
and t alone.
112
R. MO&U, Model idenrification and estimation
with RCS data
assuming pi0 = 0.’ 7 Thus the profile of pit = pr over t is a two-parameter
function; hence both p and ;1 are identified provided estimates on pt at two
different t are available. This result obviously generalizes to cases in which the
hazards are homogeneous w.r.t. t but only within groups defined over i.
Models with this type of homogeneity imposed have been studied extensively
in the statistical literature, where the predominant approach considers least
squares estimation of (26) using aggregate frequency estimates of the pit = pt,
thus identifying the hazards from an estimated slope and intercept [Miller
(1952), Madansky (1959), Lee et al. (1970), Lawless and McLeish (1984),
Kalbfleish and Lawless (1985)]. These studies typically ignore the boundedness
of the variable pt as well as the presence of sampling error in the frequency
estimates and consequent errors-in-variables difficulties in least squares estimation.18
Example
2: Linear probability
model for hazards. Let pit = Xi,e, and
Ait= x,0,, where Xi, is a vector of observables for unit i at time t (perhaps
including independent functions of t). Then we have
Pit = -Gel
+ (Xte*)Pi.t-
1.
Estimation of (28) can be considered using methods analogous to the 2SLS
methods proposed for the autoregressive linear model. Construction of a predicted value of fii,r_l, possibly by explicit calculation of the reduced form for
(28), and substitution of the dichotomous yi, and yi, f _ 1 for pit and pi, t _ 1 leads to
the same identification condition as that for the linear model, namely, that the
predicted value @i,t_ 1 have sufficient variation independent of Xi, to meet the
rank condition, which will often mean variation over t.19 The rank condition
will involve interactions of hi, 1_1 and xit as well since the parameters e2 are
identified from such interactions.20
This example demonstrates that the proportionality restrictions inherent in
the index functions Xi,eI and Xi,& are themselves sufficient for identification of
transition rates from RCS data. However, while such restrictions are not
“The value of pi0 is not the first observed outcome of the process -that is pi, -but is instead the
value of the state prior to the beginning of the process. As most outcome variables are defined,
pi0 will be zero - an individual is unemployed
at the beginning of an unemployment
spell, not
working prior to entering the labor force, unmarried at the beginning of the lifetime, and so on.
“However
some authors consider ML estimation to address the first problem,
the sample esiimate of pr_, as exogenous and error-free.
“Note that this implies that the parameters
processes.
of transition
rates cannot
“‘As in the linear case, it can alsb be shown that the predicted
all the Xi, are time-invariant.
though
still taking
be identified for steady state
value fii, ,_
1will vary with t even if
R. Mojjitt,
Model idekjication
and estimation
with RCS data
113
necessary with true panel data, in practice most studies using panel data
nevertheless impose such index function restrictions in the specification of their
transition rates.
Imposing the index function restriction in a proper model of binary choice
requires leaving the linear probability model. The specification and estimation
of a probit model illustrates such a proper probability model identifiable with
RCS data. Let
Yi’; =
Gel +
yi, = 1 if
= 0
(X:,e*)Yi,
f-
1 +
&if,
(29)
yi, 2 0,
otherwise,
(30)
where sit ‘VN(0, 1). The hazard rates are
PiI
=
F CxkeI13
(31)
J-i,= 1 - F[Xi,(Bl + 8*)],
(32)
where F is the unit normal c.d.f. Consistent instrumental variable and two-stage
estimation methods are not available for this model because of the nonlinear
errors-in-variables problem that would be created by instrumenting yi, r- 1.
However,
the reduced
form can be estimated
directly.
Letting
Pit = Prob(y, = l), the reduced form for Pit can be shown to be
1-l
Pit = Pit + 1 k
r=1
( .&,
Bis> ’
(33)
where qis = 1 - li, - pis* Consistent estimates of 8i and BZcan be obtained by
maximization of
L = 1 C CYifl”kT(Pit)
+ C1- Yir)l”gtl - Pit113
i f
(34)
using (31) and (32) in (33).*’
Computing pit by means of (33) is equivalent to integrating out all possible
histories for each individual i at time t to derive an expression for the marginal
probabilities that are observed. This integration over histories is identical to the
formal solution to the initial conditions problem in panel data [Heckman
211ncluding fixed effects in the model cannot be addressed with predicted values of X as discussed
previously because of similar nonlinear errors-invariables
problems. In this case the function in (13)
could be estimated jointly with (29)-(30) using FIML.
114
R. h4ojitt.
Model identification and estimation
with RCS data
(1981b, pp. 181-185)]. The problem in that case arises when several periods of
a process are observed for an agent but the beginning of the process is not; hence
the unobserved history must be integrated out. The situation here is identical
except that there is only a single observed time period.
It may be noted that the assumption of a scalar covariance matrix for &itis less
innocuous in this model than in the linear autoregressive model. Whereas the
presence of serial correlation in the linear model, such as that generated by
unobserved heterogeneity, only affects the variance of the reduced form error
term in (18), in the probit model it affects the form of the reduced form
expression (33), where the elements of the covariance matrix of ait would enter
nonlinearly with the regressor variables in Xi, because the complete distribution
of sir would have to be integrated out. No simple identification restrictions can
be derived guaranteeing separation of the elements of the covariance matrix and
the parameters of the transition function in this case.22
5. Empirical illustrations
The major issue raised by the discussion is whether the identification restrictions necessary for the estimation of the various dynamic models with RCS data
are indeed met in common areas of application. That exercise requires the use of
panel data and therefore will be conducted in future work. Instead, this section
provides two illustrations of the estimation of dynamic models with RCS data,
one a linear fixed effects model and one a Markov model. The first is treated
only briefly since it is less novel than the latter.
5.1. Linear$xed
efects
To illustrate the IV method for the linear fixed effects model, the life cycle
model of labour supply of Browning et al. (1985) is estimated with US. RCS
data (Browning et al. used U.K. data). In the model, y, is hours of work and xit is
the log of the real discounted hourly wage rate. The fixed effects are given
a specific interpretation, as including the marginal utility of wealth, and the
coefficient on xit is interpreted as an intertemporal substitution effect of this
wage. The data set is the U.S. Current Population Survey (CPS) and the sample
includes white males 20-59 from 21 annual waves, 1968 to 1988.23
**A less severe problem also arises with incomplete panel data when the covariance matrix
estimated for the included periods must be extrapolated to unobserved pre-sample periods.
2”0nly the March files are used. Hours of work are measured as the annual amount over the year
prior to the survey and the wage rate is measured as the ratio of earnings in the prior year to hours
worked. The wage is discounted with annualized three-month T-bill rates using a 1978 base. To keep
the estimation problem manageable and to make the sample size roughly the same as that used by
Browning et al., the data are randomly subsampled down to a total of 15,500 over the years.
R. Mojitt.
Model identification
and estimation with RCS data
115
Table 1 shows IV estimates of eq. (3) with h projected onto cohort and
individual characteristics for education, number of children, family size, and
regional location, and with xit projected onto these same variables plus age.24
Year dummies are also included in both hours and wage equations for consistency with Browning et al. Columns (1) and (2) show wage and hours estimates,
respectively, with a relatively unrestricted age and cohort specification, and
columns (3) and (4) show estimates after a specification search on the form of the
age and cohort effects. F-tests on the age, cohort, year effects, and on the seven
individual characteristics are also shown.
Two conclusions can be drawn from the results. First, a considerable amount
of parsimony is achieved in the specification of age and cohort effects. The
unrestricted specification in the first two columns contains 27 parameters for age
and cohort effects in the wage equation and 12 parameters for cohort effects in
the hours equation. Although this is much more restricted than a full specification of all single-year cohort and age effects and interactions (the sample size is
insufficient for such a specification in any case), most of the age and cohort
parameters are insignificant. The more restrictive model shown in the table,
which has quadratic age and cohort effects in the wage equation and cubic
cohort effects in the hours equation, cannot be rejected at the 10 percent level
but only contains seven age-cohort parameters in the wage equation and three
cohort parameters in the hours-worked equation.
Second, the relative sizes of the F-statistics imply that the presence of the
individual characteristics is considerably more important than either age, cohort, or year effects. Thus the IV procedure proposed in section 3 above, which
makes full use of the individual micro data and of the cross-sectional variation in
individual characteristics within age-cohort cells, significantly improves the fit
of the mode1.2s
5.2. Markov model
To illustrate the Markov model the CPS data are used to estimate a model of
female labor supply. The dependent variable yir is defined to equal 1 if the
individual is employed and 0 if not. The model in (29)-(30) is estimated with ML
by maximizing the likelihood function in (34). The sample includes white
married women 20-59 in the 21 years 1968-1988.
‘?Some of the individual characteristics
are time-varying,
inconsistent with an interpretation
as
fixed effects. They may alternatively
be considered as time-varying
variables in Xi, in (12) for which
no instruments
are needed.
15The wage coefbcient in column (4) implies an intertemporal
wage elasticity of 0.68, considerably
larger than that obtained by Browning et al. or, for example, MaCurdy (1981). It appears to be
a result neither of the parsimony
of the age and cohort effects or the inclusion of the individual
characteristics;
the difference seems to arise from the differences in the data sets used. However, the
year effects are still significant.
116
R. Mojitt,
Model identtjication
and estimation
Table
Labor
supply
1
and wage estimates
for white males; n = 15,500.”
Unrestricted
age
and cohort effects”
Log discounted
hourly wage
(1)
Log discounted
hourly wage”
-
with RCS data
Restricted age
and cohort effects’
Annual
hours worked
(2)
Log discounted
hourly wage
(3)
964.6’
(139.1)
Annual
hours worked
(4)
1037.3’
(142.7)
Individual characteristics
Education
0.056’
(0.002)
- 28.0’
(8.1)
0.056’
(0.002)
- 32.3e
(8.3)
No. children
< 6
0.011
(0.012)
6.7
(11.2)
0.010
(0.012)
4.9
(11.2)
No. children
6-17
0.002
(0.010)
12.8
(9.3)
- 0.001
(0.010)
12.9
(9.3)
0.006
(0.008)
- 6.0
(7.9)
0.008
(0.008)
- 6.4
(7.9)
Northeast
0.116’
(0.016)
- 159.0’
(22.2)
0.116’
(0.016)
- 167.9’
(22.5)
North
0.062’
(0.015)
- 38.0’
(16.8)
0.063’
(0.015)
- 43.4e
(17.0)
0.074’
(0.015)
- 119.2’
(18.1)
0.075’
(0.015)
- 125.6’
(18.3)
Family
size
central
West
F-statistics
10.0’
(5)
Age effects
6.4’
(8)
Cohort
4.2’
(12)
2.7’
(12)
7.3’
(5)
1.9’
(20)
5.5’
(20)
1.8’
(20)
151.1’
(7)
14.3’
(7)
151.8’
(7)
effects
Year effects
Individual
characteristics
R-squared
0.123
0.032
0.122
(2:;
15.1’
(7)
0.03 1
BStandard errors in parentheses; for F-statistics, number of parameters
in parentheses.
bWage equation: Piecewise-linear
age splines at five-year intervals from age 20 to 59; piecewiselinear cohort splines at five-year intervals from 1908 to 1967; interactions
of linear cohort variable
with all age splines; year dummies. Labor supply equation: Same but without age-related variables.
’ Image equation: Age, age squared, cohort, cohort squared, age x cohort, age x cohort squared,
cohort x age squared, year dummies. Labor supply equation: cohort, cohort squared, cohort cubed,
year dummies.
dPredicted from wage equation.
‘Significant at 10% level.
R. Mojitt,
Model identification
and estimation
with RCS data
117
Table 2 shows the results of three specifications of the hazards of varying
degrees of simplicity. Column (1) presents estimates of a model with constant
hazards and implies transition rates of p = 0.43 and A = 0.35 [see (31)-(32) for
definitions]. These annual transition rates are implausibly high and exceed those
typically found in panel data, for example, by Heckman and Willis (1977) who
found p = 0.139 and I = 0.144. 26 However, a model with constant hazards is
a gross violation of the data, for it implies a monotonically rising lifecycle profile
of employment rates [see (27)], contrary to the actual shapes of such profiles (see
below).
The second column shows an expanded version of this model with two
time-invariant variables, education and cohort, and one time-varying variable,
the U.S. unemployment rate for prime-age males, which can be backcast over
the past periods of the life cycle for each observation in the data.27 The
parameters are again well-determined, with entry rates positively affected by
education and cohort and with exit rates negatively affected by education and
positively affected by cohort. Higher unemployment rates lower entry rates, but
lower exit rates as well, a result to be discussed more momentarily.
Column (3) shows the most well-specified model, including quadratic age
terms in both hazards (age can be perfectly backcast), variables for children
(backcast from current numbers and ages of children in the household), and
three initial conditions variables. The initial conditions variables permit the first
transition in the process, pL1(i.e., entry starting from pi0 = 0 at age 19 to some
positive value at age 20) to differ from the rest of the profile of entry rates by an
amount varying by education and cohort. Most of the parameters of the model
are fairly precisely determined. Moreover, the results are consistent with findings from panel data: both entry and exit are concave in age and are affected by
children of different ages in the expected directions. The unemployment rate
decreases the exit rate, consistent with added-worker behavior.
Fig. 1 shows the actual and fitted employment probabilities in the data as well
as the estimated transition rates. The model fits the actual profile reasonably
well except at the very beginning of the life cycle, where an initial rise and
subsequent fall in actual employment rates is not fully captured; the introduction of more age terms would improve this fit. More interesting are the estimated transition rates shown in the figure, which ‘explain’ the employment
profile by the pattern of entry and exit rates shown. Both entry and exit rates rise
initially though the former eventually dominates and employment probabilities
rise, but entry rates are forced down and exit rates forced up (by rising numbers
26These estimates are averages over the four transition rates between their five years, 1967-1971,
shown in their fig. 2, pp. 46-47. The estimates are based on a sample of 1583 white women in the
Panel Study of Income Dynamics.
“Given
1928.
the ages of the women
in the data, this requires
the use of unemployment
rates back to
118
R. Mofitt,
Model identification and estimation
with RCS data
Table 2
Markov
estimates
for employment
status
(1)
for white females: n = 19.892.
(2)
(3)
- 2.649’
(0.137)
- 3.649’
(0.566)
Education
0.041’
(0.011)
0.043’
(0.008)
Cohort
0.043’
(0.002)
0.013’
(0.005)
- 0.185’
(0.026)
- 0.012
(0.019)
O,(P)
Intercept
- 0.308’
(0.070)
Unemployment
rate
0.085’
(0.026)
Age
Age squared/l00
- 0.120’
(0.035)
No. children
<: 6
- 0.774’
(0.052)
No. children
6-17
0.099’
(0.022)
- 0.273
(0.504)
D,cb
D2,, x Education
0.026’
(0.007)
DzO x Cohort
0.035’
(0.021)
b(S)
3.986’
(0.347)
5.570’
(1.121)
Education
- 0.007
(0.027)
- 0.025
(0.016)
Cohort
- 0.103’
(0.007)
- 0.023’
(0.013)
0.647’
(0.073)
0.134’
(0.045)
0.476’
(0.149)
Intercept
Unemployment
rate
- 0.113’
(0.047)
Age
Age squared/100
0.142’
(0.062)
No. children
cc 6
0.366’
(0.072)
No. children
6-17
0.091’
(0.050)
Log likelihood
- 13734.3
“Asymtotic standard errors in parentheses.
bEquals 1 if age = 20, 0 if not.
‘Significant at 10% level.
- 13263.9
- 12606.5
R. Mojitt.
.36
-
32
-
Model identification and estimation
119
with RCS data
Actual
(p
by
age)
<
<’
.I4 .I0 o6
_.-‘.._._.-.
02
-
-‘-Y._,_
25
Fig. 1. Actual
30
35
and fitted employment
40
probabilities,
45
90
and fitted transition
55
AW
rates, by age.
of young children), eventually reaching levels that begin pulling down the
employment rate. As the children age, entry rates are pulled back up slowly and
exit rates are pushed down, but the direct effects of age in the hazards slow this
process by exerting an upward effect on exit and a downward effect on entry. As
a result, the exit effect continues to dominate and employment rates continue to
decline, These life cycle profiles of entry and exit, and the reasons for their
patterns, are quite consistent with those found in the literature using panel data
even though transitions are not directly observed in the RCS data used here.
It is of some interest in the application at hand to note the difference between
the profiles shown in fig. 1, which confound true within-cohort profiles with
across-cohort profiles, and those for a single woman or group of women from
the same cohort and with the same characteristics. Fig. 2 shows the simulated
profile for a woman aged 45 and with the mean values of the other variables in
the data. The employment profile has the same general shape as that in fig. 1 but
shows much more curvature, implying that the quasi-cross-sectional profiles in
fig. 1 were too flat if taken as true cohort profiles. This is again a finding
commonly reported in other such comparisons in the literature. The shapes of
all three curves in fig. 2 are explained by the same forces discussed for the
profiles in fig. 1, although the trends are more dramatic in fig. 2.
Finally, table 3 provides a comparison of the fitted model with a conventional
cross-sectional probit model in which employment status is made a function of
only contemporaneous characteristics of the observation. To date, such probit
120
R. MO@,
Model iden@cation
and estimation
with RCS data
.72
.66
.64
.60
.56
.52
.40
.36
.06
Fig. 2. Simulated
I
I
I
I
I
25
30
35
40
45
life cycle profile of employment
and transition
of the characteristics.
Age
rates for women with mean values
models have been the predominant type of model estimated with RCS data. The
probit coefficients are shown in the first column and the effects of a unit change
in each of the regressors on the employment probability is shown in the next two
columns for both the cross-sectional and the Markov model (standard errors are
obtained with the delta method). For the Markov model, the effects shown
represent those resulting from a unit change in each regressor variable in both
the XitOl and XitOz vectors, and thus represent the combined effect of each
variable on the employment rate working through the lagged distribution of
entry and exit. As the table shows, the probit and Markov estimates are close for
the estimates of education and children but less so for the other variables. The
age effect in the Markov model is lower than that in the probit model, but this
R. Mofitf,
Model identiJca!ion
and estimation with RCS data
121
Table 3
Comparisons with cross-sectional probit.’
Cross-sectional
probit coefficients
(1)
Intercept
- 3.297d
(0.216)
Markov
model
(2)
Cross-sectional
probit
(3)
-
Education
0.072d
(0.004
0.03Od
(0.002)
0.029d
(0.001)
Cohort
0.026d
(0.003)
0.003
(0.002)
O.OIOd
(0.001)
- 0.007
(0.008)
O&W
(0.010)
- 0.003
(0.003)
0.079*
(0.008)
- 0.001
(0.002)
_
0.005d
(0.001)
Unemployment fate
Age
Age squared/100
- 0.087*
(0.010)
No. children < 6
- 0.5 16d
(0.015)
- 0.172d
(0.009)
- 0.197*
(0.005)
No. children 6-17
- O.lOld
(0.009)
- 0.053d
(0.005)
- 0.040d
(0.004)
-
-
Log likelihood
- 12635.7
‘Standard errors in parentheses.
bp = probability of employment. In the Markov model, ap/aX is the derivative of eq. (33) w.r.t.
elements of X common top and r~.In the cross-sectional probit model, ap/aX =f(X/?)X, where XjI
is the probit index. All effects are evaluated at the means of the data.
‘In columns (2) and (3), a unit change in age squared is not separately shown; its effect is captured
in the prior row.
dSignificant at 10% level.
difference arises because a one-year increase in the age of an observation in the
Markov model implies a different history of the children variables. In particular,
if the same number and ages of children are observed for the older woman, the
children must therefore have been born later in her life cycle and, in this case, to
have pushed down her employment rate history and therefore her contemporaneous employment rate. ‘* The unemployment rate shows no effect in the
“The age effects for the Markov model could be estimated, alternatively, by holding the children
history fixed, although some assumption about new births at the higher age would have to be
assumed. However, this would still not be comparable to the cross-sectional probit coefficients
because the contemporaneous numbers and ages of children would not be held constant. Thus it is
not possible to obtain completely comparable age effects in the two models. The same obtains for the
unemployment rate (see below).
122
R. MoJitt, Model identification and estimation with RCS data
probit model but a positive one in the Markov model, consistent with an
added-worker effect for married women. The difference is in part a result of the
differing amounts of information used in the two estimation procedures, for the
Markov model uses the entire life cycle history of unemployment rates for each
observation while the probit model uses only the contemporaneous unemployment rate. However, there is also a lack of strict comparability between the two
estimated effects, for the Markov coefficient is obtained by shifting the entire
history of unemployment rates up by one unit.29
The differences in the two models are not surprising since the Markov model
contains a formal lag structure. Also not surprisingly, comparison of the likelihood functions for the two models indicates that the Markov model provides
a better fit even after adjustment for the number of parameters using the Akaike
criterion. Nevertheless, it does confirm that a dynamic model fit to RCS data is
superior in many respects to the cross-section models usually estimated.
6. Summary and conclusions
The results of this analysis suggest that many dynamic models are identified
with repeated cross-sectional data under what appear to be relatively mild
restrictions. This raises a number of questions and issues for future work. The
most important is whether the identification conditions do indeed hold, an issue
that can be investigated with panel data. Such an investigation is not entirely
straightforward, however, if the panel data suffer from measurement error with
well-known deleterious consequences for fixed-effects estimators. Conditional
on identification, the relative efficiencies of panel and RCS estimators of dynamic models could be considered as a second issue. Other issues worth
examination are (1) the importance of serial correlation and unobserved heterogeneity in the Markov model and (2) means of incorporating the effects of
migration, mortality, and birth rates into RCS estimates.
References
Amemiya, T., 1979, The estimation of a simultaneous-equation Tobit model, International Economic Review 20, 169-181.
Angrist, J., 1991, Grouped-data estimation and testing in simple labor supply models, Journal of
Econometrics 47, 243-266.
Blundell, R., M. Browning, and C. Meghir, 1989, A microeconometric model of intertemporal
substitution and consumer demand, Discussion paper no. 89-l 1 (University College London,
London).
Browning, M., A. Deaton, and M. Irish, 1985, A profitable approach to labor supply and commodity
demands over the life cycle, Econometrica 53, 503-544.
29The Markov model permits the estimation of the effect of any alteration in the time pattern of
historical unemployment rates.
R. Mofjitt, Model ident@cation and estimation with RCS data
123
Chamberlain, G., 1980, Analysis of covariance with qualitative data, Review of Economic Studies
47, 225-238.
Deaton, A., 1985, Panel data from time series of cross sections, Journal of Econometrics 30,109-126.
Heckman, J., 1981a, Statistical models for discrete panel data, in: C. Manski and D. McFadden, eds.,
Structural analysis of discrete data with econometric applications (MIT, Cambridge, MA)
114-178.
Heckman, J., 1981b, The incidental parameters problem and the problem of initial conditions in
estimating a discrete time-discrete data stochastic process, in: C. Manski and D. McFadden, eds.,
Structural analysis of discrete data with econometric application (MIT, Cambridge, MA)
179-195.
Heckman, J. and R. Robb, 1985, Alternative methods for evaluating the impact of interventions, in:
J. Heckman and B. Singer, eds., Longitudinal analysis of labor market data (Cambridge
University, Cambridge) 156-245.
Heckman, J. and R. Willis, 1977, A beta-logistic model for the analysis of sequential labor force
participation by married women, Journal of Political Economy 85, 27-58.
Kalbfleish, J.D. and J.F. Lawless, 1985, The analysis of panel data under a Markov assumption,
Journal of the American Statistical Association 80, 863-871.
Lawless, J.F. and D.L. McLeish, 1984, The information in aggregate data from Markov chains,
Biometrika 71, 419-430.
Lee, T.C., G.C. Judge, and A. Zellner, 1970, Estimating the parameters of the Markov probability
model from aggregate time series data (North-Holland, Amsterdam).
MaCurdy, T., 1981, An empirical model of labor supply in a life-cycle setting, Journal of Political
Economy 89, 1059-1085.
Madansky, A., 1959, Least squares estimation in finite Markov processes, Psychometrika 24,
137-144.
Miller, G., 1952, Finite Markov processes in psychology, Psychometrika 17, 149-167.
Nelson, F.D. and L. Olson, 1978, Specification and estimation of a simultaneous-equation model
with limited dependent variables, International Economic Review 19, 695-709.
Newey, W., 1987, Efficient estimation of limited dependent variable models with endogenous
explanatory variables, Journal of Econometrics 36, 231-250.
Verbeek, M. and T. Nijman, 1992, Can cohort data be treated as genuine panel data?, Empirical
Economics 17,9-23.