Endogeneity

Dr. Stefan Wuyts
Associate Professor Marketing
Koç University
[email protected]
1
Common Method Variance:
Single-method error variance due to the use of a single
measurement approach (e.g. survey) (Podsakoff et al. 2003)
The amount of spurious covariance shared among variables because
of the common method used in collecting data (Malhotra et al.
2006)
◦ CMV is often raised by reviewers, but not always a concern
◦ It is lower when concepts are more concrete
◦ Caused by contextual, respondent, and measurement influences
First think about how you can prevent it:
◦ Variety of scale formats or anchors
◦ Spread over multiple respondents or over time
2
Cote & Buckley (1987); Podsakoff et al. (2003):
Calculate the impact of common method variance on
the observed correlation between measures of
different types of constructs:
where true Rti, tj is the average correlation between
trait i and trait j, tx is the percent of trait variance
in measure x, ty is the percent of trait variance in
measure y, true Rmk, ml is the average correlation
between method k and method l, mx is the percent
of method variance in measure x, and my is the
percent of method variance in measure y.
3
Podsakoff et
al. 2003 (JAP)
Reverse
coding
creates
biases
4
Podsakoff et
al. 2003 (JAP)
5
Podsakoff et
al. 2003 (JAP)
6
Malhotra et al. 2006:

Not
very
realistic
Not very
sensitive



Quite
effective
Multi Trait – Multi Method: are monomethodheterotrait correlations higher than
heteromethod-heterotrait correlations?
MTMM using CFA (variance in a measure = true
variance + method variance + random error)
Harman’s single-factor test (EFA: how much does
first factor explain? Or a variant: two CFA’s, one
with and one without a common method factor)
Marker variable technique (correct all correlations
by partialling out the “correlation between a
special variable and a theoretically unrelated
variable from the model to be estimated”; or
alternatively “the second-lowest correlation
observed in the correlation matrix”)
7

What are the problems Poppo & Zenger
identify and address?
◦ Sample selection: all sample firms chose for
outsourcing; solution: calculate and include inverse
Mills ratio.
◦ Endogeneity of explanatory variables: relational
governance and customized contracts are
endogenous; solution: use instrumented values.
◦ Correlated errors across equations: system
estimation (together with endogeneity: 3SLS)
8




Regressor is correlated with error term
Sample selection problem is very important in
management (e.g., strategic decisions are not random,
they are endogenous)
Endogeneity renders the OLS estimator inconsistent (no
convergence to the population parameter, i.e. bias)
Three instances:
◦ Errors-in-variables (measurement error, correlated with
regressor or with error term of equation, see Bascle 2008)
◦ Omitted variables (e.g. self-selection)
◦ Simultaneous causality (e.g. diversity-performance linkage)
9

Are the explanatory variables exogenous? Durbin-Wu-
Hausman test
Imagine the following two equations: z = a0 + a1*x1 + a2*x2 + ε1
y = b0 + b1*z + b2*x3 + ε2
Estimate first:
z = c0 + c1*x1 + c2*x2 + c3*x3 + ε3 and estimate residuals ̂
Then perform an augmented regression:
y = d0 + d1*z + d2*x3 + d3*̂ + ε4
If d3 is significantly different from zero, then OLS is not consistent.
10

Heckman two-step procedure
◦ First estimate a sample selection model (probit);
◦ Calculate Inverse Mills Ratio on the basis of the estimated
parameters (ratio of the pdf over the cdf of a distribution
where cdf
)
◦ Estimate regression equation of interest with OLS, including
the inverse Mills ratio.
◦ Assumptions: there are several, e.g. error terms are
bivariate normal; also need to be able to specify stage-1
equation.
Key paper: Heckman, James J. (1979). Sample selection bias
as a specification error. Econometrica, 47 (1):
153-162.
11

Application: Shaver 1998. Effect of entry-mode
choice on survival of FDI
◦ Acquisition is endogenous strategy variable, DV of
dichotomous choice model;
◦ Acquisition in turn is regressor in performance model;
◦ The errors of acquisition and performance models are likely
correlated, assume bivariate normal;
◦ If the correlation between both errors, ρ, is positive, then
the estimated performance effect of acquisition is positively
biased;
◦ Application of Heckman 2-step approach: see pages 572575.
12


Iyer and Seetharaman 2003
See figure 1: you need to correct for self-selection
because the correct comparison is not between the
y’s of observed occurrences of alternative
strategies, but between the y that results from a
given strategy and the y that would have occurred
if the alternative strategy were selected.
13

We face two decisions: station-type and pricing
(where pricing is conditional on the first decision)
◦ A criterion function determines allocation to one of two
regimes (see equations (4) and (5) in paper)
◦ Binary probit model is estimated, two probabilities
◦ Linear regression that includes self-selectivity correction
variables, the influence of which is determined by
covariance between error terms of stage 1 and stage 2
decisions (see p169 for interpretation).
14

Garen’s approach
y1 = a0 + a1*x1 + a2*x2 + ε1 (eq1)
y2 = b0 + b1*y1 + b2*x3 + ε2 (eq2)
First estimate eq1 and compute the residuals
̂
Then include ̂ as well asˆ * y1 in eq2 ( ˆ * y1 accounts for unobserved
heterogeneity over the range of the continuous variable y1; for example,
selection bias may be more pronounced at higher levels of y1)
Key paper:
Garen, John (1984). The returns to schooling: A
selectivity bias approach with a continuous choice
variable. Econometrica, 52 (5): 1199-1218.
15


Instrumental variables techniques (see Bascle 2008):
try to capture the variation in X that are uncorrelated
with the error term
Two-stage least squares (2SLS)
◦ Stage 1: regress the endogenous regressor on the
instruments and exogenous covariates; isolate variation in
X that is not correlated with the error term
◦ Stage 2: use the resulting fitted (predicted) value X instead
of the endogenous regressor in the equation
16

What constitutes a good set of instruments?
◦ Are the instruments relevant? (“sufficient correlation”) (F-test)
◦ In case of multiple instruments: over-identification (Hansen-J test)?
◦ Are the instruments exogenous? (several tests are available)
17
2SLS solves endogeneity problem, but in case of multiple
equations it does not solve problem of correlated errors.


Three-stage least squares (3SLS) is a systems method that
also takes care of correlated errors across equations of a
system of equations (full information method).
3SLS (see Poppo & Zenger):
◦ First, produce the instrumental values;
◦ Second, estimate the covariance matrix of equation disturbances
(after 2SLS);
◦ Third, GLS estimation using covariance matrix and instrumented
values.
18