On the Use of Instrumental Variables in Accounting Research

On the Use of Instrumental Variables in Accounting Research
David F. Larcker
The Wharton School
University of Pennsylvania
1332A Steinberg-Dietrich Hall
Philadelphia, PA, 19104
[email protected]
Tjomme O. Rusticus∗
The Wharton School
University of Pennsylvania
1300 Steinberg-Dietrich Hall
Philadelphia, PA, 19104
[email protected]
Revised November 29, 2004
Abstract
The use of instrumental variables is becoming an increasingly common method for
dealing with the econometric problems caused by endogeneity in accounting research.
While instrumental variables estimation is the standard textbook solution to mitigating
the inconsistency in parameter estimates caused by endogeneity, the appropriateness of
instrumental variable methods in real settings is not obvious. We provide conditions
under which instrumental variables methods are likely to be preferred over regular OLS.
Our survey of the use of instrumental variables methods in accounting research raises
doubt about whether these conditions are generally met. We illustrate these concerns with
two examples from contemporary accounting research: the effect of disclosure on cost of
capital and the effect of insider power on CEO compensation levels. We conclude with
some recommendations on the preferred approach for applying instrumental variables
estimation.
∗
We appreciate helpful comments from Ted Goodman, Jeffrey Ng, Scott Richardson and Rodrigo Verdi.
We gratefully acknowledge financial support from the Wharton school. Tjomme Rusticus is also grateful
for financial support from the Deloitte & Touche Foundation.
1.
Introduction
The econometric problems caused by the endogeneity of predictor variables in a
regression model are one of the most difficult issues for empirical accounting research.
Endogeneity exists when some of the right-hand-side (RHS) variables in an equation are
correlated with the true (but unobserved) error term in the equation. This problem
commonly occurs when the RHS variables are choice variables and some of the
determinants of these choice variables also affect the dependent variable. If these
determinants of the RHS variables are not included in the regression equation being
estimated, the resulting ordinary least squares (OLS) parameter estimates will be
inconsistent due to the well-known correlated omitted variables problem.
In order to mitigate the econometric problems caused by endogeneity, it has
become commonplace in accounting research to implement some type of instrumental
variables (IV) estimation procedure. In particular, the researcher first selects a set of
variables that are assumed to be exogenous and then two-stage least squares (2SLS)
estimation is used to estimate the coefficients in the regression model.1 This standard
textbook solution to endogeneity works if the researcher can find instrumental variables
that are correlated with the endogenous regressor but uncorrelated with the error in the
structural equation. However, as Maddala (1977, p. 154) points out “Where do you get
such a variable?” Equally importantly for applied research, what happens to the
statistical properties of IV estimates when the instrumental variables do not precisely
conform to the textbook definition of these variables?
1
Although our discussion focuses on 2SLS estimation, the same concerns are inherent in treatment effects
models (Heckman-type models), 3SLS, full information maximum likelihood (FIML), and other estimation
methods that rely on instrumental variables.
1
The purpose of this paper is to synthesize the extensive literature in statistics and
econometrics that examines the properties of IV estimators and provide accounting
researchers with a framework to appropriately evaluate and interpret their application of
instrumental variables. Three aspects of IV estimation are considered in our analysis.
First, we examine both asymptotic and finite sample properties of OLS and IV
estimators. Second, the importance of problems with “weak” instruments (i.e.,
instruments that explain only a modest proportion of the variation in the endogenous
variable) is examined. Finally, we analyze the situation where the selected instrumental
variables are not completely exogenous (i.e., the instruments that are somewhat
correlated with the error term in the structural model or “semi-endogenous”).
Our analytical results and numerical simulations indicate that the IV approach
typically used in accounting research is in many cases unlikely to produce estimates with
desirable econometric properties. It can easily be the case that IV estimates are more
biased than simple OLS estimates that make no explicit correction for endogeneity. In
addition, we examine the results produced by OLS and IV estimators for two typical
accounting research studies where there is substantial reason to suspect that the primary
regressor is endogenous. In both studies, we conclude that OLS estimation is preferred to
IV estimation. The primary implication of our analysis is that accounting researchers
need to be much more careful in applying IV estimation. In particular, accounting
researchers should generally refrain from making broad claims that endogeneity problems
have been effectively mitigated by the use of IV methods.
The remainder of the paper is composed of five sections. Section 2 provides a
summary evaluation of the instrumental variables applications in accounting. The
2
asymptotic and finite sample properties of IV estimators are developed in Sections 3 and
4, respectively. Section 5 compares the OLS and IV results for two contemporary topics
in accounting research – whether the cost of capital is an increasing function of corporate
disclosure and whether chief executive officer (CEO) compensation is a function of the
power of corporate insiders. A summary of our analysis and recommendations regarding
the use of IV methods in accounting research is provided in Section 6.
2.
Instrumental Variable Applications in Accounting Research
In order to ascertain the use of IV estimation, we conducted an electronic search
for the term “2SLS” in Accounting, Organizations and Society, Contemporary
Accounting Research, Journal of Accounting and Economics, Journal of Accounting
Research, Journal of Financial Economics, and The Accounting Review. This search
produced the 35 articles identified in Table 1. Most of these articles have been published
after 2000, indicating that IV estimation is a commonly used methodology in
contemporary accounting research.
Accounting researchers generally use instrumental variables in order to mitigate
endogeneity of the predictor variables or to identify a simultaneous system of endogenous
variables (Table 2).2 Most of the studies have reasonably large sample sizes with the
mean (median) number of observations being 2,393 (654). The typical study has less
than three endogenous regressors and used between five and seven instruments (Table 3).
The mean (median) R2 from the first-stage regression (only reported by 26 of the 35
articles) is 31 (26) percent. However, one problem with these R2 measures is that they
2
Instrumental variables are also used to mitigate measurement error in the independent variables. This has
a long history in accounting research and dates back to at least Beaver, Kettler and Scholes (1970). We do
not consider these applications in our discussion because sophisticated latent variable models exist to
address measurement error issues when there are multiple indicators for the same construct.
3
are generally not the partial explanatory power for the instruments that are unique to the
first stage regression. 3 Thus, the strength of the instrumental variables in the first-stage
regression is likely substantially overstated.
It is also instructive to examine the stated reasons and econometric claims made
by these researchers using IV methods. Rather than discussing individual papers, it is
perhaps more useful to paraphrase the typical discussion in published papers.
Accounting researchers clearly understand that endogeneity is a serious econometric
problem that confounds their interpretation of coefficient estimates:
… ignoring the simultaneous endogeneity of a and b
engenders problems of … estimation bias, which prevents
meaningful interpretation of variable coefficients.
Our econometric approach enables us to relax the
somewhat implausible assumptions of prior research
regarding the exogenous nature of either a or b.
… we investigate whether previously documented
associations between a and b are the result of biased
estimation induced by using endogenous variables in
single-equation models.
However, one serious concern across these studies is that there is little attempt to
develop a model that explicitly identifies and justifies the endogenous (choice) variables
and the exogenous and instrumental variables (i.e., those variables that are assumed to be
either pre-determined or are outside the model being examined).4 Most accounting
3
The typical analysis in applied research involves an endogenous y that is a function of an endogenous x
variable and a set of exogenous control variables (z1). In addition, there are multiple instruments,
exogenous variables (z2) that are not included in the equation describing y. In this case the proper measure
of the strength of the instrument is the partial R2. We can easily compute the partial R2 as the following:
2
2
2
( R y,z - R y,z1 )/(1- R y,z1 ), where z is the combined set of z1 and z2.
4
The ideal approach is perhaps as follows: develop an economic theory of the decision making process
regarding the relation of interest, translate the theory into a set of structural equation models that describe
the decision setting, precisely identify the endogenous and exogenous variables, develop the reduced form
equations where the endogenous variables are only functions of exogenous variables, and estimate the
4
research estimates some type of “convenient representation” that is assumed to be the
reduced form, but there is almost never any discussion of the underlying structural
equation model. Moreover, there is almost no discussion regarding the choice of specific
variables for instruments. For example, it is not clear why these instrumental variables
are assumed to be exogenous (i.e., uncorrelated with the error term in the structural
model) and whether the instrumental variables exhibit a lower correlation with the
structural equation error term than the endogenous regressor variable. Despite these
issues, accounting researchers typically make rather bold claims about the ability of IV
methods in addressing endogeneity:
Our findings are robust to controlling for endogeneity
among our variables.
We address potential endogeneity problems by estimating a
2-stage least squares (2SLS) model.
The results of the 2SLS suggest that the relation is not
driven by the potential endogeneity of our variables.
Although IV methods may mitigate the econometric problems induced by
endogeneity, we believe that the typical instruments used in accounting studies are not
well justified and are likely to be inadequate.5
In order to get a feel for the difficulty in obtaining a good instrument, consider the
following example. Suppose we are interested in estimating the effect of disclosure
quality on firms’ cost of capital. To get a meaningful variation in disclosure quality we
use an international dataset. In this case we are obviously worried that there are
reduced form model parameters. Assuming that the model is identified, the structural equation parameters
of interest can then be derived from the estimates obtained from the reduced form equations.
5
Some papers attach various caveats to the econometric approach, such as “Only a and b are treated as
endogenous, other firm-specific variables are assumed to be exogenous or pre-determined variables. This
may not be appropriate.” Thus, the success of the IV method appears to be ambiguous even to the
researcher.
5
unidentified factors that affect both cost of capital and disclosure quality. We might
consider using a country legal origin (English common law, French code law etc.) as an
instrument for disclosure quality (several international accounting papers use legal origin
as either instrument or key independent variable, e.g., Ball, Kothari, and Robin, 2000;
Bushman, Piotroski and Smith, 2005; Leuz, Nanda, and Wysocki, 2003). Unlike many
other instruments that are typically used in accounting research, legal origin seems
clearly predetermined. That is, the company does not determine the legal origin, so we
are not worried about reverse causality. Although, even here, there are some cases where
the company can adopt a different legal origin through a cross listing. However, this can
be controlled for in the regression analysis. However, the fact that legal origin is
predetermined does not necessarily make it a good instrument. What we should worry
about is that legal origin affects other institutions (such as property rights and investor
protection laws) that could also affect a firm’s cost of capital. This is graphically
displayed in Figure 1. The relation we are ultimately interested in is the relation between
disclosure and cost of capital, (a). We can use legal origin as an instrument through
relation (b). However, this only gives the correct inference if either (c) or (d) (or both) are
equal to zero. This condition is unlikely to be satisfied. Thus there is some reason to be
skeptical about this instrument.
In order to provide insight into the choice of estimation methods, it is necessary to
identify the situations where the use of instrumental variables produces better estimates
than ordinary least squares (OLS). In the remainder of the paper, we focus on two
aspects of this econometric question. First, we examine the bias of the estimates if the
instruments are strictly exogenous, but they have weak explanatory power for explaining
6
the endogenous variable. Second, we examine the bias of the estimates when the
“instruments” are not completely exogenous. The impact of these two fundamental
issues on the bias of instrumental variables estimators is examined in Section 3
(asymptotic results) and Section 4 (finite sample results).
3.
Asymptotic Properties of Instrumental Variable Estimators
3.1.
Basic Structure
The typical model in describing the impact of the predictor variable (x) on the
outcome variable (y) is the following:
y = β ⋅x+u
(1)
It is well known that a consistent estimate of β can be obtained using OLS as long the
correlation between x and u is equal to zero. This can be seen from the probability limit
(plim) of the OLS estimator:
plimbOLS = β +
σ
cov( x, u )
= β + u corr ( x, u )
σx
var( x )
(2)
If x and u are uncorrelated, the second term will go to zero and the OLS estimator is a
consistent estimator of the true coefficient. However, this assumption will not be
satisfied when there are determinants of x that are also correlated with y and these other
determinants are not included in equation (1). When x and u are correlated, the second
term in equation (2) will not go to zero and the OLS estimate of β will be inconsistent.
If x and u are correlated, the typical textbook prescription is to use instrumental
variables (e.g., Wooldridge, 2002; Greene, 2002). That is, it is necessary to incorporate a
variable (z) that is correlated with x but not with u. If such a variable exists, equation (1)
7
can be estimated using instrumental variables estimation (IV). The resulting estimator
will be consistent, as can be seen from the probability limit of the IV estimator:
plimbIV = β +
σ corr ( z, u )
cov( z, u )
=β + u
cov( z, x )
σ x corr ( z, x )
(3)
If z is correlated with x but not with u, the second term goes to zero as the sample size
increases and the IV estimator is a consistent estimator of the true coefficient. This holds
even for small corr (z,x), as long as corr (z,x) ≠ 0. However, as we discuss in Section 4,
the size of corr (z,x) can cause serious problems in finite samples.
Bartels (1991) also derives the asymptotic mean square error of the instrumental
variable parameter estimate as:
[σ
2
u
][
][
/ nσ x2 ⋅ 1 / R xz2 ⋅ 1 + nR zu2
]
(4)
where Rij2 is the squared population correlation between variables i and j, and n is the
sample size.
The first term in equation (4) is the asymptotic mean squared error of OLS and
the second term (which is greater than one) is related to the loss in efficiency caused by
using an IV estimate as opposed to an OLS estimate. The third term is related to the bias
in the IV estimator caused by the use of inappropriate instruments. This analysis
demonstrates that even if corr (z,u) is equal to zero, the actual asymptotic standard error
for the IV estimator will be greater than the OLS standard error by the square root of
[1 / R ] . Although asymptotic bias in the IV estimate is not an issue (assuming that corr
2
xz
(z,u) = 0), the associated standard error for the IV estimate is substantially larger than the
standard error for the OLS estimate. For example, if the R xz2 = 0.25 (about the median
first stage R2, see Table 3), the estimated IV standard error will larger than the OLS
8
standard error by a factor 2 ( = 1 / 0.25 ). Thus, ignoring the impact of bias in the
estimate, the power associated with IV estimation may be substantially less than that for
OLS.
3.2.
Semi-Endogenous Instruments
Finding a truly exogenous variable that is also correlated with the x is a daunting
task for applied researchers.6 As discussed by Bartels (1991), it is useful to understand
whether a “semi-endogenous” variable (i.e., an instrument that is “somewhat” correlated
with the error term in the structural equation) will produce IV estimates that are preferred
to OLS estimates. We know from equation (3) that the resulting IV estimator will not be
consistent, but the IV estimate may still have an asymptotic bias that is smaller than the
bias in the OLS estimate.
It is possible to identify the circumstances where the bias in the IV estimator is
smaller by comparing the bias terms in equations (2) and (3).7 The IV estimator has
smaller bias if the following holds:
σ u2 R zu2 σ u2 2
<
R xu
σ x2 R xz2 σ x2
(5)
Rearranging and simplifying equation (5) yields the following condition for the
superiority of the IV estimator over the OLS estimator:
2
R zu2 < R xz2 ⋅ R xu
(6)
As can be seen from equation (6), the “relative endogeneity” of x and z, and the
correlation between x and z are the critical determinants of whether IV estimators are
6
Determining whether the corr (z,u) is equal to zero is especially problematic because u is not observable.
This means that it is not possible to directly estimate the correlation between z and u. We can estimate the
second stage residual, û. This can be used in a test of overidentifying restrictions, see section 3.3.
7
We compare the squared bias terms to avoid problems with “sign flips.”
9
preferred to OLS estimators. For example, if the R xz2 = 0.10 then the correlation between z
and u can be no more than 10% of the correlation between x and u for IV estimation to be
preferred over OLS. If the instrument selected by the researcher is moderately to highly
correlated with the x variable (which can be tested) and a compelling theoretical or
practical argument can be made regarding why the instrument is considerably more
exogenous than the x variable, the IV estimator will be preferred to the OLS estimator.
However, if the correlation between the instrument and the x variable is low and the
researcher has some concern about whether the instrument is truly exogenous, it can
easily be the case that the OLS estimator is preferred to the instrumental variable
estimator.
3.3.
Testing the Appropriateness of Instruments
Several tests have been developed in connection with the use of instrumental
variables estimation (e.g., Chapter 6 in Wooldridge, 2002). The most common is the
Hausman test (Hausman, 1978) which provides a formal test on whether the IV estimator
is significantly different from the OLS estimator. Under the assumption of the
appropriateness of the instruments, this test can be used to determine the existence of an
endogeneity problem and thus the appropriateness of using OLS. This test statistic can
also easily be computed by including both the observed x and the predicted x variable
from the first stage regression into an OLS version of the second stage regression. If the
coefficient on the predicted x is significant, the Hausman test rejects the null of no
endogeneity problem. Variations of this test are applied in the majority of the papers
investigated in our survey (see Table 4).
10
Ideally we would like to have data on the structural error terms, so that we can
test whether the instruments and the second stage error are uncorrelated.8 Unfortunately,
the unobservability of this structural equation error term renders this test impossible.
However, it is possible to correlate the instruments with the estimated error term in the
second stage equation. In case of over-identified models (the number of instruments
exceeds the number of endogenous regressors), we can use this test to determine the
appropriateness of the instruments under the assumption that at least one of the
instruments is valid (see also Hausman, 1978).9 This test should be performed before the
Hausman test, as the latter is not valid if the over-identifying restrictions test rejects the
appropriateness of the instruments (e.g., Godfrey and Hutton, 1994). The over-identifying
restriction test statistic can be obtained by a regression of the second stage residuals on
all exogenous variables. If the instruments are valid, the coefficients on the instruments
should be close to zero. The formal test is based on the R2 from this model being close to
zero. In particular, nR2 is distributed χ2 with K-L degrees of freedom, where K is the
number of exogenous variables unique to the first stage and L is the number of
endogenous explanatory variables. It is important to note that the test requires that at least
one of the instruments is valid (i.e., exogenous). If this does not hold, we can have a
situation where the instruments have similar bias, so that the test will not reject (even in
large samples), even though the coefficient can be severely biased.
8
In our survey of the accounting literature, we found several instances where the authors correlated the
instrument(s) with the dependent variable (with or without controls) instead of the error term. Upon
finding an insignificant relation, they concluded that they had a valid instrument. This is a completely
inappropriate procedure.
9
Obviously, this will be zero by construction for just-identified models (the number of instruments equals
the number of endogenous regressors).
11
In our survey, we investigate whether authors perform a test of over-identifying
restrictions. Since the latter can only be tested when the model is over-identified, we split
the sample in just identified and over-identified models and report the results separately
(Table 4). In contrast to the Hausman test, we find that very few papers utilize this test
even though most use over-identified models. Finally, in very large samples both tests
will always reject, and therefore it is useful to supplement the formal test with a
sensitivity analysis that examines whether the use of different instrumental variables
yields very different results.
4.
Finite Sample Properties of Instrumental Variable Estimators
The analysis in Section 3 focused strictly on an asymptotic (very large sample)
results. While this asymptotic analysis is straightforward to compute and provides
important limiting results, it does not produce insight into the properties of OLS and IV
estimators applied in finite samples. Richardson (1968) and Sawa (1969) provide the
exact finite sample properties of some class of IV estimators. They show that the finite
sample bias of the IV estimator is in the same direction as the bias in the OLS estimator.
Moreover, Nelson and Startz (1990a, 1990b) find the asymptotic distribution of the IV
estimator is a very poor approximation to the finite sample distribution when the
instrument is only weakly correlated with the regressor. In addition to the above, an
extensive literature has evolved around the problems with weak instruments (e.g., Bound,
Jaeger, and Baker (1995), Staiger and Stock (1997), Hahn and Hausman (2003))
12
4.1
Basic Structure
We examine the finite sample properties of IV estimators using the following
model:
x = λu + ε
z = γε + θu + δ
y = βx + u
(7)
The actual endogenous variable (y) is assumed to be a linear function of the predictor
variable (x) and the random structural equation error (u). The estimate of primary interest
is the structural equation parameter (β). The predictor variable (x) is assumed to be a
function of a random variable (ε) plus a function of the random structural equation error
(u) via the λ parameter. If λ.is equal to zero, x is strictly exogenous.
The instrumental variable (z) is composed of three components. First, z is
assumed to be a function of random variable (ε) via the γ parameter. Since ε is the
exogenous part of x, the parameter γ partially determines the strength of the instrumental
variable. Second, the instrumental variable also allows z to be “semi-endogenous” via
the relation to the structural equation error (u). If the parameter θ is equal to zero, z will
be strictly exogenous in large samples. Third, z is a function of random error (δ). We
assume that ε, δ, and u have a normal distribution with a population mean of zero and
variances of σ ε2 , σ δ2 , σ u2 , respectively. Finally, we assume that the population covariance
matrix of ε, δ, and u is diagonal. Although this structure is simple, it is sufficiently
complex to illustrate the finite sample issues with IV estimation.
For this model, the IV estimator can be characterized as follows:
13
1
∑ zu
m zu
n
(8)
=β +
,
bIV = β +
1
γmεε + mδε + θmεu + λm zu
∑ xz
n
where the mij denotes either sample second moments or sample cross products. When we
set β = 0 and θ = 0, equation (8) simplifies to10:
bIV =
m zu
γmεε + mδε + λm zu
(9)
This expression will asymptotically approach zero because this is expression for
the bias in the IV estimator. However, in finite samples, the estimator is not wellbehaved because, even if θ = 0, mzu is not equal to zero in finite samples. Moreover,
similar to the analysis in Nelson and Startz (1990a), there is a discontinuity in bIV at the
point where mzu = -(γ mεε +mδε)λ-1. The bias in bIV when mzu is either a large negative or
positive number is equal to λ-1. However, when mzu approaches -(γ mεε +mδε)λ-1 from the
left (right), bIV approaches positive (negative) infinity. Thus, the moments for bIV do not
exist.
Obviously, the OLS estimator is also biased in finite samples and the bias (recall
β = 0) is equal to:
bOLS =
mxu
mεε + λ2 muu
(10)
Although bIV and bOLS are biased in finite samples, it is mathematically difficult to
compute the exact finite sample distribution for the estimators in order to compare their
statistical properties. As a result, we develop our finite sample results using the type of
numerical simulations that is common in applied econometric research.
10
We do not make the assumption that δ and ε are uncorrelated in finite samples as in Nelson and Startz
(1990a and 1990b). Making this assumption would imply that u is the only random variable in the system
causing a singular covariance matrix (see the comment by Maddala and Jeong (1992) regarding the impact
of this on the results in Nelson and Startz (1990a)).
14
4.2
Simulation Results
The specific approach used for the simulation is discussed in Appendix. In
particular, our results are based on 144 independent simulations where we vary the
sample size (n = 100, 200, 500, and 1,000), endogeneity in the regressor x (corr(x,u) =
0.1, 0.4, and 0.7), endogeneity in the instrumental variable z (corr(z,u) = 0.0, 0.1, 0.4, and
0.7), and the strength of the instrumental variable (corr(x,z) = 0.1, 0.4, and 0.7). These
parameter values were selected to roughly correspond to the statistics reported in typical
accounting studies with regard to sample size and the explanatory power for the firststage regressions. For each simulation we generate 1000 independent samples with the
indicated population moments. The percentiles for the distribution of the OLS and 2SLS
parameter estimates when the true β = 0 are presented in Table 5A, 5B, and 5C for
corr(x,u) = 0.1, 0.4, and 0.7, respectively. The shaded cells in these tables represent cases
where 2SLS is preferred to OLS based on median bias, that is, in those cells the median
of 2SLS estimates is closer to zero than the median of OLS estimates.11 However, even in
those cases it is not clear that 2SLS should be the preferred estimator, since the dispersion
is (much) higher than the dispersion of OLS estimates, especially when the instruments
are weak.
In situations where there is low endogeneity in the x variable (Table 5A), OLS
estimates generally dominate 2SLS estimates in terms of median bias. As might be
expected, asymptotically, 2SLS is preferred only when the instrument is perfect (i.e.,
corr(z,u) = 0.0). However, even in these cases, the variability of the 2SLS estimates
11
This is consistent with the asymptotic results, that is, in those cells the inequality in equation (6) holds
for the population moments.
15
tends to be much larger than that for OLS, although this outcome is somewhat mitigated
for larger sample sizes and strong instrumental variables. The results in Table 5A
indicate that the use of 2SLS in situation where there is minimal endogeneity will
produce estimates for β with undesirable properties relative to OLS. This is especially
true when the selected instruments are weak and semi-endogenous. Thus, the
indiscriminate application of 2SLS is not desirable in accounting research.
As endogeneity in the x variable increases (Tables 5B and 5C), the median bias
for OLS estimates is generally smaller than the median bias for 2SLS estimates when the
instruments are weak. Even in situations where the instruments have a strong association
with the x variable, OLS estimates are preferred to 2SLS estimates when the z exhibits
moderate to strong endogeneity. The 2SLS estimates exhibit lower median bias when the
instruments are strong, z has modest endogeneity, and the endogeneity in x is large.
However, it is also important to highlight that the variability of the 2SLS estimates is
generally larger than the variability of OLS estimates (although the variability for 2SLS
estimates is a decreasing function of sample size).
The results in Table 5 have important implication for the econometric methods
applied in accounting research. For example, consider the case where the researcher
suspects moderate to high endogeneity in the x variable (e.g., corr(x,u) = 0.4 or 0.7) and
the explanatory power for the first-stage regression is similar to that reported in our
survey of accounting applications (i.e., corr(z,x) = 0.4 or 16 percent explanatory power).
OLS estimates will be preferred to the 2SLS estimates unless the researcher can provide a
compelling argument that the endogeneity in the instrumental variable z is very small
(i.e., corr(z,u) = 0.1 or one percent explanatory power). For most empirical accounting
16
research, it is difficult to believe that the variables selected as instruments exhibit a level
of endogeneity that is this low. Although the simulation results are specific to the model
in equation (7) and selected parameter values, the analysis presented in Table 5 raises
serious concern about the desirability of IV estimation methods to correct for endogenous
regressors.12
4.3.
Testing the appropriateness of instruments
Similar to the asymptotic case, it is important to test the appropriateness of the
instrumental variables model by performing a Hausman test and a test of over-identifying
restrictions. In addition to the issues discussed in Section 3.3, there is also the problem
that the small sample distribution of these test statistics can differ quite dramatically from
the assumed asymptotic distribution (e.g., Hahn and Hausman, 2003). Absent knowledge
about the exact finite sample properties of these tests, simulation can be used to estimate
the approximate size of the test statistics for a specific study. For example, Abernethy,
Bouwens and Van Lent (2004) perform such a simulation analysis and report that the
empirical distribution of the Hausman test in their sample is different from the asymptotic
distribution. Finally, it is useful to supplement the formal tests with a sensitivity analysis
where the researcher identifies the impact of using different instruments on the IV
estimates.
12
Although not reported in tables, we also examined the rejection percentages based on the t-statistics for
the OLS and IV estimators. If β = 0, we would desire that the rejection frequency is very low at
conventional levels of statistical significance. Unsurprisingly, we find that the rejection frequency goes to 1
as the bias and the sample size increase. Since the IV standard errors are (much) larger than the OLS
standard errors, we are less likely to reject the null using IV. In this case that is a good thing, because the
null happens to be the true value. However, in more general settings the true value of β is likely not zero.
The higher dispersion of the IV estimator then inhibits the detection a non-zero parameter. This is
consistent with the findings of some of the papers in table 1, where the authors find little difference
between the IV estimate and the OLS estimate, but the IV estimate is not significant whereas the OLS
estimate is. In those cases the Hausman test would not reject and one should adopt the OLS estimate.
Unfortunately, this is not always done.
17
5.
Applications of OLS and IV Estimation Methods
In order to illustrate the theoretical issues discussed above, we apply OLS and IV
estimation methods in two contemporary accounting research settings. The first example
examines the association between the cost of capital and corporate disclosure and the
second example assesses the association between chief executive officer (CEO)
compensation and the power of corporate insiders. In each analysis, we use instrumental
variables that are commonly selected by prior researchers. We also apply the overidentifying restriction and Hausman (1978) tests for assessing the extent of endogeneity
in the regressors.13
5.1
Cost of Capital and Corporate Disclosure
The effect of voluntary disclosure on cost of capital has received considerable
interest from accounting researchers, but this topic remains controversial from both a
theoretical and econometric perspective. We rely on several papers that have attempted to
address the potential endogeneity of the disclosure choice. Leuz and Verrecchia (2000),
Hail (2002), and Brown and Hillegeist (2003) find the expected statistically positive
relation between their disclosure proxy and selected measures for the cost of capital after
using IV estimation. In contrast, Cohen (2003) finds that the relation between reporting
quality and cost of capital is no longer significant after taking into account the
endogeneity of the choice of reporting quality. The purpose of our analysis is not to
resolve this issue but rather to demonstrate common pitfalls in the application of
instrumental variables estimation.
13
A variety of alternative tests have been suggested (e.g., a graphical diagnostic developed by De Luna
and Johansson, 2004). We plan to evaluate these tests in future revisions.
18
Our measure of cost of capital is based on several measures of implied cost of
capital. Rather than arbitrarily picking one metric, we use the average of the four implied
cost of capital measure investigated by Guay, Kothari and Shu (2003). These are the
Gebhardt, Lee and Swaminathan measure, the Claus and Thomas measure, the Gordon
growth model, and the Gode and Mohanran measure. Our disclosure measure is based on
the prior work of Francis, Olsson, LaFond and Schipper (2005). Specifically, a regression
model is estimated by industry/year of current accruals on sales growth, PPE, and past,
current and future cash flow and save the residual for each firm-year. For each firm we
then calculate the standard deviation of these residuals over the past five years and call
this measure AccrualsQuality (AQ). Higher scores on this measure mean poorer
information quality.
For instrumental variables, we simply pick a set of variables previously used for
this purpose in other papers. Unfortunately, in most accounting studies on voluntary
disclosure, the instrumental variables are selected in an arbitrary manner and this violates
one of the most important principles in instrumental variables estimation (i.e., the careful
selection and justification of instruments). Our instrumental variables are the natural log
of the number of owners, one-year sales growth, capital intensity (PPE/assets), litigation
risk (dummy, equal to 1 if the firm is in a high litigation industry), operating margin,
length of operating cycle (in days), and the presence of a Big-six auditor. In addition we
use the following control variables: log of market value of equity, book-to-market equity,
number of analysts, leverage (total debt/market value of equity), and return on assets.
The cost of capital measures are computed as of July 1 for each year from 19822000. We restrict the sample to firms with December fiscal year end and all of our
19
independent variables are measured using data of six months before the computation of
the cost of capital measure. In order to reduce the influence of “outliers,” we truncate all
variables at the 1st and 99th percentile. The regressions are estimated in a pooled timeseries cross section setting after subtracting the year specific mean from each variable in
order to remove temporal effects. For ease of interpretation, we also divide the disclosure
measure (AccrualsQuality) by its year specific standard deviation. This allows an easy
interpretation of the coefficient of Accruals Quality, since the coefficient is the effect on
cost of capital (in percentage points) of a one standard deviation change in AQ. The
descriptive statistics are displayed in Table 6.
In our analysis, we first estimate an OLS regression of cost of capital on AQ and
the control variables. The results are displayed in Table 7. We find that a one standard
deviation increase in AQ is associated with a statistically significant 0.44% increase in
cost of capital. This is consistent with prior literature that finds that better disclosure is
associated with lower cost of capital (recall that higher AQ means lower quality
disclosure). We then estimate a 2SLS regression using the previously discussed
instruments. Consistent with econometric literature (e.g., Chapter 5 from Wooldridge,
2002) we include all exogenous variables in the first stage, not just the assumed
instruments. The R2 of this first-stage model is 27%. However, this overstates the true
explanatory power of the instruments as the control variables also contribute to this R2.
After removing the contribution of the control variables, the partial R2 is approximately
16%.
At this point, it is necessary to qualitatively evaluate whether the instrumental
variables estimation is likely to improve over the OLS estimate. From equation (6), we
20
know that the squared correlation of the instruments with the structural error term has to
be less than 16% (the partial R2) of the comparable squared correlation between AQ and
the structural error for 2SLS to provide better estimates than OLS. Thus the selected
instruments must be substantially more exogenous than AQ for the 2SLS estimates to
dominate the OLS estimates. Without a rigorous justification of the instruments there is
no way to evaluate this critical inequality. Although somewhat subjective, we believe
that variables such as operating margin, capital intensity, and growth rates are
endogenous since they are caused by the same variables as the determinants of voluntary
disclosure. Thus, our intuition is that OLS estimates are likely to dominate 2SLS
estimates.
In the second stage results, we find that the effect of AQ on cost of capital has
increased relative to the OLS estimate. A one standard deviation increase in AQ now
leads to a statistically significant 1.1% increase in cost of capital. At this point, it would
be useful to determine whether it is conceivable that increases in disclosure can cause this
level of change in the cost of capital. Unrealistically high or low estimates would cause
suspicion about the quality of the instruments. In this case the estimate does not seem
unreasonable. After presenting the 2SLS results, accounting researchers typically then
perform a Hausman test (done by the majority of the papers we investigated). In this case
the Hausman test strongly rejects the exogeneity of AQ, and this leads the researcher to
conclude that the 2SLS estimate is preferable to the OLS estimate. However, the validity
of this conclusion critically depends on the appropriateness of the instruments (i.e., that
the instrumental variables are actually exogenous).
21
If multiple instruments are available for the endogenous variable, as we have in
this case, it is necessary to compute a test of over-identifying restrictions. If this test
rejects the appropriateness of the instruments, it is not appropriate to proceed to the
Hausman test (e.g., Godfrey and Hutton, 1994). Equivalently, it is possible to examine
the sensitivity of second stage estimates for the instrumented AQ to the use of different
(sets of) instruments. The intuition of this test is that if the instruments are valid, then
each should give us the true coefficient. Thus the estimates produced by different
instruments should be similar. This test is implemented in the last columns in Table 7
(i.e., “unconstrained” second stage). The model is an OLS regression of cost of capital on
all the independent variables. However, for ease of comparison, we replaced each
independent variable by the product of its original value and its first stage coefficient.
This facilitates the interpretation, because the coefficient on each instrument is equal to
the second stage coefficient on AQ in a model where that instrument is the only
instrument and the rest of the so-called instruments are treated as control variables (i.e.
they are included explicitly in the second stage). If the instruments are valid, the resulting
coefficients for the instruments should be close to each other and therefore close to the
2SLS estimate (which is the weighted average of these estimates).
The results in Table 7 illustrate that the coefficients on the assumed exogenous
variables vary considerably. For example, if number of owners would have been used as
the sole instrument, we would have found a negative and statistically insignificant
coefficient on AQ. However, if operating margin or operating cycle would have been
used as instruments, we would have obtained implausibly high estimates of 10%-20%
increases in cost of capital for a one standard deviation increase in AQ. Not surprisingly,
22
we find that a formal test rejects the equivalence of these coefficients (χ2 = 148.1,
p < 0.0001). Note that the fact that sales growth and capital intensity have mid range
coefficients that are reasonably close to the 2SLS coefficient does not make them any
better or worse than the other instruments. The sensitivity analysis and the formal tests
indicate that our set of instruments is dubious, and unlikely to produce better estimates
than OLS.
A cautionary note on the use of the over-identifying restrictions test is that
research in econometrics (e.g., Hahn and Hausman, 2003) has shown that the size of the
test in finite samples can differ significantly from the asymptotic size, leading to false
rejections. In addition, it is necessary to consider the power of this test. In very large
samples the test may be so powerful that economically small deviations lead to rejections
(even though 2SLS is much better than OLS), whereas in small samples the test may lack
power to reject even economically important deviations (even though 2SLS might well be
worse than OLS). Thus, it is important to supplement the formal test with some
sensitivity analysis such as our “unconstrained” second stage to assess the similarities (or
dissimilarities) in the coefficient estimates obtained when using different sets of variables
as instruments. A final note of caution is that neither this test nor the sensitivity analysis
will pick up problems when all instruments exhibit similar problems. That is, if the
instruments all lead to a bias in the same direction with comparable magnitude, this test
will not reject (even in large samples), but the coefficient on the primary endogenous
variable of interest can be severely biased. Therefore the test should be used as a check
on instruments justified by economic theory and should not be used to select instruments,
unless one has a proper instrument to benchmark the coefficients against. Overall, the test
23
on over-identifying restrictions is a useful tool in evaluating the desirability of 2SLS
versus OLS methods. However, it cannot replace the careful selection and justification of
the variables used as instruments.
5.2
CEO Compensation and Insider Power
Our second study investigates the impact of insider power on CEO compensation.
Prior literature such as Core, Holthausen, and Larcker (1999) has documented an
association between certain board characteristics and excess CEO compensation levels. A
potential concern is that there are many variables that affect both insider power and CEO
pay levels and which have not been properly controlled in the regression. Recent research
into the determinants of corporate governance has fueled this concern (e.g., Gillan,
Hartzell, and Starks, 2003; Doidge, Karolyi and Stulz, 2004b; Black, Jang, and Kim,
2004).14
Our sample is primarily developed from data provided by Equilar, Inc. Similar to
Larcker, Richardson and Tuna (2004), we obtain CEO compensation and board of
director data for 2,106 companies with fiscal year ends between June, 2002, and May,
2003. The dependent variable for our analysis is total compensation for the CEO,
measured as the natural logarithm of total remuneration (i.e., the sum of base salary,
annual bonus and the expected values for stock options, performance plans, and restricted
stock). The primary independent variable is “insider power.” We measure this construct
by adding the standardized scores of board size, percentage of outside directors older than
14
Another potential concern is that governance and compensation levels are jointly determined which
would make single equation methods inappropriate. The simultaneous equation or nonrecursive structure is
substantially more difficult to solve because instruments are required for both the CEO compensation and
insider power endogenous variables in order to identify the system. In our example, we will model only the
impact of insider power on CEO compensation.
24
70, percent of outside directors on at least 4 boards, and whether an insider is chairman,
from this we subtract the standardized score of the average percentage share ownership
by outside directors. In order to increase interpretability of this measure, we standardize
this measure to have a zero mean and a standard deviation of one. Based on prior
research and a variety of institutional conjectures (e.g., Levitt, 2004), we expect that
corporate insiders will have more power where the chairman is also the CEO, the board is
large, and the outside member are old and/or busy with other commitments. Holding
aside issues related to endogeneity, we expect insider power to have a positive
association with the level of CEO compensation.
Our control variables are the standard economic variables that have been used in
many prior studies of executive compensation: firm size (natural logarithm of market
value), book-to-market ratio, return on assets, stock return, standard deviation of return
on assets, and standard deviation of stock returns. We expect that the level of
compensation is increasing in firm size, extent of investment opportunities, stock market
and operating performance, and risk or volatility in the performance measures. As
instruments for insider power, we use the following (somewhat predetermined) CEO
characteristics: natural logarithm of CEO age, natural logarithm of CEO tenure and
whether the CEO is the founder of the company (measured as an indicator variable). We
expect that insider power will have a positive association with variables related to either
longevity of the CEO or the role of firm founder. Since different stock exchanges impose
different governance requirements on the firm, we incorporate an indicator variable for
whether the firm is listed on NYSE/AMEX. Since institutional shareholders can limit
insider power, we include the percentage holdings by blockholders and the percentage
25
holdings by activist institutions (e.g., public pension funds) as instruments (these data
were obtained from Spectrum filings). Finally, the external auditor also has the potential
to mitigate insider power and we measure this factor using an indicator variable for Big
Four auditor and the ratio of non-audit fees to audit fees. We expect that insider power
will be a decreasing function of the use of higher quality auditors and less fees from nonaudit services. The final sample with complete data consists of 1,483 firms. The
descriptive statistics for these variables are displayed in Table 8.
The OLS regression of CEO compensation on insider power and control variables
is presented in Table 9. Most of the economic control variables have the expected signs,
although firm size is the primary determinant for CEO compensation. We also find that a
one standard deviation increase in insider power is associated with an 8% increase in total
CEO compensation. Whether this estimate is interpretable depends on the extent of
endogeneity in the insider power variable. The first stage regression for IV estimation
regresses insider power on all exogenous variables, and firm size also appears to be the
most important determinant of insider power. The statistical significance and signs for
the coefficient estimates are mixed (e.g., CEO age and tenure have the expected positive
relation, but founder has an unexpected negative relation with insider power). Although
the adjusted R2 for the entire model is 29%, the partial R2 for determining the strength of
the instrument is only about 5%. Given this result, the hurdle for instrumental variable
estimation to be preferred is very high. That is, we know from equation (6) that the
squared correlation between our set of instruments and the structural error has to be less
than 5% of the corresponding squared correlation between insider power and the
structural error. Unless the researcher is highly confident in the choice of instruments
26
(which is clearly open to question here), OLS is likely to be the preferred estimator. It is
important to note that the OLS estimator may have considerable bias, but it is still likely
to exhibit better statistical properties than the 2SLS estimator.
The second stage results indicate that there is a negative relation between insider
power and CEO compensation (i.e., a one standard deviation increase in insider power is
now associated with a 4% decrease in CEO remuneration). However, this estimate is not
statistically significant at conventional levels. Not surprisingly, we find that the Hausman
test cannot reject the equivalence of the OLS and the 2SLS estimates (χ2 = 1.25, p=
0.99). The appropriateness of the instrument can be assessed by examining the
‘unconstrained’ second stage. As in the disclosure study, we regress the dependent
variable (total compensation) on all the exogenous variables, where the independent
variables have been transformed into the product of the original value and the first stage
coefficient. The coefficient on each of the instruments can be interpreted as the 2SLS
estimate if that variable would have been the unique excluded variable, and all other
variables treated as control variables. Similar to the results from the disclosure study, the
coefficients exhibit substantial variability in both size and sign. As would be expected,
the formal test of over-identifying restrictions rejects the null of no relation between the
instruments and the error term (χ2 = 41.1, p < 0.0001). Thus, the results from the
compensation analysis provide even stronger evidence than the disclosure analysis that an
IV estimator does not represent an improvement over the OLS estimator in typical
accounting research studies.
27
6.
Conclusion and Recommendations
There is little doubt that endogeneity causes substantial econometric problems in
virtually all non-experimental empirical accounting research. Accounting researchers are
knowledgeable about these econometric problems and it has become popular to use
instrumental variables in the hope of mitigating the inconsistency in parameters
estimates. However, as we have shown in our synthesis and extension of the
contemporary econometrics literature, many of the instrumental variable applications in
accounting are likely to produce highly misleading parameter estimates and inferential
tests. We agree with the insightful comments by Hamermesh (2000, p. 371) regarding IV
estimators:
… its proponents are often too quick to assume that the chosen instrument
is exogenous and generates a consistent estimate of the population
parameter.
One must be able to argue that the instrument is beyond the decisionmakers’ control and that it describes behavior that is randomly distributed
in the population one wishes to describe.
The ultimate question is whether IV estimation is useful for typical empirical
accounting research. While it is impossible to answer this question completely, there are
several analyses that researchers should always report in order to help the reader assess
the usefulness of an IV application. First, accounting researchers must provide a
justification for their choice of instrumental variables. In particular, the correlation
between the instruments and the structural error has a critical impact on the usefulness of
the IV estimators and researchers must use economic theory and/or intuition to convince
the reader that the size of this correlation is small enough so that IV estimators are
superior to OLS estimators. Second, researchers must report the full results of the first
28
stage regression, including the F-statistic and partial R2. At that point in the analysis, the
researcher should reevaluate the appropriateness of the instruments. This evaluation
should be based on whether the instruments are “sufficiently exogenous” relative to the
strength of their relation with the endogenous variable and the degree of endogeneity in
the regressor of interest (i.e., the inequality in equation (6) should be evaluated to justify
the use of IV estimators over OLS estimators). Third, in addition to the standard second
stage results, researchers should report sensitivity analyses similar to the ‘unconstrained’
second stage. This should be supplemented with a formal test of the appropriateness of
the instruments (i.e., over-identifying restriction test) and a test on the difference between
the IV estimator and the OLS estimator (i.e., Hausman test). Finally, it can desirable to
use simulation to investigate properties of the test statistic used to test the null hypothesis
of interest.
The application of any textbook solution to a difficult applied research problem is
generally a very complicated task. The discussion in this paper provides some insight into
the difficulties encountered with the use of instrumental variables estimation for dealing
with endogeneity problems. In addition, our analysis provides some structure for
assessing whether IV estimation is likely to provide better estimates than OLS estimation.
Since endogeneity is an important problem in much accounting research, knowing when
and when not to use instrumental variable estimation is vital to making empirical
progress on many important accounting issues.
29
Appendix
Simulation Approach for Analyzing Finite Samples of OLS and IV Estimators
We are interested in examining the properties of OLS and IV estimators for the
following model:
x = λu + ε
z = γε + θu + δ
y = βx + u
The population moments (correlations) of interest are:
ρ x ,u =
ρ z ,u =
ρ x,z =
λσ u
λ 2σ u2 + σ ε2
θσ u
γ σ ε + θ 2σ u2 + σ δ2
2
2
λθσ u2 + γσ ε2
λ 2σ u2 + σ ε2 γ 2σ ε2 + θ 2σ u2 + σ δ2
For our simulations, we want to pick the parameters σ ε2 , σ δ2 , σ u2 , λ, γ, and θ in order to
achieve the desired population moments.
The parameter λ is the key input into the value of ρ x,u . Squaring the equation for
ρ x,u results in:
ρ x2,u =
λ 2σ u2
λ 2σ u2 + σ ε2
and this can be used to solve for λ as a function of σ δ2 , σ u2 :
−1
 1
 σ e2
λ =  2 − 1
2
 ρ x ,u
 σu
If we compute the ratio of ρ x, z / ρ z,u and simplify, this results in:
30
ρ x, z γ
σ ε2
λσ u2
=
+
ρ z ,u θ σ u λ 2σ u2 + σ ε2 σ u λ 2σ u2 + σ ε2
After, rearranging terms, we achieve the following:
 σ u λ 2σ u2 + σ ε2 ρ x , z
γ =θ

σ ε2
ρ z ,u
−
λσ u2 
 = θK
σ ε2 

After squaring ρ z,u , we get the following expression:
ρ z2,u =
θ 2σ u2
γ 2σ ε2 + θ 2σ u2 + σ δ2
and we can then substitute in γ = θK and solve for θ. This gives us an expression for θ in
terms of σ ε2 , σ δ2 , σ u2 , λ (computed above), and ρ z,u and ρ x, z (specified to the values of
interest):
−1

 σ2
θ =  2u − K 2σ ε2 − σ u2  σ δ2

 ρ z ,u
Finally, we can compute γ as θK. Using the above formulae and specified values
for σ ε2 , σ δ2 , σ u2 , we can compute the λ, γ, and θ required to achieve the desired population
correlations of ρ x,u , ρ z,u and ρ x, z .15 In the simulation ε, δ, and u are drawn from a
standard normal distribution. For each sequence of random observations forε, δ, and u,
the assumed regression parameter β, and the computed parameters λ, γ, and θ, we can
compute y, x, and z such we achieve the desired population correlations of ρ x,u , ρ z,u
ρ x, z .
ρ z,u is in the denominator of the expression for θ, it must be different from zero. To avoid having
to derive a separate set of formulae for this special case, we set the minimum ρ z,u to be 1 x 10-13.
15
Since
31
References
Abernethy, Margaret A., Jan Bouwens, and Laurence van Lent, 2004, Determinants of
control system design in divisionalized firms, Accounting Review 79, 545-570.
Aboody, David, 1996, Market valuation of employee stock options, Journal of
Accounting and Economics 22, 357-391.
Al-Tuwaijri, Sulaiman A., Theodore E. Christensen, and K.E. Hughes II, 2004, The
relations among environmental disclosure, environmental performance, and
economic performance: a simultaneous equations approach, Accounting,
Organizations, and Society 29, 447-471.
Anderson, Ronald C., Sattar A. Mansi and David M. Reeb, 2004, Board characteristics,
accounting report integrity, and the cost of debt, Journal of Accounting and
Economics 37, 315-342.
Ball, Ray, S.P. Kothari, and Ashok Robin, 2002, The effect of international institutional
factors on properties of accounting earnings, Journal of Accounting and
Economics 29, 1-51.
Bartels, Larry M., 1991, Instrumental and ‘quasi-instrumental’ variables, American
Journal of Political Science 35, 777-800.
Barton, Jan, 2001, Does the use of financial derivatives affect earnings management
decisions?, Accounting Review 76, 1-26.
Beatty, Anne, Sandra L. Chamberlain, Joseph Magliolo, 1995, Managing Financial
Reports of Commercial Banks: The Influence of Taxes, Regulatory Capital, and
Earnings, Journal of Accounting Research 33, pp. 231-261.
Beaver, William H., Paul Kettler, and Myron Scholes, 1970, The association between
market determined and accounting determined risk measures, Accounting Review
45, 654-682.
Beaver, William H., Mary Lea McAnally, and Christopher H. Stinson, 1997, The
information content of earnings and prices: A simultaneous equations approach,
Journal of Accounting and Economics 23, 53-81.
Black, Bernard S., Hasung Jang, and Woochan Kim, 2004, Predicting firms’ corporate
governance choices: Evidence from Korea, working paper.
Bound, John, David A. Jaeger, and Regina M. Baker, 1995, Problems with instrumental
variables estimation when the correlation between the instruments and the
endogenous explanatory variable is weak, Journal of the American Statistical
Association 90, 443-450.
32
Brown, Stephen, and Stephen A. Hillegeist, 2003, Disclosure quality and information
asymmetry, working paper.
Bushman, Robert M, Joseph D. Piotroski, and Abbie J. Smith, 2005, What determines
corporate transparency?, Journal of Accounting research, forthcoming.
Bushee, Brian J., Dawn A. Matsumoto and Gregory S. Miller, 2003, Open versus closed
conference calls: the determinants and effects of broadening access to disclosure,
Journal of Accounting and Economics 34, 149-180.
Cohen, Daniel A., 2003, Quality of financial reporting choice: Determinants and
economic consequences, working paper.
Copley, Paul A., Edward B. Douthett Jr, 2002, The association between auditor choice,
ownership retained, and earnings disclosure by firms making initial public
offerings, Contemporary Accounting Research 19, 49-75.
Copley, Paul A., Jennifer J. Gaver, Kenneth M. Gaver, 1995, Simultaneous Estimation of
the Supply and Demand of Differentiated Audits: Evidence from the Municipal
Audit Market, Journal of Accounting Research 33, pp. 137-155.
Core, John E., Robert W. Holthausen, and David F. Larcker, 1999, Corporate
governance, chief executive compensation, and firm performance, Journal of
Financial Economics 51, 371-406.
DeFond, Mark L, K Raghunandan, K.R Subramanyam, 2002, Do non-audit service fees
impair auditor independence? Evidence form going concern audit opinions,
Journal of Accounting Research 40, 1247-1274.
De Luna, Xavier, and Per Johansson, 2001, Graphical diagnostics of endogeneity,
Working paper, Umeå University.
Demers, Elizabeth, and Katharina Lewellen, 2003, The marketing role of IPOs: evidence
from internet stocks, Journal of Financial Economics 68, 413-437.
Doidge, Craig, G. Andrew Karolyi and René M. Stulz, 2004a, Why are foreign firms
listed in the U.S. worth more?, Journal of Financial Economics 71, 205-238.
Doidge, Craig, G. Andrew Karolyi and René M. Stulz, 2004b, Why do countries matter
so much for corporate governance?, working paper.
D'Souza, Julia M.,1998, Rate-Regulated Enterprises and Mandated Accounting Changes:
The Case of Electric Utilities and Post-Retirement Benefits Other Than Pensions
(SFAS No. 106), The Accounting Review 73, pp. 387-410.
33
Francis, Jennifer, Ryan LaFond, Per Olsson, and Katherine Schipper, 2005, The market
pricing of accruals quality, Journal of Accounting and Economics 39,
forthcoming.
Gillan, Stuart L., Jay C. Hartzell, and Laura T. Starks, 2003, Explaining corporate
governance: Boards, bylaws and charter provisions, working paper.
Godfrey, L.G., and J.P. Hutton, 1994, Discriminating between errors-in-variables /
simultaneity and misspecification in linear regression models, Economic Letters
44, 359-364.
Greene, William H., 2002, Econometric Analysis, 5th edition, Prentice Hall.
Guay, Wayne R., S.P. Kothari, and Susan Shu, 2003, Properties of implied cost of capital
using analysts’ forecasts, working paper.
Hahn, and Jerry A. Hausman, 2003, Weak instruments: diagnoses and cures in empirical
econometrics, American Economic Review 93, 118-125.
Hail, Luzi, 2002, The impact of voluntary disclosure on the ex ante cost of capital for
Swiss firms, European Accounting Review 11, 741-773.
Hamermesh Daniel S., 2000, The craft of labormetrics, Industrial and labor relations
review 53, 363-380.
Hansen, Stephen C., John S. Watts, 1997, Two models of auditor-client interaction: Tests
with United Kingdom data, Contemporary Accounting Research 14, 23-50.
Hausman, Jerry A., 1978, Specification tests in econometrics, Econometrica 46, 12511271.
Hodder, Leslie, Mark Kohlbeck, Mary Lea McAnally, 2002, Accounting choices and risk
management: SFAS no. 115 and U.S. bank holding companies, Contemporary
Accounting Research 19, 225-470.
Holthausen, Robert W., David F. Larcker and Richard G. Sloan, 1995, Business unit
innovation and the structure of executive compensation, Journal of Accounting
and Economics 19, 279-313.
Hunt, Alister, Susan E. Moyer, and Terry Shevlin, 1996, Managing interacting
accounting measures to meet multiple objectives: A study of LIFO firms, Journal
of Accounting and Economics 21, 339-374.
Kasznik, Ron, 1999, On the Association between Voluntary Disclosure and Earnings
Management, Journal of Accounting Research 37, pp. 57-81.
34
Keating, A. Scott, 1997, Determinants of divisional performance evaluation practices,
Journal of Accounting and Economics 24, 243-273.
Lang, Mark H., Karl V. Lins, Darius P. Miller, 2004, Concentrated control, analyst
following, and valuation: Do analysts matter most when investor are protected
least?, Journal of Accounting Research 42, 589-623.
Larcker, David F., Scott A. Richardson, A. Irem Tuna, How important is corporate
governance?, working paper.
Leuz, Christian, Dhananjay Nanda and Peter D. Wysocki, 2003, Earnings management
and investor protection: an international comparison, Journal of Financial
Economics 69, 505-527.
Leuz, Christian, Robert E. Verrecchia, 2000, The Economic Consequences of Increased
Disclosure, Journal of Accounting Research 38, pp. 91-124.
Levitt Jr., Arthur, 2004, “Money, Money, Money”, The Wall Street Journal, November
22, p. A14.
Libby, Theresa, Robert Mathieu, and Sean W. G. Robb, 2002, Earnings announcements
and information asymmetry: An intra-day analysis, Contemporary Accounting
Research 19, 449-472.
Loudder, Martha L., Inder K. Khurana, James R. Boatsman, 1996, Market Valuation of
Regulatory Assets in Public Utility Firms, The Accounting Review 71, pp. 357373.
Maddala, G.S., 1977, Econometrics, New York: McGraw-Hill.
Maddala, G.S., and Jinook Jeong, 1992, On the exact small sample distribution of the
instrumental variable estimator, Econometrica 60, 181-183.
Mensah, Yaw M., Judith M. Considine, Leslie Oakes, 1994, Statutory Insolvency
Regulations and Earnings Management in the Prepaid Health-Care Industry, The
Accounting Review 69, pp. 70-95.
Murphy, Kevin J., and Jerold L. Zimmerman, 1993, Financial performance surrounding
CEO turnover, Journal of Accounting and Economics 16, 273-315.
Nelson, Charles R., and Richard Startz, 1990a, The distribution of the instrumental
variables estimator and its t-ratio when the instrument is a poor one, Journal of
Business 63, s125-s140.
35
Nelson, Charles R., and Richard Startz, 1990b, Some further results on the exact small
sample properties of the instrumental variable estimator, Econometrica 58, 967976.
O'Brien, Patricia C., Ravi Bhushan, 1990, Analyst Following and Institutional
Ownership, Journal of Accounting Research 28, pp. 55-76.
Rajgopal, Shivaram, and Terry Shevlin, 2002, Empirical evidence on the relation
between stock option compensation and risk taking, Journal of Accounting and
Economics 33, 145-171.
Rajgopal, Shivaram, Mohan Venkatachalam, Suresh Kotha, 2003, The value relevance of
network advantages: The case of e-commerce firms, Journal of Accounting
Research 41, 135-162.
Richardson, David H., 1968, The exact distribution of a structural coefficient estimator,
Journal of the American Statistical Association 63, 1214-1226.
Roulstone, Darren T., 2003, The relation between insider-trading restrictions and
executive compensation, Journal of Accounting Research 41, 525-551.
Roulstone, Darren T., 2003, Analyst following and market liquidity, Contemporary
Accounting Research 20, 551-?.
Sawa, Takamitsu, 1969, The exact sampling distribution of ordinary least squares and
two-stage least squares estimators, Journal of the American Statistical association
64, 923-937.
Staiger, Douglas, and James H. Stock, 1997, Instrumental variables regression with weak
instruments, Econometrica 65, 557-586.
Vafeas, Nikos, 1999, Board meeting frequency and firm performance, Journal of
Financial Economics 53, 113-142.
Whisenant, Scott, Srinivasan Sankaraguruswamy, K. Raghunandan, 2003, Evidence on
the joint determination of audit and non-audit fees, Journal of Accounting
Research 41, 721-744.
Willenborg, Michael, 1999, Empirical Analysis of the Economic Demand for Auditing in
the Initial Public Offerings Market’ Journal of Accounting Research 37, pp. 225238.
Wooldridge, Jeffrey M., 2002, Econometric analysis of cross section and panel data, MIT
Press, Cambridge
36
Table 1
Articles utilizing instrumental variables estimation
This table shows the papers we investigate for the purpose of determining current practice in instrumental variables
estimation. The list of papers was obtained by an electronic search on the term 2sls in the text of article. We
investigated the following journals: Accounting, Organizations and Society (AOS), Accounting Review (AR), Journal
of Accounting and Economics (JAE), Journal of Financial Economics (JFE), Journal of Accounting Research (JAR)
and Review of Accounting Studies (RAST). Not included are articles that use instrumental variables solely to deal
with measurement error. Undoubtedly, this list is incomplete. Suggestions for additional papers are greatly
appreciated.
Financial Accounting
Aboody (JAE, 1996)
Al-Tuwaijri, Christensen, and Hughes (AOS, 2004)
Barton (AR, 2001)
Beatty, Chamberlain and Magliolo (JAR, 1995)
Beaver, McAnally, and Stinson (JAE, 1997)
Bushee, Matsumoto, and Miller (JAE, 2003)
Demers and Lewellen (JFE, 2003)
Doidge, Karolyi and Stulz (JFE, 2004)
D'Souza (AR, 1998)
Hodder, Kohlbeck and McAnnaly (CAR, 2002)
Hunt, Moyer, and Shevlin (JAE, 1996)
Kasznik (JAR, 1999)
Lang, Lins, and Miller (JAR, 2004)
Leuz and Verrecchia (JAR, 2000)
Leuz, Nanda, and Wysocki (JFE, 2003)
Libby, Mathieu, and Robb (CAR, 2002)
Loudder, Khurana, and Boatsman (AR, 1996)
Mensah, Considine, and Oakes (AR, 1994)
O'Brien and Bhushan (JAR, 1990)
Rajgopal, Venkatachalam, and Kotha (JAR, 2003)
Roulstone (CAR, 2003)
37
Management Accounting
Abernethy, Bouwens, and Van Lent (AR, 2004)
Holthausen, Larcker and Sloan (JAE, 1995)
Keating (JAE, 1997)
Audit
Copley and Douthett (CAR, 2002)
Copley, Gaver and Gaver (JAR, 1995)
DeFond, Raghunandan, and Subramanyam (JAR, 2002)
Hansen and Watts (CAR, 1997)
Whisenant, Sankaraguruswamy and Raghunandan (JAR, 2003)
Willenborg (JAR, 1999)
Corporate Governance
Anderson, Mansi, and Reeb (JAE, 2004)
Murphy and Zimmerman (JAE, 1993)
Rajgopal and Shevlin (JAE, 2002)
Roulstone (JAR, 2003)
Vafeas (JFE, 1999)
Table 2
Types of instrumental variables models
This table shows the frequency of the different types of applications of instrumental variables, classified by research
area. We distinguish between three types of IV models: ‘standard’ two stage least squares, Heckman type models, and
simultaneous equations models. The research areas are defined in table I.
FA
MA
AUD
CG
Total
Recursive models
Standard two-stage least squares
First stage is a probit model (Heckman)
4
3
0
0
1
0
2
1
7
4
Non-recursive models
Simultaneous equations models
14
3
5
2
24
38
Table 3
Descriptive statistics
This table shows the distribution of sample size across the articles we investigate. We also report the R2
from the first stage equation (this is only available for a subset of 26 out of 35 articles). In addition we
report the distribution of the number of endogenous regressors and the average number of instruments per
category.
Number of observations
First-stage R2 (26 observations)
Number of endogenous regressors
Articles per category
Average number of instruments
Mean
2393
0.31
Min
31
0.01
Q1
162
0.20
Median
654
0.26
1
9
4.7
2
17
5.8
3
6
6.2
>3
3
7.3
39
Q3
1380
0.36
Max
40386
0.80
Table 4
Incidence of specification tests
This table shows the incidence of specification testing. We report whether the authors performed a Hausman
test (or a test on the inverse Mill's ratio in case of a Heckman model), and whether the authors perform a test
of over identifying restrictions. Since the latter can only be tested when the model is over identified (more
instruments than endogenous variables) we split the sample in just identified and over identified models and
report the results separately. Two papers do not report the number of instruments and are treated separately.
Articles in category
Authors perform Hausman test
Authors perform test of over identifying restrictions
40
Just identified
5
2
NA
Over identified
28
18
4
Not enough info
2
1
0
Table 5A
Simulation results for OLS and IV estimators for corr(x,u) = 0.1
This table shows the results of the simulation. Panel A investigates the situation where x has a small
endogeneity problem, corr(x,u) = 0.1, Panel B (C) investigates the situation where x has a medium (large)
endogeneity problem, corr(x,u) = 0.4 (corr(x,u) = 0.7). Each panel is divided into sub-panels based on
sample size (n=100, 200, 500, 1000). Each sub-panel shows the effect of varying the quality of the
instrument, the quality drops from top (corr(z,u)=0.0) to bottom (corr(z,u)=0.7), and the strength of the
instrument, the strength increases from left (corr(z,x)=0.1) to right (corr(z,x)=0.1). For each combination of
instrument strength and quality we draw 1000 independent samples of sample size n and run both OLS and
2SLS regressions. We then display the 5th percentile, the 25th percentile, the median, the 75th percentile and
the 95th percentile of the distribution of βs (for comparison, the true β=0) for both OLS and 2SLS. The
shaded cells indicate results were 2SLS has smaller median bias.
corr(x,z)=0.1
corr(x,z)=0.4
corr(x,z)=0.7
n=100
corr(z,u)=0.0
p5
-0.06
-2.89
-0.07
-3.64
-0.06
-15.50
-0.08
-26.49
p25
0.04
-0.55
0.03
-0.03
0.04
1.48
0.03
3.09
p50
0.10
0.08
0.10
0.63
0.10
2.74
0.09
4.99
p75
0.16
0.72
0.17
1.59
0.17
5.34
0.16
8.91
p95
0.26
4.02
0.26
5.56
0.26
16.41
0.26
39.14
p5
-0.06
-0.49
-0.07
-0.18
-0.08
0.56
-0.06
1.14
p25
0.03
-0.19
0.03
0.08
0.03
0.79
0.03
1.45
p50
0.09
0.00
0.09
0.24
0.10
1.00
0.10
1.72
p75
0.16
0.16
0.16
0.42
0.17
1.26
0.17
2.12
p95
0.26
0.42
0.26
0.69
0.27
1.79
0.26
3.04
p5
-0.06
-0.23
-0.07
-0.12
-0.07
0.34
-0.06
0.70
p25
0.04
-0.09
0.04
0.05
0.03
0.47
0.03
0.87
p50
0.11
0.02
0.11
0.14
0.10
0.57
0.10
0.99
p75
0.18
0.11
0.17
0.24
0.17
0.68
0.17
1.13
p95
0.26
0.26
0.26
0.38
0.28
0.86
0.28
1.40
n=200
corr(z,u)=0.0
p5
-0.01
-1.98
-0.02
-2.59
-0.01
-8.97
-0.02
-23.38
p25
0.06
-0.42
0.05
0.38
0.05
2.29
0.05
4.05
p50
0.10
0.06
0.10
0.85
0.10
3.58
0.10
6.23
p75
0.15
0.56
0.15
1.54
0.15
6.00
0.14
10.77
p95
0.22
2.53
0.21
5.23
0.22
21.08
0.21
41.27
p5
-0.02
-0.30
-0.02
-0.04
-0.02
0.65
-0.02
1.29
p25
0.05
-0.12
0.05
0.13
0.05
0.83
0.05
1.54
p50
0.10
0.00
0.10
0.25
0.10
0.99
0.10
1.72
p75
0.15
0.12
0.15
0.38
0.15
1.17
0.15
1.99
p95
0.21
0.30
0.22
0.56
0.21
1.50
0.21
2.54
p5
-0.01
-0.17
-0.01
-0.03
-0.02
0.40
-0.03
0.79
p25
0.05
-0.07
0.06
0.07
0.05
0.50
0.06
0.90
p50
0.10
-0.01
0.10
0.14
0.10
0.57
0.10
1.00
p75
0.14
0.06
0.15
0.21
0.15
0.65
0.15
1.10
p95
0.22
0.18
0.21
0.31
0.22
0.76
0.22
1.24
n=500
corr(z,u)=0.0
p5
0.03
-1.35
0.03
0.19
0.02
2.15
0.02
3.81
p25
0.07
-0.34
0.07
0.65
0.07
2.95
0.07
5.32
p50
0.10
-0.03
0.10
1.00
0.10
3.86
0.10
6.84
p75
0.13
0.29
0.13
1.54
0.13
5.41
0.13
9.67
p95
0.17
1.00
0.17
3.28
0.17
11.68
0.17
23.93
p5
0.03
-0.20
0.03
0.07
0.03
0.79
0.02
1.44
p25
0.07
-0.07
0.07
0.18
0.07
0.90
0.07
1.62
p50
0.10
0.00
0.10
0.25
0.10
1.00
0.10
1.75
p75
0.13
0.08
0.13
0.33
0.13
1.10
0.13
1.89
p95
0.17
0.19
0.17
0.45
0.17
1.29
0.17
2.15
p5
0.03
-0.11
0.03
0.03
0.03
0.46
0.03
0.87
p25
0.07
-0.04
0.07
0.09
0.07
0.52
0.07
0.94
p50
0.10
0.00
0.10
0.14
0.10
0.57
0.10
1.00
p75
0.13
0.04
0.13
0.18
0.13
0.62
0.13
1.06
p95
0.17
0.10
0.17
0.24
0.17
0.69
0.18
1.14
n=1000
corr(z,u)=0.0
p5
0.05
-0.60
0.05
0.45
0.05
2.60
0.05
4.68
p25
0.08
-0.20
0.08
0.72
0.08
3.25
0.08
5.87
p50
0.10
0.02
0.10
1.00
0.10
4.01
0.10
6.97
p75
0.12
0.23
0.12
1.33
0.12
5.03
0.12
8.99
p95
0.15
0.61
0.16
2.29
0.15
8.37
0.15
13.62
p5
0.05
-0.12
0.05
0.11
0.05
0.83
0.04
1.52
p25
0.08
-0.05
0.08
0.19
0.08
0.93
0.07
1.66
p50
0.10
0.00
0.10
0.25
0.10
1.00
0.10
1.75
p75
0.12
0.06
0.12
0.30
0.12
1.07
0.12
1.86
p95
0.15
0.13
0.15
0.38
0.15
1.20
0.15
2.02
p5
0.05
-0.07
0.05
0.07
0.05
0.49
0.05
0.90
p25
0.08
-0.03
0.08
0.11
0.08
0.54
0.08
0.96
p50
0.10
0.00
0.10
0.14
0.10
0.57
0.10
1.00
p75
0.12
0.03
0.12
0.17
0.12
0.60
0.12
1.04
p95
0.15
0.07
0.15
0.22
0.15
0.65
0.15
1.10
OLS
2SLS
corr(z,u)=0.1 OLS
2SLS
corr(z,u)=0.4 OLS
2SLS
corr(z,u)=0.7 OLS
2SLS
OLS
2SLS
corr(z,u)=0.1 OLS
2SLS
corr(z,u)=0.4 OLS
2SLS
corr(z,u)=0.7 OLS
2SLS
OLS
2SLS
corr(z,u)=0.1 OLS
2SLS
corr(z,u)=0.4 OLS
2SLS
corr(z,u)=0.7 OLS
2SLS
OLS
2SLS
corr(z,u)=0.1 OLS
2SLS
corr(z,u)=0.4 OLS
2SLS
corr(z,u)=0.7 OLS
2SLS
Table 5B
Simulation results for OLS and IV estimators for corr(x,u) = 0.4
corr(x,z)=0.1
n=100
corr(z,u)=0.0
corr(z,u)=0.1
corr(z,u)=0.4
corr(z,u)=0.7
n=200
corr(z,u)=0.0
corr(z,u)=0.1
corr(z,u)=0.4
corr(z,u)=0.7
n=500
corr(z,u)=0.0
corr(z,u)=0.1
corr(z,u)=0.4
corr(z,u)=0.7
n=1000
corr(z,u)=0.0
corr(z,u)=0.1
corr(z,u)=0.4
corr(z,u)=0.7
corr(x,z)=0.4
corr(x,z)=0.7
OLS
2SLS
OLS
2SLS
OLS
2SLS
OLS
2SLS
p5
0.23
-3.32
0.23
-2.40
0.23
-15.61
0.23
-25.31
p25
0.31
-0.49
0.31
0.17
0.31
1.76
0.31
2.88
p50
0.37
0.09
0.37
0.69
0.37
2.78
0.37
4.50
p75
0.42
0.66
0.43
1.39
0.43
4.97
0.43
8.19
p95
0.50
3.06
0.50
4.73
0.52
19.90
0.52
26.57
p5
0.22
-0.52
0.23
-0.23
0.22
0.56
0.22
1.13
p25
0.31
-0.17
0.30
0.07
0.31
0.77
0.32
1.36
p50
0.37
-0.01
0.36
0.21
0.37
0.92
0.37
1.58
p75
0.42
0.14
0.42
0.35
0.43
1.10
0.43
1.89
p95
0.51
0.34
0.50
0.54
0.52
1.50
0.52
2.60
p5
0.23
-0.23
0.22
-0.09
0.22
0.34
0.23
0.71
p25
0.32
-0.08
0.30
0.04
0.31
0.44
0.31
0.82
p50
0.37
0.00
0.36
0.13
0.37
0.53
0.37
0.91
p75
0.42
0.09
0.42
0.21
0.42
0.60
0.42
1.01
p95
0.50
0.20
0.52
0.33
0.49
0.72
0.51
1.19
OLS
2SLS
OLS
2SLS
OLS
2SLS
OLS
2SLS
p5
0.26
-2.62
0.27
-1.63
0.27
-11.27
0.27
-14.62
p25
0.32
-0.40
0.33
0.42
0.33
2.22
0.33
4.03
p50
0.36
0.10
0.37
0.83
0.36
3.26
0.36
5.69
p75
0.40
0.48
0.41
1.39
0.41
5.11
0.40
9.48
p95
0.47
2.68
0.46
4.01
0.47
16.51
0.46
28.71
p5
0.27
-0.33
0.27
-0.05
0.26
0.66
0.27
1.26
p25
0.33
-0.12
0.33
0.13
0.32
0.80
0.32
1.43
p50
0.37
0.00
0.37
0.23
0.36
0.90
0.37
1.63
p75
0.41
0.10
0.41
0.33
0.40
1.05
0.41
1.81
p95
0.47
0.26
0.47
0.47
0.46
1.28
0.47
2.26
p5
0.26
-0.17
0.27
-0.02
0.27
0.38
0.28
0.76
p25
0.32
-0.07
0.32
0.07
0.32
0.46
0.33
0.85
p50
0.36
0.00
0.36
0.13
0.36
0.52
0.37
0.92
p75
0.40
0.06
0.40
0.19
0.40
0.58
0.40
0.99
p95
0.46
0.14
0.46
0.27
0.46
0.66
0.47
1.09
OLS
2SLS
OLS
2SLS
OLS
2SLS
OLS
2SLS
p5
0.31
-1.31
0.30
0.21
0.30
2.18
0.30
3.71
p25
0.34
-0.30
0.34
0.63
0.34
2.88
0.34
4.93
p50
0.37
0.00
0.36
0.91
0.37
3.63
0.36
6.42
p75
0.39
0.24
0.39
1.27
0.39
5.00
0.39
9.28
p95
0.43
0.67
0.43
2.54
0.43
11.71
0.43
19.09
p5
0.31
-0.20
0.30
0.05
0.31
0.75
0.30
1.35
p25
0.34
-0.07
0.34
0.15
0.34
0.84
0.34
1.49
p50
0.37
0.00
0.37
0.22
0.37
0.91
0.37
1.60
p75
0.39
0.07
0.39
0.29
0.39
0.99
0.39
1.72
p95
0.43
0.16
0.43
0.37
0.43
1.11
0.42
1.94
p5
0.31
-0.10
0.30
0.04
0.30
0.43
0.30
0.82
p25
0.34
-0.04
0.34
0.09
0.34
0.48
0.34
0.88
p50
0.37
0.00
0.37
0.13
0.37
0.52
0.36
0.92
p75
0.39
0.04
0.39
0.17
0.39
0.56
0.39
0.96
p95
0.43
0.09
0.43
0.22
0.43
0.62
0.43
1.03
OLS
2SLS
OLS
2SLS
OLS
2SLS
OLS
2SLS
p5
0.32
-0.63
0.32
0.49
0.32
2.47
0.32
4.31
p25
0.35
-0.21
0.35
0.74
0.35
3.03
0.35
5.29
p50
0.37
0.01
0.37
0.93
0.37
3.71
0.37
6.23
p75
0.39
0.18
0.38
1.16
0.39
4.54
0.38
7.82
p95
0.41
0.45
0.41
1.78
0.41
7.03
0.41
12.71
p5
0.32
-0.13
0.32
0.12
0.33
0.81
0.32
1.42
p25
0.35
-0.05
0.35
0.19
0.35
0.87
0.35
1.53
p50
0.37
0.00
0.37
0.23
0.37
0.92
0.37
1.61
p75
0.38
0.05
0.38
0.27
0.39
0.98
0.38
1.69
p95
0.41
0.12
0.41
0.35
0.41
1.06
0.41
1.82
p5
0.32
-0.07
0.32
0.06
0.32
0.46
0.32
0.84
p25
0.35
-0.03
0.35
0.10
0.35
0.50
0.35
0.89
p50
0.37
0.00
0.37
0.13
0.37
0.53
0.37
0.92
p75
0.38
0.03
0.38
0.16
0.38
0.55
0.38
0.95
p95
0.41
0.06
0.41
0.20
0.41
0.59
0.41
0.99
42
Table 5C
Simulation results for OLS and IV estimators for corr(x,u) = 0.7
corr(x,z)=0.1
n=100
corr(z,u)=0.0
corr(z,u)=0.1
corr(z,u)=0.4
corr(z,u)=0.7
n=200
corr(z,u)=0.0
corr(z,u)=0.1
corr(z,u)=0.4
corr(z,u)=0.7
n=500
corr(z,u)=0.0
corr(z,u)=0.1
corr(z,u)=0.4
corr(z,u)=0.7
n=1000
corr(z,u)=0.0
corr(z,u)=0.1
corr(z,u)=0.4
corr(z,u)=0.7
corr(x,z)=0.4
corr(x,z)=0.7
OLS
2SLS
OLS
2SLS
OLS
2SLS
OLS
2SLS
p5
0.42
-2.83
0.41
-1.34
0.41
-11.68
0.41
-20.45
p25
0.46
-0.27
0.47
0.28
0.46
1.38
0.46
2.30
p50
0.50
0.20
0.50
0.61
0.50
2.12
0.50
3.65
p75
0.53
0.55
0.53
1.00
0.54
3.60
0.53
6.53
p95
0.58
2.62
0.59
2.52
0.59
12.42
0.58
26.89
p5
0.41
-0.43
0.42
-0.14
0.41
0.50
0.42
0.97
p25
0.46
-0.14
0.47
0.08
0.46
0.62
0.47
1.10
p50
0.50
-0.01
0.50
0.18
0.50
0.70
0.50
1.24
p75
0.53
0.10
0.54
0.28
0.53
0.81
0.53
1.41
p95
0.59
0.22
0.58
0.40
0.59
0.99
0.58
1.79
p5
0.41
-0.22
0.41
-0.09
0.42
0.28
0.42
0.59
p25
0.46
-0.08
0.46
0.02
0.47
0.36
0.46
0.66
p50
0.50
0.00
0.50
0.10
0.50
0.41
0.50
0.71
p75
0.53
0.06
0.53
0.16
0.53
0.45
0.53
0.77
p95
0.58
0.14
0.58
0.24
0.58
0.53
0.58
0.88
OLS
2SLS
OLS
2SLS
OLS
2SLS
OLS
2SLS
p5
0.44
-2.16
0.44
-0.50
0.44
-5.57
0.44
-11.76
p25
0.47
-0.34
0.48
0.42
0.47
1.92
0.47
3.18
p50
0.50
0.07
0.50
0.64
0.50
2.66
0.50
4.48
p75
0.52
0.35
0.52
0.90
0.52
3.89
0.53
6.71
p95
0.56
1.66
0.56
1.74
0.56
14.65
0.56
25.05
p5
0.44
-0.26
0.44
-0.03
0.44
0.56
0.44
1.03
p25
0.48
-0.09
0.47
0.12
0.48
0.66
0.48
1.15
p50
0.50
0.01
0.50
0.18
0.50
0.72
0.50
1.24
p75
0.52
0.09
0.52
0.25
0.53
0.79
0.52
1.37
p95
0.56
0.18
0.56
0.34
0.56
0.90
0.56
1.60
p5
0.44
-0.15
0.44
-0.02
0.44
0.32
0.44
0.63
p25
0.48
-0.06
0.48
0.06
0.47
0.37
0.48
0.68
p50
0.50
0.00
0.50
0.10
0.50
0.41
0.50
0.72
p75
0.53
0.04
0.52
0.15
0.52
0.44
0.52
0.76
p95
0.56
0.11
0.56
0.20
0.56
0.49
0.56
0.81
OLS
2SLS
OLS
2SLS
OLS
2SLS
OLS
2SLS
p5
0.46
-1.27
0.46
0.27
0.46
1.77
0.47
2.93
p25
0.49
-0.24
0.48
0.55
0.49
2.24
0.49
3.93
p50
0.50
0.00
0.50
0.72
0.50
2.82
0.50
4.94
p75
0.52
0.18
0.52
0.92
0.52
3.77
0.51
6.55
p95
0.54
0.41
0.54
1.60
0.54
8.30
0.54
13.24
p5
0.46
-0.16
0.46
0.06
0.46
0.62
0.46
1.10
p25
0.48
-0.06
0.48
0.13
0.48
0.68
0.49
1.18
p50
0.50
0.00
0.50
0.18
0.50
0.72
0.50
1.25
p75
0.51
0.05
0.51
0.22
0.52
0.76
0.52
1.32
p95
0.54
0.12
0.54
0.28
0.54
0.83
0.54
1.44
p5
0.46
-0.07
0.46
0.03
0.46
0.35
0.47
0.66
p25
0.49
-0.03
0.48
0.07
0.48
0.38
0.49
0.69
p50
0.50
0.00
0.50
0.10
0.50
0.41
0.50
0.71
p75
0.51
0.03
0.51
0.13
0.51
0.43
0.51
0.74
p95
0.53
0.07
0.54
0.17
0.54
0.46
0.54
0.78
OLS
2SLS
OLS
2SLS
OLS
2SLS
OLS
2SLS
p5
0.47
-0.62
0.47
0.44
0.47
2.03
0.47
3.45
p25
0.49
-0.17
0.49
0.59
0.49
2.46
0.49
4.24
p50
0.50
0.01
0.50
0.71
0.50
2.90
0.50
5.11
p75
0.51
0.15
0.51
0.85
0.51
3.61
0.51
6.15
p95
0.53
0.29
0.53
1.12
0.53
5.50
0.53
9.51
p5
0.47
-0.11
0.47
0.09
0.47
0.64
0.47
1.14
p25
0.49
-0.04
0.49
0.15
0.49
0.68
0.49
1.20
p50
0.50
0.00
0.50
0.18
0.50
0.71
0.50
1.25
p75
0.51
0.04
0.51
0.21
0.51
0.74
0.51
1.30
p95
0.53
0.09
0.53
0.25
0.52
0.79
0.53
1.38
p5
0.47
-0.06
0.47
0.05
0.47
0.37
0.48
0.68
p25
0.49
-0.02
0.49
0.08
0.49
0.39
0.49
0.70
p50
0.50
0.00
0.50
0.10
0.50
0.41
0.50
0.72
p75
0.51
0.02
0.51
0.12
0.51
0.42
0.51
0.73
p95
0.53
0.05
0.53
0.15
0.53
0.45
0.52
0.76
43
Table 6
Descriptive statistics
This table shows the descriptive statistics of the disclosure
example. The sample is based on 9372 firm year observations from
the period 1982-2000.
Variable
Mean
Std
Min
Max
Cost of capital
AccrualsQuality
Log (# owners)
Sales growth
Capital intensity
Litigation risk
Operating margin
Operating cycle (in days)
Big-six auditor
Log market value
Book-to-Market
# Analysts
Leverage
Return on assets
11.28
0.00
2.11
0.11
0.44
0.13
0.34
123.0
0.95
6.65
0.61
6.58
0.47
0.11
3.37
1.00
1.57
0.19
0.24
0.33
0.15
64.1
0.23
1.53
0.32
4.73
0.50
0.06
4.33
-1.35
-2.45
-0.45
0.03
0
0.03
9.7
0
2.61
0.04
1.00
0.00
-0.15
49.88
5.74
6.96
2.08
0.92
1
0.85
497.4
1
11.79
2.32
25.00
4.25
0.34
44
Table 7
The effect of disclosure on cost of capital
This table shows results from the disclosure study. The first set of results is from a standard OLS regression of
cost of capital on accrual quality and control variables. The second set of results is from a 2SLS regression where
accrual quality is treated as endogenous. The third set of results is from an alternative second stage regression in
which all independent variables have been replaced by the product of the first stage regression coefficient and the
original variable. Unlike the standard 2SLS the coefficients on the instruments are not constrained to be the same.
The lower part of the table shows the partial R-squared from the first stage regression and the values for the two
specification tests: the test of over-identifying restrictions and the Hausman test.
OLS
AccrualsQuality
coef
t-stat
0.44
16.69
Instruments
Log (# owners)
Sales growth
Capital intensity
Litigation risk
Operating margin
Operating cycle (in days)
Big-six auditor
Control variables
Log market value
Book-to-Market
# Analysts
Leverage
Return on assets
Adjusted R-squared
-0.02
3.27
-0.05
0.63
-1.38
-0.89
28.80
-5.80
10.13
-2.85
2SLS
First stage
coef
t-stat
1.10
-0.02
0.42
-1.66
0.15
-0.24
0.00
0.06
-1.65
8.11
-32.1
4.96
-3.22
0.67
1.59
-0.14
-0.33
0.01
-0.07
-2.25
-11.7
-7.83
2.66
-2.88
-12.3
0.24
0.27
Partial R-squared
Over-identifying restrictions
Hausman test
Second stage
coef
t-stat
0.16
χ2 = 148.1 (p < 0.0001)
χ2 = 113.0 (p < 0.0001)
45
0.13
3.61
-0.05
0.86
-0.17
0.23
Unconstrained
Second stage
coef
t-stat
16.25
4.03
29.68
-6.02
12.72
-0.33
-1.43
1.29
0.74
4.08
10.68
23.33
8.97
-0.86
3.81
8.60
7.38
12.54
5.72
5.19
0.41
-9.00
-5.56
-11.00
0.44
1.77
-25.69
-5.38
-11.53
1.96
0.25
Table 8
Descriptive statistics
This table shows the descriptive statistics of the governance example. The
sample is based on a single cross section of 1438 firms from the year
2002.
Variable
Mean
Std
Min
Max
Log total pay
Insider power
Log of CEO age
Log of tenure
Founder
NYSE/AMEX
Percentage held by activist
Percentage held by block holders
Big 4 auditor
Fee ratio
Log market value
Book-to-market
Return on assets
Stock return
Standard deviation ROA
Standard deviation returns
14.85
0.00
3.98
2.08
0.10
0.59
0.02
0.16
0.97
0.59
7.01
0.64
0.00
-0.14
0.07
0.65
1.08
1.00
0.14
0.88
0.30
0.49
0.01
0.13
0.17
0.80
1.49
0.51
0.15
0.38
0.10
0.67
11.37
-3.41
3.47
0.00
0
0
0.00
0.00
0
0.00
2.88
-1.10
-0.77
-0.91
0.00
0.11
18.58
4.20
4.48
3.85
1
1
0.06
0.50
1
11.10
10.34
2.73
0.21
1.26
0.58
3.84
46
Table 9
The effect of insider power on CEO pay levels
This table shows the effect of insider power on CEO pay. The first set of results is from a standard OLS regression
of CEO pay on insider power and control variables. The second set of results is from a 2SLS regression where
insider power is treated as endogenous. The third set of results is from an alternative second stage regression in
which all independent variables have been replaced by the product of the first stage regression coefficient and the
original variable. Unlike the standard 2SLS the coefficients on the instruments are not constrained to be the same.
The lower part of the table shows the partial R-squared from the first stage regression and the values for the two
specification tests: the test of over-identifying restrictions and the Hausman test.
OLS
Insider power
2SLS
coef
t-stat
0.08
3.32
First stage
coef t-stat
-0.04
Instruments
Log of CEO age
Log of tenure
Founder
NYSE/AMEX
Percentage held by activist
Percentage held by block holders
Big 4 auditor
Fee ratio
Control variables
Log market value
Book-to-market
Return on assets
Stock return
Standard deviation ROA
Standard deviation returns
Adjusted R-squared
0.49
0.05
0.11
0.33
0.79
0.13
27.94
1.16
0.60
4.92
2.91
3.88
0.52
0.05
-0.17
0.26
0.41
-0.46
0.23
-0.09
3.08
1.88
-2.18
4.83
0.22
-2.58
1.70
-3.36
0.28
0.08
-0.21
0.12
-0.29
-0.07
14.08
1.73
-1.04
1.62
-0.97
-1.98
0.47
0.29
Partial R-squared
Over-identifying restrictions
Hausman test
Second stage
coef
t-stat
0.047
χ2 =41.13 (p = < 0.001)
χ2 =1.25 (p =.99)
47
0.53
0.07
0.08
0.36
0.72
0.12
Unconstrained
Second stage
coef
t-stat
-0.38
13.23
1.43
0.45
4.96
2.55
3.08
0.46
0.45
-2.20
-1.15
0.26
5.58
-1.19
0.02
0.26
1.51
-4.48
-2.71
1.35
1.31
-3.34
0.04
0.93
1.82
0.46
-0.50
3.14
-2.55
-1.62
27.84
0.87
-0.56
5.49
-2.68
-3.47
0.48
Figure 1
Using legal origin as an instrument for disclosure quality
e.g., Property rights
Investor protection
Corporate governance
Reorganization laws
c
Legal origin
b
Disclosure
d
a
Cost of capital
This figure graphically depicts the problems with instrumental variables estimation using legal origin as
instrument. The relation we are ultimately interested in is the relation between disclosure and cost of
capital, (a). We can use legal origin as an instrument through relation (b). However, this only gives the
correct inference if either (c) or (d), or both, are equal to zero.
48