Common Method Bias

J Bus Psychol (2010) 25:325–334
DOI 10.1007/s10869-010-9181-6
What Reviewers Should Expect from Authors Regarding
Common Method Bias in Organizational Research
James M. Conway • Charles E. Lance
Published online: 22 May 2010
Springer Science+Business Media, LLC 2010
Abstract We believe that journal reviewers (as well as
editors and dissertation or thesis committee members) have
to some extent perpetuated misconceptions about common
method bias in self-report measures, including (a) that
relationships between self-reported variables are necessarily
and routinely upwardly biased, (b) other-reports (or other
methods) are superior to self-reports, and (c) rating sources
(e.g., self, other) constitute measurement methods. We argue
against these misconceptions and make recommendations
for what reviewers (and others) should reasonably expect
from authors regarding common method bias. We believe it
is reasonable to expect (a) an argument for why self-reports
are appropriate, (b) construct validity evidence, (c) lack of
overlap in items for different constructs, and (d) evidence
that authors took proactive design steps to mitigate threats of
method effects. We specifically do not recommend post hoc
statistical control strategies; while some statistical strategies
are promising, all have significant drawbacks and some have
shown poor empirical results.
Keywords Common method bias Method variance Self-report measures Reviewing
The focus of this article is on common method bias from a
journal reviewer’s perspective. According to Chan (2009)
and Spector (2006), concerns about common method bias
J. M. Conway (&)
Department of Psychology, Central Connecticut State
University, 1615 Stanley Street, New Britain, CT 06050, USA
e-mail: [email protected]
C. E. Lance
University of Georgia, Athens, GA, USA
e-mail: [email protected]
are often raised during the review process. While there is a
substantial literature about controlling common method
bias that targets researchers and authors (e.g., Cote and
Buckley 1987; Crampton and Wagner 1994; Lance et al., in
press; Podsakoff et al. 2003; Richardson et al. 2009;
Spector 2006; Williams et al. 1989) there is very little that
advises reviewers (one exception is Richardson et al.
2009). This is important because there are misconceptions
about common method bias (as we explain later) that
reviewers, dissertation committees, editors, and other
‘‘gatekeepers’’ play an important role in perpetuating
(Lance and Vandenberg 2009). We intend to fill this gap in
the literature by describing misconceptions regarding
common method bias and what it is reasonable for
reviewers to expect from authors to minimize common
method bias. By extension, we hope to help authors be
better prepared to (a) design studies that minimize common
method bias and (b) present their research in a way that
proactively addresses concerns with method bias.
The literature on common method bias has dealt with a
variety of types of alleged measurement methods including
self-reports (e.g., self-ratings of job satisfaction and work
behaviors), rater effects (e.g., performance ratings by
multiple sources), and assessment center exercises. Our
main focus will be on the use of self-reports as measurement method because:
(a)
It is widely assumed that common method bias
inflates relationships between variables measured by
self-reports. For example, Podsakoff and Todor
(1985) stated ‘‘Invariably, when self-report measures
obtained from the same sample are utilized in
research, concern over same-source bias or general
method variance arises’’ (p. 65). A second example
comes from Organ and Ryan’s (1995) meta-analysis
123
326
J Bus Psychol (2010) 25:325–334
of correlates of organizational citizenship behavior
(OCB) in which they stated that ‘‘studies that use selfratings of OCB along with self-reports of dispositional and attitudinal variables invite spuriously high
correlations confounded by common method variance’’ (p. 779). As a final example, and as part of his
commentary as outgoing editor of the Journal of
Applied Psychology Campbell (1982) wrote:
(b)
With perhaps one or two exceptions there has
been very little opportunity to exercise any
professional biases (I think). One possible
exception pertains to the use of a self-report
questionnaire to measure all the variables in a
study. If there is no evident construct validity for
the questionnaire measure or no variables that
are measured independently of the questionnaire, I am biased against the study and believe
that it contributes very little. Many people share
this bias. (p. 692).
As a result of this apparently widely held belief,
authors often report reviewer and editorial concerns
about inflation in self-report relationships (Brannick
et al., in press; Spector 2006). Given the tendency for
reviewer criticisms to center on self-reported variables, this will be the main focus of our article. We
will not emphasize alleged ‘‘method’’ effects due to
rating sources (e.g., multisource performance ratings),
assessment center exercises or measurement occasions because as we explain later, it is a misconception that these measurement facets should necessarily
be thought of as simply different methods of
measurement (Lance et al. 2009).
We believe that there are widely held misconceptions
about the effects of common method bias in self-report
variables and that reviewers sometimes base criticisms of
manuscripts on these misconceptions. We will therefore
include a section dealing with misconceptions. This is not
to say that common method bias is never a problem. There
are situations in which common method bias should be an
important concern, and we will follow the discussion of
misconceptions with a section in which we lay out the
kinds of things reviewers should look for and reasonably
expect from authors to minimize common method bias.
Misconceptions About Common Method Bias—What
Should Reviewers Not Assume
We suggest that there are three prevailing misconceptions
about common method bias that can impede the progress
of research: (a) that relationships between self-reported
123
variables are necessarily and routinely upwardly biased,
(b) that other-reports (or other methods) are superior to
self-reports, and (c) that rating sources (e.g., self, other)
constitute mere alternative measurement methods.
Misconception #1: Relationships Between
Self-Reported Variables are Routinely Upwardly
Biased
This is the most fundamental misconception. An example
of this belief in the published literature comes from Organ
and Ryan’s (1995) meta-analysis of correlates of OCB.
Organ and Ryan stated that ‘‘The most notable moderator
of these correlations appears to be the use of self- versus
other-rating of OCB; self-ratings are associated with higher
correlations, suggesting spurious inflation due to common
method variance’’ (p. 775). Spector (2006) provided indirect evidence against inflation, observing that correlations
among self-report variables are at least sometimes nearzero, and that sometimes different-method correlations are
higher than same-method correlations. However, there is a
degree of truth to this belief—common (or correlated)
method effects can upwardly bias relationships—but the
situation is more complicated by several issues, one of
which is the important distinction between method variance and common methods bias—method variance is
systematic error variance due to the method of measurement and common method bias is inflation of relationships
by shared method variance.
Under a simple extension of classical test theory (Lord
and Novick 1968), the measurement equation for an
observed score (Xijk) represents the kth realization of the ith
individual’s true score on trait (construct) T, as measured
by the jth measurement method (Mj) and non-systematic
measurement error (Eijk):
Xijk ¼ kTij Ti þ kMij Mj þ Eijk
ð1Þ
Under the traditional assumptions that Ti and Mj are
uncorrelated with Eijk and each other (e.g., Widaman
1985), and that the Xijk, Ti, and Mj have unit variances (i.e.,
that the ks are standardized regression coefficients or factor
loadings), the variance in observed scores, r2Xijk ; reflects
three independent components: individuals’ true score
variability, k2Tij (this can also be interpreted as a squared
reliability index), variance attributable to the method of
measurement, k2Mij , and non-systematic error score
variance, r2Eijk :
r2Xijk ¼ k2Tij þ k2Mij þ r2Eijk
ð2Þ
It is in this sense that method variance is interjected into
observed scores and the primary implication is compromised construct validity to the extent that method variance
J Bus Psychol (2010) 25:325–334
327
dominates over true score variance. That is, to the extent
that k2Mij dominates over k2Tij in Eq. 2, X reflects less of the
intended construct than it does the method employed to
measure it. But compromised construct validity does not
necessarily lead to common method bias, or upward distortion in measured relationships. In fact, if two traits (i.e.,
TX and TY) are measured by two different, uncorrelated
methods, the expected correlation will be attenuated as
compared to the no-method-variance situation. This is
because each measure will contain a source of variance
(due to the measurement method) that is unshared with the
other measure, decreasing the resulting correlation value
(the appendix provides a mathematical description of why
the attenuation occurs).
However, inflationary common method bias can occur if
different traits are measured by the same method. In the
case in which X and Y are measured by the same method M,
the observed correlation can be represented as:
rXY ¼ kXTX kYTY qTX TY þ kXM kYM
ð3Þ
where kXTX and kYTY are X’s and Y’s reliability indexes,
respectively, qTX TY represents the X–Y true score correlation, and kXM and kYM represent the effect of the common
method M on X and Y, respectively. Equation 3 shows that
rXY is inflated by a factor of kXMkYM, the common causal
effect of method M used to measure both X and Y, and this
is the source of common method bias. Thus, common
method bias causes observed score correlations to be
inflated as compared to their true score counterparts. Note,
however, that in Eq. 3 rXY is also simultaneously attenuated by unreliability in both X and Y (i.e., both kXTX and
kYTY are likely to be [considerably] less than 1.0) as compared to qTX TY . As such, and perhaps counterintuitively,
correlations between variables measured using a common
method are simultaneously attenuated due to unreliability
and inflated due to common method bias. Thus, the net
effect could be that observed correlations are (a) higher
than qTX TY (if inflation is greater than attenuation), (b)
lower than qTX TY (if attenuation is greater than inflation), or
(c) equal to qTX TY (if inflation and attenuation offset each
other).
In a recent study, Lance et al. (in press) re-analyzed data
from 18 published multitrait-multimethod (MTMM)
matrices based on a variety of constructs and methods to
evaluate the possible offsetting effects of unreliability and
common method effects. Same-method correlations (rXY in
Eq. 3) were compared to trait factor correlations from
confirmatory factor analysis of the MTMM matrices; the
trait factor correlations can be regarded as unbiased estimates of true correlations, qTX TY in Eq. 3. Overall, they
found that the average same-method correlation was .342,
which was actually slightly lower than the mean trait factor
correlation of .371. These results suggest that (a) the
attenuating effects of measurement error offset (in fact,
were somewhat larger than) the inflationary effects of
common method bias, (b) that same-method observed score
correlations are actually quite accurate representations of
their true-score counterparts, and (c) the widespread belief
that common method bias serves to inflate common method
correlations as compared to their true-score counterparts is
substantially a myth.
Misconception #2: Other-Reports (or Other Methods)
are Superior to Self-Reports
A corollary to misconception #1 is that other-reports or
other methods (e.g., objective measures) are superior to
self-reports. This idea is illustrated by the earlier quotations
from Campbell (1982), Organ and Ryan (1995), and Podsakoff and Todor (1985). These quotations imply that
including a non-self-reported variable can improve the
quality of the study and/or reduce inflation of relationships.
More directly, Chan (2009) provided quotations from
reviewer comments indicating the assumption that nonself-reports are superior to self-reports (p. 325).
One reason self-reports might be considered inferior is
because of method effects which decrease construct
validity (e.g., influence of response biases on ratings of job
characteristics). Self-reports certainly can be subject to
method effects but literature reviews have indicated that
suspected biasing factors including social desirability
(Ones et al. 1996) and negative affect and acquiescence
(Spector 2006) do not have strong, consistent effects (e.g.,
Williams and Anderson 1994). In addition, other methods
are also subject to method effects (e.g., supervisor ratings
of job characteristics are subject to response biases as well,
as are peer ratings of job performance). In fact, if all
constructs are measured by the same measurement method
(e.g., supervisory ratings, peer ratings), the measurement
implications are identical to those discussed earlier in
connection with Eq. 3 and the case of self-reports.
Another reason for considering self-reports to be inferior
is the belief that if two variables are self-reported, the
relationship will be inflated (see misconception #1). It
might be argued that it is better to measure at least one of
the variables with an other-report. However, if a relationship is assessed between a self-reported variable (e.g., job
satisfaction) and an other-reported variable (e.g., supervisor report of job characteristics), the result could be an
attenuated relationship due to unshared method effects (see
the appendix and Conway 2002) and/or random error (see
Eq. 3 and Lance et al., in press), or even an inflated correlation. In the case in which X and Y are measured by
different methods Mj and Mj0 , the observed correlation can
be represented as:
123
328
rXY ¼ kXTX kYTY qTX TY þ kXMj kYMj0 qMj Mj0
J Bus Psychol (2010) 25:325–334
ð4Þ
where kXMj and kYMj0 represent the effects of the methods Mj
and Mj0 on X and Y, respectively, qMj Mj0 reflects the correlation between the different methods, and all other terms
are defined as before. Note that as in Eq. 3, attenuation
incurs to the extent that measures are less than perfectly
reliable (i.e., kXTX and kYTY are less than 1.00), but that
covariance distortion (usually inflation) still incurs to the
extent that the methods are positively correlated (i.e.,
qMj Mj0 [ 0). If the different methods are uncorrelated, then
the observed correlation is attenuated due to both unreliability and uncorrelated method variance as shown in the
appendix. In the event that the methods are negatively
correlated, then even further attenuation would incur due to
the fact that the product kXMj kYMj0 qMj Mj0 in Eq. 4 would be
negative, causing rXY to be even less than qTX TY attenuated
simply by measurement error.
We noted earlier that Lance et al. (in press) found even
same-method correlations to slightly underestimate true
relationships, and this effect was found to be even greater
for correlations between different methods: the mean different-method, different-trait correlation was only .248
when compared to the mean trait factor correlation (the
estimate of qTX TY ) of .371. In fact, and using results from
Lance et al.’s (in press) Table 2, 61% of the mean different-method correlation was due to attenuated true-score
correlation (.151/.248), but 39% (.097/.248) was due to
effects of measurement methods that were correlated
qMj Mj0 ¼ :525, on the average. Thus, rather than providing a
more accurate estimated of true relationships among constructs, relationships estimated using different methods
tend to be more attenuated and less accurate as compared
to same-method correlations. This is true despite the fact
that different-method correlations reflect some inflation
due effects of correlated methods. In fact, had the different
methods been uncorrelated, the average different-method
correlation would have been rXY = .151, rather severely
attenuated as compared to its true-score counterpart
(qTX TY ¼ :371).
Misconception #3: Rating Sources (e.g., Self, Other)
Constitute Measurement Methods
Underlying the preference for other-reports over selfreports (see ‘‘Misconception #2’’) is an assumption that
different rating sources, for example, supervisor, peer,
subordinate, and self-ratings in multisource rating systems,
represent different methods for measuring ratee performance. For example, Mount et al. (1998) wrote that
‘‘[s]tudies that have examined performance rating data
using multitrait-multimethod matrices (MTMM) or multitrait-multirater (MTMR) matrices usually focus on the
123
proportion of variance in performance ratings that is
attributable to traits and that which is attributable to the
methods or raters’’ (p. 559) and that these studies have
‘‘documented the ubiquitous phenomenon of method
effects in performance ratings’’ (p. 568) (see also, Becker
and Cote 1994; Doty and Glick 1998; Scullen et al. 2000;
Scullen et al. 2003, for similar attributions). Combined
with the belief that ‘‘method variance can bias results when
researchers investigate relations among constructs with the
common method… method variance produces a potential
threat to the validity of empirical findings’’ (Bagozzi and
Yi 1990, p. 547), the pervasive and strong rater (source)
effects that are routinely found in multisource ratings
(Mount et al. 1998; Scullen et al. 2000, 2003; Hoffman
et al. 2010) create the impression that these ratings are
seriously flawed, biased, and inaccurate measures of ratees’
actual performance. Note that despite misconception #2
that other-ratings are superior to self-ratings, beliefs about
rating sources as methods could lead to criticism of otherreports as well as self-ratings.
We argue that the mere existence of rater source effects
does not undermine the construct validity of either selfreports or other reports. In a series of studies, Hoffman,
Lance and colleagues (Hoffman and Woehr 2009; Lance
et al. 2006, 1992) conducted confirmatory factor analysis
(CFA) of multisource performance ratings (MSPRs). As is
often the case, they found support for both rating dimension and rater source factors and found that rater source
factors accounted for substantially more variance in ratings
than did dimension factors. Note that from a traditional
perspective, this pattern of evidence would be interpreted
as representing the presence of substantial rater (qua
method) bias. However, Hoffman, Lance and colleagues
also correlated rater source factors with additional jobrelated variables external to the ‘‘core’’ CFA, including
personality, aptitude, experience, and objective performance measures to test whether source factors represented
mere performance-irrelevant rater (i.e., method) bias factors or whether, as an ecological perspective on ratings (see
Lance et al. 2006) and MSPR practitioners (e.g., Tornow
1993) suggest, rater source factors represent alternative,
and complementary valid perspectives on ratee performance. That rater source factors evidenced significant
relationships with other performance-related variables
supported the latter position. For example, Lance et al.
(1992) showed that a self-rating factor correlated with
mechanical aptitude and job experience, and did so more
strongly than did supervisor or peer ratings. These results
(a) support the construct validity of self-ratings, and (b)
more generally show that different rater sources, including
self-ratings, represent valid performance-related variance
and not mere measurement method bias (see reviews by
Lance et al. 2008, 2009).
J Bus Psychol (2010) 25:325–334
What Reviewers Should Reasonably Expect
from Authors Regarding Common Method Bias
To a certain extent it is easier to say what we think
reviewers and other gatekeepers should not do (i.e., base
criticisms on the misconceptions described earlier) than it
is to prescribe proper reviewer behavior. It is not possible
to ensure that method effects do not influence results
(indeed, it is widely accepted that the act of measurement
can and does change the phenomenon being studied; Heisenberg 1927), but it is reasonable for reviewers to expect
authors to take certain steps to reduce the likelihood of
common method bias. We recommend four things
reviewers can look for, but we emphasize that it is not our
intent to define hurdles for authors that reviewers and other
gatekeepers should feel bound to enforce during the review
process. Overreliance on rules of thumb without careful
consideration of the origin and rationale for the rules and
their application is very problematic, and is instrumental in
creating and perpetuating what Vandenberg (2006) and
Lance and Vandenberg (2009) have referred to as ‘‘statistical and methodological myths and urban legends.’’ With
this in mind, we believe it is reasonable for reviewers to
expect the following:
An Argument for Why Self-Reports are Appropriate
Self-reports are clearly appropriate for job satisfaction and
many other private events (Chan 2009; Skinner 1957), but
for other constructs such as job characteristics or job performance, other types of measures might be appropriate or
even superior. If reviewers have a specific concern about
the use of self-reports (e.g., a concern about social desirability in self-reports on sensitive topics) they should
expect that authors/researchers be able to provide solid
rationale for their choice.
An excellent example of a rationale is provided by
Shalley et al. (2009) who studied self-reported creative
performance as predicted by growth need strength and
work context variables. Their rationale for using selfreports of creativity included an argument for why selfreports are particularly appropriate, stating ‘‘it has been
argued that employees are best suited to self-report creativity because they are the ones who are aware of the subtle
things they do in their jobs that make them creative’’
(p. 495). They also cited evidence from another study that
self-ratings of creativity correlated substantially (r = .62)
with supervisor ratings.
Another best-practice example is provided by Judge et al.
(2000), who studied job characteristics as a mediator of the
relationship between core self evaluations and job satisfaction. In describing their theoretical model Judge et al.
explicitly focused on perceived job characteristics (i.e., as
329
perceived by the incumbent), making self-reports the theoretically most relevant measurement method. First, they
used the term ‘‘perceived job characteristics’’ throughout the
article. Second, their rationale was tied to perceptions of job
characteristics rather than objective job characteristics. For
example, they argued that those who dispositionally experience positive affect may respond more favorably to situations (i.e., perceive a given job more positively) than those
who tend to experience negative affect.
Construct Validity Evidence
One way to rule out substantial method effects is to demonstrate construct validity of the measures used. Authors
should be able to make an argument that the measures they
chose have construct validity, and it is reasonable for
reviewers to ask for it if it is not provided. On the other
hand, we do not mean to imply that authors should always
be required to use the most widely cited and most thoroughly researched measures as we believe that this could
retard creative research toward refinement of theoretical
constructs and their operationalizations. We also do not
want to prescribe a specific set of criteria that authors
should meet because (a) establishing construct validity
depends on a constellation of types of evidence (Putka and
Sackett 2010), (b) it is not reasonable for researchers to
provide extensive evidence in every case, especially when
the area of research is relatively new and innovative, and
(c) the availability and most appropriate type of evidence
may vary case by case. Rather, we list several types of
evidence likely to be useful for establishing construct
validity; whether the evidence in a given case is adequate is
a matter of judgment.
The kinds of evidence likely to be useful include: (a)
appropriate reliability evidence (Feldt and Brennan 1989),
(b) factor structure, (c) establishment of a nomological
network including relationships with theoretically relevant
variables (or lack of relationships where this is expected
theoretically), (d) mean differences between relevant
groups, (e) convergent and discriminant validity based on a
multitrait-multimethod matrix, or (f) other evidence
depending on the situation (Messick 1989). Oftentimes,
this type of evidence is available in sources that describe
the initial development and evaluation of the particular
measure (see, e.g., Dooley et al. 2007; Mallard and Lance
1998; Netemeyer et al. 1996). In these cases, reviewers
could reasonably expect authors to acknowledge the scale
development citation and perhaps provide supplementary
corroborative evidence of the scale’s internal consistency
and factor structure.
For cases in which extensive construct validity evidence has not been presented elsewhere, reviewers might
reasonably ask for additional evidence. Examples of
123
330
reasonable construct validity evidence include Shalley
et al. (2009) and MacKenzie et al. (1998). Shalley et al.
justified self-ratings of creativity by noting that self-ratings
had been shown in other research to correlate substantially
(r = .62) with supervisor ratings, in addition to providing
evidence of internal consistency reliability. MacKenzie
et al. (1998) used instruments with long track records such
as the Minnesota Satisfaction Questionnaire, and cited
multiple sources providing construct validity evidence.
Lack of Overlap in Items for Different Constructs
Conceptual overlap in items used to measure different
constructs can bias relationships (Brannick et al., in press).
This is something reviewers should watch out for. It can be
difficult because usually authors do not, understandably,
provide all items for their measures. But, we believe it is
good to be alert for this kind of thing, and reviewers can
ask authors about potential overlap if the constructs are
similar. Possible remedies could include rewording or
dropping items that do overlap.
One research area for which this issue has been investigated is the relationship between organizational commitment and turnover intentions. Bozeman and Perrewé
(2001) had expert judges rate items from the organizational
commitment questionnaire (OCQ; Mowday et al. 1979)
and concluded that several OCQ items were retentionrelated (e.g., ‘‘It would take very little change in my
present circumstances to cause me to leave this organization’’). They used confirmatory factor analysis to show that
these items loaded significantly on a turnover-cognition
factor (along with a set of turnover cognition items), and
that the OCQ-turnover cognition correlation decreased
when the overlapping OCQ items were excluded.
While Bozeman and Perrewé’s (2001) explicit purpose
was to study item overlap, Chan (2001) acknowledged and
dealt with the issue in a substantive study of the organizational commitment (OC)–turnover intentions (QU) relationship. Chan stated that ‘‘OC was measured using a sixitem scale adapted from Mowday et al. (1979). The six
items made no reference to intent to quit (QU) to avoid
artifactual inflation of OC–QU correlation due to item
overlap’’ (p. 81).
We suspect there are other constructs for which items
are likely to overlap. One example involves measures of
contextual performance and the personality trait conscientiousness, two constructs that have been theoretically
linked (Motowidlo et al. 1997). Van Scotter and Motowidlo’s (1996) contextual performance measure (specifically the job dedication subscale) included the item ‘‘pay
close attention to important details’’ and a measure of
conscientiousness from the International Personality Item
Pool (2010) is ‘‘Pay attention to details.’’ We want to
123
J Bus Psychol (2010) 25:325–334
emphasize that ruling out conceptual overlap should not be
a new hurdle that all authors are expected to clear. Rather,
it makes sense to raise the issue if a reviewer has a specific
reason to believe overlap exists.
Evidence that Authors Proactively Considered
Common Method Bias
It is reasonable for reviewers to look for evidence that
authors considered common method bias in the design of
their study. This evidence can take many forms, and we do
not recommend that authors be required to use any particular one—merely that the judicious use of design techniques can demonstrate an a priori consideration of method
effects. Podsakoff et al. (2003) provided examples of
design techniques, including ‘‘temporal, proximal, psychological, or methodological separation of measurement,’’
‘‘protecting respondent anonymity and reducing evaluation
apprehension,’’ ‘‘counterbalancing question order,’’ and
‘‘improving scale items’’ (pp. 887–888).
We believe any of these can demonstrate a priori consideration of the possibility of common method bias,
though it might be demonstrated in other ways. The
appropriate technique(s) will depend on the situation. One
good example of a priori consideration comes from Westaby and Braithwaite (2003), who studied reemployment
self-efficacy. Westaby and Braithwaite counterbalanced
sets of items, and clearly explained the procedure and
rationale. Specifically, their purposes were to ‘‘(a) mitigate
order effects…, (b) reduce the negative effect of item order
on theoretical testing…, and (c) reduce the potential for
response sets’’ (p. 425). They used four different item
orders with random assignment of respondents to orders.
Another example might be measuring the constructs
believed to contribute to common method bias, and
incorporating them into a structural equation model, as
Williams and Anderson (1994) did. However, as we note
later, while this approach is theoretically sound it has not
been demonstrated to reduce common method bias.
A Note on Post Hoc Statistical Control of Common
Method Bias
One recommendation that we did not include in our list was
post hoc statistical control of common method bias. A
number of techniques have been proposed and thoroughly
reviewed by Podsakoff et al. (2003). One of these is Lindell
and Whitney’s (2001) partial correlation approach in which
the minimum correlation between some ‘‘marker variable’’
and a study’s focal variables is subtracted from the correlations among the focal variables to adjust for common
method bias. According to Richardson et al. (2009) this
J Bus Psychol (2010) 25:325–334
331
approach has been frequently adopted (48 times as of June
2008), but usually to demonstrate that method bias was not
a problem in the first place.
A second approach involves loading all study measures
onto an unmeasured ‘‘method’’ factor and then analyzing
relationships among the residualized variables. According
to Richardson et al. (2009), at least 49 studies through June
2008 have used the unmeasured method factor approach.
We see this approach as logically indefensible as it may
easily remove trait variance when multiple traits have a
common cause (e.g., generalized psychological climate;
James and James 1989).
Both the partial correlation and the unmeasured method
techniques were evaluated in a simulation by Richardson
et al. (2009) and both produced disappointing results. In
fact, they produced less accurate estimates of correlations
than did no correction at all when (a) method variance was
present and (b) true correlations were greater than zero—
conditions that are typical of organizational research. We
therefore discourage reviewers from suggesting that
authors use either of these commonly used remedies for
common method bias.
A third approach is Williams and Anderson’s (1994)
technique in which all focal latent variables’ manifest
indicators are loaded onto one or more substantive method
latent variables (e.g., acquiescence, social desirability,
positive/negative affectivity) on which also load the
method factor’s manifest indicators. One limitation of this
technique is that it presumes that a researcher knows the
source of method variance. The approach appears to have
been seldom applied and to have little effect on estimates
among the focal constructs (Richardson et al. 2009).
Two other approaches appear promising, though each
has important limitations. Use of a multitrait-multimethod
(MTMM) matrix to estimate relationships can be very
effective for controlling common method bias (if methods
are chosen carefully; see our ‘‘Misconception #3’’). But the
substantial demands of carrying out a MTMM study make
it difficult or impossible in many research situations.
Although there are thousands of citations to Campbell and
Fiske’s (1959) original article on the MTMM methodology
and hundreds of applications of the MTMM matrix (Lance
et al., in press) we know of only one application in which
researchers tested directional relationships (i.e., a structural
model) among self-reported constructs while controlling
for method effects in the MTMM matrix (Arora 1982).
A second promising approach is Lance et al.’s (in press)
simultaneous correction for common method bias and
attenuation. The correction for same-method correlations is:
^ TX TY ¼
q
rXY kXM kYM
kXTX kYTY
ð5Þ
and for different-method correlations is:
^ TX TY ¼
q
rXY kXMj kYMj0 qMj Mj0
kXTX kYTY
ð6Þ
where all terms are defined as in Eqs. 3 and 4 above. As
such, these corrections simultaneously adjust observed
correlations (rXY) for attenuation due to unreliability
(kXTX kYTY in the denominator) and for inflation due to
method effects (kXM kYM or kXMj kYMj0 qMj Mj0 in the numerator). Lance et al. (in press) also present a multivariate
extension of these corrections and provide discussion of
approaches to obtaining estimates of the quantities needed
to effect the corrections. While this approach appears to be
psychometrically sound there has been no empirical work
at all evaluating its effectiveness.
Thus, post hoc statistical control of common method
bias is in an unfortunate state-of-the-practice: some wellknown and easily applied approaches appear to be ineffective, use of the MTMM matrix is routinely resource
intensive, even if multiple measurement methods can be
identified for focal constructs, and the efficacy of Lance
et al.’s (in press) statistical control procedure is as yet
untested. Consequently, we cannot recommend any of
these approaches.
A Counter-Argument
Two readers of an earlier version of our manuscript independently suggested that, while our argument about attenuation in relationships due to use of different methods may
be valid, use of different methods is still preferable to the use
of the same method for different constructs. The argument is
that different—versus same-method studies may lead to
different types of errors—using different methods will tend
to underestimate relationships while using the same method
may tend to overestimate relationships (however; see our
‘‘Misconception #1’’). Thus, according to this argument, it is
more impressive to find a significant relationship with different methods because it is harder to do so. In this regard,
one reader made an analogy to Type-I errors being more
often associated with same-method correlations (samemethod correlations may overestimate relationships) versus
Type-II errors being more often associated with differentmethod correlations (different-method correlations may
underestimate relationships).
We acknowledge this analogy and we agree that in
general it is more difficult to detect a significant relationship between constructs measured by different methods
(i.e., higher probability of a Type-II error). However, we do
not agree that this is a virtue; problems with protecting
against Type-I error at the expense of a greater Type-II
123
332
error rate have been discussed, for example, by Hunter
(1997) and Cohen (1994). It is desirable to have high power
in research (i.e., to have a low Type-II error rate), attenuated relationships reduce power by increasing the likelihood of Type-II errors, and Lance et al.’s (in press) review
indicates that different-method correlations are routinely
attenuated as compared to their true-score counterparts.
We do acknowledge the potential in some cases for
inflation due to common method bias (though Lance et al.’s
[in press] review indicates that same-method correlations
are not routinely upwardly biased), and we are certainly not
opposed to the use of different methods to estimate relationships. But, we believe that if reviewers wish to criticize a
study for using a same-method design, there should be a
rationale for why the use of the same method would be
problematic. Reviewers (as well as authors) should take a
thoughtful approach to the issue of measurement methods;
each research situation’s characteristics (e.g., potential for
social desirability, availability of valid measures, etc.)
should be considered and a determination made based on
J Bus Psychol (2010) 25:325–334
will actually attenuate correlations as compared to the
no-method-variance situation. First consider a situation
with no method variance in measures of two traits (TX and
TY) which are measured by two different, uncorrelated
methods (Mj and Mj0 ). The equation for the correlation
coefficient, expressed in terms of variances and covariance,
when no method variance is present, is:
COVðTX ; TY Þ
rxy ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffipffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
VARðTX Þ þ VARðEX Þ VARðTY Þ þ VARðEY Þ
ðA1Þ
Now consider what happens when each measure contains
method variance that is unshared with the other measure.
The variance terms for each measure’s method effect,
VAR(Mj) and VAR(Mj0 ), are added to the denominator:
Because additional method variance appears in the
denominator when there are method effects (Eq. 8), but not
when there are no method effects (Eq. 7), Eq. 8 denominator will be larger while the numerators are the same. The
correlation value in Eq. 8 will therefore be attenuated.
COVðTX ; TY Þ
rxy ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffipffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
VARðTX Þ þ VARðMj Þ þ VARðEX Þ VARðTY Þ þ VARðMj0 Þ þ VARðEY Þ
ðA2Þ
those characteristics, rather than on a general notion that
same-method correlations are necessarily problematic.
References
Summary
We believe that journal reviewers and editors, and other
gatekeepers, such as dissertation or thesis committee
members, have to some extent perpetuated misconceptions
about common method bias in self-report measures, for
example, that relationships between self-reported variables
are necessarily and routinely upwardly biased. We argued
against these misconceptions and recommended what
gatekeepers should reasonably expect regarding common
method bias. Reasonable expectations include (a) an argument for why self-reports are appropriate, (b) a case for the
construct validity of the measures, (c) lack of overlap in
items for different constructs, and (d) proactive measures on
the part of authors to minimize threats of common method
bias. No post hoc statistical correction procedure can be
recommended until additional research evaluates the relative effectiveness of those that have been proposed.
Appendix: Attenuation of Correlations by Unshared
Method Variance
Method variance is usually assumed to inflate correlations,
but in a fairly common research situation, method variance
123
Arora, R. (1982). Validation of an S-O-R model for situation,
enduring and response components of involvement. Journal of
Marketing Research, 19, 505–516.
Bagozzi, R. P., & Yi, Y. (1990). Assessing method variance in
multitrait-multimethod matrices: The case of self-reported affect
and perceptions at work. Journal of Applied Psychology, 75,
547–560.
Becker, T. E., & Cote, J. A. (1994). Additive and multiplicative
method effects in applied psychological research: An empirical
assessment of three models. Journal of Management, 20, 625–
641.
Bozeman, D., & Perrewé, P. (2001). The effect of item content
overlap on Organizational Commitment Questionnaire–turnover
cognitions relationships. Journal of Applied Psychology, 86,
161–173.
Brannick, M. T., Chan, D., Conway, J. M., Lance, C. E., & Spector,
D. (in press). What is method variance and how can we cope
with it? A panel discussion. Organizational Research Methods.
Campbell, J. (1982). Editorial: Some remarks from the outgoing
editor. Journal of Applied Psychology, 67, 691–700.
Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant
validation by the multitrait-multimethod matrix. Psychological
Bulletin, 56, 81–106.
Chan, D. (2001). Method effects of positive affectivity, negative
affectivity, and impression management in self-reports of work
attitudes. Human Performance, 14, 77–96.
Chan, D. (2009). So why ask me? Are self-report data really that bad?
In C. E. Lance & R. J. Vandenberg (Eds.), Statistical and
metholodogical myths and urban legends: Doctrine, verity and
J Bus Psychol (2010) 25:325–334
fable in the organizational and social sciences (pp. 311–338).
New York: Routledge.
Cohen, J. (1994). The earth is round (p \ .05). American Psychologist, 49, 997–1003.
Conway, J. M. (2002). Method variance and method bias in I/O
psychology. In S. G. Rogelberg (Ed.), Handbook of research
methods in industrial and organizational psychology (pp. 344–
365). Oxford: Blackwell Publishers.
Cote, J. A., & Buckley, M. R. (1987). Estimating trait, method and
error variance: Generalizing across 70 construct validation
studies. Journal of Marketing Research, 26, 315–318.
Crampton, S., & Wagner, J. (1994). Percept-percept inflation in
microorganizational research: An investigation of prevalence
and effect. Journal of Applied Psychology, 79, 67–76.
Dooley, W. K., Shaffer, D. R., Lance, C. E., & Williamson, G. M.
(2007). Informal care can be better than adequate: Development
and evaluation of the exemplary care scale. Rehabilitation
Psychology, 52, 359–369.
Doty, D. H., & Glick, W. H. (1998). Common method bias: Does
common methods variance really bias results? Organizational
Research Methods, 1, 374–406.
Feldt, L. S., & Brennan, R. L. (1989). Reliability. In R. L. Linn (Ed.),
Educational measurement (3rd ed., pp. 105–146). Washington,
DC: American Council on Education and National Council on
Measurement in Education.
Heisenberg, W. (1927). Über den anschaulichen Inhalt der quantentheoretischen Kinematik und Mechanik. [About the graphic
content of quantum theoretic kinematics and mechanics].
Zeitschrift für Physik, 43, 172–198.
Hoffman, B. J., Lance, C. E., Bynum, B. H., & Gentry, W. A. (2010).
Rater source effects are alive and well after all. Personnel
Psychology, 63, 119–151.
Hoffman, B. J., & Woehr, D. J. (2009). Disentangling the meaning of
multisource performance rating source and dimension factors.
Personnel Psychology, 62, 735–765.
Hunter, J. (1997). Needed: A ban on the significance test. Psychological Science, 8, 3–7.
International Personality Item Pool. (2010). International Personality
Item Pool: A scientific collaboratory for the development of
advanced measures of personality traits and other individual
differences (http://ipip.ori.org/). http://ipip.ori.org/newNEOKey.
htm#Conscientiousness. February 5, 2010.
James, L. A., & James, L. R. (1989). Integrating work environment
perceptions: Explorations into the measurement of meaning.
Journal of Applied Psychology, 74, 739–751.
Judge, T., Bono, J., & Locke, E. (2000). Personality and job
satisfaction: The mediating role of job characteristics. Journal of
Applied Psychology, 85, 237–249.
Lance, C. E., Baranik, L. E., Lau, A. R., & Scharlau, E. A. (2009). If
it ain’t trait it must be method: (Mis)application of the multitraitmultimethod design in organizational research. In C. E. Lance &
R. J. Vandenberg (Eds.), Statistical and metholodogical myths
and urban legends: Doctrine, verity and fable in the organizational and social sciences (pp. 339–362). New York: Routledge.
Lance, C. E., Baxter, D., & Mahan, R. P. (2006). Multi-source
performance measurement: A reconceptualization. In W. Bennett, C. E. Lance, & D. J. Woehr (Eds.), Performance
measurement: Current perspectives and future challenges (pp.
49–76). Mahwah, NJ: Erlbaum.
Lance, C. E., Dawson, B., Birklebach, D., & Hoffman, B. J. (in press).
Method effects, measurement error, and substantive conclusions.
Organizational Research Methods.
Lance, C. E., Hoffman, B. J., Baranik, L. E., & Gentry, W. A. (2008).
Rater source factors represent important subcomponents of the
criterion construct space, not rater bias. Human Resource
Management Review, 18, 223–232.
333
Lance, C. E., Teachout, M. S., & Donnelly, T. M. (1992).
Specification of the criterion construct space: An application of
hierarchical confirmatory factor analysis. Journal of Applied
Psychology, 77, 437–452.
Lance, C. E., & Vandenberg, R. J. (2009). Introduction. In C. E.
Lance & R. J. Vandenberg (Eds.), Statistical and metholodogical
myths and urban legends: Doctrine, verity and fable in the
organizational and social sciences (pp. 1–4). New York:
Routledge.
Lindell, M. K., & Whitney, D. J. (2001). Accounting for common
method variance in cross-sectional research designs. Journal of
Applied Psychology, 86, 114–121.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental
test scores. Reading, MA: Addison-Wesley Publishing
Company.
MacKenzie, S. B., Podsakoff, P. M., & Ahearne, M. (1998). Some
possible antecedents and consequences of in-role and extra-role
salesperson performance. Journal of Marketing, 62, 87–98.
Mallard, A. G. C., & Lance, C. E. (1998). Development and
evaluation of a parent–employee interrole conflict scale. Social
Indicators Research, 45, 343–370.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational
measurement (pp. 13–103). Washington, DC: American Council
on Education and National Council on Measurement in
Education.
Motowidlo, S., Borman, W., & Schmit, M. (1997). A theory of
individual differences in task and contextual performance.
Human Performance, 10, 71–83.
Mount, M. K., Judge, T. A., Scullen, S. E., Sytsma, M. R., & Hezlett,
S. A. (1998). Trait, rater, and level effects in 360-degree
performance ratings. Personnel Psychology, 51, 557–576.
Mowday, R. T., Steers, R. M., & Porter, L. W. (1979). The
measurement of organizational commitment. Journal of Vocational Behavior, 14, 224–247.
Netemeyer, R. G., Boles, J. S., & McMurrian, R. (1996).
Development and validation of work-family conflict and
family-work conflict scales. Journal of Applied Psychology,
81, 400–410.
Ones, D., Viswesvaran, C., & Reiss, A. (1996). Role of social
desirability in personality testing for personnel selection: The red
herring. Journal of Applied Psychology, 81, 660–679.
Organ, D., & Ryan, K. (1995). A meta-analytic review of attitudinal
and dispositional predictors of organizational citizenship behavior. Personnel Psychology, 48, 775–802.
Podsakoff, P., MacKenzie, S., Lee, J., & Podsakoff, N. (2003).
Common method biases in behavioral research: A critical review
of the literature and recommended remedies. Journal of Applied
Psychology, 88, 879–903.
Podsakoff, P., & Todor, W. (1985). Relationships between leader
reward and punishment behavior and group processes and
productivity. Journal of Management, 11, 55–73.
Putka, D. M., & Sackett, P. R. (2010). Reliability and validity. In J. L.
Farr & N. T. Tippins (Eds.), Handbook of employee selection
(pp. 9–49). New York: Routledge.
Richardson, H., Simmering, M., & Sturman, M. (2009). A tale of
three perspectives: Examining post hoc statistical techniques for
detection and correction of common method variance. Organizational Research Methods, 12, 762–800.
Scullen, S. E., Mount, M. K., & Goff, M. (2000). Understanding the
latent structure of job performance ratings. Journal of Applied
Psychology, 85, 956–970.
Scullen, S. E., Mount, M. K., & Judge, T. A. (2003). Evidence of the
construct validity of developmental ratings of managerial
performance. Journal of Applied Psychology, 88, 50–66.
Shalley, C., Gilson, L., & Blum, T. (2009). Interactive effects of
growth need strength, work context, and job complexity on self-
123
334
reported creative performance. Academy of Management Journal, 52, 489–505.
Skinner, B. F. (1957). Verbal behavior. New York: AppletonCentury-Crofts.
Spector, P. (2006). Method variance in organizational research: Truth
or urban legend? Organizational Research Methods, 9, 221–232.
Tornow, W. W. (1993). Perceptions or reality: Is multi-perspective
measurement a means or an end? Human Resource Management,
32, 221–229.
Van Scotter, J. R., & Motowidlo, S. J. (1996). Interpersonal
facilitation and job dedication as separate facets of contextual
performance. Journal of Applied Psychology, 81, 525–531.
Vandenberg, R. J. (2006). Statistical and methodological myths and
urban legends: Where, pray tell, did they get this idea?
Organizational Research Methods, 9, 194–201.
123
J Bus Psychol (2010) 25:325–334
Westaby, J., & Braithwaite, K. (2003). Specific factors underlying
reemployment self-efficacy: Comparing control belief and
motivational reason methods for the recently unemployed.
Journal of Applied Behavioral Science, 39, 415–437.
Widaman, K. (1985). Hierarchically nested covariance structure
models for multitrait-multimethod data. Applied Psychological
Measurement, 9, 1–26.
Williams, L. J., & Anderson, S. E. (1994). An alternative approach to
method effects by using latent-variable models: Applications in
organizational behavior research. Journal of Applied Psychology, 79, 323–331.
Williams, L. J., Cote, J. A., & Buckley, M. R. (1989). Lack of method
variance in self-reported affect and perceptions at work: Reality
or artifact? Journal of Applied Psychology, 74, 462–468.
http://www.springer.com/journal/10869