J Bus Psychol (2010) 25:325–334 DOI 10.1007/s10869-010-9181-6 What Reviewers Should Expect from Authors Regarding Common Method Bias in Organizational Research James M. Conway • Charles E. Lance Published online: 22 May 2010 Springer Science+Business Media, LLC 2010 Abstract We believe that journal reviewers (as well as editors and dissertation or thesis committee members) have to some extent perpetuated misconceptions about common method bias in self-report measures, including (a) that relationships between self-reported variables are necessarily and routinely upwardly biased, (b) other-reports (or other methods) are superior to self-reports, and (c) rating sources (e.g., self, other) constitute measurement methods. We argue against these misconceptions and make recommendations for what reviewers (and others) should reasonably expect from authors regarding common method bias. We believe it is reasonable to expect (a) an argument for why self-reports are appropriate, (b) construct validity evidence, (c) lack of overlap in items for different constructs, and (d) evidence that authors took proactive design steps to mitigate threats of method effects. We specifically do not recommend post hoc statistical control strategies; while some statistical strategies are promising, all have significant drawbacks and some have shown poor empirical results. Keywords Common method bias Method variance Self-report measures Reviewing The focus of this article is on common method bias from a journal reviewer’s perspective. According to Chan (2009) and Spector (2006), concerns about common method bias J. M. Conway (&) Department of Psychology, Central Connecticut State University, 1615 Stanley Street, New Britain, CT 06050, USA e-mail: [email protected] C. E. Lance University of Georgia, Athens, GA, USA e-mail: [email protected] are often raised during the review process. While there is a substantial literature about controlling common method bias that targets researchers and authors (e.g., Cote and Buckley 1987; Crampton and Wagner 1994; Lance et al., in press; Podsakoff et al. 2003; Richardson et al. 2009; Spector 2006; Williams et al. 1989) there is very little that advises reviewers (one exception is Richardson et al. 2009). This is important because there are misconceptions about common method bias (as we explain later) that reviewers, dissertation committees, editors, and other ‘‘gatekeepers’’ play an important role in perpetuating (Lance and Vandenberg 2009). We intend to fill this gap in the literature by describing misconceptions regarding common method bias and what it is reasonable for reviewers to expect from authors to minimize common method bias. By extension, we hope to help authors be better prepared to (a) design studies that minimize common method bias and (b) present their research in a way that proactively addresses concerns with method bias. The literature on common method bias has dealt with a variety of types of alleged measurement methods including self-reports (e.g., self-ratings of job satisfaction and work behaviors), rater effects (e.g., performance ratings by multiple sources), and assessment center exercises. Our main focus will be on the use of self-reports as measurement method because: (a) It is widely assumed that common method bias inflates relationships between variables measured by self-reports. For example, Podsakoff and Todor (1985) stated ‘‘Invariably, when self-report measures obtained from the same sample are utilized in research, concern over same-source bias or general method variance arises’’ (p. 65). A second example comes from Organ and Ryan’s (1995) meta-analysis 123 326 J Bus Psychol (2010) 25:325–334 of correlates of organizational citizenship behavior (OCB) in which they stated that ‘‘studies that use selfratings of OCB along with self-reports of dispositional and attitudinal variables invite spuriously high correlations confounded by common method variance’’ (p. 779). As a final example, and as part of his commentary as outgoing editor of the Journal of Applied Psychology Campbell (1982) wrote: (b) With perhaps one or two exceptions there has been very little opportunity to exercise any professional biases (I think). One possible exception pertains to the use of a self-report questionnaire to measure all the variables in a study. If there is no evident construct validity for the questionnaire measure or no variables that are measured independently of the questionnaire, I am biased against the study and believe that it contributes very little. Many people share this bias. (p. 692). As a result of this apparently widely held belief, authors often report reviewer and editorial concerns about inflation in self-report relationships (Brannick et al., in press; Spector 2006). Given the tendency for reviewer criticisms to center on self-reported variables, this will be the main focus of our article. We will not emphasize alleged ‘‘method’’ effects due to rating sources (e.g., multisource performance ratings), assessment center exercises or measurement occasions because as we explain later, it is a misconception that these measurement facets should necessarily be thought of as simply different methods of measurement (Lance et al. 2009). We believe that there are widely held misconceptions about the effects of common method bias in self-report variables and that reviewers sometimes base criticisms of manuscripts on these misconceptions. We will therefore include a section dealing with misconceptions. This is not to say that common method bias is never a problem. There are situations in which common method bias should be an important concern, and we will follow the discussion of misconceptions with a section in which we lay out the kinds of things reviewers should look for and reasonably expect from authors to minimize common method bias. Misconceptions About Common Method Bias—What Should Reviewers Not Assume We suggest that there are three prevailing misconceptions about common method bias that can impede the progress of research: (a) that relationships between self-reported 123 variables are necessarily and routinely upwardly biased, (b) that other-reports (or other methods) are superior to self-reports, and (c) that rating sources (e.g., self, other) constitute mere alternative measurement methods. Misconception #1: Relationships Between Self-Reported Variables are Routinely Upwardly Biased This is the most fundamental misconception. An example of this belief in the published literature comes from Organ and Ryan’s (1995) meta-analysis of correlates of OCB. Organ and Ryan stated that ‘‘The most notable moderator of these correlations appears to be the use of self- versus other-rating of OCB; self-ratings are associated with higher correlations, suggesting spurious inflation due to common method variance’’ (p. 775). Spector (2006) provided indirect evidence against inflation, observing that correlations among self-report variables are at least sometimes nearzero, and that sometimes different-method correlations are higher than same-method correlations. However, there is a degree of truth to this belief—common (or correlated) method effects can upwardly bias relationships—but the situation is more complicated by several issues, one of which is the important distinction between method variance and common methods bias—method variance is systematic error variance due to the method of measurement and common method bias is inflation of relationships by shared method variance. Under a simple extension of classical test theory (Lord and Novick 1968), the measurement equation for an observed score (Xijk) represents the kth realization of the ith individual’s true score on trait (construct) T, as measured by the jth measurement method (Mj) and non-systematic measurement error (Eijk): Xijk ¼ kTij Ti þ kMij Mj þ Eijk ð1Þ Under the traditional assumptions that Ti and Mj are uncorrelated with Eijk and each other (e.g., Widaman 1985), and that the Xijk, Ti, and Mj have unit variances (i.e., that the ks are standardized regression coefficients or factor loadings), the variance in observed scores, r2Xijk ; reflects three independent components: individuals’ true score variability, k2Tij (this can also be interpreted as a squared reliability index), variance attributable to the method of measurement, k2Mij , and non-systematic error score variance, r2Eijk : r2Xijk ¼ k2Tij þ k2Mij þ r2Eijk ð2Þ It is in this sense that method variance is interjected into observed scores and the primary implication is compromised construct validity to the extent that method variance J Bus Psychol (2010) 25:325–334 327 dominates over true score variance. That is, to the extent that k2Mij dominates over k2Tij in Eq. 2, X reflects less of the intended construct than it does the method employed to measure it. But compromised construct validity does not necessarily lead to common method bias, or upward distortion in measured relationships. In fact, if two traits (i.e., TX and TY) are measured by two different, uncorrelated methods, the expected correlation will be attenuated as compared to the no-method-variance situation. This is because each measure will contain a source of variance (due to the measurement method) that is unshared with the other measure, decreasing the resulting correlation value (the appendix provides a mathematical description of why the attenuation occurs). However, inflationary common method bias can occur if different traits are measured by the same method. In the case in which X and Y are measured by the same method M, the observed correlation can be represented as: rXY ¼ kXTX kYTY qTX TY þ kXM kYM ð3Þ where kXTX and kYTY are X’s and Y’s reliability indexes, respectively, qTX TY represents the X–Y true score correlation, and kXM and kYM represent the effect of the common method M on X and Y, respectively. Equation 3 shows that rXY is inflated by a factor of kXMkYM, the common causal effect of method M used to measure both X and Y, and this is the source of common method bias. Thus, common method bias causes observed score correlations to be inflated as compared to their true score counterparts. Note, however, that in Eq. 3 rXY is also simultaneously attenuated by unreliability in both X and Y (i.e., both kXTX and kYTY are likely to be [considerably] less than 1.0) as compared to qTX TY . As such, and perhaps counterintuitively, correlations between variables measured using a common method are simultaneously attenuated due to unreliability and inflated due to common method bias. Thus, the net effect could be that observed correlations are (a) higher than qTX TY (if inflation is greater than attenuation), (b) lower than qTX TY (if attenuation is greater than inflation), or (c) equal to qTX TY (if inflation and attenuation offset each other). In a recent study, Lance et al. (in press) re-analyzed data from 18 published multitrait-multimethod (MTMM) matrices based on a variety of constructs and methods to evaluate the possible offsetting effects of unreliability and common method effects. Same-method correlations (rXY in Eq. 3) were compared to trait factor correlations from confirmatory factor analysis of the MTMM matrices; the trait factor correlations can be regarded as unbiased estimates of true correlations, qTX TY in Eq. 3. Overall, they found that the average same-method correlation was .342, which was actually slightly lower than the mean trait factor correlation of .371. These results suggest that (a) the attenuating effects of measurement error offset (in fact, were somewhat larger than) the inflationary effects of common method bias, (b) that same-method observed score correlations are actually quite accurate representations of their true-score counterparts, and (c) the widespread belief that common method bias serves to inflate common method correlations as compared to their true-score counterparts is substantially a myth. Misconception #2: Other-Reports (or Other Methods) are Superior to Self-Reports A corollary to misconception #1 is that other-reports or other methods (e.g., objective measures) are superior to self-reports. This idea is illustrated by the earlier quotations from Campbell (1982), Organ and Ryan (1995), and Podsakoff and Todor (1985). These quotations imply that including a non-self-reported variable can improve the quality of the study and/or reduce inflation of relationships. More directly, Chan (2009) provided quotations from reviewer comments indicating the assumption that nonself-reports are superior to self-reports (p. 325). One reason self-reports might be considered inferior is because of method effects which decrease construct validity (e.g., influence of response biases on ratings of job characteristics). Self-reports certainly can be subject to method effects but literature reviews have indicated that suspected biasing factors including social desirability (Ones et al. 1996) and negative affect and acquiescence (Spector 2006) do not have strong, consistent effects (e.g., Williams and Anderson 1994). In addition, other methods are also subject to method effects (e.g., supervisor ratings of job characteristics are subject to response biases as well, as are peer ratings of job performance). In fact, if all constructs are measured by the same measurement method (e.g., supervisory ratings, peer ratings), the measurement implications are identical to those discussed earlier in connection with Eq. 3 and the case of self-reports. Another reason for considering self-reports to be inferior is the belief that if two variables are self-reported, the relationship will be inflated (see misconception #1). It might be argued that it is better to measure at least one of the variables with an other-report. However, if a relationship is assessed between a self-reported variable (e.g., job satisfaction) and an other-reported variable (e.g., supervisor report of job characteristics), the result could be an attenuated relationship due to unshared method effects (see the appendix and Conway 2002) and/or random error (see Eq. 3 and Lance et al., in press), or even an inflated correlation. In the case in which X and Y are measured by different methods Mj and Mj0 , the observed correlation can be represented as: 123 328 rXY ¼ kXTX kYTY qTX TY þ kXMj kYMj0 qMj Mj0 J Bus Psychol (2010) 25:325–334 ð4Þ where kXMj and kYMj0 represent the effects of the methods Mj and Mj0 on X and Y, respectively, qMj Mj0 reflects the correlation between the different methods, and all other terms are defined as before. Note that as in Eq. 3, attenuation incurs to the extent that measures are less than perfectly reliable (i.e., kXTX and kYTY are less than 1.00), but that covariance distortion (usually inflation) still incurs to the extent that the methods are positively correlated (i.e., qMj Mj0 [ 0). If the different methods are uncorrelated, then the observed correlation is attenuated due to both unreliability and uncorrelated method variance as shown in the appendix. In the event that the methods are negatively correlated, then even further attenuation would incur due to the fact that the product kXMj kYMj0 qMj Mj0 in Eq. 4 would be negative, causing rXY to be even less than qTX TY attenuated simply by measurement error. We noted earlier that Lance et al. (in press) found even same-method correlations to slightly underestimate true relationships, and this effect was found to be even greater for correlations between different methods: the mean different-method, different-trait correlation was only .248 when compared to the mean trait factor correlation (the estimate of qTX TY ) of .371. In fact, and using results from Lance et al.’s (in press) Table 2, 61% of the mean different-method correlation was due to attenuated true-score correlation (.151/.248), but 39% (.097/.248) was due to effects of measurement methods that were correlated qMj Mj0 ¼ :525, on the average. Thus, rather than providing a more accurate estimated of true relationships among constructs, relationships estimated using different methods tend to be more attenuated and less accurate as compared to same-method correlations. This is true despite the fact that different-method correlations reflect some inflation due effects of correlated methods. In fact, had the different methods been uncorrelated, the average different-method correlation would have been rXY = .151, rather severely attenuated as compared to its true-score counterpart (qTX TY ¼ :371). Misconception #3: Rating Sources (e.g., Self, Other) Constitute Measurement Methods Underlying the preference for other-reports over selfreports (see ‘‘Misconception #2’’) is an assumption that different rating sources, for example, supervisor, peer, subordinate, and self-ratings in multisource rating systems, represent different methods for measuring ratee performance. For example, Mount et al. (1998) wrote that ‘‘[s]tudies that have examined performance rating data using multitrait-multimethod matrices (MTMM) or multitrait-multirater (MTMR) matrices usually focus on the 123 proportion of variance in performance ratings that is attributable to traits and that which is attributable to the methods or raters’’ (p. 559) and that these studies have ‘‘documented the ubiquitous phenomenon of method effects in performance ratings’’ (p. 568) (see also, Becker and Cote 1994; Doty and Glick 1998; Scullen et al. 2000; Scullen et al. 2003, for similar attributions). Combined with the belief that ‘‘method variance can bias results when researchers investigate relations among constructs with the common method… method variance produces a potential threat to the validity of empirical findings’’ (Bagozzi and Yi 1990, p. 547), the pervasive and strong rater (source) effects that are routinely found in multisource ratings (Mount et al. 1998; Scullen et al. 2000, 2003; Hoffman et al. 2010) create the impression that these ratings are seriously flawed, biased, and inaccurate measures of ratees’ actual performance. Note that despite misconception #2 that other-ratings are superior to self-ratings, beliefs about rating sources as methods could lead to criticism of otherreports as well as self-ratings. We argue that the mere existence of rater source effects does not undermine the construct validity of either selfreports or other reports. In a series of studies, Hoffman, Lance and colleagues (Hoffman and Woehr 2009; Lance et al. 2006, 1992) conducted confirmatory factor analysis (CFA) of multisource performance ratings (MSPRs). As is often the case, they found support for both rating dimension and rater source factors and found that rater source factors accounted for substantially more variance in ratings than did dimension factors. Note that from a traditional perspective, this pattern of evidence would be interpreted as representing the presence of substantial rater (qua method) bias. However, Hoffman, Lance and colleagues also correlated rater source factors with additional jobrelated variables external to the ‘‘core’’ CFA, including personality, aptitude, experience, and objective performance measures to test whether source factors represented mere performance-irrelevant rater (i.e., method) bias factors or whether, as an ecological perspective on ratings (see Lance et al. 2006) and MSPR practitioners (e.g., Tornow 1993) suggest, rater source factors represent alternative, and complementary valid perspectives on ratee performance. That rater source factors evidenced significant relationships with other performance-related variables supported the latter position. For example, Lance et al. (1992) showed that a self-rating factor correlated with mechanical aptitude and job experience, and did so more strongly than did supervisor or peer ratings. These results (a) support the construct validity of self-ratings, and (b) more generally show that different rater sources, including self-ratings, represent valid performance-related variance and not mere measurement method bias (see reviews by Lance et al. 2008, 2009). J Bus Psychol (2010) 25:325–334 What Reviewers Should Reasonably Expect from Authors Regarding Common Method Bias To a certain extent it is easier to say what we think reviewers and other gatekeepers should not do (i.e., base criticisms on the misconceptions described earlier) than it is to prescribe proper reviewer behavior. It is not possible to ensure that method effects do not influence results (indeed, it is widely accepted that the act of measurement can and does change the phenomenon being studied; Heisenberg 1927), but it is reasonable for reviewers to expect authors to take certain steps to reduce the likelihood of common method bias. We recommend four things reviewers can look for, but we emphasize that it is not our intent to define hurdles for authors that reviewers and other gatekeepers should feel bound to enforce during the review process. Overreliance on rules of thumb without careful consideration of the origin and rationale for the rules and their application is very problematic, and is instrumental in creating and perpetuating what Vandenberg (2006) and Lance and Vandenberg (2009) have referred to as ‘‘statistical and methodological myths and urban legends.’’ With this in mind, we believe it is reasonable for reviewers to expect the following: An Argument for Why Self-Reports are Appropriate Self-reports are clearly appropriate for job satisfaction and many other private events (Chan 2009; Skinner 1957), but for other constructs such as job characteristics or job performance, other types of measures might be appropriate or even superior. If reviewers have a specific concern about the use of self-reports (e.g., a concern about social desirability in self-reports on sensitive topics) they should expect that authors/researchers be able to provide solid rationale for their choice. An excellent example of a rationale is provided by Shalley et al. (2009) who studied self-reported creative performance as predicted by growth need strength and work context variables. Their rationale for using selfreports of creativity included an argument for why selfreports are particularly appropriate, stating ‘‘it has been argued that employees are best suited to self-report creativity because they are the ones who are aware of the subtle things they do in their jobs that make them creative’’ (p. 495). They also cited evidence from another study that self-ratings of creativity correlated substantially (r = .62) with supervisor ratings. Another best-practice example is provided by Judge et al. (2000), who studied job characteristics as a mediator of the relationship between core self evaluations and job satisfaction. In describing their theoretical model Judge et al. explicitly focused on perceived job characteristics (i.e., as 329 perceived by the incumbent), making self-reports the theoretically most relevant measurement method. First, they used the term ‘‘perceived job characteristics’’ throughout the article. Second, their rationale was tied to perceptions of job characteristics rather than objective job characteristics. For example, they argued that those who dispositionally experience positive affect may respond more favorably to situations (i.e., perceive a given job more positively) than those who tend to experience negative affect. Construct Validity Evidence One way to rule out substantial method effects is to demonstrate construct validity of the measures used. Authors should be able to make an argument that the measures they chose have construct validity, and it is reasonable for reviewers to ask for it if it is not provided. On the other hand, we do not mean to imply that authors should always be required to use the most widely cited and most thoroughly researched measures as we believe that this could retard creative research toward refinement of theoretical constructs and their operationalizations. We also do not want to prescribe a specific set of criteria that authors should meet because (a) establishing construct validity depends on a constellation of types of evidence (Putka and Sackett 2010), (b) it is not reasonable for researchers to provide extensive evidence in every case, especially when the area of research is relatively new and innovative, and (c) the availability and most appropriate type of evidence may vary case by case. Rather, we list several types of evidence likely to be useful for establishing construct validity; whether the evidence in a given case is adequate is a matter of judgment. The kinds of evidence likely to be useful include: (a) appropriate reliability evidence (Feldt and Brennan 1989), (b) factor structure, (c) establishment of a nomological network including relationships with theoretically relevant variables (or lack of relationships where this is expected theoretically), (d) mean differences between relevant groups, (e) convergent and discriminant validity based on a multitrait-multimethod matrix, or (f) other evidence depending on the situation (Messick 1989). Oftentimes, this type of evidence is available in sources that describe the initial development and evaluation of the particular measure (see, e.g., Dooley et al. 2007; Mallard and Lance 1998; Netemeyer et al. 1996). In these cases, reviewers could reasonably expect authors to acknowledge the scale development citation and perhaps provide supplementary corroborative evidence of the scale’s internal consistency and factor structure. For cases in which extensive construct validity evidence has not been presented elsewhere, reviewers might reasonably ask for additional evidence. Examples of 123 330 reasonable construct validity evidence include Shalley et al. (2009) and MacKenzie et al. (1998). Shalley et al. justified self-ratings of creativity by noting that self-ratings had been shown in other research to correlate substantially (r = .62) with supervisor ratings, in addition to providing evidence of internal consistency reliability. MacKenzie et al. (1998) used instruments with long track records such as the Minnesota Satisfaction Questionnaire, and cited multiple sources providing construct validity evidence. Lack of Overlap in Items for Different Constructs Conceptual overlap in items used to measure different constructs can bias relationships (Brannick et al., in press). This is something reviewers should watch out for. It can be difficult because usually authors do not, understandably, provide all items for their measures. But, we believe it is good to be alert for this kind of thing, and reviewers can ask authors about potential overlap if the constructs are similar. Possible remedies could include rewording or dropping items that do overlap. One research area for which this issue has been investigated is the relationship between organizational commitment and turnover intentions. Bozeman and Perrewé (2001) had expert judges rate items from the organizational commitment questionnaire (OCQ; Mowday et al. 1979) and concluded that several OCQ items were retentionrelated (e.g., ‘‘It would take very little change in my present circumstances to cause me to leave this organization’’). They used confirmatory factor analysis to show that these items loaded significantly on a turnover-cognition factor (along with a set of turnover cognition items), and that the OCQ-turnover cognition correlation decreased when the overlapping OCQ items were excluded. While Bozeman and Perrewé’s (2001) explicit purpose was to study item overlap, Chan (2001) acknowledged and dealt with the issue in a substantive study of the organizational commitment (OC)–turnover intentions (QU) relationship. Chan stated that ‘‘OC was measured using a sixitem scale adapted from Mowday et al. (1979). The six items made no reference to intent to quit (QU) to avoid artifactual inflation of OC–QU correlation due to item overlap’’ (p. 81). We suspect there are other constructs for which items are likely to overlap. One example involves measures of contextual performance and the personality trait conscientiousness, two constructs that have been theoretically linked (Motowidlo et al. 1997). Van Scotter and Motowidlo’s (1996) contextual performance measure (specifically the job dedication subscale) included the item ‘‘pay close attention to important details’’ and a measure of conscientiousness from the International Personality Item Pool (2010) is ‘‘Pay attention to details.’’ We want to 123 J Bus Psychol (2010) 25:325–334 emphasize that ruling out conceptual overlap should not be a new hurdle that all authors are expected to clear. Rather, it makes sense to raise the issue if a reviewer has a specific reason to believe overlap exists. Evidence that Authors Proactively Considered Common Method Bias It is reasonable for reviewers to look for evidence that authors considered common method bias in the design of their study. This evidence can take many forms, and we do not recommend that authors be required to use any particular one—merely that the judicious use of design techniques can demonstrate an a priori consideration of method effects. Podsakoff et al. (2003) provided examples of design techniques, including ‘‘temporal, proximal, psychological, or methodological separation of measurement,’’ ‘‘protecting respondent anonymity and reducing evaluation apprehension,’’ ‘‘counterbalancing question order,’’ and ‘‘improving scale items’’ (pp. 887–888). We believe any of these can demonstrate a priori consideration of the possibility of common method bias, though it might be demonstrated in other ways. The appropriate technique(s) will depend on the situation. One good example of a priori consideration comes from Westaby and Braithwaite (2003), who studied reemployment self-efficacy. Westaby and Braithwaite counterbalanced sets of items, and clearly explained the procedure and rationale. Specifically, their purposes were to ‘‘(a) mitigate order effects…, (b) reduce the negative effect of item order on theoretical testing…, and (c) reduce the potential for response sets’’ (p. 425). They used four different item orders with random assignment of respondents to orders. Another example might be measuring the constructs believed to contribute to common method bias, and incorporating them into a structural equation model, as Williams and Anderson (1994) did. However, as we note later, while this approach is theoretically sound it has not been demonstrated to reduce common method bias. A Note on Post Hoc Statistical Control of Common Method Bias One recommendation that we did not include in our list was post hoc statistical control of common method bias. A number of techniques have been proposed and thoroughly reviewed by Podsakoff et al. (2003). One of these is Lindell and Whitney’s (2001) partial correlation approach in which the minimum correlation between some ‘‘marker variable’’ and a study’s focal variables is subtracted from the correlations among the focal variables to adjust for common method bias. According to Richardson et al. (2009) this J Bus Psychol (2010) 25:325–334 331 approach has been frequently adopted (48 times as of June 2008), but usually to demonstrate that method bias was not a problem in the first place. A second approach involves loading all study measures onto an unmeasured ‘‘method’’ factor and then analyzing relationships among the residualized variables. According to Richardson et al. (2009), at least 49 studies through June 2008 have used the unmeasured method factor approach. We see this approach as logically indefensible as it may easily remove trait variance when multiple traits have a common cause (e.g., generalized psychological climate; James and James 1989). Both the partial correlation and the unmeasured method techniques were evaluated in a simulation by Richardson et al. (2009) and both produced disappointing results. In fact, they produced less accurate estimates of correlations than did no correction at all when (a) method variance was present and (b) true correlations were greater than zero— conditions that are typical of organizational research. We therefore discourage reviewers from suggesting that authors use either of these commonly used remedies for common method bias. A third approach is Williams and Anderson’s (1994) technique in which all focal latent variables’ manifest indicators are loaded onto one or more substantive method latent variables (e.g., acquiescence, social desirability, positive/negative affectivity) on which also load the method factor’s manifest indicators. One limitation of this technique is that it presumes that a researcher knows the source of method variance. The approach appears to have been seldom applied and to have little effect on estimates among the focal constructs (Richardson et al. 2009). Two other approaches appear promising, though each has important limitations. Use of a multitrait-multimethod (MTMM) matrix to estimate relationships can be very effective for controlling common method bias (if methods are chosen carefully; see our ‘‘Misconception #3’’). But the substantial demands of carrying out a MTMM study make it difficult or impossible in many research situations. Although there are thousands of citations to Campbell and Fiske’s (1959) original article on the MTMM methodology and hundreds of applications of the MTMM matrix (Lance et al., in press) we know of only one application in which researchers tested directional relationships (i.e., a structural model) among self-reported constructs while controlling for method effects in the MTMM matrix (Arora 1982). A second promising approach is Lance et al.’s (in press) simultaneous correction for common method bias and attenuation. The correction for same-method correlations is: ^ TX TY ¼ q rXY kXM kYM kXTX kYTY ð5Þ and for different-method correlations is: ^ TX TY ¼ q rXY kXMj kYMj0 qMj Mj0 kXTX kYTY ð6Þ where all terms are defined as in Eqs. 3 and 4 above. As such, these corrections simultaneously adjust observed correlations (rXY) for attenuation due to unreliability (kXTX kYTY in the denominator) and for inflation due to method effects (kXM kYM or kXMj kYMj0 qMj Mj0 in the numerator). Lance et al. (in press) also present a multivariate extension of these corrections and provide discussion of approaches to obtaining estimates of the quantities needed to effect the corrections. While this approach appears to be psychometrically sound there has been no empirical work at all evaluating its effectiveness. Thus, post hoc statistical control of common method bias is in an unfortunate state-of-the-practice: some wellknown and easily applied approaches appear to be ineffective, use of the MTMM matrix is routinely resource intensive, even if multiple measurement methods can be identified for focal constructs, and the efficacy of Lance et al.’s (in press) statistical control procedure is as yet untested. Consequently, we cannot recommend any of these approaches. A Counter-Argument Two readers of an earlier version of our manuscript independently suggested that, while our argument about attenuation in relationships due to use of different methods may be valid, use of different methods is still preferable to the use of the same method for different constructs. The argument is that different—versus same-method studies may lead to different types of errors—using different methods will tend to underestimate relationships while using the same method may tend to overestimate relationships (however; see our ‘‘Misconception #1’’). Thus, according to this argument, it is more impressive to find a significant relationship with different methods because it is harder to do so. In this regard, one reader made an analogy to Type-I errors being more often associated with same-method correlations (samemethod correlations may overestimate relationships) versus Type-II errors being more often associated with differentmethod correlations (different-method correlations may underestimate relationships). We acknowledge this analogy and we agree that in general it is more difficult to detect a significant relationship between constructs measured by different methods (i.e., higher probability of a Type-II error). However, we do not agree that this is a virtue; problems with protecting against Type-I error at the expense of a greater Type-II 123 332 error rate have been discussed, for example, by Hunter (1997) and Cohen (1994). It is desirable to have high power in research (i.e., to have a low Type-II error rate), attenuated relationships reduce power by increasing the likelihood of Type-II errors, and Lance et al.’s (in press) review indicates that different-method correlations are routinely attenuated as compared to their true-score counterparts. We do acknowledge the potential in some cases for inflation due to common method bias (though Lance et al.’s [in press] review indicates that same-method correlations are not routinely upwardly biased), and we are certainly not opposed to the use of different methods to estimate relationships. But, we believe that if reviewers wish to criticize a study for using a same-method design, there should be a rationale for why the use of the same method would be problematic. Reviewers (as well as authors) should take a thoughtful approach to the issue of measurement methods; each research situation’s characteristics (e.g., potential for social desirability, availability of valid measures, etc.) should be considered and a determination made based on J Bus Psychol (2010) 25:325–334 will actually attenuate correlations as compared to the no-method-variance situation. First consider a situation with no method variance in measures of two traits (TX and TY) which are measured by two different, uncorrelated methods (Mj and Mj0 ). The equation for the correlation coefficient, expressed in terms of variances and covariance, when no method variance is present, is: COVðTX ; TY Þ rxy ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffipffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi VARðTX Þ þ VARðEX Þ VARðTY Þ þ VARðEY Þ ðA1Þ Now consider what happens when each measure contains method variance that is unshared with the other measure. The variance terms for each measure’s method effect, VAR(Mj) and VAR(Mj0 ), are added to the denominator: Because additional method variance appears in the denominator when there are method effects (Eq. 8), but not when there are no method effects (Eq. 7), Eq. 8 denominator will be larger while the numerators are the same. The correlation value in Eq. 8 will therefore be attenuated. COVðTX ; TY Þ rxy ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffipffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi VARðTX Þ þ VARðMj Þ þ VARðEX Þ VARðTY Þ þ VARðMj0 Þ þ VARðEY Þ ðA2Þ those characteristics, rather than on a general notion that same-method correlations are necessarily problematic. References Summary We believe that journal reviewers and editors, and other gatekeepers, such as dissertation or thesis committee members, have to some extent perpetuated misconceptions about common method bias in self-report measures, for example, that relationships between self-reported variables are necessarily and routinely upwardly biased. We argued against these misconceptions and recommended what gatekeepers should reasonably expect regarding common method bias. Reasonable expectations include (a) an argument for why self-reports are appropriate, (b) a case for the construct validity of the measures, (c) lack of overlap in items for different constructs, and (d) proactive measures on the part of authors to minimize threats of common method bias. No post hoc statistical correction procedure can be recommended until additional research evaluates the relative effectiveness of those that have been proposed. Appendix: Attenuation of Correlations by Unshared Method Variance Method variance is usually assumed to inflate correlations, but in a fairly common research situation, method variance 123 Arora, R. (1982). Validation of an S-O-R model for situation, enduring and response components of involvement. Journal of Marketing Research, 19, 505–516. Bagozzi, R. P., & Yi, Y. (1990). Assessing method variance in multitrait-multimethod matrices: The case of self-reported affect and perceptions at work. Journal of Applied Psychology, 75, 547–560. Becker, T. E., & Cote, J. A. (1994). Additive and multiplicative method effects in applied psychological research: An empirical assessment of three models. Journal of Management, 20, 625– 641. Bozeman, D., & Perrewé, P. (2001). The effect of item content overlap on Organizational Commitment Questionnaire–turnover cognitions relationships. Journal of Applied Psychology, 86, 161–173. Brannick, M. T., Chan, D., Conway, J. M., Lance, C. E., & Spector, D. (in press). What is method variance and how can we cope with it? A panel discussion. Organizational Research Methods. Campbell, J. (1982). Editorial: Some remarks from the outgoing editor. Journal of Applied Psychology, 67, 691–700. Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81–106. Chan, D. (2001). Method effects of positive affectivity, negative affectivity, and impression management in self-reports of work attitudes. Human Performance, 14, 77–96. Chan, D. (2009). So why ask me? Are self-report data really that bad? In C. E. Lance & R. J. Vandenberg (Eds.), Statistical and metholodogical myths and urban legends: Doctrine, verity and J Bus Psychol (2010) 25:325–334 fable in the organizational and social sciences (pp. 311–338). New York: Routledge. Cohen, J. (1994). The earth is round (p \ .05). American Psychologist, 49, 997–1003. Conway, J. M. (2002). Method variance and method bias in I/O psychology. In S. G. Rogelberg (Ed.), Handbook of research methods in industrial and organizational psychology (pp. 344– 365). Oxford: Blackwell Publishers. Cote, J. A., & Buckley, M. R. (1987). Estimating trait, method and error variance: Generalizing across 70 construct validation studies. Journal of Marketing Research, 26, 315–318. Crampton, S., & Wagner, J. (1994). Percept-percept inflation in microorganizational research: An investigation of prevalence and effect. Journal of Applied Psychology, 79, 67–76. Dooley, W. K., Shaffer, D. R., Lance, C. E., & Williamson, G. M. (2007). Informal care can be better than adequate: Development and evaluation of the exemplary care scale. Rehabilitation Psychology, 52, 359–369. Doty, D. H., & Glick, W. H. (1998). Common method bias: Does common methods variance really bias results? Organizational Research Methods, 1, 374–406. Feldt, L. S., & Brennan, R. L. (1989). Reliability. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 105–146). Washington, DC: American Council on Education and National Council on Measurement in Education. Heisenberg, W. (1927). Über den anschaulichen Inhalt der quantentheoretischen Kinematik und Mechanik. [About the graphic content of quantum theoretic kinematics and mechanics]. Zeitschrift für Physik, 43, 172–198. Hoffman, B. J., Lance, C. E., Bynum, B. H., & Gentry, W. A. (2010). Rater source effects are alive and well after all. Personnel Psychology, 63, 119–151. Hoffman, B. J., & Woehr, D. J. (2009). Disentangling the meaning of multisource performance rating source and dimension factors. Personnel Psychology, 62, 735–765. Hunter, J. (1997). Needed: A ban on the significance test. Psychological Science, 8, 3–7. International Personality Item Pool. (2010). International Personality Item Pool: A scientific collaboratory for the development of advanced measures of personality traits and other individual differences (http://ipip.ori.org/). http://ipip.ori.org/newNEOKey. htm#Conscientiousness. February 5, 2010. James, L. A., & James, L. R. (1989). Integrating work environment perceptions: Explorations into the measurement of meaning. Journal of Applied Psychology, 74, 739–751. Judge, T., Bono, J., & Locke, E. (2000). Personality and job satisfaction: The mediating role of job characteristics. Journal of Applied Psychology, 85, 237–249. Lance, C. E., Baranik, L. E., Lau, A. R., & Scharlau, E. A. (2009). If it ain’t trait it must be method: (Mis)application of the multitraitmultimethod design in organizational research. In C. E. Lance & R. J. Vandenberg (Eds.), Statistical and metholodogical myths and urban legends: Doctrine, verity and fable in the organizational and social sciences (pp. 339–362). New York: Routledge. Lance, C. E., Baxter, D., & Mahan, R. P. (2006). Multi-source performance measurement: A reconceptualization. In W. Bennett, C. E. Lance, & D. J. Woehr (Eds.), Performance measurement: Current perspectives and future challenges (pp. 49–76). Mahwah, NJ: Erlbaum. Lance, C. E., Dawson, B., Birklebach, D., & Hoffman, B. J. (in press). Method effects, measurement error, and substantive conclusions. Organizational Research Methods. Lance, C. E., Hoffman, B. J., Baranik, L. E., & Gentry, W. A. (2008). Rater source factors represent important subcomponents of the criterion construct space, not rater bias. Human Resource Management Review, 18, 223–232. 333 Lance, C. E., Teachout, M. S., & Donnelly, T. M. (1992). Specification of the criterion construct space: An application of hierarchical confirmatory factor analysis. Journal of Applied Psychology, 77, 437–452. Lance, C. E., & Vandenberg, R. J. (2009). Introduction. In C. E. Lance & R. J. Vandenberg (Eds.), Statistical and metholodogical myths and urban legends: Doctrine, verity and fable in the organizational and social sciences (pp. 1–4). New York: Routledge. Lindell, M. K., & Whitney, D. J. (2001). Accounting for common method variance in cross-sectional research designs. Journal of Applied Psychology, 86, 114–121. Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley Publishing Company. MacKenzie, S. B., Podsakoff, P. M., & Ahearne, M. (1998). Some possible antecedents and consequences of in-role and extra-role salesperson performance. Journal of Marketing, 62, 87–98. Mallard, A. G. C., & Lance, C. E. (1998). Development and evaluation of a parent–employee interrole conflict scale. Social Indicators Research, 45, 343–370. Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (pp. 13–103). Washington, DC: American Council on Education and National Council on Measurement in Education. Motowidlo, S., Borman, W., & Schmit, M. (1997). A theory of individual differences in task and contextual performance. Human Performance, 10, 71–83. Mount, M. K., Judge, T. A., Scullen, S. E., Sytsma, M. R., & Hezlett, S. A. (1998). Trait, rater, and level effects in 360-degree performance ratings. Personnel Psychology, 51, 557–576. Mowday, R. T., Steers, R. M., & Porter, L. W. (1979). The measurement of organizational commitment. Journal of Vocational Behavior, 14, 224–247. Netemeyer, R. G., Boles, J. S., & McMurrian, R. (1996). Development and validation of work-family conflict and family-work conflict scales. Journal of Applied Psychology, 81, 400–410. Ones, D., Viswesvaran, C., & Reiss, A. (1996). Role of social desirability in personality testing for personnel selection: The red herring. Journal of Applied Psychology, 81, 660–679. Organ, D., & Ryan, K. (1995). A meta-analytic review of attitudinal and dispositional predictors of organizational citizenship behavior. Personnel Psychology, 48, 775–802. Podsakoff, P., MacKenzie, S., Lee, J., & Podsakoff, N. (2003). Common method biases in behavioral research: A critical review of the literature and recommended remedies. Journal of Applied Psychology, 88, 879–903. Podsakoff, P., & Todor, W. (1985). Relationships between leader reward and punishment behavior and group processes and productivity. Journal of Management, 11, 55–73. Putka, D. M., & Sackett, P. R. (2010). Reliability and validity. In J. L. Farr & N. T. Tippins (Eds.), Handbook of employee selection (pp. 9–49). New York: Routledge. Richardson, H., Simmering, M., & Sturman, M. (2009). A tale of three perspectives: Examining post hoc statistical techniques for detection and correction of common method variance. Organizational Research Methods, 12, 762–800. Scullen, S. E., Mount, M. K., & Goff, M. (2000). Understanding the latent structure of job performance ratings. Journal of Applied Psychology, 85, 956–970. Scullen, S. E., Mount, M. K., & Judge, T. A. (2003). Evidence of the construct validity of developmental ratings of managerial performance. Journal of Applied Psychology, 88, 50–66. Shalley, C., Gilson, L., & Blum, T. (2009). Interactive effects of growth need strength, work context, and job complexity on self- 123 334 reported creative performance. Academy of Management Journal, 52, 489–505. Skinner, B. F. (1957). Verbal behavior. New York: AppletonCentury-Crofts. Spector, P. (2006). Method variance in organizational research: Truth or urban legend? Organizational Research Methods, 9, 221–232. Tornow, W. W. (1993). Perceptions or reality: Is multi-perspective measurement a means or an end? Human Resource Management, 32, 221–229. Van Scotter, J. R., & Motowidlo, S. J. (1996). Interpersonal facilitation and job dedication as separate facets of contextual performance. Journal of Applied Psychology, 81, 525–531. Vandenberg, R. J. (2006). Statistical and methodological myths and urban legends: Where, pray tell, did they get this idea? Organizational Research Methods, 9, 194–201. 123 J Bus Psychol (2010) 25:325–334 Westaby, J., & Braithwaite, K. (2003). Specific factors underlying reemployment self-efficacy: Comparing control belief and motivational reason methods for the recently unemployed. Journal of Applied Behavioral Science, 39, 415–437. Widaman, K. (1985). Hierarchically nested covariance structure models for multitrait-multimethod data. Applied Psychological Measurement, 9, 1–26. Williams, L. J., & Anderson, S. E. (1994). An alternative approach to method effects by using latent-variable models: Applications in organizational behavior research. Journal of Applied Psychology, 79, 323–331. Williams, L. J., Cote, J. A., & Buckley, M. R. (1989). Lack of method variance in self-reported affect and perceptions at work: Reality or artifact? Journal of Applied Psychology, 74, 462–468. http://www.springer.com/journal/10869
© Copyright 2026 Paperzz