Confirmatory Factor Analysis Definition Confirmatory factor analysis (CFA) is a procedure for learning the extent to which k observed variables might measure m abstract variables, wherein m is less than k. In CFA, we indirectly measure non-observable behavior by taking measures on multiple observed behaviors. Conceptually, in using CFA we can assume either nominalist or realist constructs, yet most applications of CFA in the social sciences assume realist constructs. Terminology: Factor = Abstract Concept = Abstract Construct = Latent Variable. CFA differs from EFA in that it specifies a factor structure based upon expected theoretical relationships. Whereas we might think of EFA as a procedure for inductive theory construction, CFA is a procedure for testing hypotheses deduced from theory. CFA allows the researcher to conduct two forms of data analysis not avaliable in EFA: 1. CFA allows for the examination of second-order (i.e., higher-order) latent variables. We might posit, for example, that marital satisfaction (a latent variable) consists of four subdimensions (each a latent variable), satisfaction with: romance, companionship, family finances, and child rearing. 2. CFA allows for testing hypotheses related to construct validity. We can test for statistical significance of the effect of a latent variable on each of the observed variables posited to measure it. The web page entitled, "Using CFA to Test Empirical Validity" provides an example of how CFA can be used to examine construct and predictive validity for a second-order latent variable. CFA requires one to specify the measurement of and relationships among the factors. Therefore, it relies upon deductive examination of a theory. Deductive analysis has the advantage of knowing a priori the factor structure, which allows one to test hypotheses related to examining the various types of construct validity. However, whereas the EFA model is never underidentified, the CFA model can be underidentified, requiring one to understand mathematical identification and the rules for certifying model identification. Assumptions 1. 2. 3. 4. 5. 6. 7. 8. Typically, realism rather than nominalism: Abstract variables are real in their consequences. Normally distributed observed variables. Continuous-level data. Linear relationships among the observed variables. Content validity of the items used to measure an abstract concept. E(ei) = 0 (random error). Theoretically specified relationships among observed variables and factors. A sample size greater than 100 (more is better). Note: In CFA: 1. we use the symbol 2. 3. 4. (xi) to refer to an exogenous factor (an independent latent variable). we use the symbol (eta) to refer to an endogenous factor (a dependent latent variable). we use the symbol to refer to the intercept of the measurement model. we use the symbol to refer to the variance/covariance matrix of the factor(s). Note that the variance of a factor always equals 1 in EFA. The diagram shown below shows the terminology typically used in CFA and Structural Equation Modeling (SEM). This course addresses the "measurement" model, meaning the measurements of and relationships among the exogenous variables. Soc 613 addresses the "causal" model, referring to relationships among the exogenous and endogenous variables. When we address second-order CFA, we will discuss measuring and . The principles discussed in measuring apply also to measuring . Most of the notes in this lecture refer to measuring Notation y x p x 1 vector of observed endogenous (i.e., dependent) variables. q x 1 vector of observed exogenous (i.e., independent) variables. m x 1 vector of latent endogenous variables (i.e., dependent factors). n x 1 vector of latent exogenous variables (i.e., independent factors). p x 1 vector of measurement errors in y. q x 1 vector of measurement errors in x. m x n matrix of coefficients of the variables. m x m matrix of coefficients of the variables. m x 1 matrix of coefficients of equation errors in the relationships between and m x m matrix of coefficients of correlations among the errors in the relationships between and Example of a CFA Model The model shown below specifies that a set of three abstract variables related to locus of control—internal, chance, and powerful others—can be measured with sufficient validity and reliability by nine observed variables, wherein each latent variable is measured with three observed variables. Software Packages for Conducting CFA The Sociology 512 web site provides examples of conducting CFA using six well known software packages: LISREL, MPlus, R, SAS, SPSS/AMOS, and Stata. The examples shown in these notes rely mainly upon the LISREL software package. Consequences of Measurement Error We noted above that CFA assumes a sample size of at least 100. Understanding the consequences of measurement error can explain why we make this assumption. Single Indicator of X1 = 1 + 111 + 1, where: 1. 2. 3. refers to the intercept of the equation the population mean of = K (kappa) the population mean of X = x1 (mu) E(X1) = E(1 + 111 + 1), or: x1 = 1 + 11K1 Recall that this equation cannot be solved because of the linear dependencies in the matrix. To estimate the parameters, we must make one of the following assumptions: 1. The variance of the factors equals one. 2. The parameter estimate (11) for the effect of the factor (1) on X1 equals one. 3. For example: a. let 1 = 0 (standardized variables), b. set 11 = 1. 4. Then, X1 = 1 + 1. 5. Then, E(X1) = E() + E(). 6. Assume: E() = 0 (i.e., random errors in measurement). 7. Then, E(X1) = E(), or x1 = K1. 8. Hence, E(X1) is an unbiased estimator of K1. (so far, so good!). Multiple Indicators of Consider two indicators of : X1 = 1 + 111 + 1 X2 = 2 + 211 + 2 We must set a scale for one of the latent variables. a. let 1 = 0 (standardized variables), b. set 11 = 1. 4. Then, E(X1) = k. 5. Having set a scale for using 1 and 11, it is unnecessary and incorrect to so again for 2 and 21. 6. E(X1) = k (as before). 7. The mean of X2, however, may not equal the mean of 1. Consider the consequences of estimating the variance of 1: 1. var(X1) = 11211 + var(1), where 11 = var(1) 2. if 11 = 1 and var(1) = 0, then var(X1) = 11 (unbiased estimator) 3. if var(2) ≠ 0, however, then var(X2) > 11 (biased estimator). 4. Therefore, var(X2) is a biased estimator of 11. 5. To resolve this issue, CFA relies upon asymptotic distribution theory, which essentially states that for large sample sizes (n ≥ 100), var(X2) is an unbiased estimator of 11. The Measurement Model A measurement model specifies a structural relationship that connects latent variables to one or more observed variables. The general linear model for specifying these relationships is: = () = E(XX’), where: 1. refers to “reality.” () refers to theory. 2. 3. E(XX’) refers to the correlation matrix of observed variables. Consider the following example of the measurement model: For standardized variables: X1 = 1 + 111 + 1 X2 = 2 + 211 + 2 or, in general: X = x + Most latent variables in the social sciences are abstract ones. Abstract variables require an arbitrary scale. There are two approaches to setting a scale: 1. Set the variances of all latent variables () to 1. 2. Set one of the estimates in x (for each xii) to 1. 3. Do not do both procedures in the same model. The Covariance Matrix for the CFA Model 1. The covariance matrix for X = E(XX’) 2. Therefore, () = E(X’X), wherein: X = x + 3. XX’ = (x + ) (x + )’ 4. XX’ = (x + ) (’x’ + ’) 5. XX’ = x’x’ + x’ + ’x’ + ’ 6. E(XX’) = xE(’)x’ + xE(’) + E(’)x’ + E(’) Assume: 1. E(’) = E(’),= 0, factors are not correlated with errors (random errors in measurement). 2. E(’) is the covariance matrix of latent variables: 3. E(’) is the covariance matrix of errors: 4. Therefore: = () = E(XX’) = xx' + Model Identification In conducting CFA we specify a set of parameters to be estimated. We therefore must specify a model that contains sufficient information to estimate these parameters. Consider the following model: X1 X2 X3 X4 Let: 11 = 1 to set the scale for 1 Let: 32 = 1 to set the scale for 2 Assume: uncorrelated error terms. This assumption is not necessary in CFA; it is made here to simplify the presentation regarding model identification. Then, X= | | | | x= | 1 X1 X2 X3 X4 | | | | 0 | | 21 0 | | 0 1 | | 0 42 | = | | 1 2 | | = | | | | = | var(1) 0 0 0 | | 0 var(2) 0 0 | | 0 0 var(3) 0 | | 0 0 0 var(4) | Compute (): | 1 0 | 21 0 | 0 1 | 0 42 x | | | | | | | * | | | x | | x' 21 42 = 21 42 | | * | | | 1 21 0 0 | | 0 0 1 42 | ' xx | | 21 2142 42 | | var(1) 0 0 0 | | 0 var(2) 0 0 | | 0 0 var(3) 0 | | 0 0 0 var(4) | | | | 21 42 = 2 21 | | 2 42 + = () var(1) | | | | 21 42 | 212var(2) 21 2142 | var(3) 42 | var(4) 2 42 Using E(XX'): | | | | var(X1) cov(X1 X2) cov(X1 X3) cov(X1 X4) var(X2) cov(X2 X3) cov(X2 X4) var(X3) cov(X3 X4) var(X4) | | | | Then: cov(X1 X3) 21 = cov(X2 X3) / cov(X1 X3) 42 = cov(X1 X4) / cov(X1 X3) 11 = [cov(X1 X2) * cov(X1 X3)] / cov(X2 X3) 22 = [cov(X3 X4) * cov(X1 X3)] / cov(X1 X4) var(1) = var(X1) - 11 var(2) = var(X2) - 2111 var(3) = var(X3) - 22 var(4) = var(X4) - 4211 Example Assume the correlation matrix shown below. Calculate the parameter estimates given the model as identified above. rx1x2 = | | | | 1 .305 .233 .216 1 .230 .213 rx1x3 = .233 21 = rx2x3 / rx1x3 = .987 41 = rx1x4 / rx1x3 = .927 11 = (rx1x2 * rx1x3) / rx2x3 = .309 22 = (rx3x4 * rx1x4) / rx2x3 = .332 = 1 - .309 = .691 1 .308 1 | | | | | = 1 – [.9872 * .309] = .699 = 1 - .332 = .668 = 1 – [.9272 * .332] = .715 Item reliabilities (squared multiple correlation) = ij2ii / var(Xi) X1 = (12)(.309) / 1 = .309. X2 = (.9872)(.309) / 1 = .301. X3 = (12)(.332) / 1 = .332. X4 = (.9272)(.332) / 1 = .285. Summary All parameters to be estimated in () must be expressed in terms present in E(XX'). Rules for Mathematical Identification Some rules can help one know if a model will be identified. T-rule t ≤ ½ (q)(q + 1), where: t = the number of parameters to be estimated q = the number of observed variables. The t-rule is necessary, but not sufficient for mathematical identification. Example: For the model shown above: 1. t = ½ (4)(5) = 10. 2. The nine estimated parameters to be estimated are: 21, 41, 11, 22, 12, , 22,33, and 44 3. Therefore, the model meets the t rule. In this case, the model is said to be "underidentified" because t < 10. Three Indicator Rule 1. Three or more observed variables per latent variable. 2. Each row of x has only one non-zero element in addition to 11 = 1. That is, each X is an indicator of just one latent variable. 3. is a diagonal matrix. That is, the errors are uncorrelated. This rule is sufficient, but not necessary for mathematical identification. Two Indicator Rule 1. Two observed variables per latent variable. 2. Each row of x has only one non-zero element in addition to 11 = 1. That is, each X is an indicator of just one latent variable. 3. is a diagonal matrix. That is, the errors are uncorrelated. 4. More than one latent variables. 5. has no zero elements. That is, the latent variables are correlated with one another. This rule is sufficient, but not necessary for mathematical identification. Degrees of Freedom The degrees of freedom for a CFA model: d.f. = [q(q+1) / 2] – t. That is, the number of potential parameters minus the number of estimated parameters. Model Evaluation Theoretical proposition: = () = E(XX’), where: 1. refers to “reality.” 2. () refers to theory. 3. E(XX’) refers to the correlation matrix of observed variables. Notation: S = E(XX’), the observed correlation matrix. ( ) = the matrix of estimated parameters. Alternative Hypothesis: The theory fits the data. S = ( ) Null Hypothesis: There is no difference between the estimated parameter matrix and the observed correlation matrix. S - ( ) = 0 Note: A relatively small value for a model test statistic, such as chi-square, indicates that the theory fits the data. Such a finding would indicate support for the theory. Thus, in evaluating model fit, we look for a low chi-square value relative to the degrees of freedom, showing a probability of alpha < .05. Note: Measures of overall fit are not applicable to exactly identified models because at least one degree of freedom is required for the hypothesis. Note: Although evaluation statistics might indicate an overall good fit for the model, the individual parameter estimates might be theoretically inappropriate or statistically non-significant. Maximum Likelihood Chi-Square Ho: S - ( ) = 0 2 = (n – 1) (log|| + tr(( )-1S) – log|S| - q), where: n = sample size. log refers to the natural log. d.f. + [1/2 (q) (q+1)] - t, where: t = the number of possible parameters to be estimated. q = the number of observed variables. Consider the conceptual foundation of chi-square. It equals a summary of the estimated score minus the observed score in a table. In this same manner, chi-square equals the estimated parameters plus their item reliabilities (the trace, or diagonal of the observed correlations divided by the estimated parameters) minus the observed correlation matrix minus the number of observed variables. Coefficient of Determination The coefficient of determination (R-square) calculates the percent of variance explained in the observed variables (X matrix) by the latent variables ( matrix). It equals 1 minus the determinant of the errors in estimating X (the matrix) divided by the determinant of Sigma-hat (i.e., the input correlation matrix). R2 = 1 – [ || / | XX' | Goodness of Fit Indexes Various goodness of fit indexes have been developed to assess model fit. Ones more commonly used are the Goodness of Fit Index (GFI) and the Adjusted Goodness of Fit Index (AGFI). The Residual Mean Square (RMS) and Critical N (CN) also are a popular statistics used to assess model fit. Critical N is equal to "what chi-square would be if the sample size were 200." Thus, Critical N adjusts chi-square for very large samples, wherein a large sample size can create a large chi-square statistic even when the "amount of error" is small. These indexes have the disadvantage of not having ratio scales. Thus, the community of scholars must arrive at some agreed upon level of the indexes that assures them of adequate model fit. In general, a GFI or AGFI of .9 or above is considered acceptable. The community of scholars looks for an RMS of below .05. The community of scholars looks for a CN of above 200, meaning that "a sample size of more than 200 is needed to arrive at a chi-square that indicates a probability of alpha greater than .05." See related article by Schreiber et al. for a detailed description of model evaluation for CFA and Structural Equation models. Component Fit Measures The t-test at 1 degree of freedom is used to evaluate the statistical significance of a parameter estimate, wherein t = estimate / standard error of the estimate. A t-ratio of 1.98 or greater indicates statistical significance at alpha = .05. Reliability of the Parameter Estimates Consider this model: Reliability of Xi The reliability (i.e., communality) of Xi is the magnitude of the direct relationship that all latent variables have on Xi. Thus, the reliability of Xi = i=1-qij2ii / var(Xi). Reliability and Model Specification The reliability of Xi can vary depending upon model specification: In the Parallel Model: 11 = 21 1 = 2 In the Tau-Equivalent Model: 11 = 21 1 ≠ 2 In the Congeneric Model: 11 ≠ 21 1 ≠ 2 The Reliability of q (the reliability of ) = xi)2 i 1 / q q i 1 i 1 xi)2 + i) Average Variance Explained by vc((average variance explained by ) = i=1-q xi2 / i=1-q xi2 + i=1-q i Standardized Parameter Estimates Reporting standardized parameter estimates enables the community of scholars to compare different studies of the same model. The formulas for calculating standardized estimates are: ijs = ij [jj / var(Xi)]1/2 ijs = ij / jj1/2 ij1/2 ijs = ij / var(Xi) where i refers to an observed variable and j refers to a latent variable. In matrix format: xs = Dx-1x D s = D-1 D-1 s = Dx-1 Dx-1 where: Dx = (diag [xx' + ]1/2 D= (diag )1/2 Unique Validity Variance In cases where a measurement model specifies correlated factors or error terms one might want to know the unique commonality for an observed variable. Uxij (the unique validity variance, or commonality) of the effect of Xi on Xj) = Rxi2 - Rxi(j)2, where: Rxi2 is the squared multiple correlation coefficient for Xi. This is the proportion of variance in Xi explained by all latent variables in the model that have a direct effect on Xi. Rxi2 = xi *-1 xi' where: 1. 2. xiis the correlations of on Xi, for all that affect Xi. (a 1 x d vector, where d is the number of with direct effects on Xi). * = correlation matrix of all with direct effects on Xi. Rxi(j)2 is the squared multiple correlation coefficient for Xi, controlling for the effects of the latent variable on other observed variables. Rxi(j)2 = [xi() ij*-1 xi()'] / var(Xi), where: 1. 2. xi() = is the correlations of on Xi, for all that affect Xi, except for j, the latent variable of interest (a 1 x d vector, where d is the number of with direct effects on Xi). (j)* = correlation matrix of all with direct effects on Xi, except for j, the latent variable of interest. Note: The unique validity variance might be relatively low in comparison with Rxi2 because Xi might depend upon highly correlated latent variables. Degree of Collinearity A measurement model with more than one latent variable, wherein the latent variables are correlated with one another, should be evaluated for its degree of collinearity. R(j)2 = [(j)(j) (j)*-1 (j)(j)'] / (jj) or, the R-square for affecting Xi other than the of interest. Note: For just two x affecting Xi: R(j)2 = 122 / 1122 (i.e., the squared correlation of 1 and 2. Factor Score Estimation Having found some underlying dimension(s) in the data, the researcher might want to construct a factor scale. A factor scale is a latent variable derived from two or more observed variables that have been demonstrated to have content and construct validity, and which are sufficiently reliable to be used for further analysis. Factor scales can be used in two ways: 1) to examine observations in terms of their scores on the latent variables, 2) to use the latent variables in subsequent analysis as independent and/or dependent variables. Measurements on factor scales can be constructed in several ways. First, they can be calculated by simply adding or obtaining the mean of the two or more observed variables comprising the scale. If the observed variables differ in their item reliabilities, however, the researcher might want to construct the factor scale based upon weighted observed variables. Observed variables typically are weighted by their parameter estimates on the factor. Listed below are three procedures that use different assumptions to create more refined factor scores. Least Squares Procedure Factor Score = (xS-1)x, where S = the observed correlation matrix. Bollen's Procedure Bollen suggests accounting for the correlations among the latent variables: Factor Score = (xS-1)x, where S = the observed correlation matrix. Bartlett's Procedure Barlett suggests giving more weight to observed variables with greater item reliability: Factor Score = [(x'-2)(S-2S)-1]x, where S = the observed correlation matrix. Hypothesis Testing and Model Comparison One advantage to theory testing and the subsequent use of CFA is that nested models can be used to test hypotheses. One can conduct a difference in chi-square test, for example, to evaluate the extent to which changes in model specification affect model fit. Research and Null Hypotheses for All Models Ha: The model fits the data. If the model fits the data, then chi-square will be low and the prob. of a type-I error will be over .05 (assuming an assigned type-1 error rate of 5%). Ho: There is no relationship between the model and the data. If there is no relationship between the model and the data, then chi-square will be high and the prob. of a type-I error will be less than .05 (assuming an assigned type-1 error rate of 5%). The approach to testing differences in estimates across two samples, or testing for the moderating effect of an external variable, is to estimate a baseline model that assumes no difference in estimates across the two samples. Then, estimate less restricted models, ones that allow for differences in parameter estimates across levels of the external variable. The chisquare calculation for each less restricted model will be less than the chi-square value for the baseline model. And the degrees of freedom for the less restricted model will be less than that of the baseline model. To determine if a less restricted model fits the data better than the baseline model, one can calculate a chi-square difference test: 2r - 2u chi-square (baseline) – chi-square (less restricted). This difference score is evaluated at the difference in the degrees of freedom for the two models: df (baseline) – df (less restricted). For example, suppose the chi-square for a baseline model that contains three parameters in the gamma matrix equals 142.691 at 123 d.f. Suppose that a less restricted model is estimated that allows for the three parameters in the gamma matrix to be estimated separately for the two groups under consideration. And suppose that the chi-square for this less restricted model equals 110.527 at 120 d.f. Then the difference in chi-square equals 32.164 at 3 d.f. The critical value of chi-square at three degrees of freedom for a type-I error rate of 5% equals 7.815. Therefore, we would conclude that, at a type-I error rate of 5%, the less restricted model fits the data better than does the baseline model, meaning that the parameter estimates differ significantly from one another across the two levels of the external variable. The next step would be to conduct a chi-square difference test for each of the paths in the gamma matrix to determine which of the three paths has significantly different parameter estimates across the two levels of the external variable. Typically, one would allow a matrix of estimates, such as the lambda, gamma, beta, and error matrices (psi, theta-delta, and theta-epsilon) matrices to become less restricted to examine the possibility of differences in parameters across the levels of the external variable. If the chisquare difference test indicates that the baseline model and less restricted model contain at least some significantly different parameter estimates, then one would test each path within a matrix at a time to locate the ones that differ significantly from one another (they might all be significantly different from one another). If one finds a less restricted model that fits the data significantly better than the baseline model, then this model becomes the new "baseline" model for testing of further differences in parameter estimates across levels of the external variable. The Sociology 512 web site includes notes on hypothesis testing using the SAS and LISREL software packages. Second-Order (Higher-Order) Factor Analysis Some latent variables are themselves considered to be composed of multiple latent variables. The latent variable Locus of Control (LOC), for example, is thought to comprise three subdimensions: internal, chance, and powerful others. The diagram below illustrates a secondorder model of LOC, with the variable "perceived risk" used to assess the predictive validity of the measure of LOC. = + , where: (eta): dependent, (i.e., endogenous) latent variable. (gamma): parameter estimates for the independent (i.e., exogenous) latent variables. (xi): independent (i.e., exogenous) latent variables. (zeta): errors for the equation. Sensitivity Analysis: Testing Equality of Parameter Estimates Across Two Groups A central premise of CFA is that the theory fits the data. Thus, if an observed variable is posited to measure just one latent variable then it should not also have a significant parameter estimate on another latent variable. If an observed variable X1 is posited to measure 1, for example, then X1 should not have a significant parameter estimate on 2. If it does, then we can question the construct validity of X1 as an indicator of 1 as well as the theory that specifies that X1 measures only 1. Sensitivily analysis examines the extent to which a theory has construct validity: the extent to which hypotheses of no relationship are supported by the data. Consider, for example, the Locus of Control CFA model as specified by Sapp and Harrod (see: http://www.soc.iastate.edu/sapp/Soc512MeasurementRefs.html). Sapp and Harrod posit that 1) the latent variable Internal is measured with three observed variables: Own Actions, Protect, and Determine, 2) the latent variable Chance is measured with three observed variables: Accidential Happenings, Bad Luck Happenings, and Lucky, and 3) the latent variable Powerful Others is measured with three observed variables: Pressure Groups, Powerful Others, and Powerful People (see: http://www.soc.iastate.edu/sapp/soc512LOCCFAModel.pdf). Implied by this model is that Own Actions, for example, which is posited to measure the latent variable Internal, is not significantly related to either of the remaining latent variables: Chance or Powerful Others. Sensitivity analysis examines whether the implied hypotheses of no relationship are supported by the data. Shown below are examples of sensitivity analysis for the Sapp and Harrod LOC model conducted in LISREL. The Sociology 512 web site includes notes on hypothesis testing using the SAS and LISREL software packages. Means and Intercepts for Latent Variables In CFA with multiple samples, it is possible to estimate means and intercepts for the latent variables. X(g) = x + x(g) + (g) where: x is the constant intercept term for each Xi. This value is set to be equal across samples (g). Common Metric Standardized Solution Factor Loadings Loadings are listed in the Lambda X matrices. Intercepts are listed in the Tau X matrices. These matrices are the same for all groups. Item Actions Protect Detrmine Acchap Badhap Lucky Pressure Powoth Powple Internal Factor Loadings Chance P. Others .579 .673 .520 .538 .543 .692 .512 .822 .721 Var(x) Tau-x Intercept 1.144 0.860 2.453 1.449 1.838 2.743 1.935 2.231 1.469 5.880 5.670 4.556 5.181 4.818 4.813 4.966 5.260 5.002 Factor Covariance Matrices Phi Matrices in Common Metric Standardized Solution for Each Group Risk Perceivers Low (n=67) High (n=62) Internal Latent Variables Chance Powerful Others Internal 1.121 Chance .612 .679 P. Others .586 .251 Internal .869 Chance .814 .348 P. Others .627 .696 1.024 .974 Factor Means (Kappa Matrix) Factor Means Scaled Factor Means Internal Chance Power Internal Chance Power Low .000 .000 .000 .113 .129 .014 High -.218 -.248 -.026 -.113 -.129 -.014 Scaled Factor Mean g [( xij x j ) n / g ] / ni i 1 Where: i = 1, 2, 3 ... g groups j = 1, 2, 3 ... k factors n = number of observations in the group Example for X21 X.1 = (0 + -.218) / 2 = -.109 g n = (62 + 67) / 2 = 64.5 i 1 g [( xij x j ) n / g ] / ni = [ (-.218 + .109) 64.5 ] / 62 = -.113 i 1 Analysis of Ordinal Variables Albright and Park (2009) note that: The maximum likelihood estimation (MLE) approach relies on the strong assumption of multivariate normality. In practice, a substantial amount of social science data is non-normal. Survey responses are often coded as yes/no or as scores on an ordered scale (e.g. strongly disagree, disagree, neutral, agree, strongly agree). In the presence of categorical or ordinal data, MLE may not work properly, calling for alternative estimation methods. Mplus and LISREL employ a multi-step method for ordinal outcome variables that analyzes a matrix of polychoric correlations rather than covariances. This approach works as follows: 1) thresholds are estimated by maximum likelihood, 2) these estimates are used to estimate a polychoric correlation matrix, which in turn is used to 3) estimate parameters through (diagonally) weighted least squares using the inverse of the asymptotic covariance matrix as the weight matrix (Muthén, 1984; Jöreskog, 1990). In LISREL, the diagonally weighted least squares (DWLS) method needs to be specified. Alternatively, the polychoric correlation matrix and asymptotic covariance matrix is estimated and saved into a LISREL system file (.dsf) using PRELIS before fitting the model. Mplus automatically follows above steps when the syntax includes a line identifying observed variables as categorical. Instructions [For those times when you will be using data collected by persons other than those who graduated from ISU, given that ISU graduates never would be so silly as to collect ordinal-level data! ] In cases of non-normality (i.e., assumed for ordinal-level data), it is a misuse of CFA methodology to: Use arbitrary scale scores for categories, pretending that these scores have interval scale properties. Compute a covariance matrix or product-moment correlation matrix for such scores. Analyze cov/correlation matrices using the method of maximum-likelihood. Such misuse can lead to: distorted parameter estimates. incorrect measures of chi-square. incorrect estimates of standard error, and therefore incorrect t-ratios. When conducting CFA with ordinal-level data, use weighted least squares with an asymptotic covariance matrix. N must be at least 200 if k < 12 and at least 1.5 k(k+1) if K ≥ 12. Power Analysis From MEERA: [http://meera.snre.umich.edu/plan-an-evaluation/related-topics/poweranalysis-statistical-significance-effect-size] What is power? To understand power, it is helpful to review what inferential statistics test. When you conduct an inferential statistical test, you are often comparing two hypotheses: The null hypothesis – This hypothesis predicts that your program will not have an effect on your variable of interest. For example, if you are measuring students’ level of concern for the environment before and after a field trip, the null hypothesis is that their level of concern will remain the same. The alternative hypothesis – This hypothesis predicts that you will find a difference between groups. Using the example above, the alternative hypothesis is that students’ post-trip level of concern for the environment will differ from their pre-trip level of concern. Statistical tests look for evidence that you can reject the null hypothesis and conclude that your program had an effect. With any statistical test, however, there is always the possibility that you will find a difference between groups when one does not actually exist. This is called a Type I error. Likewise, it is possible that when a difference does exist, the test will not be able to identify it. This type of mistake is called a Type II error. Power refers to the probability that your test will find a statistically significant difference when such a difference actually exists. In other words, power is the probability that you will reject the null hypothesis when you should (and thus avoid a Type II error). It is generally accepted that power should be .8 or greater; that is, you should have an 80% or greater chance of finding a statistically significant difference when there is one. Power Estimation: Bollen Procedure (See pages 338-349). 1. Estimate the more specified model and ACOV(a), the covariance matrix of the parameter estimates for this model. 2. Calculate the added parameter estimates for the more specified model (Ha)under the assumption that all standardized estimates equal .1. 3. NCP = [(column x row) matrix of the added parameter estimates] * [diagonal matrix of the variances of the added parameters (inverse)] * [(row * column) matrix of the added parameter estimates]. 4. Calculate the power of the test. The Multitrait-Multimethod Matrix Home » Measurement » Construct Validity » The Multitrait-Multimethod Matrix (hereafter labeled MTMM) is an approach to assessing the construct validity of a set of measures in a study. It was developed in 1959 by Campbell and Fiske (Campbell, D. and Fiske, D. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. 56, 2, 81-105.) in part as an attempt to provide a practical methodology that researchers could actually use (as opposed to the nomological network idea which was theoretically useful but did not include a methodology). Along with the MTMM, Campbell and Fiske introduced two new types of validity -- convergent and discriminant -- as subcategories of construct validity. Convergent validity is the degree to which concepts that should be related theoretically are interrelated in reality. Discriminant validity is the degree to which concepts that should not be related theoretically are, in fact, not interrelated in reality. You can assess both convergent and discriminant validity using the MTMM. In order to be able to claim that your measures have construct validity, you have to demonstrate both convergence and discrimination. The MTMM is simply a matrix or table of correlations arranged to facilitate the interpretation of the assessment of construct validity. The MTMM assumes that you measure each of several concepts (called traits by Campbell and Fiske) by each of several methods (e.g., a paper-and-pencil test, a direct observation, a performance measure). The MTMM is a very restrictive methodology -- ideally you should measure each concept by each method. The Multitrait-Multimethod Matrix To construct an MTMM, you need to arrange the correlation matrix by concepts within methods. The figure shows an MTMM for three concepts (traits A, B and C) each of which is measured with three different methods (1, 2 and 3) Note that you lay the matrix out in blocks by method. Essentially, the MTMM is just a correlation matrix between your measures, with one exception -- instead of 1's along the diagonal (as in the typical correlation matrix) we substitute an estimate of the reliability of each measure as the diagonal. Before you can interpret an MTMM, you have to understand how to identify the different parts of the matrix. First, you should note that the matrix is consists of nothing but correlations. It is a square, symmetric matrix, so we only need to look at half of it (the figure shows the lower triangle). Second, these correlations can be grouped into three kinds of shapes: diagonals, triangles, and blocks. The specific shapes are: The Reliability Diagonal (monotrait-monomethod) Estimates of the reliability of each measure in the matrix. You can estimate reliabilities a number of different ways (e.g., test-retest, internal consistency). There are as many correlations in the reliability diagonal as there are measures -- in this example there are nine measures and nine reliabilities. The first reliability in the example is the correlation of Trait A, Method 1 with Trait A, Method 1 (hereafter, I'll abbreviate this relationship A1-A1). Notice that this is essentially the correlation of the measure with itself. In fact such a correlation would always be perfect (i.e., r=1.0). Instead, we substitute an estimate of reliability. You could also consider these values to be monotrait-monomethod correlations. The Validity Diagonals (monotrait-heteromethod) Correlations between measures of the same trait measured using different methods. Since the MTMM is organized into method blocks, there is one validity diagonal in each method block. For example, look at the A1-A2 correlation of .57. This is the correlation between two measures of the same trait (A) measured with two different measures (1 and 2). Because the two measures are of the same trait or concept, we would expect them to be strongly correlated. You could also consider these values to be monotrait-heteromethod correlations. The Heterotrait-Monomethod Triangles These are the correlations among measures that share the same method of measurement. For instance, A1-B1 = .51 in the upper left heterotraitmonomethod triangle. Note that what these correlations share is method, not trait or concept. If these correlations are high, it is because measuring different things with the same method results in correlated measures. Or, in more straightforward terms, you've got a strong "methods" factor. Heterotrait-Heteromethod Triangles The Multitrait-Multimethod Matrix These are correlations that differ in both trait and method. For instance, A1-B2 is .22 in the example. Generally, because these correlations share neither trait nor method we expect them to be the lowest in the matrix. The Monomethod Blocks These consist of all of the correlations that share the same method of measurement. There are as many blocks as there are methods of measurement. The Heteromethod Blocks These consist of all correlations that do not share the same methods. There are (K(K-1))/2 such blocks, where K = the number of methods. In the example, there are 3 methods and so there are (3(3-1))/2 = (3(2))/2 = 6/2 = 3 such blocks. Now that you can identify the different parts of the MTMM, you can begin to understand the rules for interpreting it. You should realize that MTMM interpretation requires the researcher to use judgment. Even though some of the principles may be violated in an MTMM, you may still wind up concluding that you have fairly strong construct validity. In other words, you won't necessarily get perfect adherence to these principles in applied research settings, even when you do have evidence to support construct validity. To me, interpreting an MTMM is a lot like a physician's reading of an x-ray. A practiced eye can often spot things that the neophyte misses! A researcher who is experienced with MTMM can use it identify weaknesses in measurement as well as for assessing construct validity. To help make the principles more concrete, let's make the example a bit more realistic. We'll imagine that we are going to conduct a study of sixth grade students and that we want to measure three traits or concepts: Self Esteem (SE), Self Disclosure (SD) and Locus of Control (LC). Furthermore, let's measure each of these three different ways: a Paper-and-Pencil (P&P) measure, a Teacher rating, and a Parent rating. The results are arrayed in the MTMM. As the principles are presented, try to identify the appropriate coefficients in the MTMM and make a judgement The Multitrait-Multimethod Matrix yourself about the strength of construct validity claims. The basic principles or rules for the MTMM are: Coefficients in the reliability diagonal should consistently be the highest in the matrix. That is, a trait should be more highly correlated with itself than with anything else! This is uniformly true in our example. Coefficients in the validity diagonals should be significantly different from zero and high enough to warrant further investigation. This is essentially evidence of convergent validity. All of the correlations in our example meet this criterion. A validity coefficient should be higher than values lying in its column and row in the same heteromethod block. In other words, (SE P&P)-(SE Teacher) should be greater than (SE P&P)-(SD Teacher), (SE P&P)-(LC Teacher), (SE Teacher)-(SD P&P) and (SE Teacher)-(LC P&P). This is true in all cases in our example. A validity coefficient should be higher than all coefficients in the heterotraitmonomethod triangles. This essentially emphasizes that trait factors should be stronger than methods factors. Note that this is not true in all cases in our example. For instance, the (LC P&P)-(LC Teacher) correlation of .46 is less than (SE Teacher)-(SD Teacher), (SE Teacher)-(LC Teacher), and (SD Teacher)-(LC Teacher) -- evidence that there might me a methods factor, especially on the Teacher observation method. The same pattern of trait interrelationship should be seen in all triangles. The example clearly meets this criterion. Notice that in all triangles the SE-SD relationship is approximately twice as large as the relationships that involve LC. The MTMM idea provided an operational methodology for assessing construct validity. In the one matrix it was possible to examine both convergent and discriminant validity simultaneously. By its inclusion of methods on an equal footing with traits, Campbell and Fiske stressed the importance of looking for the effects of how we measure in addition to what we measure. And, MTMM provided a rigorous framework for assessing construct validity. Despite these advantages, MTMM has received little use since its introduction in 1959. There are several reasons. First, in its purest form, MTMM requires that you have a fully-crossed measurement design -- each of several traits is measured by each of several methods. While Campbell and Fiske explicitly recognized that one could The Multitrait-Multimethod Matrix have an incomplete design, they stressed the importance of multiple replication of the same trait across method. In some applied research contexts, it just isn't possible to measure all traits with all desired methods (would you use an "observation" of weight?). In most applied social research, it just wasn't feasible to make methods an explicit part of the research design. Second, the judgmental nature of the MTMM may have worked against its wider adoption (although it should actually be perceived as a strength). many researchers wanted a test for construct validity that would result in a single statistical coefficient that could be tested -- the equivalent of a reliability coefficient. It was impossible with MTMM to quantify the degree of construct validity in a study. Finally, the judgmental nature of MTMM meant that different researchers could legitimately arrive at different conclusions. As mentioned above, one of the most difficult aspects of MTMM from an implementation point of view is that it required a design that included all combinations of both traits and methods. But the ideas of convergent and discriminant validity do not require the methods factor. To see this, we have to reconsider what Campbell and Fiske meant by convergent and discriminant validity. It is the principle that measures of theoretically similar constructs should be highly intercorrelated. We can extend this idea further by thinking of a measure that has multiple items, for instance, a four-item scale designed to measure self-esteem. If each of the items actually does reflect the construct of self-esteem, then we would expect the items to be highly intercorrelated as shown in the figure. These strong intercorrelations are evidence in support of convergent validity. The Multitrait-Multimethod Matrix It is the principle that measures of theoretically different constructs should not correlate highly with each other. We can see that in the example that shows two constructs -self-esteem and locus of control -- each measured in two instruments. We would expect that, because these are measures of different constructs, the cross-construct correlations would be low, as shown in the figure. These low correlations are evidence for validity. Finally, we can put this all together to see how we can address both convergent and discriminant validity simultaneously. Here, we have two constructs -- self-esteem and locus of control -- each measured with three instruments. The red and green correlations are within-construct ones. They are a reflection of convergent validity and should be strong. The blue correlations are cross-construct and reflect discriminant validity. They should be uniformly lower than the convergent coefficients. The important thing to notice about this matrix is that it does not explicitly include a methods factor as a true MTMM would. The matrix examines both convergent and discriminant validity (like the MTMM) but it only explicitly looks at construct intraand interrelationships. We can see in this example that the MTMM idea really had two major themes. The first was the idea of looking simultaneously at the pattern of convergence and discrimination. This idea is similar in purpose to the notions implicit in the nomological network -- we are looking at the pattern of interrelationships based upon our theory of the nomological net. The second idea in MTMM was the emphasis on methods as a potential confounding factor. The Multitrait-Multimethod Matrix While methods may confound the results, they won't necessarily do so in any given study. And, while we need to examine our results for the potential for methods factors, it may be that combining this desire to assess the confound with the need to assess construct validity is more than one methodology can feasibly handle. Perhaps if we split the two agendas, we will find that the possibility that we can examine convergent and discriminant validity is greater. But what do we do about methods factors? One way to deal with them is through replication of research projects, rather than trying to incorporate a methods test into a single research study. Thus, if we find a particular outcome in a study using several measures, we might see if that same outcome is obtained when we replicate the study using different measures and methods of measurement for the same constructs. The methods issue is considered more as an issue of generalizability (across measurement methods) rather than one of construct validity. When viewed this way, we have moved from the idea of a MTMM to that of the multitrait matrix that enables us to examine convergent and discriminant validity, and hence construct validity. We will see that when we move away from the explicit consideration of methods and when we begin to see convergence and discrimination as differences of degree, we essentially have the foundation for the pattern matching approach to assessing construct validity. Copyright ©2006, William M.K. Trochim, All Rights Reserved Purchase a printed copy of the Research Methods Knowledge Base Last Revised: 10/20/2006
© Copyright 2026 Paperzz