Modeling the Spearman’s hypothesis using MGCFA: The Woodcock-Johnson data Abstract There has been a good deal of research on Spearman’s hypothesis with regard to Black-White differences in tests of cognitive ability. Most of the research has relied on Jensen’s Method of Correlated Vectors (Jensen, 1998). This method, however, is incapable of rigorously testing competing models that do not involve group differences in g (Dolan, 2000, Dolan & Hamaker, 2001). The purpose of the present paper is to test Spearman’s hypothesis using Multi-Group Confirmatory Factor Analysis applied to three waves of Woodcock-Johnson standardization data. First, using Jensen’s MCV, it is found, for all three standardization waves, that there is a Jensen effect, i.e., positive correlation between the subtests’ g-loadings and the Black-White differences. Secondly, while measurement invariance (MI) was generally found to hold, results from MGCFA using either the high-order-factor or the bi-factor model approach were unclear, and although the present data may indicate that the Spearman’s model fits better than the contra hypothesis, the data was less than ideal. Keywords : Spearman’s hypothesis; MGCFA; IQ; Woodcock-Johnson 1. Introduction Differences in cognitive abilities between U.S. self-identified racial/ethnic (SIRE) groups, e.g., Blacks, Whites, and Hispanics, are beyond dispute (Jensen, 1998; Rushton & Jensen, 2010). Jensen (1998) proposed that the magnitude of the racial differences in IQ, at least between Black and White Americans, as well as differences in cognitive-related socio-economic outcomes are a function of the g-loadings (i.e., the correlation between the tests or outcomes and the general factor of intelligence) of the respective cognitive tests and outcomes, a situation which makes the g factor an essential element in the study of socio-economic inequalities. More specifically, Jensen (1998) proposed that SIRE group differences on cognitive tests are largely due to latent differences in general mental ability. This is known as Spearman’s hypothesis (SH) which exists in two forms : the strong and the weak form, the latter of which was endorsed by Jensen (1998). The strong form asserts that the differences are solely due to g factor differences while the weak form asserts that the differences are mainly due to differences in g. The alternative contra hypothesis states that group differences reside entirely or mainly in the tests’ group or broad factors and/or test specificity and that g differences contribute little or nothing to the overall ones. Regarding tests of SH, many studies have employed Jensen’s (1985, 1998) Method of Correlated Vectors (MCV) to investigate the nature of group differences (e.g., te Nijenhuis et al., 2007; Rushton & Jensen, 2010, pp. 15-16; Dragt, 2010). This method consists in correlating the subtests’ g-loadings with the variable of interest, after correcting for the subtests’ reliability. The results from these studies are consistent with Spearman’s hypothesis in that they are as one would expect were SH correct (Rushton & Jensen, 2010). As Dolan (2000) and Dolan & Hamaker (2001) noted, however, the MCV fails to explicitly test the contrary hypothesis. While it can demonstrate that the pattern of group differences on subtests is consistent with SH, it cannot rule out non-SH models and thus establish SH as a fact. Indeed, with the MCV, SH is either confirmed or rejected based on the strength of the Jensen effect or the correlation between the magnitude of subtest differences and g-loadings. Yet a correlation is a mere descriptive statistics. As Carroll (1997, pp. 131-132) suggested, any hypothesis can only be tested by evaluating and comparing the probability of competing models (e.g., SH versus non-SH models) to reproduce the data, a comparison which the MCV is unable to allow for. To satisfy this requirement, Dolan (2000) proposed the use of Multi-Group Confirmatory Factor Analysis (MGCFA). Applying MGCFA to Jensen & Reynolds’ (1982) data, Dolan (2000) demonstrated that some non-SH models fitted the data just as well as SH models, despite the MCV finding a strong Jensen effect. Based on his results, Dolan (2000, Table 4) concluded that it was impossible to tell which hypothesis is to be preferred. While Dolan (2000), Dolan & Hamaker (2001), Dolan et al. (2004) evaluated SH by way of higher-order factor (HOF) models, Frisby & Beaujean (2015) argued that a bi-factor (BF) model would have been more appropriate. These latter authors found support for SH, in the case of Black and White Americans, using a BF model, although they did not compare the results of the BF approach with ones generated using the HOF approach. In this paper, the Spearman’s hypothesis is tested by means of MGCFA, using the higher-order factor and bi-factor approaches. A major difference between HOF and BF models is that HOF models allow for factors with two indicator variables with no cross-loadings, which the Woodcock Johnson data had, while BF models don’t accept such specification. The difference between these methods will be discussed further below. 2. Method 2.1 Data The Woodcock-Johnson cognitive test battery was described as an operational representation of Horn’s Gf-Gc theory (Horn, 1991), measuring seven broad cognitive abilities : comprehensionknowledge (Gc), long-term retrieval (Glr), visual processing (Gv), auditory processing (Ga), fluid reasoning (Gf), processing speed (Gs), and short-term memory (Gsm). The Woodcock-Johnson test has been used in Murray’s (2007) paper on the Black-White cognitive difference over time. The WJ standardization consists in three waves. The initial version of the test, WJ1 (WJ-I), was standardized using a sample of 4732 subjects aged 2 to 84, tested over the period from April 1976 to May 1977. WJ2 (or WJ-R) was standardized with a sample of 6359 subjects aged 2 to 95, tested from September 1986 to August 1988. WJ3 (WJ-III) was standardized with a sample of 8818 aged 2 to 98, tested from September 1996 to August 1999. Participants are composed of four groups : Whites, Blacks, Hispanics, Asians. For this paper, the between-group comparison is analyzed in a within-wave fashion and only the first two groups are considered. The entire analysis is done with R. The syntax is supplied in the Supplementary file 1. The subtests available in this combined data set are : Picture Vocabulary (PF), Spatial Relations (SR), Memory for Sentences (MS), Visual Auditory Learning (VAL), Sound Blending (SB), Verbal Comprehension (Vcm), Visual Matching (VM), Antonyms-Synonyms (ASn), Analysis-Synthesis (ASt), Numbers Reversed (NR), Concept Formation (CF), Analogies (A), Picture Recognition (PR), Memory for Words (MW), Visual Closure (VCl), Cross-out (C), General Information (GI), Retrieval Fluency (RF), Auditory Attention (AA), Decision Speed (DS). The WJ-I data involves 3764 subjects (3328 Whites with a mean age of 13.15 and 436 Blacks with a mean age of 14.23) with 11 subtests : PF, SR, MS, VAL, SB, VM, ASn, ASt, NR, CF, A1. The WJ-II data involves 4379 subjects (3573 Whites and 806 Blacks) with 14 subtests : PF, MS, VAL, SB, VM, ASn, ASt, NR, CF, PR, MN, IW, VCl, C. The WJ-III data involves 3018 subjects 1 Sound Blending test has been removed from analyses pertaining to Woodcock-Johnson I in the current paper, as it violates measurement invariance. See below for further details. (2592 Whites with a mean age of 20.86 and 426 Blacks with a mean age of 15.70) with 14 subtests : SR, VAL, SB, VCm, VM, ASt, NR, CF, PR, MW, GI, RF, AA, DS. 2.2. Statistical Analyses Before testing SH models against non-SH models, one needs to ensure that measurement invariance (or equivalence) holds with respect to group differences. The opposite would mean that the structure of the subtests’ means has different meanings across groups, and this obscures the interpretation of group differences. In principle, MGCFA proceeds by adding additional constraints to the initial, free model. The model parameters are constrained to be the same across groups. The following steps are taken : first, put a constraint on the factor structure (configural invariance), second, add a constraint on the factor loadings (weak invariance), third, add a constraint on the intercepts (strong invariance), fourth, add a constraint on the residual invariances (strict invariance). Since the last equality constraint is sometimes dropped due to confoundings between measurement errors and specific variance (Dolan & Hamaker, 2001, p. 15), strict invariance seems not to be a condition for establishing measurement invariance. If the model fit shows a meaningful decrement throughout one of the steps, invariance is immediately rejected. With regards to testing SH models using the higher order factor (HOF) approach, one needs to first build a baseline model in which all loadings as well as the residuals of the first-order latent factors are constrained to be equal across groups while the intercepts and the residuals of the second-order latent factor are not constrained to be equal. The strong and weak SH models are nested under this baseline model. In the strong SH model, the added constraint is the intercept of the subtests (i.e., the subtest means) and of the first-order latent factors (i.e., the latent means); since the means of all of the first-order factors are equal, the difference in subtests’ means are entirely explained by the second-order factor. In the weak SH model, the intercepts of the subtests are still equal across groups, but the means and variances of some of the first-order factors (as well as the mean and variance of the second-order factor) are not equal; in this case, the difference in subtests’ means is due to both the difference in the second-order factor and in some of the first-order factors. Finally, the contra-SH model, referred to as the common (correlated) factor model in Dolan (2000), requires the subtests’ intercept and latent factors’ covariances to be constrained to be equal across groups while unconstrained means and variances are allowed for some of the latent factors. In this situation, the groups differ only with respect to some of these correlated first-order latent factors. As per Frisby & Beaujean’s (2015) recommendation, a bi-factor (BF) approach is employed as a means to test the relevant hypotheses. BF may require additional restrictions. Unlike HOF, the BF model cannot reliably estimates models with latent factors having only two indicators unless the loadings of these two indicators are constrained to be equal to each other (Kenny, March 18, 2012, Beaujean, 2014, p. 150). Otherwise, identification problems occur. As the purpose is to identify latent constructs underlying measured variables, Fabrigar et al. (1999, pp. 275-276) recommend using Exploratory Factor Analysis (EFA) over Principal Component Analysis (PCA), the latter which is used to display the pattern of the factor loadings but which can also be used in order to determine how many factors must be retained. For instance, a pattern showing a residual variance being near zero or negative (Bollen, 1989, p. 282; Jöreskog, 1999) or a pattern of loadings having no clear theoretical interpretation should better be rejected. As Costello & Osborne (2005, p. 3) noted, a pattern showing the cleanest factor structure, i.e., no low loadings, no or few cross-loadings, no factors with fewer than three indicators, has the best fit to the data. Generally, methods for deciding how many factors to extract were the Kaiser’s eigenvaluegreater-than-one rule, scree plots, parallel analysis, minimum average partial. Ledesma & Valero- Mora (2007) recommend parallel analysis. In this paper, EFA is performed using promax as oblique rotation. The best model is chosen based on EFA. If the Woodcock-Johnson’s theory-based 7-factor model doesn’t fit the data well, it is not retained. Model selection among CFA models in the HOF approach is done by examination of the baseline model in which the intercept is added (in the BF approach, the baseline already incorporates the intercept). To determine the best fitted model, the intercepts of the factors displaying weak score differences between groups are constrained and the best model is selected based on the remaining factors. To assess invariance and to choose among latent models, the following model fit indices are used : Chi-Square, Comparative Fit Index (CFI), Root Mean Square Error of Approximation (RMSEA), McDonald’s noncentrality index (Mc), Akaike’s information criterion (AIC), and Bayesian information criterion (BIC). Based on their simulation study, Hu & Bentler (1999) recommend using the following cutoff values to judge adequate model fit : CFI=.95, Mc=.90, and RMSEA=.06. CFI estimates the discrepancy between the proposed model and the null model : larger values indicate better fit. RMSEA estimates the discrepancy related to the approximation, or the amount of unexplained variance (residual), or the lack of fit compared to the saturated model : smaller values indicate better fit. AIC and BIC are both comparative measures of fit used in the comparison of two or more models, and they evaluate the difference between observed and expected covariances : smaller values indicate better fit. ECVI, which is similar to AIC, measures the difference between the fitted covariance matrix in the analyzed sample and the expected covariance matrix in another sample of same size (Byrne, 2013, p. 82), and is used for comparing models, hence the absence of threshold cut-off values for an acceptable model : smaller values indicate better fit. Meade et al. (2008) compared CFI, Mc, RMSEA, and SRMR in the context of measurement invariance testing and concluded that CFI and Mc are the most appropriate indices to detect configural invariance with accuracy, and SRMR is the most sensitive to sample size. Finally, regarding the recommended criteria for determining non-invariance in MGCFA studies, Cheung & Rensvold (2002) argued that ΔCFI greater than -.010 and/or ΔMc greater than -.020 should indicate that measurement invariance is violated. Chen’s (2007) simulations lead to the conclusion that for testing either factor loading, intercept or residual invariance, a change of ≥-.005 in CFI supplemented by a change of ≥.010 in RMSEA in the case of small sample and unequal sample size (or ≥-.010 in CFI supplemented by a change of ≥.015 in RMSEA in the case of large sample and equal sample size) would indicate non-invariance. All analyses were conducted using the R statistical software, using the psych (Revelle, 2012) and lavaan (Rosseel, 2012) packages. Full results are displayed in Supplementary file 2. 3. Results 3.1. Method of Correlated Vector The method requires one to estimate the factor loadings of the WJ subtests of each groups and to determine the standardized mean group difference and g-loading (i.e., the loading of the subtest on the first factor) for each subtest, then correct both of them for subtest reliability and finally correlate the g-loadings with the group differences. In the first step, the original variables are corrected for age and gender effects. In the second step, factor analysis (without rotation) is performed. Factor analysis is used to determine the g- loadings of the subtests. Principal axis factoring was the factoring method of choice2. When selecting the number of factors to be extracted for an unrotated solution, the Kaiser’s eigenvaluegreater-than-one rule and a scree plot are used. Correlations between g-loadings and black-white differences are done using white loadings, black loadings and the average loadings3. Table 1, 2, 3 displays the g-loadings for the White group, the Black group, and the average of the two. Mean subtest differences after correction for age and sex effects (but not for unreliability) are provided in the same tables along with the subtests’ reliabilities, which were taken from Woodcock (1978, 1990) and Shrank et al. (2001). For WJ1, the correlation between the subtests’ g-loadings and group difference is very high using the White group’s loadings (r=.778), the Black group’s loadings (r=.864) and the average loadings (r=.836) before correction for reliability. After correction, the correlations are r=.625, r=.807 and r=.732, respectively4. For WJ2, the correlation is medium using the White group’s loadings (r=.520), the Black group’s loadings (r=.564) and the average loadings (r=.558) before correction for reliability. After correction, the correlations are r=.411, r=.333 and r=.402, respectively. For WJ3, the correlation is very high using the White group’s loadings (r=.766), the Black group’s loadings (r=.743) and the average loadings (r=.761) before correction for reliability. After correction, the correlations are r=.759, r=.735 and r=.759, respectively. Overall, these result are consistent with those of earlier studies using Jensen’s MCV. 3.2.0. Testing Assumptions Prior to performing latent variable regressions, such as CFA/MGCFA or SEM, one needs to ensure that the data does not violate the normality assumption (Kline, 2011, p. 74). Several indices are used : univariate skewness (which relates to the asymmetrical distribution around the mean), univariate kurtosis (which relates to the peakedness of a distribution) and Mardia’s multivariate skew and kurtosis. A normal distribution has a skewness less than 2 and a kurtosis less than 7. The entire sample, Black sample and White sample of all three waves show skewness and kurtosis values substantially below the recommended criteria. But at the same time, Mardia’s tests of multivariate skew and kurtosis were substantially larger than the recommended cutoff values of 2 for skewness and 7 for kurtosis5. Additionally, the Q-Q plots of the multivariate distribution also showed some departure from multivariate normal distribution at both the lowest and highest values in WJ1 and WJ2 samples. This discrepancy however seems not to be large. 3.2.1. MGCFA : Woodcock-Johnson Wave 1 EFAs suggest that a 4-factor model, which also appears to have the cleanest factor structure, clearly makes the most interpretive sense. The 3-factor model has many variables with large 2 3 In WJ1, ML method produced an impossible solution in the black sample. According to Nyborg & Jensen (2001), "It would be incorrect to use the loadings in the combined samples, because these would also reflect the between-groups variance in addition to the within-groups variance". These authors suggested the use of the following formula : SQRT((white_loading^2+black_loading^2)/2). And that is the method employed in the current paper. 4 The respective uncorrected correlations when Sound Blending is included are r=.308, r=.588, and r=.468. The respective corrected correlations are r=.133, r=.473, r=.310. 5 Statistics for multivariate skew are : b1p=2.02 for the entire WJ1 sample, b1p=5.13 for the entire WJ2 sample, and b1p=3.08 for the entire WJ3 sample. Statistics for multivariate kurtosis are : b2p=137.64 for the entire WJ1 sample, b2p=259.38 for the entire WJ2 sample, and b2p=246.13 for the entire WJ3 sample. loadings in the Black group but loadings close to zero in the White group and vice versa. In the 5factor model, there are issues with the presence of cross-loadings. Parallel analysis also indicates that four factors should be retained. Thus, a 4-factor model is chosen for subsequent analyses (Table 1). There are not enough variables to consider a 7-factor model, as per WJ’s theory-based model. Using all the 11 subtests, measurement invariance did not hold (at the intercept level). Modification indices reveal that Sound Blending is the cause of the misfit. The analysis was re-run after removing the Sound Blending subtest. EFA still found that a 4-factor model fits the data best, while 3- and 5- factor models did not make much sense. Table 6, which displays MGCFA outcomes without the Sound Blending subtest, reveals that measurement invariance now holds at all steps. There is no decrement in fit. With regard to testing SH using the HOF approach to MGCFA, one needs first to decide which latent factor means have to be constrained, by examining the baseline model (in which all the latent means as well as the second-order factor variances are free across groups but intercepts are equal across groups). It is observed that the Gf, Gsm and Gs factor mean differences are close to zero; thus, they are constrained to be equal across groups. Only g and Gc display sizeable group differences. Thus, the model g+Gc is chosen as the best fitted model for the weak SH model. If we look at the model fit, shown in Table 6, the baseline model fits no better than the weak SH model, which means that the weak SH model is a good approximation to the data. On the other hand, the strong SH, weak SH and contra-SH models fit the data equally well6. Next, a BF model is run, and the baseline model (in the BF baseline, the intercept is modeled) shows that all factor means exhibit very minor differences; only the Gc factor mean difference is statistically significant. For this reason, the best fitting model needed to be identified through exploratory analysis7. It was found that the g+Gc model was the best model among weak SH models and that the Gc model is the best model among contra-SH models. With this determined, the model fits could be compared. As displayed in Table 6, the baseline, strong SH, weak SH and contra-SH models fit equally well. Overall, based on both the HOF and BF results for WJ-1, while neither strong nor weak SH can be rejected, neither can be confirmed. 3.2.2. MGCFA : Woodcock-Johnson Wave 2 EFAs suggest that the 7-factor models fit the data best. Other models do not display a clean structure. For instance, Visual Closure has multiple and very small loadings on several factors except in the case of a 7-factor model. Parallel analysis indicates that seven factors should be retained for the black sample and six factors should be retained for the white sample. Thus, a 7factor model, shown in Table 7, is chosen for subsequent analyses8. As can be seen in Table 8, when going from configural to scalar invariance, there is no decrement in the fit indices. Only with respect to strict invariance, given the criteria of Cheung & Rensvold (2002), is measurement invariance violated. But as mentioned before, strict invariance is not a necessary assumption. With regard to SH using the HOF approach to MGCFA, the baseline model is examined in order 6 7 Refer to the notes under Table 6. The best fitted model can be chosen based on differences in Chi Square value or based on Chi Square test (a model with lower Chi Square is prefered over a less complex model only if it displays significant). 8 To avoid cross-loading, some of the variables which load on several factors are forced to load onto a single factor. This data is not ideal for the present analysis (refer to the Discussion section). to decide which latent factor means have to be constrained. It is observed that the Gf, Gsm and Gs factor means are close to zero; thus, a weak SH model is run that includes Gc, Glr, Ga, Gv and g. There is no difference between the baseline and weak SH model in terms of fit. The strong SH model fits worse than weak SH although the difference is very small. On the other hand, the contra SH model fits better than weak SH model. Next, a BF model is run, and the baseline model (in which the intercept is modeled) shows that all factor means show mild differences, and that Gc, Gs and Ga factor means displayed only very small differences. Thus, the weak SH model that seems to fit the data best includes Gf, Gsm, Glr, Gv and g. As seen in Table 8, this weak SH model fits the data well, and fits a little bit better than the strong SH model; unlike what is observed for the HOF approach, the contra SH model fits worse than the weak SH model. If the BF approach is more reliable at testing the Spearman’s Hypothesis, more weight should be given to results obtained from this approach, in which case the WJ-2 results could be seen as supporting weak SH slightly more than contra-SH. 3.2.3. MGCFA : Woodcock-Johnson Wave 3 Parallel analysis indicates that four factors should be retained for the Black sample while five factors should be retained for the White sample. EFAs suggest that a 4-factor model is by far the cleanest factor structure. In the 3-, 5-, 6- and 7-factor models, the patterns are so different between groups that these definitely are not interpretable. As the WJ’s theory-based 7-factor model is not interpretable, it is not retained. Thus, a 4-factor model, shown in Table 9, is chosen for subsequent analyses. As seen in Table 10, the drop in fit, going from configural to strict invariance, is minimal and so it can be concluded that measurement invariance holds. As far as SH is concerned, the HOF approach shows again that there is no difference between the baseline and strong SH model. In order to select which weak SH model to run, the intercept is added to the baseline model, and the outputted model shows that Gs factor’s mean difference is relatively weak. A weak SH model based on g+Gc+Gf+Gsm is fitted. This model fits well compared to the baseline. Although this weak SH model fits better than the strong SH model, the difference is very small. The corresponding contra-SH model (Gc+Gf+Gsm) fits a little bit better than the weak SH model, but the difference is so small that one can say the models are equivalent. As for the BF method, the baseline model was poorly identified, with some observed variables have negative residuals (in particular, one variable, Memory for Words, had an anomalously high negative variance). As such, no model comparisons for the BF method are reported here. 4. Discussion As measurement non-invariance makes difficult the interpretation of group differences, testing for measurement invariance is important. As MCV is incapable of dealing with this issue, MGCFA was conducted first to test the assumption of measurement invariance and second to test Spearman’s hypothesis. Although measurement invariance generally holds (although, in WJ-I data, invariance holds only after the removal of one biased subtest), nothing definitive can be inferred about the veracity of Spearman’s hypothesis. In WJ-I and WJ-III, the strong SH, weak SH and contra-SH models fit equally well. In WJ-II data, BF method shows that the weak SH model fits better than contra-SH model, while with the HOF method the opposite was true. As Murray & Johnson (2013, p. 420) and Frisby & Beaujean (2015, p. 94) explained, BF approach is best suited for testing theories related to the g factor. This is because with the bi-factor approach, the variances from firstorder latent factors are not confounded (i.e, correlated) with the variance from the second-order factor, which allows g to be purely estimated. Hence, in general, results from BF should be given more weight compared to HOF. On the other hand, the WJ-II data set, in particular, was not ideal for our purposes, as each factors have only two indicators and the pattern loadings of the 7-factor model from EFAs showed some departure from the WJ’s theory based 7-factor model. In this paper, Frisby & Beaujean’s (2015) method for identifying comparison weak and contra SH models has been applied, though it is not clear that it is the best for evaluating Spearman’s Hypothesis. While Dolan et al. (2000, 2001, 2004) used an explanatory method in which they directly test each Spearman’s model and each contra-Spearman’s model in order to find out which model exhibits the best fit9, Frisby & Beaujean (2015) examined the baseline model in the BF method to determine which factors exhibit weak group differences, and based on this examination, they decide which model fits the data best. Although Dolan’s method suffers from comparability issues10, Frisby & Beaujean’s method does not explore all models in detail. Finally, the difficulty in distinguishing between models should be pointed out. Often very different models, which one would expect to be able to distinguish, showed similar fits. Possible explanations are insufficient sample sizes and insufficient number of subtests (Dolan, 2000; Dolan & Hamaker, 2001). For instance, Frisby & Beaujean (2015), who were able to designate one model as the best fitting one used data with a much larger number of subtests and number of subtests per factor. While in the WJ-II data, there are enough variables and the 7-factor model has adequate fit, in WJ-I there are not enough variables to run a 7-factor model and in WJ-III the 7-factor model doesn’t even fit the data. Finding the appropriate data for a MGCFA test of SH seems to be a challenging task. In summary, the conclusion that the Spearman’s model fits better than the contraSpearman’s model is tentative, but the data may not be appropriate. 5. References Beaujean, A. A. (2014). Latent variable modeling using R: A step-by-step guide. Routledge. Bollen, K. A. (1989). Structural equations with latent variables. John Wiley & Sons. Byrne, B. M. (2013). Structural equation modeling with AMOS: Basic concepts, applications, and programming. Routledge. Carroll, J. B. (1997). Theoretical and technical issues in identifying a factor of general intelligence. Intelligence, genes, and success: Scientists respond to the bell curve, 125-156. Chen, F. F. (2007). Sensitivity of goodness of fit indexes to lack of measurement invariance. Structural equation modeling, 14(3), 464-504. Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural equation modeling, 9(2), 233-255. Costello, A. B., & Osborne, J. W. (2005). Best Practices in Exploratory Factor Analysis: Four Recommendations for Getting the Most From Your Analysis. Practical Assessment Research & Evaluation, 10(7), 1-9. Dolan, C. V. (2000). Investigating Spearman’s hypothesis by means of multi-group confirmatory factor analysis. Multivariate Behavioral Research, 35, 21–50. Dolan, C. V., & Hamaker, E. L. (2001). Investigating Black–White differences in psychometric IQ: Multi-group confirmatory factor analyses of the WISC-R and K-ABC and a critique of the method 9 One problem with this latter model selection approach is that the number of models exponentially increases as the number of subtests does. While the fit values of a number of other models for WJ-I and WJ-III have been examined (shown in Supplementary File 2), WJ-II was not analyzed, owing to time constraints. 10 The best model among weak SH can be g+Gc, while the best model among anti-SH could have been Gc+Gsm+Gs, yet g+Gc is being compared to Gc+Gsm+Gs model. of corrected factors. In F. Columbus (Ed.), Advances of Psychological Research, vol. 6 ( pp. 31– 59). Huntington: Nova Science. Dolan, C. V., Roorda, W., & Wicherts, J. M. (2004). Two failures of Spearman’s hypothesis: The GATB in Holland and the JAT in South Africa. Intelligence, 32(2), 155-173. Dragt, J. (2010). Causes of group differences studied with the method of correlated vectors: A psychometric meta-analysis of Spearman’s hypothesis. Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. J. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological methods, 4(3), 272. Frisby, C. L., & Beaujean, A. A. (2015). Testing Spearman’s hypotheses using a bi-factor model with WAIS-IV/WMS-IV standardization data. Intelligence, 51, 79-97. Horn, J. L. (1991). Measurement of intellectual capabilities: A review of theory. In K. S. McGrew, J. K. Werder, & R. W. Woodcock (Eds.), WJ-R technical manual (pp. 197–232). Rolling Meadows, IL: Riverside Hu, L. T., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural equation modeling: a multidisciplinary journal, 6(1), 1-55. Jensen, A. R. (1985). The nature of the black-white difference on various psychometric tests: Spearman’s hypothesis. Behavioral and Brain Sciences, 8, 193-219. Jensen, A. R. (1998). The g factor: The science of mental ability. Westport, CT: Praeger Publishers/Greenwood. Jensen, A. R. & Reynolds, C. R. (1982). Race, Social Class and Ability Patterns on the WISC-R. Personality and Individual Differences, 3, 423-438. Jöreskog, K. G. (1999). How large can a standardized coefficient be. Unpublished Technical Report. Kenny, David A., (March 18, 2012). Identification. Retrieved at : http://davidakenny.net/cm/identify_formal.htm. Kline, R. B. (2015). Principles and practice of structural equation modeling. Guilford publications. Ledesma, R. D., & Valero-Mora, P. (2007). Determining the number of factors to retain in EFA: An easy-to-use computer program for carrying out parallel analysis. Practical assessment, research & evaluation, 12(2), 1-11. Meade, A. W., Johnson, E. C., & Braddy, P. W. (2008). Power and sensitivity of alternative fit indices in tests of measurement invariance. Journal of Applied Psychology, 93, 568–592. Murray, C. (2007). The magnitude and components of change in the black–white IQ difference from 1920 to 1991: A birth cohort analysis of the Woodcock–Johnson standardizations. Intelligence, 35(4), 305-318. Murray, A. L., & Johnson, W. (2013). The limitations of model fit in comparing the bi-factor versus higher-order models of human cognitive ability structure. Intelligence, 41, 407–422. Revelle, W. (2012). psych: Procedures for psychological, psychometric, and personality research. (Version 1.4.8) [Computer Program]. Evanston, IL: Northwestern University. Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48, 1–36. Rushton, J. P., & Jensen, A. R. (2010). Race and IQ: A theory-based review of the research in Richard Nisbett’s Intelligence and How to Get It. The Open Psychology Journal, 3(1), 9-35. Shrank, F. A., & McGrew, K. S. (2001). Woodcock Johnson III: Assessment service bulletin number 2. Wicherts, J. M., & Dolan, C. V. (2010). Measurement invariance in confirmatory factor analysis: An illustration using IQ test performance of minorities. Educational Measurement: Issues and Practice, 29(3), 39-47. Woodcock, R. W. (1978). Development and standardization of the Woodcock-Johnson psychoeducational battery. Teaching Resources. Woodcock, R. W. (1990). Theoretical foundations of the WJ-R measures of cognitive ability. Journal of Psychoeducational Assessment, 8(3), 231-258. Table 1. Unrotated factor loadings and reliabilities (WJ-1) Subtests Picture Vocabulary Spatial Relations Memory for Sentences Visual Auditory Learning Visual Matching Antonyms-Synonyms Analysis-Synthesis Numbers Reversed Concept Formation Analogies White Black Average g-loadings g-loadings g-loadings .649 .473 .571 .712 .504 .555 .530 .509 .809 .567 .516 .585 .762 Black-White gap Reliabilitie s .681 .489 .563 Adjusted (age/sex) 16.543 10.484 8.855 .82 .86 .80 .542 .536 10.111 .95 .439 .841 .616 .582 .602 .763 .475 .825 .592 .550 .594 .763 7.525 16.218 9.212 9.701 10.659 13.038 .65 .90 .84 .82 .90 .84 Table 2. Unrotated factor loadings and reliabilities (WJ-2) Subtests Picture Vocabulary Memory for Sentences Visual Auditory Learning Sound Blending Visual Matching Antonyms-Synonyms Analysis-Synthesis Numbers Reversed Concept Formation Picture Recognition Memory for Names Incomplete Words Visual Closure Cross-out White Black Average g-loadings g-loadings g-loadings .644 .597 .681 .581 .690 .527 .526 .737 .605 .585 .626 .493 .594 .444 .410 .545 Black-White gap Reliabilitie s .663 .589 Adjusted (age/sex) 14.194 7.600 .86 .90 .703 .697 8.142 .92 .590 .587 .725 .626 .684 .693 .500 .632 .570 .504 .636 .559 .557 .731 .616 .636 .660 .497 .613 .511 .459 .592 13.663 4.647 12.120 8.821 7.556 9.124 4.981 6.497 8.616 3.769 8.307 .87 .78 .87 .90 .87 .93 .82 .91 .82 .69 .75 Table 3. Unrotated factor loadings and reliabilities (WJ-3) Subtests White Black Average Black-White gap Reliabilitie s g-loadings g-loadings g-loadings .432 .371 .403 Adjusted (age/sex) 7.045 .616 .626 .621 8.333 .86 .515 .729 .551 .580 .517 .674 .334 .506 .652 .442 .451 .515 .506 .764 .568 .631 .473 .730 .359 .453 .714 .473 .451 .475 .511 .747 .560 .606 .495 .703 .347 .480 .684 .458 .451 .495 12.860 16.258 3.579 9.433 6.744 10.374 2.574 5.938 17.582 1.993 5.098 4.971 .89 .92 .91 .90 .87 .94 .76 .80 .89 .85 .88 .87 Spatial Relations Visual Auditory Learnin g Sound Blending Verbal Comprehension Visual Matching Analysis-Synthesis Numbers Reversed Concept Formation Picture Recognition Memory for Words General Information Retrieval Fluency Auditory Attention Decision Speed .81 Table 4. Fit indices for CFA models Chisquare df p-value CFI RMSE A Mc AIC BIC 72.716 74.231 356.591 366.747 28 30 28 30 .000 .000 .000 .000 .970 .970 .967 .967 .061 .058 .059 .058 .950 .950 .952 .951 34888 34886 261048 261054 35039 35028 261273 261268 216.227 286.582 533.818 796.724 56 70 56 70 .000 .000 .000 .000 .965 .953 .971 .955 .060 .062 .049 .054 .905 .874 .935 .903 90633 90676 399480 399715 90929 90906 399869 400017 107.993 115.888 712.777 741.018 71 73 71 73 .003 .001 .000 .000 .978 .974 .935 .933 .035 .037 .059 .059 .957 .951 .883 .879 46171 46175 282591 282615 46365 46361 282872 282885 Fit indices W-J Wave 1 Black FOF model Black HOF model White FOF model White HOF model W-J Wave 2 Black FOF model Black HOF model White FOF model White HOF model W-J Wave 3 Black FOF model Black HOF model White FOF model White HOF model Table 5. ML estimates of promax rotated factor loadings from a 4-factor model (WJ-1) Subtests Picture Vocabulary Spatial Relations Factors Gc * Gf Gsm Gs * Memory Sentences Visual Auditory Learning Visual Matching Antonyms-Synonyms Analysis-Synthesis Numbers Reversed Concept Formation Analogies * * * * * * * * * Table 6. Fit indices for MGCFA models from W-J Wave 1 Chisquare df p-value CFI RMSE A Mc AIC BIC 426.455 433.581 436.131 461.565 56 63 69 79 .000 .000 .000 .000 .969 .969 .969 .968 .059 .056 .053 .051 .952 .952 .952 .950 327917 327910 327901 327906 328628 328577 328530 328473 516.673 534.215 502.711 507.942 84 93 91 91 .000 .000 .000 .000 .964 .963 .966 .965 .052 .050 .049 .049 .944 .943 .947 .946 327951 327951 327923 327928 328487 328431 328416 328421 510.688 526.683 511.451 517.848 85 89 88 89 .000 .000 .000 .000 .964 .963 .965 .964 .052 .051 .051 .051 945 .943 .945 .945 327943 327951 327938 327942 328473 328456 328449 328447 Fit indices FOF model Configural MI Metric MI Scalar MI Strict MI HOF model Baseline Strong SH Weak SH* contra-SH** BF model Baseline Strong SH Weak SH* contra-SH** * This weak SH model corresponds to the following model : g+Gc factors for both HOF and BF modeling. ** This contra-SH model corresponds to the following model : Gc factor for both HOF and BF modeling. Table 7. ML estimates of promax rotated factor loadings from a 7-factor model (WJ-2) Subtests Picture Vocabulary Memory Sentences Visual Auditory Learning Sound Blending Visual Matching Antonyms-Synonyms Analysis-Synthesis Numbers Reversed Concept Formation Gc * Gf Gsm Factors Gs Glr Ga * * * * * * * * Gv Picture Recognition Memory for Names Incomplete Words Visual Closure Cross-out * * * * * Table 8. Fit indices for MGCFA models from W-J Wave 2 Fit indices FOF model Configural MI Metric MI Scalar MI Strict MI HOF model Baseline Strong SH Weak SH* contra-SH** BF model Baseline Strong SH Weak SH* contra-SH** Chisquare df p-value CFI RMSE A Mc AIC BIC 766.721 794.594 877.793 1077.007 112 119 126 140 .000 .000 .000 .000 .969 .968 .965 .956 .052 .051 .052 .055 .928 .926 .918 .898 530991 531005 531074 531245 532153 532122 532147 532228 1375.340 1574.952 1449.53 1 1224.943 175 187 .000 .000 .944 .935 .056 .058 .872 .853 532108 531649 532868 532332 179 .000 .940 .057 .865 531540 532274 161 .000 .950 .055 .873 531351 532200 1463.136 1568.375 1464.896 1649.256 181 187 182 187 .000 .000 .000 .000 .940 .935 .940 .931 .057 .058 .057 .060 .864 .854 .864 .846 531549 531643 531549 531723 532271 532326 532264 532407 * This weak SH model corresponds to the following model : g+Gc+Glr+Ga+Gv factors for HOF modeling and g+Gf+Gsm+Glr+Gv factors for BF modeling. ** This contra-SH model corresponds to the following model : Gc+Glr+Ga+Gv factors for HOF modeling and Gf+Gsm+Glr+Gv factors for BF modeling. Table 9. ML estimates of promax rotated factor loadings from a 4-factor model (WJ-3) Subtests Factors Gc Spatial Relations Visual Auditory Learning Sound Blending Verbal Comprehension Visual Matching Analysis-Synthesis Numbers Reversed Concept Formation Picture Recognition Memory for Words General Information Gf * * Gsm Gs * * * * * * * * * Retrieval Fluency Auditory Attention Decision Speed * * * Table 10. Fit indices for MGCFA models from W-J Wave 3 Fit indices FOF model Configural MI Metric MI Scalar MI Strict MI HOF model Baseline Strong SH Weak SH* contra-SH** Chisquare df p-value CFI RMSE A Mc AIC BIC 830.485 854.754 896.904 944.825 142 152 162 176 .000 .000 .000 .000 .946 .945 .942 .940 .057 .055 .055 .054 .892 .890 .885 .880 357413 357417 357440 357460 358327 358271 358233 358169 950.530 1051.717 987.387 954.711 177 190 184 181 .000 .000 .000 .000 .939 .932 .937 .939 .054 .055 .054 .053 .880 .867 .875 .880 357463 357538 357486 357459 358167 358164 358147 358139 * This weak SH model corresponds to the following model : g+Gc+Gf+Gsm factors for HOF modeling. ** This contra-SH model corresponds to the following model : Gc+Gf+Gsm factors for HOF modeling.
© Copyright 2026 Paperzz