Exploratory Factor Analysis Definition Exploratory factor analysis

Exploratory Factor Analysis
Definition
Exploratory factor analysis (EFA) is a procedure for learning the extent to which k observed
variables might measure m abstract variables, wherein m is less than k. In EFA, we indirectly
measure non-observable behavior by taking measures on multiple observed behaviors.
Conceptually, in using EFA we can assume either nominalist or realist constructs, yet most
applications of EFA in the social sciences assume realist constructs.
Assumptions
1.
2.
3.
4.
5.
6.
7.
8.
Typically, realism rather than nominalism: Abstract variables are real in their consequences.
Normally distributed observed variables.
Continuous-level data.
Linear relationships among the observed variables.
Content validity of the items used to measure an abstract concept.
E(ei) = 0 (random error).
All observed variables are influenced by all factors (see: model specification in CFA).
A sample size greater than 30 (more is better).
Terminology (lots of synonyms):
Factor = Abstract Concept = Abstract Construct = Latent Variable = Eigenvector.
Comparison of Exploratory Factor Analysis and OLS Regression
In OLS regression, we seek to predict a point, a value of a dependent variable (y) from the value
of an independent variable (x). The diagram below indicates the value of y expected from a
given value of x. The error represents the extent to which we fail in predicting y from x.
In EFA, we seek to predict a vector that best describes a relationship between the items used to
measure the vector. The diagram below indicates the value of the vector F, expected from the
correlation of X1 and X2. The error represents the extent to which we fail in predicting the vector
from the correlation of X1 and X2.
EFA assumes that X1 and X2 are linearly dependent, based upon their relationship to some
underlying (i.e., abstract, latent) variable (i.e., construct, concept).
In OLS regression, we solve the (standardized) equation:
Y = X, where:
Y is a vector of dependent variables,
 is a vector of parameter estimates,
X is a vector of independent variables,
 is a vector of errors.
In EFA, we solve the (standardized) equation:
X = F, where:
X is a vector of k observed variables,
 is a vector of k parameter estimates,
F is a vector of m factors (abstract concepts, latent variables),
 is a vector of k errors.
The EFA Model
Consider this simple model that consists of a single factor with two observed variables:
1
X1
2
X2
1
1
F
2
2
Note: When we address the topic of confirmatory factor analysis, we will designate abstract
concepts with the greek letters  and . Because most literature on EFA uses the designation
F, we will use it in this lecture.
We have two equations to solve:
X1 = 1F + 11
X2 = 2F + 22
1.
2.
3.
4.
5.
6.
var(Xi) = E(xi - x )2 Note: for standardized variables, the mean of x = 0.
Thus, var(Xi) = E(Xi)2
Xi = iF + ii
var(Xi) = E(iF + ii)2
var(Xi) = i2 E[F2] + i2 E[i]2 + 2iiE[F,i]
var(Xi) = i2var(F) + i2 var[i] + 2iI cov(F,i)
Assume:
1. cov(F,i) = 0 (i.e., random errors in measurement).
2. var(F) = 1 (i.e., standardized measure of F, or ontologically, "the construct has a unit value").
3. var[i] = 1 (i.e., standardized measure of , or ontologically, "the construct has a unit value").
Therefore:
1. var(Xi) = i2 + i2 = 1 (i.e., x is a standardized variable).
2. Because cov(F,Xi) = i
3. and because var(F) + i cov(F,i) = 1.
3. then, for standardized variables,i = rF,Xi (i.e., the correlation of F and Xi).
4. Example: cov(X1,X2) = 12 var(F) = 12 = rX1,X2 (i.e., the correlation of X1 and X2).
Summary:
1. The parameter estimate (i.e., "factor loading"), i = rF,Xi (i.e., for principle components factor
analysis, this parameter is identical to ).
2. The product of two factor loadings for two variables caused by the same factor (i.e., factorial
complexity = 1) is equal to the correlation between the two observed variables.
3. The "communality" or item reliability of Xi is equal to i2.
4. In principle components exploratory factor analysis, the communality of Xi is identical in
concept to the coefficient of determination (R-square) in OLS regression analysis. [Note:
Later, we will discuss various forms of EFA. Principle components EFA relies upon the
unweighted correlation matrix among the observed variables, and therefore is analogous to
OLS regression analysis with a known number of factors.]
Estimating the EFA Model
1.
2.
3.
4.
5.
Xi is caused by Fm, where m = the number of factors.
F causes Xi, where i = 1-k and k = the number of items that are caused by F.
Xi = i Fm + i.
To solve this equation, we need to measure F.
Our approach:
a. We know Xi (the observed variable).
b. We will estimate i and use this estimate to determine i. [i.e., i + i = 1].
5. Because Xi can be caused by m factors, EFA becomes an exercise in determining the
number of factors that cause Xi and the parameter estimates (i) of each F on each Xi.
Determining the Number of Factors That Affect Each Observed Variable
A factor is an abstract concept. In a realist (vs. nominalist) sense, this concept "causes"
observable behavior in the same manner that the length of a table top "causes" the ruler to
measure its longest dimension as its length. If one were to measure the longest dimension of a
table top twice, and the table top did not change in its dimensions between the two
measurements of it, and the measurements were taken carefully, and the measuring instrument
(i.e., the ruler) were stable and consistent rather than wiggly and wobbly, then the two
measurements should equal one another exactly. Similarly, if one were to measure self-esteem
twice using, for example, the Rosenberg Self-Esteem Scale, and self-esteem did not change
between the two measurements of it, and the measurements were taken carefully, and all ten
items in the Rosenberg Self-Esteem Scale had equal content validity, and the Rosenberg SelfEsteem Scale itself was a stable and consistent measuring instrument, then people should
respond equally to all ten items on the scale (taking into account that half the items are worded
in reverse conceptual order). This result should occur because one's self-esteem "causes" one
to respond accordingly to the items on the Rosenberg Self-Esteem Scale.
In mathematical terms, if the above conditions for measuring self-esteem are met, then the
matrix of responses for the ten items on the scale should have a rank of 1, wherein the figures
shown in columns 2-9 should be identical to those found in column 1 (assuming the items define
the columns and the cases define the rows). That is, once we know a person's response to the
first question in the Rosenberg Self-Esteem Scale, then we know the person's responses to the
remaining nine items. Conceptually, given that each item on the scale is intended equally to
reflect self-esteem, then this outcome is exactly what we would expect to observe. Thus, the
ten items on the Rosenberg Self-Esteem Scale would represent a single, abstract concept (i.e.,
factor): self-esteem.
With this conceptual and mathematical logic in mind, we know we can determine the number of
factors affecting responses to the i = 1-k items by calculating the rank of the matrix of responses
to the observed variables (i.e., X) because rank less than k indicates singularity in the matrix
(i.e., at least two columns are measuring the same thing). This approach is logically consistent,
but it fails in practice because, 1) not all items in a scale have equal content validity in reflecting
the abstract concept and 2) people do not necessarily behave in a logically consistent manner.
Therefore, to determine the number of factors causing responses to a set of observed variables,
we need a measure of linear dependency that is probabilistic rather than deterministic.
Consider the relationship between the rank and determinant of a matrix for a system of two
linear equations, wherein the rows and columns provide unique information.
2x + 3y = 13
4x + 5y = 23
| 2 3 | | x | | 13 |
| 4 5 | * | y | = | 23 |
Solve for x, y:
1. 2x = 13 – 3y
2. x = 13/2 – 3/2y
3. 4 (13/2 – 3/2y) + 5y = 23
4. 26 – 6y + 5y = 23
5. y = 3
6. x = 13/2 – 9/2 = 2.
Now, consider the relationship between the rank and determinant of a matrix for a system of two
linear equations, wherein the rows and columns do not provide unique information. That is,
note that the second equation is identical to 2 * the first equation.
2x + 6y = 22
4x + 12y = 44
| 2 6 | | x | | 22 |
| 4 12 | * | y | = | 44 |
Solve for x, y:
1. 2x = 22 – 6y
2. x = 11 – 3y
3. 4 (11 – 3y) + 12y = 44
4. 44 – 12y + 12y = 44
5. 44 = 44.
Result: Because of the linear dependence between row 1 and row 2 of the matrix, we cannot
find a unique solution for x and y.
Consider the rank of the second matrix:
|2 6|
| 4 12 | multiply Row 1 by 1/2
|1 3|
| 4 12 | multiply Row 1 by -4 and add to Row 2
|1
|0
3|
0 | the rank of this matrix equals 1.
Thus, if a matrix has a perfect linear dependence, then its rank is less than k (the number of
rows and columns). So, we can determine the number of factors by calculating the rank of the
matrix, but this procedure requires perfect linear dependence, a result that is highly unlikely to
occur in practice.
Consider the definition of an eigenvector: X is an eigenvector of a matrix A if there exists a
scalar , such that Ax = x. That is, an eigenvector is a representation of linear dependence in
a square matrix.
To find the eigenvector(s) of a matrix, we solve for X:
1. Ax = x.
2. Ax - x = 0.
3. However, it is impossible to subtract a scalar from a matrix. It is possible, however, to
subtract a scalar from the diagonal of a matrix. So, we insert "1" into the equation in the
form of the Identity matrix.
4. (A - I)X = 0.
5. Let B = (A - I), such that BX = 0.
6. Note: To solve this equation, we will need to calculate the inverse of A. Not all matrices
have an inverse. If a matrix has a rank less than k, then the matrix does not have an
inverse. Also, if a matrix has a rank less than k, then the determinant of the matrix = 0.
7. If BX = 0, and B has an inverse, then X= B-10 and X = 0, which means that the matrix A has
no eigenvector, meaning no indication of linear dependence.
8. Thus, X is an eigenvector of A if and only if B does not have an inverse.
9. If B does not have an inverse, then it has Det = 0 (and therefore perfect linear dependence).
10. So, X is an eigenvector of A, if and only if: Det(A - I) = 0 [i.e., the characteristic equation].
Unlike the rank of a matrix, which is deterministic, the determinant of a matrix is probabilistic,
ranging in value from minus infinity to plus infinity. Therefore, the determinant of a matrix can
be used to indicate the degree of linear dependence in square matrix. Thus, the solution to
estimating the EFA equation is to establish a criterion of linear dependence by which to deem a
matrix as containing one or more eigenvectors (i.e., factors).
The approach is to solve for  which is called the eigenvalue of the matrix. Hand-written notes
attached to this course packet describe the Power Method and Gram-Schmidt Algorithm as
procedures for estimating  wherein the Power Method is a logically correct but impractical
approach and the Gram-Schmidt Algorithm is the approach used in statistical analysis
packages. An example of the matrix algebra used by the Gram-Schmidt Algorithm is attached
to the course packet.
Calculation of 
After determining the number of factors in a matrix, the next step in estimating the EFA equation
is to calculate the parameters in  (discussed in detail below).
Summary
Determining the number of factors underlying a matrix of observed variables involves calculating
the extent to which the matrix contains linear dependency. The rank of a matrix indicates
perfect linear dependency, which is unlikely to occur in practice. The determinant of the
equation for an eigenvector (i.e., wherein an eigenvector represents a factor) is probabilistic.
Thus, we can calculate the determinant associated with an eigenvector to infer the presence of
a factor. We achieve this goal by establishing a decision criterion by which to deem a matrix as
containing one or more linear dependencies. We will discuss a mathematical logic for
establishing this criterion later in this course. For principle components EFA, we will set this
criterion as equal to 1. If an eigenvector has an associated eigenvalue of 1 or greater, then we
will state that this vector represents an underlying abstract construct.
The number of eigenvectors in a matrix of k columns and rows is equal to k. Thus, the GramSchmidt Algorithm will calculate k eigenvalues for a matrix of size k. The calculation of
eigenvalues is a "zero-sum" game in that the degree of linear dependency calculated for one
eigenvector reduces the size of the eigenvalue for the next vector, and so on. In principle
components EFA, for example, the sum of eigenvalues is equal to k.
Indeterminancy and Establishing a Scale
Unfortunately, the calculation of eigenvectors from eigenvalues is indeterminant because of the
linear dependence(s) in X.
Consider this matrix:
A= |1 2|
|4 3|
The eigenvalues of A are -1 and 5.
Solve for x: (A – I)X = 0 at 1 = -1.
1. (A – (-1)I)X = 0.
2. The vector X is: | X1 |
| X2 |
3. Then:
(| 1 2 | + | 1 0 | ) | X1 | = | 0 |
(| 4 3 | | 0 1 |) | X2 | = | 0 |
4. So,
| 2 2 | | X1 | = | 0 |
| 4 4 | | X2 | | 0 |
or: 2X1 + 2X2 = 0
4X1 + 4X2 = 0 These equations cannot be solved!
5. To solve the equations, one of the values in the X matrix must be set to a value.
6. Let X2 = 1, which indicates a "unit vector," or if you will, "The vector has the value of itself."
This process is called, "setting the scale" for the equation.
7. If X2 = 1, then, where 1 = -1
2X1 + 2 = 0, X1 = -1.
8. Solve for x: (A – I)X = 0 at 2 = 5:
(| 1 2 | + ( -5) | 1 0 | ) | X1 | = | 0 |
(| 4 3 |
| 0 1 |) | X2 | = | 0 |
or: -4X1 + 2X2 = 0
4X1 - 2X2 = 0 (X2 is set to 1)
So, X1 = .5.
The equation can be solved. But only if one of the vectors is set to a value of 1. Therefore, the
matrix of factor loadings is arbitrary because the eigenvectors are arbitrary.
The Philosophy of the Social Sciences
In the social sciences we measure variables that have no mass and therefore cannot be directly
observed with the senses. At the same time, the social sciences are conducted under the same
rules of theory development and testing as those used in the physical and life sciences. There
are no exceptions or exemptions in science. If the social sciences must operate under the
same rules of theory development and testing as required of all sciences, yet without the
opportunity to observe phenomena through the senses (or extensions of them, such as
microscopes, telescopes, and such), then some concession must be made. The concession
made is the indeterminancy of measuring abstract concepts. Social sciences must assume that
the abstract vector has some fixed length. Typically, this fixed length is set to 1. The result of
this concession is that to some extent, all measures of abstract concepts are arbitrary.
Indeterminacy in deriving eigenvalues
1. Ontology: Must make a claim about reality. Realism: Abstract concepts are real in their
consequences. Abstract concepts "exist," and this existence is equal to itself = 1.
2. Epistemology: Cannot measure something that has no concrete existence.
X = F + 
a. Known: X, which is the vector of observed variables.
b. We do not know the number of F or the scores on F. We use the GS algorithm to
determine eigenvalues for each eigenvector in R (the correlation matrix). An
eigenvalue is the extent to which one eigenvector is correlated with another
eigenvector. If an eigenvector "stands alone" or "to some extent represents an
association with another eigenvector" then the eigenvalue will be greater than or
equal to 1, respectively. If the eigenvalue ge 1, then we claim that we have
determined the existence of an abstract variable.
c. An eigenvalue is the extent to which an eigenvector must be "altered" to reduce the
determinant of R to (near) zero, wherein the lower the determinant the greater the
"singularity" of R, and the greater the extent to which we identify the existence of an
abstract variable.
Characteristic Equation: Det (A – I) = 0.
Consider the matrix:
| 1 8 |
| 2 15 |
Row 2 is nearly the double of Row 1. Setting the determinant to zero will "remove"
Row 2, and thereby show singularity. If we "remove" Row 2, then we are "removing"
much of the informational value of Row 1 as well. Thus,  will be higher than one,
indicating the existence of an abstract variable that affects both rows.
d. We cannot solve the characteristic equation for an eigenvector unless we reduce the
indeterminacy in the system of equations defined by A. One of the vectors of A must
be set to a constant. Thus, ontologically, we have "set the scale" of our abstract
variable to equal a constant (= 1).
Note: In CFA, we can set the scale by setting on of the elements of  to 1.
Calculation of Factor Loadings
Procedures Other Than Maximum Likelihood
The calculation of the factor loadings (i.e., the  matrix) is:
[factor loadings] = [eigenvectors] * [eigenvalues]1/2 That is, the factor loadings equal the
reliability of the item in predicting the factor.
Maximum Likelihood Factor Analysis
For ML factor analysis the factor loadings (A) are estimated as:
R = AA' + U2,
where R = the correlation matrix, and U2 = 1 – the item reliability (i.e., communality).
Maximum likelihood EFA calculates weights for each element in the  matrix, wherein these
weights represent the communality of each observed variable and where observed variables
with higher communality are given more weight.
Consider the SAS output for the example labeled "Kim and Mueller: Tables 4-5, Figure 5
(http://www.soc.iastate.edu/sapp/soc512EFA.html)." Note that the SAS output provides a
variance explained by each factor, which equals the sum of the squared estimates for each
observed variable on a factor. Thus, the unweighted variance explained by Factor 1 equals .82
+ .72 +.62 +.02 +.02 = 1.49. The SAS output also provides the weights for each variable, which
reflect the communality of each observed variable and where this communality has been further
enhanced to the extent that its reliability is stronger than the reliability of the other observed
variables. These weights are shown in the table labeled "Final Communality Estimates and
Variable Weights." Therefore, the weighted variance explained by Factor 1 equals (.82 * 2.78) +
(.72 * 1.96) + (.62 * 3.57) + (.02 * 2.78) + (.02 * 1.56) = 4.02.
See: Harmon, Harry H. 1976. Modern Factor Analysis, Third Edition. Chicago, The University
of Chicago Press. Pp. 200-216.
Principle Components EFA and OLS Regression
After calculating the factor scores, one can regress each observed variable on these scores to
reproduce exactly the  matrix. The R-square for the OLS regression will equal the item
reliability (i.e., communality) of the observed variable.
Factor Scales [Scores]
Once the EFA equation has been estimated, one can calculate scores on an abstract variable.
The most common procedures are to calculate either the sum or the mean of responses to the
observed variables caused by the factor. For example, to calculate a score on self-esteem,
wherein EFA showed that the ten items on the Rosenberg Self-Esteem Scale are caused by a
single abstract concept, one might add responses to the ten items on the scale. I recommend
calculating the mean score across the ten items to retain the same measurement response
scale as the one used for the ten observed variables.
Other approaches to calculating factor scales account for varying item reliabilities in
representing the abstract construct.
Regression Method
This method assumes that the observed variables represent the population of variables affected
by the abstract concept (i.e., perfect content validity).
‫ܨ‬෠ = X(R-1), where:
‫ܨ‬෠ is the estimated score on the abstract variable,
X is the matrix of standardized scores on the observed variables,
 is the matrix of parameter estimates of the effect of F on X.
R-1 is the inverse of the correlation matrix.
Recall that in OLS regression we estimate the equation:
Y = X + 
We assume that the errors are random and uncorrelated with Y or X. Thus, in OLS regression,
we solve for :
 = X'Y (X'X)-1
Similarly, in principle components factor analysis, we estimate the equation:
X = F + 
We assume that the errors are random and uncorrelated with X or F. Thus, in principle
components factor analysis, we solve for :
 = F'X (F'F)-1
Solving for F yields the equation shown above: ‫ܨ‬෠ = X(R-1)
See Gorsuch, pages 261-262, formula 12.1.6.
See Harmon, pages 368-369, formula 16.21.
Least Squares Method
This method assumes that the observed variables represent a sample from the population of
variables affected by the abstract concept (i.e., imperfect content validity).
‫ܨ‬෠ = X(')-1, where:
‫ܨ‬෠ is the estimated score on the abstract variable,
X is the matrix of standardized scores on the observed variables,
 is the matrix of parameter estimates of the effect of F on X.
Bartlett's Criterion
This method gives more weight to observed variables with higher item reliability (i.e., imperfect
content validity).
‫ܨ‬෠ = XU-2 (' U-2 )-1, where:
‫ܨ‬෠ is the estimated score on the abstract variable,
X is the matrix of standardized scores on the observed variables,
 is the matrix of parameter estimates of the effect of F on X.
U is the matrix of 1 minus the item reliability.
Evaluation of Factor Scales
1. Factor scales can be correlated with one another even if the factors are orthogonal.
2. Correlations among oblique factor scales do not necessarily equal the correlations among
the oblique factors.
3. A factor scale is said to be univocal if its partial correlation with other factors = 0.
4. Factor scales include two indeterminacies: 1) they are based upon indeterminate parameter
estimates, 2) they do not account for unique error variance in F.
Reliability of Factor Scales
F = [var(‫ܨ‬෠ ) – (1 – hi2)wi2] / var(‫ܨ‬෠ ), where:
F (symbol Rho, for ‫ܨ‬෠ ): the reliability of the factor scale,
wi = '(R-1)
var(‫ܨ‬෠ ) = correlation matrix, with all elements weighted by wi.
Extraction Procedures in EFA
Various forms of EFA are defined, wherein these forms rely upon various assumptions about
the nature of social reality. These forms and assumptions are described below. All forms of
EFA rely upon the same algorithm to calculate eigenvalues: the Gram-Schmidt Algorithm (also:
QR and QL algorithms). Therefore, the various forms of EFA differ only in the matrix evaluated
by the GS Algorithm.
The Gram-Schmidt Algorithm calculates k eigenvalues associated with k eigenvectors for a
square matrix (i.e., the correlation matrix or some weighted version of it). The various forms of
EFA, therefore, are defined solely by their treatment of the matrix of correlations among the
observed variables, prior to this matrix being evaluated using the GS Algorithm.
Principle Components
Characteristic equation: Det (R – I) = 0, where R is the correlation matrix among the observed
variables (i.e., the X matrix) with 1's on the diagonal.
This is the "least squares" approach. Indeed, once the factor structure (i.e., number of factors
and loadings of each X on each factor) is calculated, the scores on X and F can be input into
OLS regression analysis to exactly reproduce the  and  matrices.
Principle components is the procedure most often applied in EFA. The criterion used to deem
an eigenvector as a factor is an eigenvalue of 1 or greater.
Principle Axis; Common Factor
Characteristic equation: Det (R1 – I) = 0, where R1 is the correlation matrix among the
observed variables (i.e., the X matrix) with the item reliabilities (i.e., commonalities) on the
diagonal.
The principle axis (or common factor) form of EFA assumes that the items in X will vary in their
content validity as indicators of F. Therefore, the input matrix is weighted to account for differing
item reliabilities among the items in X.
Conducting principle axis EFA requires initial estimates of the item reliabilities. Recall that item
reliability equals the coefficient of determination (R-square) for the item as one observed
outcome of the abstract concept. Therefore, prior communalities (i.e., item reliabilities) can be
estimated through a series of OLS regression equations.
Consider a factor structure with a single factor and three observed variables. Prior
communalities for each Xi are estimated as the R-square statistic for the regression of each Xi
on the remaining elements in X.
X1 = X2 + X3 + e (R2 = prior communality for X1).
X2 = X1 + X3 + e (R2 = prior communality for X2).
X3 = X1 + X2 + e (R2 = prior communality for X3).
Principle axis EFA is not often used. The criterion used to deem an eigenvector as a factor is an
eigenvalue of 0 or greater.
Maximum Likelihood
Characteristic equation: Det (R2 – I) = 0, where R2 is the correlation matrix among the
observed variables (i.e., the X matrix) with weighted item reliabilities (i.e., commonalities) on the
diagonal. Observed variables with more reliability are given more weight.
R2 = U-1 (R – U2) U-1: the correlation matrix divided by the square of the prior communalities.
Maximum likelihood EFA assumes that the items in X will vary in their content validity as
indicators of F. Therefore, the input matrix is weighted to account for differing item reliabilities
among the items in X.
The ML procedure calculates prior communalities in the same manner as is done for the
principle axis procedure.
The ML procedure is commonly used in EFA, especially when one assumes significant
correlations among multiple factors. The criterion used to deem an eigenvector as a factor is an
eigenvalue of 0 or greater.
Alpha
Characteristic equation: Det (R3 – I) = 0, where R3 is the correlation matrix among the
observed variables (i.e., the X matrix) with weighted item reliabilities (i.e., commonalities) on the
diagonal. Observed variables with less reliability are given more weight (see: Correction for
attenuation).
R3 = H-1 (R – U2) H-1: the correlation matrix divided by the square of 1 minus the prior
communalities, wherein U2 + H2 = 1.
Alpha EFA assumes that the items in X will vary in their content validity as indicators of F.
Therefore, the input matrix is weighted to account for differing item reliabilities among the items
in X, but giving more weight to items with less reliability.
The alpha procedure calculates prior communalities in the same manner as is done for the
principle axis procedure.
I do not recall seeing a peer-reviewed publication that used alpha EFA. The criterion used to
deem an eigenvector as a factor is an eigenvalue of 0 or greater.
Image
Characteristic equation: Det (R4 – I) = 0, where R4 is the correlation matrix among the
observed variables (i.e., the X matrix) with weighted item reliabilities (i.e., commonalities) on the
diagonal. Prior communalities are adjusted to reflect that they are derived from a sample of the
population.
R4 = (R – S2) R-1 (R – S2): the correlation matrix divided by the square of the correlation matrix,
subtracting the variances of the observed variables from the diagonal.
S2 = the diagonal matrix of the variances of the observed variables.
The image procedure calculates prior communalities in the same manner as is done for the
principle axis procedure.
I do not recall seeing a peer-reviewed publication that used image EFA. The criterion used to
deem an eigenvector as a factor is an eigenvalue of 0 or greater.
Unweighted Least Squares
Characteristic equation: Det (R – I) = 0, where R is the correlation matrix among the observed
variables (i.e., the X matrix) with 1's on the diagonal.
This approach differs from principle components in that it uses an iterative procedure to
calculate the factor loadings, as compared with the procedure shown below.
I do not recall seeing a peer-reviewed publication that used unweighted least squares EFA. The
criterion used to deem an eigenvector as a factor is an eigenvalue of 1 or greater.
Generalized Least Squares
Characteristic equation: Det (R – I) = 0, where R is the correlation matrix among the observed
variables (i.e., the X matrix) with 1's on the diagonal.
This approach differs from principle components in that it relies upon a direct estimation of the
factor loadings, as compared with the procedure shown below.
I do not recall seeing a peer-reviewed publication that used generalized least squares EFA. The
criterion used to deem an eigenvector as a factor is an eigenvalue of 1 or greater.
The Gram-Schmidt (QR and QL) Algorithm
As noted in the attached paper by Yanovsky, the QR-decomposition (also called the QR
factorization) of a matrix is a decomposition of the matrix into an orthogonal matrix and a
triangular matrix. Note: In this algorithm, the number of rows in the correlation matrix is
referenced with the letter k (rather than the letter m, which is used in the notes above).
1. Define the magnitude of X = ||X||, which is the length of X.
||X|| = [x12 + x22 + … xk2]1/2
2. Two or more vectors are orthogonal if they all have a length of 1 and are uncorrelated with
one another (cosin = 0).
3. Consider two sets of orthogonal vectors:
{x1, x2, x3}
{q1, q2, q3}
where the set q is a linear combination of the set x (i.e., q is the same vector, rotated).
4. If the set q is a linear combination of the set x, then q and x have the same eigenvalues.
5. Thus, by creating successive sets of q, the QR algorithm can iteratively arrive at the set of
eigenvalues describing x.
6. The QR and QL algorithms are identical, except that the QL uses the lower rather than
upper half of the correlation matrix. Thus, if one conducts EFA on the same data using two
different statistical software packages, wherein one uses the QR and the other uses the QL
algorithm, then the parameter estimates will be identical but lined up under different columns
(i.e., factors).
Steps in the Gram-Schmidt (QR and QL) Algorithm
1. calculate rkk = [<xk, xk>]1/2, which is the length of X.
2. set qk = (1 / rkk)Xk, (i.e., Kaiser normalization of the vector X).
3. calculate rkj = <xj, qk>, wherein q = x rotated.
4. replace xj by xj – rkjqk. (i.e., determine the eigenvalues of q).
Rotation
The Gram-Schmidt Algorithm projects the k eigenvectors within a space of k dimensions.
These initial vectors can be difficult to interpret. The purpose of rotation is to find a simpler and
more easily interpretable pattern matrix by retaining the number of factors and the final
communalities of each of the observed variables in X.
Rotation assumes either orthogonal axes (900 angle, indicating no correlation among the
factors) or oblique axes (angles other than 900, indicating correlations among the factors).
There are three approaches to rotation.
Graphic (not commonly used).
Orthogonal: Rotate the axes by visual inspection of the vectors.
Oblique:
1. Establish a reference axis that is perpendicular to a "primary" axis (the vector with the
largest eigenvalue).
2. Plot the second vector.
3. Measure , the angle between F1 and F2.
4. Cosin  = the correlation between F1 and F2.
Rotation to a Target Matrix (not commonly used).
1. Specify a pattern matrix (rotated factor pattern) of interest.
2. Rotate the eigenvectors to this matrix.
3. Use hypothesis testing to determine the extent to which the pattern matrix equals the
theoretically derived target matrix.
Analytic (commonly used).
Orthogonal:
1. Varimax (most commonly used): maximize the squared factor loadings by columns of the
factor pattern. That is, maximize the interpretability of the factors.
2. Quartimax (not often used): maximize the squared factor loadings by rows of the factor
pattern. That is, maximize the interpretability of the observed variables.
3. See also: Equimax, Biquartimax.
Oblique:
1. Minimize errors in estimating , the angle between F1 and F2.
4. See: Harris-Kaiser (used in SAS), direct oblimin (used in SPSS), Quartimin, Covarimin,
Bivarimin, Oblimax, and Maxplane.
Normalization
After rotation from oblique procedures, the resulting vectors are no longer of unit length.
Normalization (see: Kaiser Normalization) resets the vectors to a standardized length of 1.