K. Adachi and N. Trendafilov

Orthogonal Factor Analysis
Subject to Direct Sparseness
Constraint on Loadings
Kohei Adachi
Osaka University, Japan
Nickolay T. Trendafilov
The Open University, UK
1
1. Introduction
Starting with the FA model, we introduce Sparse
Othogonal FA as a procedure for overcoming the
problem of Confirmatory FA, with five slides.
1.1. FA Model
1.2. Problem of CFA
1.3. Automatic CFA by SOFA
1.4. Differences to Sparse PCA
1.5. Remaining Parts
2
1.1. FA (Factor Analysis) model
FA model with m factors can be written as
common factors
loadings
unique factors
(diag) unique variances
2
X

F+U
np
nmp npp
for standardized n-obs  p-var data matrix X. The aim
of FA is to estimate , ,  (factor corrlations)
FA is classified into EFA (exploratory FA) without an
constraint and CFA (Confirmatory FA) in which some
loadings in  is constrained to be zero.
3
1.2. Problem of CFA
A CFA model is illustrated in this path diagram
Var.1
11
Var.2
31
Fac.1
Var.3
41
Var.4
22
42
Fac.2
 11
corresponding to 031
 = 41
 0
Var.5
52
0
22 
0
42 
51 
where the pairs of Var & Fac with nonzero loadings
are linked.
A problem of CFA is that its users must specify a priori
the constraints on , i.e., how variables are linked to
factors.
To deal with this problem, we propose a procedure for
computationally identifying the optimal CFA model
among all possible models with  = I (identity).
4
1.3. Automatic CFA by SOFA
We call our proposed procedure SOFA abbreviating
Sparse Orthogonal FA, as it seeks sparse  including
zero loadings and  = I is assumed.
Let use SP() for the sparseness of  (i.e., the number
of zero loadings). Then, SOFA is formulated as:
SOFA:
[A] Min, f(,) s.t. SP() = an integer q
[B] Perform [A] over q=qmin… qmax to select the best q
SOFA allows us to find the optimal orthogonal CFA
model among all possible ones.
5
1.4. Differences to Sparse PCA
First
X  F+U2
2
without

X  F
SOFA is based on FA model
not on PCA model
Second
In SOFA, sparseness is directly constrained as
Min, f(,) s.t. SP() = an integer q
without using Penalty, in contrast to the
existing sparse PCA formulated as
Min fPCA() + Penalty() over 
6
1.5. Organization of Remaining Parts
SOFA:
[A] Min, f(,) s.t. SP() = an integer q
[B] Perform [A] over q=qmin… qmax to select the best q
2
3
4
5
6
7
Loss Function
introduce f(,)
Algorithm
describe [A]
Sparseness Selection describe [B]
Simulation Study
Examples
Discussion
7
2. Loss Function
We present the loss function to be minimized and
formulate SOFA.
2.1. What Function is Selected?
2.2. Selected Function
2.3. Formulation of SOFA
8
2.1. What Function is Selected?
FA is formulated with some types of loss functions.
Among them, we select a function that can be
rewritten as
irrelevant to  const > 0 given matrix
=
=
f(,) = h() + c  A2
(ij) (aij)
This minimization over s.t.
is easily attained by
ij =
SP() = q
aij iff aij  the q th largest absolute in A
0
otherwise

9
2.2. Function Selected
As such a function, we select
f(F,U,,) = X  (F+U)2
(1)
(de Leeuw, 2004; Unkel & Trendafilov, 2011)
which can be written in the form
X  (FA+U)2
=
=
f(F,U,,) = h() + n A2
n1XF
Though (1) is a function of F,U,,, we show
that (1) can be minimized only with the update
of , later.
10
2.3. Formulation of SOFA
So, our proposed SOFA is formulated as
Min f(F,U,,) = X  (F+U)2
subject to
SP() = q
Sparseness Constraint
FF = nIm
UU= nIp,
Orthogonal Common Factors
FU = Om×p
Orthogo. common vs unique
Orthogonal unique Factors
11
3. Algorithm
We detail the algorithm for SOFA.
3.1. Overview
3.2. Update of Λ and Ψ
3.3. Update of n1XZ (1)
3.4. Update of n1XZ (2)
3.5. Whole Algorithm
3.6. Multiple Starts
12
3.1. Overview
To minimize
Min f(F,U,,) = X  (F+U)2
we consider an ALS algorithm in which ,, Z = [F,U]
are alternately updated, with common/unique factors
combined in n(m+p) Z
= [F,U].
However, in Slide 3.3, we show no need of updating Z
and further no need of data matrix of X if covariance
matrix S = n1XX is available.
13
3.2. Update of ,
Min X  (F+U)2 with F,,U fixed
is attained by  = diag(n1XU)
(1)
Min X  (F+U)2
Remember, rewritten as
s.t. SP()
=q
with F,,U fixed
h()+n A2 and
 is obtained from A = n1XF (2)
Note; (1) and (2) show ,
A <= n1X[F,U]
Z
14
3.3. Update of n1XZ (1)
We use 2 slides to show how n1XZ is updated.
[F,U] [,]
Our task is
MinZ X(F+U)2 = X ZB2
s.t.
n1ZZ = Im+p
summarize
FF=nIm , UU=nIp
FU = O
attained using SVD
n1/2XB =
for
1 
 D1
 Q
p q
 P1 P2   p p



 n p nm  
O  Q2 
mm   mq 

= P1D1Q1
Z = n1/2P1Q1+ n1/2P2Q2
being not unique, but n1XZ is unique as next
15
3.4. Update of n1XZ (2)
The two equations
n1/2X = n1/2XBB+ = P1D1 Q1B+
Z = n1/2P1Q1+ n1/2P2Q2
imply the matrix giving ,  is rewritten as
n1XZ = (P1D1Q1B+)(P1Q1+ P2Q2)
= B+Q1D1Q1
which can be obtained from
=
BSB = Q1D12Q1 derived from SVD
n1XX sample cavariance matrix
EVD:
16
3.5. Whole Algorithm
X(F+U)2 = X(FA+U)2 + n A2
monotonically decreases with the following algorithm:
1
2
3
4
5
6
Initialize B = [,] randomly
Perform EVD BSB = Q1D12Q1
Obtain B+Q1D1Q1
Update 
Obtain A to update 
Finish, or back to 2 with B = [,]
Here, we find that SOFA only needs S=n1XX
17
3.6. Multiple Runs
SOFA is sensitive to local minima.
So, we take the following multiple runs procedure:
1 We run the algorithm 50 times with different starts
and find the two equivalent solutions with the lowest
loss function values.
2 If such solutions are found, we finish with selecting
them as the optimal ones; otherwise, go to 3.
3 We further run the algorithm with different starts,
until the two equivalent solutions with the lowest
loss function values.
18
4. Sparseness Selection
We present our sparseness selection procedure
with just one slide.
4.1. Selection using BIC
19
3.5. Whole Algorithm
SOFA:
[A] Min, f(,) s.t. SP() = an integer q
[B] Perform [A] over q=qmin… qmax to select the best q
In the last section, we described [A].
For [B], we use BIC expressed as
BIC(q)  2log-likelihood  q  log n
That is, [B] is formulated as
Best q = argmin BIC(q) over q = qmin… qmax
We empirically found SOFA solutions were almost
equivalent to ML ones, which validate using MLbased BIC for LS-based SOFA solutions.
20
5. Simulation Studies
We briefly report a simulation study whose
purpose is to assess how well the true sparseness
and parameters are recovered by SOFA.
5.1. True Parameters
5.2. Results
21
5.1. True Parameters
We synthesized the true 40  which had one of the
five structures:
Simple Structure
#
#
#
#
?
?
?
?
#
#
#
#
#
#
#
#
?
?
?
?
?
?
#
#
#
#
?
?
?
?
?
?
?
?
?
#
#
#
#
A “?” cell had 0 or a
non-zero randomly.
#
#
#
#
?
?
?
?
?
?
?
?
Bi-factor Structure
?
?
?
?
#
#
#
#
?
?
?
?
?
?
?
?
#
#
#
#
?
?
?
?
?
?
?
?
?
?
?
?
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
?
?
?
#
#
#
?
?
?
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
?
?
?
?
#
#
#
?
?
?
#
#
#
?
?
The resulting ,  gave 200 (= 40  5)
correlation matrices to be analyzed by SOFA.
22
5.2. Recovery
The resulting medians and worst 5 percentiles of
indices values among 200 solutions are shown here.
Median
1
2
3
4
5
Worst 5%
(qˆ  q ) / q
0.000 -0.133
Rate of correctly identified zeros 1.000
0.843
Rate of correctly identified non-zeros
1.000 0.972
average of λˆij  λij 0.021
0.040
average of ˆ i  i 0.038
0.056
1: True sparseness were selected well by BIC.
2,3: True structures were recovered well.
4,5: True parameter values were recovered well.
23
6. Examples
We illustrate SOFA with the two famous data sets
which have often been used for testing FA procedures.
6.1. Box Problem Data
6.2. Twenty-four Psy Test Data
24
6.1. Box Problem Data
The first example is the 3
factor solution for the 400  20
box data matrix generated
following Thurstone (1940).
BIC was the lowest for q = 27,
and the corresponding solution
is shown right, where we find
the exact simple structure.
Variable

x
x2
y2
z2
xy
xz
yz
0.95
(x 2 + y 2)1/2
(x 2 + z 2)1/2
(y 2 + z 2)1/2
2x + 2y
2x + 2z
2y + 2z
log x
log y
log z
xyz
0.69
0.68
(x 2 +y 2 + z 2)1/2
ex
ey
ez
y
z
0.96
0.94
0.67
0.64
0.68
0.67
0.61
0.66
0.64
0.66
0.67
0.66
0.64
0.63
0.64
0.67
0.68
0.68
0.89
0.87
0.47
0.57
0.71
0.49
0.52
0.88
0.54
0.54
0.68
0.71

0.29
0.28
0.31
0.41
0.42
0.39
0.32
0.35
0.34
0.29
0.28
0.30
0.44
0.47
0.45
0.47
0.32
0.70
0.72
0.70
25
6.2. Twenty-four Psy Test Data
The second is the 4
factor solution for 24
psychol test data.
BIC was the lowest for q
= 35, and the
corresponding solution is
shown right.
The loadings showed
the bi-factor structure
matched to the ones
found in the previous
studies using EFA and
CFA.
Abilities
Variables (Problems)
1
Visual Perception
0.66
Spatial
Cubes
0.44
Perception
Paper Form Board
0.41
Flags
0.58
General Information
0.47
Paragraph Comprehension0.54
Verbal
Sentence Completion
0.47
Processing
Word Classification
0.53
Word Meaning
0.57
Addition
0.18
Speed of
Code
0.39
Performances Counting Dots
0.32
Straight-Curved Capitals 0.46
Word Recognition
0.28
Number Recognition
0.25
Figure Recogntion
0.52
Memory
Object-Number
0.20
Number-Figure
0.31
Figure-Word
0.41
Deduction
0.61
Numerical Puzzles
0.61
Mathematics Problem Reasoning
0.63
Series Completion
0.75
Arithmetic Problems
0.54

2
3
4
-0.14
-0.26
0.69
0.62
0.75
0.52
0.62
0.17
0.78
0.50
0.57
0.40
-0.16
0.33
0.19
0.25
0.21
0.21
0.28
0.19
0.60
0.52
0.35
0.51
0.38
0.23

0.74
0.88
0.90
0.76
0.54
0.57
0.45
0.66
0.53
0.58
0.72
0.75
0.78
0.75
0.79
0.77
0.76
0.84
0.87
0.78
0.74
0.74
0.65
0.75
26
7. Discussions
After summarizing SOFA, we discuss its
advantages over the existing CFA and EFA.
7.1. Summary
7.2. SOFA vs CFA
7.3. SOFA vs EFA (Rotation)
27
7.1. Summary
We propose SOFA formulated as
[A] Min, f(,) s.t. SP() = an integer q
[B] Perform [A] over q=qmin… qmax to select the best q
For [A] we developed the ALS algorithm for minimizing
X  (F+U)2 s.t. SP() = q, [F,U][F,U] = I,
which can be attained only if sample covariances are
available.
For [B] we propose to select sparseness q using BIC.
Numerical studies demonstrated SOFA successfully
select q, obtain sparse structure in  and estimate ,.
28
7.2. SOFA vs CFA
SOFA overcomes the problem of CFA that the
locations of zero loadings must be specified by
users:
SOFA computationally find the optimal CFA model.
But, SOFA solutions are restricted to orthogonal
ones. So, oblique version of SOFA remains to be
considered in future studies.
29
7.2. SOFA vs EFA (Rotation)
As compared to SOFA, two drawbacks are found in
EFA, in which loading matrix 0 is rotated so that the
resulting 0T has quasi-sparse structure. This term
implies that 0T cannot include exact zero loadings
[1] The users must resort to view some loadings as
approximately zeros, which is subjective and tandem.
[2] Rotation does not involve the original data, i.e., the
function of only 0T is optimized.
in contrast to SOFA in which FA model with
sparseness constraint is optimally fitted to data for
finding the sparse structure underlying the data.
30