Orthogonal Factor Analysis Subject to Direct Sparseness Constraint on Loadings Kohei Adachi Osaka University, Japan Nickolay T. Trendafilov The Open University, UK 1 1. Introduction Starting with the FA model, we introduce Sparse Othogonal FA as a procedure for overcoming the problem of Confirmatory FA, with five slides. 1.1. FA Model 1.2. Problem of CFA 1.3. Automatic CFA by SOFA 1.4. Differences to Sparse PCA 1.5. Remaining Parts 2 1.1. FA (Factor Analysis) model FA model with m factors can be written as common factors loadings unique factors (diag) unique variances 2 X F+U np nmp npp for standardized n-obs p-var data matrix X. The aim of FA is to estimate , , (factor corrlations) FA is classified into EFA (exploratory FA) without an constraint and CFA (Confirmatory FA) in which some loadings in is constrained to be zero. 3 1.2. Problem of CFA A CFA model is illustrated in this path diagram Var.1 11 Var.2 31 Fac.1 Var.3 41 Var.4 22 42 Fac.2 11 corresponding to 031 = 41 0 Var.5 52 0 22 0 42 51 where the pairs of Var & Fac with nonzero loadings are linked. A problem of CFA is that its users must specify a priori the constraints on , i.e., how variables are linked to factors. To deal with this problem, we propose a procedure for computationally identifying the optimal CFA model among all possible models with = I (identity). 4 1.3. Automatic CFA by SOFA We call our proposed procedure SOFA abbreviating Sparse Orthogonal FA, as it seeks sparse including zero loadings and = I is assumed. Let use SP() for the sparseness of (i.e., the number of zero loadings). Then, SOFA is formulated as: SOFA: [A] Min, f(,) s.t. SP() = an integer q [B] Perform [A] over q=qmin… qmax to select the best q SOFA allows us to find the optimal orthogonal CFA model among all possible ones. 5 1.4. Differences to Sparse PCA First X F+U2 2 without X F SOFA is based on FA model not on PCA model Second In SOFA, sparseness is directly constrained as Min, f(,) s.t. SP() = an integer q without using Penalty, in contrast to the existing sparse PCA formulated as Min fPCA() + Penalty() over 6 1.5. Organization of Remaining Parts SOFA: [A] Min, f(,) s.t. SP() = an integer q [B] Perform [A] over q=qmin… qmax to select the best q 2 3 4 5 6 7 Loss Function introduce f(,) Algorithm describe [A] Sparseness Selection describe [B] Simulation Study Examples Discussion 7 2. Loss Function We present the loss function to be minimized and formulate SOFA. 2.1. What Function is Selected? 2.2. Selected Function 2.3. Formulation of SOFA 8 2.1. What Function is Selected? FA is formulated with some types of loss functions. Among them, we select a function that can be rewritten as irrelevant to const > 0 given matrix = = f(,) = h() + c A2 (ij) (aij) This minimization over s.t. is easily attained by ij = SP() = q aij iff aij the q th largest absolute in A 0 otherwise 9 2.2. Function Selected As such a function, we select f(F,U,,) = X (F+U)2 (1) (de Leeuw, 2004; Unkel & Trendafilov, 2011) which can be written in the form X (FA+U)2 = = f(F,U,,) = h() + n A2 n1XF Though (1) is a function of F,U,,, we show that (1) can be minimized only with the update of , later. 10 2.3. Formulation of SOFA So, our proposed SOFA is formulated as Min f(F,U,,) = X (F+U)2 subject to SP() = q Sparseness Constraint FF = nIm UU= nIp, Orthogonal Common Factors FU = Om×p Orthogo. common vs unique Orthogonal unique Factors 11 3. Algorithm We detail the algorithm for SOFA. 3.1. Overview 3.2. Update of Λ and Ψ 3.3. Update of n1XZ (1) 3.4. Update of n1XZ (2) 3.5. Whole Algorithm 3.6. Multiple Starts 12 3.1. Overview To minimize Min f(F,U,,) = X (F+U)2 we consider an ALS algorithm in which ,, Z = [F,U] are alternately updated, with common/unique factors combined in n(m+p) Z = [F,U]. However, in Slide 3.3, we show no need of updating Z and further no need of data matrix of X if covariance matrix S = n1XX is available. 13 3.2. Update of , Min X (F+U)2 with F,,U fixed is attained by = diag(n1XU) (1) Min X (F+U)2 Remember, rewritten as s.t. SP() =q with F,,U fixed h()+n A2 and is obtained from A = n1XF (2) Note; (1) and (2) show , A <= n1X[F,U] Z 14 3.3. Update of n1XZ (1) We use 2 slides to show how n1XZ is updated. [F,U] [,] Our task is MinZ X(F+U)2 = X ZB2 s.t. n1ZZ = Im+p summarize FF=nIm , UU=nIp FU = O attained using SVD n1/2XB = for 1 D1 Q p q P1 P2 p p n p nm O Q2 mm mq = P1D1Q1 Z = n1/2P1Q1+ n1/2P2Q2 being not unique, but n1XZ is unique as next 15 3.4. Update of n1XZ (2) The two equations n1/2X = n1/2XBB+ = P1D1 Q1B+ Z = n1/2P1Q1+ n1/2P2Q2 imply the matrix giving , is rewritten as n1XZ = (P1D1Q1B+)(P1Q1+ P2Q2) = B+Q1D1Q1 which can be obtained from = BSB = Q1D12Q1 derived from SVD n1XX sample cavariance matrix EVD: 16 3.5. Whole Algorithm X(F+U)2 = X(FA+U)2 + n A2 monotonically decreases with the following algorithm: 1 2 3 4 5 6 Initialize B = [,] randomly Perform EVD BSB = Q1D12Q1 Obtain B+Q1D1Q1 Update Obtain A to update Finish, or back to 2 with B = [,] Here, we find that SOFA only needs S=n1XX 17 3.6. Multiple Runs SOFA is sensitive to local minima. So, we take the following multiple runs procedure: 1 We run the algorithm 50 times with different starts and find the two equivalent solutions with the lowest loss function values. 2 If such solutions are found, we finish with selecting them as the optimal ones; otherwise, go to 3. 3 We further run the algorithm with different starts, until the two equivalent solutions with the lowest loss function values. 18 4. Sparseness Selection We present our sparseness selection procedure with just one slide. 4.1. Selection using BIC 19 3.5. Whole Algorithm SOFA: [A] Min, f(,) s.t. SP() = an integer q [B] Perform [A] over q=qmin… qmax to select the best q In the last section, we described [A]. For [B], we use BIC expressed as BIC(q) 2log-likelihood q log n That is, [B] is formulated as Best q = argmin BIC(q) over q = qmin… qmax We empirically found SOFA solutions were almost equivalent to ML ones, which validate using MLbased BIC for LS-based SOFA solutions. 20 5. Simulation Studies We briefly report a simulation study whose purpose is to assess how well the true sparseness and parameters are recovered by SOFA. 5.1. True Parameters 5.2. Results 21 5.1. True Parameters We synthesized the true 40 which had one of the five structures: Simple Structure # # # # ? ? ? ? # # # # # # # # ? ? ? ? ? ? # # # # ? ? ? ? ? ? ? ? ? # # # # A “?” cell had 0 or a non-zero randomly. # # # # ? ? ? ? ? ? ? ? Bi-factor Structure ? ? ? ? # # # # ? ? ? ? ? ? ? ? # # # # ? ? ? ? ? ? ? ? ? ? ? ? # # # # # # # # # # # # # # # # # # # ? ? ? # # # ? ? ? # # # # # # # # # # # # # # # # # # ? ? ? ? # # # ? ? ? # # # ? ? The resulting , gave 200 (= 40 5) correlation matrices to be analyzed by SOFA. 22 5.2. Recovery The resulting medians and worst 5 percentiles of indices values among 200 solutions are shown here. Median 1 2 3 4 5 Worst 5% (qˆ q ) / q 0.000 -0.133 Rate of correctly identified zeros 1.000 0.843 Rate of correctly identified non-zeros 1.000 0.972 average of λˆij λij 0.021 0.040 average of ˆ i i 0.038 0.056 1: True sparseness were selected well by BIC. 2,3: True structures were recovered well. 4,5: True parameter values were recovered well. 23 6. Examples We illustrate SOFA with the two famous data sets which have often been used for testing FA procedures. 6.1. Box Problem Data 6.2. Twenty-four Psy Test Data 24 6.1. Box Problem Data The first example is the 3 factor solution for the 400 20 box data matrix generated following Thurstone (1940). BIC was the lowest for q = 27, and the corresponding solution is shown right, where we find the exact simple structure. Variable x x2 y2 z2 xy xz yz 0.95 (x 2 + y 2)1/2 (x 2 + z 2)1/2 (y 2 + z 2)1/2 2x + 2y 2x + 2z 2y + 2z log x log y log z xyz 0.69 0.68 (x 2 +y 2 + z 2)1/2 ex ey ez y z 0.96 0.94 0.67 0.64 0.68 0.67 0.61 0.66 0.64 0.66 0.67 0.66 0.64 0.63 0.64 0.67 0.68 0.68 0.89 0.87 0.47 0.57 0.71 0.49 0.52 0.88 0.54 0.54 0.68 0.71 0.29 0.28 0.31 0.41 0.42 0.39 0.32 0.35 0.34 0.29 0.28 0.30 0.44 0.47 0.45 0.47 0.32 0.70 0.72 0.70 25 6.2. Twenty-four Psy Test Data The second is the 4 factor solution for 24 psychol test data. BIC was the lowest for q = 35, and the corresponding solution is shown right. The loadings showed the bi-factor structure matched to the ones found in the previous studies using EFA and CFA. Abilities Variables (Problems) 1 Visual Perception 0.66 Spatial Cubes 0.44 Perception Paper Form Board 0.41 Flags 0.58 General Information 0.47 Paragraph Comprehension0.54 Verbal Sentence Completion 0.47 Processing Word Classification 0.53 Word Meaning 0.57 Addition 0.18 Speed of Code 0.39 Performances Counting Dots 0.32 Straight-Curved Capitals 0.46 Word Recognition 0.28 Number Recognition 0.25 Figure Recogntion 0.52 Memory Object-Number 0.20 Number-Figure 0.31 Figure-Word 0.41 Deduction 0.61 Numerical Puzzles 0.61 Mathematics Problem Reasoning 0.63 Series Completion 0.75 Arithmetic Problems 0.54 2 3 4 -0.14 -0.26 0.69 0.62 0.75 0.52 0.62 0.17 0.78 0.50 0.57 0.40 -0.16 0.33 0.19 0.25 0.21 0.21 0.28 0.19 0.60 0.52 0.35 0.51 0.38 0.23 0.74 0.88 0.90 0.76 0.54 0.57 0.45 0.66 0.53 0.58 0.72 0.75 0.78 0.75 0.79 0.77 0.76 0.84 0.87 0.78 0.74 0.74 0.65 0.75 26 7. Discussions After summarizing SOFA, we discuss its advantages over the existing CFA and EFA. 7.1. Summary 7.2. SOFA vs CFA 7.3. SOFA vs EFA (Rotation) 27 7.1. Summary We propose SOFA formulated as [A] Min, f(,) s.t. SP() = an integer q [B] Perform [A] over q=qmin… qmax to select the best q For [A] we developed the ALS algorithm for minimizing X (F+U)2 s.t. SP() = q, [F,U][F,U] = I, which can be attained only if sample covariances are available. For [B] we propose to select sparseness q using BIC. Numerical studies demonstrated SOFA successfully select q, obtain sparse structure in and estimate ,. 28 7.2. SOFA vs CFA SOFA overcomes the problem of CFA that the locations of zero loadings must be specified by users: SOFA computationally find the optimal CFA model. But, SOFA solutions are restricted to orthogonal ones. So, oblique version of SOFA remains to be considered in future studies. 29 7.2. SOFA vs EFA (Rotation) As compared to SOFA, two drawbacks are found in EFA, in which loading matrix 0 is rotated so that the resulting 0T has quasi-sparse structure. This term implies that 0T cannot include exact zero loadings [1] The users must resort to view some loadings as approximately zeros, which is subjective and tandem. [2] Rotation does not involve the original data, i.e., the function of only 0T is optimized. in contrast to SOFA in which FA model with sparseness constraint is optimally fitted to data for finding the sparse structure underlying the data. 30
© Copyright 2026 Paperzz