Canonical Correlations Canonical Correlations I Like Principal Components Analysis, Canonical Correlation Analysis looks for interesting linear combinations of multivariate observations. I In Canonical Correlation Analysis, a multivariate response X is partitioned into two groups: X(1) p×1 X = (2) (p+q)×1 X q×1 I We look for linear combinations U = a0 X(1) and V = b0 X(2) that are highly correlated. NC STATE UNIVERSITY 1/1 Statistics 784 Multivariate Analysis Canonical Correlations I The covariance matrix of X is correspondingly partitioned: Σ1,1 Σ1,2 Cov(X) = Σ = Σ2,1 Σ2,2 I Then I Without loss of generality, Var(U) = a0 Σ1,1 a and Var(V ) = b0 Σ2,2 b can both be constrained to equal 1. I The first pair of canonical variables are the U1 and V1 that maximize the correlation, subject to this constraint. a0 Σ1,2 b p Corr(U, V ) = p 0 a Σ1,1 a b0 Σ2,2 b NC STATE UNIVERSITY 2/1 Statistics 784 Multivariate Analysis Canonical Correlations I Differentiation shows that the maximizing a and b satisfy Σ1,2 b − αΣ1,1 a = 0 Σ2,1 a − βΣ2,2 b = 0 for some α and β. I Then Corr(U, V ) = a0 Σ1,2 b = αa0 Σ1,1 a = α = βb0 Σ2,2 b = β so α = β = √ λ, say. NC STATE UNIVERSITY 3/1 Statistics 784 Multivariate Analysis Canonical Correlations I We assume that Σ1,1 and Σ2,2 are invertible, and then √ λa Σ−1 1,1 Σ1,2 b = √ −1 Σ2,2 Σ2,1 a = λb I Eliminate b: −1 Σ−1 1,1 Σ1,2 Σ2,2 Σ2,1 a = λa. I −1 That is, a is an eigenvector of Σ−1 1,1 Σ1,2 Σ2,2 Σ2,1 and b is a multiple of Σ−1 2,2 Σ2,1 a. I −1 Consequently, b is an eigenvector of Σ−1 2,2 Σ2,1 Σ1,1 Σ1,2 with the same eigenvalue as a. NC STATE UNIVERSITY 4/1 Statistics 784 Multivariate Analysis Canonical Correlations I √ Since Corr(U, V ) = λ, the first canonical pair, U1 and V1 , is determined by the largest eigenvalue λ1 = Corr(U1 , V1 )2 = ρ∗1 2 . I We then look for the pair U2 and V2 with the greatest correlation, subject to Corr(U1 , U2 ) = Corr(V1 , V2 ) = 0. I As a bonus, Corr(U1 , V2 ) = Corr(U2 , V1 ) = 0. I Not surprisingly, this second canonical pair, U2 and V2 , is determined by the next largest eigenvalue λ2 = Corr(U2 , V2 )2 = ρ∗2 2 . I Continuing the process, we find r = min(p, q) canonical pairs. NC STATE UNIVERSITY 5/1 Statistics 784 Multivariate Analysis Canonical Correlations Example: head and leg measurements of chicken bones I Head group: I I I Leg group: I I I skull length; skull breadth. femur length; tibia length. SAS proc cancorr program and output. NC STATE UNIVERSITY 6/1 Statistics 784 Multivariate Analysis Canonical Correlations The number of canonical pairs I In sample data, all r = min(p, q) canonical correlations will be positive, even if the data are drawn from a population with only m < r positively correlated pairs. I proc cancorr tests the hypothesis H0 : ρ∗m = ρ∗m+1 = · · · = ρ∗r = 0 for each value of m, m = 1, 2, . . . , r . I The first null hypothesis is that the two groups are entirely uncorrelated. If that is rejected, the second is that the correlation between them is carried by a single linear combination of each group, etc. NC STATE UNIVERSITY 7/1 Statistics 784 Multivariate Analysis Canonical Correlations Standardization I The coefficient vectors ai and bi may be difficult to interpret, especially when the variables are in different physical units or have very different variances. I Consequently, the data are often standardized, or equivalently, the correlation matrix is used instead of the covariance matrix. I Unlike in Principal Components Analysis, the canonical variables are unchanged by any such scaling. I proc cancorr reports both raw and standardized forms. NC STATE UNIVERSITY 8/1 Statistics 784 Multivariate Analysis Canonical Correlations Correlations I Sometimes the canonical variables can be interpreted as representing characteristics of the items being measured. I The interpretation may be made based on the (raw or standardized) coefficients. I The correlations between the original variables and the canonical variables can also be informative, but “must be interpreted with caution.” I proc cancorr reports the correlations of each original variable with both sets of canonical variables. NC STATE UNIVERSITY 9/1 Statistics 784 Multivariate Analysis Canonical Correlations Special Cases p = q = 1: the “canonical” correlation between single variables X1 and X2 is just the absolute value of their simple Pearson correlation ρ. p = 1, q > 1: the canonical correlation of a single variable X1 and a group of variables X(2) is the multiple correlation R in the regression of X1 on X(2) , so ρ∗1 2 = R 2 . I Also, in general, ρ∗k 2 is the R 2 in the regression of Uk on X(2) , and also in the regression of Vk on X(1) : canonical correlations are closely related to R 2 . NC STATE UNIVERSITY 10 / 1 Statistics 784 Multivariate Analysis
© Copyright 2026 Paperzz