Principal component analysis M i i for f PCA came ffrom major-axis j i regression. i • Motivation • Strong assumption: single homogeneous sample. – Free of assumptions when used for exploration. exploration – Classical tests of significance of eigenvectors and eigenvalues assume multivariate normality. – Bootstrap tests assume only that sample is representative of the population. • Can be used with multiple samples for exploration: – – – – Search for structure: e.g., how many groups? Not optimized for discovering group structure. Classical significance tests can’t be used. If discover structure by exploring data, then can’t test for significance. 3 3 25 2.5 25 2.5 Scores on PC2 (19.4%) Scores on PC2 (19.4%) Principal component analysis 2 1.5 1 0.5 0 -0.5 2 1 1.5 2 1 0.5 0 -0.5 7 8 9 10 11 12 13 Scores on PC1 (72.4%) 7 8 9 10 11 Scores on PC1 (72.4%) MANOVA: p < 0.0001 But: data were sampled randomly from a i l multivariate-normal lti i t l population. l ti single 12 13 Multiple groups and multiple variables • Suppose that: – We have two or more groups (treatments, etc.) defined on extrinsic criteria criteria. – We wish to know whether and how we can discriminate groups on the basis of two or more measured variables. • Things we might want to know: – Can we discriminate the groups? – If so, so how well? How different are the groups? – Are the groups ‘significantly’ different? How do we assess significance in the presence of correlations among variables? – Which variables are most important in discriminating the groups? – Can ggroup p membership p be ppredicted for “unknown” individuals? How good is the prediction? Multiple groups and multiple variables Th ti i three th related l t d methods: th d • These questions are answeredd using (1) Discriminant function analysis (DFA): • = Discriminant analysis (DA), = canonical variate analysis (CVA). • Determines the linear combinations of variables that best discriminate groups. (2) Multivariate analysis of variance (MANOVA): • Determines whether multivariate samples differ non-randomly (significantly). (3) Mahalanobis distance (D2): • Measures distances in multivariate character space in the presence of correlations among variables. • Developed independently by three mathematicians: – Fisher (DFA) in England, Hotelling (MANOVA) in the United States, Mahalanobis (D2) in India. – Due to differences in notation,, underlying y g similarities not noticed for 20 years. – Now have a common matrix formulation. Discriminant analysis • Principal component analysis: – Inherently a single-group procedure: • Assumes that data represent a single homogeneous sample from a population. population • Can be used for multiple groups, but cannot take group structure into consideration. • Often used to determine whether groups differ in terms of the variables used, but: – Can’t use grouping information even if it exists. – Maximizes variance,, regardless g of its source. – Not guaranteed to discriminate groups. • Discriminant analysis: – Explicitly a multiple-group procedure procedure. – Assumes that groups are known (correctly) before analysis, on the basis of extrinsic criteria. – Optimizes O ti i discrimination di i i ti between b t the th groups by b one or more linear combinations of the variables (discriminant functions). Discriminant analysis • Q1 Q1: How H are th the groups different, diff t andd which hi h variables i bl mostt contribute t ib t to the differences? • A: For k groups, find the k–1 linear discriminant functions (axes, vectors, t functions) f ti ) that th t maximally i ll separate t the th k groups. – Discriminant functions (DFs) are eigenvectors of the among-group variance (rather than total variance). – Like PCs PCs, discriminant ffunctions: nctions: • Are linear combinations of the original variables. • Are specified by sets of eigenvector coefficients (weights). – Can be rescaled as vector correlations. correlations – Allow interpretation of contributions of individual variables. • Have corresponding eigenvalues. – Specify the proportion of among among-group group variance (rather than total variance) accounted for by each DF. • Can be estimated from either the covariance matrices (one per group) or the correlation matrices. – Groups are assumed to have multivariate normal distributions with identical covariance matrices. Discriminant analysis • Example: 2 groups, 2 variables: Original data Original data with 95% data ellipses 1 9 11 90 10 80 9 70 8 8 7 X2 2 5 6 5 4 A 4 3 3 2 2 60 50 40 30 20 10 B 3 4 5 6 7 8 9 10 3 4 5 6 X1 7 8 9 0 10 0 X1 Line A: ANOVA F= 0.54 20 40 60 80 100 120 Angle of line from horizontal DF: ANOVA F=85.62 Line B: ANOVA F=56.68 2 2.5 1.5 2 1 1.5 1.5 0.5 0 -0.5 -1 -1.5 Projection scorres Projection scorres 1 Projection scorres X2 7 6 F from ANOVA of scorres 10 0.5 0 -0.5 -1 2 Group 0 -0.5 -1.5 -2 1 0.5 -1 -1.5 -2 1 1 2 Group 1 2 Group 140 160 180 Discriminant analysis • Example: 3 groups, 2 variables: Original data with 95% data ellipses Original data 9 60 10 1 2 9 3 50 F from ANOVA of score es 8 8 7 7 X2 6 5 5 4 A 4 3 3 3 4 5 6 7 8 9 10 11 20 10 2 3 4 5 6 7 8 9 10 11 0 12 0 40 Line A: ANOVA F=29.71 Line B: ANOVA F= 8.91 1.5 1.5 1 1 1 0.5 0.5 -1 0 -0.5 -1 -1.5 -1.5 -2 -2 -2.5 -2.5 -3 2 Group 3 120 1.5 Projection sco ores -0.5 100 DF: ANOVA F=54.55 2 0 80 2 2 05 0.5 60 Angle of line from horizontal 2.5 1 20 X1 X1 Projection sco ores 2 30 B 2 2 Projection sco ores X2 6 40 0 -0.5 -1 -1.5 -2 25 -2.5 1 2 Group 3 1 2 Group 3 140 160 180 Discriminant analysis • The discriminant functions are eigenvectors: – For PCA, the eigenvectors are estimated from S, the covariance matrix, with accounts for the total variance of the sample. – For F DFA, DFA the th eigenvectors i t are estimated ti t d from f a matrix t i that th t accounts for the among-group variance. • For a single variable, a measure of among-group variation, scaled by within within-group group variation, variation is the ratio: sa2 2 sw 1 • Discriminant functions are eigenvectors of the matrix W B W = pooled l d within-group ithi covariance i matrix. ti B = among-group covariance matrix. – Analogous to univariate measure. Discriminant analysis • Thus the DFA eigenvectors: – Maximize the ratio of among-group variation to within-group variation. – Optimize O ti i discrimination di i i ti among all ll groups simultaneously. i lt l • For any set of data, there exists one axis (the discriminant function, DF) for which projections of groups of individuals are maximally separated, as measured by ANOVA of the projections onto the axis. – For 2 groups: this DF completely accounts for group discrimination. – For 3+ groups, have series of orthogonal DFs: • DF1 accounts for largest proportion of among-group variance. among group • DF2 accounts for largest proportion of residual among-group variance. • Etc. • DFs can be used as bases of a new coordinate system for plotting scores of observations, and loadings of original variables. Discriminant analysis • Example: 2 groups, 5 variables: Original data Original data with 95% data ellipses 11 11 10 10 1 9 9 2 8 X2 7 7 6 6 5 5 4 4 3 3 2 3 4 5 6 7 8 9 10 11 12 2 4 6 X1 8 10 12 X1 3 1 0.8 2 0.6 1 2 0 1 -1 Loadings on DF2 Scores on DF2 2 (0.0%) X2 8 0.4 X4 X 2 02 0.2 X5 X3 0 -0.2 X1 -0.4 -0.6 -2 -0.8 -1 -3 -5 0 Scores on DF1 (100.0%) 5 -1 -0.5 0 0.5 Loadings on DF1 1 Discriminant analysis • Example: 3 groups, 5 variables: Oi i ld t with ith 95% d t ellipses lli Original data data Oi i ld t Original data 11 11 10 10 9 9 8 8 7 3 6 X2 2 1 6 5 5 4 4 3 3 2 2 1 1 0 2 4 6 8 0 10 2 4 6 8 10 X1 X1 4 1 3 0.8 0.6 2 1 1 2 0 3 -1 -2 Loadings on DF2 Scores on DF2 (37.3%) X2 7 X X5 4 0.4 02 0.2 X3 0 X1 -0.2 -0.4 X -0.6 2 -0.8 -3 -1 -3 -2 -1 0 1 2 Scores on DF1 (62.7%) 3 4 -1 -0.5 0 0.5 Loadings on DF1 1 Discriminant analysis Di i i f i h l h to • Discriminant functions have no necessary relationship principal components: 9 PC axes DF axes 8 21 8 3 3 PC axes DF axes 7 2 7 1 X2 X2 6 6 5 5 4 4 3 3 2 2 3 4 5 6 7 8 9 10 2 3 4 5 6 X1 9 8 7 8 9 10 X1 3 PC axes DF axes 9 2 PC axes DF axes 2 1 8 7 X2 X2 7 1 6 3 6 5 5 4 4 3 3 2 3 4 5 6 X1 7 8 9 10 3 4 5 6 7 X1 8 9 10 11 MANOVA • Q2: Are the groups significantly heterogeneous? • A: Multivariate analysis of variance: – G Generall case off ttesting ti for f significant i ifi t differences diff among a sett of predefined groups (treatments), with multiple correlated variables. • ANOVA: special case for one variable (univariate). – Hotelling’s T2-test: special case of MANOVA for two g p groups. • t-test: special univariate case for two groups. MANOVA Di i i f i i h matrix: i W 1B • Discriminant functions are eigenvectors off the • The eigenvalues of W 1B are 1 , 2 , , p . p 1 • A general multivariate test statistic is Wilk’s lambda: j 1 1 j – Commonlyy reported p byy statistical packages. p g – Expression to determine significance is complicated. – Wilk’s lambda can be transformed to an F-statistic, but the e pression for this is complicated, expression complicated too. too • Several other test statistics are commonly reported by statistical packages: p g – Varying terminology, varying assumptions. – All reported with corresponding p-values. Mahalanobis’ distance • Q3: How to measure the distance between two groups? • A: Depends on whether we want to take correlations among variables into consideration. – If not, just measure the Euclidean distance between centroids. – If so, must measure the Mahalanobis distance between centroids ‘along’ along the covariance structure: D 2 x1 - x2 S -1 x1 - x2 – Can also measure Mahalanobis distances between points. Euclidean distances 16 16 14 14 12 12 10 2 6.93 6.93 8 10 6 4 4 4 6 8 62.7 10 X 12 1 12.6 8 6 2 71.7 X 6.93 X 2 Mahalanobis distances 14 16 18 2 4 6 8 10 X 12 1 14 16 18 Classifying ‘unknowns’ into predetermined groups C t t have h ‘k ’ groups off observations. b ti • Context: k ‘known’ – Also have one or more ‘unknown’ observations, assumed to be a member of one of the ‘known’ groups. • Task: to assign each ‘unknown’ observation to one of the k groups. • Procedure: – Find Mahalanobis distance from the ‘unknown’ observation to each of the centroids of the k groups. – Assign the ‘unknown’ unknown to the closest group. group • Can be randomized: – Bootstrap the ‘known’ observations by sampling within groups, with i h replacement. l – Assign the ‘unknown’ observation to the closest group, based on distance from the observation to the group centroids. – Repeat many times: gives the proportion of times the observation is assigned to each of the groups. Classifying ‘unknowns’ into predetermined groups E l : 2 groups, 2 variables, i bl 1 unknown, k b • Example 100 bootstrap iterations: 1 9 9 2 8 2 8 7 X X 2 2 7 6 1 6 5 5 4 4 3 4.5 5 5.5 6 6.5 7 X 7.5 8 8.5 1 Classification probabilities: Groupp 1: 0.15 Group 2: 0.85 9 9.5 3 3.5 4 4.5 5 5.5 X 6 6.5 7 1 Classification probabilities: Groupp 1: 0.51 Group 2: 0.49 7.5 Assessing misclassification rates (probabilities) W ld like lik to know k h ‘good’ ‘ d’ the h discriminant di i i f i • Would how functions are. – DFA involves findingg axes of maximum discrimination for the data included in the analysis. – Would like to know how well the procedure will generalize. – Can’t C ’t trust t t misclassification i l ifi ti rates t based b d on the th observations b ti used in the analysis. • Ideally, y would like to have new, ‘known’ data to assign g to known groups based on the discriminant functions. Assessing misclassification rates (probabilities) • Alternately, can cross-validate: – Divide all data into: (1) ‘Calibration’ data set: used to find discriminant functions. (2) ‘Test’ data set: used to test discriminant functions. • Determine how well the DFs can correctly assign ‘unknowns’ to their correct groups. • Proportions of incorrect assignments are estimates of ‘true’ misclassification rates. – Problem: need all data to get the ‘best’ estimates discriminant functions. – Solution: cross-validate one observation at a time via the jackknife procedure. Assessing misclassification probabilities C lid ti via i the th jackknife j kk if (‘leave-one-out’) (‘l t’) procedure. d • Cross-validation – Set one observation aside. – Estimate discriminant functions from remaining observations. – Classify l if the h remaining i i ‘known’ k observation b i using i the h discriminant di i i functions. – Repeat for all observations, leaving one out at a time. • Example: E l 2 groups, 2 variables, i bl 5 observations/group: b i / 2 9 85 8.5 1.5 7.5 2 7 X 1 2 65 6.5 6 5.5 5 Scores o on DF2 8 1 0.5 2 0 -0.5 1 4.5 -1 4 3.5 4 4.5 5 5.5 6 X1 6.5 7 7.5 8 -2 -1.5 -1 -0.5 0 Scores on DF1 0.5 1 1.5 Assessing misclassification probabilities • • • A i each h iindividual, di id l in i turn, t tto one off the th known k i the th Assign groups using jackknife procedure. Bootstrap 100 times. Misclassification rate: 2/10 = 20% Percentage of bootstrap replicates Observation Group Assigned to group Group 1 Group 2 1 1 2 0.49 0.51 2 1 1 0.53 0.47 3 1 1 0.52 0.48 4 1 1 0.54 0.46 5 1 1 0.57 0.47 6 2 2 0 44 0.44 0 56 0.56 7 2 2 0.45 0.55 8 2 2 0.47 0.53 9 2 2 0.39 0.61 10 2 1 0.53 0.47
© Copyright 2026 Paperzz