LECTURE 6 MULTIVARIATE REGRESSION Multivariate Regression; Selection Rules Supplementary Readings: Wilks, chapters 6; Bevington, P.R., Robinson, D.K., Data Reduction and Error Analysis for the Physical Sciences, McGraw-Hill, 1992. SOLUTION OF MATRIX EQUATION a 11 a 21 A . a N1 a 12 a a ... 22 N2 ... We can write 1N a 2N a NN a x 1 x x .2 x N b 1 b b 2 . b N Ax b If A is invertible there is a unique solution: 1 xA b Recall Linear Regression y abx 2 y x a x b x ii i i We can write this as a matrix equation, n x i x y i a i 2 x y x b i i i Recall Linear Regression y abx 2 y x a x b x ii i i We can generalize this for the case of multiple independent variables (“multiple linear regression”): yˆ b0 b1xˆ1 b2 xˆ2 ...bM xˆM Let us consider this in more detail... Consider the general linear model, yˆ b0 b1xˆ1 b2 xˆ2 ... bM xˆM N N 2 2 y ˆ yi b0 b1xˆ1i b2 xˆ2i ... bM xˆ M i yi i 1 i 1 We seek to minimize the sum-of-squares with respect to the M+1 parameters b0,…,bM N 2 ˆ y y 0 i b i 1 j N b b xˆ b xˆ ... b xˆ y 2 0 0 1 1i 2 2 M M i i i b i 1 j This is SIMULTANEOUS multiple linear regression N b b xˆ b xˆ ... b xˆ y 2 0 0 1 1i 2 2 M M i i i b i 1 j This yields a system of M+1 equations: N ˆ ˆ ˆ 2 b b x b x ... b x y 0 1 1i 1 0 2 2i M Mi i i 1 N ˆ ˆ ˆ ˆ 2 b b x b x ... b x y x 0 1 1i 0 2 2i M Mi i 1i i 1 N ˆ ˆ ˆ ˆ 2 b b x b x ... b x y x 0 1 1i 0 2 2i M Mi i 2i i 1 (1) (2) (3) . . . N ˆ (M+1) ˆ ˆ ˆ 2 b b x b x ... b x y x 0 1 1i 0 2 2i M Mi i M i i 1 N 2b0 b1xˆ1i b2 xˆ2i ... bM xˆM i yi 1 0 i 1 N 2b0 b1xˆ1i b2 xˆ2i ... bM xˆM i yi xˆ1i 0 i 1 N 2b0 b1xˆ1i b2 xˆ2i ... bM xˆM i yi xˆ2i 0 i 1 This set of M+1 linear equations in M+1 unknowns (1) (2) (3) . . can be rewritten in matrix format N xˆ1i 2 xˆ1i xˆ1i xˆ2i . . xˆ M i xˆ2i xˆ1i . . xˆ M i xˆ1i xˆ2i xˆ1i xˆ2i 2 ˆ x 2i . . xˆ M i xˆ2i ... N 2b0 b1xˆ1i b2 xˆ2i ... bM xˆM i yi xˆM i 0 i 1 xˆ M i b 0 yi y xˆ (M+1) ... xˆ xˆ b 1i M i i 1i 1 b y xˆ ... xˆ xˆ i 2i 2i M i .2 . . . . . . . ... xˆ 2 bM yi xˆ M i Mi N xˆ1i 2 xˆ1i xˆ1i xˆ2i . . xˆ M i xˆ2i xˆ1i . . xˆ M i xˆ1i xˆ2i xˆ1i xˆ2i 2 xˆ2i . . xˆ M i xˆ2i xˆ M i ... c is the column M+1vector: b 0 yi y xˆ ... xˆ xˆ b 1i M i i 1i 1 b y xˆ ... xˆ xˆ i 2i 2i M i .2 . . . . . . . ... xˆ 2 bM yi xˆ M i Mi Where A is the symmetric M+1 x M+1 matrix yi y xˆ i 1i c y xˆ i 2i . . yi xˆ M i N xˆ 1i A xˆ 2i . . xˆ M i xˆ1i xˆ 2 1i xˆ2i xˆ1i . . xˆ M i xˆ1i Can be written: Ab c xˆ2i xˆ1i xˆ2i 2 xˆ2i . . xˆ M i xˆ2i b is the column M+1-vector of regression coefficients ... xˆ M i ... xˆ xˆ 1i M i ... xˆ xˆ 2i M i . . . . ... xˆ 2 Mi b 0 b 1 b b 2 . . b M N xˆ1i xˆ2i . . xˆ M i xˆ1i xˆ1i 2 xˆ2i xˆ1i . . xˆ M i xˆ1i xˆ2i xˆ1i xˆ2i 2 xˆ2i . . xˆ M i xˆ2i xˆ M i ... c is the column M+1vector: b 0 yi y xˆ ... xˆ xˆ b 1i M i i 1i 1 b y xˆ ... xˆ xˆ i 2i 2i M i .2 . . . . . . . ... xˆ 2 bM yi xˆ M i Mi Where A is the symmetric M+1 x M+1 matrix yi y xˆ i 1i c y xˆ i 2i . . yi xˆ M i N xˆ 1i A xˆ 2i . . xˆ M i xˆ1i xˆ 2 1i xˆ2i xˆ1i . . xˆ M i xˆ1i Solution is: b A1c As long as A is invertible!! xˆ2i xˆ1i xˆ2i 2 xˆ2i . . xˆ M i xˆ2i b is the column M+1-vector of regression coefficients ... xˆ M i ... xˆ xˆ 1i M i ... xˆ xˆ 2i M i . . . . ... xˆ 2 Mi b 0 b 1 b b 2 . . b M Let us consider some special cases... (1) The predictors and predictand are zero-mean: xˆ1i N xˆ 1i A xˆ 2i . . xˆ M i xˆ 2 1i xˆ2i xˆ1i . . xˆ M i xˆ1i xˆ2i xˆ1i xˆ2i 2 xˆ2i . . xˆ M i xˆ2i ... xˆ M i N 0 ... xˆ xˆ 1i M i 0 ... xˆ xˆ 2i M i . . . . . . 0 ... xˆ 2 Mi 0 xˆ 0 2 xˆ1i xˆ2i 1i xˆ2i xˆ1i . . xˆ M i xˆ1i 0 ... xˆ xˆ 1i M i ... xˆ xˆ 2i M i . . . . ... xˆ 2 Mi ... 2 xˆ2i . . xˆ M i xˆ2i The system thus reduces to the condition b0=0 and the matrix equation, Ab c A xˆ1i 2 xˆ2i xˆ1i . . xˆ M i xˆ1i xˆ1i xˆ2i 2 xˆ2i . . xˆ M i xˆ2i ... xˆ xˆ 1i M i ... xˆ xˆ 2i M i . . . . ... xˆ 2 Mi What Symmetric kind of matrix is this? (MxM) c is the column Mvector: yi xˆ1i c yi xˆ2i . . yi xˆ M i b 1 b b2 . b is the column M. vector of regression b M coefficients (2) The predictors/predictand are zero-mean AND the predictors are orthogonal: A 2 xˆ 1i xˆ2i xˆ1i . . xˆ M i xˆ1i b 1 b b .2 . b M xˆ1i xˆ2i 2 xˆ2i . . xˆ M i xˆ2i ... xˆ xˆ 1i M i ... xˆ xˆ 2i M i . . . . ... xˆ 2 Mi ˆ y x i 1i yi xˆ2i c . . ˆ y x i M i xˆ 1i 0 . . 0 2 0 ... 0 2 ˆ x ... 0 2i . . . . . . 0 0 xˆ Mi 2 Ab c 2 ˆ ˆ y x / x b i 1i 1i 1 2 b ˆ ˆ 2 yi x2i / x2i b Since the matrix is . . . . diagonal, this is readily b 2 solved as: M yi xˆ M i / xˆ M i (3) The predictors/predictand are zero-mean AND have unit standard deviation AND the predictors are orthogonal: xˆ 2 xˆ1i xˆ2i 2 xˆ2i . . xˆ M i xˆ2i ... xˆ xˆ 1i 1i M i ... xˆ xˆ xˆ2i xˆ1i A 2i M i . . . . . . ... xˆ 2 xˆ M i xˆ1i Mi r b x y y xˆ 1 1 i 1i r b ˆ y x x2 y 2i i 2 b . c . N . . . . b r y xˆ i Mi x y 2 xˆ 0 1i ... 0 2 ˆ x ... 0 2i . . . . . . 0 0 xˆ Mi 0 . . 0 2 Ab c M b 1 yi xˆ1i / xˆ1i 2 2 2 b y xˆ / xˆ 2 The vector of regression coefficients b . i 2i 2i . . b is the vector of the individual . b correlation coefficients between the M yi xˆM i / xˆM i predictand and predictors! M rx y 1 rx y 2 . . rx y M (4) the predictors are not orthogonal, but both predictors and predictand are zero-mean and have unit standard deviation: xˆ1i 2 ... xˆ xˆ 1i M i 2 ... xˆ xˆ xˆ2i xˆ1i xˆ2i A 2i M i . . . . . . . . 2 xˆ M i xˆ1i xˆ M i xˆ2i ... xˆ M i r b x y y xˆ 1 1 i 1i r b ˆ y x x2 y 2i i 2 b . c . N . . . . b r y xˆ i Mi x y M xˆ1i xˆ2i M 1 r x x 2 1 . . r x x M 1 r x x 1 2 1 ... . . 0 . . 0 Ab c The vector of regression coefficients b is the vector of partial correlation coefficients between the predictand and (correlated) predictors! b A c 1 1 M r x x 2 M . . 1 ... r x x b 1 b 2 . . b M r ' x y 1 r' x y 2 . . r' x y M (5) One or more of the predictors are linearly dependent: A xˆ1i 2 xˆ1i xˆ2i xˆ2i xˆ1i . . xˆ M i xˆ1i b A c 1 2 xˆ2i . . xˆ M i xˆ2i ... xˆ xˆ 1i M i ... xˆ xˆ 2i M i . . . . ... xˆ 2 Mi is no longer full rank! We can no longer obtain the solution by inverting since A is NOT invertible!! Ab c One could try to obtain a solution to by singular value decomposition (SVD) or simply try to eliminate the redundant predictor... A A xˆ 2 1i xˆ2i xˆ1i . . xˆ M i xˆ1i xˆ1i xˆ2i 2 ˆ x 2i . . xˆ M i xˆ2i ... xˆ xˆ 1i M i ... xˆ xˆ 2i M i . . . . 2 ˆ ... x Mi is the variance-covariance matrix of the dataset. x are determined by (A I)x 0 The principal axes of variance in the dataset Ax x det(A I) 0 or describes the non-trivial vectors Principal Component Analysis (PCA) Subject for a later lecture... ANOVA for Multiple Linear Regression Source df SS Total n-1 SST Regression K SSR Residual n-K-1 SSE MS F-test MSR=SSR/K MSR/MSE MSE=SSE/(n-K-1)=se2 R SSR / SST 2 We can generalize the notion of the univariate coefficient of determination as the sum over the product of the partial and actual correlation coefficients between the predictand and the M predictors: K R2 r' x y rx y i i 1 i Recall results for linear regression: b 0 t b n y x y x i i i i 2 b n x 2 x i i 1/ 2 1 se b (x x) i 2 Recall results for linear regression: This generalizes to the multivariate case t bj 0 b n y x y x i i i i 2 b 2 n x x i i ( j) j ( j) se bj ( j) 1/ 2 1 ( x i ( j) x) 2 ( j) j Example #1: Statistical model for global CO2 concentrations as a function of time Monthly Atmospheric CO2 (1959-1988) Measurements of CO2 in parts per million (ppm) at Mauna Loa Observatory Higher readings occur in winter when plants die and release CO2 to the atmosphere. Lower readings occur in summer when more abundant vegetation absorbs CO2 from the atmosphere. Monthly Atmospheric CO2 (1959-1988) Linear trend(dashed) and quadratic least-squares trends (solid) shown along with actual (dotted) data Monthly Atmospheric CO2 (1959-1988) ANOVA Table and summary of regression Results of multivariate regression are simplest to interpret if predictors can be specified based on objective a priori considerations (e.g., for physical reasons)... Example #2: Statistical model of Northern Hemisphere Temperatures as a linear combination of climate forcings Temperature Solar Irradiance Greenhouse Gases Volcanism Relationship of variations in Northern Hemisphere Annual Mean Temperature reconstruction to estimates of three candidate Climate Forcings Mann, M.E., Bradley, R.S., Hughes, M.K., Global-Scale Temperature Patterns and Climate Forcing Over the Past Six Centuries, Nature, 392, 779-787, 1998. Temperature Solar Irradiance Greenhouse Gases Volcanism Standardize and remove mean from predictors and predictand... Treat this as a multivariate regression problem We are interested in the partial correlations between candidate forcings and “response”. Temperature Solar Irradiance Greenhouse Gases bA c 1 1 A N r sc r sv Volcanism Standardize and remove mean from predictors and predictand... b r sc 1 r cv sv r cv 1 r r ST sT r' c N r CT CT r' r VT VT r' Temperature Solar Irradiance Greenhouse Gases bA c 1 1 A N r sc r sv Volcanism b r sc 1 r cv sv r cv 1 r r ST sT r' c N r CT CT r' r VT VT r' Temperature Solar Irradiance How is the significance of regression coefficients established? What is the appropriate null hypothesis here? Greenhouse Gases Volcanism Correlations between white noise predictand time series and 3 independent white noise predictors? Correlations between red noise predictand time series and 3 independent red noise predictors? Motivates a Non-Parametric Approach… ALTERNATIVE APPROACH Science What if predictors cannot be specified based on objective a priori considerations? Selection Rules Ithaca Winter Snowfall Variations (1980-1986) Training Sample (Developmental Data) Verification Sample (Independent Data) Ithaca Winter Snowfall Variations (1980-1986) Training Sample Fails Cross-validation! Classic example of STATISTICAL OVERFITTING... Ithaca Winter Snowfall Variations (1980-1986) It is clear that we need some objective way of sorting out real predictors from spurious predictors! Screening Regression Training Sample Fails Cross-validation! Classic example of STATISTICAL OVERFITTING... Forward Selection or Stepwise Regression •Begin with M potential predictors, and the assumption y=b0 •Evaluate the strength of the linear relationship between predictand and the M predictors. Select the most significant x1 (ie, that with the greatest value of r or largest t ratio) •The model is now y= b0+ b1 x1 (b0 is not the same as above) •Iteratively construct the best model y= b0+ b1 x1 +…+ bK xK with K predictors (selecting at each step that xj that produces the largest t ratio, greatest increase in R2, or the greatest value of F) •Stop at some point K<M! Example of Forward Selection (Stepwise Regression) Backward Elimination •Begin with simultaneous multivariate regression against the M potential predictors: y= b0+ b1 x1 +…+ bM xM •Evaluate the strength of the linear relationship between predictand and the M predictors. Select the least significant x1 (ie, that with the smallest value of r or smallest t ratio), and eliminate from the regression equation. •The new model is y= b0+ b1 x1 +…+ bM-1 xM-1 •Iteratively construct the best model y= b0+ b0 x1 +…+ bK xK with K predictors (selecting at each step that xj that has the smallest t ratio, produces the smallest decrease in R2, or the smallest value of F) •Stop at some point K<M! Accounting for Multiplicity of predictors Critical F-ratio K=12, n=127 This curve can be generated using Monte Carlo methods Stopping Point Choice Cross-Validation Stopping Point Choice Cross-Validation
© Copyright 2026 Paperzz