Lecture (14,15) More than one Variable, Curve Fitting, and Method of Least Squares Two Variables Often two variables are in some way connected. Observation of the pairs: X Y X1 X2 . . . Xn Y1 Y2 . . . Yn Covariance The covariance gives the some information about the extent to which the two random variables influence each other. Cov( x, y ) E{x E{x}}E{ y E{ y}} Cov( x, y ) E{x. y} E{x}.E{ y} it is computed from the sample as, 1 n Cov( x, y ) ( x x )( y y ) n i 1 if x=y 1 n Cov( x, x) ( x x )( x x ) n i 1 n 1 2x ( x x ) 2 n i 1 Example Covariance 7 6 5 4 3 2 1 0 0 1 2 3 4 5 n cov( x, y ) (x i 1 i 6 7 x)( yi y )) n x y xi x yi y 0 2 3 4 6 3 2 4 0 6 -3 -1 0 1 3 0 -1 1 -3 3 x3 y3 7 1.4 5 ( xi x )( yi y ) 0 1 0 -3 9 7 What does this number tell us? Pearson’s R cov( x, y ) • Covariance does not really tell us anything – Solution: standardise this measure • Pearson’s R: standardise by adding std to equation: xy cov(x , y ) x y Correlation Coefficient Cov( x, y ) E{x E{x}}E{ y E{ y}} ( x, y ) x y x y it is computed from the sample as, Cov( x, y ) ( x, y ) x y 1 n ( x x )( y y ) n i 1 1 n 1 n 2 2 ( x x ) ( y y ) n i 1 n i 1 1 ( x, y ) 1 if x=y ( x, x) 1 ( x, y ) 0 there is no relation between x and y ( x, y ) 1 there is a perfect reverse relation between x and y Correlation Coefficient (Cont.) 60 ( x, y ) 40 ( x, y ) 0 60 40 20 Y 20 Y 0 0 -20 -40 -20 -60 -60 -40 -20 0 20 40 60 X -40 -10 0 10 20 30 X 100 0.8 ( x, y ) 1 80 ( x, y ) 0.6 Y Y 60 0.4 40 0.2 20 0 0 0 20 40 60 X 80 100 0 20 40 60 X 80 100 Procedure of Best Fitting (Step 1) How to find out the relation between the two variables? 1. Make observation of the pairs: X Y X1 X2 . . . Xn Y1 Y2 . . . Yn Procedure of Best Fitting (Step 2) 2. Make plot of the observations. It is always difficult to decide whether a curved line fits nicely to a set of data. Straight lines are preferable. 80 60 Y We change the scale to obtain straight lines. 40 20 0 -20 -40 -40 -20 0 X 20 40 Method of Least Square (Step 3) 3. Specify a straight line relation. Y=a+bX We need to find a and b that minimises the square of the differences between the line and the observed data. 80 X b + a Y= 60 Y 40 20 0 -20 -40 -40 -20 0 X 20 40 Step 3 (cont.) find best fit of a line in a cloud of observations: Principle of least squares ε n 2 ˆ ( y y ) i i 1 n min y = ax + b ε = ŷ, predicted value = yi, true value = residual error Method of Least Square (Step 4) The sum of the squared deviation is equal to, S ( a, b) n y i 1 i ( a bxi ) 2 Values a and b for which S is minimum, S ( a, b) S ( a, b) 0 and 0 a b n 2 yi ( a bxi ) 0 a i 1 a n y 2 i i 1 2 yi ( a bxi ) ( a bxi ) 2 0 yi2 [2 yi ( a bxi )] ( a bxi ) 2 0 a a i 1 a n yi2 yi 2[ y ( a bx ) ( a bx ) ] 2( a bx ) i i i i 0 a a a i 1 n 2[ yi ( a bxi )] 2( a bxi ) 0 a i 1 n n 2 y i i 1 n y i i 1 n y i 1 i 2( a bxi ) 0 ( a bxi ) 0 n na b xi 0 i 1 Method of Least Square (Step 5) S ( a, b) 0 b n 2 yi ( a bxi ) 0 b i 1 b n y 2 i i 1 2 yi ( a bxi ) ( a bxi ) 2 0 yi2 2 [2 y ( a bx )] ( a bx ) i i i b 0 b b i 1 n yi2 yi 2[ yi ( a bxi ) ( a bxi ) ] 2( a bxi ) xi 0 b b b i 1 n 2[ y ( a bx )] 2( a bx ) x 0 i i i i b i 1 n n 2( y x ) 2( a bx ) x 0 i i 1 i i n ( y x ) ( ax i i 1 n x i 1 i i i i bxi2 ) 0 n n i 1 i 1 yi a xi b xi2 0 Method of Least Square (Step 6) S ( a, b) 0 b n 2 yi ( a bxi ) 0 b i 1 b n y 2 i i 1 2 yi ( a bxi ) ( a bxi ) 2 0 yi2 2 [2 y ( a bx )] ( a bx ) i i i b 0 b b i 1 n yi2 yi 2[ yi ( a bxi ) ( a bxi ) ] 2( a bxi ) xi 0 b b b i 1 n 2[ y ( a bx )] 2( a bx ) x 0 i i i i b i 1 n n 2( y x ) 2( a bx ) x 0 i i 1 i i n ( y x ) ( ax i i 1 n x i 1 i i i i bxi2 ) 0 n n i 1 i 1 yi a xi b xi2 0 Method of Least Square (Step 7) n a n y x i 1 i i 1 2 i n n i 1 i 1 xi xi yi n x xi i 1 i 1 n n 2 2 i b n n n i 1 i 1 i 1 n xi yi yi xi n x xi i 1 i 1 n n 2 2 i y a bx Example We have the following eight pairs of observations: X y 1 1 3 2 4 4 6 4 8 5 9 7 11 8 14 9 Example (Cont.) Construct the least square line: xi yi Xi^2 xi.yi Yi^2 1 1 1 1 1 3 2 9 6 4 4 4 16 16 16 6 4 36 24 16 8 5 64 40 25 9 7 81 63 49 11 8 121 88 64 14 9 196 126 81 40 524 364 256 5 65.5 45.5 32 56 1/n 7 N=8 Example (Cont.) n a n n n y x x x y i i 1 2 i i 1 i 1 i i 1 n x xi i 1 i 1 n i n i 2 2 i 40*524 56*364 6 a 0.545 8*524 56*56 11 b n n n i 1 i 1 i 1 n xi yi yi xi n x xi i 1 i 1 n n 2 2 i 8*364 56* 40 7 b 0.636 8*524 56*56 11 xi yi Xi^ 2 xi.yi Yi^2 1 1 1 1 1 3 2 9 6 4 4 4 16 16 16 6 4 36 24 16 8 5 64 40 25 9 7 81 63 49 11 8 121 88 64 14 9 196 126 81 56 40 524 364 256 7 5 65.5 45.5 32 Example (Cont.) 10 Equation Y = 0.545+ 0.636 * X 8 Number of data points used = 8 Average X = 7 Y Average Y = 5 6 4 2 0 0 4 8 X 12 16 Example (2) i 1 2 3 4 5 xi 2.10 6.22 7.17 10.5 13.7 yi 2.90 3.83 5.98 5.71 7.74 5 xi 39.69 i 1 5 2 xi 392.3201 i 1 5 yi 26.16 i 1 5 xi yi 238.7416 i 1 1 1 (26.16)(392.3) (39.69)( 238.7) 5 a 5 2.038 1 2 392.3 39.69 5 1 238.7 (39.69)( 26.16) 5 b 0.4023 1 2 392.3 39.69 5 y 2.038 0.4023x Example (3) Excel Application • See Excel Covariance and the Correlation Coefficient • Use COVAR to calculate the covariance Cell =COVAR(array1, array2) – Average of products of deviations for each data point pair – Depends on units of measurement • Use CORREL to return the correlation coefficient Cell =CORREL(array1, array2) – Returns value between -1 and +1 • Also available in Analysis ToolPak Analysis ToolPak • • • • • • • Descriptive Statistics Correlation Linear Regression t-Tests z-Tests ANOVA Covariance Descriptive Statistics • Mean, Median, Mode • Standard Error • Standard Deviation • Sample Variance • Kurtosis • Skewness • Confidence Level for Mean • • • • • • • Range Minimum Maximum Sum Count kth Largest kth Smallest Correlation and Regression • Correlation is a measure of the strength of linear association between two variables – – – – Values between -1 and +1 Values close to -1 indicate strong negative relationship Values close to +1 indicate strong positive relationship Values close to 0 indicate weak relationship • Linear Regression is the process of finding a line of best fit through a series of data points – Can also use the SLOPE, INTERCEPT, CORREL and RSQ functions Polynomial Regression • Minimize the residual between the data points and the curve -- least-squares regression Linear yi a0 a1xi Quadratic yi a0 a1xi a2 xi2 Cubic yi a0 a1xi a2 xi2 a3 xi3 General yi a0 a1xi a2 xi2 a3 xi3 am xim Must find values of a0 , a1, a2, … am Polynomial Regression • Residual ei = yi (a0 a1xi a2 xi2 a3 xi3 am xim ) • Sum of squared residuals n 2 n S r = ei = [ y (a0 a1 x a2 x 2 a3 x 3 am x m )]2 i=1 i=1 • Minimize by taking derivatives Polynomial Regression • Normal Equations n n x i=1 i n xi2 i=1 n m xi i=1 n n i=1 2 xi i=1 n 3 xi i=1 n 4 xi i=1 n n xi i=1 n 2 xi i=1 n 3 xi m 1 xi i=1 m 2 xi i=1 m xi i=1 a0 n m 1 xi a1 i=1 n m 2 a2 xi i=1 n n 2m xi i=1 n y i ni=1 x y i=1 i i n 2 xi yi i=1 am n m xi yi i=1 Example x 0 1.0 1.5 2.3 2.5 4.0 5.1 6.0 6.5 7.0 8.1 9.0 y 0.2 0.8 2.5 2.5 3.5 4.3 3.0 5.0 3.5 2.4 1.3 2.0 x 9.3 11.0 11.3 12.1 13.1 14.0 15.5 16.0 17.5 17.8 19.0 20.0 y -0.3 -1.3 -3.0 -4.0 -4.9 -4.0 -5.2 -3.0 -3.5 -1.6 -1.4 -0.1 6 4 f(x) 2 0 -2 -4 -6 0 5 10 15 x 20 25 Example x 0 1.0 1.5 2.3 2.5 4.0 5.1 6.0 6.5 7.0 8.1 9.0 y 0.2 0.8 2.5 2.5 3.5 4.3 3.0 5.0 3.5 2.4 1.3 2.0 x 9.3 11.0 11.3 12.1 13.1 14.0 15.5 16.0 17.5 17.8 19.0 20.0 y -0.3 -1.3 -3.0 -4.0 -4.9 -4.0 -5.2 -3.0 -3.5 -1.6 -1.4 -0.1 n n x i=1 i n xi2 i=1 n 3 xi i=1 n xi i=1 n 2 xi i=1 n 3 xi i=1 n 4 xi i=1 n 2 xi i=1 n 3 xi i=1 n 4 xi i=1 n 5 xi i=1 3 xi i=1 a n 4 0 xi a i=1 1 n 5 a2 xi i=1 a3 n 6 xi i=1 n n y i ni=1 x y i=1 i i n 2 xi yi i=1 n 3 xi yi i=1 229.6 3060.2 46342.8 a0 24 1.30 229.6 316.9 3060.2 46342.8 752835.2 a1 752835.2 12780147.7 a2 3060.2 46342.8 6037.2 46342.8 752835.2 12780147.7 223518116.8 a 9943.36 3 Example a0 0.3593 a 2.3051 1 a2 0.3532 a 0.0121 3 Regression Equation y = - 0.359 + 2.305x - 0.353x2 + 0.012x3 6 4 f(x) 2 0 -2 -4 -6 0 5 10 15 x 20 25 Nonlinear Relationships y aebx To make it linear, take logarithm of both sides • If relationship is an exponential function ln (y) ln (a) + bx Now it’s a linear relation between ln(y) and x • If relationship is a power function y ax To make linear, take logarithm of both sides ln (y) ln (a) + b ln (x) Now it’s a linear relation between ln(y) and ln(x) b Examples • Quadratic curve y a0 a1 x a2 x 2 q a0 a1 H a2 H 2 – Flow rating curve: • q = measured discharge, • H = stage (height) of water behind outlet • Power curve y ax b – Sediment transport:c aq b • c = concentration of suspended sediment • q = river discharge n q K c – Carbon adsorption: • q = mass of pollutant sorbed per unit mass of carbon, • C = concentration of pollutant in solution Example – Log-Log 100 x y X=Log( x) Y=Log( y) 1.2 2.1 0.18 0.74 2.8 11.5 1.03 2.44 4.3 28.1 1.46 3.34 5.4 41.9 1.69 3.74 6.8 72.3 1.92 4.28 7.9 91.4 2.07 2.5 4.52 90 2 80 70 1.5 Y=Log(y) y 60 50 1 40 30 0.5 20 10 0 0 0 1 2 3 4 5 x x vs y 6 7 8 9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 X=Log(x) X=Log(x) vs Y=log(y) 0.9 1 Example – Log-Log Using the X’s and Y’s, not the original x’s and y’s n n Xi i=1 n Xi Yi a i=1 i=1 n 2 B n X Y Xi i i i=1 i=1 n 6 8.34 a 19.1 8.34 14.0 B 31.4 5 5 X i ln (xi ) 8.34 i 1 i 1 5 2 5 2 X i ln (xi ) 14.0 i 1 i 1 5 5 Yi ln (yi ) 19.1 i 1 i 1 5 5 X iYi ln (xi ) ln (yi ) 31.4 i 1 i 1 Example – Carbon Adsorption q K c n q = pollutant mass sorbed per carbon mass C = concentration of pollutant in solution, K = coefficient n = measure of the energy of the reaction log10 q log10 K n log10 c Example – Carbon Adsorption Linear axes: K = 74.702, and n = 0.2289 350 300 250 200 q q K c n 150 100 50 0 0 100 200 300 C 400 500 600 Example – Carbon Adsorption Logarithmic axes: logK = 1.8733, K = 101.6733 = 74.696, n = 0.2289 3 2.5 Y=Log(q) 2 1.5 log10 q log10 K n log10 c 1 0.5 0 0 0.5 1 1.5 X=Log(c) 2 2.5 3 Multiple Regression yi axi b e i Regression model • Y1 = x11 b1 + x12 b2 +…+ x1n bn + e1 Y2 = x21 b1 + x22 b2 +…+ x2n bn + e2 : Ym = xm1 b1 + xm2 b2 +…+ xmn bn + em . y1 x11 x12 x1n b y x x x b21 2 21 22 2nb ym xm1xm2 xmn n Multiple regression model e1 e 2 e m In matrix notation Multiple Regression (cont.) y1 x11 x12 x1n b e 1 b 1 e y2 x21 x22 x2n b2 2 n y xm1xm2 xmn e m m Y X b e Observed data = design matrix * parameters + residuals
© Copyright 2024 Paperzz