What is regression? (source: Roger Hadgraft [email protected] Fitting models to data sets Linear regression is most common “Line of best fit” Example Look at the data Fuel efficiency Fuel consumption (L) 25 20 15 e(i) 10 5 0 800 1000 1200 Mass (kg) 1400 1600 Chart | Add trendline Fuel efficiency y = 0.0126x - 0.8763 R2 = 0.3427 Fuel consumption (L) 25 20 15 10 5 0 800 1000 1200 Mass (kg) 1400 1600 Some maths Assume we have paired data [x(i), y(i)] Simplest model is: Y(i) = b1.x(i) + bo Where Y is model and y is original data This notation matches Excel’s Residual or Error: e(i) = Y(i) - y(i) Minimise e(i)2 - Least Squares approach Chart | Add trendline Fuel efficiency y = 0.0126x - 0.8763 R2 = 0.3427 Fuel consumption (L) 25 20 e(i) 15 10 5 0 800 1000 1200 Mass (kg) 1400 1600 Regression coefficients ei yi b1.xi bo 2 2 SSE = Sum of the Squares of the Errors Choose b1 and b0 to minimise SSE ei n (SSE) 2 ( yi b1.xi bo) 0 bo n (SSE) 2 ( yi b1.xi bo) xi 0 b1 n 2 Rearranging n.bo b1 xi yi n n bo xi b1 xi xi yi 2 n Thus : b1 n n n xi yi xi yi n n n n xi xi n n 2 bo y b1.x 2 Data Analysis | Regression We can do the calculations by hand, or we can use Excel’s Data Analysis Toolpak Tools | Add-ins | Data Analysis Once only to activate it Tools | Data Analysis | Regression Demonstration Example Chart | Add trendline Fuel efficiency y = 0.0126x - 0.8763 R2 = 0.3427 Fuel consumption (L) 25 20 15 10 5 0 800 1000 1200 Mass (kg) 1400 1600 Tools | Data Analysis | Regression This means that 34% of the variance in fuel consumption is explained by vehicle mass. The remaining 66% belongs to other factors (eg driver behaviour, etc Is the model any good? R2 = proportion of variance of y data explained by regression equation =SSR/SST SSR = unexplained variance Total Sum of Squares SST yi y 2 n Error Sum of Squares SSE ei 2 n Regression Sum of Squares SSR SST SSE Tools | Data Analysis | Regression R = sqrt(R2) Tools | Data Analysis | Regression Compensates for different number of model parameters (in multiple linear regression). Text page 587 Tools | Data Analysis | Regression standard deviation of the residuals (but divide by (n2) rather than (n-1)) Questions? Tools | Data Analysis | Regression ANOVA = Analysis of Variance Tools | Data Analysis | Regression SSR, SSE and SST Tools | Data Analysis | Regression Regression df = k-1 Total df = n-1 Residual df=(n-1)-(k-1)=(n-k) k=number of parameters n=number of data points Tools | Data Analysis | Regression Regression MS = SSR/df1 Residual MS = SSE/df2 Tools | Data Analysis | Regression F = Reg MS / Residual MS Tools | Data Analysis | Regression Probability of F statistic given df1=1 and df2=18. This is the probability of no relationship. Analisis Other regressions Multilinear regression Non-linear equations Transform the variables, eg logs, powers, etc use multi-linear regression to determine coefficients
© Copyright 2026 Paperzz