Antonio Stasi 1. 2. 3. 4. 5. 6. 7. 8. Regression: a general set-up Linear regression Linear regression by OLS What about more variables Beyond the lines Measures of fit Variables selection How to run a linear regression Regression: a general set-up • You have a set of data on two variables, X and Y, represented in a scatter plot • You wish to find a simple, convenient mathematical function that comes close to most of the points, thereby describing succinctly the relationship between X and Y Linear Regression • The straight line is a particularly simple function. • When we fit a straight line to data, we are performing linear regression analysis. Linear Regression • The Goal: • Find the “best fitting” straight line for a set of data. Since every straight line fits the equation: Y = bX + c • with slope b and Y-intercept c, it follows that our task is to find a b and c that produce the best fit. • There are many ways of operationalizing the notion of a “best fitting straight line.” A very popular choice is the “Least Squares Criterion.” Linear Regression • For every data point , define the X, Y “predicted score” to be the value of the straight line evaluated at . The “regression residual,” or “error of prediction” is the distance from the straight line to the data point in the up-down direction. Linear Regression by OLS Y 40 20 0 0 10 20 X Linear Regression by OLS 40 Y 20 predicte d 0 0 20 X Error or “residual” Observation Prediction 0 0 20 Temperature What about more variables? 26 24 22 20 30 40 20 30 20 10 10 0 0 Temperature What about more variables? Same Story, Same Procedure 26 24 22 20 30 40 20 30 20 10 10 0 0 Y = c + b1X + b2 Temperature Beyond lines Y = c + b1 X + b2 X2 40 still linear inX 20 0 0 10 20 everything is the same with Beyond the lines • Use of LOGARITHM – Both sides of equality sign = b are elasticities – Only left hand side = b are percentages of the Y Measures of fit • R-square – Measures the percentage of data variability explained by the linear regression • St. error – Measures the dispersion (variability) of parameters • T-test and P-value – Measures whether that value of the parameter significantly predict the Y, the Pvalue indicates the probability of committing a mistake when I say that the parameter is significant • F-test and P-value – Measures whether the model is overall good Variables selection • Structural variables should be included in the model – Variables from economic theory, e.g. education vrb when I wanna predict income For other variables • One man One rule…When the Pvalue relative to the variable is < 0.15 I keep the variable How to run a linear regression • Download poptools from the internet • http://www.cse.csiro.au/poptools/download .htm • It is a add-in (componente aggiuntiva), so you must install it jointly with your excel software • Go to extra-stats • Select regression • Select X variables matrix, select Y vector and run Boosting Adult System Education In Agriculture – AGRI BASE Financed by: Partners:
© Copyright 2026 Paperzz