Statistics – 20080 1. MINITAB - Lab 2 Simple Linear Regression In simple linear regression we attempt to model a linear relationship between two variables with a straight line and make statistical inferences concerning that linear model. Using the fire damage dataset from last week, we are assuming here that the variable on the x-axis (the distance from the fire station) will predict the amount of fire damage caused to the house. In this case therefore, distance from the fire station is the predictor variable and the damage to the property is the response variable. 2. Fitting the Line Construct a scatter plot of the data to determine the nature of the relationship between the two variables. Then calculate the correlation coefficient, which describes the relationship numerically. The next step is to calculate the equation of the least squares regression line, which is the data’s line of best fit. This step is what is known as fitting the line. The reason this is constructed is to help the researcher to see any trends and make predictions. The line of best fit is needed because we want to predict the values of y from the values of x. In other words we want to predict the damage in €’s using the distance from the fire station. When fitting a straight-line model we fit what is called the least squares line. This is a straight line such that the vertical distance between the points and the line is kept at a minimum. An equation for a straight-line model has two components, the intercept and the slope. Therefore the equation of the least squares regression line takes the form, Response = intercept + slope* (predictor variable) + ε (the error or residual term) Or more generally: ŷ = a + bx + ε Where • • • • a is the intercept b is the slope of the line ε is the distance between the fitted line and the data point (I.e. The residuals) X is the chosen value of the predictor variable 1 Summary from lecture notes The formulae for the estimates of the slope and the intercept are; SS xy Slope: b= Where SS xy = ∑ (x SS xx = ∑ (x a = y − bx Intercept: SS x i − x )( yi − y ) = ∑x y i i − (∑ x )(∑ y ) i i n (∑ x ) − 2 n 3. − x) = 2 i ∑ xi2 i n = sample size Fitting A Regression Model in MINITAB Using the drop down menus in Minitab, go to Stat - Regression - Fitted Line Plot 1. Select the response variable here 2. Select the predictor variable here 3. Ensure that the linear model is selected This command will give you a scatter plot of the response variable versus the predictor variable with the least squares line shown in blue on the plot. The least squares regression equation will be displayed over the plot. If you look at the session window you will also see the ANOVA table for this model and the associated p-value, similar to the table below. We will cover what this ANOVA table means in the next class. 2 Regression Analysis: Damage - $ versus Distance Regression Line The regression equation is Damage - $ = 10.28 + 4.919 Distance S = 2.31635 R-Sq = 92.3% R-Sq(adj) = 91.8% Analysis of Variance Source Regression Error Total DF 1 13 14 SS 841.766 69.751 911.517 MS 841.766 5.365 F 156.89 P 0.000 ANOVA Table What is the least squares regression equation? _________________________________ What is the slope? ________________________________________________________ What is the intercept? ______________________________________________________ What type of relationship is there between distance and damage? Now that we know what the least squares regression equation is we can use it to make predictions for the dependent variable. If a building which was on fire was 10 miles from the nearest fire station how much damage would be caused to it in the event of a fire? ________________ Hint: Substitute 10 in for X in the regression line equation. (a) For each of the datasets used in last week’s lab, find the least squares regression equation, the slope and the intercept. 1. 2. 3. 4. 3 Answer the following using your answer from (a) If the three most recent volunteers in a blood donation clinic had blood pressures of 86, 91 and 101 respectively, what would your estimate of their platelet-calcium concentrations be? ___________________ Five people were randomly selected and their heights were measured to be 145, 150, 161, 165 and 177cm. What are the estimated weights of these people?_______________________ ____________________________________________________________________________ If you look more closely at the figures you have just calculated and compare them to the actual values from the original dataset you will notice that they are not exactly the same. This is because the calculated figures are fitted using the regression line (indicated in blue on the scatter plot). The discrepancies between the two numbers are the residuals. The coefficient of correlation is one way of quantifying how large this discrepancy is. 6. The Coefficient of Determination - R2 How much of the variation in y is explained by the linear relationship between x and y? The answer to this is given by the Coefficient of Determination or R2. The Coefficient of Determination is the ratio between the total variation in the data and variation 'explained' by the linear relationship between the predictor and response variables. Coefficient of Determination - R2 R2 = SS regression / SS Total What is R2 for the regression model fitted to the fire damage dataset? _____________________ What does this figure mean? ______________________________________________________ Note, that in the case of a simple linear regression model the coefficient of determination is the correlation coefficient squared. Calculate the square root of R2 and compare it to the correlation coefficient. 4 Calculate and interpret the coefficient of determination for each of the datasets from last week’s lab. 1. 2. 3. 4. 5 Assignment: Due 2 week’s time. From the Minitab class page download the file named TV. This contains the data for 15 students on their final year mark and the number of hours they spend watching TV. 1. Construct a simple linear regression line for this data and show the graph. 2. What is the correlation coefficient for this data? 3. Is there a negative or a positive correlation between the number of hours spent watching TV and the end of year grade? 4. What is the value of the intercept with the regression line and the y-axis? 5. What is the slope of this regression line? Assignments should be handed in at the beginning of class two weeks from today. Late assignments will not be accepted. REVISION SUMMARY After this lab you should be able to: - Calculate the correlation coefficient by hand and in Minitab - Fit a simple linear regression line to data using Minitab - Make predictions from the least squares regression line 6
© Copyright 2026 Paperzz