Activity 4: Regression Work in groups of two or three. Turn in one activity per group. Person to whom the activity will be returned: _____________________ Other group members: _____________________ _____________________ From the course website www.apsu.edu/jonesmatt/1530.htm, open the Minitab worksheet called Life Expectancy vs IMR.MTW. Infant mortality rate (IMR) is the number of infant deaths (one year old or younger) per 1000 live births. The infant mortality rates and life expectancies in years for countries with populations of more than twelve million are in this worksheet. Work through parts (a) through (j) below, and then show your work to your instructor. (a) Use Minitab to make a scatterplot of Life Expectancy vs. IMR. Do this by clicking Stat – Regression – Fitted Line Plot. Use `Life Expectancy’ as the response variable and `IMR’ as the predictor variable. This is because we would like to be able to predict life expectancy when given the infant mortality rate for a particular country. Click on the box called `Graphs’ in the Fitted Line Plot window. Check `Residuals versus order’. Click `OK’ twice. (b) You should see two new windows, each with a plot. First look at the one called “Fitted Line Plot”. Minitab has drawn the regression line (the straight line that gets closest to all the data values at the same time) in blue, and the equation of the regression line appears at the top of the scatterplot. Write the equation of the regression line below. (c) The coefficient of determination r2 is the percentage of the variation in the data explained by the regression line: Here, the ŷi are the responses predicted by the regression line using the observations xi. It turns out the coefficient of determination is the square of the correlation coefficient, and so this is why we call the coefficient of determination r2. Minitab displays r2 to the right of the scatterplot next to `R‐Sq’. What is the value of the coefficient of determination? (Pay no attention to `R‐ Sq(adj)’). (d) The closer r2 is to 1, the closer the regression line comes to the data points. So, r2 is one way (but not the only way) to measure how well the regression line models data. Based only on the value of r2, do you think this line is useful for predicting life expectancy from the infant mortality rate? Explain. ______________________________________________________________________________ ______________________________________________________________________________ ______________________________________________________________________________ ______________________________________________________________________________ ______________________________________________________________________________ (e) A new country called Matt’s Kingdom has formed, and the infant mortality rate is 42. Use the equation of the regression line to predict the life expectancy for citizens of Matt’s Kingdom. (f) Another new country, Filly’s Federation, has infant mortality rate 200. Can you or should you use the regression line to predict the life expectancy for Filly’s Federation? Explain. ______________________________________________________________________________ ______________________________________________________________________________ ______________________________________________________________________________ ______________________________________________________________________________ ______________________________________________________________________________ (g) Now examine the plot in the other window called “Residuals vs Order for LIFE EXPECTANCY”. A residual is an observed response minus the value predicted by the regression line: It is simply the distance (and in what direction) a response is from the best‐fitting line. An outlier is a data point that lies far from the regression line in the vertical direction. So, outliers have large residual values. A great way to detect outliers is to plot the residuals. Move the mouse over points with large residuals. If you had to choose exactly two points to be outliers, which would they be? Identify them as (IMR, Life Expectancy) ordered pairs, e.g.(62, 59.3), (78.1, 57.4). Carefully remove the two outliers and remake the fitted line plot. Does the new regression line fit the new data set better than before? Explain. ______________________________________________________________________________ ______________________________________________________________________________ ______________________________________________________________________________ ______________________________________________________________________________ ______________________________________________________________________________ ______________________________________________________________________________ ______________________________________________________________________________ (h) Now look back at the original data set (put the deleted points back or reopen the worksheet). An influential observation is a data point whose removal will cause a significant change in the regression line. If you had to choose exactly one influential observation from the original data set, what would it be? Identify this as an ordered pair. (i) Remove your choice of influential observation from the data set, making sure you delete both the `Life Expectancy’ coordinate and the `IMR’ coordinate. Then use Minitab to make a new fitted line plot. Explain what has happened to the regression line. Also, what is the new value of the coefficient of determination? Is this regression line more useful than the previous one for making predictions? Explain. ______________________________________________________________________________ ______________________________________________________________________________ ______________________________________________________________________________ ______________________________________________________________________________ ______________________________________________________________________________ ______________________________________________________________________________ (j) Save your Minitab project or email it to yourself in case you want to look at it again while studying for your exam.
© Copyright 2026 Paperzz