STAT 200 Guided Exercise 8 Answers Problem 1. Two Sample Proportion Problem. Support for the President can be a tricky thing. The overall support can be thought of as a weighted average of the support by Republicans, Democrats, and Independents. A recent CBS News/New York Times Poll. Oct. 5-8, 2006. N=983 adults who were likely votes nationwide. MoE ± 3 (for all adults). The data are below. Do you approve or disapprove of the way George W. Bush is handling his job as president? We will focus on the proportion who approve. We expect large differences between Republicans and Democrats in support for Bush. Suppose Republican strategists are most concerned that the gap between Democratic likely voters and Independent likely voters is less than 22 points. We are moving toward an alternative hypothesis test whether the difference in proportions between Independent support for the President and Democratic support for the President is less than 22 points. a. Calculate the approval proportion for each group. Democrats pd= 31/314 = .0987 qd = .9013 Independents pi=104/384 = .2708 qi = .7292 b. Calculate the Standard Error for this Problem - it does not assume that the two proportions are equal since we are interested if the difference is less than 22. c. Conduct the hypothesis test that the difference in proportions for approval of the President between Democrats and Independents is less than .22. Express this as pi – pd = .22 for the null and alternative hypotheses. Use an alpha level of .05. Null Hypothesis pi - pd = .22 Alternative Hypothesis pi - pd < .22 one-tailed test Assumptions of Test Test Statistic (z*) Large sample difference of proportions test; assume normal, pool variance z* = (.2708-.0987 - .22)/.0282 Rejection Region z.05. = -1.645 Calculation of Test Statistics z* = -1.696 Comparison of Test Statistics with Rejection Region z* < z.05 -1.696 < -1.645 Reject Ho: pi - pd = .22 1 What is the p-value for the test? p-value for test = .045 2. Bivariate regression and correlation. On the web site is an Excel file called TAMPALMS.xls. The data are from a study of residential property sales to appraisals. We will focus on sale price as the dependent variable (Y), and appraised price as the independent variable (X). Other variables are: PRICE APRAISED S2ARATIO APPLAND APPIMPROV The actual sale price of the property in $1,000s The total appraised value in $1,000s The ratio of the Sale Price to the Appraised Value The appraised value of the land in $1,000s The appraised value of improvements to the property in $1,000s a. First, let’s look at the descriptive statistics on Sale Price and Appraised Value. Using all your knowledge from the course, I want you to briefly describe each variable (ideas: compare the mean and median, the range, and so forth). The mean level for PRICE is 236.56 ($236,560). The median is considerably below that at 190 ($190,000), so there are extreme values that skewed the data. From the Histogram you can see substantial right skew. The sale prices of the houses ranged from 59.00 ($59,000) to 957.50 ($957,500). However, the Interquartile range was $153,250 to $287,250, which represented the middle 50% of the observations. The results for the APPAISED is similar, but the mean and median is lower - the appraisers underestimated the value of the homes. The mean level for APPRAISED is 201.75 ($201,750). The median is considerably below that at 167.07 ($167,070), so there are extreme values that skewed the data. From the Histogram you can see substantial right skew. The appraised values of the houses ranged from 64.98 ($64,980) to 929.40 ($929,400). However, the Interquartile range was $126,220 to $249,180, which represented the middle 50% of the observations. There is considerable variation in the data. The CV’s are 58.95 and 62.89 respectively. 2 b. I used JMP to create a correlation matrix of all the variables in the data set (including S to App Ratio, App Land and App Improve). Correlations PRICE APPRAISED S2ARATIO APPLAND APPIMPROV PRICE APPRAISED 1.000 0.972 0.972 1.000 -0.012 -0.214 0.889 0.936 0.973 0.988 S2ARATIO -0.012 -0.214 1.000 -0.276 -0.171 APPLAND APPIMPROV 0.889 0.973 0.936 0.988 -0.276 -0.171 1.000 0.879 0.879 1.000 a. What is the correlation between Sale Price and App Value? Briefly describe what this correlation means. r = .972. There is a strong positive relationship. As the sale price goes up, so does the appraised value b. What is the correlation App Land and S to App Ratio? Briefly describe what this means. r = -‐.276. There is a weak negative relationship. As the Appraised value for land goes up, the sale price to appraised value goes down c. I used JMP to make a scatter-gram of Sale Price (Y) to Appraised Value (X). Briefly describe the relationship. d. There is a strong, positive linear positive relationship between the PRICE and APRAISED. 3 e. I used JMP to generate the regression of Sale Price on Appraised Value. a) How many observations are in the data? 92 observations b) What is R-square for this model? Briefly explain what it means. R-squared = .945 94.5% of the variability in sale price is “explained” by knowing the appraised value. c) Prove that the bivariate regression Rsquare is the same as the correlation coefficient squared. r2 = .9722 = .945 d) What is the slope coefficient for App Value? Briefly explain what it means. Slope = 1.0687 A unit increase in appraised value results in a 1.07 increase in the sales price e) What is the intercept coefficient for this model? intercept = 20.9419 f. Solve the regression equation for a property with an appraised value of $150k (by this I mean use the coefficients from the regression output and solve the equation to come up with a predicted value for the Sale Price. Est Y = 20.94 + 150*1.07 EstY = 20.94 + 160.50 Est Y = 181.44 or $181,440 4 Problem 3. Bivariate regression and correlation. Model of Average Annual Precipitation An article in Geography (July 1980) used regression to predict average annual rainfall levels in California. Data on the following variables were collected for 30 meteorological weather stations scattered throughout California. For the group work we will focus on a bi-variate regression of Annual Percip on Latitude. You will have the option of examining all the variables for this problem for the last assignment Annual Precip DEPENDENT VARIABLE: Annual Precipitation in inches Altitude The altitude of the station in feet Latitude The latitude of the station in degrees Distance Distance from the coast in miles Facing I made this into a dummy variable. Stations on the Westward facing slopes of the California mountains were coded as 1, whereas stations on the leeward side were coded as 0 a. The following are the descriptive statistics on each variable. Briefly describe Annual Precipitation using the mean, median, std deviation and so forth. The mean level of annual precipitation at the 20 weather stations is 19.81 inches per year. This value is larger than the median (15.35) and is pulled by large values in the data (the extreme is 74.87 inches and the range is 73.21). As a result the variance is relatively large and the CV is 83.90%. B. What is the interpretation for the mean for FACING? Since Facing is a dummy variable, the mean is the proportion that have the value 1, stations on the westward facing slopes. 43.3% of the stations face the west coast. 5 The following are the covariance matrix and the correlation matrix on the variables. Covariance Matrix Correlation Matrix C. Confirm For Annual Precip and Latitude the following: The diagonal in the Covariance Matrix is roughly the Variance (the small difference is that the formula for covariances does not use n-1). Variance for Annual Percip 276.26 Covariance of Annual Percip with itself 267.05 Variance for Latitude Covariance of Lattitude with itself 7.11 6.873 The differences between them is based on using n (for the Co-variances) or (n-1) (for the Sample estimates) D. Briefly describe the correlation between Annual Precip and Latitude. The Correlation between Annual Precip and Latitude is r = .577, which is moderate and positive. As the latitude increases, so does the annual precipitation E. Facing is a dummy variable. Stations on the Westward facing slopes of the California mountains were coded as 1, whereas stations on the leeward side were coded as 0. Interpret the correlation between Annual Precip and Facing. The correlation = .598 is moderate, positive Stations on the west side of the mountains tended to have higher annual rainfall 6 F. Now we will shift to the bivariate regression of Annual Precip on Latitude. Verify that R2 in a bivariate regression is simply the correlation (r) squared. Interpret R2 for this model. r = .577 r2 = .5772 = .3329 R2 = .333 Solve the model for a Latitude of 37 degrees. Est Y = -113.303 + 3.595(37) Est Y = -113.303 + 133.01 Est Y = 19.71 inches 7
© Copyright 2026 Paperzz