guidedex8 ans

STAT 200
Guided Exercise 8 Answers
Problem 1. Two Sample Proportion Problem. Support for the President can be a tricky thing.
The overall support can be thought of as a weighted average of the support by Republicans, Democrats,
and Independents. A recent CBS News/New York Times Poll. Oct. 5-8, 2006. N=983 adults who were
likely votes nationwide. MoE ± 3 (for all adults). The data are below.
Do you approve or disapprove of the
way George W. Bush is handling his
job as president?
We will focus on the proportion who
approve. We expect large
differences between Republicans
and Democrats in support for Bush. Suppose Republican strategists are most concerned that the gap
between Democratic likely voters and Independent likely voters is less than 22 points. We are moving
toward an alternative hypothesis test whether the difference in proportions between Independent
support for the President and Democratic support for the President is less than 22 points.
a. Calculate the approval proportion for each group.
Democrats
pd= 31/314 = .0987
qd = .9013
Independents
pi=104/384 = .2708
qi = .7292
b. Calculate the Standard Error for this Problem - it does not assume that the two proportions
are equal since we are interested if the difference is less than 22.
c. Conduct the hypothesis test that the difference in proportions for approval of the President
between Democrats and Independents is less than .22. Express this as pi – pd = .22 for the
null and alternative hypotheses. Use an alpha level of .05.
Null Hypothesis
pi - pd = .22
Alternative Hypothesis
pi - pd < .22 one-tailed test
Assumptions of Test
Test Statistic (z*)
Large sample difference of proportions test; assume normal, pool
variance
z* = (.2708-.0987 - .22)/.0282
Rejection Region
z.05. = -1.645
Calculation of Test
Statistics
z* = -1.696
Comparison of Test
Statistics with
Rejection Region
z* < z.05 -1.696 < -1.645
Reject Ho: pi - pd = .22
1 What is the p-value for the test? p-value for test = .045
2. Bivariate regression and correlation. On the web site is an Excel file called
TAMPALMS.xls. The data are from a study of residential property sales to appraisals. We
will focus on sale price as the dependent variable (Y), and appraised price as the
independent variable (X). Other variables are:
PRICE
APRAISED
S2ARATIO
APPLAND
APPIMPROV
The actual sale price of the property in $1,000s
The total appraised value in $1,000s
The ratio of the Sale Price to the Appraised Value
The appraised value of the land in $1,000s
The appraised value of improvements to the property in $1,000s
a. First, let’s look at the descriptive statistics on Sale Price and Appraised Value. Using
all your knowledge from the course, I want you to briefly describe each variable (ideas:
compare the mean and median, the range, and so forth).
The mean level for PRICE is 236.56 ($236,560). The median is considerably below that at 190
($190,000), so there are extreme values that skewed the data. From the Histogram you can see
substantial right skew. The sale prices of the houses ranged from 59.00 ($59,000) to 957.50
($957,500). However, the Interquartile range was $153,250 to $287,250, which represented the
middle 50% of the observations.
The results for the APPAISED is similar, but the mean and median is lower - the appraisers
underestimated the value of the homes. The mean level for APPRAISED is 201.75 ($201,750). The
median is considerably below that at 167.07 ($167,070), so there are extreme values that skewed the
data. From the Histogram you can see substantial right skew. The appraised values of the houses
ranged from 64.98 ($64,980) to 929.40 ($929,400). However, the Interquartile range was $126,220 to
$249,180, which represented the middle 50% of the observations.
There is considerable variation in the data. The CV’s are 58.95 and 62.89 respectively.
2 b. I used JMP to create a correlation matrix of all the variables in the data set (including S to
App Ratio, App Land and App Improve).
Correlations
PRICE
APPRAISED
S2ARATIO
APPLAND
APPIMPROV
PRICE
APPRAISED
1.000
0.972
0.972
1.000
-0.012
-0.214
0.889
0.936
0.973
0.988
S2ARATIO
-0.012
-0.214
1.000
-0.276
-0.171
APPLAND APPIMPROV
0.889
0.973
0.936
0.988
-0.276
-0.171
1.000
0.879
0.879
1.000
a. What is the correlation between Sale Price and App Value? Briefly describe what this
correlation means.
r = .972. There is a strong positive relationship.
As the sale price goes up, so does the appraised value
b. What is the correlation App Land and S to App Ratio? Briefly describe what this means.
r = -­‐.276. There is a weak negative relationship. As the Appraised value for land goes up, the sale price to appraised value goes down c.
I used JMP to make a scatter-gram of Sale Price (Y) to Appraised Value (X). Briefly
describe the relationship.
d.
There is a strong, positive linear positive relationship between the PRICE and APRAISED. 3 e.
I used JMP to generate the regression of Sale Price on Appraised Value.
a) How many observations are in the data?
92 observations
b) What is R-square for this model? Briefly
explain what it means.
R-squared = .945 94.5% of the variability in
sale price is “explained” by knowing the
appraised value.
c) Prove that the bivariate regression Rsquare is the same as the correlation
coefficient squared.
r2 = .9722 = .945
d) What is the slope coefficient for App Value?
Briefly explain what it means.
Slope = 1.0687 A unit increase in appraised value results in a 1.07 increase in the sales price e)
What is the intercept coefficient for this
model?
intercept = 20.9419
f.
Solve the regression equation for a property with an appraised value of $150k (by this I
mean use the coefficients from the regression output and solve the equation to come up
with a predicted value for the Sale Price.
Est Y = 20.94 + 150*1.07
EstY = 20.94 + 160.50
Est Y = 181.44
or
$181,440
4 Problem 3. Bivariate regression and correlation. Model of Average Annual Precipitation
An article in Geography (July 1980) used regression to predict average annual rainfall levels in California.
Data on the following variables were collected for 30 meteorological weather stations scattered
throughout California. For the group work we will focus on a bi-variate regression of Annual Percip on
Latitude. You will have the option of examining all the variables for this problem for the last assignment
Annual Precip DEPENDENT VARIABLE: Annual Precipitation in inches
Altitude
The altitude of the station in feet
Latitude
The latitude of the station in degrees
Distance
Distance from the coast in miles
Facing
I made this into a dummy variable. Stations on the Westward facing slopes of the California
mountains were coded as 1, whereas stations on the leeward side were coded as 0
a. The following are the descriptive statistics on each variable. Briefly describe Annual Precipitation
using the mean, median, std deviation and so forth.
The mean level of annual precipitation at the 20 weather stations is 19.81 inches per year.
This value is larger than the median (15.35) and is pulled by large values in the data (the extreme
is 74.87 inches and the range is 73.21).
As a result the variance is relatively large and the CV is 83.90%.
B. What is the interpretation for the mean for FACING?
Since Facing is a dummy variable, the mean is the proportion that have the value 1, stations on
the westward facing slopes. 43.3% of the stations face the west coast.
5 The following are the covariance matrix and the correlation matrix on the variables.
Covariance Matrix
Correlation Matrix
C. Confirm For Annual Precip and Latitude the following: The diagonal in the Covariance Matrix is
roughly the Variance (the small difference is that the formula for covariances does not use n-1).
Variance for Annual Percip
276.26
Covariance of Annual Percip with itself
267.05
Variance for Latitude
Covariance of Lattitude with itself
7.11
6.873
The differences between them is based on using n (for the Co-variances) or (n-1) (for the Sample
estimates)
D. Briefly describe the correlation between Annual Precip and Latitude.
The Correlation between Annual Precip and Latitude is
r = .577, which is moderate and positive. As the latitude increases, so does the annual
precipitation
E. Facing is a dummy variable. Stations on the Westward facing slopes of the California mountains
were coded as 1, whereas stations on the leeward side were coded as 0. Interpret the correlation
between Annual Precip and Facing.
The correlation = .598 is moderate, positive
Stations on the west side of the mountains tended to have higher annual rainfall
6 F. Now we will shift to the bivariate regression of Annual Precip on Latitude.
Verify that R2 in a bivariate regression is simply the
correlation (r) squared. Interpret R2 for this model.
r = .577
r2 = .5772 = .3329
R2 = .333
Solve the model for a Latitude of 37 degrees.
Est Y = -113.303 + 3.595(37)
Est Y = -113.303 + 133.01
Est Y = 19.71 inches
7