Regression and Forecasting Models

Midterm Examination
IOMS Department
Regression and Forecasting Models
Professor William Greene
Phone: 212.998.0876
Office: KMC 7-90
Home page:
people.stern.nyu.edu/wgreene
Email:
[email protected]
Course web page:
people.stern.nyu.edu/wgreene/regression/outline.htm
Midterm Examination: Fall 2013, Section 1
This examination contains 10 questions, each worth 10 points. Where questions have
more than one part, point values for the parts are shown. Please answer all questions.
All answers are to be given in this test booklet. No supplementary materials will be
accepted.
This is an open book test. You may use any notes, books, or other materials that you
like. You may not use a telephone, iPad, or any other device that is capable of sending
or receiving a signal during the exam.
In cases in which a computation involving several values is requested, you may report
the values in the appropriate formula. For example, if the answer were the result of
(1+1)/2, you may report the exact formula, (1+1)/2 rather than 1.
The Stern honor code applies to your participation in this examination.
YOUR NAME __________________
1
Midterm Examination
1. The following shows the results of regression of household income (HHNINC) on
years of education based on a survey of a large sample of German households.
(2)a. How large is the sample?
27326 = 27325+1
(2)b. What is the reported value of R2?
0.069
(3)c. What is the value of the sum of squared residuals?
796.319
(3)d. What are the constant term and slope of the estimated regression equation?
0.12609 and 0.019963.
(10)2.The regression model states that the relationship between a dependent variable y
and an independent variable x is
y = β0 + β1x + ε.
If I want to estimate β1, I will compute the regression of y on x (as we discussed in
class). But, just using some simple algebra, I can also write
x = -β0/β1 + (1/β1)y - (1/β1)ε.
This means that I could just compute the regression of x on y, instead, and take the
reciprocal of the slope coefficient that I compute – I will get the identical answer. True or
false? Justify your answer.
2
Midterm Examination
If you regress y on x, your b1 will be
If you regress x on y, your b1 will be
∑
∑
N
i =1
( xi − x )( yi − y )
∑
N
i =1
N
i =1
( xi − x ) 2
( xi − x )( yi − y )
N
2
i =1
i
N
now take the reciprocal and your 1/b1
. (Note different denominators.)
∑ ( y − y)
∑ ( y − y)
=
∑ ( x − x )( y − y )
2
i =1
i
N
i =1
i
i
This is obviously different from the slope in the regression of y on x. So false.
Note this is not about the theory of the model. The question asked about what you would
get if you did the computations with your data.
3. One of the interesting features of a country’s economy, or the world economy, is the
rate at which health expenditure increases in response to the size of the economy. In
the regression below, using the world health organization data that we discussed in
class, I have regressed the log of per capita health expenditure on the log of the GDP for
the 191 countries in their sample.
(5)a. What is the estimate of the elasticity of health expenditure with respect to income
(GDP)?
1.18226
(5)b. How do you interpret the value of the elasticity?
The elasticity implies that if gdp increases by 1%, expenditures increase by about
1.18%.
3
Midterm Examination
4. The following sample of 11 observations reports the sales in units for electronics
stores in the UK. To begin my study of how these businesses work, I computed a
regression of the camera sales on the number of people working the floor. (Staff).
(5)a. The equation states that if staff equals zero, sales of cameras will equal -86.7.
This is obviously nonsense. If there is no one working, sales should be zero. There is
obviously something wrong with this equation. True or false? Explain.
There is nothing wrong with the equation. The -86.75 is the constant term that is
needed to make the line pass through the middle of the data. No store has staff of
zero. The line predicts the outcome in the range of the data.
(5) b. . What is the economic interpretation of the coefficient on Staff in this equation?
Each additional person added to the staff is associated with an increase of
cameras of 58.427.
4
Midterm Examination
5. This question is based on the regression in Question 4. The average store size is a
little under 5 people. We will use 5 for the average value of staff. I want to predict the
sales of Cameras for a store that has Staff = 10 employees.
(3)a. What is the prediction of Camera sales for a store with Staff = 10?
The prediction is cameras = -86.75 + 58.427(10) = 497.52
(5)b. What are the lower and upper limits of a prediction interval for the sales of this
store?
(2)c. It’s OK if you use 2.0 or 1.96 for calculating the width of your interval. But, this
small sample has only 11 observations. The appropriate critical value is somewhat
larger than 2. The table you need is on slide 37 of Notes Part 2. What is the correct
value?
b. and c. The standard error to use is
se2 (1 + N1 ) + ( x * −=
x ) 2 SEb2
2
49.97382 (1 + 111 ) + (10 − 5)=
5.3832 58.726
The bounds of the interval are 497.52 ± t* (58.726). It is typical to use 1.96 or 2.00 for t*.
This is a very small sample. The actual appropriate value from the table mentioned in
part c is the t with N-2 = 9 degrees of freedom, 2.262
6. Being an economic philosopher, I decided to use the production data in Question 4 to
test a couple hypotheses. The following shows the results of my regression of the log of
video sales on the log of floor space. I interpret this as a regression of the log of output
on the log of capital.
(5)a. Theory 1 (Marxist) holds that capital is not productive. The null hypothesis is that
the coefficient on log capital (logfloor) is zero. Test this hypothesis.
t = (1.9352 – 0)/.3651 = 5.30. This is much larger than 2.262, so the hypothesis is
rejected.
5
Midterm Examination
(5)b. Theory 2 (bland and noncommittal) holds that capital might be productive, but
there are constant returns to scale. The null hypothesis is that the coefficient is one.
Test this hypothesis.
t = (1.9352 – 1)/.3651 = 2.561. Still larger than 2.262, so this is also rejected.
(10)7. The P value reported in the Analysis of variance table in the regression in
Question 6 is 0.000. Since this is a probability, the program is reporting that the
probability of the model is 0.000, i.e., impossible. Therefore we should discard the
model as worthless and build a new model. True or false? Explain.
The P value reports the probability of observing a coefficient as large (far from
zero) as what we observed if the actual coefficient really were zero. Since
P=0.000, we reject that assumption. The P value says it is (essentially) impossible
to observe this value b1 = 1.9352 if the true value of β 1 really were zero.
(10)8. Using the data from Question 1, I decided to regress the log of income on age.
The results are shown below. Notice that Minitab reports that R2 equals 0.0%. This is
nonsense. This is sloppiness on the part of the people who wrote the software. Luckily,
you can compute the R2 for this regression using other numbers that are reported in the
results. What is the right value for R2?
R2 = 2.9503/6599.6116 = 0.000447.
6
Midterm Examination
9. The figure below shows a scatter plot of the monthly returns on Microsoft stock vs.
Walmart stock. There are 70 observations in the sample.
(5)a. Based on this figure, is the correlation between these two variables positive or
negative? Justify your answer.
Positive. The slope of the line is positive. This is the same as the sign of the
correlation.
(5)b. Which is a good guess of the correlation between these two variables? Explain.
0.01 0.25 0.95 1.00
If you answered “negative” in part a, then put a minus sign on the guesses above.
0.01 is close to zero, which would show for an unorganized blob of points. But,
the regression line slopes up, so this is not reasonable. .95 and .99 are extremely
high. The points are not that organized around the line. That leaves 0.25. (The
actual value is .2486.)
7
Midterm Examination
(10)10. Using the data in the figure in Question 9, I computed a linear regression of the
Microsoft price on the Walmart price. I then computed the residuals and plotted them in
the figure below. Looking at these results, I don’t see any patterns that would make me
concerned about the model – they look like random noise to me. What kinds of patterns
would make me (an analyst) question whether my model is an appropriate regression
model?
Long streaks of positive and/or negative values would suggest nonrandomness.
Large numbers of very large residuals might also be suggestive. This scatter of
residuals looks line an unorganized blob of points that swings randomly between
positive and negative.
8