Perfect Competition Monopoly Price → Quantity Demanded

Homework 5: ECO220Y
Required Exercises: Chapter 7: 3, 5 – 8, 13, 17, 19, 21, 23, 31, 33, 41, 43, 47, 59, 63 (Note: The first sets of exercises at
the end of each chapter correspond to specific sections. These are the easiest exercises. These are best done as you
complete each section (i.e. before you continue and finish the reading).)
Extra Exercises (for those who want more practice): Chapter 7: 15, 29, 35
Required Problems:
(1) An important problem in economics is how to estimate demand.
You’ll recall from introductory and intermediate microeconomics that
when analyzing a market – regardless of the model of competition – we
specify demand. For example, it could be P = 100 – Q, which is a linear
demand curve. The more general linear form is P = a – bQ where a and b
are parameters that we will need to estimate using data. We know that
quantity demanded will of course be affected by price. In fact, one of the
undisputed tenets of economics is the Law of Demand: quantity
demanded goes down as price goes up and vice versa.
Price → Quantity Demanded
P
$13.95
$9.95
D
Q
(a) Give an example of time series data containing prices and
quantities of some good. Give an example of cross-sectional data containing prices and quantities of some good.
(b) When considering collecting market data on prices and quantities, what kind of data will typically be
available? (observational or experimental) Why would the other kind seldom be available?
Estimating demand from observational data containing only prices and quantities is impossible. This is because demand
shifters affect both the price and the quantity sold. This makes price an endogenous variable and means that there will
be a serious endogeneity bias in the estimate of the slope of demand (i.e. for a simple linear demand) obtained from a
regression analysis (least squares method). To see that demand shifters – population size, tastes, income, prices of
substitute goods, prices of compliment goods, etc. – affect both the market price and the market quantity consider two
simple models: perfect competition and monopoly.
Monopoly
Perfect Competition
P
S
P
$13.95
$13.95
$9.95
$9.95
D
D
mc
D’
19
20
Q
19
20
D’
Q
(c) Redraw the diagram for the experimental drug trial data that we did as an example in lecture: that is, a
diagram with boxes and arrows that illustrates the research question and the role of any other variables. Next
draw such a diagram for the question of estimating demand where we ask how price affects the quantity
demanded. Make sure to include the role of demand shifters in your diagram.
Henry J. Moore, “father of economic statistics,” conduced regressions for many industries in early 20th century. In some
regressions found negative demand elasticities, but in pig iron, for example, found positive demand elasticity and
Page 1 of 9
concluded “he had discovered a new type of demand curve with positive slope.” Of course this is insane: I think we’d all
like start businesses selling products where the higher the price we charge – other things equal (including quality,
service, etc.) – the more our customers want to buy! Moore’s regression analysis had a serious endogeneity bias and
hence did not reflect the real slope of demand for pig iron at all. To illustrate consider the following diagram.
P
S2
S1
S3
D3
D1
D2
Q
(d) If the above diagram shows three different time periods which kind of data would it generate on prices and
quantities? (observational or experimental; time series, cross-sectional, or panel) How many variables? How
many observations? Would the correlation be positive or negative? Does this mean demand is upward sloping?
(e) Suppose we had a random sample of 10,000 customers. At random each was offered the opportunity to buy
a product at a price of $1, $1.50, $2, $2.5 or $3. We used their purchase decisions to construct data with five
observations and two variables: price and quantity sold. Could we use these data to estimate the slope of
demand? Would that estimate suffer from an endogeneity bias? Is it a problem that the 10,000 customers have
various different tastes, incomes, etc.?
(2) How does time spent sleeping the night before a test affect a student’s mark? As part of an empirical project a
student surveys a random sample of 10 students in a large economics course. Right after the test each student is asked
how much time they had slept the night before (in hours). The student then follows up with the sample a week later to
ask about their mark on the test. There were 120 points possible. Here are the data. Use only a non-programmable
handheld calculator for all parts of this problem.
mark
79
55
85
58
57
74
83
97
69
36
100
mark
hours
7
1.5
8.5
4
3.5
7
8
8
3.5
2.5
80
60
40
2
4
6
hours
8
10
(a) Compute the covariance and indicate the units of measurement.
(b) Using your answer from (a) and the fact that the s.d. of sleep is 2.60 hours and the s.d. of marks is 17.98,
compute the coefficient of correlation and interpret it.
(c) Compute coefficient of determination and interpret it.
(d) What kind of data are these: observational, experimental, or a natural experiment?
Page 2 of 9
(e) Find the regression line equation and interpret the coefficients. Explain how you can or cannot use these
results to answer the research question posed at the start of this problem.
(f) A team of students is called upon to replicate these results. They collect a
larger random sample (n = 30) and obtain the following results. Presuming
that both the original study and the replication study contained no nonsampling errors, what is the best explanation for each of the differences (i.e.
the difference in the covariance, standard deviations, correlation, R2,
intercept, and slope)?
mark
120
100
80
60
40
covariance = 21.75; s.d. hours = 1.76; s.d. marks = 19.10; correlation = 0.65
OLS results: mark-hat = 39.9 + 7.0*hours, n = 30, R-squared = 0.42
2
4
6
8
hours
10
(g) Consider again the results from Part (f). Suppose that marks are calculated as percentage marks (i.e. 0 –
100%) instead of raw marks out of 120 points. What would the new OLS results be? The new R-squared?
(h) Consider again the results from Part (f). Suppose that study time is recorded in minutes instead of hours.
What would the new OLS results be? The new R-squared?
NUM_KID-hat = 5.1 – 0.2*EDU_YRS
r = -0.3132
NUM_KID = 5.1 - 0.2*EDU_YRS
n = 2,589
NUM_KID,
NUM_KID_hat
(3) An economist studies the relationship between a woman’s education
(EDU_YRS) and the number of children that she has (NUM_KID). A
random sample of 2,589 Canadian women is drawn. Using the sample of
2,589 Canadian women the following OLS line and coefficient of
correlation are estimated:
10
5
0
(a) What kind of data are these?
0
5
10 15 20
EDU_YRS
25
30
(b) How should you interpret these results? What is the meaning of 5.1? What is the meaning of -0.2? What is
the meaning of -0.3132?
(c) Is the scatter diagram a good description of these data? Explain. Aside from using statistics such as the
coefficient of correlation and least squares line, what is another way to summarize the relationship between
these two variables?
(4) Consider these data: http://www.ers.usda.gov/media/521667/corndatatable.htm.
(a) How many observations are in these data? How many variables? Explain. [Answer with 1 – 2 sentences]
(b) What kind of data are these? (observational, experimental, time series, cross-sectional, panel) Explain.
[Answer with 2 – 3 sentences]
For the next parts consider a regression analysis where production in millions of bushels is the dependent
variable and the weighted-average farm price in dollars per bushel is the independent variable.
(c) Presuming for the moment that the underlying conditions are met (if you are not sure what these are, review
pages 191 and 198 in our textbook), fully interpret the OLS line, which has an intercept of 1687.7 and a slope of
2141.2. Include in your interpretation a clear statement about whether or not the OLS line is the supply curve.
(HINT (that usually would not be given on a test): Make sure you offer a full interpretation, which means it must
Page 3 of 9
address both the intercept and slope, be context-specific, specify the units of measurement, and be clear about
causality.) [Answer with 2 – 3 sentences]
(e) Based on this scatter diagram, are there any concerns about
the underlying conditions being met? Explain. [Answer with 2 – 3
sentences]
Quantity (millions bushels)
(d) Presuming for the moment that the underlying conditions are met, fully interpret the R2, which is 0.5848.
[Answer with 1 sentence]
1926 - 2012
Y-hat = 1687.7 + 2141.2X
20000
15000
10000
5000
0
0
2
4
6
Price (w.a.$/bushel)
8
(5) Consider the following excerpt from Daniel Kahneman’s biography from the Nobel prize website.
I had the most satisfying Eureka experience of my career while attempting to teach flight instructors that praise is
more effective than punishment for promoting skill-learning. When I had finished my enthusiastic speech, one of
the most seasoned instructors in the audience raised his hand and made his own short speech, which began by
conceding that positive reinforcement might be good for the birds, but went on to deny that it was optimal for flight
cadets. He said, “On many occasions I have praised flight cadets for clean execution of some aerobatic maneuver,
and in general when they try it again, they do worse. On the other hand, I have often screamed at cadets for bad
execution, and in general they do better the next time. So please don't tell us that reinforcement works and
punishment does not, because the opposite is the case.” This was a joyous moment, in which I understood an
important truth about the world: because we tend to reward others when they do well and punish them when they
do badly, and because there is regression to the mean, it is part of the human condition that we are statistically
punished for rewarding others and rewarded for punishing them. I immediately arranged a demonstration in which
each participant tossed two coins at a target behind his back, without any feedback. We measured the distances
from the target and could see that those who had done best the first time had mostly deteriorated on their second
try, and vice versa. But I knew that this demonstration would not undo the effects of lifelong exposure to a perverse
contingency. http://www.nobelprize.org/nobel_prizes/economic-sciences/laureates/2002/kahneman-bio.html
Consider Prof. Murdock’s general explanation of regression to the mean:
Suppose observed performance is a function of some non-random factors (such as skill, effort, etc.) and a random
component (pure chance). High draws of the random component boost performance and low draws detract from
performance. However, if you get lucky with a good draw of the random component you should not expect to be
lucky next time. You should expect an average draw, which if you had been lucky is worse than your lucky draw.
Similarly, if you had a poor draw of the random component you should not expect to be unlucky next time: you
should expect an average draw, which, of course, is better than poor. Hence there is regression towards the mean:
people who got lucky tend to move down towards the mean and people who got unlucky tend to move up towards
the mean. The more that performance is driven by random factors the bigger the regression to the mean effect will
be. In the extreme case of Kahneman’s example of tossing a coin behind your back two times not even knowing
where the target is there is no skill and all chance involved: this was a great way to illustrate regression to the mean
because it is extremely powerful in this case. On the other hand, if there is no random component to performance
then there will be no regression to the mean.
(a) In what way was the seasoned flight instructor mistaken? (Is it that his recollection is faulty? Is it that his
inference based on his observations is faulty? Is it both? Is it something else entirely?)
Page 4 of 9
(b) Is it important for Kahneman’s argument that some portion of a flight cadet’s performance is entirely
random and beyond his/her control?
(c) What does the last line of Kahneman’s speech mean for you? (i.e. Is he optimistic that we will be able to take
the message of the simulation we just did to heart?)
(6) Make sure to do Exercises 1 – 6 on pages 12 – 16 of “Logarithms in Regression Analysis with Asiaphoria”
(http://homes.chass.utoronto.ca/~murdockj/eco220/logarithms_in_regression_analysis_with_Asiaphoria.pdf).
(7) Recall the Fortune 500 companies example from Lecture 5. Here are two graphs from that lecture:
6
Ln Revenues ($b)
Ln Revenues ($b)
Fortune 500 Companies, 2013
Y-hat = 1.4 + 0.4X, R2 = 0.42
5
4
3
2
1
-2
0
2
4
Ln Assets ($b)
6
8
6
Fortune 500 Companies, FAKE DATA
Y-hat = 1.4 + 0.4X, R2 = 0.42
4
2
0
-2
0
2
4
Ln Assets ($b)
6
8
What do these two additional graphs show? (Each lines up with the corresponding graph above it.)
3
2
1
residual
residual
2
1
0
0
-1
-1
-2
-2
1
2
3
Y-hat
4
5
0
1
2
Y-hat
3
4
5
Mobile-cellular subscriptions
per 100 inhabitants
(8) Consider again the data in Section 3 and Figure 1 in “Logarithms in Regression Analysis.” Below is a graph of these
data including the R2, SST, SSR, and SSE. This first graph uses the entire data set (entire cross-section of 156 countries).
250
200
150
100
50
0
2012, n = 156 countries, R2 = 0.43
SST = 267006, SSR = 114678,
SSE = 152329
.2
.4
.6
HDI
.8
1
The next two graphs use only the 34 member nations of the OECD. The last graph excludes three OECD members that
are a bit unusual: Turkey and Mexico, which have rather low HDI’s, and Canada with has the lowest mobile-cellular
subscriptions per 100 inhabitants of all OECD members. Finally, the raw data for the OECD member nations are
reproduced below. These data are observational, cross-sectional data.
Page 5 of 9
.7
.75
country_name
Australia
Austria
Belgium
Canada
Chile
Czech Republic
Denmark
Estonia
Finland
France
Germany
Greece
Hungary
Iceland
Ireland
Israel
Italy
Japan
Korea (Republic of)
Luxembourg
Mexico
Netherlands
New Zealand
Norway
Poland
Portugal
Slovakia
Slovenia
Spain
Sweden
Switzerland
Turkey
United Kingdom
United States
.8
.85
HDI
.9
country_code
AUS
AUT
BEL
CAN
CHL
CZE
DNK
EST
FIN
FRA
DEU
GRC
HUN
ISL
IRL
ISR
ITA
JPN
KOR
LUX
MEX
NLD
NZL
NOR
POL
PRT
SVK
SVN
ESP
SWE
CHE
TUR
GBR
USA
.95
Mobile-cellular subscriptions
per 100 inhabitants
Mobile-cellular subscriptions
per 100 inhabitants
180
160
140
120
100
80
2012, n = 34 countries, R2 = 0.00
SST = 14981, SSR = 45,
SSE = 14936
180
2012, n = 31 countries, R2 = 0.08
SST = 10738, SSR = 831,
SSE = 9907
160
140
120
100
hdi_2012
0.938
0.895
0.897
0.911
0.819
0.873
0.901
0.846
0.892
0.893
0.92
0.86
0.831
0.906
0.916
0.9
0.881
0.912
0.909
0.875
0.775
0.921
0.919
0.955
0.821
0.816
0.84
0.892
0.885
0.916
0.913
0.722
0.875
0.937
.8
.85
HDI
.9
.95
mobile_tel_subs_per_100_2012
106.2
161.2
119.4
75.7
138.5
122.8
118
154.5
172.5
98.1
131.3
116.9
116.4
105.4
107.1
119.9
159.5
109.4
110.4
145.5
86.8
117.5
110.3
115.5
132.7
115.1
111.2
110.1
108.3
122.6
135.3
90.8
130.8
98.2
(a) Compare and contrast the R2 across the three scatter diagrams. What explains any differences?
(b) Compare and contrast the SST, SSR, and SSE across the three graphs. What explains any differences?
Page 6 of 9
(9) [Note: This is an important question that reviews many important concepts about descriptive statistics that we have
covered so far. Next week we shift towards probability theory.] Each year the U.S. Department of Energy releases a fuel
economy guide to inform consumers about the fuel economy and greenhouse gas emissions associated with vehicles
(cars, vans, etc.) released that year. Further, on the website (www.fueleconomy.gov), they release detailed data for each
make and model of vehicle each year. Consider the most recent data on 1,250 makes and models in 2015 (e.g. Ford
Focus with automatic transmission, Honda Civic with manual transmission). These data include: the fuel economy (FE) in
city driving in miles per gallon (MPG), the FE in highway driving in MPG, and a green house gas (GHG) emissions rating
on a scale from 1 to 10 where 1 is the worst and 10 is the best.
(a) What kind of data are these? How many observations? How many variables?
(b) Consider this tabulation of the GHG rating and the fact that the sample mean is 5.3 and the s.d. is 1.6. What
is the approximate shape of the distribution? Suppose a vehicle had a GHG rating of 3: what would the
standardized rating be? How would you interpret it?
GHG Rating |
Freq.
Percent
Cum.
------------+----------------------------------1 |
27
2.16
2.16
2 |
23
1.84
4.00
3 |
83
6.64
10.64
4 |
244
19.52
30.16
5 |
359
28.72
58.88
6 |
206
16.48
75.36
7 |
199
15.92
91.28
8 |
86
6.88
98.16
9 |
18
1.44
99.60
10 |
5
0.40
100.00
------------+----------------------------------Total |
1,250
100.00
(c) Consider this variance-covariance matrix for city and highway fuel economy. Fully interpret all numbers
(which will include computing other numbers using these).
| city_fe
hwy_fe
-------------+-----------------city_fe | 30.4814
hwy_fe | 31.1006 38.2749
(d) Consider these OLS regression results. Fully interpret all numbers.
city_fe-hat = -2.53 + 0.81*hwy_fe; R-squared = 0.83; n = 1,250
(e) Consider these OLS regression results (noticing the natural log transformation). Fully interpret all numbers.
ghg_rating-hat = -12.59 + 6.02*ln(city_fe); R-squared = 0.94; n = 1,250
(10) An interesting use of regression analysis in descriptive statistics is The Economist’s calculation of the adjusted Big
Mac Index to compare currencies: http://www.economist.com/content/big-mac-index. Visit the site, read the
description and make sure you understand how the adjusted index is computed. (Of course, you can start with the raw
index, which does not involve any regression.) Explore the interactive site and pick a country (any country except for the
U.S.) to investigate in detail.
Extra Problems (for those who want more practice):
(11) Suppose that the coefficient of correlation between x and y is 0.5 and the s.d. of x is $10 and the s.d. of y is $10. A
move from x = $10 to x = $12 tends to be associated with how much of a change in y?
Page 7 of 9
(12) Consider this figure taken from the journal article: Kraay, Aart, and David McKenzie. 2014. "Do Poverty Traps Exist?
Assessing the Evidence." Journal of Economic Perspectives, 28(3): 127-48. It is available at
http://pubs.aeaweb.org/doi/pdfplus/10.1257/jep.28.3.127.
(a) Are the raw data underlying this graph time series, panel, or cross-sectional? Why?
(b) Is the line in the graph the OLS line? How do you know?
(c) Why have the authors taken the logarithm of both the y and x variables? What is the meaning of the scale of
the axes? Why are 5 and 10 as far apart as 24 and 50?
(d) The title of this figure is “Absolute Income Stagnation is Rare.” Explain how the figure is consistent with its
title.
(13) Consider the graph below showing real data on students’ grades and the number of classes that they missed. The
title shows the OLS line. Suppose the professor posted this graph on the first day of class and told students “BEWARE!!
For every class that you miss you should expect that on average your grade will decrease by 2.9 percentage points.
COME TO CLASS AT ALL COSTS!!” Explain whether this professor is behaving ethically (with respect to statistics).
Page 8 of 9
Y-hat = 66.5 + -2.9X
Grade
100
80
60
40
20
0
5
# of classes missed
10
(14) What does the derivation below show?
n
R2 
SSR

SST
 ( yˆ i  y ) 2
i 1
SST
n

n

 ( y  bx  bxi  y ) 2
i 1
SST
n
 b2
 (x
i
 x)2
(y
i
 y)2
i 1
n
i 1
 s xy
  2
 sx
 (a  bx
i 1
SST

 (bxi  bx ) 2
i 1
SST
n

 (x
i
 x ) 2 (n  1)
(y
i
 y ) 2 (n  1)
i 1
n
i 1
2
 s
 s x2 s xy 
 2  2 2   xy
s x s y  s x s y
 sy
2
 y) 2
n
n
 b2
i




b 2  ( xi  x ) 2
i 1
SST
2
Page 9 of 9