Business Analytics II

Business Analytics II - Winter 2016

Managerial Economics &
Decision Sciences Department
Developed for
business analytics II
week 8
week 9
week 10
week 3
▌panel data models
cross-section and panel data 
fixed effects 
omitted variable bias 
© 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II
session nine
panel data models
Developed for
business analytics II
learning objectives
► statistics & econometrics
cross section and panel data
 working with data across years
 regression for panel data
fixed effects
 definition
 use of fixed effects to eliminate ovb
►
 fixed effects regression: xi:regress
► (MSN)
 Chapter 8
► (KTN)
 Fixed Effects
► (CS)
 Bonus Data
readings
© 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II
session nine
Managerial Economics &
Decision Sciences Department
panel data models
Developed for
business analytics II
cross section and panel data ◄
fixed effects ◄
omitted variable bias ◄
cross section and panel data
► So far we only looked at data sets without taking into account that observations might be recorded at different point
in time.
► Suppose that you work in the central office of a global sales organization.
 The central office sets base pay across the organization
 Regional managers set bonuses for their regional sales people; the bonus is a percentage of sales and
the percentage is set at the start of the year
 You want to know if higher bonuses translate into greater sales effort
 You have the following data from four sales offices
Figure 1. Sales and related bonuses for offices across the world
region
Atlanta
Beijing
Cairo
Delhi
Atlanta
Beijing
Cairo
Delhi
year
2010
2010
2010
2010
2011
2011
2011
2011
bonus
10
20
30
40
16
18
40
50
sales
56
50
44
40
60
49
50
47
© 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II
► Looking at data for year 2010 or
2011 only we are looking at the data
with cross-section “glasses”
► However we can consider “following”
information about one particular
observation across time – the paneldata interpretation
session nine | page 1
session nine
Managerial Economics &
Decision Sciences Department
panel data models
Developed for
business analytics II
cross section and panel data ◄
fixed effects ◄
omitted variable bias ◄
cross section and panel data
► We usually write the regression as
cross-section:
where
yi  0  1·x1i  2·x2i  …  k·xki  i
i indexes the individuals from 1 to n (thus we have a total of n individuals)
k is the number of independent variables
 for example x1i indicates the ith individual for independent variable x1
► In this formulation we do not take into account the possible time-index for each observation.
► If we take into account the time we will write:
panel data:
where
yit  0  1·x1it  2·x2it  … + k·xkit  it
i indexes the individuals from 1 to n (thus we have a total of n individuals)
t indexes time from 1 to T (thus there are T periods)
k is the number of independent variables
 for example x1it indicates the ith individual in period t for independent variable x1.
► For the cross-section regression we can run two types of regressions:
 by period, thus we will run T regressions, one for each period
 pooled for all periods, thus we simply ignore the time index and pool all observations
© 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II
session nine | page 2
session nine
Managerial Economics &
Decision Sciences Department
panel data models
Developed for
business analytics II
cross section and panel data ◄
fixed effects ◄
omitted variable bias ◄
cross section and panel data
► Let’s consider again the data on bonuses and run the cross section regressions.
Figure 2. Sales and related bonuses for offices across the world
region
Atlanta
Beijing
Cairo
Delhi
Atlanta
Beijing
Cairo
Delhi
year
2010
2010
2010
2010
2011
2011
2011
2011
bonus
10
20
30
40
16
18
40
50
sales
56
50
44
40
60
49
50
47
► Separate regression for each year:
regress sales bonus if year  2010
regress sales bonus if year  2011
► Pooled regression:
regress sales bonus
► Results for the three regressions are
presented below.
Figure 3. Results for the three regressions
model
constant
coefficient on bonus
R2
cross for 2010
61.00
– 0.54
0.99
cross for 2011
58.69
– 0.23
0.44
pooled
57.77
– 0.29
0.44
© 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II
session nine | page 3
session nine
Managerial Economics &
Decision Sciences Department
panel data models
Developed for
business analytics II
cross section and panel data ◄
fixed effects ◄
omitted variable bias ◄
cross section and panel data
► The results of our regression fly in the face of economic theory: higher bonus percentages should lead to higher
effort but
 it seems that higher bonuses really cause lower effort…
 are sales people behaving irrationally?
 are the regression results biased?
► A possible solution to our problem is to add additional controls, i.e. we suspect an omitted variable bias.
► As we saw several times so far, in case of
omitted variable bias we would look for a
candidate variable, we called it z, that is
currently omitted from the regression but that
is:
 correlated with bonus (x)
 causal to sales (z)
► We infer then qualitatively whether and the
direction of the bias in the coefficient of x. But
when are we sure that we identified all the
candidates?
correlation channel
direct channel
sales  b0  b1  bonus  b2  z
z  a0  a1  bonus correlation
causal
indirect channel
sales  b0  b1  bonus
© 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II
truncated
session nine | page 4
session nine
Managerial Economics &
Decision Sciences Department
panel data models
Developed for
business analytics II
cross section and panel data ◄
fixed effects ◄
omitted variable bias ◄
fixed effects
► Let’s consider the following situation: indeed there is a variable z that we cannot identify and that is probably
correlated with x and has a causal impact on y. But we make a very important assumption about the omitted variable:
 for each individual the variable z it is fixed across time periods, thus instead of zit we write zi
► The correct regression, by individual and time, is thus:
y it  b0  b1  x it  b2  z i
Remark. The index of the omitted variable is only “i“ not “it“ as for the other variables. This means that while the values for x and y
can:
 vary for each individual across periods of time (within group variation or within group effect)
 vary for each period across individuals (between groups variation or between groups effect)
Given the assumption above for z we have only between group variation, i.e. it is fixed across time for each individual.
This is the fixed effect framework.
Remark. For our sales/bonus example:
i{Atlanta, Beijng, Cairo, Delhi} and n = 4 (number of individuals)
t{2010, 2011} and T = 2 (number of periods)
© 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II
session nine | page 5
session nine
Managerial Economics &
Decision Sciences Department
panel data models
Developed for
business analytics II
cross section and panel data ◄
fixed effects ◄
omitted variable bias ◄
fixed effects
► Back to the true regression, written by individual and time:
y it  b0  b1  x it  b2  z i
► For each individual, i.e. for each i, let’s add the above equality for all periods, assume there are T periods, and
divide by T:
1T
1T
1T
 y it  b0  b1  T  x it  b2  T  z i
T t 1
t 1
t 1
► But the complicated expression are simply the averages across time for each individual, i.e.:
► Thus we can write
1T
y i   y it
T t 1
1T
x i   x it
T t 1
1T
zi   zi  zi
T t 1
y i  b0  b1  x i  b2  z i
► Subtract this last equality from the initial regression’s equation for each individual and time to get for each i and t:
y it  y i  b1  ( x it  x i )
► Surprise!!! (and a pleasant one…) By taking this difference we managed to get rid of the omitted variable…
© 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II
session nine | page 6
session nine
Managerial Economics &
Decision Sciences Department
panel data models
Developed for
business analytics II
cross section and panel data ◄
fixed effects ◄
omitted variable bias ◄
fixed effects
► Let’s write the equation as:
y it  b1  x it  [ y i  b1  x i ]
αi
► Notice that the last term is specific to each individual, i.e. it is indexed only by “i”. This has the flavor of a dummy
variable framework. Let
α i  y i  b1  x i
and since, as mentioned above, this variable is specific to each individual, we can write it as a sum of dummy
variables:
α i  a0  a1  d1  ...  an 1  d n 1
where d1  1 if i 1 and 0 otherwise, d2  1 if i 2 and 0 otherwise,…, dn-1  1 if i n – 1 and 0 otherwise.
► Basically we write:
α1  a0  a1
d1  1, d 2  0,..., d n 1  0
α 2  a0  a2
d1  0, d 2  1,..., d n 1  0
...
α n 1  a0  an 1
α n  a0
...
d1  0, d 2  0,..., d n 1  1
d1  0, d 2  0,..., d n 1  0
© 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II
session nine | page 7
session nine
Managerial Economics &
Decision Sciences Department
panel data models
Developed for
business analytics II
cross section and panel data ◄
fixed effects ◄
omitted variable bias ◄
fixed effects
► We get a very useful result: in order to eliminate the omitted variable bias we simply run the regression
y it  a0  a1  d1  ...  an 1  d n 1  b1  x it
► The steps to construct the above regression are:
Step 1:
Construct n – 1 dummy variables (where n is the number of different individuals) using the rule:
d1  1, d 2  0,..., d n 1  0
for individual 1
d1  0, d 2  1,..., d n 1  0
for individual 2
...
d1  0, d 2  0,..., d n 1  1
for individual n  1
d1  0, d 2  0,..., d n 1  0
for individual n
Step 2:
Run the regression above on the n – 1 dummy variables and the x variable(s)
Step 3:
Interpret the coefficients; this follows directly from the part in which we studied dummy variables:
a0 is the average y for the excluded individual when xit is constant
a1 is the difference in average y for individual 1 and excluded individual when xit is constant
…
an – 1 is the difference in average y for individual n – 1 and excluded individual when xit is constant
b1 is the change in average y when x changes by one unit
© 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II
session nine | page 8
session nine
Managerial Economics &
Decision Sciences Department
panel data models
Developed for
business analytics II
cross section and panel data ◄
fixed effects ◄
omitted variable bias ◄
fixed effects
► Luckily STATA offers a very easy way to generate the regression
y it  a0  a1  d1  ...  an 1  d n 1  b1  x it
xi:regress y x i.individual_label
Remark. The individual_label is the label (name) of the variable that identifies individuals.
For our sales/bonus example: individual_label is actually region.
Figure 4. Results for regression by individuals: xi:regress sales bonus i.region
► STATA indicates which individual is excluded, thus
interpretation of coefficients should be made accordingly
i.region
_Iregion_1-4
(_Iregion_1 for region==Atlanta omitted)
-----------------------------------------------------------------------------sales |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------bonus |
.65
.0288675
22.52
0.000
.5581307
.7418693
_Iregion_2 |
-12.4
.3605551
-34.39
0.000
-13.54745
-11.25255
_Iregion_3 |
-25.3
.7094599
-35.66
0.000
-27.55782
-23.04218
_Iregion_4 |
-35.3
.9763879
-36.15
0.000
-38.4073
-32.1927
_cons |
49.55
.4368447
113.43
0.000
48.15977
50.94023
------------------------------------------------------------------------------
► The coefficient on bonus is positive: an improvement (from an economical point of view)
© 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II
session nine | page 9
Managerial Economics &
Decision Sciences Department
omitted variable bias
session nine
panel data models
Developed for
business analytics II
cross section and panel data ◄
fixed effects ◄
omitted variable bias ◄
► By controlling for all time-invariant differences in unobservable factors, fixed effects models removes a potential
source of ovb.
► If there are some unobservables that vary over time within each group, the fixed effect approach will not remove
ovb from those sources
 In our example, if you are comfortable assuming that customer characteristics in each region are stable
during the time covered by your data, then you can be comfortable that fixed effect models eliminate ovb
 If you are not comfortable with this assumption, then fixed effects results can still be biased. Even so,
the potential for bias would be even greater if you did not include fixed effects. Put another way, we all
intuitively believe that before/after comparisons are more valid than cross-section comparisons. Fixed
effects are like before/after comparisons.
► The second limitation of fixed effects models is that we cannot assess the effect of variables that do not vary within
groups over time, e.g. if bonuses did not vary over time, we could not use fixed effects.
► If it is crucial to learn the effect of a variable that lacks within group variation, then we would have to forego fixed
effects estimation. We would have to rely on within group variation and work to minimize ovb
© 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II
session nine | page 10
session nine
Managerial Economics &
Decision Sciences Department
panel data models
Developed for
business analytics II
cross section and panel data ◄
fixed effects ◄
omitted variable bias ◄
omitted variable bias: EuroPet S.A.
► linear regression. First a simple regression of Sales on FuelPrice and Radio:
E[Sales]  0  1FuelPrice  2Radio
Figure 5. Results for regression of Sales on FuelPrice and Radio
Sales |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
----------+---------------------------------------------------------------FuelPrice |
1892.702
497.7717
3.80
0.000
904.7625
2880.641
Radio |
14.64805
39.74652
0.37
0.713
-64.2378
93.5339
_cons | -175016.9
56122.55
-3.12
0.002
-286404.6
-63629.17
Figure 6. The rvfplot for regression of Sales on FuelPrice and Radio
Remark The estimated regression is
Est.E[Sales]  175,016  1,892FuelPrice  14Radio
 The positive coefficient on FuelPrice is suspicious: the
higher the FuelPrice the higher the (non-fuel related) Sales
 The rvfplot indicates possible curvature in the data. The
U-shaped rvfplot recommends using a log-linear model as
E[ln(Sales)]  0  1FuelPrice  2Radio
© 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II
session nine | page 11
session nine
Managerial Economics &
Decision Sciences Department
panel data models
Developed for
business analytics II
cross section and panel data ◄
fixed effects ◄
omitted variable bias ◄
omitted variable bias: EuroPet S.A.
► log-linear regression. We try the log-linear specification:
E[ln(Sales)]  0  1FuelPrice  2Radio
Figure 7. Results for regression of ln(Sales) on FuelPrice and Radio
lnSales |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-----------+---------------------------------------------------------------FuelPrice |
.0515498
.0091854
5.61
0.000
.0333193
.0697803
Radio | -.0000846
.0007334
-0.12
0.908
-.0015403
.001371
_cons |
4.536015
1.035632
4.38
0.000
2.480573
6.591457
Figure 8. The rvfplot for regression of ln(Sales) on FuelPrice and Radio
Remark The estimated regression is
Est.E[lnSales]  4.53  0.05FuelPrice  0.00008Radio
 The positive coefficient on FuelPrice is still suspicious.
 The rvfplot indicates that the curvature in the data has
been solved. In addition we can immediately test for
heteroskedasticity (cannot reject at 5%):
Ho: Constant variance
Variables: fitted values of lnsales
chi2(1)
Prob > chi2
© 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II
=
=
2.80
0.0945
session nine | page 12
Managerial Economics &
Decision Sciences Department
omitted variable bias: EuroPet S.A.
session nine
panel data models
Developed for
business analytics II
cross section and panel data ◄
fixed effects ◄
omitted variable bias ◄
► omitted variable bias The regression above potentially suffers from omitted variables bias (ovb): locations that
have higher fuel prices may well be located in higher traffic locations and/or have less competition, and such factors
would also likely support higher sales at the convenience stores.
► We need to eliminate any ovb coming from characteristics of the location that are constant over time, because we
are trying to estimate what will happen to sales when the Marseille location changes its prices.
► None of the other characteristics of the location are changing, so it is crucial to control for them when estimating
the price effect. Since we have panel data, we can best do this by using a fixed effects model.
► We use the log-linear specification with results:
xi:regress lnSales FuelPrice Radio i.StoreId
Figure 8. Results for fixed effects regression of lnSales on FuelPrice and Radio
i.StoreId
_IStoreId_1-20
naturally coded; _IStoreId_1 omitted)
note: _IStoreId_13 omitted because of collinearity
lnsales |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-----------+---------------------------------------------------------------FuelPrice | -.0350037
.0163477
-2.14
0.035
-.0675429
-.0024645
Radio | -.0004923
.0014257
-0.35
0.731
-.0033302
.0023455
_cons |
15.11625
1.897308
7.97
0.000
11.33975
18.89274
© 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II
session nine | page 13
session nine
Managerial Economics &
Decision Sciences Department
panel data models
Developed for
business analytics II
cross section and panel data ◄
fixed effects ◄
omitted variable bias ◄
omitted variable bias: EuroPet S.A.
Figure 8. The rvfplot for regression of lnSales on FuelPrice and Radio
Remark The estimated fixed effect regression (for
presentation purposes the coefficients on dummy variables
are not included)
Est.E[lnSales]  15.11  0.035FuelPrice  0.00049Radio
 The negative coefficient on FuelPrice is in line with
expectations.
 The rvfplot indicates no curvature in the data In addition
we can immediately test for heteroskedasticity (cannot
reject at 5%).
► confidence interval To obtain the estimate and the 95% confidence interval for the change in sales
corresponding to the 50 cents increase in fuel price, we use the klincom command:
klincom _b[FuelPrice]*0.5
Figure 9. The klincom results
lnsales |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+---------------------------------------------------------------(1) | -.0175019
.0081738
-2.14
0.035
-.0337715
-.0012323
Remark The estimated change in Sales is about –1.75% and the 95% interval for this change is from –3.37% to – 0.12%.
© 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II
session nine | page 14

Download Report

Business Analytics II - Winter 2016

Paperzz.com

Your Paperzz