MI Class 11 slides 2016

Market Intelligence
Class 11
Regression - Basics
• Terminology
– In simple regression with a single variable, we get
a zero-order effect (full effect) because we do not
control for anything else.
– In multiple regression we technically speak of the
coefficients as partial effects because it is the
change in Y from a change in X holding everything
else in the regression constant
Regression – Homework assignment
3
Regression for promotion analysis
• Goal is to uncover the partial effect of a marketing
action on customer response
– Example: look at effect of coupons on sales, controlling for
other variables
• Run multiple regression with outcome variable (often
sales) as dependent variable and marketing actions
(e.g., 4 P’s) as predictors (independent variables)
4
Multicollinearity in Multiple Regression
• Multicollinearity: when 2 or more of your
predictor variables are highly correlated with
each other
5
Multicollinearity in Multiple Regression
• Multicollinearity: when 2 or more of your
predictor variables are highly correlated with
each other
•
•
•
•
Location in store
Advertising
Coupons
Price reduction
Often covary!
6
Detecting MC
• Correlations between predictors of > .5
• Variance inflation factor (VIF) > 5
Other indicators:
• Large changes in the other regression coefficients
when a predictor is added or deleted
• Coefficient of a predictor variable is not
significant even though that predictor is highly
correlated with DV
7
Diagnostics
Coefficientsa
• Tolerance=
1  RX2 | X
2
1
• VIF =
Variance Inflation Factor
= 1/Tolerance
Model
1
(Constant)
H=Any Children Under 6?
2
(Constant)
H=Any Children Under 6?
C=Household Size
Standardi
zed
Unstandardized
Coefficien
Coefficients
ts
B
Std. Error Beta
40.680
1.154
7.278
1.923
.167
25.288
1.841
-4.875
2.119
-.112
5.455
.536
.496
t
35.266
3.786
13.734
-2.301
10.179
a. Dependent Variable: B=Weekly Food Expenditure Dollars
Rule of thumb: MC problem if VIF > 5
Sig.
.000
.000
.000
.022
.000
Collinearity Statistics
Tolerance
VIF
1.000
1.000
.683
.683
1.465
1.465
Options for handling MC
• Obtain more data
– More data can produce more precise parameter
estimates (with lower standard errors)
• Leave all predictors in model
– Significance of coefficients may be reduced
• Drop a predictor
– Standard error shrinks & will be more significant, but
will include effect of omitted variable
• Aggregate predictors
– Create index/composite/average
9
Doritos Example
• Identification of Promotion Effects
– Effect of Price Promotions on Sales of XL size
• IRI Dataset (Market Level, Weekly Data)
• Sales Models for XL Size
– Effects of own price (own price effect) & price of
other sizes (cross price effects) on sales of XL size
– Multicollinearity exists: we’ll look at options
10
Promotion Example: Symphony-IRI
data of Dorito Weekly Sales
Sizes:
SM
XL
2XL
3XL
11
Promotion Example: Symphony-IRI
data of Dorito Weekly Sales
Sizes:
SM
XL
2XL
3XL
12
Correlations
Average Price Per
Pound Small Size
Average Price Per
Pound XL Size
Average Price Per
Pound 2XL Size
Average Price Per
Pound 3XL Size
Lbs Extra Large
Size 9 Oz $2.19
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Average
Average
Price Per
Price Per
Pound
Pound XL
Small Size
Size
1.000
.018
.
.854
104
104
.018
1.000
.854
.
104
104
.449**
.091
.000
.356
104
.502**
.000
104
.085
.391
104
104
.120
.224
104
-.801**
.000
104
Average
Average
Price Per Price Per
Lbs Extra
Pound
Pound
Large Size
2XL Size
3XL Size
9 Oz $2.19
.449**
.502**
.085
.000
.000
.391
104
104
104
.091
.120
-.801**
.356
.224
.000
104
104
104
1.000
.950**
.107
.
.000
.279
104
.950**
.000
104
.107
.279
104
104
104
1.000
.
104
.067
.500
104
.067
.500
104
1.000
.
104
**. Correlation is s ignificant at the 0.01 level (2-tailed).
13
Correlations
Average Price Per
Pound Small Size
Average Price Per
Pound XL Size
Average Price Per
Pound 2XL Size
Average Price Per
Pound 3XL Size
Lbs Extra Large
Size 9 Oz $2.19
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Average
Average
Price Per
Price Per
Pound
Pound XL
Small Size
Size
1.000
.018
.
.854
104
104
.018
1.000
.854
.
104
104
.449**
.091
.000
.356
104
.502**
.000
104
.085
.391
104
104
.120
.224
104
-.801**
.000
104
Average
Average
Price Per Price Per
Lbs Extra
Pound
Pound
Large Size
2XL Size
3XL Size
9 Oz $2.19
.449**
.502**
.085
.000
.000
.391
104
104
104
.091
.120
-.801**
.356
.224
.000
104
104
104
1.000
.950**
.107
.
.000
.279
104
.950**
.000
104
.107
.279
104
104
104
1.000
.
104
.067
.500
104
.067
.500
104
1.000
.
104
**. Correlation is s ignificant at the 0.01 level (2-tailed).
14
Doritos Regression Equation
• Dependent Variable = Sales (lbs.) of XL size
• Model 1: Overloaded
– Independent Variables: Price XL, Price SM, Price
2XL, Price 3XL
SalesXLt  a  b1PSM _ t  b2PXL _ t  b3P2XL _ t  b4 P3XL _ t
15
Doritos Regression Equation
• Dependent Variable = Sales (lbs.) of XL size
• Model 1: Overloaded
– Independent Variables: Price XL, Price SM, Price
2XL, Price 3XL
SalesXLt  a  b1PSM _ t  b2PXL _ t  b3P2XL _ t  b4 P3XL _ t
• Model 2: Omitted variable
16
Which variable to drop?
• Which is less correlated with DV?
• Which is more correlated with remaining
predictors?
• Which is less important variable to you?
17
Correlations
Average Price Per
Pound Small Size
Average Price Per
Pound XL Size
Average Price Per
Pound 2XL Size
Average Price Per
Pound 3XL Size
Lbs Extra Large
Size 9 Oz $2.19
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Average
Average
Price Per
Price Per
Pound
Pound XL
Small Size
Size
1.000
.018
.
.854
104
104
.018
1.000
.854
.
104
104
.449**
.091
.000
.356
104
.502**
.000
104
.085
.391
104
104
.120
.224
104
-.801**
.000
104
Average
Average
Price Per Price Per
Lbs Extra
Pound
Pound
Large Size
2XL Size
3XL Size
9 Oz $2.19
.449**
.502**
.085
.000
.000
.391
104
104
104
.091
.120
-.801**
.356
.224
.000
104
104
104
1.000
.950**
.107
.
.000
.279
104
.950**
.000
104
.107
.279
104
104
104
1.000
.
104
.067
.500
104
.067
.500
104
1.000
.
104
**. Correlation is s ignificant at the 0.01 level (2-tailed).
18
Doritos Regression Equation
• Dependent Variable = Sales (lbs.) of XL size
• Model 1: Overloaded
– Independent Variables: Price XL, Price SM, Price
2XL, Price 3XL
SalesXLt  a  b1PSM _ t  b2PXL _ t  b3P2XL _ t  b4 P3XL _ t
• Model 2: Omitted variable
– Independent variables: Price XL, Price SM, Price
2XL (note: NO 3XL)
SalesXLt  a  b1PSM _ t  b2PXL _ t  b3P2XL _ t
19
Doritos Regression Equation
• Dependent Variable = Sales (lbs.) of XL size
• Model 1: Overloaded
– Independent Variables: Price XL, Price SM, Price
2XL, Price 3XL
SalesXLt  a  b1PSM _ t  b2PXL _ t  b3P2XL _ t  b4 P3XL _ t
20
MODEL 1: XL SALES
DORITOS - PREDICTING XL SALES FROM PRICE OF SM, XL, 2XL, 3XL
Doritos - Only Own XL Price Coefficient is Significant
Coefficientsa
Model
1
(Cons tant)
Average Price Per
Pound Small Size
Average Price Per
Pound XL Size
Average Price Per
Pound 2XL Size
Average Price Per
Pound 3XL Size
Uns tandardized
Coefficients
B
Std. Error
316.844 2445.178
Standardi
zed
Coefficien
ts
Beta
t
.130
Sig.
.897
253.361
515.280
.033
.492
.624
-1915.729
136.477
-.813
-14.037
.000
3590.806
2495.219
.267
1.439
.153
-1413.885
2574.564
-.106
-.549
.584
a. Dependent Variable: Lbs Extra Large Size 9 Oz $2.19
21
MODEL 1: XL SALES
DORITOS - PREDICTING XL SALES FROM PRICE OF SM, XL, 2XL, 3XL
Doritos - Only Own XL Price Coefficient is Significant
Coefficientsa
Model
1
(Cons tant)
Average Price Per
Pound Small Size
Average Price Per
Pound XL Size
Average Price Per
Pound 2XL Size
Average Price Per
Pound 3XL Size
Uns tandardized
Coefficients
B
Std. Error
316.844 2445.178
Standardi
zed
Coefficien
ts
Beta
t
.130
Sig.
.897
253.361
515.280
.033
.492
.624
-1915.729
136.477
-.813
-14.037
.000
3590.806
2495.219
.267
1.439
.153
-1413.885
2574.564
-.106
-.549
.584
a. Dependent Variable: Lbs Extra Large Size 9 Oz $2.19
22
MODEL 1: XL SALES
DORITOS - PREDICTING XL SALES FROM PRICE OF SM, XL, 2XL, 3XL
Doritos - Only Own XL Price Coefficient is Significant
Coefficientsa
Model
1
(Cons tant)
Average Price Per
Pound Small Size
Average Price Per
Pound XL Size
Average Price Per
Pound 2XL Size
Average Price Per
Pound 3XL Size
Uns tandardized
Coefficients
B
Std. Error
316.844 2445.178
Standardi
zed
Coefficien
ts
Beta
t
.130
Sig.
.897
253.361
515.280
.033
.492
.624
-1915.729
136.477
-.813
-14.037
.000
3590.806
2495.219
.267
1.439
.153
-1413.885
2574.564
-.106
-.549
.584
a. Dependent Variable: Lbs Extra Large Size 9 Oz $2.19
23
Doritos Regression Equation
• Dependent Variable = Sales (lbs.) of XL size
• Model 1: Overloaded
– Independent Variables: Price XL, Price SM, Price
2XL, Price 3XL
• Model 2: Omitted variable
– Independent variables: Price XL, Price SM, Price
2XL (note: NO 3XL)
SalesXLt  a  b1PSM _ t  b2PXL _ t  b3P2XL _ t
24
MODEL 2: XL SALES
DORITOS - PREDICTING XL SALES FROM PRICE OF SM, XL, 2XL (DROP 3XL)
Doritos - Own XL and 2XL Price Coefficients Significant
Coefficientsa
Model
1
Unstandardized
Coefficients
B
Std. Error
625.958 2371.187
(Constant)
Average Price Per
175.976
Pound Small Size
Average Price Per
-1924.665
Pound XL Size
Average Price Per
2305.377
Pound 2XL Size
Standardi
zed
Coefficien
ts
Beta
t
.264
Sig.
.792
493.905
.023
.356
.722
135.029
-.817
-14.254
.000
861.524
.172
2.676
.009
a. Dependent Variable: Lbs Extra Large Size 9 Oz $2.19
25
Coefficientsa
Model
1
(Cons tant)
Average Price Per
Pound Small Size
Average Price Per
Pound XL Size
Average Price Per
Pound 2XL Size
Average Price Per
Pound 3XL Size
Uns tandardized
Coefficients
B
Std. Error
316.844
2445.178
Standardi
zed
Coefficien
ts
Beta
t
.130
Sig.
.897
253.361
515.280
.033
.492
.624
-1915.729
136.477
-.813
-14.037
.000
3590.806
2495.219
.267
1.439
.153
-1413.885
2574.564
-.106
-.549
.584
a. Dependent Variable: Lbs Extra Large Size 9 Oz $2.19
Coefficientsa
Model
1
(Cons tant)
Average Price Per
Pound Small Size
Average Price Per
Pound XL Size
Average Price Per
Pound 2XL Size
Uns tandardized
Coefficients
B
Std. Error
625.958
2371.187
vs.
Standardi
zed
Coefficien
ts
Beta
t
.264
Sig.
.792
175.976
493.905
.023
.356
.722
-1924.665
135.029
-.817
-14.254
.000
2305.377
861.524
.172
2.676
.009
a. Dependent Variable: Lbs Extra Large Size 9 Oz $2.19
Model 1
(Overloaded)
Model 2
(omitted
variable)
26
Coefficientsa
Model
1
(Cons tant)
Average Price Per
Pound Small Size
Average Price Per
Pound XL Size
Average Price Per
Pound 2XL Size
Average Price Per
Pound 3XL Size
Uns tandardized
Coefficients
B
Std. Error
316.844
2445.178
Standardi
zed
Coefficien
ts
Beta
t
.130
Sig.
.897
253.361
515.280
.033
.492
.624
-1915.729
136.477
-.813
-14.037
.000
3590.806
2495.219
.267
1.439
.153
-1413.885
2574.564
-.106
-.549
.584
a. Dependent Variable: Lbs Extra Large Size 9 Oz $2.19
Coefficientsa
Model
1
(Cons tant)
Average Price Per
Pound Small Size
Average Price Per
Pound XL Size
Average Price Per
Pound 2XL Size
Uns tandardized
Coefficients
B
Std. Error
625.958
2371.187
vs.
Standardi
zed
Coefficien
ts
Beta
t
.264
Sig.
.792
175.976
493.905
.023
.356
.722
-1924.665
135.029
-.817
-14.254
.000
2305.377
861.524
.172
2.676
.009
a. Dependent Variable: Lbs Extra Large Size 9 Oz $2.19
Model 1
(Overloaded)
Model 2
(omitted
variable)
27
Coefficientsa
Model
1
(Cons tant)
Average Price Per
Pound Small Size
Average Price Per
Pound XL Size
Average Price Per
Pound 2XL Size
Average Price Per
Pound 3XL Size
Uns tandardized
Coefficients
B
Std. Error
316.844
2445.178
Standardi
zed
Coefficien
ts
Beta
t
.130
Sig.
.897
253.361
515.280
.033
.492
.624
-1915.729
136.477
-.813
-14.037
.000
3590.806
2495.219
.267
1.439
.153
-1413.885
2574.564
-.106
-.549
.584
a. Dependent Variable: Lbs Extra Large Size 9 Oz $2.19
Coefficientsa
Model
1
(Cons tant)
Average Price Per
Pound Small Size
Average Price Per
Pound XL Size
Average Price Per
Pound 2XL Size
Uns tandardized
Coefficients
B
Std. Error
625.958
2371.187
vs.
Standardi
zed
Coefficien
ts
Beta
t
.264
Sig.
.792
175.976
493.905
.023
.356
.722
-1924.665
135.029
-.817
-14.254
.000
2305.377
861.524
.172
2.676
.009
a. Dependent Variable: Lbs Extra Large Size 9 Oz $2.19
Model 1
(Overloaded)
Model 2
(omitted
variable)
28
The choices
• Overloaded model
– individual predictors won’t be as significant due to inflated
standard errors
– Result: you won’t know which predictors have significant
effect
• Omitted variable bias
– Individual influential predictors will be significant
– but they will be biased (more significant) because they
include effect of omitted variable
• Average 2xl & 3xl (create index)
– Check alpha first (in this case, correlation is enough)
– Keep in mind that you aren’t able to tease these 2 apart
29
Break out exercise
30
Summary
• Interpretation of Regression Coefficients: critical for
uncovering partial effects
• Diagnose potential Multicollinearity and Omitted
Variables issues
– Cost of these issues is decreased precision or bias in
coefficient estimates
• Know your options; every situation will be different.
– Are there other, less correlated, variables available as
substitutes for the correlated variables?
– Is it acceptable to drop 1 or more variables?
– Can you create an index or aggregate correlated
predictors?
31