Marketing Research

Chapter_Seventeen
Correlation & Regression Analysis
Naresh K. Malhotra
Marketing Research-an applied orientation, 4th ed.
Product moment correlation
Product moment correlation is a statistic is used to summarize the strength of
association between two metric (interval or ratio) variables say X and Y. It is also known
as Pearson Correlation Co-efficient, Simple Correlation, Bivariate Correlation or simply
Correlation Co-efficient. It is proposed by Karl Pearson.
Ex: How strongly are sales related to advertising expenditures?
 X
n
Formula:
r
i 1
 X
n
i 1
i
 X Y i  Y 
 X
2
i
 Y  Y 
n
i 1
2
i
The value of r varies between -1 and +1. The value of r is equal-
1. 0 means there is no linear relationship between X and
Y
2. 1 means there is a positive strong relationship
between X and Y
3. -1 means there is a negative strong relationship
between X and Y
Regression Analysis
Regression analysis is a powerful and flexible procedure for analyzing associative relationships between
a metric dependent variable and one or more independent variables. It is concerned with the nature
and degree of association between variables and does not imply or assume any causality. It is used in
the following ways:
1. Determine whether the independent variables explain a
significant variation in the dependent variable: Whether a
relationship exists
2. Determine how much of the variation in the dependent
variable can be explained by the independent variables:
Strength of the relationship
3. Determine the structure or form of the relationship: The
mathematical equation relating the independent and
dependent variables
4. Predict the values of the dependent variable
5. Construct for other independent variables where evaluating
the contributions of a specific variable or set of variables.
Bivariate Regression
Bivariate regression is a procedure for deriving a mathematical relationship
in the form of an equation between a single metric dependent or criterion
variable and a single metric independent or predictor variable.
Ex: Can the variation in market share be accounted for by the size of the
sales force?
Equation:
Y  β 0  β1 X
Bivariate Regression’s process
It is a nine-step processPlot the Scatter Diagram
Formulate the general model
Estimate the parameters
Estimate the standardized regression coefficient
Test for significance
Determine the strength & significance of association
Check prediction accuracy
Examine the residuals
Cross validate the model
Bivariate Regression’s process
Step I
A scatter diagram or scatter gram is a plot of the values of two variables
for all the cases or observations. Simply, it is a form of relationship
between the variables. It is used to plot the dependent variable on the
vertical axis and the independent variable on the horizontal axis. In the
scatter diagram, independent variable is shown in the horizontal axis
whereas the dependent variable is shown in the vertical axis. If one
variable increases, so does the other, then the relationship is described
as linear or a straight line. The most commonly used technique for
fitting a straight line to a scatter gram is the least-squares procedure.
The technique determines the best-fitting line by minimizing the
square of the vertical distances of all the points from the line. The bestfitting line is called the regression line. Any point that does not fall on
the regression line is not fully accounted for. The vertical distance from
the point to the line is the error,
ej
Bivariate Regression’s process
Step II In the Bivariate regression model, the general form of a
straight line is:
Y  β 0  β1 X
Where,
Y  dependent or criterion variable
X  Independent or predictor variable
β 0  Intercept of the line
β 1  Slope of the line
But in marketing research, the basic regression model will beY  β0
 β X      β X  e
1
1
n
n
i
Bivariate Regression’s process
Step III
In the most cases, β 0 and β1 are unknown and are estimated from the sample
observations using the equation: Ŷ  a  bx ; where Y is the estimated or
i
i
i
predicted value of Ŷ . The value of a and b will be found by the following
i
formula:
n
Number One:
b
n
 X  X Y  Y   X Y  nXY
i
i 1
n
2


X

X
 i
i 1
Number Two:
i
a  Y - bX

i 1
n
i i
 X  nX
i 1
2
i
2
Bivariate Regression’s process
Step IV
Standardization is the process by which the raw data are transformed into new
variables that have a mean of 0 and a variance of 1. When the data are
standardized, the intercept assumes a value of 0. The term beta coefficient or beta
weight is used to denote the standardized regression coefficient is
B yx
Step V
 B xy  rxy
The statistical significance of the linear relationship between X and Y may be
tested by examining the hypotheses:
H 0 : β1  0
H1 : β1  0
The null hypothesis implies that there is no linear relationship between X and Y.
The alternative hypothesis is that there is a relationship-positive or negative
between X and Y. Typically, a two-tailed test is done. A t statistic with n – 2
degrees of freedom can be used wheret 
b
SE
b
Bivariate Regression’s process
denotes the standard deviation of b and is called the standard
error. When the calculated value of t is larger than the critical value,
then the null hypothesis is rejected means that there is a significant
linear relationship between dependent & independent variable.
Step V
SE b
Step VI
Here the strength of association is measured by the coefficient of
determination, r2. In Bivariate regression, r2 is the square of the
simple correlation coefficient obtained by correlating the two
variables. The coefficient, r2 varies between 0 and 1. The value of
r2 is calculated bySS reg
SS y  SS res
2
r 

SS y
SS y
Bivariate Regression’s process
Step VI Where,
n
SS y
  Yi  Y 
2
i 1
SS reg
  Ŷi  Y 
SS res
  Yi  Ŷi 
n
2
i 1
n
2
i 1
Another equivalent test for examining the significance of the linear
relationship between X and Y is the test for the significance of the
coefficient of determination. The hypothesis is2
H0: R  0
2
H1 : R  0
Here F statistic is used as (c – 1) and (n – c) is compared with the
calculated value. If the calculated value is larger than the critical
value then null hypothesis is rejected meaning that there is a
significant relationship between dependent and independent
variable.
Bivariate Regression’s process
Step VII To estimate the accuracy of predicted values, Ŷ , it is useful to calculate the
standard error of estimate,
n
SEE 
 Y
i 1
i
 Ŷ 
n-2
2
Two cases of prediction may arise. The researcher may want to predict the
mean value of y for all the cases with a given value of X, say X 0 or predict the
value of Y for a single case. Here predicted value is
Ŷ  a  bX 0
Step
VIII
Latter
Step IX
Latter
Multiple Regression
Multiple regression involves a single dependent variable and two or more
independent variables. Ex: Can variation in sales be explained in terms of
variation in advertising expenditures, prices and level of distribution? The
general form of the multiple regression model:
Y  β0
 β X  β X  β X      β X  e
1
1
2
2
3
k
3
k
which is estimated by the following equation:
Ŷ  a
 b X  b X  b X      b X
1
1
2
2
3
3
k
k
Multiple Regression Process
The steps involved in conducting multiple regression analysis are similar to those
for bivariate regression analysis. The discussion focuses onPartial
Coefficients
Regression The interpretation of the partial regression coefficient, b1 is that
it represents the expected change in Y when X1 is changed by
one unit but X 2 is held constant or otherwise controlled.
Likewise, b 2 represents the expected change in Y for a unit
change in X 2when X1 is held constant. Thus calling b1 and b 2 ,
partial regression coefficients is appropriate. In other words, if
X1 and X 2 are each changed by one unit, the expected change
in Y would be (b1  b 2 ) . Multiple regression can not be solved if1.Sample size, n is smaller than or equal to the number of
independent variables, k
2.One independent variable is perfectly correlated with another
Multiple Regression Process
Strength
association
of
The strength of association is measured by the square of the
multiple correlation coefficient, which is alsoR called the coefficient
of multiple determination, where2
R 
2
SS reg
SS y
The multiple correlation coefficient, R, can also be viewed as the
Ŷ
simple correlation coefficient, r, between Y and R . Several
characteristics of R are1.The coefficient of multiple determination,
cannot be less than
the highest Bivariate, rR2, of any individual independent variable
with the dependent variable.
2.
will be larger when the correlations
between the independent
R
variables are low
R
3.If the independent variables
are statistically independent
(uncorrelated), then
will be the sum of Bivariate r2 of each
independent variable with the dependent variable.
4. cannot decrease as more independent variables are added to
the regression equation.
2
2
2
2
2
Multiple Regression Process
Step IX
Examination of residual
A residual is the difference between the observed value of Yi
and the value predicted by the regression equation, Ŷi . Plotting
the residuals against the independent variables provide
evidence of the appropriateness or inappropriateness of using a
linear model. Again, the plot should result in a random pattern.
The residuals should fall randomly with relatively equal
distribution dispersion about 0. They should not display any
tendency to be either positive or negative.
Multiple Regression Process
Step X
Significance testing
In testing the significance of the overall regression equation as
well as specific partial regression coefficients. The null
hypothesis for the overall test is that the coefficient of multiple
determination in the population, R 2 is zero. H 0 : R 2  0 . This is
equivalent to the following null hypothesis:
H 0 : β1  β 2  β 3              β k  0
The overall test can be conducted by using an F statistic whereF
SS reg k
SS res (n - k - 1)

R
2
k
1 - R 2  (n - k - 1)

