p - NMSU College of Business

Correlation Patterns
Cross-Tabulation
• A technique for organizing data by groups,
categories, or classes, thus facilitating
comparisons; a joint frequency distribution
of observations on two or more sets of
variables
• Contingency table- The results of a crosstabulation of two variables, such as survey
questions
Cross-Tabulation
•
•
•
•
Analyze data by groups or categories
Compare differences
Contingency table
Percentage cross-tabulations
Elaboration and Refinement
• Moderator variable
– A third variable that, when introduced into an
analysis, alters or has a contingent effect on
the relationship between an independent
variable and a dependent variable.
– Spurious relationship
• An apparent relationship between two variables
that is not authentic.
Quadrant Analysis
Two
rating
scales
4 quadrants
two-dimensional
table
ImportancePerformance
Analysis)
Calculating Rank Order
• Ordinal data
• Brand preferences
Correlation Coefficient
• A statistical measure of the covariation or
association between two variables.
• Are dollar sales associated with
advertising dollar expenditures?
The Correlation coefficient for two
variables, X and Y is
rxy
.
Correlation Coefficient
•
•
•
•
•
r
r ranges from +1 to -1
r = +1 a perfect positive linear relationship
r = -1 a perfect negative linear relationship
r = 0 indicates no correlation
Simple Correlation Coefficient
rxy  ryx 
 X  X Y  Y 
 Xi  X   Yi  Y 
i
i
2
2
Simple Correlation Coefficient
rxy  ryx 
 xy
 
2
x
2
y
Simple Correlation Coefficient
Alternative Method
 = Variance of X
2
 y = Variance of Y
2
x
 xy= Covariance of X and Y
Y
Correlation Patterns
NO CORRELATION
X
.
Y
Correlation Patterns
PERFECT NEGATIVE
CORRELATION r= -1.0
X
.
Correlation Patterns
Y
A HIGH POSITIVE CORRELATION
r = +.98
X
.
Calculation of r
r
 6.3389
17.837 5.589 
 6.3389

99.712
 .635
Pg 629
Coefficient of Determination
Explained variance
r 
Total Variance
2
Correlation Does Not Mean
Causation
• High correlation
• Rooster’s crow and the rising of the sun
– Rooster does not cause the sun to rise.
• Teachers’ salaries and the consumption of
liquor
– Covary because they are both influenced by a
third variable
Correlation Matrix
• The standard form for reporting
correlational results.
Correlation Matrix
Var1
Var2
Var3
Var1
1.0
0.45
0.31
Var2
0.45
1.0
0.10
Var3
0.31
0.10
1.0
Common Bivariate Tests
Type of Measurement
Differences between
two independent groups
Differences among
three or more
independent groups
Interval and ratio
Independent groups:
t-test or Z-test
One-way
ANOVA
Common Bivariate Tests
Type of Measurement
Differences between
two independent groups
Differences among
three or more
independent groups
Ordinal
Mann-Whitney U-test
Wilcoxon test
Kruskal-Wallis test
Common Bivariate Tests
Type of Measurement
Differences between
two independent groups
Differences among
three or more
independent groups
Nominal
Z-test (two proportions)
Chi-square test
Chi-square test
Type of
Measurement
Differences between
two independent groups
Nominal
Chi-square test
Differences Between Groups
• Contingency Tables
• Cross-Tabulation
• Chi-Square allows testing for significant
differences between groups
• “Goodness of Fit”
Chi-Square Test
(Oi  Ei )²
x²  
Ei
x² = chi-square statistics
Oi = observed frequency in the ith cell
Ei = expected frequency on the ith cell
Chi-Square Test
E ij 
R iC
j
n
Ri = total observed frequency in the ith row
Cj = total observed frequency in the jth column
n = sample size
Degrees of Freedom
(R-1)(C-1)=(2-1)(2-1)=1
Degrees of Freedom
d.f.=(R-1)(C-1)
Awareness of Tire
Manufacturer’s Brand
Aware
Unaware
Men
50
15
65
Women
10
25
35
Total
60
40
100
Chi-Square Test: Differences
Among Groups Example
( 50  39 )
(10  21)
X 

39
21
2
2
(15  26 )
( 25  14 )


26
14
2
2
2
  3.102  5.762  4.654  8.643 
2
  22.161
2
d . f .  ( R  1)(C  1)
d . f .  ( 2  1)( 2  1)  1
X2=3.84 with 1 d.f.
Type of
Measurement
Differences between
two independent groups
Interval and
ratio
t-test or
Z-test
Differences Between Groups
when Comparing Means
• Ratio scaled dependent variables
• t-test
– When groups are small
– When population standard deviation is
unknown
• z-test
– When groups are large
Null Hypothesis About Mean
Differences Between Groups
 
1
2
OR
  0
1
2
t-Test for Difference of Means
mean 1 - mean 2
t
Variabilit y of random means
t-Test for Difference of Means
1   2
t
S X1  X 2
X1 = mean for Group 1
X2 = mean for Group 2
SX1-X2 = the pooled or combined standard error
of difference between means.
t-Test for Difference of Means
1   2
t
S X1  X 2
t-Test for Difference of Means
X1 = mean for Group 1
X2 = mean for Group 2
SX -X = the pooled or combined standard error
1 2
of difference between means.
Pooled Estimate of the
Standard Error
 n1 1S (n2 1)S
SX1X2  
n1  n2 2

2
1
2
2
)  1 1 
  
 n1 n2 
Pooled Estimate of the
Standard Error
S12 = the variance of Group 1
S22 = the variance of Group 2
n1 = the sample size of Group 1
n2 = the sample size of Group 2
Pooled Estimate of the Standard
Error
t-test for the Difference of Means
S X1  X 2
 n1  1S12  ( n2  1) S 22 )  1
1 
  
 
n1  n2  2

 n1 n2 
S12 = the variance of Group 1
S22 = the variance of Group 2
n1 = the sample size of Group 1
n2 = the sample size of Group 2
Degrees of Freedom
• d.f. = n - k
• where:
–n = n1 + n2
–k = number of groups
t-Test for Difference of Means
Example
 202.1  132.6
 
33

2
S X1 X 2
 .797
2
 1 1 
  
 21 14 

16.5  12.2
4 .3
t

.797
.797
 5.395
Type of
Measurement
Differences between
two independent groups
Nominal
Z-test (two proportions)
Comparing Two Groups when
Comparing Proportions
• Percentage Comparisons
• Sample Proportion - P
• Population Proportion -

Differences Between Two Groups
when Comparing Proportions
The hypothesis is:
Ho: 1  2
may be restated as:
Ho: 1  2  0
Z-Test for Differences of
Proportions
Ho : 1   2
or
Ho : 1   2  0
Z-Test for Differences of
Proportions
Z

p1  p 2    1   2 

S p1  p 2
Z-Test for Differences of
Proportions
p1 = sample portion of successes in Group 1
p2 = sample portion of successes in Group 2
1  1) = hypothesized population proportion 1
minus hypothesized population
proportion 1 minus
Sp1-p2 = pooled estimate of the standard errors of
difference of proportions
Z-Test for Differences of
Proportions
S p1  p2 
1 1
pq   
n
n
2 
 1
Z-Test for Differences of
Proportions
pp = pooled estimate of proportion of success in a
sample of both groups
qp = (1- pp) or a pooled estimate of proportion of
failures in a sample of both groups
n1= sample size for group 1
n2= sample size for group 2
Z-Test for Differences of
Proportions
n1 p1  n2 p2
p
n1  n2
Z-Test for Differences of
Proportions
S p1  p2
1 
 1
 .375 .625 


 100 100 
 .068
A Z-Test for Differences of
Proportions

100 .35  100 .4 
p
100  100
 .375
Testing a Hypothesis about a
Distribution
• Chi-Square test
• Test for significance in the analysis of
frequency distributions
• Compare observed frequencies with
expected frequencies
• “Goodness of Fit”
Chi-Square Test
(Oi  Ei )²
x²  
Ei
Chi-Square Test
x² = chi-square statistics
Oi = observed frequency in the ith cell
Ei = expected frequency on the ith cell
Chi-Square Test
Estimation for Expected
Number for Each Cell
E ij 
R iC
n
j
Chi-Square Test
Estimation for Expected
Number for Each Cell
Ri = total observed frequency in the ith row
Cj = total observed frequency in the jth column
n = sample size
Hypothesis Test of a Proportion
 is the population proportion
p is the sample proportion
 is estimated with p
Hypothesis Test of a Proportion
H0 :   . 5
H1 :   . 5
Sp 
0.60.4
100
 .0024
.24

100
 .04899
.6  .5
p 

Zobs 
.04899
Sp
.1
 2.04

.04899
Hypothesis Test of a Proportion:
Another Example
n  1,200
p  .20
Sp 
pq
n
Sp 
(.2)(.8)
1200
Sp 
.16
1200
Sp  .000133
Sp  . 0115
Hypothesis Test of a Proportion:
Another Example
Z
p
Sp
.20  .15
.0115
.05
Z
.0115
Z  4.348
The Z value exceeds 1.96, so the null hypothesis should be rejected at the .05 level.
Indeed it is significantt beyond the .001
Z