Goodman`s and Kruskal`s Gamma

Topics:
Non-Parametric Measures of Correlation
Spearman’s Rho
Goodman’s & Kruskal’s Gamma
Stat203
Fall2011 – Week 13, Lecture 1
Page 1 of 26
More Correlation?
The Pearson correlation we’ve already studied was
somewhat restrictive.
Recall, Pearson correlation was usually on good when:
- both variables are interval or ratio scaled
- both variables are approximately normally distributed
What about when we want to know the relationship
between two variables that are:
- interval or ratio but not normally distributed
- ordinal
- nominal
Stat203
Fall2011 – Week 13, Lecture 1
Page 2 of 26
Correlation: Interval or Ratio data that
isn’t Normally Distributed
Spearman’s Rho is a measure of correlation you can use
when the data is strongly skewed, or you’re not sure
whether the variables are Normally distributed.
It also sometimes called the Spearman Rank-Order
correlation coefficient, so let’s define ‘Rank-Order’ first.
An observation’s rank is it’s order (highest to smallest) out
of the n observations in the dataset.
An example …
Stat203
Fall2011 – Week 13, Lecture 1
Page 3 of 26
Example (C12 Q5): Is there a relationship between
Distance from School and Number of clubs joined?
First, let’s look at ‘ranks’:
Distance to
School (miles)
Rank Order of
Distance to School
Number of
Clubs Joined
Lee
4
3
Rhonda
2
1
Jess
7
5
Evelyn
1
2
Mohammed
4
1
Steve
6
1
George
9
9
Juan
7
6
Chi
7
5
David
17
8
Stat203
Fall2011 – Week 13, Lecture 1
Page 4 of 26
Rank order of Number
of Clubs Joined
Now that we have the ranks of each individual’s value of
each variable, Spearman’s Rho is actually calculated using
exactly the same formula as Pearson’s correlation, but
using the ranks!
Let’s look at this using SPSS.
First off, let’s examine the histograms of these two
variables to see why we shouldn’t use the Pearson
Correlation.
Stat203
Fall2011 – Week 13, Lecture 1
Page 5 of 26
Are these variables normally distributed?
... the sample size is small, so it’s hard to tell … but there is
one easy way to see which correlation to use.
Stat203
Fall2011 – Week 13, Lecture 1
Page 6 of 26
Use SPSS to calculate both the Pearson and Spearman
Correlations
Stat203
Fall2011 – Week 13, Lecture 1
Page 7 of 26
If the data were perfectly normally distributed, the
Spearman and Pearson correlations would be identical!
In this case, the Pearson and Spearman correlation
coefficients are not identical and the histograms seem to
show some skewness, so we should use the Spearman’s
Rho as the correlation.
So, our conclusion from examining this data is that
Distance to School and the number of clubs joined have a
statistically significant (p-value = 0.002), positive
correlation (ρ = 0.838)
Stat203
Fall2011 – Week 13, Lecture 1
Page 8 of 26
Correlations between Ordinal Variables?
From the previous example, we should note that the key to
calculating Spearman’s Rho was to identify the rank-order
of each individual for each variable.
Recall, Ordinal variables only give us an ‘ordering’ … or the
rank of one individual compared to another!
So … we can use Spearman’s Rho for correlations
involving ordinal data!
Stat203
Fall2011 – Week 13, Lecture 1
Page 9 of 26
Example (Ch12, q7): A researcher ranks population
density and Quality of Life for 10 cities. Is there a
relationship between these two variables?
Research Question:
Individuals:
Population:
Variables:
Parameter:
Stat203
Fall2011 – Week 13, Lecture 1
Page 10 of 26
Statistical Hypothesis:
… From SPSS:
Conclusion:
Stat203
Fall2011 – Week 13, Lecture 1
Page 11 of 26
… but note that the Spearman correlation is identical to the
Pearson!
This is because the data we analyzed was the ranks …
and when only the ranks are available Spearman and
Pearson will be the same.
Stat203
Fall2011 – Week 13, Lecture 1
Page 12 of 26
Example (Ch12, Q11): Comparing High School GPA to
College performance. Is there a relationship between the
two?
Research Question:
Individuals:
Population:
Variables:
Parameter:
Stat203
Fall2011 – Week 13, Lecture 1
Page 13 of 26
Statistical Hypothesis:
From SPSS:
Conclusion:
Stat203
Fall2011 – Week 13, Lecture 1
Page 14 of 26
… but note now that the Pearson does not match the
Spearman.
Why?
Only one variable contained ranks, the other variable was
ratio scaled. So, for determining the correlation between
ratio and ordinal data, we should use the Spearman.
Stat203
Fall2011 – Week 13, Lecture 1
Page 15 of 26
Goodman’s and Kruskal’s Gamma
Although Spearman’s Rho can be used in most cases
involving ordinal data, if you have ‘lots’ of ties, you may
have to use Gamma as an alternative.
What’s a tie? A tie is when many many individuals will have
the same value of a variable, or combination of variables.
Why would there be lots of ties?
Think back to the homework; recall the General Happiness
variable from the GSS – there were only three categories
for this ordinal variable and most people selected ‘Pretty
Happy’ … they were all tied.
Gamma in SPSS
Stat203
Fall2011 – Week 13, Lecture 1
Page 16 of 26
As with the other statistics, we won’t calculate this by hand.
But it’s easy to find in SPSS.
Example (Ch12, Q12): Is there a relationship between
SocioEconomic Status and Number of books read?
Let’s first look at this data in SPSS:
Stat203
Fall2011 – Week 13, Lecture 1
Page 17 of 26
Stat203
Fall2011 – Week 13, Lecture 1
Page 18 of 26
Are there ties?
Stat203
Fall2011 – Week 13, Lecture 1
Page 19 of 26
Let’s do a cross-tab to obtain the table in the textbook, but
note that we can generate some statistics along the way:
Stat203
Fall2011 – Week 13, Lecture 1
Page 20 of 26
… and the output:
Stat203
Fall2011 – Week 13, Lecture 1
Page 21 of 26
So, what would our conclusion be regarding the
relationship between these variables?
When making conclusions regarding ‘relationship’
questions, quote the strength and direction (ie: the actual
correlation) and the p-value or ‘significance’.
Conclusion:
Stat203
Fall2011 – Week 13, Lecture 1
Page 22 of 26
So, we’ve studied 3 different ways to calculate correlation:
- Pearson’s r
- Spearman’s Rho
- Goodman’s and Kruskall’s Gamma
How do I know which to use?
- Consider the type of variables involved
- Consider the distribution of the variables
- Consider the # of ties
... and if all else fails, if Spearman’s gives a different conclusion
than Pearson, use Spearman … and if it looks like you have 10%
or more of your data with the same value of one variable or the
other, use Gamma
Stat203
Fall2011 – Week 13, Lecture 1
Page 23 of 26
For all correlations …
All correlations we have studied have a maximum of 1 and
a minimum of 1 and describe the strength and direction of
the relationship between TWO variables.
SPSS provides a p-value for all correlations, and all are
interpreted the same (significance of the relationship).
Research Hypotheses involving correlations always ask
about a significant relationship between the variables.
Stat203
Fall2011 – Week 13, Lecture 1
Page 24 of 26
Today’s Topics
Non-Parametric Measures of Correlation
- Pearson’s r isn’t always good enough
- All correlations have similar interpretations regarding
strength and direction of relationship
- All correlations have a p-value which is interpreted
similarly
Spearman’s Rho
- for non-normal (ie: skewed) interval or ratio-scaled
variables
- if one or more of the variables are ordinal
Goodman’s & Kruskal’s Gamma
- useful for correlations between ordinal variables with
lots of ties
Reading:
Stat203
Fall2011 – Week 13, Lecture 1
Page 25 of 26
This lecture included material from Chapter 12 up to
page 430.
No more reading for this course!
Stat203
Fall2011 – Week 13, Lecture 1
Page 26 of 26