Correlation coefficient

Correlational analysis
195
195
190
190
190
185
185
185
180
Size
195
Size
Size
Scatterplot
180
180
175
175
175
170
170
170
165
165
165
65
70
75
80
Weight
85
90
95
73
74
75
Weight
76
77
65
70
75
80
Weight
85
90
95
Correlational tests
Nominal
1. Phi coefficient
2. Cramer’s V
3. Pearson χ2 test
4. Loglinear analysis
Ordinal
1. Spearman’s Rho
2. Kendall’s Tau
Interval
1. Pearson’s r
Correlation coefficient + effect size
Correlation coefficient
Shared variance (effect
size)
r = 0.0
0.00
Kein Zusammenhang
r = 0.1
0.01 (1%)
Geringe Korrelation
r = 0.2
0.04 (4%)
r = 0.3
0.09 (9%)
r = 0.4
0.16 (16%)
r = 0.5
0.25 (25%)
r = 0.6
0.36 (36%)
r = 0.7
0.49 (49%)
r = 0.8
0.64 (64%)
r = 0.9
0.81 (81%)
r = 1.0
1.00 (100%)
Mittlere Korrelation
Hohe Korrelation
Sehr hohe Korrelation
Beispiel: MLU & Age
Child
1
2
3
4
5
6
7
8
9
10
11
12
Age in months
24
23
32
20
43
58
28
34
53
46
49
36
MLU
2.10
2.16
2.25
1.93
2.64
5.63
1.96
2.23
5.19
3.45
3.21
2.84
Beispiel: MLU & Age
Long: There is an association between age and MLU.
The r of .887 showed that 78.6% (r2) of the variation in
MLU was accounted for by the variation in age. The
associated probability level of 0.001 showed that such a
result is unlikely to have arisen from sampling error.
Short: As can be seen in the table above, there is a
strong correlation between age and MLU (r = .887, p
= .001).
Beispiel: Typicality & Frequency
Words
car
truck
sports car
motor bike
train
bicycle
ship
boat
scat board
space shuttle
Typicality rank
1
2
3
4
5
6
7
8
9
10
Frequency rank
1
2
6
5
3
4
8
7
9
10
Beispiel: Typicality & Frequency
Kendall’s tau ( = .733, p = .003)
Spearman’s rho (rs = .879, p = .001)
Partial correlation
Phoneme & Silben
Phoneme & Häufigkeit
Silben & Häufigkeit
r = .898, p = .001
r = .795, p = .006
r = .677, p = .031
Phoneme & Häufigkeit
(Silben Konstant)
r = .578, p = .103
Nominal Daten
Es gibt eine signifikante Korrelation zwischen
Geschlecht (Boys vs. Girls) und der Präferenz für ein
bestimmtes Spielzeug (mechanisch vs. nichtmechanisch) (χ2 = 49,09, df = 2, p = .001).
Phi-Koeffizient
Cramer’s
 = .70, p < .001
V = .70, p < .001
Regression
Correlation - Regression
Correlational analysis gives us a measure that
represents how closely the data points are associated.
Regression analysis measures the effect of the predictor
variable x on the criterion y. – How much does y change
if you change x.
A correlational analysis is purely descriptive, whereas
a regression analysis allows us to make predictions.
Types of regression analysis
Linear regression
Predictor
variable
1 interval
Criterion (target)
variable
1 interval
Types of regression analysis
Linear regression
Multiple regression
Predictor
variable
1 interval
Criterion (target)
variable
1 interval
2+ (some of the 1 interval
variables can
be categorical)
Types of regression analysis
Linear regression
Multiple regression
Predictor
variable
Criterion (target)
variable
1 interval
1 interval
2+ (some of the 1 interval
variables can be
categorical)
Logistic regression
1+
1 categorical
Discriminant analysis 1+
1 categorical
Line-of-best-fit
6,00
MLU
5,00
4,00
3,00
R-Quadrat linear =
0,786
2,00
20,00
30,00
40,00
Age
50,00
60,00
Linear Regression
y = bx + a
y
x
b
a
=
=
=
=
variable to be predicted
given value on the variable x
value of the slope of the line
the intercept (or constant), which is the
place where the line-of-best-fit intercepts
the y-axis.
Linear Regression
Given a score of 20 on the x-axis, a slope of b = 2,
and an interception point of a = 5, what is the
predicted score?
y = (2  20) + 5 = 45
Beispiel: MLU & Age
Child
Age in
months
MLU
1
2
3
4
5
6
7
8
9
10
11
12
24
23
32
20
43
58
28
34
53
46
49
36
2.10
2.16
2.25
1.93
2.64
5.63
1.96
2.23
5.19
3.45
3.21
2.84
How much does
MLU increase
with growing
age?
Linear Regression
There is s strong association between age and MLU (R =
0.887). Specifically, it was found that the children’s MLU
increases by an average of .088 words each months (t =
6,069, p < 0.001), which amounts to about a word a year.
Since the F-value (36,838, df = 1) is highly significant (p <
001), these results are unlikely to have arisen from
sample error.
Multiple Regression
Several predictor variables influence the criterion.
Plane-pf-best-fit
1. Simultaneous multiple regression
2. Stepwise multiple regression
Simultaneous Multiple Regression
Eine Universität möchte wissen, welche Faktoren am
besten dazu geeignet sind, den Lernerfolg ihrer
Studenten vorherzusagen. Als Indikator für den
Wissensstand der Studenten gilt die Punktzahl in einer
zentralen Abschlussklausur. Als mögliche Faktoren
werden in Betracht gezogen: (1) Punktzahl beim
Eingangstest, (2) Alter, (3) IQ Test, (4) Punktzahl bei
einem wissenschaftlichen Projekt.
Simultaneous Multiple Regression
Predictor
entrance exam
age
IQ
Scientific project
Criterion
final exam
There is s strong association between the predictor
variables and the result of the final exam (Multiple R =
0.875; F = 22,783, df = 4, p = .001 ). Together they account
for 73% of the variation in the exam succes. If we look at
the four predictor variables individually we find that the
result of the entrance exam (B = .576, t = 5.431, p = .001)
and the IQ score (B = .447, t = 4.606, p = .001) make the
strongest contributions (i.e. they are the best predictors).
The predictive value of age (B = .099, t = 5.431, p = .327
and the score on the scientific project is not significant (B =
0.141, t = 1,417, p = 0.168.).
Stepwise Multiple Regression
In stepwise regression you begin with one
independent variable and add one by one. The
order of addition is automatically determined by the
effect of the independent variable on the
dependent variable.
Assumptions Multiple Regression
1. At least 15 cases
2. Interval data
3. Linear relationship between predictor variables
and criterion.
4. No outliers (or delete them)
5. Predictor variables should be independent of
each other
Logistic Regression
In Linguistics, you often use logistic regression: Multiple
factors determine the choice of linguistic alternates:
1. look up the number - look the number up
2. that-complement clauses - zero-complement clause
3. intial adverbial clause - final adverbial clause
4. aspirate /t/ - unaspirated /t/ - glottal stop - flap