Fathers’ and daughters’ heights
Fathers’ heights
mean = 67.7
SD = 2.8
55
60
65
70
75
70
75
height (inches)
Daughters’ heights
mean = 63.8
SD = 2.7
55
60
65
height (inches)
Pearson and Lee (1906) Biometrika 2:357-462
1376 pairs
Fathers’ and daughters’ heights
corr = 0.52
Daughter’s height (inches)
70
65
60
55
60
65
70
75
Father’s height (inches)
Pearson and Lee (1906) Biometrika 2:357-462
1376 pairs
Covariance and correlation
Let X and Y be random variables with
µX = E(X), µY = E(Y), σX = SD(X), σY = SD(Y)
For example, sample a father/daughter pair and let
X = the father’s height and Y = the daughter’s height.
Covariance
Correlation
cov(X,Y) = E{(X – µX) (Y – µY)}
cor(X, Y) =
cov(X, Y)
σXσY
−1 ≤ cor(X, Y) ≤ 1
cov(X,Y) can be any real number.
Examples
corr = 0.1
30
25
25
0
20
−1
15
−2
10
−2
−1
0
1
2
Y
30
1
−3
10
5
10
15
20
25
30
5
30
25
25
20
20
20
15
15
10
10
5
5
15
20
25
30
Y
30
25
10
5
10
15
20
25
30
5
20
20
Y
25
20
Y
30
25
15
15
15
10
10
10
5
5
25
30
30
15
20
25
30
25
30
corr = −0.9
30
20
10
corr = 0.9
25
15
25
10
30
10
20
15
corr = 0.7
5
15
corr = −0.5
30
5
10
corr = 0.5
Y
Y
20
15
corr = 0.3
Y
corr = −0.1
2
Y
Y
corr = 0
5
5
10
15
20
25
30
5
10
15
20
Estimated correlation
Consider n pairs of data:
(x1, y1), (x2, y2), (x3, y3), . . . , (xn, yn)
We consider these as independent draws from some
bivariate distribution.
We estimate the correlation in the underlying distribution by:
P
− x̄)(yi − ȳ)
P
2
2
(
x
−
x̄
)
i
i(yi − ȳ)
i
r = pP
i (xi
This is sometimes called the correlation coefficient.
Correlation measures linear association
All three plots have correlation ≈ 0.7!
Fathers’ and daughters’ heights
corr = 0.52
Daughter’s height (inches)
70
65
60
55
60
65
70
75
Father’s height (inches)
Linear regression
Daughter’s height (inches)
70
65
60
55
60
65
70
Father’s height (inches)
75
Linear regression
Daughter’s height (inches)
70
65
60
55
60
65
70
75
Father’s height (inches)
Regression line
Daughter’s height (inches)
70
65
60
55
60
65
70
Father’s height (inches)
Slope = r × SD(Y) / SD(X)
75
SD line
Daughter’s height (inches)
70
65
60
55
60
65
70
75
Father’s height (inches)
Slope = SD(Y) / SD(X)
SD line vs regression line
Daughter’s height (inches)
70
65
60
55
60
65
70
Father’s height (inches)
Both lines go through the point (X̄, Ȳ).
75
Predicting father’s ht from daughter’s ht
Daughter’s height (inches)
70
65
60
55
60
65
70
75
Father’s height (inches)
Predicting father’s ht from daughter’s ht
Daughter’s height (inches)
70
65
60
55
60
65
70
Father’s height (inches)
75
Predicting father’s ht from daughter’s ht
Daughter’s height (inches)
70
65
60
55
60
65
70
75
Father’s height (inches)
There are two regression lines!
Daughter’s height (inches)
70
65
60
55
60
65
70
Father’s height (inches)
75
© Copyright 2026 Paperzz