Regression models - Two covariates, Quantitative outcome, (2-5

university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Faculty of Health Sciences
Regression models
Two covariates, Quantitative outcome, (2-5-2015)
Per Kragh Andersen and Lene Theil Skovgaard
Dept. of Biostatistics
1 / 58
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Multiple regression, no interaction
PKA & LTS, Sect. 5.1, 5.1.1, 5.1.2, 5.1.3
Confounding
I
I
I
I
I
I
The need for more than one covariate
Confounding
Adjusted vs. unadjusted estimates
Two-way anova
Ancova
Model check
Home pages: http://biostat.ku.dk/~pka/regrmodels15
2 / 58
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
So far: a single covariate
When is it reasonable to use models with only a single covariate?
I Randomized clinical trials, comparing
I
I
I
Construction of reference curves, such as
I
I
I
3 / 58
two or more treatments
dose groups
“weight for height”
hormone level vs. gestational age
blood pressure vs. age
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Now: more than one covariate
When do we need more than a single covariate?
I In observational studies:
I
I
Randomized clinical trials
I
I
I
4 / 58
Potential risk factors may be associated
with the covariate of interest
If the randomization is poorly conducted
If important risk factors exist
(If the trial is small)
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Hypothetical example
Gender difference in lung function?
Model:
Analysis: Simple T-test
Problem:
I
The age distributions for men and women may differ
I
The height distributions surely differ
5 / 58
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Confounding
Age is a confounder for the gender comparison, if
I
Age distributions differ for men and women
I
Age has an effect on lung function (which it surely has)
Such a confounder may affect the original comparison.
6 / 58
university of copenhagen
Is height a confounder?
Height is
I
an intermediate variable
I
a mediator of the gender effect
7 / 58
d e pa rt m e n t o f b i o s tat i s t i c s
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Interpretation of gender difference, I
Estimate the typical difference in Fev1 for men and women
8 / 58
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Interpretation of gender difference, II
Estimate the difference in Fev1 for a man and a women
of equal heights
If the situation is
this latter effect will be zero
9 / 58
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Example: Vitamin D
Two binary covariates:
xi,1 body mass index (BMI) for the ith woman
I normal weight (18.5 < BMI < 25)
I overweight (BMI ≥ 25)
xi,2 country for the ith woman (Ireland or Poland)
yi log10 of vitamin D status for the ith woman
10 / 58
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Two-way anova, Additive model
E(yi ) = LPi = a + b1 I (xi,1 ≥ 25) + b2 I (xi,2 = Ireland).
Normal Weight
Overweight
Difference
Poland
Ireland
a
a + b2
a + b1
a + b1 + b2
b1
b1
Difference
b2
b2
0
I
The effect of overweight is b1 , for both countries
I
The difference between countries is b2
no matter body stature
The effects are mutually adjusted, and there is no interaction
11 / 58
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Vitamin D averages
on log10 scale, with numbers of women in brackets
Normal Weight
Overweight
Difference
Poland
Ireland
1.598 (12)
1.720 (16)
1.443 (53)
1.593 (25)
–0.155
–0.127
Difference
0.121 (28)
0.150 (78)
0.028
I
Are the differences between countries the same no matter
body stature?
I
And vice versa?
It looks quite reasonable, since 0.028 is small
12 / 58
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Graphical display of averages
Average log10 (25OHD values) in four (country by BMI)-groups.
The size of a bar reflects the sample size.
13 / 58
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Estimates from additive model
I
ab = 1.587(0.043)
b
b1 = −0.141(0.044), overweight vs. normal weight
I b
b2 = 0.142(0.039), Ireland vs. Poland
I
b
b1 is an average of the two country specific differences, weighted
according to country size
b
b2 is an average of the two stature specific differences, weighted
according to (stature) group size
These are mutually adjusted estimates
14 / 58
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Estimated (predicted) mean values in the four groups
b i) = a
b+b
E(y
b1 I (xi,1 ≥ 25) + bb2 I (xi,2 = Ireland).
Normal Weight
Overweight
Difference
Poland
Ireland
1.587
1.729
1.446
1.588
-0.141
-0.141
Difference
0.142
0.142
0
Compare to the crude averages from the previous table
15 / 58
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Adjusted vs. unadjusted estimates
Unadjusted
Adjusted
overweight vs
normal weight
Ireland vs
Poland
-0.177 (0.045)
-0.141 (0.044)
0.171 (0.040)
0.142 (0.039)
Unadjusted (marginal) estimates:
I
Overweight vs. normal weight:
too large, because the group of overweight women is
dominated by Polish women (with low values)
I
Ireland vs. Poland:
too large in favour of Ireland,
because Poland has so many overweight women
16 / 58
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Confounding
BMI and COUNTRY are confounders for each other
I
Overweight gives small vitamin D levels
I
Polish women have low vitamin D levels
and
I
More Polish women are overweight
In this situation:
Marginal (unadjusted) estimates exaggerate the effects
(they steal effect from the covariate not included in the model)
It may also go the other way....
17 / 58
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Test of no effect
in model with both covariates:
I
Effect of overweight:
I
I
I
Walds test: (−0.141/0.044)2 = 10.46 ∼ χ2 (1), P = 0.0012
T-test: (−0.141/0.044) = −3.20 ∼ t(75), P = 0.0020
Ireland vs. Poland
I
I
Walds test: (0.142/0.039)2 = 12.88 ∼ χ2 (1), P = 0.0003
T-test: (0.142/0.039) = 3.62 ∼ t(75), P = 0.0005
Clear evidence of effect of both covariates
18 / 58
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Back-transformation
Since we have analyzed logarithmic values, we have to make a
back-transformation:
log10 (y∗ ) = y = a + b1 x1 + b2 x2 ⇒
x1 x2
b2∗
y∗ = a∗ b1∗
where
a∗ = 10a ,
b1∗ = 10b1 ,
b2∗ = 10b2
The relation is not linear on the original scale:
It is multiplicative
19 / 58
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Interpretation of estimates
10b1 = 0.72, 10b2 = 1.39
b
b
I
For given country, median 25OHD-values among overweight
women are about 72% of those for normal weight women.
I
For a given BMI group, Irish women have median 25OHD
values about 39% larger than Polish women.
Confidence intervals:
10b−1.96·SD
b
to
10b+1.96·SD
b
and for the effect of BMI we get the interval (0.59,0.88).
20 / 58
university of copenhagen
Model assumptions
I
Additivity (no interaction)
I
I
Equal SD’s (variance homogeneity)
I
I
I
later ....
Plot residuals vs. predicted
Levenes test for variance homogeneity
Normality (or at least symmetry)
I
I
21 / 58
Histogram of residuals
Quantile plot of residuals
d e pa rt m e n t o f b i o s tat i s t i c s
university of copenhagen
Variance homogeneity
Residuals plotted against fitted values
x: Ireland, o: Poland.
Note: Two almost identical predicted values
22 / 58
d e pa rt m e n t o f b i o s tat i s t i c s
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Variance homogeneity, cont’d
SDs of log10 values, with numbers of women in brackets:
Poland
Ireland
Normal Weight
Overweight
0.126 (12)
0.164 (16)
0.213 (53)
0.192 (25)
Levenes test of equal SD’s yields P = 0.37
(for untransformed data: P = 0.027, i.e. rejection)
23 / 58
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Now to: General categorical covariates
3 BMI-groups, 4 countries
Average log10 vitamin D values (and numbers of women):
Normal
Weight
Denmark
Finland
Ireland
Poland
24 / 58
1.692
1.664
1.720
1.598
(20)
(9)
(16)
(12)
Slight
Overweight
1.545
1.665
1.626
1.393
(21)
(32)
(16)
(25)
Obese
1.603
1.562
1.534
1.488
(12)
(13)
(9)
(28)
university of copenhagen
Graphical display of averages
Parallel profiles? Additivity?
25 / 58
d e pa rt m e n t o f b i o s tat i s t i c s
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Model
xi,1 body mass index (BMI) for the ith woman
I normal weight (18.5 < BMI < 25)
I slight overweight (25 ≤ BMI < 30)
I obese (BMI ≥ 30)
xi,2 country for the ith woman (all four)
yi log10 of vitamin D for the ith woman
Additive model (two-way anova):
E(yi ) = a + b1,1 I (25 ≤ xi,1 < 30) + b1,2 I (30 ≤ xi,1 )
+b2,1 I (xi,2 = Denmark) + b2,2 I (xi,2 = Finland)
+b2,3 I (xi,2 = Ireland)
26 / 58
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Adjusted and unadjusted estimates
Parameter
b1,1 : slight overweight
vs. normal weight
b1,2 : obese
vs. normal weight
b2,1 : Denmark vs. Poland
b2,2 : Finland vs. Poland
b2,3 : Ireland vs. Poland
27 / 58
Adjusted
Estimate
SD
Unadjusted
Estimate
SD
–0.116
0.036
–0.116
0.037
–0.113
0.040
–0.143
0.040
0.120
0.171
0.147
0.040
0.039
0.043
0.142
0.168
0.171
0.040
0.040
0.043
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Test of no effect
Omnibus test for no effect of
I
Body stature:
I
I
I
Likelihood ratio: 11.67 ∼ χ2 (2), P = 0.003
F-test: 5.83 ∼ F (2, 207), P = 0.003
Country:
I
I
Likelihood ratio: 21.78 ∼ χ2 (3), P < 0.0001
F-test: 7.43 ∼ F (3, 207), P < 0.0001
Clear evidence of effect of both covariates
28 / 58
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
BMI as a quantitative covariate, Ancova
1. One binary covariate (x1 : country, Ireland and Poland)
2. One quantitative covariate (x2 : BMI), assumed to have a
linear effect on the mean value of the outcome
Additive model:
E(yi ) = a + b1 (xi,1 − 25) + b2 I (xi,2 = Ireland)
Estimates:
b
b1 = −0.0152(0.0045), slope of BMI effect
b
b2 = 0.131(0.040), Ireland vs. Poland
ab = 1.532(0.030), Polish women with BMI=25
29 / 58
university of copenhagen
Estimated relation: Parallel lines
Ireland (dots) and Poland (circles)
30 / 58
d e pa rt m e n t o f b i o s tat i s t i c s
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Adjusted estimates of "Ireland vs Poland"
Ireland and Poland differ a little in BMI:
x̄1,Irl = 26.36, x̄1,Pol = 28.94
Therefore: Unadjusted estimates of vitamin D level (marginal
averages) will not be directly comparable
Least squares means: Adjust to overall BMI average (27.94):
adj
ȳIrl
= 1.643 + (−0.0152)(27.94 − 26.36) = 1.618
adj
ȳPol
= 1.472 + (−0.0152)(27.94 − 28.94) = 1.487.
adj
adj
Difference: ȳIrl
− ȳPol
= 1.618 − 1.487 = 0.131
31 / 58
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Illustration of adjusted country effect
32 / 58
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Adjusted vs. unadjusted effect estimates
Unadjusted
Adjusted
BMI
Ireland vs Poland
-0.0195 (0.0045)
-0.0152 (0.0045)
0.171 (0.040)
0.131 (0.040)
Unadjusted estimates:
I
Effect of BMI:
too large, because women with high BMI are predominantly
from Poland
I
Ireland vs. Poland:
too large, because Poland has so many women with high BMI
33 / 58
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Interpretation of effects, I
Since we have analyzed logarithmic values, we have to make a
back-transformation:
Ireland vs Poland: 100.131 = 1.35
Irish women have a 35% higher level of vitamin D,
compared to Polish women with the same BMI
Confidence limits:
(100.131−1.96·0.040 , 100.131+1.96·0.040 ) = (1.13, 1.62)
34 / 58
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Interpretation of effects, II
Effect of a difference in 5 kg/m2 in BMI:
10−0.0152·5 = 0.84
A women with a BMI of 5 kg/m2 more than another woman
from the same country will have a 16% lower level of vitamin D
Confidence limits:
(10−0.0152−1.96·0.0045 , 10−0.0152+1.96·0.0045 ) = (0.76, 0.93)
35 / 58
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Model assumptions
I
No interaction between BMI and country, i.e.
additivity, with parallel regression lines
I
I
I
I
I
Plot residuals vs. covariate (BMI)
Test quadratic effect, or linear spline
Equal SD’s (variance homogeneity)
I
I
I
later ...
Linear effect of BMI on log10 (25OHD)
Plot residuals vs. predicted
F-test for equality between groups
Normality (or at least symmetry)
I
I
36 / 58
Histogram of residuals
Quantile plot of residuals
university of copenhagen
Linear effect of BMI?
Smooth curves for each country:
37 / 58
d e pa rt m e n t o f b i o s tat i s t i c s
university of copenhagen
Additivity
Are the slopes reasonably equal?
38 / 58
d e pa rt m e n t o f b i o s tat i s t i c s
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Test of non-linearity
using a linear spline, with cutpoints at 25 and 30
Country
BMI
(BMI-25)
I(BMI≥ 25)
(BMI-30)
I(BMI≥ 30)
P for
linearity
Poland
–0.0422 (0.0301)
0.0393 (0.0412)
–0.0033 (0.0248)
0.53
Ireland
–0.0221 (0.0201)
–0.0084 (0.0348)
0.0203 (0.0430)
0.89
39 / 58
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Predicted values for linear spline model
Blue: Ireland, Red: Poland
40 / 58
university of copenhagen
Variance homogeneity
Residuals vs fitted values: Pattern?
41 / 58
d e pa rt m e n t o f b i o s tat i s t i c s
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Test for identical variances
Do we see the same residual variation in the two countries?
F=
2
sPol
0.2052
=
= 1.53 ∼ F (63, 39)
2
0.1662
sIrl
resulting in P = 0.16
42 / 58
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Example: The birthweight study
For 107 babies, we have ultrasound measurements of
I
abdominal diameter (AD)
I
biparietal diameter (BPD)
shortly before birth.
Purpose of this study:
Describe the relationship between birthweight and these two
ultrasound measurements,
with the aim of predicting birthweight.
43 / 58
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Suggested model
Because of the “geometry” of the problem (volume of a child), AD
and BPD are likely to affect birthweight multiplicatively:
BW ≈ c0 AD b1 BPD b2
In order to work with a linear model we therefore make logarithmic
transformation of all variables:
yi = log10 (BWi ),
xi,1 = log10 (ADi ),
xi,2 = log10 (BPDi ).
44 / 58
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Marginal relation to AD
E(yi ) = −1.062 + 2.237xi,1
45 / 58
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Marginal relation to BPD
E(yi ) = −3.077 + 3.332xi,2 ,
46 / 58
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Two linear effects in the same model
E(yi ) = LPi = a + b1 xi,1 + b2 xi,2
describes a plane
47 / 58
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Estimation
Fit a plane to the data by minimizing the residual sum of squares
X
(yi − (a + b1 xi,1 + b2 xi,2 ))2
i
For the birthweight example the fitted plane is
c i = −2.546 + 1.467xi,1 + 1.552xi,2
LP
with a residual standard deviation of 0.0464
48 / 58
university of copenhagen
Observations and fitted plane
49 / 58
d e pa rt m e n t o f b i o s tat i s t i c s
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Adjusted vs unadjusted estimates
Model
log10 (AD)
(SD)
b
b
Mutually
adjusted
Unadjusted
only AD
only BPD
50 / 58
log10 (BPD)
(SD)
Residual SD
b
b
1.467
(0.147)
1.552
(0.229)
0.0464
2.237
-
(0.111)
-
3.332
(0.202)
0.0554
0.0646
university of copenhagen
Relation between the two covariates
Relations:
log10bpd = 3.176 + 0.496 * log10ad
log10ad = -1.205 + 1.214 * log10bpd
51 / 58
d e pa rt m e n t o f b i o s tat i s t i c s
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Confounding
The two explanatory variables are
I
strongly related to each other
I
both strongly related to the outcome
Therefore, the effect of, say xi,1 ,
depends on whether we adjust for xi,2 or not,
and vice versa
and the interpretation changes!
This is not to be confused with interaction (later....)
52 / 58
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Interpretations
Marginal (unadjusted) effect of e.g. a 10% increase in BPD:
1.13.332 = 1.37
A 10 % difference in BPD corresponds to a 37% difference in
expected birth weight
The corresponding adjusted estimate is
1.11.552 = 1.16
Two fetuses, with identical AD but with a 10 % difference in BPD
will differ with 16% in expected birth weight
53 / 58
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
But:
When BPD differs by 10%, the AD will typically also differ,
typically a little more than 10% (1.11.214 = 1.12),
giving rise to a factor on birthweight:
1.11.552 1.121.467 = 1.37
i.e. precisely the unadjusted effect
54 / 58
university of copenhagen
Residual plots for linearity
55 / 58
d e pa rt m e n t o f b i o s tat i s t i c s
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Residual plots for variance homogeneity and normality
56 / 58
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Test for linearity
by including quadratic effects for log10 (AD) and log10 (BPD)
log10 (AD)
log10 (BPD)
1.467 (0.147)
1.552 (0.229)
–5.688 (5.098)
1.699 (0.251)
1.418 (0.146)
–18.933 (9.852)
(log10 (AD))2
1.785 (1.271)
P=0.16
Problem with linearity in log10 (BPD),
mostly due to the fetus with the lowest BPD
57 / 58
(log10 (BPD))2
5.354 (2.574)
P=0.04
university of copenhagen
d e pa rt m e n t o f b i o s tat i s t i c s
Diagnostics
Cook’s distance for model with quadratic effect of log10 (BPD)
I
The fetus with the smallest BPD has a large influence
I
Omitting this, the quadratic effect of BPD is no longer
important
58 / 58