National Education Longitudinal Survey

Mathematics 344
Linear Algebra and Multiple Regression
National Education Longitudinal Study of 1988. (nels88 of the faraway package)
Gender
Race
socioeconomic status
parents educational level
Mathematics test score
sex
race
ses
paredu
math
summary(nels88)
sex
Female:128
Male :132
race
White
:189
Asian
: 8
Black
: 40
Hispanic: 23
ses
Min.
:-2.4100
1st Qu.:-0.7700
Median :-0.1450
Mean
:-0.0733
3rd Qu.: 0.8125
Max.
: 1.8500
paredu
ba
:28
college:83
hs
:50
lesshs :39
ma
:34
phd
:26
math
Min.
:31.0
1st Qu.:42.0
Median :49.5
Mean
:51.3
3rd Qu.:62.0
Max.
:71.0
l1 <- lm(math ~ 1, data = nels88)
lses <- lm(math ~ ses, data = nels88)
lsex <- lm(math ~ sex, data = nels88)
lrace <- lm(math ~ race, data = nels88)
lparedu <- lm(math ~ paredu, data = nels88)
lsessex
lsexses
lsespar
lparses
<<<<-
lm(math
lm(math
lm(math
lm(math
~
~
~
~
ses + sex, data = nels88)
sex + ses, data = nels88)
ses + paredu, data = nels88)
paredu + ses, data = nels88)
lall <- lm(math ~ ses + race + paredu, data = nels88)
Model Comparison Tests
ω
Ω
the “small” model
the “big” model, the model space of ω is contained in that of Ω
Example:
ω
Ω
math
math
∼1
∼1
+ ses
+ ses + paredu
The model utility test
H0 : Ω does not “explain” y any better than ω (e.g., βi = 0 for all new βi )
Test Statistic
F =
(SSEω − SSEΩ ) /(dimΩ − dimω )
SSEΩ) /(n − dimΩ )
April 7, 2015
Page 2
summary(lses)
Call:
lm(formula = math ~ ses, data = nels88)
Residuals:
Min
1Q Median
-21.75 -6.04 -0.42
3Q
6.31
Max
22.72
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
51.823
0.544
95.3
<2e-16
ses
7.128
0.560
12.7
<2e-16
Residual standard error: 8.74 on 258 degrees of freedom
Multiple R-squared: 0.386,Adjusted R-squared: 0.383
F-statistic: 162 on 1 and 258 DF, p-value: <2e-16
summary(lsespar)
Call:
lm(formula = math ~ ses + paredu, data = nels88)
Residuals:
Min
1Q
-21.189 -5.989
Median
-0.268
3Q
6.046
Max
24.009
Coefficients:
(Intercept)
ses
pareducollege
pareduhs
paredulesshs
pareduma
pareduphd
Estimate Std. Error t value Pr(>|t|)
58.571
1.781
32.88 < 2e-16
2.791
1.310
2.13 0.03401
-7.521
2.141
-3.51 0.00053
-12.179
2.642
-4.61 6.4e-06
-13.365
3.333
-4.01 8.0e-05
-0.871
2.245
-0.39 0.69845
-2.049
2.520
-0.81 0.41688
Residual standard error: 8.45 on 253 degrees of freedom
Multiple R-squared: 0.437,Adjusted R-squared: 0.424
F-statistic: 32.7 on 6 and 253 DF, p-value: <2e-16
anova(lses, lsespar)
Analysis of Variance Table
Model 1:
Model 2:
Res.Df
1
258
2
253
math ~ ses
math ~ ses + paredu
RSS Df Sum of Sq
F Pr(>F)
19725
18083 5
1642 4.6 0.0005
Page 3
Distribution of test statistic under null hypothesis without normal assumptions
l <- lm(math ~ ses + rand(5), data = nels88)
anova(lses, l)
Analysis of Variance Table
Model 1:
Model 2:
Res.Df
1
258
2
253
math ~ ses
math ~ ses + rand(5)
RSS Df Sum of Sq
F Pr(>F)
19725
19498 5
228 0.59
0.71
r <- do(10000) * {
anova(lses, lm(math ~ ses + rand(5), data = nels88))$F[2]
}
1 - pdata(4.6, result, data = r)
[1] 0.0007
p <- densityplot(~result, data = r, plot.points = F)
plotFun(df(x, 5, 253) ~ x, plot = p, add = T, col = "red")
Density
0.6
0.4
0.2
0.0
0
2
4
result
6