linear models.

Mathematics 241
Linear models
Observation = Model + Error
November 17
Observation = Fitted + Residual
y = β0 + β1 x + ε
y = β̂0 + β̂1 x + e.
Diffusivity
Problem 2 in the supplementary exercises for Chapter 8 gives data on the diffusivity (in 10−7 m2 s−1 ) of polycarbonate
as a function of temperature (degrees Celsius). Perhaps the relationship is linear. The data are in ex8_S_2 and the
variables are Diff and Temp.
ˆ Plot the data.
> xyplot(Diff~Temp,data=ex8_S_2,type=c('p','r'))
ˆ Compute the linear model object.
> ld=lm(Diff~Temp,data=ex8_S_2)
ˆ Summarize the results.
> summary(ld)
Call:
lm(formula = Diff ~ Temp, data = ex8_S_2)
Residuals:
Min
1Q
-0.16379 -0.04498
Median
0.00753
3Q
0.04552
Coefficients:
Estimate Std. Error t
(Intercept) 1.578083
0.058980
Temp
-0.002464
0.000364
--Signif. codes: 0 '***' 0.001 '**'
Max
0.21158
value Pr(>|t|)
26.76 2.0e-13 ***
-6.76 9.1e-06 ***
0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.0993 on 14 degrees of freedom
Multiple R-squared: 0.766,
Adjusted R-squared: 0.749
F-statistic: 45.7 on 1 and 14 DF, p-value: 9.13e-06
ˆ Write confidence intervals for the parameters.
> confint(ld)
2.5 %
97.5 %
(Intercept) 1.4515830 1.7045829
Temp
-0.0032449 -0.0016822
ˆ Make predictions about individual x.
> x=data.frame(Temp=c(100,200))
> predict(ld,x,interval='prediction')
fit
lwr
upr
1 1.3317 1.10924 1.5542
2 1.0854 0.86203 1.3087
> predict(ld,x,interval='confidence')
fit
lwr
upr
1 1.3317 1.2671 1.3963
2 1.0854 1.0179 1.1529
ˆ Plot the residuals to verify that a linear model is appropriate.
> plot(residuals(l)~fitted(l))
Page 2
Industrial jaw crushers
A number of industrial jaw crushers were tested for feed rate (in 100 tons/hr) and pwer drawn (in kW). The data
are in table8_3 and the variables are Power and Feed.Rate. It is hypothesized that there is a linear model for these
variables is appropriate:
Power = β0 + β1 Feed.Rate
1. Write the equation of the least-squares (fitted) line.
2. Write a sentence interpreting the slope of the least-squares line in the context of the data.
3. Write a 95% confidence interval for the slope of the least-squares line.
4. Predict the power drawn by industrial jaw crushers with feed rate of 100 tons/hr and 200 tons/hr.
5. Examine the residuals of the model and determine whether a least-squares line is appropriate.
More than one explanatory variable
Observation = Model + Error
y = β0 + β1 x 1 + · · · + βp x p + ε
Observation = Fitted + Residual
y = β̂0 + β̂1 x1 + · · · + β̂p xp + e
Stack loss
The built-in R dataset stackloss has measurements on 21 days of operation of a plant for the oxidation of ammonia
(NH3) to nitric acid (HNO3). Air.Flow represents the rate of operation of the plant. Water.Temp is the temperature
of cooling water circulated through coils in the absorption tower. Acid.Conc. is the concentration of the acid
circulating, minus 50, times 10: that is, 89 corresponds to 58.9 per cent acid. stack.loss (the response variable) is
10 times the percentage of the ingoing ammonia to the plant that escapes from the absorption column unabsorbed;
that is, an (inverse) measure of the over-all efficiency of the plant. (i.e., high values of stack.loss represent
inefficiency). A model for the plant might be
stack.loss = β0 + β1 Air.Flow + β2 Water.Temp + β3 Acid.Conc + ε
ˆ Plot the data. Not easy – we’re dealing with four dimensional space here! But we can at least look at all pairwise
plots.
> splom(stackloss)
6. From the plot, which variable appears to be most strongly related to stack.loss?
ˆ We fit the model in exactly the same way – by minimizing the sums of the squares of the residuals.
> lsl = lm(stack.loss~ Air.Flow + Water.Temp + Acid.Conc., data=stackloss)
Notice that the + sign on the right hand side of the model equation does not mean simple arithmetic addition but
rather it indicates that the model includes a linear term for that variable.
ˆ summary again summarizes information about the coefficients and the fit, confint produces confidence intervals,
and residuals() and fitted() return residuals and fitted values.
Mathematics 241
Linear models
November 17
> summary(lsl)
Call:
lm(formula = stack.loss ~ Air.Flow + Water.Temp + Acid.Conc.,
data = stackloss)
Residuals:
Min
1Q Median
-7.238 -1.712 -0.455
3Q
2.361
Max
5.698
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -39.920
11.896
-3.36
0.0038 **
Air.Flow
0.716
0.135
5.31 5.8e-05 ***
Water.Temp
1.295
0.368
3.52
0.0026 **
Acid.Conc.
-0.152
0.156
-0.97
0.3440
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.24 on 17 degrees of freedom
Multiple R-squared: 0.914,
Adjusted R-squared: 0.898
F-statistic: 59.9 on 3 and 17 DF, p-value: 3.02e-09
> confint(lsl)
2.5 %
97.5 %
(Intercept) -65.01803 -14.82131
Air.Flow
0.43111
1.00017
Water.Temp
0.51882
2.07175
Acid.Conc.
-0.48187
0.17763
> xyplot(residuals(lsl)~fitted(lsl))
7. Write the equation of the fitted model. Make sure to put in any hats that are appropriate.
8. Interpret the coefficient of Water.Temp in the model.
9. If you had to make a simpler model by leaving out one of the variables, which one would you choose and why?
10. Refit the model by leaving out the variable you chose in Question 9. What is the equation of the new fitted model?
11. Compare the two models (the full model and the one with one fewer variable) as far as usefulness in predicting
stack.loss.
Page 4
Ice
Execise 8.3.18 of Navidi gives data on the thickness of ice in 13 Minnesota lakes. The data are in ex8_3_18. The
variables are
y
x1
x2
x3
maximum ice thickness in mm
average number of days per year of ice cover
average number of days low temperature is lower than 8 Centigrade
average snow depth in mm
Fit a model of the form y = β0 + β1 x1 + β2 x2 + β3 x3 + ε to this data.
12. Write the equation of the fitted model.
13. Do lakes with greater snow cover tend to have greater or less maximum ice thickness? Explain.
14. Are all the explanatory variables useful in this model?