Exercise 1: Assume we are working with a model y = Xβ + ϵ, E(ϵ

Linear models and their mathematical foundations
Cynthia Huber, Steffen Unkel
Study sheet 4
Winter term 2016/17
Exercise 1:
Assume we are working with a model y = Xβ + , E() = 0n , Cov() = σ 2 In , but the true
covariance matrix of is σ 2 V. Find the mean of s2 obtained using the least squares approach
and show, specifically, that if X contains a column of ones and V = (1 − ρ)In + ρ1n 1Tn ,
0 < ρ < 1, then E(s2 ) = σ 2 (1 − ρ) < σ 2 (thus s2 underestimates σ 2 ). Describe verbally the
covariance structure implied by σ 2 V.
Exercise 2:
Prove the following Theorem.
If y is Nn (Xβ, σ 2 V), where X is full-rank and V is a known positive definite matrix, where
X is a n × (k + 1) of rank k + 1, then the maximum likelihood estimators for β and σ 2 are
β̂ = (XT V−1 X)−1 XT V−1 y
and
σ̂ 2 =
1
(y − Xβ̂)T V−1 (y − Xβ̂).
n
Exercise 3:
Use the dataset mtcars which is available in R.
Assume the following model:
mpgi = β0 + β1 · cyli + β2 · hpi + β3 · wti + i ,
i ∼ N (0, σ 2 ),
i = 1, . . . , 32
a) Define the matrix C and the vector t of H0 : Cβ = t for the following hypotheses:
i) None of the covariates influences the response mpg.
ii) The number of cylinders cyl and the horsepower hp have the same influence.
iii) The influence of weight wt is −3.
b) Which of the mentioned hypotheses are considered in the R summary?
c) Calculate the coefficients under the null hypotheses H0 considered in a).
Date: 17 November 2016
Page 1
Exercise 4:
iid
Let x = (x1 , . . . , xn )T be a random vector with xi ∼ N (µ, σ 2 ) for i = 1, . . . , n and µ is
known. The parameter σ 2 should be estimated. Look at the estimator
T = T (x) =
n
X
(xi − µ)2 .
i=1
Is T sufficient for σ 2 ?
Exercise 5:
When fitting a model yi = β0 +β1 xi1 +β2 x2i1 +i ,
lm() in R, we obtain the following output:
i = 1, . . . , n to our data using the function
Coefficients:
Estimate
Std. Error
t value
(Intercept)
0.74669
0.24922
2.996
x
-0.45842
0.12677
x2
0.02796
0.01255
P r(> |t|)
2.227
Residual standard error: 0.9343 on 73 degrees of freedom
Multiple R-squared: 0.3383, Adjusted R-squared: 0.3202
F-statistic:
on 2 and 73 DF, p-value: 2.841e-07
a) How many observations did we use? (The design matrix in our model was of full column
rank.)
b) Fill in the missing values. You know that the missing p-values were 0.02900, 0.000547,
0.003733. Fill in these values.
c) Test the overall regression hypothesis (significance level 1%).
d) Can we reject (at 5% significance level) the hypothesis that the regression function goes
through the origin?
Date: 17 November 2016
Page 2