notes

Mathematics 344
Model:
Y
X
β
March 24, 2015
Y = Xβ + random vector of responses (we use y for the observed value of Y )
random “error” vector
is the model matrix, each column is an explanatory variable, this is an n × (k + 1) matrix
(we assume the columns of X are linearly independent)
the vector of (unknown) coefficients (parameters)
Fitted model:
y
β̂
e
X β̂
Linear Algebra and Multiple Regression
y = X β̂ + e
observed values of response variable
fitted values of coefficients, (this could either be an estimator or an estimate, notation is ambiguous)
residual vector (defined by this equation)
also called ŷ, the fitted values of the response variable
“Least squares estimate” and its properties:
1. e = y − ŷ = y − X β̂ is orthogonal to the column space of X.
2. y − ȳ = (ŷ − ȳ) + e is a decomposition of y − ȳ into two orthogonal vectors.
3. |y − ȳ|2 = |ŷ − ȳ|2 + |e|2 (Pythagorean Theorem of Statistics)
Some important linear alegra facts:
1. The length of a vector u satisfies |u|2 = u · u = uT u
2. Two vectors u and v are orthogonal if and only if u · v = 0 if and only if uT v = 0.
The “Hat” Matrix
−1
H = X X TX
XT
Properties:
1. ŷ = Hy
2. H is symmetric (H T = H) and idempotent (H 2 = H).
3. e = (I − H)y
Definitions: If Z is a random vector (with components Zi ),
1. E(Z) = µZ is the vector with components E(Zi )
2. Cov(Z) is the matrix with the i, j th entry equal to Cov(Zi , Zj ). (Note that the ith diagonal entry of Cov(Z) is
the variance of Zi . )
Note: Cov(Z) = E[(Z − µZ )(Z − µZ )T ].
Page 2
Estimating β.
1. Assume E[] = 0. Then β̂ is an unbiased estimator of β.
−1
−1
−1
β̂ = X T X
X TY = X TX
X T (Xβ + ) = β + X T X
X T
So
−1
E[β̂] = β + E[ X T X
X T ] = β
−1
2. Assume that Cov() = σ 2 I. Then Cov(β̂) = σ 2 X T X
.
T
Cov(β) = E[(β − β̂)(β − β̂) ] = E
"
T
X X
−1
T
T
X X
T
X X
−1 T
Estimating σ 2 .
1. E(e) = 0.
2. Cov(e) = σ 2 (I − H)
3. Var(ei ) = σ 2 (1 − hii )
4. σ̂ 2 = S 2 =
|e|2
SSE
=
is an unbiased estimator of σ 2 .
n − (k + 1)
n − (k + 1)
The standard, distributional assumption: i are independent, normal with variance σ 2 . Then
1.
(n − (k + 1))S 2
∼ Chisq((n − (k + 1)))
σ2
2. β̂, Ŷ , and e all have normal distributions.
−1
3. Let W = X T X
and wi = W ii . Let SE(β̂) = SwI . Then
β̂i − βi
SE(β̂i )
∼ t(n − (k + 1))
#