Asymptotic Results for the Linear Regression Model C. Flinn November 29, 2000 1. Asymptotic Results under Classical Assumptions The following results apply to the linear regression model y = Xβ + ε, where X is of dimension (n × k), ε is a (unknown) (n × 1) vector of disturbances, and β is a (unknown) (k × 1) parameter vector. We assume that n À k, and that ρ(X) = k. This implies that ρ(X 0 X) = k as well. Throughout we assume that the “classical” conditional moment assumptions apply, namely • E(εi |X) = 0 ∀i. • V (εi |X) = σ 2 ∀i. We Þrst show that the probability limit of the OLS estimator is β, i.e., that it is consistent. In particular, we know that β̂ = β + (X 0 X)−1 X 0 ε ⇒ E(β̂|X) = β + (X 0 X)−1 X 0 E(ε|X) = β In terms of the (conditional) variance of the estimator β̂, V (β̂|X) = σ 2 (X 0 X)−1 . Now we will rely heavily on the following assumption Xn0 Xn = Q, n→∞ n lim where Q is a Þnite, nonsingular k × k matrix. Then we can write the covariance of β̂ n in a sample of size n explicitly as σ2 V (β̂ n |Xn ) = n µ Xn0 Xn n ¶−1 , so that µ 0 ¶−1 Xn Xn σ2 lim V (β̂ n |Xn ) = lim lim n→∞ n n −1 = 0×Q =0 Since the asymptotic variance of the estimator is 0 and the distribution is centered on β for all n, we have shown that β̂ is consistent. Alternatively, we can prove consistency as follows. We need the following result. Lemma 1.1. µ X 0ε plim n Proof. First, note that E X0ε is given by n V ¡ X 0ε ¢ µ n X 0ε n ¶ = 0. = 0 for any n. Then the variance of the expression ¶ µ ¶ µ 0 ¶0 X 0ε Xε = E n n −2 0 0 = n E(X εε X) σ2 X 0X = , n n ¡ 0 ¢ so that limn→∞ V Xn ε = 0 × Q = 0. Since the asymptotic mean of the random variable is 0 and the asymptotic variance is 0, the probability limit of the expression is 0.¥ 2 Now we can state a slightly more direct proof of consistency of the OLS estimator, which is plim(β̂) = plim(β + (X 0 X)−1 X 0 ε) ¶−1 µ 0 ¶ µ 0 Xε Xn Xn plim = β + lim n n −1 = β + Q × 0 = β. Next, consider whether or not s2 is a consistent estimator of σ 2 . Now s2 = SSE , n−k where SSE = (y − X β̂)0 (y − X β̂). We showed that E(s2 ) = σ 2 for all n - that is, that s2 is an unbiased estimator of σ 2 for all sample sizes. Since SSE = ε0 M ε, with M = (I − X(X 0 X)−1 X 0 ), then ε0 M ε n−k ε0 M ε = p lim n µ 0 ¶ µ 0 ¶−1 µ 0 ¶ 0 εε εX XX Xε = p lim − p lim n n n n 0 εε − 0 × Q−1 × 0. = p lim n p lim s2 = p lim Now n X ε0 ε = n−1 ε2i n i=1 so that E µ ε0 ε n ¶ = n−1 E n X ε2i i=1 = n−1 n X Eε2i i=1 −1 = n (nσ 2 ) = σ 2 . 3 Similarly, , under the assumption that εi is i.i.d., the variance of the random variable being considered is given by µ 0 ¶ n X εε −2 = n V( V ε2i ) n i=1 = n −2 n X V (ε2i ) i=1 = n−2 (n[E(ε4i ) − V (εi )2 ]) = n−1 [E(ε4i ) − V (εi )2 ], 0 so that the limit of the variance of εnε is 0 as long as E(ε4i ) is Þnite [we have already assumed that the Þrst two moments of the distribution of εi exist]. Thus 0 the asymptotic distribution of εnε is centered at σ 2 and is degenerate, thus proving consistency of s2 . 2. Testing without Normally Distributed Disturbances In this section we look at the distribution of test statistics associated with linear restrictions on the β vector when εi is not assumed to be normally distributed as N (0, σ 2 ) for all i. Instead, we will proceed with the weaker condition that εi is independently and identically distributed with the common cumulative distribution function (c.d.f.) F. Furthermore, E(εi ) = 0 and V (εi ) = σ 2 for all i. Since we retain the mean independence and homogeneity assumptions, and since unbiasedness, consistency, and the Gauss-Markov theorem for that matter, all only rely on these Þrst two conditional moment assumptions, all these results continue to hold when we drop normality. However, the small sample distributions of our test statistics no longer will be accurate, since these were all derived under the assumption of normality. If we made other explicit assumptions regarding F, it is possible in principle to derive the small sample distributions of test statistics, though these distributions are not simple to characterize analytically or even to compute. Instead of making explicit assumptions regarding the form of F, we can derive distributions of test statistics which are valid for large n no matter what the exact form of F [except that it must be a member of the class of distibutions for which the asymptotic results are valid, of course]. We begin with the following useful lemma, which is associated with LindbergLevy. 4 Lemma 2.1. If ε is i.i.d. with E(εi ) = 0 and E(ε2i ) = σ 2 for all i; if the elements of the matrix X are uniformly bounded so that |Xij | < U for all i and j and for 0 U Þnite; and if lim XnX = Q is Þnite and nonsingular, then 1 √ X 0 ε → N (0, σ 2 Q). n Proof. Consider the case of only one regressor for simplicity. Then n 1 X Zn ≡ √ Xi εi n i=1 is a scalar. Let Gi be the c.d.f. of Xi εi . Let Sn2 n X ≡ V (Xi εi ) = σ 2 i=1 n X Xi2 . i=1 P In this scalar case, Q = lim n−1 i Xi2 . By the Lindberg-Feller Theorem, the necessary and sufficient condition for Zn → N(0, σ 2 Q) is n 1 X lim 2 Sn i=1 Z ω 2 dGi (ω) = 0 (2.1) |ω|>νSn for all ν > 0. Now Gi (ω) = F ( |Xωi | ). Then rewrite [2.1] as n n X Xi2 lim 2 Sn i=1 n Z |ω/Xi |>νSn /|Xi | µ ω Xi ¶2 dF ( ω ) = 0. |Xi | P X2 Since lim Sn2 = lim nσ 2 ni=1 ni = nσ 2 Q, then lim Snn2 = (σ 2 Q)−1 , which is a Þnite and nonzero scalar. Then we need to show −1 lim n n X Xi2 δ i,n = 0, i=1 R ³ ω Xi ´2 where δ i,n ≡ |ω/Xi |>νSn /|Xi | dF ( |Xωi | ). Now lim δ i,n = 0 for all i and any Þxed ν since |Xi | is bounded while lim Sn = ∞ [thus the measure P 2 of the set −1 Xi is Þnite and |ω/Xi | > νSn /|Xi | goes to P 0 asymptotically]. Since lim n lim δ i,n = 0 for all i, lim n−1 Xi2 δ i,n = 0.¥ 5 For vector-valued Xi , the result is identical of course, with Q being k × k instead of a scalar. The proof is only slightly more involved. Now we can prove the following important result. Theorem 2.2. Under the conditions of the lemma, √ n(β̂ − β) → N(0, σ 2 Q−1 ). ¡ 0 ¢−1 1 0 ¡ 0 ¢−1 √ √ X ε. Since lim X X Proof. n(β̂ − β) = XnX = Q−1 and n n √ N (0, σ 2 Q),then n(β̂ − β) → N(0, σ 2 Q−1 QQ−1 ) = N(0, σ 2 Q−1 ).¥ √1 X 0 ε n → The results of this √ proof have the following practical implications. For small n, the distribution of n(β̂ − β) is not normal, though asymptotically the distribution of this random variable converges to a normal. The variance of this random variable converges to σ 2 Q−1 which is arbitrarily well-approximated by ³ 0 ´−1 0 s2 XnnXn = s2 n(Xn Xn )−1 . But the variance of (β̂ − β) is equal to the variance √ of n(β̂ − β) divided by n, so that in large samples the variance of the OLS 0 0 estimator is approximately equal to s2 n(Xn Xn )−1 /n = s2 (Xn Xn )−1 , even when F is non-normal. Usual t tests of one linear restriction on β are no longer consistent. However, an analagous large sample test is readily available. Proposition 2.3. Let εi be i.i.d. (0,σ 2 ), σ 2 < ∞, and let Q be Þnite and nonsingular. Consider the test H0 : Rβ = r, where R is (1 × k) and r is a scalar, both known. Then p Rβ̂ − r s2 R(X 0 X)−1 R0 → N(0, 1). Proof. Under the null, Rβ̂ − r = Rβ̂ − Rβ = R(β̂ − β), so that the test statistic is √ nR(β̂ − β) p . s2 R(X 0 X/n)−1 R0 Since √ n(β̂ − β) → N (0, σ 2 Q−1 ) √ ⇒ nR(β̂ − β) → N (0, σ 2 RQ−1 R0 ). 6 p The denominator of the test statistic has a probability limit equal to σ 2 RQ−1 R0 , which is the standard deviation of the random variable in the numerator. A mean zero normal random variable divided by its standard deviation has the distribution N (0, 1).¥ A similar result holds for the situation in which multiple (nonredundent) linear restrictions on β are tested simultaneously. Proposition 2.4. Let εi be i.i.d. (0,σ 2 ), σ 2 < ∞, and let Q be Þnite and nonsingular. Consider the test H0 : Rβ = r, where R is (m × k) and r is a (m × 1) vector, both known. Then χ2 (r − Rβ̂)0 [R(X 0 X)−1 R0 ]−1 (r − Rβ̂)/m → m. SSE/(n − k) m Proof. The denominator is a consistent estimator of σ 2 [as would be SSS/n], and has a degenerate limiting distribution. Under the null hypothesis, r − Rβ̂ = −R(X 0 X)−1 X 0 ε, so that the numerator of the test statistic can be written ε0 Dε, where D ≡ X(X 0 X)−1 R0 [R(X 0 X)−1 R0 ]−1 R(X 0 X)−1 X 0 . Now D is symmetric and idempotent with ρ(D) = m. Then write ε0 P P 0 DP P 0 ε ε0 Dε = 2 mσ 2 mσ · ¸ 1 0 Im 0 = V V 0 0 m m 1 X 2 V , = m i=1 i ¸ Im 0 where P is the orthogonal matrix such that P DP = and where V = 0 0 P 0ε . Thus the Vi are i.i.d. with mean 0 and standard deviation 1. Because V = σ 0 P ε/σ, n X Pji εj Vi = , i = 1, ..., m. σ j=1 0 7 · The terms in the summand are independent random variables with mean 0 and variance σ 2j = Pji2 . Since the εj are i.i.d., the central limit theorem applies, so that n X Pji εj /σ Wn j=1 where Wn = qP n 2 j=1 σ j = qP each Vi is standard normal, n j=1 1 m Pm → N (0, 1), Pji2 = 1 because P is orthogonal. Then since i=1 Vi2 → χ2m .¥ m The practical use of this theorem is as follows. For large samples, the sample distribution of the statistic (r − Rβ̂)0 [R(X 0 X)−1 R0 ]−1 (r − Rβ̂)/m χ2 → m, SSE/(n − k) m (2.2) which means that for large enough n (r − Rβ̂)0 [R(X 0 X)−1 R0 ]−1 (r − Rβ̂) → χ2m . SSE/(n − k) (2.3) Now when disturbances were normally distributed, in a sample of size n we have the same test statistic given by the left-hand side of [2.2] was distributed as an 2 F (m, n − k). Note that limn→∞ F (x; m, n − k) is χmm(x) . For example, say that the test statistic associated with a null with (m) 3 restrictions assumed the value 4. In a sample size of n = 8000, we have (approximately) 1 − F (4; 3, 8000) = .00741. The asymptotic approximation given in [2.3] in this example yields 1−χ23 (3 ×4) = .00738. In small samples, differences are much greater of course. For example, for the same value of the test statistic, when n = 20 we have 1 − F (4; 3, 20 − 3) = .02523, which is certainly different than 1 − χ23 (3 × 4) = .00738. In summary, when the sample size is very large, the normality assumption is pretty much inconsequential in the testing of linear restrictions on the parameter vector β. In small samples, some given assumption as to the form of F (ε) is generally required to compute the distribution of the estimator β̂. Under normality, the small sample distributions of test statistics follow the t or F, depending on the number of restrictions being tested. Testing in this environment depends critically on the normality assumption, and if the disturbances are not normally distributed, tests will be biased in general. 8
© Copyright 2026 Paperzz