Some notes on asymptotic theory in probability Alen Alexanderian∗ Abstract We provide a precise account of some commonly used results from asymptotic theory in probability. Contents 1 Introduction and basic definitions 1 2 Basic definitions from probability theory 1 3 Convergence in probability and op (n), Op (n) notations 3 4 Probabilistic Taylor expansion 6 5 Convergence in distribution 6 6 Central Limit Theorem and related results 9 References 1 10 Introduction and basic definitions This brief note summarizes some important results in asymptotic theory in probability. The main motivation of this theory is to approximate distribution of large sample statistics with a limiting distribution which is often much simpler to work with. The structure of this note is as follows. First, we recall some basic definitions from probability theory in Section 2. Then, we proceed by discussing the notion of convergence in Probability and op (n) and Op (n) notations in Section 3. Section 4 presents the probabilistic Taylor formula. In Section 5 we discuss convergence in distributions along with some related results, and in Section 6, we follow up by presenting the Central Limit Theorem along with some accompanying results. 2 Basic definitions from probability theory Throughout this note, we let (Ω, F, P ), be a probability space, where Ω denotes a sample space, F is a suitable σ -algebra on Ω, and P is a probability measure. We call a measurable function X : Ω 7→ R a random variable; sometimes, we work with ∗ The University of Texas at Austin, USA. E-mail: [email protected] Last revised: April 16, 2015 Asymptotic theory in probability random k -vectors (vectors valued random variables), which are measurable functions X : Ω 7→ Rk . The expectation (mean) of a random variable X is Z E [X] = X(ω) dP (ω). Ω Variance of a random variable measures the deviation of X from its mean, Var [X] = E (X − E [X])2 . It is trivial to show, 2 Var [X] = E X 2 − (E [X]) . Next, we recall the notion of the covariance of two random variables. Let X and Y be two random variables, Cov [X, Y ] = E [(X − E [X])(Y − E [Y ])] . Note that Cov [X, X] = Var [X]. The variance and covariance for random k -vectors are defined by, Var [X] = E [(X − E [X]) · (X − E [X])] , Cov [X, Y ] = E [(X − E [X]) · (Y − E [Y ])] . Here x · y = k X xi yi . i=1 Definition 2.1 (Distribution Function). The distribution function of a random variable X is the function FX : R 7→ [0, 1] defined by FX (z) := P (X ≤ z) For a random k -vector X we similarly define FX : Rk 7→ [0, 1] by FX (z) := P (X ≤ z) Note that for two vectors x and y in Rk , we write x ≤ y to mean xi ≤ yi , for i = 1, . . . , k . Another equivalent definition of expectation is given by Z Z E [X] = X(ω) dP (ω) = z dFX (z). Rk Ω We also have, for a measurable function g Z E [g(X)] = Z g(X(ω)) dP (ω) = g(z) dFX (z). (2.1) Rk Ω The next definition concerns the very important notion of the characteristic function of a random variable. Definition 2.2. Let X be an random k -vector. Then φX (ξ) := E eiξ·X , ξ ∈ Rk is the characteristic function of X . Using (2.1) we have, φX (ξ) = E eiξ·X = Z Rk 2 eiξ·z dFX (z). Asymptotic theory in probability 3 Convergence in probability and op (n), Op (n) notations Let us begin by defining the notion of convergence in probability to zero [2]. Definition 3.1. Let {Xn } be a sequence of random variables defined on (Ω, F, P ). We say Xn converges in probability to zero, written Xn = op (1) P or Xn → 0, if for every > 0, P (|Xn | > ) → 0, as n → ∞. A closely related notion is that of a sequence being bounded in probability [2]. Definition 3.2. Let {Xn } be a sequence of random variables defined on (Ω, F, P ). We say Xn is bounded in probability (or tight), if for every > 0, there exists M > 0 such that, P (|Xn | > M ) < , for all n. We can look at the idea of boundedness in probability in the following alternative way. We say Xn is bounded in probability if for every > 0, there is a set F ∈ F and a number M such that for all ω ∈ F , |Xn (ω)| ≤ M , for all n, and P (F c ) < . P To get used to this ideas, let us prove the following basic fact: if Xn → 0, then Xn is bounded in probability. That is if Xn = op (1), then Xn = Op (1). This corresponds to the basic idea from real analysis that says a convergent sequence is bounded. The P argument is as follows; suppose Xn → 0. Let > 0 be given; we know P (|Xn | > 1) → 0, therefore, there is some n0 such that P (|Xn | > 1) < , for n ≥ n0 . Now, pick M0 sufficiently large so that P (|Xi | > M0 ) < for i = 1, . . . , n0 − 1. Then, we have for M = max(1, M0 ) that P (|Xn | > M ) < , for all n, which completes the proof that Xn = Op (1). Now, we use the above developments to define convergence in probability and order in probability [2]. Definition 3.3. Let {Xn } be a sequence of random variables on (Ω, F, P ) and let {an } be a sequence of strictly positive reals. 1. Xn converges is probability to X if and only if Xn − X = op (1). 2. Xn = op (an ) if and only if a−1 n Xn = op (1). 3. Xn = Op (an ) if and only if a−1 n Xn = Op (1). The following proposition[2] shows the similarities between the notions of op and Op with their counterparts from real analysis (the little-oh and big-oh notations). Proposition 3.4. Let {Xn } and {Yn } be sequences of random variables on (Ω, F, P ), and let {an } and {bn } be sequences of positive real numbers. 1. Suppose Xn = op (an ) and Yn = op (bn ); then, 3 Asymptotic theory in probability (a) Xn Yn = op (an bn ). (b) Xn + Yn = op (max(an , bn )). (c) |Xn |r = op (arn ) for r > 0. 2. If Xn = op (an ) and Yn = Op (bn ), then Xn Yn = op (an bn ). We can extend the above developments to random k -vectors. For a random k -vector, X , denote its j th component by X j . Definition 3.5. Let {X n } be a sequence of random k -vectors on (Ω, F, P ), and let {an } be a sequence of strictly positive reals. 1. X n = op (an ) if and only if Xnj = op (an ) for j = 1, . . . , k . 2. X n = Op (an ) if and only if Xnj = Op (an ) for j = 1, . . . , k . P 3. X n converges in probability to X , written X n → X , if and only if X n − X = op (1). Recall that in analysis, convergence in Rk is characterized by componentwise convergence. Same idea applies when we talk about convergence in probability of random k -vectors. Pk j 2 Let kXk denote the Euclidean norm, kXk2 = j=1 |X | . The following result characterizes convergence in probability in terms of the Euclidean distance [2]. Proposition 3.6. Let {X n } be a sequence of random k -vectors on (Ω, F, P ) and X a random k -vector on the same probability space. Then, X n − X = op (1) if and only if kX n − Xk = op (1). Proof. Suppose X n − X = op (1). Then, for every > 0, lim P (|Xnj − X j |2 > /k) = 0, n→∞ Pk Claim: P ( j=1 |Xnj − X j |2 > ) ≤ Pk j=1 for j = 1, . . . , k. P (|Xnj − X j |2 > /k). Pk j j 2 This follows from the fact that j=1 |Xn − X | > implies at least one of the summands is larger than /k . To make this clearer, let En = {ω : k X |Xnj (ω) − X j (ω)|2 > }, and Enj = {ω : |Xnj − X j |2 > /k}, j=1 and note that En ⊆ ∪kj=1 Enj . Therefore, P (En ) ≤ claim. Then, we have, P (kX n − Xk2 > ) ≤ k X Pk j=1 P (Enj ), which establishes the P (|Xnj − X j |2 > /k) → 0, as n → ∞. j=1 Therefore, kX n − Xk2 = op (1), and by proposition 3.4, kX n − Xk = op (1). Conversely, if kX n − Xk = op (1), we use |Xnj − X j | ≤ kX n − Xk to get P (|Xnj − X j | > ) ≤ P (kX n − Xk > ) → 0 as n → ∞. Proposition 3.7. Let {X n } and {Y n } be sequences of random k -vectors on (Ω, F, P ). P P P If X n − Y n → 0 and Y n → Y , where Y is a random k -vector on (Ω, F, P ) then, X n → Y also. 4 Asymptotic theory in probability Proof. Note that by Proposition 3.6, kX n − Y n k = op (1) and kY n − Y k = op (1). Then, we have kX n − Y k ≤ kX n − Y n k + kY n − Y k = op (1), where the last equality follows from Proposition 3.4. Therefore, again applying Proposition 3.6, we have X n − Y = op (1). We also have the following continuity result [2]. Proposition 3.8. Let {Xn } be a sequence of random variables on (Ω, F, P ), such that P P Xn → X . If g : R 7→ R is a continuous mapping, then g(Xn ) → g(X). Proof. Let > 0 be given. For any K ∈ R+ , P (|g(Xn ) − g(X)| > ) ≤ P (|g(Xn ) − g(X)| > , |X| ≤ K, |Xn | ≤ K) +P ({|X| > K} ∪ {|Xn | > K}). Now we note that g is uniformly continuous on {x : |x| ≤ K}, and thus there exists δ = δ() such that |g(Xn ) − g(X)| ≤ for |Xn − X| < δ (where we consider the set where |X| ≤ K and |Xn | ≤ K ). Therefore, {|g(Xn ) − g(X)| > , |X| ≤ K, |Xn | ≤ K} ⊆ {|Xn − X| > δ}. Therefore, we have P (|g(Xn ) − g(X)| > ) ≤ P (|Xn − X| > δ) + P (|X| > K) + P (|Xn | > K) K K ≤ P (|Xn − X| > δ) + P (|X| > K) + P (|X| > ) + P (|Xn − X| > ) 2 2 Now, let γ > 0 be arbitrary; we can choose K sufficiently large so that the second and third terms are both less than γ/2. Thus, we have for sufficiently large K ), P (|g(Xn ) − g(X)| > ) ≤ P (|Xn − X| > δ) + P (|Xn − X| > K ) + γ. 2 P Taking the limit of both sides of above equation and using the fact that |Xn − X| → 0 gives, lim P (|g(Xn ) − g(X)| > ) ≤ γ, n→∞ and since γ > 0 was arbitrary the proof is complete. Suppose Xn = a + op (1), then using the above result we have for any continuous function g , g(Xn ) = g(a) + op (1). Now, what if g(x) is also differentiable at x = a? In the next section, we discuss a probabilistic analogue of Taylor’s Theorem. Remark 3.9. The following technical note is relevant in what follows. Let Let {Xn } be a sequence of of random variables such that Xn = Op (rn ), where 0 < rn → 0. Then, P Xn = op (1); i.e., Xn → 0. To show this, we let δ > 0 be fixed but arbitrary, and let > 0 −1 be given. By Xn = Op (rn ), there exists M ∈ (0, ∞) such that P (rn |Xn | > M ) < , for all n. Now, take n0 (), sufficiently large, such that for n ≥ n0 (), we have rn M < δ . Then, we have for n ≥ n(), P (|Xn | > δ) ≤ P (|Xn | > M rn ) < , and hence the claim. 5 Asymptotic theory in probability 4 Probabilistic Taylor expansion Proposition 4.1 (Proposition 6.1.5 in [2]). Let {Xn } be a sequence of of random variables such that Xn = a+Op (rn ) where a ∈ R and 0 < rn → 0 as n → ∞. If g is a function with s continuous derivatives at a then, g(Xn ) = s X g (j) (a) j! j=0 (Xn − a)j + op (rns ). Proof. Define the function h as below, h(x) = P g(x)− sj=0 0, g (j) (a) (x−a)j j! (x−a)s s! , x 6= a x = a. The function h defined above is continuous at x = a (by repeated application of L’Hôpital’s rule we can check limx→a h(a) = 0). Therefore, h(Xn ) = h(a) + op (1), and since h(a) = 0, we have h(Xn ) = op (1). But then, using Proposition 3.4(2), we have (Xn − a)s h(Xn ) = op (rns ), that is g(Xn ) − s X g (j) (a) j! j=0 (Xn − a)j = op (rns ), which completes the proof. We can also state an analogue of the above result for random k -vectors, the proof of which is similar [2]. Proposition 4.2. Let {X n } be a sequence of of random k -vectors such that X n = a + Op (rn ) where a ∈ Rk and 0 < rn → 0 as n → ∞. If g : Rk 7→ R is a function such ∂g are continuous in a neighborhood of a then, that the partial derivatives ∂x i g(X n ) = g(a) + ∇g(a) · (X n − a) + op (rn ). 5 Convergence in distribution When we talk about convergence in probability, we always talk about a sequence of random variables, X1 , X2 , . . ., defined on the same probability space. In this section, we discuss convergence in distribution, which makes sense even if Xi are not defined on the same probability space. Definition 5.1. Let {X n } be a sequence of random k -vectors, with distribution functions {FX n (·)}. We say X n converges in distribution to X if, lim FX n (z) = FX (z) n→∞ for all z ∈ C where C is the set of continuity points of FX n . We use the notation, D D X n → X or FX n → FX to denote convergence in distribution. D Speaking in practical terms, X n → X tells us that the distribution of X n is well approximated by the distribution of X for large n. The following Theorem [2] is an important result linking convergence in distribution to convergence of characteristic functions. 6 Asymptotic theory in probability Theorem 5.2. Let {Fn }, n = 0, 1, 2, .Z. ., be distribution functions on Rk corresponding eiξ·x dFn (x), n = 0, 1, 2, . . ., then the following to characteristic functions, φn (ξ) = Rk are equivalent. D 1. Z Fn → F0 , Z g(x) dFn (x) → 2. Rk k g(x) dF0 (x), for every bounded continuous function g on Rk R . 3. limn→∞ φn (ξ) = φ0 (ξ) for every ξ ∈ Rk . Let us get some insight into the notion of convergence in distribution by proving the following simple result, which says that a sequence that converges in distribution is bounded in probability. D Lemma 5.3. Suppose Xn → X . Then, Xn = Op (1). Proof. Let > 0 be given. Choose M0 sufficiently large so that P (|X| > M0 ) < . Now, we know by convergence in distribution that, P (|Xn | > M0 ) → P (|X| > M0 ). So there exists some n0 such that for n ≥ n0 , P (|Xn | > M0 ) < Next, choose M1 sufficiently large so that P (|Xi | > M1 ) < , for i = 1, . . . , n0 − 1. Then, for M = max(M0 , M1 ), we have P (|Xn | > M ) < , for all n. P Proposition 5.4. Let {Xn } be a sequence of random variables. Suppose Xn → X . Then, 1. E |eitXn − eitX | → 0 as n → ∞, for all t ∈ R, and D 2. Xn → X . Proof. Let t ∈ R and > 0 be fixed but arbitrary. Pick δ = δ() > 0 such that, |x − y| < δ =⇒ |eitx − eity | = |1 − eit(y−x) | < . Then, we have h i itX itX it(X−Xn ) | E |e n − e | = E |1 − e h i h i = E |1 − eit(X−Xn ) |χ{|Xn −X|<δ} + E |1 − eit(X−Xn ) |χ{|Xn −X|≥δ} h i ≤ + E |1 − eit(X−Xn ) |χ{|Xn −X|≥δ} ≤ + 2P (|Xn − X| ≥ δ). Therefore, taking limit as n → ∞ gives, lim E |eitXn − eitX | ≤ , n→∞ and since > 0 was arbitrary, the proof of the first statement of the Proposition is complete. 7 Asymptotic theory in probability To prove convergence in distribution, we show the convergence of characteristic functions: |φXn (t) − φX (t)| = |E eitXn − E eitX | = |E eitXn − eitX | ≤ E |eitXn − eitX | → 0, as n → ∞, where the last conclusion follows from the first statement of the Proposition; this completes the proof. Remark 5.5. By Proposition 5.4 we have that convergence in probability implies convergence in distribution. While the converse is not true in general, we mention the following important special case. Let {Xn } be a sequence of random variables such D P that Xn → c where c is a constant. Then, we have that Xn → c also. To see this, let δ > 0 be fixed but arbitrary, and note that P (|Xn − c| > δ) ≤ P (Xn − c > δ) + P (Xn − c ≤ −δ) = 1 − FXn (c + δ) + FXn (c − δ) → 0, as n → ∞. Proposition 5.6. Let {Xn } and {Yn } be sequences of random variables such that Xn − D D Yn = op (1), and Xn → X ; then, Yn → X also. Proof. Again, we will show the convergence of the characteristic functions. First, we claim, |φYn (t) − φXn (t)| → 0, as n → ∞, ∀t ∈ R. This follows from, |φYn (t) − φXn (t)| = |E eitYn − E eitXn | = |E eitYn − eitXn | ≤ E |eitYn − eitXn | h i = E |1 − eit(Xn −Yn ) | → 0, as n → ∞, where the last conclusion follows from Proposition 5.4. The rest of the proof is simple; for any t ∈ R, |φYn (t) − φY (t)| ≤ |φYn (t) − φXn (t)| + |φXn (t) − φX (t)| → 0, as n → ∞. The following continuity result is very useful. D Proposition 5.7. Let {Xn } be a sequence of random variables such that Xn → X . If D h : R 7→ R is a continuous function, then h(Xn ) → h(X). Proof. We will show φh(Xn ) (t) → φh(X) (t) for all t. Note that h i Z ith(Xn ) φXn (t) = E e = eith(x) dFXn . R ith(x) Now, for every t ∈ R, the function g(x) = e of x. Therefore, by Theorem 5.2(2), we have Z e ith(x) Z dFXn → R is a bounded and continuous function eith(x) dFX , R but the right hand side of the above equation is none but, φh(X) (t). Hence, we have D shown that φXn (t) → φX (t) for every t and consequently, by Theorem 5.2, h(Xn ) → h(X). 8 Asymptotic theory in probability 6 Central Limit Theorem and related results Definition 6.1. A sequence {Xn } of random variables is said to be asymptotically normal with mean µn and standard deviation σn if σn > 0 for n sufficiently large and Xn − µn D → Z, σn Z ∼ N (0, 1). In this case we use the notation, Xn is AN (µn , σn2 ). 2 Remark 6.2. [2] If Xn is AN (µn , σn ) it is not necessarily the case that µ(Xn ) = µn and 2 Var [Xn ] = σn . The following Central Limit Theorem, which is stated without proof is well known (I will at some point put a carefully written proof here). Elementary proofs of this Theorem can be found in [3] or [4], for more rigorous, measure theoretic, proofs see [1, 2, 5]. Theorem 6.3 (Central Limit Theorem). Let {Xn } be a sequence of independent and identically distributed random variables with mean µ and variance σ 2 (notation: {Xn } ∼ IID(µ, σ 2 )). Define X̄n = X1 + X2 + · · · + Xn . Then, n X̄n is AN (µ, σ2 ) n The following result [2], known as the delta-method in statistics literature, is useful in applications. 2 ), where σn → 0 as n → ∞, and if g is a function Proposition 6.4. If Xn is AN (µ, σn continuously differentiable at µ, and g 0 (µ) 6= 0, then 2 g(Xn ) is AN (g(µ), [g 0 (µ)] σn2 ). D n −µ Proof. Let Zn = Xσ , we know that Zn → Z ∼ N (0, 1). Thus, by Lemma 5.3, Zn = n Op (1). Therefore, Xn = µ + Op (σn ). Then, by Taylor formula given in Proposition 4.1, we have g(Xn ) = g(µ) + g 0 (µ)(Xn − µ) + op (σn ). Therefore, g(Xn ) − g(µ) = g 0 (µ) σn Xn − µ σn + op (1). we rewrite the above equation as g(Xn ) − g(µ) = g 0 (µ)σn and recall that Xn − µ σn + op (1), Xn − µ D → Z ∼ N (0, 1); therefore, by Proposition 5.6, we have σn g(Xn ) − g(µ) D → Z ∼ N (0, 1), g 0 (µ)σn That is, 2 g(Xn ) is AN g(µ), [g 0 (µ)] σ 2 . 9 Asymptotic theory in probability References [1] L. Breiman, Probability, Siam, 1992. [2] P. Brockwell and R. Davis, Time Series: Theory and Methods, Springer, 1991. [3] P. G. Hoel, S. C. Port, and C. J. Stone, Introduction to Probability Theory, Houghton Mifflin Company, 1971. [4] S. Ross, A First Course in Probability, Macmillan Publishing Company, 2nd ed., 1984. [5] D. Williams, Probability with Martingales, Cambridge University Press, 1991. 10
© Copyright 2026 Paperzz