BCAM June 2013 1 Weak convergence in Probability Theory A summer excursion! Day 1 Armand M. Makowski ECE & ISR/HyNet University of Maryland at College Park [email protected] BCAM June 2013 Day 1: Basic definitions of convergence for random variables will be reviewed, together with criteria and counter-examples. Day 2: Skorokhod’s Theorem and coupling – Examples in queueing theory, in the theory of Markov chains and time series analysis. Day 3: Poisson convergence: The Stein-Chen method with applications to problems in the theory of random graphs. Day 4: Weak convergence in function spaces – Prohorov’s Theorem and sequential compactness Day 5: An illustration: From random walks to Brownian motion 2 BCAM June 2013 3 A very short bibliography A. D. Barbour and L. Holst, “Some applications of the Stein-Chen method for proving Poisson convergence,” Advances in Applied Probability 21 (1989), pp. 74-90. A. D. Barbour, L. Holst and S. Janson, Poisson Approximation, Oxford Studies in Probability 2, Oxford University Press, Oxford (UK), 1992. P. Billingsley, Convergence of Probability Measures, John Wiley & Sons, New York (NY), 1968. P. Billingsley, Probability and Measure, Third Edition, Wiley Series in Probability and Statistics, John Wiley & Sons, New York (NY), 1995. BCAM June 2013 4 CONVERGENCE IN Rd : BASIC FACTS BCAM June 2013 5 A sequence a : N0 → R, often described as {an , n = 1, 2, . . .}, converges to some a in R if for every ε > 0, there exists n⋆ (ε) such that |an − a| ≤ ε, n ≥ n⋆ (ε) We write lim an = a or n→∞ an → a This definition contains two basic questions: • Existence – Does it converge? • Value – Find the limiting value! What happens if a = ±∞? BCAM June 2013 6 Existence Two basic ideas Every monotone sequence converges! Bolzano-Weierstrass Theorem: Every bounded sequence contains at least one convergent subsequence! BCAM June 2013 7 Liminf/Limsup Given a sequence a : N0 → R, define lim sup an = inf an n→∞ n≥1 with an = and lim inf = sup an n→∞ n≥1 with an = sup am m≥n inf am m≥n lim sup an = Largest accumulation point of the sequence n→∞ and lim inf an = Smallest accumulation point of the sequence n→∞ BCAM June 2013 8 an ↑ lim inf an n→∞ and an ↓ lim sup an n→∞ lim inf an ≤ lim sup an n→∞ n→∞ Fact: The sequence a : N0 → R converges if and only if lim inf an = lim sup an ≡ lim an n→∞ n→∞ n→∞ BCAM June 2013 9 A sequence a : N0 → R is said to be Cauchy if for every ε > 0, there exists n⋆ (ε) such that |an − am | ≤ ε, m, n ≥ n⋆ (ε) Fact: A sequence a : N0 → R converges if and only if it is Cauchy – R is complete under its usual topology BCAM June 2013 10 MODES OF CONVERGENCE FOR RANDOM VARIABLES BCAM June 2013 11 Random variables Given a probability triple (Ω, F, P), a d-dimensional random variable (rv) is a measurable mapping X : Ω → Rd , i.e., X −1 (B) = {ω ∈ Ω : X(ω) ∈ B} ∈ F, B ∈ B(Rd ) Two viewpoints • A mapping • A probability distribution function (i.e., measure induced on B(Rd )) F : Rd → [0, 1] : x → F (x) ≡ P [X ≤ x] Several modes of convergence with many subtleties! BCAM June 2013 12 An obvious definition . . . Consider a collection {X; Xn , n = 1, 2, . . .} of Rd -valued rvs all defined on the same probability triple (Ω, F, P). Then, we could say convergence takes place to X if lim Xn (ω) = X(ω), n→∞ ω∈Ω Why not? • Too strong • Modeling information: Often only the corresponding probability distributions {Fn , n = 1, 2, . . .} are available BCAM June 2013 13 Four basic modes of convergence • Convergence in distribution (in law) – Weak convergence • Convergence in the r th -mean (r ≥ 1) • Convergence in probability • Convergence with probability one (w.p. 1) Requirements • Consistency with usual convergence for deterministic sequences • Subsequence principle BCAM June 2013 14 A programme • Easy-to use-criteria • Relationships between modes of convergence • Impact of (continuous) transformations • Key limit theorems of Probability Theory With r ≥ 1, v u d uX r kxkr = t |xk |r , k=1 x = (x1 , . . . , xd ) ∈ Rd BCAM June 2013 15 Convergence with probability one Consider a collection {X; Xn , n = 1, 2, . . .} of Rd -valued rvs all defined on the same probability triple (Ω, F, P). We say that the sequence {Xn , n = 1, 2, . . .} converges almost surely (a.s.) (or with probability one (w.p. 1)) to the rv X if h i P ω ∈ Ω : lim Xn (ω) = X(ω) = 1 n→∞ We write lim Xn = X n→∞ a.s. BCAM June 2013 16 Convergence in probability Consider a collection {X; Xn , n = 1, 2, . . .} of Rd -valued rvs all defined on the same probability triple (Ω, F, P). We say that the sequence {Xn , n = 1, 2, . . .} converges in probability to the rv X if for every ε > 0, lim P [kXn − Xk2 > ε] = 0. n→∞ This is often written P Xn →n X For d = 1, lim P [|Xn − X| > ε] = 0. n→∞ BCAM June 2013 17 Convergence in the rth mean (r ≥ 1) Consider a collection {X; Xn , n = 1, 2, . . .} of Rd -valued rvs all defined on the same probability triple (Ω, F, P). We say that the sequence {Xn , n = 1, 2, . . .} converges in the r th -mean to the rv X if E [(kXn kr )r ] < ∞, n = 1, 2, . . . and E [(kXkr )r ] < ∞ and lim E [(kXn − Xkr )r ] = 0. n→∞ This is often written r Xn →n X BCAM June 2013 18 Convergence in distribution Also known as distributional convergence, convergence in law and weak convergence. Multiple equivalent definitions available A sequence of probability distribution functions {Fn , n = 1, 2, . . .} on Rd converges weakly to the probability distribution function F on Rd , written Fn =⇒n F , if lim Fn (x) = F (x), n→∞ x ∈ CF where CF denotes the continuity set of F , i.e., CF := {x ∈ R : point of continuity of F } BCAM June 2013 19 The definition is sometimes given in the following form and setting: Consider Rd -valued rvs {X, Xn , n = 1, 2, . . .} where for each n = 1, 2, . . ., the rv Xn is defined on some probability triple (Ωn , Fn , Pn ) and the rv X is defined on some probability triple (Ω, F, P)a . The sequence of rvs {Xn , n = 1, 2, . . .} converges in distribution to the rv X if Fn =⇒n F where Fn (x) = Pn [Xn ≤ x] and F (x) = P [X ≤ x] , In that case we write Xn =⇒n X. a The probability triples may or may not be distinct! x ∈ Rd n = 1, 2, . . . BCAM June 2013 20 Why this definition? Consider the two sequences Xn = 1 n 1 and Yn = − , n n = 1, 2, . . . By the consistency requirement, one should have Xn =⇒n 0 and Yn =⇒n 0. BUT, 0 if x < 0 limn→∞ FXn (x) = limn→∞ FYn (x) 1 if x > 0 with limn→∞ FXn (0) = 0 and limn→∞ FYn (0) = 1. “The limit of a distribution is not always a distribution” BCAM June 2013 21 RELATIONSHIPS BETWEEN MODES OF CONVERGENCE BCAM June 2013 22 Fact: Almost sure convergence implies convergence in probability With ε > 0, [Xn converges to X] ⊆ ∪∞ n=1 Bn (ε) with monotone increasing events Bn (ε) ≡ ∩∞ m=n [|Xm − X| ≤ ε], n = 1, 2, . . . Therefore, P [Xn converges to X] ≤ lim P [Bn (ε)] n→∞ by monotonicity! BCAM June 2013 23 If P [Xn converges to X] = 1, then limn→∞ P [Bn (ε)] = 1 becomes 0 = lim P [Bn (ε)c ] ≥ lim P [∪∞ m=n [|Xm − X| > ε]] n→∞ n→∞ by complementarity, whence lim P [|Xn − X| > ε] = 0 n→∞ Converse is not true as seen through standard counterexamples. BCAM June 2013 24 Partial converse – If the sequence {Xn , n = 1, 2, . . .} converges in probability to the rv X, then there exists a sequence ν : N0 → N0 with νk < νk+1 , k = 1, 2, . . . (whence limk→∞ νk = ∞) such that lim Xνk = X k→∞ a.s. Thus, any sequence convergent in probability contains a deterministic subsequence which converges a.s. (to the same limit). BCAM June 2013 25 Fact: Convergence in the r th -mean implies convergence in probability By Markov’s inequality, P [|Xn − X| > ε] = P [|Xn − X|r > εr ] −r ≤ ε r E [|Xn − X| ] , r > 0, ε > 0 n = 1, 2, . . . Converse is not true without additional conditions, e.g., with α > 0, −α 0 with probability 1 − n Xn = n with probability n−α BCAM June 2013 26 Fact: Convergence in probability implies convergence in distribution For each n = 1, 2, . . . and ε > 0, we have P [Xn ≤ x] ≤ P [X ≤ x + ε] + P [|Xn − X| ≥ ε] and P [X ≤ x − ε] ≤ P [Xn ≤ x] + P [|Xn − X| ≥ ε] Thus, lim sup P [Xn ≤ x] ≤ P [X ≤ x + ε] n→∞ and P [X ≤ x − ε] ≤ lim inf P [Xn ≤ x] n→∞ Finally let ε ↓ 0 and use the fact that x will be a point of continuity of the probability distribution of X. BCAM June 2013 Converse is not true! With Z ∼ N(0, 1), take Xn = (−1)n Z for each n = 1, 2, . . .. Obviously, Xn =⇒n Z but if n even 0 |Xn − Z| = |1 − (−1)n ||Z| = 2|Z| if n odd 27 BCAM June 2013 28 Partial converse – If the sequence {Xn , n = 1, 2, . . .} converges in distribution to the a.s. constant rv c, then P Xn →n c Every sequence converging in distribution to a constant converges to it in probability! Indeed, for each n = 1, 2, . . . and ε > 0, we have P [|Xn − c| ≤ ε] = P [Xn ≤ c + ε] − P [Xn < c − ε] BCAM June 2013 29 BEWARE! WEAK CONVERGENCE IS INDEED WEAK BCAM June 2013 30 Consider the two sequences of rvs {X, Xn , n = 1, 2, . . .} and {Y, Yn , n = 1, 2, . . .} where for each n = 1, 2, . . ., the pair of rvs Xn and Yn are defined on the same probability triple (Ωn , Fn , Pn ). Assume that Xn =⇒n X and Yn =⇒n Y. Important ways in which weak convergence differs from the other modes of convergence. BCAM June 2013 31 Convergence under transformation: Is it true that h(Xn ) =⇒n h(X) with h : Rd → Rp ? If not always, then under what conditions? Fact: We have h(Xn ) =⇒n h(X) if h : Rd → Rp is continuous Not easy to show from the basic definition because no more “pointwise convergence” – Skorohod to the rescue! BCAM June 2013 32 Joint convergence: Is it true that (Xn , Yn ) =⇒n (X, Y )? In general no: Take Z ∼ N(0, 1), Xn = Z and Yn = (−1)n Z, so that P [Z ≤ min(x, y)] if n even P [Xn ≤ x, Yn ≤ y] = P [−y ≤ Z ≤ x] if n odd Fact: We have (Xn , Yn ) =⇒n (X, Y ) if for each n = 1, 2, . . ., the rvs Xn and Yn are independent, in which case X and Y are independent. BCAM June 2013 33 Convergence of sums: Is it true that Xn + Yn =⇒n X + Y ? In general no: Take Z ∼ N(0, 1), Xn = Z and Yn = (−1)n Z, so that Xn + Yn = (1 + (−1)n )Z Fact: We have Xn + Yn =⇒n X + Y if for each n = 1, 2, . . ., the rvs Xn and Yn are independent! BCAM June 2013 34 Question: You know that P Xn →n X P and Yn →n Y Convergence of sums: Is it true that P Xn + Yn →n X + Y ? Yes because for each n = 1, 2, . . ., the event [|(Xn + Yn ) − (X + Y )| > ε] is contained in [|Xn − X| > ε ε ] ∪ [|Yn − Y | > ] 2 2 BCAM June 2013 35 What if only P Xn →n X and Yn =⇒n Y Counterexample: With Z ∼ N(0, 1), set and Yn = (−1)n Z, Xn = Z n = 1, 2, . . . so that Xn + Yn = (1 + (−1)n ) Z, P n = 1, 2, . . . It is plain that Xn →n Z and Yn =⇒n Z, but the convergence P Xn + Yn =⇒n X + Y does not hold, hence Xn + Yn →n X + Y fails as well! BCAM June 2013 36 TIGHTNESS BCAM June 2013 37 Tightness The Rd -valued rvs {Xn , n = 1, 2, . . .} (or equivalently, their probability distribution functions {Fn , n = 1, 2, . . .}) are tight if there for every ε > 0, there exists a compact subset Kε ⊆ Rd such that inf P [Xn ∈ Kε ] ≥ 1 − ε n=1,2,... or equivalently, by complementarity, sup P [Xn 6∈ Kε ] ≤ ε n=1,2,... BCAM June 2013 38 An easy criterion Fact: Tightness holds if for some p ≥ 1, we have sup E [|Xn |p ] < ∞ B= n=1,2,... (Proof for d = 1) By Markov’s inequality, B E [|Xn |p ] ≤ , P [|Xn | > c] ≤ p p c c and note that Kc = [−c, c] is a compact subset of R c>0 n = 1, 2, . . . BCAM June 2013 39 Fact: Every probability distribution function F on Rd is tight. By monotone continuity of probability measures and the fact that Rd is σ-compact: d 1=P X ∈R = P [∪∞ n=1 [X ∈ B(0, n)]] = lim P [X ∈ B(0, n)] n→∞ Fact: If the sequence of probability distribution functions {Fn , n = 1, 2, . . .} on Rd converges weakly to the probability distribution function F on Rd , then the collection {Fn , n = 1, 2, . . .} is tight BCAM June 2013 40 (For d = 1) Fix x and y in CF such that y < 0 < x. For each δ > 0, there a finite integer n⋆ = n⋆ (x, y; δ) such that F (x) − δ ≤ Fn (x) ≤ F (x) + δ, n ≥ n⋆ F (y) − δ ≤ Fn (y) ≤ F (y) + δ, n ≥ n⋆ and Consequently, P [Xn > x] ≤ P [X > x] + δ, n ≥ n⋆ P [Xn ≤ y] ≤ P [X ≤ y] + δ, n ≥ n⋆ and Thus, P [Xn 6∈ [y, x]] ≤ P [X > x] + P [X ≤ y] + 2δ, n ≥ n⋆ BCAM June 2013 41 Now take x in CF sufficiently large, say x = x(δ), such that P [X > x] ≤ δ Similarly take y in CF with |y| sufficiently large, say y = y(δ) such that P [Y ≤ y] ≤ δ With this choice, P [Xn 6∈ [y, x]] ≤ 4δ, n ≥ n(δ) with n(δ) = n⋆ (x(δ), y(δ); δ) BCAM June 2013 42 By Prohorov’s Theorem, Tightness = Sequential precompactness (with respect to weak convergence) Remember Bolzano-Weierstrass! BCAM June 2013 43 ANALYTIC VIEW OF WEAK CONVERGENCE BCAM June 2013 44 Basic idea Transform of a probability distribution: With any probability distribution F : Rd → [0, 1], we associate its transform/related quantity T (F ) : Rd → C Many such transforms available: Characteristic functions (general applicability), moment generating functions, Laplace-Stieltjies transforms (non-negative rvs), z-transforms (N-valued rvs), etc BCAM June 2013 45 Uniqueness requirement: With F and G probability distributions on Rd , T (F ) = T (G) if and only if F = G Desired result: The sequence of probability distribution functions {Fn , n = 1, 2, . . .} on Rd converges weakly to the probability distribution function F on Rd if and only if lim T (Fn )(t) = T (F )(t), n→∞ t ∈ Rd BCAM June 2013 46 Characteristic functions With Rd -valued rv X = (X1 , . . . , Xd )′ , its characteristic function ΦX : R → C is given by h ′ i ΦX (t) = E eit X , t ∈ Rd Also ΦF = ΦX where X ∼ F Uniqueness: With F and G probability distributions on Rd , ΦF = ΦG if and only if F = G BCAM June 2013 47 Fact: With Rd -valued rv X = (X1 , . . . , Xd )′ , its characteristic function ΦX : R → C satisfies the following properties: • Bounded: |ΦX (t)| ≤ ΦX (0) = 1, t ∈ Rd • Uniformly continuous on Rd : lim sup (|ΦX (t + h) − ΦX (t)|) = 0 h→0 t∈Rd • Positive definiteness: For every n = 1, 2, . . ., every t1 , . . . , tn in Rd and every z1 , . . . , zn in C, n n X X k=1 ℓ=1 ΦX (tk − tℓ )zk zℓ⋆ ≥ 0 BCAM June 2013 48 The Bochner-Herglotz Theorem Theorem 1 Consider a function Φ : Rd → C. It is the characteristic function of the probability distribution F : Rd → [0, 1] for some Rd -valued rv X if and only if it is positive definite, continuous at the origin, and if Φ(0) = 1. A beautiful characterization BCAM June 2013 49 Fact: The sequence of probability distribution functions {Fn , n = 1, 2, . . .} on Rd converges weakly to the probability distribution function F on Rd if and only if lim ΦFn (t) = ΦF (t), n→∞ t ∈ Rd Useful analytic characterizations of weak convergence via characteristic functions A natural idea: Look for the limiting behavior of characteristic functions h ′ i lim ΦFn (t) = lim E eit Xn , t ∈ Rd n→∞ n→∞ Beware: While this limit may exist, it is not always a characteristic function. BCAM June 2013 50 Fact: Consider a sequence of probability distribution functions {Fn , n = 1, 2, . . .} on Rd such that the limits lim ΦFn (t) = Φ(t), n→∞ t ∈ Rd exist. If Φ : Rd → C is continuous at t = 0, then it is the characteristic function of a probability distribution function F on Rd and Fn =⇒n F . Consequence of the Bochner-Herglotz Theorem because the positive definiteness of Φ and the requirement Φ(0) = 1 are automatically inherited through the limiting process. BCAM June 2013 51 Applications • Sums of independent rvs – WLLNs and CLT • Joint distribution of independent components
© Copyright 2026 Paperzz