A PROOF USING DERIVED DISTRIBUTIONS
OF THE LINDEBERG CENTRAL LIMIT THEOREM
by
Walter L. Smith
Department of Statistics
University of North Carolina
Chapel Hill, North Carolina, 27514
Key Words: Central Limit Theorem, Characteristic Functions.
This research is supported by Office of Naval Research Grant No. N00014-83K-0352.
SUMMARY
A short discussion of the ideas of "derived" distributions precedes a
proof of the Lindeberg Central Limit Theorem.
This proof is short, carries
the necessity and sufficiency parts together, and renders their relationship
clear.
The derived distribution approach is then utilized to construct some
illustrative examples of behavior differing from the familiar normal limit
which occurs when the Lindeberg conditions are fully satisfied.
page 2
411.
Introduction.
The Lindeberg Central Limit Theorem is one of a few
theorems that should, and typically do, constitute the major landmarks, as it
were, of a serious first graduate course in Probability Theory.
Most of the
published proofs of this important theorem involve Fourier analysis in that
they employ characteristic functions.
Two noteworthy exceptions are the
proofs of Trotter (1959) and (for the somewhat restricted case of lattice
random variables, identically distributed, and therefore hardly meriting to be
referred to as the "Lindeberg" theorem) due to Petrovsky and Kolmogorov
(given in Khinchin (1948) and presented very clearly, in English, in Rosenblatt
(1974)).
However, the majority of the published proofs we have seen,
including those in a
number of
much-used "standard" texts, essentially
involve two distinct proofs, each somewhat intricate; one deals with the
sufficiency part of the proof, the other with the necessity part.
It is many
years since I introduced into the first graduate course on probability theory
at Chapel Hill a proof of the Lindeberg Theorem which makes use of socalled "derived" distributions; this proof is strikingly straightforward, the
necessity
and
the
simultaneously
and
sufficiency
their
parts
of
the
inter-relationship
theorem
is
are
rendered
established
transparent.
Professional colleagues at various academic institutions have long urged
publication of this proof because it could, at the very least, be useful to
others faced with the teaching of the Lindeberg Theorem, and more than
that, it may provide, even to those not engaged in teaching, a new insight
into that theorem.
Having recently found a way to avoid one unattractive
feature of the "Chapel Hill" proof as it has been taught for these many
years (the use of the logarithm with all the ambiguities raised by that multivalued function), I am finally persuaded to present this proof to a wider
audience.
page 3
12.
Conditions for the theorem.
Let X h X 2 ,
be an infinite sequnce of
••.
mutually independent random variables with finite means and variances.
With
no loss of generality and some gain in ease of discussion we may suppose that
SX n
0 for all
=
nj
we shall write
2
Sn =
(2.1)
for the variance of Sn
P{X n
s:
Xl
=
2
0"1
for the variance of X n and set
0";
+ 0"22 + ... +
2
O"n
+ X 2 + ... + X n•
x} for the distribution function of X n •
Let us also write Fn(x)
=
Finally make the assumption
of asymptotic negligi bility:
(2.2)
~
0, as n
~ 00.
Let us refer to the all the assumptions described in the preceding
paragraph as the Global Hypothesis.
We may now introduce the famous
Lindeberg condition:
Condition (LC): For every fixed!
>
J'
(2.3)
0, as n
Ixl
Jpl >
2
~ 00,
FJ(dx)
~
°
!-Sni'"'
Lindeberg (1922) established the following:
.,
~
THEOREM I
U.
".
'.
I
.
the alob<il hypothesis be. taken' to hold, then, in order .for
Sn/sn to be asymptotically .N'(0,1) as
that Condition (LC) b:Ql.d.
ri
~
00
; .,
page 4
tt is necessary
"'.
and sufficient
~3.
Derived Distributions.
The author introduced the so-called "derived"
distributions in his Ph. D. thesis (Smith (953)) and has used them since in
various ways; see (Smith (959» and, for more general results (Smith (1966)).
For completeness of the present account we present briefly the necessary
basic ideas, which are extremely simple, in spite of their availability in much
greater generality elsewhere.
Let F(o) be the distribution function of a non-negative random variable
whose first two moments IJ. l and 1J. 2 are both finite.
IJ. l
=
I:
{l -
It is well-known that
F(x)} dx; thus we can define a probability density function on
(0,00) by
(3.1)
and we shall call this the first derived density function.
It is a trivial
matter to show that this new density has its first moment finite and equal to
IJ,/2IJ,l =
IJ,~lJ, say.
Thus, if we write p(lk)
for the df associated with flli( 0)
we can introduce a second derived density by
(3.2)
It is a comparatively easy matter to ve;'ify the following equation
which relates the tail probabilities of f(zn
to the original distribtuion
function po) :
(3.3)
J
J,'!(U)!!"L,Je'
OX!
••
0
k f.l":'0.';'c~J'}XdZ).
'1, :'}' •,'\
OJ
.,.._ _
"
Now let us introduce characteristic functions: write, for dummy real 9,
(3.4)
~(9)
J.".
0-
page 5
F(d,,).
Then, in an obvious notation, one can obtain by an integration by parts the
result
(3.5)
and, by the same token
(3.6)
If we combine (3.5) and (3.6) we are led to the result
ep(9)
(3.7)
This equation (3.7) is the basis of our proof; notice that it is valid for all 9
and not, as is the case with the more familiar Taylor expansions of a
characteristic function at the origin, for "small" 9 only.
All that remains for
us to do, before we turn to the. actual proof of the Central limit Theorem
proper, is to extract from (3.7)"which refers to a distribution on the nonnegative half-line, a
simil~r
.result for a. distribution on the whole real line.
However, before we do' tha!t. let:.us: notice' that ep(2/9) is the characteristic
function of the pdf f(2i-}.,More.ov,er, 1('2)(-) is a pdf with special properties; it
is (as a glance at (3.1),
~nd '(3.2):, ~ill.Jt!'eveal)
non-increasing, with a non-
increasing derivative which tends to zero as x -+
00.
Thus (3.7) is remarkable
in that it connects the arbitrary cf ep(9) to the cf of such a special convex
pdf.
"~
., ....
~. ~~.
~'.';:,'
n Jt ... - ·...,
.•. ;~
Let X be an unrest'ricted (t<:>'the" nob'.:.negative reals) random variable
with zero mean and finiteV~ttanek:~2;";L~t' us set p = P{X::::::O) and q =
P{X<O). Then we can regard' !X~'ilSf 'a'rishig" trom two non-negative random
variables
X
=
-X-.
X+
and
X-:
with
probability - p,
In an obvious notation we have
page 6
X
=
X+;
with
probabilty
q,
•
fJ,1 =
pfJ,i -
fJ,2
PfJ,t
qfJ,l
+ q fJ,"2
p,p+(0) + q,p -(-0)
,p(0) =
and from equations like (3.7) for both ,p+(O) and ,p-(O) and can then derive the
resul t, remem bering tha t fJ, 1 = 0,
(3.8)
in which the function 1/;0 is !!. characteristic function
associated
with a density g(.), say, which is convex and monotonically
decreasing to zero as x increases to
on the positive half-line, convex and
00
monotonically decreasing to zero as x decreases to
line.
-00
on the negative half-
This result (3.8) has been the. object oftltissedion; it is a special case
of a more general result [Smith (l999H(which .'mightwell be presented at some
earlier
point
in
a
probability 'course, ·.·when;<.. dealing
characteristic functions).
with
properties of
It is' important;· to::ridte:' tha t,like (3.7), (3.9) is true
for all real O.
. .:.
14.
Proof of Lindebers's Th~.'::~m.!;W:e1}"ltst pr.stproye a crucial lemma; the
one given here differs
Chapel Hill; this newer
fro~ th~·
~
lemm~
is.
by ambiguities of the logarithm
~.-,_
one. .,.lom~.,
,u.t,iJised, in the same context in
J. __ , ' . '
.~
wh,~.Lenab.Iesius
r ,:.' 1
~
'",," ~,
1.!}
function~..
':
.lot'
i.
page 7
'
to avoid difficulties caused
LEMMA 1
For j
= 1, 2, ... , n; n = 1, 2, ... , ad inf., let {aJ n} be complex
numbers such that
n
L
(4.1)
la J n I s;:
A
<
00
J=l
and suppose further that
(4.2)
Then, as n -+
00,
n
(4.3)
fl
~ eXP{-L~:~aJn}.
a Jn )
(l -
J = 1
PROOF. Let
1
Then
1
2
2i a jn
+E
+
jn ,
1
3
3i a jn
1-
say.
+
ajn
so
We may thus conclude that
~n
(4.4)
.....
"~',
L !Ejnl
.:'~~-
=
-+ 0, as n -+00.
J=l
page 8
e'
But
."
Il
Il(
."
(l
+
E}.,,) -
J=l
1
s:
1
+ IEj."l)
- 1
j=l
which, in view of (4.4), tends to zero and proves the lemma.
Let us now turn to the proof of Theorem 1.
In the notation already
established, let us define the characteristic function of S.,,/ s." to be
Then, because of the mutual independence of the {X.,,},
n
Il
Il
¢JiB/Sj)
j=l
n
J
(1 -
a jn ), say,
=1
where we have set
The question is then whether the {a,1n} so defined satisfy the conditions of
the lemma.
Because the lJJ are characteristic functions it follows that for all
B we must have IftliB)1
~
i··' .
1 and so (4.2) is satisfied as because of the
asymptotic negligibility assumed as part of the global hypothesis.
fixed B we have
(4.5)
page 9
For every
from which it is plain that that (4.2) is satisfied.
But from (4.5) it is equally
plain that
(4.6)
and so condtion (4.1) of Lemma
1 is also satisfied.
Notice that from the global hypothesis alone we must have that, as n
~ 00,
for every fixed (},
(4.7)
where we have conveniently written
(4.8)
From (4.8) it can be'seendhat 't1l( (}) is also !!. characteristic function!
Thus the global hypothesis of Lindeberg's Theorem implies the asymptotic
relationship between two sequences of characteristic functions.
Moreover, we
see that Wn((}) is assodate'd with',8 pdf with special convexity properties
since it is a convex line9:r;-co.lIIbinati.:on
(especially after equatidn (3.9»;:9"
~~':.
c)-f.)
d'ertsities like the generic g(.) of 13
;',' : _,
.';if'.
'rr. " '" " . . ,
Let us write G n for the df associated with 't n( (}) and U for the
'~,
~
':f4
";' ~ :- :\;' .. ~~ .-;
~J'J(
y~-i,
l
f'
.;
.r
degenerate df (sometimes called the '''HeaViside ' Uhit Function") which places
... ,
"".q('~
'\
-f
:-'"
,r't
_ _'
all the unit probability on the origin. 'Then it
,
(4.9)
J'",'.,;,
\ " ',""
e
~',
_!(}2
2
if and only if
page 10
'/"-
_
'.
IS' plain from (4.7) that
(4.10)
From the Continuity Theorem for characteristic functions it can be
inferred that (4.9) holds if and only if the sequence of distribution functions
{Gn} converges weakly to U.
This is essentially completes the proof of
Lindeberg's Theorem although, of course, something must be said to relate
the condition "Gn
e
U", which the present argument shows to arise in a
natural way as the crucial condition for asymptotic normality, and the
classical Condition (LC).
From (3.3) coupled with (4.8) we see that the condition Gn ~U is
'"
equivalent to the statement
(4.10
12
Sn
t: J
.1-1
-
(lxl -
>
lrel
E:S n /
F.I(dx)
-+ 0,
as n -+
00.
E:s n
Obviously Condition (LC) implies (4.11). On the other hand, in the integrals in
(4.10, for the range Ixl>2E:s n it is true that Ixl-E:s n > ~Ixl from which it
should be apparent that (4.11) implies (Lql Jw1th the unimportant change of E:
into 2E: in the latter condition).
~5.
Further remarks
Let us "Wl'itelKij fOfjthe df;,;QfS{sn, with cf 4>n, and let
us continue to write Gn for "the" df. $saocmte.d:.wittdl'tn.
The argument we
have given in the preceding section proves; '.the nf.ollowing theorem, more
general that the Lindeberg Theorem:
...
I''';~ :1 f~ Lui :.'~.
;;
"'; ~
"; .~-, 1
" .... '
THEOREM!! Under the,globc;~!?lI~~!!t~s~~:,oJ:.c'l:{KnJconverges
w eakly to a
limit df K, say, if and o:nl1l 1/..r {Gn }, ;?P7l~erges weakly to a (possibly
~ -3 .. _C --,.
defective) df G, say. If 't(.) be the cf associated with G, then K will have a
_,
J
' ....
-.. J.
.. ,
finite variance given by '1'(0).
PROOF
If Kn converges weakly to K then 4>n(e) tends to exp{- ~ 13 'i'(B)}
where we must also
2
have i'n(e)
-+ '+'(9).
page 11
The
usual weak-compactness
argument will then show that we must have
possibly
defective,
df
e
of
which
en
converging weakly to some,
~
function
is
the
•
Fourier-Stieljes
Transform.
Conversely, if the
en
e
converge weakly to some
tend to a limit 'i'(9), the transform of
continuous and 0 ::;: 'i'( 0) ::;: 1.
e.
then the
(~n(9)}
Thus 'i'(9) must be uniformly
•
Thus 'l>n(9) -+ '1>(9) where the limit function
must also be uniformly continuous and '1>(0) = 1; the latter claim may easily
be understood by letting 9 -+ 0 in the equation '1>(9) = eXP{-~92~(9)}.
From
this result it follows that K must always be a proper distribution function,
It is well-known, and easily proved, that if
lim inf I - lR'I>(9)
9_0
92
<
00
then K, the df associated with '1>, must have a finite variance; and hence,
from a representation like (3.8), but possibly including (for argument's sake) a
term corresponding to a first moment,
it follows that K must in fact have a
zero first moment and a finite variance given by:
Thus the claim /(,2 - ,'i'(O»)also"~ollow.'s from.equation '1>(9)
=
exp{ _~92 ~(9)}.
The approach to the Centra.I"Limit Theorem by means of the derived
distributions makes iH'easY):iodconst'r'udt lexsinpteS"of behavior differing from
the familiar convergenceipi-edicted\by .th~:··'1:fiiid~'beJ'g':cTheorem. We need two
simple lemmas, however. ' ,
LEMMA 2
Let g( x) be an absolutely continUous function in LI(O,oo) such
that both g(x) and Ig'(x)I'ar.e"nd1l-increci.sing, and both tend to zero as x00.
Also asume that I g'( 0 ) I ::;: 1.
Then the distribution function F(x;g) "" i-lg'(x)1 on [0,00) has its first
page 12
•
•
two rnoments finite and given by
ii,(g)= g(O);
ii2(g)=2I~g(x)dx
and a second derived pdf f\z)(x) such that
) /.lhif(2J\xl
=
g(x).
!
The proof of Lemma 2 is straightforward, in view of what has gone
earlier in this note, and so we do not give details.
Even simpler to prove is
the following:
if g(x) satisfies the conditions of Lemma 2 and a> 0 is any
LEMMA 3
constant, then ga(x) == ag(ax) also satisfies those conditions, and iL.'ga) = iLz(g)
for all a>O.
Furthermore, suppose g, and gz both satisfY the conditions of Lemma 2.
Then, for any 0 <;; p,q <;;1 such that p+q = I, pg, + qgz also satisfies Lemma 2,
and
'.
•
Plainly, if glx) on (0,"") satisfies Lemma 2, we can use gllxlJ to define
,,2 f\.i x ),
where f(2j(x) is the second deri~ed· pdf of S9me symmetric pdf f(x) on
(-00,+':00).
write
We have written ,,' for the variance of this pdf fix).
,,2l/i(BJ
for
the
Fourier -Transform
of
characteristic function of the second derived pdf.
g(lxD;
thus
Let us also
l/ilB)
is
the
We have, by Lemma 3, the
equation ,,2=2I~g(x)dx, and"T1.o~,e ,~bat "Utis;:.'!ariance,;is unchanged by scale
cha nges made upon g.
1.
Let·..g(lxD~"aT1\i
.l/J(9h.,!:ledhe:
introduced in the
.
- functions
.
preceding paragraph, associatedr:Y!'.ith~ll pclf~il:j;bwithc~ero mean and variance
EXAMPLE
"z.
Then, for j =1, 2, ... we can define the characteristic functions
•
•
•
of random variables with zero· mean~:and variances all equal to
page 13
,,2.
For this
example we find
~n(e)
n
kL
=
7/)(e';jj';n).
j=l
One can deduce from this that as n -+
'l'n(e) -+
The limit cf
~
1:
•
00
7/)(e.;u) du =
~(e),
say.
is associated with the pdf
-LJl
(72
o
g(lxV';u) du
';u
Thus we have a situation in which the global hypothesis is satisfied but the
condition (LC) is not:
yet there is still convergence, though not to a normal
limit. The characteristic function of the limiting distribution is
It may be verified from this last formula that the non-normal limit K still
has unit variance.
EXAMPLE
~
We use Lemma 3. Pick 0
7/)
i e)
=
'tiT $(
e)
<
'tiT
<
1 and set,
+ (1- 'tiT ) 7/)( ej )
where 'I/) is the characteristic function introduced in the previous example. In
this case
~n(e)
-+
'tiT.
To see this let us begin by noting that, since 7/)(0)= 1,
n
kL
'1/)( OJ ';n)
j-l
page 14
-+ 1
as n -+
00.
Further, by the Riemann-Lebesgue lemma,7/)( ())
given a small E>O we can find
~(E)
-+
such that 17/)(())1
From this it is possible to show that for any fixed
Since
E:
a as
is arbitrary the claim is established.
I() I -+
<
E
,x.
Therefore,
for all lei >
~.
e,
It follows that for this example
the limit distribution is normal, but the variance is strictly less than one,
depending on our choice of
't4T.
There is an example of this phenomenon in
Feller (1971).
I am indebted to Dr. Andrew Rosalsky for drawing my attention to the
interesting and highly relevant paper of H. F. Trotter.
;;.,
page 15
" ..H
~ ~. ~
REFERENCES
W. Feller (1971), An Introduction to Probability Theory and Its Applications,
2d Edition, New York, John Wiley & Sons.
A. I. Khinchin (1948), Asymptotische
rechnung, New York: Springer.
Gesetze
des
Wahrscheinlichkeits-
M. Rosenblatt (1974), Random Processes, Second Edition, New York: Springer
W. L. Smith (1953), Stochastic Sequences of Events, Ph. D. Thesis, Cambridge
University.
- - - - - - - (1959), On the cumulants of renewal processes, Biometrika, 46, 129.
- - - - - - - (1966), A theorem on functions of characteristic functions and
its application to some renewal theoretic random walk problems, Proc. Fifth
Berkeley Symposium, Vol. II, Pt. 2, Berkeley, California: University of
California Press.
H. F. Trotter (1959), An elementary proof of the Central Limit Theorem,
Archiv der Mathematik, 10, 226-234.
•
•
page 16
© Copyright 2026 Paperzz