,
UNIVERSITY OF NORTH CAROLINA
Department of Statistics
Chapel Hill, N. C.
LECTURES ON TD1E SERIES
by
Friedhelm Eicker
January 1962
Contract No. AF 49( 638 )-929
This research 'WaS partially supported by the Mathematics Division of
the Air Force Office of Scientific Research.
Institute of Statistics
Mimeo Series No. 314
•
PREFACE
These notes were used in a two hour seminar in which the attempt
has been made to give a brief survey over some of the important problems occurring in the analysis of time series ~ and over some fundamental mathematical tools useful for their treatment.
The presentation
is essentially restriced to observations from (discrete) stationary
random sequences;
however~
occasionally sidelines have been drawn to-
'Wards non-stationary problems.
The mathematics has been kept as rigor-
ous as appropriate and as abstract as
convenient~
trying to avoid any
over-emphasis not necessary for the ultimate statistical goals.
for
example~
Thus~
the concept of Hilbert space is used. \ We ,,;rant, however,
to point out clearly that Hilbert space techniques provide the modern
and most appropriate tools in attacking important probabilistic and
statistical problems not only on stationary random processes but also
on non-stationary and multiple (vector) processes over very general index sets.
Compare for an excellent exposition of this theory Parzen (4)
and (5).
The prerequisites for this seminar course are elementary measure
theory ~ advanced calculus and matrix theory.
As sources served Doob
f3!l:.7
Stochastic Processes, Grenander and Rosenblatt (1) Time Series Analysis~
and several articles from journals.
Some new material has been contri-
buted in a number of new or modified theorems and proofs.
Especially
Chapter IlIon the means of stationary sequences and extended classes
•
iii
of models contains new results.
This exposi-tion OIfes very nmch to the cooper-a.tion and reac-tions
of the participants of' the seminar to whom I vlant to express my gra-titude.
The senrlnar
viaS
suggested by Dr. George E. Nicholson; i-t8 reali-
zation and the preparation of -the nates were made possible through the
support of the Offices of' Naval and Air Force Research as well as the
National Science Foundation.
The tedious
vTOrl~
of typing has been ex-
cellently performed by Miss Judi-th Pinson, Mrs. Doris Gardner and Miss
Martha Jordan.
Stanford, July 1961.
TABIE OF CONTENTS
Chapter
I
Page
SPECTRilL TlffiORY OF RANDOM SEQlJENCES
12.
3·
4.
5·
6.
7·
II
Introductory Remarks
Definitions of stochastic Processes; the Covariance
li\mction
T11e Covariance Function and the Spectrdl Distribution of a Stationary Process
stochastic Integrals and Processes in Hilbert Spaces
The Spectral Representation Theorem for Stationary
Sequences
Spectral Decompositions and Linear Operations
Spectral Densities Rational in e 211:1>-- and the
Corresponding Random Sequences
8.
2
7
19
27
32
37
44
The Estimation of Ren) and Fe>--) from Sam.ple
Sequences
Classes of Estimates for the Spectral Density and
Criteria for their Goodness
Some Properties of Spectral Density Estimators
58
67
RANDOM SEQUENCES CONTAINING A DETERMINISTIC COMPONENT
75
9.
10.
li.
12.
13.
14.
15.
16.
IV
1
THE ESTIMATION OF THE COVARIANCE FUNCTION AND OF THE
SPECTRUM IN STATIONARY SEQUENCES
III
1
Preliminary Remarks on the Mean ot a Stationary
Sequence
Least Squares Estimation fora Deterministic Can~
ponent
On Best Linear Estimatea
Consistent Linear Estimates for Classes of Models
with Different Error Sequences
On Asymptotically Efficient Linear Estimates
Asy.mptotical~ Efficient Estimates for Trigonometric}
Polynomial and Certain Other Regressions
PREDICTION THEORY
17.
18.
The Problem of Prediction in Stationary Sequences
Solutions to Some Specific Prediction Problems
BIBLIOGRAPHY
44
75
88
93
97
100
liO
116
116
118
123
CHAPTER I
SPECTRAL TIlliORY OF RANDOM SEQUENCES
1.
Introductory Remarks
The subject of this seminar is stationary stochastic processes,
especially those 'With a discrete set of' parameter values (usually taken
as equidistant time points) and realizations of them (time series).
The processes are then preferably called stochastic sequences.
The
theory and some statistical properties of these sequences are discussed as 'Well as a number of applications.
The applications :have to be
restricted to phenomena which can be more or less closely approximated
-e
directly by a stationary sequence or by equations
t
where
Cxt } is stationary. The mean mt
=:
•••
,
-1, 0, 1, •••
of Yt (if Ext
=:
0), often
called trend or seasonal variation may then also be the subject of' statistical investigation besides properties of the sequence (x ).
t
In most cases
the ultimate goal of the statistical inference is the deter.m.ination of' the
time dependent component mt
or the ir..terpolation or prediction for a
time series from a limited number of' observations.
The reason f'or the
great emphasis on and preference of' stationary sequences even for phenomena Where they are physically not quite the appropriate models lies in the
comparative convenience to explore their mathematical stIucture.
2
2.
Definitions of stochastic Processes; the Covariance Function
The following standard definitions are needed.
!:::
Definition:
stochastic (or random) process is a famiJ.y of random
variables (abbr. r.v. 's) eXt' t
t , t , ••• , t
l
n
2
of points
€
T)
such that for any finite number
the joint probability distribution (abbr.
p.d. )
is consistently defined.
~y
T
is the set of all allowed
t-values; it
be of any cardinality.
The p.d. has to be synnnetric, Le., for any pernmtation of
. .• , t
n
• ••
...
J
and consistent, i.e., for any
F
t
t l ,···, m,tm+1"'"
t
m
,
<n
00, ••• ,00) = Ft
t (u '" .,u ) .
1"'" m l
m
(u ,···, u ,
m
n l
Kolmogoroff proved (in 1933; see also Ha1Jnosr3~7, p. 158; example
(2»
that under these t170 conditions the given definition is meaningful
["Kolmogoroff (1), pp.
25/217,
can be defined a joint probability space
p.d.'s for any finite collections of
the product s:Pace
T
K
(X ) there
t
lJ.) yielding the above
Le., for the entire family
x .
t
with coordinates
(n,
B,
(For
n
there could be UUten
x £Doob[3S/, p.
t
1!J.7.
In any
case
.Q
field
may have more than countably many dimenstons.
of
of the form
ro
sets (contajned in n) generated by the class of sets
fro: x (ro)
t
€
(real or complex) space
Definition:
!:::
B is the Borel
A, t
T}
€
where
A
is any Borel set in the
T
K .)
realization (or ~ sam})le function) of the process
is obtained by fixing
roo
to one particular point.
€.Q
is a single irdlued function of
t
on
xt(ro )
o
eXt}
then
T.
We henceforth restrict outselves to real, one-dimensional sets T
only .
The stochastic process
. in the case that
if
T
T
is then also called a random function
xt(ro)
is a continuum, and a random (or stochastic) sequence
is a discrete point set on the real line.
vIe
want, however, to
mention two simple examples for more dimensional stochastic processes:
1) T is two dimensional discrete if one lets
tility or yields of the plot at
and rOv1
2)
t
= (i,j),
~
be the fer-
i.e., block
i
j.
in meteorology
T
can be taken four dimensional, continuous
or discrete, letting
t
and the time coordinate;
be the quaQTUple of three Euclidean
x
t
may mean an atmospheric velocity,
temperature, etc.
A r.v. usually is defined as a real-valued measurable f\mction.
..
But sometimes, and so in the course of this exposition, it is convenient
to introduce and use complex
Definition:
r.v.'s:
A .5::..0mplex random ~'iable is defined by a pair (2-dimensional
4
vector) of real
r.v. f s
u
and v, writ'ten in the form
x:::: u + iv.
In an actlml computation, it is always easy to obtain the real
version from the complex notation.
Definition:
A time series is the realization of a stochastic process
whose parameter set
(T
T is a subset or the totality of the real numbers
is thus identifiable with time).
Thus a time series is a sequence of nuniliers (observations) giving
unique~
cal~
a number for each value
t
€
T (as a matter of fact,
needs not always to be the empirical time).
t
physi-
He rel}l"l.rk, hOlvever,
that 1h SOme empirical investigations this definition
does not cover
precisely imt one has in mind as nameJ.y the observations may not have
taken from a stochastic process.
incorporate some
reqUire
dete~nistic
complete~
The series may, for instance,
"arbitrary" influences.
sinrp~
As these series
different methods, we exclude them :further on, that
is to say, we restrict ourselves to the above defined time series.
A time series is called
continuous, if T is an interval (as such continuous);
discrete, if T is a set of discrete real numbers, which then iuthout
loss of generality can be assumed to be equidistant and to be the
integers.
He vTill be mostly concerned 'With cliscrete time series
may range from
..00
to
+00 •
ex)
'Where n
n
Hamwer, for the spectral representation
continuous processes are needed and therefore they vTill be discussed
briefly.
5
Definition:
A process
(-X.t'
-00
<t <
+00
J is called
strict~ sta,tionary
x t ' ..• , xt
if the probability distribution of any set
is indepen1
n
dent under shift in time; i.e.) it is the same as the p.d. of
} x t +h
for any
h.
n
Many statistical inferences
~~ce
use of first and second moments
Also, the spectral theory to be outlined involves only these both.
on~.
Hence
the more general
Definition:
A process is called yeak~ statiCll.1ary (in future we say merely
stationary) if
-00
< t < 00
and
_00< n<oo
the so-called covariance function ( or
not depend on
Example:
n-th
serial covariance), does
t (an upper bar denotes the conjugate complex qu.antity ) .
For a stationary random sequence '\vlth mutually independent
r.v.'s ,vlth
E(xt) = 0, E(X;)
= rr2
one finds
n
i:
0
Such a sequence is sometimes called vmite noise.
Let
and
E(X ):= llt'
t
He ocCaSiOllil.lly shall distinguish betlreen
6
i~hich
may be called the tru~ ~!::..iance :function (Parzen caD_s this the
proper covariance) .
of
n
Obviously, not for all means
~lt
is this a :function
only (see Chapter In for a more detailed discussion).
other hand, the true covariance depends on
theorems given below hold both for
convenience one assumes
is inessential.
~t
n
only, then all of' the
r(n)=r(t+n, t) and R(n).
=0
= const, or
If, on the
Often for
in which case the distinction
In general, however, the terminology is not quite clear
here, and some caution might be appropriate.
Often R(n)
is calleo.
the correlation :function, yet here this expression will be reserved for
r~n~ ='n
p.
r 0
n
= ...
For discrete stationary processes
p
n
is called the correlogram,
-1, 0, 1, •••
Obviously, weakly stationary processes include all strictly stationary processes except those with inf'inite variance~
(There are
weakly stationary processes which are not stationary in the strict sense.)
Definition:
Random variables in a stochastic sequence are called uncor-
related respectively orthogonal according to
or
,
}
s ~ t ,
supposed the variances are finite.
From any process with uncorrelated variables, an orthogonal one is
obtained by putting
Yt = xt
Example:
in
L 0,
-
E(Xt ) .
LDoOb.!J!!.7, p. &27
!5.7.
Then
IJet
x
have the rectangular distribution
1
sin x, sin 2x, ...
~orm
a process idth uncorrelated and orthogonal r.v.ls, vmi1e the sequence
1 + sin x, 2 + sin 2x, ...
is merely uncorrelated.
Here one observes how the theory
o~
orthogonal f'unctions enters
into probabilistic problems, and its power:rul tob1s can be utilized.
De~inition:
i~
A process is said to have independent increments
the
random variables
are independent
~or
(important only ~or
The analogue
Definition:
> t n_1 > ... > t 1
(continuous) processes) £Doobf3!J:.1 p. 9§].
any number
o~
the above
A process
(2C.t),
o~
points
\,
de~inition
-00
<t <
CXl
'With
t
n
for processes is the
with
E( fxt-x f2)
s
said to have uncorrelated respectively orthogonal increments
E{(X -x )(x. -x
t 2 s2
"t1
51
< co
i~
is
for
») =
\
,.
The Covariance Function and the Spectral Distribution of a Stationary
Process
The follmling considerations £following Doobl.3!!:.7, p.
limited to (weakly) stationary random sequences
475-48.!7 are
eXt J, t :::: .•• , -1,0,1, ... ,
8
though each result can, with some care, be easily extended to processes.
For simplicity,
x
is allowed to be a complex random variable.
t
The
following theorem is needed mainly for the second one although it is in
itself of some interest.
Theorem 1:
The covariance function
n(n):::: E(Xt + Xt)
n
of a stationary
sequence is non-negative definite; i.e., for every set of complex numbers
aI' ..• ,
ON'
any N,
N
L:
n,m=l
R(m-n) am an -> 0 •
Conversely, any non-negative definite function
R(n):
is the covariance
function of a stationary sequence ('which, however, is not uniquely defined
-e
by R(n);
it can be taken real if R(n)
The theorem holds also for
Proof:
1)
function:
He notice that
from N
=1
r(n)
R(O)::: O.
for any non-negative definite
For
N = n+l,
a ::::
2
follows
As the imaginary part
hence
R(-n)
= R(n)
Im( .) vanishes, \ole have for
u ~ 0
Im(R(-n) + Hen»~
u
real:
u
pure imagir~ry: Re(R(-n) - R(n»
•
72-727.
if this function is defined.
R(-n) == R{i1j
follows
is real) .LDoorf?>~7~ pp.
and
:: 0
~
0 ,
= a
n
= 0,
9
2)
A covariance f'1.mction is non-negative defini t~ given
(xi~ J
stationary:
N
L:
N
I:
R(m-n) ex
TIl
n,m=l
t : : Xt-[lt'
Putting
X
r(n) if
Xl
t
3)
ex :::: E f
n
n,m=l
x x ex ex }
m n 1U n
E
:=:
2
f II: xn exn 1 } > 0 •
1) and. 2) are seen to hold also for
[It :::: E(X ),
t
is stationary.
Suppose
R(n)
is real ana (strictly) posit.ive definite.
'Will construct a stationary Gaussian (real) process vrith
i.e., we reg~rd
R{n)
as a true covariance function
He
x _ N(O,R(O))j
t
~~e covariance
r(n).
~trix
r 2n
reO) r{-l)
r(-2n)
r(l) reO)
r{-2n+l)
........................
. ......... ........ .. .....
:=:
r(2n) r(2n-l) .•• ~(o)
of the
for any
(2n+l)-dimensional vector
n
~
0, hence
x
-n
, x, ... , x
o
n
,
is
p. def.
.
r -1
eXlsts, and the multivariate normal distrD)u2n
tion for this vector can be -written dovm.
multivariate normally distributed vdth
because of the assumption on
r(n)
Any subset of
p. def.
Xl s
is again
covariance function
(marginal distribution).
This also
ShOliS the consistency of the joint probability density :function of' the l1rocess, and its stationarity.
The syrnmetry requirement is satisf'ied if' "lie
simply define the p.o.. of pen-nuted arguments by that for the ordered argn-
nenta (a chane;e of the order of ini;el3X"e:ttions of the noruJL'.l llrobc:.bility density).
10
Thus the process exists.
4)
Let
r( n)
St + i Tlt' E x t
be strictly p. clef.
= 0,
and let for any
complex valued.
n
Let
x t ;:
the real vector
be normally distributed with covariances
= E(Tlt+m Sm)
E(St+m Tlt)
= -1/2 Im R(m) •
The covariance matrix
Re
r 2n
- Im
r 2n
Im
r 2n
Re
r 2n
p. def. and real, because
Im
Mn
is
(by
1)) and
= 1/2
= - (Im
r 2n )J
is anti-symmetric
r 2n is p. def. (prime denotes the transposed matrix). If,
further, the vector
(~) is real and suitably subdivided, then
NOvT the same arguments as in
from (3.3).
r 2n
3) apP4r.
x
t
is finally taken as a pair
Its real and its ilre.ginary parts each separately thus form
a stationary random sequence.
11
5)
If a non-negative definite function
n(n)
bas fini·te rank
v
then it is the covariance function of the stationar,y sequence
e
where
Y.m =
EY
n
a)
b)
R{O)
n
=
v
L:
R(m)
must be of the form
2rdA. m
e
n
varfy f .
n
n=l
R{n)
If
= 0;
(5.1)
is not the only possible sequence.
i.e.} all the
linear relations among the
j=l
0:. X _
J
t j
x
t
=0
with probability
~
terms of
of them.
v-I
ex.., ••• ,
J.
0:
v
1.
If
~
R{O)
>0
then there exist
~:
= 0
Hence all the
Ol' ..• ,
is zero for all nontrivial sets
and (5.l) vanishes for a non-trivial set
v
L:
suitable constants,
As a matter of fact, the above constructed Gaussian sequence
corresponding to
then
A.
0
var fy f.
n,m
n
R{m)
Remarks:
2rciA.
t
. n,
vr.ith probability one, for all
t.
can be linearly expressed 1v.ith probability one in
This is the case, e.g., if all the
xt1s
are
identical.
The fol101v.ing theorem defines and determines the so-called spectral
properties of a stationar,y sequence by representing
R(n)
as a certain
Fourier-stieltjes transform.
Theorem 2:
A function
R{n)
can be expressed in the form
is positive semi-definite if and only if it
12
1/2
R(n)
==
J
e
21r
inA. dF(A)
-1/2
where
f
F(A)
is a (real)
-1/2, 1/~7).
measure (i.e., monotone non-decreasing in
If at a jump
1/2 { F(A-) + F{A+.) j
then
F(A)
F(A)
al-ways equals the mean value
is uniquely determined by
F(1/2) - F{-1/2)
If
R{n)
lim
N
I:
N-> co
n=-O
R{n) e
-21rinA.
2
n:/:O
= R{O) .
is real this can be written as
1/2
R{n)
=
J
cos
21r nA dG{A) ,
o
G(A)
in £0, 1/~7 is non-decreasing and, suitably normalized, in
o < A < 1/2
G(~) - G(O)
= 2A R(O)
1) R(n)
N
I:
n,llJ;;:l
given by
R(m-n)amn
a
+ ~:7C
=R(O)
G{1/2) - G(O)
Proof:
by
uniquely determined
('.ll)
1/2
=J
-1/2
(I:
njm
1/2
=
J
-1/2
II:
.n
dO
L:
R(n)
n=1
.
is
p. s.d.:
sin 2:7C nA
n
,
-2:7CinA
l
n -e
13
OJ.' .•• ,
~
being any complex numbers not all equal to zero,
N
> 1
being arbitrary.
2)
Conversely, given any p.s.d.
H(n), we put in theorem 1
f'unction
an = e -21Ci.n.A • VIe then have for the f'unction
f'i~) =
~r
L:
1
N
~ ~
21Ci (m-n)7-.
R(m-n) e-
n,m.=l
e-21Cir~ > 0
1Nl)
R(r)(l -
•
r=-N
~
J
F (7-.) =
N
-1/2
Putting
f'N(7-.)dA. , the Fourier-Stie1tjes coef'f'icients become
> N
1/2
J
(.14)
=
dF(~)
-1/2
FN(~)
f'or
F:N(-1/2) = 0, F (1/2)
n
is monotone with
:=
R(O)
fnl
~ N •
f'or all
N.
By the first theorem of' He1ly (stated belcrw), there exists then a
subsequence
continuity point of'
ness of
F(7-.)
to a :f\mction
converges at each
.Lsee
F(7-.)
f'or all
7-..
The unique-
3).
will be show in
e.g. Loeve j)§J, p. 179, or GnederiJ.w1517,
Each sequence of unif'o:rmly bounded, non-decreasing f'lmctions
Fl(X), F (X), ••• 'With
2
one subsequence
F(x)
F (~)
N
such that
00
i
F(~)
First theorem of Helly
p. 204_7:
->
:HI' N , •••
2
(x)
F
n
Fi(X ):=: const at some
o
X
o
contains at least
'Which converges to a non-decreasing f'unction
i
at each continuity point of'
F(x) •
(For the proof', it is sufficient to consider only a cmmtable,
everywhere dense pointset
vergent sequence at
Xl:
D::: Y'l' x ' ••• •
2
-> ]i'(Xl );
FI/X )
l
subsequence, and, inductively,
F
nr
(x)
n
-->
One then selects a confrom it select
Ii'( x ).
n
tl.
second
'rhen the d:lagonul
14
sequence
converges on
F (x)
nn
D, and it is continuous.)
For Ni _>
Let the
one now obtains (3.11) from (3.14) by the theorem of Helly-Bray:
non-decreasing, uniformly bounded functions
weak~
F(x) in a ~ x ~ b (a,b
be continuous there.
J
lim
n
3)
1
-)fx>
dFn(X)
g(x)
L -1/2,
Let
J
=:
g(x) dF(x)
a
(3.12) :
Let b( A.)
in the open interval
where in
F(x».
b
a
Proof of
continuity points of
Then
b
is
F1(X), F (X), ••• converge
2
(1. e., everywhere except pos sibly at dis continuity points) against
the function
g(x)
co
1/:5,7;
be the rectangular fUnction 'Which
(A. ,A. ), 1/2 at
1 2
-1/2 ~ A.
< A.2
l
~
A. and A. and
l
2
1/2.
0
else-
Its Fourier expansion
is
00
b(A.)
b
n=-oo
n
e 2rtinA.
'
where
-2~inA.2
e
- e
-2:rrinA.l
n
A.2
=
J
e- 2:rrinA. dA.
A.l
is pointwise convergent, and all its partial sums are uniformly bounded
(as belonging to a function of bounded variation; see e.g. Achieser (1),
p. 93).
Thus by Lebesgues bounded convergence theorem fI-TaJJnosf32,1 p. 11,27
and by (3.11), we have convergence in mean (hence in distribution):
1/2
J
-1/2
b ( A. ) elF( A. ) ;:: F( A. )- F( A. ) ;::
1
2
11m
N
-~
N
I:
n=-N
R(n)
-2:rrinA.
-2:rrinA.
2
l
e.
-e
-2:rrin
,
15
with any of those limit functi. ons
2).
F(A)
v1hose existence 'Was shown in
But as the right side does not depend on
F(A), this formula nmr
defines unique~r one of them. (Enc.l of proof)
Definition:
The non-decreasing function
F(A)
defined~·in the preceding
theorem is called the spectral distribution function of the stationary
sequence
{xn~}
•
Any distribution function
,mere
£ see
!i0)
can be decomposed into tllree functions
e.g. Loeve j)g, p. 17!!l
A
FI(A) ==
J
f(x)dx
is absolutely continuOUs (here f(x) is called
-1/2
the spectral density) ,
is a pure jump function (it can have at most conntably many
jill~s; see below) ,
is the so-called continuous singUlar component, monotonically
non-decreasing.
A function
F(X)
there exists a
is called absolutely continuous if for any
8
for any choice of
>
0
~
€
>0
such that
< Yl <X2 < .•. < Yn
and any
n
such that
n
~
i=l
(Yi-xi)
< 0 .
Functions of this mind can be represented by an integral l:ilce
'Where
f(x)
F (A)
1
is finite valued and integrable (Ra.don-NiJ;:odYlll theorem;
see Halmos ./)27, p. lQ8).
F (A)
3
haa a derivative ivhieh vanishes almost
16
everywhere (in Lebesgue interval measure).
V'
Because of' its unif'om con-
tinuity in £-1/2, 1/'E.7, its contribution in the indef'inite integral
J
g(x) dF (x)
g(x)
over any integrable f'unction
3
in Ollr statistical applications,
anon-vanishing function
F
3
F
(X)
3
vanishes.
can be neglected.
is enumerable folloVTs from general
measure theoretic considerations £ e.g., Loeve (11.7.
alVlaYs an
"~o
Hence for any
Y -€
o
00
Y
o
such that
if only
f(x +) exists.
o
x
Let
Yo
xl~x. f(x),
=
<00
for any function
< x t < x.
Thus y
0
0
€
:=
f(x).)
such tha,t
f(x -) exists;
0
Nmf, each discontinuity generates an open
+».
interval on the ordinate axis
of
1l§7:
lAlexandroff (1), p.
> 0 by definition of lim there is an x €
< f(x t ) -< Y0
similarly,
-e
€
It can, however,
o
<
v
An example f'or
can be found e.g. in Riesz-Nagy (1), p. 53.
(X)
That the munberof' jumps of F (x)
2
also be seen as follovTS
Theref'ore,
(f(xo·
-), f{x0 ,
f(x), all these intervals are disjoint.
Because of the monotonicity
But then they are also enumerable
(and hence the discontinuities), because we can map each of these intervals
on a rational point in it.
The latter ones are enumerable.
The spectrtUn of the process consists of every number
neighborhood
F{A)
A
o
in whose
actually increases in the sense that
F(A + E) - F(A .. €)
o
0
>0
for every
The spectrum can hence consist of discrete points (jumps of
F(A»
€
and
OCI
of a continuous Pf:l:rt (it both: mixed spectrum).
If L:
Y
R(n) ~
verges, there is a continuous spectral density function given by
00
t(lx.)
=
F I (A)
R(n)
L:
:=
e
,
-21t'iM
...w
in the real case
QO
t
G (A)
:=
2R(O) +
4
L:
1
R(n)
cos
21t'M
> O.
con-
17
(3.12)
(from differentiation of
1)
Exnmples:
If
~,
.•• x '
o
E(x
••• are
x ) = {)st
0-
s t
(x )
t
then
2).
in Theorem
nmtual~
orthogonal:
2
is stationary with
n:/:O
,
R(n)
gl' ••. , ~lc
2)
Let
~,
.•• , 01.-r,
be
1'(7\)
mutual~
k
= F' (;\.)
2
=0-
orthogonal random variables
with
and
Define
be real unequal numbers.
Xm =
k
~
j=l
~. e
x
m
by
iro.m
J
(singular process)
,
J
then
=
E(x
x)
mtn m
is independent of
'x v'
m
m, hence
J
stationary.
R(n)
R(n)
can be 'Written
12
in integral form
R(n) ::::
e 2:lrin/l. dF(;\.)
where
F(;\.)
is a function
-1/2
jumping at
;\.j
by
o-~ (discrete spectrmn).
The real sequence corre-
sponding to this complex seqre nce is defined as follows.
u I ' ••• ,
~,
vI' ••• , v
k
be real
nmtual~
Let
orthogonal random variables
with
> 0,
all
i, j,
18
and let again 11.1 '
x
:=
n
"
.. ,
II.
k
2:
j:=l
be any real nuniliers.
s
cos
+ v.
21m >-.
J
sin
J
Put
21C n
>-.).
J
Then
E(xmtn x)
m
is independent
k
2:
:=
cos
Cxn)
is stationary.
J
j=l
m, hence
o~
2
cr.
~orm)
spectrum is seen to have k
\~nts
to vrite
one has,
o~
R(n)
'With jumps o~ cr~/2
J
:=
J
jumps at
R(n)
From this, the (real
11. , ... ,~.
~~
1
for this real process in the complex
~or
course,
21Cn 11..
F(>-)
(for >-.
J
to assume
i
2k
respective
one
~orm
3t-1
(3.11),
points
0).
In the next section, it \rill be shovrn. that every stationary sequence which
or
poss~sses
a spectral density is either as in examples
2) (~inite parameter schemes) or can be apprOXimated arbitrarily
closely by processes of this tYl'e.
It 'Will then be clear that the
spectral decomposition (harmonic analysis) of stationary processes
nmst play an essential role in their study.
4) Hithout giving already the precise definitions, suppose
Xn
is exprensed by the stochastic integral
X
n
--_. J7/2
e2:/tni>-
dy( >-)
-1/2
where the
y(II.)-random process has orthogonal incren~nts lrith
E(
Then
1)
(x)
m
~orms
Idyl) 2
:=
dF(>-) .
a random sequence llhere
19
,. , J.
1/2
E(X
m:+n
e 21!in"A. dF( '\)
(,
"i)
m
=
R(h)
-1/2
is independent of
F(;"').
tion
m, hence
(X)
n
is stationary with spectral distribu-
The following spectral representation-theorem shows that
any stationary process can be represented this
4.
v~y.
stochastic Integrals and Processes in Hilbert Spaces
·To prepare the important spectral representation theorem in section
.2 •
"lie need a number of further concepts ( continuity and stochastic
integrals) connected with stochastic processes which can be nlost adequately and clearly described in Hilbert spaces.
This theory, nOVT being
applied in many modern papers on time series (see e.g. Parzen (4) and
some others of his teclmical reports)., has been introduced into the
field of stocoostic processes by the :31'Tedish and the Russj.an schools
(mainly
Cramer, Kolmogoroff, ObuJchoff, Karhunen
, Aronszajn ; Cramer
gave (19h2) a proof of the representation theorem).
Definition:
tox
A sequence of
r. v. I s
exn~}
is said to converge in the mean
if
lim
n
->
00
we write
l.1.m.
n
->
X
n
::: X
(limit in mean) .
00
This convergence implies because of the Tchebycheff inequality .?onvergence
1.1'.
20
xn -> x ,
meaning
p{lx-xl>€)~o,
11.m
n ->
1.1"
n
IX)
all €
-
>
0 •
vle need tl~ following property of a (not necessarily stationary) random
process
(y(t))
(or
(Yt))
idthorthogonal increments (Section
2),
namely that to each such process there corresponds a uniquely determined
fLmction
F(t)
(except for an additive constant,; later on this func-
tion will be ahoi'm to be equal to the above used measure f\.mction
t
F(A))
> s,
which is often written symbolically
e
This function
11'( t)
is obviously monotone non-decreasing, and of the
form
T>O
F(t) "" const + T t ,
if, in addition, (Yt)
(Proof:
Let
t
is stationary .
> r > s. Then by the definition of stationarity and
orthogonality 1nth a suitable fLmction
But
Fl(X) == const
Fl(X) + l~(Y):
x
is the only monotone solution of
first, one sees that
y, and any natural munber
canst
r
n.
for any rational
also continuous
F
l
a.e., hence
Fl(ny)
Putting
r;;:
r
n/m.
=n
Fl(x+y) ""
Fl(y), for any real
y "" 11m one obtains
Now as
may be real.)
F
l
F1 (r) ::::
is monotone, it is
with
21
Definition:
f, B, •••
A set of elements (poirres, vectors, fUnctions)
is called a linear space R if
(a) they form a mod'ule (i. e ., there
is defined a conul1utative addition among the elements of R and to each
element exists its inverse)
(b)
there is defined a distributive, asso-
ciative nmltiplication between the elements of
numbers
0;,
such that
t3,
af
€
R for all
R and (real 'or complex)
0;,
f ["cor.rpare for the
_7·
general theory e.g. Achieser and Gla.sma.n
The number of linear independent elements in
R is called its
dimension.
R is called metric if there is defined a fimctional (f, g)
Definition:
(scalar or inner product) bet'ween any two elements
positive definite.!and hermitian,
to the same
cOll~utation
(f, f)1/2 :::; /If II
f
and
(f, g) :=(g, f),
f, g
(f, g)
is
bilinear and submitted
rules as the inner product for vectors.
is called the norm of
f,
/If-gil
the distance between
An infinite dimensional linear metric space is called a
Hilbert space
H, if it is
con~lete.
Complete means that for any f'lmdamental sequence
€
R.
g.
Definition:
fn
€
Hand
IIfn-fill /I
f € H such that
...;>
0
as
n, m ->
/If -fll -> 0
n
co
{f J with
n
there exists an element
(limit element with regard to the
respective metric).
Definition:
a Euclidean
A finite dimensional conrplete linear metric space is called
slk~ce.
22
e
The examples most important in this context for Hilbert spaces are the
so-called
~
~~~e~ containing all measurable functions
(n,
tain measure space
B, J..t.), say, such that
where the norm is defined by
L~,
(Similarly spaces
P::: 1, are
J. V. Neumann in 1932
Completeness theorem
complete.
proved the
f
(J
IIxll:=
on a cer-
is integrable and
dJ..t.)1/2::= (x, x)1/2.
(J
<'[5'2.7,
/x/ P dJ..t.)l/P • )
3!!l: The spaces LpJ..t. are
among any Cauchy sequence {f
n
r. v. I s
of a stochastic process (or sequence)
in order to generat.e a Hilbert space
:=
fyt }
E(~)
H'
with the inner product
,
The construction is done by taking all finite linear combinations
of the
Yt I s, that is, the linear manifold M'
subspace in the
to
J..t.
(say) which forms a
L~ of all square measurable functions with respect
(i.e., all r.v.'s
on
(n,
B, J..t.)
wnose variances exist).
Hence
because of the completeness theorem any fundamental sequence aut of M'
has a limit (in mean) in this
MI
(the closure of MI
L~.
Now by adding all these limits to
one obtains the Hilbert ~.Jf. (say)
)
a subset of the mentioned
If novr in addition
)
a.e.
over their probability space Which have finite second absolute moments
(x, y)
P'
n
important
e.g. Achieser (1), p.
f
We novr take the
2
2
de~ined with IIxll::=
There is a subsequence
which converges to
/x/
/x/
x
L~ (compare ["2_7,
P'
8).
ty } has orthogonal increments then we have
t
for the distance-square of any two elements
Y ' Ys
t
as
174).
23
where
F(t)
is bounded and continuous
mental sequence (Yt.J
with
t
~
t-interva1
t
i
Vice versa" if
Hence instead of a f'unda-
approaching a value
i
(or from both)" one may also consider
continuously.
a. e.
s
Yt
with
t
s
from either side
approaching
s
is any real number in the considered
T (except possibly end points), then a f'undamental sequence with
-> scan a1i'laYs be found.
:Note that the limit elements in
HI
are
ahJE>.ys defined only a .e .
He can nOif define the continuity of a process LDo01["'3!J:.J p. 42~7:
Definition:
A process
(Yt
J
(t
ranging" over an interval T) with
orthogonal increments is called continuous from the left (right) if
or
1. i. m.
t~s
exist (for all inner points
s
of~j hence also
Y
t
:->
y
~.p.
s..
or
y
s+
Now, because F(t) can have at most countably many jumps (see3 .),
we find that any process (yJ
with orthogonal increments is continuous
in the sense that
with probability
for all
t
e.g., F(t)
a.e.)
except possibly for an enumerable set
= F(t+)
then
2
E(ly(t)_y(t+)1 )
= 0,
(jmups of
hence
yet)
1
F).
= y(t+)
(If"
•
24
To define stochastic integrals we set up a measure-preserving correspondence (congruence) between t'WO Hilbert spaces
am
H'
*
H
to be
defined now.
Let
T be a closed interval, and defined on it a stochastic process
( yt) 'With orthogonal increments.
Let
be the above introduced Hilbert
HI'
space generated by it, the scalar-product being
now take the
(Ys' Yt ) = E(y;rt)'
1'le
1-1 mapping
,.~t,
in
elsewhere (characteristic function of
the
T
set)
and extend it for linear combinations of corresponding elements in both
sets.
Let
¢(t)
be the follovr.i.ng S1lep function on
l
< ... < an'
t
< al ,
t
in
a.J- 1· -<
t
> an
-
=
¢(t)
a
in
T:
It corresponds to the
< a.,
J
r. v. cp
L
€ H'
j
<n,
-
written in an obvious
integral notation
<p::
J
¢(t) dy(t)::
~
j==2
T
cj(y(a j -) - y(aj_i-)
(provision is made for non-continuous processes by taking
y( t- ) •
Then any
functions
(
l'
¢1' ¢2
2) == E(
1
cP
is left continuous· also) .
with the corresponding
T
T
=
For any tyro step
<PI' 1P2
r.v.'s
2):: Efj¢1(t)dy(tV¢2(t)dy(t)
y(t)
J ::
we have
J¢I(t)~dF(t)
T
:: (¢l' ¢2)'
25
e
say, taking this as scalar product in the linear space of the
(For the proof taJee
¢l
and ¢2·
aI' ••• ,an
¢' s.
as the union of the step abscissas of
Then, using orthogonality and assuming
= y{t-),
yet)
a
J
j
dF{t)
a.J- I
=
J
¢l(t) ¢2(t) dF{t).)
T
NCn-T, by taJdng the closure of this metricized linear manifold, vIe get
*
(s~y)
a Hilbert space' H
F( t) )
to the measure
space.
If
F(t)
on
of square-measurable functions ('With respect
T
('Which is again contained in some
is a finite measure, then
The scalar product
(¢l' ¢2)
H*
L:
is congruent to this
S).
by this extension is now defined for all
*
elements of H, and so is the correspondence
~ <--> ¢
.
vfuich re-
mains to. be distance preserving.
SUppose
nOVT
a sequence
r/
~n
has the limit
¢,
i. e. ,
A
or
(where the
F
indicatea the 'Weight function for the norm in
Then it follows that also the corresponding
<» n form a f'unde.mental
sequence -which because of the completeness has a limit
say (here
this
r.v.
P
l.i.m. p
indicates the weight function for the norm in
P E:
HI
*
H) •
H').
<fn =
P,
Itis
which "We define as the value of the stochastic inte-
<p:::.
(Definition)
J
26
¢(t) ay(t)
T
¢(t)
here rrey be any
F-square integrable function.
We observe that this mapping is topological (1. e., continuous and unique in both directions).
If we 'Want to integrate instead CNer
A
c:. T
(1
then lre merely IlIUltiply
on A,
0
Definition:
¢
only over a Borel-set
T
by the characteristic function of A
else"There), and get again a unique
A }3tochastic integral of a process
9"
r.v.
(Yt) 1dth orthogonal
increments for a Rieraann-Stieltjes integI"?-ble function
spect to
F(t)
as given abCNe)
is defined by the
b
CJ=
J
¢(t)ay(t)
1. -1 .. m.
n -> 00
a
"There
a, b
is a (closed)t-interval and
is a subdivision of it vrith
Put
y(tj+1)-y(t )
j
of orthogonal
H'
r.v.'s.
'2::
j=l
j
= z(t j ),
J
J
1fiich for
(,.rith re-
¢(t~)(y(t·+l)-y(t.»
J
J
J
to = a'< t
max(t.+l-t.) '-;:'0;
¢(t)
r.v.
n-l
:::
Thus the
1
< •.• < t
r.v.'s.
:::. b
t. <t~ <t'+
J-
J l
J-
j:::. 1, 2, •••
are a set
Clearly the abCNe limit shows that Ll.ny r.v.
can be appro:A.'"inJated in the mean as closely as desired by a
orthogonal
n
For stationary processes
shovlU to assume a very simple form.
CPn 1 ,
¢(t)
stUll
in
of
1illl be
27
5.
The S]?ectral Representation Theorem for Stationary Sequences
'He are
nOvT
Theorem 3:
prepared to prove
Every stationary sequence (x ),
n
_00
<n
00,
has the spectral
/I.
continuous) has
representation
~
(5 .1)
:=:
f
1/2
2
e :rdnA
ay(A.)
-1/2
where the process (y(A.») (not necessarily stationary;
orthogonal increments witb
F(/I.)
tXJ .
is the spectral distribution function of
zation
y(A.)
==
1/2 (y(/..-) + y(A.+»
holds, then for 1/2
N
Y(A. )-Y(/I. )
2
1
:=:
X
0
y(1/2) - Y(-1/2)
vThich determines
into the
y(A.)
(A. -A. ) +
2 1
==
x
o
uniquely
If the normali-
~
2~
1. 1. m.
1'T -> 00
l:
n==-N
> /1.2 . > /..1 > -1/2
-2nin,.'\2
e
n
n::/:O
,
a.e.
If (x ) is real, this can be put
n
f~rm
(cos
2~
-2~in/..l
----------xn ,
e
n/I.
du(/I.)
+
sin
2~
n/I.
dV(A.)f
where (u(~)}Av(/I.)} are real processes '\Tith orthogonal increments and
28
"A. >0
G{"A.)
1
o
=
E {du{"A.) dV{IJ.»)
"2
1
being defined in theorem 2.
Again after suitable normalization one has in
= 2x
u{"A.) ~ u{ 0)
"A. +1,;1.m
N->
o
= xo
u{1/2)~u{O)
N
L:
sin 2:n: n "A.
x
n=~N
1XI
0 < "A. < 1/2
:n:n
n
n-/:O
,
cos 2n: nA
=
~
- cos 2:n: nA
2
l
n:n
l.i.m.
N->~
1) Proof of (5.1), given any stationary sequence f xn }
Proof:
spanned by the
closure of the linear manifold M'
x
n
by
with respect to
the distance function
x, y
one obtains a Hilbert space, say
from the above defined
able
the
r.v.ts
xn '
HI).
with respect to
Lt
of square measur-
(defined on the joint pl'. - space of
1.fhich are square measurable
sense stationarity).
M'
HI (for the moment to be distinguished
It is embedded in an
IJ.
€
E(
fxn f2) <
1XI
because of wide
S:i.nuJ.arly~ by closure of the linear manifold:M*
f'unctions
e
2:rcinA
spanned by the
-1/2 ~ 'A ~ 1/2 'With respect to the distance
on
f'unction
H*
one obtains eo Hilbert space, say
*
from the above defined H)
F
L
of all :functions
s('A)
2
Glasrnmn
system
(l)~
p.
(for the mament to be distinguished
It is embedded, in fact it is equal to the
J'1/2 2 dF('A) <w LAchieser and
'With.
0
Is I
This follo:w1/2 from the fact that the orthonormal
227.
(e2:rcin'A J is complete in
dense), hence every element in
L;.
L;
Hence
L:
is the
is separable (or everywhere
l.i.mo
F
(i.eo, 'With respect
to the just defined distance) of a :rundamental sequence "With elements out
of
*
M.
Especially the characteristic functions
elements of
X('A)
of £-1/2,
1:::7
are
'*
H.
Now put into I-I-correspondence 'the elements of M*',and M'LDoob
f"3!!.7, p.
565}.7 by
e 2ninA
<_
> x
n
and similarly for linear combinations.
This napping is distance preserving:
-1/2
E fL:
m
b x
mm
L:
n
"'b""X"
J
n n
==
L:
m,n
bo
mn
J
R(m-n) ==
-1/2
utilizing Theorem 2
1/2
=J
-1/2
e
2rc:ilO.
L:
m
b e 23r:i.rnA L: D. e-2:rtinAdF('A) ,
m
n n
30
By continuity it can be extended to the 'whole respective Hilbert spaces
__
H*
and
H' (topological mapping).
X(A).
ponds to
increments:
E(Y(~l) - Y(~2»
C(~,
where
v)
r
The set (Y(A)
for every
be the
r.v.
'which corres-
is a stocbastic process 'With orthogonal
L -1/2,
1/2_7 holds
(f2C(~1' ~2) C('3' ~4) dF(~)
=
=
0
-172
< A ~ v , and 0 elsewhere. Further, because
~
in
y(A)
> A2 ~ A3 > A4 in
Al
(Y(k3'-Y(X4»
=1
Let
of
we have (5.2).
Obviously,
fY(A)1
forms a stochastic process:
the definition of the measurable functions
e
stochastic integral its
the
J)n'S'
for any
A by a
is given on the joint probability space of
Thus also the joint
At
A , ••• ,
l
p.d.
Y(A)
through
p.d. for any parameter points
is consistently J.nd symmetrically defined.
Finally, to any
¢(A)
introduced in section
€
H*
there corresponds the stochastic integral
4:
¢(A)
dy (A)
€
H'
-1/2
Putting especially
2)
d( A)
'fJ
Proof of (5.3)
theorem 2
i
b
n
=
if
=2:rcn- (e
e 2:rcinA
we must have
(5.1) is true:
-2:rcini'-2
.
equat~on
( 5.1).
write as in the proof of
e
-2:rcinA
l
)•
in (5.3) can be 'l'1ritten (in the notation of Theorem 2)
Then the right side
31
'Which is a
'With
l.i.m.
N
E
N
-N
r.v.
H'
€
. J
1/2 N
b x
;;:;
nn
l.J..m.
N
E
P -1/2
b
n
e 21CinA dy (A)
-N
and equals by virtue of the l-l-correspondence
H*
J
N
2
(.n..1.m.
-1/2
F
-I;
b
21ri
n
e
-N
.
l/2
nA.) dY(A) l""
CtA1 ,A )dy(A)
2
-1/2
J
lIZ
Y(A )-Y(A1 )·
2
(End of proof.)
He are nov in the position to give precise meaning to the e.,."'(B.mple 3)
in Section 3 'Which proves to be a partial inverse of the last
theorem, namely:
f x n)
Any stochastic sequence
defined by the stochastic integrals
1/2
J
xn ;;:;
e
21CinA
ay(A)
-1/2
where
CYeA)}
is a process 'With orthogonal increments and
dF(A), variation
<
F(A)
00,
E
(I ayf 2).
is stationary in the ,vide sense.
(To prove this we need only to remember that the I-I-mapping of
the Hilbert space
are elements, onto
generated by
H'
H*
, of 'which the
preserves the inner product.
Hence
1/2
E(x
m+n
x)
m
J
-1/2
Which is seen to be independent of
m;
we call it
R(n).)
X 's
n
;;:;
6.
Spectral Decompositions and Linear O;perations
:x:
n
One can "tn-ite the random variable
in a stationary sequence as a
(finite or denumerable) sum of nmtu.aD.y orthogonal random variables.
t his purpose,
"t{0
n
=
into a number of dis-
, A
whose union is the interval.' Then clearly
m
joint sets AI'
:x:
f -1/2, 1/E.7
divide the interval
For
m
,
L;
J
=
j=l
dy(A)
A.
J
franl the spectrul representation theorem.
Using the characteristic :t'unction
Xj(A)
of A , the orthogonality
j
is easily seen; ve even shmr
1/2
J
e 21ti (n-m) X (A) \:(A) dF(A)
j
;::
0
j
{k
-1/2
for fL"Ced
Vice versa,
j
can be arbitrarily closely appro:x:i:mated
in the mean by a linear combination of the
teristic :t'unction
Xj (A)
of' any A
:x:.
n
He
kn01f
that the charac-
can be expanded into its Fourier
j
series
N
=
lill!
N
which converges a. e.
->
L;
00
lc=-N
and is unif'ormly bounded.
pondence between the. Hilbert spaces
Xj(A)
€
*
H
HI
,
(j)
~
or
H* and
<--> Xj(A)
By the constrncted corres-
,
we have, as
33
N
loLm.
N
->
k==-:r-r
co
Choosing the three sets
con~osition
~
==
of a sequence
{xt}
:x:~l) + y~2) + x~3)
For the
heights
f
j'l1Il1p
properly one obtains the deinto three mutua.l1y orthogonal parts
corresponding to
function
F (A)
l
1dth jum;ps at
one obtains because of
k
F == F1 + F2 + F3
dF (A.)
l
=
E
(Section 3).
A , A , •••
l
2
Cf
dy(A)1
2
and
}= 0
in the
(Al' At + 1)
open intervals
2:l!inA.
(1) =
.E
j
xn
J(Y(A)_Y(Aj _» ==
e
.E
j
e
21tin'A.
J
Absolutely continuous spectral 0.istributions and moving averages
In this case
...
then define
for
f(A)
f
F'(A)
dy(A)
O.
proportional to
Definition:
(6.2)
y(A.)
Idy(A)12)~
E (
dye A)
F'(A.)
=
'2
Ir(A) 1
from Theorem
•
We can
3 by
Note that (Y(A») is now a stationary process with
orthogonal increments.
if only
exists, and we put
can be arbitrarily defined for
f'(A.):::: 0
dA. •
A random sequence
n ==.E
.'
:x:
J=-09
c.
J
gn+j
is called a sequence of moving averages
, ,
form an orthonormal set of
,
if
r. v. 's; i'. e., B( ~
.E
j
2
Ic.1
J
1" ) : :
nrn
<
0
nm
co
•
and the
He :prove
~n
34
Theorem It:
A random sequence is a process of' mov:ing averages if and
only if' it is stationary with an absolutely cont:inuous spectral distribution.
The spectral density is equal to
e21tij~
= L:
c(A)
idth
c.
j
Proof':
Ic{~) 12
J
1) If' a random sequence (x J is given by (6.2)
then
n
0Cl
E(~
1U,.-n x)
m
J
hence (x
n
= .L:
J=-
c.
J-n
c.
=:
J
0Cl
R(n)
<
is a stationary random sequence.
0Cl
;
c(~)
If
=
L: c
j
j
e21tij~ ,
then by the Parseval identity
1/2
R(n)
J
=
1/2
em ~ = J
c(~)
-1/2
Ic(~)12
e
21tiM
~
-1/2
ao that the spectral distribution is absolutely continuous with
2)
Let (x
n
J be a stationary random sequence vlith absolutely
continuous spectral distribution,
F' (~)
= l:f(~) 12
.
Let
imch converges in the mean (but does not need to be unique).
Using
dy(~)
= f(A) a.Y
(~)
(as introduced above) in the spectral
representation we have
_00
ctY(A)
35
utilizing again the correspondence between the Hilbert spaces
'llhe orthogonality of the
5m is verified by inspection.
H'
and
(End of
proof. )
He thus obtain the general representation of a stationary sequence:
00
xn
vlhere all the
by
lY(r-) ]
r. v. 1 S
gr
Z j'
or (x ]
are mutually orthogonal and determined
as shown above.
n
Linear operations on stationar,y sequences
Let
1/2
X
n
=1
2
e :rdnr- ay(r-),
Elf (ay(r-) f2]
::: dF(r-)
-1/2
be a stationary random sequence.
"
xn ->x
n
=
A linear operation is a mapping
l:
j
or, more generally,
1/2
in :::
J
e 2 n:iM
c(r-)
ay(r-)
-1/2
where the sum over
this type).
If
j
is finite or infinite
( filtering devices are of
e(r-)
can be expanded into an integrable ]'ollrier series
c(r-)
:::
l:
j
e
2iij;\
1
then the sum and the integral are evidently identical.
The covariance
function is
1/2
1\
R(n)
=
J
e
21ci
nf...
'c(A.)
f
d.F(A.)
,
-1/2
hence the spectral distribution is
~(A.) =
A.
J
"C(ll)
f
c1F{~l)
-1/2
The spectral intensities are mu.ltiplied by
.An example fDoob
1\
Xn
f)!!:.7J
= 1/3
p. 50};!
'C(ll)
I2 .
is moving average seqtences, say
(xn- 1 + xn + xn+1) .
'c(A.)
,2
=
/C{A.)I:Z-
= 1/3
This means that frequencies near
A.
tion of the covariance function
R(n)
in the Fourier decomposi-
become relatively unimportant.
One observes that any such averaging changes the frequenc:y relations
of the process; certain frequencies (such as of noise J for example)
may be damped down.
out.
Also um'1anted seasonal variations can be blended
37
7·
Spectral Densities Rational in
e
2rtiA
and the Corresponding
Random Sequences
Since any continuous :function
~y
be approximated uniformJ..y over
a bOlmded interval by a trigonometric polYnomial, it is natural to use
this as an approximation for
F'(A), the derivative of the absolutely
continuous spectral distribution.
would be rational functions in
to lead to important dif'ferent
the non~negative f'unc'tion
e
A more general approJdma.tion of'
2n:iA
t~'J?es
pI (A)
Both approx:i.mations 't'Till be seen
of sequences.
He now' assume f'or
F ' (A)
> 0
The problem is to f'ind the properties of the random sequences which
pond to different types of densities.
forn~tion YeA) --> YeA)
cess.
1-7e need
>
F ' (A)
0;
in the from
llC
If(A) I
(Section 6)
=
It can be solved by using in e
trans~
into a stationary orthonormal pro-
(F' (A) )1/2 which can be obtained because of'
:prove that the complex :f\mction
(7.1)
corres~
f(A)
can again be expressed
yet not uniquely, as ''I'1ill be shovm.
This ambiguity 'tdll
result in a nluuber of dif'ferent possible types of corresponding sequences
(x ) •
n
i:lithout loss of generality it can be assumed that munerator and
deno~lnator
grability of'
in
pI
have no cammon roots.
The~because
F ' , its denominator, respectively,
I,
):::
{=-L
of the inte{
13,/ ~/-*
cannot have a
X
;8
2:rdA.
root
the
Zo
of modulus one because to
A.-value
A.
IzoJ::: Je
01::: 1
would belong
out of the (closed) integration interval.
o
Because of the reality and the unique representation of the quotient
(7.1) (-which can be aSSU1lled norma.lized by 13 ::: 1) we have
L
a-Ie
==
a.·K ;
and thus rooli1,-y of the nU1llerator and the denominator themselves.
(He have namely:
L:
Ck
L:
(.3!
e- 2:rcilcA.
e 2:rcilo",
o
=
==
e 2:rcifi\
If now v
e
~ 0
21t:ilo..
)
e 21tifA.
is a root of a.ny real valued trigonometric poly-
--1
nomial then also V
o
.-
N
L: Y _ v j
-N -J 0
v -j
o
Thus its factor representation consists only of squares as seen after
suitable grOU]?ing (including also roots
v o == 1 which must occur
twice because the degree of the polynonrlal is always even);
:::( e 2:rciA. -vo )( e 21tiA. -v-=1)
0
Finally) (using that
F'(A.)
F'(I-.)
-
const
:rc(e1t~_w_)
j=l
J
~
j=l
fe
2:rci/1o
- vo
r0
•
equals its modulus))
K· 2 -I-.
(7- 2 )
2:rciA.
e
v
o
(e2:rcil-._ Zj)
2
K
L:
==
s==o
L
L:
f:::o
A
e 2:rcirl-.
r
Dr
e 21ti!1-.
2
39
'Where
'W 0'
J
Z
0 are the respective roots or equally 'Well the inverses of
J
their conjugate complex values, and B my be norn:alized to one.
L
actual roots
Z
as v1el1 as the
0
J
from among the sets
modulus
z 0'
J
< 1. Then the
denominator having roots
mode.
-=1
Z
0
and
choice of the roots
Z
0'
J
1-1
~
<1
on~
fCA.) = IFf CA.)
Then also
~ 1
0
J
can be p3.rticu1ar~ chosen
respective~
J
Bf
W
v1
-=1
o '
J
wj
so as to be in
are uniquely·. determined, the
in mode, the numerator
is unique.
<
1
0 ('which must be equivalent to
J
A.
nm1tip~g
of modulus
'We may get a number of different representations as a quotient of
determined).
FI(A.)
f(A.)
(whose mode as a fiJnction of
He relI8rk that
in the considered class
A.
(7.1).
fl shaving the same modulus, 'We men-
Let
be one of the possible representations.
I aY1 2 ) =
N
po~-
stays of course
(L
E(
1)
(7.2) represents the most general form of
As an example for different
Case 1:
in
However, by taking another
numerator and denominator by complex functions of
noIDials for
The
cD.
Then put
aY(A.)
= 0)
= [l(A.)dy(A.);
.
such that the spectral representation (Theorem 3) becomes
where
e 21l'itA
~ ~
-1/2
k=o
is evidently an orthonormal random sequence.
have taken in
f(A) negative povrers
course, al-ways possible if the
. namely from
f(A).)
f(iI.)
age only future values.
--
density.)
f(r-.)
2
e- 1l'itA
Ar sand
Br s
(For convenience, we
which in (7.2) is, of
are
chosen properly J
Evidently is (7.3) a finite moving avemge (aver-
aging only the "past" values of'
(If we would talce
e- 2 :n:ikA
€t)
and as such an example of Theorem
with positive powers
e
21l'ilcil.
, we ivould aver-
Both processes have been the same spectral
The sequences corresponding to the examples given above for
'With the same spectral distribution are
The
l€~l) J
Case 2:
and
(€~2)}
are tiro different orthonormal sequences.
If
(K :; 0)
f(iI.)
is one of the possible roots of the given
Again
E
t
is an orthonorm/J..l sequence.
Fr(r-.), we get for
lIFllture" errors
€,.
are ortho-
4.
41
as is seen from expanding
below into a series of' orthogonal errors
We call
x
Bf zf
2.:
["e. g., Mann and IV"ald
J)J.7_7 the
<1
they are in mode
u
as shmm
€ T , € T- l'
If all
roots of'
L
2.:
f::::o
Z
Bf zf r 1
(2.:
,.
_II
z ~
Bf
2.: Bf
if' those of
pand the analytic :fUnction
f
:=;
0
are
> 1), vTe may ex-
into a series of positive powers
e-2~iA which ,vill converge tUlif'ormly in a region including the
unit circle.
Introducing this series above, we have transformed the
autoregression into an inf'inite moving average of'
--
x...
lie outside the unit circle (one often considers
: : 0 : : /J
z::::
and
T
a finite autoregression on past ob::lervations.
t
the roots of'
of'
x
If all the roots lie inside 'bhe tUlit circJ.e
€t' €t-l' .••
(f(A.)
under this re-
quirement is again unique; its raux'ent series does not contain any
positive powers of
z:::: e~21ciA), then
average on all "future"
€t.
(These
x
t
€rs
can be written as a moving
are again diff'erent f'rom the
above ones yet orthonormal.)
If the roots lie on both sides of the unit circle, the moving
aver-dge ceases to be one-sided.
values
1
€t' (:f- (A.)
2:n:iA )
z "" e
regular in
It ra.ther includes all future and past
0
< r < / z/
< R ..Then r < 1 < R,
•
01'ten on the ground of prior information it is posiible to select
one of the possible averages to a given density
Note that llfutux'e ll and
ponent in
Case 3:
If'
e 2:n:iA
l'
Il
FI(A).
past" is determined by the sign o:f the ex-
which can be ohosen :freely.
has the general fonn
42
f(r-.)
==
K
:E
k==o
~c
e
-2:n:ik.A.
e -2:n:ifr-.
(~
one is led to consider the random sequence
) for 1-Thich holds
K
:E
k::::o
with an orthonormal sequence
(e )
t
The left side is namely equal to
•
dy(r-.) ::::
J~
aY(r-.)
putting again
and
-1/2
One observes that there are up to
equations of the above types beti'Teen the
2K+L
x
t
different dif1'erence
and the appropriate
e
t
for the same SI)ectra1 density, arising from d.ifferent choices of the
z.,
J
11 .•
J
(These choices include the models obtained by changing the
sign in the p01'Ters of
f(r-.).)
'llhese so- called fin!te
parameter schemes l\re of gre.."l.t practical
t
s
43
importance.
On the one hand, they allow the setup of plausible and s:i.nJple
models for some real phenomena.
On the other hand, it is only a finite
munber of quantities that need to be ( and can be) estimated from given
data.
It is clear from the above and Theorem 4 that all these schemes
for any orthogonal errors represent a stationary sequence.
The covariance function of a process in any of the three cases
mentioned obeys certain simple differEnce egtJ.a.tions 'I'1"hich are sometimes
useful 1Doob[j!!.7, p. 503; Hannan
13g7,
p. 37_7.
CHI\.PTER II
TIrE ESTD1ATION OF THE COV.ARIANCE FUNCTION
.AND OF THE SPECTHUH IN DTATIONARY SEQUENCES
B.. The Estimation of H(n) and F(:>..) from Sample Sequences
From Theorem 2, concerning the transformation of
n(n)
and F(A.)
into each other, it is clear that a. sequenc-e is spec'i'1"ied by 'the carre...
logram (or covariance function
spectral distribution
F(A.).
R(n»
to the same extent as by the
vfuich aspect one prefers depends on the
physical context in 1fhich the problem is given.
R(n)/R(O)
The correlogram
is usually preferred for the estimation of parameters in
finite schemes occurring often in economics.
Knowledge of the spectral
density is often helpful in the estimation of deterministic components
(Section 14).
The basis for any estimation in the field of random sequences is
given by ergodic theorems which here occur as versions of the law of
large numbers.
The question is
one and the same
r.v., say
X ,
o
~hether
a multitude of observations on
can be replaced by successive single
observations on a series of other related
r.v.'s,say
XI'
x , •••
2
If this can be asserted then either of both sets of observations can be
used in a statistic which makes statements about the distribution of a
typical
r.v.
x
o
of the sequence.
To ex:Plain the meaning of the ergodic theorem further, we may think
Of
n
identical processes
(xii»), i
= 1,
••• , n, running parallel
in space (coordinate
place
n
and simuJ:taneous~ in· ti.rre
i)
observations on the
r.v.
x
.
different points
i, by ·n
t.
at :fixed time
t0
Then 1Te reto
taken at
observations taken at the time points
t , t , ••• , t ·1· and at the same point
o
~
l
i
0
(of course, it nnyalso
be possible to take several observations on the same r. v. in one process.
Then one does not need
,,'eak
n
parallel processes).
raw of Large Numbers L11e follow DOOb1.3.47' p. 489 ff):
If· (xt }
is a atationary random sequence then the identity holds (between
(8.1)
n
Proof:
->
00
~
n
n-m
-->
''1e have (Theorems
~
j
00
2
xj =
j=l
Instead of having
such that
1
n
n
1
l-.i:.m.
j
r. v. IS)
y(O+) - y(O-) •
run from 1, it could start from any
ill
(thus allowing any negative or positive indices).
and .3)
1_e21cinA
1 :n.
-
n
~
1-e2rciA.
1
1/2
1
-Il
~
j
R(j)
=J
..1/2
e21CiA.
n
1-e 21CinA.
l_e 2n:iA.
dy(A.)
dF(A.)
46
The integranc1.s are identical and tend for
equals
1
(~l)
rounded
theorem
H'
~:= 0
for
and
for all
H*
y(Ot) - y(O-)
o.
to
<Xl
0 (~)
'Which
. o.
I' Hospita'J:7•
They are
Also, because of the Hilbert space mapping
y( Ot) ... y( 0- ) •
The theorem shows that
d.F( 0) =
1 de
(distance preserving) (Section
exist and to equal
r.v.
-->
n
n, and hence by Lebesgue's b01.U1ded convergence
(8.2) is proved.
< --- >
else1vhere
0
.
x
(8.1)
is seen to
(End of proof.)
n
.E x.
n . 1 J
J=
1
:=
4),
-
converges in mean to the
wltlch 'Will, hm-rever, not be zero in the mean unless
0,
I. e., if the spectrum has a jump at
x
needs not to
converge to a constant, and there is no sampling method (based on finite
samples) that can do more than indicate vaguely that any particular
A-value is a jUl1lP point of
x
F.
is a consistent estimator of
Corollary::
If
l.i-.m.
n-m -> 00
l.±.m.
n-m -> <Xl
Proof:
< Ii <
-1/2
If, on the other hand,
0
(i.e., tends
n-m+l
1
n-m+l
L:
j=Jll
xj
n
L:
j:=.m.
n(j)
e- 2n:1jli
n(n) e- 2n:inJ.L
:=
e-2n:ijli
Apply above theorem to the process
function
i.p.
to
0).
1/2, then
n
1
dF(O):= 0, then
(x
n
Y(Ii+ ) ... Y(Ii-)
= F(Wr )
- Y(Jl-)
e- 2n: i llJl)
and spectral distribution
,
'With covariance
F(~+Jl) (mod 1 in
the argument).
The foUmring iheorem is the strong lavr of large numbers.
It
establishes convergence 'With probability one (or a. e .) instead of meanconvergence, on the basis of stronger assumptions about the process
(x ).
t
47
Theorem:
If'
{~J
is a stationary random .sequence and, if' -1/2
<
I-l
:5 1/2
then
n
(8.8)
lIn L:
k=l
(or with probability
1)
~
e-21Cll);1-l
->
0
a.e.
if there are positive constants
K
and
a
such that
(8.9)
The left side can be replaced by
e
(8.10)
=
1
.
2
(n+l)
=
1
n+l
~
j,k=o
n
L:
(1 -
j=-n
1/2
=
R(j-k) e- 21Ci (j-k)1-l
J
sin
Jn+l
.Jl ) R( J.)
e- 2:n:ijl-l
2
1C(n+l)<2:.:.u2
2
(n+l)2 sin 1C(A-I-l)
dF(A)
-1/2
Note that in the usual strong lavT of' large numbers the x
are indepent
n
dent. Tnen
L:
k- 2 E(~) < 00 is a suf'ficient condition.
k;:l
Note further that the condition of the theorem is satisfied for all
(8.11)
a) with a:;: 1
if L:
IR(n) I < 00
o
b)
if
c)
with
IR(k)l < cons-t k- a
a:;;:: 1
if the
r-t
(a
> 0)
are orthogonal.
I-l
48
Proof:
Note that from (8.9) follovrs, u.sing that the left side tends to
O.
zero, that the mean limit in the preceding corollary is
1)
The identity of the expressions in
(8.10) is easily seen from
Theorem 2, u.sinG
2
2
n
E
j,k=o
2
sin (n+1)(A-t.tht
2
sin
Assume
2)
I-l.
= O.
Choose an integer 13
Then for each m == 1, 2,
:rc(A-Il)
13a > 1-
so large that
by (8.9), putting
n = ml3 ,
K
<
From
p( 'y'
for any
E
0Cl
P{
m=l
~
r.v.
I
J
=
IYI? €
c::)
y
dG(y)
~
J /f12
'Y/?
with distribution
dG(Y)
~
€
G(Y)), we have for any
€
>a
0Cl
00
•
r/+l
Em
for a given
the
x
t
given by
{
ro:
I
€
be the set in the joint probabi1ity-sJXlce n
of
1 _
__
nf3+1
and let . / \ be the set common to infinitely many of these
4g7:
M2
€
_1_
Let
p.
=
Em Llfulmos £327,
co
OG
n
==
lim
E
m
N=l
sup
m
E
m
Then (this is part of the Bore1-Cantelli lemma. LIIaJ.mos£3jJ,p. 201:7)
co
p(J\) ~ p(
as
N
-> co
vThich the
,
U
E.) <
hence
P(/\.)
J
j=N
j
J
m-af3
-> 0
n==N
Thus the measure of the set G
O.
==
n on
J3
L:
X
j
j=o
nP+1
converge to zero, is zero, 1.e.
1
lim
e
--< 2€
L:
r.v.' s
1
do not
-
co
K
P(E.)
L:
'With probabi1ity
E {
rJ3~
-<
-<
1
(or a. e. ) •
E[
1
(nP+1)2
m$
for a suitable constant
n
-
x.
L:
J
0
L:
·J+1
1
n+1
2
111+1)f3
IXj
«m+1)f3 _ rJ3)2
1('
0
Moreover,
1
n+1
nax
n ~ (m+1)f3
R(O)
=
xj
nP+1
==
1/
-<
rrP
x.
L:
J
0
,2}
1
(rrP+1)
L:
2
j,k
EClxj~l)
Ie'
"2
m
and large enough m.
It follo'l-1'S.., as in
the argument just used, that for the maximal as '\Vell as any
n
in
50
nf3 <
n
n
1
lim
n+l
m -> 00
< (m+l)13 ,
( l:
x.
J
j=o
(because this limit holds for any segmnce
3)
to the
For
r.v.
IJ. ~ 0
x
n=n
i
nP
x.)
l:
::::
J
j=o
-> 00), thus
a.e.
0
f'ina~ (
S.8).
we apply the above proof as in the foregoing corollary
e-2~itlJ..
t
-
(End of proof.)
As a r.v. or a statistic (estimate) is consistent certainly if it
converges
a.e. to a constant, this theorem gives in
(8.10)
sufficient conditions for the consistency of the statistic
Taking the eX]?ectation in
n
1"
-
n
-e
L...
k:::::l
P
k
(8.8),
i'Te have with
!-1t
or
(8.11)
(8.8).
== E ~
e -2~:ik!-L -> 0
if a condition l:ike (8.10) or (8.11) is satisfied. If these hold for
1 n
o.
IJ. :;:;: 0 one gets - l: 1-1c -> O. If all u.. :::::!-L
o this means ~t a ==
.K
n
n 1
-1
l: x
Applying the "reak la'~" of large numbers, we see that x i:: n
t
t==l
converges in the mean to zero (or:
consistent 'tdth limit 0) if
-1 n
and only if n
l: net)
A simi1
lar condition can be obtained for the mean convergence of (8.8).
x
Note that the convergence of
-x
;...::= 0;
there is a j1.11lIp in the spectrum at
Obviously, a jump of size
;... :;:;: IJ..
R(n)
can be used for testingi-lhether
p
contains the additive constant
(8.8)
tests for a
jlU1Ip
at
> 0 in F(;"') at;...:::; 0 means
p;
in this case
(8.3) cannot
converge.
The follO'l'ring theorem gives conditions for the consistency of the
estimate
51
n+1
(8.15)
L:
j=o
the sample covariance for
Theorem:
R(v) •
Suppose that for any fixed
(.8.16)
E(x
n+n r +v
xn+n
x
r
n+v
n
is a stationary sequence.
exists.
Then
n-m+1
This limit in the mean is
(8.18)
n
n
~
n+l.
1..;..."
.~u
->00
J
1
1.t-;m.
n-m -> 00
(8.17)
J==O
R( v)
~
1
j=-n
if and only if
E(v
"J+v ""'xj
If further for some positive constants
n+l
n, n f
and allf
n (f1 sta.tionarity of fourth orderrt ) ;
is finite and independent of
(x)
xn
v
(1 -
111)
n+l
1
n
L:
~
)
V 0
==
In(v)
12
~
and
a,
and n
K:
E[X +
j
:x-:
v J
•
=:
1, 2, ••.
x x J _ IR(v) /2
v 0
~
then
(8.20)
lim
n ->
n+l
Clearly (8.15)
Remarks:
tions
co
j=o
xv+j x.
=:
R(v)
with probability
J
is an unbiased estimate of
R(v).
'llhe condi-
(8.16)-(8.19) on the fourth moments of the sequence have the ef-
feet of decreasing the influence of distant observations.
Proof:
(xv+n
The
x
n
ea!~ier
- R(v)} :;:;
theorems are applied to the stationary process
(z~v)}
,ihich has zero expectations.
From the above
1.
52
weak law follows the ex! stence of (8.17); further, its mean lim!t i s
R(v)
if and only if
(8.18) holds (using the right equality in (8.3».
(End of proof.)
6cn J
Clearly, if
is strictly stationary with finite variance,
all conditions of the theorem are satisfied (e.g., (x
n
in this case LDoob
l5!!7,
n
n
E
n+1
-->00
IR( j~ 2
j=o
is necessary and sufficient for
tion
0
for jumps of
For any covariance function
R(V)
F(A):
with spectral distribu-
F(A)
1
lim n ->.ilO n+l
The
=
(8.20».
We further have this criterion
Theorem:
real Gaussian;
49!!7
p.
1
lim
J is
n
=
E
E (F(A+) _ F(r-»2
A
j=o
A summation is taken over the enumerable set of jump points of
F(A).
This sum is zero if and only if there are no jumps.
1/2
Proof:
Using
R(n)
=
J'
e2JtiAn. dF(A), we have
-1/2
1/2
'U·
1
1 J
= n+l
n+I
n
E
j=o
-1/2
1/2
=j'
( 8.21)
-1/2
==
X
1- e 2d ( r-~) (n+l)
(n+l)( l_e 2Jti (r-t-L»
(F(A+)-F(r-»
2
.
1/2
dF(A)dF(~) -->
j' (F(A+)-F(r- })dF(A)
-1/2
53
Here the integrand in (8.21) converges boundedly to
elsewhere
(End of proof.)
The estimation of the spectral distribution function
be done by way of the covariance function
2).
is available (Theorem
Theorem:
let
R(n)
F(A)
can
if' knowledge about it
But there is also a direct way to do this
ex) be stationary, and for any integer
n
m let
in probability or
n:
,
1
1 ~m -
( 8.21)
n
l:
x.
-->con+l j=o J+m
X
j
=
R(m)
in the mean or
with probability 1.
If
, J.l2 are continuity points of F( A)
l
for the integral over the periodogram
J.l
then j.n case
(c). we have
J.l2
( 8.22)
lim
1
n+l
J'
J.l
with probability 1.
continuity of
F(A)
l
The lim! t is uniform in any closed interval of
J
holds in probability.
with probability 1.
In cases (a) and (b), (8.22)
Proof:
1)
Assume (8.21) holds for all
m with probability one.
Em be the set in the joint probability-space
the sum in (8.21) converges pointwise, thus
Em
Let
n of the xn on which
!l(E)
m
= 1.
has measure zero and so has the (countable) union
ro
€
n,
ro
r/.
The complement
co
U::::U
m=-co
and lceep this as argument of all r.v.
U,
Let
f S
~.
x.(ro) fixed,
J
or, which is the same, assume we are given a sequence of con~lex numbers
1
n
X.+ x.:::: R(m).
x j with the property that
lim n+l
L:
J m J
n -> co
j::::l
2)
The
functions of
=
f.-
1
n+l
are monotonic non-decreasing and uniformly bounded by the constant
n
max
(n+lfl
n
(because
lim (n+lr
l
L:
o
L:fXjI2::::
R(O)
< co),
Hence by the first theorem of Hell...v there
k :::: 1, 2, .••
such that
¢~(f.-)
i~
and equal zero at
a subsequence
f.-
< -1/2.-
~,
converges to a non-decreasing function
¢(f.-), say, at each of its continuity points. Further, the m-th
Fourier-Stieltjes coefficients of the
-.
2~ R(m)
¢ (f.-)
1Jc
which is (by the second theorem of Helly) then also the
Fourier-Stieltjes coefficient of
coefficients of
¢(f.-). Hence the Fourier-Stieltjes
¢(f.-) and F(f.-) are the same (and also for all other
possible llinit functions
¢(A».
~ence all these limit functions coin-
cide a.e. in Riemann measure (Theorem 2).
each
¢(r.)
converge by (8.21) to
The exceptional set
contains only jump points 'Which must be those of
E for
F(f.-) .
55
Thus (8.22) is proved for continuity points
almost all sample sequences
m,
for
~
->
A, A
l
2
00.
countable) union of all approx:imating sequences
same limit with the same exceptional set
total sequence
¢l(A), ¢2(A), •.•
F(~)
of
and
Now the (possibly over-
c {¢n (A)}
has the
E, and it is equal to the
(because if there were an infinite
sub-sequence left containing no subsequence tending to
F(A)
a.e.,
apply 2) from the beginning to show that it still has a subsequence with
F(A) as linrlt
a.e.).
Hence
(8.22) holds indeed for all
n.
The uniformity of convergence in a closed continuity interval of
A at a fixed ro
3~
If
follOlvs from the lemma
6) below.
(8.21) holds in mean, then it holds also in probability.
Because for a mean of any order
PQ snv(ro) - R(v)
I
> €) :::
J
~f>
where we have put
fJ.
p
~
d+t:5
€-p
€ > 0,
J ~IP
dfJ.
-> 0
€
for the probability distribution of
~ ~\:+v -
l:
k
4) Assume that
1, any
(a)
is true.
R(v)
(The following proof is analogous
to one contained in a paper by HoeffdingJ!i.2.7, p. 190.)
The following
lemma is needed:
Let
F(A)
everywhere
,;tn
be a distribution Which is uniquely determined almost
T::: ["-1/2, l/~} by its Fourier-Stieltjes coefficients
R( 0),
i
-2 R(m), m ::: +1, +2, ••. •
F(A).
Then there exist for every
and a
ron·
5
= 5(A t ,€)
Let
--
€
> 0 such that
Ar
be a continuity point of
> 0 a positive number k::: k(Are)
<
, G(;\ I) _ F(;\') ,
holds for every distribution :function
p(m)
coefficients
pea) for
m
= ~1,
< 5,
~k
••• ,
Proof of the lemma.:
;\ I
tive numbers
5
and
soever 'Way k, 5
we have a
G (;\)
le
I~o and
p(m)
i
-
2:mn
•
and for some
> a there do not exist posi--
€
In particular, for
E.
I < k- l
R(a)
P.k:m
I
~
,
I Pmu For It
E.
= 1,2,
k
every continuity point of'
R(m)
F(;\)
Given
the above lemma.
n
P:rfl Pn(a)-R(a)f
€
> a,
--> 00,
<
i
2:mn(n+l5
and
with
5 = k...
l
the Fourier-Stieltjes coeffi-
->
G (;\)
k
F(;\)
at
which is a contradiction.
let k(;\', E)
Given further any
=
...
G(;\)
I
5) Proof of the theorem for assumption (a) :
p (m)
I.e., in what-
such that
cients converge, and so, bY' the two Helly theorems
F("').
Then for at least
for 'Which the assertion is true.
with coefficients
tit(;\1 )-F(;\')
point of
< 5
R(m)
are chosen there exists a1'WaYs at least one
~
'G(;\') - F(;\') ,
if its Fourier-Stieltjes
Assume the assertion is false.
one continuity point
k
G(;\)
satisty the inequalities
R(a)
~2,
€
11
,. I Pn(m)
-
i
2ran
;\,
be a continuity
5(;\',e:) be defined by
> a, choose N so that with
n+l-Iml
1:
j=o
friil)
and
Let
XjXj+lm
R(m)' <5
for
I
m ::: .:!:l, ... " :':: k ) > 1-11
57
n > N.
for
This is possible by assu.nrption (a).
¢n·(~).
Fourier-Stieltjes coefficient of
G(~)
in the class of the
6)
In
Lennna:
Let
f(~)
Let
2)
->
f(~)
~
m
= b
~i ~ ~ I
vri th
:5 ~"
/I. • 1 - ~.
1.+
1.
~ /l. i +1 '
f n (~) -> f(~);
f
If'
f (~)
n
are hence
~.
E.7.
Given any
by a finite number of points
<
5
such that
I f(~
I
)-f(~")
m
Further,
Jr(~.)-f (~.)
J
n J
I:
.-_]
J~.
~i
a closed interval.
monotone non-decreasing.
n
is uniformly continuous in fa,
so that this sum is
if
E.7,
~ € .La,
for all
be continuous there and aLl
one can subdivide fa,!!.7
<
¢n(~)
All these
<
€!2.
a
1 <
1->0.
Then for any
~
===
~l
€
>
< ~2 < .,.
N01T
take
in .La, "!:.7,
< ~ -< /I..J.+1 '
-
€.
one concludes similarly.
Note that from (8.22)
it cannot be concluded (by differentiation)
that the periodogmm
1
= -n
x.
J
can be uocd as a consistent est:blJate for the spectral density.
vn11 be
0,
for any
€/2
(r.. )-f(~) -< f n (A..1.+.
1) - f'(A...-1.+ 1) + f(~.1.+1) - 1'(~) <
f (~) < f(~)
n
-
m-th
1re referred to the
fn(~)
f(~)
is the
defined by the lenuna and hence
Then the convergence is uniform in
Proof:
Pn(m)
ex~nplified
in ·the next section.
This
58
9.
Classes of Estimates for the Spectral Density e.nd
Crit~ria
for
Their Goodness
As mentioned above, the pcriodogram (changing novr the
interval to
n
1
(9.1)
t..
L:
j::::l
::::~
~
belonging to the sample
xl' ••• ,
of the spectral density
F I (t..) == f(t..).
is not a consistent estimator
itt..
He show this for a sequence of orthogonal r. v. IS
where the
x
t
N( 0, 211:).
are independent
0
X
t
)
This special sequence is very
A:::: 0)
conunon and f'unc1amental (especially for
(e
so that we cannot 'H8.nt
to exclude it from the class of stationary sequences for "I'rhich we vm.nt
consistent estinntes.
R(O) :::: 2rc, R(n) :::: 0
The sequence has the covariance f'unction
for
n:::: 0, and
any requirement of smoothness for
I R(n) I .
thus has
for
t..:::: A.
o
FI(A.):::: 2rc; thus it satisfies
f(A.), or of speed of decrease of
is the square of the standard nomnl variable,
distribution fur all
n
'vrhich hence at this
never approach a degenerate distribution ~
A-value can
the unit step f\mction).
He
remark that, as a nutter of fact, cornpletely real examples can also be
given.
This example shows that it is not promising to work "Idth the periodogram directly as an estinnte for
f(A.).
Yet as the spectral density
has become increasingly important in a number of applications, a great
variety of other estinntes has been proposed.
listed here.
Some of them vrill be
The simple estimation of the integrated density (Section 8)
59
suggests that integral estimators might have reasonable properties.
(He follow closely Parzen's (1) paper.)
Clearly now some further conditions need to be required from the
sequences considered:,
vre asstune henceforth that
continuous; 1. e., f(A)
= F I (A)
4
F(A)
exists everywhere.
is absolutely
lIence by Theorem
only moving averages
2
Ic.1
J
.E
j=_oo
E(~ nrn
1)
<00,
=
Bnm
are allowed (these are often also called linear processes and the sequence
'white noise, or pure 'White noise if the
tion independent; the reason is that here
f(A) =1;
~
n
are in addi-
all frequencies
have the same amplitude LGrenander and Rosenblatt (117).
Further, as
a minimum assumption there occurs in most papers the existence of
fourth moments
E( 'Xt 14 )
< 00, often in the form (8.16), 'which
for the covariance estimate.
viaS
needed
Note that by these assumptions the class
of our weakly stationary sequences is narrowed in the direction of
strictly stationary sequences (for vihich (8.16) obviously holds if
eXistent) .
Note further that under certain smoothness conditions for
f(A)
we have the representation
R(m)
e -imA
m=-oo
This is true, for example, for a continuous, piecewise smooth function
f(A) or for an integrable
•
(1), p.
927;
:rCA) with bounded variation
L e.g.,
Akhieser
also a Lipschitz condition or less is sufficient .Lsee, e.g.,
Tolstow' (1), p. 70-75; p. 112..7.
More general conditions can be obtained
using generalized derivatives and summability procedures
.LA.
Zygmund (1), p.
527.
60
The vast majority of all the estimators in use is quadratic in
XJ.'
the salJll)le observations
1
N
2rcN
.E
v,J..l=l
... , xn ,
hence the most general ronn is
xx
b
vJ..l
vJ..l
= b
The latter ass'l.'Ul:U?tion of symmetry of the matrix
is no loss of
generality, also not that the
x.y.
bVJ.l.
are real for real
J..lv
A ili.scussion
is found e.g. in Grenander and Rosenblatt (1), p. 118.
The next somewhat less general estimating quadratic form is given by
(9·3)
where the sample covariance fUnction
R_(V)
Iv I
1
N.E.
N
t=l
=-
l\f
has already been used in (8.15) as an estimator for
the factor
.:!: 1,
± 2,
*
fN(A)
i-Tith the notation
Am(N) =
~+~
••• , (9.3) can also be written as a discrete average
over values of the periodogram (9.1)
here
(except for
~) . The constants ~(v) are to be chosen so as to
satisf'y some criterion of optimality.
m ::: 0,
R(v)
=
2rc
2N+l
1
61
with
T
~
N a suitable integer valued function of N.
The est:trrnte
(9.3) (in any o:r its possible forms) is. often called a spectrograph
estimate.
It is insofar a specialization of (9.2) as the matrix
B
has in its main diagonal respectively in any side diagonal identical
elements (analogous to matrices of Toeplitz forms).
Again another
estimate is obtained by generalizing in (9.3) the bounds of s1.umnation to
g(N,~, .•. ,
a function, say
sample~,
XN'
,
xrr),
of the saluple size
~ich is integer valued and
N and the
~ N.
(See, e.g.,
Parthasarathy (1).)
Parthasarathy's statistic equals essentially
(9. 6)
=
2:rc
where now, hO'vrever,
g
depends also on the true
,
1
specifically chosen function.
R(v) , s.; ];']y( v)
is some
Remarkable at this statistic is the fact
that it no longer needs to be quadratic in the observations (because
of
g(N,~».
g(NJ~)
~ich no longer contains the constants
Parzen
of
-00
One might choose a sequential method to determine a function
k..N( v) .
L (1),
Let
p.
h( n)
3017
R(v).
concentrates on the following specialization
be a bounded, even, square integrable function,
< n < 00, with some smoothness properties. Possible choices are
(others see Parzen (2»
h(u)
==
1/(1 + ful)
(Bartlett's estinate)
~-
lui
==
or
Iu I
~ 1
otherwise
or
== sin
u/u
(DanieLl's estimate)
e
or
(~
(9·7)
h(u)
Iu I <
(0
:::
>
I'ul
( truncated .Teight f'tmction).
1
1-
NO\ftake
l~(V) ;::: h(~T
where the
a > O.
AN
fv~
in (9.3) formed with this
fN(iI.)
exponential type.
Lomnic1d and Zaremba
the further specialization
IDa tes
ct
are positive constants tending to
*
The
e
(1),
p.
wi.th
~(v)
--> 00,
N
and
are called from
(1), PI'· 13-317 considered
~T;::: a/N, a ::: log p, a
of algebraic type are formed with
fparzen
f
0
~(v)
> 0, P >
= h(~v),
1.
~
Esti-
as above
308 and (317.
Further estimates are obtained by truncating the donain of summation
here
h(u)
from
(9.7)
is used.
Further choices for
h(u)
are that
by Hanning:
1
"2 (1 + cos :rex)
h(u)
,
;:::
o
Ixl>
0.5h + 0.46 cos :rex
Ix 1.:5
1
lxl
1.
1
and that by Hanning:
h(u)
==
o
>
The folloiving e~~ressions (or figures of merit) are used to check
the goodness of the estinntors (besides the criteria of biasedness and
ordinary consistency) ["Parzen (1), p. 30!!J.
*
* - E fN(A»
* 2
:::: E(fN(A)
Variance
~2 (fN(A»
bias
*
* - f(A)
b(fN(A»::::
E fN(A)
*
= ~2(fN(A»
*
11 2 (fN(A»
mean (or expected) square ~
= E(f;(A)
*
+ b 2 (fN(A)
- f(A»2
integrated square error (or: average square error)
1C
J(f;(A»
::::
J
(f;(A) - f(A»2
dA
-1C
~
integrated square error (or: expected average
sq~mre
error)
E(J(f;(A») •
Criteria of the goodness
of estimators are the follornng:
asymptotic efficiency for a. certain sequence of estiroators
measured by
(~
lim
N
->
1)
oa
E(J ) is the minimum mean integrated square
N
*
*
error obtained by minimizing E(J(fN(A»)
with respect to fN(A)
where, for any fixed
N,
taken out of the considered class of estinmtors.
Let
deN)
be a function ivith
lim
N
criteria are used:
->
d(N)::::
oa
oa.
Then the further
64
consistent of order
deN)
at
'). .
if
o
~ (N) == deN)
boundedly consistent of order
deN) at
uniformly consistent of order
deN)
It (N,')...)
'). .
if
if
converges uniformly in ')...
uniform-boundedly consistent of order
deN) defined by a
combi~
nation of the two latter;
functionally-uniformly consistent of order
deN) E(sup If*(')...) - f(')...) I2 ) < 00
N
integratedly consistent of order
deN)
if
for all N,
deN)
if
deN) E(J(f* (')...) J ) --> const > 0 ,
N
(strongly) consistent in the mean if
rc
lim
N
->
E
cXJ
-rc
uniformly strongly consistent
lim
N
-)w
(J
max
I ')...I:s
1C
if
(f;(')...) - f(')...))2
a>..)
I:
0
The two
h~tter
criteria are applied in the paper by Lonmicki and Zaremba (1) .
•If.
The properties desired from an
on the IJlrrpose for ,'Thieh it is 'Used.
estll~te
1" (;\.)
:N
depend largely
As these differ vJio.ely one looks
from case to case for different properties ifh:lch results in a very inhomogeneous appearance of the available knowledge of spectral density
estimation.
A vievrpoint 'which is of'ten important is the corrwutational
easiness of the method selected.
A remark about different concepts of consistency is here necessary.
The usual probabilistic concept means convergence in probability of a
series of random variables (estillutors)
formulae:
exn (rn))
to a constant
x.
In
if
p(E(e)) = O. (Note that convergence in
n __ >
n
the mean includes consistency but not vice versa. Thus the variances
then for any
€
> 0) lim
of a consistent sequence of estimator lllay not even eXist.)
E.g.)
the consistency of the estimator for the spectral distribution is taken
(Section 8) in this strlct sense.
All of the above na,rrOirer consistency concepts are defined in
terms of the quadratic mean (or mean square error).
IIOirever) a plausi-
ble reason for the restriction to square mean convergence lies in the
nature of the conditions
(8.16) and (8.18) ensuring the convergence of
the sample covariances.
Though these conditions are by no Ineans necessary)
they are pYdctically as 'well as theoretically rather sinwle and pre8umably rather close to the necessary ones.
On the other hand) the
66
convergence of the sample covar:4'lnces appears intuitively to be something like a minimum assumption for the process for which any spectral
estimation is done, because one would not be interested in prdctice
in the case vnlere a spectral estinutor converges yet not the sanwle covariances.
Hence one assumes (8.18) which, hovlever, is nothing but
the convergence of the variance of the sample covariance to, zero.
Because of the close relationship between spectral
variance
est~nators,
estL~ntors
and co-
the above definition of consistency for spectral
estimators appearn thus to be quite appropriate.
Besides ,the existence of a spectral density, we hence aSSluue that
the fourth moments
E(x
for any integers
order stationary).
x
xi)
n+u n+v n+w n '
u, v, \{,
are fini'te and independent of n (fourth
HOivever, in the particular cases considered in
Section 10, much stronger assuraptions
'~d11
be nnde.
10.
Some Properties of G'pectral Density Estimators
The folloiti.ng theorem is stated here without proof LLomnicki and
(1),
Zaremba
p. ~7.
{~} be a stationary process with
Let
€t-J.
and l:: Ih.
J
E
€~
'"
I
<
00
4
30" + K4
,
< 00
E €t '" 0,
E
for each
t;
E(€ €t€ )
=0
s
u
€~ = 0"2 > 0;
E
E,€2 €2
= 0"4
unless
s
t
s
€~ = ~3
for
~
s
= t = uj
<00
tj
'F
E
,
€
S
€t€uv
€ '" 0
unless two pairs coincide.
These conditions are, for instance, satisfied by an autoregressive
sequence with :suitab1e errors (e.g., tndependent ones).
=
(10.1)
l::
Ikl~ N-1
Au
~(k)
cos 27C kA.,
Au
The estimator
suitable constants,
is now consistent in the mean if and only if
(i)
i±m
N
00
;I,;i.m
(ii)
N
(iii)
->
->
l::
Au'"
1 ,
~~
N=fkT
l::
00
Ikl~ N-1
R2 (k)
<
any fixed
=
k
0
00
Clearly, by definition, uniform strong consistency entails consistency
in the mean.
68
If, moreover,
I
:qn(k)
co,
<
then (i) and (ii)
are also necessary
and sufficient for uniform consistency_
Conclusions clro.wn from this theorem:
1)
For a sequence
~t
J
satisf'ying the conditions stated in the
beginning, the periodogram is not a consistent estimator for
f(A) •
2
This "Ivo1.lld namely correspond to
2)
r-,tlT
-
~ (I ,
and
L:
IleI ~ N-l ""1f-11r
A.. __
-k[If
for
YeN)
I k I:::
yeN)
lIe I >
YeN)
is a positive, integer valued function
is consistent according to the above theorem if and
< N. This estimator
onJ~1
if
YeN)
yeN) ->
1::i.Ju
N
N
1.
=
o
and
=
The truncated periodogr&TI is obtained in putting
I kf IN
where
=1
~Jlr
->
00
-> o.
The same is true if one chooses
Both matrices formed by these
for
I k I < YeN)
for
11;.: I> YeN)
1\' s
are asymptotically equivalent.
for example, choose
:3) Bartlett IS smoothed periodogrom is obtained for
One may,
<X)
i
I k I < v(N)
11'4 jv(N)
-
=
11\:1
v(N)
Again one h~s a consistent estimate.
as before.
~ v(N)
,
So far there is
no satisfactory method available as to hoYT to choose a "1)est" estimate
v(N).
or a "best" function
lIIuch experimenting h.."ts been done, yet the
class of admissible sequences seems too inhomogeneous to consider this
a promising
4)
v~y.
obtained by integrating over -:ic
vations
:1"r/l-.)
* in (10.1) are
Many estimates vlhich can be'written l:ilce
~,
.•• ,
~r
:s I-. :s
a quadratic form of the obser-
1L
whose coefficients depend on
1-..
He now slJ.llnna.rize some of Parzen' s results EParzen (2), p.
30:27.
The assumptions about the sequence are as follmrs
E xt
=°
for all
t,
E(lxt
I
<
00
(hence
const
<
00
== p(vl , v 2 ' v ) is independent of t,
3
E(xt Xt+v X
Xt+v )
l ttv2
3
r; 1 R(m)
!~
1) <
FI '
(I-.)
eXists) .
The highest order of consistency of an estimator of
tion 9) will depend on the "smoothness" of
in terms of the decrease of IR(m)
I for
Exponential decrease of coefficient
I·R(m) .
l:s
canst
e- plml for all
p:
m
F'(A)
->
let
m, but
00
:
= f(A.),
F"(I-.)
(Sec-
expressed
70
pm.
J.I R(fUID _7) I
i
e
and let for any
c
1
u
subsequence
-> 00
r
> 0:
mr t € IR(m) 1 ->
€
>a
Ih(u)
1<
. 1 I~
J.n u
Let
~;::
Let
'yet not for any
decrease exponentially and let
const, all Uj
1.
ex < 2bp.
for any
R(m)
and for a
,
:=:
O.
€;::
a •
v lul<v
00
00
m
i
(0,1)
in
1
L:
Algebraic decrease of degree
Theorem:
e
->
for almost all
>a
·,lim
V
-> 00
Il-h(u)
I<
const lu I in
A T-b 'With A
>
b
0,
>
h(u)
lu ~
1 / 2.
Ij
be such that
luh(U)
Let. ex
I<
const
> a so that
*
E{J(fN(/I.»}
'With
Then
:=
1
2n:
satisfies
T
lim
N
--> 00
2b
=
log T
2n:ex
S,
:r(
S ==
L:
2
R (m)
= 2:r(
J
f2(/I.)
The covariancea for any two freqm ncies
lim
N __ >
where
in
/l.
_N_
OQ
>
*()
/l.
log N cav ( f N /1.1 ' f N /1.2 ) ;::
5(/1.1' /1.2)
i
*()
dA..
€
•
i
>
2b
ex
is the Kronecker function.
satisf'y
€
f
2
(/1.1) {l + 5(O,~) } 5(~,/l.2)
The convergence is uniform
11
The supremum of the bias satisfies
1-m
"
_>
N
:foh
co
sup
/I.
'log N
and the mean square error cons'equently
2b
ex
fN(/I.)
*
Thus
Finally,
2
f (/I.) (1 + 8(0,/1.) }
log N
is uniformly-bounde~{ consistent of order
f;(/I.)
N
is functionally uniformly consistent of order
2
N(1og Nr :
N
lim sup
N
-> 00
For algebraically decreasing covariance
Theorem:
Choose
For
p
in
r
> 1/2,
~1/2,
let
E7
For positive constants
mt R(m)
and let
€,
1-> Rr = const > 0
b
= 1/2p, Em = BT- b ,
H , II , H and
o
R(m), holds the following
q
as
B
~ Ho for all u;
q > P - 1/2, let
lu(1/2)+€
h(u)l
~
H,
lui
Il-h(u)l ~ Hq
~
1.
Then the mean integrated square error
1
=2i(
1ul q , Iut
->
> O.
satisfy the inequalities
t h(u)j
m
~ 1;
h(u)
00
•
72
satisfie13
:::
2~B
S(h) :::
J
B2.P- l S (h)
S(h) +
p
1-there
rc
2
S ::: L: R (m) ::: 2rc
J
f2(A.) dA.,
-rc
If
I'
S (h)
P
p
_00
> 1, 1 < p ~ r, b ::: (2.P_l)-l, q > p-l, then the bias and the
-p+l
> 1, 1 < p
~ r,
lim
sup
N
->
are for some
00
b::: 1/2.P,
~-l/P
C::: const
<
00
< C
sup
A.
~
I'
2
h (u)du ,
-00
mean square error of fN(A.)
*
If
:::JooG-h(U~
ful
00
q
> p-l, then
E(sup
A.
I
f;(A.) _ f(A.)
12
which means fUnctionally uniform consistency of order
)<
00
,
~-l/p.
Note that a sequence of orthogonal variables falls under neither
of the two considered cases of decrease.
There are indications that a consistent estimator for the density
f(A.)
cannot be constructed merely on the basis of the assumption of
consistency of the sample covariance functions
~(m)
though a proof
or a counterexample for this is not at hand.
From the abundance of literature on the subject of this paragraph
we list besides that mentioned earlier a few more recent papers:
2 duo
73
Kata1V8. (1) (continuous processes);
Symposium on spectral approach to time series, with discussion.
Contributions by .renldns, Priestley; Lomnicki , Zaremba; vThittle;
Ll!!:.7
Technometrics, May 1961 (devoted to density estimation).
He conclude with the following observation:
in Section
8 (consistency of sample covariance R:N(m»
consistent, yet biased estimators of
tinuity of
that the
A.
from the
f(A o )
R:N(m)
in an open
f(A).
o
Then, if
follow easily
Assumed is only the con-
A-interval T
o
are consistent.
5ththeorem
containing A, and
0
(Al +A2 )/2
c
AO ' LAl ,A2;-7
C
2 - Al == &, using the mean value theorem for integrals,
~(A)dA =
-
L
Iml~ N-l
~(m) e
-2n:imAo
The asymptotic bias of this estinator is
made arbitrarily small by choosing
&
->
f{A o ) - f{A.')
0
Which can be
small (however, the limit
& --> 0 would yield the periodogram estimator for f(A)
not to converge;
smoother
f{A.)
is often called the bandwidth).
&
is in the neighborhood of
which needs
Clearly, the
A (e.g., a Lipschitz cono
dition or differentiability might be fulfilled) the more can be said about
the bias as a function of &.
(where
AoI
Assumed
Clearly, the (above) estimator of
is unknovm) can converge to
f(/~)
o
00
=
L
R(m)
f(A.')
o
f(A)
only in special cases.
0
-2n:iffiA.
e O , one sees tllat the terms
m=-oo
of this series are weighted by
which apprcx.whes
1
f'or
To'
'14
-> o.
any m on~ if &
of
IR(m)
I'
If one makes assumptions about the decrease
one is again in the position to estimate bounds for the
bias.
Clear~ many
other choicep of intervals &
those) are possible.
(or sequences of
CHAPI'ER III
RANDOM SEQUENCES CONTAINING A DETERMINISTIC COMPONENT
11.1
Preliminary Remarks on the Mean of a Stationary Sequence.
*)
In linear regression models a wide variety of assumptions about
the error random variables, denoted by x t ' is in use.
the means E x
t
=
'1F
t
Most frequently
of the errors are assumed to be zero.
not make this assumption.
Here we do
We require, however, that the errors form a
(weakly) stationary random sequence.
Up to section 11.3 we restrict
ourselves to models with discrete parameter sets.
However, in the
continuous case treated in section 11.4, similar and even stronger
results hold.
The observations (such as a time series) are taken from
a stochastic sequence (Zt) wi,th real or complex random variables and with
t
€
T, where T is the set of all integers.
Denoting the means by
the covariance function (or covariance kernel) is
E(ZS Zt)
cov(zs,Zt) + ~s ~t
c
'
s,t
(a bar over a number denotes the conjugate complex number).
for the ~ or proper covariance (Parzen).
€
T
We write
The linear regression is
expressed in the decomposition
,
(11.1)
t
€
T
where the sequence of constants (Qt J (often called the deterministic
component~32_7) lies in a certain regression space~.
These
cet
may
for instance represent a trend or a seasonal variation.
sequence with mean values
'1F
t
The (x )
t
(not necessarily zero) has the covariance
function
Compare tor this section a forthcoming paper by the author with the title:
Die MittelwertfUnktionen sch~ch stationarer Zufallsprozesse und ihre Bedeutung
in der Regreasionsanalyse.
*J
The question to be answered in section 11.2 is whether or under what
circumstances the decomposition (11.1) is unique.
In a statistical
estimation problem 'With unknown ('!rt1. ' if a unique decomposition is
possible, one avoids having to test for a specific mean value sequence.
In this connection the question may be raised what the most general
conditions on the sequence (X
t
! and on the
regression space cIl might be
such that the decomposition (11.1) is still useful.
There are for i~
stance good intuitive reasons for believing that many statistical pro2
2
cedures remain valid for the case that N- ( 1 0/1 1 + I ~212 + ••• +
I/PN12 ) - > 00 'With N, the sample size, tending to infinity, even if
{X } is only assumed to be a second order stochastic sequence, (see,
t
We remark that Balakrishnan 15_7 and, more generally, Ylvisaker
133_7 have
problem.
considered a problem which in a sense is inverse to our
They ask what mean values
~t
can be possessed by a stochastic
process 'With given covariance kernel R(s,t)j
in other words, for which
mean values ~t is R{s,t) - ~s ~t = r{s,t) (the true covariance) positive
definite.
Ylvisak~r's
theorem states for general sets T that this is
the case if and only if 1I~ 1/ < 1, the norm 1I~ 1/ of the mean :f\mcti(m being
taken in the metric of the reproducing kernel Hilbert space of R(s,t).
That is,
~t
is a possible mean function if and only if a sequence (a )
j
exists such that
00
~t
= .E
J=l
'With
a j R{t,t j
)
77
11.2.
Classification of Stationary Covariance Functions.
I
Given a stochastic sequence [Zt) with means [~t) , it is in
general not possible to obtain by the transformation (ILl) a stationary
sequence [X ) .
t
~
With q>t
=
The following is a counter example in the real case.
CPt - ~t we have
/I.
<}Jt
Now let [Zt) be a Gaussian sequence.
= cov(zs,Zt)
Then cov(Zs,Zt) can be written
for s,t ::: 1, 2, ••• , N (say) as any synnnetric positive definite matrix
with (N;l.) parameters ranging freely over a c~tai~ (N;~ - dimensional
domain.
The corresponding matrix LE(XsXt ) -
cp s
Ptr-7 ,
however, con-
tains only 2N parameters.
We shall now show that there are non-trivial functions r(s,t}
and
W
,
t
ways.
s,t e T, where (Zt) can be decomposed in two or more different
It is clear that in these cases provisions have to be made if one
wishes to estimate [CPt) or (x ). If there exist distinct decompositions
t
then always
We denote the class of all stationary sequences (x ) associated through
t
(11.1) with the sequence (Zt) by S.
However, if we classify all stationary
sequences according to their true covariances only, we may instead of S
consider the wider class
S
of all sequences (x!f) whose covariance
78
functions R*(m) == E(xt-tm
"it)
(11.2)
r(s,t) = R*(s-t) ..
•
for the given true covariance r(s,t).
necessarily satisfy (ll.l).
exists in
of
S
= wt
and means Ext
satisfy
w: ~t
The elements of
do not
S
However, for any element in
there
S
an element satisfying (ll.l) which then also is an element
S.
The following theorem allows a classification of all covariance
functions with interger-valued arguments.
Theorem:
The following statements in which
sand
t
range over the
set of all integers are equivalent (in the sense that the truth of any
one implies the truth of all three):
(i)
r(s,t} is non-negative definite and is the sum of a function
depending on
s- t
only and of the function
,
(11.3)
where
b
is a complex number and <PI'
terval LO,2tr.} and
9>1
I::
0/2.
P2
lie in the in-
In the real case (11.3) equals
,
b real.
(ii) There exists for those pairs c ,c of complex numbers whose
l 2
product c{C2 is constant (equal to b), a family of stationary
random sequences all of which have the true covariance r( s, t}
and whose mean value sequences are
(11.4)
where
'Pl ,
C:P2 and
LO,2tr.} and CJ>1
I::
b
g)2.
are constants with £P ,
l
In the real case is
<})2 in
79
,
(iii) r(s,t) is the true covariance of at least two stationary
random sequences with linear independent mean value fUnctions
W
t
*t
and
(in the sense that
does not hold for all
Corollary 11.1:
W
t +
const·~ = 0
t).
If and only if r( s, t) depends on
s-t only (b = 0 in
W = cei q> t can occur as mean
t
arbitrary complex, CV arbitrary in ["'0,21C) • In
the above theorem), then any sequence
value sequence with
c
the real case, the only possible mean value sequences are given by the
one-parametric families,
or
W
t
const(-l) t
c
If T is an infinite interval on the real line and if the
Remark:
processes considered are continuous the above theorem remains unaltered
valid in the complex case (where
B,
t €T).
In the real case it turns
out that there are no two processes in the class
pendent mean value functions.
Moreover, if
S with linear inde-
S contains two processes
with linear dependent mean value fUnctions then these must be constant,
except possibly those having opposite signs.
11.3.
Proof and Applications.
1)
From (iii) follows (i):
let (Xt
in S, and R(n), R*(n) their covariances.
can be written for
sand
t
J and
(xt J be
the sequences
Then
ranging over 0, ..:!:" 1, ••• ,..:!:" N,
an obvious notation with finite matrices
N > 2 in
80
(11.6)
where
H
W denotes the Hermitian transpose of the vector w.
dependence of
'!I* follaws rk
wand
any three columns in
I:::.
I:::.
=2
for
are linearly dependent.
From the in-
N large enough.
Hence
Considering this and
the (Toeplitz) structure of I:::. we can write
=0
6(n+2) + A I:::.(n+l) + B I:::.(n)
Here B ~ 0;
for all n.
I:::.(n) is the constant in the n-th lower side diagonal of 6..
n
J
Solutions of this difference equation are the sequences ro., j
with
ro~ + A roj + B
= 0 where ro
with some non-zero constants
it holds lro I
j
ro
= 1,
j
= 1,
2,
~ ro2 • I:::.(n) is uniquely given by
l
~.
and
a
As I:::.(n) is a bounded sequence,
and. we may put
:::: e
iQ'
j
o
:s cp l'
If 2
,
< 21t
j = 1, 2
Introducing the vectors
wj
...
(1, e
i
<pj
,
,
j :::: 1, 2
(T denotes the transpose) we have
(11.7)
Because of the Hermiticity of 1:::., a and
~
must be real.
In the case that I:::. is real, A and B must be real.
<P2=2 1t - q> 1 and a :::: ~;
or
q> 1
:::: 0,
cp2
Thus either
:::: 1t or vi ce versa.
Now from (11. 6) and (11. 7) follows that wand 1Jt* must be linear
combinations of WI' w2 ' say:
81
fl
('M*) = (W1 ,w2 ) \. c
~\)
"1\
2
,
Insertion in (11.6) yields the equations
,
which have a non-trivial solution a, f3 only if a 1= f3, because with
a
= f3
1= 0 it follows
a =
thus c2 = cl = 0;
further, a = Icll
the above real case alternative
912
2
2
= - Ic~ I •
= 21t -
This fact excludes
€Pl·
Now
,
which proves (i).
2)
From (i) follows (ii):
r(s,t) is positive definite, hence
for any bounded mean value sequence there can be constructed a random
sequence having r(s,t) as true covariance.
(11.4) and (11.5) are bounded
82
for any pair cl' c2 •
If' one takes here
(11.3), and cl ' c2 such that c l c 2
=b
CPl'
~j2' and b as defined from
then any random sequence with these
mean values is stationary because in its covariance kernel the term
(11.3) which does not depend on. s-t is canceled.
3)
From (ii) follows (iii) because for different pairs c ' c
l
2
the corresponding mean sequences are linear independent.
(End of Proof)
We remark that a shorter, somewhat less elementary proof can be
given which involves spectral theory (see section
11.4).
An immediate consequence of the above theorem is the
Corollary
(11.2): If and only if the true covariance r(s,t) belonging
to S contains a non-trivial additive term not depending on s-t, which,
however, is not of the form
(complex case)
or
constL'(-l) s-tt:...7
(real case)
,
then there belongs a uniquely determined covariance kernel R(s-t) to
the class
Sj
the associated mean value sequences are uniquely determined
up to a unimodular factor
Remark:
(e ir in the complex case, -1 in the real case).
The above proof shows that all the statements of this paper
apply also to any index set T which consists of all integers bounded
from a.bove or from below by a constant.
The above results allow a complete enumeration of the stationary
linear regression models whose decomposition is not unique.
We denote
by 41 the space of all admissible regression sequences rqt}' 41 is often
given as a linear manifold of the columns of an (infinite) regression
83
4It
= {X~,~
matrix X, ~
€
complex vector space.
1)
For real models one obtains the following list:
If the structure of r(s,t) is as described in corollary (11.2),
and if {lift) e
occur also
B} where B is a (finite dimensional) real or
~,
then instead of the regression sequence (CPt) there can
(<pt .'2iYt
).
This may open possibilities for the estimation
of ("iV t ).
2a)
If r(s,t) depends on s-t only and *t
efficient of a constant regression vector
if *t
= const(-l) t
€.~
= const,
then the co-
can be chosen arbitrarily;
then the coefficient of a regression vector
€
~ pro-
If ~ contains
portional to (+1, =1, +1, ••• ) can be chosen arbitrarily.
both vectors, then as the mean vector *t a multiple of anyone of them
can be chosen.
2b)
and the re-
If r( s, t) has a non- zero
mainder depends only on s-t, if further ~ contains the vectors (1,1,1, ••• )
and (+1, -1, +1, ••• ) then in the t-th component of *t the expression
,
can b~ added where y ~ c
l
t =
0, ..:tl,
is arbitrary real and
,
c l c2
= b,
c
l
and c
2
reaL
In the estimation of the regression coefficients
z
=~
+ x one will have to
the modeL
tak~
~
in the model
into account the possible ambiguity of
If the model is not unique one will have to fix
or another one definite set of
coefficients~.
automatically by some estimation procedures
in one way
This is accomplished
for~.
There is of course
often the additional diffict,l.tv that the tr'l'ecovariances r(s,t) (as
84
well as R(n»
are not known.
Thus one will have to be suspicious towards
any regression vector out of an infinite dimensional manifold in the
complex case or towards the constant vector and towards the vector (+1,
-1, +1, ••• , ~l) in the real ca~e.
However, as demonstrated under case
1) above, any bounded regression vector in f Xt3, f3
€
B J can occur as mean
value sequence and thus may not have a unique coefficient •
.In the continuous real case one has only the following trivial
classes:
1)
If the r(s,t) does not depend on v-s only then there are only
the two mean functions
*(t) and -*(t) possible.
identically and determine R(s-t) completely.
They do not vanish
The linear regression
may be ambiguous if *(t) is an element of the space ~ of regression
functions.
2)
If r(s,t) depends on s-t only then any mean function is
constant, and any constant can occur as a mean function.
linear regression space
~
Thus if the
contains a function identical to a constant
then the regression is ambiguous.
11.4.
Spectral-Theoretic Proof for Discrete and Continuous Processes.
Let T be the set of all integers for discrete processes and the
real line for continuous processes.
In the latter case the covariance
function R(t) of the process eXt}' t
€
We have for all t
€
T, is assumed to be continuous.
T
R( t)
,!
x( t) ::::
J
e
2311
tAo dF(A.)
e 2311 tAo dye A.)
where the (y(A.)-process has orthogonal increments, and
E( Idy(A)
11 = dF(A)
,
F(A) being the spectral distribution function.
The integration interval
is the A.-interval, namely £-1/2, 1/2_7 in the discrete case and (-00,
+00) in the continuous case.
The theorem in section 11.2 will be proved
for both cases together:
Proof:
(from (iii) follows (i»
Iet{x(t»), (x:*(t») be the two stochastic
processes in S whose mean functions ~(t), ~*(t) are linear independent.
let R( t), R*( t) ;
F( A), F*( A); {y( AD, (r( A») be the respective covariances,
spectral distributions and associated processes with orthogonal increments.
let Lbe the dense set of all
~values
for which the spectral densities
f(A) and f*(A) are defined and continuous.
Now for any ~
€
L and any
integer N > 0 we have for a discrete process
2ffi E
N
(11.8) 2N+I
2 it ~ ~(t)
t e- ff
t=-N
The right-hand integral exists for each N as an ordinary limit of sums
as well as a lebesque-Stieltjes
integral~/ because ~ y (A)
a function of bounded variation.
=
E(Y(A»
is
For continuous T the sums are replaced
by integrals and the A.-domain of integration is (-00, 00).
If now ~ is
further restricted to the dense subset L of L where the derivative
~ I (A) is continuous then (11. 8) tends for N -:> 00 to ~ I (~) because
Y
y
the integrand tends towards
at A
=~
elsewhere
e
l/
(with more general integrands)
86
5 (~) d~ = 1. A limit similar to that obtained for (11.8) is
I.l
obtained for the expressions on the left side of the equality
with
R(t) - R*(t)
in f(l.l) - f*(I.l), I.l
s
€
= ~(t~s)
~ - ~*(t+s) ~*(s)
L. Thus we get from (11.9) for any I.l
€
L and any
€ T
where ~~(I.l)
= .~ LE(Y*(I.l) >-7.
Assume now that F(I.l) - F*(J.l.) is not a
mere jump function, Le., there eXists a dense set D of I.l-values in L
for which 5(J.l.)
=f(l.l)
- f*(J.l.) ~ O.
are non-singular for any fixed pair
chosen close enough together.
Then for J.l. l , J.l. 2
€
D all the matrices in
and J.l. are
2
l
Thus '0/( s) and '0/*( s) are linear combina8
1
, s2 e T if only J.l.
2niJ.l. s
2niJ.l. S
2 for any pair of numbers J.l. , J.l. which are
l and e
2
l
elements in a dense set C D. This~ however) is impossible and therefore
tions of e
F(J.l.) - F*(J.l.) must be a mere jump function.
jump points.
Let I.l ,
l
2 , ••• be its
1.l
Application of the law of large numbers (L~4_7 p. 489 ana
p. 529) upon (11.9) yields
(11.10)
;.:;: L\.
2nisJ.l. k
== e
k == 1, 2,
~)
(mkl(S)' - m~ ~*( S
3, .••
J
87
w-here
TIl!';:
jilinp
'!J{t)
=
E(Y(!-1~)
- Y{!-1c-) J.
and '!J-x-(t)
In the case that there is no or only one
are seen to be linearly dependent aeainst the
Let us denote
assun~tion.
'!J-l<-( S
1
)
'!J*{ s 2 )
If one writes
(11.10) for two "ta1ues
~t1
-f
Jl
in matrix form
2
=.l\.
2
m ) is non-singular because of
11J..
one sees toot the matrix
-~
~ ~ f.
.
2
other\n.se it would fo11m-, that
8
2
->
sl
and
tjr*(s)
Ill::::: P-
2
from letting
in the continuous case, respectively f'rom choosing
and observing
IP'l
- 1J"2
1< 1
in the discrete case.
s2::::: sl + 1
Hence aGrain
'!J( s)
,
are determined as linear cffiuoinations of'
and f'rom the uniqueness of' this representation it follows necessarily that
there are
eY~ctly
two jump points
Ill' Jl
2
filor a real continuous process one has
:function must be of the f'orm
ino.ependent mean :functions
must be proportional to
III ::::: O.
const
'!J(t)
cos 2:rcI-l s
l
1l
2
::::: -Ill'
cos 2:rcllls.
and
0
'!J-*(t).
and thus £my mean
Hence there are no tw"O
Moreover, alsoR{s) - R*{s)
vrhich agrees vrlth (11. 9) only if
Thus the remark in section 11.1
is justif'ied.
88
Adopting the notation
irf} t
=
'!ret)
(i)
e
'12
is e~sily verified.
2) and 3)
The reJllaining parts of the proof are as under
proof in section
of the
11.3.
He renark that the elementary proof given in section 11.3
can also
be adapted to the continuous case.
12.
Least Sql.1£l.res EstillEtion for a Deterministic Component
It is obvious that It is ill1]?osoible to obtain an estimate for a
completely general deterministic component
~t
in a process
stationary,
(12.1)
from a time series (i.e., just one observation from each of the populations for
ft
t
~
1, 2, .•• ).
One hence can consider
lies in a finite parameter family.
onJ~
models where
As such, vle take
,
in matrix notation
y
where the
O. r S
J
are k
0
,
parameters to be estiwated, and the
given numerical sequences.
As estimates for the
0
He hence have a case of linear regression.
one uses often the least squares estinEte
One important property to be required from the
not distort the variables
are
0
O.
is that they must
in such a -way toot they cease to form a
statioDaX"'J sequence or a sequence '\nth
E x
t
::=
0, if this has been
assumed.
Obviously, this requirement is of asymptotic nature, because
1:J..'
the property ar the time series
to be a sample from a
x ,
2
stationary sequence ts a purely asymptotic property.
Consequently, if
one wants information about the suitability of estiImtes
~tudy
e
one has to
their behavior for increasing sanw1e size.
A
If the ~
remainders
are consistent ( and if the model is correct) then the
fo~m
asymptotically in probability the realization of a
stationary sequence.
The least squares estimate of
If
E x
= 0,
t
(yty)-l yr~ = 0 +
=
e
0
is
(TIt
r1
yr x
A
0
is obviously 1.U1biased.
the process is for
s~le
size
then the covariance matrix of
If the covariance matrix of
n
0 - 0
is
(12.4)
Inserting in (12.1)
(I
= n-th
gives
1.U1it matrix) 1"hich holds for each
n.
In order that the
A
second term tends to a stationary sequence
A
matrix
rn , rn
must tend to
rn ,
or
eXt) with n-th covariance
yp-l y'
(0)
r yp-l
yt + yp-l yr
n
r + r yp-l
n
n
yr
_>
(0),
n
->00
is the zero n:atrix •
0'"
One has consistency ~or
if
(12.4)
tends to the zero matrix.
A sUfficient condition to assure this is that
N
L.:
v:-N
~(v)
I
->
0
means the minimum characteristic value.
A.min
Proof:
(12.4)
tends to zero if and on~ if for any k-vector
a
An upper bOWld for this is
An upper bound for the first characteristic value can be obtained
an unimodular vector
E'
E
1~th
by 'Way of
00
r n E ~ R(O) E2 +'
I E'(LV +
L.:
Ltv) E R(v),
,
v::::-oo
observing that (~'
LV E)2
5 (~t
E)2
:::: 1.
Here
L means the (n, n)
matrix 'Which has one in its ,1mTer side diagonal and zeros else1mere.
(End of proof.)
Clearly, if
00
L.:
IR( v) I
< 00
,
A._of
uun
(y'n yn ) ->
00
91
is suf'ficient for (12.6).
On the other hand, al-ways
,.
n
I:
fR(V)f < 0 (n)
-n
thus, if
n
= O(~nun
. (Y'
n
Y
n
»,
(12.6) is satisfied for any possible
covariance function.
n
One observes that in the case I:
R(v) = O(n), for instance, a
-n
single constant regression vector Y = y
cannot be consistently estin· -n
mated (in the sense of mean sql..1B.re convergence); this is because of
~_~_(yl
y ) = v2
.uu.u
n n
J!..n
= O(n) •
One may ask for necessary conditiona to be requir.ed from Y
n
if (12.4) tends to zero.
matrix
r.
n
or
rn
Clearly these would depend on the particular
This, however, mostly is not known.
when it is actually know.
(There are cases
Then the 'Whole estination procedure allows
much more precise statements, e. g., by the use of Markov estinators.)
Hence
the restrictions for
vary over a 'Whole domain
Y
n
must be obtained from letting ther
n
G of (infinite) matrices vThich itself also
could be submitted to possibly necessary conditions.
sider conditions assuring consistency for
8
G representing a certain class of models.
This leads us to con-
over the domain (or set)
This means consistency not
only for one particular sequence of matrices
fr n )
for all covariance functions or models in
Clearly the conditions in
G.
but s:i.rm:l.;ltaneously
the latter case 'Will be more restrictive than in the first case.
Eicker (1,2)~7 Now if
G contains the covariance function
(uncorrelated random variables), (12.4) becomes
zero if and only if'
p-l
["Compare
R(n) ~ 8(n,0)
vlhich tends to
_>
A
(p)
min
00
;
A
hence this is shown to be a necessary condition for consistency of
over
0
G.
Note that "Ire so far have not required
~
E
= O.
If (E x )= ("'t) is any possible mean value sequence of the stationary
t
A
process
(x )
t
then
~
is asymptotically unbiased, certainly if
A
min
(p)
_>
00
•
v'i:l
Proof:
~.
It;1s
p-l Y'
''''t
12
I --> O,!:
<
R(O)
<00.
Now p-
l
Y'
any unimodular k-vector,
By Schwartz's inequality, this is true if
vn
1-> 0
I'
if and only if
= ("'1' ._~2' ••• ).
max (p-l) -> O.
A
93
e
13.
On Best Linear Estimates
'--
In some statistical problems the covariance :function (or matrix)
(12.3) can be assumed to be kn01ID. (either exactly, or partially if it
belongs to a f'inite parameter f'amily, or by some estimation f'rom other
•
observations). In the first case the set G of' possible covariance
:functions (Section 12) consists of' just one element.
studyf'urther properties of' the estimate
5;
Then one may
see e.g., Grenander and
Rosenblatt (1), p. 87 :
Theorem:
If'
E x ",,0
t
5'" =
(13.1)
then the Harkov estimate
Fyr
5'
of
5
is given by
y' r n-1 -z;
r n is the covariance matrix (12.3), assumed to be non-singular. Here
5
D,
€
D being a space containing at least
k
independent vectors.
The Markov estimate means the unbiased linear estimate 'With the smallest
variance in the class of' all unbiased linear estimates in the sense that
f'or any k-vector
~r E(~
5
_ ~)(~r_~I) ~
is then also
Proof':
~
Let
~
c~lled
c
K~
_ ar
KY = 1.
a
> 0
ef'f'icient.
=~
+
From the unbiasedness f'ollows
hence
'"
'"
E(~ _~) (~_ ~)r
K~
be any linear estimate of 2.
(KY-I)2 = 0, f'rom the property of'
Abbreviating
we have
W-1 .
D
cov(~,~) - cov (~,~)
We have to show that
is positive de~inite, or
~or
The left side with another vector
a
all
a
now is equivalent to
a (i'lIIT K'H-W) a := a' (vJIIT1 / 2_
-
n
(using
-
-
n
KY:= I), as has to be show.
fCompare for a different treat-
ment using the Hellinger integral Grenander and Rosenblatt
88-90..:.7
(1),
pp.
(End of proof.)
This theorem can be generalized considerably:
~
= 'Vt
(0:)
E
(13)
instead of considering the class of linear unbiased es-
X
t
may be any of the possible mean value sequences
timates one considers linear consistent (in the strict sense) or asymptotically unbiased estimates
d
= Kz = KYO
+ Kx.
The set
D of
2-vectors (representing a certain class of models) over which all conditions
and statements shall hold must contain at least
Then it is clear from «(3)
because
K,!
->
resp., K,!
~
(I-KY)~
varying over
D.
follows
that
KY
+>
independent vectors.
(0)
does not depend on
KY -> I
k
and Kx
From KlE
5.
reap.
-> 0 resp. K! ->
° for
Hence the conditions
«(31)
KY -> I
«(32)
K,!
-> 0
resp.
Kx -> 0
are equivalent to the respective ones in «(3).
1Je nOli can state the
5
95
Theorom:
Let
d = Kz
be an asymptotically unbiased or a consistent
(strict sense) linear
(13·3 )
estinnte over
(KY-I) 'K
:k ->
D (alternatively, let
satis~
K
0,
Then an estimate which minimizes in the sense of (13.4) asymptotically
the mean square error for estlllBtes
by
'5
-
~
within the defined class is given
(13.1), supposed that inf'inite~ many matrices r
n
are non-singular.
Proof:
As above, we have, abbreviating .(KY-I) = 1.1, and taldng any
n.:.vector
a
......
a' (cov{~,~) - cov (~,~»!:: = !::'M~ ~'1.1'!: + !:'(M~ ]:rK' + K]: ~'M') !::
(13.h)
-1
+ a' (Kr K r -11
-
(cov
actually means the mean square error) using
,sition used in the previous proof.
(E'M'Kl)2,
n
,
)a > 0
--
«(31) ano. the decompo-
Observe further that
tr(Wo\jlrK r )2
:::;;
the only non-zero eigenvalue of that matrix; hence, by
(13.3),
Mo\jlrK r --> 0 .
Remarks:
Note that (13.3) in general means little restriction because
on the one hand
1V
2
t
:5 R(O) < co,
on the other mostly
2
-J
y.
->
Note further that we have neither ass'LUl1ed nor proven that E
00
as
n
->
00
is asymp-
(13.3) is dropped.. one might obtain
....
in certain cases "better" estimates than 0, Obviously.. (13.3) needs
totically Unbiased or consistent.
only to be observed if'
d
If
is consistent; unbiasedness implies (13.3),
In going through the proof' of' the above theorem, one notices that
no use is made of' the stationarity of' (x } .
t
for any second order stochastic process
for all
sand t,
Hence it holds verbally
(xt} , i.e.,
E(XsX ) <
t
00
r n are
if' only iflf'initely many covariance matrices
non- singular.
Of' particuJil.r interest is of' ccu.rse the case where the mean square
errornatrix
w- l
in 1-Thich case
.2
of' the optimal estimator
;
tends to the zero matrix,
....
only if'
""min (H')
would also be consistent.
->
00.
This is the case if' and
A suf'f'icient condition f'or this to be true is
,
_>00
p
n
= y' y
n
n
For a stationary sequence clearly R(O) < A
(r) < n R(O) .
- 'max n
N
Hence one has consistency of'
where
(~}
.2
f'or the entire class of' models
(12.1)
is any stationary sequence if
(The correaponding necessary condition is somewhat wea1~er.)
For any
second order stochastic sequence (not necessarily stationary) holds
il
t =I ,
max
••• ,
n
R(t,t) < ""
(r) < L:
-max n - t=l
If' one assumes that all
R(t,t)
R(t,t), R(t,t) ::: E{lxtI2} •
are uniformly bounded, (13.6) is also
a sufficient condition for consistency of' ~
chastic sequences with
oVer the set
E{lxtl~)< const < CXl).
S (-sto-
97
e
ll~.
Consistent LinEnI' Estimates for Classes of Models 't.Q.th Different
E~:ror
Sequences
In the case of a regression
on a single regression vector
l'
and if the errors
x
have zero
t
mean and bOlUlded, positive spectral density, one obtains the following
(E(d -O)2
n
necessary conditions for mean sqtmre convergence
n
general linear estimate
d
~ a t
n == t=l
n
Zt
of
0 (~ 0):
n
(i)
->
~
-> 0) of a
for
0
->
n
<Xl
t=l
n
(ii)
2
~
t=l
Proof:
Y
t
->
<Xl
fGrenander and Rosenblatt (1), p. 23Q7.
E(d -O)2
n
= E(d -Ed )2 + (E(d )-8)2
n
n
n
=
f It~l
which shall tend to zero.
(14.1)
One has
Hence
a t Yt -> 1
tn.
~
(asymptotic unbiasedness)
and
n
~
f)
ac'
t=l n·t
which proves (1).
Then from
(14.1)
follovm (ii).
(End
->
0
of proof.)
n
=~
n
YtZ~ / ~
2
tlmt (ii) is
t=l
1
also sufficient for the existence of a consistent estimator if f(A.)
One sees from the exanwle
d
n
Yt
is uniformly bounded.
We generalize this theorem to the vector case
z
=
Y
0
£r1i )
About the stationary sequence
Ex=O
+ x
we assume either tl1at its spectral
density is kno\in and satisfies
(14.3)
or if
o < canst < f(A.) < const <
f(A.)
for all
00
A.,
is not known, that it belongs to any set of spectral densi-
ties, say F, at least one ot whi<:h satisfies (143).
Consider the
estitmtor
of
O.
About
vTe again assume that it is either knorlll ('Which is
0
practically of little interest) and not a characteristic vector of
A Y -I
n n
in the limit
k
vector in R •
in order that
(14.3)
and a
n -->
00
,
or that it may be any k-diJ.l1cnsional
One can derive necessary conditions for A
n
d
-n
is convergent for a specific
not being that eigenvector.
0
and Y
satisf'ying
However, it is more
realistic not to derive conditions for such specific cases "mose presWe rather want conditions for A
n
Y to be fulfilled for a class of models, namely rmere 0 nay be
ence mostly cannot be checked.
k
any element in: R
and
f(A.)
may be any element in F.
and
The condi-
tions so obtained may not be necessary in a specific situation; however,
other conditions cannot be established if this situation is acttlally
unknovm.
99
Theorem:
In order that the linear
d
-n
of
0
est~nate
A
z
n-
:::
in !::: Y ~ + ~, where • Ex ::: 0
spectral density
f(A)
and
(xt )
is stationary with
F, is consistent over F x llk it is neces-
€
sary that
(i)
tr AI
(ii)
A_-r (yl Y )
uU.n
It must even hold
Proof:
~
n
n
->
AI Y
n
->
An
n
n
E(~_~)2 = E(~' A~
An
0
->
T (=k x It - unit matrix).
-.I'>:
~) + f(AnYn-r,,)
2.-72 .
which proves (i). The second requires A Y -> T ,
A n n
-.Ie
min f(A) 2" tr AnI An
and thus, for any sequence of unimodular It-vectors
V'
-n
The first term is
Y' A A I Y v -> l.
n n n n-n
root of Y~ Yn '
Then
Let
v
n
,
be the eigenvector for the smallest
-n
Amin(Y~ Y } an
v
-n
-> 1, 1-There an -> 0;
hence
(ii).
(End of' proof')
One may even discard the assumption
~
= 2,
and require instead
that F containa a process with zero ·mean and a spectral density satisf'ying
(14.3).
Then the above theorem and proof' hold still valid.
One checks easily that
tr AI A -> 0
n
n
,respectively
->
A' Y
n
n
I
r
are not only necessary conditions but also sufficient for consistency
over F
if f'or all f(A)
may depend on
f(A».
€
F
(14.3)
holds (the constants in (11~.3)
100
15.
On Asymptotically Efficient IJinear Er.;t:irrates
Assume nOi-l in
'1l
-n
E
x
-n
=0
=y
o+x
n -
-n
(one may ask ,·rhether this condition
_>00
where
j
= 1,
lj,n
is the
.•. ,
p;
n
:for
_>
n
00
,
j-th column vector in the
is the sample size.
Furthermore, let
:for each
(lj,n
and
is necessary)
is then called slO'W~ increasing.)
j
and
k.
Final~,
n
L:
t=l
lin,
n
->
Ys,t
=
R
1',8
(h)
00
shall exist for all
o:f
Yr,t+h
l(,n (m ~ n).
h
> 0, 1:5
1',
s :5 P;
YII
x,m
means the
m-th component
The limiting matrix shall be denoted by
h
= 0,
+ 1, + 2, .••
From the asslUllptions follovTS immediately the Hermitieity
Ife derive the follovdng Fourier-Stieltjes representation for
LGrenander and Rosenblatt (1), p. 233 f,!..7:
n(h)
101
Lenrrna :
n:
(15.1)
J
Reb) ==
e
ihA
dN(A)
-n:
where
•
of functions of
M(A) is a p x p _ matrix
ference
~
A
for which any dif-
M(A) ::: M(A2) - M(Al ) is a non-negative definite matrix for
any AI! A
2 (Spectral distribution function of the regression vectors)
Proof:
Let a be any p-vector.
=
Then
a
pan
(_eJI
rJ v- lI )_a:::
1"\\
r
lim
r
roE y
y
I y--t
r,t+v 6,t+1-L
r,s=l ..l!r,n I t=l
L:
n -> 00
forms a positive semi-definite sequence because for any
with components
m
L:kcr
v,l-L:::l v
kv '
k=
~
S
s,n
~vector
k
m arbitrary, we have
n
v-~
lL:T
lim
n
I
L:
-> 00 t:::l
~,t+r
E
v,r
2
> 0
r,n
Hence from theorem 2 (Section 3) there exists for each a a measure
function
such that
Fa(A)
Specialization of a
to
th~
~th
J
unitvector
u
-r
yields
1C
R
r,r
(h)
:::
eihA. dM (A)
rr
-1C
where we have put
Fu (A)
-r
haVing in the
j-th and
=
Mrr (A).
Specialization of _a
k-th position
(1,1) resp.
to a vector
(l,i) and zeros
102
A
•
elsewhere leads to two sim.:LLar integrals for
R'k(h) +
J
linear combination allows the de:finition of H(;\)
Mjk(~)
is of bounded variation (for
j
longer to be monotonic non-decreasing,
tk
-
in
R~ ,(11)
-l".J
(15.1) 1-rhere each
these functions need no
The integml formed 'With M
'is the difference of two Stieltjes integrals because
Mjk(A)
written as the difference of t1W non-decreasing functions
androff (1)_7.
'whose
jk
(;\)
can be
.Le.g.,
A1ex-
(End of proof)
Analogously to
Secti~
3,
vTC
mean by the regression spectrum
the set of all points
A
matrix for every pair
A1 , A 'With
2
H(~2) - M(A )
for vThich
is not the null
1
~l
S
< A < A2 .
A
We want to compare the least squares estiImtes
5
of
!?,
Section 12,
A
!? = (y'y)-l
e
y'
z
'With the Markov estiImtes (Section J.3)
(The subscripts
A
Y and
n
are omitted where confusion is unli};:ely (e.g. on
ow
!?' !?).)
The respective covariance matrices are
A
covLS
r n:=
(R(j-k)}
(15. 2)
(If
cov
N
:=
A
E(!?-~)(~-~)'
yP
-1
being a section of the covariance llatrix of
:=
= y'y
p
,
(x ),
t
,
and
- -
E(o-o)(~-~)'
Y is complex, vTe nhJays t<J,l':e the Hermitian transpose insteac.l of the
usual transpose.)
Using
Dn "" diagonal rn.s,tri::
(I.~·.·l.n'
Iyr,d1 f ) .•• ,
Iy
"'-p,rl
I), Grenander
103
and Rosenblatt then call
lim
.
cav
D
n
-> 00
n
5
o.s;ymptotica1ly efficient if
D
n
18
and the limiting matrices exiS't.
D
lim
::=:
n _>00
!J.'l1e :pre- and post-multiplication with
shall effect the finiteness of the considered matrices.
n
cov
LS
- cov
M
is
alv~ys
Note that
non-negative definite.
R(O)::: :M(1C) - M(-:n:)
VIe assume also that
5:
M is non-singular,
meaning that the regression vectors are asynwtotica1ly independent.
Lemmo:
From the Markov property f'ollovrs under the preceding assunrption
and if
f(},.). is positive and pie'ce'ldse continuous that
1C
f(-},.)dM(},.) M-
l
-1C
if f(~~)
-
dM(A.») -1
-1C
is non-negative definite.
This :matrix is the asymptotic expression
Then (Section 7)
Proof'~
0:
r(A.) =
2:
-0:
Now for the
l'
v
v,
r
e iv},.
with
~-th
matrix element
:::
l'
v
n-m
0;
.
21(
I~
< S:,
r v ::: f V ::: a for
Iv I
> 0:.
YVt1 R(U-V)y\.LV
r~
n
2:
u,v:::l
:::
2~ for
:::
2:
m:::o
l'
m
"
W
v:::l
YftV Yv,m+v
r XVln X~tn , +
n+lll
-0;
2:n: 2:
f'
>:
m=:- 1 m u:::1
y
Il,n-l1l
y
V,n
~
I-t
104
for
a
> O.
li'or
->co each
n
0:['
2a + 1
the
terms tends to the finite
a
limit
2rc
f
I:
R
m
m=-o;
IJ."V
(-m).
Thus" from (15.1)"
->
-rc
Now
D- l P D- l
n
-->
n
R(O) = M ;
hence
rc
-->
M-
l
21CJ f(-A) dM(A) M-
l
-1C
2)
Let now
f(A)
be positive and piecewise continuous" and
let possible discontinuities of
M(A).
\
We can then approxim9.te
where the
f (A)
i
r(l) < r < r(2)
n
- n- n
belonging to
A-B
f(A)
with any
where
J
-1C
from above and belovT"
rei)
n
t'(A)
n-vector
and each
Then also
A:: B means that
this follows from the FouI'ieI'-Stieltjes
t'i (A)
Furthermore
~.
f (-A)dM(A)M1
f (A) ~ f(A)~ f (A)",
l
2
is the covariance matrix of the sequence
fi(A)" (and where generally for matrixes
1C
l
not coincide with those of
are finite trigonometric polynomials.
is positive semi-definite);
transt'orms t'or
~H M-
fe-A)
l
~ ~ ]-im"
n
lim £H Dn
-->00
~ ~H rvr
E(;-~)(£-~) D~
I
rr
l
J
-rr
f"2( -A.) aM(A.) M-
l
~~
105
But as
max(f (A.)-f (A.»
l
2
can be made arbitrarily small ezcept :tn
small neighborhoods of discontinuit;y poin-ts, it folloYTs that
1\
lim
E(~-~)(~-~)'
Dn
n __ >co
T,{e
2rr:
J
nOi'T derive an analogous expression for
boxt + blxt + l + .•• + bclt+a =
a
"nth roots of l:
f(-f..)(l}I(A.)M-
=
k zk
b
o
f(f..)
°
to a set of
T"latl' T"lat2' ••• , 11n
then
~
==.§.
The cOiTariance matrices obeJr
.
for
I._~
"IU
Xl' .•• , x
a
V
== 0
n
ortho-
and denote
vrith a suitable non-singular matrix
!::i.
rn
H
!::i. == I , or
n
r-n l ::
H
.0. !::i..
Except
elements in the matrix, we have
b V+U b J..L+u
(b
assuming
Then (Section 7)
\
~,
...
~,
11t+a
inside the unit circle.
normal variables by linear combinations of
!::i..
,
= 2rr:1
If we complete the set
this new set by
l
(15.3).
the le:ft matrix in
:3)
rr:
1\
for
v <0
or
v > a).
==
Considering first an element at (v,J..L)-position
in the matrix, we find, finally,
1
f{-f..)
dl-I(A.) .
106
4)
He now similarly as in
2)
t(A)
approximate a general
from above and below by densities of the form (15.4) and let the difference
Then from (15.2) follows (15 . .3).
tend to zero.
If the
x
Introducing
and. Y
t
T(A)
cov
W
f(A)::: f( -A) and CUvI(A) ::: aM( -A) .
are real, then
= M(A+)
-> T-1
- M(A-)
in
(End of proof.)
0
-< A -< 1C
one has
J1C
2n
o
1C
->
covM
Lenima:
Let
21C
(J
fCA)
d.T(A») -1
.
o
N(A)
= M- l / 2
l
M(A) M- / 2 •
Then the spectrum
decomposed into a finite number of non-overlapping sets
2!
q,
I
the spectrum) with
E
'V
j=l
(i)
N(E ) :::
j
J
j
dN(A)
::: S
S
can be uniquEly
E (the elements
j
such that
is a positive semi-definite non-vanishing matrix,
Ej
(ii)
r: N(E j
) ::: I
j
P
,
(iii)
(iv)
the number
K can not be increased (q
is maxiInal; the deconrpo-
sition is maximal).
Proof:
1)
Any decomposition, if it exists, (not necessarily the
maximal one) is finite, and
construction of M(A)
Also,
101;;; R(O)
j;: 1, .•• , q ~ p.
First, from the
we see that it is hermitian:
is hermitian hence
N{A)::; wH(A)
M(A)
= MII(A).
and N(E.;);:
J
dN(A)
E
j
;:;:
NH(E ).
j
Thus each such matrix is similar to a diagonal matrix and
107
bas non-negative eigenvalues only.
I'lith a unitarian matrix U, say,
it comes
~
where the non-zero diagonal elements
By (i)) ~l
corner.
all the matrices
U
> o.
N(E
0
J
~
Let
)JI
for
Z
U N(Eo)UH
j>l
tion that
J
q
~
=I
p-m
~,m+-l:=
o.
> 0,
hOvTever)
>
have on~ zeros in the first
1
By (ii) fina~,
m rows and col1.unns by (iii).
and
j
can be assumed in the upper
0
J
d
ll
Then
~
:=. •• :=
1,
:=
. As m~ 1, one sees from successive reduc-
p.
2) The existence of a decomposition with a maximal number
q follo'VTS from the above construction:
jdN(A)
:=
I .
Choose a set
p
clearly, from definitions,
E2
EIC S)
:=
S-E1 such that (i)-(iii)
S
hold.
If this is not possible, 'VTe have
Further, if possible, choose
hold.
E)C El
or
q "" 1, othervTise
C.E
Thereby the follo-vTing lemma is used:
= 0,
then MiN
j
~
2.
(i)-(iii)
if M , M , N , N
l
l
2
2
are
(p,p)-matrices, if M = ~ + M ,
2
hermitian, positive semi-definite
N := Nl + N2 , MN
such that
2
q
=0
for
a unitarian transformation (with matrix
i,j
V)
= 1,2.
ot
This is seen after
into a diagonal matriX,
M
having its non-zero elements in the left upper corner, say in rows
1, .•• , m.
columns.
But then
Y1w 'ElNV
has
NOvl, in general, a positive semi-definite matrix
in the diagonal is the zero na trix:
suitable vector ~
H
with
-
u:Au;;:; 2 Re u. u. 0.01
-
-
on~ zeros in the first
J
K
J{
< o.
and the upper lef't in the
assumed namely
j-th and k-th elements
f
a J{
01
0
f
m
and
1,Tith zeros
A
0, j
f
a
k,
\'Tould yield
Hence the la-weI' right part in -the
is zero.
rOW's
liurthermore, each
tlJO
Ills
1
may
108 .
have non-zero elements only in the upper left (m,m)-matrix (or else
2
lu,J I
(i)
m"
JJ
Dl(i)
'I
JC
o De -
+
u, u.
h -
c
J
K
J
general fact thatin a positive semidefinite matrix A
column vanish if
N(E
3
)
ShOiVS
corresponding
JIT~'S, thus yielc1ing the assertion (observe still the
is true for the
This
~1e
could. 1>e :tl1D.c1e ne/33.tive).
j-th row and
does) ..
8. ••
JJ
R.>: C
now that if an
N(El-~) ~ 0,
the
then
E
::>
1
can l)e found such that
(i)-(iii) hold.
A "maxinnl"decompoaition can be obtained by emmining all possible decompositions.
3)
To prove the lmiqueness suppose there are two maximal
decompositions
(E j ) , (Ej ), (differing on sets of positive
Then there must be sets
i == 1, 2.
Put
l\:
n
R
-.kr
,
E~ , E '
J
l
Ejl == F,
N(F) N(E ) ;:.:: 0, r ~ k, and
r
(from the decomposition E!
j2
N(R () E~ ) > 0,
K.
J
i
Then from the lenma in 2),
such that
~{-F == C.
N(C) N(Er)
== F
= O.
+ complement,
contradictory extension of
N(F) N(C) == 0
Furthermore,
Jl
1Thich means a
N-measure) •
L:
k~jl
~';:.:: C
+ complement)
c
(E.)
J
(End of proof)
Application:
The necessary and sufficient condition for asymptotic efficiency
can be vT!'itten (-t·ri th (15.3) which then equals zero)
(15.5)
f
f(-A.) dN(A.)
f
1
f(-~)
;:.::
I
.
-j(
Using the maximal decomposition of
S, this is equivalent to
109
q
J
~
j=l
1
CIN(!,-) = I •
E.
J
I~
= c.J > 0
~(-~)
on
J
2
(observe ~rom (ii)
ciency.
E.,
N (E )
j
this reduces to
= N(E j »,
~ ~(E.)
=E
J
N(E.)
:=
J
I
so that we have asymptotic e~~i-
The converse is also true.
Proo~ o~ the latter
part:
From aSYllrptotic efficiency i'Te have
inth the unitarian modal matrix, Uj
j fe-A)
(15.6)
putting
J\r(~)U
U
= N
(15.5)
or,
,
1
~( -Jl)
-rc
(observe that the tifO matrices are inverses and thus have the soone
Let
diag NU(~)
=
(9)1 (~), .•• , Cfp(~)};
~i(~) is a measure l)e-
each
U
Jf'(-~)
d9'i (~)
Considering this as a
holds, one ~inds
<p i (~)
J
f-1(-Jl) d'i (Jl) = 1, integrating over (-rc,rc).
f(-~)::: c
i
:::: const
On any chain
by sets of positive
<9 i-measure,
any two chains of A I S
Bt=UA
f(-~)
i
o~
If for
sets
t1 Bf )
f1
J dNU(~)
:::: 0,
2
has an
fl:f f 2 ·
0
i
Ai
is constant.
must by construction have
q>i (B
l'f(Be)::::
on the set denoted by Ai' on which
(i:= 1, .•• , p).
> 0 then c i ::: c j '
The matrix
can
Bunjakovski-Scln.Jartz inequality in i,lhich equality
actually increases
to a.ny measure <Pi:
(15.6)
N .
cause the positive semi-definiteness is preserved in
be i·1ritten
U).
f:
j,
i (Ai
n
Bf
j
lmich are connected
The intersection of
measure vr.tth respect
Also
U B( = S
== spectrum.
i-th diagonal element tUlequal to
Be
zero only if Ai C
A )
Beca.use of the po~itive semi-definiteness it ha,s
110
possibly non-zero elements onJ...y in thooe rows and co11Jllll1s "rhich cross at
places of non-zero diagonal elements.
~ O.
and, of course,
W(Bf)
the spectrum
though not necessarily maximal.
S
Bl , ••. , Br
Thus,
of certain elements of the spectrLun.
vIe
Theorem:
is a decomposition of
Each
Br
is the union
(End of proof)
reformulate the preceding theorem and draw a conclusion:
1\
The least squares estilrote
5
of
5· is asymptotically effi-
cient if and only if the spectral density is constant on each of the
elements of
S
(besides being positive and piecewise continuous; fur-
ther assumptions are listed at the beginning of this section).
If and only if the spectrum
points (= the elements), then
error sequence
1\
2
S
consists of
q
~
P
distinct
is asymptotically efficient for any
f>c } 'With arbitrary positive, piecevrise continuous spect
tral density.
Proof. of the latter part:
immediate consequence of the first part.
Rosenblatt has similarly treated a vector valued process.
Cram~r volume, Hiley,
16.
fThe Harald
1959..:7
Asymptotically Efficient Est:i..mates f'O,r Trigonometric, Po~omial,
and Certain Other Regressions
The above theorem allows for a number of' interesting applications.
We give some examples fGrenander and Rosenblatt (1), p. 245
q7
ill
Example 1.
The regression vectors have the form
it~c
e
..
where
v
k
"K
and
,
t =
.
1, 2,
Ie = 1, ... , p,
••• J
are fixed numbers and all pairs are distinct.
In
A
this case, the least squares estimates
for any spectral density
> 0 of
f(A)
l.lkn
To see this, not tbat
r'"
inf'init'yand is slowly increasing.
=
R (h)
rs
~
n
}
n
t
E
.
2vk
tends to
==
t=l
Further,
n
E
->00
V
t r
+v
s
e
it(A "A )
r
s
e
ihAr
t=l
~(2Vr+l)(2Vs+lJ
ihA
,
are asymptotically efficient
1
lim
n
~
r
e
(v +v +1)
r s
Some elements of the spectral matrix M(A)
one jump (at
A
r
respectively) of height
are seen from this to have just
1/(2Vr +1)(2vs +1)
constant elsewhere (if they are not at all constant).
consists of the difference among the numbers
Example 2.
If in Example
and are
vr + vs + 1
1
v 51, one bas a pure
k
gression (seasonal adjustlJl,ent):
The spectrum
, "A •
p
~rigonolUetric
re-
S
112
+/I. .• ),
(A.
A
~
cnd
i
J
A
A
one bas COV( 0., o})
J
(.
is asymptotically efficient.
2j( 1'('10,) 0'1 + o(l/n),
<w
n
J
Jt
For the covariances
Hence the
are approxi-
0,
J
nntely uncorrelated with variances depending on the spectral densities
f(A..).
J
Exara.ple 3.
A
o
In polynomial regression
is again asymptotically efficient.
point
A. "" a
i·n.th
(~
Here
S
consists only of the
a jump of
M(O))
V,V
:iI2V+1) ( 2v I +l)
I
+
v
VI
+ 1
and covariances
A
A
COV(Oj' ~\c)
mjlc
= (ffi 11[( 0_1T
1
)jk
2j( f(O) m
jk
<w
n
,
j+k+l
.
All of the above results hold similarly for the real cases if only
f(A.)
is positive and piecewise continuous.
Example 4.
venience,
The first example can be generalized as follows:
1re
restrict ourselves to two regression vectors only.
J"
Ylt = p(t)
eitl>.
-:j{
2t = Q(t)
Y
J'
-j(
1-There
for con-
e
itA.
00(1).)
Let
113
u
pet) = E
v=o
a(A), t3(A)
and
v
P
V
and
tV
S
6
are functions of bounded variation.
(some of them may be zero).
et:L (A)
and t3~(A.)
== a(A)
a
-
A , A , ••.
l
2
They may have
of reS1Jective heights
The jump-free parts are
e a:(A) - a: (A)
2
s
defined analogously.
A
The least squares estimator 9
is asymptotically efficient for
all positive and piece1nse continuous densities
one of the following cases arises:
(1)
,
;::: L:
v=o
jumps (enumer-a.bly many at most) at
'a , b
Q(t)
Ylt == pet) £e
Y2t
=:
itlJ.
J1C
+.
-1C
1C
Q(t) £eitlJ. +
J
-1C
'where u ~ v •
eitA.
d(31(A)_7
f(A)
if and only if
114
re
J
-re
;::: Q( t) fb l e
i tl-.l
+ 1)2 e
i tl-. 2
Jre e i tl-. dl3
+
-rc
l (i.. t7
The proof is given in Grenander and Rosenblatt (1), pp. 2)+8>-252.
If
Y , y 2t
lt
are real, sim.ila.r conditions like those above can be
estLi.blished.
5~
Example
Pulse trains.
For simplicity, suppose that there is only one reGression rLmction
•
Y , 1Thich is supposed to be periodic vith period
t
integer) J
Yt
::=
Y
+
t g
Yl ' Y2 ' .•. ) Yg .
(periodic pulse train).
g (a positive
The pulse shape is
Writing
+ b e2re~t(g-1)/~7
g
one sees tbnt the regression spectrL1JYl
for ,'Thich the constant
b
v
S
consists of those points
does not vunish.
21CV·
Only in the exceptioml
case that the spectral density is the same for these points the least
n
0:= L:
1\
squares estiwutor
cient.
Zt Yt
t==l
For its variance holds
l
n
I:::
Iytl
2
-
is asymptotically e1'1'i-
1
as~~wtotically
q
115
q-l
~
v=o
1\
var 0 ...,
2~ ~(2~~)
q-l
n
0
( )' 11' 1"-)
h"
)
0
L
V
o
Example 6.
bv
a~
Some of the results obtained so far can be applied in the
theory of connnunication, where a received message
appropriate to have a continuous pam.meter,
of the origin.9.l si81lal
s (t)
yet)
(it is here
0 < t < T)
is composed
and a disturbance
x( t)
'Hhich may be
assumed to be a stationary process whose spectral density exists and
is continuous:
= set) +
yet)
One -wants to reconstruct
If, for instance,
1\
y =
set)
set)
~
x(t) •
from yet) .
=y
= const, the least squares estimate
T
J
o
yet)
dt
is seen to be asymptotically efficient.
A further discussion is given
in the reference, where properties of estimators are described vfuich
take care of the finiteness of the sample size
is mentioned, what to do if
thuB the least
sq~mres
set)
T.
Also, the problem
is a (periodic) pulse train and
estimator is not efficient.
CHAPrER IV'
e
PREDICTION TllEORY
17.
The Problem of Prediction in Stationary Sequences
We proceed
~n
giving a brief survey of some problems and results
in the theory of prediction for stationary sequences.
~t
exn },
n =
••• -1, 0, 1, ••. be a stationary sequence with spectral distribution
Then the problem of linear prediction by means of least squares
consists in the determination of a random variable specified by means
of coefficients b ' ••• , bN-l such that the r.v. x n+v is "best"
o
N-l
v and N, and for all nand
approximated by E b xn-j for given
j=o j
and
0)
€
The crtterion used is the minimization of the mean square
error
Prediction can also be made if the restriction of fixed N is dropped.
If then ~ is an element of the Hilber.~ space HI generated by x , x l'
n
n
n••• , the predictor is determined by minimization of
E{
2
Ixn+v -
q') I
}
It will be shown that in both cases there exists a solution which is
a.e. uniquely given by the projection of x
respectively by x , •• "
n
x
n-
n+v
on the spaces generated
N 1 or HI.
+
n
Finally, we mark on the possibility of
nor~llnear
prediction by a
function f(x n , ••• , xn- N+1) which is determined from minimizing for
fixed N and v
111
There exists a unique solution (up to sets of measure 0) of the
linear prediction problem name~ the grg~tecticn of ~i1+'V en the s~ce M
spanned by x n- N+l' "', xn respectively, x n , x l' ... .
n-
This, in
the finite case, is the r.v.
N-1
chosen in such a way that x n+v - .E b j xn- j is orthogonal to M.
J=O
The
system of linear equations determining the bj's can be uniquely solved
if and only if the covariance function is strictly positive definite.
In the case of semi-definiteness, the sequence (x t } is singular insofar
as it is proportional to one single r.v. x
["compare DoobJ..3..~1 , pp. 16 and 152_7.
o
as we may say (Theorem 1)
The sequence is then also called
non-regular or deterministic ["Doob j;j!fJ, p. 564_7.
of ~ v is obvious.
The minimum property
Let f(X n_ N+1 , ••• , xn ) be any linear function;
then,
because of the orthogonality of CP v'
E(
Ixn+v - f I2 )
= E(
Ixr+ v -
fO 2
'1 v I ) + E(
I
Using spectral representations (with other words congruences of
Hilbert spaces), the prediction problem is identical with the determination of a function y\(A.)
=
n- N+1
E
m=n
'Y
m
2....1-\
e ••
JlV"
2 . \
(polynomial or series in e 1n. A)
such that
Clearly, here in the integrand the unimodular factor e 2:ninA. can be
canceled, indicating that the solution
y • y
_,
- m' - m-l
•••
is indenendent
of
~
118
2niy
vA. be2:n:imA. ,
We hence have to approximate e
n.
m~
0, in F-measure.
The equality
2
~l
2
(YvN == E( Ix v - .E b. x_ j I )
j=o J
1/2
Ie
=
2:n:i vA.
-
-1/2
exhibits the complete equivalence of the two formulations.
One may write
1/2
<J>v
=
-1/2
where ~v(A.) is called the prediction function for lag v.
Worth noting is the analogy with
pol~lomial
approximation for a
certain weighting function.
Kolmogoroff (2) proved the following interesting formula LHannan
4It
L3g1
p. 21;
see also Szeg8 (1)_7:
:n:
221
(Yl = min (Y1N = exp ( 2;{
N
log 2:n: f( A.) dA.)
-:n:
The right hand shall equal 0 if the integral equals - co •
If
(j
2
1
> 0,
the sequence is called regular.
18.
Solutions to Some Specific Prediction Problems.
Here we mention a few results obtained for specific stationary
sequences
LDoob.f)!:.7,
p.
569
ff ._7.
Theorem: If (~n' -co < n < 00) is an orthonormal sequence of r.v.'s, if
CD
.E
o
2
Ic n l
< co, co
.[
F
0, and x n is a moving average
119
00
x
n
::
L:
j::o
C
j
S
.
n-J
then the
exn )- sequence
O'~ = min
:2
O'vN for lag v satisfies the inequality
N
v
is regular and the mean square prediction error
This even holds true for any
exn ) sequences whose
absolutely continuous
part
a.e. (Lebesque measure)
wi th
Co
~
0,
2
L: Ic n 1
n
<
00.
CD
There cannot be equality in (18.1) for any v if L:
in
Iz I < 1
o
C
n
zn has zeros
.
iJ> v is determined by the
The form of the predicting r.v.
00
Theorem:
Let x
n
=
L: c
j
j=o
Sn- j~+ V n
5
U
n
+ v
non-singular and singular parts (Section 6).
approximating x
n·
be the decomposition into
Then we have for
CJ'v
n+v
,
and the prediction error is
Another formulation is given by the
Theorem:
A sequence is regular if and only if F'(~) > 0 a.e. (Lebesque
measure), and
120
1/2
log F'(A.) dA. >
-CD
-1/2
I~
the regular case, the constants c
n
in the above theorems are uniquely
determined by
1/2
Co
= exp( ~
log F' (A.) dA}
,
CD
=
F' (A.)
I I:
nco
-1/2
00
I:
o '
c
n
2
Ic 1 <
n
00
,
CD
I:
o
c
en zn ~ 0 on Izi
n
e-21CinA.1
2
a.e.,
.
=1
)
can be calculated from
,
Izi
<
1
,
1/2
-1/2
In the following, some examples for predictors are given.
1)
Let the x be mutually orthogonal.
n
o/v ~ 0
Then
fv
Hence also any predicted observation taken from
probability 1.
a.e., also ¢(A.) ~ O.
would equal 0 with
(However, this probability would be less if from the
"past" r.v. 's only finitely many observations are given.)
2)
Let F(A.) be absolutely continuous, and
Bo Bf3
!
0
Without loss of generality all the roots of I: B. zj can be assumed to
j
be in modulus < 1, and then
J
121
,
E(~iTj)
= 0ij· Now
.pI
= - ~(3
(B(3_1 x n + ••• + Bo x n_ +1 )
f3
is the predictor for lag 1, with error
iteration can be done.
t{J
(3-2
2 = E
B
@-1
IB(3
r2 •
For higher lag an
Thus
B
(3- j-1
-B
(3
B
B
+ (3-1
(3- j- 2 x
~
j=o
CT~ =
(3
~j
B
x
0
~
~(3H
(3
,
3)
For (3
=1
(Markov chain) one has
,
,
R(n)
,
n>O
= cv
,
B
c
= -B
-o
1
122
4)
z
Let F ' (A)
=e gniA
j::o
t3
j 2
0:
:: I I:
Aj z
j -2
I I.I:
Bj
J=o
I '
Z
CD
which can be written as
I:
A0 B0 Ao: Bt3
c j e -2~ijAI2 •
o
above theorem, one has
(JI)
~
V
(A)
= e 2~i VA
I: c. e
j=v J
00
E c
j=o
e
j
-2~ijA
-2n:ijA
f.
0,
.
App1y-l.ng
the
BIBLIOGRAPHY
~~7
N. L.Akhiezer (1), Theory of Approximation, Fred. Ungar Publ.
Co., New York (1956).
~2_7
N. I.Akhiezer and I. M. Glasmann, Theorie der linearen
Operatoren im Hilbert-Haum, Akademie- Verlag, Berlin (1954).
~3_7
P. S. Aleksandrov, Einfilhrung in die Mengenlehre und die Theorie
der reellen Funktionen, Deut. Verlag der Wissenschaften,
Berlin (1956).
~4_7
T. W. Anderson, Notes on Stochastic Difference Equations
(lecture notes), New York (1949).
["5_7
A. V. Balakrishnan (1), "On a characterization of covariances,"
Ann. Math. Stat., Vol. 30 (1959), pp. 670... 675.
["6_7
Salomon Bochner, Harmonic Analysis and the Theory of Probability,
University of California Press, Berkeley, (1955).
~7,-7
Salomon Bochner and Tatsuo Kawata, "A 1imitthoer':'ID for the
periodogram," Ann. Math. Stat., Vol. 29 (1958), pp. 1198-1208.
~
/
D. Dugue,
Arithmetique
des lois de probabili te"s,
Gauthier-
Villars, Paris (1957).
~9_7
F. Eicker (1), "Central limit theorem and consistency in linear
regression";(2YCentra1 limit theorem for sums over sets of random
variables", Institute of Statistics Mimeograph Series No. 271
and 279, Chapel Hill (1960).
["10_7 F. Eicker (3), "A note on linear regressions with weakly
stationary errors having unknown means," Technical Report at
Stanford University, (1961).
124
e-
125
e
["2~7
K. R. Parthasarathy (1), "On the estimation of the spectrwa of
a stationary stochastic process," Ann. Math. Stat., Vol. 31
--
(1960 ), pp. 568-574.
f
["22_7
E. Parzen (1), "On asymptotically efficient consistent estimates
of the spectral density function of a stationary time series,"
126
~31_7
Antoni Zygmund
(1),
Trigonometric Series, Vol. II, Cambridge
University Press, Cambridge
~32_7
(1959).
E. J. Hannan, Time Series Analysis, Methuen Monographs, London,
(1960) •
~33_7
Nils Donald Ylvisaker, "A Generalization of a Theorem of
Balakrishnan," Technical Report No. CU -
0)4),
Columbia University, New York
38 - 59 (NR - 042 -
(1959).
~34_7
J. L. Doob, Stochastic Processes, John Wiley, New York (1953).
["3'2.7
Paul R. Ha.lmos, Measure Theory, D. van Nostrand Co., Princeton,
N. J. (1950).
["3§J
e
Michel Loave, Probability Theory, 2nd edition, D. van Nostrand Co.,
Princeton, N. J. (19 60).
£317
B. Gnedenko, Lehrbuch der Wahrscheinlichkeitsrechnung, Akadie Verlag,
Berlin (1951) .
.,
£3N
£327
F. Ries and B. SZ-Nagy, Functional Analysis, London (195 6 ).
H. B. Mann and A. Wald, "On the statistical treatment of linear
stochastic difference equations,,"
Econometrica, Vol. 11,
(1943),
pp. 113-220.
£4IJ.7
Waasily Hoeff'ding, "The large-sample power of tests based on permutations of observations,"
(1952), pp. 169...192.
•
Annals of Mathenatical Statistics, Vol. 23,
© Copyright 2026 Paperzz