Based in part on the author's Ph.D. dissenation at YaZe University.
PRoJECTION WITH THE \~RONG INNER PRODUCT AND ITS APPLICATIexi
TO REGRESSIOO WIlli lDRRELATED ERRoRs AND LINEAR FILTERING OF
TUE SERIES *
by
William S. Cleveland
Department of Statistics
University of North CaroZina at Chapel, BiZZ
Institute of Statistics Mimeo Series No. 673
MARCH 1970
Projection with the Wrong Inner Product and its Application to
Regression with Correlated Errors and Linear Filtering of Time Series *
by
William S. Cleveland
Department of Statistics
University of North Carolina at Chapel Hill
1.
INTRODUCTION
In many places in statistics one wants to calculate the orthogonal projection Px
of some vector
x
on a subspace
P.
Oftentimes the inner
product function is specified by the unknown covariances
random variables.
The usual procedure is to estimate
proximate
P*x..
that is.
Px by
x
C of a set of
C by
C*
and ap-
the orthogonal porjection with respect to
is projected on
C*;
P using a wrong inner product. There is.
therefore. interest in knowing when
p*
will be a good approximation of
P.
In Section 2. the question of calculating orthogonal projections with
the wrong inner product in a general Hilbert space is investigated.
The
results are then applied to the problem of regression with correlated errors
in Section 3 and to linear filtering operations on multi-channel, widesense stationary, stochastic processes in Section 4.
2.
PROJECTION WITH THE WRONG INNER PRODUCT IN A GENERAL HILBERT SPACE
Let
H be a Hilbert space with inner product
11·11 • (.,. )%.
*
Let
[.,.],
( • , .)
and norm
which will be thought of as the wrong inner
Based in part on the author's Ph.D. dissertation at YaZe University.
2
product, be a bilinear functional with the following properties:
(1)
[ • , •]
H
H,
whose closure is
the linear functional
(2)
If
zn
and
z
= [y,x],
[x,y]
[x,,]
is a
n
[ •, •]
and for fixed
x
V is bounded.
on
is a sequence of vectors in
[z -z , z -z 1 + 0
n m n m
H such that zn
z
+
Cauchy sequence, that is
n,m + m,
as
In (1), it has not been assumed that
tional.
V is a linear subset of
'Ox'O where
is defined on
z~'O.
then
[ ',' ]
is a bounded bilinear func-
Assumption (2) has been made to ensure that
[.,.]
is defined
everywhere it is possible to do so and maintain the properties in (1).
x~'O,
From (1), for fixed
[x,·]
the definition of the linear functional
H.
may be extended boundedly to all of
y~H
Theorem (Halmos, 1957, p.3l) there exists
Let
B be the mapping defined by
Bx. y.
and
B is bounded if and only if
[ " •]
Let
ator onto
that
P.
Let
[x,y]. 0
x • p+q
with
p• • Po'O
for all
peP'
if and only if
[.,.]
B is linear and self-adj oint,
Let
qf;Q.*.
This decomposition of
is positive definite on
x
p·xP·.
zep·
(p+z)+(q-z)
with
which implies
for all
yeP'.
Thus
so that the decomposition is not unique.
positive definite and
o •
[z,y]. 0
x . p '+q'
with
[p'+q'-p-q, p'+q'-p-q]
p'. p
and
q ' . q.
I
•
p' eP*
and
will be unique
For, suppose
Since the Cauchy-Schwartz inequality holds for
(Helmberg, 1969, p.10),
such
'0* be the set of all xe'O such that
is not positive definite, that is there is a
[z,z] • O.
x~'O
be the set of all
y~P*.
and
[x,·]· (y,.) •.
the orthogonal projection oper-
0..*
and let
such that
is a bounded bilinear functional.
H and P
P be a subspace of
From the Riesz Representation
z '" 0
[.,.]
such that
[0,0]
zeQ.*
Suppose
and
[.,.]
q 'eQ.* 0 Then
[p'-p, p'-p] + [q'-q, q'-q]
x =
is
3
The operator
P*,
orthogonal projection with respect to
be defined in the following manner.
such that
sequence in
[y,y] •
o.
Let
Clearly N is a linear space.
N and let z be the limit of zn.
Let
Let
From (2),
P*X
x· p+q.
Then define
P*x
to be
be a Cauchy
zeD
N.
Let
p-Np.
p+N or even all of them.
to be any vector in
and
N is a sub-
Thus
in the special cases of Sections 3 and 4, it is seen that
p-Np"
xeD'
We could
However,
the vec-
p+N with the smallest norm, is a natural assignment.
tor in
THEOREM:
(3)
Let
xeV.
Then
xeD'
if and only if there exists a
inf{[ x-y , x-y]: ye p.} ,
PROOF:
n
N be the orthogonal projection operator onto
have the decomposition
have taken
Z
will
y~V
N be the set of vectors
[z,z] • (Bz,z) • limn-+oo (Bz,Zn) • limn+"D (z,BZn ) • O.
space.
[ •, •] ,
(i.e.,
pep·
in which case
p*
is defined for
such that
[x-p,x-p]·
P*x· p-Np.
The proof proceeds exactly the same as the analogous theorem for
orthogonal projections.
THEOREM:
(4)
If
is positive, then
PRooF:
Let
xeD
V. 0*.
and let
We may proceed exactly as in (Halmos,1957, p.23, Theorem 1) to show that
there is a
[• , •)
Then
Cauchy sequence
z~P',
yep·
n
[x-z ,z-z ] ~ 1;,
n
n
wi th
and
[x-y ,x-y ]
n
Zn
n
is a
~ 1;.
[ •, •]
Let
Cauchy
x)
4
sequence.
z
n
is
(2) ,
[zn -zm,zn -zm]
From the hypothesis,
Cauchy.
zeP*.
Let
Therefore, there exists
vn • z-zn
lim
o • n,m+oo
•
~
lim
e(zn -zm,zn -z)
m
zd' with
Z
n
so that
-to
then
[v -v , v -v ]
n m
n m
(Bv ,v ) - lim 11m (Bv ,v )
n n
n-+oo
m+oo n-+oo
m n
- lim lim (Bv ,v ) + lim (Bv ,v )
n+oo m+<o
n m
m+<o
m m
•
which proves that
o =
lim [v ,v ] + lim [v ,v],
n+Go
n n
[z-zn , z-zn ]
-to
m m
1Jt+<Xl
O.
Thus
lim [z-z ,z-z ]
n
n
Q'+ClG
•
[z,z] - 11m (Bz,z ) - lim (z ,Bz) + lim [zn,zn l
•
[z,z] - lim [z ,z ] •
n+<»
n
n+Go
n
n+oo
n+c»
n
n
This last result together with the fact that
lim [z ,x]
n+Go
n
•
•
lim (Bz ,x)
n+c»
n
lim (z ,Bx)
n+Go
•
(z,Bx)
=
[z,x]
n
proves
I
I
z; •
lim [z -x,z -xl
n+Go
n
n
•
[z-x,z-x] •
z. From
5
Thus from (3),
x«V*.
THEOREM:
(5)
V·. V.
Suppose
Then p .. P*
on V if and only
if BjW. l'
PRooF:
Suppose
(x,By)
Since
P
(Px,By)
-
and
p . P*
on V.
(Bx,y)
•
(. ,By)
l'
onal complement of
[x,y]
•
in
V is dense in H,
Py -+Py • y.
xe:P',
Bx. O.
(BPx,y)
H,
which implies
and
Bye:pj
there is a sequence
Ynef) with
so that
1". p.
Then
[x,y)· 0
for all
y£p.
n
Thus
hence
B is one-to-one on
.. Piry ep·
n
B is one-to-one on
1".
and
B is a one-to-one self-adjoint transformation on
p' •
To show
Let
Yn-+yj
yeP.
thus
P*x· O.
p' = l'
The facts that
gether with (He3.mberg, 1969, Theorem 6, p.12l), imply
p'
with
Bp'cp
to-
BP·. p.
p. Let x£V then for all y£P', (x,By) .. (Bx,y) ..
Bpi.
[x,y] • [P*x,y] .. (BP*x,y) • (P*x,By).
ze:P,
(x,By)·
Bp·cp.
which implies
P*X .. Px .. x.
zEBp·.
(Px,By) •
Suppose that for some
But
Suppose
•
V is dense in H,
Py
But
n
•
In particular it holds for all vectors in the orthog-
equality, we fire t prove ",... = l'
Since
xeV and Y«p·
[Px,y]
are continuous and
xeH.
for all
For all
Thus
(x,z)
= (P*x,z)
for all
From the hypothesis, this last equality may be extended to all
which implies
P*x .. Px.
The following inequality is due to Kantorovich (1948, p.142).
(6)
lEMMA:
Let
.•• ,n.
Then
n
r
i-l
0 < a
l
S
a
2
••• s an <
n
ciai
L cia~l
i-l
s
co
and
ci~
for
i · 1,
6
and equality holds fOT the proper choice of
THEOREM:
(7)
Let
xd'- •
a
Let
•
inf{rZtZ]
(z.z)
a2 •
zPZ
sup{r(z,z)]
l
c i
and
Then
,
where the right side of the inequality is assigned the value
if either
00
a
1
• 0
or
a
•
2
The inequality is best in
00.
the sense that the left side can be made arbitrarily close to
P and x.
the right by the proper choice of
PRooF:
Ql>O
If
and
a
a
2
l
=- 0 or a 2 •
<00.
00
the inequality is clearly true.
V·
From (2) and (4).
III
V
III
H.
Let
8
1
Thus assume
• x-Px
and
z2 • P*zl.
Let
R be the orthogonal projection operator onto the space
spanned by
z2'
and
to
Then
[0 •• ].
R*
the orthogonal projection operator with respect
zl-R*zl- x-P*x
loss of generality in assuming
to
p.
Let
S
and
Ilxll· 1.
Let
y
be the space spanned by
([x.y]/[y.y])y
and
zl-Rzl- x-Px.
P is one-dimensional.
be a basis for
x
and
y.
•
~
Ilx-Pxll 2
1
•
x
P, where
Then
Px - 0
and
a
Thus there is no
+ J rXpIU:
[y,y]2
is orthogonal
Ilyli. 1.
and
P*x
=-
7
Since the quadratic forms
Ql(v). (v,v)
and
Q2(v). [v,v]
for
are nonsingular with respect to each other, we may coordinatize S
that if
v
corresponds to the two-tuple
vl'v
2
vt::S
so
then
and
where
a
• min{Q1{v)/Q2(v): vt::S}
l
replaced by
max.
Let
x
a
Since
•
II xii • Ilyll • 1
and
a
2
correspond to
1+
and
is defined similarly with
xl' x
2
and
y
to
y1 ,y2
min
then
81xlY1+a2x2Y2 .J2
2
2"
[ allY11 +8 2 IY2'
(x,y). 0,
we have
IxII • /Y2/
which im-
plies
a ..
Letting
a j • 8j
and
c j • BjIYj/2/{SlIYlI2+B2/Y212)
we may apply (6)
with the result that
(8)
The inequality of the theorem now follows from this and the fact that
a1~a1
and
82Sa2 "
It will now be shown that the inequality cannot be improved.
still assuming
al>O
assume
For
a <a "
1 2
and
£>0,
a <oo.)
2
let
For
u,v€H
a
1
• a
2
it is obvious.
be such that
(We are
Thus
8
[u,u]
(u,u)
<
[v,v]
(v,v)
>
0.
1
+
£
and
Clearly
u,v
0. 2
are linearly independent for
be the space spanned by
u
and
v.
Then
- &.
£
small enough.
Bl<o. +£
l
and
B >o. -&.
2 2
from (6) equality holds in (8) for the proper choice of
shows the inequality cannot be improved for the case
If
0.
2
=~
[.,.]
is positive definite on
0.
1
Now let S
Y1' Y2'
>0
OxO and either
and
0.
1
• 0
But
which
o.2<~.
or
then by a proof analogous to that in the previous paragraph it may
be shown that the inequality cannot be improved.
Suppose
xeO
space spanned by
is such that
x
then
Px· x
[x,x]. 0
and
and
P*x. O.
x
~
O.
Let
P be the
Thus the inequality can-
not be improved in this case.
CoROllARY:
(9)
PRooF:
The first inequality follows easily from (7) and the equality
II x-P*x 11 2 • II x-Px 11 2 + II Px-P*x 11 2 •
2
2
that II x_Px11 s II x_p*xII •
3.
Suppose
mean
The second follows from the fact
REGRESSION WITH CORRELATED ERRORS
x· xl, ••• ,x
n
has a multivariate normal distribution with
m and nonsingualr covariance matrix
C.
Suppose m lies in
P,
9
a subspace of Euclidean n-space
where yEE,
f.
The norm
1
(y,y)%. Ilyll • (yc" 1)lf,
is oftentimes referred to as the Mahalanobis distance.
It
provides a measure of the distance of two normal populations with the same
covariance matrix C,
in the sense that the closer the means of the two
populations under this norm, the more difficult it is to discriminate them.
(Rao, 1965, Section Se.)
A
Thus if
m and
m*
are estimates of
a measure of their proximity
A
In fact, if
is
m,
m denotes the posterior mean of m under the
assumption that the prior on m in uniform, then
which minimizes
II x-p II
as
p
ranges over P•
Markov estimate even if normality is not assumed.)
P
is the orthogonal projection operator onto
Suppose, however, that
C*
C is not known but
convenient choice, as in least squares estimation.
C replaced by
and m is estimated by m*. P*x,
operator on
P with respect to
where
x
and
a bound for
P.
Let
a
l
and a
That is,
C*
is available where
C or perhaps just a
Now suppose
That is,
p*
A
m • Px where
m is
[y,y]. y(c*)-ly'
is the orthogonal projection
[.,.].
In the notation of Section 2, let
by
C*.
(~ is also the Gauss-
P.
is a nonsingular approximation or estimate of
estimated as above with
~ is the vector in P
H be the subspace of
be defined as in (7).
2
Ilx-m*ll/llx-~11 and (9) gives a bound for
A
Mahalanobis distance between m and
E spanned
Then (7) provides
II~m*ll, the
(Compare this with (Watson,
m*.
1967, p .1685) .)
Instead of choosing H as above we can take H· f.
results in less sharp bounds for a particular
now hold whatever the choice of
x
and
P.
x
This, in general,
and P, but bounds which
From (7), these bounds are in
10
fact best among all botmds which do not depend on
B of Section 2 is
(C*)
-1
C and
eigenvalues of this operator.
(4) and (5),
m*. ~
a
l
Clearly
if and only if
x
and
l'.
The operator
and a
are the minimum and maximum
2
(C*)-lCl'. (c*)-lCl' so that from
(C*) -lCl' • l',
a result first proved
by Kruska1 (1968).
4.
LINEAR FILTERING OPERATIONS
The calculation of projections, in particular linear predictions,
interpolations, and signal extractions, in the space spanned by a multichannel, wide-sense stationary time series
X
t
has received great attention
since the initial works of Wiener (1949) and Kolmogorov (1941).
The theory
is based on the assumption that the spectral distribution matrix of
is known.
X
t
But it is not really known in practice, and the traditional
remedy is to estimate the matrix and then calculate the projections as
though the estimate were the true matrix•.
Suppose
rxr
X
t
is an r-channel process.
spectral distribution matrix of
X ,
t
continuous parameter process and OSAsl
F*(A) • [F*jk(A)]
be an estimate of
continuous with respect to
Let
F(A). [F
where
-<A<1lll
jk
(A)]
if
X
t
be the
is a
if it is discrete parameter.
F(A).
Assume
I;.lF*jj
Let
is absolutely
Ij.1Fjj •
Any projection in the Hilbert space spanned by the process can be
L (F) ,
described as a projection in
YeA) • vl(A), ••• ,vr(A)
2
the Hilbert space of vector ftmctions
such that
r
IIv11
2
•
I
JVj(A) Vk(A)dFjk(A) < CD,
j,k-l
·
'
11
where the range of integration is
.- to co if X is continuous paramt
to 1 if it is discrete (Rosanov, 1967, p.28). Let m be
eter and 0
any measure such that r;.lFjj is absolutely continuous with respect to
dF
m. Let f(A)· dm{A)
and f*(A)· dF*
dm(A). Then
IIvl1
2
•
f vO) f(A)v'(A)dm(A)
,
(
where v'(A)
denotes the transpose of veAl.
H· L2 (F) and define
To apply the results of Section 2, we let
[.,0] by
[u,v]
•
f u(A)f*(A)v'(A)dm(A)
,
so that V is the set of v€L 2 (F) with [v,v)<co. Using F* in place of
F to calculate a projection is equivalent to using [.,"} in place of
(",.).
It is easily seen that assumptions (1) and (2) hold, and the op-
erator B maps
v to u, where
u(A)
•
u at the point A is
f~*(A)V' (A)J
l v(A)f(A)v' (A)
veAl •
.
With these definitions, the results of Section 2 may now be applied.
The values
f*
a 1 and a 2 in (7) can be written in terms of
as shown by the following theorem.
(10)
THEOREM:
f
and
Let
a complex r-tuP1e}
for all
max.
A.
Define A2(A) similarly with min replaced by
Then A1 and A2 are m-measurab1e. Let
·.
12
t
l
• essAinf A ().)
l
and
If
sup
fO.)
~2·
are with respect to
is nonsingular,
f*(A)f-l(A)
PROOF:
and
and
A (A)
2
essAsup A ().),
2
m.
where
a l • l;l
Then
e8S
tnf
a 2 • l;2.
and
is the minimum eigenvalue of
A (A)
l
the maximum.
Since the complex r-tuples with rational real and imaginary parts
(cf(A)c')~'
are dense in the space of r-tuples with norm
the definition of
A (A)
l
may be taken over such rational
the minimum in
c.
Thus
A
l
is m-measurable since it is the minimum of a countable number of m-measurable functions.
Similarly,
Since for each
respect to
f(A),
A
2
A (A)
2
A,
is m-measurable.
is the maximum eigenvalue of
there eRists an eigenvector
f*(A)
with
e(A)· el(A), ••• ,er(A)
such that
e(A)f*(A)
•
A (A)e(A)f(A).
2
It. is easy to specify a routine for choosing
m-measurable.
Also
e(A)
e(A)
may be chosen so that
to ensure that
lej(A) 1 < 1,
e
is
since a
constant times an eigenvector is also an eigenvector.
Now if
v£1J,
[v,v]
•
f
veAl f*(A) V'(A) dm(A)
· J ~*(A)v'(A)
veAl f(A) V'(A) dm(A)
v(A)f(A)v' (A)
That
'.;2 = a
that
Ilv
'.;2 •
2
will be proved by exhibiting a sequence
n II = 1 and
00
then
[v,v] ... '.;2.
n n
Define the set
v €L (F)
n 2
S
such
n as follows:
if
13
if
C <CD
2
then
There exists an m-measurab1e subset
such that Tn
T
n
of
S with positive m measure
n
is contained in a bounded interval.
Let
Tn
also denote
T and define v (~) • liT ell-~ (~)e(A).
n
n
n
n
are m-measurab1e, lej(~)t < 1, and Tn lies
the indicator function of the set
v £L (F)
since
n 2
e
and T
n
in a bounded interval.
Now
so that
ess inf T (~)
A£T
n
n
a
2
(~)
s
[v ,v) s ess sup T (~)
n nAn
Since the left side tends to
A similar proof shows
C2 '
so does
a2(~)
s
C2 •
[vn'vn ).
a
• C
r The final statement of the theorem
1
is a well-known fact (Rao, 1965, p.15).
A special case of F and F* deserves comment because of its frequent
occurrence in practice.
Suppose that
X
t
is a discrete parameter process,
m is Lebesgue measure, and
for all
A where
constants.
I
is the
rxr
identity matrix and Pl,P2
A common way of obtaining
order autoregression
f*
is to assume
X
t
are positive
is a k-th
.
14
where the
kxk
matrices
are such that the roots of the polynomial in
A
j
Z
clet[Ao
k
+ Alz + ... + J\z ]
lie outside the unit circle, and W
t
E WtW~
• I
1 2
for
tl
=t2
and
(cf. Whittle, 1953, p.125).
AS, •• •,~
and
f*(A)
expression for
f(A).
0
is r-channel white noise; that is,
for
tl
+t 2•
The parameters
is formed by replacing
For such a model
AO, ••• ,J\ are estimated by
Aj
by
A;
in the above
The adequacy of the autoregressive model is then checked by calculating
the fitted residuals
and checking them for the white noise assumption.
If
f*
is to be used
for calculating projections then from (7) and (10) the model is adequate
if
a
e
..
..
is nearly
1.
Let
matrix of
w~
with respect to Lebesgue measure.
that if
~l (A)
h(A)
be the derivative of the spectral distribution
is the minimum eigenvalue of
heAl
It will be shown below
and
~2(A)
the maximum,
15
then
~l 0)
• ll;l(,,)
~l • ess"inf ~l(A)
-1
+2 • '=1
~2 (")
and
a
Thus letting
~2· ess"sup ~2(")
and
we have
-1
~l· ~2
and
so that
a
Thus
l
• lli (,,).
•
can be estimated by calculating the fitted residuals and estimating
~l(")
successively h("),
To see
~l(")·
~2(")'
and
-1
A2 ("),
4>1
~2
and
and finally
a.
let
It is easily seen that
=
h(")
Now
~il(,,)
A*(")f(")A*'(-A).
is the maximum eigenvalue of
which has the same maximum eigenvalue as
which from (10) is
112 (") •
A similar argument shows
As in Section 3, we can define
P
rather than all of
L (F).
a
bounds, but bounds which now depend on
x
easily expressed in terms of
f
and
-1
• III 0).
H to be the space spanned by
The new
2
~2 ("-)
f*.
l
and
and
a
P
2
x
and
will give better
and which might not be
,.,
.
.".
16
ACKNOWLEDGMENT
I would like to thank Leonard J. Savage for several helpful suggestions.
REFEBENCES
1lALMOS, P.R. (1957)
BiZbezrt Space.
HELMBERG, GILBERT (1969)
New York.
Chelsea, New York.
Spectml, 1!heozty in Bitbe'Pt Space.
Wiley,
lCANTOROVICH, L.W. (1948)
Functional analysis and applied mathematics
(Russian) • Uspehi Mat. HaUk... 3(6). 89-~89. English translation
issued by National Bureau of Standards, Los Angeles, 1952.
KOLMOGOROV, A.N. (1941)
Interpolation and extrapolation of stationary
random sequences (Russian). IS1J. Akad• •auk. SSBR. Seza. Mat. 9,
3-14.
lCRDSKAL, W. (1968)
When are Gauss-Markov and least squares estimators
identical? A coordinate-free approach. Ann. Math. Stat ... 39,
70-75.
MO,e.R. (1965)
WArS~A
Uneaza Statistical, Inferenos. Wiley, New York.
G.S. (1967)
Linear least squares regression,
Ann. Math. stat ...
.)0, 1679-1699.
WHIttLE, P. (1953)
J.R.S.S... (B) ..
The analysis of multiple stationary time series.
125-139.
15,
E:x:tmpolation" Inte7'pOZation.. and Smoothing of stationary Time Senes. M. I.T. Press, Cambridge.
WIENER, N. (1949)
© Copyright 2026 Paperzz