S1 Proof of the main theorem.

Statistica Sinica: Supplement
KERNEL ADDITIVE SLICED INVERSE REGRESSION
Heng Lian and Qin Wang
University of New South Wales and Virginia Commonwealth University
Supplementary Material
S1
Proof of the main theorem.
In the proofs, C denotes a generic positive constant. We first note that EX ∗ [(fˆ(X ∗ )−f (X ∗ ))2 ] =
kΣ1/2 (fˆ − f )k2H . From (2.3), Σ1/2 fˆ satisfies the eigenvalue equation
Σ1/2 (Σn + sI)−1 Γn Σ−1/2 (Σ1/2 fˆ) = λΣ1/2 fˆ.
Similarly, Σ1/2 f satisfies the eigenvalue equation
Σ1/2 Σ−1 ΓΣ−1/2 (Σ1/2 f ) = λΣ1/2 f.
Using the perturbation theory for operators, for example as in Chapter 4 of Kato (1995), we only
need to show that kΣ1/2 (Σn + sI)−1 Γn Σ−1/2 − Σ1/2 Σ−1 ΓΣ−1/2 k2 = Op (cn n−d/(d+1) ), where
kAk denotes the operator norm for an operator A defined on HK .
We write
kΣ1/2 (Σn + sI)−1 Γn Σ−1/2 − Σ1/2 Σ−1 ΓΣ−1/2 k
=
kΣ1/2 ((Σn + sI)−1 Γn − (Σ + sI)−1 Γ + (Σ + sI)−1 Γ − Σ−1 Γ)Σ−1/2 k
=
kΣ1/2 ((Σ + sI)−1 (Γn − Γ) + (Σ + sI)−1 (Σ − Σn )(Σn + sI)−1 Γn
+(Σ + sI)−1 Γ − Σ−1 Γ)Σ−1/2 k
≤
kΣ1/2 (Σ + sI)−1 (Γn − Γ)Σ−1/2 k + kΣ1/2 (Σ + sI)−1 (Σ − Σn )(Σn + sI)−1 Γn Σ−1/2 k
+kΣ1/2 ((Σ + sI)−1 Γ − Σ−1 Γ)Σ−1/2 k
=:
(I) + (II) + (III).
(S1.1)
To simplify the proofs and notations a little bit, we assume K(., X) = 0 in the following, since
using kK(., X)kH = Op (n−1/2 ), such terms does not lead to extra difficulties in the proof. By
the same reason, we also replace p̂h by ph = P (Y = yh ) in the following expression of Γn .
Obviously we can rewrite Γn and Γ as
Γn
=
H
X
1
ph
h=1
Γ
=
!
n
1X
K(., Xi )I{Yi = yh } ⊗
n i=1
!
n
1X
K(., Xi )I{Yi = yh } ,
n i=1
H
X
1
E[(K(., X)I{Y = yh })] ⊗ E[K(., X)I{Y = yh }].
ph
h=1
(S1.2)
2
HENG LIAN AND QIN WANG
The term (III) is easy to deal with as follows. Using
kΣ1/2 ((Σ + sI)−1 − Σ−1 )(E[K(., X)I{Y = yh }] ⊗ E[K(., X)I{Y = yh }])Σ−1/2 k2
=
ksΣ1/2 (Σ + sI)−1 (Σ−1 E[K(., X)I{Y = yh }]) ⊗ (Σ−1/2 E[K(., X)I{Y = yh }])k2
=
O(ksΣ1/2 (Σ + sI)−1 k2 )
=
O(s),
we have (III)2 = O(s).
For the term (II), writing Σ(x) = K(., x) ⊗ K(., x) and using that
EkΣ1/2 (Σ + sI)−1 Σ(x)k2HS
=
Etr(Σ(x)2 (Σ + sI)−1 Σ(Σ + sI)−1 )
≤
CEtr(Σ(x)(Σ + sI)−1 Σ(Σ + sI)−1 )
=
Ctr(Σ(Σ + sI)−1 Σ(Σ + sI)−1 )
X
λ2j
C
,
(λj + s)2
j
=
where k.kHS is the Hilbert-Schmidt norm. In the inequality above we used that kΣ(x)f kH =
p
kK(., x)f (x)kH = f 2 (x)K(x, x) ≤ Ckf k∞ ≤ Ckf kH and thus kΣ(x)k ≤ C, and the inequality
tr(AB) ≤ kAktr(B).
Thus using the Markov inequality, we have
(II)2
≤
=
kΣ1/2 (Σ + sI)−1 (Σ − Σn )k2 · k(Σn + sI)−1 Γn Σ−1/2 k2
!
X
λ2j
Op
k(Σn + sI)−1 Γn Σ−1/2 k2 .
2
n(λ
j + s)
j
We will argue later that actually k(Σn + sI)−1 Γn Σ−1/2 k2 = Op (1).
The term (I) is more complicated. Let Γnh and Γh be the terms on the right hand side
P
P −1
of the sums in (S1.2) such that Γn = h p−1
h Γnh , Γ =
h ph Γh . To bridge Γnh and Γh , we
further define
Γ0nh =
!
n
1X
K(., Xi )I{Yi = yh } ⊗ E[K(., X)I{Y = yh }].
n i=1
KERNEL ADDITIVE SLICED INVERSE REGRESSION
3
Since
EkΣ1/2 (Σ + sI)−1 (K(., X)I{Y = yh } − E[K(., X)I{Y = yh }])
⊗(Σ−1/2 E[K(., X)I{Y = yh }])k2HS
≤
CEkΣ1/2 (Σ + sI)−1 (K(., X)I{Y = yh } − E[K(., X)I{Y = yh }])k2H
≤
CEkΣ1/2 (Σ + sI)−1 K(., X)I{Y = yh }k2H
CE I{Y = yj }hΣ(Σ + sI)−2 K(., X), K(., X)i2H
CE I{Y = yj }tr(Σ(Σ + sI)−2 Σ(X))
CE tr(Σ(Σ + sI)−2 Σ(X))E[I{Y = yj }|X]
=
=
=
≤
=
Ctr(Σ2 (Σ + sI)−2 )
X
λ2j
C
,
(λj + s)2
j
by Markov inequality,
kΣ
1/2
(Σ + sI)
−1
(Γ0nh
−1/2 2
− Γh )Σ
k = Op
X
j
λ2j
n(λj + s)2
!
λ2j
n(λj + s)2
!
.
Similarly one can show
kΣ1/2 (Σ + sI)−1 (Γn − Γ0nh )Σ−1/2 k2 = Op
X
j
.
These imply that
(II)2 = Op
X
j
λ2j
n(λj + s)2
!
.
Once we have shown k(Σn +sI)−1 Γn Σ−1/2 k = Op (1),the bounds given above will combine
P
λ2
j
and direct calculations by plugging
to obtain that (S1.1) is bounded by Op s + j n(λ +s)
2
j
in λj j −d and s = cn n−d/(d+1) shows that this is Op (cn n−d/(d+1) ).
What is left is to show k(Σn + sI)−1 Γn Σ−1/2 k = Op (1), which is equivalent to showing
k(Σn + sI)−1 Γn Σ−1/2 − Σ−1 ΓΣ−1/2 k = Op (1). Note this equation is actually similar to (S1.1).
Following similar lines that are used to upper bound the terms (I)-(III) in (S1.1), we will get
=
k(Σn + sI)−1 Γn Σ−1/2 − Σ−1 ΓΣ−1/2 k2
!
X
X
λj
λj
Op
k(Σn + sI)−1 Γn Σ−1/2 k2 + Op (1 +
).
2
2
n(λ
n(λ
j + s)
j + s)
j
j
When s = cn n−d/(d+1) with cn → ∞, by direct calculations we have
λj
j n(λj +s)2
P
thus the above displayed equation implies k(Σn + sI)−1 Γn Σ−1/2 k = Op (1).
= o(1) and