Supplementary Material: Online Bandit Learning for a Special Class of
Non-convex Losses
Lijun Zhang1 and Tianbao Yang2 and Rong Jin3 and Zhi-Hua Zhou1
1
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China
2
Department of Computer Science, the University of Iowa, Iowa City, IA 52242, USA
3
Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
{zhanglj, zhouzh}@lamda.nju.edu.cn, [email protected], [email protected]
Proof of Lemma 4
Proof of Proposition 1
The proof is similar to that of Lemma 4 in (Zhang, Yi, and
Jin 2014). First, we have
Zt ct
>
u Et−1 −
vt
kvt k2
(1)
> vt
> vt
=
−ηEt−1 f u
u
kvt k2
kvt k2
(3), (5)
=
ηγ.
Then, consider any vector ū ⊥ u. We have
Zt ct
>
ū Et−1 −
vt
kvt k2
(1)
> vt
> vt
= − ηEt−1 f u
ū
kvt k2
kvt k2
!
#
"
kūk2 α2
α1
p
= − ηEt−1 f p 2
α1 + · · · + αd2
α12 + · · · + αd2
where α1 , . . . , αn aren independentstandard Gaussian variables. Notice that f
√
α1
α21 +···+α2d
2 α2
√ kūk
2
α1 +···+α2d
is an odd
function of α2 , so its expectation with respect to α2 must be
0. Thus, we have
Zt ct
ū> Et−1 −
vt = 0.
kvt k2
Then, it is easy to prove Proposition 1 by contradiction.
Proof of Theorem 3
The following lemma extends Lemma 2 to the general case
when each ut is a different vector.
Lemma 4. We have
>
f (x>
t ut ) − min f (x ut ) ≤
kxk≤1
if t = 1;
2B,
t−1 P 2L
γηρT (t−1) δi + 2Lkut − ūt−1 k2 , otherwise.
i=1
2
where ρT = min kūt k2 .
t∈[T ]
The rest proof is the same as that for Theorem 1.
c 2015, Association for the Advancement of Artificial
Copyright Intelligence (www.aaai.org). All rights reserved.
When t = 1, it is clear that
(2)
>
f (x>
1 u1 ) − min f (x u1 ) ≤ 2B.
kxk≤1
In the following, we discuss the case when t ≥ 2. Notice
that in this setting, (4) becomes
Z t ct
vt = γηut , t = 1, . . . , T
(16)
Et−1 −
kvt k2
and the vector-valued martingale-difference sequence is
δi = −
Z i ci
vi − γηui , i = 1, . . . , T.
kvi k2
(17)
Following the same analysis for Lemma 2, we have
>
>
>
f (x>
t ut ) − min f (x ut ) = f (xt ut ) − f (ut ut )
kxk≤1
wt
>
>
− ut ≤L|xt ut − ut ut | ≤ Lkxt − ut k2 = L kwt k
2
wt
ū
ū
t−1
t−1
≤L kwt k − kūt−1 k2 + L ut − kūt−1 k2 .
2
2
(18)
According to the procedure in Algorithm 1, we have
wt =
t−1
X
−
i=1
(17)
= γη
Zi ci
vi
kvi k2
t−1
X
i=1
ui +
t−1
X
δi
i=1
=γη(t − 1)ūt−1 +
t−1
X
δi .
i=1
Then, we have
wt
ūt−1 γη(t − 1)kūt−1 k2 − kūt−1 k2 2
t−1 1
X ≤
δi .
γη(t − 1)kūt−1 k2 i=1
2
Following a simple geometric argument, we have
t−1 X wt
2
ū
t−1
≤
δi ,
−
kwt k kūt−1 k2 γη(t
−
1)kū
k
t−1 2
2
i=1
>
f (x>
t u) − min f (x u)
kxk≤1
t−1 X
(18)
ūt−1 2L
≤
δi + L ut −
γη(t − 1)kūt−1 k2 i=1 kūt−1 k2 2
2
t−1 X
2L
≤
δi + 2Lkut − ūt−1 k2 .
γηρT (t − 1) 2
Proof of (7)
We provide the proof because our definition of ūt is slightly
different from the one in (Hazan and Kale 2010).
First, we have
T
X
kut − ūt−1 k22 −
t=2
≤
T
X
kut − ūt k22
t=2
T
X
2hut − ūt−1 , ūt − ūt−1 i
t=2
T
X
(t − 1)ūt−1 + ut
=
2 ut − ūt−1 ,
− ūt−1
t
t=2
=2
T
X
kut − ūt−1 k2
2
t
t=2
≤4
(19)
t
where the first inequality is due to the property of convexity
and the second inequality comes from our assumption that
PT
kut k2 = 1, ∀t ∈ [T ]. We then bound t=2 kut −ūtt−1 k2 as
follows.
T
T
t−1
X
X
1
kut − ūt−1 k2
1 X =
ui ut −
t
t
t − 1 i=1 t=2
t=2
2
T
t−1
X
X
1
1
=
(ui − ūT )
(ut − ūT ) −
t
t−1
t=2
≤
t=2
≤
i=1
T
X
1
T
X
t=2
≤2
t
kut − ūT k2 +
t=1
1
kut − ūT k2 +
t
T
X
t=1
T
−1
X
T
−1
X
t=1
kut − ūT k2
2
T
−1
X
i=t
kut − ūt−1 k22
t=2
≤
T
X
v
u T
uX
2
2
kut − ūt k2 + 12t
kut − ūT k2 .
t=2
1
i(i + 1)
1
kut − ūT k2
t
1
kut − ūT k2
t
v
v
v
u T
u T
u T
uX
u
uX
X
1
2t
2
t
t
≤2
≤
3
kut − ūT k2
kut − ūT k2 .
2
t
t=1
t=1
t=1
(20)
(21)
i=1
According to (Cesa-Bianchi and Lugosi 2006, Lemma 3.1),
we have
T
X
t=1
kut − ūt k22 ≤
T
X
kut − ūT k22 .
(22)
t=1
We complete the proof by combining (21) with (22).
Multiplicative Chernoff Bound
Theorem 5.
(Angluin and Valiant 1979) Let X1 ,
X2 , . . . , Xn be independent binary
Pn random variables with
Pr[Xi = 1] = pi . Denote S = i=1 Xi and µ = E[S] =
P
n
i=1 pi . We have
2 Pr [S ≤ (1 − )µ] ≤ exp − µ , for 0 < < 1,
2
2
Pr [S ≥ (1 + )µ] ≤ exp −
µ , for > 0.
2+
2
T
X
kut − ūt−1 k2
t=2
T
X
2
implying
i=1
Combining (19) with (20), we have
For the second bound, let t = 2+
µ, which implies =
√2
t+ t +8µt
. Then, with a probability at least e−t , we have
2µ
!
p
t + t2 + 8µt
µ ≤ 2µ + 2t.
S ≤ 1+
2µ
© Copyright 2026 Paperzz