Supplementary Material

Görnitz, Porbadnigk, Binder, Sannelli, Braun, Müller, Kloft
Supplemental Material
A
Proofs
When bounding the Rademacher complexity for Lipschitz
continuous loss classes (such as the hinge loss or the
squared loss), the following lemma is often very helpful.
Lemma A.1 (Talagrand’s lemma [41]). Let l : R → R be a
loss function that is L-Lipschitz continuous and l(0) = 0.
Let F be a hypothesis class of real-valued functions and
denote its loss class by G := l ◦ F. Then the following
inequality holds:
Rn (G) ≤ 2LRn (F ).
We can use the above result to prove Lemma 3.
Proof of Lemma 3. Since the L ATENT SVDD loss function
is 1-Lipschitz with l(0) = 0, by Lemma A.1, it is sufficient
to bound R(FSVDD (z)). To this end, it holds
�
n
1� �
σi Ω
c,Ω:0≤�c�2 +Ω≤λ n i=1
�
+ 2�c, Ψ(xi , z)� − �Ψ(xi , z)�2 ]
�
�
n
1�
σi Ω
≤ E
sup
Ω:−λ≤Ω≤λ n i=1
�
�
n
1�
σi (�c, Ψ(xi , z)�)
+ 2E
sup
c:�c�2 ≤λ n i=1
�
�
n
1�
2
σi �Ψ(xi , z)� . (A.3.1)
+E −
n i=1
�
��
�
def.
R(FSVDD (z)) = E
sup
=0 (by symmetry of σi )
Note that the term to the right is zero because the
Rademacher variables are random signs, independent of
x1 , . . . , xn . The term to the left can be bounded as follows:
�
n
1�
E
sup
σi Ω
Ω:−λ≤Ω≤λ n i=1
�
��
�� n
�1 � �
�
�
= λE �
σi �
�
�n
i=1
� 

�
n
�
�
(*) �
1
σi σj 
≤ λ�E  2
n i,j=1
λ
= √ .
n
(A.3.2)
where for (∗) we employ Jensen’s inequality. Moreover,
applying the Cauchy-Schwarz inequality and Jensen’s in-
equality, respectively, we obtain
�
�
n
1�
E
sup
σi (�c, Ψ(xi , z)�)
c:�c�2 ≤λ n i=1
� n
��
�
�1 �
�
C.-S.
�
�
≤ E
sup �c� �
σi Ψ(xi , z)�
�
�
2
n
c:�c� ≤λ
i=1
� 

�
n
�
�
Jensen �
1
≤ �λE  2
σi σj �Ψ(xi , z), Ψ(xj , z)�
n i,j=1
�
�
n
� 1 �
2
�Ψ(xi , z)�
= �λ 2
n i=1
�
λ
≤ B
n
(A.3.3)
because P(�Ψ(xi , z)� ≤ B) = 1. Hence, inserting the
results (A.3.2) and (A.3.3) into (A.3.1), yields the claimed
result, that is,
R(GSVDD (z))
Lemma A.1
≤
R(FSVDD (z))
�
√
λ+B λ
λ
λ
√
=
≤ √ +B
.
n
n
n
(A.3.4)
Next, we invoke the following result, taken from [23]
(Lemma 8.1).
Lemma A.2. Let F1 , . . . , Fl be hypothesis sets in RX , and
let F := {max(f1 , . . . , fl } : fi ∈ Fi , i ∈ {1, . . . , l}}.
Then,
l
�
Rn (Fj ).
Rn (F ) ≤
j=1
Sketch of proof [23]. The idea of the proof is to
write max(h1 , h2 ) = 12 (h1 + h2 + |h1 − h2 |), and
then to show that
�
�
n
1�
E
sup
|h1 (xi ) − h2 (xi )| ≤ Rn (F1 )+Rn (F2 ).
h1 ∈F1 ,h2 ∈F2 n i=1
This proof technique also generalizes to l > 2.
We can use Lemma A.2 and Lemma 3, to conclude the
main theorem of this paper, that is, Theorem 2, which
establishes
generalization guarantees of the usual order
√
O(1/ n) for the proposed L ATENT SVDD method.
Proof of Theorem 2. First observe that, because l is 1Lipschitz,
Rn (GL ATENT SVDD ) ≤ Rn (FL ATENT SVDD ).
Learning and Evaluation in Presence of Non-i.i.d. Label Noise
Next, note that we can write
�
�
Rn (FL ATENT SVDD ) = max(fz ) : fz ∈ FSVDD (z) .
z∈Z
Thus, by Lemma 2 and Lemma 4,
Rn (FL ATENT SVDD ) ≤ |Z| max Rn (FSVDD (z))
z∈Z
√
λ+B λ
.
≤ |Z| √
n
Moreover, observe that the loss function in the definition
of GL ATENT SVDD can only range in the interval [0, B]. Thus,
Theorem 2 in the main paper gives the claimed result, that
is,
�
2 log(2/δ)
∗
E[�
gn ] − E[g ] ≤ 4Rn (GL ATENT SVDD ) + B
n
�
√
2 log(2/δ)
λ+B λ
.
+B
≤ 4|Z| √
n
n