SYS 6016/4582: Machine Learning Spring 2017 Lecture 12 Date: Feb 28th , 2017 Instructor: Quanquan Gu Scriber: Pan Xu In our last lecture, we introduced the concepts of cover set and covering number. Let’s restate them as follows. Definition 1 (Cover Set) We say V ⊆ Rn is an `p cover set of function class F on X1 , . . . , Xn at scale > 0, if for any f ∈ F, there exists vf ∈ V such that X n 1/p 1 f (Xi ) − vf [i]p ≤ , n i=1 where vf [i] denotes the i-th entry of vector vf . Definition 2 (Empirical Covering Number) The empirical covering number of F on X1 , . . . , Xn is defined as Np (F, ; X1:n ) = min |V | : V is an `p cover set of F on X1 , . . . , Xn at scale . V Based on these definitions, we can define the covering number as Definition 3 (Covering Number) The covering number of F at scale with respect to `p norm is defined as sup Np (F, ; X1:n ). Np (F, , n) = X1 ,...,Xn Given a function class F and sample X1 , . . . , Xn , for any f ∈ F, let the projected vector be f = (f (X1 ), . . . , f (Xn ))> ∈ Rn . There exists vf ∈ V such that the `p distance between f and vf is less than . We illustrate this in Figure 1. v4 v1 f v5 vf v2 v3 ✏ Figure 1: Illustration of covering set of F on X1 , . . . , Xn at scale . 1 Excercise: Given a Function class F and > 0, if p ≥ q > 0, then which one is larger: Np (F, , n) or Nq (F, , n)? Hint: Given any cover set V of F with |V | = Nq (F, , n), for any X1 , X2 , . . . , Xn and any f ∈ F and the projected vector f = [f (X1 ), f (X2 ), . . . , f (Xn )]> , we can find a vf ∈ V such that n−1/q kvf − f kq ≤ . For any vector x ∈ Rn , we have kxkq ≤ n1/q−1/p kxkp . Therefore, the `p norm distance of vf and f may not be smaller than . In that case, V is not sufficient to be a cover set of F at scale in `p distance, and thus Np (F, , n) ≥ Nq (F, , n). For a function class F whose output is bounded, the following theorem bounds its empirical Rademacher complexity by its `1 covering number. Theorem 1 (Pollard’s Bound) Let F be a function class such that F = f : X → [−1, 1] and X1 , . . . , Xn be n examples. The empirical Rademacher complexity can be bounded by r 2 log N1 (F, β; X1:n ) b . Rn (F) ≤ inf β + β≥0 n Proof: For all β ≥ 0, we want to show r b n (F) ≤ β + R 2 log N1 (F, β; X1:n ) . n Let V be an `1 cover of F on X1 , . . . , Xn at scale βPand |V | = N1 (F,β; X1:n ). Therefore, for any f ∈ F, there exists a vf ∈ V such that n−1 ni=1 f (Xi ) − vf [i] ≤ β. Recalling the definition of empirical Rademacher complexity, we have n 1X sup σi f (Xi )X1:n f ∈F n i=1 n 1X σi f (Xi ) − vf [i] + vf [i] X1:n , sup f ∈F n i=1 b n (F) = Eσ R = Eσ Thus we have n n X X 1 1 b n (F) ≤ Eσ sup R σi f (Xi ) − vf [i] X1:n + Eσ sup σi vf [i]X1:n f ∈F n i=1 f ∈F n i=1 n 1X ≤ β + Eσ sup σi vf [i]X1:n vf ∈V n i=1 r 2 log N1 (F, β; X1:n ) ≤β+ , n where the first inequality comes from sup(a + b) ≤ sup a + sup b, the second inequality is due to the definition of cover set V , and the last inequality is due to Massart’s Finite Lemma. 2
© Copyright 2026 Paperzz