A Proof for Theorem. 3.3 Proof Let us define rt = (ŷt rt ŷt+1 )rf ft (xt ), namely, rt is short for r`dt (ft ) with respect to predictor ft . Take the gradient of Eq. 9 with respect to f and set it to zero, we have for ft+1 and ft : µ t X i=0 µ t 1 X i=0 Hence, we have: ri + rR(ft+1 ) = 0, (19) ri + rR(ft ) = 0 µrt + rR(ft+1 ) (20) rR(ft ) = 0. Since µ1 R(f ) is 1/µ-strongly convex, from Eq. 9, for ft and ft+1 we have: hft , t X i=0 hft+1 , ri i + (1/µ)R(ft ) t 1 X i=0 hft+1 , ri i + (1/µ)R(ft+1 ) t X i=0 hft , From the above two inequalities, we have: Now let us consider the progress DR (f ⇤ , ft ) ri i + (1/µ)R(ft+1 ) + t 1 X i=0 k ft k = kft+1 ri i + (1/µ)R(ft ) + 1 kft+1 2µ ft k2 , 1 kft+1 2µ ft k2 . (21) (22) ft k µkrt k. DR (f ⇤ , ft+1 ) in every time step t. Similar to the prove for T D⇤ (0), we have: DR (f ⇤ , ft ) DR (f ⇤ , ft+1 ) = R(f ⇤ ) R(ft ) rR(ft )(f ⇤ ft ) R(f ⇤ ) + R(ft+1 ) + rR(ft+1 )(f ⇤ ft+1 ) = R(ft+1 ) R(ft ) rR(ft )(f ⇤ ft ) + rR(ft+1 )(f ⇤ ft+1 ) = R(ft+1 ) R(ft ) + (rR(ft+1 ) rR(ft ))f ⇤ + rR(ft )ft rR(ft+1 )ft+1 = R(ft+1 ) R(ft ) + (rR(ft+1 ) rR(ft ))f ⇤ + rR(ft )ft rR(ft+1 )ft + rR(ft+1 )ft rR(ft+1 )ft+1 = R(ft+1 ) R(ft ) + (rR(ft+1 ) rR(ft ))(f ⇤ ft ) + rR(ft+1 )(ft ft+1 ) = DR (ft , ft+1 ) + µrt (ft f ⇤ ) (23) Since we assume that R(f ) is also ↵-smooth, we must have: ↵ kft 2 DR (f ⇤ , ft ): DR (ft , ft+1 ) Now let us upper bound the progress DR (f ⇤ , ft+1 ) DR (f ⇤ , ft+1 ) DR (f ⇤ , ft ) = µrt (ft f ⇤ ) + DR (ft , ft+1 ) ↵ µrt (ft f ⇤ ) + kft ft+1 k2 2 2 ↵µ µrt (ft f ⇤ ) + krt k2 2 ↵µ2 X 2 µ(et et+1 )(e⇤t et ) + (et 2 = 2µet e⇤t 2µe2t (24) ft+1 k. et+1 )2 2µ et+1 e⇤t + 2µ et+1 et + ↵µ2 X 2 2 et 2 ↵µ2 X 2 et et+1 + 2 2 2 b ⇤2 2 b ↵ 2 2 2 2 2 µ et + et 2µe2t + µ2 2 e2t+1 + e⇤2 µ X et t + µ et+1 + µ et + b 2 b 2 2 ↵ ↵ ↵ + µ2 X 2 e2t+1 + µ2 X 2 e2t+1 + µ2 X 2 2 e2t+1 2 2 2 ↵µ2 X 2 2 2 2 et+1 (25) PT Now we are going to sum over the above inequality from t = 1 to T . Note that for any t=1 e2t+1 can be rewritten as PT PT 2 2 2 2 t=1 et + ( e1 + eT +1 ). When we summing over the above inequality, we will keep use this trick to get ride of t=1 et+1 PT by replacing it by t=1 e2t + ( e21 + e2T +1 ). T X DR (f ⇤ , f1 ) 2 2 µ2 ( + b b 2 (DR (f ⇤ , ft+1 ) DR (f ⇤ , ft )) t=1 + ↵X 2 (1 + )2 ) 2 2µ(1 ) X e2t + b X e⇤2 t + C, (26) where C is a constant that only depends on eT1 and e2T +1 , which under our assumption are finitely bounded real numbers. 1 Now, let us set µ = 2 + 2 2 +(↵X 2 /2)(1+ )2 , after rearrange terms, we have: b b X e2t 2 b + 2 2 b X + (↵X 2 /2)(1 + )2 ⇤ (D (f , f ) + b e⇤2 R 1 t + C). (1 )2 (27) We can keep optimizing the of the above inequality by optimizing with respect to b. Under our assumption that |e⇤t | is PRHS ⇤2 always bounded, we have et T E 2 , where E = supt |e⇤t |. We also assume that DR (f1 , f ⇤ ) R, R 2 R+ . We can further improve the bound by optimizing it with respect to b. The RHS of the above inequality can be upper bounded as: 2 + 2 2 X ⇤2 2+2 2 (↵X 2 /2)(1 + )2 X ⇤2 (↵X 2 /2)(1 + )2 et + (R + C) + b et + (R + C) 2 2 (1 ) b(1 ) (1 )2 (1 )2 2 + 2 2 X ⇤2 (↵X 2 /2)(1 + )2 2(1 + )2 XE p et + (R + C) + ↵(R + C)T , (28) 2 2 (1 ) (1 ) (1 )2 q p P ⇤2 4(R+C) where the last inequality comes from the fact that et T E 2 , and by setting b = ↵X T ). Substitute b 2 E 2 T = O(1/ back to µ, we will have: With this µ, for P e2t , we have: X µ= q 1 ↵X 2 E 2 T R+C 1 = O( p ). T (1 + 2 ) + (↵X 2 /2)(1 + )2 (29) 2 + 2 2 X ⇤2 (↵X 2 /2)(1 + )2 2(1 + )2 XE p e + (R + C) + ↵(R + C)T t (1 )2 (1 )2 (1 )2 p 2 + 2 2 X ⇤2 = et + O( T ). 2 (1 ) e2t Now for average prediction error, we divide both sides of the above inequality by T , and take T ! 1, we have: P 2 P ⇤2 et 2+2 2 et lim . 2 T !1 T (1 ) T (30) (31) Hence we prove the theorem. B Proof for Theorem. 3.4 Proof The proof is similar to the one for OMD-T D⇤ (0). Again, we start by quantifying the progress made from step t to t + 1: kft+1 f ⇤ k2 = 2 ft (ft kft ⇤ f ⇤ k2 (32) 2 f ) + k ft k 2µ = (et 1 + µK(xt , xt ) 2µ (et 1 + µK(xt , xt ) µ2 X 2 (et (1 + µK(xt , xt ))2 et+1 )(e⇤t et ) + et+1 )(e⇤t et ) + µ2 X 2 (et et+1 )2 . et+1 )2 (33) Now for notation simplicity, let us define ⇠t = inequality, we have: f ⇤ k2 kft+1 2⇠t (et 2⇠t et e⇤t kft µ 1+µK(xt ,xt ) . f ⇤ k2 We have µ/(1 + µX 2 ) ⇠t µ. Substitute ⇠t to the above (34) et+1 )(e⇤t et ) + µ2 X 2 (et 2⇠t e2t 2⇠t et+1 e⇤t + 2⇠t et+1 ) 2 et+1 et + µ2 X 2 e2t 2µ2 X 2 et et+1 + µ2 X 2 2 e2t+1 2 b 2 b ⇠t2 e2t + e⇤t 2 2⇠t e2t + ⇠t2 2 e2t+1 + e⇤2 + ⇠t e2t+1 + ⇠t e2t + µ2 X 2 e2t b 2 b 2 t + µ2 X 2 e2t+1 + µ2 X 2 e2t+1 + µ2 X 2 2 e2t+1 2 b 2 b µ2 e2t + e⇤t 2 2µ/(1 + µX 2 )e2t + µ2 2 e2t+1 + e⇤2 + µ e2t+1 + µ e2t + µ2 X 2 e2t b 2 b 2 t + µ2 X 2 e2t+1 + µ2 X 2 e2t+1 + µ2 X 2 2 e2t+1 = Now let us sum over the above inequality from t = 1 to T , use the same tricks that we did in the proof for theorem. 3.3, we have: X X 2 2 2 kf1 f ⇤ k2 µ2 ( + + X 2 (1 + )2 ) 2µ(1/(1 + µX 2 ) ) e2t + b e⇤2 (35) t + C, b b where C again is a bounded non-negative constant that only depends on e1 and eT . We first need 1/(1 + µX 2 ) > , hence p 2 we need µ < (1/ 1)/X . Eventually we will set µ = O(1/ T ) as we will show later (and we assume T is big enough). 2 Hence we here simply assume that µ 12 (1/ 1)/X 2 . Hence, we have 1/(1 + µX 2 ) = 11+ . 1+ P 2 Now let us rearrange the terms in Eq. 35. We move the term that has et to the LHS and the term kf1 f ⇤ k2 to the RHS and set µ = (1+ )( 2 + 21 2 +X 2 (1+ )2 ) , we have: b b X X (1 + )2 ( 2b + 2b 2 + X 2 (1 + )2 ) (kf1 f ⇤ k2 + b e⇤2 (36) t + C). 2 (1 ) P ⇤2 Based on our assumption, kf1 f ⇤ k2 R, et E 2 T . We can further tight the upper bound by optimizing b. The RHS of the above inequality can be upper bounded as: h2+2 2 X 2+2 2 X 2 (1 + )2 X ⇤2 (1 + )2 e⇤2 (R + C) + b et t + 2 2 (1 ) b(1 ) (1 )2 i X 2 (1 + )2 + (R + C) (37) 2 (1 ) q p Similar to the proof for OMD-T D⇤ (0), we set b = 2(R+C) X 2 E 2 T = O(1/ T ). Substitute b back to the expression of µ, we have: e2t µ= (1 + )( the above expression, we have: X q 1 2X 2 E 2 T R+C 1 = O( p ). T (1 + 2 ) + X 2 (1 + )2 ) 2 + 2 2 X ⇤2 X 2 (1 + )4 2(1 + )4 XE p e + (R + C) + 2(R + C)T t (1 )2 (1 )2 (1 )2 p (2 + 2 2 )(1 + )2 X ⇤2 = et + O( T ). 2 (1 ) (38) e2t (1 + )2 For average prediction error, we divide both sides of the above inequality by T and take T ! 1, we have: P 2 P ⇤2 et (2 + 2 2 )(1 + )2 et lim . 2 T !1 T (1 ) T Hence we prove the theorem. (39) (40)
© Copyright 2025 Paperzz