Supplementary Material for Neural Universal Discrete Denoiser Taesup Moon DGIST Daegu, Korea 42988 [email protected] 1 Seonwoo Min, Byunghan Lee, Sungroh Yoon Seoul National University Seoul, Korea 08826 {mswzeus, styxkr, sryoon}@snu.ac.kr Proof of Lemma 1 Lemma 1 Define Lnew , −L + Lmax 11> in which Lmax , maxz,s L(z, s), the maximum element of L. Using the cost function C(·, ·) defined above, for each c ∈ Ck , let us define X p∗ (c) , arg min C L> new 1zi , p . p∈∆|S| {i:ci =c} Then, we have sk,DUDE (c, ·) = arg maxs p∗ (c)s . Proof: Recalling p̂(c) , arg min p∈∆|S| X 1> L p, zi (1) {i:ci =c} we derive p̂(c) = arg max X p∈∆|S| {i:ci =c} X > > > 1> (−L + L 11 ) p = arg max L 1 p zi | new zi {zmax } p∈∆|S| (2) {i:ci =c} =Lnew in which the first equality follows from flipping the sign of L and the fact that arg max does not change by adding a constant to the objective. Furthermore, since C(·, ·) is linear in the first argument, X X p∗ (c) = arg min C L> L> (3) new 1zi , p = arg min C new 1zi , p . p∈∆|S| p∈∆|S| i∈{ci =c} i∈{ci =c} P |S| Now, from comparing (2) and (3), and from the fact that i∈{ci =c} L> new 1zi ∈ R+ , we can show that X arg max p̂(c)s = arg max p∗ (c)s = arg max( L> new 1zi )s s s s i∈{ci =c} by considering Lagrangian of (3) and applying KKT condition. That is, p∗ (c) no longer is on one of the vertex of ∆|S| , but still puts the maximum probability mass on the vertex p̂(c). Since sk,DUDE (c, ·) = arg maxs p̂(c)s as shown in the previous section, the proof is done. 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.
© Copyright 2026 Paperzz