suppl

Supplementary Material for
Neural Universal Discrete Denoiser
Taesup Moon
DGIST
Daegu, Korea 42988
[email protected]
1
Seonwoo Min, Byunghan Lee, Sungroh Yoon
Seoul National University
Seoul, Korea 08826
{mswzeus, styxkr, sryoon}@snu.ac.kr
Proof of Lemma 1
Lemma 1 Define Lnew , −L + Lmax 11> in which Lmax , maxz,s L(z, s), the maximum element
of L. Using the cost function C(·, ·) defined above, for each c ∈ Ck , let us define
X
p∗ (c) , arg min
C L>
new 1zi , p .
p∈∆|S|
{i:ci =c}
Then, we have sk,DUDE (c, ·) = arg maxs p∗ (c)s .
Proof: Recalling
p̂(c) , arg min
p∈∆|S|
X
1>
L
p,
zi
(1)
{i:ci =c}
we derive
p̂(c) = arg max
X
p∈∆|S|
{i:ci =c}
X
>
>
>
1>
(−L
+
L
11
)
p
=
arg
max
L
1
p
zi |
new zi
{zmax }
p∈∆|S|
(2)
{i:ci =c}
=Lnew
in which the first equality follows from flipping the sign of L and the fact that arg max does not
change by adding a constant to the objective. Furthermore, since C(·, ·) is linear in the first argument,
X
X
p∗ (c) = arg min
C L>
L>
(3)
new 1zi , p = arg min C
new 1zi , p .
p∈∆|S|
p∈∆|S|
i∈{ci =c}
i∈{ci =c}
P
|S|
Now, from comparing (2) and (3), and from the fact that i∈{ci =c} L>
new 1zi ∈ R+ , we can show
that
X
arg max p̂(c)s = arg max p∗ (c)s = arg max(
L>
new 1zi )s
s
s
s
i∈{ci =c}
by considering Lagrangian of (3) and applying KKT condition. That is, p∗ (c) no longer is on
one of the vertex of ∆|S| , but still puts the maximum probability mass on the vertex p̂(c). Since
sk,DUDE (c, ·) = arg maxs p̂(c)s as shown in the previous section, the proof is done.
30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.