1 Supplementary Material: Complete Proof of Theorem 4

1
Supplementary Material: Complete Proof of Theorem 4
e λ and H
e α as
Define two new functions L




dj −1
X
X
X
e λ := 1 
πΘj 
λh 
L
πΘj λdj − 1 =
λ−1
j∈L
j∈L
h=0
1
e α := 1 −
H
P
α1 ,
K
α
k=1 πΘk
e λ is related to the cost function Lλ (Π) as
where L
e λ + 1,
λLλ (Π) = (λ − 1)L
(1)
e α is related to the α-Rényi entropy Hα (Πy ) as
and H
K
K
X
X
1
1
α
α
πΘ
πΘ
Hα (Πy ) =
log2
log2
k =
k = logλ
1−α
α log2 λ
k=1
k=1
! α1
! α1
K
K
X
X
α
α
eα + 1
=⇒ λHα (Πy ) =
πΘ
=
πΘ
H
k
k
k=1
K
X
! α1
α
πΘ
k
(2a)
k=1
(2b)
k=1
where we use the definition of α, i.e., α =
1
1+log2 λ
in (2a).
e λ can be decomposed as
Now, we note from Lemma 1 that L
X
eλ =
L
λda πΘa
a∈I
=⇒ λ
Lλ (Π)
=1+
X
(λ − 1)λda πΘa
(3)
a∈I
where da denotes the depth of internal node ‘a’ in the tree T . Similarly, note from Lemma 2 that
e α can be decomposed as
H
X
1
eα =
H
πΘa Dα (Θa ) − πΘl(a) Dα (Θl(a) ) − πΘr(a) Dα (Θr(a) )
P
α1
K
α
a∈I
k=1 πΘk
X
=⇒ λHα (Πy ) = 1 +
πΘa Dα (Θa ) − πΘl(a) Dα (Θl(a) ) − πΘr(a) Dα (Θr(a) ) .
(4)
a∈I
Finally, the result follows from (3) and (4) above.
e λ can be decomposed over the internal nodes in a tree T , as
Lemma 1. The function L
X
eλ =
L
λda πΘa
a∈I
where da denotes the depth of internal node a ∈ I and πΘa is the probability mass of the objects at
that node.
Proof. Let Ta denote a subtree from any internal node ‘a’ in the tree T and let Ia , La denote the set
e a in the subtree Ta to
of internal nodes and leaf nodes in the subtree Ta , respectively. Then, define L
λ
be
h
i
πΘj Pda
j −1
h
ea = P
λ
L
λ
j∈La πΘ
h=0
a
where daj denotes the depth of leaf node j ∈ La in the subtree Ta .
1
Now, we show using induction that for any subtree Ta in the tree T , the following relation holds
X a
ea =
πΘa L
λds πΘs
(5)
λ
s∈Ia
where
das
denotes the depth of internal node s ∈ Ia in the subtree Ta .
The relation holds trivially for any subtree Ta rooted at an internal node a ∈ I whose both child
nodes terminate as leaf nodes, with both the left hand side and the right hand side of the expression
equal to πΘa . Now, consider a subtree Ta rooted at an internal node a ∈ I whose left child (or
right child) alone terminates as a leaf node. Assume that the above relation holds true for the subtree
rooted at the right child of node ‘a’. Then,
 a

dj −1
X
X
e aλ =
πΘa L
πΘj 
λh 
j∈La
h=0
da
j −1

X
=
X
πΘj +
{j∈La :da
j =1}
πΘj 
{j∈La :da
j >1}
= πΘl(a) +
λh 

r(a)
dj −1
 X h
λ 
πΘj 
j∈Lr(a)
h=0
dr(a)
s
X
= πΘa + λ

h=0

X
X
πΘj 1 + λ
{j∈La :da
j >1}
= πΘa + λ
λh 
h=0
da
j −2

X

X
λ
πΘs
s∈Ir(a)
where the last step follows from the induction hypothesis. Finally, consider a subtree Ta rooted at
an internal node a ∈ I whose neither child node terminates as a leaf node. Assume that the relation
in (5) holds true for the subtrees rooted at its left and right child nodes. Then,
 a

dj −1
X
X
e aλ =
πΘa L
πΘj 
λh 
j∈La
h=0
da
j −2

=
X
πΘj 1 + λ
j∈Ll(a)
X
λh  +
πΘj 1 + λ
l(a)
dj −1

λh 
h=0
j∈Ll(a)


j∈Lr(a)
h=0

= πΘa + λ 
X

r(a)
dj −1
X
X
X




λh 
πΘj 
λh  + λ
πΘj 

= πΘa + λ
X
j∈Lr(a)
h=0
X
da
j −2


h=0

X
l(a)
λds πΘs +
s∈Il(a)
X
s∈Ir(a)
r(a)
λds
πΘs  =
X
a
λds πΘs
s∈Ia
thereby completing the induction. Finally, the result follows by applying the relation in (5) to the
tree T whose probability mass at the root node, πΘa = 1.
e α can be decomposed over the internal nodes in a tree T , as
Lemma 2. The function H
X
1
eα =
H
πΘa Dα (Θa ) − πΘl(a) Dα (Θl(a) ) − πΘr(a) Dα (Θr(a) )
P
α1
K
α
a∈I
k=1 πΘk
hP
π k α i α1
Θa
K
where Dα (Θa ) :=
and πΘa denotes the probability mass of the objects at any
k=1 πΘa
internal node a ∈ I.
2
Proof. Let Ta denote a subtree from any internal node ‘a’ in the tree T and let Ia denote the set of
e αa in a subtree Ta to be
internal nodes in the subtree Ta . Then, define H
πΘa
e αa = 1 −
H
PK
k=1
1
α
παi
Θa
Now, we show using induction that for any subtree Ta in the tree T , the following relation holds
"K
# α1
X
X
ea =
H
πΘ Dα (Θs ) − πΘ Dα (Θl(s) ) − πΘ Dα (Θr(s) )
(6)
πα k
Θa
α
s
l(s)
r(s)
s∈Ia
k=1
Note that the relation holds trivially for any subtree Ta rooted at an internal node a ∈ I whose both
child nodes terminate as leaf nodes. Now, consider a subtree Ta rooted at any other internal node
a ∈ I. Assume the above relation holds true for the subtrees rooted at its left and right child nodes.
Then,
"K
# α1
"K
# α1
"K
# α1
X
X
X
e αa =
πα k
H
πα k
− πΘ =
πα k
− πΘ − πΘ
Θa
Θa
k=1
Θa
a
k=1
"
=
K
X
# α1
α
πΘ
k
a
"
−
K
X
# α1
α
πΘ
k
l(a)
"
K
X
# α1
α
πΘ
k
l(a)
"
−
k=1
k=1
+
l(a)
r(a)
k=1
K
X
# α1
α
πΘ
k
r(a)
k=1

"
− πΘl(a)  + 
k=1
K
X
# α1
α
πΘ
k
r(a)

− πΘr(a) 
k=1
= πΘa Dα (Θa ) − πΘl(a) Dα (Θl(a) ) − πΘr(a) Dα (Θr(a) )
"K
# α1
"K
# α1
X
X
l(a)
α
α
eα +
e αr(a)
H
πΘk
H
+
πΘk
r(a)
l(a)
k=1
k=1
=
X
πΘs Dα (Θs ) − πΘl(s) Dα (Θl(s) ) − πΘr(s) Dα (Θr(s) )
s∈Ia
where the last step follows from the induction hypothesis. Finally, the result follows by applying the
relation in (6) to the tree T .
2
Proof of Theorem 3
The result in Theorem 3 follows from the above result where each group is of size one, thereby
reducing Dα (Θa ) to

 α1
#1
"M X πi α
X πi I{θi ∈ Θa } α α
 ,
=
Dα (Θa ) =
πΘa
πΘa
i=1
{i:θi ∈ Θa }
where I{θi ∈ Θa } is the indicator function which takes the value one when θi ∈ Θa , and zero otherwise.
3
Proof of Theorem 2
The result in Theorem 2 is a special case of that in Theorem 4 when λ → 1. It follows by taking the
logarithm to the base λ on both sides of equation
X
πΘr(a)
πΘl(a)
πΘa (λ − 1)λda − Dα (Θa ) +
Dα (Θl(a) ) +
Dα (Θr(a) ) ,
λLλ (Π) = λHα (Πy ) +
πΘa
πΘa
a∈I
and then finding the limit as λ → 1.
3
Using L’Hôpital’s rule, the left hand side (LHS) of the equation reduces to
X
lim logλ (LHS) = lim Lλ (Π) =
πΘj dj ,
λ→1
where Lλ (Π) = logλ
to
j∈L
πΘj λdj . Similarly, the right hand side (RHS) of the equation reduces
lim logλ (RHS) = H(Πy ) +
λ→1
X
a∈I
where H(Θa ) = −
j∈L
P
PK
λ→1
πΘk
πΘl(a)
πΘr(a)
πΘa 1 − H(Θa ) −
H(Θl(a) ) −
H(Θr(a) ) ,
πΘa
πΘa
log2
a
k=1 πΘa
π
Θk
a
πΘa
.
Finally, the result follows by noticing that
πΘl(a)
πΘr(a)
H(Θa ) −
H(Θl(a) ) −
H(Θr(a) )
πΘa
πΘa
"K
!
!
!#
πΘl(a)
πΘr(a)
1 X
πΘa
πΘka log2
− πΘk log2
− πΘk log2
(7a)
=
l(a)
r(a)
πΘa
πΘka
πΘk
πΘk
l(a)
r(a)
k=1
"K
!
!#
πΘk
1 X
πΘa πΘkl(a)
πΘa
r(a)
=
πΘk log2
·
+ πΘk log2
·
(7b)
l(a)
r(a)
πΘa
πΘl(a) πΘka
πΘr(a)
πΘka
k=1
"
!
!
1
πΘa
πΘa
πΘl(a) log2
+ πΘr(a) log2
=
πΘa
πΘl(a)
πΘr(a)
!
!#
K
πΘk
πΘk
X
r(a)
l(a)
+ πΘk log2
(7c)
+
πΘk log2
r(a)
l(a)
πΘka
πΘka
k=1
= H(ρa ) +
K
X
πΘk
a
k=1
πΘa
H(ρka ),
(7d)
where (7b) follows from (7a) by using the relation πΘka = πΘk
l(a)
(7c) using the definitions of ρa and
4
+ πΘk
r(a)
, and (7d) follows from
ρka .
Proof of Theorem 1
The result in Theorem 1 follows from the above result where each group is of size one, thereby
having ρka = 1 ∀k at each internal node a ∈ I. It can also be derived as the limiting case of the
relation in Theorem 3 by taking logarithm to the base λ on both sides of the relation and letting
λ → 1.
4