ADD-Information Theory

Information & Entropy
Shannon Information Axioms

Small probability events should have more
information than large probabilities.
–
–

“the nice person” (common words  lower info)
“philanthropist” (less used  more information)
Information from two disjoint events should
add
–
–
–
“engineer”  Information I1
“stuttering”  Information I2
“stuttering engineer”  Information I1 + I2
Shannon Information
7
I
I   log 2 ( p)
6
5
4
3
2
1
p
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Information Units



log2 – bits
loge – naps
log10 – ban or a hartley
Ralph Vinton Lyon Hartley (1888-1970)
inventor of the electronic oscillator circuit
that bears his name, a pioneer in the field
of Information Theory
Illustration

Q: We flip a coin 10 times. What is the
probability we come up the sequence
0 0 1 1 0 1 1 1 0 1?
10
Answer
1

How much information do we have?

p 
2
10
1
I   log 2 ( p)   log 2    10 bits
2
Illustration: 20 Questions

Interval halving: Need 4 bits of information
1
I   log 2    4 bits
 16 
Entropy
 Bernoulli
trial with parameter p
 Information from a success = log 2  p 
 Information from a failure =  log 2 1  p 
 (Weighted) Average Information
H   p log 2 ( p)  (1  p) log 2 1  p
 Average Information = Entropy
The Binary
Entropy Function
p
h( p)   p log 2 ( p)  (1  p) log 2 1  p
Entropy Definition
H   pn log 2 ( pn )
n
=average Information
Entropy of a
Uniform
Distribution
1
pK  ;1  k  K
K
K
1
1
H   log 2
K
k 1 K
 log 2 K
Entropy as an Expected Value
H   pn log 2 ( pn )
n
 E[ I ]   E log 2 ( p X ( X )) 
where
 pn ; x  xn
p X ( x)  
0 ; otherwise
Entropy of a Geometric RV
 p(1  p) k ; x  0,1,2,..., k ,...
p X ( x)  
0 ; otherwise
then
H   pq n log 2 ( pq n )
n
  p  q n log 2 ( p )  n log 2
n
h( p )

p
H = 2 bits
when
q
p=0.5
Relative Entropy
p  [ p1 , p2 , p3 ,..., pK ]
q  [q1 , q2 , q3 ,..., qK ]
K
H ( p, q)   pk log 2 qk  H p
k 1
K
 
k 1
 qk
pk log 2 
 pk



Relative Entropy Property
H ( p, q )  0
Equality iff p=q
Relative Entropy Property Proof
Since
 ln x  1  x
K
H ( p, q )   
k 1
K

k 1
 qk
pk ln 
 pk



K
 qk  K
pk 1     pk   qk  0
k 1
 pk  k 1
Uniform Probability is
Maximum Entropy
Relative to uniform:
K
H ( p, q )  
k 1
 p 
pk ln  k   0
1 
 K
Thus, for K fixed,
How does this relate
to thermodynamic
entropy?
H  log 2 K  maximum entropy
Entropy as an Information
Measure: Like 20 Questions
1
1
1
1
2
2
2
2
3
3
4
4
5
6
7
8
16 Balls
Bill Chooses One
You must find which ball with
binary questions. Minimize
the expected number of
questions.
One Method...
1
2
no
yes
3
no
yes
4
no
yes
5
no
yes
6
no
yes
7
no
yes
8
no
yes
1
1
1
1
E[Q ]   1   2   3   4
4
4
8
8
1
1
1
1

5 
6
7 
7
16
16
16
16
 51
 3.1875 bits
16
Another (Better) Method...
is X  2 ?
no
is X  4 ?
yes
is X  1?
yes
1
is X  6 ?
yes
yes
3
no
is X  7 ?
yes
is X  3?
no
2
no
is X  5 ?
no
4
yes
yes
5
Longer paths have
smaller probabilities.
7
no
8
no
6
1
1
1
1
2
2
2
2
3
3
4
4
5
6
7
8
is X  2 ?
no
is X  4 ?
yes
is X  1?
yes
1
is X  6 ?
yes
yes
is X  3?
no
2
no
yes
3
is X  5 ?
no
4
yes
5
no
is X  7 ?
yes
7
no
8
no
1
1
6
2
2
2
2
3
3
4
4
5
6
7
8
1
1
1
1
1
1
E[Q ]   2   2   3   3
4
4
8
8
1
1
1
1

4
4
4
4
16
16
16
16
 44  2.75 bits
16
Relation to Entropy...
E[Q]  44
16
 2.75 bits
The Problem’s Entropy is...
1
1
1
1
H  -2  log 2 - 2  log 2
4
4
8
8
1
1
44
- 4
log 2

 2.75 bits
16
16 16
1
1
1
1
2
2
2
2
3
3
4
4
5
6
7
8
Principle...
1
1
2
2
2
2
3
3
4
4
5
6
7
8
1
1
1
•The expected number of
questions will equal or exceed
the entropy.
There can be equality only if all
probabilities are powers of ½.
1
1
1
1
2
2
2
3
3
5
6
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
4
4
3
3
4
4
3
3
4
4
7
8
5
6
7
8
5
6
7
8
Principle Proof
Lemma: If there are k
solutions and the length of
the path to the k th
solution is  , then
1
1
1
1
2
2
2
2
3
3
4
4
5
6
7
8
k
K
2
k 1
 k
1
Principle Proof
K
K
k 1
k 1
E[Q ]  H    k pk   pk log 2 pk
K
=
pk
  pk log 2  k
2
k 1
 k
the relative entropy with respect to qk  2
Since the relative entropy always is nonnegative...
E[Q ]  H