Ch2: Basic Concepts

資訊理論
Ch2: Basic Concepts
授課老師: 陳建源
Email:[email protected]
研究室:法401
網站 http://www.csie.nuk.edu.tw/~cychen/
Ch2: Basic Concepts
2. 1 Self-information
Let S be a system of events
in which
E1 , E 2 ,..., E n
P(E k )  p k with 0  p k  1
p1  p 2    p n  1
Def:
The self-information of the event Ek is written I(Ek):
I(E k )   log pk .
lnx
log 2 x 
 (lnx)(log 2 e)
ln2
The base of the logarithm: 2 (log) , e (ln)
單位:bit,
nat
Ch2: Basic Concepts
2. 1 Self-information
when pk  1
then I(E k )   log pk  0
1
when pk 
2
then I(E k )   log pk  1
1
16
then I(E k )   log pk  4
when pk  0
then I(E k )   log pk  ?
when pk 
pk
愈小 I(E )
k
愈大
Ch2: Basic Concepts
2. 1 Self-information
Ex1. A letter is chosen at random from the Enlish alphabet.
pk 
1
26
1
I(E k )   log
 4.7bits
26
Ex2. A binary number of m digits is chosen at random.
pk 
1
2m
1
I(E k )   log m  m bits
2
Ch2: Basic Concepts
2. 1 Self-information
Ex3. 64 points are arranged in a square grid.
Ej be the event that a point picked at random in the jth column
Ek be the event that a point picked at random in the kth row
1
P(E j )  P(E k ) 
8
I(E j )  I(E k )  3 bits
1
I(E j  E k )  -log
 6 bits
64
I(E j  E k )  I(E j )  I(E k )  6 bits
Why?
Ch2: Basic Concepts
2. 2 Entropy
f: Ek→ fk
E(f) be expectation or average or mean of f
n
E(f)   pk f k
k 1
Let S be the system with events E1 , E 2 ,..., E n
the associated probabilities being
n
p1 , p2 ,, pn with 0  pk  1,  pk  1
k 1
Ch2: Basic Concepts
2. 2 Entropy
Def: The entropy of S, called H(S), is the average of the self-information
n
H(S)  E(I)  - p k logp k
k 1
Self-information of an event increases as its uncertainty grows
觀察
logp k  0
Let
p1  1, p 2   p n  0
H(S)  0
H(S)  0
certainty
H(S)
最小值為0,表示已確定。但最大值呢?
Ch2: Basic Concepts
2. 2 Entropy
Thm: H(S)  logn with equality only when
p1  p 2   p n 
1
n
Proof:
n

H(S)  -
n
p k logp k
H(S) 
k 1
n

k 1
1
p k ln
np k

p k log
k 1
n

k 1
p k (log
1
pk
1
1
 log )
pk
n
Ch2: Basic Concepts
2. 2 Entropy
Thm 2.2: For x>0
ln x  x  1
with equality only when x=1.
Assume that pk ≠0
n

k 1
n

k 1
1
p k ln

np k
1
p k log

np k
n
H(S) - logn
n

k 1
n

k 1
1
( - p k )  11  0
n
n
n
n
k 1
k 1
k 1
 p k - logp k - logn   - p k logp k   p k logn 
 pk  0
k 1
1
pk (
- 1) 
np k
Ch2: Basic Concepts
2. 2 Entropy
n
H(S) - logn
 pk  0
k 1
H(S)  logn
Ch2: Basic Concepts
2. 2 Entropy
In the system S the probabilities p1 and p2 where p2> p1 are
replaced by p1 +ε and p2-εrespectively under the proviso
0<2ε<p2-p1 .
Prove the H(S) is increased.
We know that entropy H(S) can be viewed as a measure of
_____ about S. Please list 3 items for this blank.
information
uncertainty
randomness
Ch2: Basic Concepts
2. 3 Mutual information
Let S1 be the system with events E1 , E 2 ,..., E n
the associated probabilities being
n
p1 , p2 ,, pn with 0  pk  1,  pk  1
k 1
Let S2 be the system with events
F1 , F2 ,..., Fn
the associated probabilities being
n
q1 , q2 ,, qn with 0  qk  1,  qk  1
k 1
Ch2: Basic Concepts
2. 3 Mutual information
Two systems S1 and S2
P(E j  Fk )  p jk  0
m
 p
satisfying
relation
n
k 1 j1
jk
1
p jk , p j , qk
m
m
i 1
k 1
p j  P(E j )   P(E j  Fi )   p jk
n
p
j1
j
1
Ch2: Basic Concepts
2. 3 Mutual information
relation
p jk , p j , qk
n
n
i 1
j1
qk  P(Fk )   P(E i  Fk )   p jk
m
q
k 1
k
1
Ch2: Basic Concepts
2. 3 Mutual information
conditional probability
P(E j | Fk )  P(E j  Fk ) / P(Fk )  p jk / qk
P(Fk | E j )  P(E j  Fk ) / P(E j )  p jk / p j
conditional self-information
I(E j | Fk )   log P(E j | Fk )   log p jk / qk 
mutual information
NOTE:
I(E j , Fk )  I(Fk , E j )
 p jk
I(E j , Fk )  log
 log 
p q
P(E j )P(Fk )
 j k
P(E j  Fk )




Ch2: Basic Concepts
2. 3 Mutual information
conditional entropy
n
m
n
m
H(S1 | S2 )   p jk I(E j | Fk )   p jk log(p jk /q k )
j 1 k 1
j 1 k 1
mutual information
n
m
n
m
I(S1 , S2 )   p jk I(E j , Fk )   p jk log(
j 1 k 1
j 1 k 1
p jk
p jq k
)
Ch2: Basic Concepts
2. 3 Mutual information
mutual information and conditional self-information
I(E j , Fk )  log
P(E j  Fk )
P(E j )P(Fk )
 - log P(E j )  log
 I(E j )  I(E j | Fk )
P(E j  Fk )
P(Fk )
I(E j , Fk )  log
P(E j  Fk )
P(E j )P(Fk )
 - log P(Fk )  log
P(E j  Fk )
P(E j )
 I(Fk )  I(Fk | E j )
If Ej and Fk are statistically independent
I(E j , Fk )  0
Ch2: Basic Concepts
2. 3 Mutual information
joint entropy
n
m
H(S1  S2 )   p jk P(E j  Fk )
j 1 k 1
n
H(S1  S2 )  0
m
 - p jk log(p jk )
j 1 k 1
joint entropy and conditional entropy
n
m
n
m
H(S1  S2 )   p jk log(p jk )   p jk (log
j 1 k 1
m
j 1 k 1
 H(S1 | S2 ) -  qk logq k H(S1 | S2 )  H(S 2 )
k 1
p jk
qk
 logq k )
Ch2: Basic Concepts
2. 3 Mutual information
H(S1  S2 )  H(S1 | S2 )  H(S2 )
H(S1  S2 )  H(S2 | S1 )  H(S1 )
mutual information and conditional entropy
n
m
I(S1 , S2 )   p jk log(
j 1 k 1
n
m
   p jk log
j 1 k 1
p jk
qk
n
p jk
p jq k
n
m
)   p jk (log
j 1 k 1
p jk
qk
- logp j )
 p j logp j  H(S1 )  H(S1 | S2 )
j 1
Ch2: Basic Concepts
2. 3 Mutual information
Thm:
I(S1, S2 )  H(S1 )  H(S2 )  H(S1  S2 )
 H(S1  S2 )  0
I(S1 , S2 )  H(S1 )  H(S2 )
mutual information of two systems cannot exceed the sum
of their separate entropies
Ch2: Basic Concepts
2. 3 Mutual information
System’s independent
If S1 and S2 are statistically independent
I(E j , Fk )  0 E j  S1 , Fk  S2
 I(S1 , S2 )  0
 I(S1 , S2 )  H(S1 )  H(S2 )  H(S1  S2 )
H(S1  S2 )  H(S1 )  H(S2 )
Joint entropy of two statistically independent systems is
the sum of their separate entropies
Ch2: Basic Concepts
2. 3 Mutual information
Thm: H(S1 | S2 )  H(S1 )
with equality only if S1 and S2 are statistically independent
Proof:Assume that pjk ≠0
n
m
 p
j 1
k 1
jk
ln
p jq k
p jk
n
n

p
j 1
k 1
jk
(
p jq k
H(S1 )
H(S1 | S2 )
n

j 1
m
p
k 1
jk
log
p jk
- 1)  1  1  0
p jk
qk
n
 
j 1
m
p
k 1
n
jk
logp j   p jlogp j
j 1
Ch2: Basic Concepts
2. 3 Mutual information
Thm:
I(S1 , S2 )  0
with equality only if S1 and S2 are statistically independent
Proof:
 I(S1, S2 )  H(S1 )  H(S1 | S2 )
H(S1 | S2 )  H(S1 )
I(S1 , S2 )  0
Ch2: Basic Concepts
2. 3 Mutual information
Ex: A binary symmetric channel with crossover probability ε
Let S1 be the input E0=0, E1=1
and S2 be the output F0=0, F1=1
P(F0 | E 0 )  1 - 
P(F1 | E 0 )  
P(F0 | E1 )  
P(F1 | E1 ) 1  
Ch2: Basic Concepts
2. 3 Mutual information
Assume that
P(E 0 )  p 0
P(E1 )  p1  1  p 0
Then
p00  P(E 0  F0 )  P(F0 | E 0 )P(E 0 )  (1   )p0
p01  P(E 0  F1 )  P(F1 | E 0 )P(E 0 )  p0
p10  P(E1  F0 )  P(F0 | E1 )P(E1 )  p1
p11  P(E1  F1 )  P(F1 | E1 )P(E1 )  1   p1
Ch2: Basic Concepts
2. 3 Mutual information
Compute the output
Then
q 0  P(F0 )  p 00  p10  p1  (1   )p 0
  (1  p 0 )  (1   )p 0    (1  2 )p 0
q1  P(F1 )  p 01  p11  p 0  (1   )p1
  (1  p1 )  (1   )p1    (1  2 )p1
If
1
p 0  p1 
2
then
1
q 0  q1 
2
Ch2: Basic Concepts
2. 3 Mutual information
Compute the mutual information
P(E 0  F0 )
(1   )p 0
I(E 0 , F0 )  log
 log
 log2( 1   )
P(E 0 )P(F0 )
p0q 0
P(E 0  F1 )
p 0
I(E 0 , F1 )  log
 log
 log2 
P(E 0 )P(F1 )
p 0 q1
P(E1  F0 )
p1
I(E1 , F0 )  log
 log
 log2 
P(E1 )P(F0 )
p1q 0
P(E1  F1 )
(1   )p1
I(E1 , F1 )  log
 log
 log2( 1   )
P(E1 )P(F1 )
p1q1
Ch2: Basic Concepts
2. 3 Mutual information
Compute the mutual information
I(S0 , S1 )  H(S1 )  H(S1 | S0 )
1
1 1
1
H(S1 )   log  log  1
2
2 2
2
1
1
H(S1 | S0 )   (1   ) log( 1   )   log 
2
2
1
1
  log   (1   ) log( 1   )
2
2
 (1   ) log( 1   )   log 
  0 or 1  I(S0 , S1 )  0
Ch2: Basic Concepts
2. 3 Mutual information
Ex: The following message may be sent over a binary symmetric
channel with crossover probability ε
M1  00, M 2  01, M3  10, M 4  11
and they are equally probable at the input.
What is the mutual information between M1 and the first output
digit being 0?
What additional mutual information is conveyed by the
knowledge that the second output digit is also 0?
1
P(M1  0)  P(0 | M1 )P(M 1 )  (1 -  )
4
Ch2: Basic Concepts
2. 3 Mutual information
1
(1 -  )
P(M1  0)
4  1  log( 1   )
I(M1 ,0)  log
 log
1 1
P(M1 )P(0)

4 2
For the output 00
1
P(M1  00)  P(00 | M1 )P(M 1 )  (1 -  )
4
2
1
(1 -  )
P(M1  00)
4  2  2 log( 1   )
I(M1 ,00)  log
 log
1 1
P(M1 )P(00)

4 4
2
The extra mutual infoemation
1  log(1 -  ) bits
Ch2: Basic Concepts
2. 4 Data processing theorem
Data processing theorem
If S1 and S3 are statistically independent when conditioned on S2,
then
I(S1 , S3 )  I(S2 , S3 ) I(S1 , S3 )  I(S1 , S2 )
convexity theorem
If S1 and S3 are statistically independent when conditioned on S2,
then
I(S2 , S3 )  I(S2 , S3 | S1 )
Ch2: Basic Concepts
2. 4 Data processing theorem
Data processing theorem
If S1 and S3 are statistically independent when conditioned on S2,
then
I(S1 , S3 )  I(S2 , S3 ) I(S1 , S3 )  I(S1 , S2 )
proof
Ch2: Basic Concepts
2. 5 Uniqueness theorem
Def:
f(p 1, p2 ,, pn )
be a continuous function of its arguments in which
n
p k  0,  p k  1
滿足
k 1
(a) f takes its largest value of pk=1/n
(b) f is unaltered if an impossible event is added to the system
f(p 1 , p2 ,, pn ,0)  f(p 1 , p2 ,, pn )
(c)
f(p 1 ,, p j ,, p k ,, p n ) 
f(p 1 ,, p j  p k ,,0,, p n )  p j  p k f(
pj
p j  pk
,
pk
,0,0)
p j  pk
Ch2: Basic Concepts
2. 5 Uniqueness theorem
Uniqueness theorem
n
f(p 1 , p 2 ,, p n )  C p k log p k
k 1
for a positive constant C
proof