資訊理論 Ch2: Basic Concepts 授課老師: 陳建源 Email:[email protected] 研究室:法401 網站 http://www.csie.nuk.edu.tw/~cychen/ Ch2: Basic Concepts 2. 1 Self-information Let S be a system of events in which E1 , E 2 ,..., E n P(E k ) p k with 0 p k 1 p1 p 2 p n 1 Def: The self-information of the event Ek is written I(Ek): I(E k ) log pk . lnx log 2 x (lnx)(log 2 e) ln2 The base of the logarithm: 2 (log) , e (ln) 單位:bit, nat Ch2: Basic Concepts 2. 1 Self-information when pk 1 then I(E k ) log pk 0 1 when pk 2 then I(E k ) log pk 1 1 16 then I(E k ) log pk 4 when pk 0 then I(E k ) log pk ? when pk pk 愈小 I(E ) k 愈大 Ch2: Basic Concepts 2. 1 Self-information Ex1. A letter is chosen at random from the Enlish alphabet. pk 1 26 1 I(E k ) log 4.7bits 26 Ex2. A binary number of m digits is chosen at random. pk 1 2m 1 I(E k ) log m m bits 2 Ch2: Basic Concepts 2. 1 Self-information Ex3. 64 points are arranged in a square grid. Ej be the event that a point picked at random in the jth column Ek be the event that a point picked at random in the kth row 1 P(E j ) P(E k ) 8 I(E j ) I(E k ) 3 bits 1 I(E j E k ) -log 6 bits 64 I(E j E k ) I(E j ) I(E k ) 6 bits Why? Ch2: Basic Concepts 2. 2 Entropy f: Ek→ fk E(f) be expectation or average or mean of f n E(f) pk f k k 1 Let S be the system with events E1 , E 2 ,..., E n the associated probabilities being n p1 , p2 ,, pn with 0 pk 1, pk 1 k 1 Ch2: Basic Concepts 2. 2 Entropy Def: The entropy of S, called H(S), is the average of the self-information n H(S) E(I) - p k logp k k 1 Self-information of an event increases as its uncertainty grows 觀察 logp k 0 Let p1 1, p 2 p n 0 H(S) 0 H(S) 0 certainty H(S) 最小值為0,表示已確定。但最大值呢? Ch2: Basic Concepts 2. 2 Entropy Thm: H(S) logn with equality only when p1 p 2 p n 1 n Proof: n H(S) - n p k logp k H(S) k 1 n k 1 1 p k ln np k p k log k 1 n k 1 p k (log 1 pk 1 1 log ) pk n Ch2: Basic Concepts 2. 2 Entropy Thm 2.2: For x>0 ln x x 1 with equality only when x=1. Assume that pk ≠0 n k 1 n k 1 1 p k ln np k 1 p k log np k n H(S) - logn n k 1 n k 1 1 ( - p k ) 11 0 n n n n k 1 k 1 k 1 p k - logp k - logn - p k logp k p k logn pk 0 k 1 1 pk ( - 1) np k Ch2: Basic Concepts 2. 2 Entropy n H(S) - logn pk 0 k 1 H(S) logn Ch2: Basic Concepts 2. 2 Entropy In the system S the probabilities p1 and p2 where p2> p1 are replaced by p1 +ε and p2-εrespectively under the proviso 0<2ε<p2-p1 . Prove the H(S) is increased. We know that entropy H(S) can be viewed as a measure of _____ about S. Please list 3 items for this blank. information uncertainty randomness Ch2: Basic Concepts 2. 3 Mutual information Let S1 be the system with events E1 , E 2 ,..., E n the associated probabilities being n p1 , p2 ,, pn with 0 pk 1, pk 1 k 1 Let S2 be the system with events F1 , F2 ,..., Fn the associated probabilities being n q1 , q2 ,, qn with 0 qk 1, qk 1 k 1 Ch2: Basic Concepts 2. 3 Mutual information Two systems S1 and S2 P(E j Fk ) p jk 0 m p satisfying relation n k 1 j1 jk 1 p jk , p j , qk m m i 1 k 1 p j P(E j ) P(E j Fi ) p jk n p j1 j 1 Ch2: Basic Concepts 2. 3 Mutual information relation p jk , p j , qk n n i 1 j1 qk P(Fk ) P(E i Fk ) p jk m q k 1 k 1 Ch2: Basic Concepts 2. 3 Mutual information conditional probability P(E j | Fk ) P(E j Fk ) / P(Fk ) p jk / qk P(Fk | E j ) P(E j Fk ) / P(E j ) p jk / p j conditional self-information I(E j | Fk ) log P(E j | Fk ) log p jk / qk mutual information NOTE: I(E j , Fk ) I(Fk , E j ) p jk I(E j , Fk ) log log p q P(E j )P(Fk ) j k P(E j Fk ) Ch2: Basic Concepts 2. 3 Mutual information conditional entropy n m n m H(S1 | S2 ) p jk I(E j | Fk ) p jk log(p jk /q k ) j 1 k 1 j 1 k 1 mutual information n m n m I(S1 , S2 ) p jk I(E j , Fk ) p jk log( j 1 k 1 j 1 k 1 p jk p jq k ) Ch2: Basic Concepts 2. 3 Mutual information mutual information and conditional self-information I(E j , Fk ) log P(E j Fk ) P(E j )P(Fk ) - log P(E j ) log I(E j ) I(E j | Fk ) P(E j Fk ) P(Fk ) I(E j , Fk ) log P(E j Fk ) P(E j )P(Fk ) - log P(Fk ) log P(E j Fk ) P(E j ) I(Fk ) I(Fk | E j ) If Ej and Fk are statistically independent I(E j , Fk ) 0 Ch2: Basic Concepts 2. 3 Mutual information joint entropy n m H(S1 S2 ) p jk P(E j Fk ) j 1 k 1 n H(S1 S2 ) 0 m - p jk log(p jk ) j 1 k 1 joint entropy and conditional entropy n m n m H(S1 S2 ) p jk log(p jk ) p jk (log j 1 k 1 m j 1 k 1 H(S1 | S2 ) - qk logq k H(S1 | S2 ) H(S 2 ) k 1 p jk qk logq k ) Ch2: Basic Concepts 2. 3 Mutual information H(S1 S2 ) H(S1 | S2 ) H(S2 ) H(S1 S2 ) H(S2 | S1 ) H(S1 ) mutual information and conditional entropy n m I(S1 , S2 ) p jk log( j 1 k 1 n m p jk log j 1 k 1 p jk qk n p jk p jq k n m ) p jk (log j 1 k 1 p jk qk - logp j ) p j logp j H(S1 ) H(S1 | S2 ) j 1 Ch2: Basic Concepts 2. 3 Mutual information Thm: I(S1, S2 ) H(S1 ) H(S2 ) H(S1 S2 ) H(S1 S2 ) 0 I(S1 , S2 ) H(S1 ) H(S2 ) mutual information of two systems cannot exceed the sum of their separate entropies Ch2: Basic Concepts 2. 3 Mutual information System’s independent If S1 and S2 are statistically independent I(E j , Fk ) 0 E j S1 , Fk S2 I(S1 , S2 ) 0 I(S1 , S2 ) H(S1 ) H(S2 ) H(S1 S2 ) H(S1 S2 ) H(S1 ) H(S2 ) Joint entropy of two statistically independent systems is the sum of their separate entropies Ch2: Basic Concepts 2. 3 Mutual information Thm: H(S1 | S2 ) H(S1 ) with equality only if S1 and S2 are statistically independent Proof:Assume that pjk ≠0 n m p j 1 k 1 jk ln p jq k p jk n n p j 1 k 1 jk ( p jq k H(S1 ) H(S1 | S2 ) n j 1 m p k 1 jk log p jk - 1) 1 1 0 p jk qk n j 1 m p k 1 n jk logp j p jlogp j j 1 Ch2: Basic Concepts 2. 3 Mutual information Thm: I(S1 , S2 ) 0 with equality only if S1 and S2 are statistically independent Proof: I(S1, S2 ) H(S1 ) H(S1 | S2 ) H(S1 | S2 ) H(S1 ) I(S1 , S2 ) 0 Ch2: Basic Concepts 2. 3 Mutual information Ex: A binary symmetric channel with crossover probability ε Let S1 be the input E0=0, E1=1 and S2 be the output F0=0, F1=1 P(F0 | E 0 ) 1 - P(F1 | E 0 ) P(F0 | E1 ) P(F1 | E1 ) 1 Ch2: Basic Concepts 2. 3 Mutual information Assume that P(E 0 ) p 0 P(E1 ) p1 1 p 0 Then p00 P(E 0 F0 ) P(F0 | E 0 )P(E 0 ) (1 )p0 p01 P(E 0 F1 ) P(F1 | E 0 )P(E 0 ) p0 p10 P(E1 F0 ) P(F0 | E1 )P(E1 ) p1 p11 P(E1 F1 ) P(F1 | E1 )P(E1 ) 1 p1 Ch2: Basic Concepts 2. 3 Mutual information Compute the output Then q 0 P(F0 ) p 00 p10 p1 (1 )p 0 (1 p 0 ) (1 )p 0 (1 2 )p 0 q1 P(F1 ) p 01 p11 p 0 (1 )p1 (1 p1 ) (1 )p1 (1 2 )p1 If 1 p 0 p1 2 then 1 q 0 q1 2 Ch2: Basic Concepts 2. 3 Mutual information Compute the mutual information P(E 0 F0 ) (1 )p 0 I(E 0 , F0 ) log log log2( 1 ) P(E 0 )P(F0 ) p0q 0 P(E 0 F1 ) p 0 I(E 0 , F1 ) log log log2 P(E 0 )P(F1 ) p 0 q1 P(E1 F0 ) p1 I(E1 , F0 ) log log log2 P(E1 )P(F0 ) p1q 0 P(E1 F1 ) (1 )p1 I(E1 , F1 ) log log log2( 1 ) P(E1 )P(F1 ) p1q1 Ch2: Basic Concepts 2. 3 Mutual information Compute the mutual information I(S0 , S1 ) H(S1 ) H(S1 | S0 ) 1 1 1 1 H(S1 ) log log 1 2 2 2 2 1 1 H(S1 | S0 ) (1 ) log( 1 ) log 2 2 1 1 log (1 ) log( 1 ) 2 2 (1 ) log( 1 ) log 0 or 1 I(S0 , S1 ) 0 Ch2: Basic Concepts 2. 3 Mutual information Ex: The following message may be sent over a binary symmetric channel with crossover probability ε M1 00, M 2 01, M3 10, M 4 11 and they are equally probable at the input. What is the mutual information between M1 and the first output digit being 0? What additional mutual information is conveyed by the knowledge that the second output digit is also 0? 1 P(M1 0) P(0 | M1 )P(M 1 ) (1 - ) 4 Ch2: Basic Concepts 2. 3 Mutual information 1 (1 - ) P(M1 0) 4 1 log( 1 ) I(M1 ,0) log log 1 1 P(M1 )P(0) 4 2 For the output 00 1 P(M1 00) P(00 | M1 )P(M 1 ) (1 - ) 4 2 1 (1 - ) P(M1 00) 4 2 2 log( 1 ) I(M1 ,00) log log 1 1 P(M1 )P(00) 4 4 2 The extra mutual infoemation 1 log(1 - ) bits Ch2: Basic Concepts 2. 4 Data processing theorem Data processing theorem If S1 and S3 are statistically independent when conditioned on S2, then I(S1 , S3 ) I(S2 , S3 ) I(S1 , S3 ) I(S1 , S2 ) convexity theorem If S1 and S3 are statistically independent when conditioned on S2, then I(S2 , S3 ) I(S2 , S3 | S1 ) Ch2: Basic Concepts 2. 4 Data processing theorem Data processing theorem If S1 and S3 are statistically independent when conditioned on S2, then I(S1 , S3 ) I(S2 , S3 ) I(S1 , S3 ) I(S1 , S2 ) proof Ch2: Basic Concepts 2. 5 Uniqueness theorem Def: f(p 1, p2 ,, pn ) be a continuous function of its arguments in which n p k 0, p k 1 滿足 k 1 (a) f takes its largest value of pk=1/n (b) f is unaltered if an impossible event is added to the system f(p 1 , p2 ,, pn ,0) f(p 1 , p2 ,, pn ) (c) f(p 1 ,, p j ,, p k ,, p n ) f(p 1 ,, p j p k ,,0,, p n ) p j p k f( pj p j pk , pk ,0,0) p j pk Ch2: Basic Concepts 2. 5 Uniqueness theorem Uniqueness theorem n f(p 1 , p 2 ,, p n ) C p k log p k k 1 for a positive constant C proof
© Copyright 2026 Paperzz