Information & Entropy Shannon Information Axioms Small probability events should have more information than large probabilities. – – “the nice person” (common words lower info) “philanthropist” (less used more information) Information from two disjoint events should add – – – “engineer” Information I1 “stuttering” Information I2 “stuttering engineer” Information I1 + I2 Shannon Information 7 I I log 2 ( p) 6 5 4 3 2 1 p 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Information Units log2 – bits loge – naps log10 – ban or a hartley Ralph Vinton Lyon Hartley (1888-1970) inventor of the electronic oscillator circuit that bears his name, a pioneer in the field of Information Theory Illustration Q: We flip a coin 10 times. What is the probability we come up the sequence 0 0 1 1 0 1 1 1 0 1? 10 Answer 1 How much information do we have? p 2 10 1 I log 2 ( p) log 2 10 bits 2 Illustration: 20 Questions Interval halving: Need 4 bits of information 1 I log 2 4 bits 16 Entropy Bernoulli trial with parameter p Information from a success = log 2 p Information from a failure = log 2 1 p (Weighted) Average Information H p log 2 ( p) (1 p) log 2 1 p Average Information = Entropy The Binary Entropy Function p h( p) p log 2 ( p) (1 p) log 2 1 p Entropy Definition H pn log 2 ( pn ) n =average Information Entropy of a Uniform Distribution 1 pK ;1 k K K K 1 1 H log 2 K k 1 K log 2 K Entropy as an Expected Value H pn log 2 ( pn ) n E[ I ] E log 2 ( p X ( X )) where pn ; x xn p X ( x) 0 ; otherwise Entropy of a Geometric RV p(1 p) k ; x 0,1,2,..., k ,... p X ( x) 0 ; otherwise then H pq n log 2 ( pq n ) n p q n log 2 ( p ) n log 2 n h( p ) p H = 2 bits when q p=0.5 Relative Entropy p [ p1 , p2 , p3 ,..., pK ] q [q1 , q2 , q3 ,..., qK ] K H ( p, q) pk log 2 qk H p k 1 K k 1 qk pk log 2 pk Relative Entropy Property H ( p, q ) 0 Equality iff p=q Relative Entropy Property Proof Since ln x 1 x K H ( p, q ) k 1 K k 1 qk pk ln pk K qk K pk 1 pk qk 0 k 1 pk k 1 Uniform Probability is Maximum Entropy Relative to uniform: K H ( p, q ) k 1 p pk ln k 0 1 K Thus, for K fixed, How does this relate to thermodynamic entropy? H log 2 K maximum entropy Entropy as an Information Measure: Like 20 Questions 1 1 1 1 2 2 2 2 3 3 4 4 5 6 7 8 16 Balls Bill Chooses One You must find which ball with binary questions. Minimize the expected number of questions. One Method... 1 2 no yes 3 no yes 4 no yes 5 no yes 6 no yes 7 no yes 8 no yes 1 1 1 1 E[Q ] 1 2 3 4 4 4 8 8 1 1 1 1 5 6 7 7 16 16 16 16 51 3.1875 bits 16 Another (Better) Method... is X 2 ? no is X 4 ? yes is X 1? yes 1 is X 6 ? yes yes 3 no is X 7 ? yes is X 3? no 2 no is X 5 ? no 4 yes yes 5 Longer paths have smaller probabilities. 7 no 8 no 6 1 1 1 1 2 2 2 2 3 3 4 4 5 6 7 8 is X 2 ? no is X 4 ? yes is X 1? yes 1 is X 6 ? yes yes is X 3? no 2 no yes 3 is X 5 ? no 4 yes 5 no is X 7 ? yes 7 no 8 no 1 1 6 2 2 2 2 3 3 4 4 5 6 7 8 1 1 1 1 1 1 E[Q ] 2 2 3 3 4 4 8 8 1 1 1 1 4 4 4 4 16 16 16 16 44 2.75 bits 16 Relation to Entropy... E[Q] 44 16 2.75 bits The Problem’s Entropy is... 1 1 1 1 H -2 log 2 - 2 log 2 4 4 8 8 1 1 44 - 4 log 2 2.75 bits 16 16 16 1 1 1 1 2 2 2 2 3 3 4 4 5 6 7 8 Principle... 1 1 2 2 2 2 3 3 4 4 5 6 7 8 1 1 1 •The expected number of questions will equal or exceed the entropy. There can be equality only if all probabilities are powers of ½. 1 1 1 1 2 2 2 3 3 5 6 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 4 4 3 3 4 4 3 3 4 4 7 8 5 6 7 8 5 6 7 8 Principle Proof Lemma: If there are k solutions and the length of the path to the k th solution is , then 1 1 1 1 2 2 2 2 3 3 4 4 5 6 7 8 k K 2 k 1 k 1 Principle Proof K K k 1 k 1 E[Q ] H k pk pk log 2 pk K = pk pk log 2 k 2 k 1 k the relative entropy with respect to qk 2 Since the relative entropy always is nonnegative... E[Q ] H
© Copyright 2026 Paperzz