Coding and Data Compression Mathias Winther Madsen [email protected] Institute for Logic, Language, and Computation University of Amsterdam March 2015 Information Theory All the ’s and ’s implied by the words “small” and “about” in these statements ap allow T to increase and S0 to approach the maximizing source. The situation is summarized in Fig. 10 where the input sequences are points on t sequences points on the right. The fan of cross lines represents the range of possible c output. E M 2H xT 2H HIGH PROBABILITY MESSAGES 2Hy xT yT HIGH PROBABILITY RECEIVED SIGNALS REASONABLE CAUSES FOR EACH E 2Hx yT REASONABLE EFFECTS FOR EACH M Fig. 10— Schematic representation of the relations between inputs and outputs in a c Now suppose we have another source producing information at rate R with R C. I Claude Shannon: “A Mathematical Theory Communication,” source will have 2T R high probability of messages. We wish to associate these with a select channel inputs in such a way as to get a small frequency of errors. We will set up this Bell System Technical Journal, 1948. 23 Information Theory THE COIEF DIFFIOULTY ALOCE FOUOD OT FIRST WAS IN OAOAGING HER FLAOINGO: SHE SUCCEODEO ON GO OTIOG IOS BODY OUOKEO AOAO, COMFOROABLY EOOOGO, UNDER OER O OM, WITO OTS O O OS HANGIOG DOO O, BOT OENEOAO OY, OUST AS SO O HOD OOT OTS O OCK NOCEO O SOROIGHTEOEO O OT, ANO WOS O O ONG TO OIOE TO O HEDGEHOG O OLOW WOTH ITS O OAD, O O WOULO TWOST O OSEOF OOUO O ANO O O OK OP IN HOR OACO, O OTO OUO O A O O OZOED EO OREOSOOO O O O O SHO COUOD O O O O O O O O O OSO O OG O O O OAO OHO O O: AOD WHON O O O OAO OOO O O O O O O O DOO O, O OD O OS GOIOG O O BO O ON O O OIO, O O O OS O O OY O OOOOO O O O O O O O O O O O OT TO O OEOGO O O O O OD O OROLO O O O O O O OF, O O O O O O O O OHO O O O O O OOOOOOOO OOOOO The Hartley Measure Definition: The Hartley Measure of Uncertainty H = log2 |Ω| . Ralph V. L. Hartley: “Transmission of Information,” Bell System Technical Journal, 1928. The Hartley Measure ♠♣♥♦ ♠♥♣♦ ♠♥♦♣ ♠♣♦♥ ♠♦♣♥ ♠♦♥♣ ♣♠♥♦ ♣♥♠♦ ♣♥♦♠ ♣♠♦♥ ♣♦♠♥ ♣♦♥♠ ♠♣♦♥ ♠♦♣♥ ♠♦♥♣ ♠♣♥♦ ♠♥♣♦ ♠♥♦♣ H = log2 24 = 4.58 ♣♠♦♥ ♣♦♠♥ ♣♦♥♠ ♣♠♥♦ ♣♥♠♦ ♣♥♦♠ The Hartley Measure 00000 00100 01000 01100 10000 10100 00001 00101 01001 01101 10001 10101 00010 00110 01010 01110 10010 10110 H = log2 24 = 4.58 00011 00111 01011 01111 10011 10111 The Hartley Measure ♠♣♥♦ ♠♥♣♦ ♠♥♦♣ ♠♣♦♥ ♠♦♣♥ ♠♦♥♣ ♣♠♥♦ ♣♥♠♦ ♣♥♦♠ ♣♠♦♥ ♣♦♠♥ ♣♦♥♠ ♠♣♦♥ ♠♦♣♥ ♠♦♥♣ ♠♣♥♦ ♠♥♣♦ ♠♥♦♣ H = log2 24 = 4.58 ♣♠♦♥ ♣♦♠♥ ♣♦♥♠ ♣♠♥♦ ♣♥♠♦ ♣♥♦♠ The Hartley Measure ♠♣♥♦ ♠♥♣♦ ♠♥♦♣ ♠♣♦♥ ♠♦♣♥ ♠♦♥♣ – – – – – – – – – – – – – – – – – – H = log2 6 = 2.58 The Hartley Measure 000 001 010 011 100 101 – – – – – – – – – – – – – – – – – – H = log2 6 = 2.58 The Hartley Measure ♠♣♥♦ ♠♥♣♦ ♠♥♦♣ ♠♣♦♥ ♠♦♣♥ ♠♦♥♣ – – – – – – – – – – – – – – – – – – H = log2 6 = 2.58 The Hartley Measure – – – – – – – – – – – – – – – – – – – – – – ♠♥♣♦ ♠♥♦♣ H = log2 2 = 1.00 The Hartley Measure – – – – ♠♥♣♦ – – – – – – – – – – – – – – – – – – – H = log2 1 = 0.00 The Hartley Measure H == H =log(k log+k ?1) ? H == log( ) ?!? H = log(∞) ? Entropy The Shannon Entropy X 1 1 H = E log = p(x) log . p(X) p(x) x 0.6 0.6 0.4 0.4 p(x) p(x) 0.2 0.2 0 1 2 x 3 0 1 2 3 − log p(x) Entropy 1 H 0.5 0 0 0.5 1 Entropy Entropy 1−p 6 p 1−p 4 H p 2 1−p p 0 0 0.5 p 1 .. . 1 2 3 Entropy Properties of the entropy 1. Positive: H ≥ 0. 2. Decomposes: H(X × Y) = H(X) + H(Y | X). 3. Reduced (on average) by information: H(X) ≥ H(X | Y). Definition: Conditional Entropy H(X | Y) = EY [ H(X | Y) ] = X y p(y) H(X | Y = y) Huffman Coding x a b c d e Pr{X = x} .05 .15 .20 .25 .35 David A. Huffman: “A Method for the Construction of Minimum-Redundancy Codes,” Proceedings of the Institute of Radio Engineers, 1952. Huffman Coding Huffman Coding x Code A B C D E F G H I J K L M N O P 1001 011101 00011 10100 001 101111 101011 11011 0110 011100000 0111001 10110 101110 0101 1000 110100 p .0634 .0135 .0242 .0321 .0980 .0174 .0165 .0438 .0552 .0009 .0061 .0336 .0174 .0551 .0622 .0180 − log p k x Code 3.98 6.21 5.37 4.96 3.35 5.84 5.92 4.51 4.18 10.17 7.35 4.89 5.85 4.18 4.01 5.80 4 6 5 5 3 6 6 5 4 9 7 5 6 4 4 6 Q R S T U V W X Y Z ¶ _ ’ , . ? 0111000100 0000 0100 1100 00010 0111110 011110 011100001 101010 01110001011 0111111 111 011100011 1101011 1101010 01110001010 p .0008 .0470 .0502 .0729 .0234 .0075 .0156 .0014 .0160 .0005 .0084 .1741 .0019 .0117 .0109 .0003 − log p k 10.33 4.41 4.32 3.78 5.42 7.06 6.00 9.46 5.97 11.04 6.89 2.52 9.06 6.42 6.52 11.56 10 4 4 4 5 7 6 9 6 11 7 3 9 7 7 11
© Copyright 2026 Paperzz