02_Huffman_Shannon-Fano_Coding.doc Coding (Huffmann and Shannon-Fano) The advantage of using coders such as Huffmann or Shannon-Fano prior to transmission of a data signal is that the bandwidth available is utilised more efficiently due to the average bit rate being reduced by the coding process. They use variable length codes that use shorter code lengths for symbols that occur with high probabilities and longer code lengths for symbols of low probability of occurrence. Huffman Code The Huffman code is an organised technique for finding the best possible variable length code for a given set of messages. For example, if we want to encode 5 symbols S1-S5 with probabilities 0.0625, 0.125, 0,25, 0.0625 and 0.5 respectively, then the Huffman coding procedure is accomplished as follows: (1) Arrange the messages in order of decreasing probability: Word Probability S5 0.5 S3 0.25 S2 0.125 S1 0.0625 S4 0.0625 1 02_Huffman_Shannon-Fano_Coding.doc (2) Now combine the bottom 2 entries to form a new entry whose probability is the sum of the probabilities of the original entries. If necessary, reorder the list so that the new entry in the correct position of decreasing order. Word Probability Probability S5 0.5 0.5 S3 0.25 0.25 S2 0.125 0.125 S1 0.0625 0.125 S4 0.0625 Note that the bottom entry in the 3rd column is a combination of S1 and S4. (3) Continue combining in pairs until only 2 entries remain. Word Probability Probability Probability Probability S5 0.5 0.5 0.5 0.5 S3 0.25 0.25 0.25 0.5 S2 0.125 0.125 0.25 S1 0.0625 0.125 S4 0.0625 2 02_Huffman_Shannon-Fano_Coding.doc (4) Now assign code bits by starting at the right. Move to the left and assign another bit if a split has occurred. The assigned bits are underlined in the table. Word Probability Probability Probability Probability S5 0.5 0.5 0.5 S3 0.25 0.25 0.25 10 S2 0.125 0.125 0.25 11 0.5 0 0.5 1 110 S1 S4 0.0625 0.125 1110 111 0.0625 1111 Finally, the code words are as follows: Word Probability S1 1110 S2 110 S3 10 S4 1111 S5 0 3 02_Huffman_Shannon-Fano_Coding.doc The average length is 40.0625+30.125+20.25+40.0625+10.5=1.875 This is much more efficient than a simple allocation of 3 bits. 4 02_Huffman_Shannon-Fano_Coding.doc Tutorial: Huffmann Coding (1) 9 symbols (S1-S9) of probabilities 0.04, 0.14, 0.02, 0.14, 0.49, 0.07, 0.02, 0.07 and 0.01 are transmitted down a digital channel. Determine a Huffmann coding of these symbols. What is the average number of bits per symbol? (2) A digital source transmits data using 8 symbols (S1-S8). The probabilities of symbols 1 to 4 are 0.05, 0.1, 0.2 and 0.25 respectively. Symbols 5 and 6 have the same probability of 0.12. Symbols 7 and 8 also have the same probability of occurrence. Derive the Huffman code for these symbols and determine the average code word length in this case. 5 02_Huffman_Shannon-Fano_Coding.doc Shannon-Fano Code The Shannon-Fano Code is implemented as follows: (1) Arrange the symbols in order of decreasing probability as before. Word Probability S5 0.5 S3 0.25 S2 0.125 S1 0.0625 S4 0.0625 (2) Partition the messages into the most equiprobable subsets. i.e. find dividing line that results in the closest 2 probabilities between the upper and lower sets. Now assign a 0 to all members of one of the 2 sets and a 1 to the other (or vv). Therefore we assign 0 to the code word S5. Word Probability S5 0.5 0 S3 0.25 1 S2 0.125 1 S1 0.0625 1 S4 0.0625 1 6 02_Huffman_Shannon-Fano_Coding.doc (3) Continue the subdivision process. Word Probability S3 0.25 10 S2 0.125 11 S1 0.0625 11 S4 0.0625 11 Word Probability S2 0.125 110 S1 0.0625 111 S4 0.0625 111 Word Probability S1 0.0625 1110 S4 0.0625 1111 Finally, the code words are as follows: 7 02_Huffman_Shannon-Fano_Coding.doc Word Probability S1 1110 S2 110 S3 10 S4 1111 S5 0 In this case the result turned out to be exactly the same as the Huffman example. This is not always the case. One disadvantage of the Huffman code is that we cannot start assigning code bits until the entire combination process is completed. Considerable computer storage could be required. The Shannon-Fano code storage requirements are considerably relaxed, and the code is easier to implement. However the Shannon-Fano code does not always give results as good as Huffman. 8 02_Huffman_Shannon-Fano_Coding.doc Sample Problem: Huffmann & Shannon Fano Coding A data source transmits 8 symbols with the following probabilities: Symbol S1 S3 S4 S5 S7 S8 Probability 0.2 0.05 0.15 0.1 0.25 0.125 Symbols 2 and 6 have the same probability. Determine: (i) the Huffmann coding of these symbols (ii) the Shannon Fano coding of these symbols (iii) the average number of bits/symbol in each case Huffmann First we sum the probability values: 0.2+0.05+0.15+0.1+0.25+0.125=0.875 Then the probability of symbols 2 and 6 must be (1-0.875)/2=0.125/2=0.0625 Now we assemble the symbols in decreasing order of probability and assign 0 or 1 working from right to left through the table. 9 02_Huffman_Shannon-Fano_Coding.doc (S7) 0.25 0.25 0.25 0.25 0.3125 0.4375 (S1) 0.2 0.2 0.2 0.2375 0.25 0.3125 (00) 0.4375(1) (S4) 0.15 0.15 0.1625 0.2 0.2375 (10) 0.25 (01) (S8) 0.125 0.125 0.15 0.1625 (000) 0.2 (S5) 0.1 0.1125 0.125 (100) 0.15 (S2) 0.0625 0.1 (0000) 0.5625(0) (11) (001) 0.1125 (101) (S6) 0.0625 (1010) 0.0625 (0001) (S3) 0.05 (1011) Now we read off the assignment of the symbols: Symbol S1 S2 S3 S4 S5 S6 S7 S8 Probability 0.2 0.0625 0.05 0.15 0.1 0.0625 0.25 0.125 Assignment 11 0001 1011 001 0000 1010 01 100 Average number of bits/symbol (Lavg) is obtained by summing the product of the probability of each symbol with the number of bits assigned to that particular symbol. (0.2)(2)+0.0625(4)+0.05(4)+0.15(3)+0.1(4)+0.0625(4)+0.25(2)+0.125(3) = 0.4+0.25+0.2+0.45+0.4+0.25+0.5+0.275 = 2.725 bits/symbol 10 02_Huffman_Shannon-Fano_Coding.doc Shannon Fano First we create a table of descending probability terms. Then we subdivide the table into approximately equal probability sections, assigning 0 to the first group and 1 to the second. Then the process is repeated until all terms have been subdivided. Symbol S7 S1 S4 S8 S5 S6 S2 S3 S5 S6 S2 S3 Probability 0.25 0.2 0.15 0.125 0.1 0.0625 0.0625 0.05 0.1 0.0625 0.0625 0.05 110 110 111 111 Code 0 0 1 1 1 1 1 1 Symbol S4 S8 S5 S6 S2 S3 Probability 0.15 0.125 0.1 0.0625 0.0625 0.05 Code 10 10 11 11 11 11 S2 0.0625 1110 S3 0.05 1111 S5 0.1 1100 S6 0.125 1101 S7 0.25 00 S1 0.2 01 S4 0.15 100 S8 0.125 101 11 Digital Communications Now we read off the assignment of the symbols: Symbol S1 S2 S3 S4 S5 S6 S7 S8 Probability 0.2 0.0625 0.05 0.15 0.1 0.0625 0.25 0.125 Assignment 01 1110 1111 100 1100 1101 00 101 Lavg=(0.2)(2)+0.0625(4)+0.05(4)+0.15(3)+0.1(4)+0.0625(4)+0.25(2) +0.125(3) =2.725 bits/symbol In this case the result for the average number of bits/symbol is the same in both cases, although the actual encoding bits are different. 12 Digital Communications Sample Problem: Coding Double Symbol Groups (a) A digital encoder transmits 3 symbols A, B, C with respective probability of occurrence of 0.3, 0.6, 0.1. Code these symbols using the Huffmann algorithm. Determine the average number of bits per symbol in these coded symbols. (b) If the 3 symbols are block coded into double symbol groups, determine the Huffmann coding of these groups. Determine the average number of bits per symbol in this case. In what way does grouping the symbols in this manner improve system performance? Solution (a) Initially we assemble the symbols in decreasing order of probability. B 0.6 0.6 (0) A 0.3 (10) 0.4 (1) C 0.1 (11) Then we combine the last 2 symbols to produce a 2nd column. Assign 0 to the top symbol and 1 to the bottom symbol. Repeat process backwards. Then read off the assignment of each symbol from the diagram. 13 Digital Communications Thus B=0, A=10, C=11. Average number of bits/symbol: LAV = (1)(0.6)+(2)(0.3)+(2)(0.1) = 0.6+0.6+0.2 = 1.4 bits/symbol. (b) There are 9 possibilities of the double grouping of symbols: R=AA S=AB T=AC U=BA V=BB W=BC X=CA Y=CB Z=CC P(R)= (0.3)(0.3)=0.09 P(S)= (0.3)(0.6)=0.18 P(T)= (0.3)(0.1)=0.03 P(U)= (0.6)(0.3)=0.18 P(V)=(0.6)(0.6)=0.36 P(W)=(0.6)(0.1)=0.06 P(X)= (0.1)(0.3)=0.03 P(Y)= (0.1)(0.6)=0.06 P(Z)= (0.1)(0.1)=0.01 Again we assemble the symbols in decreasing order of probability. 14 Digital Communications (V) 0.36 0.36 0.36 0.36 0.36 0.36 0.36 (U) 0.18 0.18 0.18 0.18 0.18 0.28 0.36 (00) 0.36 (1) (S) 0.18 0.18 0.18 0.18 0.18 0.18 (000) 0.28 (01) 0.09 0.12 0.16 (010) 0.18 (001) (X) 0.09 0.09 (Y) 0.06 0.06 0.07 0.09 (0100) 0.12 (011) (R) 0.06 0.06 0.06 (0110) 0.07 (0101) (T) 0.03 0.04 (01010) 0.06 (0111) 0.64 (0) (X) 0.03 (010100) 0.03 (01011) (Z) 0.01 (010101) Now we read off the assignment of the symbols: V=1 U=000 S=001 X=0100 Y=0110 R=0111 T=01011 X=010100 Z=010101 Average number of bits/(symbol group): = (1)(0.36)+(3)(0.18)+(3)(0.18)+(4)(0.09) + (4)(0.06)+(4)(0.06)+(5)(0.03)+(6)(0.03)+(6)(0.01)=2.55 bits/group 15 Digital Communications Since each group corresponds to 2 symbols, then the average number of bits/symbol is: LAV = 2.55/2=1.275 bits/symbol. Grouping the symbols in this way improves the efficiency of the transmission by effectively transmitting more information due to less likelyhood per symbol group. This results in less bits per symbol than in the more simpler implementation. 16 Digital Communications Lempel-Ziv Compression A data source uses 3 symbols (a, b and c) where each symbol can be represented by an 8-bit ASCII code. The initial sequence of symbols from the source is as follows: aaabccbbccccabbbbaaaa. We wish to show with the aid of a table how compression would be implemented on this code and what compression ratio the code offers? Consider the following table using 3 symbols, a,b and c. 1 a 0a 1+8 9 2 aa 1a 1+8 9 3 b 0b 1+8 9 4 c 0c 1+8 9 5 cb 4b 3+8 11 6 bc 3c 2+8 10 7 cc 4c 3+8 11 8 ca 4a 3+8 11 9 bb 3b 2+8 10 10 bba 9a 4+8 12 11 12 aaa etc 2a 2+8 10 The data is parsed where it is broken up into groups of symbols that have not been seen before. Previous groups are then used to produce further groups where the code of the group is replaced with its index. In this way a natural compression of the data occurs as an index replaces large sequences of symbols using a run-length coding technique. If the data is represented by 8 bit ASCII, then a total of 111 bits used instead of 21*8=168. A compression ratio of 168/111=1.5 is achieved. 0 a b 1 a b 11 9 c 4 a 3 2 a c c 8 7 b 5 6 a 10 Decompressor Tree A typical application of this coding method is in the lossless PKZIP utility. The advantage of the code is that it is adaptive and does not require prior knowledge of the probabilities of the symbols. 17
© Copyright 2025 Paperzz