Shannon Entropy

Shannon Entropy
Shannon worked at Bell Labs (part of
AT&T)
Major question for telephone
communication: How to transmit signals
most efficiently and effectively
across telephone wires?
Shannon adapted Boltzmann’s statistical
mechanics ideas to the field of
communication.
Claude Shannon, 19162001
Shannon’s Formulation of Communication
Message
Receiver
Message Source
Message (e.g., a word)
Message source : Set of all possible messages this source
can send, each with its own probability of being sent next.
Message: E.g., symbol, number, or word
Information content H of the message source: A function of the
number of possible messages, and their probabilities
Informally: The amount of “surprise” the receiver has upon
receipt of each message
No surprise;
no information content
Message source: One-year-old
Messages:
“Da” Probability 1
InformationContent (one-year-old) = 0 bits
More surprise;
more information content
Message source: Three-year-old
Messages: 500 words (w1 , w2 , ... , w500)
Probabilities: p1 , p2 , ... , p500
InformationContent (three-year-old) > 0 bits
Shannon information (H):
If all messages have the same probability, then
H(messagesource) = log2 ( number-of-possible-messages)
= -log 2 ( probability-of-a-message)
Units = “bits per message”
Example: Random bits (1, 0)
Example: Random DNA (A, C, G, T) [meaning in “bits per message”]
Example: Random notes in an octave (C, D, E, F, G, A, B, C’)
[meaning in “bits per message”]
General formula for Shannon Information
Content
General formula for Shannon Information
Content
Let M be the number of possible messages, and pi be the
probability of message i.
General formula for Shannon Information
Content
Let M be the number of possible messages, and pi be the
probability of message i.
M
H(messagesource) = -å pi log2 pi
i=1
• Example: Biased coin
• Example: Text
Relation to Coding Theory:
Information content = average number of bits it takes to encode a
message from a given message source, given an “optimal coding”.
This gives the compressibility of a text.
Huffman Coding
• An optimal (minimal) and unambiguous coding, based on
information theory.
• Algorithm devised by David Huffman in 1952
• Online calculator: http://planetcalc.com/2481/
David Huffman
Huffman Coding Example
Name:_____________________________
Frequency
5
4
3
2
2
1
1
Phrase: to be or not to be
Huffman code of phrase:
(remember to include sp code for spaces)
Average bits per character in code:
Shannon entropy of phrase:
Huffman Coding Example
Name:_____________________________
Frequency
5
4
3
2
2
1
1
Phrase: to be or not to be
Huffman code of phrase:
(remember to include sp code for spaces)
Average bits per character in code:
Shannon entropy of phrase:
Clustering C
c3
c1
c2
What is the entropy of each cluster?
What is the entropy of the clustering?