INFORMATION and . COMPUTATION

INFORMATION and
.
COMPUTATION
what is a bit ?
what is computation?
• a transformation of information
from one form into another
• an effective procedure for generating
output when an information sequence is given
• a description of information over time
so . . . . what is information?
what is information?
• dictionary:
– information, n.
knowledge communicated or received
concerning a particular fact or circumstance
• philosophy:
– data < information < knowledge < wisdom
• engineering
– information resolves uncertainty
so, for our purpose, information has little to do with
knowledge or meaning, it is simply more the less likely.
newspaper information
Feyenoord overklast Real Madrid
Supercup in
Rotterdam
Van onze Telesportredactie
VENLO - Nadat Ajax
eerder al niet won op
bezoek bij VVV-Venlo
slaagde PSV er ook niet
in een driepunter te
produceren in de Koel.
De ploeg van trainer Fred
Rutten kwam op een 2-0
voorsprong, maar gaf het
duel na de rust helemaal
uit handen. Wilfred
Bouma redde uiteindelijk
een punt voor de
Brabanders: 3-3.
what is information?
• dictionary:
– information, n.
knowledge communicated or received
concerning a particular fact or circumstance
• philosophy:
– data < information < knowledge < wisdom
• engineering
– information resolves uncertainty
So, for our purpose, information has little to do with
knowledge or meaning, it is simply more the less likely.
in engineering, information often comes in symbols
in sequence ( "serial" ),
in simultaneous groups ( "parallel" ),
or both
how much in a symbol?
how many distinct messages can be built
with N equally likely symbols from an alphabeth A?
|A|N
how many distinct messages
with 2N equally likely symbols from an alphabeth A?
|A|2N
how much information is in a message of 2N symbols,
when a message of N symbols has information I(N)?
2I(N)
define I(N):= log2 (|A|N) , then I(xN) = x log2 (|A|N) = xI(N).
the average information per symbol is
I(xN)
xN log2 |A|
=
= log2 |A|
the unit of information
xN
xN
1
when base 2 is chosen
= -log2
|A|
is called bit
independent of x or N
choosing symbol sequences
• encoding is
assigning representations to information
• choosing appropriate and efficient encoding
is a real engineering challenge
• encoding has impact at many levels
– what components to use (implementation)
– how many bits to use (efficiency)
– how reliable is the representation (robustness)
– who knows about the encoding (security)
in a fixed-length code
each piece of information gets the same number of symbols
otherwise, a code is called a variable length code
fixed-length encoding
fixed length codes (of sufficient length) are used
when all choices are equally likely
decimal digits
to represent all decimal digits we need at least 4 bits
log2 (10) = 3.322 bits
for example: BCD
(binary coded decimal)
characters
to represent 84 english characters we need 7 bits
(26 lower case and 26 upper case characters, 10 digits,
8 punctuation marks, 9 math symbols, and 5 finance ....)
log2 (84) = 6.392 bits
for example: ASCII
(american standard code for information interchange)
fixed-length encoding
unequal probabilities
• symbols are seldom equally likely!
– Morse gave a single dot to "E" and dash dash dot dash to "Q"
• unlikely choices provide more information
than likely ones
– “Zwitserland verslaat Spanje",
provides more information than
“Nederland verslaat San Marino in EK-classificatie",
• fixed-length coding would be a waste
when symbols have vastly different probability!
• but how to assign variable-length codes to symbols?
is there a consistent information measure that helps?
can we measure information?
or can we quantify, give a number to, information?
desirable properties:
• vary continuously with changes in probability
• increasing with the number of choices with equal
probability
• it should not depend on how we choose
there is only one function with those properties !!
− ∑ pi log2 pi = ∑ pi log2 p1
i
1
note: ∑ 21 log2 2 = 1 , a choice out of 2 equally likely events
i= 0
has information 1
efficient encoding
choice i
Pi
log2(1/pi)
"A"
1/2
1 bit
"B"
1/6
2.58 bits
"C"
1/6
2.58 bits
"D"
1/6
2.58 bits
average information
= (.5)(1) + (3) (.167) (2.58)
= 1.79 bits
if we would have used log24=2 bits
for each character,
the average would be 2 bits
encoding efficiency is a measure of
how close the code size is to
the information content of the data stream
number of bits used
redundancy =
number of bits needed
in the example: redundancy(fixed length) = 2/1.79 = 1.12
or 12 %
variable length encodings
use shorter bit sequences for high probability choices
longer sequences for less probable choices
average information =
A B A C DA
choicei
pi
encoding
"A"
1/2
0
"B"
1/6
100
"C"
1/6
101
"D"
1/6
11
= 0.5 • 1 + 0.167 • 3 + 0.167 • 3 + 0.167 • 2 =
= 1.836 bits
redundancy = 1.836 =1.026
1.790
or 2.6%
a tree is a graph
with exactly one path
between every pair of nodes
use a decoding tree !
combine the two lowest probabilities in a new node
with the sum of their probabilities . . .iteratively!
efficient encoding is of the utmost importance for file compression;
not only individual symbol probabilities are taken into account,
but also the probability of symbol sequences
first assignment B1
• find an efficient code for reporting each throw of 2 dice
• calculate the redundancy (bits used/bits needed)of that code!
Answer due : sept 13, 12:00
Use only the format on the right
Comment/explanation/picture:
only in attachment after answer!
From: [email protected]
To: [email protected]
Subject : B1
decimal
3 0
point!
4 100
5 101
6 11
remarks:
maximum score is 6+6 = 12
1.026
so there is fixed-length code
attachment <student_B1.xxx>
with at least log2 12 = 3.58
⇒ 4 bits suffice !
however, scores 0, 1, 13, 14, 15 do not occur
in addition, not all occurring scores are equally likely
further considerations
• encoding schemes that attempt to match
the information content of a data stream
are essentially removing redundancy.
(we call them data compression techniques)
• encoding schemes that add redundancy
– to adapt codes for their medium
– to ease manipulating encoded operands
– to improve the resiliency against noise
(enhance error detection and correction)
are called code expansion techniques
consequence: expanded codes contain more symbols
than necessary for the information content!
for example: the cd-code has an expansion factor of 49/16 !!
expansion is often used to enhance
reliability in communication
what is communication?
computation is
• a transformation of information from one form into
another
• an effective procedure for generating output when an
input sequence is given
• a description of information over time
communication is
• a description of information over space
• a transmission of information between system components
communication exists also between parts in a system
such as a computer
a slight misunderstanding
Don, drop
the bomb!
Don't drop
the bomb!
the ultimate error message
errors
occur because of faults in components
apply testing, work around them, exclude them
because of random phenomenae during operation
enhance robustness by expansion
heads
0
1
introductory example
of transmission reliability
tails
suppose we wanted to reliably transmit
the result of a single coin flip
if during transmission
a single-bit error occurs,
then a single "0"
is detected to be a "1",
or a "1" is detected as a "0"
can errors be detected?
what we need is an encoding where a single-bit error
does not produce another valid code word
invalid words
valid words 01
use more bits !
add a parity bit to each code word !
heads
00
0
11
1
tails
10
now, any two valid code words
are separated
by at least one invalid code word
or,
the minimum distance
between each pair of valid words
is at least two !
the hamming distance of two code words (representations)
in a fixed-length coding scheme is
the number of bit positions at which they differ
the hamming distance of a code
is the smallest hamming distance over all pairs of its words
can errors be corrected?
YES ! ! ! by increasing the minimum hamming distance!
more in general:
01
100 101
heads
000
00 001
11
110 111
tails
010 011
10
coding theory
tells you much more !
to be able to detect t errors
the minimum hamming distance
has to be larger than t+1
to be able to correct t errors
the minimum hamming distance
has to be larger than 2t+1
summary
• information resolves uncertainty
• what is a bit ?
• there is only one acceptable information measure
– choicei with probability pi : log2 1/pi bits
– average number of bits = ∑pilog2 1/pi
• N equally probable choices:
– use fixed-length encodings: length = log2 N
– encoding numbers
• not necessarily equally probable choices
– use variable-length encodings to reduce redundancy
• to detect t bit-errors: hamming distance > t
• to correct t bit-errors: hamming distance > 2t