Information and uncertainty

Informatics
Information and uncertainty
Informatics
Manipulating symbols
 Previously
 Typology of signs
 Sign systems, semiotics
 Symbols
 Tremendously important distinctions for informatics and
computational sciences
 Computation = symbol manipulation
 Symbols can be manipulated without reference to content
(syntactically), thanks to the arbitrary nature of
convention
 Allows computers to operate!
 All signs rely on a certain amount of convention, as all
signs have a pragmatic (social) dimension, but symbols are
the only signs which require exclusively a social
convention, or code, to be understood.
Informatics
Symbol manipulation
aedl:
adel
alde
deal
eadl
elad
ldae
adle
aled
dela
eald
elda
ldea
aedl
dael
dlae
edal
lade
lead
aeld
dale
dlea
edla
laed
leda
4! Permutations:
4 x 3 x 2 x 1 = 24
 Some have meaning (in some language)
 The relation between symbols and meaning is arbitrary
 Example: cut-up method for generating poetry
pioneered by Brion Gysin and William Burroughs and
often used by artists such as David Bowie, or use of
samples in electronic music
Informatics
Information theory
“The mathematical theory of
communication”, Claude Shannon
(1948)
Efficiency of information
transmission in electronic
channels
Key concept: information quantity
that can be measured
unequivocally (objectively)
 Does not deal at all with the
subjective aspects of information
semantics and pragmatics.
 Information is defined as a
quantity that depends on symbol
manipulation alone
Informatics
What’s an information quantity?
How to quantify a relation?
Information is a relation between
an agent, a sign and a thing,
rather than simply a thing.
The most palpable element in the
information relation is the sign,
symbols
But which symbols do we use to
quantify the information contained
in messages?
• Several symbol systems can be
used to convey the same message
• We must agree on the same symbol
system for all messages!
Informatics
What’s an information quantity?
Both sender and receiver must use the same code, or
convention, to encode and decode symbols from and to
messages. We need to fix the language used for communication
• Alphabet: Set of symbols allowed
• Syntax: The rules to manipulate symbols in alphabet
• Semantics: the meaning of the symbols
A language specifies the universe of all possible messages =
Set of all possible symbol strings of a given size.
DEAL is 1 out of 4! =
4×3×2×1 = 24 choices.
DEAL
EALD
ALDE
LDEA
DELA
EADL
ALED
LDAE
DLEA
ELDA
ADEL
LEDA
DLAE
ELAD
ADLE
LEAD
DAEL
EDLA
AELD
LADE
DALE
EDAL
AEDL
LAED
Informatics
What’s an information quantity?
Information is defined as “a measure of the freedom from
choice with which a message is selected from the set of all
possible messages”
Bit (short for binary digit) is the most elementary choice
one can make between two items: “0’ and “1”, “heads” or
“tails”, “true” or “false”, etc.
Bit is equivalent to the choice between two equally likely
choices.
Example, if we know that a coin is to be tossed, but are
unable to see it as it falls, a message telling whether the
coin came up heads or tails (alphabet) gives us one bit of
information.
Informatics
Decision-making
Decision-making:
• Perhaps the most fundamental capability of human beings
• Decision always implies uncertainty
• Implies choice
• Imploies lack of information, randomness, noise, Error
“The highest manifestation
of life consists in this:
that a being governs its
own actions. A thing which
is always subject to the
direction of another is
somewhat of a dead thing.
”
“A man has free choice to
the extent that he is
rational.”
(St. Thomas Aquinas)
Herbert Simon: “Bounded
rationality”:
• Limits imposed by time,
ability, resources.
• Satisficing
“In a predestinate world, decision would be illusory ; in a world of
perfect foreknowledge, empty; in a world without natural order,
powerless. Our intuitive attitude to life implies non-illusory, non-empty,
non-powerless decision… Since decision in this sense excludes both perfect
foresight and anarchy in nature, it must be defined as choice in face of
bounded uncertainty” (George Shackle)
Informatics
Uncertainty-based information:
original contributions
Information is transmitted through noisy communication
channels: Ralph Hartley and Claude Shannon (at Bell Labs),
the fathers of Information Theory, worked on the problem of
efficiently transmitting information; i.e. decreasing the
uncertainty in the transmission of information.
Hartley, R.V.L., "Transmission of
Information", Bell System
Technical Journal, July 1928,
p.535.
C. E. Shannon, ``A
mathematical theory of
communication,'' Bell System
Technical Journal, vol. 27, pp.
379-423 and 623-656, July and
October, 1948.
Informatics
Choices: multiplication principle
•
•
“If some choice can be made in M different ways, and some subsequent choice can be made
in N different ways, then there are M x N different ways these choices can be made in
succession” [Paulos]
3 shirts and 4 pants = 3 x 4 = 12 outfit choices, this is related to uncertainty when you are
deciding what to wear
Informatics
Hartley uncertainty
•
•
•
Nonspecificity: Hartley measure
The amount of uncertainty associated with a set of alternatives (e.g.
messages) is measured by the amount of information needed to
remove the uncertainty
A type of ambiguity
A = Set of Alternatives
Quantifies how many
yes/no (~1 bit) questions
need to be asked to
establish what the
correct alternative is
B
x1
x2
H ( A) = log 2 A
Measured in bits
Number of Choices
x3
xn
Informatics
Hartley uncertainty
A
H ( A) = log 2 A
Measured in bits
Number of Choices

Menu Choices



How many dinner combinations?

Quantifies how many
yes-no questions
need to be asked to
establish what the
correct alternative
is
A = 16 Entrees
B = 4 Desserts
16 x 4 = 64
H(AxB) = log2(16x4) = log2(16)+log2(4) = 4+2 = 6
AxB
Informatics
Hartley uncertainty: decision trees
H ( A) = log 2 A
Measured in bits
Number of Choices
Informatics
What about probability?
Some alternatives may be much more probable than others!
A different type of ambiguity
Higher frequency alternatives: less information
required
Measured by Shannon’s entropy measure
The amount of uncertainty associated with a set of
alternatives (e.g. messages) is measured by the
average amount of information needed to remove the
uncertainty
Probability distribution of letters in English
text (Orwell’s 1984 in fact):
Informatics
Shannon’s entropy
A = Set of weighted
Alternatives
x1
x2
Shannon’s measure
The average amount of uncertainty
associated with a set of weighted
alternatives (e.g. messages) is measured
by the average amount of information
needed to remove the uncertainty
Measured in bits
Probability of alternative
x3
xn
Informatics
Entropy of a message
Message encoded in an alphabet of n
symbols, for example:
English = 26 characters + space
More code = dots, dashes and spaces
DNA: A, T, G, C
Informatics
What it measures
missing information, how much information is
needed to establish what the symbol is, or
• uncertainty about what the symbol is, or
• on average, how many yes-no questions need
to be asked to establish what the symbol is.
One alternative
Uniform distribution
Informatics
Example: Morse code
1) All dots: p1 = 1, p2 = p3 = 0.
Take any symbol – it’s a dot; no uncertainty, no question needed, no missing
information, HS = -1.log2(1) = 0.
2) 50-50 dots and dashes: p1 = p2 = 1/2, p3 = 0.
Given the probabilities, need to ask only one question
one piece of missing information
HS = -(1/2.log2(1/2) + 1/2.log2(1/2) ) = -1.log2(1/2) = - (log2(1) - log2(2)) =
log2(2) = 1 bit (coin toss!)
3) Uniform: all symbols equally likely, p1 = p2 = p3 = 1/3.
Given the probabilities, need to ask as many as 2 questions - 2 pieces of
missing information, HS = - log2(1/3) = - (log2(1) - log2(3)) = log2(3) = 1.59
bits
Informatics
Bits, entropy and Huffman codes
Given a symbol set {A,B,C,D,E}
And occurrence probabilities PA, PB, PC, PD, PE,
The Shannon entropy then corresponds to:
The average minimum number of bits needed to represent a
symbol
Huffman coding: variable length coding for messages whose
symbols have variable frequencies that minimizes number of
bits per symbol?
Coding:
H = -(0.250*log2(0.250)+
0.375*log2(0.375)+
0.167*log2(0.167)+
0.125*log2(0.125)+
0.083*log2(0.083)) =
2.135
Huffman
symbol=
0.375 *
0.250 *
0.167 *
0.125 *
0.083 *
code: #bits per
1+
2+
3+
4+
4= 2.208
Informatics
Critique of Shannon’s communication
theory?
•The entropy formula as a measure of
information is arbitrary
• Shannon’s theory measures quantities of
information, but it does not consider
information content
• In Shannon’s theory, the semantic aspects
of information are irrelevant to the
engineering problem
Informatics
Human information processing
• Hick’s Law
Informatics
• <jb> End of lecture
Informatics
Other forms of uncertainty
•Vagueness or fuzziness
•Simultaneously being “True” and “False”
•Fuzzy Logic and Fuzzy Set Theory
Informatics
From crisp to fuzzy sets
•Fuzziness: Being and Not Being
•Laws of Contradiction and Excluded Middle are Broken
A∩ A ≠ 0
A∪ A ≠ X
1
1
Set of all People
Tall
People
Set of all People
Tall
People
Informatics
Next week’s discussion
Ratkiewicz, Nathan: Helbing, Dirk, Dirk
Brockmann, Thomas Chadefaux, Karsten Donnay,
Ulf Blanke, Olivia Woolley-Meza, Mehdi Moussaid,
et al. [2015]. "How to Save Human Lives with
Complexity Science". Journal of Statistical Physics.
158 (3): 735-781.
Whitley, Derek: Yann LeCun, Yoshua Bengio and
Geoffrey Hinton [2015]. "Deep learning" Nature
521, 436–444 (28 May 2015)
doi:10.1038/nature14539