A brief introduction to probability theory

A brief introduction to probability theory
Elise Arnaud
perception - inria Rhône-Alpes
655, avenue de l’Europe 38330 Montbonnot, France
pop tutorial, nov. 2006, Coimbra
Elise Arnaud ([email protected])
Introduction
1 / 13
Introduction
When you submit a paper to a conference there is some uncertainty
about its acceptance.
When an uncertain event is quantified, one is dealing with
probabilities.
A set of possible outcomes is called an event
a compound event can be decomposed into elementary events.
The space of all possible elementary events is called the sample
space or event space
Elise Arnaud ([email protected])
Introduction
2 / 13
Introduction
Relationships among events can be expressed in terms of the set
theory.
Elise Arnaud ([email protected])
Introduction
3 / 13
Axiom of probability
0 ≤ P(A) ≤ 1
P(S) = 1 (the set of all possible events)
P(Ā) = 1 − P(A)
P(A) ≤ P(B) if A ⊂ B
P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
Elise Arnaud ([email protected])
Introduction
4 / 13
Notion of joint events
if both A and B occurs, we may be interested in the probability of
their intersection :
P(A ∩ B) = P(A, B)
Elise Arnaud ([email protected])
Introduction
5 / 13
Conditional probabilities
The conditional probability of event A given event B denotes the
probability of event A in the presence of event B : P(A|B)
We have :
P(A|B) =
P(A, B)
P(B)
P(A, B) = P(A|B) P(B) = P(B|A) P(A)
Remark : P(A, B) is symetric, P(A|B) is not
Elise Arnaud ([email protected])
Introduction
6 / 13
Independent events
Two events are said to be independent if their joint probability is
equal to the product of their individual probabilities.
if A and B are independant, then :
P(A, B) = P(A) P(B)
P(A|B) = P(A)
Elise Arnaud ([email protected])
Introduction
7 / 13
Independent events
Two events are said to be independent if their joint probability is
equal to the product of their individual probabilities.
if A and B are independant, then :
P(A, B) = P(A) P(B)
P(A|B) = P(A)
if A and B are conditionally independent given C then:
P(A, B|C ) = P(A|C ) P(B|C )
P(A|B, C ) = P(A|C )
Elise Arnaud ([email protected])
Introduction
7 / 13
Bayes’ theorem
P(A|B) =
P(A, B)
P(B|A) P(A)
=
P(B)
P(B)
Additional remark :
P(B) = P(A, B) + P(Ā, B) = P(B|A) P(A) + P(B|Ā) P(Ā)
P(A|B) =
Elise Arnaud ([email protected])
Introduction
P(B|A) P(A)
P(B|A) P(A) + P(B|Ā) P(Ā)
8 / 13
An example
A person in a country is given a tuberculosis test. Given the result of
the skin test, which is the probability that the person has tuberculosis?
Available information:
1
2
3
P(positive test | tuberculosis) = 0.98
P(positive test | no tuberculosis) = 0.05
P(tuberculosis) = 0.01
By applying the Bayes theorem we obtain:
P(tuberculosis | positive test) = 0.165
(event A : tuberculosis, event B: positive test)
Elise Arnaud ([email protected])
Introduction
9 / 13
Random variable and probability distribution
A random variable is a function that maps outcomes of random
experiments to numbers.
discrete / continuous
if a random variable is discrete, then the set of all values that it can
assume with nonzero probability is finite or countably infinite
Elise Arnaud ([email protected])
Introduction
10 / 13
Random variable and probability distribution
A random variable is a function that maps outcomes of random
experiments to numbers.
discrete / continuous
if a random variable is discrete, then the set of all values that it can
assume with nonzero probability is finite or countably infinite
Every random variable gives rise to a probability distribution
If x is a random variable, the corresponding probability distribution
assigns to the interval [a, b] the probability P(a ≤ x ≤ b), i.e. the
probability that the variable x will take a value in the interval [a, b].
Elise Arnaud ([email protected])
Introduction
10 / 13
Random variable and probability distribution
discrete random variable x → discrete probability distribution P(x)
The binomial distribution describes the number of successes in a
series of independent Yes/No experiments.
The Poisson distribution describes a very large number of individually
unlikely events that happen in a certain time interval.
continuous random variable x → probability density p(x)
The exponential distribution
The Gaussian distribution
Elise Arnaud ([email protected])
Introduction
11 / 13
To keep in mind
Sum rule:
p(x) =
X
p(x, y)
y
Product rule:
p(x, y) = p(x|y)p(y)
Bayes’ rule:
p(x|y) =
p(y|x)p(x)
p(y)
Normalisation:
p(y) =
X
p(y|x)p(x)
x
Elise Arnaud ([email protected])
Introduction
12 / 13
To keep in mind
x: “hidden variable”, ie what we want to estimate
y: observations, measurements, data
p(x|y) =
Elise Arnaud ([email protected])
p(y|x)p(x)
p(y)
Introduction
13 / 13
To keep in mind
x: “hidden variable”, ie what we want to estimate
y: observations, measurements, data
p(x|y) =
p(y|x)p(x)
p(y)
p(x|y) is the posterior probability, function of y
Elise Arnaud ([email protected])
Introduction
13 / 13
To keep in mind
x: “hidden variable”, ie what we want to estimate
y: observations, measurements, data
p(x|y) =
p(y|x)p(x)
p(y)
p(x|y) is the posterior probability, function of y
p(x) is the prior or marginal probability of x; prior in the sense that
it does not take into account the data.
Elise Arnaud ([email protected])
Introduction
13 / 13
To keep in mind
x: “hidden variable”, ie what we want to estimate
y: observations, measurements, data
p(x|y) =
p(y|x)p(x)
p(y)
p(x|y) is the posterior probability, function of y
p(x) is the prior or marginal probability of x; prior in the sense that
it does not take into account the data.
p(y) is a normalizing factor
Elise Arnaud ([email protected])
Introduction
13 / 13
To keep in mind
x: “hidden variable”, ie what we want to estimate
y: observations, measurements, data
p(x|y) =
p(y|x)p(x)
p(y)
p(x|y) is the posterior probability, function of y
p(x) is the prior or marginal probability of x; prior in the sense that
it does not take into account the data.
p(y) is a normalizing factor
p(y|x) is the likelihood of the observation y given x
Elise Arnaud ([email protected])
Introduction
13 / 13