Review of
Probability
Axioms of Probability Theory
Pr(A) denotes probability that proposition A is true. (A is also called
event, or random variable).
1.
0 Pr( A) 1
2.
Pr(True) 1
3.
Pr( A B) Pr( A) Pr( B) Pr( A B)
Pr( False ) 0
A Closer Look at Axiom 3
Pr( A B) Pr( A) Pr( B) Pr( A B)
True
A
A B
B
B
Using the Axioms to prove new properties
Pr( A A) Pr( A) Pr(A) Pr( A A)
Pr(True)
Pr( A) Pr(A) Pr( False )
1
Pr( A) Pr(A) 0
Pr(A)
1 Pr( A)
We proved this
Probability of Events
• Sample space and events
– Sample space S: (e.g., all people in an area)
– Events E1 S: (e.g., all people having cough)
E2 S:
(e.g., all people having cold)
• Prior (marginal) probabilities of events
–
–
–
–
P(E) = |E| / |S| (frequency interpretation)
P(E) = 0.1
(subjective probability)
0 <= P(E) <= 1 for all events
Two special events: and S: P() = 0 and P(S) = 1.0
• Boolean operators between events (to form
compound events)
– Conjunctive (intersection):
E1 ^ E2
( E1 E2)
– Disjunctive (union):
E1 v E2
( E1 E2)
– Negation (complement): ~E
(EC = S – E)
5
• Probabilities of compound events
– P(~E) = 1 – P(E) because P(~E) + P(E) =1
– P(E1 v E2) = P(E1) + P(E2) – P(E1 ^ E2)
– But how to compute the joint probability P(E1 ^ E2)?
~E
E
E1
E2
E1 ^ E2
• Conditional probability (of E1, given E2)
– How likely E1 occurs in the subspace of E2
| E1 E 2 | | E1 E 2 | / | S | P ( E1 E 2)
P ( E1 | E 2)
| E2 |
| E2 | / | S |
P ( E 2)
P ( E1 E 2) P ( E1 | E 2) P ( E 2)
6
Using Venn diagrams and
decision trees is very useful in
proofs and reasonings
Independence, Mutual Exclusion and Exhaustive sets
of events
• Independence assumption
– Two events E1 and E2 are said to be independent of each other if
P ( E1 | E 2) P ( E1) (given E2 does not change the likelihood of E1)
– It can simplify the computation
P ( E1 E 2) P ( E1 | E 2) P ( E 2) P ( E1) P ( E 2)
P ( E1 E 2) P ( E1) P ( E 2) P ( E1 E 2)
P ( E1) P ( E 2) P ( E1) P ( E 2)
1 (1 P ( E1)(1 P ( E 2))
• Mutually exclusive (ME) and exhaustive (EXH) set of
events
– ME:
– EXH:
E i E j ( P ( E i E j ) 0), i , j 1,.., n, i j
E1 ... En S ( P ( E1 ... En ) 1)
Random
Variables
8
Discrete Random Variables
• X denotes a random variable.
• X can take on a finite number of values in
set {x1, x2, …, xn}.
• P(X=xi), or P(xi), is the probability that the random
variable X takes on value xi.
• P( ). is called probability mass function.
• E.g.
P( Room) 0.7,0.2,0.08,0.02
These are four possibilities of value of X. Sum of these values must be 1.0
Discrete Random Variables
• Finite set of possible outcomes
X x 1 , x 2 , x 3 ,..., x n
P ( xi ) 0
0.4
0.35
0.3
n
P( x ) 1
i 1
X binary:
i
P ( x ) P ( x ) 1
0.25
0.2
0.15
0.1
0.05
0
X1
X2
X3
X4
10
Continuous Random Variable
• Probability distribution (density function)
over continuous values
X 0 ,10
P (x) 0
10
P ( x ) dx 1
P (x)
0
7
P (5 x 7 )
P ( x ) dx
5
5
7
x
11
Continuous Random Variables
• X takes on values in the continuum.
• p(X=x), or p(x), is a probability density function (PDF).
b
Pr( x [a, b]) p( x)dx
• E.g.
a
p(x)
x
Probability Distribution
• Probability distribution P(X|x)
– X is a random variable
• Discrete
• Continuous
– x is background state of information
Joint and Conditional
Probabilities
• Joint
P (x, y) P ( X x Y y)
– Probability that both X=x and Y=y
• Conditional
P(x | y) P(X x |Y y)
– Probability that X=x given we know that Y=y
Joint and Conditional
Probabilities
• Joint
P (x, y) P ( X x Y y)
– Probability that both X=x and Y=y
• Conditional
P(x | y) P(X x |Y y)
– Probability that X=x given we know that Y=y
Joint and Conditional Probability
• P(X=x and Y=y) = P(x,y)
• If X and Y are independent then
P(x,y) = P(x) P(y)
• P(x | y) is the probability of x given y
P(x | y) = P(x,y) / P(y)
P(x,y) = P(x | y) P(y)
• If X and Y are independent then
P(x | y) = P(x)
divided
Law of Total Probability
Discrete case
Continuous case
P( x) 1
p( x) dx 1
x
P ( x ) P ( x, y )
y
P( x) P( x | y ) P( y )
y
p( x) p( x, y ) dy
p ( x) p ( x | y ) p ( y ) dy
Rules of Probability: Marginalization
• Product Rule
P ( X , Y ) P ( X | Y ) P (Y ) P (Y | X ) P ( X )
• Marginalization
n
P (Y ) P (Y , x i )
i 1
X binary:
P (Y ) P (Y , x ) P (Y , x )
18
Gaussian,
Mean and
Variance
N(m, s)
Gaussian (normal) distributions
N(m, s)
P( x)
( x m )2
1
exp
2s
2 s
N(m, s)
different mean
different variance
20
Gaussian networks
Each variable is a linear
function of its parents,
with Gaussian noise
Joint probability density functions:
X
Y
X
Y
21
Reverend Thomas Bayes
(1702-1761)
Clergyman and
mathematician who first
used probability
inductively.
These researches
established a
mathematical basis for
probability inference
Bayes Rule
P (H , E ) P (H | E )P (E ) P (E | H )P (H )
P(E | H )P(H )
P(H | E )
P(E )
Pr( A B) Pr( A) Pr( B) Pr( A B)
P (H , E ) P (H | E )P (E ) P (E | H )P (H )
10/40 = probability that you smoke if you
have cancer = P(smoke/cancer)
P(E | H )P(H )
P(H | E )
P(E )
10/100 = probability that you have
cancer if you smoke
40 People who
have cancer
100 People who
smoke
1000-100 = 900 people who do not smoke
1000-40 = 960 people who do not have cancer
10 People who smoke
and have cancer
True
All
people =
1000
E = smoke, H = cancer
A
A B
B
B
Prob(Cancer/Smoke) =
P (smoke/Cancer) * P (Cancer) / P(smoke)
P(smoke) = 100/1000
P(cancer) = 40/1000
P(smoke/Cancer) = 10/40 = 25%
Prob(Cancer/Smoke) = 10/40 * 40/1000/ 100 = 10/1000 / 100 = 10/10,000 =/1000 = 0.1%
10/40 = probability that you smoke if you
have cancer = P(smoke/cancer)
40 People who
have cancer
100 People who
smoke
10 People who smoke
and have cancer
10/100 = probability that you have
cancer if you smoke
1000-100 = 900 people who do not smoke
1000-40 = 960 people who do not have cancer
True
E = smoke, H = cancer
A
All
people =
1000
A B
B
Prob(Cancer/Smoke) =
P (smoke/Cancer) * P (Cancer) / P(smoke)
P(smoke) = 100/1000
P(cancer) = 40/1000
P(smoke/Cancer) = 10/40 = 25%
B
Prob(Cancer/Smoke) = 10/40 * 40/1000/ 100 = 10/1000 / 100 = 10/10,000 = 1/1000 = 0.1%
E = smoke, H = cancer
Prob(Cancer/Not smoke) = 30/40 * 40/100 / 900 =
30/100*900 = 30 / 90,000 = 1/3,000 = 0.03 %
Prob(Cancer/Not Smoke) =
P (Not smoke/Cancer) * P (Cancer) / P(Not smoke)
Bayes’ Theorem with relative likelihood
• In the setting of diagnostic/evidential reasoning
H i P(Hi )
hypotheses
P(E j | Hi )
E1
Ej
Em
– Know prior probability of hypothesis
conditional probability
– Want to compute the posterior probability
evidence/m anifestati ons
P(Hi )
P(E j | Hi )
P(Hi | E j )
• Bayes’ theorem (formula 1): P ( H i | E j ) P ( H i ) P ( E j | H i ) / P ( E j )
• If the purpose is to find which of the n hypotheses H1 ,..., H n
is more plausible given E j, then we can ignore the
denominator and rank them, use relative likelihood
rel ( H i | E j ) P ( E j | H i ) P ( H i )
26
Relative likelihood
• P ( E j ) can be computed fromP ( E j | H i ) and P ( H i ,) if we assume
all hypotheses H1 ,..., H n are ME and EXH
P ( E j ) P ( E j ( H1 ... H n )
(by EXH)
n
P(E j Hi )
(by ME)
i 1
n
P(E j | Hi )P(Hi )
i 1
• Then we have another version of Bayes’ theorem:
P(Hi | E j )
P(E j | Hi )P(Hi )
n
P(E
k 1
j
| Hk )P(Hk )
rel ( H i | E j )
n
rel ( H
k 1
k
| Ej)
n
where
P(E
k 1
j
| H k ) P ( H k ), the sum of relative likelihood of all
n hypotheses, is a normalization factor
Naïve Bayesian Approach
• Knowledge base:
E1 ,..., Em :
evidence/m anifestati on
H1 ,..., H n :
hypotheses /disorders
E j and H i are binary and hypotheses form a ME & EXH set
P ( E j | H i ), i 1,...n, j 1,...m
conditiona l probabilit ies
• Case input: E1 ,..., El
• Find the hypothesis H iwith the highest posterior
probability P ( H i | E1 ,..., El )
P ( E1 ,...E l | H i ) P ( H i )
• By Bayes’ theorem P ( H i | E1 ,..., E l )
P ( E1 ,...E l )
• Assume all pieces of evidence are conditionally
independent, given any hypothesis
P ( E1 ,...El | H i ) lj 1 P ( E j | H i )
28
absolute posterior probability
• The relative likelihood
rel ( H i | E1 ,..., El ) P ( E1 ,..., El | H i ) P ( H i ) P ( H i ) lj 1 P ( E j | H i )
• The absolute posterior probability
P ( H i | E1 ,..., El )
rel ( H i | E1 ,..., El )
n
rel ( H k | E1 ,..., El )
k 1
P ( H i ) lj 1 P ( E j | H i )
l
P
(
H
)
k j 1 P ( E j | H k )
n
k 1
• Evidence accumulation (when new evidence is
discovered)
rel ( H i | E1 ,..., El , El 1 ) P ( El 1 | H i )rel ( H i | E1 ,..., El )
rel ( H i | E1 ,..., El , ~ El 1 ) (1 P ( El 1 | H i ))rel ( H i | E1 ,..., El )
Bayesian Networks and
Markov Models – applications in robotics
•
•
•
•
•
•
•
Bayesian AI
Bayesian Filters
Kalman Filters
Particle Filters
Bayesian networks
Decision networks
Reasoning about changes over time
• Dynamic Bayesian Networks
• Markov models
© Copyright 2026 Paperzz