Probability and Time - IME-USP

MAC 425/5739
A RTIFICIAL I NTELLIGENCE
Probability and Time
D ENIS D. M AUA
IME 2016
Bayesian Network
Consists of
I
A DAG G over a set of variables X = {X1 , . . . , Xn }
I
Probability constraints: P(Xi = k|Pa(Xi ) = j) = θijk
I
D-Separation semantics of the DAG (implies Markov
assumption and factorization)
Joint Distribution: There is a single consistent probability function:
P(X) = P(X1 , . . . , Xn ) =
n
Y
i=1
P(Xi |Pa(Xi )) =
n
Y
θijk
i=1
2 / 14
Means of Transport Survey
I Age: young, adult, old
Age
Gender
I Gender: male, female
I Education: primary, high school,
Education
university
I Occupation: employee,
self-employed
Occupation
City size
I City size: small, big
I Transport: private (car), public
(bus, train, etc)
Transport
3 / 14
I
New survey is conducted every week over a couple of months
I
Time-parameterized variables: At , Gt , Et , Ot , Ct , Tt
I
Probabilistic model comprises all variables:
P(A0 , G0 , . . . , TK )
I
Memoryless dynamics:
I
I
I
System evolves based only on current state (irrespective of
trajectory)
Time Markov Property: P(Xt |X0 , . . . , Xt−1 ) = P(Xt |Xt−1 )
Stationarity: transition probability distribution is constant
through time
4 / 14
Bayesian network for 3 surveys
A0
G0
A1
E0
O0
A2
E1
C0
T0
G1
O1
E2
C1
T1
P(A0 , . . . , T3 ) = P(A0 , . . . , T0 )
2
Y
t=1
G2
O2
C2
T2
P(At , . . . , Tt |At−1 , . . . , Tt−1 )
5 / 14
Two Time-Slice Bayesian Network
Represent Temporal Bayesian Network for undefined horizon K
A0
G0
A1
E0
O0
E1
C0
T0
G1
O1
C1
T1
Stationarity: P(At , . . . , Tt |At−1 , . . . , Tt−1 ) does not depend on t
6 / 14
Dynamic Bayesian Network
Consists of
I
Two Time-Slice Bayesian Network
I Slice t: X = {X t , . . . , X t }
t
n
1
I Pa(X 0 ) ⊂ X = {X 0 , . . . , X 0 }
0
n
1
i
I Pa(X 1 ) ⊂ X ∪ X = {X 0 , . . . , X 1 }
0
1
n
1
i
I
Memoryless and Stationarity assumption
Joint Distribution for given horizon K : There is a unique probability
function consistent with DBN:
P(X0:K ) = P(X10 , . . . , XnK ) =
K−1
n
YY
P(Xit |Pa(Xit ))
t=0 i=1
7 / 14
Inference tasks
I
Sequence of observations: e1:t , where each et is an
assignment to a subset of Xt
I
Filtering: P(Xt |e1:t ) – track belief state
input to the decision process of a rational agent
I
Prediction: P(Xt+k |e1:t ) for k > 0
evaluation of possible action sequences;
like filtering without part of the evidence
I
Smoothing: P(Xk |e1:t ) for 0 ≤ k < t
better estimate of past states, essential for learning
I
Most likely explanation: arg maxx1:t P(x1:t |e1:t )
speech recognition, decoding with a noisy channel
8 / 14
Filtering
Aim: devise a recursive state estimation algorithm:
P(Xt+1 |e1:t+1 ) = f (et+1 , P(Xt |e1:t ))
Simplifying assumption: et assigns values to variables Xit s.t.
Pa(Xit ) ⊆ Xt
9 / 14
Filtering
Aim: devise a recursive state estimation algorithm:
P(Xt+1 |e1:t+1 ) = f (et+1 , P(Xt |e1:t ))
Simplifying assumption: et assigns values to variables Xit s.t.
Pa(Xit ) ⊆ Xt
P(Xt+1 |e1:t+1 ) = P(Xt+1 |e1:t , et+1 )
∝ P(et+1 |Xt+1 , e1:t )P(Xt+1 |e1:t )
= P(et+1 |Xt+1 )P(Xt+1 |e1:t )
i.e., prediction + estimation. Prediction by summing out Xt :
P(Xt+1 |e1:t+1 ) ∝ P(et+1 |Xt+1 )
X
P(Xt+1 |xt , e1:t )P(xt |e1:t )
xt
= P(et+1 |Xt+1 )
X
P(Xt+1 |xt )P(xt |e1:t )
xt
9 / 14
Filtering: Forward Algorithm
For t = 1, 2, . . . compute:
f1:t+1 = Forward(et+1 , f1:t )
where
f1:t = P(Xt |e1:t )
Note:
f1 = P(X1 |e1 ) = P(e1 |X1 )
X
P(Xt |Xt−1 )P(X0 )
X0
Time and space constant (independent of t)
10 / 14
Example
I
Prisoner in cell without windows wants to know whether it is
raining outside
I
On each day, prisoner observes whether warden is carrying
umbrella
I
Before the first observation, prisoner feels indifferent as to
whether it rained or not on that day
R t −1
P(R t )
t
f
0.7
0.3
Raint −1
Umbrella t −1
Raint +1
Raint
Rt
P(U t )
t
f
0.9
0.2
Umbrella t
Umbrella t +1
11 / 14
Example
True
False
0.500
0.373
0.500
0.500
0.818
0.182
0.883
0.117
Rain 0
Rain 1
Rain 2
Umbrella 1
Umbrella 2
12 / 14
Hidden Markov Models
I
Hidden variables Xt
I
Manifest variables Yt
I
Memoryless state: P(Xt |X0..t−1 , Y0..t−1 ) = P(Xt |Xt−1 )
I
Memoryless sensor: P(Yt |X0..t , Y0..t−1 ) = P(Yt |Xt )
R t −1
P(R t )
t
f
0.7
0.3
Raint −1
Umbrella t −1
Raint +1
Raint
Rt
P(U t )
t
f
0.9
0.2
Umbrella t
Umbrella t +1
13 / 14
Applications
I
Speech recognition
I
I
I
Machine translation
I
I
I
Observations are acoustic signals
states are phonemes
Observations are words in native language
States are words in foreign language
Robot tracking
I
I
Observations are sensors readings (laser range, ultrasound,
etc)
States are positions on map
14 / 14