Stochastic Processes, Markov Chains, and Markov Models

Stochastic
Processes, Markov
Chains, and
Markov Models
Stochastic Process
Stochastic
Process
Stochastic
Process
Markov Chain
Stochastic Processes, Markov Chains,
and Markov Models
Stochastic
Processes, Markov
Chains, and
Markov Models
◮
Markov Models
◮
A stochastic or random process is a sequence
ξ1 , ξ2 , . . . of random variables based on the same
sample space Ω.
The possible outcomes of the random variables are
called the set of possible states of the process.
◮
L645
◮
Dept. of Linguistics, Indiana University
Fall 2009
Markov Chain
Markov Models
The process will be said to be in state ξt at time t.
Note that the random variables are in general not
independent.
◮
In fact, the interesting thing about stochastic process
is the dependence between the random variables.
1 / 22
Stochastic Process: Example 1
Stochastic
Processes, Markov
Chains, and
Markov Models
2 / 22
Stochastic Process: Example 2
Stochastic
Process
Stochastic
Processes, Markov
Chains, and
Markov Models
Stochastic
Process
Markov Chain
Markov Models
The sequence of outcomes when repeatedly casting the
die is a stochastic process with discrete random variables
and a discrete time parameter.
Markov Chain
There are three telephone lines, and at any given moment
0, 1, 2 or 3 of them can be busy. Once every minute we
will observe how many of them are busy. This will be a
random variable with Ωξ = {0, 1, 2, 3}. Let ξ1 be the
number of busy lines at the first observation time, ξ2 the
number of busy lines at the second observation time, etc.
Markov Models
The sequence of the number of the busy lines then forms
a random process with discrete random variables and a
discrete time parameter.
3 / 22
Characterizing a random process
Stochastic
Processes, Markov
Chains, and
Markov Models
4 / 22
Markov Chains and the Markov Property (1)
Stochastic
Process
To fully characterize a random process with time
parameter t, we specify:
Stochastic
Processes, Markov
Chains, and
Markov Models
Stochastic
Process
Markov Chain
Markov Models
1. the probability P(ξ1 = xj ) of each outcome xj for the
first observation, i.e., the initial state ξ1
A Markov chain is a special type of stochastic process
where the probability of the next state conditional on the
entire sequence of previous states up to the current state
is in fact only dependent on the current state.
Markov Chain
Markov Models
This is called the Markov property and can be stated as:
2. for each subsequent observation/state
ξt+1 : t = 1, 2, ... the conditional probabilities
P(ξt+1 = xit+1 |ξ1 = xi , ..., ξt = xit )
P(ξt+1 = xit+1 |ξ1 = xi1 , . . . , ξt = xit ) =
P(ξt+1 = xit+1 |ξt = xit )
Terminate after some finite number of steps T
5 / 22
6 / 22
Markov Chains and the Markov Property (2)
The probability of a Markov chain ξ1 , ξ2 , . . . can be
calculated as:
Stochastic
Processes, Markov
Chains, and
Markov Models
Markov Chains: Example
Stochastic
Processes, Markov
Chains, and
Markov Models
Stochastic
Process
Stochastic
Process
Markov Chain
Markov Chain
Markov Models
Markov Models
P(ξ1 = xi1 , . . . , ξt = xit ) =
We can turn the telephone example with 0, 1, 2 or 3 busy
lines into a (finite) Markov chain by assuming that the
number of busy lines will depend only on the number of
lines that were busy the last time we observed them, and
not on the previous history.
= P(ξ1 = xi1 ) · P(ξ2 = xi2 |ξ1 = xi1 ) ·
. . . · P(ξt = xit |ξt−1 = xit−1 )
The conditional probabilities P(ξt+1 = xit+1 |ξt = xit ) are
called the transition probabilities of the Markov chain.
A finite Markov chain must at each time be in one of a
finite number of states.
7 / 22
Transition Matrix for a Markov Process
Consider a Markov chain with n states s1 , . . . , sn . Let pij
denote the transition probability from state si to state sj ,
i.e., P(ξt+1 = sj |ξt = si ).
Stochastic
Processes, Markov
Chains, and
Markov Models
8 / 22
Transition Matrix: Example
Stochastic
Processes, Markov
Chains, and
Markov Models
Stochastic
Process
Stochastic
Process
Markov Chain
Markov Models
Markov Chain
In the telephone example, a possible transition matrix
could for example be:
The transition matrix for this Markov process is then
defined as


n
p11 . . . p1n
X
P =  . . . . . . . . .  , pij ≥ 0,
pij = 1, i = 1, . . . , n
pn1 . . . pnn
j=1
P=
s0
s1
s2
s3
s0
0.2
 0.3

 0.1
0.1

s1
0.5
0.4
0.3
0.1
s2
0.2
0.2
0.4
0.3
s3
0.1
0.1
0.2
0.5
Markov Models




In general, a matrix with these properties is called a
stochastic matrix.
9 / 22
Transition Matrix: Example (2)
Stochastic
Processes, Markov
Chains, and
Markov Models
10 / 22
Transition Matrix after Two Steps
Stochastic
Process
Stochastic
Process
Markov Chain
Assume that currently, all three lines are busy.
Stochastic
Processes, Markov
Chains, and
Markov Models
Markov Models
(2)
pij
= P(ξt+2 = sj |ξt = si )
=
What is then the probability that at the next point in time
exactly one line is busy?
=
The element in Row 3, Column 1 (p31 ) is 0.1, and thus
p(1|3) = 0.1. (Note that we have numbered the rows and
columns 0 through 3.)
n
X
r =1
n
X
r =1
Markov Chain
Markov Models
P(ξt+1 = sr |ξt = si ) · P(ξt+2 = sj |ξt+1 = sr )
pir · prj
(2)
The element pij can be determined by matrix
multiplication as the value in row i and column j of the
transition matrix P 2 = PP.
More generally, the transition matrix for t steps is P t .
11 / 22
12 / 22
Matrix Multiplication
Stochastic
Processes, Markov
Chains, and
Markov Models
Stochastic
Process
Markov Chain
Markov Models
Matrix after Two Steps: Example
Stochastic
Processes, Markov
Chains, and
Markov Models
For the telephone example: Assuming that currently all
three lines are busy, what is the probability of exactly one
line being busy after two steps in time?
Stochastic
Process
Markov Chain
Markov Models
Let’s go to
http://people.hofstra.edu/Stefan Waner/realWorld/tutorialsf1/frames3 2.html
and practice.
P 2 = PP =
s0
s1
s2
s3
(2)
 s0
0.22
 0.21

 0.17
0.12
s1
0.37
0.38
0.31
0.22
s2
0.25
0.25
0.30
0.28
s3
0.15
0.15
0.20
0.24




⇒ p31 = 0.22
13 / 22
Initial Probability Vector
Stochastic
Processes, Markov
Chains, and
Markov Models
A vector v = [v1 , . . . , vn ] with vi ≥ 0 and
called a probability vector.
Pn
i=1 vi
= 1 is
14 / 22
Initial Probability Vector
Stochastic
Processes, Markov
Chains, and
Markov Models
Stochastic
Process
Stochastic
Process
Markov Chain
Markov Chain
Markov Models
Markov Models
The probability vector that determines the state
probabilities of the observations of the first element
(state) in a Markov chain, i.e., where vi = P(ξ1 = si ), is
called an initial probability vector.
If p t (si ) is the probability of a Markov chain being in state
si at time t, i.e., after t − 1 steps, then
[p t (s1 ), . . . , p t (sn )] = vP t−1
The initial probability vector and the transition matrix
together determine the probability of the chain in a
particular state at a particular point in time.
15 / 22
Initial Probability Vector: Example
Let v = [0.5 0.3 0.2 0.0] be the initial probability vector for
the telephone example of a Markov chain.
Stochastic
Processes, Markov
Chains, and
Markov Models
s0
0.21
s1
0.43
s2
0.24
Markov Models
Stochastic
Process
Stochastic
Process
Markov Chain
Markov Chain
Markov Models
Markov Models
Markov models add the following to a Markov chain:
What is then the probability that after two steps exactly
two lines are busy?
vP 2 = vPP = [
16 / 22
Stochastic
Processes, Markov
Chains, and
Markov Models
◮
◮
s3
0.12 ]
a sequence of random variables ηt for t = 1, ..., T
representing the signal emitted at time t
◮
We will use σj to refer to a particular signal
⇒ p(ξ2 = s2 ) = 0.24
17 / 22
18 / 22
Markov Models (1)
A Markov Model consists of:
◮
◮
◮
◮
◮
a finite set of states Ω = {s1 , . . . , sn };
Stochastic
Processes, Markov
Chains, and
Markov Models
HMM Example 1: Crazy Softdrink Machine
Stochastic
Process
Stochastic
Processes, Markov
Chains, and
Markov Models
Stochastic
Process
0.5
Markov Chain
Markov Chain
Markov Models
an signal alphabet Σ = {σ1 , . . . , σm };
Markov Models
0.7
an n × n state transition matrix P = [pij ] where
pij = P(ξt+1 = sj | ξt = si );
ColaPref.
Iced Tea Pref.
0.5
0.3
an n × m signal matrix A = [aij ], which for each
state-signal pair determines the probability
aij = p(ηt = σj | ξt = si ) that signal σj will be emitted
given that the current state is si ;
with emission probabilities:
CP
IP
and an initial vector v = [v1 , . . . , vn ] where
vi = P(ξ1 = si ).
cola ice-t lem
0.6 0.1 0.3
0.1 0.7 0.2
from: Manning/Schütze; p. 321
19 / 22
Markov Models (2)
Stochastic
Processes, Markov
Chains, and
Markov Models
Stochastic
Process
Markov Chain
Markov Models
(t)
20 / 22
Markov Models (3)
The probability that signal σj will be emitted at time t is
then:
where
◮
p (t) (σj ) =
is the ith element of the vector vPt−1
n
X
i=1
This means that the probability of emitting a
particular signal depends only on the current state
◮
Stochastic
Process
Markov Chain
Markov Models
(t)
p (si , σj ) = p (si ) · p(ηt = σj | ξt = si )
p (t) (si )
Stochastic
Processes, Markov
Chains, and
Markov Models
p (t) (si , σj ) =
n
X
i=1
p (t) (si ) · p(ηt = σj | ξt = si )
Thus if p (t) (σj ) is the probability of the model emitting
signal σj at time t, i.e., after t − 1 steps, then
. . . and not on the previous states
[p (t) (σ1 ), . . . , p (t) (σm )] = vPt−1 A
21 / 22
22 / 22