The Entropy Rate of the Hidden Markov Process

The Entropy Rate of the Hidden Markov Process
Mohammad Rezaeian
Department of Electrical and Electronic Engineering
University of Melbourne
30th Conference on Stochastic Processes and their Applications
University of California, Santa Barbara
June 28, 2005
Outline
1. The hidden Markov process.
2. The entropy rate of a process.
3. A new expression for the entropy rate of hidden Markov process.
4. Comparison with previous expressions.
5. The numerical method for computation of the entropy rate.
6. Conclusion.
1
Hidden Markov process
Consider the two processes
{Sn}∞
n=0, Sn ∈ S,
{Zn}∞
n=0, Zn ∈ Z
Sn is a sufficient statistics for both Sn+1 and Zn,
p(sn+1|sn, z n−1) = p(sn+1|sn)
p(zn|sn, z n−1) = p(zn|sn)
Markovity
Memoryless
Then
• {Sn}∞
n=0 is a Markov process
• {Zn}∞
n=0 is a hidden Markov process.
.
2
A hidden Markov process is a noisy
observation of a Markov process
through a memoryless channel.
P
T
The defining parameters of a HMP are [S, P, Z, T ],
• Each row of P is a probability vector, P (s) ∈ ∇S ,
p(sn+1|sn) = P [sn, sn+1].
• Each row of T is a probability vector, T (s) ∈ ∇Z ,
p(zn|sn) = T [sn, zn]
∇S and ∇Z are the probability simplexes.
3
The Entropy of a Random Variable
X A random variable.
X Its domain (a discrete set)
∇X The probability simplex of dimension |X | − 1.
p(X ) The distribution of X,
H (X ) The entropy of X.
The function h : ∇X → R+,
where h(p(X )) = H (X )
is the entropy function.
p( X ) ∈ ∇ X .
H (X ) = −EX log P r(x)
P
= − x P r(x) log P r(x).
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
0
0.5
1
0
0.2
0.4
0.6
0.8
1
4
The Entropy Rate of a Process
For a process {Xn}∞
n=0, the entropy rate is
Ĥ (X ) = limn→∞ 1/nH (X0, X1, ..., Xn)
Average information per symbol
= limn→∞ H (Xn|Xn−1, Xn−2, ..., X0) Steady state information per symbol
• For an i.i.d process, the entropy rate is the entropy of any sample.
• For processes with memory the entropy rate can be computed if it is a Markov
process.
• Objective: Compute the entropy rate of the hidden Markov process.
• Blackwell, 1957, An expression for the entropy rate by a measure as the solution of an
integral equation.
• Recently 2002-2004, Bounds and simple expressions for special cases of hidden Markov
process has been investigated.
5
The Entropy Rate of a Markov process
A Markov process is identified by a triple [S, P, µ0].
The stationary distribution is the solution ν, in
νP = ν, ν ∈ ∇S
For a Markov process, the entropy rate is
Ĥ (S ) = ES h(P (s))
S ∈ S, P (s) ∈ ∇S
(1)
where the distribution of S is the stationary distribution of the Markov process.
P
P (1)
∇S
P (2)
h : ∇S → R +
P (1)
ν
P (3)
P (2)
P (3)
6
The entropy rate corresponds to a probability measure
An important relation for conditional entropy H (Y |X ) , −E log P r(y|x)
H (Y |X ) =
ψ X (ω ) =
Z
∇Y
X
h(ω 0)ψX (ω 0)dω 0,
(2)
p(x)δ (ω − p(Y |x))
x
is a measure on ∇Y which depends on the disruption of X.
∇Y
p(Y |x1 )
h : ∇Y → R +
p(Y |x2 )
p(Y |x3 )
7
The probability measure is in general a limiting distribution.
For a general process {Zn}∞
n=0, by defining the random variable
qn = p(Zn|Z n−1),
H (Zn|Z n−1) =
Z
∇Z
qn ∈ ∇ Z .
h(q )ψn(q )dq = Eh(qn),
(3)
The measure ψn(.) is the distribution of the random variable qn.
Ĥ (Z ) = lim Eh(qn) =
n→∞
Z
∇Z
h(q )ψ (q )dq.
ψ (.) = limn→∞ ψn(.)
∇X
∇S
∇Z
ν
An i.i.d process
A Markov process
ψ (.)
A general process
8
The entropy rate of the hidden Markov process
For a hidden Markov model , the entropy rate of the
measurement process {Zn}∞
n=0 is
Ĥ (Z ) = lim Eh(qn) =
n→∞
P
Z
∇Z
h(q )ψ (q )dq.
qn = p(Zn|Z n−1)
T
Problem: The limiting distribution ψ (.) is not computable.
Resolution: Define the random variable
πn = p(Sn|Z n−1)
• The distribution of πn (denoted by µn(.)) is computable.
• qn = πn × T
9
The entropy rate of the hidden Markov process
For a hidden Markov model , the entropy rate of the
measurement process {Zn}∞
n=0 is
Ĥ (Z ) = lim Eh(qn) =
n→∞
P
Z
∇Z
h(q )ψ (q )dq.
qn = p(Zn|Z n−1)
T
πn = p(Sn|Z n−1)
qn = π n × T
Ĥ (Z ) = limn→∞ Eh(πn × T ) = ∇S h(πn × T )µ(π )dπ.
R
10
For the hidden Markov process, the entropy rate can be described by a measure
which is the stationary distribution of a Markov process.
Ĥ (Z ) =
Z
∇Z
h(q )ψ (q )dq.
Using a mapping ∇S → ∇Z , we can define the entropy rate by a measure on ∇S .
[Blackwell 1957]
Ĥ (Z ) =
Z
∇S
h(π 0 × P × T )ϕ(π 0)dπ 0.
q = π0 × P × T
ϕ is the state distribution of a stationary Markov process, and the solution of the integral equation,
XZ
rz (w)dϕ(w),
ϕ(E ) =
z
fz−1 E
New result
Ĥ (Z ) =
Z
∇S
h(π × T )µ(π )dπ.
q =π×T
µ is the stationary distribution of a non-stationary Markov process with the transition probabilities
and initial distribution explicitly defined by P and T .
11
The information-state process
The Hidden Markov Process is a special case of the
Partially Observed Markov Decision Processes (POMDP)
For a POMDP the random variable
P
πn = p(Sn|Z n−1)
is called the information-state at time n.
T
The main properties of the information-state process
• Sufficient statistics for the observation process p(Zn|πn, Z n−1) = p(Zn|πn).
• The recursion πn+1 = f (Zn, πn), using Bayes’ rule and total probability.
• qn = πn × T , where qn = p(Zn|Z n−1)
12
The Markovity of the information-state process
The information-state process {πn }∞
n=0 is a non-stationary Markov process with
• Initial distribution p(π0 ) = δ (π0 − ν )
• Conditional probabilities
pK (πn+1 = α|πn = β ) =
½
q [z ](β ), ∃z ∈ Z : α = f (z, β );
0,
otherwise,
This allows the computation of the distribution µn(.), and the measure µ(.) on ∇S .
Ĥ (Z ) =
R
∇S
h(π × T )µ(π )dπ
∇S
Ĥ (Z ) =
∇Z
R
∇Z
h(q )ϕ(q )dq
q =π×T
ν
µ(.)
ϕ(.)
13
The simple algorithm for computation of entropy rate
In iteration n we generate the set Un = {u1, u3, ..., u|Z n|}, and a probability distribution on this set µn(ui), from the previous iteration, starting from U0 = {ν}.
The recursive generation of sets Un and the distributions µn is based on the transition probabilities
of the information state process.
The sequence of Hn =
the entropy rate.
P|Z|n
i=1
µn(ui )h(ui T ) = H (Zn |Z n−1 ) converges monotonically from above to
1.38

0.05 0.9
0.04 0.1 
0.15 0.05 
0.03 0.04

0.5 0.2
0.2 0.1 
.
0.1 0.2 
0.1 0.4
1.36
1.34
1.32
1.3
n
0.03
0.06
0.7
0.03
0.2
0.1
0.2
0.2
H (bits)
0.02
 0.8
P =
0.1
0.9
0.1
 0.6
T =
0.5
0.3

1.28
1.26
1.24
1.22
1.2
1
2
3
4
5
6
7
8
n
14
Conclusion
The entropy rate for a general process is
Ĥ (Z ) = Eπ h(π ),
π ∈ ∇Z
but the measure for this expectation does not have a close form or there is not an
algorithm to find it.
For the hidden Markov process
[Blackwell, 1957]
Ĥ (Z ) = Eπ h(π × P × T ),
π ∈ ∇S
The distribution of π needs to be found by solving an integral equation.
New expression,
Ĥ (Z ) = Eπ h(π × T ),
π ∈ ∇S
The distribution of π is obtained numerically , using the information-state process.
15