Lecture Chapter 7: Random Process 1 Random Process

ECE511: Analysis of Random Signals
Fall 2016
Lecture Chapter 7: Random Process
Dr. Salim El Rouayheb
1
Random Process
Definition 1. A discrete random process X(n), also denoted as Xn , is an infinite sequence of
random variables X1 , X2 , X3 , . . . ; we think of n as the time index.
1. Mean function: µX (n) = E[X(n)].
2. Auto-correlation function: RXX (k, l) = E[X(k)X(l)].
3. Auto-covariance function: KXX (k, l) = RXX (k, l) − µX (k)µX (l).
Definition 2. X(n) is a Gaussian r.p. if Xi1 , Xi2 , . . . , Xim are jointly Gaussian for any m ≥ 1.
Example 1. Let W (n) be an i.i.d Gaussian r.p with autocorrelation RW W (k, l) = σ 2 δ(k − l) and
mean µW (n) = 0 ∀n, where
{
1 x = a,
δ(x − a) =
0 otherwise.
{
σ 2 k = l,
⇒ RW W (k, l) = E [Wk Wl ] =
0 otherwise.
Which means that for l = k,
RW (k, l) = E[Wl2 ] = V [Wl ] = σ 2 ,
and Wl & Wk are correlated. While for l ̸= k,
RW (k, l) = 0,
and Wl & Wk are uncorrelated and therefore independent because they are jointly Gaussian.
A new averaging r. p. X(n) defined as
X(n) =
W (n) + W (n − 1)
2
for example for n = 1,
X(1) =
W1 + W0
.
2
1
n ≥ 1,
4
Wn
2
0
−2
−4
.
0
1
2
3
4
5
6
7
4
5
6
7
n
Xn
2
0
−2
.
0
1
2
3
n
Figure 1:
A possible realization of the random process W (n) and its corresponding averaging functionX(n).
.
Questions:
1. What is the pdf of X(n)?
2. Find the autocorrelation function RXX (k, l)
Answers:
1. From previous chapters we know that X(n) is a Gaussian r.v. because it is a linear combination of Gaussian r.v. Hence, it is enough to find the mean and variance of Xn to find its
pdf.
[
]
1
E [X(n)] = E
(Wn + Wn−1 ) ,
2
1
= (E [Wn ] + E [Wn−1 ]) ,
2
= 0.
[
V [X (n)] = V
]
1
(Wn + Wn−1 ) ,
2
1
(V [Wn ] + V [Wn−1 ])
4
)
1( 2
σ + σ2 ,
=
4
σ2
=
.
2
=
2
because Wn & Wn−1 are independent,
OR:
[ ]
V [X (n)] = E Xn2 − µ2Xn ,
[
]
1
2
=E
(Wn + Wn−1 ) ,
4
[ 2 ]
)
1 ( [ 2]
=
E Wn + E Wn−1
+ 2E [Wn Wn−1 ] ,
4
)
1( 2
=
σ + σ2 ,
4
σ2
=
,
2
(1)
(2)
(3)
(4)
(5)
where equation (4) follows from the fact that
E [Wn Wn−1 ] = RW W (n, n − 1) = 0.
Therefore,
(
)
1
x2n
fXn (xn ) = √ σ exp − σ2 ,
2π √2
22
( 2)
1
x
= √ exp − n2 .
σ
σ π
2. Before we apply the formula, let us try to find the autocorrelation intuitively. By definition:
X1 =
W1 + W0
,
2
X2 =
W2 + W1
,
2
X3 =
W3 + W2
.
2
It is clear that X1 and X3 are uncorrelated (independent) because they do not have any Wi
in common and W (n) is i.i.d. However, X1 , X2 and X2 , X3 are correlated.
RXX (k, l) = E [Xk Xl ] ,
1
= E [(Wk + Wk−1 ) (Wl + Wl−1 )] ,
4
1
= (E [Wk Wl ] + E [Wk Wl−1 ] + E [Wk−1 Wl ] + E [Wk−1 Wl−1 ]) .
4
Recall from the definition of
{
σ 2 /2
E [Wk Wl ] =
0
{
σ 2 /4
E [Wk−1 Wl ] =
0
W (n) that
{
σ 2 /4
E [Wk Wl−1 ] =
0
{
σ 2 /2
E [Wk−1 Wl−1 ] =
0
k = l,
otherwise,
k = l + 1,
otherwise,
Therefore,
 2
σ

2
RXX (k, l) =
3
σ2
4

0
k = l,
k = l ± 1,
otherwise.
k = l − 1,
otherwise,
k = l,
otherwise.
Figure 2:
A possible realization of the random walk process
Example 2. Random walk process
Let W0 = X0 = constant, and W1 , W2 , . . . be i.i.d. random process with the following distribution
{
1 p,
Wi =
−1 1 − p.
The random walk process Xn , n = 1, 2, . . . is then defined as
Xn = W0 + W1 + W2 + · · · + Wn .
Questions:
1. What is the pdf of Xn ?
2. Find the mean function of Xn .
3. Find the variance of Xn .
Answers:
1. Xn is a Binomial r.v. since it is the summation of Bernoulli r.v. So to find the pdf of Xn is
the same as finding P (Xn = h). Let U be the number of steps “up”, i.e., the corresponding
Wi = 1; and let D be the number of steps “down”, i.e., the corresponding Wi = −1,
}
U − D = h − X0
n + h − X0
⇒U =
.
2
U +D =n
4
Then,
( )
n U
P (Xn = h) =
p (1 − p)n−U ,
U
(
)
n−h+X0
n+h−X0
n
= n+h−X0 p 2
(1 − p) 2
.
2
Remark 1. if n >>>(n is big enough), Xn ∼ N (., .) by CLT.
2.
E [Xn ] = E [X0 ] + E [W1 ] + · · · + E [Wn ] ,
= X0 + n (1 × p + (−1)(1 − p)) ,
= X0 + (2p − 1)n.
Leading to the following for different values of p:
1
if p = ,
2
1
if p > ,
2
1
if p < ,
2
E [Xn ] = X0 ,
E [Xn ] −−−→ +∞,
n→∞
E [Xn ] −−−→ −∞.
n→∞
3.
V [Xn ] = V [X0 ] + V [W1 ] + · · · + V [Wn ] ,
(6)
= 0 + 4np(1 − p),
(7)
= 4np(1 − p).
(8)
Where equation (6) is applicable because W (n), n = 1, 2, . . . , are i.i.d.
Remark 2. By CLT, when n → ∞,
∫
lim FXn (xn ) =
n→∞
xn
−∞
∫
xn
=
−∞
2
(
)
2
0 +(2p−1)n))
exp − (xn −(x
2(4np(1−p))
√ √
dxn ,
2π 4np(1 − p)
(
)
2
0)
exp − (xn −x
2n
1
√
dxn ,
(f or p = ).
2
2πn
Brief review on the random walk process
Recall that the random walk process is a process that starts from a point X0 = h0 . At each time
instant n, Xn = Xn−1 ± 1 (c.f. fig. 3 for an example).
5
X(n)
6
4
2
0.
0
2
4
6
n
Figure 3:
3
An example of the random walk process, h0 = 4.
Independent Increments
Definition 3. A Random Process is said to have independent increments if for all n1 < n2 < · · · <
nT ,
Xn1 , Xn2 − Xn1 , . . . , XnT − XnT −1 ,
are jointly independent for all T > 1.
Example 3. The random walk process is an independent increment process.
For T = 2 and n1 < n2 ,
Xn1 = W0 + W1 + · · · + Wn1 ,
Xn2 = W0 + W1 + · · · + Wn1 + Wn1 +1 + · · · + Wn2 .
Hence,
Xn2 − Xn1 = Wn1 +1 + Wn1 +2 + · · · + Wn2 ,
is independent of Xn1 , because the Wi are i.i.d.
Example 4. Consider the random walk process Xn of last lecture where the Wi , i = 1, 2, . . . are
i.i.d, W0 = h0 , for i ≥ 1
{
1 with probability p = P (Wi = +1) ,
Wi =
−1 with probability 1 − p = P (Wi = −1) ,
and Xn = W0 + W1 + · · · + Wn .
Questions:
1. Find the probability P (X5 = a, X7 = b).
2. Find the autocorrelation function RXX (k, l) of Xn .
6
Answers:
1. Recall from example 2 of last lecture that we can compute P (Xn = h) using the binomial
formula derived there. The first method to answer this question is by using Bayes’ rule:
P (X5 = a, X7 = b) = P (X5 = a) P (X7 = b|X5 = a) ,
= P (X5 = a) P (X7 − X5 = b − a) ,
= P (X5 = a) P (X2 − X0 = b − a) ,
= P (X5 = a) P (X2 = b − a + X0 ) ,
And the second method is by using the independent increment property of the random walk
process as follows:
P (X5 = a, X7 = b) = P (X5 = a, X7 − X5 = b − a) ,
= P (X5 = a) P (X7 − X5 = b − a) .
(9)
(10)
Equation (10) follows from the independent increment property.
2. Now we can use this to find the autocorrelation function, assuming without loss of generality
that l > k.
RXX (k, l) = E[x(k)2 ],
[
]
= E (W0 + W1 + · · · + Wk )2 ,
[
]
= E W02 + W12 + · · · + Wk2 + 2 (W0 W1 + · · · + Wk Wk−1 ) ,
]
]
[
[
= E W02 + kE W12 + 2h0 (E [W1 ] + · · · + E [Wk ]) + 2E [W1 W2 + · · · + Wk−1 Wk ] ,
k(k − 1)
= h20 + k + 2h0 (2p − 1) + 2
(2p − 1)2 .
2
For p = 21 ,
{
RXX (k, l) =
h20 + k
h20 + l
l > k,
l < k.
⇒ RXX (k, l) = h20 + min(k, l).
Practice 1. Try to find at home RXX (k, l) in a different way, i.e., using
RXX (k, l) = [(W1 + W2 + · · · + Wk ) (W1 + W2 + · · · + Wl )] .
Definition 4. A random process X(t) is stationary if it has the same nth-order CDF as X(t + T ),
that is, the two n-dimensional functions
FX (x1 , . . . , xn ; t1 , . . . , tn ) = FX (x1 , . . . , xn ; t1 + T, . . . , tn + T )
are identically equal for all T , for all positive integers n, and for all t1 , . . . , tn .
Example 5. Consider the i.i.d. Gaussian r.p. W1 , W2 , . . . from last lecture.
Wi ∼ N (0, σ 2 )
7
∀ i ≥ 1.
2
X(n)
1
0
−1
−2 .
−10
−5
0
5
10
15
n
Figure 4:
Plot of a stationary random process. The points having the same symbol, or any group of them, have the same
distribution.
Question: Is this r.p. stationary?
Answer: This r.p. is stationary because:
1. All of the Wi , i = 1, 2, . . . have the same pdf, i.e., same mean and same variance.
2. Any two groups of them have the same jointly Gaussian distribution (since they are i.i.d).
Example 6. Consider the averaging process, defined as
Xi =
Wi + Wi−1
2
Where Wi , i = 0, 1, 2, . . . are i.i.d Gaussian r.v.
Question: Is this r.p. stationary?
Answer:
1. The Xi ,i = 1, 2, . . . have the same pdf since the Wi , i = 0, 1, 2, . . . are i.i.d.
2. Recall that we got the auto-correlation function
 2
σ

2
2
RXX (k, l) = σ4


0
8
in the previous lecture:
if k = l,
if k = l ± 1,
if otherwise.
It can be seen that there is correlation between Xi and Xj if and only if the distance between
them is at most 1, i.e., j = i−1, i, i+1. For instance, consider fX1 ,X4 (x1 , x4 ), since 1 ̸= 3, 4, 5,
thus X1 and X4 are independent and
fX1 ,X4 (x1 , x4 ) = fX1 (x1 ) fX4 (x4 ) ,
= fX11 ,X14 (x11 , x14 ) ,
= fX11 (x11 ) fX14 (x14 ) .
Moreover, even though X1 and X2 are correlated we can still say
fX1 ,X2 (x1 , x2 ) = fX11 ,X12 (x11 , x12 ) ,
because they are both jointly Gaussian r.v. and have the same covariance matrices. Therefore,
the averaging process is stationary.
This leads us to a more interesting case: the stationarity of the random walk process.
Example 7. Consider the random walk process X(n) defined as
{
h0
n = 0,
Xn =
h0 + W1 + W2 + · · · + Wn n ≥ 1.
Where h0 = W0 = X0 = constant and Wi , i = 1, 2, . . . are Bernoulli r.v. that can take the values
±1 with probability p and 1 − p.
Question: Is the random walk process stationary?
Answer: By simply looking at the mean of Xm for any m,
E [Xm ] = (2p − 1)m + X0 ,
we see that for p ̸= 0.5, E [Xn ] is a function of m. This means that the mean of each r.v. varies
with m. Thus, the pdf of two different points can not be the same. For p = 0.5, although the mean
is the same, the variance V (Xn ) = 4np(1 − p) is increasing with time, hence two different points
have different distributions. Therefore, the random walk process is not stationary. Interpretation
can be found in Figure 5.
Definition 5. X(n) is called wide sense stationary (WSS) process, iff:
1. E [X(n)] = E [X(0)] , ∀ n (average does not change with time).
2. RXX (k, l) = RXX (k + n, l + n) = RX (k − l) (the autocorrelation function depends only on the
time difference).
Example 8. Is a random walk process WSS?
In general, E(Xn ) = (2p − 1)n + X0 changes with n. So, it does not satisfy the mean condition.
Also, since RXX = h20 + min(k, l) changes with k and l, thus, it fails the auto-correlation condition
either. Therefore, the random walk process is not WSS.
9
Lemma 1. If a random process X(n) is stationary, then it also is wide sense stationary.
X(n) stationary ⇒ X(n) is WSS.
Lemma 2. If X(n) is Gaussian r.p., then if X(n) is wide sense stationary, it is also stationary.
X(n) is Gaussian and WSS ⇔ X(n) is stationary.
Example 9. Let Θ be a R.V. uniformly distributed on [−π, π] and X(n) = cos (nω + Θ), where ω
is a constant.
Questions:
1. Is X(n) WSS?
2. Is X(n) stationary?
Answers:
1. To check the Mean Condition:
∫
+∞
E [X(n)] =
−∞
=
1
2π
fΘ (θ) cos (nω + θ) dθ,
∫
π
cos (nω + θ) dθ,
−π
= 0.
It is equal to 0 because the cosine function is symmetric between −π and π. E [X(n)] is
constant for any n, then it satisfies the mean condition.
2. To check the Auto-Correlation Condition:
RXX (k, l) = E [X (k) X (l)] ,
∫ +π
1
=
cos (kω + θ) cos (lω + θ) dθ,
2π −π
1
= cos (ω (k − l)) ,
2
= RXX (k − l).
(11)
(12)
(13)
(14)
(15)
Where equation (13) can be found after some trigonometric calculations beginning by
cos αcosβ =
1
[cos (α − β) + cos (α + β)] .
2
Thus, X(n) is WSS.
10
3. X(n) is stationary. We will not prove it rigourously, however we will give the main idea.
Consider ω = π,
X(n) = cos(nπ + Θ),
X(1) = cos(π + Θ),
X(2) = cos(2π + Θ) = cos(Θ),
X(3) = cos(3π + Θ) = cos(π + Θ),
X(4) = cos(4π + Θ) = cos(Θ).
It is clear that any Xn is uniformly distributed on [−1, 1] because the cosine is symmetric
on any interval of length 2π. in particular for Θ ∼ U [−π, π] and Θ′ = Θ + π ∼ U [0, 2π],
cos(Θ) ∼ U [−1, 1] and cos(Θ′ ) ∼ U [−1, 1]. Therefore, the Xn have same distribution and
X(n) is stationary.
Generally, since Θ ∼ U [−π, π] and ω is a constant. The r.v. nω + Θ is uniformly distributed
on [nω − π, nω + π] which is an interval of length 2π ⇒ cos (nω + Θ) ∼ U [−1, 1] for all n.
Therefore, all the Xn have the same distribution and X(n) is stationary.
4
Discrete Time Markov Chains (DTMC)
Definition 6 (Markov Process). A discrete Random Process (R.P.) Xn is Markov Process if for
arbitrary times
n1 < n2 < n3 < · · · < nk < nk+1 ,
(
)
(
)
P X(nk+1 ) = xk+1 |X(1) = x1 , X(2) = x2 , ..., X(nk ) = xk = P X(nk+1 ) = xk+1 |X(nk ) = xk .
In other words, the future depends on the past only through the present.
Example 10. The Random Walk Process is a Markov Process:
Xn = X0 + W1 + W2 + W3 + ..... + Wn−1 + Wn ,
P (X6 = 11|X5 = 10, X4 = 9, ..., X0 = x0 ) = P (X6 = 11|X5 = 10) ,
= p.
In general, to prove that a process is Markov, we have to show that:
P (Xn+1 = xn+1 |X0 = x0 , X1 = x1 , ..., Xn = xn ) = P (Xn+1 = xn+1 |Xn = xn ) .
Definition 7 (Markov Chain). If the Markov Process takes discrete values {0, 1, 2, . . . } then it is
called a Markov Chain.
Example 11. Infinite DTMC representing the random walk process. (see figure 6).
Example 12. Finite DTMC. This two-states chain is one of the simplest Markov chains (see figure
7). This DTMC models the Gilbert-Elliot Channel. For example, state 0 is the bad state of the
BSC channel where the probability of bit flip ϵ = 0.1 and state 1 is the good state of that channel
where the probability of bit flip ϵ = 0.0001 (figure 8).
1
For example, let us take α = 10
and β = 15 and assume that at n = 0, the channel is at the good
state. We denote by X(n) = Xn the channel state at time n, X0 = 1.
11
Questions:
1. What is the probability of the channel being at the bad state at n = 1?
2. What is the probability of the channel being at the bad state at n = 3?
Answers:
1. P (Channel is bad at n = 1) = 0.2.
2. To reach the bad state at n=3, the chain has 4 possibilities:
(a) Move to state 0 at n = 1 and stay there until n = 3. This happens wit probability
P (1000) = β(1 − α2 ).
(b) Move to state 0 at n = 1, go back to state 1 at n = 2 and then go to state 0 at n = 3.
This happens with probability P (1010) = β 2 α.
(c) Stay at state 1 at n = 1, move to state 0 at n = 2 and stay there at n = 3. This happens
with probability P (1100) = (1 − β)β(1 − α).
(d) Stay at state 1 for n = 1 and n = 2 then move to state 0 at n = 3. This happens with
probability P (1110) = (1 − β)2 β.
Then the probability of the channel being at the bad state at n = 3 is given by:
P (Channel is bad at n = 3) = P (1000) + P (1010) + P (1100) + P (1110) ,
= β(1 − α2 ) + β 2 α + (1 − β)β(1 − α) + (1 − β)2 β,
= 0.438.
A more general way to solve the above problem, and any DTMC problem, is by using the Transition
Matrix of Markov Chain.
Notation:
1. We denote the probability of being at state j at time n by:
Pj (n) = P [Xn = j],
In the previous example:
P1 (0) = 1,
P0 (0) = 0
and P0 (3) = 0.438.
2. We denote the probability distribution of a Markov chain, containing s states, at time n by:
[
]
P (n) = P0 (n) P1 (n) · · · Ps (n) .
¯
In the previous example:
[
]
P (0) = P0 (0) P1 (0) ,
¯
[
]
= 0 1
12
3. We denote the probability of going from state i to state j, at any time, by:
Pij = P [Xn+1 = j|Xn = i] .
In the previous example:
P01 = α,
P00 = 1 − α,
P11 = 1 − β,
and P10 = β.
4. The matrix containing those probabilities is called transition matrix and denoted by:
[
] [
]
p00 p01
1−α
α
P=
=
,
p10 p11
β
1−β
[
]
0.9 0.1
=
.(from previous example)
0.2 0.8
Property 1. All rows in P sum up to 1.
n
∑
Pij = 1
∀ i = 1, 2, . . . , n.
j=1
Recall that P0 (n)=P (channel is bad at time n) only depends on P0 (n − 1), which only depends on
P0 (n − 2) and so on and so forth. Therefore,
P0 (n) = P00 P0 (n − 1) + P10 P1 (n − 1),
P1 (n) = P01 P0 (n − 1) + P11 P1 (n − 1).
These equations are known as Chapman-Kolmogorov equations. In matrix form:
[
]
[
] [
] p00 p01
P0 (n) P1 (n) = p0 (n − 1) p1 (n − 1)
.
|
{z
} |
{z
} p10 p11
|
{z
}
P (n)
P (n−1)
¯
¯
P
Hence,
P (n) = P (n − 1)P,
¯
¯
P (n − 1) = P (n − 2)P,
¯
¯
⇒ P (n) = P (n − 2)P2 ,
¯
¯
..
.
P (n) = P (0)Pn .
¯
¯
Now, let us apply those equation in the previous example to get the probability of being in a bad
state at n = 3, P0 (3).
[
]
P (0) = P0 (0) P1 (0) ,
¯
[
]
= 0 1 .
[
]
[
] 0.9 0.1 3
P (3) = 0 1
,
0.2 0.8
¯
[
]
[
] 0.381 0.219
= 0 1
,
0.438 0.562
[
]
= 0.438 0.562 .
13
Hence P0 (3) = 0.438.
Notice that for this example, as n = 3, computing P3 can be done using a simple calculator and is
not complicated. However, as n increases, it would be difficult to carry on the matrix multiplication.
Therefore, we can use the eigenvalue decomposition of the matrix to compute Pn as follows:
P = U ΛU −1 ,
P2 = U ΛU −1 U ΛU −1 = U Λ2 U −1 ,
..
.
Pn = U Λn U −1 ,
where, Λ is the diagonal matrix of the

λ1 0

 0 λ2
Λn = 
 ..
 .
0
0 ···
eigenvalues and U is the matrix
n  n
··· 0
λ1 0 · · ·

.. 
n

0
0
. 
 =  0 λ2

 ..
..
.
..
 .
. 0 
0
0 ··· 0
0 λn
of eigenvectors. Recall that

0
.. 
. 
.

0 
λnn
Finding the eigenvalues of P of the previous example for all α and β:
([
])
1−α−λ
α
det (P − λI) = det
,
β
1−β−λ
= (1 − α − λ)(1 − β − λ) − αβ,
= (λ − 1)(λ + α + β − 1).
Hence we get two eigenvalues, λ1 = 1 and λ2 = 1 − α − β. Their corresponding eigenvectors and
the matrix U are the following:
[ ]
[
]
[
]
1
α
1 α
Φ1 =
, Φ2 =
and U =
.
1
−β
1 β
¯
¯
[
]T
Remark 3. Since all the rows of P sum to 1, λ = 1 and Φ = 1 1
are always an eigenvalue
¯
and an eigenvector of P. In fact, PΦ = λΦ by definition of the eigenvalues and eigenvectors; and
by construction of P
   
1
1
 1   1 
   
P .  =  . .
 ..   .. 
1
1
Therefore,
U
−1
1
=
(α + β)
14
[
β α
1 −1
]
.
[
1
P =
(α + β)
[
1
=
(α + β)
n
1 α
1 −β
][
1
0
0 (1 − β − α)n
][
β α
1 −1
]
,
(β + α(1 − α − β)n ) (α − α(1 − α − β)n )
(β − β(1 − α − β)n ) (α + β(1 − α − β)n )
]
,
hereafter, for any n, α and β
P (n) = P (0)P n ,
¯
[¯
]
= 0 1 P n,
]
1 [
(β − β(1 − α − β)n ) (α + β(1 − α − β)n ) ,
=
α+β
]
[
nβ
(1−α−β)n β
β
α
= α+β
,
− (1−α−β)
+
α+β
α+β
α+β
[
]
β
α
−−−→ α+β
if |1 − α − β| < 1.
α+β
n→∞
[
]
In the previous example of Gilbert-Elliot Channel, P (n) −
→ [n → ∞] 23 13 for the given α and
¯
β. So, initially the Markov chain is biased and have memory but eventually after waiting enough
time, the Markov Chain forget from where it starts and converge to a stationary distribution. The
next lecture focuses on the convergence of Markov Chain. The question that will be answered in
next lecture is if the Markov Process converges all the time i.e, if the stationary properties are
always satisfied or not.
Example 13. Consider the following two DTMC where the condition 1 − α − β < 1 is not verified.
Those chains always depend from where they start. For the first case, if we start at state 1 then at
∞ state 1 occurs only at even time instances while state 0 occurs at even time instances and vice
versa. In the second case, if we start at state 0 we get stuck there and same for state 1.
5
Discrete Time Markov Chain
Example 14. Consider the following DTMC:
α = 0.1
1 − α = 0.9
0.
1
1 − β = 0.9
β = 0.1
Since it is a Markov chain, we know that
P (X8 = 1|X1 = 0, X2 = 0, X3 = 1, . . . , X7 = 1) = P (X8 = 1|X7 = 1) .
Hence, assume that we start at the good state (1) at time 0. To find the probability of being at the
same state again at time n is given by
P (n) = P (0)P.
¯
¯
15
Where, P (n) =
¯
matrix
[
P0 (n) P1 (n)
]
is the probability distribution at time n and P is the transition
]
[
1−α
α
P=
.
β
1−β
[
]
In addition, P (0) = 0 1 because we started at state 1. When n goes to infinity the Markov
¯
chain will converge to a stationary distribution
[
] [
]
β
α
P (n) −−−→ α+β
= 21 12 ,
α+β
n→∞
¯
meaning that if wait wait long enough, the chan will forget the starting point and be present at any
state with equal probability.
Example 15 (Page ranking). The DTMC can be used to model the page ranking problem. Consider
the following DTMC, the states represent the webpages and the transtitions represents the probability
of moving from a given webpage to another:
0.5
0.3
1.
0.2
2
0.5
0.3
3
0.4
0.6
0.2
At infinity, if it converges to a stationary distribution, the probabilities will reflect
[ the importance
]
of each page. For instance, if we suppose that this DTMC converges to P (n) = 0.2 0.6 0.2 ,
¯
then page 2 will be the more important and pages 1 and 3 have the same importance.
Our main question now is in which cases P (n) converges? i.e. when Xn converges in distribution?
¯
And to what distribution does it converge?
Chapman-Kolmogorov
P0 (n + 1) = P0 (n)P00 + P1 (n)P10 ,
P1 (n + 1) = P0 (n)P01 + P1 (n)P11 .
Which is equivalent to,
[
P (n + 1) = P (n)P,
¯
] [¯
]
P0 (n + 1) P1 (n + 1) = P0 (n) P1 (n) P.
16
Hence, if P (n) converges to a distribution π =
¯
¯
[
π0
[
]
π0 π1 , then π should satisfy the following:
¯
π = πP,
¯ ¯
[
]
] [
] 1−α
α
π1 = π0 π1
,
β
1−β
{
π0 = π0 (1 − α) + βπ1 ,
⇒
π1 = π0 α + π0 (1 − β),
⇒ απ0 = βπ1 .
Recal that π is a probability vector, than π0 + π1 = 1.
¯
}
(
)
απ0 = βπ1
β
⇒ 1+
π1 = 1.
α
π0 + π1 = 1
Which gives us
π1 =
α
,
α+β
π1 =
β
.
α+β
The Markov Chain of example 14 converges if |1 − α − β| < 1,i.e., it does not converge in two cases:
1. Case 1:1 − α − β = 1 ⇒ α = −β ⇒ α = β = 0.
2. Case 2:1 − α − β = −1 ⇒ α = 2 − β ⇒ α = β = 1.
In order to better understand when the Markov Chain converges we analyze the problematic cases
mentioned above.
Case 1:
1
0.
If X0 = 0 ⇒ P (n) =
¯
If X0 = 1 ⇒ P (n) =
¯
1
[
[
1 0
0 1
]
]
[
[
1 0
0 1
1 0
0 1
1
]n
=
]n
[
[
1 0
0 1
]
]
.
.
Which means there is no stationary distribution.
Remark 4. In general, the idea of convergence is to make the Markov Chain forget where it started
from.
Example 16. Consider the following DTMC:
17
1.
2
3
4
1
In this example, the Markov chain will converge to state 2 no matter where it started from.
Example 17. Consider the following DTMC:
1.
2
3
4
1
5
1
In this example, the stationary distribution will depend on the starting state. If it starts at 5 it will
remain there, if it starts at 2 it will get stuck there.
Example 18. Consider the following Markov Chain
0.
2
1
3
Question: Does this DTMC converge?
Answer: By observation:
• If the initial state is 3, after some n > 0, eventually it will go to state 0 and then go to the
states 1 and 2 and stay there.
• If the initial state is 0, after some n > 0, eventually it will go to states 1 and 2 and stay there.
• If the initial state is 1 or 2, the chain will stay there and never visit states 0 and 3.
18
Therefore, no matter from [where it starts,
] when n → ∞ the chain will converge to a unique
stationary distribution π = ∗ ∗ 0 0 , where ∗ is some positive number less than 1.
¯
Example 19. Now consider the following Markov chain
0.
2
1
3
By observation:
• If the initial state is 3, the chain will stay there
[ and never]visit the states 0, 1 and 2. At
infinity the probability distribution is P (n) = 0 0 0 1 .
¯
• If the initial state is 1 or[ 2, the chain will
] never visit states 0 and 3. At infinity the probability
0
∗
∗
0
distribution is P (n) =
.
¯
• If the initial state is[ 0, after some] n > 0 the chain will move to one of the previous caes and
stay there. P (n) = 0 ∗ ∗ 0 .
¯
Hence, it is clear that the final distribution depends on the initial state and the chain is not
memoryless. Then, we say that this DTMC does not converge to a stationary distribution.
Remark 5. To guarantee convergence in Markov Chain, starting from any state we should be able
to go to any state, i.e
∀i, j and n > 0, Pij (n) = P (Xn = j|X0 = i) ̸= 0.
. Such a Markov Chain is called “irreducible”.
Definition 8. State j is accessible from state i if for some n > 0, Pij (n) > 0.
State i and j can communicate if i is accessible from j and j is accessible from i.
Definition 9. A class is a subset of states in which any tow states can communicate with each
other.
Definition 10. A Markov Chain is “irreducible” if it is formed of one class.
Remark 6. If a Markov Chain is irreducible, we have guarantee that it will not get stuck in one
state and probably it has convergence.
Now we can go back to the second case of non convergence of a two-states DTMC.
19
Case 2:
1
0.
1
1
This is an irreducible Markov chain; however, if X0 = 0, the state 0 will occur only at even
time
[
]
instances only whereas state 1 occurs at the odd ones, i.e., for any n > 0, P( 2n) = 1 0 ,
[
]
P( 2n + 1) = 0 1 and vice versa. Although, you won’t get stuck at any state, the Markov
Chain will never forget where it started from. Hence, it never converges.
Example 20. Consider the following DTMC:
1
1
1.
0.5
2
0.5
3
1
4
Questions:
1. Is this chain irreducible?
2. Does it converge?
Answers:
1. Starting at any state we can reach any other state, hence this is an irreducible DTMC.
2. By observation:
• If X0 = 1, the chain has two possible ways to come back to state 1:
1 → 2 → 1,
1 → 2 → 3 → 4 → 1.
• If X0 = 2, the chain has two possible ways to come back to state 2:
2 → 1 → 2,
2 → 3 → 4 → 1 → 2.
Therefore, it matters from where the chain starts because if the initial state is 1, state 1
occurs only at odd time instances; whereas if the initial state is 2, state 1 occurs only at odd
time instances. We say that this chain is periodic and has a period 2. Henceforth, if the
Markov chain is periodic then there is a convergence problem.
20
It turns out that there is another way to check the periodicity of a Markov chain by drawing
its state diagram differently.
We can think of states 1 and 3 as forming one group (group A), and states 2 and 4 as forming
another group (group B). If X0 = 3, P1 (99) = 0 but if X0 = 4, P1 (99) ̸= 0. That is if
the initial state is from group A, for an even time instance only a state from group A can
occur; whereas, for an odd time instance, only a state from group B can occur and vice versa.
Therefore, this is a periodic DTMC having a period of 2 and does not converge to a unique
stationary distribution.
Example 21. Consider the following Markov chains:
Questions:
1. Is DTMC 1 irreducible? DTMC 2?
2. Is DTMC 1 periodic? DTMC 2?
Answers:
1. Both DTMC 1 and 2 are irreducible because starting at any state we can reach any state.
2. Let us first draw DTMC 1 state diagram by grouping the states that does not directly communicate to check its periodicity. To check the periodicity, it is enough to start at one state
and check all the possible ways to come back to it. The Greatest Common Divisor (GCD) of
the times needed to come back to this state is the period of the chain. For this example we
will start at state 1.
1 → 2 → 3 → 1 ⇒ 3 time instances,
1 → 2 → 3 → 4 → 2 → 3 → 1 ⇒ 6 time instances.
To get back to state 1, we need a multiple of 3 time instances (gcd (3, 6) = 3). Hence, DTMC
1 is periodic and has a period 3.
For DTMC 2 If we start at state 1:
1 → 2 → 3 → 1 ⇒ 3 time instances,
1 → 2 → 3 → 4 → 3 → 1 ⇒ 5 time instances.
The period of DTMC 2 is equal to gcd (3, 5) = 1. Therefore, DTMC 2 is aperiodic.
Remark 7. A state having a self-loop has period 1.
Definition 11. A state i is said to have a period di if:
di = gcd {n : Pii (n) > 0}.
If di = 1, then the state is Aperiodic.
Lemma 3. All states in the same class have the same period.
21
Remark 8. In an irreducible Markov Chain, formed of 1 class, all the states have the same period.
Hence, it is enough to check the periodicity of one state.
Lemma 4. An irreducible DTMC where all the states are aperiodic has a unique stationary distribution independent of the initial condition, i.e,
lim P (n) = π,
¯
n→∞ ¯
where π is the solution of π = πP.
¯
¯ ¯
6
Brief review of DTMC and its convergence
Example 22. Consider the following DTMC:
α = 0.1
1 − α = 0.9
0.
1
1 − β = 0.8
β = 0.2
At infinity the probability of each state will converge to a constant independently from where it
started, i.e.
2
β
= ,
n→∞
α+β
3
α
1
lim P [Xn = 1] =
= .
n→∞
α+β
3
lim P [Xn = 0] =
Which means after waiting long enough time, 13 of the time will be spent in state 1 and 23 of the time
will be spent in state0. [We say ]that the state vector probability P (n) will converge to a stationary
¯
distribution vector π = 23 13 , i.e.,
¯
[
] d.
[
]
P (n) = P0 (n) P1 (n) −
→ π = π0 π1 .
¯
Where,
π = πP,
¯ ¯
and P is the transition matrix. Finally, to find π if it exists, we have to solve:
¯
π0 = 0.9π0 + 0.2π1 ,
π1 = 0.1π0 + 0.8π1 ,
π0 + π1 = 1.
To get the probability of being at a certain state at a specific time instant, we just solve:
P (n) = P (0)Pn .
¯
¯
Now the question that we want to answer is when a DTMC converges. A DTMC converges to a
stationary distribution if:
22
1. It is irreducible, i.e. avoid having multiple classes in the chain.
2. It is also aperiodic, i.e. avoid having groups in which the states only occur at a fixed multiple
of time instances.
Lemma 5. A sufficient condition for an irreducible DTMC to be aperiodic is that it has one state
with a self loop
7
Continuous Time Random Process
Definition 12. A continuous time random process X(t), is a random process defined for any t ∈ R.
We can redefine, in continuous time, everything defined in discrete time.
8
Poisson process
In a Poisson process, customers are arriving at a rate of λ customers/second. We define N (t) as
the number of customers that have arrived so far.
In the Poisson random process, we assume that customers do not leave. The distribution of N (t)
is given by
(tλ)k )e−λt
,
k!
µN (t) = E [N (t)] = λt,
P (N (t) = k) =
V (N (t)) = λt.
Property 2 (Independent increment). The Poisson random process is an independent increment
random process, i.e., for t1 < t2 , N (t1 ) and N (t2 ) − N (t1 ) are independent and N (t2 ) − N (t1 ) has
the same distribution as N (t2 − t1 ).
Hence,
P (N (t1 ) = i, N (t2 ) = j) = P (N (t1 ) = i) P (N (t2 ) − N (t1 ) = j − i) ,
= P (N (ti ) = i) P (N (t2 − t1 ) = j − i) ,
=
(t1 λ)i )e−t1 λ ((t2 − t1 )λ)j−i e−(t2 −t1 )λ
.
i!
(j − i)!
23
8.1
Autocovariance
Assume, without loss of generality, that t1 < t2 .
KN N (t1 , t2 ) = E [(N (t1 ) − λt1 ) (N (t2 ) − λt2 )] ,
= E [(N (t1 ) − λt1 ) (N (t2 ) − N (t1 ) − λt2 + λt1 + N (t1 ) − λt1 )] ,
[
]
= E [(N (t1 ) − λt1 ) (N (t2 ) − N (t1 )) − (λt2 − λt1 )] +E (N (t1 ) − λt1 )2 ,
|
{z
}
0
= λt1 .
If t2 ≤ t1 then KN N (t1 , t2 ) = λt2 . In general, KN N (t1 , t2 ) = λ min (t1 , t2 ).
Question: Is N (t) WSS?
Answer:
1. E [N (t)] = λt, depends on t.
2. KN N (t1 , t2 ) = λ min (t1 , t2 ), depends on t.
Hence N (t) is not WSS.
Theorem 1. The first arrival time X1 ∼ exp(λ).
Proof.
P (X1 > t) = P (N (t) = 0) =
(tλ)0 )e−tλ
= e−tλ .
0!
Therefore,
FX1 (t) = P (X1 ≤ t) = 1 − e(−tλ) .
This is the CDF of an exp(λ) random variable.
Theorem 2. All the inter arrival times, Xi , i = 1, 2, . . . , are iid and have a distribution exp(λ).
Proof.
P (Xi > ti |X0 = 0, X1 = t1 , . . . , Xi−1 = ti−1 ) = P (N (t1 + ... + ti ) − N (t1 + ... + ti−1 ) = 0) ,
= P (N (ti ) = 0) ,
= e−ti λ ,
= P (Xi > ti ) .
24
9
Continuous Gaussian Random Process
Definition 13. X(t) is a Gaussian random process if X(t1 ), X(t2 ), . . . , X(tk ) are jointly Gaussian
for any k, i.e.,
(
)
)T −1 (
)
1
1(
fX(t1 ),X(t2 ),...,X(tk ) =
− X − µX KXX X − µx .
1
1 exp
¯
2 ¯
¯
¯
(2π) 2 |KXX | 2
Example 23. Let X(t) be a Gaussian random process with µX (t) = 3t and KXX = 9e−2|t1 −t2 | .
Question: Find the pdf of X(3) and Y = X(1) + X(2).
Answer: We know that X(3) is a Gaussian r.v. (by definition of X(t)) and Y is also a Gaussian
r.v. being a linear combination of two Gaussian r.v. Therefore, it is enough to find the mean and
the variance of those variables in order to find their PDFs.
1. X(3): µX (3) = 9 and V [X(3)] = 9e−|3−3| = 9. Thus,
2
1 (x−9)
1
fX(3) (x) = √ e− 2 9 .
3 2π
2. Y :
E [y] = E [X(1)] + E [x(2)] = 3 + 6 = 9,
V [Y ] = V (X(1)) + V [X(2)] + 2cov (X(1), X(2)) ,
cov (X(1), X(2)) = 9e
−2|2−1|
= 9e−2 ,
V [Y ] = 9 + 9 + 9e−2 = 18 + 18e−2 .
Another way to find the variance is the following:
[
]
V [Y ] = E (X(1) + X(2))2 − (E [X(1)] + E [X(2)])2 ,
[
]
[
]
= E X(1)2 + E X(2)2 + 2E [X(1)X(2)] − E 2 [X(1)] − E 2 [X(2)] − 2E [X(1)X(2)] ,
= V (X(1)) + V (X(2)) + 2cov (X(1), X(2)) ,
= 9 + 9 + 2 × 9e−2 ,
= 18 + 18e−2 .
25
Figure 5
26
Figure 6: State diagram of an infinite DTMC
.
Figure 7: State diagram of a two-states DTMC
Figure 8: Binary symmetric channel
.
0.
1
0
1
1
1
Figure 9: DTMC 1
1
Figure 10: DTMC 2
1.
0.5
1
3
1
1
0.5
1
2
4
Figure 11: Another way to draw the same DTMC state diagram.
1.
2
3
Figure 12: DTMC 1
27
4
1.
2
3
4
Figure 13: DTMC 2
1.
4
2
3
Figure 14: Another way to draw DTMC 1 state diagram.
1.
2
4
3
Figure 15: DTMC 2
Figure 16:
A realization of the Poisson random process. The Xi represent the times between customers arrivals.
28