Probability and Random Processes Detection

Probability and Random Processes
Lecture 12
• Detection
• Estimation
• Capacity
• Information
Mikael Skoglund, Probability and random processes
1/24
Detection
• A random process on (E T , E T ) with two possible distributions
µ0 and µ1
• Assume that µ0 µ1 and µ1 µ0 (the distributions are
equivalent)
• Observe f ∈ E T and based on the observation
decide H0 : µ0 or H1 : µ1
⇐⇒ design a measurable mapping g : E T → {0, 1}
Mikael Skoglund, Probability and random processes
2/24
Criteria
• Classical: minimize
P (g(f ) = 0|H1 )
subject to P (g(f ) = 1|H0 ) ≤ α
• Bayesian: minimize
Pe = P (g(f ) = 0|H1 )P (H1 ) + P (g(f ) = 1|H0 )P (H0 )
Mikael Skoglund, Probability and random processes
3/24
Bayesian detection
• Let G1 = g −1 ({1}), G0 = g −1 ({0}), assuming G1 ∪ G0 = E T
and G1 ∩ G0 = ∅, then
Pe = P (H1 )
Z
Z
dµ1 + P (H0 )
dµ0
G1
Z dµ1 P (H0 )
= P (H1 )
−
dµ0 + P (H0 )
dµ0 P (H1 )
G0
G0
• Hence, we should set
G0 =
G1 =
dµ1
P (H0 )
f:
(f ) <
dµ0
P (H1 )
dµ1
P (H0 )
f:
(f ) >
dµ0
P (H1 )
Mikael Skoglund, Probability and random processes
4/24
• Compare the likelihood ratio
λ(f ) =
dµ1
(f )
dµ0
to a threshold
• Classical ⇒ Neyman–Pearson: also based on comparing λ to
a threshold
• Given f ∈ E T , how do we compute λ(f )?
Mikael Skoglund, Probability and random processes
5/24
Grenander’s theorem
• Look at (E, E) = (R, B) and T = R+
• X : (Ω, A, P ) → (RT , B T , µ) is separable if there is a set
N ⊂ Ω for which P (N ) = 0 and a sequence {tk } ∈ T such
that for any open interval I and closed set C
{ω : πtk (X(ω)) ∈ C, tk ∈ I}\{ω : πt (X(ω)) ∈ C, t ∈ I} ⊂ N
• i.e., f (t) ∈ RT can be sampled without loss at the tk ’s
• For any X : (Ω, A, P ) → (RT , B T , µ) there is a
X̃ : (Ω, A, P ) → (RT , B T , µ̃) such that X̃ is separable and
P ({ω : πt (X) = πt (X̃), t ∈ T }) = 1
Mikael Skoglund, Probability and random processes
6/24
• Consider the detection problem, and assume that µ1 and µ2
when restricted to {tk }n1 , for any finite n, are both absolutely
continuous with respect to Lebesgue measure on Rn
• Observe f (t), sample as fk = f (tk ) (with {tk } as in the
definition of separability), let f n = (f1 , . . . , fn ) and denote
the densities g1 (f n ) and g2 (f n ) (corresponding to µ1 and µ2 )
• Then the entity
g1 (f n )
gn =
g2 (f n )
converges with probability one to λ(f ) under both H0 and H1
Mikael Skoglund, Probability and random processes
7/24
Gaussian waveforms
• Consider the continuous-time Gaussian example with two
possible mean-value functions, that is, f (t) is Gaussian with
E[f (t)] = mi (t) under Hi , and has a positive-definite
covariance kernel V (s, u) (under both H0 and H1 )
• Without loss we can assume m0 (t) = 0 and m1 (t) = m(t)
• Assume that m(t) can be expressed as
m(t) =
Z
V (t, s)h(s)ds
for some h(t), then
ln λ(f ) =
Z
1
f (t)h(t)dt −
2
Z
m(t)h(t)dt
(with probability one under H0 and H1 )
Mikael Skoglund, Probability and random processes
8/24
Estimation
Bayesian
• Two random objects X and Y on (Ω, A, P ) with range spaces
(R, B) and (E, E) (standard)
• Estimate X ∈ R from observing Y = y ∈ E; MMSE ⇒
X̂(y) = E[X|Y = y]
• That is,
X̂(y) =
Z
xdµy
where µy is the regular conditional distribution for X given
Y =y
Mikael Skoglund, Probability and random processes
9/24
Classical
• For an absolutely continuous random variable X with pdf
fα (x); given the observation X = x we have the traditional
ML estimate
α̂ = arg max fα (x)
α
• A pdf f (x) is the Radon–Nikodym derivative of the
distribution µ on (R, B) w.r.t. Lebesgue measure λ; that is, it
can be interpreted as the likelihood ratio between the
hypothesis H0 : µ = λ and H1 : µ = µX (the correct
distribution)
• In the case of a general random object X : (Ω, A, P ) →
(E, E, µ), we can choose a “dummy hypothesis” H0 : µ = µ0
as a reference to H1 : µ = µα , where µα is the correct
distribution, with an unknown parameter α ∈ R
Mikael Skoglund, Probability and random processes
10/24
• Then, based on the observation X = x, the ML estimate can
be computed as
α̂ = arg max
α
dµα
(x)
dµ0
• The reference distribution can be chosen e.g. such that
computing the likelihood ratio is feasible
• Note that in general, the estimate “α̂ = arg max µα ” does not
make sense; µα is a mapping from sets in E
Mikael Skoglund, Probability and random processes
11/24
Channels
• Given two measurable spaces (Ω, A) and (Γ, S), a mapping
g : Ω × S → R+ is called a transition kernel (from Ω to Γ) if
1
2
f (ω) = g(ω, S) is measurable for any fixed S ∈ S
h(S) = g(ω, S) is a measure on (Γ, S) for any fixed ω ∈ Ω
If the measure h(S) in 2. is a probability measure, then g is
called a stochastic kernel
• If Y is a random object on (Ω, A, P ) with values in (E, E),
then a stochastic kernel g from Ω to E is the regular
conditional distribution of Y given G ⊂ A if
g(ω, F ) = P ({Y ∈ F }|G)(ω)
with probability one w.r.t. P and ω, and for all F ∈ E
• A regular conditional distribution for Y exists if (E, E) is
standard
Mikael Skoglund, Probability and random processes
12/24
• The factorization lemma: Assume two measurable spaces
(Ω1 , A1 ) and (Ω2 , A2 ) and a measurable mapping
u : Ω1 → Ω2 are given. A function v : (Ω1 , A1 ) → (R, B) is
measurable w.r.t σ(u) ⊂ A1 iff there is a measurable mapping
φ : (Ω2 , A2 ) → (R, B) such that v = φ ◦ u
• Let X be an arbitrary random object on (Ω, A, P ), and let Y
be as before, with (E, E) standard. Let
g(ω, F ) = P ({Y ∈ F }|σ(X))(ω)
and let φF be the mapping in the factorization lemma
g(ω, F ) = φF (X(ω)) (w.r.t. ω for a fixed F ), then
P ({Y ∈ F }|X = x) = φF (x)
is the conditional distribution of Y given X = x
Mikael Skoglund, Probability and random processes
13/24
• Assume (Ω, A, P ), a parameter set T and a process
X : (Ω, A, P ) → (E T , E T , µX ) are given
• Given another function/sequence space (F T , F T ), a channel
is a regular conditional distribution from Ω to F T given a
specific value X = x ∈ E T ; that is, for any x ∈ E T the
distribution
µx (F ) = P ({Y ∈ F }|X = x), F ∈ F T
• Interpretation: A random channel input X is generated and is
then transmitted over the channel, resulting in the channel
output Y ; given X = x the distribution for Y is µx
• A channel exists if the relevant spaces are standard
Mikael Skoglund, Probability and random processes
14/24
• Given (E T , E T ) and (F T , F T ), let E T × F T be the product
σ-algebra on E T × F T
• Let µ̃ be defined by
µ̃(A, B) =
Z
P (B|X = x)dµX (x)
A
on rectangles, A ∈ E T , B ∈ F T
• Standard spaces ⇒ unique extension of µ̃ from rectangles to
E T × F T ; a joint distribution µ on (E T × F T , E T × F T )
• Also define the corresponding product distribution π,
generated as the extension of µX (A)µY (B), with
Z
µY (B) =
P (B|X = x)dµX (x)
Ω
Mikael Skoglund, Probability and random processes
15/24
Channel Capacity
• Focus on T = N+ and E = F = R; a channel µx (·) with
input x ∈ RT and output y ∈ RT (sequences), resulting in the
joint distribution µ
• A rate R [bits per channel use] is achievable if information
can be transmitted at R with error probability below ε for any
ε>0
• The capacity C of the channel = sup{R : R is achievable }
Mikael Skoglund, Probability and random processes
16/24
• Let Sn = {1, 2, . . . , n}, define the information density
i(x, y) = log
dµ
dπ
(assuming π µ), and the corresponding restricted version
in (xn , y n ) = log
• Let
dµ|Sn
dπ|Sn
n
o
−1
γ(µX ) = sup α : lim P n in ≤ α = 0
n→∞
Mikael Skoglund, Probability and random processes
17/24
• A general formula for channel capacity [Verdú–Han, ’94]:
C = sup γ(µX )
µX
• Computing C involves the problem of characterizing the limit
γ (for each fixed µX ) ⇒ ergodic theory
Mikael Skoglund, Probability and random processes
18/24
Information Measures
• Given (Ω, A), a measurable partition of Ω is a finite collection
G1 , . . . , Gn , Gi ∈ A, such that Gk ∩ Gl = ∅ for k 6= l and
∪i Gi = Ω
• Given two probability measures P and Q on (Ω, A) and a
measurable partition G = {Gi }ni=1 of Ω, define
D∗ (P kQ)(G) =
X
G∈G
P (G) log
P (G)
Q(G)
• Then, the relative entropy between P and Q is defined as
D(P kQ) = sup D∗ (P kQ)(G)
G
Mikael Skoglund, Probability and random processes
• If P Q, then we get
D(P kQ) =
Mikael Skoglund, Probability and random processes
19/24
Z
log
dP
(ω) dP (ω)
dQ
20/24
• Let X and Y be two random objects on (Ω, A, P ) with range
spaces (Γ, S) and (Λ, U), and a joint distribution µXY on
(Γ × Λ, S × U) corresponding to the marginal distributions
µX and µY
• Let πXY be the corresponding product distribution
• The mutual information between X and Y is then defined as
I(X; Y ) = sup D∗ (µXY kπXY )(F)
F
over measurable partitions F of Γ × Λ
• The entropy of the single variable X is defined as
H(X) = I(X; X)
Mikael Skoglund, Probability and random processes
• If µXY πXY , then
I(X; Y ) =
Mikael Skoglund, Probability and random processes
21/24
Z
log
dµXY
dµXY
dπXY
22/24
• Returning to the setup of transmission over a channel (with
the previous notation), if π µ we had
in (xn , y n ) = log
dµ|Sn
dπ|Sn
• If for any fixed stationary and ergodic input, with distribution
µX , the channel is such that the joint input–output process on
(RT × RT , B T × B T ) is stationary and ergodic, and in addition
satisfies the finite-gap information property (below), then
1
1
in → lim I(X n ; Y n )
n→∞ n
n
with probability one
• finite-gap information: for any n > 0 there is a k ≥ n such that
I(Xk ; X n |X k ) and I(Yk ; Y n |Y k ) are both finite
Mikael Skoglund, Probability and random processes
• Letting
23/24
1
I(X n ; Y n )
n→∞ n
i∞ (µX ) = lim
for any fixed µX , we get
C = sup i∞ (µX )
µX
• Channels that result in this formula for C have been called
information stable
• To prove this, one first needs to see that γ = i∞ for any fixed µX
such that the input, output and joint input–output are stationary
and ergodic. Then one needs to show that the supremum is
achieved in this class.
Mikael Skoglund, Probability and random processes
24/24