Discrete Random Signals

Discrete Random Signals
• Signal :
{
– deterministic : unique value
– random signal : multiple value and described by
probability density function
•
{
– finite energy
– Periodic
→
deterministic
• Infinite energy / nonperiodic signal
→ random signal : average / statistical properties.
1
2
to flip a coin
Head : p
tail : 1 – p
random variable Xn
1
-1
A given sequence { X n − ∞ < n < ∞ } is called
“a realization of the random process, or a sample
sequence of the random process.”
A random process is an indexed family of random
variables {Xn}. The family of random variable is
characterized by a set of prob. distribution functions
that in general may be a function of the index n.
3
prob. density function
p
X
n
(xn,n) =
∂ P
( x n ,n )
∂xn
X n
prob. distribution function
P
X
n
(xn ,n) =
∫
xn
−∞
p X n ( x , n ) dx
Joint prob. density function
p X n ,X m ( x n , n, x m , m ) =
∂ 2 PX n , X m ( x n , n , x m , m )
∂xn∂xm
Joint prob. distribution function
PX n , X m ( x n , n , x m , m ) = prob [ X n ≤ x n , X m ≤ x m ]
=
xn
xm
−∞
−∞
∫ ∫
p X n , X m ( x n , n , x m , m ) dx n dx m
4
If p X n , X m = p X n ⋅ p X m
⇒ statistic independent
If
p X n+k , X m +k = p X n , X m
⇒ stationary (in time)
the prob. functions are independent of a shift
of time-origin.
5
• Recall :
∞
average : m X = E [ X n ] = ∫ xp X ( x , n ) dx
−∞
(mean) : ensemble average
if g(‧) is a single – valued function, then g(xn) is
also a random variable, and the set of random
variable {g(xn)} defines a new random process.
n
E[g (X
n
)] =
n
∫
∞
−∞
g (xn ) p
X
n
( x , n ) dx
6
if the random variables are quantized
⇒ E[ g ( xn )] = ∑ g ( x) p X n ( x, n)
x
E[ X n + Ym ] = E[ X n ] + E[Ym ]
E[ aX n ] = aE [ X n ]
7
In general
E [ X n ⋅ Y m ] ≠ E [ X n ] ⋅ E [Y m ]
if E [ X n ⋅ Y m ] = E [ X n ] ⋅ E [ Y m ]
then Xn and Ym are linearly independent
Statistically independent
E[ X
n
Linearly independent
+ iY m ] = E [ X n ] + iE [ Y m ]
Complex random variable
8
∞
• E[Xn : mean square = ∫ − ∞ x
: average power
2]
2
p
Variance= E [( X n − m x n ) ] =
2
2
= E [ X n ] − m xn
X
n
( x , n ) dx
nonstationary
σ x (n)
2
n
2
= mean square – (mean)2
9
• Autocorrelation sequence :
a measure of the dependence between values
of the random process at different times.
it describes the time variation of a random
signal.
φ
=
xx
(n, m ) = E [ X
∫∫
∞
−∞
n
X
,X
m
*
xn xm p
X
n
*
m
]
( x n , n , x m , m ) d x n dx
m
• Autocovariance
r xx ( n , m ) = E [( X
= φ
xx
n
− m
xn
)( X
(n,m ) − m
− m
m
x
n
m
xm
x
m
)* ]
10
• Cross – correlation
φ
xy
(n, m ) = E [ X
=
∫∫
∞
*
n
Ym ]
*
−∞
xy p X n , X m ( x , n , y , m ) dxdy
• Cross – covariance
rxy ( n , m ) = E [( X n − m x n )( Ym − m y m ) * ]
= φ xy ( n , m ) − m x n m
*
ym
For nonstationary process
φ xx , rxx , φ xy , rxy : two – dimensional sequences
11
• A stationary Random Process means the first order
averages such as mean, E[Xn], and variance,
E[(Xn-mxn)2], averaged only at one specific time n are
independent of time; furthermore; the 2nd order
average such as the autocorrelation, φ xx ( n , m ) = E [ X n , X m * ]
averaged at two specific time n & m, are dependent of
the time difference m-n only.
For stationary process
⇒
m x = E[ X n ]
σ x = E [( X n − m x ) 2 ]
2
φ xx ( n , n + m ) = φ xx ( m ) = E [ X n X n* + m ]
1-D sequence
12
• For only the above three equations hold, such
random processes are called “stationary in
wide sense”
• Not true stationary (strict sense stationary), i.e.,
their prob. distribution are not time invariant.
⇒ PX , X ( x n + k , n + k , x m + k , m + k )
≠ PX n , X m ( x n , n , x m , m )
n+k
m+k
13
• Ex : Bernoulli Random Process :
p X n ( xn , n) = p , xn = 1
1-p , xn = -1
0 , otherwise
{
mxn = 1⋅ p + (−1)(1 − p) = 2 p − 1
2
E[ X n ] = (1) ⋅ p + (−1) (1 − p) = 1
2
2
σ x = E[ X n ] − mx = 1 − (2 p − 1) = 4 p(1 − p)
2
2
2
2
n
14
• If we assume statistical independent and stationary
for m≠0
φxx (m) = E[ X n X n+ m ] = E[ X n ] ⋅ E[ X n+ m ] = mx
*
*
for m = 0
*
2
φ xx ( m ) = E [ X n X n ] = E [ X n ] = 1
⇒ φ xx ( m ) =
{
1
,m=0
mx2 , m≠0
15
2
• if p = 1 / 2 , mx = 2p – 1 = 0
then φ xx ( m ) = 1 , m = 0
0 , m≠0
{
} = δ (m)
we call such random process “White Noise Process”
due to its Fourier transform is unity for all
frequencies.
F [δ ( n )] = 1
F [1] = δ ( f ) = 2πδ ( w)
16
Time average
Xn
N
1
Xn
= lim
∑
N →∞ 2 N + 1
n=− N
X n , X n+m
: time average
N
1
*
= lim
X
X
∑
n n + m : time autocorrelation
N →∞ 2 N + 1
n=− N
the limit exist if {Xn} is a
stationary process with
finite mean.
17
Ergodic. Random Process:
The time average equals to the ensemble average.
Æ X
n
= E [ X n ] = m xn = m x
X n, X
*
n+m
= E[X n, X
*
n+m
] = φ xx ( m )
For an ergodic process, mx, φxx(m), …, etc., can be computed from a
single finite segment of a infinite-energy sequence. i.e.,
estimate
X (n)
N
N
1
=
X (n)
∑
2 N + 1 n=− N
X (n) X * (n + m)
N
Å sample data of Xn
N
1
*
=
X
(
n
)
X
(n + m)
∑
2 N + 1 n =− N
useful for practical power spectra estimation
18
Although the Z-transform of an infinite energy signal does not exist, the
auto covariance and auto correlation sequences of such a sequence are
aperiodic sequence for which the Z-transform and fourier transform often
do exist.
Properties of correlation and covariance sequences of random signals:
For two real, stationary random processes {Xn} & {Yn}
φxx(m) = E[Xn,Xn+m], rxx(m) = E[(Xn-mx)(Xn+m-mx)]
φxy(m) = E[Xn,Yn+m], rxy(m) = E[(Xn-mx)(Yn+m-my)]
Property 1:
rxx(m) = φxx(m) = mx2
rxy(m) = φxy(m) = mxmy
19
if zero mean : then correlation ≡ covariance
(if mx=0, then φxx(m) = rxx(m) and φxy(m) = rxy(m))
Property 2:
φxx(0) = E[X2]
: mean-square value
rxx(0) = σX2
: variance
Property 3:
φxx(m) = φxx(-m); rxx(m) = rxx(-m)
φxy(m) = φyx(-m); rxy(m) = ryx(-m)
Property 4:
│φxy(m)│≦ [φxx(0) φyy(0)]0.5
│rxy(m)│≦ [rxx(0) ryy(0)]0.5
│φxx(m)│≦φxx(0)
│rxx(m)│≦ rxx(0)
20
Property 5: If Yn = X n−n (a time shift) and stationary process
0
φyy(m) = φxx(m)
ryy(m) = rxx(m)
Property 6: For many random processes, the random variables become
less correlated as they become more separated in time. Thus
Lim φxx(m) = E[Xn Xn+m] = E[Xn].E[Xn+m] =mx2
mÆ∞
Lim rxx(m) = E[(Xn-mx)(Xn+m-mx)] = 0
mÆ∞
mx
mx
Lim φxy(m) = E[Xn Yn+m] = E[Xn].E[Yn+m] =mxmy
mÆ∞
Lim rxy(m) = E[(Xn-mx)(Yn+m-my)] = 0
mÆ∞
mx
my
21
Îthe correlation and covariance are aperiodic sequences which tend to
die out for large values of m. Thus it is often possible to represent these
sequences in terms of their Z-transforms.
Z-transform Representation:
φxx(m) ÅÆ Φxx(Z), rxx(m) ÅÆ Rxx(Z),
φxy(m) ÅÆ Φxy(Z), rxy(m) ÅÆ Rxy(Z).
Note: the Z-transforms of φxx(m) and φxy(m) exist only when mx=0, i.e.,
φxx(m) = rxx(m), φxy(m) = rxy(m). In which case:
Φxx(Z) = Rxx(Z)
Φxy(Z) = Rxy(Z)
Property 1:
∞
Rxx(Z) = Σ rxx(m) Z-m
m=-∞
22
σx2 = rxx(0)
= 1/(2πj) ∮c Rxx(Z) Z-1dZ
Property 2:
∞
Rxx(Z) = Σ rxx(m)Z-m
m=-∞
∞
∞
= Σ rxx(m)Z-m + Σ rxx(-m) Zm
m=0
m=1
∞
∞
rxx(m)
Rxx(Z-1) = Σ rxx(m)Zm + Σ rxx(m)Z-m
m=0
m=1
23
∞
∞
= Σ rxx(m)Zm + Σ rxx(m)Z-m
m=1
m=0
Î Rxx(Z-1) = Rxx(Z)
Similarly Rxy(Z) = Ryx*(1/Z*)
The typical region of covergence of Rxx(Z) for two sided sequence is
Ra <│Z│<1/Ra, 0 < Ra < 1
24
Since the R.O.C contains the unit circle, then
σx2 = 1/(2πj) ∮c Rxx(Z) Z-1 dZ
π
∫ π Pxx(w) dw
≡ 1/ (2π)
−
where Pxx(w) = Rxx(Z) │Z=ejw
When mx=0, the variance is equal to the mean-square or average power.
Thus the area under Pxx(w) for -π≦w≦ π is proportional to the average
power in the signal.
Pxx(w) : power density spectrum (or simply spectrum)
Pxx(w) = Pxx(-w) and Pxx(w) ≧ 0
Pxx(w) = Rxx(Z) │Z=ejw = F[rxx(m)]
power density spectrum = Fourier transform of autocovariance
(autocorrelation if mx=0)
25
Note: the Fourier transform of the auto correlation sequence does not
exist if mx≠0. (impulse exist)
Pxy(w) = Rxy(Z) │Z=ejw : cross-power density spectrum
Pxy(w) = Pyx*(-w)
Response of Linear System to Random Signals:
X(n) : real and “wide sense-stationary” random process
∞
∞
Y(n) = Σ h(n-k) X(k) = Σ h(k) X(n-k)
k=-∞
k=-∞
26
m y = E[Y ( n)] =
( H ( e jω ) =
∞
∑ h(k ) E[ x(n − k )] = mx ⋅
k = −∞
∞
− jω k
j0
h
(
k
)
e
,
H
(
e
)=
∑
k = −∞
∞
j0
h
(
k
)
=
m
⋅
H
(
e
)
∑
x
k = −∞
∞
∑ h(k )
k = −∞
, E[ x ( n − k )] = m x )
If a stationary input is put into a linear shift-invariant system, is the
output also stationary?
∞
φ yy (m) = E[ y(n), y(n + m)] = E[ ∑
∞
∑ h ( k ) h( r ) x ( n − k ) x ( n + m − r ) ]
k = −∞ r = −∞
=
∞
∞
∞
∑ h(k ) ∑ h(r ) E[ x(n − k ) x(n + m − r )] = ∑ h(k )φ ′ (m + k )
k = −∞
r = −∞
k = −∞
xx
where E[ x(n − k ) x(n + m − r )] = φxx (m + k − r )
φxx′ (m + k ) ≡
∞
∑ h(r )φ
r = −∞
xx
(m + k − r )
27
ÎThe output auto correlation sequence also depends only on the time
difference m.
ÎThe output is also stationary.
∞
∞
φyy(m) = Σ h(k) Σ h(r) φxx(m+k-r)
k=-∞
let l =r-k ∞
r=-∞
∞
∞
∞
φyy(m) = Σ h(k) Σ h(k+l) φxx(m- l) = Σφxx(m- l) {Σ h(k) h(k+ l)}
k=-∞
l=-∞
l=-∞
k=-∞
v(l)
∞
= Σφxx(m- l)v(l) = φxx(m)*v(m)
l=-∞
convolution
28
∞
Where v(l) = Σ h(k) h(k+l) =h(l) * h(-l) Å Fourier transform can apply
k=-∞
aperiodic auto correlation of h(n)
Output auto correlation = input auto correlation * impulse response
autocorrelation
z φyy(m) = φxx(m) * v(m) (in general, φyy(m) = φxx(m) * h(τ) * h*(- τ))
Φyy(Z) = Φxx(Z) . V(Z) = Φxx(Z)H(Z)H(Z-1)
V(Z) = Z[h(l) * h(-l)] = H(Z) . H(Z-1)
Pyy(m) = Φyy(Z) │Z=ejw = Φxx(ejw) H(ejw) H(e-jw)
( H(e-jw)=H*(ejw) )
( h(n) is real Î H(ejw) = H*(e-jw) Î H*(ejw) = H(ejw) )
Î Pyy(w) = Pxx(w) │H(ejw) │2
Output power density spectra
(magnitude response of the system)2
input power density spectra
29
π
∫
σy2 = φyy(0) = 1/2π −πPyy(w) dw : average power in the output
= 1/2π
π
∫
jw)│2dw
P
(w)
│H(e
−π xx
Now, s’pose that H(ejw) is an ideal bandpass filter, as shown below
Note: Pxx(w) = Pxx(-w)
even function of w
│H(ejw) │2 = │H(e-jw)│2
30
π
∫
σy = φyy(0) = 2/2π │H(ejw)│2Pxx(w) dw
2
0
Wb
∫
= 1/ π W Pxx(w) dw : average power in the output
a
since Pxx(0) ≧0 Î σy2≧0
φxy(m) = E[x(n)y(n+m)] : cross-correlation between input and output
∞
∞
= E[x(n) Σ h(k) x(n+m-k)] = Σ h(k) E[x(n) x(n+m-k)]
k=-∞
k=-∞
∞
= Σ h(k) φxx(m-k) = φxx(m) * h(m)
k=-∞
take z-transform
31
Φxy(Z) = H(Z) Φxx(Z) Î Pxy(w) = H(ejw) Pxx(w) (z=ejw)
φxy(m) = h(m) * φxx(m)
Now if, φxx(m) = σx2δ (m) : the input is a white random process
Î φxy(m) = h(m) *σx2δ(m) = σx2 h(m)
Î for a white-noise input, the cross-correlation between input and
output of a linear system is proportional to the impulse of the
system.
h(m) = φxy(m)/ σx2 : system identification
z φxx(m) = σx2δ(m)
Φxx(Z) = σx2 . 1
Î Pxx(w) = Φxx(Z) │Z=ejw = σx2, -π≦w≦π
Î Pxy(w) = σx2 H(ejw)
: the cross-power spectrum is proportional to the frequency response of the system.
32
Power Spectrum Estimation
: one of the most important areas of application for digital signal
processing
1. Basic Principles of Estimation Theory
x(n): signal --- random process (ergodic process)
For each sample sequence generated by the random process,
N
mx = <x(n)> = Lim 1/(2N+1) Σ X(n)
NÆ∞
n=N
N-1
m̂x = 1/N Σ X(n) : an estimate of mx
n=0
and mx Æ mx, as N “large enough”
33
The branch of statistical theory that pertains to such situation is called
“estimation theory”. Let’s consider a stationary ergodic process {Xn},
-∞< n <∞.
σx2 = E[(X-mx)2] = <(Xn-mx)2>
rxx(m) = E[(Xn-mx)(Xn+m*-mx*)] = <(Xn-mx)(Xn+m*-mx*)>
∞
and Pxx(m) = Σ rxx(m) e-jwm
m=-∞
The estimation of a parameter of the random process is based on a finite
segment of a single sample sequence, i.e., we have N values x(n),
0≦n≦N-1, from which to estimate some parameters which initially we
will denote “α.”
The estimate α of α is a function of the r.v. Xn, 0≦n≦N-1, i.e.,
α = F[X0,X1,…,XN-1] Î α is also a random variable.
34
Prob. Density function Pαˆ (αˆ ) will depend on
(i) The choice of the estimator F[]
(ii) The prob. Density functions of Xn
•
Estimate 2 is superior to estimate 1 because the prob. Density of
estimate 2 is more concentrated about the true value α
35
Confidence interval:
to measure or characterize the concentration of the prob. Density
function of an estimator
α-△2 ≦ α ≦ α+ △1
36
Generally speaking, for a good estimator the prob. Density function Pαˆ (αˆ )
should be narrow and concentrated around the true value.
Two properties of estimators that are commonly used as a basis for
comparison
(I) bias
(II) variance
The bias of an estimator is defined as the true value of the parameter
minus the expected value of the estimate, i.e.,
bias = α – E[α] = B
B = 0 Î unbiased estimator.
The variance of the estimator in effect measures the width of the prob.
Density and is defined by
var[α] = E[(α-E[α])2] = σα2
37
A small variance suggests that the prob. density Pαˆ (αˆ ) is concentrated
around its mean value, which, if the estimator is also unbiased, will be
the true value of the parameter.
mean square error = E[(α-α)2] = σα2 + B2
An estimate is said to be consistent if as the number of observations
becomes large, the bias and the variance both tend to zero.
38
Detection and Estimation
Basic concept of detection theory :
detection process:
to make a decision regarding the presence or absence of a particular
signal in a noisy environment.
Hypothesis Testing : the fundamental of yes-no decision
signal
r(n) = s(n) + w(n)
received signal
noise
r(n) = 0 + w(n)
By observing r(n) we must decide if the source signal s(n) is present
or not. Intuitively, we will expect to make the decision based on some
amplitude comparison.
39
Questions:
What do we compare ?
How do we make the comparison ?
such that the decision process will be optimum (and implicitly,”
what do we mean by optimum?”)
To begin, we designate two mutually exclusive hypotheses
H and H as follows :
o
1
H : r(n) = w(n)
o
H : r(n) = s(n) + w(n)
1
40
If we observe r(n) and make an arbitrary decision on the presence or
absence of s(n) we can have 4 possible outcomes :
2 correct decisions :
choose H and H
choose H and H
2 incorrect decisions :
choose H and H
choose H and H
1
1
o
o
1
o
o
1
is true
is true
is true
is true
There are 4 common approaches which lead to rules for choosing a
hypothesis. Each depends on an increasing level of statistical
knowledge about the environment.
1. Maximum a Posteriori (MAP) Criterion:
:choose the hypothesis that is most likely to have the result
presented in the given observation.
41
Consider the a posteriori (after the fact ) conditional properties
P H |r
and P H |r
1
o
decision rule :
if P H |r > P H |r
1
or
P{H
P{H
o
1
0
then choose H
1
| r}
> 1
| r}
This approach demands the minimum knowledge about the statistics of
the environment.
42
2. Maximum Likelihood (ML) Criterion
The previous decision rule considers only the a posteriori probabilities
for H and H with no consideration of the actual a priori
probabilities of the source signal s(n) being present or not.
o
1
From the Bayes Theorem , we obtain
P{H |r}=P{r| H }P{H }/P{r}
1
1
1
and P{ H |r}=P{r|H }P{H }/P{r}=P{r|H }(1-P{H })/P{r}
where P{r| H } represents the prob. of an observation r given H is true
o
o
o
o
1
1
1
decision rule :
choose H if P{r| H }/P{r| H } >(1-P{ H })/P{H } : likelihood ratio
test
from the result of MAP
1
1
otherwise choose H .
o
o
1
1
P{r | H 1} ⋅ P{H 1}
P{r , H 1}
=
>1
P{r | H 0 } ⋅ (1 − P{H 1}) P{r , H 43
0}
The ratio P{r|H }/P{r| H } is known as the “likelihood ratio”
The effect of the likelihood ratio test is to shift the level at which the
decision changes based on the a priori probability P{H }.
Note that , the ML and MAP criterion are equivalent when P{ H }=
P{ H } = ½.
Both the preceding criteria ignored the cost or penalty of being wrong.
In many instances some errors are more costly then others.This will
be considered next.
1
o
1
1
o
3. Bayes Criterion:
based on the assignment of a cost to each possible outcome of the
decision process and performing a likelihood ratio test that minimizes
the average cost.
Consider the joint probabilities for each possible outcome of the
decision process, P1 P2 P3 and P4
i.e.
P1 ={choose H ,and H is true} = P{ h1 , H }
=P{ h1| H }P{H }
44
1
1
1
1
1
=> P{h1 | H }=P{h1 , H }/P{ H }
Similarly
P2 =P{ho | H }P{H }
1
1
1
o
o
P =P{ h | H }P{ H }
3
1
o
o
P =P{h | H }P{H }
4
o
1
1
By assigning costs c1,c2,c3 and c4 to each joint prob.
P1 P2 P3 and P4 we form the “Bayes risk function”
R=c1P1 +c2 P2 +c3P3 +c4P4
P{ ho | H }=1-P{ h1 | H }
=P{H }[c1P{h1 | H }+c4P{ho| H }]
+P{H }[c2P{ho | H }+c3P{ h1 | H }] P{ ho | H }=1-P{ h1 | H }
=P{H }c4+P{H }c2-P{H }[c4-c1]P{ h1 | H }+P{ H }[c3-c2]P{h1 |H }
1
1
1
o
o
1
o
1
1
o
o
1
1
o
o
o
The first two terms of R are independent of the decision . Thus, to
minimize the risk we must minimize the sum of the last two terms.
45
We assume that the cost of a correct decision is less than that of an
incorrect decision; => c4-c1 ≥ 0 and c3-c2 ≥ 0
to minimize the cost , we wish to make our decision such that
P{ H }[c4-c1]P{ h1 | H } ≥ P{H }[c3-c2]P{ h1 | H }
1
or
1
o
o
P{h1 | H1} P{H 0 }[c3-c2 ]
≥
P{h1 | H 0 } P{H1}[c4 − c1 ]
The prob. That the observation is one that decision rule assign to H1
given that H1 is true.
decision rule: choose H1 if
P{h1 | H1} P{H 0 }[c3-c2 ]
≥
P{h1 | H 0 } P{H1}[c4 − c1 ]
Otherwise, choose H0
46
4.Neyman-Pearson Criterion
: The Neyman-Pearson criterion for optimum detection is based on
specifying a fixed probability for false detection (i.e.,fixed P{h1 | H }
which is often denoted P f for prob. of false alarm) and maximizing
the prob. of detection Pd =P{h1 | H }
o
1
Since P{h1 | H }=1-P{ho | H } and P{ h1|H } is fixed, we can
minimize the quantity Q= P{ho | H }+ α P{h1 |H }
which is equivalent to maximizing P{h1 | H }, where α represents
an arbitrary constant. If we assume zero cost for a correct decision
i.e. , c1=c2=0. Then from 3, we obtain
R=P{H }c4-P{H }c4P{h1 | H }+P{H }c3P{ h1 | H }
=P{H }[1-P{h1 | H }]c4+P{ H }c3P{ h1 |H }
=P{H }c4P{ ho | H }+P{ H }c3P{ h1 | H }
Thus, if P{H }c4=1 and P{H }c3 = α
We have R= Q
1
1
o
o
1
1
1
1
1
1
1
1
o
o
o
1
1
o
o
o
o
47
From the Bayes criterion we know that minimizing Q will give the
decision rule : choose H if
determined from c3
1
P{r| H }/P{r| H } ≥ ( P{ H }c3/P{H }c4) = α
1
o
o
1
The cost associated with false alarms
Ex : Consider a unit value signal in zero mean Gaussian noise with
unit variance.
If the signal is present then the observed signal will be the additive
result of signal plus noise and will appear as a Gaussian distribution
with unit mean.
If the signal is not present the noise alone will appear as a Gaussian
distribution with zero mean, i.e. ,
1 −r / 2
1 −( r −1) 2 / 2
e
and
P{r|
}=
H
e
P{r| H }=
2π
2π
2
1
o
48
The likelihood ratio , λ (r), is
1
r−
P{r| H }
2
=
(r)
=
λ
e
P{r| H }
1
o
r−
1
2
and our decision rule becomes : choose H if e
≥ T,
where T is a threshold whose determination depends on which
optimality criterion is being used.
1
It is common to use a log likelihood ratio test derived from the fact
that taking logarithms dose not alter the threshold position but gives
a linear comparison of the form :
1
choose H if r ≥ 2 + ln T
1
The observation r (possibly a voltage level measurement )
represents a sufficient statistics on which to base our decision.
49
Effect of threshold placement
on the prob. of false alarm and
missed detection:
P{ho| H }
o
P{ h1 | H }
1
1
Pmiss = P{ho | H1}
choose H
1/2
r
Pf = P{h1 | H o }
choose H
o
1
+ ln T
2
50
1
The matched Filter : (The correlation filter)
w(n) : white noise
the detection of a signal in white noise will be more reliable if we
make several observations rather than a single observation. If we
were to average the observations over a period of time and then
make the threshold comparison we would smooth out the sample by
sample fluctuations due to the noise , and essentially have a measure
of the mean value over the observation time with which to make our
comparison.
r(t) = s(t)+w(t)
the optimum detection receiver is a correlation receiver based on
the decision rule : choose H if
1
∫
T
0
r (t ) s (t )dt > T
a predetermined threshold level
This operation can be obtained by applying r(t) to a linear filter
with impulse response S(τ − t) : a time reversed replica of the signal
to be detected.
51
digital matched filter :
r(n)
h(n)
h(n) : the time inverse of the signal to be detected.
r(n)= s(n)+w(n)
w(n) ∞
=> r(n)*s(-n) = ∑ s(−k )r (n − k )
k = −∞
=
∞
∑ s(i)r (n + i) : correlation
i = −∞
=
∞
∑ s(i)(s(n + i) + w(n + i))
i = −∞
= Rss (n) + Rsw (n)
(with signal)
or r(n)*s(-n) = Rsw (n)
(without signal)
52
Thus , for detecting a known signal the output of a matched filter
represents the sufficient statistic that is compared to a threshold
value.
decision rule : at each time τ choose H
if
r(n)*s(-n) ≥ T
The matched filter problem can be proved by considering the
following optimization problem :
1
LTI
r(n)=s(t)+w(t)
2
y s (t 0 )
h(τ ) : 2
(max)
y w (t 0 )
ys (t ) + yw (t )
∞
ys (t0) = ∫ s(τ)h(t0 −τ)dτ
−∞
∞
yw(t0) = ∫ w(τ)h(t0 −τ)dτ
−∞
2
y s (t0 ) Signal to noise
S
( ) t0 = 2
: ratio maximized
53
N
y w (t0 )
Basic Concepts of Estimation Theory
r(n) = s(n)+w(n)
If we wish to determine only the presence or absence of s(n) we
have a detection problem. If we wish to measure (estimate ) some
parameter of s(n) we generally refer to this as an estimation problem.
Random process
Recall: the Quality of an Estimation
∧
E[α ] = α
: unbiased estimate
∧
E[α ] = α + c
: biased estimate
where c is a constant
∧
E[α ] = α + f (α ) : the estimate has an unknown bias
where f
(α )
is some function of α
54
∧
∧
∧
var[α ] = E{[α − E[α ]]2 }
∧
∧
: how close a single estimate is to the mean
∧
var{(α − α ) } = E{[α − α − E{α − α }]2 }
2
To have the greatest degree of confidence in an estimate we would
like it to be unbiased and a minimum variance.
If both the variance and bias approaches zero as the number of
estimation trials approaches infinity the estimation is said to be
“consistent.”
55
Linear Minimum Mean Square Error Estimation
y
L
The projection Theorem:
(x,y)
→
v
→
v − vˆ
→
→
∧
v
∧ ∧
( x, y )
x
Recall : 2-D case
→
v : the vector from origin to the point (x,y)
The shortest distance from the point to the line L through the
→ →
origin is the vector v − vˆ which is perpendicular or orthogonal to
the line L.
→ →
That is , the→dot product of ( v − vˆ ) with any vector along L is zero.
→ →
The vector v̂ along L such that ( v − vˆ ) is orthogonal to L is
→
56
called the projection of v on L.
The 2-D case can be extended to an arbitrary number of dimensions.
⇒“Hilbert space”
The vectors of a Hilbert space may be real or complex sequences,
functions or random variables and infinite in number(e.g. a Fourier
series).
The dot product of Euclidean space is generalized to an inner
product in Hilbert space defined as a function that assigns to every
pair of vectors x and y the scalar value ( x , y ) such that :
*
*
*
1.( x , y ) = ( x , y ) where x is the complex conjugate of x .
2.c( x , y )=(c x , y )=( x ,c y ) ; c is a scalar
3.( x + z , y )=( x , y )+( z , y )
4.( x , x ) ≥ 0
5.( x , x )=0 iff x = φ
: null vector.
6.( x , x )= || x ||2
: norm of x
57
Any two vectors x and y in a Hilbert space H are orthogonal if
their inner product ( x , y )=0.
The norm is a direct extension of the concept of the magnitude or
length of a vector in Euclidean space.
The projection theorem assures us that for∧ any vector v in a
Hilbert space H, there is a unique vector v defined on a complete
subspace S ∧of H called the projection ∧of v on S such that norm
of v - v is minimum if v - v
is orthogonal to S.
58
Given a set of K vectors s1 , s 2 , …, s k each of dimension N,they
define a subspace S of the N-dimension Hilbert space H. We wish to
determine the projection v̂ of an arbitrary vector v in H on the
subspace S. We know from the projection theorem that v − vˆ must
be orthogonal to S, that is the inner-product of v − vˆ with every
vector in S must be zero.
⇒ v − vˆ
is orthogonal to each vector s j , j=1,2,…,k, i.e.
(v − vˆ ,
s
j
) =( v ,
s
j )-(
v̂ ,
s )=0
j
Now v̂ is a vector in S and can be represented as a linear
combination of the vectors s j , i.e.,
k
v̂ = ∑ c j s j =
j =1
−
s
c
−
Where Sc is the n x k matrix whose columns are the vectors
c is a k element vector of coefficients.
sj
and
59
( v ,
sj)=(
k
v̂ , s j ) = ∑ c (
i
si , s j )
j =1
In matrix form,
( s1 , s1 ) ( s1 , s2 )…( s1
( v , s1 )
−
−
−
−
−
.
…
…
…
.
=
…
…
…
.
…
…
…
s
v
(
, k)
( sk , s1 ) ( sk , s2 )…( sk
−
−
−
−
−
,
,
sk
−
sk
−
c
)
1
.
.
.
)
c
k
or A=GC : “normal equations”
If the inner product depends
with the obvious matrix equivalences. only on the difference of
the indices.
60
=>Toeplitz matrix.
G: Gram matrix : is non-singular iff Sj are linearly independent
( i.e. , the set of vectors Sj from a basis set of the subspace S)
C=G-1A
r
v r
If the norm of x − xˆ is minimized , then xˆ is a good estimate of
v
x
in some sense.
Note that the Gram matrix contains only information about the
subspace onto which the projection is being made. All the information
regarding the vector to be estimated is contained in the A matrix.
Thus ,when formulating estimation problems to be solved using the
projection theorem , the known or observed data from which the
estimation is to be made will define the Gram matrix. Specification of
the A matrix may be based on observed data as well in some problems
or it may be derived from a model or an assumption regarding the
relationship of the quantity to be estimated to the data from which it is
to be estimated.
k
vˆ =
∑
j=1
c js
j
61
vˆ =
k
∑
j=1
c
j
s
j
Represents the basic model of a linear estimation problem , expressing
the desired estimate as a linear combination of the input data.
ex: eigen space decomposition
estimation
data compression
62
Spectral Estimation:
digital filtering
Dsp
power spectral estimation
1.Periodogram: the Fourier transform of the
autocorrelation / autocovariance estimates
. Autocorrelation estimate : consistent
. Periodogram : not consistent estimate of spectral density
2.Average/smoothed periodograms
:consistent
3.Autoregressive spectral estimation
. Maximum likelihood method (MLM)
.Maximum entropy method (MEM)
63