Mathematical Preliminaries
Matrix Theory
Vectors
nth element of vector u : u(n)
Matrix
u (1)
u ( 2)
u {u (n)}
u ( N )
mth row and nth column of A : a(m,n)
a(1,2) a (1, N ) column vector
a (1,1)
a (2,1)
A {a (m, n)}
a1 a 2 a N
a( M ,1) a( M ,2) a ( M , N )
where a k [a(1, k ) a(2, k ) a( M , k )]T
37
Lexicographic Ordering(Stacking operation)
r1
r
T
x [ x(1,1) x(1,2) x(1, N ) x(2,1) x(2, N ) x(M ,1) x(M , N )] 2 ,
x(1,2)
x (1, N )
x (1,1)
where row vector
x ( 2,1)
rM
T
rk [ x(k ,1) x(k ,2) x(k , N )]
Row-ordered form of a matrix
x( M ,1)
x( M ,2)
x ( M , N )
c1
c2
T
,
x [ x(1,1) x(2,1) x( M ,1) x(1,2) x(M ,2) x(1, N ) x(M , N )]
x(1,2)
x (1, N )
x (1,1)
c M
where column vector
x ( 2,1)
Column-ordered form of a matrix
c k [ x(1, k ) x(2, k ) x( M , k )]
T
x( M ,1)
38
x( M ,2)
x ( M , N )
39
Transposition and conjugation rules
A*T [ AT ]* ,
[ AB ]T BT AT
[ A 1 ]T [ AT ]1 ,
[ AB ]* A**
Toeplitz matrices
t (m, n) tmn
Circulant matrices
c(m, n) c(( m n) modulo N )
c(( m n) % N )
t 0
t
1
t 2
T
t
N 1
c0
c
N 1
C
c2
c
1
40
t0
t 1
t2
t N 1
t N 2
t 1
t1 t 0
c1
c2
c0
c1
c2
t 1
c N 1
c N 1
c N 2
c1
c0
Linear convolution using Toeplitz matrix
h(n) 0, for n 0, n N h
x(n) 0, for n 0, n N x
y(n) h(n) x(n)
N x 1
h(n k ) x(k )
k 0
y(0)
y(1)
y(l )
N x 1
h( n k ) x ( k )
h(0) x(0)
k 0
N x 1
n 0
k 0
N x 1
n 1
k 0
n l
1
h(n k ) x(k )
h(1 k ) x(k ) h(1) x(0) h(0) x(1)
h(n k ) x(k )
k 0
l
h(l k ) x(k ) h(l ) x(0) h(l 1) x(1) h(0) x(l )
k 0
41
y Hx
Nx
Nh N x 1
y (0)
0
0
h ( 0)
y (1)
h(0)
0
h(1)
h(2)
y ( 2)
h(1)
h ( 0)
h( N h 1) h( N h 2)
0
h( N h 1) h( N h 2)
0
0
h( N h 1)
0
0
0
0
0
0
y ( N h N x 2)
y
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
H
(Toepliz matrix)
42
0
0
0
0
0
0
0
0
x(0)
x(1)
h(1) h(0)
0
0
h(2) h(1)
h ( 0)
0
h(1)
h ( 0)
x( N x 1)
0
0
h( N h 1) h( N h 2)
0
0
0
h( N h 1)
0
0
0
0
x
Circular convolution using circulant matrix
N-point circular convolution :
~
h ( n)
h(n) N x(n)
h(n kN ),
k
x(n) 0, for n 0, n N
N 1 ~
h (n k ) x(k ), 0 n N
y ( n) h( n) N x ( n )
k 0
0,
otherwise
N 1
~
y (0) h (n k ) x(k )
k 0
N 1
n 0
N 1
~
y (1) h (n k ) x(k )
k 0
k 0
k 0
k 1
N 1
n 1
N 1
~
y (l ) h (n k ) x(k )
N 1
~
h (k ) x(k ) h(0) x(0) h( N k ) x(k )
N 1
~
h (1 k ) x(k ) h(1) x(0) h(0) x(1) h( N k ) x(k )
k 0
k 2
N 1
n l
~
h (l k ) x(k )
k 0
h(l ) x(0) h(l 1) x(1) h(0) x(l )
N 1
h( N k ) x ( k )
k l 1
43
y Hx
h( N 1) h( N 2)
h(2)
h(1) x(0)
y (0) h(0)
y (1) h(1)
h(0)
h( N 1) h( N 2)
h(2) x(1)
y (2) h(2)
x(2)
h(1)
h(0)
h( N 1)
y ( N 2) h( N 2)
h(2)
h(1)
h(0) h( N 1) x( N 2)
h(2)
h(1)
h(0) x( N 1)
y ( N 1) h( N 1) h( N 2)
y
x
H
(circulant matrix)
Circular convolution + zero padding
linear convolution
h(n) 0, for n 0, n N h ,
x(n) 0, for n 0, n N x
Circular convolution with the period : N N x N h 1
the same result with that of linear convolution
44
(ex) Linear convolution as a Toeplitz matrix operation
h(n) n, 1 n 1
y (1) 1 0 0 0 0
y (0) 0 1 0 0 0 x(0)
y (1) 1 0 1 0 0 x(1)
y
(
2
)
0
1
0
1
0
x
(
2
)
y (3) 0 0
1 0 1 x(3)
1 0 x(4)
y (4) 0 0 0
y (5) 0 0 0 0
1
4
y ( n) h( n) x ( n) h( n k ) x ( k )
k 0
M N L 1 5 3 1 7
(ex) Circular convolution as a circulant matrix operation
h(n) n 3, h(n) h(n N ), N 4
N 1
y (0) 3
y (1) 0
y (2) 1
y (3) 2
y (n) h(n k ) x(k ), 0 n N 1
k 0
45
2 1 0 x(0)
3 2 1 x(1)
0 3 2 x(2)
1 0 3 x(3)
Orthogonal and unitary matrices
Orthogonal :A 1 AT or AA T AT A I
*T
*T
1
*T
or
AA
A
AI
A
A
Unitary :
Positive definiteness and quadratic forms
A is called positive definite, if A is a Hermitian matrix
and
Q x*T Ax 0, x 0
A is called positive semidefinite(nonnegative), if Ais a
Hermitian matrix and Q x*T Ax 0, x 0
Theorem
if A is a symmetric positive definite matrix,
then all its eigenvalues are positive and the
N
N
determinant of A satisfies | A | k a(k , k )
k 1
46
k 1
Diagonal forms
For any Hermitian matrix R there exists a unitary matrix
such that
*T R
(or R )
: diagonal matrix containing the the eigenvalues of R
Eigenvalue and eigenvector
Rk kk , k 1,, N
k
: eigenvalue
k
: eigenvector
where [1 | 2 | | N ]
47
Block Matrices
Block matrices : elements are matrices
n
n
(ex)
n
1 4 1
1
1
1 1
2 5 3
m
y (m, n)
1 5 5 1
3 10 5 2
2 3 -2 -3
m
m
x ( m, n )
y(m,n)
h ( m, n )
2
1
x(m' , n' )h(m m' , n n' ),
m ' 0 n ' 0
2 1
X 5 4 [x 0
3 1
0 m 3, 0 n 2
2
5
3
x
1
4
1
x1 ]
2 3 1
3 10 5
y 0
Y
2 5 5
3 2 1
A1,1
A
2,1
A
A m ,1
y1
;
y 0
y y 1
y 2
y2
48
Column
Stacking
Operation
A1, 2
A 2, 2
A m,2
A1,n
A 2,n
A m ,n
Let xn1 and yn be the column vector, then
yn
H
n' 0
n n'
xn
where Hn {h(m m' , n)},
1 0 0
1 1 0
,
H0
0 1 1
0 0 1
y 0 H 0
y H
1 1
y 2 0
1
1
H1
0
0
0
x 0
H0
x1
H 1
0
1
1
0
block matrix
49
0 m 3, 0 m' 2
0
0
1
1
y Hx
Kronecker Products
Definition
a (1,1)B a (1, M 1 )B
A B {a (m, n)B}
a ( M 1 ,1)B a ( M 1 , M 1 )B
(ex)
1 1
1 2
A
,
B
3 4
1 1
1
3
AB
1
3
Properties(Table2.7)
( A B)(C D) ( AC) (BD)
(A B)(C D) : O( N 6 ) O( N 4 ) operations
(AC) (BD) : O( N 4 ) operations
50
2
1 2
2
1
1 1 2 2
4
3
4
, BA
3
2 1 2
3 4
4
4 3 4
3 3 4 4
2
1
Separable transformation
Transformation on an NXM image U
Consider the transformation
v(k , l ) u (m, n)t (k , l ; m, n)
m
n
m
n
a(k , m)u (m, n)b(l , n) , if t (k , l ; m, n) a(k , m)b(l , n)
V AUB T
: matrix form
v ( A B)u
: vector form
row-ordered form
Let v k , um : row vectors of U and V
then vTk a(k , m)[BuTm ] [ A B]k ,m uTm
m
m
v [ v v v ] (A B)u
T
1
T
2
T T
M
51
52
Random Signals
Definitions
Random signal : a sequence of random variables
Mean : u (n) E[u(n)]
Variance : 2 (n) E[| u(n) (n) |2 ]
Covariance : Cov[u(n), u(n' )] 2 (n, n' )
u
uu
E{[u(n) (n)][u * (n' ) * (n' )]}
Cross covariance :
Autocorrelation :
Cov[u(n), v(n' )] uv2 (n, n' )
*
E{[u(n) u (n)][v* (n' ) v (n' )]}
ruu (n, n' ) E[u (n)u * (n' )]
2
uu
(n, n' ) (n) * (n' )
Cross correlation :
ruv (n, n' ) E[u (n)u * (n' )] uv2 (n, n' ) u (n) v (n' )
*
53
Representation for an NX1 vector
E[u] μ { (n)}
u
: Nx1 vector
2
Cov[u] E[(u μ )(u* μ * )T ] C { uu
(n, n' )} : NxN matrix
*
Cov[u, v] E[(u μ u )( v* μ v )T ] Cuv { uv2 (n, n' )} : NxN matrix
μ : mean vector
C
: covariance matrix
Gaussian(or Normal) distribution
| u |2
pu (u )
exp{
}
2
2
2
2
1
Gaussian random processes
Gaussian random process if the joint probability density
of any finite sub-sequence is a Gaussian distribution
pu (u ) pu (u1 , u2 ,, u N ) [( 2 ) N / 2 | C |1/ 2 ]1 exp{ 1 / 2(u μ )*T C1 (u μ )}
C
: covariance matrix
54
Stationary process
Strict-sense stationary
if the joint density of any partial sequence {x(l ), n l k}
is the same as that of the shifted sequence {x(l n0 ), n l k}
Fx ( n ), x ( n1),, x ( k ) ( xn , xn1 ,, xk )
Fx ( n n0 ), x ( n1 n0 ),, x ( k n0 ) ( xn n0 , xn n0 1 ,, xk n0 ),
for n, n0 , k
Wide-sense stationary
if E[u (n)] μ constant
E[u (n)u * (n' )] ruu (n n' ) : covariance matrix is Toeplitz
Gaussian process : wide-sense = strict sense
55
Markov processes
p-th order Markov
prob[u (n) | u (n 1), u (n 2), ]
prob[u (n) | u (n 1), , u (n p)], n
(ex) Covariance matrix of a first-order stationary Markov sequence u(n)
2
uu
(n) |n| , | | 1, n
1
2 N 1
C
2
N 1
1
: Toeplitz
*
E
[
xy
]0
:
Orthogonal
p x , y ( x, y ) p x ( x ) p y ( y )
Independent :
E[ xy* ] E[ x]E[ y* ] or
Uncorrelated :
56
E[( x x )( y y )* ] 0
Karhunen-Loeve(KL) transform
KL transform of x
: NxN unitary matrix
y *T x,
Property
*T
E[yy ] *T E[xx*T ] *T R
E[ y (k ) y * (l )] k (k l )
The elements of y(k) are orthogonal
*T is called the KL transform matrix
The rows of *T are the conjugate eigenvectors of R
57
Discrete Random Field
Definitions
Discrete random field
Each sample of a 2-D sequence is a random variable
Mean : E[u (m, n)] (m, n)
Covariance : Cov[u(m, n), u(m' , n' )] 2 (m, n; m' , n' )
uu
E{[u(m, n) (m, n)][u * (m' , n' ) * (m' , n' )]}
White noise field
2 (m, n; m' , n' ) xx2 (m, n) (m m' , n n' )
xx
Symmetry
uu2 (m, n; m' , n' ) uu2 (m' , n' ; m, n)
*
58
Separable and isotropic image covariance functions
Separable
2
xx (m, n; m' , n' ) 2 (m, m' ) 2 (n, n' ) (Nonstationary case)
xx2 (m, n) 2 (m) 2 (n)
(Stationary case)
1
1
2
2
Separable stationary covariance function
xx2 (m, n) 2 1|m| 2|n| , | 1 | 1, | 2 | 1
Nonseparable exponential function
xx2 (m, n) 2 exp{ 1m 2 2 n 2 }
xx2 (m, n) 2 d , d m 2 n 2 , 1 2
(isotropic or circularly symmetric)
Estimation mean and autocorrelation
1
̂
MN
M
N
u (m, n)
m 1 n 1
1
(m, n) ˆ (m, n)
MN
2
xx
2
xx
M mN n
[u (m' , n' ) ˆ ][u (m m' , n n' ) ˆ ]
m 1 n 1
59
SDF(spectral density function)
Definition
Fourier transform of autocorrelation function
1-D case
SDF{u (n)} Su ( f )
uu2 (n)
0.5
0.5
n
2
uu
(n) exp( j 2fn)
Su ( f ) exp( j 2fn)df
2-D case
SDF{u (m, n)} Su (u, v)
uu2 (m, n)
0.5
0.5
0.5 0.5
m n
2
uu
(m, n) exp[ j 2 (um vn)]
Su (u, v) exp[ j 2 (um vn)]dudv
Average power
uu2 (0,0)
0.5
0.5
0.5 0.5
Su (u, v)dudv
60
(ex) the SDF of stationary white noise field
S (u, v) 2
xx2 (m, n) 2 (m, n)
61
Estimation Theory
Mean square estimates
Estimate the random variable x by a suitable function g(y), such that
E[| x g (y ) |2 ] [ x g ( y )]2 f xy ( x, y )dxdy
but
is min.
f xy ( x, y ) f x ( x | y ) f y ( y )
E[| x g (y ) | ]
2
f y ( y ) [ x g ( y )]2 f x ( x | y )dxdy
the integrand is non-negative ; it is sufficient to minimize
[ x g ( y )]
ˆ g (y )
x
2
f x ( x | y ) dx
for every y
xf
x
( x | y )dx E[x | y ]
62
minimum mean square estimate (MMSE)
ˆ ] E[ g (y )] E[ E[x | y ]] E[x]
also E[x
unbiased estimator
◆ Theorem
Let y
△
y1
y2 y3 yn
t
and x be jointly Gaussian with zero mean.
N
E[x | y ] ai yi , where ai is chosen, such that
The MMSE estimation is
i 1
N
E[( x ai yi ) yk ] 0
∀ all k = 1, 2, … , N
i 1
(Pf)
The random variable (x
N
a y ) , y , y , , y
i 1
i
i
1
2
n
are jointly Gaussian.
But the first one is uncorrelated with all the rest, it is independent of them.
Thus, the error (x
N
a y ) is independent of the random vector y.
i 1
i
i
63
N
N
N
i 1
i 1
i 1
E[( x ai yi ) | y ] E[ x ai yi ] E[x] ai E[ yi ] 0
N
N
i 1
i 1
E[ x | y ] ai E[ yi | y ] ai
N
E[ x | y ] ai yi
i 1
min E[( x xˆ ) 2 ] min E[e 2 ]
{ ( n )}
{ ( n )}
N
where
e ( n) y ( n) x
: estimation error
n 1
E[e 2 ]
0
(n)
yields
E[ey (n)] 0
,
n = 1, 2, … , N
64
The estimation error is minimized if
E[ey (n)] 0
,
n = 1, 2, … , N
orthogonality principle
If x and {y(n)} are independent
xˆ E[x | y ] E[x]
If zero mean Gaussian random variables
N
xˆ (n) y (n)
n 1
: linear combination of {y(n)}
(n ) is determined by solving linear equations
65
Orthogonality principle
The minimum mean square estimation error vector is
orthogonal to every random variable functionally related
to the observations, i.e., for any g (y ) g ( y(1), y(2), , y( N ))
x
E[( x xˆ ) g (y )] 0
x xˆ
x̂
E[xˆ g (y )] E[ E (x | y ) g (y )] E[ E[xg (y ) | y ] E[xg (y )]
Since x̂ is a function of y
ˆ ) g (x
ˆ )] 0
ˆ )x
ˆ ] 0 , E[( x x
E[( x x
substitute
N
N
xˆ (n) y (n)
n 1
(n) E[ y(k ) y(n)] E[ xy(n)],
n 1
matrix notation
R y 1rxy ,
n 1, , N
( { (n)}, rxy {E[xy(n)]})
66
g (y )
Minimum MSE : 2 x2 α T rxy
If x,y(n) are nonzero mean r.v.
N
xˆ xˆ xˆ x (n)[ y (n) y (n)]
n 1
If x,y(n) are non-Gaussian, the results still give the best linear
mean square estimate.
67
Information Theory
Information
Entropy
I k log 2 pk
[bits ]
pk , k 1,, L
: probabilit ies independen t message rk
L
H pk log 2 pk
[bits/messa ge ]
k 1
L
max pk H
k 1
1
1
log 2 log 2 L [bits ]
L
L
For a binary source, i.e.,
L 2,
H p log 2 p (1 p) log 2 (1 p)
p1 p, p2 1 p1 , 0 p 1
H ( p)
0
68
p
0.5
1
Information Theory
Let x be a discrete r.v. with Sx={1, 2, … , K}
with pk=Pr[x=k]
let event Ak
△
{x=k}
uncertainty of Ak is low, if pk is close to one,
and it is high, if pk is small.
uncertainty of event :
1
I (x k ) ln
0
Pr (x k )
if Pr(x=k) = 1
entropy :
K
H x E[ I (x k )] Pr (x k ) ln
k 1
1
Pr (x k )
unit : bit when the logarithm is base 2
69
Entropy as a measure of information
Consider the event Ak, describing the emission of symbol sk
by the source with probability pk
1) if pk=1 and pi=0 ∀ all i≠k
no surprise ⇒ no information when sk is emitted by the source
2) if rk is low
more surprise ⇒ information when sk is emitted by the source
1
; amount of information gained
I ( sk ) log
pk
after observing the event sk
H x E[ I ( sk )]
; average information per source symbol
70
Ex) 16 balls : 4 balls “1”, 4 balls “2”
2 balls “3”, 2 balls “4”
1 ball “5”, “6”, “7”, “8”
Question : Find out the number of the ball
through a series of yes/no questions.
1
1
1
1
44
H x log 2 log 2
bit / ball
4
4
4
4
16
1)
x=1 ?
no
x=2 ?
yes
x=1
no
x=7 ?
no
x=8
yes
yes
x=2
x=7
the average number of question asked :
1
1
1
1
1
1
1
1
51
E[ L] 1( ) 2( ) 3( ) 4( ) 5( ) 6( ) 7( ) 7( )
4
4
8
8
16
16
16
16
16
71
2)
x≤2 ?
no
x≤4 ?
yes
x=1 ?
yes
x=1
no
x≤6 ?
x=2
x=3 ?
yes
x=3
x=7 ?
yes
yes
no
no
no
x=5 ?
yes
x=4
x=5
no
yes
no
x=7
x=6
1
1
1
1
1
1
1
1
44
E[ L] 2( ) 2( ) 3( ) 3( ) 4( ) 4( ) 4( ) 4( )
4
4
8
8
16
16
16
16 16
⇒ The problem of designing the series of questions to identify x
is exactly the same as the problem of encoding the output
of information source.
72
x=8
3 bit / symbol
variable length code
x=1
0 0 0
yes / yes ⇒ 1 1
x=2
0 0 1
yes / no ⇒ 1 0
x=3
0 1 0
no / yes / yes ⇒ 0 1 1
x=4
0 1 1
no / yes / no ⇒ 0 1 0
x=5
1 0 0
no / no / yes / yes ⇒ 0 0 1 1
x=6
1 0 1
no / no / yes / no ⇒ 0 0 1 0
x=7
1 1 0
no / no / no / yes ⇒ 0 0 0 1
x=8
1 1 1
no / no / no / no ⇒ 0 0 0 0
pk
1
4
1
4
1
8
1
8
1
16
1
16
1
16
1
16
⇒ Huffman code
⇒ short code to frequency source symbol
long code to rare source symbol
⇒ entropy of x represent the max.
average number of bits required to identify the outcome of x
73
Noiseless Coding Theorem (1948, Shannon)
min(R) = H(x) +ε bit / symbol
when R is the transmission rate and ε is a positive
quantity that can be arbitrarily close to zero by
sophisticated coding procedure utilizing an appropriate
amount of encoding delay.
74
Rate distortion function
Distortion
x : Gaussian r.v of variance 2
y : reproduced value
D E[( x y) 2 ]
Rate distortion function of x
1
2
log 2 ( ), D 2
RD 2
D
2
0
D
1
2
max[ 0, log 2 ( )]
2
D
For a fixed average distortion D
2
1
RD
N
where
k
1
max[
0
,
log
(
)]
2
2
k 0
N 1
{x(0), x(1), , x( N 1)} : Gaussian r.v.’s
{ y (0), y (1), , y ( N 1)} : reproduced values
is determined by solving
1
D
N
N 1
min[ , k ]
2
k 0
75
RD
Rate distortion function
for a Gaussian source
D
© Copyright 2026 Paperzz