x - VIPL

Mathematical Preliminaries
Matrix Theory

Vectors


nth element of vector u : u(n)
Matrix

 u (1) 
 u ( 2) 


  
u  {u (n)}  

  
  


u ( N )
mth row and nth column of A : a(m,n)
a(1,2)  a (1, N )  column vector
 a (1,1)
 a (2,1)



 



A  {a (m, n)}  
  a1 a 2  a N 





 





a( M ,1) a( M ,2)  a ( M , N )
where a k  [a(1, k ) a(2, k )  a( M , k )]T
37
Lexicographic Ordering(Stacking operation)

 r1 
 
r
T
x  [ x(1,1) x(1,2) x(1, N ) x(2,1) x(2, N ) x(M ,1) x(M , N )]   2  ,
 
x(1,2)

x (1, N ) 
 
 x (1,1)
where row vector
 x ( 2,1)

rM 


T





rk  [ x(k ,1) x(k ,2)  x(k , N )]
Row-ordered form of a matrix






 x( M ,1)




x( M ,2)





x ( M , N ) 
 c1 
 
c2 
T

,
x  [ x(1,1) x(2,1) x( M ,1) x(1,2) x(M ,2) x(1, N ) x(M , N )] 
  
 
x(1,2)

x (1, N ) 
 x (1,1)
c M 
where column vector
 x ( 2,1)


Column-ordered form of a matrix
c k  [ x(1, k ) x(2, k )  x( M , k )]
T









 x( M ,1)
38






x( M ,2)







x ( M , N ) 
39


Transposition and conjugation rules
A*T  [ AT ]* ,
[ AB ]T  BT AT
[ A 1 ]T  [ AT ]1 ,
[ AB ]*  A**
Toeplitz matrices
t (m, n)  tmn

Circulant matrices
c(m, n)  c(( m  n) modulo N )
 c(( m  n) % N )
t 0
t
1
t 2

T  



t
 N 1
c0
c
 N 1


C  


c2
c
 1
40
t0
t 1

t2
 t  N 1 
t  N  2 








t 1 

t1 t 0

c1
c2

c0
c1
c2

t 1
c N 1
c N 1 
c N  2 








c1 
c0 

Linear convolution using Toeplitz matrix
h(n)  0, for n  0, n  N h
x(n)  0, for n  0, n  N x
y(n)  h(n)  x(n) 
N x 1
 h(n  k ) x(k )
k 0
y(0) 
y(1) 
y(l ) 
N x 1
 h( n  k ) x ( k )
 h(0) x(0)
k 0
N x 1
n 0
k 0
N x 1
n 1
k 0
n l
1
 h(n  k ) x(k )
  h(1  k ) x(k ) h(1) x(0)  h(0) x(1)
 h(n  k ) x(k )
k 0
l
  h(l  k ) x(k ) h(l ) x(0)  h(l  1) x(1)    h(0) x(l )
k 0
41
y  Hx
Nx
Nh  N x 1
y (0)
0
0

  h ( 0)



y (1)
h(0)
0

  h(1)

  h(2)
y ( 2)
h(1)
h ( 0)

 





 

 h( N h  1) h( N h  2)





0
h( N h  1) h( N h  2)

 




0
0
h( N h  1)

 





 




0
0
0

 
0
0
0
 y ( N h  N x  2) 
y
0 0 0
0 0 0
0 0 0















0 0 0
0 0 0
H
(Toepliz matrix)
42


0
0
0
0


0
0
0
0
  x(0) 




  x(1) 


h(1) h(0)
0
0



h(2) h(1)
h ( 0)
0








h(1)
h ( 0)
  x( N x  1)





0
0
h( N h  1) h( N h  2)

0
0
0
h( N h  1) 
0
0
0
0
x
Circular convolution using circulant matrix

N-point circular convolution :
~
h ( n) 
h(n) N x(n)

 h(n  kN ),
k  
x(n)  0, for n  0, n  N
 N 1 ~
 h (n  k ) x(k ), 0  n  N
y ( n)  h( n) N x ( n )   
k 0

0,
otherwise

N 1
~
y (0)   h (n  k ) x(k )
k 0
N 1
n 0
N 1
~
y (1)   h (n  k ) x(k )
k 0
k 0
k 0
k 1
N 1
n 1
N 1
~
y (l )   h (n  k ) x(k )
N 1
~
  h (k ) x(k )  h(0) x(0)   h( N  k ) x(k )
N 1
~
  h (1  k ) x(k )  h(1) x(0)  h(0) x(1)   h( N  k ) x(k )
k 0
k 2
N 1
n l
~
  h (l  k ) x(k )
k 0
 h(l ) x(0)  h(l  1) x(1)    h(0) x(l ) 
N 1
 h( N  k ) x ( k )
k l 1
43
y  Hx
h( N  1) h( N  2)

h(2)
h(1)   x(0) 
 y (0)   h(0)
 y (1)   h(1)
h(0)
h( N  1) h( N  2)

h(2)   x(1) 

 
 y (2)   h(2)
  x(2) 
h(1)
h(0)
h( N  1)




 











 


 y ( N  2) h( N  2)

h(2)
h(1)
h(0) h( N  1)  x( N  2)

 



h(2)
h(1)
h(0)   x( N  1) 
 y ( N  1)   h( N  1) h( N  2)
y
x
H
(circulant matrix)
Circular convolution + zero padding
linear convolution
h(n)  0, for n  0, n  N h ,
x(n)  0, for n  0, n  N x
Circular convolution with the period : N  N x  N h  1
the same result with that of linear convolution
44
(ex) Linear convolution as a Toeplitz matrix operation
h(n)  n,  1  n  1
 y (1)  1 0 0 0 0
 y (0)   0  1 0 0 0 x(0)


 


 y (1)   1 0  1 0 0 x(1) 


 




y
(
2
)
0
1
0

1
0
x
(
2
)

 



 y (3)   0 0
1 0  1  x(3) 

 

1 0  x(4)
 y (4)   0 0 0
 y (5)   0 0 0 0
1

 
4
y ( n)  h( n)  x ( n)   h( n  k ) x ( k )
k 0
M  N  L 1  5  3 1  7
(ex) Circular convolution as a circulant matrix operation
h(n)  n  3, h(n)  h(n  N ), N  4
N 1
 y (0)  3
 y (1)  0


 y (2) 1

 
 y (3)  2
y (n)   h(n  k ) x(k ), 0  n  N  1
k 0
45
2 1 0  x(0) 
3 2 1  x(1) 
0 3 2  x(2)


1 0 3  x(3) 

Orthogonal and unitary matrices
Orthogonal :A 1  AT or AA T  AT A  I
*T
*T
1
*T
or
AA

A
AI
A

A
 Unitary :
Positive definiteness and quadratic forms
A is called positive definite, if A is a Hermitian matrix
and
Q  x*T Ax  0, x  0

A is called positive semidefinite(nonnegative), if Ais a
Hermitian matrix and Q  x*T Ax  0, x  0
Theorem
if A is a symmetric positive definite matrix,
then all its eigenvalues are positive and the
N
N
determinant of A satisfies | A |  k   a(k , k )
k 1
46
k 1

Diagonal forms

For any Hermitian matrix R there exists a unitary matrix 
such that
*T R  
(or R  )
 : diagonal matrix containing the the eigenvalues of R

Eigenvalue and eigenvector
Rk  kk , k  1,, N
k
: eigenvalue
k
: eigenvector
where   [1 | 2 |  |  N ]
47
Block Matrices

Block matrices : elements are matrices
n
n
(ex)
n
1 4 1
1
1
1 1
2 5 3
m
y (m, n) 
1 5 5 1
3 10 5 2
2 3 -2 -3
m
m
x ( m, n )
y(m,n)
h ( m, n )
2
1
 x(m' , n' )h(m  m' , n  n' ),
m ' 0 n ' 0
2 1 
X  5 4  [x 0
3 1 
0  m  3, 0  n  2
2
5 
 
3
x 
1 
4
 

1 

x1 ]
 2 3 1
 3 10 5
  y 0
Y
 2 5 5


  3 2 1
 A1,1
A
 2,1
 
A
 
 

 A m ,1
y1
;
y 0 
y   y 1 
 y 2 
y2 
48
Column
Stacking
Operation
A1, 2
A 2, 2



A m,2
A1,n 
A 2,n 
 

 
 

 A m ,n 


Let xn1 and yn be the column vector, then
yn 
H
n'  0
n n'
xn
where Hn  {h(m  m' , n)},
1 0 0
 1 1 0 
,
H0  
 0 1 1 


 0 0  1
y 0  H 0
y   H
 1  1
y 2   0
1
1
H1  
0

0
0
x 0 

H0  
x1 

H 1 
0
1
1
0

block matrix
49
0  m  3, 0  m'  2
0
0
1

1
y  Hx
Kronecker Products
Definition
 a (1,1)B  a (1, M 1 )B 






A  B  {a (m, n)B} 






a ( M 1 ,1)B  a ( M 1 , M 1 )B 
(ex)
1 1
 1 2
A
,
B


3 4
1  1


1
3
AB  
1

3
Properties(Table2.7)
( A  B)(C  D)  ( AC)  (BD)
(A  B)(C  D) : O( N 6 )  O( N 4 ) operations
(AC)  (BD) : O( N 4 ) operations
50
2
1 2
2
1
 1  1 2  2
4
3
4

, BA  
3
2  1  2
3 4
4



4  3  4
3  3 4  4
2
1
Separable transformation
Transformation on an NXM image U
Consider the transformation
v(k , l )   u (m, n)t (k , l ; m, n)
m
n
m
n
  a(k , m)u (m, n)b(l , n) , if t (k , l ; m, n)  a(k , m)b(l , n)
V  AUB T
: matrix form
v  ( A  B)u
: vector form
row-ordered form
Let v k , um : row vectors of U and V
then vTk   a(k , m)[BuTm ]  [ A  B]k ,m uTm
m
m
v  [ v v  v ]  (A  B)u
T
1
T
2
T T
M
51
52
Random Signals

Definitions




Random signal : a sequence of random variables
Mean : u (n)  E[u(n)]
Variance :  2 (n)  E[| u(n)   (n) |2 ]
Covariance : Cov[u(n), u(n' )]   2 (n, n' )
u
uu
 E{[u(n)   (n)][u * (n' )   * (n' )]}

Cross covariance :

Autocorrelation :
Cov[u(n), v(n' )]   uv2 (n, n' )
*
 E{[u(n)  u (n)][v* (n' )  v (n' )]}
ruu (n, n' )  E[u (n)u * (n' )]
2
  uu
(n, n' )   (n)  * (n' )

Cross correlation :
ruv (n, n' )  E[u (n)u * (n' )]   uv2 (n, n' )  u (n) v (n' )
*
53

Representation for an NX1 vector
E[u] μ  { (n)}
u
: Nx1 vector
2
Cov[u]  E[(u  μ )(u*  μ * )T ]  C  { uu
(n, n' )} : NxN matrix
*
Cov[u, v]  E[(u μ u )( v* μ v )T ]  Cuv  { uv2 (n, n' )} : NxN matrix
μ : mean vector
C
: covariance matrix
Gaussian(or Normal) distribution
 | u   |2
pu (u ) 
exp{
}
2
2
2

2
1
Gaussian random processes
Gaussian random process if the joint probability density
of any finite sub-sequence is a Gaussian distribution
pu (u )  pu (u1 , u2 ,, u N )  [( 2 ) N / 2 | C |1/ 2 ]1 exp{ 1 / 2(u  μ )*T C1 (u  μ )}
C
: covariance matrix
54

Stationary process


Strict-sense stationary
if the joint density of any partial sequence {x(l ), n  l  k}
is the same as that of the shifted sequence {x(l  n0 ), n  l  k}
Fx ( n ), x ( n1),, x ( k ) ( xn , xn1 ,, xk )
 Fx ( n n0 ), x ( n1 n0 ),, x ( k  n0 ) ( xn n0 , xn n0 1 ,, xk  n0 ),


for  n, n0 , k
Wide-sense stationary
if E[u (n)]  μ  constant
E[u (n)u * (n' )]  ruu (n  n' ) : covariance matrix is Toeplitz

Gaussian process : wide-sense = strict sense
55
Markov processes
p-th order Markov
prob[u (n) | u (n  1), u (n  2), ]
 prob[u (n) | u (n  1), , u (n  p)], n
(ex) Covariance matrix of a first-order stationary Markov sequence u(n)
2
 uu
(n)   |n| , |  | 1, n



 1
  2   N 1 


 
 
C 
2 






  N 1 

1 

: Toeplitz
*
E
[
xy
]0
:
Orthogonal
p x , y ( x, y )  p x ( x ) p y ( y )
Independent :
E[ xy* ]  E[ x]E[ y* ] or
Uncorrelated :
56
E[( x   x )( y   y )* ]  0

Karhunen-Loeve(KL) transform


KL transform of x
 : NxN unitary matrix
y  *T x,
Property
*T
E[yy ]  *T E[xx*T ]  *T R  
E[ y (k ) y * (l )]  k  (k  l )



The elements of y(k) are orthogonal
*T is called the KL transform matrix
The rows of *T are the conjugate eigenvectors of R
57
Discrete Random Field

Definitions

Discrete random field
Each sample of a 2-D sequence is a random variable



Mean : E[u (m, n)]   (m, n)
Covariance : Cov[u(m, n), u(m' , n' )]   2 (m, n; m' , n' )
uu
 E{[u(m, n)   (m, n)][u * (m' , n' )   * (m' , n' )]}

White noise field
 2 (m, n; m' , n' )   xx2 (m, n) (m  m' , n  n' )
xx

Symmetry
 uu2 (m, n; m' , n' )   uu2 (m' , n' ; m, n)
*
58

Separable and isotropic image covariance functions

Separable
2
 xx (m, n; m' , n' )   2 (m, m' ) 2 (n, n' ) (Nonstationary case)
 xx2 (m, n)   2 (m) 2 (n)
(Stationary case)
1
1

2
2
Separable stationary covariance function
 xx2 (m, n)   2 1|m|  2|n| , | 1 | 1, |  2 | 1

Nonseparable exponential function
 xx2 (m, n)   2 exp{ 1m 2   2 n 2 }
 xx2 (m, n)   2  d , d  m 2  n 2 , 1   2  
(isotropic or circularly symmetric)
Estimation mean and autocorrelation
1
  ̂ 
MN
M
N
 u (m, n)
m 1 n 1
1
 (m, n)  ˆ (m, n) 
MN
2
xx
2
xx
M  mN  n
  [u (m' , n' )  ˆ ][u (m  m' , n  n' )  ˆ ]
m 1 n 1
59
SDF(spectral density function)

Definition

Fourier transform of autocorrelation function

1-D case
SDF{u (n)}  Su ( f ) 
 uu2 (n)  
0.5
0.5


n  
2
uu
(n) exp(  j 2fn)
Su ( f ) exp(  j 2fn)df
2-D case
SDF{u (m, n)}  Su (u, v) 
 uu2 (m, n)  
0.5

0.5
0.5 0.5


 
m   n  
2
uu
(m, n) exp[  j 2 (um  vn)]
Su (u, v) exp[ j 2 (um  vn)]dudv
Average power
 uu2 (0,0)  
0.5

0.5
0.5 0.5
Su (u, v)dudv
60
(ex) the SDF of stationary white noise field
S (u, v)   2
 xx2 (m, n)   2 (m, n)
61
Estimation Theory

Mean square estimates
Estimate the random variable x by a suitable function g(y), such that
E[| x  g (y ) |2 ]    [ x  g ( y )]2 f xy ( x, y )dxdy
but
is min.
f xy ( x, y )  f x ( x | y ) f y ( y )

E[| x  g (y ) | ] 
2



f y ( y )  [ x  g ( y )]2 f x ( x | y )dxdy

 the integrand is non-negative ; it is sufficient to minimize

 [ x  g ( y )]

ˆ  g (y ) 
x
2
f x ( x | y ) dx
for every y

 xf
x
( x | y )dx  E[x | y ]

62
 minimum mean square estimate (MMSE)
ˆ ]  E[ g (y )]  E[ E[x | y ]]  E[x]
also E[x
 unbiased estimator
◆ Theorem
Let y
△

y1
y2 y3  yn 
t
and x be jointly Gaussian with zero mean.
N
E[x | y ]   ai yi , where ai is chosen, such that
The MMSE estimation is
i 1
N
E[( x   ai yi ) yk ]  0
∀ all k = 1, 2, … , N
i 1
(Pf)
The random variable (x 
N
 a y ) , y , y , , y
i 1
i
i
1
2
n
are jointly Gaussian.
But the first one is uncorrelated with all the rest, it is independent of them.
Thus, the error (x 
N
 a y ) is independent of the random vector y.
i 1
i
i
63
N
N
N
i 1
i 1
i 1
E[( x   ai yi ) | y ]  E[ x   ai yi ]  E[x]   ai E[ yi ]  0
N
N
i 1
i 1
 E[ x | y ]   ai E[ yi | y ]   ai
N
 E[ x | y ]   ai yi
i 1
min E[( x  xˆ ) 2 ]  min E[e 2 ]
{ ( n )}
{ ( n )}
N
where
e    ( n) y ( n)  x
: estimation error
n 1
E[e 2 ]
0
 (n)
yields
E[ey (n)]  0
,
n = 1, 2, … , N
64
 The estimation error is minimized if
E[ey (n)]  0

,
n = 1, 2, … , N
orthogonality principle
If x and {y(n)} are independent
xˆ  E[x | y ]  E[x]
If zero mean Gaussian random variables
N
xˆ    (n) y (n)
n 1
: linear combination of {y(n)}
 (n ) is determined by solving linear equations
65

Orthogonality principle

The minimum mean square estimation error vector is
orthogonal to every random variable functionally related
to the observations, i.e., for any g (y )  g ( y(1), y(2), , y( N ))
x
E[( x  xˆ ) g (y )]  0
x  xˆ
x̂
 E[xˆ g (y )]  E[ E (x | y ) g (y )]  E[ E[xg (y ) | y ]  E[xg (y )]
Since x̂ is a function of y
ˆ ) g (x
ˆ )]  0
ˆ )x
ˆ ]  0 , E[( x  x
E[( x  x
substitute
N
N
xˆ    (n) y (n)
n 1
 (n) E[ y(k ) y(n)]  E[ xy(n)],
n 1
matrix notation
  R y 1rxy ,
n  1, , N
(  { (n)}, rxy  {E[xy(n)]})
66
g (y )


Minimum MSE :  2   x2 α T rxy
If x,y(n) are nonzero mean r.v.
N
xˆ   xˆ  xˆ   x    (n)[ y (n)   y (n)]
n 1

If x,y(n) are non-Gaussian, the results still give the best linear
mean square estimate.
67
Information Theory

Information
Entropy
I k   log 2 pk
[bits ]
pk , k  1,, L
: probabilit ies independen t message rk
L
H   pk log 2 pk
[bits/messa ge ]
k 1
L
max pk H  
k 1
1
1
log 2  log 2 L [bits ]
L
L
For a binary source, i.e.,
L  2,
H   p log 2 p  (1  p) log 2 (1  p)
p1  p, p2  1  p1 , 0  p  1
H ( p)
0
68
p
0.5
1

Information Theory
Let x be a discrete r.v. with Sx={1, 2, … , K}
with pk=Pr[x=k]
let event Ak
△

{x=k}
 uncertainty of Ak is low, if pk is close to one,
and it is high, if pk is small.
uncertainty of event :
1
I (x  k )  ln
0
Pr (x  k )
if Pr(x=k) = 1
entropy :
K
H x  E[ I (x  k )]   Pr (x  k ) ln
k 1
1
Pr (x  k )
unit : bit when the logarithm is base 2
69

Entropy as a measure of information
Consider the event Ak, describing the emission of symbol sk
by the source with probability pk
1) if pk=1 and pi=0 ∀ all i≠k
no surprise ⇒ no information when sk is emitted by the source
2) if rk is low
more surprise ⇒ information when sk is emitted by the source
1
; amount of information gained
I ( sk )  log
pk
after observing the event sk
H x  E[ I ( sk )]
; average information per source symbol
70
Ex) 16 balls : 4 balls “1”, 4 balls “2”
2 balls “3”, 2 balls “4”
1 ball “5”, “6”, “7”, “8”
Question : Find out the number of the ball
through a series of yes/no questions.
1
1
1
1
44
H x   log 2  log 2   
bit / ball
4
4
4
4
16
1)
x=1 ?
no
x=2 ?
yes
x=1
no

x=7 ?
no
x=8
yes
yes
x=2
x=7
the average number of question asked :
1
1
1
1
1
1
1
1
51
E[ L]  1( )  2( )  3( )  4( )  5( )  6( )  7( )  7( ) 
4
4
8
8
16
16
16
16
16
71
2)
x≤2 ?
no
x≤4 ?
yes
x=1 ?
yes
x=1
no
x≤6 ?
x=2
x=3 ?
yes
x=3
x=7 ?
yes
yes
no
no
no
x=5 ?
yes
x=4
x=5
no
yes
no
x=7
x=6
1
1
1
1
1
1
1
1
44
E[ L]  2( )  2( )  3( )  3( )  4( )  4( )  4( )  4( ) 
4
4
8
8
16
16
16
16 16
⇒ The problem of designing the series of questions to identify x
is exactly the same as the problem of encoding the output
of information source.
72
x=8
3 bit / symbol
variable length code
x=1
0 0 0
yes / yes ⇒ 1 1
x=2
0 0 1
yes / no ⇒ 1 0
x=3
0 1 0
no / yes / yes ⇒ 0 1 1
x=4
0 1 1
no / yes / no ⇒ 0 1 0
x=5
1 0 0
no / no / yes / yes ⇒ 0 0 1 1
x=6
1 0 1
no / no / yes / no ⇒ 0 0 1 0
x=7
1 1 0
no / no / no / yes ⇒ 0 0 0 1
x=8
1 1 1
no / no / no / no ⇒ 0 0 0 0
pk
1
4
1
4
1
8
1
8
1
16
1
16
1
16
1
16
⇒ Huffman code
⇒ short code to frequency source symbol
long code to rare source symbol
⇒ entropy of x represent the max.
average number of bits required to identify the outcome of x
73

Noiseless Coding Theorem (1948, Shannon)


min(R) = H(x) +ε bit / symbol
when R is the transmission rate and ε is a positive
quantity that can be arbitrarily close to zero by
sophisticated coding procedure utilizing an appropriate
amount of encoding delay.
74

Rate distortion function

Distortion
x : Gaussian r.v of variance  2
y : reproduced value
D  E[( x  y) 2 ]
Rate distortion function of x
1
2
 log 2 ( ), D   2
RD   2
D
2

0
D



1
2
 max[ 0, log 2 ( )]
2
D
For a fixed average distortion D
2
1
RD 
N
where

k
1
max[
0
,
log
(
)]

2
2

k 0
N 1
{x(0), x(1), , x( N  1)} : Gaussian r.v.’s
{ y (0), y (1), , y ( N  1)} : reproduced values
is determined by solving
1
D
N
N 1
 min[  ,  k ]
2
k 0
75
RD
Rate distortion function
for a Gaussian source
D