NN-math

Review of Matrix Operations
Vector: a sequence of elements (the order is important)
e.g., x = (2, 1) denotes a vector
X (2, 1)
length = sqrt(2*2+1*1)
a
orientation angle = a
x = (x1, x2, ……, xn), an n dimensional vector
a point in an n dimensional space
column vector:
row vector
T
1 
y  (12 58 )  x
 
 2
x  
5
 
8 
 
(x )  x
T
T
transpose
norms of a vector: (magnitude)
L1 norm
x
L2 norm
x
L norm
x
1
2

 n
i 1 x i
2 1/ 2
 ( n
x
i 1
i )
 max x i
1 i  n
vector operations:
rx  (rx1 , rx2 ,...... rxn )T r : a scaler , x : a column vector
inner ( dot ) product
x, y are column vectors of same dimension n
 y1 
 x1 
y  n
x  T
T
2
x  y  ( x1 , x2 ......xn )     xi yi  ( y1 , y2 ... yn )  2   y  x


i

1
y 
x 
 n
 n
 x1 
n
x  n
T
x x  ( x1 , x2 ......xn )  2    xi xi   ( xi ) 2  0

i 1
 x  i 1
 n
Cross product:
x y
defines another vector orthogonal to the plan
formed by x and y.
Matrix:
Am n
 a11 a12 ...... a1n 




  ( ai j ) m  n
 am1 am 2 ...... amn 


aij : the element on the ith row and jth column
aii : a diagonal element (if m = n)
wij : a weight in a weight matrix W
each row or column is a vector
a j : jth column vector
ai  : ith row vector
 a1 
Am x n  ( a1 ...... an )    
a 
 m 
a column vector of dimension m is a matrix of m x 1
transpose:
T
Am
n
 a11 a21 ...... am1 




 a a ...... a 
mn 
 1n 2 n
jth column becomes jth row
square matrix: A
n n
identity matrix:
1 0 ..... 0 
 0 1...... 0 
I 

 0 0......1 


ai j  1 if i  j
0 otherwise
symmetric matrix: m = n and
A  AT , or  i ai  ai , or  ij aij  a ji
matrix operations:
rA  (ra1,......ran )  (rai j )
xT Am  n  ( x1......xm )( a1 ,......an )
 ( xT a1 ,......xT an )
The result is a row vector, each element of which is
an inner product of xT and a column vector a j
product of two matrices:
Am n  Bn  p  Cm  p
where Cij  ai   b j
Amn  I nn  Amn
vector outer product:
 x1 y1 , x1 y2 ,...... x1 yn 
 x1 


 


x  y T   xi  y1...... yn   


 


x 
 m
 xm y1 , xm y2 , ...... xm yn 
Linear Algebra
• Two vectors x  ( x1,..., xn ) and y  ( y1,..., yn ) T
are said to be orthogonal to each other if
x  y  in1 xi yi  0.
• A set of vectors x (1) ,..., x ( k ) of dimension n are said to be
linearly independent of each other if there does not exist a
set of real numbers a1 ,...,ak which are not all zero such that
a1 x (1)    ak x ( k )  0
otherwise, these vectors are linearly dependent and each one
can be expressed as a linear combination of the others
x
(i )
a j ( j)
ak ( k )
a1 (1)

x 
x   j i
x
ai
ai
ai
• Vector x != 0 is an eigenvector of matrix A if there exists a
constant  such that Ax = x
–  is called a eigenvalue of A (wrt x)
– A matrix A may have more than one eigenvectors, each
with its own eigenvalues
• Ex.
has 3 eigenvalues/eigenvectors
• Matrix B is called the inverse matrix of square matrix A if
AB = I (I is the identity matrix)
– Denote B as A-1
– Not every matrix has inverse (e.g., when one of the row
can be expressed as a linear combination of other rows)
• Every matrix A has a unique pseudo-inverse A*, which
satisfies the following properties
AA*A = A; A*AA* = A*; A*A = (A*A)T; AA* = (AA*)T
Ex. A = (2 1 -2), A* = (2/9 1/9 -2/9) T
Calculus and Differential Equations
•
xi (t), the derivative of xi , with respect to time t
• System of differential equations
 x1 (t )  f1 (t )

 
 xn (t )  f n (t )
solution: ( x1(t ), xn (t ))
difficult to solve unless fi (t ) are simple
 x1(t )  sin(t )
 x1(t )   cos(t )
has a solution
 x (t )  t 2
3 /3
x
(
t
)

t
2
2


• Multi-variable calculus: y  f ( x1, x2 ,, xn )
partial derivative: gives the direction and speed of
change of y with respect to xi. Ex.
y  sin( x1 )  x2 2  e  ( x1  x2  x3 )
y
 ( x1  x2  x3 )
 cos( x1 )  e
x1
y
 2 x2  e ( x1  x2  x3 )
x2
y
 e ( x1  x2  x3 )
x3
the total derivative of y(t )  f ( x1(t ), x2 (t ),, xn (t ))
gives the direction and speed of change of y, with respect to t
df f
f
y (t )   x1 (t )   xn (t )
dt x1
xn
  f  ( x1 (t ) ,, x n (t ))T
Gradient of f :  f  (
f
f
, ......
)
x1
xn
Chain-rule: z is a function of y, y is a function of x, x is a
function of t
dz dz dy dx
  
dt dy dx dt
dynamic system:
–
–
–
–
 x1 (t )  f1 ( x1, ..... xn )

 

 xn (t )  f n ( x1, ...... xn )
change of xi may potentially affect other x
all xi continue to change (the system evolves)
reaches equilibrium when xi  0 i
stability/attraction: special equilibrium point
(minimal energy state)
– pattern of ( x1 , ...... xn ) at a stable state often
represents a solution of the problem