Mathematical Structures for Systems and Control Ravi Banavar

Mathematical Structures for Systems and
Control
Ravi Banavar
Debasish Chatterjee
Systems & Control Engineering, IIT-Bombay, Powai, Mumbai
– 400 076, India, Phone: +91-22-2576-7879, url: http://www.sc.iitb.ac.in/ banavar
E-mail address: [email protected]
Systems & Control Engineering, IIT-Bombay, Powai, Mumbai
– 400 076, India, Phone: +91-22-2576-7879, url: http://www.sc.iitb.ac.in/ chatterjee http://www.sc.iitb.ac.in/ banavar
E-mail address: [email protected]
2010 Mathematics Subject Classification. Primary
Key words and phrases. groups, rings, fields, vector spaces, . . .
Abstract. This is the complete set of lecture notes for the course Mathematical Structures for Systems & Control, at the Systems & Control
Engineering, IIT Bombay.
Dedication text.
Contents
Preface
ix
Introduction
1
Notation
1
A Prelude
Chapter 1.
.
3
Groups, fields and vector spaces
Coordinate systems
S2 .
Groups
5
5
8
.
Matrix groups
9
.
Rigid body motion and the groups - SO(3) and SE(3)
10
.
Three results on rotations
11
.
Skew-symmetric matrices, the cross product and rotations
14
.
Euler angle parameterization of rotations
16
.
Interesting facts about rotations
19
.
More on the rigid body transformation group SE(3)
22
0.
Rings and Fields
23
1.
Ring
23
2.
Field
24
3.
Vector Spaces
25
4.
Basis
27
5.
Linear functionals and dual basis
28
6.
Annihilator
29
vii
viii
Contents
7.
Direct sum
30
8.
Multilinear functionals (tensors)
31
9.
Linear transformations
33
S2 0.
M atrixrepresentations, theadjointandsimilaritytransf ormations 34
S2 1.
Eigenvalues
38
S2 2.
M ultilinearf ormsandthedeterminant
38
Chapter 2.
.
Additional structures in vector spaces
Norms
S2 .
Innerproducts
Bibliography
41
41
43
49
Preface
Text.
– Ravi Banavar and Debasish Chatterjee
ix
Introduction
Notation
[l]Symbol Intended meaning
R
the set of real numbers ] − ∞, +∞[
the set of positive integers {1, 2, . . .}
Z
the set of all integers {. . . , −2, −2, 0, 1, 2, . . .}
the set of non-negative integers {0, 1, 2, . . .}
the
√ set of complex numbers
+ −1, or the imaginary unit
(z)
real part of z ∈
(z)
imaginary part of z ∈
|z|
absolute value of z ∈
z
complex conjugate of z ∈
F
topological closure of a set F
In
n × n identity matrix
k·k
norm
h·, ·i
inner-product
A
transpose of a matrix A
A
adjoint of an operator A
(A)
trace of a square matrix A
(·)
Fourier transform
1-2 AB
A is defined by B
AB
B is defined by A
A∼
A is isomorphic to B
=B
1
A Prelude
In the study of many engineering disciplines, one encounters differential
equations, matrix equations, eigen values and eigen vectors, functions and
polynomials. The last two centuries have witnessed the birth and evolution
of a sound mathematical framework to study these notions. Comprehending the basic mathematical structure on which these many entities rest on,
greatly enhances our appreciation of their usage and provides more insight
into the role they play in a problem. These notes are an attempt in this
direction. The examples included are mainly from the fields of mechanics,
control systems and signals.
The first three chapters of these notes have been constructed from liberal
borrowings from the following classic textbooks:
• Finite Dimensional Vector Spaces - P. Halmos, Springer, 84
• Ordinary Differential Equations - V. Arnold, Springer, 92
3
Chapter 1
Groups, fields and
vector spaces
In a broad sense, the study of groups is the study of symmetries or invariants. For instance, what are the objects (transformations) that render the
length of a vector in Euclidean space invariant ? They are translations and
rotations. Similarly, in a rigid body,
• the distance between any two points, as well as
• the orientation of a coordinate frame fixed to the rigid body,
remain invariant under rotations and translations. Energy conservation in
the physical world implies invariance of physical laws with time. The kinetic
energy of a rigid body is invariant with respect to translations and rotations
of the frame with respect to which it is measured. To name a few areas, the
theory of groups finds applications in
• Rigid body dynamics and control - robotics, satellite and aerospace
dynamics,
• Computer graphics,
• Cryptography.
1. Coordinate systems
1.1. Rotational transformations. Consider a plane as shown in figure
1 and a Cartesian frame of reference. From high- school understanding, we
assign coordinates to any point p as (xp , yp ) where xp and (yp ) indicate the
5
6
1. Groups, fields and vector spaces
Y
Y
0
p
yp
X0
0
yp
0
α
xp
o |{z}
xp
X
Figure 1. Rotation of frames
component of a segment drawn from the origin to the point p along the
X-axis and Y-axis respectively.
In problems, the choice of a coordinate frame is at the discretion of the
user. Let us now choose a different Cartesian frame of reference, with the
origin at the same point and oriented at an angle α (taken counter-clockwise)
from the previous one. The coordinates of the point p in the new frame,
denoted by (x1p , yp1 ), are given by
1 xp
cos α sin α
xp
=
(1)
.
yp1
− sin α cos α
yp
The matrix
cos α sin α
− sin α cos α
that relates the coordinates of the point p between two frames is termed
a transformation matrix. More specifically, it is termed a rotational transformation matrix (or a rotation matrix) in two-dimensional space. Now
consider a third frame of reference that is oriented at an angle of β from the
second one. The new coordinates (x2p , yp2 )are given by
2 1 xp
xp
cos β sin β
=
yp2
− sin β cos β
yp1
(2)
=
cos β sin β
− sin β cos β
cos α sin α
− sin α cos α
xp
yp
Notice that the product of two rotation matrices
cos β sin β
cos α sin α
cos(β + α) sin(β + α)
(3)
=
,
− sin β cos β
− sin α cos α
− sin(β + α) cos(β + α)
1. Coordinate systems
7
Y0
p
X0
Y
α
o
o
X
Figure 2. Affine transformations
is a rotation matrix, and
2 xp
cos(β + α) sin(β + α)
xp
(4)
=
.
− sin(β + α) cos(β + α)
yp
yp2
1.2. Affine transformations. From pure rotations of the coordinate frame
examined in the previous section, we now examine a transformation that involves a rotation followed by a translation of the origin of the new coordinate
frame. See figure 4. Re-examining the first transformation that we studied,
with the additional assumption that the origin translates a and b units alone
the x2 and y 2 axes respectively, the new coordinates (x1p , yp1 )are given by
(5)
x1p
yp1
=
cos α sin α
− sin α cos α
xp
yp
+
a
b
.
To enable matrix multiplication to express the new coordinates of successive
changes of coordinate frames, we adopt a slightly altered notation. We
append a 1 to the coordinates of the point p and express this as a 3 × 1
column vector as


xp
 yp  .
1
8
1. Groups, fields and vector spaces
The transformation from one coordinate system
expressed in terms of a 3 × 3 matrix as

 1  
xp
cos α sin α a
 yp1  =  − sin α cos α b  
(6)
0
0
1
1
to the other can now be

xp
yp  .
1
Note that the left-top side 2×2 matrix denotes a rotation and the right 2×1
column vector denotes the translation. This set of matrices that denotes
both translation and rotation of a coordinate frame is identified with the
two-tuple
R p
R ∈ SO(2), p ∈ R2
and is denoted by SE(2) (special-Euclidean in 2-dimensions). It has the
following properties: The two examples presented, motivate us to define a
mathematical object that plays an important role in the study of dynamical
systems.
2. Groups
Consider a set S and define a binary operation, denoted by the symbol +.
The operation + between any two objects a and b of the set is denoted as
a + b,
and yields an element which belongs to the set S (this is called the closure
property.) With this basic structure we impose certain requirements on the
binary operation to define a group.
A group is a set G with a binary
operation + that satisfies the following properties.
• For all x, y ∈ G, x + y ∈ G (Closure) and (x + y) + z = x + (y + z)
(Associativity.)
• There exists a unique 0 ∈ G such that x + 0 = 0 + x = x for every
x ∈ G (Existence of the identity element.)
• For every x ∈ G there exists a unique x−1 ∈ G such that x + x−1 =
0. (Existence of the inverse.)
1. Which of the following structures qualifies to be termed a group ?
(1) Z with the conventional addition operation ?
(2) Z with the conventional multiplication operation ?
(Note: from henceforth, if not mentioned, the words ”addition” and
”multiplication” will denote the conventional addition and multiplication operations in the reals, respectively.)
(3) R with the addition operation ? with the multiplication operation
?
3. Matrix groups
9
(4) The set of all polynomials (with real coefficients) of degree ≤ n
with the addition operation ? (note: addition of two polynomials
implies adding coefficients of terms with identical indices.)
(5) The set of all rotational transformations that relate coordinates of
points on a rigid body undergoing pure rotation from a body-fixed
frame to an earth-fixed frame in three-dimensional space ?
(6) The set of all transformations that relate coordinates of points on
a rigid body undergoing general motion from a body-fixed frame
to an earth-fixed frame in three-dimensional space ?
(7) The set
A = {ea : a ∈ R+ }
with the multiplication operation ?
(8) The set
A = {ea : a ∈ R}
with the multiplication operation ?
A commutative group satisfies
• For all x, y ∈ G, x + y = y + x (Commutativity.)
Which of the sets amongst those in the previous question (2) qualifies to be
termed a commutative group ?
3. Matrix groups
Groups, whose elements are matrices, are called matrix groups. Matrix
groups form an object of study by themselves. Here we state a few frequently
encountered matrix groups.
• GL(n, R), the general linear group, is the set of n × n nonsingular matrices with real entries with the binat operation being
the usual multiplication.
• O(n), the orthogonal group of order n, is the subset of GL(n, R)
with the additional property that R ∈ SO(n) ⇒
RRT = I.
• SO(n), the special orthogonal group of order n, is the subset
of O(n) with the additional property that R ∈ SO(n) ⇒
det(R) = 1.
• The symplectic group Sp(2n, R) consists of 2n × 2n matrices
with real entries that satisfy
AT JA = J,
10
1. Groups, fields and vector spaces
J=
where
0
In
.T hebinaryoperation, onceagain, istheconventionalmatrixmultiplication.
−In 0
• , the general linear group, is the set of n × n non-singular matrices with complex entries, with the binary operation being conventional matrix multiplication.
• U (n, C), the unitary group of order n is the subset of that
satisfies
< Ax, Ay >=< x, y >,
or
|det(A)| = 1.
• SU (n, C), the special unitary group of order n is the subset of
U (n, C) with the additional property that A ∈ SU (n) ⇒
det(A) = 1.
4. Rigid body motion and the groups - SO(3) and SE(3)
Rigid body motion is characterized by two properties
• The distance between any two points in the body remains invariant
• The orientation of the body is preserved. (A right-handed coordinate system remains right-handed)
Two groups which are of particular interest to us in the context of rigid body
motion are SO(3) - the special orthogonal group that represents rotations and SE(3) - the special Euclidean group that represents general rigid body
motions (both rotations and translations.)
• Elements of SO(3) are 3 × 3 real matrices and satisfy
RT R = I
with det(R) = 1.
• An element of SE(3) is of the form (p, R) where p ∈ R3 and R ∈
SO(3). The two tuple (p, R) is represented as
R p
R ∈ SO(3), p ∈ R3
0 1
and the group operation is the usual matrix multiplication.
Rigid body motions are usually described using two frames of reference
(see figure 3.) One is called the body frame that remains fixed to the body
and the other is the inertial frame that remains fixed in inertial space. For
a body undergoing pure rotation, and with the origins of the two frames
coinciding, a rotation matrix R ∈ SO(3) maps the initial coordinates of
any fixed point p in the body to its final coordinates after the rotation.
5. Three results on rotations
11
zb
qb
B
yb
qa
za
pab
xb
ya
A
gab
xa
Figure 3. Rigid body motion
Similarly, coordinate changes in a motion that comprises of both rotation
and translation, are given by a matrix of the form
R p
R ∈ SO(3), p ∈ R3
0 1
where the notation has been touched upon before.
5. Three results on rotations
Every A ∈ SO(3) has an eigen value equal to 1.
Proof. Recall that λ ∈ C is an eigen value of A if there exist a non-zero
vector x (eigen vector) such that Ax = λx. Given AT A = I, we have
(Ax)∗ (Ax) = (λx)∗ (λx) ⇒ x∗ AT Ax = |λ|2 x∗ x ⇒ x∗ x = |λ|2 x∗ x
Along with the fact that det(A) = 1 = λ1 · λ2 · λ3 , this gives three possible
options for the eigen value of A - (1, −1, −1), (1, 1, 1) or (1, α + iβ, α − iβ).
Claim 5.1. The rotation group SO(2) can be identified with S 1 (the unit
circle).
Proof. Now
S 1 = {x ∈ R2 : kxk = 1}
Parametrize the elements of S 1 in terms of θ ∈ [0, 2π]. For each θ ∈ [0, 2π],
the counter-clockwise rotation of the vectors {(1, 0), (0, 1)} in R2 (these form
a basis) by the angle θ
(1, 0) →(cos θ
sin θ)
(0, 1) →(− sin θ
cos θ)
12
1. Groups, fields and vector spaces
is given by the matrix
Rθ =
cos θ − sin θ
sin θ cos θ
which is an element of SO(2).
Conversely, take an element of SO(2) of the form
a1 a2
R=
a3 a4
Then from the properties of an element of SO(2), we have
a1 a4 − a2 a3 = 1; a21 + a23 = 1; a22 + a24 = 1; a1 a2 + a3 a4 = 0
It is possible to find a θ ∈ [0, 2π] such that that R can be represented in the
form Rθ .
A note: For those not familiar with the notion of a matrix representation of a linear transformation, we shall study this concept at a later stage,
but to interpret the next theorem, here is a brief explanation. Suppose
A : R3 → R3 is a linear transformation. Then its matrix representation in
the basis {ξ1 , ξ2 , ξ3 } is the set of 9 scalars {αij : i, j = 1, . . . , 3}
Aξ1 =
3
X
αj1 ξj ,
Aξ2 =
j=1
3
X
αj1 ξ,
j=1
Aξ3 =
3
X
αj1 ξj
j=1
and represented as


α11 α12 α13
 α21 α22 α23 
α31 α32 α33
(7)
(Euler’s theorem)
Every A ∈ SO(3) is a rotation through an angle θ ∈ S 1 about an axis
ω ∈ R3 .
Proof. Since 1 is an eigen value of A, we have Aw = w where w ∈ R3 is
an eigen vector. Choose two vectors e1 and e2 that are orthogonal to each
other as well as w. So
< w, e1 >= 0, < w, e2 >= 0, < e1 , e2 >= 0
6. Skew-symmetric matrices, the cross product and rotations
13
rotation.pdf
Figure 4. A rotating rigid body
The two vectors {e1 , e2 } lie in the plane perpendicular to w and it follows
that {w, e1 , e2 } form a basis for R3 . Since A is orthogonal
0 =< w, e1 >=< Aw, Ae1 >=< w, Ae1 >
(8)
0 =< w, e2 >=< Aw, Ae2 >=< w, Ae2 >
and the matrix representation of A in this basis, computed as
(9)
Aw = w Ae1 = a1 e1 + a2 e2 Ae2 = a3 e1 + a4 e2
is of the form


1 0 0
 0 a1 a3  .
0 a2 a4
Now
a1 a3
a2 a4
is an element of SO(2) (why ?) and hence there exists a θ ∈ [0, 2π] such
that
a1 a3
cos θ − sin θ
=
a2 a4
sin θ cos θ
It follows that A is a rotation about w through the angle θ .
14
1. Groups, fields and vector spaces
6. Skew-symmetric matrices, the cross product and rotations
Consider a rigid body undergoing pure rotation about a fixed axis in inertial
space with a constant angular velocity ω. We shall denote this angular
→
→
velocity vector as ω. Let rq denote the position vector of a point q in the
rigid body from a point on the axis of rotation. Then, from elementary
physics,
→
d rq
→
→
→
vq =
= ω × rq .
dt
→
Expressing vectors in a coordinate frame fixed to the earth as ω= (ω1 , ω2 , ω3 )T
→
and rq (t) = (q1 (t), q2 (t), q3 (t))T , the above vectorial equation takes the coordinate form

 
 

q1 (t)
ω1
q1 (t)
d 
q2 (t)  =  ω2  ×  q2 (t)  ,
(10)
dt
q3 (t)
ω3
q3 (t)
where × denotes the cross product of vectors in R3 , defined as follows.
Property 6.1. The cross product between two vectors a = (a1 , a2 , a3 )T and
b = (b1 , b2 , b3 )T in R3 is defined as


a2 b3 − a3 b2
a × b =  a3 b1 − a1 b3  .
a1 b2 − a2 b1
.
Property 6.2. The operation of the cross-product could, alternatively be
represented as the multiplication by a skew-symmetric matrix as


0
−a3 a2
a
0
−a1  .
b → a × b = âb where â =  a3
−a2 a1
0
Equation (10) now takes the form
dq
= ω × q = ω̂q,
dt
which is a set of three time-invariant linear differential equations in the
variables q1 , q2 , q3 . In more generality, we often encounter equations of the
form ẋ = Ax in many engineering systems, where A is an n×n matrix of real
numbers and x1 (t), . . . , xn (t) are n, real variables, dependant on time. The
solution to such a set of differential equations, with the initial condition,
4
x0 = (x1 (0), . . . , xn (0)),
is given by
x(t) = eAt x0 ∀t ≥ 0.
6. Skew-symmetric matrices, the cross product and rotations
15
where
1 2
A + ···
2!
and the infinite series converges for all real matrices A. In the current
context, the solution is
q(t) = eω̂t q0 ,
where
1
4
eω̂ = I + ω̂ + (ω̂)2 + · · ·
2!
is a rotation matrix. Note that the exponential map is also denoted by
exp(ω̂). We now state a few facts about skew-symmetric matrices, the exponential map and rotations. The set of skew-symmetric matrices in R3×3
with the operation [·, ·], termed as a bracket, and defined as
4
eA = I + A +
[X, Y ] = XY − Y X
forms a Lie algebra and is denoted as so(3). .
Using the vector notation in R3 , the Lie bracket on so(3) between two elements ω1 , ω2 ∈ R3 is given by
[ω1 , ω2 ] = ω1 × ω2 = ωˆ1 ωˆ2 − ωˆ2 ωˆ1
Consider a rotation about the axis (1, 0, 0) in the standard basis, where the
rotation is parametrized by t as follows


cos(t) − sin(t) 0
Rz (.) : t → SO(3) Rz (t) =  sin(t) cos(t) 0 
0
0
1
This same rotation is now expressed as the exponential of a skew-symmetric
matrix. Take axis of rotation be ω = (1, 0, 0), assume unit angular velocity
and let the time of rotation be t. Then the rotation achieved is





0 −t 0
0 −t 0
0 −t 0
1
eω̂t = I +  t 0 0  +  0 t 0   t 0 0  + . . .
2!
0 0 0
0 0 0
0 0 0
which turns out to be


cos(t) − sin(t) 0
Rz =  cos(t) sin(t) 0  .
0
0
1
Given the axis of rotation, the angular velocity and the time of rotation, the
exponential map denoted by ”exp” gives the actual rotation. Mathematically, the exponential map is a transformation from so(3) to SO(3) given
as
4
exp(ω̂) = I + ω̂ + ω̂ 2 /2! + . . . ∈ SO(3)
Remark: The axis of rotation ω is often normalized such that kωk = 1
and the angular velocity vector written as αω where α is the magnitude of
16
1. Groups, fields and vector spaces
the angular velocity. The exponential map from the Lie algebra so(3) to
the group SO(3) is a many-to-one map that is surjective. (A given rotation
(∈ SO(3))can be obtained in more than one (∈ so(3)) element). Example:
Consider the elements of so(3)

 

0 −α 0
0 −α − 2π
0
 0 0 α   0
0
α + 2π 
0 0 0
0
0
0
Both yield the same value for exp(·)
7. Euler angle parameterization of rotations
One of the ways of parametrizing rotations in three dimensional space is
through using Euler angles. These angles denote the successive rotations
about three axes to produce the resultant rotation. Consider the following
sequence of three rotations as shown in figure 5.
• Rotate about the blue z (denote it by Z0 ) by an angle α. Denote
the rotated frame by X1 Y1 Z1 .
• Rotate about the green N (also X1 ) by an angle β. Denote the
rotated frame by X2 Y2 Z2 .
• The final rotation is about the red Z (also Z2 ) by an angle γ.
Denote the rotated frame by X3 Y3 Z3 .


p x3
Now the coordinates of a point p, expressed as  py3  in the third frame
p z3
are given in the second frame as



 


px2
px3
cos γ − sin γ 0
p x3
 py2  = RZ2 (γ)  py3  =  sin γ cos γ 0   py3 
pz2
p z3
0
0
1
p z3


px2
and the coordinates of point p, expressed as  py2  in the second frame
pz2
are given in the first frame as



 


p x1
px2
1
0
0
px2
 py1  = RX1 (β)  py2  =  0 cos β − sin β   py2 
p z1
pz2
0 sin β cos β
pz2
7. Euler angle parameterization of rotations
17
Figure 5. ZXZ euler angle convention [Source: Wikipedia]


p x1
and finally, the coordinates of point p, expressed as  py1  in the first
p z1
frame are given in the initial frame as



 

px1
px
cos α − sin α 0
p x1
 py  = RZ0 (α)  py1  =  sin α cos α 0   py1 
p z1
pz
pz1
0
0
1

18
1. Groups, fields and vector spaces
The composite rotation is thus RZ0 (α)RX1 (β)RZ2 (γ) and




p x3
px
 py  = RZ0 (α)RX1 (β)RZ2 (γ)  py3 
p z3
pz
(11)



cos α cos γ − sin α cos β sin γ − cos α sin γ − sin α cos β cos γ sin α sin β
p x3
=  sin α cos γ + cos α cos β sin γ − sin α sin γ + cos α cos β cos γ − cos α sin β   py3 
p z3
sin β sin γ
sin β cos γ
cos β
The domains of α, β, γ could be taken as
α ∈ [0, 2π), β ∈ [0, 2π), γ ∈ [0, 2π)
Question: What are the values of α, β, γ for the rotation matrix


a11 a12 0
 a21 a22 0 ?
0
0 1
To give further interpretation to Euler angles and their use in engineering,
assume that there exists a mechanical system that imparts angular velocities
to a rigid body and, thse angular velocities are denoted based on the Euler
angles. Call these angular rates as (α̇, β̇, γ̇). The resultant angular velocity
vector, then, is
→
ω= α̇k̂ + β̇ î1 + γ̇ k̂2 .
where k̂, î1 , k̂2 denote unit vectors about the X0 , Z1 , Y2 axes respectively.
Expressing all vectors in the coordinates of the first frame, we have
 
 
0
1
k̂2 → RZ0 (α)RX1 (β)  0  î1 → RZ0 (α)  0 
1
0
  

 
0
cos α − sin α 0
1
0
0
0
RZ0 (α)RX1 (β)  0  =  sin α cos α 0   0 cos β − sin β   0 
1
0
0
1
0 sin β cos β
1


sin α sin β
=  − cos α sin β 
cos β
  
  

1
cos α − sin α 0
1
cos α
RZ0 (α)  0  =  sin α cos α 0   0  =  sin α 
0
0
0
1
0
0
Finally,

 


α̇
ω1
0 cos α sin α sin β
 ω2  =  0 sin α − cos α sin β   β̇  .
ω3
1
0
cos β
γ̇
|
{z
}
8. Interesting facts about rotations
19
A(α, β)
4
The mapping from ω = (ω1 , ω2 , ω3 ) to (α̇, β̇, γ̇) is non-singular only when
β 6= 0. What does this imply ? When β = 0,


0 cos α 0
A(α, 0) =  0 sin α 0  .
1
0
1
It can be seen that the rank of A is 2. Suppose we wish to achieve a certain angular velocity vector (a, b, c) for the mechanism, and do this through
specifying the Euler angle rates. Then
(12)
a = (cos α)β̇ b = (sin α)β̇ c = α̇ + γ̇
From the above equations we see that a and b cannot be arbitrarily specified
and further, infinitely many combinations of α̇ and γ̇ achieve the same c.
In aerospace/mechanical engineering, this phenomenon is called the gimbal
lock. It occurs because the map f from Euler angles to SO(3) (rotations)
f : (α, β, γ) → SO(3)
is not a covering map, it is not a local homeomorphism at every point, and
thus at some points the rank of the map must drop below 3, at which point,
this phenomenon called the gimbal lock occurs. (Note: The rank of a map
or function at a point is the rank of the Jacobian at that point.) In this
case, the Jacobian is A(α, β). Euler angles provide a parametrization for
any rotation in three dimensional space using three numbers, but as seen,
this description not unique, as also there are some points where not every
rotation can be realized by the given set of Euler angles. This feature arises
from a basic topological fact, and this is that there is no covering map from
the 3-torus (S 1 × S 1 × S 1 ) to SO(3); the only (non-trivial) covering map is
from the 3-sphere, and this prompts the use of quaternions.
8. Interesting facts about rotations
(This part is optional, but to the interested reader it whets your appetite
about the fascinating world of rotations and, groups at large.)
We now move on to relate four mathematical entities - the 3-dimensional
sphere S 3 , the real-projective space in 3-dimensions RP3 , the special unitary
group SU (2, C) and the special orthogonal group SO(3).
(1) The 3-dimensional sphere S 3 . We view this sphere as embedded
in the 4-dimensional real vector space R4 . Thus
S 3 = {(x0 , x1 , x2 , x3 ) ∈ R4 : x20 + x21 + x22 + x23 = 1}.
20
1. Groups, fields and vector spaces
(2) The real-projective space RP3 in 3-dimensions consists of the
set of all straight lines passing through the origin in R3 with the
origin excluded. An element of RP3 is called an equivalence class,
defined as
[ξ] = [ξ1 , ξ2 , ξ3 ] = {ξa : a ∈ R − {0}} = {(ξ1 a, ξ2 a, ξ3 a) : a ∈ R − {0}}.
As seen from the above definition, each straight line (with the origin
excluded) forms an equivalence class and is considered an element
of this set.
(3) The special unitary group SU (2, C). We shall denote the elements of this set as
x0 + ix1
x2 + ix3
(x0 , x1 , x2 , x3 ) ∈ R4 det(·) = x20 +x21 +x22 +x23 = 1
−(x2 − ix3 ) x0 − ix1
(4) The special orthogonal group SO(3): Recall that the special
orthogonal group SO(3) indicates a rotation in 3-dimensional
space. In terms of an axis η ∈ R3 and an angle of rotation θ ∈ R,
taken in a right-handed sense, a rotation by an angle θ about an axis
η is the same as that by an angle −θ about an axis −η. Hence the
two tuples (η, θ) and (−η, −θ) denote the same element in SO(3).
We now proceed to define four mappings between the four entities that we
have just introduced.
A mapping from S 3 to RP3 : Define two sets on S 3 as
M = S 3 −{(0, 0, 0, 1), (0, 0, 0, −1)}
and
N = S 3 −{(1, 0, 0, 0), (−1, 0, 0, 0)}
and consider the mappings
fM : M 3 (x0 , x1 , x2 , x3 ) →[x0 , x1 , x2 ] ∈ RP3
We notice that this is a 2-to-1 map with two antipodal (diametrically opposite) points on M mapping to the same element of RP3 . So
S 3 3 (−x0 , −x1 , −x2 , −x3 ) →[x0 , x1 , x2 ] ∈ RP3
Similarly
fN : N 3 (x0 , x1 , x2 , x3 ) →[x1 , x2 , x3 ] ∈ RP3
The two mapping fM and fN map S 3 to RP3 .
A mapping from SU (2, C) to SO(3): Let
0 i
SU (2, C) ⊃ M = SU (2, C) − {
}
i 0
and
1 0
SU (2, C) ⊃ N = SU (2, C) − {
}
0 1
S 3 = M ∪N
8. Interesting facts about rotations
21
Consider
M3
x0 + ix1
x2 + ix3
−(x2 − ix3 ) x0 − ix1
→((x0 , x1 , x2 ), πx3 ) ∈ SO(3)
| {z }
The above map is once again a 2-to-1 map since
−x0 − ix1 −x2 − ix3
→((−x0 , −x1 , −x2 ), −πx3 ) ∈ SO(3)
M3
(x2 − ix3 ) −x0 + ix2
|
{z
}
also yields the same rotation. Similarly
x0 + ix1
x2 + ix3
→((x1 , x2 , x3 ), πx0 ) ∈ SO(3)
N3
−(x2 − ix3 ) x0 − ix1
| {z }
is again a 2-to-1 map.
A mapping from S 3 to SU (2, C): Define this as
3
f : S 3 (x0 , x1 , x2 , x3 ) →
x0 + ix1
x2 + ix3
−(x2 − ix3 ) x0 − ix1
∈ SU (2, C)
As is easily seen, this is an isomorphism.
A mapping from RP3 to SO(3): Define this by mapping the unit ball
D = {(x0 , x1 , x2 ) ∈ R3 : k(x0 , x1 , x2 )k ≤ 1}
q
D 3 (x0 , x1 , x2 ) 6= 0 →((x0 , x1 , x2 ), π (x20 + x21 + x22 )) ∈ SO(3)
| {z }
D 3 0 → I ∈ SO(3)
Antipodal points in D - (x0 , x1 , x2 ) and −(x0 , x1 , x2 ) maps to the same
element in SO(3). We now map D to S 3 as follows.
q
D 3 (x0 , x1 , x2 ) 6= 0 →(x0 , x1 , x2 , + (1 − (x20 + x21 + x22 )) ∈ S 3
with the positive sign in the last element indicating the upper hemisphere
of S 3 . Further, an antipodal point in D gets identified with the same point
in
q
D 3 (−x0 , −x1 , −x2 ) 6= 0 →(x0 , x1 , x2 , + (1 − (x20 + x21 + x22 )) ∈ S 3
The points on the boundary of D are mapped to the equator on S 3 as
D 3 (x0 , x1 , x2 )(k(x0 , x1 , x2 )k = 1) →(x0 , x1 , x2 , 0)
q
3
RP 3 [x0 , x1 , x2 ] →((x0 , x1 , x2 ), π (x20 + x21 + x22 )) ∈ SO(3)
| {z }
Based on the discussion of this section, we have the diagram 6.
22
1. Groups, fields and vector spaces
S3
∼
=
SU (2, C)
2 to 1
RP3
2 to 1
∼
=
SO(3)
Figure 6. Diagram for the 4 maps.
8.1. The quaternion algebra. Based on the previous discussion, let us
introduce a multiplicative structure on R4 as follows. We identify an element
- (x0 , x1 , x2 , x3 ) - in R4 with a 2 × 2 complex matrix
x0 + ix1
x2 + ix3
−(x2 − ix3 ) x0 − ix1
and denote this collection of 2×2 matrices as R4 . This is a real vector space,
with a basis
1 0
i 0
0 1
0 i
1=
, i=
, j=
, k=
0 1
0 −i
−1 0
i 0
Further, if the basis defined is declared as orthonormal,
< 1, i >= 0, < 1, k >= 0, · · · and so on,
then the norm of an element x ∈ R4 is
kxk22 = x20 + x21 + x22 + x23
which is identical to the norm of the element (x0 , x1 , x2 , x3 ) in R4 . This is
termed an isometry.
Under matrix multiplication, we have
i∗i = j ∗j = k∗k = −1
i∗j = −j ∗i = k, j ∗k = −k∗j = i, k∗i = −i∗k = j
with 1 as the multiplicative identity. The space we have constructed is a
4-dimensional real vector space and is called the algebra of quaternions.
From the discussion in the previous section, SU (2, C), that was viewed as
the sphere S 3 embedded in R4 , is the set of unit quaternions.
9. More on the rigid body transformation group SE(3)
The set of matrices of the form
4
ω̂ v
ˆ
ξ=
ω̂ ∈ so(3); v ∈ R3 (⊂ R4×4 )
0 0
11. Ring
23
with the bracket operation [·, ·] defined as
(ω̂1 ω̂2 − ω̂2 ω̂1 ) ω̂1 v2 − ω̂2 v1
[ξˆ1 , ξˆ2 ] =
0
0
forms a Lie algebra and is denoted as se(3).
Given the angular velocity
ω, the linear velocity v and the time of motion t, define a matrix
4
ω̂ v
ξˆ =
0 0
ˆ gives the actual rigid body transformation.
Then the exponential exp(ξt)
Mathematically, the exponential map is a transformation from se(3) to
SE(3) given as
4
ˆ =
ˆ + ξˆ2 t2 /2! + . . . ∈ SE(3)
exp(ξt)
I + ξt
The exponential map from the Lie algebra se(3) to the group SO(3) is
a many-to-one map that is surjective. (A given rotation (∈ SE(3))can be
obtained in more than one (∈ se(3)) element).
10. Rings and Fields
Our next two structures are a ring and a field. Though we shall not be discussing these in detail, they are essential in making our way to the important
structure of vector spaces that, we shall discuss in much detail.
11. Ring
By defining an additional binary operation in a group, we impose additional
structure and define a mathematical object termed a ring. A ring is a set
R with two binary operations + and × such that
(1) a + (b + c) = (a + b) + c ∈ R (+ is associative)
(2) There exists a unique element 0 ∈ R (called the zero element) such
that
a+0=0+a=a
(3) For every element a ∈ R there exists an element a−1 ∈ R such that
a + (a−1 ) = 0
(4) a + b = b + a ∈ S (+ is commutative) (Note that the axioms 1
through 4 make the set a commutative group with respect to the
operation +. The × operation satisfies:)
(5) a × (b × c) = (a × b) × c ∈ R (× is associative)
(6) a × (b + c) = (a × b) + (a × c) ∈ R (× is distributive over +)
24
1. Groups, fields and vector spaces
Which of the following sets qualifies for a ring ?
(1) The set Z with + and ×.
(2) The set of rational number with + and ×.
(3) The set of all n × n matrices with polynomial entries (Pn×n ) with
the conventional matrix multiplication and addition.
(4) The set of all minimum phase transfer functions (denoted by Gmp )
with the usual notion of transfer function addition and multiplication.
(5) The set of all proper transfer functions (denoted by Gprop )
(6) The set of all stable transfer functions (denoted by Gstb )
(7) The set of all 2 × 2 matrices with elements from Z (denoted as
Z2×2 )
If there exists a unique element 1 ∈ R such that
1 × x = x × 1 = x ∈ R ∀x ∈ R,
then the ring R is said to have an identity - 1.
Consider a ring R with
identity. Then any x ∈ R is called a unit in R if there exists a y ∈ R such
that
x × y = y × x = α.
Which elements are the units in the ring Z, Z2×2 , Pn×n , Gst ?
which satisfies the additional axiom
A ring
• a × b = b × a ∈ R (× is commutative)
is called a commutative ring.
Is Z a commutative ring ? Z2×2 ?
12. Field
A field F is a commutative ring with identity satisfying the following axioms
• F contains atleast two elements
• Every nonzero element of F is a unit
The notion of ”inverse” with respect to the × operation thus enters in the
definition of a field.
(1) Consider the set {0, 1, 2} where + and × are the usual addition
and multiplication operations. Is this a ring ? Is this a field ?
(2) Is the set Z2×2 a field ?
13. Vector Spaces
25
13. Vector Spaces
A set of elements V with the binary operation + is said to form a linear
vector space over the field F if they satisfy the following axioms for any
x, y, z ∈ V
• x+y =y+x∈V
• (x + y) + z = x + (y + z) ∈ V
• There is a unique element 0 in V such that
0 + x = x ∀x ∈ V
• For every element x ∈ V there exists an element x−1 ∈ R such that
x + (x−1 ) = 0
(Note that these first four conditions make a vector space a commutative
group.) Further on, there are a few more conditions, based on the notion of
a ”scalar” and ”vector” association defined as follows:
• For every α ∈ F and x ∈ V, there exists an element αx ∈ V.
• For every α, β ∈ F and x, y ∈ V,
– αx + βy ∈ V
– α(βx) = (αβ)x
– α(x + y) = αx + αy
– (α + β)x = αx + βx
– 1x = x ∀x ∈ V
Remark 1.
Which of the following qualifies for a vector space ?
• The set R over the field R.
• The string of n-tuples of real numbers over the field R.
• The set of all real-valued continuous functions over the interval [0, 1]
over the field R.
• The set of all polynomials of degree n or less in the indeterminate x
xn + a1 xn−1 + . . . + an ai ∈ C
over the field C.
13.1. Linear independence, span and subspace. A set of elements
{x1 , x2 , . . . , xn } (not including the zero element θ) in a vector space V is
said to be linearly independent if
α1 x1 + α2 x2 + . . . + αn xn = 0 ⇒ α1 = α2 = . . . = αn = 0
Else, the set of elements is linearly dependant.
26
Note
1. Groups, fields and vector spaces
more clarity needed here...Any set that includes the zero
element.......
(1) Consider the real vector space R2 over the field of real numbers. Is the
set
1
0
(13)
−1
−2
linearly independent ?
(2) Let F denote the real, vector space of all continuous functions {f : R →
R}. Give an example of a linearly independent set in this vector space.
The linear span of a set of elements
S = {x1 , . . . , xp }
in a vector space V is defined as
{
p
X
αi xi : αi ∈ F }
1
and is denoted by span S.
A subset M of a vector space V which satisfies
x, y ∈ M ⇒ αx + βy ∈ M ∀x, y ∈ M
is called a subspace of V.
Remark 2. By its definition, a subspace satisfies all properties of a vector
space.
The maximal number of elements in any linearly independent set of a subspace M is the dimension of M .
(1) Let V be the set of all 2 × 2 matrices with real (R) entries considered
over the field R. Show that V is a vector space. Prove that V has
dimension 4.
(2) Let V be the real vector space of all functions from R into R. Which
of the following sets of functions are subspaces of V ?
(a) all f such that f (x2 ) = f (x)2
(b) all f such that f (0) = f (1)
(c) all f which are continuous
(3) In question 1, which of the following sets of matrices A in V are subspaces of V ?
(a) all invertible A
(b) all non-invertible A
(c) all A such that A2 = A
14. Basis
27
(4) In question 1, let W1 be the set of matrices of the form
x −x
(14)
y z
and let W2 be the set of matrices of the form
a b
(15)
−a c
(a) Prove that W1 and W2 are subspaces of V
T
(b) Find the dimensions of W1 , W2 , W1 + W2 and W1 W2
(5) Find the coordinates (the scalars) of the vector (1, 0, 1) in the basis of
C 3 consisting of the vectors
{(2i, 1, 0), (2, −1, 1), (0, 1 + i, 1 − i)}
(6) Classify each of the following sets as a vector space and/or a group.
Specify the binary operation and check all other requirements.
Rn , Rn×n , O(n), SO(n), S 1 , S 1 × S 1 , SE(3), Z
(7) If R ∈ SO(3) and ω ∈ R3 , then show that Rω̂RT = (Rω)
If R ∈ SO(3) and v, w ∈ R3 , then show that R(v × w) = (Rv) × (Rw)
(9)
(8) Is so(3) a vector space ? What is its dimension and give a basis.
14. Basis
A set of elements in a vector space is called a basis if it satisfies two properties
• The set is linearly independent.
• Each element of the vector space belongs to the linear span of the set.
Remark 3. A basis for a vector space is not unique.
Any two bases for a vector space have the same number of elements.
Proof. to be done.
Note
The number of elements in a basis of a vector space is called the dimension of the vector space.
Two vector spaces V1 and V2 over the same
field F are said to be isomorphic to each other if there exists a one-to-one
and onto correspondence between the elements in each vector space that
preserves linearity, i. e.
V 1 3 x 1 → y1 ∈ V 2 ,
V1 3 x2 → y2 ∈ V2 ⇒ αx1 + βx2 → αy1 + βy2
28
Note
1. Groups, fields and vector spaces
Examples
(1) Prove that if the set {x1 , x2 , . . . , xm } spans a subspace M of a vector
space V , so does the set
{x1 − x2 , x2 − x3 , . . . , xm }
(2) Let U be a subspace of R5 defined by
U = {(ξ1 , . . . , ξ5 ) ∈ R5 : ξ1 = 3ξ2 , ξ3 = 7ξ4 }
Find a basis for U .
(3) Give an example of a nonempty subset U of R2 such that U is closed
under scalar multiplication, but U is not a subspace of R2 .
(4) Let U1 and U2 be
S two subspaces of a vector space V . Under what
conditions Is U1 U2 a subspace of V ? Support your answer with
arguments.
(5) Prove that if U1 and U2 are subspaces of a finite dimensional vector
space V , then
dim(U1 + U2 ) ≤ dim(U1 ) + dim(U2 )
(6) Consider the vector space R4 and the subspace U = {(ξ1 , 0, ξ3 , 0) :
ξ1 , ξ3 ∈ R}. Give two subspaces U1 and U2 such that
R4 = U ⊕ U1
R 4 = U + U2
R4 6= U ⊕ U2
15. Linear functionals and dual basis
A linear functional F on a vector space V over the field F is a mapping
F : V → F that satisfies
F(α1 x1 + α2 x2 ) = α1 F(x1 ) + α2 F(x2 )
∀x1 , x2 ∈ V
and
∀α1 , α2 ∈ F
Alternate notation for a linear functional is as follows. Let y be a linear
functional. Then, its action on an element x in the vector space is denoted
by
[x, y] and [α1 x1 + α2 x2 , y] = α1 [x1 , y] + α2 [x2 , y]
The set of all linear functionals over a vector space V forms a vector space
and is called the dual of V and denoted by V0 . The dimension of V0 is the
same as V. Elements of the dual space will be denoted by the letter y.
Given a vector space V, a basis X = {x1 , . . . , xn } and n-scalars α1 , . . . , αn ,
there exists a unique linear functional y which satisfies [xi , y] = αi .
Proof. Left as an exercise.
Given a vector space V and a basis X =
{x1 , . . . , xn }, there exists a unique basis {y1 , . . . , yn } with the property
[xj , yi ] = δij
i, j = 1, . . . , n,
16. Annihilator
29
and is termed the dual basis.
Proof. Given the set of scalars (1, 0, . . . , 0), the linear functional y1 is
unique. Similarly for y2 , . . . , yn . We now proceed to show that {y1 , . . . , yn }
forms a basis for V0 .
P
Linear independence: Let αi be a set of scalars such that i αi yi = 0. Then
X
[xj ,
αi yi ] = 0∀j = 1, . . . , n ⇒ α1 = · · · = αn = 0
i
Since all the αi s are zero, this implies linear independence.
Every member of V0 can be expressed as a linear combination of {y1 , . . . , yn }
: Let y ∈ V0 and [xi , y] = βi . Then
X
y=
βi yi
i
is the expression for y in terms of the yi s.
The dual of Cn (Rn ) is identified with Cn (Rn ) itself.
Proof. Let y ∈ Cn0 and
X = {x1 , . . . , xn }
be a basis in Cn . Then there exists {α1 , . . . , αn }(αi ∈ C) that are unique
and which satisfy
[xi , y] = αi
The unique n-tuple (α1P
, . . . , αn ) is identified with the linear functional y
since for any Cn 3 x = ni=1 βi xi
[x, y] =
n
X
αi βi
i=1
Alternatively, every n-tuple in Cn corresponds to a linear functional on Cn .
16. Annihilator
If S is a subset of a vector space V, then y ∈ V0 is said to annihilate S if
[x, y] = 0 ∀x ∈ S.
The set of all elements y ∈ V0 with the property that ”y annihilates S ” is
called the annihilator of S and is denoted by S o The annihilator S o is
always a subspace of V0 .
If M is a subspace of dimension m of a finite
dimensional vector space V of dimension n, then the dimension of M 0 is
n − m.
30
1. Groups, fields and vector spaces
Proof. Let {x1 , . . . , xm } form a basis for M and let {x1 , . . . , xm , xm+1 , . . . , xn }
form a basis for V. Let {y1 , . . . , ym , ym+1 , . . . , yn } be the corresponding dual
basis. We shall show that M o = span{ym+1 , . . . , yn }.
o
Let
property of a basis, y can be expressed as
P y ∈ M . From the
o , [x , y] = 0 ∀i = 1, . . . , m. This implies α =
y = i αi yi . Since y ∈ MP
i
1
n
o
. . . = αm = 0. So y =
i=m+1 αi yi ⇒ y ∈ span{ym+1 , . . . , yn } ⇒ M ⊂
span{ym+1 , . . . , yn }.
P
Let y = ni=m+1 βi yi ∈ span{ym+1 , . . . , yn }. Then for any x ∈ M
[x, y] = [
m
X
i=1
n
X
xi ,
βj yj ] = 0 ⇒ y ∈ M o ⇒ span{ym+1 , . . . , yn } ⊂ M o .
j=m+1
The result follows from the fact that the dimension of span{ym+1 , . . . , yn }
is n − m.
17. Direct sum
Two subspaces M and N of a vector space V are said to form a direct sum
of V and denoted as V = M ⊕ N if they satisfy the following property
• Every x ∈ V can be expressed in the form
x = z + s z ∈ M, s ∈ N
in a unique way.
(1) If V is an n-dimensional vector space over a finite field , and if 0 <
m < n then the number of m- dimensional subspaces of V is same as
the number of (n − m)-dimensional subspaces.
(2) Suppose that x, y, u and v are vectors in C4 ; let M and N be the subspaces of C4 spanned by x,y and u,v respectively. In which of the
following cases is it true that C4 = M ⊕ N ?
(a) x=(1,1,0,0) , y=(1,0,1,0)
u=(0,1,0,1) , v=(0,0,1,1)
(b) x=(-1,1,1,0) , y=(0,1,-1,1)
u=(1,0,0,0) , v=(0,0,0,1)
(c) x=(1,0,0,1) , y=(0,1,1,0)
u=(,1,0,1,0) , v=(0,1,0,1) .
(3) If M is the subspace consisting of all those vectors (ξ1 , ....ξn , ξn+1 , ....ξ2n )
in C2n for which ξ1 = ... = ξn = 0, and if N is the subspace of all those
vectors for which ξj = ξn+j , j=1,...,n, then C2n = M ⊕ N .
18. Multilinear functionals (tensors)
31
(4) Construct three subspaces M, N1 and N2 of a vector space V so that
M ⊕ N1 = M ⊕ N2 = V but N1 6= N2 . (Note that this means that there
is no cancellation law for direct sums). What is the geometric picture
corresponding to this situation ?
(5) (a) If U, V and W are vector spaces, what is the relation between
U⊕(V ⊕ W) and (U ⊕ V)⊕W (i.e., in what sense is the formulation of direct sums as associative operation) ?
(b) In what sense is the formation of direct sums commutative ?
(6) Consider the quotient spaces obtained by reducing the spaces P of polynomials modulo various subspaces. If M = Pn , is P/M finite dimensional? What if M is the subspace consisting of all even polynomials
divisible by xn (where xn (t) = tn ) ?
(7) Prove that each of the corresponding described below is a linear transformation.
(a) V is the set C of complex numbers regarded as a real vector space;
Ax is complex conjugate of x .
(b) V is P; if x is a polynomial, then (Ax)(t)=x(t + 1) − x(t) .
(8) Prove that if V is a finite-dimensional vector space, then the space of
all linear transformations on V is finite-dimensional, and find its dimension .
18. Multilinear functionals (tensors)
The notion of a linear functional is easily extended to a function with multiple arguments, wherein each argument belongs to a vector space and the
function satisfies linearity with respect to each of these.
A k-linear
functional Fk on a vector space V over the field F is a mapping Fk :
V
· · × V} → F that satisfies linearity in each argument.
| × ·{z
k−times
Remark 4. The k-linear functional, as defined above, is also called a covariant k-tensor. If each of the the arguments V in Fk were to be replaced
by the dual space V0 (with the linearity property still holding true) as
0
Fk : V
· · × V}0 → F,
| × ·{z
k−times
then the object obtained is called a contravariant k-tensor and we denote
the k-linear functional by a superscript notation as Fk . In many texts, a
32
1. Groups, fields and vector spaces
covariant k-tensor is also denoted as a (0, k)-tensor, while a contravariant
k-tensor is denoted as a (k, 0)-tensor.
The set of all k-linear functionals on an n-dimensional vector space V
(or covariant k-tensors), denoted by Tk , forms a vector space of dimension
nk .
Proof. (The axioms of a vector space are easily shown.) We move on to
show the dimension. We construct a basis for Tk as follows. Let X =
{e1 , . . . , en } be a basis for V. We construct nk covariant tensors as follows.
The first covariant tensor F 1 . . . 1 is constructed as follows. Define
| {z }
k−times
F1...1 (ψ1 , . . . , ψk ) =
1
for
(ψ1 , ψ2 , . . . , ψk−1 , ψk ) = (e1 , e1 , . . . , e1 , e1 )
0 otherwise
The second covariant tensor
1
for
(ψ1 , ψ2 , . . . , ψk−1 , ψk ) = (e2 , e1 , . . . , e1 , e1 )
F2...1 (ψ1 , . . . , ψk ) =
0 otherwise
and extending this construction, the covariant k-tensor Fi1 ...ik is constructed
as
1
for
(ψ1 , . . . , ψk ) = (ei1 , . . . , eik )
Fi1 ...ik (ψ1 , . . . , ψk ) =
0 otherwise
Now we claim that this constructed set of k-tensors {Fi1 ...ik , 1 ≤ i1 , . . . , ik ≤
n} forms a basis.
Property 1: Any covariant k-tensor can be expressed as a linear combination
of the elements of this set.
Consider an arbitrary covariant k-tensor F which takes values at the basis
vectors as
F(ei1 , . . . , eik ) = βi1 ...ik .
Now for any arbitrary vectors (vi1 , . . . , vik ),
X
X
X
X
F(vi1 , . . . , vik ) = F(
αi1 ei1 , . . . ,
αik eik ) =
···
αi1 . . . αik βi1 ...ik Fi1 ...ik (ei1 , . . . , eik )
i1
=
X
i1
···
X
ik
βi1 ...ik Fi1 ...ik (vi1 , . . . , vik ) ⇒ F =
ik
i1
ik
X
i1
···
X
βi1 ...ik Fi1 ...ik
ik
Property 2 : The set {Fi1 ...ik , 1 ≤ i1 , . . . , ik ≤ n} is a linearly independent
set.
Let
X
X
···
αi1 ...ik Fi1 ...ik = 0
i1
ik
19. Linear transformations
33
Consider the action of the above linear combination on the vector (vi1 , vi2 , . . . , vik−1 , vik ) =
(e1 , e1 , . . . , e1 ). We have
X
X
αi1 ...ik Fi1 ...ik (e1 , e1 , . . . , e1 ) = α1...1 = 0
···
i1
ik
Continuing with this procedure with vectors (e2 , e1 , . . . , e1 ), (e3 , e1 , . . . , e1 ), . . . (e1 , e1 , . . . , en ),
we show that all the coefficients αi1 ...ik are zero, thus proving linear independence.
A k-linear functional that satisfies the property
Fk (v1 , . . . , vi , . . . , vj , . . . , vk ) = Fk (v1 , . . . , vj , . . . , vi , . . . , vk )
for all i, j is called a symmetric k-linear functional.
A k-linear functional that satisfies the property
Fk (v1 , . . . , vi , . . . , vj , . . . , vk ) = −Fk (v1 , . . . , vj , . . . , vi , . . . , vk )
for all i, j is called a skew-symmetric k-linear functional.
The set of all skew-symmetric k-linear functionals (k ≤ n) on V (or
skew-symmetric covariant k-tensors) forms a subspace of dimension n Ck of
Tk .
Proof. From the skew-symmetry property, we have F(vi1 , . . . , vik ) = 0
whenever two of the vectors in its argument are identical. Moving on the
lines of the proof of theorem (18), we construct a set of skew-symmetric
covariant k-tensors from the set
{Fi1 ...ik , 1 ≤ i1 , . . . , ik ≤ n : Fi1 ...ik (ei1 , . . . , eik ) = 1 and zero for all other arguments},
by eliminating all those elements for which any two vectors are identical.
Applying this logic, we have n choices for the first subscript i1 , (n − 1)
choices for the second subscript i2 and so on till we have (n − k + 1) choices
for the subscript is . The set this obtained contains n × (n − 1) . . . (n − k + 1)
elements and is a basis. Details can be worked out.
(1) Find the dimension of the subspace of symmetric k-linear functionals
on a vector space V.
(2) Find the dimension of the subspace of skew-symmetric k-linear functionals on a vector space V.
19. Linear transformations
A correspondence A from a vector space V to W
V 3 x → Ax ∈ W
34
1. Groups, fields and vector spaces
that satisfies
A(αx + βy) = αAx + βAy
∀ x, y ∈ V
α, β ∈ F
is called a linear transformation from V to W.
The set of all linear
transformations from V to W forms a vector space. Proof: Left as an
exercise. What is the dimension of the vector space of all linear transformations from V to W ?
Proof. Hint: Consider a basis X = {x1 , . . . , xn }. Construct a set of n2
linear transformations
{A11 , A12 , . . . , A1n , A21 , . . . , Ann },
defined as follows
Aij xk =
xj
for
k=i
0 otherwise
For instance, take A23 . Then A23 x2 = x3 and A23 xi = 0 for all i 6= 2.
Peculiarities of linear transformations:
Consider the vector space of linear transformations on V. The product AB
of two linear transformations A and B is defined as follows:
ABx = A(Bx) ∀x ∈ V
(1) The product of two non-zero transformations could yield a zero transformation. Example: Let V = P3 , the vector space of polynomials
d
of degree 2 or less in the indeterminate t. Now consider A = dt
and
2
d
B = dt2 .
(2) The product, in general, is in non-commutative (AB 6= BA).
Example: This is easily seen in matrices as there product, in general,
is non-commutative.
.
20. Matrix representations, the adjoint and similarity
transformations
Given a linear transformation A from V to W, a basis X = {x1 , . . . , xn }
in V and a basis Y = {y1 , . . . , yp } in W, the array of np scalars (αij , i =
1, . . . , p, j = 1, . . . , n)


α11 · · · α1n
 ..
..
.. 
 .
.
. 
αp1 · · ·
αpn
20. Matrix representations, the adjoint and similarity transformations
35
defined by the relation
Axj =
p
X
αkj yk j = 1, . . . , n
k=1
is called the matrix representation of A with respect to the basis X and Y.
(1) Let A be the linear transformation on Pn defined by (Ax)(t) = x(t + 1),
and let {x1 , ..., xn+1 } be the basis of Pn defined by xj (t) = tj , j =
0, ..., n − 1. Find the matrix of A with respect to this basis.
(2) Find the matrix of the operation of conjugation on C, considered
as a
√
real vector space, with respect to the basis {1, i}(where i = −1).
(3) Consider the vector space of all two-by-two matrices and let A be the
linear transformation that sends each matrix X onto P X, where
1 1
P =
1 1
. Find the matrix of A with respect to the basis consisting of
1 0
0 1
0 0
0 0
,
,
,
0 0
0 0
1 0
0 1
(4) Let A be the linear transformation on C2 defined by A(ξ1 , ξ2 ) = (ξ1 +
ξ2 , ξ2 ). Prove that if a linear transformation B commutes with A, then
there exists a polynomial p such that B = p(A).
(5) If A and B are linear transformations on a vector space, and if AB = 0,
does it follow that BA = 0?
(6) (a) Suppose that V is a finite-dimensional vector space with basis {x1 , ..., xn }.
Suppose that α1 , ..., αn are pairwise distinct scalars. If A is a linear
transformation such that Axj = αj xj , j = 1, ..., n, and if B is a linear transformation that commutes with A, then there exists scalars
β1 , ..., βn such that Bxj = βj xj .
(b) Prove that if B is a linear transformation on a finite dimensional
vector space V and if B commutes with every linear transformation on
V then B is a scalar. (that is there exits a scalar β such that Bx = βx
for all x in V)
(7) (a) It is easy to extend matrix theory to linear transformations between
different vector spaces. Suppose that U and V are vector spaces over
the same field, let {x1 , ..., xn } and {y1 , ..., ym } be the bases of U and V
respectively, and let A be the linear transformation from U to V. The
matrix of A is, by definition, the rectangular m by n, array of scalars
defined by
Axj = Σi αij yi .
Define addition and multiplication of rectangular matrices so as to generalize as many as possible of the results of 38inHalmos.(N otethattheproductof anm1
36
1. Groups, fields and vector spaces
by n1 matrix and an m2 by n2 matrix, in that order, will be defined
only if n1 =m2 .)
(b)Suppose that A and B are multipliable matrices. Partition A into
four rectangular blocks(top left, top right,bottom left,bottom right) and
then partition B similarly so that number of columns in the top left
part of A is the same as the number of rows in the top left part of B.
If, in an obvious shorthand, there partitions are indicated by
B11 B12
A11 A12
,
,B =
A=
B21 B22
A21 A22
then
AB =
A11 B11 + A12 B21 A11 B12 + A12 B22
A21 B11 + A22 B21 A21 B12 + A22 B22
,
(c) Use subspaces and complements to express the result of (b) in terms
of linear transformations (instead of matrices).
(d) Generalize both (b) and (c) to larger number of pieces (instead of
four).
(8) Suppose that the matrix of a linear transformation (on a two-dimensional
vector space) with respect to some coordinates system is
0 0
0 1
. How many subspaces are there invariant under the transformation?
20.1. The inverse and adjoint transformations. If a linear transformation A on a vector space V satisfies the two properties
(1) x1 6= x2 ⇒ Ax1 6= Ax2 ,
(2) For every y ∈ V there exists an x ∈ V such that Ax = y ,
then the linear transformation is said to be invertible and the transformation
which corresponds x to y, where Ax = y is called the inverse of A and is
denoted by A−1 and A−1 y = x. Show that A−1 is a linear transformation.
For finite dimensional vector spaces, the above two conditions are equivalent
to the single condition
Claim 20.1. If a linear transformation A on a finite dimensional vector
space satisfies the condition
Ax = 0 ⇒ x = 0
then A is said to be invertible.
What does a linear transformation induce on the dual space ? A linear
transformation A on a vector space V induces a transformation A0 on the
20. Matrix representations, the adjoint and similarity transformations
37
dual space V0 called the adjoint (or dual) transformation defined as
[x, A0 y] = [Ax, y] ∀x ∈ V, y ∈ V0 .
Claim 20.2. The adjoint is a linear transformation on V0 .
Proof. Left as an exercise.
20.2. Similarity transformations. We now pose questions of the following nature:
What happens to the matrix representation of a linear transformation under
a change of basis ?
Claim 20.3. Given a linear transformation A on a vector space V and two
bases X and Y, the matrix representation [A]X is related to [A]Y as
[A]Y = [C]−1
X [A]X [C]X
where C is an invertible linear transformation defined as Cxi = yi .
Proof. Now
Ayi = βji yj = βji Cxj = βji (γkj xk ).
We also have
Ayi = ACxi = Aγmi xm = γmi (Axm ) = γmi (αlm xl )
Comparing the scalars associated with xk in each of the two preceding expressions, we obtain
γkj βji = αkm γmi ,
which in matrix form is
[C]X [A]Y = [A]X [C]X
The result follows.
Claim 20.4. Given two linear transformations A and B on a vector space
V, both of which have identical matrix representations αij in basis X and Y
respectively, the linear transformations are related as
A = C −1 BC
where C is an invertible linear transformation defined as Cxi = yi .
Proof.
Byi = αki yk = αki Cxk = C(αki xk ) = CAxi
Now Byi = BCxi as well. Comparing the previous two expressions, we have
BC = CA.
38
1. Groups, fields and vector spaces
21. Eigen values
Which are those vectors x ∈ V that just get scaled x → βx(β ∈ F ) under the
action of a linear transformation A ? How much do they get scaled ? The
answer to this question is related to the notion of an eigen value. β ∈ F
is called an eigen value of a linear transformation A if there exists atleast
one non-zero vector xβ such that
Axβ = βxβ
The vector xβ is termed an eigen vector corresponding to the eigen value
β.
The eigen values of a linear transformation remain invariant under a
similarity transformation.
Proof. : left as an exercise.
22. Multilinear forms and the determinant
In this section we usher in the notion of a determinant through the theory
of multilinear forms and then establish the equivalence of this notion with
our earlier comprehension of the determinant.
Recall that the space of skew-symmetric n-linear forms over an n-dimensional
vector space V has dimension 1. Let A be a linear transformation from V to
V and let X = {x1 , . . . , xn } be a basis for V. Now let w be a skew-symmetric
n-linear functional and define a transformation Ā on Tks−sym as
(16)
(Āw)(x1 , . . . , xn ) = w(Ax1 , . . . , Axn )
Since the space of skew-symmetric n-linear forms over an n-dimensional
vector space V has dimension 1,
(17)
(Āw)(x1 , . . . , xn ) = δw(x1 , . . . , xn )
where δ is a scalar. We shall show that this scalar δ is indeed the determinant of the linear transformation A in the basis X. Let [αij ] be the matrix
representation of A in the given basis. Then
n
n
X
X
(18)
w(Ax1 , . . . , Axn ) = w(
αj1 xj , . . . ,
αjn xj )
j=1
j=1
Using the property of linearity in each argument, the RHS of the above
equation could be expanded. On doing so, the terms that have any two
22. Multilinear forms and the determinant
39
identical entries would disappear and the resulting summation would yield
terms, each of which looks like
w(απ(1)1 xπ(1) , . . . , απ(n)n xπ(n) ) = απ(1)1 · · · απ(n)n w(xπ(1) , . . . , xπ(n) )
= απ(1)1 · · · απ(n)n πw(x1 , . . . , xn )
(19)
for some permutation π of the integers (1, . . . , n). Summing over all
possible permutations, we have
n
n
X
X
X
w(
αj1 xj , . . . ,
αjn xj ) =
απ(1)1 · · · απ(n)n πw(x1 , . . . , xn )
j=1
(20)
j=1
all permutations π
X
=
απ(1)1 · · · απ(n)n sign(π)w(x1 , . . . , xn )
all permutations π
From (17) and (18), we have
X
δ=
(21)
απ(1)1 · · · απ(n)n sign(π)
all permutations π
The RHS of the above equation is the computation of the determinant as
we know it. So we now term δ as det(A).
Property 22.1. Say C = BA. Then δC = δB δA .
Proof.
δC w(x1 , . . . , xn ) = (C̄w)(x1 , . . . , xn ) = w(Cx1 , . . . , Cxn )
= w(BAx1 , . . . , BAxn ) = (B̄w)(Ax1 , . . . , Axn ) = δB w(Ax1 , . . . , Axn )
(22)
= δB (Āw)(x1 , . . . , xn ) = δB δA w(x1 , . . . , xn )
Property 22.2. If A is an invertible transformation, then δA 6= 0.
Proof.
(23)
1 = AA−1 ⇒ δ1 = 1 = δA δA−1 ⇒ δA 6= 0
Chapter 2
Additional structures
in vector spaces
1. Norms
A norm on a vector space V is a non-negative, real-valued map (denoted by
k.k)
k.k : V → R≥0
which satisfies the following properties
(1) krk ≥ 0 ∀r ∈ V and krk = 0 ⇐⇒ r = 0
(2) kr + yk ≤ krk + kyk ∀r, y ∈ V (Triangular inequality)
(3) kβrk = |β| krk ∀β ∈ F, r ∈ V
The notion of length from the real-Euclidean space (Rn ) is familiar to most
of us. This length is the Euclidean norm or the 2-norm and is defined as
n
4 X
kxk2 = (
|xi |2 )1/2
for
4
x = (x1 , . . . , xn )
i=1
1.1. A few typical norms. On Cn and Rn consider
• For x ∈ Cn , the p-norm (1 ≤ p < ∞)is defined as
n
4 X
kxkp = (
|xi |p )1/p ,
i=1
and the ∞ norm as
4
kxk∞ = max |xi |.
1≤i≤n
41
42
2. Additional structures in vector spaces
• For x ∈ C 0 [a, b] (the space of continuous functions on the real interval
[a, b]), the 2-norm is defined as
s
Z b
4
kxk2 =
(x(s))2 ds.
a
Note that all these norms satisfy the three properties stated in the
definition.
1.2. Norms on signals. The notion of a ”signal” is fairly well entrenched in engineering. Current (or voltage) in an electrical element,
velocity of a moving body, concentration of a component in a mixture,
torque supplied by a motor, the speed of a rotor shaft, are some examples of physical quantities that are termed signals. Mathematically
speaking, signals are functions of time or other arguments. Scalar signals encountered in many electrical and mechanical applications are
often real-valued functions of time (and n-dimensional signals are ntuple valued functions of time.) These signals are amenable to being
modeled as elements of a vector space. We now endow signals with
norms. Norms on signals, quite often, give an indication of physical
quantities like energy, amplitude and so on.
Examples of norms on signals:
• The 1, 2 and ∞ norms for signals belonging to the space of Rn -valued
continuous signals Sc : (−∞, ∞) → Rn are
Z ∞X
n
4
krk1 =
|ri (t)|dt r ∈ Sc
−∞ i=1
v
uZ
4 u
krk2 = t
∞
n
X
(ri (t))2 dt
r ∈ Sc
−∞ i=1
4
krk∞ = sup(kr(t)k2 )
where
t∈R
v
u n
uX
kr(t)k2 = t (ri (t))2
r ∈ Sc
i=1
• The 1, 2 and ∞ norms for signals belonging to the space of Rn -valued
discrete sequences Sd : Z → Rn are
n
4 X X
krk1 =
{
|(ri (j)|} r ∈ Sd
j=Z i=1
v
u
n
4 uX X
krk2 = t
{ (ri (j))2 }
j=Z i=1
r ∈ Sd
2. Inner products
43
4
krk∞ = sup(kr(j)k2 )
v
u n
uX
kr(j)k2 = t (ri (j))2
where
j∈Z
r ∈ Sd
i=1
• Recall that that matrices too form a vector space. So we now introduce norms on matrices. Let A ∈ C m×n . The 1,2 and ∞ norms are
defined as
m
X
kAk1 = max
|aij | (column sum)
1≤j≤n
p
kAk2 = λmax (A∗ A)
i=1
= σmax (A)
kAk∞ = max
1≤i≤m
n
X
(maximum singular value of A)
|aij |
(row sum)
j=1
and the Frobenius norm is
v
uX
n
p
um X
∗
t
kAkF = Trace(A A) =
|aij |2
i=1 j=1
bullet We are all aware of matrix multiplication. Say y = Ax where
we consider x as an input and y as an output. The norms on these
input/output vectors induces a norm on the matrix called the induced
norm. The matrix norm induced by a vector p-norm is thus defined as
as
kAxkp
4
kAkp = sup
x6=0 kxkp
2. Inner products
High school physics teaches us the notion of a dot (scalar) product
between vectors in R3 . The dot product ushers in the notion of an
angle between vectors, and thereby, orthogonality. We now introduce
this in a more general way and call this structure as the inner product.
An inner product on a vector space relates two vectors to a scalar. More
precisely An inner product on a real (or complex) vector space is scalar
valued map (·, ·) : V × V → F (= C or R) which satisfies
– (x, x) = 0 if and only if x = 0
– (x, y) = (y, x)∗ ∀x, y ∈ V where ∗ denotes complex conjugate
– (αx + βy, z) = α(x, z) + β(y, z) ∀x, y, z ∈ V
In an inner product space, the norm is induced by the inner product
and we have
4 p
kxk = (x, x)
We have encountered
44
2. Additional structures in vector spaces
– On Rn , a valid inner product is
4
(x, y) =
n
X
4
4
ξi ηi where x = (ξ1 , . . . , ξn ) and y = (η1 , . . . , ηn )
i=1
The q
norm induced by this inner product is the Euclidean norm
Pn 2
4
x=
i=1 ξi
– Consider the space C 0 [a, b]. A valid inner product is
Z b
4
x(s)y(s)ds
(x, y) =
a
2.1. Approximation problems in engineering. Example 1 : Recall a problem often encountered in experimental laboratory reports.
From the experimental readings of two variables - say pressure and
temperature - at ten different conditions
(P1 , T1 ), . . . , (P10 , T10 )
we were required to find the ”best” straight line approximation between
the pressure (P ) and temperature (T ). So how did we proceed ? Suppose the equation of the straight line is P = mT + K where m is the
slope and K is a constant, we performed the following minimization
min
10
X
m,K
[Pi − (mTi + K)]2
i=1
Such a minimization, using the sum of the squares of the error, was
called the least squares problem. Where does all this fit in into our
discussion of norms and inner product spaces ? Look at the space R10
(why 10 ?) and consider the vectors
4
4
P = (P1 , . . . , P10 ) T = (T1 , . . . , T10 ) and G = (1, . . . , 1)
Let us take the standard inner-product on R10 and the norm induced
by the inner-product. Our minimization problem can now be restated
as
min kP − (mT + KG)k2
m,K
So what are we doing ? Consider the subspace
4
S = span{T, G}
The vector (mT + KG) represents any vector in S. Here m and K
are scalars. By minimization, we are trying to find that vector in the
subspace which is closest to P . How is this ”closeness” measured ? In
4
terms of the norm of the error e = P − (mT + KG). When is this
2. Inner products
45
norm the smallest ? It is smallest when the error vector e is
orthogonal to the subspace S. Mathematically this means
(eopt , S) = 0
where opt stands for optimal (in this case the minimum). This implies
(eopt , T ) = 0 and (eopt , G) = 0
So we have
((mopt T + Kopt G), T ) = 0 and ((mopt T + Kopt G), G) = 0
which implies
mopt kT k2 + Kopt (G, T ) = 0 and mopt (T, G) + Kopt kGk2 = 0
These are two simultaneous equations for the scalars mopt and Kopt
which can be solved to obtain the values.
Example 2: On the space C 0 [0, 2π], let us consider the inner-product
Z 2π
4
(x, y) =
x(s)y(s)ds
0
The functions
{sin(t), cos(t) sin(2t), cos(2t), sin(3t), cos(3t) . . .}
forms a basis for this infinite dimensional space. Let us consider a finite
set of elements from this basis
4
M = {sin(t), cos(t), sin(2t), cos(2t), sin(3t), . . . sin(mt), cos(mt)}
and let us look at the problem of approximating any function f (·) in
C 0 [0, 2π] by a linear combination of the elements in M such that the
norm of the error
4
e = [f (t) − (
m
X
ai sin(it) + bi cos(it))]
i=1
is minimized. Mathematically, we have
2
m
X
min [f (t) − (
ai sin(it) + bi cos(it))]
ai i=1
The solution procedure is the same as the example before and is based
on the fact that the optimal error is orthogonal to the subspace span{M }.
46
2. Additional structures in vector spaces
To obtain the ai s we solve the equations
m
X
([f (t) − (
ai sin(it) + bi cos(it)))], sin t) = 0
i=1
m
X
([f (t) − (
ai sin(it) + bi cos(it)))], cos t) = 0
i=1
..
.
m
X
(([f (t) − (
ai sin(it) + bi cos(it)))], sin mt) = 0
i=1
(([f (t) − (
m
X
ai sin(it) + bi cos(it)))], cos mt) = 0
i=1
Question 1(The Pythagorean theorem)
This 2500 year old result is known to all of us. In Rn - with the standard
Euclidean norm induced by the inner-product - prove that if u and v
are orthogonal then
ku + vk2 = kuk2 + kvk2
Question 2(The Parallelogram Law)
We are familiar with this from Physics. If u, v are vectors in an innerproduct space show that
ku + vk2 + ku − vk2 = 2(kuk2 + kvk2 )
Question 3(Orthonormal basis)
We are familiar with the notion of a basis. In an inner-product space, if
a basis satisfies the property that (ej , ek ) = δjk for any two basis vectors
ej and ek then it is called an orthonormal basis. Suppose (e1 , . . . , en )
is an orthonormal basis for an inner-product space V . Then show that
for every v ∈ V
v = (v, e1 )e1 + (v, e2 )e2 + . . . + (v, en )en
and
kvk2 =
n
X
i=1
|(v, ei )|2
2. Inner products
47
Question 4(Orthogonal Complement)
We are familiar with the complement of a subspace. Now let us see
what an orthogonal complement of a subspace is. Given a subspace
U in an inner-product space V , its orthogonal complement denoted by
U ⊥ is
U ⊥ = {v ∈ V : (v, u) = 0 ∀u ∈ U }
L ⊥
Show that V = U
U
Question 4(Approximation of a function)
We are all familiar with the Taylor series. To approximate sin x upto
5th order with the Taylor series we wrote
x3 x5
+
3!
5!
Now let us approximate sin x a different way. Let us look at the innerproduct space C 0 [−π, π] with the inner-product
Z π
4
(z, y) =
z(s)y(s)ds
sin x = x −
−π
4
and define the set S = {1, x, x2 , x3 , x4 , x5 }. We wish to approximate
sin x as
5
X
ci xi
i=0
Using a computer find out the ci s. Also comment on which approximation is a better one - the Taylor series or the other ?
Bibliography
49