Mathematical Structures for Systems and Control Ravi Banavar Debasish Chatterjee Systems & Control Engineering, IIT-Bombay, Powai, Mumbai – 400 076, India, Phone: +91-22-2576-7879, url: http://www.sc.iitb.ac.in/ banavar E-mail address: [email protected] Systems & Control Engineering, IIT-Bombay, Powai, Mumbai – 400 076, India, Phone: +91-22-2576-7879, url: http://www.sc.iitb.ac.in/ chatterjee http://www.sc.iitb.ac.in/ banavar E-mail address: [email protected] 2010 Mathematics Subject Classification. Primary Key words and phrases. groups, rings, fields, vector spaces, . . . Abstract. This is the complete set of lecture notes for the course Mathematical Structures for Systems & Control, at the Systems & Control Engineering, IIT Bombay. Dedication text. Contents Preface ix Introduction 1 Notation 1 A Prelude Chapter 1. . 3 Groups, fields and vector spaces Coordinate systems S2 . Groups 5 5 8 . Matrix groups 9 . Rigid body motion and the groups - SO(3) and SE(3) 10 . Three results on rotations 11 . Skew-symmetric matrices, the cross product and rotations 14 . Euler angle parameterization of rotations 16 . Interesting facts about rotations 19 . More on the rigid body transformation group SE(3) 22 0. Rings and Fields 23 1. Ring 23 2. Field 24 3. Vector Spaces 25 4. Basis 27 5. Linear functionals and dual basis 28 6. Annihilator 29 vii viii Contents 7. Direct sum 30 8. Multilinear functionals (tensors) 31 9. Linear transformations 33 S2 0. M atrixrepresentations, theadjointandsimilaritytransf ormations 34 S2 1. Eigenvalues 38 S2 2. M ultilinearf ormsandthedeterminant 38 Chapter 2. . Additional structures in vector spaces Norms S2 . Innerproducts Bibliography 41 41 43 49 Preface Text. – Ravi Banavar and Debasish Chatterjee ix Introduction Notation [l]Symbol Intended meaning R the set of real numbers ] − ∞, +∞[ the set of positive integers {1, 2, . . .} Z the set of all integers {. . . , −2, −2, 0, 1, 2, . . .} the set of non-negative integers {0, 1, 2, . . .} the √ set of complex numbers + −1, or the imaginary unit (z) real part of z ∈ (z) imaginary part of z ∈ |z| absolute value of z ∈ z complex conjugate of z ∈ F topological closure of a set F In n × n identity matrix k·k norm h·, ·i inner-product A transpose of a matrix A A adjoint of an operator A (A) trace of a square matrix A (·) Fourier transform 1-2 AB A is defined by B AB B is defined by A A∼ A is isomorphic to B =B 1 A Prelude In the study of many engineering disciplines, one encounters differential equations, matrix equations, eigen values and eigen vectors, functions and polynomials. The last two centuries have witnessed the birth and evolution of a sound mathematical framework to study these notions. Comprehending the basic mathematical structure on which these many entities rest on, greatly enhances our appreciation of their usage and provides more insight into the role they play in a problem. These notes are an attempt in this direction. The examples included are mainly from the fields of mechanics, control systems and signals. The first three chapters of these notes have been constructed from liberal borrowings from the following classic textbooks: • Finite Dimensional Vector Spaces - P. Halmos, Springer, 84 • Ordinary Differential Equations - V. Arnold, Springer, 92 3 Chapter 1 Groups, fields and vector spaces In a broad sense, the study of groups is the study of symmetries or invariants. For instance, what are the objects (transformations) that render the length of a vector in Euclidean space invariant ? They are translations and rotations. Similarly, in a rigid body, • the distance between any two points, as well as • the orientation of a coordinate frame fixed to the rigid body, remain invariant under rotations and translations. Energy conservation in the physical world implies invariance of physical laws with time. The kinetic energy of a rigid body is invariant with respect to translations and rotations of the frame with respect to which it is measured. To name a few areas, the theory of groups finds applications in • Rigid body dynamics and control - robotics, satellite and aerospace dynamics, • Computer graphics, • Cryptography. 1. Coordinate systems 1.1. Rotational transformations. Consider a plane as shown in figure 1 and a Cartesian frame of reference. From high- school understanding, we assign coordinates to any point p as (xp , yp ) where xp and (yp ) indicate the 5 6 1. Groups, fields and vector spaces Y Y 0 p yp X0 0 yp 0 α xp o |{z} xp X Figure 1. Rotation of frames component of a segment drawn from the origin to the point p along the X-axis and Y-axis respectively. In problems, the choice of a coordinate frame is at the discretion of the user. Let us now choose a different Cartesian frame of reference, with the origin at the same point and oriented at an angle α (taken counter-clockwise) from the previous one. The coordinates of the point p in the new frame, denoted by (x1p , yp1 ), are given by 1 xp cos α sin α xp = (1) . yp1 − sin α cos α yp The matrix cos α sin α − sin α cos α that relates the coordinates of the point p between two frames is termed a transformation matrix. More specifically, it is termed a rotational transformation matrix (or a rotation matrix) in two-dimensional space. Now consider a third frame of reference that is oriented at an angle of β from the second one. The new coordinates (x2p , yp2 )are given by 2 1 xp xp cos β sin β = yp2 − sin β cos β yp1 (2) = cos β sin β − sin β cos β cos α sin α − sin α cos α xp yp Notice that the product of two rotation matrices cos β sin β cos α sin α cos(β + α) sin(β + α) (3) = , − sin β cos β − sin α cos α − sin(β + α) cos(β + α) 1. Coordinate systems 7 Y0 p X0 Y α o o X Figure 2. Affine transformations is a rotation matrix, and 2 xp cos(β + α) sin(β + α) xp (4) = . − sin(β + α) cos(β + α) yp yp2 1.2. Affine transformations. From pure rotations of the coordinate frame examined in the previous section, we now examine a transformation that involves a rotation followed by a translation of the origin of the new coordinate frame. See figure 4. Re-examining the first transformation that we studied, with the additional assumption that the origin translates a and b units alone the x2 and y 2 axes respectively, the new coordinates (x1p , yp1 )are given by (5) x1p yp1 = cos α sin α − sin α cos α xp yp + a b . To enable matrix multiplication to express the new coordinates of successive changes of coordinate frames, we adopt a slightly altered notation. We append a 1 to the coordinates of the point p and express this as a 3 × 1 column vector as xp yp . 1 8 1. Groups, fields and vector spaces The transformation from one coordinate system expressed in terms of a 3 × 3 matrix as 1 xp cos α sin α a yp1 = − sin α cos α b (6) 0 0 1 1 to the other can now be xp yp . 1 Note that the left-top side 2×2 matrix denotes a rotation and the right 2×1 column vector denotes the translation. This set of matrices that denotes both translation and rotation of a coordinate frame is identified with the two-tuple R p R ∈ SO(2), p ∈ R2 and is denoted by SE(2) (special-Euclidean in 2-dimensions). It has the following properties: The two examples presented, motivate us to define a mathematical object that plays an important role in the study of dynamical systems. 2. Groups Consider a set S and define a binary operation, denoted by the symbol +. The operation + between any two objects a and b of the set is denoted as a + b, and yields an element which belongs to the set S (this is called the closure property.) With this basic structure we impose certain requirements on the binary operation to define a group. A group is a set G with a binary operation + that satisfies the following properties. • For all x, y ∈ G, x + y ∈ G (Closure) and (x + y) + z = x + (y + z) (Associativity.) • There exists a unique 0 ∈ G such that x + 0 = 0 + x = x for every x ∈ G (Existence of the identity element.) • For every x ∈ G there exists a unique x−1 ∈ G such that x + x−1 = 0. (Existence of the inverse.) 1. Which of the following structures qualifies to be termed a group ? (1) Z with the conventional addition operation ? (2) Z with the conventional multiplication operation ? (Note: from henceforth, if not mentioned, the words ”addition” and ”multiplication” will denote the conventional addition and multiplication operations in the reals, respectively.) (3) R with the addition operation ? with the multiplication operation ? 3. Matrix groups 9 (4) The set of all polynomials (with real coefficients) of degree ≤ n with the addition operation ? (note: addition of two polynomials implies adding coefficients of terms with identical indices.) (5) The set of all rotational transformations that relate coordinates of points on a rigid body undergoing pure rotation from a body-fixed frame to an earth-fixed frame in three-dimensional space ? (6) The set of all transformations that relate coordinates of points on a rigid body undergoing general motion from a body-fixed frame to an earth-fixed frame in three-dimensional space ? (7) The set A = {ea : a ∈ R+ } with the multiplication operation ? (8) The set A = {ea : a ∈ R} with the multiplication operation ? A commutative group satisfies • For all x, y ∈ G, x + y = y + x (Commutativity.) Which of the sets amongst those in the previous question (2) qualifies to be termed a commutative group ? 3. Matrix groups Groups, whose elements are matrices, are called matrix groups. Matrix groups form an object of study by themselves. Here we state a few frequently encountered matrix groups. • GL(n, R), the general linear group, is the set of n × n nonsingular matrices with real entries with the binat operation being the usual multiplication. • O(n), the orthogonal group of order n, is the subset of GL(n, R) with the additional property that R ∈ SO(n) ⇒ RRT = I. • SO(n), the special orthogonal group of order n, is the subset of O(n) with the additional property that R ∈ SO(n) ⇒ det(R) = 1. • The symplectic group Sp(2n, R) consists of 2n × 2n matrices with real entries that satisfy AT JA = J, 10 1. Groups, fields and vector spaces J= where 0 In .T hebinaryoperation, onceagain, istheconventionalmatrixmultiplication. −In 0 • , the general linear group, is the set of n × n non-singular matrices with complex entries, with the binary operation being conventional matrix multiplication. • U (n, C), the unitary group of order n is the subset of that satisfies < Ax, Ay >=< x, y >, or |det(A)| = 1. • SU (n, C), the special unitary group of order n is the subset of U (n, C) with the additional property that A ∈ SU (n) ⇒ det(A) = 1. 4. Rigid body motion and the groups - SO(3) and SE(3) Rigid body motion is characterized by two properties • The distance between any two points in the body remains invariant • The orientation of the body is preserved. (A right-handed coordinate system remains right-handed) Two groups which are of particular interest to us in the context of rigid body motion are SO(3) - the special orthogonal group that represents rotations and SE(3) - the special Euclidean group that represents general rigid body motions (both rotations and translations.) • Elements of SO(3) are 3 × 3 real matrices and satisfy RT R = I with det(R) = 1. • An element of SE(3) is of the form (p, R) where p ∈ R3 and R ∈ SO(3). The two tuple (p, R) is represented as R p R ∈ SO(3), p ∈ R3 0 1 and the group operation is the usual matrix multiplication. Rigid body motions are usually described using two frames of reference (see figure 3.) One is called the body frame that remains fixed to the body and the other is the inertial frame that remains fixed in inertial space. For a body undergoing pure rotation, and with the origins of the two frames coinciding, a rotation matrix R ∈ SO(3) maps the initial coordinates of any fixed point p in the body to its final coordinates after the rotation. 5. Three results on rotations 11 zb qb B yb qa za pab xb ya A gab xa Figure 3. Rigid body motion Similarly, coordinate changes in a motion that comprises of both rotation and translation, are given by a matrix of the form R p R ∈ SO(3), p ∈ R3 0 1 where the notation has been touched upon before. 5. Three results on rotations Every A ∈ SO(3) has an eigen value equal to 1. Proof. Recall that λ ∈ C is an eigen value of A if there exist a non-zero vector x (eigen vector) such that Ax = λx. Given AT A = I, we have (Ax)∗ (Ax) = (λx)∗ (λx) ⇒ x∗ AT Ax = |λ|2 x∗ x ⇒ x∗ x = |λ|2 x∗ x Along with the fact that det(A) = 1 = λ1 · λ2 · λ3 , this gives three possible options for the eigen value of A - (1, −1, −1), (1, 1, 1) or (1, α + iβ, α − iβ). Claim 5.1. The rotation group SO(2) can be identified with S 1 (the unit circle). Proof. Now S 1 = {x ∈ R2 : kxk = 1} Parametrize the elements of S 1 in terms of θ ∈ [0, 2π]. For each θ ∈ [0, 2π], the counter-clockwise rotation of the vectors {(1, 0), (0, 1)} in R2 (these form a basis) by the angle θ (1, 0) →(cos θ sin θ) (0, 1) →(− sin θ cos θ) 12 1. Groups, fields and vector spaces is given by the matrix Rθ = cos θ − sin θ sin θ cos θ which is an element of SO(2). Conversely, take an element of SO(2) of the form a1 a2 R= a3 a4 Then from the properties of an element of SO(2), we have a1 a4 − a2 a3 = 1; a21 + a23 = 1; a22 + a24 = 1; a1 a2 + a3 a4 = 0 It is possible to find a θ ∈ [0, 2π] such that that R can be represented in the form Rθ . A note: For those not familiar with the notion of a matrix representation of a linear transformation, we shall study this concept at a later stage, but to interpret the next theorem, here is a brief explanation. Suppose A : R3 → R3 is a linear transformation. Then its matrix representation in the basis {ξ1 , ξ2 , ξ3 } is the set of 9 scalars {αij : i, j = 1, . . . , 3} Aξ1 = 3 X αj1 ξj , Aξ2 = j=1 3 X αj1 ξ, j=1 Aξ3 = 3 X αj1 ξj j=1 and represented as α11 α12 α13 α21 α22 α23 α31 α32 α33 (7) (Euler’s theorem) Every A ∈ SO(3) is a rotation through an angle θ ∈ S 1 about an axis ω ∈ R3 . Proof. Since 1 is an eigen value of A, we have Aw = w where w ∈ R3 is an eigen vector. Choose two vectors e1 and e2 that are orthogonal to each other as well as w. So < w, e1 >= 0, < w, e2 >= 0, < e1 , e2 >= 0 6. Skew-symmetric matrices, the cross product and rotations 13 rotation.pdf Figure 4. A rotating rigid body The two vectors {e1 , e2 } lie in the plane perpendicular to w and it follows that {w, e1 , e2 } form a basis for R3 . Since A is orthogonal 0 =< w, e1 >=< Aw, Ae1 >=< w, Ae1 > (8) 0 =< w, e2 >=< Aw, Ae2 >=< w, Ae2 > and the matrix representation of A in this basis, computed as (9) Aw = w Ae1 = a1 e1 + a2 e2 Ae2 = a3 e1 + a4 e2 is of the form 1 0 0 0 a1 a3 . 0 a2 a4 Now a1 a3 a2 a4 is an element of SO(2) (why ?) and hence there exists a θ ∈ [0, 2π] such that a1 a3 cos θ − sin θ = a2 a4 sin θ cos θ It follows that A is a rotation about w through the angle θ . 14 1. Groups, fields and vector spaces 6. Skew-symmetric matrices, the cross product and rotations Consider a rigid body undergoing pure rotation about a fixed axis in inertial space with a constant angular velocity ω. We shall denote this angular → → velocity vector as ω. Let rq denote the position vector of a point q in the rigid body from a point on the axis of rotation. Then, from elementary physics, → d rq → → → vq = = ω × rq . dt → Expressing vectors in a coordinate frame fixed to the earth as ω= (ω1 , ω2 , ω3 )T → and rq (t) = (q1 (t), q2 (t), q3 (t))T , the above vectorial equation takes the coordinate form q1 (t) ω1 q1 (t) d q2 (t) = ω2 × q2 (t) , (10) dt q3 (t) ω3 q3 (t) where × denotes the cross product of vectors in R3 , defined as follows. Property 6.1. The cross product between two vectors a = (a1 , a2 , a3 )T and b = (b1 , b2 , b3 )T in R3 is defined as a2 b3 − a3 b2 a × b = a3 b1 − a1 b3 . a1 b2 − a2 b1 . Property 6.2. The operation of the cross-product could, alternatively be represented as the multiplication by a skew-symmetric matrix as 0 −a3 a2 a 0 −a1 . b → a × b = âb where â = a3 −a2 a1 0 Equation (10) now takes the form dq = ω × q = ω̂q, dt which is a set of three time-invariant linear differential equations in the variables q1 , q2 , q3 . In more generality, we often encounter equations of the form ẋ = Ax in many engineering systems, where A is an n×n matrix of real numbers and x1 (t), . . . , xn (t) are n, real variables, dependant on time. The solution to such a set of differential equations, with the initial condition, 4 x0 = (x1 (0), . . . , xn (0)), is given by x(t) = eAt x0 ∀t ≥ 0. 6. Skew-symmetric matrices, the cross product and rotations 15 where 1 2 A + ··· 2! and the infinite series converges for all real matrices A. In the current context, the solution is q(t) = eω̂t q0 , where 1 4 eω̂ = I + ω̂ + (ω̂)2 + · · · 2! is a rotation matrix. Note that the exponential map is also denoted by exp(ω̂). We now state a few facts about skew-symmetric matrices, the exponential map and rotations. The set of skew-symmetric matrices in R3×3 with the operation [·, ·], termed as a bracket, and defined as 4 eA = I + A + [X, Y ] = XY − Y X forms a Lie algebra and is denoted as so(3). . Using the vector notation in R3 , the Lie bracket on so(3) between two elements ω1 , ω2 ∈ R3 is given by [ω1 , ω2 ] = ω1 × ω2 = ωˆ1 ωˆ2 − ωˆ2 ωˆ1 Consider a rotation about the axis (1, 0, 0) in the standard basis, where the rotation is parametrized by t as follows cos(t) − sin(t) 0 Rz (.) : t → SO(3) Rz (t) = sin(t) cos(t) 0 0 0 1 This same rotation is now expressed as the exponential of a skew-symmetric matrix. Take axis of rotation be ω = (1, 0, 0), assume unit angular velocity and let the time of rotation be t. Then the rotation achieved is 0 −t 0 0 −t 0 0 −t 0 1 eω̂t = I + t 0 0 + 0 t 0 t 0 0 + . . . 2! 0 0 0 0 0 0 0 0 0 which turns out to be cos(t) − sin(t) 0 Rz = cos(t) sin(t) 0 . 0 0 1 Given the axis of rotation, the angular velocity and the time of rotation, the exponential map denoted by ”exp” gives the actual rotation. Mathematically, the exponential map is a transformation from so(3) to SO(3) given as 4 exp(ω̂) = I + ω̂ + ω̂ 2 /2! + . . . ∈ SO(3) Remark: The axis of rotation ω is often normalized such that kωk = 1 and the angular velocity vector written as αω where α is the magnitude of 16 1. Groups, fields and vector spaces the angular velocity. The exponential map from the Lie algebra so(3) to the group SO(3) is a many-to-one map that is surjective. (A given rotation (∈ SO(3))can be obtained in more than one (∈ so(3)) element). Example: Consider the elements of so(3) 0 −α 0 0 −α − 2π 0 0 0 α 0 0 α + 2π 0 0 0 0 0 0 Both yield the same value for exp(·) 7. Euler angle parameterization of rotations One of the ways of parametrizing rotations in three dimensional space is through using Euler angles. These angles denote the successive rotations about three axes to produce the resultant rotation. Consider the following sequence of three rotations as shown in figure 5. • Rotate about the blue z (denote it by Z0 ) by an angle α. Denote the rotated frame by X1 Y1 Z1 . • Rotate about the green N (also X1 ) by an angle β. Denote the rotated frame by X2 Y2 Z2 . • The final rotation is about the red Z (also Z2 ) by an angle γ. Denote the rotated frame by X3 Y3 Z3 . p x3 Now the coordinates of a point p, expressed as py3 in the third frame p z3 are given in the second frame as px2 px3 cos γ − sin γ 0 p x3 py2 = RZ2 (γ) py3 = sin γ cos γ 0 py3 pz2 p z3 0 0 1 p z3 px2 and the coordinates of point p, expressed as py2 in the second frame pz2 are given in the first frame as p x1 px2 1 0 0 px2 py1 = RX1 (β) py2 = 0 cos β − sin β py2 p z1 pz2 0 sin β cos β pz2 7. Euler angle parameterization of rotations 17 Figure 5. ZXZ euler angle convention [Source: Wikipedia] p x1 and finally, the coordinates of point p, expressed as py1 in the first p z1 frame are given in the initial frame as px1 px cos α − sin α 0 p x1 py = RZ0 (α) py1 = sin α cos α 0 py1 p z1 pz pz1 0 0 1 18 1. Groups, fields and vector spaces The composite rotation is thus RZ0 (α)RX1 (β)RZ2 (γ) and p x3 px py = RZ0 (α)RX1 (β)RZ2 (γ) py3 p z3 pz (11) cos α cos γ − sin α cos β sin γ − cos α sin γ − sin α cos β cos γ sin α sin β p x3 = sin α cos γ + cos α cos β sin γ − sin α sin γ + cos α cos β cos γ − cos α sin β py3 p z3 sin β sin γ sin β cos γ cos β The domains of α, β, γ could be taken as α ∈ [0, 2π), β ∈ [0, 2π), γ ∈ [0, 2π) Question: What are the values of α, β, γ for the rotation matrix a11 a12 0 a21 a22 0 ? 0 0 1 To give further interpretation to Euler angles and their use in engineering, assume that there exists a mechanical system that imparts angular velocities to a rigid body and, thse angular velocities are denoted based on the Euler angles. Call these angular rates as (α̇, β̇, γ̇). The resultant angular velocity vector, then, is → ω= α̇k̂ + β̇ î1 + γ̇ k̂2 . where k̂, î1 , k̂2 denote unit vectors about the X0 , Z1 , Y2 axes respectively. Expressing all vectors in the coordinates of the first frame, we have 0 1 k̂2 → RZ0 (α)RX1 (β) 0 î1 → RZ0 (α) 0 1 0 0 cos α − sin α 0 1 0 0 0 RZ0 (α)RX1 (β) 0 = sin α cos α 0 0 cos β − sin β 0 1 0 0 1 0 sin β cos β 1 sin α sin β = − cos α sin β cos β 1 cos α − sin α 0 1 cos α RZ0 (α) 0 = sin α cos α 0 0 = sin α 0 0 0 1 0 0 Finally, α̇ ω1 0 cos α sin α sin β ω2 = 0 sin α − cos α sin β β̇ . ω3 1 0 cos β γ̇ | {z } 8. Interesting facts about rotations 19 A(α, β) 4 The mapping from ω = (ω1 , ω2 , ω3 ) to (α̇, β̇, γ̇) is non-singular only when β 6= 0. What does this imply ? When β = 0, 0 cos α 0 A(α, 0) = 0 sin α 0 . 1 0 1 It can be seen that the rank of A is 2. Suppose we wish to achieve a certain angular velocity vector (a, b, c) for the mechanism, and do this through specifying the Euler angle rates. Then (12) a = (cos α)β̇ b = (sin α)β̇ c = α̇ + γ̇ From the above equations we see that a and b cannot be arbitrarily specified and further, infinitely many combinations of α̇ and γ̇ achieve the same c. In aerospace/mechanical engineering, this phenomenon is called the gimbal lock. It occurs because the map f from Euler angles to SO(3) (rotations) f : (α, β, γ) → SO(3) is not a covering map, it is not a local homeomorphism at every point, and thus at some points the rank of the map must drop below 3, at which point, this phenomenon called the gimbal lock occurs. (Note: The rank of a map or function at a point is the rank of the Jacobian at that point.) In this case, the Jacobian is A(α, β). Euler angles provide a parametrization for any rotation in three dimensional space using three numbers, but as seen, this description not unique, as also there are some points where not every rotation can be realized by the given set of Euler angles. This feature arises from a basic topological fact, and this is that there is no covering map from the 3-torus (S 1 × S 1 × S 1 ) to SO(3); the only (non-trivial) covering map is from the 3-sphere, and this prompts the use of quaternions. 8. Interesting facts about rotations (This part is optional, but to the interested reader it whets your appetite about the fascinating world of rotations and, groups at large.) We now move on to relate four mathematical entities - the 3-dimensional sphere S 3 , the real-projective space in 3-dimensions RP3 , the special unitary group SU (2, C) and the special orthogonal group SO(3). (1) The 3-dimensional sphere S 3 . We view this sphere as embedded in the 4-dimensional real vector space R4 . Thus S 3 = {(x0 , x1 , x2 , x3 ) ∈ R4 : x20 + x21 + x22 + x23 = 1}. 20 1. Groups, fields and vector spaces (2) The real-projective space RP3 in 3-dimensions consists of the set of all straight lines passing through the origin in R3 with the origin excluded. An element of RP3 is called an equivalence class, defined as [ξ] = [ξ1 , ξ2 , ξ3 ] = {ξa : a ∈ R − {0}} = {(ξ1 a, ξ2 a, ξ3 a) : a ∈ R − {0}}. As seen from the above definition, each straight line (with the origin excluded) forms an equivalence class and is considered an element of this set. (3) The special unitary group SU (2, C). We shall denote the elements of this set as x0 + ix1 x2 + ix3 (x0 , x1 , x2 , x3 ) ∈ R4 det(·) = x20 +x21 +x22 +x23 = 1 −(x2 − ix3 ) x0 − ix1 (4) The special orthogonal group SO(3): Recall that the special orthogonal group SO(3) indicates a rotation in 3-dimensional space. In terms of an axis η ∈ R3 and an angle of rotation θ ∈ R, taken in a right-handed sense, a rotation by an angle θ about an axis η is the same as that by an angle −θ about an axis −η. Hence the two tuples (η, θ) and (−η, −θ) denote the same element in SO(3). We now proceed to define four mappings between the four entities that we have just introduced. A mapping from S 3 to RP3 : Define two sets on S 3 as M = S 3 −{(0, 0, 0, 1), (0, 0, 0, −1)} and N = S 3 −{(1, 0, 0, 0), (−1, 0, 0, 0)} and consider the mappings fM : M 3 (x0 , x1 , x2 , x3 ) →[x0 , x1 , x2 ] ∈ RP3 We notice that this is a 2-to-1 map with two antipodal (diametrically opposite) points on M mapping to the same element of RP3 . So S 3 3 (−x0 , −x1 , −x2 , −x3 ) →[x0 , x1 , x2 ] ∈ RP3 Similarly fN : N 3 (x0 , x1 , x2 , x3 ) →[x1 , x2 , x3 ] ∈ RP3 The two mapping fM and fN map S 3 to RP3 . A mapping from SU (2, C) to SO(3): Let 0 i SU (2, C) ⊃ M = SU (2, C) − { } i 0 and 1 0 SU (2, C) ⊃ N = SU (2, C) − { } 0 1 S 3 = M ∪N 8. Interesting facts about rotations 21 Consider M3 x0 + ix1 x2 + ix3 −(x2 − ix3 ) x0 − ix1 →((x0 , x1 , x2 ), πx3 ) ∈ SO(3) | {z } The above map is once again a 2-to-1 map since −x0 − ix1 −x2 − ix3 →((−x0 , −x1 , −x2 ), −πx3 ) ∈ SO(3) M3 (x2 − ix3 ) −x0 + ix2 | {z } also yields the same rotation. Similarly x0 + ix1 x2 + ix3 →((x1 , x2 , x3 ), πx0 ) ∈ SO(3) N3 −(x2 − ix3 ) x0 − ix1 | {z } is again a 2-to-1 map. A mapping from S 3 to SU (2, C): Define this as 3 f : S 3 (x0 , x1 , x2 , x3 ) → x0 + ix1 x2 + ix3 −(x2 − ix3 ) x0 − ix1 ∈ SU (2, C) As is easily seen, this is an isomorphism. A mapping from RP3 to SO(3): Define this by mapping the unit ball D = {(x0 , x1 , x2 ) ∈ R3 : k(x0 , x1 , x2 )k ≤ 1} q D 3 (x0 , x1 , x2 ) 6= 0 →((x0 , x1 , x2 ), π (x20 + x21 + x22 )) ∈ SO(3) | {z } D 3 0 → I ∈ SO(3) Antipodal points in D - (x0 , x1 , x2 ) and −(x0 , x1 , x2 ) maps to the same element in SO(3). We now map D to S 3 as follows. q D 3 (x0 , x1 , x2 ) 6= 0 →(x0 , x1 , x2 , + (1 − (x20 + x21 + x22 )) ∈ S 3 with the positive sign in the last element indicating the upper hemisphere of S 3 . Further, an antipodal point in D gets identified with the same point in q D 3 (−x0 , −x1 , −x2 ) 6= 0 →(x0 , x1 , x2 , + (1 − (x20 + x21 + x22 )) ∈ S 3 The points on the boundary of D are mapped to the equator on S 3 as D 3 (x0 , x1 , x2 )(k(x0 , x1 , x2 )k = 1) →(x0 , x1 , x2 , 0) q 3 RP 3 [x0 , x1 , x2 ] →((x0 , x1 , x2 ), π (x20 + x21 + x22 )) ∈ SO(3) | {z } Based on the discussion of this section, we have the diagram 6. 22 1. Groups, fields and vector spaces S3 ∼ = SU (2, C) 2 to 1 RP3 2 to 1 ∼ = SO(3) Figure 6. Diagram for the 4 maps. 8.1. The quaternion algebra. Based on the previous discussion, let us introduce a multiplicative structure on R4 as follows. We identify an element - (x0 , x1 , x2 , x3 ) - in R4 with a 2 × 2 complex matrix x0 + ix1 x2 + ix3 −(x2 − ix3 ) x0 − ix1 and denote this collection of 2×2 matrices as R4 . This is a real vector space, with a basis 1 0 i 0 0 1 0 i 1= , i= , j= , k= 0 1 0 −i −1 0 i 0 Further, if the basis defined is declared as orthonormal, < 1, i >= 0, < 1, k >= 0, · · · and so on, then the norm of an element x ∈ R4 is kxk22 = x20 + x21 + x22 + x23 which is identical to the norm of the element (x0 , x1 , x2 , x3 ) in R4 . This is termed an isometry. Under matrix multiplication, we have i∗i = j ∗j = k∗k = −1 i∗j = −j ∗i = k, j ∗k = −k∗j = i, k∗i = −i∗k = j with 1 as the multiplicative identity. The space we have constructed is a 4-dimensional real vector space and is called the algebra of quaternions. From the discussion in the previous section, SU (2, C), that was viewed as the sphere S 3 embedded in R4 , is the set of unit quaternions. 9. More on the rigid body transformation group SE(3) The set of matrices of the form 4 ω̂ v ˆ ξ= ω̂ ∈ so(3); v ∈ R3 (⊂ R4×4 ) 0 0 11. Ring 23 with the bracket operation [·, ·] defined as (ω̂1 ω̂2 − ω̂2 ω̂1 ) ω̂1 v2 − ω̂2 v1 [ξˆ1 , ξˆ2 ] = 0 0 forms a Lie algebra and is denoted as se(3). Given the angular velocity ω, the linear velocity v and the time of motion t, define a matrix 4 ω̂ v ξˆ = 0 0 ˆ gives the actual rigid body transformation. Then the exponential exp(ξt) Mathematically, the exponential map is a transformation from se(3) to SE(3) given as 4 ˆ = ˆ + ξˆ2 t2 /2! + . . . ∈ SE(3) exp(ξt) I + ξt The exponential map from the Lie algebra se(3) to the group SO(3) is a many-to-one map that is surjective. (A given rotation (∈ SE(3))can be obtained in more than one (∈ se(3)) element). 10. Rings and Fields Our next two structures are a ring and a field. Though we shall not be discussing these in detail, they are essential in making our way to the important structure of vector spaces that, we shall discuss in much detail. 11. Ring By defining an additional binary operation in a group, we impose additional structure and define a mathematical object termed a ring. A ring is a set R with two binary operations + and × such that (1) a + (b + c) = (a + b) + c ∈ R (+ is associative) (2) There exists a unique element 0 ∈ R (called the zero element) such that a+0=0+a=a (3) For every element a ∈ R there exists an element a−1 ∈ R such that a + (a−1 ) = 0 (4) a + b = b + a ∈ S (+ is commutative) (Note that the axioms 1 through 4 make the set a commutative group with respect to the operation +. The × operation satisfies:) (5) a × (b × c) = (a × b) × c ∈ R (× is associative) (6) a × (b + c) = (a × b) + (a × c) ∈ R (× is distributive over +) 24 1. Groups, fields and vector spaces Which of the following sets qualifies for a ring ? (1) The set Z with + and ×. (2) The set of rational number with + and ×. (3) The set of all n × n matrices with polynomial entries (Pn×n ) with the conventional matrix multiplication and addition. (4) The set of all minimum phase transfer functions (denoted by Gmp ) with the usual notion of transfer function addition and multiplication. (5) The set of all proper transfer functions (denoted by Gprop ) (6) The set of all stable transfer functions (denoted by Gstb ) (7) The set of all 2 × 2 matrices with elements from Z (denoted as Z2×2 ) If there exists a unique element 1 ∈ R such that 1 × x = x × 1 = x ∈ R ∀x ∈ R, then the ring R is said to have an identity - 1. Consider a ring R with identity. Then any x ∈ R is called a unit in R if there exists a y ∈ R such that x × y = y × x = α. Which elements are the units in the ring Z, Z2×2 , Pn×n , Gst ? which satisfies the additional axiom A ring • a × b = b × a ∈ R (× is commutative) is called a commutative ring. Is Z a commutative ring ? Z2×2 ? 12. Field A field F is a commutative ring with identity satisfying the following axioms • F contains atleast two elements • Every nonzero element of F is a unit The notion of ”inverse” with respect to the × operation thus enters in the definition of a field. (1) Consider the set {0, 1, 2} where + and × are the usual addition and multiplication operations. Is this a ring ? Is this a field ? (2) Is the set Z2×2 a field ? 13. Vector Spaces 25 13. Vector Spaces A set of elements V with the binary operation + is said to form a linear vector space over the field F if they satisfy the following axioms for any x, y, z ∈ V • x+y =y+x∈V • (x + y) + z = x + (y + z) ∈ V • There is a unique element 0 in V such that 0 + x = x ∀x ∈ V • For every element x ∈ V there exists an element x−1 ∈ R such that x + (x−1 ) = 0 (Note that these first four conditions make a vector space a commutative group.) Further on, there are a few more conditions, based on the notion of a ”scalar” and ”vector” association defined as follows: • For every α ∈ F and x ∈ V, there exists an element αx ∈ V. • For every α, β ∈ F and x, y ∈ V, – αx + βy ∈ V – α(βx) = (αβ)x – α(x + y) = αx + αy – (α + β)x = αx + βx – 1x = x ∀x ∈ V Remark 1. Which of the following qualifies for a vector space ? • The set R over the field R. • The string of n-tuples of real numbers over the field R. • The set of all real-valued continuous functions over the interval [0, 1] over the field R. • The set of all polynomials of degree n or less in the indeterminate x xn + a1 xn−1 + . . . + an ai ∈ C over the field C. 13.1. Linear independence, span and subspace. A set of elements {x1 , x2 , . . . , xn } (not including the zero element θ) in a vector space V is said to be linearly independent if α1 x1 + α2 x2 + . . . + αn xn = 0 ⇒ α1 = α2 = . . . = αn = 0 Else, the set of elements is linearly dependant. 26 Note 1. Groups, fields and vector spaces more clarity needed here...Any set that includes the zero element....... (1) Consider the real vector space R2 over the field of real numbers. Is the set 1 0 (13) −1 −2 linearly independent ? (2) Let F denote the real, vector space of all continuous functions {f : R → R}. Give an example of a linearly independent set in this vector space. The linear span of a set of elements S = {x1 , . . . , xp } in a vector space V is defined as { p X αi xi : αi ∈ F } 1 and is denoted by span S. A subset M of a vector space V which satisfies x, y ∈ M ⇒ αx + βy ∈ M ∀x, y ∈ M is called a subspace of V. Remark 2. By its definition, a subspace satisfies all properties of a vector space. The maximal number of elements in any linearly independent set of a subspace M is the dimension of M . (1) Let V be the set of all 2 × 2 matrices with real (R) entries considered over the field R. Show that V is a vector space. Prove that V has dimension 4. (2) Let V be the real vector space of all functions from R into R. Which of the following sets of functions are subspaces of V ? (a) all f such that f (x2 ) = f (x)2 (b) all f such that f (0) = f (1) (c) all f which are continuous (3) In question 1, which of the following sets of matrices A in V are subspaces of V ? (a) all invertible A (b) all non-invertible A (c) all A such that A2 = A 14. Basis 27 (4) In question 1, let W1 be the set of matrices of the form x −x (14) y z and let W2 be the set of matrices of the form a b (15) −a c (a) Prove that W1 and W2 are subspaces of V T (b) Find the dimensions of W1 , W2 , W1 + W2 and W1 W2 (5) Find the coordinates (the scalars) of the vector (1, 0, 1) in the basis of C 3 consisting of the vectors {(2i, 1, 0), (2, −1, 1), (0, 1 + i, 1 − i)} (6) Classify each of the following sets as a vector space and/or a group. Specify the binary operation and check all other requirements. Rn , Rn×n , O(n), SO(n), S 1 , S 1 × S 1 , SE(3), Z (7) If R ∈ SO(3) and ω ∈ R3 , then show that Rω̂RT = (Rω) If R ∈ SO(3) and v, w ∈ R3 , then show that R(v × w) = (Rv) × (Rw) (9) (8) Is so(3) a vector space ? What is its dimension and give a basis. 14. Basis A set of elements in a vector space is called a basis if it satisfies two properties • The set is linearly independent. • Each element of the vector space belongs to the linear span of the set. Remark 3. A basis for a vector space is not unique. Any two bases for a vector space have the same number of elements. Proof. to be done. Note The number of elements in a basis of a vector space is called the dimension of the vector space. Two vector spaces V1 and V2 over the same field F are said to be isomorphic to each other if there exists a one-to-one and onto correspondence between the elements in each vector space that preserves linearity, i. e. V 1 3 x 1 → y1 ∈ V 2 , V1 3 x2 → y2 ∈ V2 ⇒ αx1 + βx2 → αy1 + βy2 28 Note 1. Groups, fields and vector spaces Examples (1) Prove that if the set {x1 , x2 , . . . , xm } spans a subspace M of a vector space V , so does the set {x1 − x2 , x2 − x3 , . . . , xm } (2) Let U be a subspace of R5 defined by U = {(ξ1 , . . . , ξ5 ) ∈ R5 : ξ1 = 3ξ2 , ξ3 = 7ξ4 } Find a basis for U . (3) Give an example of a nonempty subset U of R2 such that U is closed under scalar multiplication, but U is not a subspace of R2 . (4) Let U1 and U2 be S two subspaces of a vector space V . Under what conditions Is U1 U2 a subspace of V ? Support your answer with arguments. (5) Prove that if U1 and U2 are subspaces of a finite dimensional vector space V , then dim(U1 + U2 ) ≤ dim(U1 ) + dim(U2 ) (6) Consider the vector space R4 and the subspace U = {(ξ1 , 0, ξ3 , 0) : ξ1 , ξ3 ∈ R}. Give two subspaces U1 and U2 such that R4 = U ⊕ U1 R 4 = U + U2 R4 6= U ⊕ U2 15. Linear functionals and dual basis A linear functional F on a vector space V over the field F is a mapping F : V → F that satisfies F(α1 x1 + α2 x2 ) = α1 F(x1 ) + α2 F(x2 ) ∀x1 , x2 ∈ V and ∀α1 , α2 ∈ F Alternate notation for a linear functional is as follows. Let y be a linear functional. Then, its action on an element x in the vector space is denoted by [x, y] and [α1 x1 + α2 x2 , y] = α1 [x1 , y] + α2 [x2 , y] The set of all linear functionals over a vector space V forms a vector space and is called the dual of V and denoted by V0 . The dimension of V0 is the same as V. Elements of the dual space will be denoted by the letter y. Given a vector space V, a basis X = {x1 , . . . , xn } and n-scalars α1 , . . . , αn , there exists a unique linear functional y which satisfies [xi , y] = αi . Proof. Left as an exercise. Given a vector space V and a basis X = {x1 , . . . , xn }, there exists a unique basis {y1 , . . . , yn } with the property [xj , yi ] = δij i, j = 1, . . . , n, 16. Annihilator 29 and is termed the dual basis. Proof. Given the set of scalars (1, 0, . . . , 0), the linear functional y1 is unique. Similarly for y2 , . . . , yn . We now proceed to show that {y1 , . . . , yn } forms a basis for V0 . P Linear independence: Let αi be a set of scalars such that i αi yi = 0. Then X [xj , αi yi ] = 0∀j = 1, . . . , n ⇒ α1 = · · · = αn = 0 i Since all the αi s are zero, this implies linear independence. Every member of V0 can be expressed as a linear combination of {y1 , . . . , yn } : Let y ∈ V0 and [xi , y] = βi . Then X y= βi yi i is the expression for y in terms of the yi s. The dual of Cn (Rn ) is identified with Cn (Rn ) itself. Proof. Let y ∈ Cn0 and X = {x1 , . . . , xn } be a basis in Cn . Then there exists {α1 , . . . , αn }(αi ∈ C) that are unique and which satisfy [xi , y] = αi The unique n-tuple (α1P , . . . , αn ) is identified with the linear functional y since for any Cn 3 x = ni=1 βi xi [x, y] = n X αi βi i=1 Alternatively, every n-tuple in Cn corresponds to a linear functional on Cn . 16. Annihilator If S is a subset of a vector space V, then y ∈ V0 is said to annihilate S if [x, y] = 0 ∀x ∈ S. The set of all elements y ∈ V0 with the property that ”y annihilates S ” is called the annihilator of S and is denoted by S o The annihilator S o is always a subspace of V0 . If M is a subspace of dimension m of a finite dimensional vector space V of dimension n, then the dimension of M 0 is n − m. 30 1. Groups, fields and vector spaces Proof. Let {x1 , . . . , xm } form a basis for M and let {x1 , . . . , xm , xm+1 , . . . , xn } form a basis for V. Let {y1 , . . . , ym , ym+1 , . . . , yn } be the corresponding dual basis. We shall show that M o = span{ym+1 , . . . , yn }. o Let property of a basis, y can be expressed as P y ∈ M . From the o , [x , y] = 0 ∀i = 1, . . . , m. This implies α = y = i αi yi . Since y ∈ MP i 1 n o . . . = αm = 0. So y = i=m+1 αi yi ⇒ y ∈ span{ym+1 , . . . , yn } ⇒ M ⊂ span{ym+1 , . . . , yn }. P Let y = ni=m+1 βi yi ∈ span{ym+1 , . . . , yn }. Then for any x ∈ M [x, y] = [ m X i=1 n X xi , βj yj ] = 0 ⇒ y ∈ M o ⇒ span{ym+1 , . . . , yn } ⊂ M o . j=m+1 The result follows from the fact that the dimension of span{ym+1 , . . . , yn } is n − m. 17. Direct sum Two subspaces M and N of a vector space V are said to form a direct sum of V and denoted as V = M ⊕ N if they satisfy the following property • Every x ∈ V can be expressed in the form x = z + s z ∈ M, s ∈ N in a unique way. (1) If V is an n-dimensional vector space over a finite field , and if 0 < m < n then the number of m- dimensional subspaces of V is same as the number of (n − m)-dimensional subspaces. (2) Suppose that x, y, u and v are vectors in C4 ; let M and N be the subspaces of C4 spanned by x,y and u,v respectively. In which of the following cases is it true that C4 = M ⊕ N ? (a) x=(1,1,0,0) , y=(1,0,1,0) u=(0,1,0,1) , v=(0,0,1,1) (b) x=(-1,1,1,0) , y=(0,1,-1,1) u=(1,0,0,0) , v=(0,0,0,1) (c) x=(1,0,0,1) , y=(0,1,1,0) u=(,1,0,1,0) , v=(0,1,0,1) . (3) If M is the subspace consisting of all those vectors (ξ1 , ....ξn , ξn+1 , ....ξ2n ) in C2n for which ξ1 = ... = ξn = 0, and if N is the subspace of all those vectors for which ξj = ξn+j , j=1,...,n, then C2n = M ⊕ N . 18. Multilinear functionals (tensors) 31 (4) Construct three subspaces M, N1 and N2 of a vector space V so that M ⊕ N1 = M ⊕ N2 = V but N1 6= N2 . (Note that this means that there is no cancellation law for direct sums). What is the geometric picture corresponding to this situation ? (5) (a) If U, V and W are vector spaces, what is the relation between U⊕(V ⊕ W) and (U ⊕ V)⊕W (i.e., in what sense is the formulation of direct sums as associative operation) ? (b) In what sense is the formation of direct sums commutative ? (6) Consider the quotient spaces obtained by reducing the spaces P of polynomials modulo various subspaces. If M = Pn , is P/M finite dimensional? What if M is the subspace consisting of all even polynomials divisible by xn (where xn (t) = tn ) ? (7) Prove that each of the corresponding described below is a linear transformation. (a) V is the set C of complex numbers regarded as a real vector space; Ax is complex conjugate of x . (b) V is P; if x is a polynomial, then (Ax)(t)=x(t + 1) − x(t) . (8) Prove that if V is a finite-dimensional vector space, then the space of all linear transformations on V is finite-dimensional, and find its dimension . 18. Multilinear functionals (tensors) The notion of a linear functional is easily extended to a function with multiple arguments, wherein each argument belongs to a vector space and the function satisfies linearity with respect to each of these. A k-linear functional Fk on a vector space V over the field F is a mapping Fk : V · · × V} → F that satisfies linearity in each argument. | × ·{z k−times Remark 4. The k-linear functional, as defined above, is also called a covariant k-tensor. If each of the the arguments V in Fk were to be replaced by the dual space V0 (with the linearity property still holding true) as 0 Fk : V · · × V}0 → F, | × ·{z k−times then the object obtained is called a contravariant k-tensor and we denote the k-linear functional by a superscript notation as Fk . In many texts, a 32 1. Groups, fields and vector spaces covariant k-tensor is also denoted as a (0, k)-tensor, while a contravariant k-tensor is denoted as a (k, 0)-tensor. The set of all k-linear functionals on an n-dimensional vector space V (or covariant k-tensors), denoted by Tk , forms a vector space of dimension nk . Proof. (The axioms of a vector space are easily shown.) We move on to show the dimension. We construct a basis for Tk as follows. Let X = {e1 , . . . , en } be a basis for V. We construct nk covariant tensors as follows. The first covariant tensor F 1 . . . 1 is constructed as follows. Define | {z } k−times F1...1 (ψ1 , . . . , ψk ) = 1 for (ψ1 , ψ2 , . . . , ψk−1 , ψk ) = (e1 , e1 , . . . , e1 , e1 ) 0 otherwise The second covariant tensor 1 for (ψ1 , ψ2 , . . . , ψk−1 , ψk ) = (e2 , e1 , . . . , e1 , e1 ) F2...1 (ψ1 , . . . , ψk ) = 0 otherwise and extending this construction, the covariant k-tensor Fi1 ...ik is constructed as 1 for (ψ1 , . . . , ψk ) = (ei1 , . . . , eik ) Fi1 ...ik (ψ1 , . . . , ψk ) = 0 otherwise Now we claim that this constructed set of k-tensors {Fi1 ...ik , 1 ≤ i1 , . . . , ik ≤ n} forms a basis. Property 1: Any covariant k-tensor can be expressed as a linear combination of the elements of this set. Consider an arbitrary covariant k-tensor F which takes values at the basis vectors as F(ei1 , . . . , eik ) = βi1 ...ik . Now for any arbitrary vectors (vi1 , . . . , vik ), X X X X F(vi1 , . . . , vik ) = F( αi1 ei1 , . . . , αik eik ) = ··· αi1 . . . αik βi1 ...ik Fi1 ...ik (ei1 , . . . , eik ) i1 = X i1 ··· X ik βi1 ...ik Fi1 ...ik (vi1 , . . . , vik ) ⇒ F = ik i1 ik X i1 ··· X βi1 ...ik Fi1 ...ik ik Property 2 : The set {Fi1 ...ik , 1 ≤ i1 , . . . , ik ≤ n} is a linearly independent set. Let X X ··· αi1 ...ik Fi1 ...ik = 0 i1 ik 19. Linear transformations 33 Consider the action of the above linear combination on the vector (vi1 , vi2 , . . . , vik−1 , vik ) = (e1 , e1 , . . . , e1 ). We have X X αi1 ...ik Fi1 ...ik (e1 , e1 , . . . , e1 ) = α1...1 = 0 ··· i1 ik Continuing with this procedure with vectors (e2 , e1 , . . . , e1 ), (e3 , e1 , . . . , e1 ), . . . (e1 , e1 , . . . , en ), we show that all the coefficients αi1 ...ik are zero, thus proving linear independence. A k-linear functional that satisfies the property Fk (v1 , . . . , vi , . . . , vj , . . . , vk ) = Fk (v1 , . . . , vj , . . . , vi , . . . , vk ) for all i, j is called a symmetric k-linear functional. A k-linear functional that satisfies the property Fk (v1 , . . . , vi , . . . , vj , . . . , vk ) = −Fk (v1 , . . . , vj , . . . , vi , . . . , vk ) for all i, j is called a skew-symmetric k-linear functional. The set of all skew-symmetric k-linear functionals (k ≤ n) on V (or skew-symmetric covariant k-tensors) forms a subspace of dimension n Ck of Tk . Proof. From the skew-symmetry property, we have F(vi1 , . . . , vik ) = 0 whenever two of the vectors in its argument are identical. Moving on the lines of the proof of theorem (18), we construct a set of skew-symmetric covariant k-tensors from the set {Fi1 ...ik , 1 ≤ i1 , . . . , ik ≤ n : Fi1 ...ik (ei1 , . . . , eik ) = 1 and zero for all other arguments}, by eliminating all those elements for which any two vectors are identical. Applying this logic, we have n choices for the first subscript i1 , (n − 1) choices for the second subscript i2 and so on till we have (n − k + 1) choices for the subscript is . The set this obtained contains n × (n − 1) . . . (n − k + 1) elements and is a basis. Details can be worked out. (1) Find the dimension of the subspace of symmetric k-linear functionals on a vector space V. (2) Find the dimension of the subspace of skew-symmetric k-linear functionals on a vector space V. 19. Linear transformations A correspondence A from a vector space V to W V 3 x → Ax ∈ W 34 1. Groups, fields and vector spaces that satisfies A(αx + βy) = αAx + βAy ∀ x, y ∈ V α, β ∈ F is called a linear transformation from V to W. The set of all linear transformations from V to W forms a vector space. Proof: Left as an exercise. What is the dimension of the vector space of all linear transformations from V to W ? Proof. Hint: Consider a basis X = {x1 , . . . , xn }. Construct a set of n2 linear transformations {A11 , A12 , . . . , A1n , A21 , . . . , Ann }, defined as follows Aij xk = xj for k=i 0 otherwise For instance, take A23 . Then A23 x2 = x3 and A23 xi = 0 for all i 6= 2. Peculiarities of linear transformations: Consider the vector space of linear transformations on V. The product AB of two linear transformations A and B is defined as follows: ABx = A(Bx) ∀x ∈ V (1) The product of two non-zero transformations could yield a zero transformation. Example: Let V = P3 , the vector space of polynomials d of degree 2 or less in the indeterminate t. Now consider A = dt and 2 d B = dt2 . (2) The product, in general, is in non-commutative (AB 6= BA). Example: This is easily seen in matrices as there product, in general, is non-commutative. . 20. Matrix representations, the adjoint and similarity transformations Given a linear transformation A from V to W, a basis X = {x1 , . . . , xn } in V and a basis Y = {y1 , . . . , yp } in W, the array of np scalars (αij , i = 1, . . . , p, j = 1, . . . , n) α11 · · · α1n .. .. .. . . . αp1 · · · αpn 20. Matrix representations, the adjoint and similarity transformations 35 defined by the relation Axj = p X αkj yk j = 1, . . . , n k=1 is called the matrix representation of A with respect to the basis X and Y. (1) Let A be the linear transformation on Pn defined by (Ax)(t) = x(t + 1), and let {x1 , ..., xn+1 } be the basis of Pn defined by xj (t) = tj , j = 0, ..., n − 1. Find the matrix of A with respect to this basis. (2) Find the matrix of the operation of conjugation on C, considered as a √ real vector space, with respect to the basis {1, i}(where i = −1). (3) Consider the vector space of all two-by-two matrices and let A be the linear transformation that sends each matrix X onto P X, where 1 1 P = 1 1 . Find the matrix of A with respect to the basis consisting of 1 0 0 1 0 0 0 0 , , , 0 0 0 0 1 0 0 1 (4) Let A be the linear transformation on C2 defined by A(ξ1 , ξ2 ) = (ξ1 + ξ2 , ξ2 ). Prove that if a linear transformation B commutes with A, then there exists a polynomial p such that B = p(A). (5) If A and B are linear transformations on a vector space, and if AB = 0, does it follow that BA = 0? (6) (a) Suppose that V is a finite-dimensional vector space with basis {x1 , ..., xn }. Suppose that α1 , ..., αn are pairwise distinct scalars. If A is a linear transformation such that Axj = αj xj , j = 1, ..., n, and if B is a linear transformation that commutes with A, then there exists scalars β1 , ..., βn such that Bxj = βj xj . (b) Prove that if B is a linear transformation on a finite dimensional vector space V and if B commutes with every linear transformation on V then B is a scalar. (that is there exits a scalar β such that Bx = βx for all x in V) (7) (a) It is easy to extend matrix theory to linear transformations between different vector spaces. Suppose that U and V are vector spaces over the same field, let {x1 , ..., xn } and {y1 , ..., ym } be the bases of U and V respectively, and let A be the linear transformation from U to V. The matrix of A is, by definition, the rectangular m by n, array of scalars defined by Axj = Σi αij yi . Define addition and multiplication of rectangular matrices so as to generalize as many as possible of the results of 38inHalmos.(N otethattheproductof anm1 36 1. Groups, fields and vector spaces by n1 matrix and an m2 by n2 matrix, in that order, will be defined only if n1 =m2 .) (b)Suppose that A and B are multipliable matrices. Partition A into four rectangular blocks(top left, top right,bottom left,bottom right) and then partition B similarly so that number of columns in the top left part of A is the same as the number of rows in the top left part of B. If, in an obvious shorthand, there partitions are indicated by B11 B12 A11 A12 , ,B = A= B21 B22 A21 A22 then AB = A11 B11 + A12 B21 A11 B12 + A12 B22 A21 B11 + A22 B21 A21 B12 + A22 B22 , (c) Use subspaces and complements to express the result of (b) in terms of linear transformations (instead of matrices). (d) Generalize both (b) and (c) to larger number of pieces (instead of four). (8) Suppose that the matrix of a linear transformation (on a two-dimensional vector space) with respect to some coordinates system is 0 0 0 1 . How many subspaces are there invariant under the transformation? 20.1. The inverse and adjoint transformations. If a linear transformation A on a vector space V satisfies the two properties (1) x1 6= x2 ⇒ Ax1 6= Ax2 , (2) For every y ∈ V there exists an x ∈ V such that Ax = y , then the linear transformation is said to be invertible and the transformation which corresponds x to y, where Ax = y is called the inverse of A and is denoted by A−1 and A−1 y = x. Show that A−1 is a linear transformation. For finite dimensional vector spaces, the above two conditions are equivalent to the single condition Claim 20.1. If a linear transformation A on a finite dimensional vector space satisfies the condition Ax = 0 ⇒ x = 0 then A is said to be invertible. What does a linear transformation induce on the dual space ? A linear transformation A on a vector space V induces a transformation A0 on the 20. Matrix representations, the adjoint and similarity transformations 37 dual space V0 called the adjoint (or dual) transformation defined as [x, A0 y] = [Ax, y] ∀x ∈ V, y ∈ V0 . Claim 20.2. The adjoint is a linear transformation on V0 . Proof. Left as an exercise. 20.2. Similarity transformations. We now pose questions of the following nature: What happens to the matrix representation of a linear transformation under a change of basis ? Claim 20.3. Given a linear transformation A on a vector space V and two bases X and Y, the matrix representation [A]X is related to [A]Y as [A]Y = [C]−1 X [A]X [C]X where C is an invertible linear transformation defined as Cxi = yi . Proof. Now Ayi = βji yj = βji Cxj = βji (γkj xk ). We also have Ayi = ACxi = Aγmi xm = γmi (Axm ) = γmi (αlm xl ) Comparing the scalars associated with xk in each of the two preceding expressions, we obtain γkj βji = αkm γmi , which in matrix form is [C]X [A]Y = [A]X [C]X The result follows. Claim 20.4. Given two linear transformations A and B on a vector space V, both of which have identical matrix representations αij in basis X and Y respectively, the linear transformations are related as A = C −1 BC where C is an invertible linear transformation defined as Cxi = yi . Proof. Byi = αki yk = αki Cxk = C(αki xk ) = CAxi Now Byi = BCxi as well. Comparing the previous two expressions, we have BC = CA. 38 1. Groups, fields and vector spaces 21. Eigen values Which are those vectors x ∈ V that just get scaled x → βx(β ∈ F ) under the action of a linear transformation A ? How much do they get scaled ? The answer to this question is related to the notion of an eigen value. β ∈ F is called an eigen value of a linear transformation A if there exists atleast one non-zero vector xβ such that Axβ = βxβ The vector xβ is termed an eigen vector corresponding to the eigen value β. The eigen values of a linear transformation remain invariant under a similarity transformation. Proof. : left as an exercise. 22. Multilinear forms and the determinant In this section we usher in the notion of a determinant through the theory of multilinear forms and then establish the equivalence of this notion with our earlier comprehension of the determinant. Recall that the space of skew-symmetric n-linear forms over an n-dimensional vector space V has dimension 1. Let A be a linear transformation from V to V and let X = {x1 , . . . , xn } be a basis for V. Now let w be a skew-symmetric n-linear functional and define a transformation Ā on Tks−sym as (16) (Āw)(x1 , . . . , xn ) = w(Ax1 , . . . , Axn ) Since the space of skew-symmetric n-linear forms over an n-dimensional vector space V has dimension 1, (17) (Āw)(x1 , . . . , xn ) = δw(x1 , . . . , xn ) where δ is a scalar. We shall show that this scalar δ is indeed the determinant of the linear transformation A in the basis X. Let [αij ] be the matrix representation of A in the given basis. Then n n X X (18) w(Ax1 , . . . , Axn ) = w( αj1 xj , . . . , αjn xj ) j=1 j=1 Using the property of linearity in each argument, the RHS of the above equation could be expanded. On doing so, the terms that have any two 22. Multilinear forms and the determinant 39 identical entries would disappear and the resulting summation would yield terms, each of which looks like w(απ(1)1 xπ(1) , . . . , απ(n)n xπ(n) ) = απ(1)1 · · · απ(n)n w(xπ(1) , . . . , xπ(n) ) = απ(1)1 · · · απ(n)n πw(x1 , . . . , xn ) (19) for some permutation π of the integers (1, . . . , n). Summing over all possible permutations, we have n n X X X w( αj1 xj , . . . , αjn xj ) = απ(1)1 · · · απ(n)n πw(x1 , . . . , xn ) j=1 (20) j=1 all permutations π X = απ(1)1 · · · απ(n)n sign(π)w(x1 , . . . , xn ) all permutations π From (17) and (18), we have X δ= (21) απ(1)1 · · · απ(n)n sign(π) all permutations π The RHS of the above equation is the computation of the determinant as we know it. So we now term δ as det(A). Property 22.1. Say C = BA. Then δC = δB δA . Proof. δC w(x1 , . . . , xn ) = (C̄w)(x1 , . . . , xn ) = w(Cx1 , . . . , Cxn ) = w(BAx1 , . . . , BAxn ) = (B̄w)(Ax1 , . . . , Axn ) = δB w(Ax1 , . . . , Axn ) (22) = δB (Āw)(x1 , . . . , xn ) = δB δA w(x1 , . . . , xn ) Property 22.2. If A is an invertible transformation, then δA 6= 0. Proof. (23) 1 = AA−1 ⇒ δ1 = 1 = δA δA−1 ⇒ δA 6= 0 Chapter 2 Additional structures in vector spaces 1. Norms A norm on a vector space V is a non-negative, real-valued map (denoted by k.k) k.k : V → R≥0 which satisfies the following properties (1) krk ≥ 0 ∀r ∈ V and krk = 0 ⇐⇒ r = 0 (2) kr + yk ≤ krk + kyk ∀r, y ∈ V (Triangular inequality) (3) kβrk = |β| krk ∀β ∈ F, r ∈ V The notion of length from the real-Euclidean space (Rn ) is familiar to most of us. This length is the Euclidean norm or the 2-norm and is defined as n 4 X kxk2 = ( |xi |2 )1/2 for 4 x = (x1 , . . . , xn ) i=1 1.1. A few typical norms. On Cn and Rn consider • For x ∈ Cn , the p-norm (1 ≤ p < ∞)is defined as n 4 X kxkp = ( |xi |p )1/p , i=1 and the ∞ norm as 4 kxk∞ = max |xi |. 1≤i≤n 41 42 2. Additional structures in vector spaces • For x ∈ C 0 [a, b] (the space of continuous functions on the real interval [a, b]), the 2-norm is defined as s Z b 4 kxk2 = (x(s))2 ds. a Note that all these norms satisfy the three properties stated in the definition. 1.2. Norms on signals. The notion of a ”signal” is fairly well entrenched in engineering. Current (or voltage) in an electrical element, velocity of a moving body, concentration of a component in a mixture, torque supplied by a motor, the speed of a rotor shaft, are some examples of physical quantities that are termed signals. Mathematically speaking, signals are functions of time or other arguments. Scalar signals encountered in many electrical and mechanical applications are often real-valued functions of time (and n-dimensional signals are ntuple valued functions of time.) These signals are amenable to being modeled as elements of a vector space. We now endow signals with norms. Norms on signals, quite often, give an indication of physical quantities like energy, amplitude and so on. Examples of norms on signals: • The 1, 2 and ∞ norms for signals belonging to the space of Rn -valued continuous signals Sc : (−∞, ∞) → Rn are Z ∞X n 4 krk1 = |ri (t)|dt r ∈ Sc −∞ i=1 v uZ 4 u krk2 = t ∞ n X (ri (t))2 dt r ∈ Sc −∞ i=1 4 krk∞ = sup(kr(t)k2 ) where t∈R v u n uX kr(t)k2 = t (ri (t))2 r ∈ Sc i=1 • The 1, 2 and ∞ norms for signals belonging to the space of Rn -valued discrete sequences Sd : Z → Rn are n 4 X X krk1 = { |(ri (j)|} r ∈ Sd j=Z i=1 v u n 4 uX X krk2 = t { (ri (j))2 } j=Z i=1 r ∈ Sd 2. Inner products 43 4 krk∞ = sup(kr(j)k2 ) v u n uX kr(j)k2 = t (ri (j))2 where j∈Z r ∈ Sd i=1 • Recall that that matrices too form a vector space. So we now introduce norms on matrices. Let A ∈ C m×n . The 1,2 and ∞ norms are defined as m X kAk1 = max |aij | (column sum) 1≤j≤n p kAk2 = λmax (A∗ A) i=1 = σmax (A) kAk∞ = max 1≤i≤m n X (maximum singular value of A) |aij | (row sum) j=1 and the Frobenius norm is v uX n p um X ∗ t kAkF = Trace(A A) = |aij |2 i=1 j=1 bullet We are all aware of matrix multiplication. Say y = Ax where we consider x as an input and y as an output. The norms on these input/output vectors induces a norm on the matrix called the induced norm. The matrix norm induced by a vector p-norm is thus defined as as kAxkp 4 kAkp = sup x6=0 kxkp 2. Inner products High school physics teaches us the notion of a dot (scalar) product between vectors in R3 . The dot product ushers in the notion of an angle between vectors, and thereby, orthogonality. We now introduce this in a more general way and call this structure as the inner product. An inner product on a vector space relates two vectors to a scalar. More precisely An inner product on a real (or complex) vector space is scalar valued map (·, ·) : V × V → F (= C or R) which satisfies – (x, x) = 0 if and only if x = 0 – (x, y) = (y, x)∗ ∀x, y ∈ V where ∗ denotes complex conjugate – (αx + βy, z) = α(x, z) + β(y, z) ∀x, y, z ∈ V In an inner product space, the norm is induced by the inner product and we have 4 p kxk = (x, x) We have encountered 44 2. Additional structures in vector spaces – On Rn , a valid inner product is 4 (x, y) = n X 4 4 ξi ηi where x = (ξ1 , . . . , ξn ) and y = (η1 , . . . , ηn ) i=1 The q norm induced by this inner product is the Euclidean norm Pn 2 4 x= i=1 ξi – Consider the space C 0 [a, b]. A valid inner product is Z b 4 x(s)y(s)ds (x, y) = a 2.1. Approximation problems in engineering. Example 1 : Recall a problem often encountered in experimental laboratory reports. From the experimental readings of two variables - say pressure and temperature - at ten different conditions (P1 , T1 ), . . . , (P10 , T10 ) we were required to find the ”best” straight line approximation between the pressure (P ) and temperature (T ). So how did we proceed ? Suppose the equation of the straight line is P = mT + K where m is the slope and K is a constant, we performed the following minimization min 10 X m,K [Pi − (mTi + K)]2 i=1 Such a minimization, using the sum of the squares of the error, was called the least squares problem. Where does all this fit in into our discussion of norms and inner product spaces ? Look at the space R10 (why 10 ?) and consider the vectors 4 4 P = (P1 , . . . , P10 ) T = (T1 , . . . , T10 ) and G = (1, . . . , 1) Let us take the standard inner-product on R10 and the norm induced by the inner-product. Our minimization problem can now be restated as min kP − (mT + KG)k2 m,K So what are we doing ? Consider the subspace 4 S = span{T, G} The vector (mT + KG) represents any vector in S. Here m and K are scalars. By minimization, we are trying to find that vector in the subspace which is closest to P . How is this ”closeness” measured ? In 4 terms of the norm of the error e = P − (mT + KG). When is this 2. Inner products 45 norm the smallest ? It is smallest when the error vector e is orthogonal to the subspace S. Mathematically this means (eopt , S) = 0 where opt stands for optimal (in this case the minimum). This implies (eopt , T ) = 0 and (eopt , G) = 0 So we have ((mopt T + Kopt G), T ) = 0 and ((mopt T + Kopt G), G) = 0 which implies mopt kT k2 + Kopt (G, T ) = 0 and mopt (T, G) + Kopt kGk2 = 0 These are two simultaneous equations for the scalars mopt and Kopt which can be solved to obtain the values. Example 2: On the space C 0 [0, 2π], let us consider the inner-product Z 2π 4 (x, y) = x(s)y(s)ds 0 The functions {sin(t), cos(t) sin(2t), cos(2t), sin(3t), cos(3t) . . .} forms a basis for this infinite dimensional space. Let us consider a finite set of elements from this basis 4 M = {sin(t), cos(t), sin(2t), cos(2t), sin(3t), . . . sin(mt), cos(mt)} and let us look at the problem of approximating any function f (·) in C 0 [0, 2π] by a linear combination of the elements in M such that the norm of the error 4 e = [f (t) − ( m X ai sin(it) + bi cos(it))] i=1 is minimized. Mathematically, we have 2 m X min [f (t) − ( ai sin(it) + bi cos(it))] ai i=1 The solution procedure is the same as the example before and is based on the fact that the optimal error is orthogonal to the subspace span{M }. 46 2. Additional structures in vector spaces To obtain the ai s we solve the equations m X ([f (t) − ( ai sin(it) + bi cos(it)))], sin t) = 0 i=1 m X ([f (t) − ( ai sin(it) + bi cos(it)))], cos t) = 0 i=1 .. . m X (([f (t) − ( ai sin(it) + bi cos(it)))], sin mt) = 0 i=1 (([f (t) − ( m X ai sin(it) + bi cos(it)))], cos mt) = 0 i=1 Question 1(The Pythagorean theorem) This 2500 year old result is known to all of us. In Rn - with the standard Euclidean norm induced by the inner-product - prove that if u and v are orthogonal then ku + vk2 = kuk2 + kvk2 Question 2(The Parallelogram Law) We are familiar with this from Physics. If u, v are vectors in an innerproduct space show that ku + vk2 + ku − vk2 = 2(kuk2 + kvk2 ) Question 3(Orthonormal basis) We are familiar with the notion of a basis. In an inner-product space, if a basis satisfies the property that (ej , ek ) = δjk for any two basis vectors ej and ek then it is called an orthonormal basis. Suppose (e1 , . . . , en ) is an orthonormal basis for an inner-product space V . Then show that for every v ∈ V v = (v, e1 )e1 + (v, e2 )e2 + . . . + (v, en )en and kvk2 = n X i=1 |(v, ei )|2 2. Inner products 47 Question 4(Orthogonal Complement) We are familiar with the complement of a subspace. Now let us see what an orthogonal complement of a subspace is. Given a subspace U in an inner-product space V , its orthogonal complement denoted by U ⊥ is U ⊥ = {v ∈ V : (v, u) = 0 ∀u ∈ U } L ⊥ Show that V = U U Question 4(Approximation of a function) We are all familiar with the Taylor series. To approximate sin x upto 5th order with the Taylor series we wrote x3 x5 + 3! 5! Now let us approximate sin x a different way. Let us look at the innerproduct space C 0 [−π, π] with the inner-product Z π 4 (z, y) = z(s)y(s)ds sin x = x − −π 4 and define the set S = {1, x, x2 , x3 , x4 , x5 }. We wish to approximate sin x as 5 X ci xi i=0 Using a computer find out the ci s. Also comment on which approximation is a better one - the Taylor series or the other ? Bibliography 49
© Copyright 2026 Paperzz