ME(EE) 550 Foundations of Engineering Systems Analysis Chapter 05: Linear Transformations and Functionals The concepts of normed vector vector spaces and inner product spaces, presented in Chapter 3 and Chapter 4, respectively, synergistically combine the topological structure of metric spaces and the algebraic structure of vector spaces. Now we present linear transformations between such spaces, where these linear transformations form a vector space in their own right. We also introduce the concept of a norm in the space of linear transformations. This chapter should be read along with Chapter 4 and Chapter 5 of Naylor & Sell. Specifically, some of the solved examples and exercises in Naylor & Sell would be very useful very useful. 1 Basic concepts Definition 1.1. (Transformations, Operators, and Functionals) Let V and W be two vector spaces (not necessarily of the same dimension) defined over the same field F (which is either R or C). Then, (i) A mapping f : V → W is called a transformation from V into W . (ii) A mapping f : V → V is called an operator from V into itself. Hence, an operator belongs to a specific class of transformations. (iii) A mapping f : V → F is called a function from V into its field. Hence, a functional belongs to a specific class of transformations. Furthermore, if the mapping f is linear, i.e., if f (α x⊕V y) = α f (x)⊕W f (y) ∀x, y ∈ V and ∀α ∈ F, then these mappings are respectively called a linear transformation, a linear operator, or a linear functional. The collection of all linear transformations from V into W forms a vector space, denoted as L(V, W ), over the field F. Example 1.1. let V = Fn and W = Fm for some n, m ∈ N. Then, the linear transformation A : V → W is an (m × n) matrix, i.e., A ∈ Fn×m . Example 1.2. let V = P (F), where P (F) denotes the space of polynomials of any degree with coefficients in F. Then, the linear mapping A : V → V is an infinite-dimensional operator. Example 1.3. let V = P (R). Then, the mapping f : V → R is a functional. For example, the norm in a normed vector space is a functional; however, the norm is not a linear functional. Definition 1.2. (Injectivity, surjectivity, and bijectivity of transformations) Let V and W be two vector spaces (not necessarily of the same dimension) defined over the same field F. Let T : V → W be a transformation. Then, 1 (i) T is called one-to-one or injective if T (x) = T (y) ⇒ x = y ∀x, y ∈ V . If T : V → W is injective, then its left inverse is S : W → V such that T S = IV . (ii) T is called onto or surjective if the range space of T is equal to W , i.e., if ∀z ∈ W ∃ x ∈ V such that T (x) = z. If T : V → W is surjective, then its right inverse is S : W → V such that T S = IW . (iii) T is called bijective if T is both injective and surjective. In that case, there exists a unique inverse of T , denoted as T −1 : W → V that is also bijective, and T −1 T = T T −1 = I. Definition 1.3. (Null space) Let L : V → W be a linear transformation, then the null space of L is defined as n N (L) , x ∈ V : Lx = 0W Proposition 1.1. (Injectivity of null spaces) Let L : V → W be a linear transformation and let A ∈ L(V, W )). A is injective if and only if N (A) = {0V }. Proof. To show the if part, let N (A) = {0V }. Then, A(x−y) = 0W ⇒ (x−y) = 0V or x = y. So, Ax = Ay ⇒ x = y, which implies A is injective. Next, to show the only if part, let N (A) 6= {0V }. Then, ∃ z 6= 0V such that Az = 0W . Now, z = (x − y) ⇒ A(x − y) = 0W ⇒ Ax = Ay with x 6= y, which implies A is not injective. Definition 1.4. (Boundedness of transformations) Let T : V → W be a transformation (not necessarily linear), where V and W be normed vector spaces (with possibly different norms) over the same field F. Then T is defined to be bounded if ∀ x ∈ V ∃ M ∈ (0, ∞) such that kT (x)kW ≤ M kxkV and M is called a bound of the transformation T . Remark 1.1. Let T : V → W be a transformation (not necessarily linear), where V and W be normed vector spaces (with possibly different norms) over the same field F. Then T is unbounded if ∀ M ∈ (0, ∞) ∃ x ∈ V such that kT (x)kW > M kxkV Remark 1.2. Let V and W be two vector spaces (not necessarily of the same dimension) defined over the same field F. Then, the collection of all bounded linear transformations L : V → W forms a vector space, denoted as BL(V, W ), over the field F that is a subspace of L(V, W ). Definition 1.5. (Induced norm of bounded linear transformations) Let A ∈ BL(V, W ). Then, the induced norm of A is defined as kAkind , inf x∈V M ∈ (0, ∞) : kAxkW ≤ M kxkV Proposition 1.2. (Equivalence of induced norms) Let A ∈ BL(V, W ). Then, the following statements are equivalent. 2 (i) If dim V > 0, then kAkind = supx6=0V kAxkW kxkV . (ii) kAkind = supkxkV =1 kAxkW . (iii) kAkind = supkxkV ≤1 kAxkW . W ≤ M ⇒ supkxkV =1 kAxkW = Proof. (i) For x 6= 0V , it follows that kAxk kxkV inf x∈V M ∈ (0, ∞) : kAxkW ≤ M kxkV . y ⇒ kxk = 1 ∀y 6= 0V , which implies that (ii) Let y 6= 0V . Define x , kyk kAkind = supkxkV =1 kAxkW . (iii) Let z = αx, where kxkV = 1, and |α| ≤ 1 for an arbitrary α ∈ F. Then, kzkV = |α|kxkV ≤ 1, which implies that supkzkV kAzkW = sup(kxkV =1, |α|≤1) |α|kAxkW = supkxkV =1 kAxkW = kAkind . Theorem 1.1. (Equivalence of continuity and boundedness for linear transformations) Let A ∈ L(V, W ). Then A is continuous if and only if A is bounded. Proof. To show the if part, let A be bounded, i.e., ∀z ∈ V ∃M ∈ (0, ∞) such that kAzkW ≤ M kzkV . Let x, y ∈ V and z = x − y. Then, since A is linear, Az = Ax−Ay. For continuity of A, we must show that ∀x ∈ V ∀ε > 0 ∃δ(ε, x) > 0 ε . such that kx − ykV < δ ⇒ kAx − AykW < ε . This achieved by choosing δ = M Next, to show the only if part, let A be continuous. Let us consider a Cauchy xk sequence {xk } of non-zero vectors in V converging to 0V . Let us define z k , kkxk ; k since A is linear and continuous, {Az } must converge to 0V as k → ∞. Now, if A k is unbounded, then V such that kAxk kW > kkxk kV , which implies ∀kk ∈N ∃x ∈ k Ax kAx kW > 1 ∀k ∈ N. This is a contradiction because that kAz k kW = kkx k k = kkxk k V V {Az k } must converge to 0V as k → ∞. Therefore, A must be bounded. Corollary 1.1. (Boundedness of finite-dimensional linear transformation) If V is finite-dimensional, then every A ∈ L(V, W ) is continuous. Proof. Let dim V = n for some n ∈ N and let {ek : k = 1, · · · , n} be a basis of V , where ej consists of all zero elements except 1 being in the j th position. Since A is linear, it follows that n X kAxkW = A αk ek W k=1 = n X |αk |kAek kW ≤ k=1 n X k=1 |αk | max kAek kW k Since the vectors ek ’s are linearly independent, it follows from Lemma 1.2 in Chapter 3 (on linear combination in normed spaces) (see also Kreyszig pp. 72-73) that there exists c ∈ (0, ∞) such that n n X X k α e ≥ c |αk | for every choice of αk ’s k k=1 V k=1 Therefore, kAxkW ≤ 1c maxk kAek kW kxkV ⇒ A is bounded. Then, by Theorem 1.1, A is continuous. Corollary 1.2. (Continuity of a linear transformation at a point) Let V and W be normed vector spaces over the same field and let a linear transformation A ∈ L(V, W ) be continuous at a point y ∈ V . Then, A is bounded on V . 3 Proof. Let {xk } be a convergent sequence in V such that xk → x ∈ V . Then, kAxk − AxkW = kA(xk − x)kW = kA(xk − x + y) − AykW . Then, as k → ∞, it follows that (xk − x + y) → y. Since A is continuous at y ∈ V , it follows that A(xk − x + y) → Ay as k → ∞. Therefore, Axk → Ax. Hence, A is continuous on V implying that A is bounded on V . Corollary 1.3. Let A ∈ L(V, W ) bounded. Let {xk } be a sequence in V that converges to x ∈ V . Then (i) The sequence {Axk } in W converges to Ax ∈ W . (ii) The null space N (A) is closed in V . Proof. Part (i) Since A is bounded, A is continuous by Theorem 1.1, the image of a convergent sequence under a continuous mapping is also a convergent sequence by Theorem 3.7.1 in Naylor & Sell (see p.74). This is also seen from the following. k Ax − Ax = A xk − x ≤ kAkind xk − x V W W Part (ii) Let {xk } be a Cauchy sequence in N (A) that converges to x ∈ N (A). Then, it follows from Part (i) that Axk → Ax. Since A is continuous and Axk = 0W ∀k, we conclude that Ax = 0W ⇒ x ∈ N (A). Therefore, N (A) is closed in V. Remark 1.3. Let V and W be two vector spaces (not necessarily of the same dimension) defined over the same field F. Then, the vector space L(V, W ) of linear transformations from V into W must be bounded if V is finite-dimensional, regardless of whether W is finite-dimensional or not. However, if V is infinitedimensional, then L(V, W ) may or may not be bounded, regardless of whether W is finite-dimensional or not. Example 1.4. (Unbounded transformation) Let P∞ [0, 1] be the space of all real d polynomials on [0, 1] with the L∞ -norm as the metric. Let D , dt be a transformation D : P∞ [0, 1] → P∞ [0, 1]. It is concluded that D is a linear transformation based on the fact that d p1 (t) + αp2 (t) dp1 (t) dp2 (t) = +α D(p1 + αp2 ) = dt dt dt = Dp1 + αDp2 ∀p1 , p2 ∈ P∞ [0, 1] ∀α ∈ R Now we show that D is an unbounded transformation. let xk (t) = tk k ∈ N. Then, d(tk ) = ktk−1 = kxk−1 Dxk = dt Therefore, kDxk kL∞ = kkxk−1 kL∞ = kkxk−1 kL∞ . Since kxk kL∞ = 1 k ∈ N, it follows that kDxk kL∞ = kkxk kL∞ and there is no upper bound on k ∈ N. Therefore, D is unbounded. It is concluded from Theorem 1.1 that D is a discontinuous transformation. Discontinuity of the derivative operator has been demonstrated earlier from the ε − δ perspective. 4 Proposition 1.3. (Bounded inverse) Let A ∈ BL(V, W ) be a bounded linear transformation. Then, A is invertible on R(A) and A−1 ∈ L(R(A), V ) is bounded if and only if there exists a lower bound M ∈ (0, ∞) such that kAxkW ≥ M kxkV ∀x ∈ V . Proof. To prove the if part, let there exist a lower bound M ∈ (0, ∞) such that kAxkW ≥ M kxkV ∀x ∈ V . Now, to show that A−1 exists on R(A), it suffices to demonstrate that A is injective, i.e., N (A) = {0V }. Let x ∈ N (A), i.e., Ax = 0W . Since A is given to be bounded, it follows that 0 = kAxkW ≥ M kxkV . Since M > 0, it follows that kxkV ⇒ x = 0V . Therefore, N (A) = {0V }. Next, we show that A−1 ∈ L(R(A), V ) is bounded. Let u, v ∈ A and let u = Ax and v = Ay. Then, linearity of A−1 follows from the fact: ∀α, β ∈ F A−1 (αu + βv) = A−1 (αAx + βAy) = αA−1 Ax + βA−1 Ay = αx + βy = αA−1 u + βA−1 v We proceed as follows to show that A−1 is bounded, kukW = kAxkW ≥ kAxkW ≥ M kxkV = M kA−1 ukV ⇒ kA−1 ukV ≤ 1 kukW M To show the only if part, let A−1 exist on R(A), and let A−1 ∈ BL(R(A), V ). Then there exists m ∈ (0, ∞) such that ∀x ∈ V kxkV = kIV xkV = kA−1 AxkV ≤ mkAxkW By setting β = 2 1 m, it follows that kAxkW ≥ M kxkV ∀x ∈ V . Linear Bounded Functionals and Dual Spaces Let us recall that if V is a vector space over the field F, then f : V → F is called a functional. In general, a functional belongs to the class of transformations. This section focuses on the space of bounded linear functionals on a vector space V , which is called the dual space V ⋆ . The concept of dual spaces is very important for understanding even the rudimentary aspects of the optimization theory. We present a few examples of linear functionals. Example 2.1. On Rn , a linear functional is expressed as: f (x) = where αk ∈ R and x = [ξ1 · · · ξn ]T ⇒ f (x) = aT x. Example 2.2. On L2 [0, 1], a linear functional f is expressed as: f (x) = where y ∈ L2 [0, 1] is given. Pn k=1 R1 0 αk ξk , dt y(t)x(t), The notions of boundedness and norm of a functional follow those of a transformation. Definition 2.1. (Norm of a bounded functional) Let V be a vector space over the field F. Then, a functional f : V → F is defined to be bounded if ∀x ∈ V ∃M ∈ (0, ∞) such that |f (x)| ≤ M kxkV . The norm of a bounded functional f : V → F is defined as: kf k , supkxkV =1 |f (x)| Definition 2.2. (Dual space) Let V be a vector space over the field F. Then, the dual space of V , denoted as V ⋆ , is the vector space of all linear bounded functionals on V , i.e., V ⋆ , {f ∈ L(V, F) : ∃M ∈ (0, ∞) such that |f (x)| ≤ M kxkV ∀x ∈ V . 5 Remark 2.1. Every bounded linear functional on a normed space (V, k • k) is continuous by Theorem 1.1 based on the fact that every functional is a transformation. However, note that all functionals are not bounded as seen below. Example 2.3. (An example of a linear unbounded functional) Let us consider a subspace U of the space ℓ∞ over the real field R, in which each sequence has finitely many non-zero elements. Let us define a linear functional f : U → R such that PN f (x) = k=1 nk ξnk , where N ∈ N and the (finitely many) non-zero elements of the sequence x ∈ U are ξn1 , ξn2 , · · · , ξnN . Although N ∈ N, there is no upper bound on N and hence the linear functional f is unbounded. Theorem 2.1. (Completion of dual spaces) Let V be a normed space over a (complete) field F. Then, its dual space V ⋆ is a Banach space. Proof. Let {z k } be a Cauchy sequence in V ⋆ . For any x ∈ V , {z k (x)} is a Cauchy sequence of scalars because |z k (x) − z ℓ (x)| ≤ kz k − z ℓ kkxkV and kz k − z ℓ k → 0 as k, ℓ → ∞. Since the field F is complete, the Cauchy sequence {z k (x)} of scalars converges to a scalar z(x) ∈ F. that is, z k (x) → z(x) ∀x ∈ V . We need to show that the functional z is linear and bounded. Linearity of z is established as follows. z(αx + βy) = lim z k (αx + βy) = lim αz k (x) + βz k (y) = αz(x) + βz(y) k→∞ k→∞ Since z k is continuous (because it is bounded), the sequence {z k (x)} converges to a continuous functional z. So, z ∈ V ⋆ . Hence, V ⋆ is a Banach space. Theorem 2.2. (Dual space of Rn ) The dual space Rn phic to Rn with Euclidean norm. ⋆ is isometrically isomor- P 1 n 2 2 . Proof. Let x = [ξ1 · · · ξn ]T ∈ Rn , where n ∈ N and let kxk , k=1 |ξk | P ⋆ n Let f ∈ (Rn be expressed as f (x) = k=1 ηk ξk , where ηk ∈ R, which is a linear combination of ξk ’s. Therefore, f is linear. Furthermore, f is bounded because n n n n X X 1 X 21 1 X 2 2 2 2 |f (x)| = ηk ξk ≤ |ηk | = |ηk |2 kxk < ∞ |ξk | k=1 k=1 k=1 k=1 Pn 2 If we choose x = [η1 · · · ηn ]T , then |f (x)| = k=1 |ηk | by equality in the 21 P n 2 ⇒ f (x) = y T x, where Cauchy-Schwarz sense. That is, kf k = k=1 |ηk | y = [η1 · · · ηn ]T ∈ Rn . Theorem 2.3. (Dual Space of ℓp ) Let p ∈ (1, ∞) and q be its conjugate, i.e., 1 1 ⋆ p + q = 1. Then, dual space ℓp is isometrically isomorphic to ℓq . Proof. Let {ek } be a Schauder basis for ℓp , where {ek } , δkj . Then, every x ∈ ℓp P∞ has a unique representation x = k=1 ξk ek , where x , {ξ1 ξ2 ξ3 · · · }. Let f ∈ ℓ⋆p . Since f is linear and bounded, it follows that f (x) = f ∞ X k=1 ∞ ∞ X X ξk f (ek ) = ξk ηk ξk ek = k=1 k=1 by defining ηk , f (ek ). Let us denote y , {η1 η2 η3 · · · }. 6 Let a sequence {xn } in ℓp be defined as xn , {ξkn }, i.e., that ξkn = ( |ηk |q ηk 0 P∞ k=1 |ξkn |p < ∞, such if k ≤ n and | ηk |> 0 if k > n or | ηk |= 0 It follows from the constraint p1 + 1q = 1 that (q − 1)p = q. By substituting the expression for ξkn in f (x), it follows that f (xn ) = n X |ηj |q j=1 and, from the property of the induced norm k f k, |f (xn )| ≤ k f k k xn k P 1 ∞ n p p |ξ | =k f k k k=1 P 1 n (q−1)p p =k f k |η | k k=1 P 1 n q p =k f k k=1 |ηk | Combining the above equations, it follows that n X |ηk |q ≤ k f k k=1 n X k=1 Dividing both sides of the above equation by identity 1 − p1 = q1 , it follows that n X |ηk |q k=1 |ηk |q 1− p1 = n X k=1 Pn k=1 |ηk |q 1q p1 |ηk |q and making use of the ≤k f k Since the positive integer n in the above equation can be arbitrarily large, by letting n → ∞ it follows that ∞ X k=1 |ηk |q 1q ≤k f k ⇒ y = {ηk } ∈ ℓq and k y kℓq ≤k f k . P 1 ∞ q q To establish the equality kykℓq = =k f k, it is necessary to k=1 |ηk | P 1q ∞ q show that ≥k f k. By Hölder inequality, it follows that k=1 |ηk | ∞ ∞ X X ξj ηj ≤k x kℓp k y kℓq ξj f (ej ) = | f (x) |= j=1 j=1 |f (x)| Therefore, ∀x 6= 0, kxk ≤k y kℓq ⇒k f k≤k y kℓq . Hence, by combining the ℓp inequalities, it follows that k f k=k y kℓq . The mapping ℓq → ℓ⋆p , defined by y 7→ f is linear and surjective, and the linear span of the vectors in the Schauder basis {ek } is dense in ℓp ; furthermore, this mapping is norm-preserving. Therefore, the dual space ℓ⋆p is isometrically isomorphic to ℓq . 7 Theorem 2.4. (Dual Space of ℓ1 ) The dual space ℓ⋆1 is isometrically isomorphic to ℓ∞ . Proof. Let {ek } be a Schauder basis for ℓp , where {ek } , δkj . Then, every x ∈ ℓ1 P∞ has a unique representation x = k=1 ξk ek , where x , {ξ1 ξ2 ξ3 · · · }. Let f ∈ ℓ⋆1 . Since f is linear and bounded, it follows that f (x) = f ∞ X k=1 ∞ ∞ X X ξk f (ek ) = ξk ηk ξk ek = k=1 k=1 by defining ηj , f (ej ), which are uniquely determined by f . Let us denote y , {η1 η2 η3 · · · }. Since kek kℓ1 = 1, it follows that |ηk | = |f (ek )| ≤k f k ⇒ sup | ηk |≤k f k and y ∈ ℓ∞ k To establish the equality k y kℓ∞ = supk | ηk |=k f k, it is necessary to show that supk | ηk |≥k f k. Therefore, ∞ X ξk ηk ≤k x kℓ1 sup | ηk | | f (x) |= k k=1 |f (x)| Therefore, ∀x 6= 0, kxk ≤ supk ηk =k y kℓ∞ , which implies k f k≤k y kℓ∞ . Hence, ℓ1 by combining the inequalities, it follows that k f k=k y kℓ∞ . he mapping ℓ∞ → ℓ⋆1 , defined by y 7→ f is linear and surjective, and the linear span of the vectors in the Schauder basis {ek } is dense in ℓ1 ; furthermore, this mapping is norm-preserving. Therefore, the dual space ℓ⋆1 is isometrically isomorphic to ℓ∞ . Theorem 2.5. (Dual Space of co ) The dual space c⋆o is isometrically isomorphic to ℓ1 . Proof. It is known that co is a closed subspace of the complete space ℓ∞ and co is complete relative to the metric induced by the norm k • kℓ∞ . Let f ∈ c⋆o . Since f is linear and bounded, it follows that f (x) = f ∞ X k=1 ∞ ∞ X X ξk f (ek ) = ξk ηk ξk ek = k=1 k=1 by defining ηj , f (ej ), which are uniquely determined by f . Let us denote y , {η1 η2 η3 · · · }. Then, k f k, sup | kxk=1 ∞ X ηk ξk |≤ k=1 ∞ X | ηk |< ∞ k=1 Hence, y = {ηk } ∈ ℓ1 and k f k≤k y kℓ1 . Next we establish the equality that is trivial if y = 0ℓ1 implying that f = 0c⋆0 . So, we assume that y 6= 0ℓ1 . Given ǫ > 0 ∃ n ∈ N such that n kyk− X ǫ < | ηk |= f (z) ≤k f k 2 k=1 8 where the vector z ∈ co has all zero coordinates after the nth and, for j = 1, 2, · · · , n, |y | zj = yjj if yj 6= 0 and zj = 0 yj = 0. As ǫ → 0, n → ∞, and hence the equality k f k=k y kℓ1 is established. Bijectivity between ℓ1 and c⋆o is established in the same way as between ℓq and ℓ⋆p in Theorem 2.3. Remark 2.2. The dual space ℓ⋆∞ of ℓ∞ that has a very abstract concept is not encountered in the engineering discipline; it may occasionally come up in analytic number theory. Note that ℓ⋆∞ 6= ℓ1 . Theorem 2.6. (Riesz-Frechét Theorem, also called Riesz Representation Theorem) Let H be a Hilbert space over the (complete) field F and H ⋆ be its dual space. Then, every vector in H ⋆ uniquely identifies a vector in H, i.e., ∀f ∈ H ⋆ ∃ a unique y ∈ H such that f (x) = hx, yiH ∀x ∈ H, and kf kind = kykH (See Naylor & Sell, p. 345.) Proof. If f = 0H ⋆ , i.e., if f (x) = hx, yiH = 0 ∀x ∈ H, then y = 0H . Therefore, we assume f 6= 0H ⋆ , i.e., ∃ x ∈ H such that f (x) = hx, yiH 6= 0 for some y 6= 0H . Then, the null space N (f ) , {x ∈ H : f (x) = 0} is a proper closed subspace of L H, i.e., N (f ) N ⊥ (f ) = H and dim N ⊥ (f ) = 1 because f : H → F. Now, the orthogonal projection of x ∈ H onto the one-dimensional space N ⊥ (f ) is f (x)z for some z ∈ N ⊥ (f ) where z 6= 0H , which implies that hx, ziH x − f (x)z ∈ N (f ), i.e., h x − f (x)z , ziH = 0 ⇒ f (x) = kzk2 z In other words, the projection of x onto N ⊥ (f ) is f (x)z = hx, uiu where u , kzk is the unique unit vector in the one-dimensional space N ⊥ (f ). Notice that the vector u that spans the space N ⊥ (f ) is independent of the choice of x; however, u is z dependent on the choice of f . By setting y , kzk 2 , we have f (x) = hx, yiH ∀x ∈ H. Thus, existence of y ∈ H such that f (x) = hx, yiH ∀x ∈ H is established. To show uniqueness of y, let there exist ỹ ∈ H such that f (x) = hx, ỹiH ∀x ∈ H. Then, hx, yiH − hx, ỹiH = f (x) − f (x) = 0 ⇒ hx, (y − ỹ)iH = 0 ∀x ∈ H ⇒ y = ỹ. Thus, uniqueness of y ∈ H is established. Finally, to show that kf kind = kykH , we proceed as follows. kyk2H = hy, yiH = f (y) = |f (y)| ≤ kf kind kykH ⇒ |f kind ≥ kykH where the trivial case of y = 0H is excluded for which kf kind = 0 and kykH = 0. By Cauchy-Schwarz inequality it follows that |f (x)| ≤ |ykH ⇒ kf kind ≤ kykH kxkH 6=0 kxkH ∀x ∈ H, |f (x)| = |hx, yiH | ≤ kxkH kykH ⇒ sup Hence, kf kind = kykH . 3 Hahn-Banach Theorem The Hahn-Banach theorem is an extension of linear functionals and has many applications. It allows manipulation of normed spaces and associated bounded linear 9 functionals, and also provides an adequate theory of dual spaces. Specifically, it states that a bounded linear functional on a subspace of a normed vector space can be extended to a bounded linear functional on the entire space with the same norm. 3.1 Zorn’s Lemma Zorn’s Lemma is necessary for proving the Hahn-Banach Theorem and it has also other applications. We introduce the concept of Zorn’s Lemma. Definition 3.1. A partially ordered set, abbreviated as poset, is a set S on which a binary relation, known as partial ordering and denoted as 4, satisfies the following conditions for every α, β, γ ∈ S: α4α (Ref lexivity) If α 4 β and β 4 α, then α = β (Antisymmetry) If α 4 β and β 4 γ, then α 4 γ (T ransitivity) Definition 3.2. Two elements α and β of a partially ordered set are called comparable if they satisfy the condition α 4 β or β 4 α or both; two elements are called incomparable for which neither α 4 β holds nor does β 4 α. Definition 3.3. A totally ordered (also called linearly ordered) set or a chain is a partially ordered set such that every pair of elements in the set are comparable. In other words, a chain is a a partially ordered set having no incomparable elements. Definition 3.4. Let P, 4 be a partially ordered set. Then, Q is a maximally totally ordered subset of P if (i) Q ⊆ P, (ii) Q, 4 is totally ordered, and (iii) if any member of P not in Q is adjoined to Q, then the resulting collection of sets is no longer totally ordered by 4. Remark 3.1. Every subset of a nonempty set, which consists of a single element, is totally ordered. Definition 3.5. Let S be a partially ordered set. An upper bound of W ⊆ S is an element α ∈ S such that θ4α ∀θ∈W (1) A lower bound of W ⊆ S is an element β ∈ S such that β4θ ∀θ∈W (2) Depending on S and W , an upper bound or a lower bound of W may or may not exist. Definition 3.6. Let (S, 4) be a partially ordered set. An element α ∈ S is called a maximal element of S if θ 4 α for every θ ∈ S which is comparable to α. In other words, If θ ∈ S, then (α 4 θ) ⇒ (α = θ) (3) Similarly, a minimal element of S is an element β ∈ S such that If θ ∈ S, then (θ 4 β) ⇒ (β = θ) 10 (4) A partially ordered set S may or may not have a maximal element or a minimal element. Furthermore, a maximal element need not be an upper bound. Similarly, a minimal element need not be a lower bound. Example 3.1. Let S = (0, 1) ⊂ R; then, S, ≤ is a totally ordered set that has no maximal element and no minimal element. However, 1 ∈ R is an upper bound of S; similarly, 0 ∈ R is a lower bound of S. As a matter of fact, 1 is the least upper bound of S and 0 is the greatest lower bound of S. Example 3.2. Let S be the set of all points (x, y) in the plane R2 with y ≤ 0. Let us define an ordering 4 on S as ^ (x, y) 4 (x̃, ỹ) ⇒ (x = x̃) (y ≤ ỹ) Then, the partially ordered set (S, 4) has infinitely many maximal elements. Zorn’s lemma: Let S = 6 ∅ be a partially ordered set such that every chain T ⊆ S has an upper bound. Then, S has at least one maximal element. Hausdorff Maximality Theorem: Every (nonempty) partially ordered set contains a maximal totally ordered subset. In other words, if S is a maximal totally ordered subset of a (nonempty) partially ordered set X and if T is a totally ordered subset of X, then S ⊆ T ⊆ X ⇒ S = T . Axiom of Choice: Let S 6= ∅ be a set and I 6= ∅ be an index set. Then, there exists a mapping, called the choice function, f : I → S such that f (α) ∈ Sα ⊆ S and Sα 6= ∅. That is, for every nonempty set, there exists a choice function. The axiom of choice can also be stated as: The product of a family of nonempty sets indexed by a nonempty set is nonempty. Remark 3.2. Zorn’s Lemma and Hausdorff Maximality Theorem are equivalent and they are also equivalent to Axiom of Choice. For details, see Appendix, pp. 392-393, on Hausdorff Maximality Theorem in Real and Complex Analysis by Rudin and p. 13 in Algebra by Thomas Hungerford. Let us illustrate a simple application of Zorn’s lemma. We first make the following assertions: • V is a vector space and A is a set of linearly independent vectors belonging to V . • X is the collection of all linearly independent sets of vectors in V such that A is a subset of each member in X. • ⊆ is a partial ordering on X. • H is a Hamel basis of V such that A ⊆ H. • I is a non-empty index set and Y = {Bi : i ∈ I} is a chain of X. S • B = i∈I Bi It follows that the sets in the chain Y can be ordered as: Bi1 ⊆ Bi2 ⊆ · · · ⊆ Bin ⊆ · · · and Y has an upper bound B. Since Y can be arbitrarily chosen, X has a maximal element H by Zorn’s lemma. 11 3.2 Extension of Linear Functionals In Hahn-Banach theorem, the objective is to extend a linear functional f , defined on a subspace U of a vector space V , which has a certain boundedness property. Definition 3.7. (Extension of a linear functional) Let f be a linear functional on a proper subspace U of a vector space V over the real field R. A linear functional fext , on another subspace W ⊆ V , is called an extension of f from U to W if • U is a proper subspace of W , i.e., U W. • fext (x) = f (x) ∀x ∈ U Definition 3.8. (Sublinear functional) Let V be a vector space over the real field R, and let p : V → R. Then, p is called a sublinear functional on V if it has the following two properties: • Subadditive: p(x + y) ≤ p(x) + p(y) ∀x, y ∈ V • Positive homogeneous: p(αx) = αp(x) ∀α ∈ [0, ∞) ∀x ∈ V Remark 3.3. A norm on a vector space is a sublinear functional. Theorem 3.1. (Hahn-Banach Theorem: Extension of Linear Functionals) Let V be a vector space over the real field R, and let p be a sublinear functional on V . Let f be a linear functional on a subspace U ⊂ V . If f (x) ≤ p(x) ∀x ∈ U , then there exists an extension fext of f from U to V such that fext (x) ≤ p(x) ∀x ∈ V . Proof. The theorem is proved in the following three steps: • Step 1: Let us construct the set E consisting of the linear functional f and all linear extensions g of f , which satisfy the relation: g(x) ≤ p(x) on the domain D(g). The set E is partially ordered and Zorn’s lemma yields a maximal element fext ∈ E. • Step 2: The linear functional fext is defined on the entire space V . • Step 3: The relation fext (x) ≤ p(x) ∀x ∈ V is established. Step 1 : It is obvious that E is nonempty because f ∈ E. Let us define a partial ordering on E as: g ≤ h ⇒ h is an extension of g. That is, D(g) ⊆ D(h) and h(x) = g(x) ∀x ∈ D(g). For any chain H ⊆ E, let us define a linear functional e g ∈ E as: [ D(e g) = D(g) and ge(x) = g(x) if x ∈ D(g) (5) g∈H T Note that, for an x ∈ D(g1 ) D(g2 ) with g1 , g2 ∈ H, we have g1 (x) = g2 (x) because H is a chain so that g1 ≤ g2 or g2 ≤ g1 . Then, g ≤ ge for all g ∈ H. Hence, H has an upper bound. Since selection of H ⊆ E is arbitrary, Zorn’s lemma implies that E has a maximal element; let us call this maximal element as fext . By definition, fext is a linear extension of f that satisfies the condition: fext (x) ≤ p(x) 12 ∀x ∈ D(fext ) (6) Step 2 : Now we prove, by contradiction, that D(fext ) spans the entire vector space V . Let us assume that the assertion is false, i.e., D(fext ) is a proper subset of V . Then, there exists z ∈ V \ D(fext ) and z 6= 0 because 0 ∈ D(fext ). Let the subspace W be spanned by D(fext ) and the vector z. Thus, any x ∈ W can be expressed as: x = y + αz where y ∈ D(fext ) and α is a scalar (7) The above representation is unique because y ∈ D(fext ) and z ∈ V − D(fext . A linear functional g on W is defined by g(y + αz) = fext (y) + αc where g(z) = c ∈ R (8) Note that g is a proper extension of fext , i.e., D(fext ) is a proper subset of D(g), because if α = 0, then g(y) = fext (y) ∀y ∈ D(fext ). Consequently, if it is proven that g ∈ E by showing that g(x) ≤ p(x) ∀x ∈ D(g), then this will contradict maximality of fext so that the assertion D(fext ) 6= V is false, i.e., the truth of the statement D(fext ) = V is established. Step 3 : We will show that g with a real constant value of c in Eq. (8) satisfies the condition g(x) ≤ p(x) ∀x ∈ D(g). Let y, z ∈ D(fext ) and let w ∈ D(fext ) be fixed. Since p is a subadditive functional and the linear functional fext ≤ p, fext (y) − fext (z) = fext (y − z) ≤ p(y − z) = p(y + w − w − z) ≤ p(y + w) + p(−w − z) (9) Taking the last term to the left and the term fext (y) to the right in Eq. (9), we have −p(−w − z) − fext (z) ≤ p(y + w) − fext (y) (10) Since y does not appear on the left and z does not appear on the right, the inequality in Eq. (10) continues to hold if the supremum, m, is taken over z ∈ D(fext ) on the left and the infimum, M , over y ∈ D(fext ) on the right. Therefore, with the constant c in Eq. (8) being in the closed interval [m, M ], it follows from Eq. (10) that −p(−w − z) − fext (z) ≤ c ∀z ∈ D(fext ) (11) c ≤ p(y + w) − fext (y) ∀y ∈ D(fext ) (12) For α = 0, we already have x ∈ D(fext ). Let us first prove g(x) ≤ p(x) ∀x ∈ D(g) for α < 0 in Eq. (8). Replacing z in Eq. (11) by α−1 y and multiplying both sides by the positive quantity −α yields: αp(−w − α−1 y) + fext (y) ≤ −αc (13) From Eqs. (8) and (11), using x = y + αw yields: g(x) = fext (y) + αc ≤ −αp(−w − α−1 y) = p(αw + y) = p(x) 13 (14) For α > 0, let us replace y in Eq. (12) by α−1 y to obtain: c ≤ p(α−1 y + w) − fext (α−1 y) (15) Multiplication of Eq. (15) by α yields αc ≤ αp(α−1 y + w) − αfext (α−1 y) = p(x) − fext (y) (16) A combination of Eq. (16) with Eq. (8) yields: g(x) = fext (y) + αc ≤ p(x) (17) Remark 3.4. The above derivation of Hahn-Banach theorem does not require continuity (i.e., boundedness) of the linear functionals nor density of the vector space. In some restricted cases (e.g., finite-dimensional and separable Hilbert spaces), it is possible to prove Hahn-Banach Theorem without using Zorn’s Lemma, as seen in Chapter 5, p. 111 of Optimization by Vector Space Methods by Luenberger. Niether of these two restrictions, namely, boundedness of linear functionals and density of the vector space, are critical for many engineering applications; for example, Sobolev spaces that form the backbone of finite-element analysis are separable Hilbert spaces. Therefore, in many engineering applications, it is possible to apply the Hahn-Banach theorem to separable normed spaces (e.g., ℓ1 ) although their dual spaces (e.g., ℓ∞ ) may be nonseparable. The steps of such analysis are briefly outlined below. Let {x1 , x2 , · · · , xn , · · · } be a countable dense set in V ; let the sublinear functional p be bounded and let f be a linear bounded functional on a subspace U of V such that f (x) ≤ p(x) ∀x ∈ U . From this set of vectors let us select, one at a time, a subset of linearly independent vectors {y1 , y2 , · · · , yn , · · · } which is linearly independent of the subspace U ⊂ V . The set of vectors {y1 , y2 , · · · , yn , · · · } together with the subspace U generates a dense subspace S, i.e., S = V . Now, the functional f on U can be extended to a functional g on the subspace S by extending f from U to [U + y1 ] to [[U + y1 ] + y2 ] and so on. Finally, the resulting functional g (which is bounded because p is) can be extended from the dense subspace S to the space V . To see the above extension, let z ∈ V and let there exist a sequence {sk } of vectors in S converging to z. Let F (z) , limn→∞ g(sn ). Notice that F is linear and F ← g(sn ) ≤ p(sn ) → p(z) and thus F (z) ≤ p(z) on V. Theorem 3.2. (Hahn-Banach Theorem: Generalization) Let V be a vector space over the real field R or the complex field C, and let p be a real-valued functional on V , having the following two properties: • Subadditive: p(x + y) ≤ p(x) + p(y) ∀x, y ∈ V • Absolute homogeneous: p(αx) = |α|p ∀α ∈ C ∀x ∈ V Let f be a linear functional on a subspace U ⊂ V . If f (x) ≤ p(x) ∀x ∈ U , then there exists an extension fext of f from U to V such that |fext (x)| ≤ p(x) ∀x ∈ V . 14 Proof. If V is a vector space over R, the proof is identical to that of Theorem 3.1. If V is a vector space over C, then the functional f is also complex-valued and is split into real and complex parts as: f (x) = f real (x) + if imag (x) (18) where both f real and f imag are real-valued. We note that f real (x) ≤ |f (x)| ∀x ∈ U real It follows from Theorem 3.1 that a linear extension of fext of f real from U to V satisfies the following condition: real fext (x) ≤ p(x) ∀x ∈ V Equating the real and imaginary parts of the following equation: i[f real (x) + if imag (x)] = if (x) = f (ix) = f real (ix) + if imag (ix) ∀x ∈ U we have f imag (x) = −f real (ix) ∀x ∈ U (19) real real fext (x) = fext (x) − ifext (ix) ∀x ∈ V (20) It follows from Eqs. (19) and (20) that fext (x) = f (x) on U , i.e., fext is an extension of f from U to V . It remains to prove the following tasks: 1. fext is a linear functional on the complex vector space 2. |fext (x)| ≤ p(x) ∀x ∈ V Task 1 holds from the fact that, for any complex scalar a + ib, the following relation holds based on Eq. (20): real real fext ((a + ib)x) = fext (ax + ibx) − ifext (iax − bx) real real real real = afext (x) + bfext (ix) − i[afext (ix) − bfext (x)] real real = (a + ib)[fext (x) − ifext (ix)] real = (a + ib)fext (x) Now we prove Task 2. Let fext (0) = 0 which holds because p(x) ≥ 0 ∀x ∈ V . Let x 6= 0 be such that fext (0) 6= 0. Using the polar notation, fext (x) = |fext (x)| exp(iθ) ⇒ |fext (x)| = exp(−iθ)fext (x). Since |fext (x)| is real, the absolute homogeneity property of the sublinear functional p yields real exp(−iθ)x ≤ p exp(−iθ)x |fext (x)| = fext = |exp(−iθ(x)|p(x) = p(x) The proof is thus complete. Further details are available in Real and Complex Analysis by Rudin (see Chapter 5, p. 105). 15 4 Applications of Hahn-Banach Theorem to Bounded Linear Functionals Theorem 4.1. (Hahn-Banach Theorem: Normed Spaces) Let f be a bounded linear functional on a subspace U of a vector space V , defined on the real field R or the complex field C. Then, there exists a bounded linear functional fext on V , which is an extension of f to V having the same norm, ||fext || = ||f || (21) Proof. If U = {0}, then f = 0 and consequently fext = 0. Let f 6= 0. Since we will use Theorem 3.2 to prove this theorem, we must first find an appropriate sublinear functional p. We have |f (x)| ≤ ||f ||U ||x|| ∀x ∈ U where we select p(x) = ||f ||U ||x|| (see Remark 3.3). Using Theorem 3.2, it follows that there exists a linear functional fext , which is an extension of f , satisfies the condition: |fext (x)| ≤ p(x) = ||f ||U ||x|| ∀x ∈ V Taking supremum over all unity norm x ∈ V , we obtain the inequality: ||fext ||V = sup||x||=1 |fext (x)| ≤ ||f ||U (22) Since a norm cannot decrease under extension, we claim that ||fext ||V ≥ ||f ||U (23) A combination of Eqs. (22) and (23) proves the theorem. Corollary 4.1. Let V be a normed space and let x0 6= 0 be an arbitrary vector in V . Then, there exists a bounded linear functional g on V such that ||g|| = 1 and g(x0 ) = ||x0 ||V . Proof. Let U be the subspace spanned by the vector x0 . Let us define a linear functional f on U as f (αx0 ) = αf (x0 ) = α||x0 ||, where α is a scalar. Then, f is bounded and ||f || = 1 because if x = αx0 , then |f (x)| = |f (αx0 )| = |α|||x0 || = ||αx0 || = ||x|| Then, Theorem 4.1 implies that f has a linear extension from U to V of norm ||fext || = ||f || = 1 because fext (x0 ) = f (x0 ) = ||x0 ||. Corollary 4.2. Let V be a normed vector space and f ∈ V ∗ . Then, every x ∈ V has the following property: ||x||V = sup||f ||=1|f (x)| and if x0 ∈ V is such that f (x0 ) = 0 ∀ f ∈ V ∗ for all f ∈ V ∗ , then x0 = 0. 16 (24) Proof. By replacing x0 by x in Corollary 4.1, it follows that supx∈V ∗ \{0V } |f (x)| |fext (x)| ≥ = ||x|| ||f || ||fext || and the proof follows from the fact that |f (x| ≤ ||f ||||x||. Lemma 4.1. Let U be a proper closed subspace of a normed vector space V . Let x0 ∈ V − U be arbitrary and the distance from x0 to U is defined as: δ = infy∈U ||y − x0 || > 0 (25) Then, there exists fext ∈ V ∗ such that ||fext || = 1; fext (y) = 0 ∀ y ∈ U ; and fext (x0 ) = δ (26) Proof. Let the subspace W be spanned by U and x0 . Let a bounded linear functional f be defined on W as: f (z) = f (y + αx0 ) = αδ (27) We will first show that f satisfies Eq. (26) and then extend f to fext on V by Theorem 4.1. Linearity of f is readily seen. Since U is closed and δ > 0, it follows that f 6= 0. It follows from Eq. (26) that f (y) = 0 and f (x0 ) = δ by setting α to 0 and 1, respectively. For α = 0, f (z) = 0. For α 6= 0, it follows from Eq. (25) that |f (z)| = |α|δ = |α| infy∈U ||y − x0 || ≤ |α| || − α−1 y − x0 || = ||y + αx0 || Therefore, |f (z)| ≤ ||z|| ∀ z ∈ W . Hence, f is bounded and ||f || ≤ 1. Next we show that ||f || ≥ 1. By definition, U contains a sequence {y k } such that ||y k − x0 || → δ as k → ∞. Let z k , y k − x0 . Then, f (z k ) = −δ by setting α = −1 in Eq. (27). Furthermore, ||f || = supz∈W −{0} |f (z k )| δ |f (z)| ≥ = k →1 k→∞ k ||z|| ||z || ||z || Hence, ||f || ≥ 1 which implies that ||f || = 1. By Theorem 4.1, f is extended to V without increasing the norm. 4.1 Dual Spaces and Separability Theorem 4.2. (Separability) For a normed vector space V , if the dual space V ∗ is separable, then V itself is separable. 17 Proof. Given that V ∗ be separable, the unit ball U ∗ , {f ∈ V ∗ : ||f || = 1} ⊂ V ∗ contains a countable dense subset, say, {f k : k ∈ N}, where ||f k || = sup||x||=1|f k (x)| = 1. Therefore, there exist unit vectors xk ∈ V such that f k (xk ) ∈ [0, 1]. Let f k (xk ) ≥ 0.5. Let W be the closure of the space spanned by {xk }. Then, W is separable because W has a countable dense subset, namely, the set of all linear combinations of the vectors xk with rational coefficients. To show that W = V by contradiction, let us assume that W 6= V . Since W is closed, it follows from Lemma 4.1 that there exists fext ∈ V ∗ with ||fext || = 1 and fext (y) = 0 ∀ y ∈ W . Since xk ∈ W , we have fext (xk ) = 0 ∀ k, which implies 0.5 ≤ |f k (xk )| = |f k (xk ) − fext (xk )| = |(f k − fext )(xk )| ≤ ||(f k − fext )||||(xk )|| = ||(f k − fext )|| The assertion ||(f k − fext )|| ≥ 0.5 is a contradiction because {f k } is dense in U ∗ ; in fact, ||(fext )|| = 1. Corollary 4.3. ℓ∗∞ is not isometrically isomorphic to ℓ1 . Proof. Let us assume that ℓ∗∞ is isometrically isomorphic to ℓ1 . Since ℓ1 is separable, so is ℓ∗∞ . By Theorem 4.2, ℓ∞ must be separable. This is a contradiction. Remark 4.1. L∗∞ is not isometrically isomorphic to L1 by the same argument as in Corollary 4.3. Remark 4.2. It follows from Corollary 4.3 that the converse of Theorem 4.2 is false. Remark 4.3. The space c0 of all sequences of scalars converging to zero is separable and c∗0 is isometrically isomorphic to ℓ1 . 4.2 Bounded Linear Functionals on C[a, b] This section presents a general representation formula for bounded linear functionals on C[a, b], where C[a, b] is the space of continuous functions on a fixed compact interval [a, b] with the metric defined as: d(x, y) = max |x(t) − y(t)| t∈[a,b] (28) Definition 4.1. A function w on [a, b] is defined to be of bounded variation on [a, b] if its total variation is finite, i.e., V ar(w) = sup n X |w(tj ) − w(tj−1 )| < ∞ (29) j=1 where the supremum is taken over all partitions Pn , {a = t0 < t1 < · · · < tn = b} for some n ∈ N 18 (30) Theorem 4.3. (Reisz Theorem on Functionals) Every bounded linear functional f on C[a, b] can be represented by a RiemannStieltjes integral Z b f (x) = x(t)dw(t) (31) a where w is of bounded variation on [a, b] and has the total variation V ar(w) = ||f || (32) Proof. It follows from Theorem 4.1 that f has an extension fext from C[a, b] to the space of all bounded functions on [a, b] with the norm defined as: ||x|| = max |x(t)| (33) t∈[a,b] and the bounded functional fext has the same norm as f , i.e., ||fext || = ||f ||. If the function w in Eq. (31) is real-valued, then it is defined as: w(t) = fext (χ[a,t] ) where t ∈ [a, b] and the characteristic function χ[a,t] = 1 on the support [a, t]. In general, for a complex-valued w, we use the polar notation to express w(t) = |w(t)| exp(i θ), where θ = arg(w(t)). For any partition (see Eq. (30)), we have: n X |w(tj ) − w(tj−1 )| j=1 = |fext (x1 ) + n X |fext (xj ) − fext (xj−1 )| j=2 = ε1 fext (x1 ) + n X εj (fext (xj ) − fext (xj−1 )) j=2 = fext ε1 x1 + n X εj (fext (xj ) − xj−1 ) j=2 ≤ ||fext || || ε1 x1 + n X j=2 εj (xj − xj−1 ) || On the right hand side, ||fext || = ||f || and the other factor || · · · || equals to 1 because |εj | = 1 and only one of the terms x1 , x2 − x1 , · · · is nonzero and its norm is equal to 1. This is true because of the choice of the structure of xt as the characteristic function χ[a,t] . On the left we take supremum over all partitions of [a, b]. Then, it follows that V ar(w) ≤ ||f || (34) Hence, w is of bounded variation on [a, b]. Next we prove Eq. (31). For every partition Pn of the form similar to Eq. (30) on C[a, b], we define a function as: zPn = x0 x1 + n X xj−1 (xj − xj−1 ) j=2 19 implying that fext is bounded on [a, b]. By definition of w, fext (zPn ) = x0 fext (x1 ) + n X xj−1 (fext (xj ) − fext (xj−1 ) j=2 = x0 w(t1 ) + = n X j=2 n X tj−1 (w(tj ) − w(tj−1 )) tj−1 (w(tj ) − w(tj−1 )) (35) j=1 where the last equality follows from w(t0 ) = w(a) = 0. By making the partition Pn finer and taking n → ∞, the sum on the right hand side of Eq. 35 approaches the integral in Eq. (31). Since fext (zPn ) → fext (x), the integral in Eq. (31) becomes equal to f (x) because x ∈ C[a, b]. Rb Since a x(t) dw(t) ≤ maxt∈[a,b] |x(t)|V ar(w), it follows from the supremum over all x ∈ C[a, b] that V ar(w) ≤ ||f || (36) The combination of Eqs. (34) and (36) yields the equality in Eq. (32). The proof is complete. 20
© Copyright 2025 Paperzz