Chapter 10 Conjugate Direction Methods An Introduction to Optimization Spring, 2012 Wei-Ta Chu 1 2012/4/13 Introduction Conjugate direction methods can be viewed as being intermediate between the method of steepest descent and Newton’s method. Solve quadratics of variables in steps. The usual implementation, the conjugate gradient algorithm, requires no Hessian matrix evaluations. No matrix inversion and no storage of an matrix are required. The conjugate direction methods typically perform better than the method of steepest descent, but not as well as Newton’s method. 2 Introduction For a quadratic function of variables , , , the best direction of search is in the -conjugate direction. Basically, two directions and in are said to be conjugate if . Definition 10.1: Let be a real symmetric matrix. The directions are -conjugate if for all , we have 3 Introduction Lemma 10.1: Let be a symmetric positive definite matrix. If the directions , , are nonzero and -conjugate, then they are linearly independent. Proof: Let be scalars such that Premultiplying this equality by because all other terms But and Therefore, 4 , ; hence, , , , yields , by -conjugacy. , . , are linearly independent. Example Let Note that . The matrix is positive definite because all its leading principal minors are positive: Our goal is to construct a set of -conjugate vectors Let , , require that . We have 5 . We Example Let , , . Then, , and thus To find the third vector , which would be -conjugate with and , we require that and . We have If we take mutually conjugate. 6 , then the resulting set of vectors is The Conjugate Direction Algorithm This method of finding -conjugate vectors is inefficient. A systematic procedure for finding -conjugate vectors can be devised using the idea underlying the Gram-Schmidt process of transforming a given basis of into an orthonormal basis of . 7 The Conjugate Direction Algorithm Minimizing the quadratic function of variables where function , . Note that because , the has a global minimizer that can be found by solving Basic Conjugate Direction Algorithm. Given a starting point and -conjugate directions ; for , 8 The Conjugate Direction Algorithm Theorem 10.1: For any starting point , the basic conjugate direction algorithm converges to the unique (that solves ) in steps; that is, . Proof: Consider . Because the are linearly independent, there exist constants , such that Now premultiply both sides of this equation by to obtain where the terms , by the - conjugate property. Hence, 9 The Conjugate Direction Algorithm Now, we can write Therefore, So writing and premultiplying the above by because and 10 and , we obtain . Thus, , which completes the proof. Example Find the minimizer of using the conjugate direction method with the initial point , and -conjugate direction and . We have and hence Thus, 11 Example To find , we compute and Therefore, Because 12 is a quadratic function in two variables, The Conjugate Direction Algorithm For a quadratic function of variables, the conjugate direction method reaches the solution after steps. Suppose that we start at and search in the direction to obtain we claim that 13 The Conjugate Direction Algorithm The equation implies that , where the chain rule to get Evaluating the above at has the property that . To see this, apply , we get Because is a quadratic function of , and the coefficient of the term in is , the above implies that 14 The Conjugate Direction Algorithm Using a similar argument, we can show that for all , and hence, Lemma 10.2: In the conjugate direction algorithm, for all , Proof: Note that because 15 , and . Thus, The Conjugate Direction Algorithm Prove by induction. The result is true for because We now show that if the result is true for (i.e. ), then it is true for , i.e. Fix and . By the induction hypothesis, Because and by the -conjugacy, we have It remains to be shown that 16 The Conjugate Direction Algorithm Indeed, because . Therefore, by induction, for all 17 and The Conjugate Direction Algorithm By Lemma 10.2 we see that is orthogonal to any vector from the subspace spanned by We now show that not only does satisfy , but also In other words, if we write then we can express . As increases, the subspace “expands,” and will eventually fill the whole of Rn (provided that are linearly independent). 18 The Conjugate Direction Algorithm Therefore, for some sufficiently large , will lie in . For this reason, the above result is sometimes called the expanding subspace theorem. To prove the expanding subspace theorem, define the matrix Note that Hence, 19 . Also The Conjugate Direction Algorithm Now, consider any vector . There exists a vector such that . Let . Note that is a quadratic function and has a unique minimizer that satisfies the FONC. By the chain rule, Therefore, By Lemma 10.2, . Therefore, satisfies the FONC for the quadratic function , and hence is the minimizer of ; that is, 20 The Conjugate Gradient Algorithm The conjugate direction algorithm is very effective. However, to use the algorithm, we need to specify the - conjugate directions. Fortunately, there is a way to generate -conjugate directions as we perform iterations. The conjugate gradient algorithm does not use prespecified conjugate directions, but instead computes the directions as the algorithm proceeds. At each stage of the algorithm, the direction is calculated as a linear combination of the previous direction and the current gradient, in such as way that all the directions are mutually - conjugate. 21 The Conjugate Gradient Algorithm For a quadratic function of variables, we can locate the function minimizer by performing searches along mutually conjugate directions. We consider the quadratic function where point Thus, 22 . Our first search direction from an initial is in the direction of steepest descent; that is, , where The Conjugate Gradient Algorithm We search in a direction that is -conjugate to . We choose as a linear combination of and . In general, at the th step, we choose as a linear combination of and . Specifically, we choose The coefficients is -conjugate to choosing to be 23 , are chosen in such a way that . This is accomplished by The Conjugate Gradient Algorithm The algorithm 1. Set 2. 3. 4. 5. 6. 7. 8. Set 24 ; select the initial point . If , stop; else, set . If ; go to step 3. , stop. Example Proposition 10.1: In the conjugate gradient algorithm, the directions are -conjugate. Consider the quadratic function We find the minimizer using the conjugate gradient algorithm, using the starting point We can represent as where 25 Example We have Hence, 26 Example Hence, 27 Proof of Proposition 10.1 We use induction. We first show that . Substituting for we see that Assume that , are -conjugate directions. From Lemma 10.2 we have Thus, is orthogonal to each of the directions We now show that 28 Proof of Proposition 10.1 Fix . We have Substituting this equation into the previous one yields Because , it follows that We are now ready to show that We have If , then , by virtue of the induction hypothesis. Hence, we have 29 Proof of Proposition 10.1 But Thus, It remains to show Using the expression for completes the proof. 30 . Because . We have , we get , which The Conjugate Gradient Algorithm for Nonquadratic Problems The algorithm can be extended to general nonlinear functions by interpreting as a second-order Taylor series approximation of the objective function. For a quadratic, the matrix , the Hessian of the quadratic, is constant. However, for a general nonlinear function the Hessian is a matrix that has to be reevaluated at each iteration of the algorithm. Observe that appears only in the computation of the scalars and . Because can be replaced by a numerical line search procedure, we need only concern ourselves with the formula for . 31 The Conjugate Gradient Algorithm for Nonquadratic Problems Hestenes-Stiefel Formula. Replacing the by the term . The two terms are equal in the quadratic case. . Premultiplying both sides by , subtracting from both sides, and recognizing that we get , which we can rewrite as . Therefore, the Hestenes-Stiefel Formula 32 The Conjugate Gradient Algorithm for Nonquadratic Problems Polak-Ribiere Formula. Starting from the Hestenes-Stiefel formula, we multiply out the denominator to get By Lemma 10.2, and premultiplying this by . Also, since , we get where once again we used Lemma 10.2. Hence, we get the Polak-Ribiere formula 33 The Conjugate Gradient Algorithm for Nonquadratic Problems Flether-Reeves Formula. Starting with the Polak-Ribiere formula, we multiply out the numerator to get We now use , which we get by using the equation and applying Lemma 10.2. This leads to the Fletcher-Reeves formula 34 The Conjugate Gradient Algorithm for Nonquadratic Problems Without the Hessian matrix , all we need are the objective function and gradient values at each iteration. For the quadratic case the three expressions for are exactly equal. A very important issue in minimization problems of nonquadratic functions is the line search. If the line search is known to be inaccurate, the Hestenes-Stiefel formula for is recommended. 35 Homework 2 Exercises 8.3, 8.15 Exercises 9.1 Exercises 10.9 Hand over your homework at the class of Apr. 27. 36
© Copyright 2026 Paperzz