CONCURRENT LEARNING IN THE PRESENCE OF UNCERTAIN INPUT ALLOCATION By BENJAMIN DAVID REISH Bachelor of Science in Mechanical Engineering Oklahoma Christian University Edmond, OK 2004 Submitted to the Faculty of the Graduate College of Oklahoma State University in partial fulfillment of the requirements for the Degree of MASTER OF SCIENCE July, 2015 CONCURRENT LEARNING IN THE PRESENCE OF UNCERTAIN INPUT ALLOCATION Thesis Approved: Dr. Girish Chowdhary Thesis Advisor Dr. Prabhakar Pagilla Dr. Lisa Mantini ii Acknowledgments Without my wife, Melissa, this degree would have been impossible. She is my supporter and encourager. She has endeavored tirelessly to provide for the family and raise our children while I completed this degree. She is my cheering section and without her, I would still be hating my job. She encouraged me to look for what makes me happy and to not be limited by what I think are ‘my’ responsibilities. I want to thank my parents for their encouragement and financial support throughout this degree. They helped shoulder some of the financial strain that having a family, house, and a fifty mile commute one way and allowed me to concentrate on my classwork and research to finish this degree. I want to thank Dr. Byron Newberry for planting a seed years ago about obtaining an advanced degree. I do not know that he knows the impact he had when he talked with me about his graduate school experience initially and then repeatedly over several years in small conversations that we had together. Part of this material is based upon work supported by the National Aeronautics and Space Administration under Research Initiation Grant No. NNX13AB21A issued through the Research Infrastructure Development Program. iii Acknowledgments reflect the views of the author and are not endorsed by committee members or Oklahoma State University. Name: BENJAMIN DAVID REISH Date of Degree: JULY, 2015 Title of Study: CONCURRENT LEARNING IN THE PRESENCE OF UNCERTAIN INPUT ALLOCATION Major Field: MECHANICAL AND AEROSPACE ENGINEERING Abstract: Most Model Reference Adaptive Control methods assume that the input allocation matrix (B in the state-space representation: ẋ = Ax + Bu) is known. These methods cannot be used in situations that require adaptation in presence of uncertain input allocation, such as when controls reverse on a flexible aircraft due to wing twist, or when actuator mappings are unknown. To handle such situations, a Concurrent Learning Model Reference Adaptive Control method is developed for linear uncertain dynamical systems where the input allocation matrix is uncertain. The approach relies on simultaneous estimation of the input allocation matrix using online recorded and instantaneous data concurrently, while the system is being actively controlled using the online updated estimate. It is shown that the tracking error and weight error convergence depend on how accurate the estimates of the unknown parameters are. This is used to establish the necessity for purging the concurrent learning history stack, and three algorithms for purging the history stack for eventual re-population are presented. The system stability is shown by the solutions to the system being locally bounded and staying within that local set. Local ultimate boundedness is plausible. Then a relaxation of the uncertain input allocation matrix assumption is discussed and shown to be locally bounded in the closed-loop control case and ultimately bounded within the set described. Simulations validate the theoretical results for both the uncertain input allocation case and the relaxed uncertain input allocation case. iv Nomenclature Symbol A Arm B b B b B† Brm e B D δK δc K δKr δc Kr e ė 0 B ts b K K i b Kr Kr j b W ΓB Γr Γx ΓW K K∗ e K Kr K∗r fr K Λ Description state matrix of plant reference model plant actual plant control matrix controller’s estimate of the B matrix b pseudo-inverse of B reference model’s input matrix b and B difference between B known part of B when dissolved vector stored for concurrent learning e vector with error from B vector stored for concurrent learning e vector with error from B error between x and xrm trajectory of the system error through time threshold for deciding to shock the history stacks b matrix regressor for the B e at time ts norm of B b weight error for K using B ith weight error for K used in concurrent learning history stack b weight error for Kr using B th j weight error for Kr used in concurrent learning history stack combined b K and b Kr b update law learning rate for B update law learning rate for Kr update law learning rate for K update law learning rate for W state feedback gain ideal matching gains needed for MRAC difference between K and K∗ feedforward gain ideal matching gains needed for MRAC difference between Kr and K∗r unknown diagonal matrix for dissolved B v Notes assumed known uncertain in this work Chpt. 5 uncertain B uncertain B concurrent learning Symbol b Λ λΛ e Λ P pmax Φ0 Φ1 ΦT Q r RBb RΛ RW σ θ θTstart θTend u W W∗ f W x ẋ xdes xrm ẋrm XBb XΛ XW Description estimator for Λ b outer bound of the projection operator for Λ b −Λ Λ positive definite matrix from Lyapunov eqn. max allowed points in history stack set for Projection operator set for Projection operator set for Projection operator positive definite matrix from Lyapunov eqn. reference command b regressor history stack for B regressor history stack for Λ regressor history stack for W combined x and r vectors vector used with Projection operator inner bound of Projection operator outer bound of Projection operator input to the plant combined K and Kr matrices combined K∗ and K∗r matrices e and K fr matrices combined K state of the plant derivative of the state, x desired state reference model state derivative of the reference model state b input history stack for B b input history stack for Λ state history stack for W vi Notes of dissolved B assumed measurable assumed measurable used in MRAC Table of Contents Chapter Page 1 Literature Survey 1 1.1 State-Space Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Model Reference Adaptive Control . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Persistency of Excitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Nussbaum Gains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.5 Eigenvectors and Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Model Reference Adaptive Control 7 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Classic Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Concurrent Learning-MRAC . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.4 CL-MRAC with Uncertain Allocation Matrix . . . . . . . . . . . . . . . . . 11 2.5 Combining Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.6 The Projection Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.7 Derivatives with the Projection Operator . . . . . . . . . . . . . . . . . . . 17 2.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3 Error in the History Stack 18 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2 Shocking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 vii 3.3 3.2.1 Heuristic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2.2 Hypothesis Testing Algorithm . . . . . . . . . . . . . . . . . . . . . . 20 3.2.3 Variance of Average Allocation Matrix Estimate Error . . . . . . . . 22 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4 Stability 23 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.2 Boundedness without a Full Rank History Stack . . . . . . . . . . . . . . . 25 4.3 Boundedness with a Full Rank History Stack . . . . . . . . . . . . . . . . . 27 4.4 Bounded Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.5 Ultimate Boundedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.6 Rate of Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 5 Dissolved B Matrix 39 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.2 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.2.1 Bounded Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.2.2 Locally Ultimately Bounded . . . . . . . . . . . . . . . . . . . . . . . 44 5.2.3 Rate of Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.3 6 Simulations 47 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 6.2 Uncertain Allocation Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 6.3 Dissolved Input Allocation Matrix . . . . . . . . . . . . . . . . . . . . . . . 52 viii 6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Conclusion 58 59 7.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 References 61 A Derivations 65 A.1 Expanding the Input Matrix into Known and Unknown Components . . . . 66 A.2 Define Concurrent Learning Error Terms . . . . . . . . . . . . . . . . . . . . 68 A.3 Derivative of the Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 A.4 The Lyapunov Candidate and Derivative . . . . . . . . . . . . . . . . . . . . 71 A.4.1 Properties of the Trace Operator . . . . . . . . . . . . . . . . . . . . 71 A.4.2 The Lyapunov Candidate Equation . . . . . . . . . . . . . . . . . . . 71 A.4.3 The Lyapunov Candidate Derivative . . . . . . . . . . . . . . . . . . 73 A.5 Expanding the Hatted Epsilons . . . . . . . . . . . . . . . . . . . . . . . . . 82 A.6 Expanding the Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 A.6.1 Expanding the Input with W . . . . . . . . . . . . . . . . . . . . . . 87 B Concurrent Learning Update 90 B.1 The MRAC Update Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 B.1.1 CL-MRAC Update Law . . . . . . . . . . . . . . . . . . . . . . . . . 90 B.2 MRAC Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 B.2.1 Model Reference Adaptive Control Only . . . . . . . . . . . . . . . . 91 B.2.2 Model Reference Adaptive Control with Concurrent Learning . . . . 92 B.2.3 Concurrent Learning in the Weight Space . . . . . . . . . . . . . . . 96 ix C Acronyms 100 D Additional Plots 101 x List of Tables Table Page Algorithm 1. Heuristic, Time Based Method . . . . . . . . . . . . . . . . . . . . 19 Algorithm 2. Hypothesis Test on Expectation of (x ^˙ − ẋ) . . . . . . . . . . . . . . 20 Algorithm 3. Variance of Average Allocation Matrix Estimate Error . . . . . . . 21 xi List of Figures Figure Page 2.1 Projection Operator in the Weight Space . . . . . . . . . . . . . . . . . . . 15 4.1 e and W f with Time . . . . . . . Notional Representation of System State, B, 24 6.1 6.2 Time History of Reference Model Tracking Using Algorithms 1, 2, and 3 . . 48 Time History of Reference Model Tracking Using Algorithms 1, 2, and 3, zoomed to first 10 seconds . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 6.3 System Tracking Errors Shown Using Algorithms 1, 2, and 3 . . . . . . . . 50 6.4 51 6.5 b K, and Kr History Stacks . . . . . . . . . . Minimum Eigenvalues of the B, 6.6 b Convergence Using Algorithms 1, 2, and 3, with Ideal Values . . . . . . . B W Convergence Using Algorithms 1, 2, and 3, with Ideal Values . . . . . . . 53 6.7 Dissolved Input Allocation Matrix, not using concurrent learning. . . . . . . 54 6.8 Dissolved Input Allocation Matrix, States, using concurrent learning. . . . . 56 6.9 Dissolved Input Allocation Matrix, Errors, using concurrent learning. . . . . 57 52 6.10 Dissolved Input Allocation Matrix, Adaptive Weights, with concurrent learning. 57 b Convergence, using concurrent learning. 58 6.11 Dissolved Input Allocation Matrix, Λ B.1 Step Response of Two Systems . . . . . . . . . . . . . . . . . . . . . . . . . 92 B.2 MRAC only State Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 B.3 Time history of MRAC only Adaptive Gains . . . . . . . . . . . . . . . . . 94 B.4 Time history of MRAC only State Tracking Error . . . . . . . . . . . . . . . 94 xii B.5 CL-MRAC State Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 B.6 CL-MRAC Tracking Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 B.7 CL-MRAC Adaptive Weights . . . . . . . . . . . . . . . . . . . . . . . . . . 96 B.8 CL-MRAC Weight Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 B.9 CL-MRAC Weight Stochastic Gradient . . . . . . . . . . . . . . . . . . . . . 98 D.1 System Tracking Errors Shown Using Algorithms 1, 2, and 3 . . . . . . . . 102 D.2 Dissolved Input Allocation Matrix Weight Space . . . . . . . . . . . . . . . 103 b . . . . . . . . . . . . . . . . . . . . . . D.3 Dissolved Input Allocation Matrix Λ. 104 xiii CHAPTER 1 Literature Survey Uncertain input allocation is a rare topic in the controls world. In many cases, input allocation is the one thing that is designed and known with little uncertainty about a system. Even so, there are times when input allocation is uncertain. In this chapter, Model Reference Adaptive Control (MRAC) is the focus of the available adaptive control tools. Then there is a discussion of the persistence of excitation (PE) required by classical MRAC architectures to drive the adaptive gains to their ideal values. Finally, there is a short discussion on a more general formulation and eigenvectors and eigenvalues. 1.1 State-Space Modeling Sets of equations have been used to model interconnected systems. If the format of those equations takes on the form of multiple ordinary differential equations (ODE’s) which use a minimum number of variables to describe the internal status of the system, the form of that set of ODE’s is called the state-space form. In that form, the ODE’s may be separated into parts which are linearly combined: one that is functions of the states of the system, and one that is functions of the input to the system (if any). If the system is a linear system, then these equations may be written as matrices, one multiplying the state vector and one multiplying the input vector. The state vector need not be made up of measurable quantities. The input vector is generally measurable because of its physical nature. Inputs are designed to affect the system and therefore to what extent that happens is generally known. In general, the state-space matrix form of a linear, time-invariant, continuous 1 system is denoted as ẋ(t) = Ax(t) + Bu(t) (1.1) y(t) = Cx(t) + Du(t) (1.2) where ẋ is the state derivative vector, x is the state vector, A is a matrix of coefficients, B is the input allocation matrix, u is the input to the system, y is the measurable output of the system, C is the observability matrix, and D is the direct transmission matrix. The situation is more difficult with the output defined such that C is not an identity matrix, but that may be handled. A special situation where y = x is used here and is called full state feedback. Obviously, C is an identity matrix in full state feedback while D is a zero matrix. [3] In this thesis the following terms are defined as a naming convention used to differentiate the input to the system, u, and the matrix that shows how it is distributed to the system. Definition 1 The allocation matrix, B, is the matrix, when the system is described in state-space matrix form, which is multiplied by the system’s input vector. Definition 2 The state matrix, A, is the matrix when the system is described in state-space matrix form which is multiplied by the state vector of the system. 1.2 Model Reference Adaptive Control (MRAC) Adaptive control of uncertain systems has been studied and applied to many areas [1, 7, 11, 15,21,30]. A widely used approach is MRAC which drives the instantaneous tracking error to zero. In most MRAC approaches, the input allocation matrix of the plant (e.g. the B matrix in the standard state-space representation: ẋ = Ax + Bu) is assumed known, or at least the sign of the diagonal elements of the matrix is assumed known [1, 7, 11, 15, 17, 21, 26, 30]. These methods cannot be used in situations that require adaptation in presence of uncertain input allocation, such as when controls reverse on an flexible-wing aircraft due to wing twist, or when an autopilot must control an Unmanned System whose actuator mappings are unknown. 2 Authors have studied the problem of uncertain allocation matrices: Lavretsky et al. used a diagonal scaling matrix and adaptive laws to approximate the symmetric control effectiveness loss [18]. Somanath showed uncertainties in the allocation matrix could be handled if the allocation matrix was some multiplicative combination of an uncertain matrix and the reference model input allocation matrix [29]. His work on hypersonic vehicles required the reference model input allocation matrix to be in the subspace of the plant allocation matrix. Tao et al. show how to control a multi-input system where some of the actuators became inoperative at a fixed or varying locations of actuation at some unknown time which does not address the uncertain allocation matrix problem, it just reduces the allocation matrix by the number of stuck actuators [31]. All these works require an assumption that the input allocation matrix be defined as a diagonal scaling matrix multiplied by a known matrix which is usually the reference model allocation matrix. The works just discussed all use MRAC formulations with one model, but there are other formulations available. Multiple model adaptive control [14,22] techniques can handle uncertain input allocation by choosing between candidate models for the B matrix. However, these candidate models need to be available before hand. The retrospective cost adaptive control method [28, 33] could also potentially handle uncertain input allocation situations. Stability of the system, however, may not be guaranteed while data is being collected for learning in the retrospective cost adaptive method. It appears that little work has been done on the general case of controlling a stable system when the input allocation matrix is uncertain. 1.3 Persistency of Excitation Model reference adaptive control has been shown to need persistently exciting input in order to drive the adaptive gains or weights to their ideal values by Boyd and Sastry [2]. But, with CL-MRAC, the adaptive gains are shown to converge if the input signals are exciting over a finite time. Exciting signals can be defined using Tao’s definition in [30]: over some 3 interval [t, t + T ] where t > t0 and T > 0, the input signal is exciting if Z t+T u(τ)uT (τ) dτ ≥ λI (1.3) t for some λ > 0 and I is an identity matrix of appropriate dimension. The signal is persistently exciting if (1.3) holds for all t ≥ t0 . When the adaptive weights attain their ideal values, the tracking error reduces [21], but without persistently exciting (PE) input, the adaptive weights are not guaranteed to converge to the ideal values under traditional gradient based MRAC update laws because the tracking error goes to zero, but concurrent learning MRAC (CL-MRAC) is a method that guarantees the adaptive weights converge to the ideal with only finite excitation [5, 8, 10]. CL-MRAC achieves this by using specifically selected online data concurrently with instantaneous data for adaptation. The argument can be made that any non-zero signal will fulfill this condition. This is true. What is lacking is the persistent component. With a non-zero signal, the MRAC formulation will have zero tracking error, e(t), but that is no guarantee that the adaptive weights will converge. And, in fact, zero tracking error stops adaptation and guarantees the weights will not converge to their ideal values. As an example, a step input would be exciting per the definition in (1.3), but an MRAC system would not converge to its ideal weights. The tracking error, e(t) would go to zero. An example of this can be seen in Appendix B.2.1 on page 91 with a sinusoidal input to a two state MRAC system. Persistently exciting input is a different thing. Just because the input meets the definition in (1.3), this does not mean the input is persistently exciting. The signal has to do so for all time. The term ‘persistent’ applies to maintaining a non-zero tracking error, e(t). Boyd and Sastry discuss this in terms of the number of spectral lines the input has in [2]. The idea is that with enough frequencies in the input, the system’s tracking error, e(t), will not converge to zero without the adaptive weights converging to their ideal values. MRAC is a minimization scheme in this context. With the amount of excitation in the signal that Boyd and Sastry mention, the least tracking error occurs when the adaptive weights are equal to the ideal weights. However, existing results in CL-MRAC do not extend to the case when the input allo4 cation matrix is uncertain. Simulation results in [26] of CL-MRAC working with uncertain allocation matrix were presented, however, that work did not provide rigorous justification of stability when the input allocation is uncertain. In particular, unlike previously studied CL-MRAC approaches which assumed the allocation matrix was known [5, 8], the case for uncertain input allocation matrix requires that the concurrent learning histories be purged once the estimate of the input allocation matrix converges close to its actual value as shown in [27], though the stability shown in that paper did not put enough restrictions on the system. A more complete and correct treatment is given here. 1.4 Nussbaum Gains Nussbaum used a more general formulation where the input allocation function is unknown. He used an equation set like ẋ(t) = x(t) + λf(x(t), y(t)) ẏ(t) = g(x(t), y(t)) where the λ is a non-zero, real number and f and g are differentiable in [23]. He shows there exist specific families of functions for which the proposed system is stable, but in order for the proof to work, Nussbaum needs random input commands to be able to eventually find a value, c, that limits values of g to be either positive or negative ( he shows both cases) for values of x greater than c. Once c has been found, the sign of g is known for values greater than c. The hinge is the random input, though. Aerospace applications generally do not react well to random input. 1.5 Eigenvectors and Eigenvalues Important components of any matrix are its eigenvalues and eigenvectors. Eigenvalues are the roots of a matrix’s characteristic polynomial gleaned from |A − Iλ| = 0 where | · | is the determinant operator, A is the matrix under investigation, I is the identity matrix, 5 and λ is the working variable. So, for every n × n matrix, there exists an nth order characteristic equation which means there are n roots per fundamental algebraic result. Once the roots of the characteristic equation are known, the eigenvectors can be found by inspecting Axi = λi xi for xi , i ∈ {1, 2, . . . , n} assuming the eigenvalues are real and distinct. If not, there are methods that will be left to the reader in [3] among other texts. The xi ’s found following this method are the eigenvectors associated with the ith eigenvalue. Together the eigenvectors and values illuminate the direction (vectors) and the magnitude (values) that a given vector will grow in when multiplied by A. [3, Chap. 7] 6 CHAPTER 2 Model Reference Adaptive Control 2.1 Introduction This work presents a hybrid MRAC method that directly uses an online-updated estimate of the uncertain allocation matrix in the adaptation laws. An outline of Model Reference Adaptive Control is presented. Then Concurrent Learning Model Reference Adaptive Control is shown and a discussion of how the CL-MRAC process was changed to be able to identify a input allocation matrix is reported. Combined variables and projection operators are the final segments of the chapter. 2.2 Classic Equations Let D ∈ Rn be compact and let x(t) ∈ D be the state vector of the system. Consider a linear, time-invariant, dynamical system of the form ẋ(t) = Ax(t) + Bu(t) (2.1) where A is the known state matrix and B is an known allocation matrix. Equation (2.1) is called the plant. The linear reference model is of the form ẋrm (t) = Arm xrm (t) + Brm r(t) 7 (2.2) where Arm is chosen to be Hurwitz, Brm is chosen to be an identity matrix, and the reference model input is r(t) = −B†rm Arm xdes (t) (2.3) where the † means the psuedoinverse operation of [24] (Tikhonov regularization of [32] may be used to guarantee the inverse exists). The desired state, xdes is a bounded signal. The control law applied to the plant is defined as the following by [1, 30]: u(t) = KT (t)x(t) + KTr (t)r(t) (2.4) where the first term on the right-hand-side is the state feedback and the second is the reference input feed-forward term. Defining the error as e(t) = x(t) − xrm (t), (2.5) then it can be shown (see [1, 10, 30]) that the time derivative of the tracking error is where e T (t)x(t) + BK fr T (t)r(t) ė(t) = Arm e(t) + BK e K(t) = K(t) − K∗ fr (t) = Kr (t) − K∗ K r (2.6) (2.7) (2.8) are the vanishing weights. This progression is also shown in Appendix A.3 on page 70 in detail. The starred parameters come from the assumption of matched uncertainty which guarantees the existence of ideal constant gains, K∗ and K∗r . Assumption 1 (Matching Conditions) There exists two constant, non-zero matrices, K∗ and K∗r , such that A + BK∗T = Arm and BK∗T r = Brm . Using (2.1), (2.2), (2.4), and the matching conditions, the state derivative of the plant can be represented as: e T (t)x(t) + BK fr T (t)r(t) ẋ(t) = Arm x(t) + Brm r(t) + BK 8 (2.9) which is derived in detail in Appendix A.1 on page 66. The adaptation laws for K and Kr are: K̇(t) = −Γx x(t)eT (t)PB K̇r (t) = −Γr r(t)eT (t)PB (2.10) (2.11) where P is from the Lyapunov equation. Let P, Q ∈ Rn×n and positive definite matrices such that the Lyapunov equation states: ATrm P + PArm = −Q. (2.12) e and K fr because from (2.7) and (2.8), Then (2.10) and (2.11) are also the derivatives for K e and K fr are linear combinations of K∗ and K and K∗ and Kr . Since K∗ and K∗ are both K r r constant matrices, their derivatives are 0 leaving just the derivatives of K and Kr as shown in (2.10) and (2.11). 2.3 Concurrent Learning Model Reference Adaptive Control CL-MRAC differs from MRAC in that concurrent learning operates on an estimate of the weight error concurrently with the tracking error. The regressor terms for the jth recorded data point for adaptive gains K and Kr are: e T x = KT xj − δKj Kj = K (2.13) δKj = K∗T xj = KT xj − B† (ẋj − Arm xj − Brm rj − BKrj ) T fr rj = KT rj − δK Krj = K r rj (2.14) (2.15) † δKr j = K∗T r rj = B Brm rj (2.16) Along with δKj and δKr j , concurrent learning also stores xj and rj at that time. Note that the evaluation of these regressors requires an estimate of ẋj for a recorded data point. The estimate can be computed using fixed-point smoothing. This method has been validated through several flight tests to yield acceptable results (see [6, 8–10]), and furthermore, [20] shows that the CL-MRAC framework is robust to noise in estimating ẋj . Also, the ẋj is a 9 stored value, which means there is time to use Kalman filtering to improve the estimate of ẋj before using it, if so desired. In this work, zero-mean, truncated, white, Gaussian noise was added to ẋ to simulate measurement error. The regressor in (2.13) comes from rearranging (2.9) and the regressor in (2.15) comes from the matching condition (Assumption 1) that fr , (2.8). Multiplying by xj and rj respectively compresses defined K∗r and the definition of K the data into a n × 1 vector for storage in the history stack. The regressor terms in (2.13) and (2.15) are summed and used in the update laws for K and Kr in CL-MRAC (see [4, 26]). The CL-MRAC update laws are pX max K̇(t) = − Γx x(t)eT (t)PB + xj K Tj j=1 K̇r (t) = − Γr r(t)eT (t)PB + pX max j=1 rj Kr Tj (2.17) (2.18) where pmax is the maximum number of data points to be stored. The error terms (the summations in (2.17) and (2.18)) are part of CL-MRAC and would not be present in MRAC update laws. As a counterpoint to having a highly oscillatory input per [2], concurrent learning may be used with step inputs or single sinusoids. As long as the input is exciting per (1.3) over some (short) finite time when concurrent learning may record the data, that data can be used along with the normal MRAC adaptation law like in (2.17) and (2.18) to drive the adaptive weights to their ideal values per the following theorem. Theorem 1 Consider the system in (2.1), the control law of (2.4), and let p ≥ n be the number of recorded data points. Let Xk = [x1 , x2 , . . . , xp ] be the history stack matrix containing recorded states, and Rk = [r1 , r2 , . . . , rp ] be the history stack matrix containing recorded reference signals. Assume that over a finite interval [0, T ] the exogenous reference input r(t) is exciting, the history stack matrices are empty at t = 0, and are consequently updated using Algorithm 1 of [10], the Concurrent Learning History Stack Minimum Eigenvalue Maximization (CL-HSMEM) routine. Then, the concurrent learning weight update h i e fr (t) ≡ 0 is globally laws of (2.17) and (2.18) guarantee that the zero solution e(t), K(t), K 10 exponentially stable. Proof. See [10] for the proof. An example of concurrent learning on a two state system with step inputs is shown in Appendix B.2.2 on page 92. 2.4 CL-MRAC with Uncertain Allocation Matrix In order to describe the method for uncertain input allocation CL-MRAC, the framework of CL-MRAC in section 2.3 must be expanded to extimate the input allocation uncertainty. The classical implementation of MRAC from section 2.2 would not operate consistently with an uncertain allocation matrix because the adaptive laws, grounded in the Lyapunov function for the system, assume the sign (and magnitude) of the input allocation matrix. The wrong sign would drive the adaptation in the opposite direction causing divergence of parameters and instability. CL-MRAC will bound parameter growth ( [10, 26]) but still the uncertain B matrix is not tackled. Since the input allocation matrix, B, is uncertain, the controller uses an internal estimate b b of B, denoted as B(t). Then, B(t) is related to B by the following: e b −B B(t) ≡ B(t) (2.19) e b where B(t) is not directly measurable. The regressor for B(t)u(t) is defined as ẋ(t) − Ax(t) with the following assumption. Assumption 2 The state matrix, A, is assumed to be known. The knowledge of A is a restrictive assumption, however, there is a lack of results on the topic of adaptive control with uncertain input allocation even with this assumption. Future work will try to relax Assumption 2 along the empirical evidence presented in [26]. b To utilize B(t), (2.6), (2.9), (2.15)-(2.16) are rewritten solving (2.19) for B and inserting 11 as follows: bK eT x − B eK eT x + B bK fr T r − B eK fr T r ẋ = Arm x + Brm r + B (2.20) bK eT x + B bK fr T r − B( e K eT x + K fr T r) ė = Arm e + B (2.21) ∗T T b † e fT eT b† b fT δc K = K x + B B(Kr r + K x) = K x − B (ẋ − Arm x − Brm r − BKr r) ∗T b † e ∗T b† δc Kr = Kr r − B BKr r = B Brm r e T x = KT xj − δc b K = K Kj (2.22) (2.23) (2.24) fr T rj = KT rj − δc b Kr = K Kr j r (2.25) and the full derivations can be found in Appendix A.1. Again, (2.22) is derived from (2.20). e Note that δc K has elements of B stored in it under this derivation. This situation will remain, e can be forced to become small. In the same way, (2.23) but with concurrent learning, B e as well. Finding the error also stores more than estimates of K∗r r. It includes elements of B b to the regressor for B, b B = B(t)u(t) − ẋ(t) + Ax(t) (2.26) e in expanded form) which is then used in concurrent learning for B. b The update law (or Bu b for B(t) is chosen as follows: " # max T pX ḃ b B(t) = −ΓB u(t) B(t)u(t) − ẋ(t) + Ax(t) + ui TB = −ΓB e T(t) + u(t)uT (t)B pX max i=1 e T(t) ui uTi B ! i=1 (2.27) e where B(t) is defined in (2.19). And, (2.17) and (2.18) are rewritten using the estimator, b instead of B as B, K̇(t) = − Γx b + e (t)x(t)PB(t) T b + K̇r (t) = − Γr eT (t)r(t)PB(t) pX max i=1 pX max j=1 xi b TK i ! rj b TKr j (2.28) (2.29) because this uses terms the controller knows. As in [4, 8, 26], the history stacks of the concurrent learning mechanism are populated while the input signal is exciting per the 12 definition given in (1.3) up to a given number of points, pmax . The CL-HSMEM routine of [10] is used to select and replace future data points in the history stack. c b Remark 2.1 Solving for δc K and δKr is the only place the controller inverts the B matrix. b and turning off Singularities may be avoided here by placing a dead-zone around zero for B b moves away from zero. the CL-HSMEM routine until the norm of B Assumption 3 The input allocation matrix, B, is uncertain except for its dimension. The uncertainty may include the sign of the elements of B, but knowledge of the dimension of B is necessary because it affects the dimensions of K∗ and K∗r and therefore K and Kr . A completely different result would be found if a the input allocation matrix was a vector instead of a square matrix (or non-square). Also, knowing if the system is or is not multiinput is rather straight forward. 2.5 Combining Variables For ease of expostulation, several variable matrices will be concatenated together. W, W ∗ , f Ẇ along with σ and b W, W are now defined. Definition 3 Let W be a 2n × n matrix of the form: K W≡ . Kr (2.30) Using (2.30) will reduce the number of terms in the results in later chapters. Definition 3 subsequently requires the definition of the following: f and Ẇ) Definition 4 (W ∗ , W, W∗ ≡ 13 K∗ K∗r (2.31) e K f ≡ W fr K K̇ Ẇ ≡ K̇r (2.32) (2.33) where (2.33) refers to (2.17) and (2.18). Also, the state and reference input shall be combined as well. Definition 5 Let σ be a 2n × 1 matrix of the form: x σ ≡ . r (2.34) This is necessary for use with W. Definition 6 Let b W i be a 2n × 1 matrix of the form: K b b W i ≡ b Kr (2.35) where b K refers to (2.24) and b Kr refers to (2.25). 2.6 The Projection Operator One of the many methods to increase the robustness of a solution is to add a projection operator of the form found in [19, 25] to the update law. Now a projection operator works by creating a ball inside which the weights are allowed to move with freedom, but if the weight vector approaches the boundary of the ball specified, the projection operator begins to remove the component of the weights that is perpendicular to the boundary. The amount removed increases as the boundary is approached and by the time the weights achieve the boundary, all the perpendicular component of the update is removed so that the weight vector is projected parallel to the boundary. This is shown in Figure 2.1. [19] 14 Proj(θ, y, f) y 5f(θ) ΦT {θ f(θ) = 1} Φ0 {θ f(θ) = 0} Figure 2.1: Projection Operator in the Weight Space adapted from [19]. Any θ such that f(θ) ≤ 1 is an allowable value. If the weights arrive at the outer boundary as shown by y, the projection operator redirects y to become parallel to the boundary, Proj(θ, y, f) by removing 5f from y. The function f : Rn → R is used to smoothly transition from not projecting at all to fully projecting the vector, y, parallel to the boundary where f(θ) = 1. For this thesis, the function is of the form f(θ, θTstart , θTend ) = kθk2F − θ2Tstart 2θTend θTstart + θ2Tend (2.36) where θTstart is the dashed line shown in Figure 2.1 at which the projection operator begins to affect the result, θTend is the additional width from θTstart to the maximum value that should be allowed by the projection operator. Thus, when θ = θTstart + θTend , the entire component of θ perpendicular to the boundary, delineated by f(θ) = 1, has been removed and the transition is complete. The k·kF is the Frobenius norm for matrices and the 2-norm for vectors as defined in [30]. If θTstart and θTend have been pre-specified, then (2.36) may be denoted as f(θ). For completeness, the derivative of f(θ) where θ is a vector is q P 2 2 P 2 + θ2Tstart 2 i θi ∂ 2θi ∂ i θi + θTstart = = ∂θi 2θTend θTstart + θ2 ∂θi 2θTend θTstart + θ2 2θTend θTstart + θ2Tend Tend Tend 15 (2.37) Thus, a transition set is created of values of θ where the projection operator is active as: Φ1 = {θ f(θ) ≤ 1} (2.38) Φ0 = {θ f(θ) ≤ 0} (2.39) ΦT = Φ1 \ Φ0 (2.40) All values of θ in set Φ1 are allowable as are those in set Φ0 , but the point here is to define a region where projection operator does nothing and then another region where the projection operator does its job, ΦT . Definition 7 (Projection Operator) The projection operator is then defined as T y − 5f(θ)(5f(θ)) yf(θ) f(θ) > 0 ∧ yT 5 f(θ) > 0 2 k5f(θ)k Proj(θ, y, f) = (2.41) y otherwise where θ is the vector the operator is limiting to within the set defined by the function f per (2.36), and y is the vector which defines the growth of θ. The preceding definition is for the case of two vectors, θ and y. To generalize the operator to the matrix case, the projection operator is defined as Proj(Θ, Y, F) = [Proj(θ1 , y1 , f(θ1 )), . . . , Proj(θj , yj , f(θj ))] (2.42) where Θ, Y, F ∈ Rn×m given Θ = [θ1 , . . . , θm ], Y = [y1 , . . . , ym ], and F = [f(θ1 ), . . . , f(θm )] and j = 1 to m. If more control of the extents of the boundary is needed, then the matrix case could be redefined with an indexed list for each column of the matrix so that its norm is within the indexed values like the format in [19]. The Gamma Projection operator is developed also in [19] to handle the case when the learning rate gain matrix for MRAC is not of the form Γ = λI. 16 2.7 Derivatives with the Projection Operator b are converted Using the operator explained in section 2.6, the derivatives for both W and B to the following: b − ΓW Ẇ = Proj W, −ΓW σe PB T ḃ = Proj B, b −ΓB uuT B e − ΓB B pX max i pX max k σi b TW i , f(W, ωb , ωmax ) ! e , f(B, b βb , βmax ) uk uTk B ! (2.43) (2.44) where ωb is the norm value that begins the transition of the projection operator for the weights, ωmax is the norm value maximum allowed for the weights and βb and βmax are b respectively. the beginning and maximum normed values allowed for B 2.8 Summary Model Reference Adaptive Control has been reviewed along with the necessary changes for adding concurrent learning to the derivative and the changes for estimating an uncertain input allocation matrix. Then projection operators were discussed as a method to add robustness to a system. Finally, the projection operator was used to define derivatives from concurrent learning model reference adaptive control. 17 CHAPTER 3 Error in the History Stack 3.1 Introduction The mechanism of concurrent learning will store errors, defined in section 2.4, in the history stack if the data is linearly independent. This chapter discusses the need for shocking the adaptive weight history stack to remove those accumulated errors. 3.2 Shocking The concurrent learning history stacks are empty at t = t0 and are filled with data per the e goes to zero. CL-HSMEM routine explored in [10]. The following lemma shows that B Lemma 1 Consider the plant of (2.1), the reference model of (2.2), the control law of e (2.4), the weight update law of (2.44), and Assumption 2, then B(t) → 0 as t → ∞ exponentially. Proof. From Theorem 1, let (2.26) be the reference signals, Rk , and let u be used in place of the state vector for Xk , then result is straight forward. e Hence, by the Lemma 1, there exists a time, ts > 0, such that B(t ) s ≤ 0 , where 0 is a small positive constant. Before ts , the data stored in the history stack has estimates of W ∗ formed by using (2.22) and (2.23) together. However, when these estimates were e term was large causing the values stored to be very different from K∗ and recorded, the B 18 K∗r that were expected. As the theoretical results show later, this incorrect data in the W history stack helps ensure that the system response stays bounded, but W will not converge to its ideal values as long as the incorrect data remains in the stack. Therefore, the stack for W must be shocked or purged to remove this incorrect data and allow collection b is closer to the actual allocation matrix. Hence, the of new data where the estimate, B, e e condition for shocking the stack becomes B(t) is assumed to be not ≤ 0 . Since B(t) directly measurable, three methods to estimate it are presented: a heuristic (Algorithm 1), by hypothesis testing (Algorithm 2), and by investigating the variance of the expectation of the regressor (Algorithm 3). Algorithm 1: Heuristic, Time Based Method b Require: x(t), xrm (t), B(t), u(t), dt ẋ(t) ⇐ plant (2.1) ẋrm (t) ⇐ model (2.2) ḃ B(t) ⇐ control law (2.44) ḃ if |B(t)| < 0 then . Choose 0 to be small Step counter, cnt end if if cnt · dt == 1sec then . Heuristic Purge history stack for W end if 3.2.1 Heuristic Algorithm e Algorithm 1 estimates B(t) by looking at the number of iterations that the absolute values ḃ ḃ is small for a number of iterations that accounts for a second of time, of B(t) < 0 . When B the algorithm shocks the history stack. While Algorithm 1 is simple to implement, it needs to be told from an outside source that the allocation matrix has changed. The algorithm is not robust to multiple changes of the allocation matrix without outside information. 19 3.2.2 Hypothesis Testing Algorithm To address the limitation of Algorithm 1, a hypothesis test is used to detect changes in the b in Algorithm 2 based on a rolling set of 2-norms of the expectations B matrix relative to B of e≈x B ^˙ (t) − ẋ(t) (3.1) b values where x ^˙ (t) = Ax(t) + B(t)u(t) so other than measurement errors, (3.1) is a good e A rolling set is a data smoothing operation which has a fixed number approximation of B. of elements which are replaced, oldest first, by new elements. Using only pmax elements in the rolling set, the set is updated whenever ku(t)k is greater than zero. Otherwise, the set could artificially go to zero. The check to purge the history stacks is executed every e iteration because due to Lemma 1 concurrent learning will be driving B(t) toward zero continuously. Algorithm 2: Hypothesis Test on Expectation of (x ^˙ − ẋ) b Require: x(t), xrm (t), B(t), u(t) ḃ ẋ(t) ⇐ plant (2.1), ẋrm (t) ⇐ model (2.2), and B(t) ⇐ control law (2.44) b ^ ⇐ Ax(t) + B(t)u(t) ẋ(t) if ku(t)k2 > 0 then Ppmax ˙ 1 T ˙ xrollmean ⇐ pmax ( x ^ − ẋ )( x ^ − ẋ ) i i i i i ˙ T ˙ Add (x ^ − ẋ)(x ^ − ẋ) − xrollmean to rolling list, xroll 2 xrollave ⇐ mean(xroll ) p xrollS ⇐ var(xroll ) end if Expected-ẋ ⇐ (x ^˙ − ẋ)(x ^˙ − ẋ)T − xrollmean √ UCL ⇐ 4.781xrollS / pmax √ Tstat⇐ kExpected − ẋk2 pmax /xrollS . 99.99% Upper Control Limit . Statistic if Tstat<UCL then Purge history stack for W end if 20 e ˙ T ˙ Algorithm 2 estimates B(t) with (x ^(t) − ẋ(t))(x ^(t) − ẋ(t)) since x ^˙ (t) − ẋ(t) = 2 b − B)u(t) = B(t)u(t). e (B(t) The upper control limit of the hypothesis test changes based on the mean and standard deviation of the rolling set. The statistic depends on the current measurement as well as the standard deviation. The statistic is the difference between the plant and the controller’s estimate of the plant. The outside product of this difference is normed to return a positive number so the statistic is positive semi-definite. The variance approaches zero faster due to being squared. Unlike Algorithm 1, if the allocation matrix changes again, then the difference x ^˙ (t) − ẋ(t) will be non-zero and the control limit will increase, as will the statistic, automatically allowing for successive changes in the B matrix to be detected. Algorithm 3: Variance of Average Allocation Matrix Estimate Error b Require: x(t), xrm (t), B(t), u(t) ḃ ẋ(t) ⇐ plant (2.1), ẋrm (t) ⇐ model (2.2), and B(t) ⇐ control law (2.44) b ^ ⇐ Ax(t) + B(t)u(t) ẋ(t) if ku(t)k2 > 0 then Ppmax ˙ 1 xrollmean ⇐ pmax (x ^i − ẋi )(x ^˙ i − ẋi )T i ˙ Add (x ^ − ẋ)(x ^˙ − ẋ)T − xrollmean to rolling list, xSnroll 2 p xSnrollstd ⇐ var(xroll ) end if . small tolerance if xSnrollstd < tol then Purge history stack for W Set flag to indicate that stack has been purged end if if flag set then if xSnrollstd > tol2 then Set flag to indicate that stack can be purged end if end if 21 3.2.3 Variance of Average Allocation Matrix Estimate Error Algorithm 3 is defined in order to shock the stack less often when compared to Algorithm 2. It focuses on the standard deviation of the expectation of ẋ(t) − Ax(t), reasoning that q e when B(t) ^˙ (t) − ẋ(t))(x ^˙ (t) − ẋ(t))T ) will be small. Therefore, two < 0 , then var((x tolerances are given. The first, tol, is small, around 10−8 , and is used to verify that the history stack should be shocked, and the second tolerance, tol2, is much larger (≈ 10−3 ) and is used to indicate that the difference between x ^˙ (t) and ẋ(t) has increased. In this way, the algorithm allows for multiple changes to the allocation matrix to be detected. While the selection of tolerances is arbitrary, using the variance of the expectation of the difference between x ^˙ (t) and ẋ(t) gives a measure of how consistently the expectation is near zero. 3.3 Summary In this chapter retained errors are discussed as part of concurrent learning. Those errors are present due to the uncertainty of the B matrix and three methods were presented to remove the errors from the concurrent learning history stack. Algorithm 1 uses a simple time counting scheme. Algorithm 2 uses a hypothesis test to decide whether to shock the stack and Algorithm 3 looks at the variance of the average allocation matrix estimate error to decide if it is time to shock the stack. 22 CHAPTER 4 Stability 4.1 Introduction Lyapunov stability theory is used to demonstrate the local boundedness of the zero solution. This chapter proceeds with a set of events that happen in order notionally shown in Figure 4.1. The system state begins to grow in response to input and the estimators for B e begins to shrink and W ∗ are wrong. Shortly thereafter, XBb becomes full rank at time, tf . B f continues to grow because of large values of B e stored exponentially following Lemma 1. W in the history stack at first. Because W is growing, kxk grows from the control law. Then e B is found to be less than 0 and the shocking method shocks XW . There is a little time e continues to decrease. Then W f too, shrinks. while XW becomes full rank after ts and B This pulls kxk back as well. b history stack is full rank is presented. Then First, system boundedness before the B system boundedness after the XBb becomes full rank is shown. After that, there is a theorem for combining the first two together. Then the ultimate boundedness of the system is described and finally, the convergence rate is discussed. In this thesis, vec(·) is the vectorize operation where the columns of a matrix are stacked one upon the next. The function λmax (·) returns the maximum eigenvalue of its argument while λmin (·) returns the minimum eigenvalue of its argument. Also, Algorithm 1 of [10], the concurrent learning history stack minimum eigenvalue maximization (CL-HSMEM) routine is used throughout this chapter. 23 R ẋ(τ )dτ Φ0 e B(t) f (t) W ts t0 tf T1 e and W f with Time. Time is along the Figure 4.1: Notional Representation of System State, B, e and W f are pictured above that axis. Time bottom and representations of the sets containing x, B, ts is when the history stack shock occurs. 24 4.2 Boundedness without a Full Rank History Stack The problem of showing there is a limited amount growth during the time required for a full rank history stack to be selected is evaluated. Define tf as the time when the history stack b Xb , becomes full rank. This controller is only for use on inherently stable platforms for B, B like some fixed wing aircraft and unmanned aerial vehicles. The system must be able to sustain itself during the time between t0 , initialization, and tf . This theorem shows that the integral of kẋk over a fixed interval of time is finite because of the assumption of a linear plant and the use of projection operators. Theorem 2 Consider the plant of (2.1), the reference model of (2.2), the control law of (2.4), the weight update laws of equations (2.44), and (2.43), and let pmax be an integer such that pmax > n where n is the number of states in the system and pmax is the maximum number of recorded data points. Assume the ideal values, W ∗ , and actual system parameters, B, are within Φ0,W and Φ0,B , respectively, per (2.39), for the projection opb erators used, and that W(0) and B(0) are within those sets as well. Let the norm bounds for the projection operator be chosen such that bmax ≥ 2 kPk, where P is selected with the Lyapunov equation, and wmax ≥ 2 kW ∗ k > 2. From Assumption 3, B is uncertain. Let XBb = [u1 , u2 , . . . , upmax ] ∈ Rn×pmax be the history stack matrix containing recorded inputs h i n×pmax be the history stack matrix of recorded reand let RBb = B,1 , , . . . , b b b max ∈ R B,2 B,p b Assume Xb and Rb are 0 at t0 and are updated by Algorithm 1 of [10], the gressors for B. B B concurrent learning history stack minimum eigenvalue maximization (CL-HSMEM) routine. Assume that the exogenous reference input, r(t), is sufficiently rich to allow the CL-HSMEM routine to select n linearly independent points and let tf be the time at which XBb becomes fB e ] is bounded. full rank. Then the zero solution of [ e W Proof. Selecting a quadratic Lyapunov candidate like 1 fT −1 f 1 e T −1 e 1 Γw W + tr B ΓB B V(ζ) = eT Pe + tr W 2 2 2 (4.1) f allows the compilation of all the energy in the system into one variable. Noting that W 25 e are bounded by projection operator leaves eT Pe. From (2.5) the necessity is bound and B x(t) because xrm (t) is bounded due to the reference input being bounded and the reference model Arm being Hurwitz. So, evaluate the integral of kẋk to see if it exists and is finite. Z t Z t f f ẋ(τ) dτ = (Ax(τ) + Bu(τ)) dτ Then since u = W T σ, (4.2) t0 t0 Z t Z t f f T ẋ(τ) dτ = Ax(τ) + BW (τ)σ(τ) dτ (4.3) t0 t0 and σ may be separated into [ xT rT ]T as follows: Z t Z x(τ) f tf T ẋ(τ) dτ = Ax(τ) + BW (τ) dτ. t0 r(τ) t0 (4.4) f is bounded The reference input is bounded by cr > max kr(t)k, and the adaptation of W due to projection operators as is B. The upper bounds of B and W are found by wmax f = sup W (4.5) e W∈Φ 1,W e bmax = sup B . (4.6) e B∈Φ 1,B Substituting wmax for W and bmax for B and bounding r(t) obtains Z t Z t f f ẋ(τ) dτ ≤ Ax(τ) + bmax (wmax x(τ) + wmax cr ) dτ. t0 (4.7) t0 Then letting Ap = A + bmax wmax In , the remainder is Z t Z t f f ẋ(τ) dτ ≤ Ap x(τ) + bmax wmax cr dτ (4.8) t0 t0 and applying the triangle inequality and integrating the constant term leaves Z t Z t f f ẋ(τ) dτ ≤ Ap x(τ) dτ + c. t0 (4.9) t0 Now there exist functions which grow faster than the above like the exponential function for instance because the plant in this thesis is linear time invariant. Thus the norm of the integral of Ap x(τ) is bounded above by the integral of the exponential function plus c from fB e ] is bounded from t0 to tf . t0 to tf . Therefore, the zero solution of [ e W 26 Remark 4.1 The assumption that the reference input is rich enough is not difficult to qualify. If the reference input is identically zero and the system initializes at the zero state, then there is no requirement for the system to move and then no linearly independent data to collect. Even with the system initializing at a non-zero state, there is only the error term driving change which will not be that rich of a signal. So, some level of excitement within the reference input is required to allow selection of linearly independent data points for the W history stack. Remark 4.2 The growth of the state, x, depends on Ap from this derivation. Ap may be partially selected by the designer by choosing stable platforms like some fixed wing aircraft. If highly unstable platforms are chosen or nonlinear plants are used, Theorem 2 does not necessarily hold. 4.3 Boundedness with a Full Rank History Stack b is now full rank and the Proceeding forward toward the next event, the history stack for B e converges toward following theorem is presented to show boundedness of the system while B zero from concurrent learning, but B is still not known. Theorem 3 Consider the plant of (2.1), the reference model of (2.2), the control law of (2.4), the weight update laws of equations (2.44), and (2.43), and Theorem 1. Assume that b the ideal values for W ∗ and B are within Φ0,W and a full rank history stack exists for B, b Φ0,B , respectively, per (2.39), for the projection operators used, and that W(0) and B(0) are within the set Φ0 , too. Let the norm bounds for the projection operator be chosen such that bmax ≥ 2 kPk where P is selected from the Lyapunov equation and wmax ≥ 2 kW ∗ k > 2. From Assumption 3, B is uncertain, but by using Theorem 1 and one of the methods for fB e ] is shocking the history stack for W discussed in section 3.2, the zero solution of [ e W bounded. Proof. Focusing on a specific interval of time, [ti , ti+1 ], allows the ‘(t)’ to not be included 27 for ease of reading. The Lyapunov candidate is chosen to be: 1 1 fT −1 f 1 e T −1 e V(ζ) = eT Pe + tr W Γw W + tr B ΓB B 2 2 2 fT vec(W) where ζ = [ eT (4.10) e T ]T and P, Q ∈ Rn×n agreeing with (2.12). The candivec(B) date Lyapunov function, (4.10), can be bounded above and below by min λmin (P), λmin (Γw−1 ), λmin (ΓB−1 ) ||ζ||2 ≤ 2V(ζ) ≤ max λmax (P), λmax (Γw−1 ), λmax (ΓB−1 ) ||ζ||2 (4.11) where the λmax (·) operator and λmin (·) are described in section 4.1. Let [t1 , t2 , . . . , tpmax ] ≥ ti < ti+1 be the sequence of times where each data point was recorded in the past with ti being the initial starting time. The derivative of the Lyapunov candidate along the system trajectory of (2.21) for each interval [ti , ti+1 ] with simplification and removing (ζ) for ease is: Ppmax pX T max xi b K i i=1 T T T T TQ e e+ f e ku B e B e − tr W Bu V̇ = − e − tr Buu k Ppmax 2 T rj b k=1 j=1 − 1 2 Kr j fB e T Pe + eT PB eW fT σ σT W (4.12) where σ is defined in (2.34) and the whole derivation is in Appendix A.4.3. Define b Kr = ∆Kr + Kr and b K = ∆K + ∆Kr + K where Kr is from (2.15), K is from b instead of B as shown in (2.13), and ∆K and ∆Kr are the differences caused by using B b † BK e ∗T r, and (2.22): B b † B( e K eT x + K fr T r). Define Ωσ = Ppmax σi σT which creates a (2.23): −B r i i=1 non-negative matrix. Then p Ppmax T max X Q i=1 xi ∆K i fT Ωσ W f − tr e k uT B e T − tr W fT Bu V̇ = − eT e − tr W P k 2 pmax T r ∆ k=1 j Kr j j=1 1 T fe T eW fT σ − tr Buu e TB e − σ W B Pe + eT PB (4.13) 2 and the expansion of b K and b Kr is in Appendix A.5. The first three terms in (4.13) are negative definite and the last term is non-positive depending on if the input, u, is nonzero. 28 Let cQ = λmin (Q) and cP = kPk. Now, Arm of the reference model is chosen to be Hurwitz and the reference signal, r, is a scaled version of xdes which is bounded. Therefore, there exist scalars crm , cr > 0 such that crm > max(kxrm k) and cr > max(krk). Let cΩσ = λmin (Ωσ ) and cerr Ppmax T i=1 xi ∆K i > P . pmax T j=1 rj ∆Kr j (4.14) The concurrent learning error terms, ^, (in all variants) are functions of xi and rj which are time invariant and bounded over the interval. Let the constants cR , cK > 0 where f B e cR > [ crm cr ]T and cK > [ K∗ K∗r ]T . Now, formulating an upper bound for V̇ e, W, and simplifying yields 2 2 cQ f e f e f 2 kek2 − cΩσ W − cu B + cerr W + cP B W kek 2 2 2 2 e f f e + cP crm B W kek − kek + cR W + cK B . V̇ ≤ − (4.15) f and B e once more, it can be seen that the projection Returning to the derivatives of W operator in use would limit both of these to at most wmax and bmax , respectively which are f and B e began within their respective sets Φ1 per defined by (4.5) and (4.6). Since both W (2.38), they are bounded within that set. Thus an upper bound for V̇ is cQ kek2 − cΩσ w2max − cu b2max + cerr wmax + cP bmax wmax kek2 2 2 + cP crm bmax wmax kek − kek + cR (wmax + cK )2 b2max . V̇ ≤ − (4.16) By setting the left-hand-side of (4.16) to 0, and neglecting the negative constant terms in (4.16), the conservative set outside of which V̇ is negative requires three constants. Let c1 , c2 , and c3 be greater than zero and let c1 = cQ + b2max (wmax + cK )2 − cP bmax wmax 2 (4.17) c2 = cP crm bmax wmax + 2cR b2max (wmax + cK )2 (4.18) c3 = cerr wmax . (4.19) 29 Then the inequality (4.16) may be restated as 0 ≤ − c1 kek2 + c2 kek + c3 c3 c2 ≤ . kek kek − c1 c1 (4.20) (4.21) To show that c1 is positive, remember that the selection of bmax and wmax was guided by the theorem. So, expanding c1 obtains cQ + b2max (w2max + 2cK wmax + c2K ) − cP bmax wmax 2 cQ 0< + b2max w2max + 2cK wmax b2max + b2max c2K − cP bmax wmax 2 c1 = (4.22) (4.23) and rearranging delivers cQ + b2max w2max + b2max c2K + 2cK wmax b2max > cP bmax wmax . 2 (4.24) Using the minimum values, per the theorem, (4.24) becomes cQ + (2cP )2 (2cK )2 + (2cP )2 c2K > 2cK (2cK )(2cP )2 + cP (2cP )(2cK ) 2 cQ + 5cK > 4cK + 1 8c2P cK (4.25) (4.26) Note that the left hand side is greater than the right hand side ignoring the cQ term if cK is chosen greater than 1. Now, a set outside of which the derivative of the Lyapunov candidate is negative may be shown: Ω= c2 c3 kek kek − ≤ c1 c1 (4.27) where c1 is from (4.17), c2 is from (4.18), and c3 is from (4.19). Thus the tracking error f and B e are bounded by their projection operators. Therefore, the zero is bounded and W h i fB e is bounded. solution of e W e conRemark 4.3 Though the above proof shows boundedness only, the thought is that B verges towards zero faster than the system tracking error grows because of the exponential convergence of concurrent learning. Thus the new errors in the current update stored in the history stack for W, XW , will be smaller in magnitude than those already stored. 30 4.4 Bounded Operation Here, the previous two theorems are combined together to show boundedness from the initial time, t0 , onward to a time T1 > tf when the history stack for W becomes full rank after being shocked. Theorem 4 Consider the system of (2.1), the reference model of (2.2), the control law of (2.4), the weight update laws of equations (2.27), and (2.33), and let pmax be an integer such that pmax > m > n where m = 2n is the number of rows in W, n is the number of states in the system, and pmax is the maximum number of recorded data points. Assume the ideal values, W ∗ , and actual system parameters, B, are within Φ0,W and Φ0,B , respectively, b per (2.39), for the projection operators used, and that W(0) and B(0) are within those sets as well. Let the norm bounds for the projection operator be chosen such that bmax ≥ 2 kPk, where P is selected with the Lyapunov equation, and wmax ≥ 2 kW ∗ k > 2. From Assumption 3, B is uncertain so select a method of shocking from section 3.2. Let XBb = [u1 , u2 , . . . , upmax ] ∈ Rn×pmax be the history stack matrix containing recorded inputs and let h i n×pmax be the history stack matrix of recorded regressors RBb = B,1 , , . . . , b b b max ∈ R B,2 B,p b Assume Xb and Rb are 0 at t0 and are updated by the CL-HSMEM routine of [10]. for B. B B Let XW = [σ1 , σ2 , . . . , σpmax ] ∈ Rm×pmax be the history stack matrix of combined state and reference input points and let RW = [b W 1 , b W 2 , . . . , b W pmax ] ∈ Rm×pmax be the history stack containing the regressors for W from (2.35), and let XW and RW be updated by the CLHSMEM routine from 0 at time t0 . Assume that the exogenous reference input, r(t), is sufficiently rich to allow the CL-HSMEM routine to select n linearly independent points and let tf be the time at which XBb becomes full rank. Let T1 > ts be the time at which XW becomes full rank after shocking and assume that the reference input continues to be sufficiently rich from tf to T1 such that the CL-HSMEM routine may select at least m data h i f B e for this system is bounded. points. Then the zero solution of e, W, Proof. This situation contains two cases: the time before tf when XBb becomes full rank and the time after tf . The first case for the time prior to XBb becoming full rank fulfills 31 the requirements of Theorem 2 and is therefore bounded. The second case from time tf to T1 , the time when the history stack for W becomes full rank, meets the requirements of Theorem 3 and is therefore bounded. Then since a common Lyapunov candidate was used in both cases the zero solution for this system is bounded. 4.5 Ultimate Boundedness In the previous section, the boundedness of the concurrent learning adaptive scheme was set forth in the presence of uncertain input allocation. In this section, the same scheme is investigated, but the local ultimate bound is found. In section 3.2, the term 0 was defined for use with the shocking methods. Let ts = e B(ts ). In this chapter the proof of Theorem 4 shows that the closed loop system is e bounded before B < ts which is at a time later than both tf , when the history stack for b becomes full rank, and ts , when the shocking method from section 3.2 first acts. Here, B ultimate boundedness will be investigated, but only inside the bounds of the projection operators assuming the ideal weights and parameter values are within the boundary of the projection operators, too. Thus, the following theorem is not for global, ultimate boundedness, but for local ultimate boundedness. Once the history stack has been shocked, the bmax terms in (4.16) can be replaced with ts for upper bounding purposes. Theorem 5 Let D ⊂ Rn × Φ1,W × Φ1,B ⊂ Rn+m be a domain that contains the origin T T f e where m is the dimension of vec W vec B , let ν > 0 ∈ R be a constant, and let V : D → R be a continuously differentiable function such that α1 (kζk) ≤ V(ζ) ≤ α2 (kζk) and the derivative of V(ζ) in the trajectories of the system is V̇(ζ) such that V̇(ζ) ≤ −M(ζ) ∀ ν > kζk ≥ µ > 0 ∀ ζ ∈ D, where α1 and α2 are positive, increasing functions whose limit as their argument goes to ∞ is ∞, and M(ζ) is a continuous, positive definite function. b is full rank. Additionally, let r(t) be such that the input is Assume the history stack for B exciting over a finite interval (0, ts + δt) so that by T1 = ts + δt, 2n linearly independent data points are selected by the CL-HSMEM routine where 2n is full rank for the W history 32 stack, XW . Then the closed loop system described by (2.1), (2.2), (2.4), (2.44), (2.43), and using a shocking method from section 3.2 and Theorem 4 is ultimately bounded for t > T1 , the time when XW becomes full rank after the first shock occurred. b history stack full rank implies that the system is bounded Proof. Beginning with the B from Theorem 4. Since 2n linearly independent points were stored in XW , the CL-HSMEM routine from [10] found enough exciting input or rich reference input per Boyd and Sastry in [2] from the time ts when the history stack was shocked to T1 . The system continues to be described by the Lyapunov candidate from (4.10), which is restated here for ease of reference: 1 1 fT −1 f 1 e T −1 e V(ζ) = eT Pe + tr W ΓW W + tr B ΓB B . 2 2 2 (4.28) The positive, increasing functions that can bound (4.28) above and below are (4.11) so α1 and α2 are defined as 1 α1 (ζ) = min λmin (P), λmin (Γw−1 ), λmin (ΓB−1 ) kζk2 2 1 α2 (ζ) = max λmax (P), λmax (Γw−1 ), λmax (ΓB−1 ) kζk2 2 (4.29) (4.30) where both α1 and α2 → ∞ as ζ → ∞. The Lyapunov candidate derivative is (4.13), reproduced here: p Ppmax T max X i=1 xi ∆K i TQ T T eT T f f e f V̇ = − e e − tr W Ωσ W − tr Buk uk B − tr W P 2 pmax T k=1 j=1 rj ∆Kr j 1 T fe T eW fT σ − tr Buu e TB e − σ W B Pe + eT PB (4.31) 2 which can be more narrowly bounded in this stage of operation. Let ts be a constant such e e that ts = B(t ) . Then B < ts < bmax from the projection operator due to Lemma 1. s Since Theorem 4 assumed that the ideal weights and parameters were within the bounds f towards zero with some bias. of the projection operators, concurrent learning is driving W The bias has been reduced by shocking XW . Thus, (4.31) may be bounded above neglecting 33 some of the negative definite terms by V̇ ≤ − 2 cQ f f f f 2 kek2 − cΩσ W + cP ts W kek + cP crm ts W kek + cerr W . 2 (4.32) c Let θ1 and θ2 be constants such that θ1 , θ2 ∈ (0, 1) and θ1 + θ2 < 1. Use part of 2Q kek2 2 f f 2 f Use and part of cΩσ W kek for a certain subset of e and W. to dominate cP ts W 2 f f another fraction of cΩσ W to dominate the cerr W term. Using θ1 and θ2 , (4.32) is rewritten as 2 cQ f f V̇ ≤ − (1 − θ1 ) kek2 − (1 − θ1 − θ2 )cΩσ W kek + cP crm ts W 2 cQ f f f2 f2 2 2 − θ1 kek + cΩσ W + cP ts W kek − θ2 cΩσ W + cerr W . (4.33) 2 f 2 To find the values where cP ts W kek is dominated, look at just the terms involved: 2 cQ f f 2 2 0 > −θ1 kek + cΩσ W +cP ts W kek 2 2 cQ f f 2 θ1 kek2 + cΩσ W > c P ts W kek 2 (4.34) (4.35) In the worst case, the normed variables become equal, like one variable with all the exponents combined, and the inequality reduces to: θ1 c kzk2 + cΩσ kzk2 > cP ts kzk3 2 c Q θ1 + cΩσ kzk2 > cP ts kzk3 2 θ1 c Q + cΩσ > kzk cP ts 2 Q (4.36) (4.37) (4.38) f where z is the psuedo-variable that combined the exponents. Then, for kek and W < f cQ 2 θ1 + cΩσ = ν, the cP ts W kek term is dominated and may be neglected from cP ts 2 the inequality. Following the same progression for the cerr term obtains: 2 f f 0 > −θ2 cΩσ W + cerr W 2 f f θ2 cΩσ W > cerr W cerr f =µ W > θ2 cΩσ (4.39) (4.40) (4.41) It may be noted from page 28 that the cerr term itself is proportional to ts . Since t > ts , e the errors collected are relatively small since B is small and since XW is assumed full 34 rank, the value of cΩσ is increased at every opportunity. This bound is going to be near zero. Then the positive definite function, M, is M(ζ) = (1 − θ1 ) cQ cP crm ts kek − 2 (1 − θ1 )cQ h while (1 − θ1 − θ2 )cΩσ − (cP crm ts )2 2(1−θ1 )cQ i 2 (cP crm ts )2 f f2 + (1 − θ − θ )c − W W 1 2 Ωσ 2(1 − θ1 )cQ (4.42) ≥ 0. This is not a difficult condition to meet. Note that cΩσ is in the numerator and positive because θ1 + θ2 < 1 by definition and cΩσ is from the W concurrent learning history stack which the CL-HSMEM routine is increasing at every opportunity, so it is large since XW is full rank. For the negative term, note that ts is in the numerator and squared and the other elements are chosen directly or indirectly by the designer. So the bounding function is V̇(ζ) ≤ −M(ζ) ∀ kζk, θ1 cQ cerr + cΩσ > kζk > cP ts 2 θ2 cΩσ (4.43) which implies that there are constants ψ1 , ψ2 such that ψ2 = sup maxµ<kζk<ν V(ζ) and ψ1 = inf minµ<kζk<ν V(ζ). Then there is a positively invariant set, Ψ = {ζ ψ1 < V(ζ) < ψ2 } and in that set, the Lyapunov candidate derivative is negative so the Lyapunov candidate will decrease. Thus, following Khalil’s ultimate bound discussion in [16], solve α2 (α−1 1 (µ)) for the stated α1 and α2 , the ultimate bound is s max λmax (P), λmax (Γw−1 ), λmax (ΓB−1 ) cerr · ultbound = . −1 −1 θ2 cΩσ min λmin (P), λmin (Γw ), λmin (ΓB ) (4.44) Remark 4.4 If the shocking method selected allows for multiple shocks, then each time the history stack for W is shocked, the value of ts could be updated and since more time has e e the ultimate bound passed, the value of B will have decreased. Since cerr depends o B, may become arbitrarily small. Remark 4.5 If (1 − θ1 − θ2 )cΩσ − (cP crm ts )2 2(1−θ1 )cQ previous theorem. 35 < 0, the system is still bounded by the 4.6 Rate of Convergence Inside the set Ψ, defined by the previous subsection, the Lyapunov candidate is positive while its derivative is negative. This fulfills the requirements for local asymptotic stability. Yet, due to noise in the measurements, the zero solution may not be attainable, but exponential convergence into a set may be. Toward that end, the definition of exponentially pth ultimate boundedness from [12] is given. Definition 8 Let x(t) be a solution to the nonlinear system ẋ(t) = f(x(t)) with x(0) = x0 , then x(t) is said to be exponentially pth ultimately bounded if kx(t)kp ≤ α kx(0)kp e−ct + k for some positive constants α, c, k. So, if the system is within the set Ψ and the reference input continues to be rich the convergence into the interior of Ψ is exponential in the sense of Definition 8. The following theorem shows this. Theorem 6 If in addition to the system meeting the conditions of Theorem 5, r(t) is such that the input is exciting over a finite interval (0, ts + δt) so that by T1 = ts + δt, m linearly f B e) independent data points are selected by the CL-HSMEM routine then the solution ( e, W, of the closed loop system of (2.21) and (2.43) is exponentially ultimately bounded for t ≥ T1 . Proof. By meeting the criteria for Theorem 5, XW , the history stack for W, is full rank. T T T f e Restating (4.11) where ζ = eT vec W returns: vec B 1 −1 min λmin (P), λmin (ΓW ), λmin (ΓB−1 ) kζk2 2 1 −1 ), λmax (ΓB−1 ) kζk2 ≤ V(ζ) ≤ max λmax (P), λmax (ΓW 2 (4.45) −1 Dividing (4.11) by max λmax (P), λmax (ΓW ), λmax (ΓB−1 ) gives an inequality for kζk2 that is useful. −1 ), λmin (ΓB−1 ) min λmin (P), λmin (ΓW 2V(ζ) kζk2 ≤ ≤ kζk2 −1 −1 max λmax (P), λmax (ΓW ), λmax (ΓB−1 ) max λmax (P), λmax (ΓW ), λmax (ΓB−1 ) (4.46) 36 h Let c4 = (1 − θ1 − θ2 )cΩσ − (cP crm ts )2 2(1−θ1 )cQ i and rearranging (4.43) yields: cQ 1 cP crm ts V̇ ≤ − min (1 − θ1 ) , cP crm ts , c4 kζk2 + 2 2 (1 − θ1 )cQ Applying (4.46) to (4.47) obtains: f W kek cP crm ts 1 f 2 V̇(ζ) ≤ − min ((1 − θ1 )cQ , cP crm ts , 2c4 ) kζk + W kek 2 (1 − θ1 )cQ min ((1 − θ1 )cQ , cP crm ts , 2c4 ) cP crm ts f V(ζ) + V̇(ζ) ≤ − W kek −1 (1 − θ1 )cQ max λmax (P), λmax (ΓW ), λmax (ΓB−1 ) (4.47) (4.48) Thus, by meeting Theorem 5, the system is ultimately bounded. Now let c= min ((1 − θ1 )cQ , cP crm ts , 2c4 ) (4.49) , −1 max λmax (P), λmax (ΓW ), λmax (ΓB−1 ) and note that due to the boundedness, the quantity Zt cP crm ts f −c(t−τ) (4.50) k = sup e W(τ) ke(τ)k dτ (1 − θ1 )cQ t T1 f exists. Let k = cP crm ts W(t) ke(t)k, then from (4.48) the derivative is bounded by (1−θ1 )cQ (4.51) V̇(ζ) ≤ −cV(ζ(T2 )) + k. −1 Since V(ζ(T1 )) ≤ 21 max λmax (P), λmax (ΓW ), λmax (ΓB−1 ) kζ(T1 )k2 −1 ), λmin (ΓB−1 ) kζk2 ≤ V(ζ), then α1 (ζ) = 21 min λmin (P), λmin (ΓW = α2 (ζ(T1 )) and kζk2 ≤ α2 (ζ(T1 ))e−ct + k kζk2 ≤ (4.52) 1 −1 max λmax (P), λmax (ΓW ), λmax (ΓB−1 ) kζ(T1 )k2 e−ct + k. 2 (4.53) Therefore, for t ≥ T1 the solution of the closed loop system of (2.6), (2.43), and (2.44), fB e ) is exponentially ultimately bounded in the sense of Definition 8. (e W Remark 4.6 If the shocking method selected allows multiple shocks, then with each shock the value of ts could be updated to its new (smaller) value and the cerr term would be propor e tionally smaller as well because of its relationship to ts since B converges exponentially. After each shock, new limits would be computed and the solution would be exponentially ultimately bounded into another subset of the previous set. 37 f B e) Remark 4.7 If the measurement of ẋ is without noise, the zero solution of ( e W could become a possibility. Then, at that point, an exponential solution which converges to an arbitrarily small value could be calculated and applied. Only then would the original exponential convergence rate of [10] be recovered. 4.7 Summary In this chapter the stability of concurrent learning used in the presence of uncertain input allocation was studied. Five theorems were presented to step with the system through its operating time from initialization to after both history stacks are full rank. The solution to the closed loop system is found to be bounded, and under some additional conditions, ultimately bounded and even exponentially ultimately bounded in the sense of Definition 8. 38 CHAPTER 5 Dissolved Control Allocation Matrix 5.1 Introduction Having explored the uncertain control allocation matrix case, Assumption 3 is now relaxed. Instead of the B matrix being uncertain, other than dimension, let it be dissolved into two matrices multiplied together. Assumption 4 Assume Λ is a diagonal matrix whose magnitude is uncertain and D is known such that B = DΛ (5.1) Since D is known, it gives insight into the dimension of B, but relaxes Assumption 3. The matching condition does not change, but is shown for clarity. Assumption 5 (Matching Condition) Let W ∗ ∈ R2n×n be a constant matrix such that ∗ K (5.2) W∗ = K∗r where K∗ and K∗r fulfill Arm = A + DΛK∗T (5.3) Brm = DΛK∗T r . (5.4) Under Assumption 5, W ∗ includes the uncertain Λ−1 . Somanath used a similar formulation in [29], but he used the reference model input allocation matrix in place of D used here. 39 b Once again, since Λ is uncertain, estimating it becomes necessary. The estimate is Λ and the vanishing term is e=Λ b − Λ. Λ (5.5) Using (2.1) for the plant and (2.2) for the reference model and Assumption 5, the matching conditions mentioned earlier in this chapter, the state derivative is fT σ = Arm x + Brm r + D(Λ b − Λ) e W fT σ ẋ = Arm x + Brm r + DΛW (5.6) bW fT σ − DΛ eW fT σ ẋ = [ Arm Brm ] σ + DΛ (5.7) or in full matrix form with substitution for Λ f is from (2.32) and σ is from (2.34). W is the working variable and the input is where W defined as u = W T σ. Subtracting (2.2) from (5.6), the derivative state tracking error is obtained: bW fT σ − DΛ eW fT σ. ė = Arm e + DΛ (5.8) Then the update law of the adaptive weights is selected to follow the normal MRAC update b is used in place of B: except that DΛ b − ΓW Ẇ = Proj W, −ΓW σe PDΛ T pX max σi ^TW,i ! . (5.9) b uses the regressor in (5.10) and is show below: The update law for Λ (5.10) i b − D† (ẋ − Ax) = (Λ b − Λ)u Λ = Λu pX max ḃ = Proj Λ, b −ΓΛ uT − ΓΛ Λ uj TΛ,j Λ (5.11) j where ΓΛ is the learning rate. Since the structure of Λ is known per Assumption 4 to be diagonal, only the main diagonal elements of the above derivative are used. As per section 2.6, the Proj(·, ·) operator is equivalent to Proj(·, ·, f(·, θd , θb )) when θd is the selected, normed length across which the projection operator is to work (the width of ΦT ) and θb is the normed length at which point the projection operator begins to work (the boundary of Φ0 from section 2.6). 40 5.2 Stability The stability of the system will be analyzed by breaking the duration in to separate intervals. The first theorem will be from the initialization time until the controller has two full rank history stacks. The second theorem will be for local ultimate boundedness and the third theorem in this chapter will be for exponentially ultimate boundedness in the sense of Definition 8. 5.2.1 Bounded Operation Theorem 7 Consider the plant of (2.1), the reference model of (2.2), the control law of (2.4), Assumption 2, 4, and 5, and the adaptive laws (5.9) and (5.11). Assume the ideal values, W ∗ , and actual system parameters, Λ, are within Φ0,W and Φ0,Λ , respectively, per b (2.39), for the projection operators used, and that W(0) and Λ(0) are within those sets as well. Let the norm bounds for the projection operator be chosen such that λΛ ≥ 2 kPk, where P is selected with the Lyapunov equation, and wmax ≥ 2 kW ∗ k > 2 kDk. Let XΛ = [u1 , u2 , . . . , upmax ] ∈ Rn×pmax be the history stack matrix containing recorded inputs and let RΛ = [Λ,1 , Λ,2 , . . . , Λ,pmax ] ∈ Rn×pmax be the history stack matrix of recorded regressors b Assume XΛ and RΛ are 0 at t0 and are updated by the CL-HSMEM routine during for Λ. periods of rich reference input over an interval [ t0 , t0 + δt ] allowing for the selection of n points making XBb full rank and the selection of 2n points over [ ts , ts + δt ] making XW full rank after time ts , when the first history stack shock occurs due to a method selected from f Λ e ] for the closed loop system is bounded. section 3.2, then the zero solution of [ e, W, Proof. Initially, the system is within an allowable set and, due to the projection operators, will stay there. Neither history stack is full rank, so the CL-HSMEM routine will select data points as they come along until XΛ becomes full rank at time, tf . Thus, the proof that the system is bounded from t0 to tf follows the same logic as that of Theorem 2. e → 0 per Then, from time tf to ts , XΛ is full rank. Concurrent learning is driving Λ e Theorem 1, similar to Lemma 1, so Λ is approaching and will attain values less than ts . 41 That happens at time ts and the shocking method will remove the old data from the history stack XW . To show boundedness, select a linear quadratic for the Lyapunov candidate like: V(ζ) = 1 T 1 fT −1 f 1 e T −1 e e Pe + tr W ΓW W + tr Λ ΓΛ Λ 2 2 2 (5.12) Then the Lyapunov candidate derivative can be shown to be: Q 1 T fbT T bW fT σ − σT W fΛ e T DT Pe − eT PDΛ eW fT σ e+ σ W Λ D Pe + eT PDΛ 2 2 !T pX pX max max 1 bW f−W fT σeT PDΛ b− f−W fT σi ^TW,i W σi ^TW,i + tr −DT PeσT Λ 2 i i T pX pX max max 1 T T T T T T e uu Λ e −Λ e uu Λ e − e −Λ e + tr −Λ uj Λ,j Λ uj Λ,j (5.13) 2 j j V̇(ζ) = − eT which can be simplified by combining like terms and canceling where possible into pX max 1 Q fT fΛ e T DT Pe − eT PDΛ eW fT σ + tr −W σi b TW i −σT W V̇(ζ) = − eT e + 2 2 i p max X e T uuT Λ e −Λ e + tr −Λ (5.14) uj TΛ,j j and noting that b W = W + ∆W, from (2.35) returns pX max 1 T feT T Q T T T T eW f σ + tr −W f −σ W Λ D Pe − e PDΛ σi W,i e+ V̇(ζ) = − e 2 2 i pX pX max max fT e T uuT Λ e −Λ e σi ∆TW,i + tr −Λ (5.15) uj TΛ,j + tr −W T i j pX max Q 1 T feT T T T T T eW f σ − tr W f f = −e e+ −σ W Λ D Pe − e PDΛ σi σi W 2 2 i pX pX max max fT e T uuT Λ e −Λ e + tr −W σi ∆TW,i + tr −Λ uj TΛ,j (5.16) T i j Remembering that σ = [x r]T , it can be restated as σ = [(e+xrm ) r]T . Let crm be a constant greater than the upper bound of the norm of xrm which is bounded because it is driven by r which is a bounded input and the reference model was chosen to be Hurwitz. Let cΩσ P P be a constant such that cΩσ > pi max σi σTi and let cerr > pj max σj ∆TW,i . Let cr be a 42 constant greater than the maximum norm of r. Then σ can be referred to as: e crm σ ≤ + cr 0 (5.17) and (5.16) can be bounded with using the triangle inequality: e f V̇(ζ) ≤ − kQk kek2 + Λ kDk kPk W kek kσk pX max 2 2 2 f f 2 e T e uj uj Λ + cerr W − kuk Λ − cΩσ W − j (5.18) P pmax T uj uj , cd = kDk, cp = kPk and cR to be a constant Define cq = λmin (Q), cu ≥ j greater than the norm of [ crm cr ]T . Then (5.16) can be bounded with (5.18) and making substitutions: e f e f 2 V̇(ζ) ≤ − cq kek2 + cd cP Λ W kek + cR cd cP Λ W kek 2 2 2 2 f e f e − (kek − cR )2 W − cK Λ − cΩσ W − cu Λ 2 e f + 8cR cK Λ + cerr W . (5.19) ė and W, ḟ their upper bounds are known. Since projection operators are being used with Λ Substituting these bounds into (5.19) obtains: V̇(ζ) ≤ − cq kek2 + cd cP λΛ wmax kek2 + cR cd cP λΛ wmax kek − (wmax − cK )2 λ2Λ kek2 + 2cR (wmax − cK )2 λ2Λ kek − c2R (wmax − cK )2 λ2Λ − cΩσ w2max − cu λ2Λ + 8cR cK λ2Λ + cerr wmax . (5.20) Neglecting the negative definite, constant terms in (5.20) leaves V̇(ζ) ≤ − cq kek2 + cd cP λΛ wmax kek2 − (wmax − cK )2 λ2Λ kek2 + cR cd cP λΛ wmax kek + 2cR (wmax − cK )2 λ2Λ kek + 8cR cK λ2Λ + cerr wmax (5.21) which is quadratic in kek. To find the set, Ωy , where V̇ is negative, set the left hand side 43 of (5.21) to 0 and solve for kek: c4 = cq + (wmax − cK )2 λ2Λ − cd cP λΛ wmax c5 = cR cd cP λΛ wmax c6 = 8cR cK λ2Λ + cerr wmax V̇(ζ) ≤ − c4 kek2 + c5 kek + c6 Ωy = (e) c5 kek − 2c4 2 c2 c6 ≤ 52 + 4c4 c4 (5.22) (5.23) and it may be noted that c4 is positive from the assumed limits in the theorem statement for wmax and λΛ following the process for c1 on page 30. Let φ = max V(e, wmax , λΛ ) e∈Ωy be the criteria for another set, Ωφ such that Ωφ = (e) V(e, wmax , λΛ ) ≤ φ (5.24) (5.25) then Ωφ is a positively invariant set with respect to the Lyapunov candidate and Ωy ⊂ Ωφ . V̇(ζ) is negative outside Ωφ while V(ζ) is positive so solutions starting outside Ωφ will enter it and those beginning within Ωφ will not leave it since V̇(ζ) is negative at the boundary. Thus the zero solution of the system is bounded. Remark 5.1 Note that the knowledge of the signs of the elements of Λ is not necessary for the boundedness condition. This sort of formulation has been used by others to model loss of actuators (see [11,17,31]) so negativeness does not seem like an option for that application. Remark 5.2 Another difference with the existing works is that D does not have to be Brm with use of concurrent learning. This allows a broader selection of reference models. 5.2.2 Locally Ultimately Bounded After ts + δt, XW is full rank and the system is bounded. If the system is examined in this state, a more narrow bound may be found. 44 Theorem 8 Let D ⊂ Rn × Φ1,W × Φ1,Λ ⊂ Rn+m be a domain that contains the origin T T e f , and n is the number of states, let vec Λ where m is the dimension of vec W constants µ, ν ∈ R be greater than 0, and let V : D → R be a continuously differentiable function such that α1 (kζk) ≤ V(ζ) ≤ α2 (kζk) and the derivative of V(ζ) in the trajectories of the system is V̇(ζ) such that V̇(ζ) ≤ −M(ζ) ∀ ν > kζk ≥ µ > 0 ∀ ζ ∈ D, where α1 and α2 are positive, increasing functions whose limit as their argument goes to ∞ is ∞, and b is full rank. M(ζ) is a continuous, positive definite function. Assume the history stack for B Additionally, let r(t) be such that the input is exciting over a finite interval (0, ts + δt) so that by T2 = ts + δt, 2n linearly independent data points are selected by the CL-HSMEM routine where 2n is full rank for the W history stack, XW . Then the closed loop system described by (2.1), (2.2), (2.4), (5.9), (5.11), and using a shocking method from section 3.2 and Theorem 7 is ultimately bounded for t > T2 , the time when XW becomes full rank after the first shock occurred. Proof. Assuming the system fulfills the criteria for Theorem 7 meets the requirement for XW to be full rank after T2 . Thus, following Theorem 5 with necessary changes based on Assumption 4, the result is straight forward. 5.2.3 Rate of Convergence Here, like in Chapter 4, convergence rate will be analyzed to see if exponential convergence into a set is feasible. Theorem 9 If in addition to the system meeting the conditions of Theorem 8, r(t) is such that the input is exciting over a finite interval (0, ts + δt) so that by T2 = ts + δt, 2n linearly independent data points are selected by the CL-HSMEM routine then the solution f B e ) of the closed loop system of (5.8) and (5.9) is exponentially ultimately bounded ( e, W, for t ≥ T2 . Proof. From Theorem 8, XW is full rank, T2 is known and the system is ultimately bounded. 45 Then by making the necessary changes to align with Assumption 4, kζk2 ≤ 1 −1 max λmax (P), λmax (ΓW ), λmax (ΓΛ−1 ) kζ(T2 )k2 e−ct + k 2 (5.26) follows where the constants are defined on page 37. Therefore, for t ≥ T2 the solution of the fΛ e ] is exponentially ultimately bounded closed loop system of (5.8), (5.9), and (5.11), [ e W in the sense of Definition 8. 5.3 Summary The dissolved B matrix variation on the uncertain input allocation matrix has been presented. The variation is like those used by other authors to model actuator loss, but here, the loss does not have to be scaled between [ 0, 1 ] nor does the scaling need to be positive. The known segment of the input allocation matrix, D, is also not required to be equal to the input allocation matrix of the reference model. The solution to the closed loop system is found to be bounded, and with certain conditions met, ultimately bounded, and under even more specific circumstances, is found to be exponentially ultimately bounded. 46 CHAPTER 6 Simulations 6.1 Introduction In this chapter, simulations using the shocking methods and concurrent learning controller described previously are presented. Both the uncertain input allocation matrix case and the dissolved B case are simulated. 6.2 Uncertain Allocation Matrix In this section, an example is presented where the controller is placed in a system with known state matrix, but the input allocation matrix is not known (a general case). The system state and the input allocation matrices are defined as follows: 0 1 0 −1 0.1 0.1 B = A= 0 0.1 0.9 0 0 0.5 0.5 0 −0.5 −1 2 −3 The reference model state matrix is chosen to be Hurwitz with an identity input allocation matrix and the estimate of B is b B(0) = I3 . b Note that B(0) is significantly different from B, not only in magnitude, but also the signs of parameters on the main diagonal differ. The reference input for state X1 is set to 2 for the first 5 seconds. Then the reference for state X2 is set to 2 from 15 to 25 seconds. Then 47 State X1 2 0 −2 0 Alg 1 20 40 60 80 State X2 Time (sec) Alg 2100 Alg 3 2 Xrm 0 −2 0 20 40 60 80 100 60 80 100 60 80 100 State X3 Time (sec) 2 0 −2 0 20 40 Refernce Time (sec) 2 0 −2 0 R1 20 R2 R3 40 Time (sec) Figure 6.1: Time History of Reference Model Tracking Using Algorithms 1, 2, and 3. The Reference r(t) is shown on its own. Note the tracking is quite good after each algorithm shocks the stacks. A zoomed in view is shown with Figure 6.2. the reference for state X3 is set to 2 from 35 to 45 seconds. Then state X1 is set to 1, state X2 is set to −1, and state X3 is set to 0.5 from 60 seconds to 80 seconds. To show that the system will track reference inputs where state X2 is the derivative of state X1 and state X3 is the derivative of state X2 , from 85 to 95 seconds, the reference input, R1 , is a sine wave, its derivative for state R2 , and its second derivative for state R3 . In Figure 6.1, the system states are shown tracking the reference model states. The tracking is quite good after an initial disparity which is shown in greater detail in Figure 6.2. This is to be expected, since the controller has the vastly different estimate of B when the 48 State X1 2 0 −2 0 Alg 1 2 4 6 Alg 2 10 Alg 3 8 Time (sec) X rm State X2 1 0 −1 0 2 4 6 8 10 6 8 10 State X3 Time (sec) 1 0 −1 0 2 4 Refernce Time (sec) 2 R1 R2 R3 0 −2 0 2 4 6 8 10 Time (sec) Figure 6.2: Time History of Reference Model Tracking Using Algorithms 1, 2, and 3, plotted to show the first 10 seconds. The Reference r(t) is shown on its own. The initial variability of the states is b converges and each of the methods select at least a point to shock the history stack. shown while B The shocks occurred at the blue ‘+’ for Alg. 1, at red the ‘X’s for Alg. 2, and at the black ‘5’ for Alg. 3. 49 Tracking Error e1 2 0 −2 0 5 10 Time (sec) 15 20 5 10 Time (sec) 15 20 5 10 Time (sec) 15 20 e2 0.5 0 −0.5 0 e3 1 0 −1 0 Figure 6.3: System Tracking Errors Shown Using Algorithms 1 (solid), 2 (dashed), and 3 (dash dot). Note that the error reduces quickly once the stacks are shocked. Only the first 20 seconds is shown. The full time scale is shown in Appendix D. system is initialized than what B actually is. Note that even with the initial disparity, the system states remain bounded. The disparity starts to reduce immediately as each of the algorithms shocks the history stacks. It should be noted that baseline MRAC laws diverged for the presented case of uncertain B matrix, and hence their response is not shown. The reference input is also shown in Figures 6.1 and 6.2. In Figure 6.3, the error between the system states and the reference model states is shown for the three algorithms. The error reduces as the B matrix is identified and the W matrix converges. The full time scale of the system tracking error is shown in Appendix D, Figure D.1, but the initial disparity seems the most important to show. b and W depends on the minimum eigenvalue of the respective The convergence of B history stack matrix. A time history of the minimum eigenvalues of the three history stacks are displayed in Figure 6.4 with the different algorithms noted. The markers in Figure 6.4 50 Concurrent Learning Stacks Minimum Eigenvalue Alg Alg Alg Alg Alg Alg Alg Alg Alg Alg Alg Alg Min Eigenvalue 15 10 5 0 0 20 40 60 Time (sec) 80 1 1 1 2 2 2 3 3 3 1 2 3 B̂ Kx Kr B̂ Kx Kr B̂ Kx Kr Shock Shock Shock 100 b K, and Kr History Stacks Using Algorithms 1, 2, and 3 Figure 6.4: Minimum Eigenvalues of the B, (The markers denote when one of the algorithms purged the history stacks) and 6.5 indicate when the stacks were purged under the different algorithms: a ‘+’ for Algorithm 1, ‘X’s for Algorithm 2, and a ‘5’ for Algorithm 3. Algorithm 2 purges the stacks many times in the first second and then does so once again at about 78 seconds which is about the same time that all three state errors converge back to zero in Figure D.1. Notice that the minimum eigenvalues for Algorithm 2, K and Kr , drop to zero, but begin increasing again as the input becomes exciting (85 sec). Algorithms 1 and 3 purge the stacks only once. b matrix to the B matrix under the three Figure 6.5 indicates the convergence of the B algorithms. Note that only the first 30 seconds of the simulation are shown in Figure 6.5. The balance is quite similar to the last 5 seconds shown in the figure. The time history of the element values of W converge to their ideal values (the dotted lines) in Figure 6.6 which is split into K and Kr for clarity. The elements have arrived in the first 45 seconds which concludes the state by state excitation of the input. Remark 6.1 It is interesting to note that the bounding of the growth of the W values aligns with each algorithm’s time for shocking the history stack, but the steps in W toward the ideal values align with steps in the input. This may be seen in Figure 6.6. All three algorithms have shocked the stacks by about 5 seconds. In the upper plot for Kx Element Values, the 51 b Element Values B 1 0.5 0 −0.5 −1 −1.5 0 5 10 15 Time (sec) 20 25 30 b Convergence Using Algorithms 1 (solid), 2 (dashed), and 3 (dash dot), Ideal Values Figure 6.5: B (dotted). Note that the dashed lines of Alg. 2 arrive faster than the others at the ideal values. (markers same as Figure 6.4) solid blue lines for Algorithm 1 growing up until about 5 seconds. Then the lines converge to their ideal values with steps aligning with steps in the reference input which can most clearly be seen in the lower plot of Kr Element Values, around 15 and 35 seconds. Remark 6.2 This simulation was accomplished with a plant that is unstable which has imaginary eigenvalues. 6.3 Dissolved Input Allocation Matrix This section describes the relaxed uncertain input allocation matrix assumption, Assumption 4. Usually, this scheme is used to model actuator control loss. The system state and the input allocation matrices are 0 A= 0 −1 defined as follows: 1 0 −1 0.1 0.1 0 0.5 0 B = 0.1 0.9 2 −3 0.5 0 −0.5 which is the same as the previous section. The reference model state matrix is chosen to be Hurwitz with an identity input allocation matrix. Per Assumption 4, B = DΛ and Λ is 52 Kx Element Values 15 10 The max of the dashed red line is 20.2543 5 0 −5 −10 −15 0 20 40 60 80 100 80 100 Time (sec) Kr Element Values 5 0 −5 0 The min of the dashed red line is −14.2636 20 40 60 Time (sec) Figure 6.6: W Convergence Split into K and Kr for Clarity Using Algorithms 1 (solid), 2 (dashed), and 3 (dash dot), Ideal Values (dotted). Note that the steps in convergence align with steps in the input, but the bounding of growth aligns with shocking the stacks. 53 3 X1 Xrm1 2 1 0 −1 −2 0 10 20 30 40 50 Time (s) 60 70 80 90 100 3 X2 Xrm2 2 1 0 −1 −2 0 10 20 30 40 50 Time (s) 60 70 80 90 100 3 X3 Xrm3 2 1 0 −1 0 10 20 30 40 50 Time (s) 60 70 80 90 100 Figure 6.7: Dissolved Input Allocation Matrix, not using concurrent learning. Note that Λ must be positive definite in this example and notice the high oscillations in the system states (blue), but then the state tracks the reference model reasonably well. uncertain. As an example, Figure 6.7 shows the state response when Λ is positive definite and concurrent learning is not used. MRAC is able to reduce the state tracking error similar to [26]. While the control method without concurrent learning works, it involves a large amount of oscillation during the adaptation process. Going away from the MRAC example of Figure 6.7, back to the relaxed uncertain input allocation matrix case, the D and Λ matrices are defined as: 0.0500 −0.5000 −.0333 D= 0 0.0500 −0.3000 0.2500 0 −0.2500 2 0 0 Λ= 0 −3 0 0 0 2 54 (6.1) (6.2) which multiply to equal B while the controller’s estimate of Λ is initially b Λ(0) = I3 b where I3 is the identity matrix of rank 3. Note that Λ(0) is significantly different from Λ, not only in magnitude, but also the sign of the one parameter on the main diagonal differs. That negative sign in Λ causes at least one of the eigenvectors of D to be of opposite direction compared to the eigenvectors of B. Remark 6.3 At this point, the normal adaptive law of MRAC, (2.28) for instance, would fail and cause the weights to diverge because it is using D, which is known, and the plant is using B and their eigenvectors are different. In the systems where D is in the subspace of B, this does not matter because their eigenvectors are the same, but that is not the case here. For this case’s simulation, the input is a series of steps of different magnitudes shown in the bottom plot of Figure 6.8 for the first 55 seconds. Then the reference has different magnitudes for all three states simultaneously for 20 seconds. Finally, the reference executes a sine waveform for state X1 , its derivative for state X2 and its second derivative for state X3 from 85 seconds to 95 seconds. Then the reference is zero on all states through the end of the simulation. Figure 6.8 shows that adding concurrent learning to this adaptation process removes the oscillation mentioned about Figure 6.7, and it shows the reference input. Remember that at the beginning, the scaling matrix and the weights matrix are both wrong so tracking errors are to be expected. Those errors lessen with time as can be seen in Figure 6.9. Again, as the tracking approaches zero, non-concurrent learning MRAC adaptation would slow and effectively stop. But, with concurrent learning the parameters and weights continue to be driven to their ideal values which is shown in Figure 6.10. There, the orange dashed lines are the ideal values for the weights and the history stack was shocked at about 9 seconds. b to This is marked on Figure 6.11 with a ‘+’ along with the display of the convergence of Λ b is shown in the appendix on page 104. its ideal values. The full length time history of Λ 55 State X1 10 5 0 −5 0 20 40 60 80 100 Time (sec) Alg 1 State X2 2 X rm 0 −2 0 20 40 60 80 100 60 80 100 Time (sec) State X3 5 0 −5 −10 0 20 40 Reference Time (sec) R1 R2 R3 2 0 −2 0 20 40 60 80 100 Time (sec) Figure 6.8: Dissolved Input Allocation Matrix, using concurrent learning. Note the reduction in oscillation compared to Figure 6.7, but this simulation has a negative sign in Λ and takes longer to settle. 56 Tracking Error e1 5 0 −5 0 20 40 60 80 100 60 80 100 60 80 100 Time (sec) 2 e2 1 0 −1 0 20 40 Time (sec) 5 e3 0 −5 −10 0 20 40 Time (sec) Figure 6.9: Dissolved Input Allocation Matrix Errors, using concurrent learning. The errors decrease with time. K x Element Values 10 5 0 −5 0 20 40 60 80 100 60 80 100 Time (sec) K r Element Values 10 5 0 −5 0 20 40 Time (sec) Figure 6.10: Dissolved Input Allocation Matrix Adaptive Weights, with concurrent learning. Note the same steps toward the ideal (dotted) values aligning with steps in the reference input. 57 b Element Values Λ 1 0.5 0 −0.5 −1 −1.5 0 2 4 6 8 10 Time (sec) b Convergence, using concurrent learning. The Figure 6.11: Dissolved Input Allocation Matrix, Λ time scale was reduced to show prior to the history stack shock which happened at the ‘+’. The full time is pictured in the appendix. 6.4 Summary Two cases have been shown in this chapter: the uncertain input allocation matrix and the dissolved input allocation matrix case. In both cases the reference model allocation matrices were quite different from the plant’s allocation matrix. In both cases the controllers converged onto the ideal weights and reduced the tracking error. 58 CHAPTER 7 Conclusion 7.1 Summary This thesis seeks to address the problem of uncertain input allocation in linear systems control. Using a reinforcement learning technique, concurrent learning, the result can be achieved with known state matrix. The approach relies on simultaneous estimation of the uncertain input allocation matrix while the system is actively controlled using that estimate of the uncertain allocation matrix. Lyapunov theory was used to show that the approach will result in system states being bounded. The concurrent learning history stack purging or shocking is required to remove data retained while the B matrix estimates were far from their true values. Three algorithms for shocking the history stack were presented. Algorithms 2 and 3 have the advantage to be able to handle multiple changes in the input allocation matrix if they happen relatively slowly. Algorithm 1 would not be able to do that in its present form. The local ultimate boundedness and local exponentially ultimate boundedness of this result are shown. Then a relaxation of the uncertain input allocation matrix assumption is discussed and shown to be locally ultimately bounded in the closed-loop control case and locally exponentially ultimately bounded as well. Simulation results for both systems are included to show the controller’s performance characteristics. These results establish the feasibility of using a learning based adaptive controller to handle uncertainties in input allocation matrices. 59 7.2 Future Work Recommendations for future work include • Relax the assumption that A be known. This can be accomplished if the sign of the input allocation matrix is known and does not change along the lines of [26]. • Work with NASA’s flexible Generic Transport Model, f–GTM, since the wings may cause situations where the input allocation matrix is uncertain • Delve into multiple objective optimization, MO–Op, where objectives may fight each others’ fulfillment. In the case of the f–GTM, the objectives could be something like fuel efficiency, ride comfort, and stability • Use risk averse methods to make inferences on stability of systems e is the • Investigate uncertain B matrix from the singular perturbation theory where B fast variable and the other system variables are considered slow 60 References [1] Karl Johan Åström and Björn Wittenmark. Adaptive Control. Addison-Weseley, Readings, 2nd edition, 1995. [2] Stephen Boyd and Shankar Sastry. Necessary and sufficient conditions for parameter convergence in adaptive control. Automatica, 22(6):629–639, 1986. [3] William Brogan. Modern control theory. Prentice Hall, Englewood Cliffs, N.J, 1991. [4] Girish Chowdhary. Concurrent Learning for Convergence in Adaptive Control Without Persistency of Excitation. PhD thesis, Georgia Institute of Technology, Atlanta, GA, 2010. [5] Girish Chowdhary and Eric N. Johnson. Concurrent learning for convergence in adaptive control without persistency of excitation. In 49th IEEE Conference on Decision and Control, pages 3674–3679, 2010. [6] Girish Chowdhary and Eric N. Johnson. Theory and flight test validation of a concurrent learning adaptive controller. Journal of Guidance Control and Dynamics, 34(2):592–607, March 2011. [7] Girish Chowdhary, Maximilian Mühlegg, Jonathan P. How, and Florian Holzapfel. Concurrent learning adaptive model predictive control. In Qiping Chu, Bob Mulder, Daniel Choukroun, Erik-Jan Kampen, Coen Visser, and Gertjan Looye, editors, Advances in Aerospace Guidance, Navigation and Control, pages 29–47. Springer Berlin Heidelberg, 2013. [8] Girish Chowdhary, Maximilian Mühlegg, and Eric N. Johnson. Exponential parameter and tracking error convergence guarantees for adaptive controllers without persistency of excitation. International Journal of Control, 87(8):1583–1603, 2014. 61 [9] Girish Chowdhary, Tongbin Wu, Mark Cutler, and Jonathan P. How. Rapid transfer of controllers between UAVs using learning based adaptive control. In IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2013. [10] Girish Chowdhary, Tansel Yucelen, Maximilian Mühlegg, and Eric N. Johnson. Concurrent learning adaptive control of linear systems with exponentially convergent bounds. International Journal of Adaptive Control and Signal Processing, 27(4):280–301, 2013. [11] Travis E. Gibson, Anuradha M. Annaswamy, and Eugene Lavretsky. Adaptive systems with closed-loop reference models: Stability, robustness and transient performance. arXiv preprint arXiv:1201.4897, 2012. [12] Wassim M. Haddad and VijaySekhar Chellaboina. Nonlinear Dynamical Systems and Control: A Lyapunov-Based Approach. Princeton University Press, Princeton, 2008. [13] Martin Hagan. Neural network design. PWS Pub, Boston, MA, 1st edition, 1996. [14] Zhuo Han and Kumpati Narendra. Multiple adaptive model for control. In Conference on Decision and Control, pages 60–65, Atlanta, December 2010. IEEE. [15] Petros A. Ioannou and Petar V. Kokotovic. Adaptive Systems with Reduced Models. Springer Verlag, Secaucus, NJ, 1983. [16] Hassan Khalil. Nonlinear systems. Prentice Hall, Upper Saddle River, N.J, 3rd edition, 2002. [17] Nakawan Kim. Improved Methods in Neural Network Based Adaptive Output Feedback Control, with Applications to Flight Control. PhD thesis, Georgia Institute of Technology, Atlanta Ga, 2003. [18] Eugene Lavretsky. Combined/composite model reference adaptive control. Automatic Control, IEEE Transactions on, 54(11):2692–2707, Nov. 2009. [19] Eugene Lavretsky, Travis E Gibson, and Anuradha M Annaswamy. Projection operator in adaptive systems, 2011. 62 [20] Maximilian Mühlegg, Girish Chowdhary, and Eric N. Johnson. Concurrent learning adaptive control of linear systems with noisy measurements. In AIAA Guidance, Navigation, and Control Conference, Guidance, Navigation, and Control and Co-located Conferences. American Institute of Aeronautics and Astronautics, 2012. [21] Kumpati S. Narendra and Anuradha M. Annaswamy. Stable Adaptive Systems. Prentice-Hall, Englewood Cliffs, 1989. [22] Kumpati S. Narendra and Jeyendran Balakrishnan. Adaptive control using multiple models. Automatic Control, IEEE Transactions on, 42(2):171–187, feb 1997. [23] Roger Nussbaum. Some remarks on a conjecture in parameter adaptive control. Systems & Control Letters, 3(5):243–246, November 1983. [24] R. Penrose. A generalized inverse for matrices. Mathematical Proceedings of the Cambridge Philosophical Society, 51:406–413, 7 1955. [25] Jean-Baptiste Pomet and Laurent Praly. Adaptive nonlinear regulation: estimation from the lyapunov equation. Automatic Control, IEEE Transactions on, 37(6):729– 740, June 1992. [26] Ben Reish, Girish Chowdhary, Kemal Ure, and Jonathan P. How. Concurrent learning adaptive control in the presence of uncertain control allocation matrix. In Proc. of the Conference on Guidance, Navigation, and Control, pages 1–19. AIAA, 2013. [27] Benjamin Reish and Girish Chowdhary. Concurrent learning adaptive control for systems with unknown sign of control effectiveness. In Proceedings of the Conference on Decision and Control, pages 4131–4136. IEEE, 2014. [28] Mario A. Santillo and Dennis S. Bernstein. Adaptive control based on retrospective cost optimization. Journal of Guidance, Control, and Dynamics, 33(2):289–304, MarchApril 2010. [29] Amith Somanath. Adaptive control of hypersonic vehicles in presence of actuation uncertainties. Sm, Massachusetts Institute of Technology, Cambridge, MA, June 2010. 63 [30] Gang Tao. Adaptive Control Design and Analysis. Wiley, New York, 2003. [31] Gang Tao, Suresh M. Joshi, and Xiaoli Ma. Adaptive state feedback and tracking control of systems with actuator failures. Automatic Control, IEEE Transactions on, 46(1):78–95, jan 2001. [32] Andrey Nikolayevich Tikhonov. On the stability of inverse problems. Dokl. Akad. Nauk SSSR, 39(5):195–198, 1943. [33] Ming-Jui Yu, Yousaf Rahman, Ella M. Atkins, Ilya V. Kolmanovsky, and Dennis S. Bernstein. Minimal modeling adaptive control of the NASA generic transport model with unknown control-surface faults. In AIAA Guidance, Navigation, and Control (GNC) Conference, Proc. of, pages 1–21, Boston, MA, August 2013. 64 APPENDIX A Derivations This appendix contains the full derivations of the jumps shown in the preceding chapters. Section A.1 shows the derivation of the normal Concurrent Learning MRAC equation using the known and unknown allocation matrix components. Section A.2 shows the derivation of the Concurrent Leaning regressor terms. Section A.3 shows the derivation of the system e and B. b Section A.4 derives the Lyapunov candidate and derivative and error in terms of B works through the bounding of them. Section A.5 expands the regressors for K and Kr to terms capable of being bounded. Section A.6 shows how the input can be expressed in e and K fr . terms of e, K, 65 A.1 Expanding the Input Matrix into Known and Unknown Components b e e b b Assume Arm = A+BK∗T and Brm = BK∗T r . Assume B = B− B because B ≡ B−B. Then, B is e is the component unknown to the controller. the component known to the controller while B ẋ =Ax + Bu (A.1) b − B)u e ẋ =Ax + (B (A.2) u =upd + urm = KT x + KTr r (A.3) b T x + KT r) − B(K e T x + KT r) ẋ =Ax + B(K r r (A.4) b T x + BK b T r ± (B b − B)K e ∗T x ± (B b − B)K e ∗T r − B(K e T x + KT r) =Ax + BK r r r (A.5) b − B)K e ∗T x + BK b T x + BK b T r − BK b ∗T x + BK e ∗T x ± (B b − B)K e ∗T r − B(K e T x + KT r) =Ax + (B r r r (A.6) b − B)K e ∗T x + B(K b T − K∗T )x + BK b T r + BK e ∗T x ± (B b − B)K e ∗T r − B(K e T x + KT r) =Ax + (B r r r (A.7) b − B)K e ∗T x + (B b − B)K e ∗T r + B( b K e T )x + BK b T r + BK e ∗T x − (B b − B)K e ∗T r =Ax + (B r r r e T x + KT r) − B(K r (A.8) b − B)K e ∗T x + (B b − B)K e ∗T r + B( b K e T )x + BK b T r + BK e ∗T x − BK b ∗T r + BK e ∗T r =Ax + (B r r r r e T x + KT r) − B(K r (A.9) b − B)K e ∗T x + (B b − B)K e ∗T r + B( b K e T )x + B(K b T − K∗T )r + BK e ∗T x + BK e ∗T r =Ax + (B r r r r e T x + KT r) − B(K r (A.10) b − B)K e ∗T x + (B b − B)K e ∗T r + B( b K e T )x + B( b K fr T )r + BK e ∗T x + BK e ∗T r =Ax + (B r r e T x + KT r) − B(K r (A.11) b − B)K e ∗T x + (B b − B)K e ∗T r + B( b K e T )x + B( b K fr T )r + BK e ∗T x + BK e ∗T r =Ax + (B r r e T x − BK e Tr − BK r (A.12) 66 T b − B)K e ∗T x + (B b − B)K e ∗T r + B( b K e T )x + B( b K fr )r =Ax + (B r e ∗T x − BK e T x + BK e ∗T r − BK e Tr + BK r r (A.13) T b − B)K e ∗T x + (B b − B)K e ∗T r + B( b K e T )x + B( b K fr )r =Ax + (B r e T − K∗T )x − B(K e T − K∗T )r − B(K r r (A.14) T b − B)K e ∗T x + (B b − B)K e ∗T r + B( b K e T )x + B( b K fr )r =Ax + (B r e K e T )x − B( e K fr T )r − B( T (A.15) T b eT b f e eT e f =(Ax + BK∗T x) + BK∗T r r + B(K )x + B(Kr )r − B(K )x − B(Kr )r (A.16) bK eT x − B eK eT x + B bK fr T r − B eK fr T r =(Arm )x + Brm r + B (A.18) bK eT x + B bK fr T r − B eK eT x − B eK fr T r =(A + BK∗T )x + Brm r + B (A.17) b − B) e K e T x + (B b − B) e K fr T r =(Arm )x + Brm r + (B (A.19) e T x + BK fr T r ẋ =Arm x + Brm r + BK (A.20) It can be seen that the derivation of ẋ is the same from either perspective, using B or using b − B. e Equation (A.19) is useful for further discussions. B 67 A.2 Define Concurrent Learning Error Terms Now the definition of the terms has to be addressed. Using (A.19) as a starting point for b and B e displayed in it. the derivation seem like a good start since it has both B b − B) e K e T x + (B b − B) e K fr T r ẋ =Arm x + Brm r + (B b − B) e K e T x + (B b − B) e K fr T r ẋ − Arm x − Brm r =(B T b − B) e K fr r =(B b − B) e K eT x ẋ − Arm x − Brm r − (B (A.21) (A.22) (A.23) b − B) e K fr T r =B bK eT x − B eK eT x ẋ − Arm x − Brm r − (B (A.24) bK e T x =ẋ − Arm x − Brm r − (B b − B) e K fr T r + B eK eT x B bK e T x =ẋ − Arm x − Brm r − B bK fr T r + B eK fr T r + B eK eT x B bK e T x =ẋ − Arm x − Brm r − B bK fr T r + B( e K fr T r + K e T x) B T (A.25) (A.26) (A.27) T e T x =B b † (ẋ − Arm x − Brm r − B bK fr r + B( e K fr r + K e T x)) (A.28) K = K From K to what concurrent learning stores: b † (ẋ − Arm x − Brm r − B bK fr T r + B( e K fr T r + K e T x)) (KT − K∗T )x =B T T (A.29) b † (ẋ − Arm x − Brm r − B bK fr r + B( e K fr r + K e T x)) δ = K∗T x =KT x − B (A.30) b † B( e K fr T r + K e T x) =KT x − B b † (ẋ − Arm x − Brm r − B bK fr T r) δ = K∗T x + B (A.31) So (A.30) shows why the stacks have to be shocked. Thus (A.31) shows what the controller actually uses to calculate δ on the right-hand-side e errors end up on the left-hand-side, in the history stack. so that the B Now the definition for the Kr term. fr T =KT − K∗T K r r T fr r =KT r − K∗T r Kr = K r r (A.32) (A.33) =KTr r − B† Brm r (A.34) b − B) e † Brm r =KTr r − (B (A.35) 68 (A.36) Working back alternate ways of showing the same thing. b e ∗T b ∗T e ∗T Brm =BK∗T r = (B − B)Kr = BKr − BKr (A.37) b † (Brm + BK e ∗T ) Kr∗T =B r (A.39) e ∗T =BK b ∗T Brm + BK r r (A.38) (A.40) Found an equation for K∗T r , and plugging it into A.33: T fr r =KT r − K∗T r Kr = K r r fr T r =KT r − B b † (Brm + BK e ∗T )r K r r fr T r + B b † BK e ∗T r =KT r − B b † Brm r b Kr = K r r T b † e ∗T b† KTr r − K∗T r r + B BKr r =Kr r − B Brm r b † e ∗T b† K∗T r r − B BKr r =B Brm r b† δc Kr = B Brm r (A.41) (A.42) (A.43) (A.44) (A.45) (A.46) Again, an error shows up that makes the reason for shocking easier to understand. Once again, (A.43) shows why shocking is necessary. 69 A.3 Derivative of the Error b−B e is used. Look at what happens to ė when B = B e =x − xrm (A.47) ė =ẋ − ẋrm (A.48) =Ax + Bu − Arm xrm − Brm r (A.49) b − B)u e − Arm xrm − Brm r =Ax + (B (A.50) b T x + KT r) − B(K e T x + KT r) − Arm xrm − Brm r =Ax + B(K r r (A.52) b − Bu e − Arm xrm − Brm r =Ax + Bu (A.51) b T x + BK b T r − B(K e T x + KT r) − Arm xrm − Brm r =Ax + BK r r (A.53) b T x + BK b T r − B(K e T x + KT r) ± (B b − B)K e ∗T x − Arm xrm − Brm r =Ax + BK r r (A.54) b − B)K e ∗T x + BK b T x + BK b T r − B(K e T x + KT r) − (B b − B)K e ∗T x − Arm xrm − Brm r =Ax + (B r r (A.55) b T x + BK b T r − B(K e T x + KT r) − BK b ∗T x + BK e ∗T x − Arm xrm − Brm r =Ax + (B)K∗T x + BK r r (A.56) b T x − BK b ∗T x + BK b T r − B(K e T x + KT r) + BK e ∗T x − Arm xrm − Brm r =(A + BK∗T )x + BK r r (A.57) b T − K∗T )x + BK b T r − B(K e T x − K∗T x + KT r) − Arm xrm − Brm r =(Arm )x + B(K r r (A.58) bK e T x + BK b T r − B([ e K e T ]x + KT r) − Brm r =Arm x − Arm xrm + B r r (A.60) bK e T x + BK b T r ± (B b − B)K e ∗T r − B( e K e T x + KT r) − Brm r =Arm (e) + B r r r (A.62) bK e T x + BK b T r − BK b ∗T r + BK e ∗T r − B( e K e T x + KT r) + (B)K∗T r − Brm r =Arm e + B r r r r r (A.64) b K e T )x + BK b T r − B([K e T − K∗T ]x + KT r) − Brm r =Arm x − Arm xrm + B( r r (A.59) bK e T x + BK b T r − B( e K e T x + KT r) − Brm r =Arm (x − xrm ) + B r r (A.61) bK e T x + BK b T r − (B b − B)K e ∗T r − B( e K e T x + KT r) + (B b − B)K e ∗T r − Brm r =Arm e + B r r r r (A.63) bK e T x + B(K b T − K∗T )r + BK e ∗T r − B( e K e T x + KT r) + Brm r − Brm r =Arm e + B r r r r bK e T x + B( b K fr T )r + BK e ∗T r − B( e K e T x + KT r) =Arm e + B r r 70 (A.65) (A.66) T bK eT x + B bK fr r − B( e K e T x + KT r − K∗T r) ė =Arm e + B r r bK eT x + B bK fr T r − B( e K eT x + K fr T r) ė =Arm e + B (A.67) (A.68) b is obvious in (A.68). Here again, the error term for using B A.4 The Lyapunov Candidate and Derivative Make ζ= eT e T vec(K) fr )T vec(K e T vec(B) T for ease of expostulation. The trace will be used in this Lyapunov candidate, so a section is reserved for pertinent properties of the trace operator. Then the section continues with the expansion of the Lyapunov candidate derivative. A.4.1 Properties of the Trace Operator The trace tr{·} is defined as the sum of the main diagonal elements of a square matrix. P Another way to think about it is tr{A} = i Aii or even (assuming the vec(·) operator is defined) tr{A} = vec(A)T vec(I), where I is the identity matrix of the same dimension as A. A matrix and its transpose have equal traces: tr{A} = tr AT . A trace of a product of several matrices is equal to the trace of those matrices in cyclical order: tr{ABCD} = tr{BCDA} = tr{CDAB} = tr{DABC} which is different than the trace of an arbitrarily ordered set of matrices. The trace of an arbitrary set of matrices is not equal to another order of them, in general: tr{ABC} 6= tr{ACB}. If the matrices, A, B, C, are symmetric then the order does not matter. Their traces will be equal. [3, Chapter 4] A.4.2 The Lyapunov Candidate Equation Define an equation as a non-negative, scalar valued function that measures the energy in the system. Usually, the working variables of the system make an appearance and since the 71 value should be non-negative, square the working variables. T e K fr , B) e = 1 eT Pe + 1 tr K e T Γ −1 K e + 1 tr K fr Γ −1 K fr + 1 tr B e T Γ −1 B e V(e, K, x r B 2 2 2T 2 e K fr , B) e = 1 ėT Pe + eT Pė + 1 tr K ė Γ −1 K e+K e T Γ −1 K ė V̇(e, K, x x 2 2 T T T −1 1 1 −1 f −1 e T −1 ė ḟ f ḟ ė e + tr Kr Γr Kr + Kr Γr Kr + tr B ΓB B + B ΓB B 2 2 (A.69) (A.70) The derivative of important variables are summarized here. Concurrent learning terms are included. bK eT x + B bK fr T r − B( e K eT x + K fr T r) ė =Arm e + B (A.71) b − B) e K e T x + (B b − B) e K fr T r ẋ =(Arm )x + Brm r + (B (A.72) T e T x + BK fr r ẋ =Arm x + Brm r + BK ė = − Γx (xeT PB b+ K pX max j=1 b+ = −Γx xeT PB ḟr = − Γr reT PB b − Γr K pX max j=1 pX max j=1 b+ = −Γr reT PB ė = − ΓB B xj b TK i ) j=1 TK i xj b TKr j rj b pX max (A.73) rj b TKr j b − ẋ + Ax)T + u(Bu 72 (A.74) pX max k=1 (A.75) uk (ẋk − Axk )T ! (A.76) A.4.3 The Lyapunov Candidate Derivative 1 V̇(ζ) = 2 h e K eT x + K fr T r) bK eT x + B bK fr T r − B( Arm e + B iT h bK eT x Pe + eT P Arm e + B i bK fr T r − B( e K eT x + K fr T r) +B T T T −1 1 1 −1 e T −1 ė −1 f ė e ḟ f ḟ + tr K Γx K + K Γx K + tr Kr Γr Kr + Kr Γr Kr 2 2 T 1 ė Γ −1 B e+B e T Γ −1 B ė + tr B B B 2 = = (A.77) i 1 h T T eB b T + rT K fr B b T − (K eT x + K fr T r)T B e T Pe + eT PArm e + eT PB bK eT x e Arm + xT K 2 T T b fT T e eT f +e PBKr r − e PB(K x + Kr r) T T T −1 1 1 −1 T −1 −1 ė Γ K e+K e Γ K ė + tr K ḟr Γ K fr + K fr Γ K ḟr + tr K x x r r 2 2 T 1 −1 e T −1 ė ė e + tr B ΓB B + B ΓB B (A.78) 2 1 T T eB b T Pe + rT K fr B b T Pe − (K eT x + K fr T r)T B e T Pe + eT PArm e + eT PB bK eT x e Arm Pe + xT K 2 bK fr T r − eT PB( e K eT x + K fr T r) +eT PB T T T −1 1 1 −1 e −1 f T −1 ė ė ḟ e f ḟ + tr K Γx K + K Γx K + tr Kr Γr Kr + Kr Γr Kr 2 2 T 1 ė Γ −1 B e+B e T Γ −1 B ė + tr B (A.79) B B 2 The Lyapunov equation is necessary for MRAC type control system stability to be analyzed. It says for two positive definite, symmetric matrices P, Q ∈ Rn×n , that AT P + PA = −Q. So, assuming A = Arm , select P such that ATrm P + PArm = −Q 73 V̇(ζ) = = 1 T T eB b T Pe + eT PB bK e T x + rT K fr B b T Pe e Arm Pe + eT PArm e + xT K 2 bK fr T r − (K eT x + K fr T r)T B e T Pe − eT PB( e K eT x + K fr T r) +eT PB T T 1 ė Γ −1 K e+K e T Γ −1 K ė + 1 tr K ḟr Γ −1 K fr + K fr T Γ −1 K ḟr + tr K x x r r 2 2 T 1 ė Γ −1 B e+B e T Γ −1 B ė + tr B B B 2 i 1 Th T eB b T Pe + eT PB bK e T x + rT K fr B b T Pe e Arm P + PArm e + xT K 2 bK fr T r − (K eT x + K fr T r)T B e T Pe − eT PB( e K eT x + K fr T r) +eT PB T T T −1 1 1 −1 e T −1 ė −1 f ė e ḟ f ḟ + tr K Γx K + K Γx K + tr Kr Γr Kr + Kr Γr Kr 2 2 T 1 ė Γ −1 B e+B e T Γ −1 B ė + tr B B B 2 (A.80) (A.81) Apply the Lyapunov equation result figured yesterday. = 1 T eB b T Pe + eT PB bK e T x + rT K fr B b T Pe e [−Q] e + xT K 2 bK fr T r − (K eT x + K fr T r)T B e T Pe − eT PB( e K eT x + K fr T r) +eT PB T T T −1 1 1 −1 e T −1 ė −1 f ė e ḟ f ḟ + tr K Γx K + K Γx K + tr Kr Γr Kr + Kr Γr Kr 2 2 T 1 ė Γ −1 B e+B e T Γ −1 B ė + tr B B B 2 74 (A.82) V̇(ζ) = 1 T eB b T Pe + eT PB bK e T x + rT K fr B b T Pe −e Qe + xT K 2 bK fr T r − (K eT x + K fr T r)T B e T Pe − eT PB( e K eT x + K fr T r) +eT PB T pX max 1 b+ e + tr −Γx xeT PB xj b TK i Γx−1 K 2 j=1 pX max 1 T T −1 T −1 T b −1 f T e ḟ f ḟ −K Γx Γx xe PB + xj b K i + tr Kr Γr Kr + Kr Γr Kr 2 j=1 T 1 −1 T −1 ė Γ B e+B e Γ B ė + tr B B B 2 = (A.83) 1 T eB b T Pe + eT PB bK e T x + rT K fr B b T Pe −e Qe + xT K 2 bK fr T r − (K eT x + K fr T r)T B e T Pe − eT PB( e K eT x + K fr T r) +eT PB "p #T max X 1 b T PexT K e − Γx Γ −1 e + tr −Γx Γx−1 B xi b TK i K x 2 i=1 # " T pX max T −1 1 −1 f T −1 T b T ḟ f ḟ e −K Γx Γx xe PB + xi b K i + tr Kr Γr Kr + Kr Γr Kr 2 i=1 T 1 ė Γ −1 B e+B e T Γ −1 B ė + tr B B B 2 75 (A.84) V̇(ζ) = 1 T eB b T Pe + eT PB bK e T x + rT K fr B b T Pe −e Qe + xT K 2 bK fr T r − (K eT x + K fr T r)T B e T Pe − eT PB( e K eT x + K fr T r) +eT PB #T "p max X 1 bT e− e + tr −B PexT K xi b TK i K 2 i=1 # "p max 1 T X T −1 −1 T T T e xe PB b−K e ḟr Γ K fr + K fr Γ K ḟr −K xi b K i + tr K r r 2 i=1 T 1 −1 T −1 ė Γ B e+B e Γ B ė + tr B B B 2 1 = 2 1 = 2 (A.85) fr B b T Pe + eT PB bK fr T r − eT Qe + rT K T T T T T T e eT e f e f − (K x + Kr r) B Pe − e PB(K x + Kr r) #T "p max X 1 T e bT T T T e b Pex K e− xi b K i K + tr x KB Pe − B 2 i=1 # "p max 1 T X T −1 T b eT T T b T −1 f e e ḟ f ḟ + e PBK x − K xe PB − K xi b K i + tr Kr Γr Kr + Kr Γr Kr 2 i=1 T 1 T −1 ė −1 e e ė (A.86) + tr B ΓB B + B ΓB B 2 fr B b T Pe + eT PB bK fr T r − eT Qe + rT K T T T T T T e eT e f e f − (K x + Kr r) B Pe − e PB(K x + Kr r) "p #T max X 1 bT e−B b T PexT K e− e + tr B PexT K xi b TK i K 2 i=1 "p # max 1 T X T −1 T T b T T b T −1 f e e e ḟ f ḟ b + K xe PB − K xe PB − K xi K i + tr Kr Γr Kr + Kr Γr Kr 2 i=1 T 1 −1 e T −1 ė ė e + tr B ΓB B + B ΓB B (A.87) 2 76 V̇(ζ) = 1 2 T fr B b T Pe + eT PB bK fr r − eT Qe + rT K T T T T T T e eT e f e f − (K x + Kr r) B Pe − e PB(K x + Kr r) "p #T "p # max max X X 1 e+0−K e xi b TK i K xi b TK i + tr 0 − 2 i=1 i=1 T T 1 ḟr Γ −1 K fr + K fr T Γ −1 K ḟr + 1 tr B ė Γ −1 B e+B e T Γ −1 B ė + tr K r r B B 2 2 1 = 2 fr B b T Pe + eT PB bK fr T r − eT Qe + rT K T e eT = = T e eT T eK fr r − x KB Pe − − e P BK x − e P B # # "p "p max max X X 1 T T T e e + tr −K xi b K i xi b K i − K 2 i=1 i=1 T T T −1 1 1 T −1 ė −1 e −1 f e ḟ ė f ḟ + tr Kr Γr Kr + Kr Γr Kr + tr B ΓB B + B ΓB B 2 2 = fr B e T Pe rT K T 1 T fr B b T Pe + eT PB bK fr T r − rT K fr B e T Pe − eT PB eK fr T r −e Qe + rT K 2 "p # max X 1 eB e T Pe − eT PB eK e T x − 2K eT xi b TK i + tr −xT K 2 i=1 T T T −1 1 1 T −1 ė −1 f −1 e e ḟ f ḟ ė + tr Kr Γr Kr + Kr Γr Kr + tr B ΓB B + B ΓB B 2 2 1 T fr B b T Pe + eT PB bK fr T r − rT K fr B e T Pe − eT PB eK fr T r −e Qe + rT K 2 "p # max X 1 T T e eT T T e eT T e xi b K i + tr −(e PBK x) − e PBK x − 2K 2 i=1 T T T −1 1 1 −1 f −1 e T −1 ė ḟ f ḟ ė e + tr Kr Γr Kr + Kr Γr Kr + tr B ΓB B + B ΓB B 2 2 1 T fr B b T Pe + eT PB bK fr T r − rT K fr B e T Pe − eT PB eK fr T r −e Qe + rT K 2 "p # max X 1 eK e T x − 2K eT xi b TK i + tr −2eT PB 2 i=1 T T T −1 1 1 −1 f −1 e T −1 ė ḟ f ḟ ė e + tr Kr Γr Kr + Kr Γr Kr + tr B ΓB B + B ΓB B 2 2 77 (A.88) (A.89) (A.90) (A.91) (A.92) e is taken care of, move on to K fr . Now that K V̇(ζ) = 1 T fr B e T Pe − eT PB eK fr T r fr B b T Pe + eT PB bK fr T r − rT K −e Qe + rT K 2 "p # max X T e eT T T e − tr e PBK x + K xi b K i i=1 T T T −1 1 1 −1 f −1 e T −1 ė ḟ f ḟ ė e + tr Kr Γr Kr + Kr Γr Kr + tr B ΓB B + B ΓB B 2 2 = (A.93) 1 T fr B e T Pe − eT PB eK fr T r fr B b T Pe + eT PB bK fr T r − rT K −e Qe + rT K 2 "p # T max X 1 T e eT T T −1 e T −1 ė e ė e − tr e PBK x + K xi b K i + tr B ΓB B + B ΓB B 2 i=1 T p p max max X X T −1 1 T b T −1 f T b T f rj b Kr j Γr Kr − Kr Γr Γr re PB + rj b Kr j + tr −Γr re PB + 2 j=1 j=1 (A.94) = 1 T fr B b T Pe + eT PB bK fr T r − rT K fr B e T Pe − eT PB eK fr T r −e Qe + rT K 2 # "p T max X 1 T −1 e T −1 ė T e eT T ė e e K i + tr B ΓB B + B ΓB B − tr e PBK x + K xi b 2 i=1 T pX pX max max T −1 1 −1 T b T f T T f b + tr −Γr Γr re PB + rj b Kr j Kr − Kr Γr Γr re PB + rj b Kr j 2 j=1 j=1 (A.95) = 1 T fr B b T Pe + eT PB bK fr T r − rT K fr B e T Pe − eT PB eK fr T r −e Qe + rT K 2 "p # T max X 1 T e eT T T e ė Γ −1 B e+B e T Γ −1 B ė − tr e PBK x + K xi b K i + tr B B B 2 i=1 T pX pX max max T T T 1 T T T T b Per K fr − fr − K fr re PB b−K fr rj b Kr j K rj b Kr j (A.96) + tr −B 2 j=1 j=1 78 V̇(ζ) = = = 1 T fr B e T Pe − eT PB eK fr T r −e Qe − rT K 2 "p # T max X 1 T e eT T T −1 e T −1 ė ė e e − tr e PBK x + K xi b K i + tr B ΓB B + B ΓB B 2 i=1 T pX max T 1 T f bT b T PerT K fr − bK fr r − B b fr − K fr T reT PB rj b TKr j K + tr r Kr B Pe + eT PB 2 j=1 pX max T T f b rj Kr j (A.97) − Kr j=1 1 T fr B e T Pe − eT PB eK fr T r −e Qe − rT K 2 "p # T max X 1 T T e eT T −1 e T −1 ė e ė e K i − tr e PBK x + K xi b + tr B ΓB B + B ΓB B 2 i=1 T pX max 1 b T PerT K fr + K fr T reT PB b−B b T PerT K fr − K fr T reT PB b− fr rj b TKr j K + tr B 2 j=1 pX max T T fr −K rj b Kr j (A.98) j=1 1 T fr B e T Pe − eT PB eK fr T r −e Qe − rT K 2 "p # T max X 1 −1 e T −1 ė T T e eT T e ė e xi b K i + tr B ΓB B + B ΓB B − tr e PBK x + K 2 i=1 T p p max max X T X T 1 T T f f + tr 0 + 0 − Kr rj b Kr j − Kr rj b Kr j 2 j=1 = (A.99) j=1 1 T fr B e T Pe − eT PB eK fr T r −e Qe − rT K 2 "p # T max X 1 T e eT T −1 e T −1 ė T ė e e − tr e PBK x + K xi b K i + tr B ΓB B + B ΓB B 2 i=1 pmax 1 fT X T b + tr −2K r r j Kr j 2 j=1 79 (A.100) V̇(ζ) = 1 T −e Qe 2 "p max X # T 1 −1 e T −1 ė ė e + tr B ΓB B + B ΓB B 2 i=1 pX max T T 1 T f eT eK fr r − 2K fr rj b TKr j + tr −r Kr B Pe − eT PB 2 eK eT x + K eT − tr e PB T xi b TK i (A.101) j=1 = 1 T −e Qe 2 "p max X # T 1 −1 e T −1 ė ė e + tr B ΓB B + B ΓB B 2 i=1 pX max 1 eK fr T r)T − eT PB eK fr T r − 2K fr T rj b TKr j + tr −(eT PB 2 eK eT x + K eT − tr e PB T xi b TK i (A.102) j=1 = 1 T −e Qe 2 T e eT eT − tr e PBK x + K "p max X TK i xi b # T 1 −1 e T −1 ė ė e + tr B ΓB B + B ΓB B 2 i=1 pmax T X 1 T e fT f + tr −2e PBKr r − 2Kr rj b TKr j 2 (A.103) j=1 pX pX max max T T Q eK eT x + K eT eK fr r + K fr = − eT e − tr eT PB xi b TK i − tr eT PB rj b TKr j 2 i=1 j=1 T 1 −1 e T −1 ė ė e (A.104) + tr B ΓB B + B ΓB B 2 ė to its derivative. Now, expand B pX max pX max T T e fT T f b − tr e PBKr r + Kr rj Kr j Q eK eT x + K eT e − tr eT PB xi b TK i 2 i=1 j=1 " # T pX max 1 b − ẋ + Ax)T + b k − ẋk + Axk )T Γ −1 B e uk (Bu + tr −ΓB u(Bu B 2 k=1 " # pX max e T Γ −1 ΓB u(Bu b − ẋ + Ax)T + b k − ẋk + Axk )T −B u ( Bu k B k=1 V̇(ζ) = − eT 80 (A.105) pX max pX max eK fr T r + K fr T rj b TKr j − tr eT PB Q eK eT x + K eT e − tr eT PB xi b TK i 2 i=1 " #T pX max 1 b − Bu)T + b k − Buk )T B e + tr −ΓB ΓB−1 u(Bu uk (Bu 2 k=1 " # pX max T T T e b b − B u(Bu − Bu) + uk (Buk − Buk ) k=1 V̇(ζ) = − eT j=1 (A.106) pX max Q eK fr T r + K fr T eK eT x + K eT rj b TKr j = − eT e − tr eT PB xi b TK i − tr eT PB 2 j=1 i=1 " # #T " pX pX max max 1 e k )T e T+ e k )T B e−B e T u(Bu) e T+ uk (Bu uk (Bu + tr − u(Bu) 2 pX max k=1 k=1 (A.107) pX pX max max T T Q eK eT x + K eT eK fr r + K fr = − eT e − tr eT PB xi b TK i − tr eT PB rj b TKr j 2 i=1 j=1 " # " #T pX pX max max 1 eT e−B e T uuT B eT + eT B eT + (A.108) uk uTk B + tr − uuT B uk uTk B 2 k=1 k=1 pX max T TQ T T e eT T T e fT T e f b b =−e xi K i − tr e PBKr r + Kr e − tr e PBK x + K rj Kr j 2 i=1 j=1 "p # pX max max X 1 T T T T T T T T e e− e k uk u B e−B e uu B e −B e e + tr −Buu B B uk uk B (A.109) k 2 pX max k=1 k=1 pX max T T Q eK eT x + K eT eK fr r + K fr xi b TK i − tr eT PB rj b TKr j = − eT e − tr eT PB 2 i=1 j=1 " p #T pX max max X 1 e Te T eT e T B) e T− B eT e k uk uT − B eT + tr −Buu B − (Buu B u u B (A.110) k k k 2 pX max k=1 81 k=1 pX max Q eK eT x + K eT e − tr eT PB xi b TK i 2 i=1 pX max 1 e TB e − 2B eT eT + tr −2Buu uk uTk B 2 V̇(ζ) = − eT pX max eK fr T r + K fr T rj b TKr j − tr eT PB j=1 (A.111) k=1 pX max Q eK eT x + K eT e − tr eT PB xi b TK i 2 i=1 pX max e TB e+B eT eT − tr Buu uk uT B = − eT pX max eK fr T r + K fr T rj b TKr j − tr eT PB j=1 (A.112) k k=1 A.5 Expanding the Hatted Epsilons As noted, b K = ∆K + ∆Kr + K and b Kr = ∆Kr + Kr . In (A.112), the b K and b Kr have to be expanded in order to attempt a bound on the Lyapunov candidate derivative. Expanding b K : pX max eK eT x + K eT xi b T tr eT PB (A.113) Ki eK eT x + K eT tr eT PB T e eT eT tr e PBK x + K eK eT x + K eT tr eT PB eK eT x + K eT tr eT PB T e eT eT tr e PBK x + K eK eT x + K eT tr eT PB i=1 pX max i=1 pX max i=1 pX max i=1 pX max i=1 pX max i=1 pX max xi (∆K + ∆Kr + K )T eT T xi (∆K + ∆Kr ) + K eT xi (∆K + ∆Kr )T + K eT xi (∆K + ∆Kr )T + K pX max i=1 pX max i=1 pX max i=1 e xi (∆K + ∆Kr ) + K Ωx K T xi (∆K + ∆Kr )T i=1 82 eT (A.114) xi TK (A.115) e T xi )T xi (K (A.116) e xi xTi K (A.117) e T Ωx K e + tr K (A.118) (A.119) Expanding b Kr : pX max eK fr T r + K fr T rj b TKr j tr eT PB j=1 pX max eK fr T r + K fr T rj (∆Kr + Kr )T tr eT PB j=1 pX pX max max T T T eK fr r + K fr fr rj TKr rj (∆Kr )T + K tr eT PB j=1 j=1 pX pX max max T T T T eK fr r + K fr fr fr rj )T rj ∆TKr + K rj ( K tr eT PB j=1 j=1 pmax pmax T X T X T e fT T f f fr tr e PBKr r + Kr rj ∆Kr + Kr rj rTj K j=1 j=1 pmax T X T e fT fr T Ωr K fr f tr e PBKr r + Kr rj ∆TKr + K j=1 pmax T T X T T e fT f fr Ωr K fr tr e PBKr r + Kr rj ∆Kr + tr K (A.120) (A.121) (A.122) (A.123) (A.124) (A.125) (A.126) j=1 A.6 (A.127) Expanding the Input Here the input, u, is expanded into terms that may be bounded. The idea is that since u is squared, it is non-negative so this section shows that it could be expanded into the four working variables, but it does not need to be. u =KT x + KTr r (A.128) e + K∗ ) T x + ( K fr + K∗ )T r =(K r (A.129) e T + K∗T )(e + xrm ) + (K fr T + K∗T )r =(K r (A.131) e + K∗ )T (e + xrm ) + (K fr + K∗ )T r =(K r (A.130) eT e + K e T xrm + K∗T e + K∗T xrm + K fr T r + K∗T r =K r (A.132) 83 Now, expand uuT to elements that are boundable. uuT =(KT x + KTr r)(KT x + KTr r)T (A.133) =(KT x + KTr r)(xT K + rT Kr ) (A.134) T ∗ T ∗ eT e + K e T xrm + K∗T e + K∗T xrm + K fr T r + K∗T r)(eT K e + xT K e =(K r rm + e K + xrm K (A.135) fr + rT K∗ ) + rT K r e T eeT K e+K e T exT K e eT T ∗ eT T ∗ eT T f eT T ∗ =K rm + K ee K + K exrm K + K er Kr + K er Kr T ∗ T ∗ Tf T ∗ e T xrm eT K e+K e T xrm xT K e eT eT eT eT +K rm + K xrm e K + K xrm xrm K + K xrm r Kr + K xrm r Kr ∗T T ∗ ∗T T ∗ ∗T T f ∗T T ∗ e + K∗T exT K e + K∗T eeT K rm + K ee K + K exrm K + K er Kr + K er Kr ∗T T ∗ ∗T T ∗ ∗T Tf e + K∗T xrm xT K e + K∗T xrm eT K rm + K xrm e K + K xrm xrm K + K xrm r Kr T T T T T fr reT K e+K fr rxT K e f T ∗ f T ∗ f Tf + K∗T xrm rT K∗r + K rm + Kr re K + Kr rxrm K + Kr rr Kr T ∗T T ∗ ∗T T ∗ ∗T T f fr rrT K∗ + K∗T reT K e + K∗T rxT K e +K r r r rm + Kr re K + Kr rxrm K + Kr rr Kr T ∗ + K∗T r rr Kr (A.136) T ∗T T ∗ ∗T T ∗ ∗T T ∗ e T eeT K e+K e T xrm xT K e f Tf =K rm + K ee K + K xrm xrm K + Kr rr Kr + Kr rr Kr e T exT K e eT T ∗ eT T ∗ eT T f eT T ∗ +K rm + K ee K + K exrm K + K er Kr + K er Kr e T xrm eT K e+K e T xrm eT K∗ + K e T xrm xT K∗ + K e T xrm rT K fr + K e T xrm rT K∗ +K rm r ∗T T ∗ ∗T T f ∗T T ∗ e + K∗T exT K e + K∗T eeT K rm + K exrm K + K er Kr + K er Kr ∗T T ∗ ∗T Tf e + K∗T xrm xT K e + K∗T xrm eT K rm + K xrm e K + K xrm r Kr T T T T fr reT K e+K fr rxT K e f T ∗ f T ∗ + K∗T xrm rT K∗r + K rm + Kr re K + Kr rxrm K T ∗T T ∗ ∗T T ∗ ∗T T f fr rrT K∗ + K∗T reT K e + K∗T rxT K e +K r r r rm + Kr re K + Kr rxrm K + Kr rr Kr (A.137) That is, 6 positive definite terms and 30 not so positive definite terms. Using the same constants as in Chapter 3, the upper bound of uuT is 84 2 2 2 e e f 2 2 2 2 uuT ≤ K kek + c2rm K + κ2x kek + κ2x c2rm + c2r K r + cr κr 2 e e e e e f 2 + crm kek K + κx K kek + crm κx K kek + cr K kek K r + κx cr K kek 2 e e e e f e + crm kek K + crm κx K kek + c2rm κx K + cr crm K Kr + κr crm cr K e e f + κx kek2 K + κx crm kek K + κ2x crm kek + cr κx kek K r + κx κr cr kek f e e + κx crm kek K + κx c2rm K + κ2x crm kek + κx crm cr K r f e f e f f + κx crm cr κr + cr K r kek K + crm cr Kr K + κx cr Kr kek + cr crm κx Kr f e e 2 2 f + cr κr Kr + κr cr kek K + κr cr crm K + κr cr κx kek + κr cr crm κx + κr cr Kr (A.138) and combining terms is 2 2 e e e e f 2 2 2 2 uuT ≤ K kek + 2κx K kek + 2crm kek K + 2cr K kek K r + κx kek 2 2 e f e e f + c2rm K + c2r K r + 4crm κx K kek + (κx + κr )cr K kek + 2cr κx kek Kr e f e e 2 + 2cr crm K Kr + 3κr crm cr K + 2crm κx K + 2κ2x crm kek + 2κx κr cr kek f 2 f 2 2 2 2 + κx crm cr K + 2κ c (A.139) r r r Kr + cr κr + crm κx + 2cr crm κx κr . Then rearranging to place positive definite terms first: 2 2 e e 2 2 uuT ≤ K kek + κ2x kek + c2rm K + c2r κ2r + c2rm κ2x + 2cr crm κx κr 2 2 e e e f e 2 2 f + 2κx K kek + 2c kek K + 2c K kek K + c K + 4c κ r rm r rm x K kek r r e f e f e + (κx + κr )cr K K K + 2c c + 3κ c c kek + 2cr κx kek K r r rm r r rm r K e f 2 f + 2c2rm κx K (A.140) + 2κ2x crm kek + 2κx κr cr kek + κx crm cr K r + 2κr cr Kr . Since (4.13) has uuT negative, the bound can be found by comparing coefficients. So then, κ2 c + κ κ c 3κr crm cr e 2 e x r r 2 x rm T 2 kek − 2crm kek + + crm −uu ≤ − K − 2κx K + crm 2κx 2 κx κr cr 3κr cr crm e 2 f + 5c2rm + 7c2r κ2r + 12crm cr κx κr + kek2 + K − cr K r crm 2κx e e f e f + (8crm κx + cr κx + cr κr ) kek K + 2cr kek K Kr + 2cr crm K Kr f + κx cr crm + 2κr c2r K (A.141) r . 85 Continuing to complete the square where possible the inequality expands to κ2 c + κ κ c 3κr crm cr e e 2 x r r 2 x rm 2 −uu ≤ − K − 2κx K + kek − 2crm kek + + crm crm 2κx 3κr cr crm κx κr cr e e 2 kek2 + + K + (8crm κx + cr κx + cr κr ) kek K crm 2κx 2 e f e f f 2 2 2 + 5crm + 7cr κr + 12crm cr κx κr − cr Kr + 2cr kek K Kr + 2cr crm K Kr f (A.142) + κx cr crm + 2κr c2r K r . T κ2 c + κ κ c 3κr crm cr e e 2 x r r 2 x rm 2 kek − 2crm kek + + crm −uu ≤ − K − 2κx K + crm 2κx 2 3κr cr crm 8crm κx + cr crm κx + cr crm κr κx κr cr e e 2 2 kek + kekK + + K crm κx κr cr 2κx 2 f e f e f + 5c2rm + 7c2r κ2r + 12crm cr κx κr − cr K + 2c kek K K + 2c c r r r r rm K Kr f + κx cr crm + 2κr c2r K (A.143) r T κ2 c + κ κ c 3κr crm cr e 2 e x r r 2 x rm 2 ≤ − K − 2κx K + kek − 2crm kek + + crm crm 2κx 2 2 κx κr cr 8crm κx + cr crm κx + cr crm κr 3κr cr crm e e 2 + kek + K + K crm 2κx κr cr 2κx 2 2 κx κr cr 8crm κx + cr crm κx + cr crm κr e 2 − K 4crm κx κr c r 2 f e f e f + 5c2rm + 7c2r κ2r + 12crm cr κx κr − cr K + 2c kek K K + 2c c r r r r rm K Kr f + κx cr crm + 2κr c2r K (A.144) r κ2 c + κ κ c 3κr crm cr e 2 e x r r 2 x rm 2 kek − 2crm kek + + crm ≤ − K − 2κx K + crm 2κx 2 2 κx κr cr 8crm κx + cr crm κx + cr crm κr e + kek + K crm 2κx κr cr 2 2 ! 8crm κx + cr crm κx + cr crm κr 3κr cr crm e 2 − + K 2κx 4κx κr cr crm 2 e f e f f + 5c2rm + 7c2r κ2r + 12crm cr κx κr − cr K r + 2cr kek K Kr + 2cr crm K Kr 2 f + κx cr crm + 2κr cr Kr (A.145) 86 κ2 c + κ κ c 3κr crm cr e 2 e x r r 2 x rm 2 −uu ≤ − K − 2κx K + + crm kek − 2crm kek + crm 2κx 2 2 κx κr cr 8crm κx + cr crm κx + cr crm κr e kek + + K crm 2κx κr cr ! 2 2 8crm κx + cr crm κx + cr crm κr 3κr cr crm e 2 − − K 4κx κr cr crm 2κx 2 f 2 f − cr K r + κx cr crm + 2κr cr Kr e f e f + 5c2rm + 7c2r κ2r + 12crm cr κx κr + 2cr kek K (A.146) Kr + 2cr crm K Kr T κ2 c + κ κ c 3κr crm cr e 2 e x r r 2 x rm 2 ≤ − K kek − 2c kek + + c − 2κ K + rm x rm crm 2κx 2 2 8crm κx + cr crm κx + cr crm κr κx κr cr e kek + + K crm 2κx κr cr ! 2 2 8crm κx + cr crm κx + cr crm κr 3κr cr crm e 2 − − K 4κx κr cr crm 2κx f2 f − cr Kr − [κx crm + 2κr cr ] K r e f e f + 5c2rm + 7c2r κ2r + 12crm cr κx κr + 2cr kek K Kr + 2cr crm K Kr (A.147) κ2 c + κ κ c 3κr crm cr e 2 e x r r 2 x rm 2 kek − 2crm kek + + crm ≤ − K − 2κx K + crm 2κx 2 2 κx κr cr 8crm κx + cr crm κx + cr crm κr e + kek + K crm 2κx κr cr ! 2 2 8crm κx + cr crm κx + cr crm κr 3κr cr crm e 2 − − K 4κx κr cr crm 2κx 2 cr f κx crm + 2κr cr − cr K + (κx crm + 2κr cr )2 r − 2 4 e f e f + 5c2rm + 7c2r κ2r + 12crm cr κx κr + 2cr kek K K + 2c c (A.148) r r rm K Kr A.6.1 Expanding the Input with W Instead of K and Kr , use W. −uuT = − W T σ(W T σ)T = −W T σσT W f + W ∗ )T σσT (W f + W∗) = − (W 87 (A.149) (A.150) Let cK ≥ kW ∗ k. fT + cK )σσT (W f + cK ) −uuT ≤ − (W (A.151) fT + cK )(W f + cK )σT σ ≤ − (W h i fT W f + Wc f K + cK W f + c2 σT σ ≤− W K (A.152) (A.153) (A.154) Apply the norm: Let cR ≥ [ crm cr ]. 2 f f 2 2 2 −uuT ≤ − W kσk − 2cK W kσk − c2K kσk kek ≥ kσk − cR (A.156) kek + cR ≥ kσk (A.157) kσk2 ≤(cR + kek)(cR + kek) (A.158) ≤ kek2 + 2cR kek + cR (A.159) Going back to (A.155) 2 f f 2 2 −uuT ≤ − W (kek + 2cR kek + c2R ) − 2cK W (kek + 2cR kek + c2R ) − c2K (kek2 + 2cR kek + c2R ) 2 2 2 f f f f 2 2 ≤ − W − 2cK W kek kek − 2cR kek W − c2R W f 2 2 2 2 2 2 f − 4cR cK W kek − 2c c K R W − cK kek − 2cR cK kek − cK cR 2 2 f f f 2 2 2 ≤ W kek − 2cK W kek − c2K kek − 2cR kek W 2 f f f − c2R W kek − 2cK c2R W − 2cR c2K kek − c2K c2R − 4cR cK W Now pull out the negative and look at the function. 2 2 f f f 2 2 2 kek + c2K kek + 2cR kek W W kek + 2cK W 2 f f f + c2R W + 4cR cK W kek + 2cK c2R W + 2cR c2K kek + c2K c2R 2 kek + 2cR kek + (A.155) c2R 2 f f 2 W + 2cK W + cK 88 (A.160) (A.161) (A.162) (A.163) (A.164) f −uuT ≤ −(kek + cR )2 (W + cK )2 89 (A.165) APPENDIX B Concurrent Learning Update The purpose of this appendix is to assist the reader in understanding the Model Reference Adaptive Control (MRAC) update law and how it is affected by adding concurrent learning (CL-MRAC). B.1 The MRAC Update Law In MRAC with no concurrent learning, the learning stops as soon as the error between the reference model and the actual system decreases to very small values. The adaptive law is: ċ = −ΓxeT PB W (B.1) c is the adapting variable, Γ > 0 is the adaptive learning rate, x is the state vector, where W e is the tracking error vector, P is the solution to the Lyapunov equation, AP + PAT − Q = 0 ([16]), and B is the input allocation matrix. B.1.1 CL-MRAC Update Law Concurrent Learning (CL) may be used to alleviate this condition as discussed in [4]. CL acts like extra excitement when the input is not exciting as defined by Tao in [30]. The MRAC-only adaptive update law, (B.1), is only effective when x and e are large. If e goes to zero, the adaptation stops. The CL-MRAC adaptation law is " # n X T T ċ = −Γ xe PB + Wb xi i i=1 90 (B.2) where the new terms are xi , previously recorded states and i , previously recorded adaptation errors, and n, the upper limit to the number of stored or budgeted points. These are stored in a history stack which is governed by the algorithm outlined in [10]. But, note that the summation term in (B.2) is not dependent on the current state or the current state tracking error. The points in the history stack were recorded when the input was exciting so the summation is adding excitation when the first term of (B.2) is small. Therefore, the adaptation will continue even while the state tracking error is small and the ideal values will be attained. The addition of concurrent learning is like a stochastic gradient ascent for neural networks (stochastic gradient ascent, [13]). The stored data points are like the test points in a neural network. Execute the gradient search at each one and add the results together. The total result will generally be more accurate in pointing to the goal than any of the single points taken alone. B.2 MRAC Example To help illustrate this concept, the simple MRAC control problem is illustrated in the next section. Two linear systems are shown in Figure B.1 where a.) is the step response of the plant or actual system, and b.) is the step response of the desired model. Note that Figure B.1 a.) overshoots the amplitude of 1, the desired value, and then oscillates for 13 seconds before reducing to under 2% of the desired, which is the definition of settling time. Figure B.1 b.) has no overshoot and settles in 3 seconds. B.2.1 Model Reference Adaptive Control Only The state tracking is shown in Figure B.2, and the effect of the control law is evidenced in the first three seconds. The error was large, but as adaptation happened, the error decreases and the reference input is then dominant. The tracking error is shown in Figure B.4. The time history of the adaptive gains is plotted in Figure B.3. The lack of adaptation can be 91 a.) Amplitude Step Response 2 1.5 1 0.5 0 0 2 4 6 8 10 12 14 Time (seconds) b.) Amplitude Step Response 1 0.5 0 0 0.5 1 1.5 Time (seconds) 2 2.5 3 Figure B.1: Step Response of Two Linear Systems. Note the differences in magnitude and settling time between the plant, a.), and the reference model, b.). The goal is to make a system that naturally acts like a.) instead take on the response of the system shown in b.) seen by comparing when the gains in Figure B.3 stop adapting (flatten out) and when the state tracking errors (especially the error for the second state) become very small in Figure B.4. Since the gains have not attained their ideal values, when the system encounters a new region of the state-space, the actual system will not respond as the reference model does. In the past, this problem was overcome by requiring the system to have persistently exciting input ([2]), but that is not easily attained in an aerospace application. Step inputs are much more tolerable. B.2.2 Model Reference Adaptive Control with Concurrent Learning This concept is illustrated in Figure B.5–B.9. A linear system is selected for this example that has oscillation while the model settles within 3 seconds to the demanded signal with no overshoot. The state tracking and the input to the system is shown in Figure B.5 where the green dashed line indicates the model. The tracking is quite good even at the start. The demanded signal is a series of steps, both positive and negative, which is more like what aerospace applications use than the sinusoid that Figures B.2–B.4 use. Figure B.6 shows the error between the system state and the model state. Note that the error has jumps that coincide with the steps in the demanded signal shown in Figure B.5. Those are not 92 1 X 1 Amplitude 0.5 Xrm1 0 −0.5 −1 0 2 4 6 8 10 Time (sec) 12 14 16 1.5 X2 1 Amplitude 18 20 Xrm2 0.5 0 −0.5 −1 0 2 4 6 8 10 Time (sec) 12 14 16 Input u 18 20 Command 1 Amplitude 0.5 0 −0.5 −1 0 2 4 6 8 10 Time (sec) 12 14 16 18 20 Figure B.2: Model Reference Adaptive Control update law. Note the difference in the first couple of seconds while the input is jumping around. Then the input settles down and tracking error is quite small by 8 seconds. The bottom plot shows the commanded input and the total input including the adaptive gains. 93 0.5 Amplitude 0 K*x1 Kx1 −0.5 −1 −1.5 0 2 4 6 8 10 Time (sec) 12 14 16 18 20 0 Amplitude −0.2 −0.4 K*x2 Kx2 −0.6 −0.8 −1 0 2 4 6 8 10 Time (sec) 12 14 16 18 20 2.5 Amplitude 2 1.5 K*r Kr 1 0.5 0 −0.5 0 2 4 6 8 10 Time (sec) 12 14 16 18 20 Figure B.3: Model Reference Adaptive Control update law gains Kx and Kr . Note that the actual gains never attain their ideal values indicated by the green lines. 0.01 e1 0 −0.01 −0.02 0 2 4 6 8 10 Time (sec) 12 14 16 18 20 0 2 4 6 8 10 Time (sec) 12 14 16 18 20 0.2 e2 0.1 0 −0.1 −0.2 Figure B.4: Model Reference Adaptive Control time history of tracking error. Note that the error for both states is quite small by about 8 seconds which is when the gains stopped adapting. Also, the original response of the plant can be seen in the first 3 seconds prior to being damped out by the reference model. 94 2 X1 1 Xref1 0 −1 −2 0 2 4 6 8 10 Time (sec) 12 14 16 18 20 5 X2 Xref2 0 −5 0 2 4 6 8 10 Time (sec) 12 14 16 18 20 12 14 16 18 20 4 2 0 Input u Command −2 −4 0 2 4 6 8 10 Time (sec) Figure B.5: Concurrent Learning MRAC State Tracking. The system states are solid lines while the model states are dashed. The bottom plot shows the input to the system as solid and the commanded signal as dashed. Note the good state tracking in the top and middle plots. Stepped input was selected over sinusoidal. e1 0.05 0 −0.05 0 2 4 6 8 10 Time (sec) 12 14 16 18 20 0 2 4 6 8 10 Time (sec) 12 14 16 18 20 1 e2 0.5 0 −0.5 −1 Figure B.6: Concurrent Learning MRAC Tracking Errors. The system state tracking errors compared to the reference model. Note the jumps in the error coincide with steps in the command signal from Figure B.5. 95 0.5 0 Kx1 −0.5 K*x1 −1 −1.5 0 2 4 6 8 10 Time (sec) 12 14 0 16 18 20 18 20 Kx2 K*x2 −0.5 −1 −1.5 0 2 4 6 8 10 Time (sec) 12 14 16 2.5 2 1.5 Kr 1 K*r 0.5 0 0 2 4 6 8 10 Time (sec) 12 14 16 18 20 Figure B.7: Concurrent Learning MRAC Adaptive Weights. Note the gains, solid, attain their ideal values, dashed, within 6 seconds. apparent in Figure B.4 because the input is a smooth function. The time history of the adaptive weights for the concurrent learning system is shown in Figure B.7. Note that the ideal values are attained quickly and are continuously reattained when the gains are driven off the ideal values by large errors due to steps in the demand signal. Figure B.3 displays nothing like this behavior: the adaptation just stops. B.2.3 Concurrent Learning in the Weight Space The weight space of a system is the area where the weights or gains live. It can be Rn or even larger than the number of states for the system. In the case of neural networks, the weight space can be several times larger than the state space in dimension. It is separate and apart from the state space. The weight space of the Concurrent learning MRAC is displayed in Figure B.8 where the ideal values are the juncture of the orange lines and the randomly selected starting values are in the bottom center of the figure. This figure shows the way that MRAC and concurrent learning work together to guide the gains to their ideal values based on the data they each have. It becomes apparent that the MRAC adaptive law and the concurrent 96 2.5 2 1.5 CL Vector MRAC Adaptive Law Weights Ideal Values Kr 1 0.5 0 −0.5 −1.5 −1 −0.5 0 0.5 −1.5 Kx1 −0.5 −1 0 Kx2 Figure B.8: Concurrent Learning MRAC Adaptive Weight Space. The actual weights are difficult to see among the arrows. The blue arrows show the direction of the summation CL vector while the green arrows show the direction of adaption if only the MRAC adaptive law was used. The orange lines cross at the ideal values for the gains. 97 Adaptation vectors for Time Instant 764 Adaptation vectors for Time Instant 864 −0.84 −0.86 −0.88 −0.945 Concurrent Learning Summation vector Individual Concurrent Learning vectors Normal Adaptive Law vector Combined CL−MRAC vector Ideal Values −0.946 −0.947 Concurrent Learning Summation vector Individual Concurrent Learning vectors Normal Adaptive Law vector Combined CL−MRAC vector Ideal Values −0.9 −0.948 Kx2 Kx2 −0.92 −0.949 −0.94 −0.95 −0.96 −0.951 −0.98 −0.952 −1 −1.02 −1.05 −1.04 −1.03 −1.02 −1.01 −1 −0.99 −0.98 −0.953 −1.004 −0.97 Kx1 −1.003 −1.002 −1.001 −1 −0.999 −0.998 −0.997 −0.996 Kx1 a.) b.) Figure B.9: Concurrent Learning MRAC Weight Stochastic Gradient. In each, the red arrows are the direction of adaption from each element of the history stack. The blue line is the summation of the red arrows. The green arrow is the MRAC adaptation law direction and the black arrow is the combined concurrent learning and MRAC adaptation direction. Plot a.) shows this for time instant 7.64 seconds while Plot b.) displays this for time instant 8.64 seconds. Note that in a.), the combined vector is close to, but not directly pointing at, the ideal values indicated by the X. In b.), the combined vector points directly to the ideal values. learning law are at odds with each other. This means the actual adaptation is smaller and smoother than either by itself. Now looking at two instances of the adaptation split into MRAC adaptive law and the concurrent learning law directions, the plots in Figure B.9 expound on the similarity of concurrent learning to stochastic gradient optimization in neural networks. The red vectors in both are the directions attained by utilizing the elements of the history stack to obtain a bearing on the ideal values for the system. The blue arrow is the summation of the red ones and the green arrow is the MRAC adaptation vector as in Figure B.8. The black arrow is the combined Concurrent learning MRAC adaptation vector. In Figure B.9 a.), the MRAC adaptation vector is pointing in the opposite direction from the ideal weights, indicated by the X. One second later, Figure B.9 b.) shows the combined concurrent learning MRAC vector passing through the ideal values. Also note about Figure B.9 b.) that the MRAC adaptation vector is difficult to see, being so small. This would have caused minuscule adaptation under just MRAC. With concurrent learning, the adaptation is driven even while the state tracking error is small. Without concurrent learning, MRAC controllers are only high-gain controllers and will 98 not attain their ideal values unless persistently excited. The ideal values of the adaptive weights are important because once they are attained, the system will act like the model for any input at any point in the state-space. If the ideal values have not been attained when a new region of the state-space is encountered, the adaptive law, (B.1), will direct change in the gains based on the error which will most likely be large. In an attempt to reduce the error, the MRAC adaptation will change the gains. This is not really guided by anything, c.f. Figure B.8 where the green vectors are pointing away from the ideal value point. 99 APPENDIX C Acronyms Acronym CL CL-HSMEM f–GTM LQR MO–Op MPC MRAC ODE PE Expanded Version Concurrent Learning Concurrent Learning History Stack Minimum Eigenvalue Maximization Routine flexible Generic Transport Model Linear Quadratic Regulator Multi-Objective Optimization Model Predictive Control Model Reference Adaptive Control Ordinary Differential Equation Persistence of Excitation 100 APPENDIX D Additional Plots This appendix contains plots in addition to those from Chapter 6. 101 Tracking Error e1 2 0 −2 0 20 40 60 80 100 60 80 100 60 80 100 Time (sec) e2 0.5 0 −0.5 0 20 40 Time (sec) e3 1 0 −1 0 20 40 Time (sec) Figure D.1: System Tracking Errors Shown Using Algorithms 1 (solid), 2 (dashed), and 3 (dash dot). Note that the error reduces quickly once the stacks are shocked. 102 Figure D.2: Dissolved Input Allocation Matrix Weight Space. The red stars ‘*’ are the inital weights while the orange diamonds ‘’ show the ideal values. 103 Kx(3,1) Kr(3,1) 0 0.1 Kr(2,1) −0.4 0.2 −0.3 −0.2 −0.1 0 Kx(2,1) −2 2 −1.5 −1 −0.5 0 −2 0 −4 Kr(1,1) −2 2 Kx(1,1) 0 4 Kx(3,2) Kr(3,2) 0 0 1 Kr(2,2) 0 2 0.01 0.02 0.03 Kx(2,2) −1 5 −0.5 0 0.5 1 0 0 −5 −1 1 Kr(1,2) −0.5 Kx(1,2) 2 0 Kx(3,3) Kr(3,3) 0 0.1 Kr(2,3) −3 0.2 −2 −1 0 1 Kx(2,3) −6 10 −4 −2 0 2 0 −10 −10 −2 0 Kr(1,3) 0 Kx(1,3) 10 2 Version b Element Values Λ 1 0.5 0 −0.5 −1 −1.5 0 20 40 60 80 Time (sec) b Figure D.3: Dissolved Input Allocation Matrix Time History of Λ. 104 100 VITA Benjamin David Reish Candidate for the Degree of MASTER OF SCIENCE Thesis: CONCURRENT LEARNING IN THE PRESENCE OF UNCERTAIN INPUT ALLOCATION Major Field: Mechanical and Aerospace Engineering Biographical: Personal Data: Lives in Edmond, OK Education: Completed the requirements for the Master’s of Science degree with a major in Mechanical and Aerospace Engineering at Oklahoma State University in May, 2015. He holds a Bachelor of Science in Mechanical Engineering from Oklahoma Christian University from 2004. Experience: Cognizant Engineer F101/F118 Engines Section, Tinker AFB, OK: Jet engine performance and engine accessories maintenance from 2005 to 2012. Teacher’s Assistant Mechanical Engineering Department, Oklahoma Christian University, Edmond, OK: Instrumentation lab, spring 2011. Teacher’s Assistant Mechanical and Aerospace Engineering Department, Oklahoma State University, Stillwater, OK: Dynamics, spring 2014. Research Assistant DASLab, Oklahoma State University, Stillwater, OK: spring 2013 to present. Professional Memberships: IEEE and DASLab
© Copyright 2025 Paperzz