2010 American Control Conference Marriott Waterfront, Baltimore, MD, USA June 30-July 02, 2010 WeB14.1 Stochastic Iterative Learning Control Design for Nonrepetitive Events Sandipan Mishra and Andrew Alleyne Abstract— This paper proposes a lifted domain ILC design technique for repetitive processes with significant non-repetitive disturbances. The learning law is based on the minimization of the expected value of a cost function (i.e., error norm) at each iteration. The derived learning law is iteration-varying and depends on the ratio of the covariance of non-repetitive component of the error to the covariance of the residual total error. This implies that in earlier iterations the learning is rapid (large learning gains) and as iterations go by, the algorithm is conservative and learns slowly. The proposed algorithm is also extended to the case where the learning filter is fixed and the optimal (iteration-varying) learning rate needs to be determined. Finally, the performance of the proposed method is evaluated vis-a-vis a geometrically decaying learning algorithm and an optimal fixed-rate learning algorithm through simulation of a Micro-robotic deposition system. I. INTRODUCTION Iterative learning control (ILC) is a feedforward control design technique for repetitive processes. ILC algorithms use information from earlier trials of the repetitive process to improve performance in the current trial [1]. ILC has been employed in a wide variety of applications including industrial robotics, computer-numerical control tools, injection molding systems, rapid thermal processing, microscale robotic deposition , and rehabilitation robotics. Detailed surveys of ILC algorithms and their applications can be found in [2], [3]. The most attractive feature of ILC is its simple implementation and robustness to model uncertainty. ILC algorithms are designed with the assumption that the disturbances are repetitive, i.e., they remain the same from one run of the process to another. This assumption is however not valid for most processes, where nonrepetitive disturbances or events often occur. Therefore, it is important to account for the effect of noise and nonrepetitive events on ILC performance. Of late, significant interest has been generated in the ILC community towards understanding and analyzing the effects of nonrepetitive disturbances in ILC. Saab [4] developed a stochastic ILC framework for designing optimal P-type learning. Moore [5] proposed a higher order ILC algorithm to suppress the effect of nonrepetitive but structured disturbances. [6], [7] used a forgetting factor to avoid the accumulation of nonrepetitive disturbances. In [8], Helfrich et al designed optimal feedback and ILC algorithms taking into account the frequency domain characteristics of the nonrepetitive and repetitive disturbances. Bristow [9] analyzed the effect of learning rate on the amplification of nonrepeating components of the error and proposed optimal learning algorithms for a fixed convergence rate. Longchamp [10] statistically evaluated and compared three alternate ILC algorithms for robustness to nonrepetitive disturbances and showed that the heuristic geometrically decaying learning algorithm proposed in [11] provided good robustness. However, there has been so far no systematic determination of an optimal (decreasing) learning rate taking into account the frequency spectra of the nonrepetitive and repetitive components of the error in ILC. In this paper, we present stochastic machinery in lifted domain to understand the effect of nonrepetitive events and design algorithms that take into account the nature of these nonrepetitive events in order to improve performance for linear ILC systems. We propose a stochastic norm optimal learning algorithm that takes into account repetitive and nonrepetitive components of the disturbance by minimizing the expected value of the 2-norm of the error at each iteration. This norm optimal learning algorithm is shown to be iteration-varying, with faster learning rates in initial iterations and slower learning rates as iterations go by. The rate of learning is effectively governed by the ratio of nonrepetitive error to total error at each iteration. This is intuitive since we wish to have aggressive learning initially when large repetitive disturbances exist and then slow down learning in order to avoid accumulation of nonrepeating disturbances. Section II describes the general problem framework for a repetitive process with repetitive and nonrepetitive disturbances. II. ILC P ROBLEM S ETUP FOR A F EEDBACK -C ONTROLLED R EPETITIVE P ROCESS Let us consider a discrete time linear time invariant (LTI) plant, denoted by P. The plant is controlled via an LTI feedback controller C. This closed loop system is stable and executes a repetitive process with period of N samples. We want the output of the system to track a trajectory yd ( j), where j ranges from 0 to N − 1. This is repeated several times, with the system coming back to rest condition at the end of each iteration of the cycle, and starting at rest condition at the beginning of each iteration. This system is illustrated in Figure 1. From this, we can derive the following input-output relationships. S. Mishra is a post doctoral research asscoiate in the Department of Mechanical Science and Engineering, University of Illinois, UrbanaChampaign, IL 61802. [email protected] A. Alleyne is a faculty member with the Department of Mechanical Science and Engineering, University of Illinois, Urbana-Champaign, IL 61802. [email protected] 978-1-4244-7425-7/10/$26.00 ©2010 AACC 1266 yk ( j) = (I + P(z)C(z))−1 P(z)C(z)yd ( j) + (I + P(z)C(z))−1 P(z) u f ,k ( j) + d( j) + nk ( j) (1) ek ( j) = yd ( j) − ym,k ( j) (2) yd- eekC(z) − 6 u f ,k d -? e -? e - P(z) 6 nk not always true, by shifting the output signal in time till the relative degree is 0, we can make G full column rank. Consider the learning law u f ,k+1 = u f ,k + Lek , assuming zero measurement noise ξ (·) = 0, we get the following equations for error and control effort evolution. -yk ? eξk Fig. 1. Block Diagram of Closed loop System for the kth trial. We make the assumption that the overall closed loop system is of relative degree 1. This assumption may be relaxed by shifting the input signal forward in time by r steps for systems with relative degree r. Consider the vectors obtained by stacking the signals yd (·), yk (·), u f ,k (·), etc. for each cycle. These are denoted as, for example, T yd = yd (0) yd (1) yd (2) . . . yd (N − 1) T yk = yk (0) yk (1) yk (2) . . . yk (N − 1) T u f ,k = u f ,k (0) u f ,k (1) u f ,k (2) . . . u f ,k (N − 1) Assuming that the system starts at intial rest condition (yk ( j) ≡ 0 ∀ j ≤ 0, ∀k), Eq. 1 for each step j ∈ {0, 1, 2, . . . , N − 1} can be unified into yk = Gyd yd + Gu f u f ,k + Gd (d + nk ) + Gξ ξ k ek = yd − yk where Q : ℜN → ℜN and L : ℜN → ℜN . III. ILC AND N ONREPETITIVE S IGNALS In this section, we derive optimal learning matrices for the ILC algorithm based on the minimization of the cost function given by the expected value of the error 2-norm. Our goal is to determine the learning matrix Lk so that this is minimized at each iteration. In other words, we aim to minimize (5) J = E eTk+1 ek+1 The plant output equations in lifted form for the kth trial of the process are given by (9) uk+1 = (I − Lk G) uk + Lk w + Lk vk (10) ek+1 = w + vk+1 − Guk+1 (11) We assume that the nonrepetitive component vk is a zero mean Gaussian random variable in ℜN with covariance V. The frequency spectrum of nonrepetitive component of the error is captured in the structure of V. We also introduce the following definitions of input and error covariance matrices: h i Xuu,k = E (uk − E[uk ]) (uk − E[uk ])T (12) h i Xee,k = E (ek − E[ek ]) (ek − E[ek ])T (13) Note that: J = E eTk+1 ek+1 = trace E ek+1 eTk+1 = trace Xee,k+1 + E [ek+1 ] E [ek+1 ]T (14) From the error (Eq. 9) and control (Eq. 10) update equations, we get (3) where Gyd , Gu f , Gd , and Gξ ∈ ℜN×N . We will refer to the lifting operation on a transfer function ( H( jω) ) to get the corresponding matrix ( H ) as L (H( jω)) = H. A linear learning law u f ,k+1 (·) = Q(z) u f ,k (·) + L(z)ek (·) can be written in the lifted form as shown in Eq. 4. u f ,k+1 = Q u f ,k + Lek (4) ek = w + vk − Guk Xee,k+1 = GXuu,k+1 GT + V (15) Xuu,k+1 = (I − Lk G) Xuu,k (I − Lk G)T + Lk VLTk (16) Xee,k+1 = (I − GLk ) GXuu,k GT (I − GLk )T + GLk VLTk GT + V (17) T Also, we get E [ek+1 ] E [ek+1 ] = (I − GLk ) E [ek ] E [ek ]T (I − GLk )T (18) Plugging the above into Eq. 14, we get E eTk+1 ek+1 = trace (I − GLk ) E [ek ] [ek ]T + GXuu,k GT (I − GLk )T +GLk VLTk GT + V = trace (I − GLk ) E ek eTk − V (I − GLk )T + GLk VLTk GT + V = trace (I − GLk ) E ek eTk (I − GLk )T + GLk V +VLTk GT We denote E ek eTk = Φk , and note that E eTk ek = trace(Φk ). Therefore, we get, yk = Gyd yd + Gu f u f ,k + Gd (d + nk ) (6) Φk+1 = (I − GLk ) Φk (I − GLk )T + GLk V + VLTk GT ek = yd − yk (7) = Φk + GLk (V − Φk ) + (V − Φk ) LTk GT + GLk Φk LTk GT (19) u f ,k+1 = u f ,k + Lk ek (8) Φk is already determined from the kth run of the process. So, we must find the optimal learning matrix Lk based on Φk and V. Further, Φk+1 is an indicator of the expected size of the repeating disturbance in stochastic terms for the k + 1 cycle. The update equation for Φ is We club all the repetitive (invariant with k) terms together into w = (I − Gyd )yd − Gd d, and all the nonrepetitive terms together into vk = Gd nk . We also drop the subscript f for the control effort and the matrices, henceforth. We also assume that the matrix Gu ≡ G is full column rank. While this is 1267 Φk+1 = (I − GLk ) Φk (I − GLk )T + GLk V + VLTk GT (20) In order to find the optimal matrix Lk , we set the gradient of the cost function (trace(Φk+1 )) to zero, as shown below. ∇Lk trace (I − GLk ) Φk (I − GLk )T +GLk V + VLTk GT = 0 (21) −1 T opt T −1 ⇒ Lk = G G G I − VΦk (22) From Eq. 22 we can see that if V ≈ Φk , then we should have very slow learning (Lopt k = 0), while if V << Φk , −1 Lopt ≈ G (which means one-step inversion learning). This k makes intuitive sense, i.e., if we have very large nonrepetitive components (V ≈ Φk ), then learning will only degrade performance, therefore we should NOT learn (Lopt k = 0). On the other hand, if there are no nonrepetitive disturbances, the best learning one step inversion learning algorithm −1is the T T (Lopt = G G G ). For notational clarity, we will drop k the superscript opt on the optimal learning gain for the remainder of the paper. In order to get the optimal βk , we set and solve for βk . d (1 − βk )2 trace(Φk ) + 2βk trace(V) = 0 dβk ⇒ βk trace(Φk ) = trace(Φk ) − trace(V) trace(V) trace(V) = 1− T βk = 1 − trace(Φk ) E ek ek 1) Φ0 may be initialized to the size of the error in the first iteration (with no learning, L ≡ 0). Further, if the repetitive and nonrepetitive parts of the error are independent, Φ0 = V + W, where W is the covariance of the repetitive error. 2) trace (Φk+1 ) ≤ trace (Φk ), i.e., Jk+1 ≤ Jk . This property is very desirable since it guarantees monotonicity of convergence of the expected value of the Euclidean norm of the error. Further, this condition is sufficient to claim stability of the proposed learning law. See Appendix for proof. 3) The contraction rate ρk = σ̄ (I − GLk ) is smaller for earlier iterations. In other words, ρk ≤ ρk+1 < 1. Also, ρk → 1 as k → ∞. In other words, we start with aggressive learning and as iterations go by learning decreases. In the limit, the convergence rate approaches 1 (i.e., no learning). This is a desirable property that has been used heuristically in many ILC algorithms, see for example [11]. IV. F IXED S TRUCTURE L EARNING R ATE In many applications, the learning functions (lifted matrices) are already determined from the available mathematical model of the system, but the rate of learning needs to be decided. In other words, the structure of the matrix L is fixed, but has a variable learning rate , i.e., Lk = βk L0 , with βk ∈ ℜ+ . We propose a method to determine the optimal iteration-varying gain to minimize the expected value of the 2-norm of error. Case 1: L0 = G−1 . Φk+1 = (I − GLk ) Φk (I − GLk )T + GLk V + VLTk GT = (I − βk I) Φk (I − βk I)T + βk V + βk V ⇒ trace(Φk+1 ) = (1 − βk )2 trace(Φk ) + 2βk trace(V) (23) =0 (24) (25) (26) Note that βk ∈ [0, 1] ∀k, therefore, the ILC scheme is monotonically convergent in error. Case 2: L0 is a general learning matrix satisfying σ̄ (I − GL0 ) < 1. As before, we obtain the optimal βk at each step by minimizing trace(Φk+1 ), i.e., by setting dβd trace (Φk+1 ) = 0 k and solving for βk . d (trace(Φk+1 )) = 0 dβk ⇒ Properties of Optimal Learning d dβk trace (Φk+1 ) d trace Φk + βk GL0 (V − Φk ) + βk (V − Φk ) LT0 GT dβk +βk2 GL0 Φk LT0 GT = 0 trace (GL0 (Φk − V)) ⇒ βk = trace GL0 Φk LT0 GT V. N ORM O PTIMAL S TOCHASTIC I TERATIVE L EARNING C ONTROL The earlier derivations assume that G is full column rank. We now consider the norm optimal cost function J = E eTk+1 ek+1 + uTk+1 ST Suk+1 (27) which does not require G to be full rank if ST S is chosen to be > 0. The penalty on the control effort also improves the robustness of the learning algorithm [12]. Consider the augmented system ek+1 w vk+1 G = + − uk+1 (28) Suk+1 0 0 −S ēk+1 = w̄ + v̄k+1 − Ḡuk+1 (29) T The cost functiontherefore becomes J = E ēk+1 ēk+1 = trace E ēk+1 ēTk+1 = trace(Φ̄k+1 ). Further, we have V 0 V̄ = 0 0 Φee,k Φeu,k Φ̄k = Φue,k Φuu,k ek uk+1 = uk + L̄ Suk Using the same development as in section III, we get the optimal learning gain to be −1 T T L̄k = Ḡ Ḡ Ḡ I − V̄Φ̄−1 (30) k Plugging in Eq 30 into the learning update law and consolidating the error and control effort terms, we obtain the 1268 learning update law −1 uk+1 = GT G + ST S −1 −1 S uk Φ Φ Φ GT G + GT V Φee,k − Φeu,k Φ−1 eu,k uu,k uu,k ue,k −1 T −1 ek +G I − V Φee,k − Φeu,k Φuu,k Φue,k (31) It is interesting to note that now we have a Q-filter effect because of the penalty on control effort. Further, the convergence rate is −1 T V 0 G T T −1 T ρk = σ̄ G G + S S Φ̄k <1 G S 0 0 S (32) Fig. 2. Plot of Reference Trajectory for µ-RD system. VI. S IMULATION R ESULTS We now demonstrate the performance of the proposed method on a simulated model of a Micro Robotic Deposition (µ-RD ) system. A detailed description of the modeling and control of this system may be found in [13]. The closed loop system model for the y-axis for this system is given by (z + 0.9963) z2 − 1.768z + 0.9567 P(z) = 0.0459 (z − 1) (z − 0.9772) (z2 − 1.764z + 0.9562) z2 − 0.2238z + 0.7933 × 2 (z − 0.1784z + 0.7898) The µ-RD system is run (experimentally) for M = 50 iterations with u f ,0 ≡ 0. The reference trajectory is a constant velocity scan as shown in Figure 2. The covariances of the nonrepetitive (V) and repetitive (W) components of the error (e0j ) are determined by uk+1 = uk + W = L (Pww ( jω)) M 1 Pvv ( jω) = |V ( jω)|2 = ∑ |F e0j − E e0 |2 M j=1 V = L (Pvv ( jω)) Φ0 = W + V, where F is the Discrete-time Fourier transform operator and L is the lifting operator introduced in Section II. Figure 3 shows experimentally obtained V ( jω) and W ( jω). We fit transfer functions through these given by. 7.926 × 10−5 e jω − 0.9921 −4 jω 2.864 × 10 e + 2.832 × 10−4 W 0 ( jω) = e2 jω − 1.973e jω + 0.9732 A. Fixed-Structure Learning Rate In this section, we consider two algorithms with decaying learning rates. The first algorithm is discussed in [10] and is a geometrically decaying algorithm 1 M 0 E e0 = ∑ ej M j=1 Pww ( jω) = |W ( jω)|2 = |F E e0 |2 V 0 ( jω) = Fig. 3. Frequency Spectra of Repetitive (W ( jω)) and Nonrepetitive (V ( jω)) Components of the Tracking Error. (33) (34) 1 Lek k+1 (35) −1 T where L = GT G G . The second algorithm uses the same learning structure L, but uses the method proposed in Section IV to determine the learning rate. trace(V) uk+1 = uk + 1 − Lek (36) eTk ek Figure 4 shows the rate of learning against iteration number. Note that both algorithms decay to zero learning as k → ∞. Therefore both methods have the same performance against nonrepetitive disturbances in steady state. However, the proposed method has a faster rate of decay. As a result, the transient performance is better. Figure 5 shows the plot of error 2-norm over iterations. We notice that as predicted the transient response of the optimal learning algorithm is better 1269 than that of the heuristic geometrically decaying learning algorithm. However, the steady state performance of both algorithms is the same. Fig. 4. Convergence rate ( 1 - learning rate) for the trajectory following error vs. number of trials k for the fixed structure learning algorithms (Optimal Rate and Geometrically Decaying Rate). We compare the performance of the above fixed rate optimal learning algorithm to: −1 T Lk = GT G G I − VΦ−1 (39) k Φk+1 = (I − GLk ) Φk (I − GLk )T + GLk V + VLTk GT (40) uk+1 = uk + Lk ek (41) , where Φ0 = W + V. Figure 6 shows the plot of the 2-norm of error against iteration number for the above two schemes. η was chosen as 0.3 to match the initial learning rate of both algorithms. We see that the performance of the two algorithms is comparable in the initial iterations. However, in the limit as iterations go by, the stochastic optimal ILC algorithm has a smaller error norm. This is because there is no residual learning for nonrepetitive disturbances that dominate in later iterations. At the same time, there is more computational burden at each iteration in recomputing the learning matrix Lk . Fig. 6. 2-norm of the trajectory following error ek vs. number of trials k for the fixed rate optimal learning algorithm and the proposed stochastic optimal ILC algorithm. Fig. 5. 2-norm of the trajectory following error ek vs. number of trials k for the fixed structure learning algorithms (Optimal Rate and Geometrically Decaying Rate). B. Stochastic Optimal Learning The fixed structure learning rate algorithms do not take into account the frequency distribution of the nonrepetitive and repetitive components of the error signal. In this section, we compare two algorithms: (a) fixed learning rate algorithm proposed in [9], and (b) optimal iteration-varying learning rate algorithm proposed in Section III. The lifted version of the fixed rate learning algorithm in [9], for a maximum rate of convergence (η) is given by √ √ √ L = ΠG ((1 − η)W) ((1 + η)W + η(1 − η)V)−1 (37) uk+1 = uk + Lek (38) The attractive feature of this algorithm is that it explicitly accounts for the relative magnitudes of nonrepetitive and repetitive disturbances in different frequency bands. However, since it does not have a decaying learning rate, there is always some accumulation of nonrepetitive disturbances. C. Discussion Based on the simulation results presented above, we make the following observations. Remark 1 The geometrically decaying algorithm is attractive from an implementation point of view because of its simplicity. Further, since the rate of learning decays to zero, there is no residual impact of nonrepetitive disturbances from earlier iterations. At the same time, in comparison to the optimal learning rate algorithm the transient performance of the geometrically decaying algorithm is a little worse. Further, the fixed decay rate means that there is no resetting of the learning in case of an unanticipated change. By using the error-norm in the learning rate decay term in Equation 36, a sudden increase in error norm re-triggers the learning algorithm. Finally, the frequency spectra of the repetitive and nonrepetitive disturbances is not taken into account for either of these algorithms, leading to suboptimal performance. Remark 2 The optimal fixed-rate learning algorithm proposed in [9] uses the frequency spectra of the nonrepetitive 1270 and repetitive components of the error and performs very well in the initial iterations of the process, when the repetitive and nonrepetitive disturbance spectra match well with that of the first iteration. This can be seen clearly in Figure 6. However, the fixed learning rate forces us to have some accumulation of error cause by nonrepetitive components, as can be seen in Figure 6. In contrast, the varying learning rate algorithm with a decaying rate of learning (proposed in Section III) does not cause accumulation of the nonrepeating error. This performance benefit, however, comes at the cost of substantially more computation (forward propagation of the covariance matrices). VII. C ONCLUSIONS Performance of ILC algorithms is degraded in the presence of nonrepetitive components in the error signal. With a faster rate of learning, the effect of nonrepetitive disturbances is amplified. At the same time, in order to remove repetitive disturbances, the rate of learning should be high. Therefore, there exists a clear tradeoff between amplifying nonrepetitive disturbances and attenuating repetitive disturbances. This paper posed this trade-off from a lifted-domain ILC framework perspective and provided expressions for optimal learning rates. The optimal learning rate was found to be dependent on the relative magnitudes of the nonrepetitive error to the total error at each iteration. As the nonrepetitive and total error sizes become comparable, the rate of learning decreases. As a result, nonrepetitive disturbances are not amplified in steady state. A comparison with a geometrically decaying algorithm showed that the steady state error performance is the same, however the transient performance is poorer as compared to the optimized learning rate algorithm. The tradeoff here is that the geometrically decaying algorithm does not require heavy computation in between iterations. Therefore, in applications where between-iteration times are small, the geometrically decaying algorithm provides an attractive solution at the cost of marginal loss in transient performance. The proposed optimal method, on the other hand, provides better performance in cases where the frequency spectra of the nonrepetitive and repetitive disturbances overlap. A central drawback of the method proposed in this paper is the need for forward propagation of the covariance matrices. In order to overcome this, an online estimation method for these matrices can be investigated. This online estimation may be done through processing of the error information along the iteration. At the same time, frequency domain equivalents for the learning algorithms derived in this paper will provide elegant and computationally advantageous implementation. We aim to address these issues in the future. R EFERENCES [3] H.-S. Ahn, Y. Chen, and K. Moore, “Iterative learning control: Brief survey and categorization,” Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, vol. 37, no. 6, pp. 1099–1121, Nov. 2007. [4] S. Saab, “Stochastic p-type, d-type iterative learning control algorithms,” International Journal of Control, vol. 76, pp. 139–148(10), 2003. [5] Y. Chen and K. Moore, “Harnessing the nonrepetitiveness in iterative learning control,” Decision and Control, 2002, Proceedings of the 41st IEEE Conference on, vol. 3, pp. 3350–3355 vol.3, Dec. 2002. [6] S. Arimoto, “Robustness of learning control for robot manipulators,” in Proc. of the 1990 IEEE Int. Conf. on Robotics and Automation, Cincinnati, Ohio, USA, 1990, pp. 1528–1533. [7] G. Heinzinger, D. Fenwick, B. Paden, and F. Miyazaki, “Robust learning control,” in Proc. of the 28-th IEEE Conf. on Decision and Control, Tempa, FL, USA, Dec. 1989, pp. 436–440. [8] B. Helfrich, C. Lee, D. Bristow, X. Xiao, J. Dong, A. Alleyne, S. Salapaka, and P. Ferreira, “Combined h-infinity feedback and iterative learning control design with application to nanopositioning systems,” in American Control Conference, 2008, June 2008, pp. 3893–3900. [9] D. Bristow, “Frequency domain analysis and design of iterative learning control for systems with stochastic disturbances,” in American Control Conference, 2008, June 2008, pp. 3901–3907. [10] M. Butcher, A. Karimi, and R. Longchamp, “A statistical analysis of certain iterative learning control algorithms,” International Journal of Control. [11] K. Tao, R. Kosut, and G. Aral, “Learning feedforward control,” in American Control Conference, 1994, vol. 3, June-1 July 1994, pp. 2575–2579. [12] K. Barton, J. van de Wijdeven, A. Alleyne, O. Bosgra, and M. Steinbuch, “Norm optimal cross-coupled iterative learning control,” in Decision and Control, 2008. CDC 2008. 47th IEEE Conference on, Dec. 2008, pp. 3020–3025. [13] D. Bristow and A. Alleyne, “A high precision motion control system with application to microscale robotic deposition,” Control Systems Technology, IEEE Transactions on, vol. 14, no. 6, pp. 1008–1020, Nov. 2006. A PPENDIX Claim: trace (Φk+1 ) ≤ trace (Φk ), i.e., Jk+1 ≤ Jk . Proof: Plugging in the optimal Lk = ΠG I − VΦk−1 into Eq. 20, we get (V − Φk ) + Φk+1 = Φk + ΠG I − VΦ−1 k T T T (V − Φk ) I − VΦ−1 ΠG + k T T Φk I − VΦ−1 ΠG (42) ΠG I − VΦ−1 k k Further −1 T T Φ I − VΦ Π 0 ≤ trace ΠG I − VΦ−1 k G k k T −1 −1 T = trace ΠG ΠG I − VΦk Φk I − VΦk = −trace ΠG I − VΦ−1 (V − Φk ) k Using the above, we have trace (Φk+1 ) = trace (Φk ) − T T trace ΠG I − VΦ−1 Φk I − VΦ−1 ΠG k k ≤ trace (Φk ) . Hence, Jk+1 ≤ Jk . [1] S. Arimoto, S. Kawamura, and F. Miyazaki, “Bettering operation of robots by learning,” J. of Robotic Systems, vol. 1, no. 2, pp. 123–140, 1984. [2] D. Bristow, M. Tharayil, and A. Alleyne, “A survey of iterative learning control,” Control Systems Magazine, IEEE, vol. 26, no. 3, pp. 96–114, June 2006. 1271
© Copyright 2026 Paperzz