A Framework for Finding Robust Optimal Solutions over Time Yaochu Jin, Ke Tang, Xin Yu, Bernhard Sendhoff and Xin Yao Abstract— Dynamic optimization problems (DOPs) are those whose specifications change over time, resulting in changing optima. Most research on DOPs has so far concentrated on tracking the moving optima (TMO) as closely as possible. In practice, however, it will be very costly, if not impossible to keep changing the design when the environment changes. To address DOPs more practically, we recently introduced a conceptually new problem formulation, which is referred to as robust optimization over time (ROOT). Based on ROOT, an optimization algorithm aims to find an acceptable (optimal or sub-optimal) solution that changes slowly over time, rather than the moving global optimum. In this paper, we propose a generic framework for solving DOPs using the ROOT concept, which searches for optimal solutions that are robust over time by means of local fitness approximation and prediction. Empirical investigations comparing a few representative TMO approaches with an instantiation of the proposed framework are conducted on a number of test problems to demonstrate the advantage of the proposed framework in the ROOT context. I. I NTRODUCTION Most real-world optimization problems are often subject to changing environments. Changes in an optimization problem can involve variations in the objective functions, decision variables, environmental parameters, as well as the constraints. The number of objectives, decision variables and constraints may also vary from time to time during the optimization. Such optimization problems are termed as dynamic optimization problems (DOPs) [1], [2]. Population-based optimization algorithms (POAs), such as evolutionary algorithms (EAs), are considered to be well suited for solving DOPs [3]. By far, most research on dynamic optimization has focused on tracking moving optima (TMO) [2], [4], [5], [6], [7], [8], where, whenever a change occurs, the new global optimal solution is targeted. Such approaches are theoretically possible, but practically of less interest, since in most real-world scenarios, it is impractical to change the design (or a plan) in use constantly. In our previous work [9], we pointed out a number of limitations of TMO approaches to DOPs, including the difficulty in relocating the new optima in severely or quickly changing environments, the impracticality of using the new solutions due to too many human operations [10] or limited resources such as time and costs, and the risk of being deceived by Yaochu Jin, Ke Tang, Xin Yu and Xin Yao are with the Nature Inspired Computation and Applications Laboratory (NICAL), School of Computer Science and Technology, University of Science and Technology of China, Hefei, Anhui 230027, China. Bernhard Sendhoff is with Honda Research Institute Europe, 63073 Offenbach, Germany. Yaochu Jin is also with the Department of Computing, University of Surrey, Guildford, Surrey, GU2 7XH, UK. Xin Yao is also with the Centre of Excellence for Research in Computational Intelligence and Applications (CERCIA), School of Computer Science, University of Birmingham, Edgbaston, Birmingham B15 2TT, UK. (emails: [email protected], [email protected], [email protected]). the “time-linkage” problem [11]. Meanwhile, we proposed to find robust optimal solutions over time, which was referred to as robust optimization over time (ROOT) [9]. According to ROOT, the goal of optimization is to find a sequence of acceptable solutions, which can be optimal or sub-optimal, instead of the global optimum in any given environment. One assumption is that when the environment changes, the old solution can still be used in the new environment as long as its quality is acceptable. The criterion for an acceptable optimal solution is problem-specific and can be pre-defined by the user. The main new contributions of this paper are to propose a generic framework for solving DOPs based on the ROOT concept. The proposed framework consists of a populationbased optimization algorithm (POA), a database, a fitness approximator and a fitness predictor. A definition for robustness over time, which is the essence of the framework, is also proposed. Simply put, a solution’s robustness over time is estimated by both its past and future performances. To demonstrate the feasibility of the framework, an instantiation of the generic framework is given, where a radial-basisfunction (RBF) network [12] is adopted as the local approximator [13], [14] of past fitness values and the autoregressive (AR) model [15] for predicting future performance. The performance of the instantiation as well as a few representative TMO approaches is examined on nine DOP test problems that belong to three categories of ROOT problems, including mMPB [9] and two other benchmark problems proposed in this paper. Finally, we demonstrate the effectiveness of the framework and empirically analyze the performance of the implemented prototype on the above test problems. II. R ELATED W ORK In this section, we present a brief introduction to research on robust optimization and dynamic optimization, both are closely related to ROOT. More comprehensive surveys on dynamic optimization and robust optimization can be found in [1], [2], [16]. Without loss of generality, we assume the following maximization problem is in consideration: Maximize : F (~x) = f (~x, α ~ ), (1) where f is the objective function, ~x is a vector of design variables and α ~ is the vector of environmental parameters. A. Robust Optimization One widely used definition of robust solutions is a solution’s expected performance over all possible disturbances. Thus the resultant fitness of Eq. 1 is Z +∞ F (~x) = f (~x + ~δ, α ~ )p(~δ)d~δ, (2) −∞ or F (~x) = Z +∞ ~ ξ)d ~ ξ, ~ f (~x, α ~ + ξ)p( (3) −∞ where ~δ and ξ~ are terms for describing noise in the design variables and environmental parameters, respectively, and ~ are their probability density functions. p(~δ) and p(ξ) In practice, it is often impossible to analytically calculate the expected values of Eq. 2 and Eq. 3. An intuitive alternative is to estimate them using Monte Carlo integration by sampling over a number of realizations of ~δ and ξ~ [17], [18], [19]. To provide a more reliable estimation in the Monte Carlo integration while avoiding additional fitness evaluations, surrogate models [20], [21] have been employed to substitute the original fitness function [22], [25], [26]. Surrogates have also been studied in memetic algorithms [23], [24]. Ong et al. [25] employed the radial basis functions (RBFs) as the local surrogate models and incorporated them into a Baldwinian trust-region framework to estimate a solution’s worst case performance. The experimental results verified the effectiveness of the above work on robust optimization [22], [25]. Most recently, a probabilistic model has been developed in [28] for choosing a surrogate that is expected to bring about highest fitness improvement among a number of given surrogates based on the work in [29], [30]. However, dynamic optimization problems defined in this paper have not been not considered. B. TMO Approach to DOPs In a DOP, the fitness function is deterministic at any given time instant, but dependant on time t. Hence the resultant fitness of Eq. 1 takes the form F (~x) = f (~x, α ~ (t)), (4) where t is the time index with t ∈ [0, Tend ] (Tend is the life cycle of the problem, Tend > 0). Now the problem parameters α ~ (t) are changing over time, and the objective of TMO is to maximize the function defined in Eq. 4 at any time instance. In practice, an assumption of “small to medium” degree of environmental changes is often made so that “tracking” makes sense [27]. To enhance the ability of POAs to track a moving optimum, two main strategies have been adopted. One is to make use of historical information [27], [31], [32], [33], [34], [35] and the other is to re-introduce or maintain the population diversity [4], [36], [37], [38], [39], [40], [41]. C. Robust Optimization over Time In the context of ROOT, the problem to be solved is similar to that defined in Eq. 4. However, the goal here is to find solutions whose quality is acceptable over a certain time interval, although they must not be the global optima at any time instant. From this perspective, the target is to optimize the expected fitness over time, i.e. [9], Z t0 +T Z +∞ F (~x) = f (~x, α ~ (t))p(~ α(t))d~ α(t)dt, (5) t0 −∞ where p(~ α(t)) is the probability density function of α ~ (t) at time t, T is the length of the time interval, t0 is a given starting time. If we know the dynamics of the problem parameters, e.g., α ~ (t) is determined by α ~ (t) = g(~ α(0), ∆~ α, t), (6) where ∆~ α is the change of α between two successive time instants. From Eqs. 2-4 and 6, we can find that robustness in ROOT not only takes into account uncertainties in the decision space and parameter space, but also the effect of these uncertainties in the time domain. Our definition has also taken into account the “time-linkage” problems [11], where the optimal decision in the long run is deceived by keeping finding the global optimum of the problem at any given time. III. D ISCRETE -T IME DYNAMIC O PTIMIZATION P ROBLEMS A. Problem Formulation In this section, we discuss robustness over time in the context of a class of discrete-time dynamic optimization problems (DTDOPs), together with the formal definitions of TMO and ROOT. We also briefly analyze the characteristics of ROOT on DTDOPs. In Eq. 4, the time t is normally assumed to be discrete and is measured in function evaluations in the field of DOPs. Here we consider a class of dynamic optimization problems that are most widely studied in the literature: The parameters of the problem change over time with stationary periods between changes. In other words, α ~ (t) does not have to change continuously over time, and thus within Tend there may be a sequence of l = dTend /τ e different instances of the same problem (the problem with parameters < α ~ 1, α ~ 2, . . . , α ~ l >), where 1/τ is the frequency of the change and usually remains constant but could also be timevariant. Therefore, the problem can be re-formulated as: < f (~x, α ~ 1 ), f (~x, α ~ 2 ), . . . , f (~x, α ~ l) > . (7) We call this category of problems DTDOPs. For DTDOPs defined in Eq. 7, TMO means to find the global optimal solution Si∗ within the i-th time interval Ti for the problem f (~x, α ~ i ) where i = 1, 2, . . . , l. Thus the task is to find a sequence of corresponding global optimal solutions < S1∗ , S2∗ , . . . , Sl∗ >. B. Definition of ROOT in DTDOPs The task of ROOT is to find a sequence of acceptable solutions that are robust over time. Formally, for the DTDOP defined in Eq. 7, the goal is to find a sequence of solutions < S1 , S2 , . . . , Sk >, and a solution Si (i = 1, 2, . . . , k) is said to be robust if its performance is acceptable for at least two consecutive time intervals. Thus we have 1 ≤ k ≤ l. Note that k = l means that no such robust solutions exist and for every changed environment, a new solution must be found. On the other hand, the ideal situation occurs when k = 1, where only one solution is needed for all environments, and thus there is no need to search for a new solution when the environment changes. Consider the dynamic problem defined above in the time interval [tL , tU ] ⊆ [0, Tend ] with a simple additive dynamics between two successive changes, i.e., let lL = dtL /τ e and lU = dtU /τ e, we obtain the problem ~ lL +1 ), . . . , f (~x, α ~ lU ) >, < f (~x, α ~ lL ), f (~x, α (8) Taking the following scalar objective function to be maximized as an example: f (x, α) = α − (α − 1)x2 , α ∈ R, x ∈ R, where α changes according to Eq. 9 and ∆α is assumed to be normally distributed ∆α ∼ N (0, 1). In the time interval [tL , tU ], according to Eqs. 10 and 11, the two measures can be calculated as follows: with the dynamics F = αlL − (αlL − 1)x2 , α~i = α ~ i−1 + ∆~ α, (9) where i = lL + 1, lL + 2, . . . , lU . Recall that f is to be maximized. The dynamics parameter ∆~ α is regarded as a random variable obeying a certain distribution such as a Gaussian distribution or a uniform distribution. Here the parameters of the distribution functions (e.g., the mean and the standard deviation of the Gaussian distribution function) are assumed to be time-invariant. In this case, two suggested measures [9] are given below to estimate the robustness in ROOT: F (~x; lL , lU ) = Z 1 lU − lL + 1 lU −l L +1 X α)p(∆~ α)d∆~ α, f (~x, α ~ lL + (i − 1)∆~ (10) and D(~x; lL , lU ) = Z 1 lU − lL + 1 lU −l L +1 X i=1 α) − F (~x; lL , lU ))2 p(∆~ α)d∆~ α, (f (~x, α ~ lL + (i − 1)∆~ (11) where p(∆~ α) is the probability density function of ∆~ α. It can be seen that F actually measures the average performance over all environments within the time interval [tL , tU ], and D measures the degree to which the performance varies when the environment changes. 3.5 3 2.5 D 2 1.5 1 0.5 0 1 1.2 1.4 1.6 1.8 2 F Fig. 1. Pareto front of the example function in Eq. 12 aiming for maximizing F in Eq. 13 and minimizing D in Eq.14. The curve represents the non-dominated solutions of the multi-objective optimization problem. (13) and (lU − lL )[2(lU − lL ) + 1] (1 − x2 )2 . (14) 6 From Eqs. 13 and 14, we can see that if αlL ≤ 1, maximizing F and minimizing D are not conflicting goals. However, if αlL > 1, maximizing F and minimizing D represent conflicting objectives, and hence we are dealing with multiobjective robust optimization. Using the goals in Eqs. 13 and 14 (as an example Pareto front, here we set αlL = 2 and lU − lL = 3), the Pareto front of the example function in Eq. 12 is shown in Fig. 1. D= IV. P OPULATION -BASED A LGORITHMS (POA S ) ROBUST O PTIMIZATION OVER T IME i=1 (12) FOR A. The Framework We propose a framework for solving ROOT using POAs, as they have been widely used and shown successful in the TMO approach to DOPs. From the robustness measure defined in Eqs. 10 and 11, we can conceive that to estimate a solution’s performance over a time interval, we must take into account its past and future performance. If we assume the change in performance is not random, both past and future performance can be estimated from historical data collected during optimization. Furthermore, a local approximator is needed to estimate the past fitness values of a solution if they cannot be directly retrieved from the database. Thus, our framework for solving ROOT problems should consist of an optimizer (an POA, e.g., a genetic algorithm or a particle swarm optimization algorithm), a database, an approximator and a predictor. 1) Approximator and Predictor: When the optimization problem changes, we can employ an approximation model constructed with historical data to estimate a solution’s performance in a previous time instant. An approximate model, often known as meta-model or surrogate, can also be very helpful when fitness evaluations are highly timeconsuming as in many real-world applications [20], [45]. In this work, a local approximation model is constructed using its neighboring historical data in the database to estimate a solution’s past performance. Many models, such as radialbasis-functions (RBFs) [25], [14], polynomial functions [13], [22], kriging models [44] can be chosen. For an overview of using fitness approximation in EAs, the reader is referred to [2], [45]. By contrast, the task of the predictor is to estimate a solution’s future performance. To this end, an approximation model must be constructed based on the solution’s past performances that can either directly retrieved from the database or predicted using the local approximator introduced in the previous section. Any time series prediction techniques can be adopted for the predictor, such as autoregressive (AR) models [15], support vector machines (SVMs) [46], and models used for approximation mentioned in the previous subsection. 2) Database: The database is used to store historical data, based on which the local approximator and the predictor are constructed. It starts without any records and collects data when the POA is running. In each generation, each candidate’s location in the search space, its fitness value and the associated time instant are stored into the database. B. Estimation of Robustness over Time From the previous sections, we can see that the critical issue in solving ROOT problems is to estimate the robustness over time as defined in Eqs. 10 and 11 of a solution. We limit our discussions to estimating the F measure in this paper. Generally, the integral part of F is not easy to estimate since in real-world applications there is little prior knowledge about the problem and the uncertainty. However, if we know the performance of a solution in several time steps, we can compute its average performance during a certain time interval and take this average performance as the solution’s robustness over the considered time interval. In other words, at time step l0 , we estimate the robustness of a solution ~x as follows lX 0 +q F̂ (~x; l0 ) = fˆ(~x, α ~ l ), (15) l=l0 −p where F̂ actually looks backward to p past values and forward to q future values of a solution. Note that the approximated fitness fˆ is used instead of the real fitness value, because the estimation of a solution’s robustness should not introduce extra fitness evaluations. In this way, it is guaranteed that the problem being solved will not change again during the process of estimation. The pseudo-code of the proposed algorithm, a (p+q)-POA, for ROOT is described in Algorithm 1, where g is the index of generation, l is the index of environment, ’DB’, ’LA’, and ’Pre’ denotes the database, the local approximator, and the predictor, respectively. (p + q)-POA starts with an empty database and collects data during the run. In each generation, it operates like an ordinary POA except that the fitness now is an estimated F value. C. Requirements on Estimators In order to guarantee an acceptable performance of the framework, the estimator should be unbiased. However for a POA that needs rank-based information to drive the search, a low standard deviation of the estimation error is more important than the mean as long as the biases are consistent on different points [22]. For example, if there is an estimator whose estimation error has a probability distribution with zero standard deviation, the search result of an POA with Algorithm 1 (p+q)-POA – A population-based optimization algorithm for ROOT g←0 l←0 initialize population P (0) evaluate every individual in P (0) with real f add (indi, 0) to DB where indi ∈ P (0) repeat if the environment changes then l ←l+1 end if P 0 (g) ← select(P (g)) P 00 (g) ← algorithmOperators(P 0 (g)) evaluate every individual in P 00 (g) with real f add (indi, l) to DB where indi ∈ P 00 (g) estimate F of all P 00 (g) individuals via LA(p) and Pre(q) assign the estimated F value to the fitness of every individual in P 00 (g) S P (g + 1) ← survival(P (g) P 00 (g)) g ←g+1 until termination criteria met rank based selection will be the same, no matter whether the search is based on the estimator or on the real fitness function. For a PSO algorithm, since the updates of the personal best information and the global best information are all based on the comparisons between fitness values, fitness approximation errors will not change the locations of the attraction to the population. V. A N I NSTANTIATION OF THE F RAMEWORK As an instantiation of the framework, we adopt a radialbasis-function (RBF) model [12] as the local approximator and an autoregressive (AR) model [15] as the predictor. Since AR is a linear model, we also employ a nonlinear RBF model as the predictor for comparison. A. Radial-Basis-Function Model An RBF approximation model can be defined as follows: y(~xk ) = nc X i=1 ωi φ(k~xk − ~ci k2 ), (16) where {~xk , y(~xk ), k = 1, 2, . . . , ns } is the training dataset, ~xk ∈ Rd is the k-th input vector, y(~xk ) is the k-th output, nc is the number of centers, ~ci is the i-th center, φ(·) : Rd → R is a radial basis kernel, k · k2 represents the Euclidean norm, and ~ω = {ω1 , ω2 , . . . , ωnc }T ∈ Rnc denotes the weight vector. In this work, we adopt a linear kernel for the RBF model, which takes the following form: y(~xk ) = nc X i=1 ωi k~xk − ~ci k2 . (17) Normally, ns ≥ nc . When ns = nc , it is the interpolation model. Each training data is chosen as the center of the RBF, and the weights can be calculated as ω ~ ∗ = K −1 ~ y, (18) where ~y = {y(~x1 ), y(~x2 ), . . . , y(~xns )}T ∈ Rns is the output vector and the ij-th element of matrix K is calculated as k~xi −~xj k2 . When ns > nc , it becomes the regression model. The centers can be selected from the training data via a kmeans algorithm [47]. The weights can be computed using the least square method as follows: ω ~ ∗ = (K T K)−1 K T ~ y, (19) where the ij-th element of matrix K is calculated as k~xi − ~cj k2 . In the l-th environment, when estimating an individual’s past fitness value in the (l−l0 )-th environment and at its position ~x, we first check whether the tuple (~x, l−l0 , f (~x, α ~ l−l0 )) exists in the database. If yes, we can use the recorded fitness value. Otherwise, we construct a local RBF model using ns nearest neighbors of ~x in the (l − l0 )-th environment, and obtain the estimated value fˆ(~x, α ~ l−l0 ). Euclidean distance is used here. Make sure that a data sample should not be stored in the database more than once to avoid singularity of the matrix K in constructing an RBF model. To get good approximation models, the training data should be properly distributed in the search space. To this end, we employ the symmetric Latin hypercube design (SLHD) [48] to generate the initial population, and once the environment changes, we generate half of the population size individuals using SLHD to replace the worst half of the population. SLHD has the same characteristics as the regular Latin hypercube design (LHD). To avoid ill-distributed training data, we use the approach described in [22] to detect severe outliers and replace them with an estimated fitness value averaged over the neighboring points in the decision space in that time instant. B. Autoregressive Model obtained using the RBF approximation model described in the previous section. The aforementioned RBF model can also be used to perform the prediction task. Similar to the RBF approximator, in order to predict the value at time l, i.e., yl , we retrieve ns nearest historical data of xl from the database. The distance is also measured in Euclidean distance but we apply it in two different spaces. One is to retrieve xl ’s neighbors in xl ’s space, the other is in x∗l ’s space, where x∗l = xl − xl−1 = {yl−1 − yl−2 , yl−2 − yl−3 , . . . , yl−ψ − yl−ψ−1 }T [50]. The former focuses on ψ-length historical series with similar values, while the latter pays more attention to the historical series with similar dynamics. We denote the former RBF model by RBF(ψ) and the latter by GRBF(ψ). C. PSO as the Optimizer We adopt three variants of the particle swarm optimization (PSO) algorithm [42], [43] as the optimizer, including a PSO with a simple re-start strategy (denoted by rPSO), a PSO with a memory scheme used in [6] (denoted by memPSO), and a PSO using species technique [54] (denoted by SPSO). memPSO and SPSO represent two typical TMO techniques to DOPs, one using the memory scheme and the other multiple populations. In the following, we provide a brief description of rPSO. In a swarm of size µ, the i-th particle’s position is denoted by x~i = (xi1 , xi2 , . . . , xin ), the best position it has found so far is denoted as p~i = (pi1 , pi2 , . . . , pin ) (called pbest), the best position of the whole population (called gbest) or of the current particle’s neighborhood (called lbest) is denoted by p~b = (pb1 , pb2 , . . . , pbn ), and the rate to change the position of this particle is called velocity and is denoted by v~i = (vi1 , vi2 , . . . , vin ), where i = 1, 2, . . . , µ. The basic PSO used in this work is the PSO using the constriction factor [55], in which, at each iteration step g, the i-th particle updates its d-th dimension of velocity according to Eq. 22 and position in the search space according to Eq. 23 as follows: An AR model of order ψ is denoted by AR (ψ). Given a time series of data Xl , a typical AR (ψ) model takes the form [15] ψ X ηi Xl−i , (20) Xl = l + vid (g) = χ(vid (g − 1) + c1 r1 (pid − xid (g − 1)) + c2 r2 (pbd − xid (g − 1))) xid (g) = xid (g − 1) + vid (g), i=1 T where l is white noise, ~ η = {η1 , η2 , . . . , ηψ } denotes the vector of the parameters of the model. To perform the prediction task, we form the input-output pair as (xl , yl ) = ({Xl−1 , . . . , Xl−ψ }T , Xl ). Hence in order to predict yl , the training data are (xl−1 , yl−1 ), . . . , (xl−ns , yl−ns ) [49]. Once the value of ψ is determined, the parameters of the model can be estimated using the least square method. Specifically, the coefficient can be calculated as ~, ~η ∗ = (~ χT χ ~ )−1 χT Y (21) ~ = {yl−1 , . . . , yl−ns }. where χ ~ = {xl−1 , . . . , xl−ns } and Y For regression, we ensure that ns > ψ. The training data are T (22) (23) where χ= |2 − c − 2 √ c2 − 4c| , c = c1 + c2 , c > 4.0. (24) The constriction factor χ provides a damping effect on a particle’s velocity and ensures that the particle will converge over time. c1 and c2 are constants, typically 2.05, and thus, χ = 0.729844 according to Eq. 24. r1 and r2 are random numbers uniformly distributed in [0, 1]. Moreover, the velocity v~i can be constricted within the range of [−VMAX , +VMAX ]. In this way, the likelihood of a particle’s flying out of the search space is reduced. The value of ±VMAX is usually set to be the lower and upper bounds of the allowed search ranges as suggested in [43]. We refer readers to [6] and [54] for a detailed description of memPSO and SPOS. D. Computational Complexity We first briefly present an analysis of the computational costs for the construction of an RBF model and an AR model, and then analyze the overall computational overhead of the algorithm (p + q)-POA over a standard POA. (1) Compute the Euclidean distance between the model fitting point and all records with the same environment index (e.g., the index l) in the database. This costs time in the order O(ndbl d), where ndbl is the number of the records in the database in the l-th environment and d is the dimensionality of the search space. (2) Sort the above recorded data with regard to their Euclidean distances to the fitting point. The time complexity of using the Quicksort algorithm [51] is O(ndbl log2 ndbl ) on average and O(n2dbl ) in the worst case. (3) Computing coefficients of the RBF and AR models. Since we ensure that no singular matrix exists in the calculation, for interpolation, we employ LU decomposition [52] to solve Eq. 18, and for regression, we employ Cholesky decomposition [53] to solve Eq. 19 and Eq. 21. Both methods have a complexity of O(n3r ), where nr is the rank of the matrix involved in the above calculation. Specifically, nr = nc for the RBF model and nr = ψ for the AR model. Therefore, the overall complexity for the construction of an RBF and an AR models is O(ndbl d + n2dbl + n3c + ψ 3 ). (25) For the (p + q)-POA, to estimate p past values, p RBF approximators are needed while for the q future values only one AR predictor is enough. Therefore, if the database size ndb is the same in each environment, the overall computational overhead over a fitness evaluation is dynamic rotation peak benchmark generator (mDRPBG) and the modified dynamic composition benchmark generator (mDCBG). mDRPBG comes from DRPBG [56] by allowing each peak to have its own width change severity and height change severity, while mDCBG is obtained from DCBG [56] by allowing the height of each composition part to have its own change severity. The three benchmark problems are described as follows. A. TP1 – Modified Moving Peak Benchmark We use the cone peak function and do not consider any base landscape in mMPB. An n-dimensional test function with m peaks is defined as: m ~ i (t)k2 }, F (~x, t) = max{Hi (t) − Wi (t) · k~x − X i=1 (28) ~ W ~,X ~ denote the peak height, width and position, where H, respectively. For every ∆e evaluations the height and width of the i-th peak are changed as follows: Hi (t + 1) = Hi (t) + height severityi · N (0, 1) Wi (t + 1) = Wi (t) + width severityi · N (0, 1), (29) where N (0, 1) denotes a normally distributed onedimensional random number with a mean of zero and variance one, height ~severity and width ~severity denotes the height change severity and width change severity respectively. The position is moved by a vector ~vi of a fixed length s in a random direction (λ = 0) or a direction exhibiting a trend (λ > 0) as follows: ~ i (t + 1) = X ~ i (t) + ~vi (t + 1). X (30) The shift vector ~vi (t+1) is a linear combination of a random vector ~r and the previous shift vector ~vi (t), and is normalized to a length of s, i.e. s ~vi (t + 1) = ((1 − λ)~r + λ~vi (t)). (31) k~r + ~vi (t)k2 (26) The random vector ~r is created by drawing random numbers in [−0.5, 0.5] for each dimension and normalizing its length to s. Normally, ndb is much larger than d, nc , and ψ. Thus the computational overhead is mainly determined by B. TP2 – Modified Dynamic Rotation Peak Benchmark Generator O(pndb d + pn2db + pn3c + ψ 3 ). O(pn2db ). (27) VI. T EST P ROBLEMS As ROOT can be regarded as a more practical way of addressing DOPs, we can create test problems for ROOT via modifying existing benchmark problems for DOPs. The first test problem is the modified moving peak benchmark (mMPB) [9]. It was derived from MPB benchmark [27] by allowing each peak to have its own change severities of width and height. In this way, some parts of the search space change more severely than other parts, which is very useful in evaluating solutions that are robust over time. Based on a similar idea, we propose further in this study the modified We use the random change type in mDRPBG, so the differences between mMPB and mDRPBG are the definition of the peak function and the way of changing the peak position. An n-dimensional test function with m peaks is defined as: √ m ~ i (t)k2 / n)} (32) F (~x, t) = max{Hi (t)/(1+Wi (t)·k~x − X i=1 ~ W ~,X ~ denote the peak height, width and position, where H, respectively. For every ∆e evaluations the height and width of the ith peak change in the same way as mMPB in Eq. 29. The ~ i is changed by rotating its projection i-th peak position X on randomly paired dimension from the first dimension axis to the second by an angle θi . The dynamics of the rotation angle θ~ and the specific rotation algorithm [56] are shown below: Step 1. θi (t + 1) = θi (t) + θ severity · N (0, 1). Step 2. Randomly select l dimensions (l is an even number) from the n dimensions to compose a vector r = [r1 , r2 , ..., rl ]. Step 3. For each pair of dimension r[j] and dimension r[j + 1], construct a rotation matrix Rr[j],r[j+1] (θi (t + 1)). Step 4. A transformation matrix for the i-th peak Ai (t + 1) is obtained by: Ai (t + 1) = Rr[1],r[2] (θi (t + 1)) · Rr[3],r[4](θi (t + 1)) · · · Rr[l−1],r[l] (θi (t + 1)) ~ i (t + 1) = X ~ i (t) · Ai (t + 1), Step 5. X where the rotation matrix [57] Rr[j],r[j+1] (θi (t + 1)) is ~ i (t) on the plane obtained by rotating the projection of X r[j] − r[j + 1] by an angle θi (t + 1) from the r[j]-th axis to the r[j + 1]-th axis. As for the value of l, if n is an even number, l = n; otherwise l = n − 1. C. TP3 – Modified Dynamic Composition Benchmark Generator We use base functions with the original optimum being ~0, so the composition function with m base functions and n dimensions can be described as: m X ~ i (t) ~x − O ~ i ) + Hi (t))), (33) (wi · (fi0 ( F (~x, t) = ·M λ i i=1 ~ i is the orthogonal where fi (~x) is the i-th base function, M ~ rotation matrix for each fi (~x), and Oi (t) is the optimum of the changed fi (~x) caused by rotating the landscape at the time t. The weight value wi for each fi (~x) is calculated as: − wi = e wi = s Pn (xk −ok )2 i k=1 2nσ2 i , wi if wi = max(wi ) wi · (1 − max(wi )10 ) if wi 6= max(wi ), wi = wi / m X wi , i=1 where for each base function fi (~x), σi is the convergence range factor and λi is the stretch factor, which is defined as: λi = σi · Xmax − Xmin , ximax − ximin where [Xmax , Xmin ]n is the search range of F (~x) and [ximax , ximin ]n is the search range of fi (~x). In Eq. 33, i fi0 (~x) = C · fi (~x)/|fmax |, where C is a predefined constant i and fmax is the estimated maximum value of fi (~x), which is estimated as: i ~ i) fmax = fi (xmax ~ ·M ~ is initialized using the same transIn the mDCBG, M formation matrix construction algorithm as the one in the mDRPBG and then remains unchanged. For every ∆e eval~ and O ~ change in the same way as the parameters uations, H ~ ~ H and X in mDRPBG. In this study, we only select the Sphere’s function as the base function which takes the form f (~x) = n X x2i i=1 xi ∈ [−100, 100]. (34) VII. E MPIRICAL R ESULTS In this section, the first goal is to investigate the performance of the POA designed for TMO in the context of ROOT. We want to see if they can be easily adapted to ROOT, and if so, what a performance they can achieve. Second, we would like to evaluate the effectiveness of the proposed framework for ROOT and compare it to that of the existing TMO EAs. Finally, we empirically analyze the influence of each component in the framework on the performance so that we can provide suggestions on how to select proper approximators and predictors. Before presenting the empirical results, we provide the parameter setting and the performance indicators used in the experiments. A. Parameter Setting We have a built-in random number generator for each test problem, and to generate test instances we always set the generator seed to 1. For each problem, we generate instances of a dimension 2, 5, and 10, and thus we have 9 test instances in total. The parameters height ~severity and width ~severity are initialized by drawing random numbers within [1, 10]m and [0.1, 1]m , respectively, and then fixed. The change frequency is measured in function evaluations (FEs). All investigated algorithms were run 25 times, each with different initial populations in each instance. The parameter settings of test problems are summarized in Table I. All parameters of the PSOs studied in this work are listed in Table II. To examine the influence of the time interval p+q on the performance of the framework, for (p + q)-PSO, we vary the value p+q from 1 to 10 and for each p+q value, we vary the percentage of p from 0 to 100%. For the RBF used in the instantiation, we set nc = (D + 1)(D + 2)/2 [14], where D is the dimension of the input vector. When regression is used, the number of training data is set to be more than twice of the model size (i.e., nc of RBF and ψ of AR). Note that for each investigated algorithm on each test problem, we assume that the environment changes can be detected. Consequently, we focus on the algorithms’ capability of addressing ROOT problems. B. Performance Evaluation Let S =< S1 , S2 , . . . , Sk > be a sequence of solutions found by an algorithm for the problem defined in Eq. 7 (< f (~x, α ~ 1 ), f (~x, α ~ 2 ), . . . , f (~x, α ~ l ) >). A solution is regarded as robust if it is used for at least two consecutive problem instances and thus we have 1 ≤ k ≤ l. We first define a measure error i and a measure sensitivity Di for a single TABLE I PARAMETERS S ETTING OF T EST P ROBLEMS . three measures to evaluate a solution sequence found by an algorithm. • ρ – the robustness rate of the solution sequence: Parameters TP1 TP2 TP3 Number of changes 50 50 50 ∆e 2500FEs 5000FEs 10000FEs m 5 5 5 n 2, 5, 10 2, 5, 10 2, 5, 10 Search range [0, 50]n [−5, 5]n [−5, 5]n Height range [30, 70] [10, 100] [10, 100] Initial height 50 50 50 Width range [1, 12] [1, 10] N/A Initial width 6 5 N/A Rotation angle range N/A [− π6 , π6 ] [− π6 , π6 ] Initial angle N/A 0 0 height severity range [1, 10] [1, 10] [1, 10] width severity range [0.1, 1] [0.1, 1] N/A θ severity N/A 0.1 0.1 λ=0 C = 2000 Other Parameters N/A s = 1.0 σi = 1.0, i = 1, 2, · · · m δdrop 0.2 0.2 0.2 • TABLE II • PSO c1 = c2 = 2.05, χ = 0.729844 Swarm size: µ = 50 Parameters TP1 TP2 TP3 VM AX 50 10 10 SPSO Parameters TP1 TP2 TP3 rs 15 3 3 PM AX 10 10 10 memPSO Premature convergence threshold: 0.01 Loss of diversity threshold: 0.001 Parameters TP1 TP2 TP3 rs 15 3 3 PM AX 10 10 10 δpre 15 30 50 v u u Di = t 1 Ni − 1 j=l0 |optj − f (Si , α ~ j )|, (35) l0 +N Xi −1 j=l0 k 1X j . k j=1 (38) We can see that Eavg measures how close the sequence of solutions is to the true optima over time. Davg – the average sensitivity of the robust solution sequence: 1 X Davg = Dj , (39) |R| j∈R solution Si as follows: l0 +N Xi −1 (37) where we add −1 to the numerator and the denominator because we need at least one solution for a problem and we have ρ ∈ [0, 1]. When k = 1, the ideal situation happens and ρ = 1. When k = l, no robust solutions have been found and ρ = 0. Eavg – the average error of the solution sequence: Eavg = PARAMETERS S ETTING OF PSO S . 1 i = Ni k−1 , l−1 ρ=1− (|optj − f (Si , α ~ j )| − i )2 . (36) In Eqs. 35 and 36, Ni is the number of different environments in which Si is used, f (~x, α ~ l0 ) is the first environment Si is used, f (Si , α ~ j ) is the fitness value of the solution Si , and optj is the fitness value of the true optimum of the j-th problem instance. Note that in Eq. 36, if the denominator is Ni , for solutions that are not robust, Di will be zero but these solutions are not good in the sense of ROOT. Therefore, we set the denominator to be Ni − 1 to avoid this situation. This also means that Di is significant only for robust solutions over time. Now we denote the sequence of all robust solutions by Sr =< Sr1 , Sr2 , . . . , Srk > (1 ≤ rk ≤ k), and we denote the index sequence < r1 , r2 , . . . , rk > by R. With the two measures above, we can resort to the following where |R| is the length of the sequence R, which equals k, Davg measures the general sensitivity of the sequence of robust solutions to the environmental changes. From the above definitions, it can be seen that the goal of TMO is to minimize Eavg , where ρ = 0 and Davg do not exist. In contrast, the primary goal of the ROOT approach to DOPs pursued in this paper is to maximize ρ, i.e., to find as many robust solutions over time as possible. In addition, we can also consider the success rate ξ of running an algorithm for several times on a dynamic problem. A run is successful if at least one robust solution is found by the algorithm. The final solution sequence is obtained as follows: The first solution for the first environment is the best solution found so far by the algorithm. For the j-th environment where 2 ≤ j ≤ l, if the solution in the (j − 1)-th environment is Si (1 ≤ i ≤ j − 1) and if | f (Si , α ~ j ) − optj | ≤ δdrop , optj (40) then solution Si will still be used for the j-th environment. Otherwise, the best solution found so far by the algorithm for the j-th environment will be adopted as the solution Si+1 . The parameter δdrop denotes the maximal tolerance of performance degradation of the solutions when the environment changes. C. Simulation Results To examine the effectiveness of the framework, (p + q)PSO works on the true past and future fitness values. The experimental results regarding ρ, Eavg and Davg of the investigated algorithms on test problems were plotted in Figs. 2-4. In Fig. 4, the missing data points and lines indicate that the corresponding algorithms failed to find robust solutions over time. The above results were also summarized in Table III, 0.74 0.72 0.7 0.7 0.65 0.68 0.6 0.64 0.5 0.62 0.45 0.6 0.4 0.58 0.4 0.6 0.8 1 0.4 0.3 0.2 0.1 0 0.2 0.4 p / (p+q) TP2D2 0.6 0.8 0 0 1 p / (p+q) TP2D5 0.35 p+q=1 p+q=2 p+q=3 p+q=4 p+q=5 p+q=6 p+q=7 p+q=8 p+q=9 p+q=10 rPSO memPSO SPSO 0.5 0.66 0.55 0.2 0.6 ρ 0.8 0.75 0.35 0 TP1D10 TP1D5 0.76 ρ ρ TP1D2 0.85 0.2 0.4 0.6 0.8 1 p / (p+q) TP2D10 0.2 0.25 0.18 0.3 p+q=1 p+q=2 p+q=3 p+q=4 p+q=5 p+q=6 p+q=7 p+q=8 p+q=9 p+q=10 rPSO memPSO SPSO 0.16 0.2 0.25 0.14 0.12 0.15 0.15 ρ ρ ρ 0.2 0.1 0.1 0.08 0.06 0.1 0.05 0.04 0.05 0.02 0 0 0.2 0.4 0.6 0.8 0 0 1 0.2 0.4 p / (p+q) TP3D2 0.6 0.8 0 0 1 p / (p+q) TP3D5 0.03 0.2 0.4 0.6 0.8 1 p / (p+q) TP3D10 0.16 0.04 0.14 0.035 0.12 0.03 0.1 0.025 0.025 p+q=1 p+q=2 p+q=3 p+q=4 p+q=5 p+q=6 p+q=7 p+q=8 p+q=9 p+q=10 rPSO memPSO SPSO 0.015 ρ ρ ρ 0.02 0.08 0.06 0.02 0.015 0.01 0.04 0.01 0.02 0.005 0.005 0 0 0.1 0.2 0.3 0.4 0.5 0.6 p / (p+q) Fig. 2. 0.7 0.8 0.9 1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 p / (p+q) 0.7 0.8 0.9 1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 p / (p+q) The robustness rate ρ of the solution sequence obtained via varying the (p + q) configuration and the length of optimization interval. where the resultant success rates of finding robust solutions over time (i.e., ξ) were included and (1+4)-PSO was selected as a representative. In Table III, the values in bold are at a 0.05 significance level using the Wilcoxon rank sum test. To take a closer look at the behaviors of the algorithms, Fig. 5 shows the evolutionary curves with respect to the performance error and the Euclidean distance in the search space between the best solution found so far and the global optimum for the test problems with a dimension of 5. The plots were based on the data collected for the run with the median performance regarding ρ among 25 runs. Results in Table III clearly suggest that (1 + 4)-PSO succeeded in finding robust solutions over time in all instances of test problems and outperformed traditional TMO algorithms with respect to the success rate and the robustness rate. This is expected because (1 + 4)-PSO explicitly takes into account the robustness in the time domain. This can also be verified by Fig. 2 suggesting that most (p + q)-PSO algorithms achieved better robustness rates than the TMO algorithms. 1) Traditional TMO Algorithms for DOPs: As can be seen in Table III, the TMO algorithms succeeded in finding robust solutions over time on 5-dimensional problems TP1, TP2, and TP3. This confirms the expectation that some techniques, such as reusing historical information and maintaining the diversity for TMO, are beneficial to ROOT as well. Fig. 2 clearly shows that SPSO obtained the best performance among the three algorithms. This indicates that diversity plays an important role in both TMO and ROOT, because by maintaining a certain level of diversity, the optimizer can keep its search ability. The performance of memPSO is similar to that of rPSO, since the historical information stored in the memory was not used for finding robust solutions over time but tracking the global optima. Additionally, it can be observed from Fig. 3 that the TMO algorithms achieved better Eavg results than most of the (p + q)-PSO algorithms. This is natural since minimizing the average error between the obtained solutions and the true optima is consistent with the goal of TMO. According to Fig. 4, the TMO algorithms showed better Davg perfor- TP1D2 TP1D10 TP1D5 18 18 16 17 27 26 p+q=1 p+q=2 p+q=3 p+q=4 p+q=5 p+q=6 p+q=7 p+q=8 p+q=9 p+q=10 rPSO memPSO SPSO 25 16 14 24 10 Eavg avg E E avg 15 12 14 23 22 13 21 8 12 6 11 4 0 10 0 0.2 0.4 0.6 0.8 1 20 19 0.2 p / (p+q) TP2D2 0.4 0.6 0.8 18 0 1 p / (p+q) TP2D5 50 0.2 0.4 0.6 0.8 1 p / (p+q) TP2D10 70 70 60 60 50 50 45 40 35 p+q=1 p+q=2 p+q=3 p+q=4 p+q=5 p+q=6 p+q=7 p+q=8 p+q=9 p+q=10 rPSO memPSO SPSO Eavg avg 25 E E avg 30 40 40 20 15 10 30 30 20 20 5 0 0 0.2 0.4 0.6 0.8 10 0 1 0.2 p / (p+q) TP3D2 0.4 0.6 0.8 10 0 1 p / (p+q) TP3D5 300 0.2 0.4 0.6 0.8 1 p / (p+q) TP3D10 450 400 400 350 250 p+q=1 p+q=2 p+q=3 p+q=4 p+q=5 p+q=6 p+q=7 p+q=8 p+q=9 p+q=10 rPSO memPSO SPSO 350 300 300 200 Eavg avg E E avg 250 150 200 250 200 150 100 150 100 100 50 50 0 0 0.2 0.4 0.6 0.8 1 0 0 50 0.2 p / (p+q) Fig. 3. 0.4 0.6 0.8 p / (p+q) 1 0 0 0.2 0.4 0.6 0.8 1 p / (p+q) The average error of the solution sequence obtained via varying the (p + q) configuration and the length of optimization interval. TABLE III OVERALL PERFORMANCE OF THE INVESTIGATED ALGORITHMS . Test problems TP1 TP2 TP3 Investigated algorithms rPSO memPSO SPSO (1 + 4)-PSO rPSO memPSO SPSO (1 + 4)-PSO rPSO memPSO SPSO (1 + 4)-PSO ξ 1.00 1.00 1.00 1.00 1.00 0.96 1.00 1.00 0.00 0.00 0.00 0.28 Dimension 2 ρ Eavg Davg 0.73(0.04) 7.37(1.12) 4.27(0.48) 0.75(0.03) 7.58(1.11) 4.15(0.44) 0.77(0.00) 5.29(0.12) 4.79(0.08) 0.80(0.01) 6.20(0.11) 3.92(0.16) 0.04(0.00) 2.17(0.94) 7.00(0.02) 0.04(0.01) 2.58(1.48) 7.00(0.02) 0.04(0.00) 5.96(0.41) 5.88(1.01) 0.27(0.02) 36.23(1.66) 35.41(1.47) 0.00(0.00) 7.35(2.31) ∼ 0.00(0.00) 8.75(1.55) ∼ 0.00(0.00) 11.41(0.81) ∼ 0.01(0.02) 190.00(5.44) 242.80(42.66) ξ 1.00 1.00 1.00 1.00 0.92 0.92 0.92 1.00 0.48 0.68 0.88 1.00 Dimension 5 ρ Eavg Davg 0.65(0.00) 13.62(0.06) 3.33(0.03) 0.65(0.00) 13.62(0.06) 3.33(0.02) 0.67(0.04) 10.97(2.11) 3.74(0.55) 0.74(0.05) 13.44(2.76) 4.72(0.78) 0.03(0.02) 10.07(7.12) 6.95(2.33) 0.03(0.02) 12.23(5.85) 7.15(2.24) 0.04(0.02) 15.41(1.18) 2.63(1.55) 0.18(0.03) 44.06(1.56) 20.82(4.98) 0.01(0.01) 21.74(10.28) 1.45(0.62) 0.01(0.01) 21.41(7.03) 1.72(0.82) 0.02(0.01) 5.68(0.91) 2.87(0.91) 0.08(0.02) 178.89(10.64) 160.08(16.41) mances than most of the (p + q)-PSO algorithms on most TP1 and TP2 test problems. The reason is that although these algorithms’ goal is to track the moving optima, the optima change only slightly during some time intervals. As a result, these optima found by the TMO algorithms are also the robust solutions over time. Evidence supporting the above reasoning can be found in Fig. 5. For example on the problem “TP1D5”, from the 500-th to the 1000-th generation, the optima changed very slowly so the optimal solution found at the 500-th generation could be used during this time interval ξ 1.00 1.00 1.00 1.00 0.32 0.40 0.20 1.00 0.00 0.00 0.00 0.56 Dimension 10 ρ Eavg Davg 0.10(0.09) 26.06(2.65) 9.84(1.44) 0.10(0.09) 26.02(2.66) 9.81(1.43) 0.50(0.04) 18.11(1.97) 2.23(0.19) 0.59(0.04) 18.45(1.86) 4.08(0.84) 0.01(0.01) 22.16(8.91) 5.84(0.76) 0.01(0.01) 18.85(3.88) 5.78(0.58) 0.00(0.01) 22.21(1.50) 1.58(2.13) 0.13(0.03) 43.17(1.35) 20.82(3.47) 0.00(0.00) 58.79(12.52) ∼ 0.00(0.00) 53.31(9.53) ∼ 0.00(0.00) 40.51(4.80) ∼ 0.02(0.02) 244.05(13.60) 200.83(16.09) with acceptable performance. To summarize, the TMO algorithms without explicitly considering the robustness in the time domain can find robust solutions over time only if the global moving optimum happens to be a robust optimal solution over time. 2) The (p+q)-PSO Framework: In this section, we mainly investigate the influence of configuration (i.e., the values of p and q) of (p + q)-PSO on its performance. From the results plotted in Figs. 2-5, we can make some observations by comparing the framework’s performances with different TP1D2 TP1D10 TP1D5 5.5 5.5 5 5 10 9 p+q=1 p+q=2 p+q=3 p+q=4 p+q=5 p+q=6 p+q=7 p+q=8 p+q=9 p+q=10 rPSO memPSO SPSO 8 4 7 Davg avg 4.5 D D avg 4.5 4 3.5 3.5 3 3 2.5 0 2.5 0 6 5 4 3 0.2 0.4 0.6 0.8 1 0.2 p / (p+q) TP2D2 0.4 0.6 0.8 2 0 1 p / (p+q) TP2D5 40 35 0.2 0.4 0.6 0.8 1 p / (p+q) TP2D10 30 30 25 25 20 20 p+q=1 p+q=2 p+q=3 p+q=4 p+q=5 p+q=6 p+q=7 p+q=8 p+q=9 p+q=10 rPSO memPSO SPSO 20 Davg avg 25 D D avg 30 15 15 10 10 5 5 15 10 5 0 0.2 0.4 0.6 0.8 0 0 1 0.2 p / (p+q) TP3D2 0.4 0.6 0.8 0 0 1 p / (p+q) TP3D5 350 0.2 0.4 0.6 0.8 1 p / (p+q) TP3D10 260 180 160 240 300 140 p+q=1 p+q=2 p+q=3 p+q=4 p+q=5 p+q=6 p+q=7 p+q=8 p+q=9 p+q=10 220 200 150 100 Davg avg 120 D D avg 250 80 200 180 60 160 40 100 140 20 50 0 0.2 0.4 0.6 p / (p+q) Fig. 4. 0.8 1 0 0 0.2 0.4 0.6 p / (p+q) 0.8 1 120 0 0.2 0.4 0.6 0.8 1 p / (p+q) The average standard deviation of the solution sequence obtained via varying the (p + q) configuration and the length of optimization interval. configurations. First, the robustness rate does not always increase as the length of the time interval increases. Generally, there is an optimal length of the time interval for a given problem. As can be seen from Fig. 2, the optimal length ranges from 4 to 8 on TP1, from 1 to 5 on TP2, and from 1 to 3 on TP3. The optimal value of p + q is problem-dependant. According to Fig. 5, TP2 and TP3 change much more severely than TP1, so the likelihood of using a solution for a long time interval on TP2 and TP3 was smaller than that on TP1. This is why larger values of p + q favored TP1 while smaller values are more appropriate for TP2 and TP3. Second, the following observations can be made regarding the ratio of p to the length of p + q. As Fig. 2 illustrates, for 2-dimensional and 5-dimensional TP1, TP2 and TP3D5, increasing p first improved the performance. However, further increasing p was detrimental. According to Fig. 2, for TP1, the performance was even worse than that of the TMO when p > q. For TP1D10, TP2D10, and TP3 of dimensions 2 and 10, increasing p generally degraded the performance. The above phenomena can be explained as follows. If there are some “consistencies” in the dynamics during some periods of time, past values are beneficial, and when p is larger than the length of the period, the outdated information will mislead the search and hence degrades the performance. The “consistency” in the dynamics here means that successive environments share some common characteristics or in other words the environment does not change severely. Taking TP3D5 as an example, seen from Fig. 5, there were some periods such as the first 1000 generations and the period around the 4000-th generation the degrees of the changes were mild. This is why on TP3D5 there existed some “sweet spots ” for the value of p. Third, it can be seen from Figs. 3 and 4 that some algorithms perform better regarding Eavg , but worse regarding Davg . These results verify the analyses in Section III.B that in some situations Eavg and Davg are two conflicting goals. In fact, from Figs. 2-4, we find that ρ, Eavg and Davg cannot be simultaneously optimized. For instance, when ρ = 0, there is no robust solution over time and ROOT degenerates to TP1D5 TP2D5 20 15 10 5 500 1000 1500 2000 70 50 40 30 20 10 3000 4000 40 30 20 10 1000 1500 300 250 200 150 100 0 0 5000 2000 2000 2500 Generation 6000 8000 10000 8000 10000 TP3D5 12 12 (1+4)−PSO rPSO memPSO SPSO 10 8 6 4 2 0 0 4000 Generation Distance to the Optimum Distance to the Optimum Distance to the Optimum 2000 14 (1+4)−PSO rPSO memPSO SPSO 500 350 Generation TP2D5 60 0 0 400 50 1000 Generation TP1D5 50 (1+4)−PSO rPSO memPSO SPSO 450 60 0 0 2500 500 (1+4)−PSO rPSO memPSO SPSO Error of Fitness Values 25 0 0 TP3D5 80 (1+4)−PSO rPSO memPSO SPSO Error of Fitness Values Error of Fitness Values 30 1000 2000 3000 Generation 4000 5000 (1+4)−PSO rPSO memPSO SPSO 10 8 6 4 2 0 0 2000 4000 6000 Generation Fig. 5. The evolutionary curves with respect to the performance error and the Euclidean distance in the search space between the best solution found so far and the global optimum. The plots show the median performance with respect to the robustness rate ρ among 25 runs for a dimension of 5. TMO. Consequently, the best Eavg performance is obtained. When ρ = 1, only one solution is used for all environments so the worst Davg performance is achieved. To illustrate this, we have plotted all the available results of (p + q)-PSO in the Eavg − Davg − ρ space in Fig. 1. From this figure, we can clearly see that there existed some points, among which one point could not outperform others regarding all the three measures simultaneously. VIII. C ONCLUSIONS Different to the traditional approaches to solving DOPs where the target is to track the moving global optimum, we aimed at finding robust optimal solutions over time, which opens up a new perspective of solving DOPs referred to as robust optimization over time (ROOT). We proposed a generic framework as well as a robustness measure in the time domain for ROOT. An instantiation of the framework has also been proposed, where a PSO algorithm is employed as optimizer, an RBF model is adopted for estimating fitness in the past, and an AR model for predicting future fitness. The instantiation is empirically evaluated on three benchmarks for ROOT in comparison to their TMO counterparts. From the simulation results, the following conclusions can be drawn. First, re-using historical information and maintaining the diversity proposed for the TMO approaches are also beneficial for ROOT approaches. However, without explicitly considering the robustness in the time domain, the TMO approaches are not able to find robust solutions over time in general. Second, the proposed framework can reliably find robust solutions over time. For the cases where the TMO approaches can also find robust solutions over time, the proposed framework has significantly better performance in terms of the robustness rate. Third, our proposed framework can also track moving optima, although the moving optima in this case will be robust solutions over time. In fact, ROOT provides an effective way of addressing DOPs having different changing behaviors, e.g., shifting and rotation. We also found that there are conflicting objectives in solving ROOT problems. More specifically, the robustness rate, the distance to the true optima and the sensitivity of solutions to the changes cannot be optimized simultaneously. Finally, for a better performance of the framework, a lower standard deviation of estimation error of the estimator is essential. For the estimators used in the instantiation studied in this paper, the RBF approximators failed to capture the characteristics of the environments in the past based on the historical data collected online only. However, the simple AR and GRBF predictors performed the prediction task quite well on some problem instances. As discussed in Section III, the ROOT to DOPs has typically two objectives to take into account, namely, maximizing the average performance over time and minimizing the performance variation. These two objectives can be consistent or conflicting depending on the nature of the problem. In the former case, ROOT can be analyzed and addressed using a single-objective optimization approach [58]. Our future work is to address the conflicting objectives in ROOT by employing a multi-objective approach [59], [60] to ROOT for solving DOPs In addition, there is much room for improvement in constructing approximation models for estimating past fitness values. Furthermore, in-depth analysis of the proposed framework is necessary, including more intensive experimental evaluations of the framework on a larger number of benchmark functions. ACKNOWLEDGMENT This work is partly supported by an EPSRC grant (no.EP/E058884/1) on “Evolutionary Algorithms for Dynamic Optimisation Problems: Design, Analysis and Applications”, the European Union 7th Framework Program under Grant No 247619, a grant from Honda Research Institute Europe, two National Natural Science Foundation Grants (No. 61028009 and No. 61175065), and the National Natural Science Foundation of Anhui Province (No. 1108085J16). R EFERENCES [1] J. Branke, Evolutionary Optimization in Dynamic Environments. Norwell, MA: Kluwer, 2002. [2] Y. Jin and J. Branke, “Evolutionary optimization in uncertain environments – A survey,” IEEE Transactions on Evolutionary Computation, vol. 9, no. 3, pp. 303–317, 2005. [3] T. Weise, Global Optimization Algorithms – Theory and Application. Second edition, 2009. Online available at: http://www.itweise.de/projects/book.pdf. [4] X. Yu, K. Tang, T. Chen, and X. Yao, “Empirical analysis of evolutionary algorithms with immigrants schemes for dynamic optimization,” Memetic Computing, vol. 1, no. 1, pp. 3–24, 2009. [5] J. Brest, A. Zamuda, B. Bošković, M. S. Maučec, and V. Žumer, “Dynamic optimization using self-adaptive differential evolution,” in Proceedings of the 2009 IEEE Congress on Evolutionary Computation, pp. 415–422, 2009. [6] E. L. Yu and P. N. Suganthan, “Evolutionary Programming with ensemble of explicit memories for dynamic optimization,” in Proceedings of the 2009 IEEE Congress on Evolutionary Computation, pp. 431–438, 2009. [7] H. K. Singh, A. Isaacs, T. T. Nguyen, T. Ray, and X. Yao, “Performance of infeasibility driven evolutionary algorithm (IDEA) on constrained dynamic single objective optimization problems,” in Proceedings of the 2009 IEEE Congress on Evolutionary Computation, pp. 3127–3134, 2009. [8] S. Yang and C. Li, “A clustering particle swarm optimizer for locating and tracking multiple optima in dynamic environments”, IEEE Transactions on Evolutionary Computation, vol. 14, no. 6, pp. 959-974, 2010 [9] X. Yu, Y. Jin, K. Tang, and X. Yao, “Robust optimization over time – A new perspective on dynamic optimization problems,” in Proceedings of the 2010 IEEE Congress on Evolutionary Computation, pp. 3998–4003, 2010. [10] H. Handa, “Fitness function for finding out robust solutions on timevarying functions,” in Proceedings of the 2006 Genetic and Evolutionary Computation Conference, pp. 1195–1200, 2006. [11] P. A. N. Bosman, “Learning, anticipation and time-deception in evolutionary online dynamic optimization,” in Proceedings of the 2005 Genetic and Evolutionary Computation Conference, pp. 39–47, 2005. [12] M. J. D. Powell, “The theory of radial basis function approximation in 1990,” in Advances in Numerical Analysis, Volume 2: Wavelets, Subdivision Algorithms and Radial Basis Functions, W. Light, Ed., London, U.K.: Oxfod Univ. Press, pp. 105–210, 1992. [13] K.-H. Liang, X. Yao, and C. Newton, “Evolutionary search of approximated n-dimensional landscapes,” International Journal of KnowledgeBased Intelligent Engineering Systems, vol. 4, no. 3, pp. 172–183, 2000. [14] R. G. Regis and C. A. Shoemaker, “Local function approximation in evolutionary algorithms for the optimization of costly functions,” IEEE Transactions on Evolutionary Computation, vol. 8, no. 5, pp. 490–505, 2004. [15] G. E. P. Box, G. M. Jenkins, and G. Reinsel, Time Series Analysis: Forecasting and Control, 3rd edition. Upper Saddle River, NJ, USA: Prentice Hall PTR, 1994. [16] H.-G. Beyer and B. Sendhoff, “Robust optimization – A comprehensive survey,” Computer Methods in Applied Mechanics and Engineering, vol. 196, pp. 3190–3218, 2007. [17] J. Branke, “Creating robust solutions by means of an evolutionary algorithm,” Parallel Problem Solving from Nature–PPSN V, pp. 119– 128, 1998. [18] H. Greiner, “Robust optical coating design with evolution strategies,” Applied Optics, vol. 35, no. 28, pp. 5477–5483, 1996. [19] D. Wiesmann, U. Hammel, and T. Back, “Robust design of multilayer optical coatings by means of evolutionary algorithms,” IEEE Transactions on Evolutionary Computation, vol. 2, no. 4, pp. 162–167, 1998. [20] Y. Jin, “Fitness approximation in evolutionary computation - A survey,” In: Proceedings of Genetic and Evolutionary Computation Conference, pp. 1105 – 1112, New York, July 2002 [21] Y. Jin, “Surrogate-assisted evolutionary computation: Recent advances and future challenges,” Swarm and Evolutionary Computation, vol. 1, no. 2, pp. 61 – 70, 2011 [22] I. Paenke, J. Branke, and Y. Jin, “Efficient search for robust solutions by means of evolutionary algorithms and fitness approximation,” IEEE Transactions on Evolutionary Computation, vol. 10, no. 4, pp. 405–420, 2006. [23] X. S. Chen, Y. S. Ong, M. H. Lim and K. C. Tan, “A Multi-Facet Survey on Memetic Computation,” IEEE Transactions on Evolutionary Computation, vol. 15, no. 5, pp. 591 – 607, 2011. [24] Y. S. Ong, M. H. Lim and X. S. Chen, “Research Frontier: Memetic Computation - Past, Present & Future”, IEEE Computational Intelligence Magazine, vol. 5, no. 2, pp. 24 – 36, 2010. [25] Y. S. Ong, P. B. Nair, and K. Y. Lum, “Max-min surrogate-assisted evolutionary algorithm for robust design,” IEEE Transactions on Evolutionary Computation, vol. 10, no. 4, pp. 392–404, 2006. [26] D. Lim, Y. S. Ong, Y. Jin, B. Sendhoff, and B. S. Lee. “Inverse multiobjective robust evolutionary optimization”. Genetic Programming and Evolvable Machines. vol. 7, no. 4, 383–404, 2006 [27] J. Branke, “Memory enhanced evolutionary algorithms for changing optimization problems,” in Proceedings of the 1999 IEEE Congress on Evolutionary Computation, vol. 3, pp. 1875–1882, 1999. [28] M. N. Le, Y. S. Ong, S. Menzel, Y. Jin, and B. Sendhoff, “Evolution by adapting surrogates,” Evolutionary Computation, 2012 (accepted) [29] M. N. Le, Y. S. Ong, Y. Jin and B. Sendhoff, “A unified framework for symbiosis of evolutionary mechanisms with application to water clusters potential model design”, IEEE Computational Intelligence Magazine, vol. 7, no. 1, pp. 20 – 35, 2012 [30] M. N. Le, Y. S. Ong, Y. Jin and B. Sendhoff, “Lamarckian memetic algorithms: local optimum and connectivity structure analysis,” Memetic Computing Journal, vol. 1, no. 3, pp. 175 –190, 2009. [31] S. Yang, “Explicit memory schemes for evolutionary algorithms in dynamic environments,” In S. Yang, Y.-S. Ong, and Y. Jin (eds.), Evolutionary Computation in Dynamic and Uncertain Environments, Chapter 1, pp. 3 – 28, Springer-Verlag Berlin Heidelberg, 2007. [32] D. E. Goldberg and R. E. Smith, “Nonstationary function optimization using genetic algorithms with dominance and diploidy,” in Genetic Algorithms, J. J. Grefenstette, Ed: Lawrence Erlbaum, pp. 59–68, 1987. [33] B. S. Hadad and C. F. Eick et al., “Supporting polyploidy in genetic algorithms using dominance vectors,” in Evolutionary Programming. ser. LNCS, P. J. Angeline et al., Eds. Berlin, Germany: Springer-Verlag, vol. 1213, pp. 223–234, 1997. [34] J. Lewis, E. Hart, and G. Ritchie, “A comparison of dominance mechanisms and simple mutation on nonstationary problems,” in Parallel Problem Solving from Nature. ser. LNCS, A. E. Eiben, T. Bäck, M. Schoenauer, and H.-P. Schwefel, Eds. Berlin, Germany: SpringerVerlag, vol. 1498, pp. 139–148, 1998. [35] C. Ryan, “Diploidy without dominance,” in Proceedings of 3rd Nordic Workshop Genetic Algorithms, J. T. Alander, Ed., pp. 63–70, 1997. [36] H. G. Cobb, “An investigation into the use of hypermutation as an adaptive operator in genetic algorithms having continuous, timedependent nonstationary environments,” Naval Res. Lab., Washington, DC, Tech. Rep. AIC-90-001, 1990. [37] Y. Jin and B. Sendhoff. “Constructing dynamic test problems using the multi-objective optimization concept,” In: Applications of Evolutionary Computing. LNCS 3005, pp.525-536, Springer, 2004 [38] S. Yang and R. Tinós R, “A hybrid immigrants scheme for genetic algorithms in dynamic environments,”, International Journal of Automation and Computing, vol. 4, no. 3, pp. 243–254, 2007. [39] X, Yu, K. Tang, and X. Yao, “An immigrants scheme based on environmental information for genetic algorithms in changing environments,” in Proceedings of the 2008 IEEE Congress on Evolutionary Computation, pp. 1141–1147, 2008. [40] J. Branke, T. Kaußler, C. Schmidt, and H. Schmeck, “A multipopulation approach to dynamic optimization problems,” in Adaptive Computing in Design and Manufacturing 2000, ser. LNCS. Berlin, Germany: Springer-Verlag, 2000. [41] R. K. Ursem, “Multinational GA optimization techniques in dynamic environments,” in Proceedings of Genetic and Evolutionary Computation Conference, D.Whitley et al., Eds., pp. 19–26, 2000. [42] J. Kennedy and R. C. Eberhart, “Particle swarm optimization,” in Proceedings of the IEEE International Conference on Neural Networks, vol. 4, pp. 1942–1948, 1995. [43] J. Kennedy and R. C. Eberhart, Swarm Intelligence. San Fransisco, CA, US: Morgan Kaufmann, 2001. [44] T. Simpson, T. Mauery, J. Korte, and F. Mistree, “Comparison of response surface and Kriging models for multidisciplinary design optimization,” Technical Report 98-4755, AIAA, 1998. [45] Y. Jin, “A comprehensive survey of fitness approximation in evolutionary computation,” Soft Computing, vol. 9, no. 1, pp. 3–12, 2005. [46] N. Sapankevych and R. Sankar, “Time series prediction using support vector machines: A survey,” IEEE Computational Intelligence Magazine, vol. 4, no. 2, pp. 24–38, 2009. [47] J. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, pp. 281–297, 1967. [48] K. Q. Ye, W. Li, and A. Sudjianto, “Algorithmic construction of optimal symmetric Latin hypercube designs,” Journal of Statistical Planning and Inference, vol. 90, pp. 145–159, 2000. [49] S. Chen, C. F. N. Cowan, and P. M. Grant, “Orthogonal least squares learning algorithm for radial basis function networks,” IEEE Transactions on Neural Networks, vol. 2, no. 2, pp. 302–309, 1991. [50] E. S. Chng, S. Chen, and B. Mulgrew, “Gradient radial basis function networks for nonlinear and nonstationary time series prediction,” IEEE Transactions on Neural Networks, vol. 7, no. 1, pp. 190–194, 1996. [51] C. A. R. Hoare, “Quicksort,” Computer Journal, vol. 5, no. 1, pp. 10–15, 1962. [52] G. H. Goloub and C. F. van Loan. Matrix Computations, 3rd ed. Baltimore, MD: The John Hopkins Press, 1996. [53] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical Recipes in C: The Art of Scientific Computing, 2nd ed. Cambridge, U.K.: Cambridge Univ. Press, 1992. [54] D. Parrott and X. Li, “Locating and tracking multiple dynamic optima by a particle swarm model using speciation,” IEEE Transactions on Evolutionary Computation, vol. 10, no. 4, pp. 440–458, 2006. [55] M. Clerc and J. Kenndy, “The particle swarm – explosion, stability, and convergence in a multidimensional complex space,” IEEE Transactions on Evolutionary Computation, vol. 6, no. 1, pp. 58–73, 2002. [56] C. Li, S. Yang, T. T. Nguyen, E. L. Yu, X. Yao, Y. Jin, H.-G. Beyer, and P. N. Suganthan, “Benchmark generator for CEC’2009 competition on Dynamic Optimization,” Technical Report 2008, Department of Computer Science, University of Leicester, U.K., 2008. [57] R. Salomon, “Reevaluating genetic algorithm performance under coordinate rotation of benchmark functions: A survey of some theoretical and practical aspects of genetic algorithms,” BioSystems, vol. 39, no. 3, pp. 263–278, 1996. [58] H. Fu, B. Sendhoff, K. Tang and X. Yao. “Characterizing Environmental Changes in Robust Optimization Over Time,” Congress on Evolutionary Computation, June 2012. [59] Y. Jin and B. Sendhoff, “Trade-off between performance and robustness: An evolutionary multiobjective approach,” In: Proceedings of Second International Conference on Evolutionary Multi-criteria Optimization. LNCS 2632, Springer, pp. 237 – 251, Faro, April 2003 [60] Y. Jin and B. Sendhoff, “A systems approach to evolutionary multiobjective structural optimization and beyond,” IEEE Computational Intelligence Magazine, vol. 4, no. 3, pp. 62 – 76, 2009.
© Copyright 2026 Paperzz