Soft Comput DOI 10.1007/s00500-015-1878-z METHODOLOGIES AND APPLICATION Analysing and characterising optimization problems using length scale Rachael Morgan1 · Marcus Gallagher1 © Springer-Verlag Berlin Heidelberg 2015 Abstract Analysis of optimization problem landscapes is fundamental in the understanding and characterisation of problems and the subsequent practical performance of algorithms. In this paper, a general framework is developed for characterising black-box optimization problems based on length scale, which measures the change in objective function with respect to the distance between candidate solution pairs. Both discrete and continuous problems can be analysed using the framework, however, in this paper, we focus on continuous optimization. Length scale analysis aims to efficiently and effectively utilise the information available in black-box optimization. Analytical properties regarding length scale are discussed and illustrated using simple example problems. A rigorous sampling methodology is developed and demonstrated to improve upon current practice. The framework is applied to the black-box optimization benchmarking problem set, and shows greater ability to discriminate between the problems in comparison to seven well-known landscape analysis techniques. Dimensionality reduction and clustering techniques are applied comparatively to an ensemble of the seven techniques and the length scale information to gain insight into the relationships between problems. A fundamental summary of length scale information is an estimate of its probability density function, which is shown to capture salient structural characteristics of the benchmark problems. Communicated by V. Loia. B Rachael Morgan [email protected] Marcus Gallagher [email protected] 1 School of Information Technology and Electrical Engineering, University of Queensland, Brisbane 4072, Australia Keywords Length scale · Fitness landscape analysis · Black-box optimization · Search space diagnostics 1 Introduction The black-box optimization problem setting has been the focus of a large amount of research in evolutionary computation and other fields. Theoretically, it is well-known that under mild conditions, all algorithms perform equally on average across all possible problems (Macready and Wolpert 1996). In stark contrast, thousands of algorithms have been proposed in the literature and have shown good performance on a range of benchmark and other problems. Indeed, it would be expected that a hill-descending/local search algorithm would easily outperform a hill-climbing algorithm on any reasonable set of minimization problems. Here, the definition of reasonable is critical; it has been proven that the “No Free Lunch” scenario does not hold if mild conditions (e.g. smoothness) are imposed on the space of optimization problems (Whitley and Watson 2005). Therefore, understanding the relationship between optimization algorithms and problems of interest emerges as a key research question for both theory and practice. Well-defined and measurable problem properties and characteristics are essential components in facilitating this understanding. There have been many attempts to characterise or measure properties of optimization problems in the literature. In mathematical programming, key assumptions are made regarding the objective function (e.g. convexity, differentiability) and algorithms are developed to take full advantage of these properties (Fletcher 1987). By doing so, the algorithms implicitly assume a “model” of the landscape. Since their behaviour on the model problem can often be understood theoretically, their performance on real-world/black-box problems 123 R. Morgan, M. Gallagher can be viewed as a reflection of how closely the problem landscape matches the problem model. In metamodel or surrogate-based optimization (known as fitness approximation in evolutionary computation), an explicit model is built (e.g. via regression) from a sample of solutions evaluated using the true objective function (assumed to be expensive) (Jin 2005; Forrester et al. 2008; Bartz-Beielstein et al. 2010). Search is then performed more economically on the surrogate model. If the model matches the problem, good solutions on the model should correspond to good solutions on the true objective function. In metaheuristics, researchers have developed practical fitness landscape analysis techniques and derived other metrics for measuring problem “difficulty”. As a longer-term goal, if problems can be characterised and robust problem metrics developed, then this information could be used to partially or fully automate the algorithm selection problem (Rice 1976; Smith-Miles 2008). In this paper, we propose a new approach to analysing and characterising black-box optimization problems. The essence of the work is remarkably simple: the notion of length scale as the observed change in the objective function with respect to the distance between two candidate solutions. In contrast to previous research, our work builds on efficiently and effectively utilising the information available in the black-box setting, rather than deriving features based on specific characteristics of algorithms or conjecture about what constitute important landscape properties. Section 2 gives an overview of relevant literature on optimization problem analysis. Since our work is focused on (but is not limited to) continuous problems, we review and discuss the key issues in this setting (extensive recent reviews for the discrete case are cited). Section 3 formally introduces length scale, discusses its fundamental properties and presents examples. Related work based on similar quantities (i.e. finite differences and Lipschitz constants) but with fundamentally different aims is also discussed. Statistical and information-theoretic methods to analyse, summarise and interpret length scale information are discussed and illustrated on simple example problems in Sect. 4. Considerations for sampling length scales in practice are discussed in Sect. 5, and a methodology is developed and compared to commonly used sampling approaches. In Sect. 6, the length scale framework is applied to the noiseless black-box optimization benchmarking (BBOB) problem set (Hansen et al. 2010) and compared with seven well-known landscape analysis techniques. Conclusions and avenues for future work are discussed in Sect. 7. This paper is an extension of the work presented in Morgan and Gallagher (2012), with the addition of significant theoretical properties of length scale, a rigorous methodology for sampling length scales, experimental comparisons between landscape analysis techniques in the literature, and powerful extensions to the framework that allow clustering and visualisations of problem similarity. 123 2 Analysing and characterising optimization problems It is widely accepted that the comparative performance of algorithms may vary across problems within a particular problem class (Macready and Wolpert 1996). In light of this, there has been an increasing interest in developing features/properties to characterise problem structure and to relate these features to the behaviour and performance of algorithms (Hutter et al. 2006; Smith-Miles 2008). A considerable amount of problem analysis has been developed for combinatorial optimization problems. Predominately, the analysis of the travelling salesman problem (TSP) has resulted in a number of features that utilise problem-specific knowledge and have been shown to contribute to problem difficulty (Zhang and Korf 1996; Zhang 2004; Gent and Walsh 1996; Stadler and Schnabl 1992; van Hemert 2005). For example, Cheeseman et al. (1991) and Ridge and Kudenko (2007) show that increasing the standard deviation of the distances between cities for randomly generated TSP instances increases problem difficulty for a number of algorithms. Features such as these are very problem-specific and in most cases cannot be easily transferred to other problem classes. Hence, problem-specific features have been developed for other problem classes such as graph colouring, boolean satisfiability, time-tabling and knapsack problems. Comprehensive reviews of discrete fitness landscape and problem-specific techniques can be found in (Pitzer and Affenzeller 2012; Reidys and Stadler 2002; Smith-Miles and Lopes 2011; Talbi 2009). Problem-specific features are very insightful and useful, however, they do not allow comparisons between blackbox problems. Typically, problem-specific features cannot be applied to different problem classes. No domain knowledge is available in the black-box scenario, therefore, analysis should be based on candidate solutions, x , from the feasible search space, S, and their respective objective function values, f (x). The notion of the objective function as a (ddimensional) landscape or surface defined over S is intuitive and can be found across the literature in optimization and related areas. A discrete fitness landscape is formally defined by f and a graph, G representing S. The edges in G can be defined by a given move operator and induce a neighbourhood in the space. Topological properties of the fitness landscape can be precisely defined in this framework; for example a (strict) local optimum is a point x where all neighbouring solutions have worse fitness than f (x ). For any discrete or combinatorial problem instance, it is possible to computationally determine whether or not x is a local optimum by exhaustive evaluation of all neighbouring points. However, complete enumeration of the search space is often impractical due to the finite, but very large number of candidate solutions. Hence, fitness landscape analysis tech- Analysing and characterising optimization problems using length scale niques typically employ random, statistical or other sampling methods to examine a set of points of interest (and/or their fitness values) from a landscape. Features based on the landscape metaphor are inherently very general and can also be applied to problems of different classes and/or non-black-box problems. In a continuous search space, topological landscape features that are conceptually similar to the discrete case can be defined mathematically [see Pitzer and Affenzeller (2012) for a review], but evaluating these features on a real problem instance is problematic. Contrary to discrete problems, each solution in continuous space has an infinite number of neighbours in theory, and a finite but extremely large number in practice due to finite-precision representation of floating-point numbers. Hence, discrete problem features that are reliant on neighbourhood information— including autocorrelation (Weinberger 1990), correlation length (Stadler 1996), fitness evolvability portraits (Smith et al. 2002), fitness clouds (Collard et al. 2004) and variants of information content (Vassilev et al. 2000; Steer et al. 2008)—must introduce additional assumptions and parameters to be used in a continuous space, such as the size and distribution of the neighbourhood, as well as methods for adequately sampling the neighbourhood. Common recommendations and approaches include sampling from a (bounded) uniform neighbourhood distribution, discretising the space (Smith et al. 2002), varying the neighbourhood size (Malan and Engelbrecht 2009) and utilising algorithm trajectories (Muñoz et al. 2012b). However, the validity of these assumptions and their effect on empirical results is not well understood. Another significant difference between discrete and continuous landscapes is tied to the distance between points in S (using some appropriate metric). For a discrete landscape, the minimum possible pairwise distance will occur between a point and one of its neighbours. There will also be a finite set of possible distance values between all pairs of vertices in G. For a continuous landscape, the minimum distance between points can be made arbitrarily small (in practice until the limit of precision is reached) and the number of possible distance values is infinite. Consider a problem with binary representation, S = [0, 1]d . To solve the problem we determine whether each variable xi ∈ x should take the value 0 or 1, and there is no notion of the scale of xi . For a continuous problem, however, finding an appropriate scale for each xi is important (e.g. does f (x) vary in a significant way with changes in xi of order 103 , 10−3 or 10−30 ?). Discrete fitness landscape analysis techniques do not capture such information because it is not relevant for the discrete case. Some researchers have adapted landscape analysis techniques by modifying the distance metrics used. For example, fitness distance correlation (FDC) was originally proposed using Hamming distance (Jones and Forrest 1995), however, it is generally used with Euclidean distance in the continuous setting (Gallagher 2000; Müller and Sbalzarini 2011; Muñoz et al. 2012a). Problem analysis techniques originating in the continuous domain—e.g. Dispersion, a measure of the average distance between pairs of high quality solutions (Lunacek and Whitley 2006)—also typically utilise Euclidean distance. Gallagher calculated FDC for the error surface (i.e. fitness landscape) of the training problem for a multi-layer perceptron neural network (Gallagher 2000, 2001). For the specific learning task considered (student–teacher model), the global optimum was known, however, this would not normally be the case for a neural network training problem. Solutions were sampled within a specified range around the global optimum. Wang and Li calculated the FDC of a continuous NK-landscape model and on some standard test problems (Wang and Li 2008). The FDC of the CEC 2005 benchmark problem set calculated on solutions uniformly sampled from S has also been analysed (Müller and Sbalzarini 2011). While the results from these papers show some interesting structure and differences between problems, the limitations of FDC that have been noted for discrete problems remain (e.g. Müller and Sbalzarini (2011) conclude that FDC alone is not sufficient for problem design or measuring problem difficulty). How the results are impacted by the numerous experimental/implementation details that are specific to the continuous case (see the discussion above) remain unknown. Information content and its variants have also been adapted and analysed on a variety of continuous problems. In Malan and Engelbrecht (2009), variants of information content are estimated on seven continuous benchmark problems in 1 and 30 dimensions using an increasing-step random walk, while Muñoz et al. (2012b) used samples generated by instances of a (1+1) covariance matrix adaptation evolution strategy (CMA-ES), particle swarm optimization (PSO) and random search to estimate variants of information content on 2D continuous problems from the BBOB problem set (Hansen et al. 2010). While many techniques have been adapted from discrete to continuous, some landscape analysis techniques have originated from the continuous problem domain. One such technique is dispersion, which indicates the degree to which high-quality solutions are concentrated/clustered, where quality is determined by sampling n solutions and using truncation selection to retain the fittest tn solutions, where t ∈ (0, 1]. Dispersion has been shown to be a useful metric in studying the performance of algorithms relative to particular problems (and their structure); CMA-ES, hybrid PSO / CMA-ES algorithms, pattern search methods and local search have been analysed on a number of benchmark problems (Lunacek and Whitley 2006; Whitley et al. 2006; Müller et al. 2009). The dispersion values of 2D, 5D, 10D and 20D problems from BBOB’10 have also been used in the 123 R. Morgan, M. Gallagher feature-set of an algorithm prediction model (Muñoz et al. 2012b). Dispersion makes only limited use of the fitness values of solutions via the value of t used to produce the sample of solutions. Pairwise distances between solutions have also been analysed in samples of approximate local minima for multi-layer perceptron training (Gallagher 2000; Gallagher et al. 2002) and for combinatorial problems (Solla et al. 1986). Mersmann et al. (2011) proposed six feature classes (with 50 features in total) that aim to characterise the structure of continuous problems. The features use various statistical and machine learning techniques to indicate the degree of separability, global structure, modality, global to local optima contrast, basin size homogeneity, search space homogeneity, variable scaling and plateaus in the landscape. The 50 features were used to discriminate between the expert-assigned problem categories in the BBOB problem set. Regardless of whether techniques originate in the discrete or continuous domain, there are some important issues that limit the ability of the techniques to characterise problems. Arguably the most pertinent is the failure of existing techniques to utilise all information available in the black-box scenario. This stems from the approaches commonly used in the development of new landscape analysis techniques. Designers typically aim to capture a particular problem structure of interest. For example, the density of states (Rosé et al. 1996) estimates a probability distribution of objective function values—no other structures or topological features within the landscape are intended to be captured. With a given structure/topological property of interest, designers then decide how best to capture the structure (e.g. what type of sample should be used as well as how the sample should be filtered and analysed). Herein lies a major issue; by only considering a certain aspect of landscape structure and by filtering the information available, information that describes (and potentially differentiates) a landscape may be lost. Furthermore, many techniques compress the information gleaned into a single scalar value (e.g. correlation length) representing the structural feature of interest (e.g. ruggedness). Such compression leads to further information loss. Hence, in the context of characterising and differentiating problems, existing problem features (when considered individually) are inherently limited. Features have been combined in an attempt to characterise problems [see Mersmann et al. (2011) or the literature concerning automatic algorithm selection, e.g. (Hutter et al. 2006)], however, the coverage of structural information and interaction between features is complex and remains unclear. Open questions include whether existing techniques encapsulate the same information (leading to redundancy in the ensemble of features), and whether the ensemble of features captures all of the information required to characterise problems. Indeed, selecting a subset of relevant landscape features is analogous to the 123 feature selection problem in machine learning (Guyon and Elisseeff 2003). In summary, there is significant need for developing more powerful analysis techniques for black-box continuous optimization problems. Existing methods have a number of limitations due to their (original) discrete formulation and/or their specific foci. In the next section, we propose a new approach towards addressing these concerns. 3 Length scale A continuous fitness landscape is defined by three components; a search space, S ⊆ Rn , a distance metric, d : S × S → R, that defines a notion of similarity between solutions, and an objective function, f : S → R (Stadler 2002; Pitzer et al. 2012). Together, the search space and distance metric form a metric space. We aim to develop a framework that can be used to study the characteristics of a landscape independent of any particular algorithm, but that can also be potentially used to evaluate algorithm behaviour. We make no assumptions regarding the structure within landscapes, nor do we intend to capture pre-defined landscape properties. Rather, we utilise all information provided in the black-box setting (i.e. the ability to evaluate f at any point x ∈ S) to implicitly capture the landscape structure required to characterise and differentiate problems. The framework can be applied in practice and is amenable to statistical and information-theoretic analysis. Definition 1 Let xi and x j be two distinct (xi = x j ) solutions in S with corresponding objective function values f (xi ) and f (x j ). The length scale, r , is defined as r : [0, ∞) = | f (xi ) − f (x j )| d(xi , x j ) (1) The length scale intuitively measures how much the objective function changes with respect to a step between two points in the search space. Because r relies on two feasible solutions, it is applicable to both unconstrained and constrained problems (assuming feasible solution pairs can be sampled), as well as bounded and unbounded problems. Length scale is defined as a magnitude over a finite interval in the search space; directional information regarding a step from xi to/from x j is not considered. The development of the length scale framework is motivated by the analysis of continuous problems, however, because r relies solely on S, d and f , length scale analysis of discrete problems is equally possible. In some cases, a simple expression for all length scales of a landscape can be derived, as illustrated by the following examples. The resulting expressions may be either used to Analysing and characterising optimization problems using length scale infer structure directly (as in the following examples), or indirectly via further analysis techniques (see Sect. 4.2). Example 1 1D linear function Given f (x) = ax where x, a ∈ R and d(x i , x j ) = i |x − x j |, the length scale between x i and x j is |a|. Here, r captures the intuition that any step in S will be accompanied by a proportional change in f . The length scale of a d-dimensional neutral/flat landscape (i.e. f (x) = c, c ∈ R) is also a special case of this where r = 0. For most continuous problems, r will not be a constant over S. In different regions of the space, r will depend on the local topology of the fitness landscape (varying slope, basins of attraction, ridges, saddle points, etc.). Example 2 1D quadratic (sphere) function Given f (x) = ax 2 where x, a ∈ R and d(x i , x j ) = i |x − x j |, the length scale between x i and x j is: | f (x i ) − f (x j )| r= d(x i , x j ) i2 |ax − ax j2 | = |x i − x j | |a||(x i − x j )(x i + x j )| = |x i − x j | = |a||x i + x j | The derived expression for r indicates that the relative change in fitness is dependent on the location of x i and x j . For example, r is small only when both |x i | and |x j | are small (near the optimum, x = 0) or x i ≈ −x j (directly across the basin). This suggests that an algorithm needs to reduce its step size to successfully approach the optimum (e.g. gradient descent can work well on such a problem because the gradient smoothly approaches 0 at the optimum). The derivation of a simple and intuitive length scale expression is not always feasible, in particular if the problem is black-box and its definition is unknown. Instead, length scale values can be sampled from the landscape, and the sample analysed to gain insight into the problem. A number of suitable analysis techniques are given in Sect. 4, and sampling considerations and methodologies are discussed in Sect. 5. 3.1 Length scale distribution By considering r as a continuous random variable, the length scales in a landscape can be summarised by their distribution. Definition 2 The length scale distribution is defined as the probability density function p(r ). The length scale distribution, p(r ), describes the probability of observing different length scale values for a given fitness landscape. Consider Example 1 (1D linear function); since r = a, p(r ) is a Dirac delta function with a spike at r = |a|. The d-dimensional flat function also results in a Dirac delta function with a spike at r = 0. Now reconsider Example 2 (1D quadratic function), where r = |a||x i + x j |. Assuming uniform enumeration of S, r is the absolute value of the sum of two independent, continuous uniform random variables. Introducing lower (bl ) and upper (bu ) bounds for f , let Z = X + Y , where X, Y ∼ U[bl , bu ] are independent. The distribution for Z is triangular (Grinstead and Snell 2012): ⎧ z−2b l , ⎪ ⎪ ⎨ (bu −bl )2 2bu −z p Z (z) = , (bu −bl )2 ⎪ ⎪ ⎩ 0 2bl ≤ z ≤ (bl + bu ) (bl + bu ) < z ≤ 2bu otherwise Hence, p(r ) for Example 2 is a “folded” triangular distribution: p(r ) = | p Z (r )| = p Z (r ) + p Z (−r ). For simple problems such as these where the problem definition is known, the exact form of p(r ) can be derived. Generally, it can be approximated by fitting a probability density estimator, p̂(r ), to a finite sample of length scales. Indeed, in Sect. 6.3, the length scale distributions of problems in the BBOB problem set are estimated and analysed in detail. 3.2 Properties of length scale From a structural perspective, the regularities and structures that fundamentally define a landscape are captured by distance, and are therefore, invariant to isometric (distance preserving) mappings such as translation, rotation and reflection (Borenstein and Poli 2006). We define an equivalence relation (Rosen 1999) (in terms of structure and information) between two problems if there is an isometric mapping between the search spaces as well as between the objective functions. For example, consider f : R D → R, and let g(x) = f (x − α1 ) + α2 where x, α1 ∈ R D and α2 ∈ R. While both S and f have been translated (by α1 and α2 , respectively) to define g, the structure of the landscape has not changed, and so from a landscape analysis perspective—and from the point of view of any reasonable optimization algorithm— f and g are equivalent (denoted f ≡ g). Because equivalent problems share the same structure, they should be characterised equivalently, and so it follows that problem characteristics with an invariance to isometric mappings are attractive. This is analogous to algorithm 123 R. Morgan, M. Gallagher design, where algorithms are invariant to transformed, but equivalent problems [e.g. the invariance of CMA-ES (Hansen 2000)]. Length scale is the ratio of two distance functions, hence it is invariant to isometric mappings including translation, rotation and reflection. Length scale is sensitive to other nondistance preserving transformations, such as shearing and scaling of S. For uniform scaling of S by α ∈ R \ {0}, the length scale values are scaled by a factor of α1 . Likewise, for scaling of f by α, the length scale values are scaled by α. From Definition 1, if two length scales are identical, then the objective functions are equivalent between the pairs of solutions under consideration. That is, given solution pairs Sa = {xi , x j } and Sb = {x p , xq }, and functions f a : Sa → R and f b : Sb → R, if | f a (xi ) − f a (x j )| | f b (x p ) − f b (xq )| = d(xi , x j ) d(x p , xq ) then f a ≡ f b . Additionally, two non-equivalent functions will always produce non-equal length scales. Hence length scale is a very useful indicator of functional equivalence (over pairs of solutions) in the context of optimization. Assuming enumeration of all solutions (which is possible in the discrete case), two equivalent problems will yield identical length scales, and therefore, identical length scale distributions. Thus, p(r ) is a unique characteristic of landscapes. Problems with local regions of equivalence will have length scales in common, and so problem similarity can be measured by the degree to which problems share common length scales. Indeed, in Sect. 6 we exploit this idea and use statistical techniques based on p(r ) to measure the similarity of problems within the BBOB problem set. 3.3 Related work Our definition of r is related to the difference quotient (also known as Newton’s quotient and is a generalisation of finite difference techniques) from calculus and numerical analysis. f (x) , and can The difference quotient is defined as f (x+h)− h be used to estimate the gradient at a point x, as h → 0 (Overton 2001). Implementations of gradient-based algorithms utilise approximations of this form if the gradient of f is not available. Finite difference methods are widely used in the solution of differential equations, but are not directly related to this paper. Length scale is also related to the Lipschitz constant, defined as a constant, L ≥ 0, where the Lipschitz condition | f (xi )− f (x j )| ≤ Lxi −x j , ∀xi , x j is satisfied (Sergeyev and Kvasov 2010). The ideal Lipschitz constant is the smallest L for which the Lipschitz condition holds, and functions satisfying the condition are known as Lipschitz continuous. 123 The definition of the Lipschitz constant is very similar to length scale, indeed, max (r ) is equivalent to the ideal Lipschitz constant. However, the differences in the definitions of r and L define vastly different values; valid Lipschitz constants may overestimate the largest rate of change, while length scales will always underestimate or be equal to the largest rate of change. Consider again Example 2: the 1D quadratic function. From above, r = |a||x i + x j |. For convenience, we introduce bounds x ∈ [0, 1] and let a = 1 (hence f = x 2 ). Given that x i , x j ∈ [0, 1], valid r values are in (0, 2). For bounds [0, 1], Lipschitz constants in [1, ∞) satisfy the Lipschitz condition. Hence, values for r and L are very different for this simple function. Lipschitzian optimization algorithms are a class of global optimization algorithms that use knowledge of Lipschitz constants to solve Lipschitz continuous problems (Horst and Tuy 1996). A function’s Lipschitz constant is not always known a priori (e.g. in the black-box scenario), and so various methods have been developed in order to estimate L from the landscape. Estimating L is itself a global optimization problem, and so heuristics are used to actively search the landscape for L. Heuristics vary between methods, with some calculating what are essentially length scales between solution pairs, and refining the search to solutions with large/promising length scales (Strongin 1973; Wood and Zhang 1996; Beliakov 2006). Because the aim is to accurately estimate L, only the supremum of the length scales in the search data is of interest and retained. The remaining length scale samples are not used in Lipschitzian optimization, and to our knowledge have never been analysed in the context of problem analysis. Consequently, notions and concepts similar to length scale analysis are absent from the Lipschitzian optimization literature. While the techniques used to estimate L illustrate potential approaches for sampling r , the sample is biased due to the heuristics used to search for L. To summarise, length scale analysis captures information regarding all rates of change, over a wide variety of intervals (distances) on the problem. While the concept of length scale is similar to the difference quotient and the Lipschitz constant, the utilisation of this information at all scales is novel in the context of optimization problem analysis. 4 Analysing and interpreting length scales Length scale values capture information about the structure of a fitness landscape. In the following, we outline several methods to analyse, summarise and interpret the length scales obtained from a landscape. 4.1 Length scale heatmaps of 1D problems When S ⊆ R, we can visualise the length scale data using a heatmap, where the axes correspond to candidate solutions Analysing and characterising optimization problems using length scale 5 0 23.4032 4 1 3 2 3.5802 0.5477 2 0.0838 xj f 1 0 3 0.0128 4 0.0020 −1 −2 0.0003 −3 5 0 −4 6 −5 0 1 2 3 4 5 6 0 1 2 3 x xi (a) (b) 4 5 6 Fig. 1 Problem analysis using a heatmap of r . a 1D “mixed-structure” function. b r enumerated over the “mixed-structure” function x i and x j , and the cells on the map are coloured according to the respective r values. To obtain the length scales, the objective function must be evaluated at x i and x j , and the values of x i and x j can be enumerated within specified bounds and precision. Heatmaps will appear symmetric across the diagonal (drawn from top left to bottom right) which follows from the symmetric definition of r . To illustrate the richness of length scale information, we construct an artificial 1D function with a variety of different topological features, including neutrality, linear slopes and both convex and concave basins of attraction. Example 3 1D “mixed-structure” function defined as follows and shown in Fig. 1a. ⎧ −1, ⎪ ⎪ ⎪ ⎪ 50(x − 1.75)2 − 4.15, ⎪ ⎪ ⎪ ⎪ 5.125x − 11.25, ⎪ ⎪ ⎨ 50(x − 3.25)2 + 1, f (x) = 0.75(x − 4.35)2 + 3.583, ⎪ ⎪ ⎪ ⎪ ⎪ 3 log(|x − 5.6|) + 5.5, ⎪ ⎪ ⎪ ⎪ 3 log(|x − 5.4|) + 5.5, ⎪ ⎩ 0 1 ≤ x < 1.5 1.5 ≤ x < 2 2≤x <3 3 ≤ x < 3.5 3.5 ≤ x < 5 5 ≤ x < 5.5 5.5 ≤ x < 6 otherwise where S = [0, 6]. Figure 1b shows the length scales calculated between pairs of points, x i , x j , at increments of 10−3 across the search space. We have shaded/coloured the values using a logarithmic scale to better visualise magnitudes of change in r . It is clear from the length scales in Fig. 1b that there are many different structures within the landscape. While it may not be immediately obvious what the structures are, the boundaries between them can be identified by the sudden transitions in shade/colour and pattern. For example, the transition from black to light at x = 1.5 on the diagonal of the figure clearly indicates a change in structure. Likewise, the subtle transition at x = 5 shows a change in structure. The nature of the structures within the landscape can be further understood by analysing the lower (or upper) diagonal in Fig. 1b. Darker shades represent small changes in objective function values, while lighter shades indicate large jumps in objective function value. Solid blocks of colour, such as between [0, 1], signal a constant change in objective function over a region. The dark lines and curves show steps in the space where f (x i ) ≈ f (x j ), e.g. moving from a point on one side of a basin to a point on the other side of the minimum at the same fitness. Approximate locations of optima, such as x = 3.25, may be located by observing dark lines where the area on either side of the line changes to lighter shades. The visualisation in Fig. 1b also shows how simple structures, such convex and concave basins of attraction, can combine to give a complex objective function. Consider steps within [3, 6]; the change in objective function values varies significantly in a complex, unpredictable manner, which is the type of information that black-box optimization algorithms contend with when solving a given problem. 4.2 Length scale distribution and its related summaries Visual analysis of r over a large set of values is challenging for higher dimensions. One possibility is to summarise the values of r that occur over a given problem landscape. This summary may be used to compare and characterise problems as well as predict the values of r obtained if further exploration of the landscape is conducted (particularly if the sampling technique is fixed). Our hypothesis is that problems with structure of similar complexities should yield similar length scale distributions. Subsequently, statistical summaries of p(r )—such as measures of central tendency (e.g. mean, median and mode), shape (e.g. skewness and kurtosis) and variability (e.g. range, percentiles and standard deviation)—are potentially very useful for characterising problems. However, such statistics 123 R. Morgan, M. Gallagher are compressed representations of r values, and so it follows that they may not be unique and will vary depending on the structure present in a problem. Concepts from information theory can also be used to characterise p(r ). Shannon entropy is used as a measure of the uncertainty of a random variable (Cover and Thomas 1991). The entropy, h(r ), measures the expected amount of information needed to describe a random variable: ∞ h(r ) = − p(r ) log2 p(r ) dr. (2) 0 The Dirac delta function minimises differential entropy, meaning that the d-dimensional flat and 1D linear functions minimise h(r ). The uniform density function (in a bounded region) has the largest entropy of any other density function bounded within the same region. To obtain a uniform p(r ), there must be length scales of uniformly varying size, e.g. random noise functions. Other landscapes will yield h(r ) within these extremes. Information theory also provides tools to directly compare two length scale distributions. The similarity between two length scale distributions can be calculated using the KLdivergence (Kullback and Leibler 1951): D K L ( p || q) = 0 ∞ p(r ) log2 p(r ) dr . q(r ) (3) KL-divergence is not a symmetric measure; in general, D K L ( p || q) = D K L (q || p). A symmetric alternative, known as J-divergence, can be used instead: D J ( p || q) = D K L ( p || q) + D K L (q || p). (4) Given that the J-divergence can be used to measure the similarity between two distributions, the J-divergence between length scale distributions is a proxy for comparing the structural characteristics between landscapes. This is a general and potentially very powerful tool; the structural similarity between two problems can be quantified without explicitly measuring and comparing specific structural properties. In conclusion, the length scale distribution provides a unique summary of the landscape that is amenable to numerous statistical and information-theoretical analysis techniques. The analysis techniques discussed in this section are not exhaustive, and we expect that additional techniques and/or extensions may provide further insights into problem structure. 123 5 Sampling length scales When exact derivation and/or enumeration of the length scales for a problem is infeasible, a representative sample of length scales can be analysed instead. Let p(r ) be the true length scale distribution, and let p̂(r ) be the length scale distribution estimated from a finite sample of length scales. Intuitively, as the number of sampled length scales increases towards complete enumeration, p̂(r ) converges to p(r ). However, in practice, the methodology used to sample r and the overall size of the sample will affect the convergence of p̂(r ) to p(r ). Two solutions are required to compute a single r value, and so a sample of solution pairs is required to construct a sample of r . In Morgan and Gallagher (2012), a sample of solution pairs was generated from all unique pairwise combinations of a sample of solutions. More specifically, with an initial sam ple of m solutions (assumed to adequately cover S), all m2 unique combinations of pairs are used to construct a sample length scales of length scales. Using this technique, m(m−1) 2 are sampled from m unique solutions in the landscape. While this approach uses the maximum information available from m solutions, it is limited in that a length scale sample of size √ O(n) is based on only O( n) unique solutions. We conjecture that to obtain a sample of length scales representative of the true distribution, r should be sampled from as wide a variety of solutions in S as possible. In the extreme case, n length scales can be calculated using n pairs of unique solutions, i.e. 2n unique solutions. If computational effort/storage is an important consideration, the number of unique solutions used to generate the length scales can be reduced. For example, n length scales can be generated via a sample of n unique solutions by pairing each solution with exactly two other solutions. One way to achieve this is by considering the sampled solutions as a random tour and calculate r between “steps” along the tour. The method used to generate the solutions is an important aspect of the length scale framework and deserves careful consideration. For example, uniform random sampling of high-dimensional problems can yield a sparse sample where the Euclidean distances between solutions are similar (Beyer et al. 1999; Aggarwal et al. 2001; Morgan and Gallagher 2014). Hence, a uniform random sample is not ideal for generating length scales in high dimensions; the denominator of r (the distance between solutions) would be similar across all sampled r , thereby essentially reducing r to the magnitude of change in the objective function. The purpose of the length scale framework is to analyse the objective function at a variety of scales (i.e. distances), and so a sample of solutions at varying distances apart in S is required. In this paper and Morgan and Gallagher (2012), we obtain a sample of solutions by using a Lévy random walk, where steps are taken in a random, isotropic direction and Analysing and characterising optimization problems using length scale step sizes are sampled from a Lévy distribution (Shlesinger et al. 1987). The Lévy distribution pertaining to step size is defined by scale (γ ) and location (δ) parameters. The value of δ determines the minimum possible step size, and is therefore, set to 0 in our experiments. The value of γ essentially controls the magnitudes of step sizes generated. To determine appropriate values of γ , we examined the distributions of distances between solutions generated and adjusted γ to ensure that steps spanning the diameter of S were obtained. The sample size required to produce an adequate sample will vary based on the structure of the landscape. For the 1D linear and constant objective functions, any pair of solutions will yield the single length scale value that completely characterises the problem. However, for a needlein-a-haystack problem (where the objective function is a constant value for all but one solution), both the needle and solutions at unique distances away from the needle must be sampled. Of course, the underlying structure of the problem is unknown in the black-box scenario, and so choosing an appropriate sample size is difficult in practice. We suggest that sample sizes are made as large as practically possible and that p̂(r ) is examined across samples of increasing sizes in order to assess convergence, and hence, adequacy of the sample. If p(r ) is known, the KLdivergence can be used to directly measure convergence, since D K L p || p̂ = 0 when p̂(r ) = p(r ). Often, p(r ) is unknown, and so the KL-divergences between different sample sizes—i.e. D K L p̂n+1 || p̂n —can be assessed as an indicator for convergence. That is, once an adequate sample size is achieved, subsequent sampling will not drastically alter the distribution, and so the KL-divergence between the subsequent sample size and the current sample size will be negligible. The experiments in Morgan and Gallagher (2012) calculated length scales using all n2 unique solution pairs in a sample of n solutions. The following experiment investigates our conjecture that length scale is affected by the variety of the solutions in the sample. Specifically, we compare the following sampling methodologies: – MU 1 : generate a uniform random sample of n solutions and calculate n2 length scales from all pairwise solution combinations. n – MU 2 : generate a uniform random sample 2 solutions, nof choose a random tour and calculate 2 length scales between “steps” along the tour. – M L1 : generate n a Lévy random walk of n solutions and calculate 2 length scales from all pairwise solution combinations. – M L2 : generate a Lévy random walk of n2 solutions, choose a random tour and calculate n2 length scales between “steps” along the tour. Note that all four methodologies generate a sample of n2 length scales, however, they differ in the type and size of the original sample of candidate solutions. To evaluate the different methodologies, length scales are calculated for Example 2 (1D quadratic function), where the analytic form of p(r ) is known. Note that D K L p || p̂ → 0 as the sampling adequacy increases. Length scale samples are sized according to n2 , where n = [10, 50, 100, 500, 1000, 5000, 10, 000], and 30 different samples are generated for each size. For both Lévy walks, γ = 10−3 . p̂(r ) is estimated from the samples via kernel density estimation with a Gaussian kernel, using the “solvethe-equation plug-in” method (Sheather and Jones 1991) for bandwidth selection. The KL-divergence is estimated via numerical approximation (over the sample) of Eq. 3. Figure 2a shows the mean and standard deviation (as error bars) of D K L p || p̂ for each sample size. In the black-box scenario, p(r ) is unknown, and so the KL-divergence between the distributions for each sample size and its subsequent sample size (e.g. n = 10 and n = 50) is calculated. The mean and standard deviation of the divergences between sample sizes is shown in Fig. 2b. Since KL-divergence is non-negative, error bars yielding negative values are omitted from the figure. Figure 2a shows that for small samples of r , both uniform random sampling methods are superior to the Lévy random walks on this problem. This is not surprising; large steps in a Lévy walk are not as probable as small steps, and so it can take a number of samples before Lévy walks adequately explore the landscape. Interestingly, by conducting the Lévy walk and calculating r between steps along a random tour of the resulting sample, we are able to achieve a well-represented length scale sample (i.e. D K L p || p̂ for M L2 is ≤ 1) much faster than M L1 , which was used in Morgan and Gallagher (2012). Furthermore, on this problem M L2 is competitive to uniform random sampling after 103 samples. Thus our conjecture that a variety of solutions yields well-represented length scales also seems to be well-founded; for both uniform random sampling and Lévy random walks, generating a sample, choosing a random tour and calculating r between steps in the tour gives a more accurate sample for almost all sample sizes. However, because uniform random sampling in high dimensions yields solutions with similar Euclidean distances between them (Beyer et al. 1999; Aggarwal et al. 2001; Morgan and Gallagher 2014), we advocate and use Lévy random walks to sample the length scales of continuous problems. The divergences of the p̂(r )’s from a black-box perspective are shown in Fig. 2b. Both variants of the uniform random sample have small divergences, and are therefore, quite stable, even for a low number of samples. M L2 achieves a very low divergence (≤ 0.1 bits) after approximately 103 samples, whereas M L1 does not achieve low divergence until 106 samples. Small KL-divergences does not necessarily mean that p̂(r ) has converged to p(r ), however, they do indicate how 123 10 2 10 0 10 −2 10 −4 10 MU 1 MU 2 ML1 ML2 MU 1 MU 2 ML1 ML2 2 10 DKL (p(r)||p̂(r)) DKL (p(r)||p̂(r)) R. Morgan, M. Gallagher 0 10 −2 10 −4 1 10 2 10 3 10 4 10 5 10 6 10 7 10 8 10 1 10 2 10 3 10 4 5 10 10 6 10 Number of Length Scales Number of Length Scales (a) (b) 7 10 8 10 Fig. 2 Estimating sampling adequacy via convergence of p̂(r ). Lines represent the mean (over 30 trials), while error bars representthe standard deviation. Since KL-divergence is non-negative, error bars yielding negative values are omitted. a D K L p || p̂ . b D K L p̂n+1 || p̂n much length scale information (in terms of bits) might be gained by sampling further. If little can be gained, either all the important structure has been sampled (in which case, p̂(r ) is a good estimate of p(r )), or there exists important structure that is hard to find (e.g. a needle in a haystack). In the former case the sample is adequate, but in the latter case any landscape analysis technique would struggle, and length scale is no exception. The trends in Fig. 2b closely follow those in Fig. 2a, suggesting that the black-box methodology proposed provides a good summary of convergence, and hence can reliably assist practitioners in determining the adequacy of their samples. This technique is used in Sect. 6 to determine adequate sample sizes for the BBOB problem set. 6 Experimental analysis of BBOB problems In this section, we analyse the noiseless BBOB problem set using the length scale framework. In particular, we compare the length scale analysis to other landscape features from the literature, investigate the robustness of the framework, extend the framework to visualise and classify the relationships between problems and examine problems’ length scale distributions. The methodology used in these experiments is general and can be easily applied to other black-box problems. Source code is available at https://github.com/RachM/ length-scale. 6.1 Comparing length scale to existing features The following experiment investigates how well a subset of problem features from the literature—namely the correlation length, FDC, information content, partial information content, information stability, and dispersion—characterise the BBOB problem set. These features were chosen because they yield scalar values, are easy to interpret and are widely used in 123 the problem analysis context. We then compare the features to the entropy of the length scale distribution, h(r ) (other summaries of r —such as the mean and variance—could similarly be calculated). 2D, 5D, 10D and 20D problems are analysed and Euclidean distance is used as the distance metric between solutions. Sample sizes of 1000d 2 , 5000d 2 and 10,000d 2 were tested on instances from all the BBOB problems, across 2D, 5D, 10D and 20D, and there was only a negligible (≤ 1 bit) difference in sampling more than 1000d 2 solutions. Thus in the following experiments, all features are calculated from a sample of solutions obtained using a Lévy random walk (γ = 10−3 ) of 1000d 2 solutions in S = [−5, 5]d . The robustness of the features is investigated by examining them over varying instances of the problems, and varying samples of those instances. 30 problem instances are produced by supplying seeds 1 to 30 to the BBOB problem generator. For each instance, 30 different samples (of size 1000d 2 ) of S are generated, meaning for a given problem in dimension d (e.g. 2D Sphere), there are 30 × 30 samples of the problem. Hence each feature is calculated 900 times for a single problem. The BBOB problem generator applies several linear and non-linear transformations to both x and f on many of the problems. Consequently, the structure between problem instances may vary (e.g. structure within the bounds on one instance may be removed on another). Thus while we expect the features to characterise the underlying nature of the problems, we do not expect the features to be invariant between instances. FDC is calculated using the global optimum, x∗ , as well as the best solution in the sample, x̂∗ (the latter gives insight into how well FDC performs in the black-box scenario). Each variant is denoted F DCx∗ and F DCx̂∗ respectively. The choice in for information content, partial information content and information stability is highly dependent on the problem, as results vary depending on the value of chosen. In the black-box scenario, the search for an appropriate value Analysing and characterising optimization problems using length scale of is highly challenging, and is an optimization problem in itself. In these experiments, information content, partial information content and information stability are estimated with = 0, meaning transitions in objective function are “neutral” if the change in objective function value is 0. Dispersion is calculated using the best 5 % of solutions in the sample, and normalised using bound-normalisation (Morgan and Gallagher 2014). The correlation length, l, is calculated from the random walk autocorrelation function r (d) (Weinberger 1990). Given the variety of distances between solutions in the samples, the correlation length is defined as ∞ l = d=0 r (d) (Stadler 1996). Length scale distributions are estimated via kernel density estimation as described in Sect. 5. Figure 3 displays the mean and one standard deviation (as error bars) for each feature across the 24 BBOB problems and d. Correlation length is typically used to indicate ruggedness, and it specifically captures the maximum distance between solutions such that the correlation between objective function values is significant. The correlation lengths shown in Fig. 3a are generally high across the BBOB problem set, and apart from problems F16 (Weierstrass), F21 (Gallagher’s 101), F22 (Gallagher’s 21) and F23 (Katsuura), the correlation lengths are of very similar values. While most values of l are similar across d, F17 (Schaffers F7), F18 (Schaffers F7 moderately ill-conditioned), F21 and F22 vary slightly. The standard deviation in correlation length is small, mostly around 0.05 (2D F18 varies the most, with a standard deviation of 0.11), and decreases slightly as d increases for all problems. High dispersion values indicate that the “good” solutions in the sample are well separated and distributed throughout S, thus implying a rugged landscape. Since both l and dispersion are measures of ruggedness, it is not surprising that the problems in Fig. 3b with higher dispersion values mostly correspond to the problems in Fig. 3a with low l. Both Fig. 3a, b show F16 and F23 to be more rugged than the other problems. Furthermore, F1 (Sphere) to F15 (Rastrigin) are generally quite smooth compared to F16 to F24 (Lunacek bi-Rastrigin). However, in contrast to l, the dispersion varies with d. Because bound-normalisation is used, this is unlikely to be an artefact of the “curse of dimensionality” discussed in Morgan and Gallagher (2014) (indeed, the dispersions of F24 and F25 are invariant to d), and so the variations are likely caused by structural changes as d increases. Interestingly, as d increases, the differences in dispersion between the problems becomes less pronounced (e.g. the range of dispersions for 2D problems is 0.26, compared to 0.09 for 20D). Overall, l and dispersion provide very limited ability to characterise and differentiate between the BBOB problems. High FDC values indicate a strong correlation between the fitness of solutions and their distance from the global optimum. Figure 3c, d show considerably different values across the problem set. For some problems, the FDC variants actually indicate conflicting landscapes, e.g. F DCx∗ indicates F4 (Büche-Rastrigin) is slightly deceptive (which it is), while this is not the case with F DCx̂∗ . This is an important result, as it demonstrates that F DCx̂∗ is not always a reliable approximation of F DCx∗ and so conclusions based on the theory of FDC should not be drawn from F DCx̂∗ results. F6 (Attractive Sector) and F24 are also deceptive problems for some algorithms, however, neither FDC variant suggest this. F DCx∗ and F DCx̂∗ are largely invariant across d, however F DCx̂∗ is typically larger than F DCx∗ , and F DCx∗ has a lot more variation between samples. F DCx̂∗ exhibits similar trends to l; F16, F21, F22 and F24 are far less correlated than the other problems. The similarity between F DCx̂∗ and l is not surprising as they both correlate the fitness values of points within the sample. Information content and partial information content are shown in Fig. 3e, f respectively, and it is clear that they are largely unable to differentiate the BBOB problems. Information content measures the variety of fluctuations in f along the sample. A value of log6 2 (≈ 0.3869) indicates a highly rugged landscape with no neutral regions, and the results (erroneously) suggest that the BBOB problems are all highly rugged. Indeed, F1 and F5 (linear slope) are extremely smooth, and yet their information content values suggest otherwise. Partial information content indicates the degree of modality by measuring the variety of non-neutral regions in the sample Vassilev et al. (2000), and the results in Fig. 3f are very similar to information content. With the exception of F7 (step ellipsoidal), the partial information content is invariant to both the problems and d, and has extremely small variance between samples. F7 contains numerous neutral regions, which both information content and partial information content appear to have captured. Figure 3f and further analysis of F7’s information content (not shown here) suggest that the 5D problem contain significantly more transitions between neutral and non-neutral regions than the 2D, 10D and 20D problems. Subsequent results for 7D, 9D, 11D and 19D F7 problems showed similar behaviour to 5D, suggesting that in general, F7 problems with odd-d contain fewer neutral regions than even-d. Figure 3f also indicates F17, F19 (Composite Griewank–Rosenbrock), F23 and F24 are more multi-modal than the other problems. The information stability is the largest transition in objective function values encountered along the walk, i.e. max | f (xi ) − f (x j )| . Hence, it is conceptually very similar to max (r ). Figure 3g, h show the information content and h(r ) respectively; both contain very similar trends, however, there are some minor differences (e.g. h(r ) is more varied across D for F21 to F23). Both information stability and h(r ) are generally well-correlated with the conditioning of the problem; high conditioned problems, like F12 (Bent Cigar), have high information stability and h(r ) values. 123 R. Morgan, M. Gallagher 1 2D 5D 10D 20D 1 2D 5D 10D 20D 0.8 Dispersion Correlation Length 1.5 0.5 0.6 0.4 0 0.2 −0.5 0 5 10 15 20 0 0 25 5 Problem ID (a) 1 F DC xˆ∗ −0.5 2D 5D 10D 20D −1 0 5 10 15 20 −1 0 25 2D 5D 10D 20D 5 10 Partial Information Content Information Content 0.6 0.4 0.2 15 20 0.8 0.6 0.4 0.2 0 0 25 2D 5D 10D 20D 5 Problem ID 10 10 20 25 (f) 30 2D 5D 10D 20D 2D 5D 10D 20D 20 h(r) Information Stability 10 15 Problem ID (e) 10 25 (d) 0.8 10 20 1 2D 5D 10D 20D 5 15 Problem ID (c) 1 15 25 0 −0.5 Problem ID 10 20 0.5 0 0 0 15 (b) 1 0.5 F DCx∗ 10 Problem ID 10 5 0 10 0 0 5 10 15 Problem ID (g) 20 25 −10 0 5 10 15 20 25 Problem ID (h) Fig. 3 Problem analysis features for the BBOB problems. a Correlation length. b Dispersion. c F DCx∗ . d F DCx̂∗ . e Information content. f Partial information content. g Information stability. h Length scale entropy 123 Analysing and characterising optimization problems using length scale The length scale entropy will vary depending on the distribution’s shape and location. The magnitudes of the length scale values varies significantly across the BBOB problems, however, the values generally yield unimodal, long-tailed distributions. Consequently, the similarity between h(r ) and information stability is likely due to the similarity in the shapes of the length scale distributions across the BBOB problems. Statistical measures such as the maximum, median, mean and variance of r also exhibit similar trends to h(r ) and information stability. While the results here show similar trends between h(r ) and information stability, length scale distributions with different shapes and locations can produce different trends. Encouragingly, both features are clearly able to distinguish the similarities and differences between the BBOB problems. In addition, both features exhibit invariance to d and show very little variance between samples. Thus in terms of characterising the BBOB problems, h(r ) and information stability are clearly superior to the other techniques analysed. The expected amount of information to describe r is quantified in h(r ), hence problems with similar h(r ) values are likely to have similar degrees of structural complexity. For example, F8 (Rosenbrock) and F9 (Rotated Rosenbrock) are structurally identical (they differ only by a rotation of S), and Figure 3h shows they have almost identical h(r ) values. In practice, h(r ) can be used to identify similar (or dissimilar) problems, and further information can be obtained by comparing the p(r )’s. This is utilised in Sect. 6.3, where length scale distributions are compared both visually and using J-divergence. To summarise, the features compared in these experiments show some ability to capture landscape structure, however, most features are unable to detect the known differences across the BBOB problem set. In terms of characterising and distinguishing problems, the existing features seem limited. There were two notable exceptions—information stability and h(r )—that produced a wide range of consistent, reliable values with clear relationships to the problems. 6.2 Comparing length scale with an ensemble of features The length scale analysis in Sect. 6.1 was clearly able to characterise the BBOB problems, while many of the existing features struggled. Collectively, the existing features may offer a greater ability to characterise and distinguish problems (Hutter et al. 2006; Smith-Miles and Tan 2012), and so in the following we evaluate the ability of an ensemble of the features to characterise problems, and compare this to the length scale framework. In addition, we propose a novel—yet natural—extension to the length scale framework that can be used to classify and visualise the problems, and we compare this approach to the feature-ensemble approach. To do so, we analyse the high- dimensional “feature space” defined by the features, and the relationships between problems in such a space. Conceptually, the feature space is not novel; e.g. Smith-Miles and Tan (2012) generated problem feature vectors for a variety of TSP problems and used principal component analysis to visualise the instances in the (reduced) feature space. The following experiments use the 20D BBOB results from Sect. 6.1. In the feature-ensemble approach, each problem is represented by a 7D feature vector consisting of the correlation length, dispersion, F DCx∗ , F DCx̂∗ , information content, partial information content and information stability, averaged across the seeds/walks. The features are normalised by their appropriate bounds, and because information stability is unbounded, we normalised the values by the range of information stability values obtained across the problems. The 20D BBOB problems are thus located in a 7D (unit hypercube) feature space that can be further analysed via clustering, classification and visualisation techniques. As discussed in Sect. 4.2, the J-divergence between two length scale distributions is a natural measure of problem similarity. Hence we propose using the J-divergences between problems to implicitly define the problem space. That is, the J-divergences can be used to infer certain properties of the problem space, without knowledge of the explicit locations of problems within the space. In contrast to the feature-ensemble approach, this approach does not explicitly define—and hence constrain—the problem space. Dimensionality reduction techniques are frequently used to visualise high-dimensional data in two or three dimensions (Lee and Verleysen 2007). To visualise the problem spaces resulting from the length scale and the featureensemble approaches, we employ t-distributed stochastic neighbourhood embedding (t-SNE) (Van der Maaten and Hinton, 2008). t-SNE is a state of the art, probabilistic, non-linear dimensionality reduction technique that aims to distribute points in a lower-dimensional space such that the original, high-dimensional neighbourhood relationships are preserved. More specifically, a non-convex cost function modelling the discrepancy between the low and highdimensional relationships is minimised using a variant of stochastic gradient descent. The algorithm is parameterised by a perplexity term, which essentially controls the number of effective neighbours near a given point. Based on the recommendations in Van der Maaten and Hinton (2008) and exploratory experimentation, the perplexity was set to 5 for all visualisations in this paper. The explicit locations the high-dimensional points are not required by t-SNE, rather, t-SNE relies on a notion of dissimilarity between points. Thus in the following, the average J-divergences between the 24 BBOB problems (across the different walks/seeds) were used to calculate a 24 × 24 dissimilarity matrix. To apply t-SNE with the feature-ensemble approach, a 24 × 24 distance matrix was generated by cal- 123 R. Morgan, M. Gallagher 2000 19 9 24 8 1000 0 −1000 1 6 3 F1 − 5 F6 − 9 F10 − 14 F15 − 19 F20 − 24 13 5 20 15 1500 1000 500 21 18 12 4 −2000 −2000 −1000 9 23 12 19 2 4 11 10 1000 2000 16 17 18 −1500 −2000 (a) 5 1 −1000 11 0 22 21 20 −500 10 17 7 0 14 2 22 6 F1 − 5 F6 − 9 F10 − 14 F15 − 19 F20 − 24 3 13 8 7 23 16 15 −1000 0 14 24 1000 2000 (b) Fig. 4 Feature spaces of 20D BBOB problems reduced via t-SNE. a Feature-ensemble approach. b Length scale approach 0.7 60 0.6 50 Distance Distance 0.5 0.4 0.3 30 20 0.2 10 0.1 0 40 171812 2 1410212211 1 8 3 1520 6 13 5 4 9 2419 7 1623 0 1619 1 5 17 6 8 9 142418 2 10112012212223 3 15 7 13 4 Problem ID Problem ID (a) (b) Fig. 5 Dendrograms of the 20D BBOB problems. a Feature-ensemble approach. b Length scale approach culating the Euclidean distance between problems’ feature vectors. Due to the stochastic nature of t-SNE, 1000 different trials were conducted, with a maximum of 5000 iterations for each trial. Figure 4 shows the best t-SNE visualisation (in terms of the final cost) from the trials for each approach. The cost of a t-SNE solution indicates the discrepancy between the neighbourhood relationships in the two-dimensional visualisation and the original high-dimensional data, and were quite consistent across the 1000 trials. Specifically, the feature-ensemble approach ranged between 0.1064 and 0.5522 with a median cost of 0.1314, while the length scale approach ranged between 0.1979 and 0.3244 with a median cost of 0.2034. While the feature-ensemble approach shown in Fig. 4a has lower error, the relationships between structurally similar problems are not evident, and overall, it lacks the ability to discriminate between problems. For example F8 and F9 (i.e. Rosenbrock problems) are well separated in the space, despite the problems differing only by rotation. Figure 4b demonstrates the ability of the length scale framework approach to capture structurally similar problems—not only are F8 and F9 close, but so are F2 (ellipsoidal) and F10 (ellipsoidal) as well as F3 (Rastrigin) and F15 (i.e. Rastrigin problems). 123 To determine if specific cluster structure is evident in the data, hierarchical clustering was applied to the Jdivergence matrix and the feature vector distance matrix using unweighted average distance linkages. The resulting dendrograms for the 20D BBOB problems can be seen in Fig. 5. Problems whose linkage is less than half the maximum linkage share common shaded/coloured lines. Interestingly, the clusters yielded from both hierarchical clustering correspond well with the clusters in the visualisations produced by t-SNE. However, the clusters and relationships suggested by the t-SNE visualisations and dendrograms do not correspond to the problem categories in the original BBOB specification. The BBOB problems and categories were defined by researchers with the aim of providing a wide variety of test problems with different landscape structures. This was done using intuition in two and three dimensions, modifying previously proposed problems and experience with algorithms. Hence, the categories might not capture the important underlying relationships between problems, or perhaps be the best categorization for algorithm evaluation. The feature-ensemble approach shown in Fig. 5a lacks strong clusters, although there are perhaps three weak clusters (F7, F16/F23 and the remaining problems). While many similar problems are close in proximity (e.g. F3/F15, Analysing and characterising optimization problems using length scale F9/F19, F17/F18 and F21/F22), many structurally dissimilar problems are also close in proximity (e.g. F1/F8, F3/F20, F12/F18, F24/F9). In contrast, the J-divergence of length scales (Fig. 5b) clusters the problems into many different classes that correspond well with the underlying structures of the problems. For example, F8 and F9 are together, as are the elliptically structured F2, F10 and F11 (Discus), and the Gaussian-constructed F21 and F22. There are some exceptions; F17 and F18 are separated despite both being Schaffer F7 functions (F18 is moderately ill-conditioned). 6.3 Interpreting length scale distributions The results above indicate that there are some strong similarities between problems, and so in this section, we examine the length scale distributions, p(r ), in greater detail. Similar to Sect. 5, each p(r ) is a kernel density estimator with a Gaussian kernel and bandwidth selected with the “solve-theequation plug-in’ method. For each problem there are 900 (30 seeds × 30 walks) p(r ) estimates, and so we show the average (with shading of one standard deviation) of the distributions across the seeds/walks. For most problems, the length scales did not vary significantly between dimensions (e.g. see Fig. 6b). F5 (linear slope), however, varied quite a lot between 2D and the other dimensions (see Fig. 6a). More specifically, the J-divergence between F5 in 2D compared to 5D, 10D and 20D are 2.23, 4.55 and 5.36 respectively. In comparison, F5’s next largest J-divergence is very low (0.68) between 5D and 20D. The F5 problem is a linear slope with a gradient that varies logarithmically between 1 and 10 for each xi . If we were to only consider length scales of axis-aligned steps, the length scale values yielded are within [1, 10], with an increasing presence of “small” length scales as dimensionality increases (due to the logarithmic distribution of gradient values along the solution variables). Introducing non-axisaligned steps simply means that length scales between the extreme gradient values will also occur. The length scale distributions yielded for F5 closely reflects the structural change over dimensionality. In 2D there are many length scales in x 10 0.5 2D 5D 10D 20D 0.3 0.2 −4 2D 5D 10D 20D 2.5 2 p(r) p(r) 0.4 1.5 1 0.5 0.1 0 the problem of approximately 10, and the remaining length scales vary quite uniformly between 1 and 10. As dimensionality increases, p(r ) reflects the presence of more “smaller” length scales. The length scale distribution indicates the changes in objective function values between solutions that can be expected to occur. The scaling on the r axis of Fig. 6a, b illustrates that r can be extremely different between two problems. From the length scales sampled, F20 (Schwefel) has length scales that are 3 orders of magnitude larger than F5. More specifically, F5 has changes in objective function values that are at most 20 times a given step size, while F20 has changes in objective function values that are at most 76 300 times a given step size. Such information may be useful in the selection of algorithm parameters or interpretation of algorithm search trajectories. Given that the distributions for the BBOB problems were found to be mainly similar across d, for the remainder of the length scale distribution analysis we examine problems in 20D. The resulting length scale distributions are quite varied across the problems, however, there are a few notable similarities, some of which are shown in Fig. 7. Figure 7a shows p(r ) for F3 (Rastrigin), F4 (BücheRastrigin) and F15 (Rastrigin). F15 is a non-separable and less regular variant of F3, and as a result, they have almost identical length scales and a low J-divergence of 1.14. On the other hand, F4 contains slightly different structure to F3 and F15, however, since the global structure is similar across the problems, we observe some similarity in p(r ). The Jdivergence between F4 and F3 is 0.92, compared to the high value of 19.95 between F4 and F15. The length scale distributions for F6 (attractive sector), F8 (Rosenbrock) and F9 (Rosenbrock Rotated) can be seen in Fig. 7b. The larger peak in the F9 distribution indicates that there are more low-valued length scales. Both F8 and F9 are variations of Rosenbrock and we observe similar changes in fitness, and hence similar distributions of length scales. The J-divergence between F8 and F9 is very low (0.24), indicating that the two problems are almost identical. Contrary to this, 0 5 10 r 15 20 0 (a) 0 1 2 3 r 4 5 6 7 8 x 10 4 (b) Fig. 6 Examples of p(r ) that have large and small variation with d. a F5. b F20 123 R. Morgan, M. Gallagher −5 6 p(r) p(r) 0.01 F6 F8 F9 4 2 0.005 0 x 10 F3 F4 F15 0.015 0 500 1000 1500 2000 2500 3000 3500 0 0 5 10 r r (a) (b) 15 20 x 10 4 Fig. 7 Examples of similar p(r ) estimated on BBOB problems. a 20D F3, F4 and F15. b 20D F6, F8 and F9 the J-divergence between F6 and F8 is 1.56, and between F6 and F9 is 1.46. A number of other problems (e.g. F17 and F18) were also found to have similar length scale distributions (with low J-divergence). The difference in shape and scale between Fig. 7a, b illustrates the potential differences between length scale distributions. It is clear that problems with similar structure have similar p(r )s and low J-divergences between them, while problems with different structure have different p(r )s and large J-divergences. This demonstrates p(r ) as a powerful tool for characterising and differentiating optimization problems. In addition, the distributions in Figs. 6 and 7 are surrounded by very thin shading, indicating that the standard deviation across samples is low. 7 Conclusions and future work This paper has proposed length scale as a fundamental property of optimization problems. The framework is very general, and particularly applicable for black-box problems as it is based on sampling candidate solutions and their objective functions values. The simplicity of length scale means it can be readily calculated from samples of the landscape and/or algorithm search data. Source code for all experiments in this paper are available at https://github.com/ RachM/length-scale. Analytical properties of length scale have been discussed and techniques for problem analysis were proposed using statistical and information-theoretic summaries of length scale. Length scale analysis on simple example problems illustrated the framework’s ability to capture important problem structures and the complexity of their interactions. The framework was also used to assess the adequacy of samples varying in size from the landscape, thereby providing insight into appropriate sample sizes. This methodology could be used in other applications where a notion of sampling adequacy is needed. Experimental results on the BBOB problem set shows that the length scale analysis provides a greater ability to 123 discriminate between the problems in comparison to seven well-known landscape analysis techniques. In addition, summaries of r were shown to be statistically robust to different samples of given problem instances. Valuable insights into the complex relationships between the BBOB problems were gained using dimensionality reduction and hierarchical clustering techniques applied to the length scale data. In practice, using a sample of r values may result in different landscapes yielding the same sets of length scales, and hence length scale summaries. Identical sets of r values can be obtained from two landscapes sharing similar structure, where the structure discriminating them is not captured in the sample. However, any practical landscape analysis technique is limited to the information obtainable via sampling. Compressing O(n) length scale values into a single summary value (e.g. the mean of sampled r values or h(r )) may incur information loss. This too is an unavoidable issue for many existing landscape techniques. Hence, while the use of length scale summaries may aid in characterising and analysing problems, they are not necessarily unique for individual problem instances. We believe that there is considerable scope to apply a range of other techniques for modelling and summarising length scale information. It should be possible to explore the relationship between well-defined topological properties such as landscape modality and the shape of p(r ). Length scale is sensitive to the scale of x and f . As discussed in Sect. 3.2, functions that differ by a scaling factor, α ∈ R, yield length scale values that also differ by a scale of α. Hence the current, un-normalised approach yields values that distinguishes the scaling between functions, but also affords analysis that can recognise when functions differ merely by a scaling factor. If upper bounds on | f (xi −x j )| and d(xi , x j ) are known (which is not usually the case in black-box optimization), then the length scale could theoretically be normalised. For functions differing by a scaling factor, α, the normalised length scale values are equal. We envisage that the normalisation of length scale values could be useful in situations where the scaling of f or x is not important. Analysing and characterising optimization problems using length scale Our experiments used entropy to summarise p(r ), but other ideas from statistics and information theory (including those already discussed in Sect. 4.2) deserve investigation. In particular, the current methods used to analyse length scales ignore spatial information, such as the locations of the solutions. Our experimental results assume that the sampling methodology used produces a representative sample of the search space. While the convergence of different samples’ length scale distributions and the low variance of length scale between samples indicates the sample size was sufficient, whether or not the sample is an adequate representation of the landscape requires further quantification. The set of solutions evaluated by an algorithm during a run could also be analysed in the length scale framework. Comparisons between the landscape’s length scales and the algorithm’s resulting length scales can be made, and possible insights into algorithm behaviour may be drawn. In addition, this information could potentially be used to make online algorithm parameter adjustments. Similar to the comparisons of problem length scale distributions conducted in this paper, the length scale distributions of the solutions produced by algorithm instances could also be directly compared to each other. We are also interested in exploring the relationship between algorithm performance and length scale metrics (e.g. h(r )), as it may be possible to incorporate the length scale framework into algorithm prediction models. Compliance with ethical standards Conflicts of interest None. References Aggarwal CC, Hinneburg A, Keim DA (2001) On the surprising behavior of distance metrics in high dimensional spaces. In: Proceedings of the 8th international conference on database theory. Springer, London, pp 420–434 Bartz-Beielstein T, Chiarandini M, Paquete L, Preuss M (2010) Experimental methods for the analysis of optimization algorithms. Springer, New York Beliakov G (2006) Interpolation of lipschitz functions. J Comput Appl Math 196(1):20–44 Beyer KS, Goldstein J, Ramakrishnan R, Shaft U (1999) When is ”nearest neighbor” meaningful? In: Proceedings of the 7th international conference on database theory. Springer, London, pp 217–235 Borenstein Y, Poli R (2006) Kolmogorov complexity, optimization and hardness. In: IEEE congress on evolutionary computation (CEC 2006), pp 112–119 Cheeseman P, Kanefsky B, Taylor W (1991) Where the really hard problems are. In: Proceedings of 12th international joint conference on AI, Morgan Kauffman, pp 331–337 Collard P, Vérel S, Clergue M (2004) Local search heuristics: fitness cloud versus fitness landscape. In: The 2004 European conference on artificial intelligence, IOS Press, pp 973–974 Cover TM, Thomas JA (1991) Elements of information theory. Wiley, New York Fletcher R (1987) Practical methods of optimization, 2nd edn. Wiley, Hoboken Forrester A, Sóbester A, Keane A (2008) Engineering design via surrogate modelling: a practical guide. Wiley, Hoboken Gallagher M (2000) Multi-layer perceptron error surfaces: visualization, structure and modelling. PhD thesis, Department of Computer Science and Electrical Engineering, University of Queensland Gallagher M (2001) Fitness distance correlation of neural network error surfaces: a scalable, continuous optimization problem. In: Raedt LD, Flach P (eds) European conference on machine learning, Singapore, Lecture notes in artificial intelligence, vol 2167, pp 157–166 Gallagher M, Downs T, Wood I (2002) Empirical evidence for ultrametric structure in multi-layer perceptron error surfaces. Neural Process Lett 16(2):177–186 Gent I, Walsh T (1996) The TSP phase transition. Artif Intell 88(1– 2):349–358 Grinstead CM, Snell JL (2012) Introduction to probability. American Mathematical Society, Providence Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182 Hansen N (2000) Invariance, self-adaptation and correlated mutations in evolution strategies. In: Schoenauer et al M (ed) Parallel problem solving from nature—PPSN VI. Lecture notes in computer science, vol 1917, Springer, pp 355–364 Hansen N, Finck S, Ros R, Auger A (2010) Real-parameter black-box optimization benchmarking 2010: noiseless functions definitions. Technical Report, RR-6829, INRIA Horst R, Tuy H (1996) Global optimization: deterministic approaches. Springer, New Yotk Hutter F, Hamadi Y, Hoos H, Leyton-Brown K (2006) Performance prediction and automated tuning of randomized and parametric algorithms. In: Benhamou F (ed) Principles and practice of constraint programming. Lecture notes in computer science, vol 4204, Springer, pp 213–228 Jin Y (2005) A comprehensive survey of fitness approximation in evolutionary computation. Soft Comput 9(1):3–12 Jones T, Forrest S (1995) Fitness distance correlation as a measure of problem difficulty for genetic algorithms. In: Proceedings of the 6th international conference on genetic algorithms. Morgan Kaufmann, San Francisco, CA, pp 184–192 Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86 Lee JA, Verleysen M (2007) Nonlinear dimensionality reduction. Springer, New York Lunacek M, Whitley D (2006) The dispersion metric and the CMA evolution strategy. In: Proceedings of the 8th annual conference on genetic and evolutionary computation. ACM, New York, pp 477–484 Macready W, Wolpert D (1996) What makes an optimization problem hard. Complexity 5:40–46 Malan K, Engelbrecht A (2009) Quantifying ruggedness of continuous landscapes using entropy. In: IEEE congress on evolutionary computation, pp 1440–1447 Mersmann O, Bischl B, Trautmann H, Preuss M, Weihs C, Rudolph G (2011) Exploratory landscape analysis. In: Proceedings of the 13th annual conference on genetic and evolutionary computation. ACM, New York, pp 829–836 Morgan R, Gallagher M (2012) Length scale for characterising continuous optimization problems. In: Coello et al CAC (ed) Parallel problem solving from nature—PPSN XII. Lecture notes in computer science, vol 7491, Springer, pp 407–416 Morgan R, Gallagher M (2014) Sampling techniques and distance metrics in high dimensional continuous landscape analysis: limitations and improvements. IEEE Trans Evol Comput 18(3):456–461 123 R. Morgan, M. Gallagher Muñoz MA, Kirley M, Halgamuge S (2012a) A meta-learning prediction model of algorithm performance for continuous optimization problems. In: Coello et al CAC (ed) Parallel problem solving from nature—PPSN XII. Lecture notes in computer science, vol 7491, Springer, pp 226–235 Muñoz MA, Kirley M, Halgamuge SK (2012b) Landscape characterization of numerical optimization problems using biased scattered data. In: IEEE congress on evolutionary computation, pp 1180– 1187 Müller C, Baumgartner B, Sbalzarini I (2009) Particle swarm CMA evolution strategy for the optimization of multi-funnel landscapes. In: IEEE congress on evolutionary computation, pp 2685–2692 Müller CL, Sbalzarini IF (2011) Global characterization of the CEC 2005 fitness landscapes using fitness-distance analysis. In: Proceedings of the 2011 international conference on applications of evolutionary computation. vol Part I. Springer, Berlin, Heidelberg, pp 294–303 Overton M (2001) Numerical computing with IEEE floating point arithmetic. Cambridge University Press, Cambridge Pitzer E, Affenzeller M (2012) A comprehensive survey on fitness landscape analysis. In: Fodor J, Klempous R, Suárez Araujo C (eds) Recent advances in intelligent engineering systems, studies in computational intelligence. Springer, New York, pp 161–191 Pitzer E, Affenzeller M, Beham A, Wagner S (2012) Comprehensive and automatic fitness landscape analysis using heuristiclab. In: Moreno-Díaz R, Pichler F, Quesada-Arencibia A (eds) Computer aided systems theory–EUROCAST 2011. Lecture notes in computer science, vol 6927, Springer, pp 424–431 Reidys C, Stadler P (2002) Combinatorial landscapes. SIAM Rev 44(1):3–54 Rice JR (1976) The algorithm selection problem. Adv Comput 15:65– 118 Ridge E, Kudenko D (2007) An analysis of problem difficulty for a class of optimisation heuristics. In: Proceedings of the 7th european conference on evolutionary computation in combinatorial optimization, Springer, pp 198–209 Rosé H, Ebeling W, Asselmeyer T (1996) The density of states—a measure of the difficulty of optimisation problems. In: Voigt et al HM (ed) Parallel problem solving from nature PPSN IV. Lecture notes in computer science, vol 1141, Springer, pp 208–217 Rosen K (1999) Handbook of discrete and combinatorial mathematics, 2nd edn., Discrete mathematics and its applicationsTaylor & Francis, Routledge Sergeyev YD, Kvasov DE (2010) Lipschitz global optimization. Wiley Encycl Oper Res Manag Sci 4:2812–2828 Sheather S, Jones M (1991) A reliable data-based bandwidth selection method for kernel density estimation. J R Stat Soc Ser B (Methodological) 53:683–690 Shlesinger MF, West BJ, Klafter J (1987) Lévy dynamics of enhanced diffusion: application to turbulence. Phys Rev Lett 58:1100–1103 Smith T, Husbands P, O’Shea M (2002) Fitness landscapes and evolvability. Evol Comput 10(1):1–34 Smith-Miles K (2008) Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Comput Surv 41(1):1–25 123 Smith-Miles K, Lopes L (2011) Measuring instance difficulty for combinatorial optimization problems. Comput Oper Res 39(5):875– 889 Smith-Miles K, Tan TT (2012) Measuring algorithm footprints in instance space. In: IEEE congress on evolutionary computation (CEC), pp 1–8 Solla SA, Sorkin GB, White SR (1986) Configuration space analysis for optimization problems. In: et al EB (ed) Disordered systems and biological organization, NATO ASI Series, vol F20, Springer, Berlin, New York, pp 283–293 Stadler PF (1996) Landscapes and their correlation functions. J Math Chem 20:1–45 Stadler PF (2002) Fitness landscapes. In: Lässig M, Valleriani A (eds) Biological evolution and statistical physics. Lecture notes in physics, vol 585, Springer, pp 183–204 Stadler PF, Schnabl W (1992) The landscape of the traveling salesman problem. Phys Lett A 161(4):337–344 Steer K, Wirth A, Halgamuge S (2008) Information theoretic classification of problems for metaheuristics. In: Li et al X (ed) Simulated evolution and learning, Lecture notes in computer science, vol 5361, Springer, pp 319–328 Strongin R (1973) On the convergence of an algorithm for finding a global extremum. Eng Cybern 11:549–555 Talbi E (2009) Metaheuristics: from design to implementation., Wiley series on parallel and distributed computingWiley, Hoboken Van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(11):2579–2605 van Hemert J (2005) Property analysis of symmetric travelling salesman problem instances acquired through evolution. Evol Comput Comb Optim 3448:122–131 Vassilev VK, Fogarty TC, Miller JF (2000) Information characteristics and the structure of landscapes. Evol Comput 8:31–60 Wang Y, Li B (2008) Understand behavior and performance of real coded optimization algorithms via nk-linkage model. In: IEEE world congress on computational intelligence, pp 801–808 Weinberger E (1990) Correlated and uncorrelated fitness landscapes and how to tell the difference. Biol Cybern 63:325–336 Whitley D, Watson JP (2005) Complexity theory and the no free lunch theorem. In: Search methodologies, Springer, pp 317–339 Whitley D, Lunacek M, Sokolov A (2006) Comparing the niches of CMA-ES, CHC and pattern search using diverse benchmarks. In: Runarsson et al TP (ed) Parallel problem solving from nature— PPSN IX. Lecture notes in computer science, vol 4193, Springer, pp 988–997 Wood GR, Zhang BP (1996) Estimation of the Lipschitz constant of a function. J Glob Optim 8:91–103 Zhang W (2004) Phase transitions and backbones of the asymmetric traveling salesman problem. J Artif Intell Res 21(1):471–497 Zhang W, Korf RE (1996) A study of complexity transitions on the asymmetric traveling salesman problem. Artif Intell 81(1–2):223– 239
© Copyright 2025 Paperzz