View full text in PDF format - Lietuvos mokslo periodikos asociacija

ISBN 978-609-95241-4-6
L. Sakalauskas, A. Tomasgard, S. W.Wallace (Eds.):
Proceedings. Vilnius, 2012, pp. 136–141
© The Association of Lithuanian Serials,
Lithuania, 2012
doi:10.5200/stoprog.2012.24
International Workshop
“Stochastic Programming for Implementation
and Advanced Applications”
(STOPROG-2012)
July 3–6, 2012, Neringa, Lithuania
ON THE ARITHMETIC OF INFINITY ORIENTED IMPLEMENTATION
OF THE MULTI-OBJECTIVE P-ALGORITHM
Antanas Žilinskas
Vilnius University, Institute of Mathematics and Informatics
Akademijos str. 4, LT-08663 Vilnius, Lithuania
E-mail: [email protected]
Abstract. The single-objective P-algorithm is a global optimization algorithm based on a statistical model of objective functions and the axiomatic theory of rational decisions. It has been proven quite suitable
for optimization of black-box expensive functions. Recently the P-algorithm has been generalized to multi-objective optimization. In the present paper, the implementation of that algorithm is considered using
the new computing paradigm of the arithmetic of infinity. A strong homogeneity of the multi-objective
P-algorithm is proven, thus enabling rather a simple application of the algorithm to the problems involving infinities and infinitesimals.
Keywords: arithmetic of infinity, multi-objective optimization, global optimization.
1.
Introduction
New computing paradigms and projects of hardware, which potentially enable the achievement of
enormous performance and/or super-high precision, urge the development of new algorithms, and the adaptation of well-respectable conventional algorithms to the new prospects. A recently proposed computing paradigm, the arithmetic of infinity, lays the foundation of computations involving infinities and
infinitesimals [9, 13]. Although the hardware for implementing the arithmetic of infinity at present is not
available, in principle such hardware can be designed and built as shown in [12]. The arithmetic of infinity is attractive for the development of algorithms for numerical problems in various fields of applied
mathematics, e.g. mathematical modeling [11], operations research and mathematical programming [2],
global optimization [20], quantitative analysis of fractals [11], and others. In the present paper, multiobjective optimization problems are considered where the computation of objective vectors, using the
standard computer arithmetic, is problematic because of either underflows or overflows. Besides the fundamentally new problems of minimization, where the computation of values of the objective functions
involves either infinities or infinitesimals, the arithmetic of infinity can also be helpful in tackling optimization problems with objective functions computable using the numbers that differ in many orders. For
example, the objective functions of the optimization problems of statistics considered in [20, 22] are
computed operating with the numbers that differ more than 10200 times. The arithmetic of infinity can be
applied in the optimization of challenging objective functions in two ways. First, the optimization algorithm is implemented in the arithmetic of infinity. Second, the arithmetic of infinity is applied only to
scale the objective function values to be suitable for processing in the conventional computer arithmetic,
and a conventionally implemented optimization algorithm is used to process the scaled values. The second implementation is considerably simpler than the first one, since the arithmetic of infinity is applied
only to scale the function values. However, the second implementation can be recognized correct only in
case both implementations generate the same sequence of points to compute the vectors of objectives. It
has been shown in [20] that not all single-objective global optimization algorithms are appropriate to the
correct second type implementation. In the present paper, we formulate a sufficient condition of the second-type correct implementation, called the strong homogeneity, and show that the multi-objective
P-algorithm satisfies that condition. The multi-objective P-algorithm, like its single-objective prototype,
is oriented to black box expensive functions which comprise a class of global optimization problems most
difficult to tackle. Both versions of algorithms are based on the similar statistical models of objective
functions. For the axiomatic essentials of the approach to global optimization, based on the statistical
models of objective functions, we refer to [15, 16]. Recently it has been proven that the single-objective
P-algorithm is strongly homogeneous [20]. Here we generalize that result showing that the multiobjective P-algorithm is strongly homogeneous as well. In the next section, the property of strong
136
ON THE ARITHMETIC OF INFINITY ORIENTED IMPLEMENTATION OF THE MULTI-OBJECTIVE P-ALGORITHM
homogeneity of multi-objective optimization algorithms is formulated, which is sufficient for the correct
second-type implementation of the algorithm meant for solving the problems related to the arithmetic of
infinity. Section 3 presents brief description of the multi-objective P-algorithm. The proof of the strong
homogeneity of the multi-objective P-algorithm is presented in Section 4.
2.
The definition of strong homogeneity
The minimization problem
min X ∈A F ( X ), F(X) = (f1(X),..., fm(X))T, A ⊂ R d ,
(1)
is considered, where the vector objective function F(X) is defined over a simple feasible region, e.g., for
concreteness it can be assumed that A is a hyper rectangle set. For definition of the solution to the multiobjective optimization problem with nonlinear objectives we refer to [7]. Depending on the properties of
the problem, different approaches to its solution can be applied. However, in the most general statement
any algorithm can be described as a sequence of mappings
πn : An × (Rm)n →A,
which define the points of the current computation of objectives, depending on the results of the previous
iterations:
X n +1 = π n ( X1 , ..., X n , Y1 , ..., Yn ), Yi = F ( X i ), i = 1, ..., n.
(2)
Definition 1. Let us consider two vector valued objective functions F(X) and H(X), X
in scales of function values, i.e.
H(X) = (c1f1(X),..., cmfm(X))T + B
A, differing only
(3)
where C and B are constant vectors in R that can assume not only finite, but also infinite and infinitesimal values expressed by the numerals defined in [9, 13]. The sequences of points generated by an algorithm, when applied to these functions, are denoted by Xi , i = 1, 2, …, and Vi , i = 1, 2,…, respectively.
The algorithm that generates the identical sequences, Xi = Vi , i = 1, 2,…, is called strongly homogeneous.
A weaker property of algorithms is considered in [3], where the algorithms that generate the identical
sequences for the scalar functions f(X) and h(X) = f(X)+b are called homogeneous. Since the proper scaling of function values by translation alone is not always possible, we consider here the invariance of optimization results with respect to a more general (affine) transformation of the objective function values.
The concept of strong homogeneity related to the algorithms of the single-objective optimization was introduced in [20].
m
3.
A brief description of the multi-objective P-algorithm
To validate the selection of a site for current computation/observation of the vector of objectives
from the decision theory perspective, a model of objective functions is needed. The considered approach
is based on statistical models. In the present paper, we assume that the objective functions considered are
random realizations of the stochastic function, chosen for a model. However, the P-algorithm considered
below can also be constructed using more general statistical models proposed in [16] in conformity with
the ideas of the theory of subjective probabilities. To facilitate the implementation, the Gaussian stochastic functions normally are chosen for statistical models [1]. Some authors call the optimization, based on
statistical models of objective functions, as kriging; the term “kriging” was coined in geostatistics to describe the statistical model-based interpolation methods [14]. The global single-objective optimization
algorithms have been proved well suited for the problems with black box expensive objective functions. It
is interesting to note that the other global optimization approach aimed at that class of problems, namely,
the radial basis functions approach [3], corroborates the algorithms coincident with the P-algorithm based
on the Gaussian model [18]. Recently the statistical model-based global optimization approach has attracted a considerable attention of experts in the multi-objective optimization; see e.g. [5, 6, 8, 21].
The chosen statistical models of particular objectives fj(X) comprise a vector-valued Gaussian random field Ξ(X), and it is accepted for the statistical model of F(X). In many real-world applied problems,
thus in test problems as well, the objectives are not (or weakly) interrelated. Accordingly, in the present
paper, the components of Ξ(X) are supposed to be independent. The correlation between the components
of Ξ(X) could be included into the model; however it would imply some numerical and statistical inference problems that require a further investigation.
137
A. Žilinskas
It is assumed that a priori information on the expected behavior (a form of variation over A) of objective functions is scarce. The heuristic assumption on the lack of a priori information is formalized as an
assumption that ߦj(X); j = 1,…,m; are homogeneous isotropic random fields, i.e., that their mean values µj
and variances ߪj2 are constants, and that the correlation between ξj(Xi) and ξj(Xk) depends only on ||Xi –
Xk||; here and further ||·|| denotes the Euclidean norm in the vector spaces Rd and Rm considered. The
choice of the exponential correlation function ρ(t) = exp(–ct), c>0, is motivated by the previous experience of use of single-objective global optimization algorithms based on such a statistical model. The parameters of the statistical model should be estimated using the data on F(·); since the components of Ξ(X)
are assumed to be independent, the parameters of each ߦj(X) can be estimated separately.
The minimization is considered at the n+1 minimization step. The points where the objective functions were computed are denoted by Xi i = 1,…,n, and the corresponding objective vectors are denoted by
Yi = F(Xi). The vector-valued Gaussian random field with values in Rm:
Ξ ( X ) ∈ Rm , X ∈ A ⊂ Rd ,
(4)
Φ nX (Y ) = P{Ξ ( X ) ≤ Y | Ξ ( X i ) = Yi , i = 1, ..., n}.
(5)
X n+1 = arg max X ∈A P{Ξ ( X ) ≤ Y on | Ξ ( X _ i ) = Yi , i = 1, ..., n},
(6)
is accepted as a model of the vector objective function considered. In the frame of that model an unknown
vector of objectives F(X), X ≠ Xi , i = 1,…,n, is interpreted as a random vector the distribution of which is
defined by the conditional distribution function of Ξ(X)
The choice of the current observation point, i.e. of the point where to compute the vector of objectives at the current minimization step, is a decision under uncertainty. The statistical model (4) represents
the uncertainty with respect to the result of that decision, therefore the choice of the current observation
point, X n+1 ∈ A , means a choice of a distribution function from the set of distribution functions Φ nX (Y ) ,
X ∈ A . The postulates of rational decision making under uncertainty, when applied to substantiate such a
choice, imply the P-algorithm [17, 21] a multi-objective version of which is defined by the following expression
where Y on = ( y1on , y2on , ..., ymon )T is a vector not dominated by Yi = 1, ..., n ; for the sake of explicitness it
is assumed that y on
j = min1≤ h≤ n y jh
j = 1, ..., m .
The implementation of the multi-objective P-algorithm is similar to that of the single objective
P-algorithm [15]. Since the components of the vector random field are assumed independent, the probability in (6) is computed as the product of particular probabilities related to the components of Ξ(X)
P{Ξ ( X ) ≤ Y on | Ξ ( X i ) = Yi , i = 1, ..., n} =
∏ j =1 P{ξ j ( X ) ≤ ξ onj | ξ j ( X i ) = y ji ,
m
∏ ∫−∞
m
yon
j
j =1
∏
i = 1, ..., n} =
 (m j ( X | X i , y ji , i = 1, ..., n) − t )2 

1
exp  −
 dt =
2 s j ( X | X i , y ji , i = 1, ..., n) 
2π



m
j =1
 y on

j − m j ( X | X i , y ji , i = 1, ..., n)
,
G


s j ( X | X i , y ji , i = 1, ..., n)


(7)
where m j ( X | X i , y ji , i = 1, ..., n) and s j ( X | X i , y ji , i = 1, ..., n) denote the conditional mean and conditional standard deviation of the random field ξj(X) at the point X, and G(· ) stands for the Gaussian cumulative distribution function.
138
ON THE ARITHMETIC OF INFINITY ORIENTED IMPLEMENTATION OF THE MULTI-OBJECTIVE P-ALGORITHM
4.
Strong homogeneity of the multi-objective P-algorithm
To evaluate the influence of data scaling on the whole optimization process, two objective functions
are considered
(8)
F(X) and H(x) = (c1f1(X),…,cmfm(X))T + B,
m
where C and B are constant vectors in R that can assume not only finite, but also infinite and infinitesimal values, expressed by the numerals defined in [9,13]. Let the first n function values be computed for
both functions at the same points Xi, i = 1,…,n, and the corresponding function values be denoted as Yi =
F(Xi) and Zi = H(Xi). The next points of computation of the values of F(·) and H(·) are denoted by Xn+1
and Vn+1. We are interested in the strong homogeneity of the P-algorithm, i.e. in the equality Xn+1 = Vn+1.
To apply the P-algorithm to a particular problem, the parameters of the statistical model should be
estimated accordingly. A sample of observations is comprised of the values of objective functions computed at the points randomly with a uniform distribution, generated in A; let the sample size be denoted
by k < n. The parameters of the stochastic function are estimated using these observations which are also
taken into account in a further optimization process. A Gaussian homogeneous isotropic random field is
specified by the average, variance, and correlation function. Normally the correlation function is chosen a
priori, depending on the supposed properties of the considered problem. For example, frequently the correlation function ρ(t) = exp(-ct) is chosen with the parameter 3 ≤ c ≤ 7 for A scaled to a unit hypercube.
The larger the number of local minimizers, the larger value of c is chosen. The mean and variance are estimated using the methods of statistics.
Let us consider the estimation of the vectors of mean values and variances of the components of the
random Gaussian vector ζ . In case a small number of observations k is available, the following formulae
frequently are used to obtain rough estimates of mean and variance of the components of ζ , although
they are well justified only for independent observations
γj =
1 k
k
ζ ji , ϑ 2j = ∑ i =1 (ζ ji − γ j )2 ,
∑
i
=
1
k
(9)
where ζ j = (ζ j1 , ..., ζ jk )T are the observed values of components of random vector ζ.
~
and
The estimates of µ and σ2, obtained according to (9) using the data Yi and Zi are denoted as µ , σ 2
,
respectively. It is obvious that the following equalities hold
µ = ( c1 µ~1, ..., cm µ~m )T + B, σ j = c jσ j , j = 1, ..., m.
(10)
The maximum likelihood estimates of mean and variance of the components of the Gaussian random
vector ζ in the case of the correlated observations are defined by the following formulae
∑ ∑ ζ jiτ ih ,
γ j = i =1k h=k1
∑ i=1 ∑ h=1τ ih
k
k
1
ϑ 2j = (ζ j − γ j I )T K −1 (ζ j − γ j ),
k
12)
where the correlation coefficients between ζ ji and ζ jh are equal to ρih , and τ ih are elements of the
inverted correlation matrix
 ρ11 ... ρ1k 

−1 
K =  ... ... ... 
ρ

 k1 ... ρkk 
−1
 τ11 ... τ1k 


=  ... ... ...  .
τ

 k1 ... τ kk 
(13)
It can be proved that for the estimates of µ and σ2, obtained according to (11–12), using data Yi and Zi
correspondingly, the equalities (10) hold. Below we assume that the estimates of the mean value and variance of the homogeneous isotropic Gaussian random field, used for a statistical model, satisfy equalities
(10).
139
A. Žilinskas
Theorem. The Gaussian model based multi-objective P-algorithm is strongly homogeneous.
Proof. As shown in Section 4, the current point of computation of the vector value of H(X) by the
P-algorithm (denoted by Vn+1) is defined by the following formula
Vn+1 = arg max X ∈A ∏
m
j =1
 z on

j − m j ( X | X i , z ji , i = 1, ..., n)
.
G

s j ( X | X i , z ji , i = 1, ..., n) 


(14)
The conditional mean and conditional variance in (14) are dependent on the data Xi; Yi, i = 1,…,n, as
follows
m j ( X | X i , z ji , i = 1, ..., n) = µ j + ( z j − µ j Ι )T Σ−1ϒ =
c j µ j + b j + ( c j y j + b j Ι − ( c j µ j + b j )Ι )T Σ −1ϒ =
c j m j ( X | X i , y ji , i = 1, ..., n) + b j ,
(15)
~
c 2j σ 2j − (c j y j − c j µ j I )T Σ −1 (c j y j − c j µ j I ) =
c 2j s 2j ( X | X i , y ji , i = 1, ..., n),
(16)
where Σ is the matrix composed of ρ(||Xi–Xh||) – correlation coefficients between ξj(Xi) and ξj(Xh),
i=1,...,n, and ϒ = ( ρ ( | X − X1 | ), ..., ρ ( | X − X n | ))T .
The replacement of m j ( X | X i , z ji , i = 1, ..., n) and s 2j ( X | X i , z ji , i = 1, ..., n) in (14) by the expres-
sions (15) and (16) implies the equality
m
 z on

j − m j ( X | X i , z ji , i = 1, ..., n)
=
Vn+1 = arg max ∏ G 

X ∈A
s j ( X | X i , z ji , i = 1, ..., n) 
j =1


m
 y on

j − m j ( X | X i , y ji , i = 1, ..., n )
 = X n+1.
arg max ∏ G 


X ∈A
s j ( X | X i , y ji , i = 1, ..., n)
j =1


The equality between the current point of computing the vector-value of H(·), i.e. Vn+1 and the current
point of computing the vector-value of F(·), i.e. X n +1 , completes the proof of the strong homogeneity of
the multi-objective P-algorithm.
5.
Conclusions
The multi-objective P-algorithm is strongly homogeneous. Therefore the computationally advantageous second type implementation of the multi-objective P-algorithm in the arithmetic of infinity is correct.
Acknowledgement
The research was supported by the Research Council of Lithuania under Grant No. MIP-063/2012.
References
1. Calvin, J. M.; Zilinskas, A. (2000). A one-dimensional P-algorithm with convergence rate O(n–3+δ) for smooth
functions, Journal of Optimization Theory and Applications 106: 297–307.
http://dx.doi.org/10.1023/A:1004699313526
2. De Cosmis, S.; De Leone, R. (2012). The use of grossone in mathematical programming and operations research, J. Appl. Math. Comput. 218(16): 8029–8038. http://dx.doi.org/10.1016/j.amc.2011.07.042
140
ON THE ARITHMETIC OF INFINITY ORIENTED IMPLEMENTATION OF THE MULTI-OBJECTIVE P-ALGORITHM
3. Elsakov, S. M.; Shiryaev, V. I. (2010). Homogeneous algorithms for multiextremal optimization, Computational
Mathematics and Mathematical Physics 50(10): 1642–1654. http://dx.doi.org/10.1134/S0965542510100027
4. Gutmann, H. (2001). A radial basis function method for global optimization, Journal of Global Optimization 19:
201–227 http://dx.doi.org/10.1023/A:1011255519438
5. Keane, A.; Scalan, J. (2007). Design search and optimization in aerospace engineering, Phil. Trans. R. Soc. A,
365, 2501–2529. http://dx.doi.org/10.1098/rsta.2007.2019
6. Knowles, J. (2006). ParEGO: A hybrid algorithm with on-line landscape approximation for expensive multiobjective optimization problems, IEEE Trans. Evolutionary Computation 10(1): 50–66.
http://dx.doi.org/10.1109/TEVC.2005.851274
7. Miettinen, K. (1999). Nonlinear multiobjective optimization, Springer.
8. Nakayama, H. (2009). Sequential approximate multiobjective optimization using computational intelligence,
Springer.
9. Sergeyev, Ya. D. (2005). A few remarks on philosophical foundations of a new applied approach to infnity,
Scheria, 26–27, 63–72
10. Sergeyev, Ya. D. (2007). Blinking fractals and their quantitative analysis using infinite and infinitesimal numbers, Chaos, Solitons & Fractals 33(1): 50–75. http://dx.doi.org/10.1016/j.chaos.2006.11.001
11. Sergeyev, Ya. D. (2009). Numerical computations and mathematical modeling with infinite and infinitesimal
numbers, J. Appl. Math. Comput. 29: 177–195. http://dx.doi.org/10.1007/s12190-008-0123-7
12. Sergeyev, Ya. D. (2009). Computer system for storing infinite, infinitesimal, and finite quantities and executing
arithmetical operations with them. EU patent 1728149.
13. Sergeyev, Ya. D. (2010). Lagrange lecture: methodology of numerical computations with infinities and infinitesimals, Rendiconti del Seminario Matematico dell'Universit e del Politecnico di Torino, 68(2): 95–113.
14. Stein, M. (1999). Interpolation of spatial data, some theory of kriging, Springer.
http://dx.doi.org/10.1007/978-1-4612-1494-6
15. Torn, A.; Zilinskas A. (1989). Global optimization, Lecture Notes in Computer Science 350: 1–255.
http://dx.doi.org/10.1007/3-540-50871-6
16. Zilinskas, A. (1982). Axiomatic approach to statistical models and their use in multimodal optimization theory,
Mathematical Programming 22: 104–116. http://dx.doi.org/10.1007/BF01581029
17. Zilinskas, A. (1985). Axiomatic characterization of a global optimization algorithm and investigation of its
search strategies, Operations Research Letters 4: 35–39. http://dx.doi.org/10.1016/0167-6377(85)90049-5
18. Zilinskas, A. (2010). On Similarities between two Models of global optimization: statistical models and radial
basis functions, Journal of Global Optimization 48(1): 173–182. http://dx.doi.org/10.1007/s10898-009-9517-9
19. Zilinskas, A. (2011). Small sample estimation of parameters for Wiener process with noise, Communications in
Statistics, Theory and Methods 40: 3020–3028. http://dx.doi.org/10.1080/03610926.2011.562788
20. Zilinskas, A. (2012). On strong homogeneity of two global optimization algorithms based on statistical models
of multimodal objective functions, J. Appl. Math. Comput. 218(16): 8131–8136.
http://dx.doi.org/10.1016/j.amc.2011.07.051
21. Zilinskas, A. (2012). A statistical model-based algorithm for black-box multi-objective optimization, International Journal of System Science, accepted.
22. Zilinskas, A.; Zilinskas, J. (2010). Interval arithmetic based optimization in nonlinear regression, Informatica
21(1): 149–158.
141