1 On Using the Hypervolume Indicator to Compare Pareto Fronts

On Using the Hypervolume Indicator to Compare Pareto Fronts: Applications to
Multi-Criteria Optimal Experimental Design
Yongtao Caoa, Byran J. Smuckerb, Timothy J. Robinsonc,1
a
Department of Mathematics, Indiana University of Pennsylvania, Indiana, PA USA
b
c
Department of Statistics, Miami University, Oxford, OH USA
Department of Statistics, University of Wyoming, Laramie, WY USA
Abstract
The Pareto approach to optimal experimental design simultaneously considers
multiple objectives by constructing a set of Pareto optimal designs while explicitly
considering trade-offs between opposing criteria. Various algorithms have been
proposed to populate Pareto fronts of designs, and evaluating and comparing these
fronts—and by extension the algorithms that produce them—is crucial. In this paper,
we first propose a framework for comparing algorithm-generated Pareto fronts based
on a refined hypervolume indicator. We then theoretically address how the choice of
the reference point affects comparisons of Pareto fronts, and demonstrate that our
approach is Pareto compliant. Based on our theoretical investigation, we provide
rules for choosing reference points when two-dimensional Pareto fronts are
compared. Because theoretical results for three-dimensional fronts are difficult to
obtain, we propose an empirical rule for the three-dimensional case by making an
analogy to the rules for two dimensions. We also consider the use of our procedure
in evaluating the progress of a front-constructing algorithm, and illustrate our work
with two examples from the literature.
Keywords: Pareto front, multi-objective optimization, design of experiments, point
exchange
1
Corresponding author: Department of Statistics, University of Wyoming, Laramie, WY 82071; e-mail
address: [email protected]
1
1. Introduction
Most experiments are conducted with multiple, competing objectives in mind. Therefore,
designing under a single criterion may be inadequate. For instance, Gilmour and Trinca (2012)
show via examples that the traditional D-optimal designs allow no ability to estimate pure error;
on the other hand, the best design for estimating pure error performs very poorly in terms of the
D-criterion. In situations like this, the final choice of an experimental design should reflect
appropriate compromise across the criteria of interest. But choosing a design based upon the
simultaneous optimization of multiple design criteria is often a difficult problem. Without a
priori knowledge about the interdependencies between the criteria, the conventional compound
design and constrained design approach (e.g. Cook and Wong, 1994) for solving multiple-criteria
optimal design problems could lead to relatively poor solutions (Coello Coello et al, 2007; Das
and Dennis, 1997). Furthermore, the tradeoff between the objectives cannot be fully understood
without simultaneously considering all criteria.
The Pareto front approach (Park, 2009; Lu et al., 2011; Sambo et al., 2013) not only
accounts for the varying interest and importance of the various objectives simultaneously but
also provides the most insight about the tradeoffs between the alternative choices, which in turn
enables better decision making. This procedure involves finding a set of Pareto optimal designs
and then using the experimenter’s evaluation of the existing trade-offs between the designs to
ultimately choose between them. The criterion vectors associated with the Pareto optimal set of
designs is known as the Pareto front. The shape of the Pareto front provides useful information
about the amount of tradeoff between the different criteria and how much compromise is needed
from some criteria to improve others. Critical to this approach is the assumption that the Pareto
front has been sufficiently populated. The true Pareto front, however, is rarely known and hence
any algorithm used to generate a set of designs (e.g. the exchange algorithms of Lu et al., 2011
and Sambo et al., 2014, or the multi-objective evolutionary algorithm of Park 2009) merely
results in an approximation of the true Pareto front. The quality of this approximation depends
upon (1) the proximity of the points on the approximated front to the points on the true Pareto
front; and (2) the diversity of the points on the approximated front, where more diversity is
typically better.
These characteristics are important in both offline settings, in which one
compares multiple fronts produced by competing algorithms, and online settings, in which a
2
front is evaluated as it evolves with the rate of this evolution potentially used as a termination
criterion. In this article, we are concerned with how Pareto fronts are evaluated and compared,
rather than with algorithm development.
A popular measure of the quality of an approximated Pareto front is the front’s
hypervolume (Zitzler and Thiele, 1998), which measures the size of the space enclosed by all
solutions on the Pareto front and a user-defined reference point (for a formal definition see
Section 2). This measure of Pareto front quality has gained increasing interest in recent years
and has become the standard offline indicator to evaluate the performance of multi-objective
optimization algorithms (Zitzler et al., 2008). It has also been used as an online indicator to guide
the optimization process (Knowles et al., 2003; Zitzler and Künzli, 2004; Emmerich et al., 2005;
Beume et al., 2007; Bader and Zitzler, 2011). Its success and popularity are due to the fact that it
simultaneously accounts for proximity and diversity and is strictly Pareto compliant. This means
that whenever one Pareto front approximation dominates another, the hypervolume of the former
is greater than that of the latter. One significant drawback to this measure is that its magnitude is
dependent upon an arbitrarily chosen reference point. We will return to this point, in detail, in
Sections 3 and 4.
Though the hypervolume measure is a well-established indicator of a front’s quality, it
has only recently been introduced to the statistics literature. Lu and Anderson-Cook (2012)
develop a hypervolume-like indicator within the context of optimal experimental design, but
there are several issues with their proposed measure: (1) they use different reference points for
different approximate Pareto fronts which leads to unfair comparisons; (2) they choose the
reference point in a way that does not permit a contribution to the hypervolume by all points; (3)
Pareto compliance is not maintained because dominated points are used to compare an
approximate front to the reference front; and (4) when used in an online setting, their proposed
procedure can suggest decreases in Pareto front quality even as a front evolves in the context of a
front construction algorithm. These issues are explained in more detail in Sections 2.2 and 4.1.
In this paper we address the aforementioned issues and propose an improved
hypervolume-based measure for use in Pareto optimal design. In Section 2, we review the notion
of Pareto optimal design, describe the computation of the hypervolume indicator, and explain in
more detail the problems with the outstanding versions of the measure as well as our solutions to
those problems. We also propose an interpretable scalar metric for describing how well a Pareto
3
front is approximated and illustrate how the proposed indicator can be used in comparing
competing Pareto fronts. In Section 3, we develop theoretical properties regarding the influence
of the reference point on the calculation of the hypervolume indicator. Guidance is provided for
choosing the reference point, in the presence of two criteria, based on our theoretical
investigations, and we suggest a similar approach for the three-dimensional case. In Section 4,
we illustrate our proposed procedure in both an offline and online setting in the context of multiobjective optimal experimental design, consider the influence of the reference point in a threedimensional example, and explore the reasons for unintuitive decreases in the online uses of the
hypervolume measure. In Section 5 we provide a recap and discussion.
Though we proceed with experimental design as the context, we note that our results and
conclusions are more generally applicable to any multi-objective optimization setting in which
the Pareto approach is employed and similar algorithms are used.
2. The hypervolume indicator and procedure for comparing discrete Pareto fronts
Without loss of generality, assume that the goal of a general multiple-criteria design
optimization
f 


problem
f1    , f 2  
, fC 
to
simultaneously
  denote the
'
C 1
maximize
C  2
design
criteria.
vector of criterion values for design

Let
. Let
denote the search space of all feasible designs. A design  1   is said to dominate  2   if
f j 1   f j  2 
f j 1   f j  2 
f  2 
2

is
for all
j  1, ..., C 
and there exists at least one
j  1, ..., C 
for which
. In this case, the criterion vector f   1  is said to dominate the criterion vector
and we write  1
and we write  1
2 .
2
. If
f j 1   f j  2 
for all 1 
j  C
, we say  1 weakly dominates
Henceforth, the criteria vector corresponding to a particular design is
referred to as a point in the criterion space. A design is Pareto optimal if and only if no other
design dominates it and its corresponding criterion vector is a non-dominated vector. The set of
Pareto optimal designs constitutes the Pareto optimal set and the corresponding criterion vectors
are said to be on the Pareto front or frontier. A good overview of the Pareto-related concepts is
available in Coello Coello et al. (2007).
4
Given the experimental design setting, we treat every Pareto front as finite and discrete.
We then assume that a given point on the Pareto front can be written as the ordered pair
 f   ,
1
f2 
 
dimensions, where
2
in two dimensions or the ordered triplet 
f1    , f 2  
f3 
 and
f1    , f 2    , f 3  
 
3
in three
 correspond to design criterion values 1, 2 and 3
respectively. In the remainder of this paper we operate in two or three dimensions, and a Pareto
front with cardinality
(the number of points in a Pareto front) is written as
p
PF 
 f
1
1  ,
f2 1   ,
 f    , f    
PF 
 f
1
1  ,
f2 1  , f3 1 
,
1
p
,
2
,
in
p
two
 f    , f    , f    
1
also defined on criterion vectors; e.g.
p
2
P F1
p
P F2
3
p
dimensions
in three. Moreover,
if every point in
and
or
are
is dominated by at least
P F2
one point in P F1 .
To simplify our theoretical investigation, in the remainder of this paper we will
standardize the criterion vector to  0 , 1  , rather than using the original scale. Specifically, for
PF 
C
, we standardize by scaling every criterion
f j 
where
f j 

*
f j 

f j 
 best
f j 
 w orst
 f j 
 w orst
, j  1, 2 ,
to
,C
f j 

*
  0 ,1 
with
(1)
 w o r s t corresponds to the minimum (or maximum, if the criterion is to be minimized)
observed value of criterion
observed value of criterion
C

f j   
j
j
within
within
PF
PF
and
f j 
 b e s t denotes the maximum (minimum)
. Note that this scaling maps the original criterion space
to  0 , 1  .
C
2.1 The hypervolume indicator
As its name suggests, the hypervolume of a given Pareto front measures the volume of
the criterion space that is weakly dominated by the points on the Pareto front. In order to define
the hypervolume indicator, a bounded space has to be made by the Pareto front and a userdefined reference point. The hypervolume indicator (Zitzler and Thiele, 1998) for P F 
C
,
5
denoted as
IH
 P F  , is dependent upon the reference point
r   r1 , r2 ,
, rC

'

C
and is
formalized as
IH
with s p a c e  s , r    v 
C
r
 PF, r 
v
s
 

s P F
sp a c e  s, r  
,
 being the criterion space (rectangles in 2-dimension and
hyper-rectangles in dimensions > 2) containing all criterion vectors,
dominated by the elements
s  PF
(1)
and themselves dominate
v
r   r1 , r2 ,
, rC
C
, that are weakly
 where
'
ri
is the ith
coordinate of the reference point, and  is the usual Lebesgue measure. Note that strictly
speaking, the reference point could have positive elements and the individual components of
r   r1 , r2 ,
, rC
 need not be the same. However, such a reference point would lose practical
'
meaning, and so we restrict the elements to be the same and non-positive.
2.2 Issues when comparing discrete Pareto fronts
When several competing algorithms exist for populating the Pareto front, an important
question is how to make comparisons among the algorithms. This is especially difficult when, as
is typically the case, the shape and cardinality of the true Pareto front is unknown. In Section 2.3
we propose a framework for algorithm comparison that avoids the problems described in the
Introduction. First, however, we present these problems in more detail.
The first two problems raised in the Introduction concern the reference point. Indeed, the
primary and overarching issue is that the hypervolume of a Pareto front is dependent upon a
user-defined reference point, and the subsequent ranking of competing Pareto fronts is dependent
upon the location of this point. When the true Pareto front is unknown, the reference point is
usually chosen as either the nadir point of the investigated Pareto front (Zitzler et al., 2007; Lu
and Anderson-Cook, 2012) or a point that is slightly worse than the nadir (Zitzler et al., 2008;
Sambo et al., 2013). (Note that the nadir point is defined as the vector with the worst values of
each criterion as its elements.) Auger et al. (2009, 2010, 2012), Brockhoff (2010), and Friedrich
et al. (2013) consider reference point selection but the assumptions made are unrealistic for
design optimization. Specifically, these works assume that (1) the user is only interested in a prespecified number of solutions and (2) that one objective can be explicitly formulated as a
6
continuous and differentiable function of the remaining objectives. While these works provide
insight into the general problem by superimposing a pre-knowledge of the theoretical form of the
true, unknown Pareto front, they are of little practical use to solve the design selection problem.
Another issue regarding Pareto front comparison is accounting for point dominance in the
formulation of a reference front. Knowles et al. (2006) suggest that such a reference front can be
obtained by combining all the competing fronts based upon Pareto dominance. However, when
comparing the individual Pareto fronts to the reference front or to each other, neither the multiobjective optimization community (Coello Coello et al., 2007) nor the statistics community (Lu
and Anderson-Cook, 2012; Sambo et al., 2013) has noticed that if points on the individual Pareto
fronts that are dominated by the reference front are not removed, the comparison will not be
Pareto compliant. In other words, if this adjustment is not made, the comparison procedure
cannot reliably distinguish between Pareto fronts in terms of quality as measured by the
hypervolume indicator. This issue is explored more concretely in Section 4.1.
The aforementioned problems deal fundamentally with making fair comparisons among
two or more front approximations, which is of interest in offline settings in which the fronts have
been fully constructed and their quality is awaiting adjudication. Evaluation and comparison in
online settings (i.e. situations where an algorithm is evaluated as it produces an increasingly
dense approximation of the true front) is also of interest to indicate the extent to which the
front’s quality is increasing. This allows the algorithm’s progress to be tracked and terminated
when substantial improvements cease. Conceptually, any measure reflecting front quality should
increase in magnitude as the front evolves. However, recent empirical studies (Judt et al., 2012,
2013) along with our empirical results using the method of Lu and Anderson-Cook (2012)
suggest that existing hypervolume measures can exhibit decreases in magnitude while the Pareto
front quality itself is increasing. We will take this issue up further in Section 4.4.
2.3 Comparing Pareto Fronts Using the Contribution Rate Indicator
In this section, we propose a method for comparing competing Pareto fronts utilizing the
contribution rate indicator. For a given multiple criteria optimal experimental design problem,
we assume that the true Pareto front exists but is unknown,
n
approximation Pareto fronts,
7
P F1 , P F2 , ..., P Fn
with
n
, are obtained by either executing
n
competing algorithms or the same algorithm
different input settings. We propose comparing Pareto fronts via the following procedure:
(1) Combine all the Pareto fronts,
P F1 , P F2 , ..., P Fn
, according to the definition of Pareto
domination into a single set of non-dominated criterion vectors, denoted by
Since the true Pareto front is generally unknown and
approximation,
P F1, P F2 ,
, P Fn
vectors from
(2) Standardize
P FS
P FS
P FS
.
is the best available
is regarded as a surrogate for the true Pareto front. Obtain
such that
P Fi  P Fi
P FS
. That is,
P Fi
only contains those criterion
P Fi
which are not collectively dominated by P FS .
P FS
by employing equation (1). Note in this step each
P Fi
is also
standardized.
(3) Calculate the hypervolume indicator for each of the approximation Pareto fronts,
denoted by I H  P Fi, r  , using the standardized
the standardized P FS , denoted by
IH
P Fi
along with the hypervolume for
 P FS , r  .
(4) Compute the contribution rate associated with each of the approximations as:
C R  P F i , P FS  
Note that values of
C R  P F i , P FS 
IH
 P Fi, r 
IH
 P FS , r 
.
(2)
close to 1 suggest that the ith approximation front is very close
to the surrogate Pareto front. The contribution rate for Pareto front
‘approximating efficiency’ associated with
1 i  j  n
,
P Fi
P Fi
is a better approximation than
PFj
. If
P Fi
can be thought of as the
C R  P F i , P FS   C R  P F j , P F S 
with
, and the comparison of the contribution
rate gives a sense of the magnitude of the difference. This contribution rate indicator, of course,
is itself only an approximation since
P FS
is not necessarily the true Pareto front. We present an
application of the proposed procedure in Section 4.1.
8
3. The choice of the reference point
As a starting point for the investigation of the reference point’s influence on the
comparison of Pareto fronts in higher dimensions, we first consider how the choice of the
reference point will change the hypervolume indicator when there are just two criteria.
3.1 Choice of reference point in two dimensions
The first result shows how the value of the hypervolume indicator for a 2-dimensional
Pareto front changes as the reference point changes.
Lemma 1. For a 2-dimensional discrete Pareto front in  0 , 1  of size p
2
reference point
 r1
r 
 0 , r2  0 
 PF, r  
p 1

f1   i  f 2   i  
i 1
Proof. Let
with
p  2
PF 
 f
1
 r1
f 1   i  f 2   i  1   r1 f 2   1   r2 f 1   p   r1 r2
.
(3)
f2 1   ,
1  ,
,
 f    , f       0 , 1 
1
p
2
2
p
be a 2-dimensional Pareto front
solutions. Without loss of generality, assume that the solutions are sorted by criterion
 0 , r2  0  ,
IH

i 1
f1   i   f1   i 1 
1 in ascending order, i.e.,
r 
, with respect to a
, the hypervolume indicator can be written as
p
IH
 2
 PF, r 
i  1,
, p 1
. Then for a given reference point
the definition of the hypervolume indicator in equation (2) can be written as

 f   
r1

 f   
f1   2
1
1
1
3

f 2   1   r2
 

i 1
f1   i  f 2   i  
   f   
f 2   3   r2
p 1
p

for

1

2

f 1   1    f 2   2   r2

 f    f     f   
1
p
1
p 1
2
p
r2

f 1   i  f 2   i  1   r1 f 2   1   r2 f 1   p   r1 r2
i 1
Two implications are clear from Lemma 1. First, since the solutions are ordered by
criterion 1,
f1   p
Therefore, when
 is the largest value of criterion 1 and
r
f2 1 
is changed, the change in the value of
is the largest value of criterion 2.
IH
 P F , r  will depend upon the
reference point and the largest value of each criterion. This is illustrated in Figure 1, where a
five point Pareto front is displayed for a hypothetical two-criterion problem. If we define the
9
boundary points as the optimal solutions with respect to a single criterion, then in Figure 1 the
blue area represents the contribution of the left boundary point with respect to the reference point
r   r1 , r2
 , the yellow area represents the contribution of the right boundary point, and the red
area the contribution of the reference point alone. The second implication is that when
r   0, 0 
the contributions of the two boundary points as well as the reference point itself will be
eliminated entirely.
Figure 1. Illustration of the hypervolume for a hypothesized 2-dimensional Pareto front consisting of 5
points with nadir point (0,0) and reference point r   r1 , r2  . The blue area is the hypervolume
contributed by the left extreme point; the red area is the hypervolume contributed by the reference point,
the yellow area is the hypervolume contributed by the right extreme point and the green area is the
hypervolume contributed by the interior of the Pareto front.
Lemma 2.
P F1   0 , 1 
2
The difference in hypervolume between two 2-dimensional Pareto fronts
and P F2
  0 , 1
2
, with respect to a common reference point
r 
 r1
 0 , r2  0 
, is
given by
IH
 P F1 , r   I H

 P F2 , r   




p1
f1   i

i 1
 r1
i 1

p2

i 1
p 2 1


f2  i  
f 1   i  f 2   i 1  

f 1   i  f 2   i  

p1  1

i 1
f 2   1   f 2   1    r2

f1   i  f 2   i 1  

(4)
 f    f    
1
p1
2
p2
10
,
where  i , i
 1,
p1
are designs in
P F1 and  i, i  1,
Proof. Equation (4) is established directly by writing
p2
are designs in
IH
 P F1 , r  and
.
P F2
 P F2 , r  using
IH
□
Equation (3).
Lemma 2 implies that when using the hypervolume indicator to compare two Pareto
fronts, the difference will not only depend on the fronts but also upon the reference point. Now
we turn to several results relevant to the on-line and off-line use of hypervolume to compare
Pareto fronts.
Theorem 1. If two 2-dimensional Pareto fronts
maximum
IH
criterion
 P F1 , r  
1
value
and
the
 P F2 , r  is independent of
IH
Proof. As in Lemma 1 we assume that
r
same
IH
 P F1 , r  
IH
2
and
maximum
P F2   0 , 1 
criterion
2
2
have the same
value,
then
.
f1   p
 is the maximum criterion 1 value and

f1  p
the maximum criterion 2 value. By Lemma 2, if
equation (5),
P F1   0 , 1 
1

 P F2 , r  does not depend on
r

f 1  p
2
 and
f2 1 
f 2   1   f 2   1 
is
then by
□
.
Theorem 1 implies that if two Pareto fronts have the same boundary points (e.g. both find
the same optimal design for each of the criteria individually), the reference point plays no role in
the comparison of the fronts.
Theorem 2.
A1  A 2
and
Partition
B1
B2
P F1   0 , 1 
then
IH
2
as
P F1  A 1
 P F1 , r  
IH
and
B1
 P F2 , r 
P F2   0 , 1 
holds
2
for
as
any
P F2  A 2
reference
B2
. If
point
r   r1  0 , r2  0  .
Proof. The proof can be broken into three cases:
(i)
A1  A 2  
and
B1  
B2  
. In this case,
P F1
P F2
of Pareto dominance, we have that the set bounded by
subset of the set bounded by
r
A1  A 2  
and
B1
B2
r
and
P F2
is a proper
and P F1 . Taking the Lebesgue measure (i.e., the
hypervolume) of the two sets, we obtain
(ii)
and by the definition
IH
, where neither
 P F1 , r  
B1
nor
B2
IH
 P F2 , r  .
are empty. In this case
P F1
and P F2 share a certain number of solutions, but of the remaining solutions in each
11
front, each solution in P F2 is dominated by at least one solution in P F1 . The proof
of this case is very similar to the arguments given in case (i), therefore we have
IH
(iii)
 P F1 , r  
A1  A 2  
portion
 P F1 , r  
For case (iii),
 P F2 , r  .
B2  
and B1
of
B1    I H
IH
IH
P F1
.
 B1, r  
IH
, where
and
 P F2 , r  .
r   r1  0 , r2  0  is
is not empty. In this case,
A1  A 2    IH
Since
0
B2
B2    IH
 B1, r 
 0  IH
 0
,
is only a
IH
A 2, r  
we
have
0
,
that
□
required. Otherwise, if
points for which one criterion value is 0, such as  0 ,
IH
B2, r 
 A 1, r  
P F2
 B 2 , r  which yields I H  P F1 , r 
 IH
r   0, 0 
f2 
 P F2 , r  .
and
  and/or 
B1
contains only
f1    , 0 
, then
This clearly violates Pareto
compliance.
The three cases are illustrated graphically in Figure 2. With respect to a common
reference point, we see that P F1 has the maximum hypervolume indicator because it dominates
P F2
,
P F3
P F3
and
P F4
. We also have that
 P F2 , r  
IH
 P F3 , r  because solutions 4, 5, and 6 in
are collectively dominated by the solutions 4 and 5 in
first three solutions. Furthermore,
than
IH
P F4
IH
 P F2 , r  
IH
P F2
though the two fronts share the
 P F4 , r  because
P F2
has two more solutions
does.
Theorems 1 and 2 focus on situations in which the comparison of Pareto fronts by using
the hypervolume indicator is independent of the choice of the reference point. There remains an
exceedingly important practical question to be answered. How should we select the reference
point when none of the conditions in Theorems 1 and 2 hold? The answer to this question has
implications for the off-line setting in which a researcher wishes to compare multiple Pareto
fronts and also in the on-line setting where the hypervolume indicator may be used as a stopping
criterion.
To demonstrate the confusion that might result if the reference point is not chosen
carefully, consider
P F1
and
P F2
shown in the upper half of Figure 3, where both of the fronts are
made up of 5 points in  0 , 1  . If the two fronts are compared with respect to a common
2
12
reference point
r   r  0, r  0 
according to Lemma 2. Thus, (a)
(b)
P F1
is as good as
P F2
if
, we will have
P F1
is superior to
r  0 .1 7 5 ,
and (c)
the ranks differ depending on the position of
r
P F1
IH
P F2
 P F1 , r  
IH
 P F2 , r 
 0 .2 r  0 .0 3 5
r  0 .1 7 5
in terms of hypervolume if
is inferior to
P F2
if
0  r  0 .1 7 5
,
. Clearly,
. Consequently, consider Theorem 3 along with
an illustrative example.
Figure 2. Three hypothetical Pareto fronts: P F1 dominates P F2 , P F3 and P F4 . P F2 and P F3 have
common solutions 1, 2, and 3, but solution 4 in P F3 is dominated by solution 4 in P F2 ; solutions 5 and 6
in P F3 are dominated by solution 5 in P F2 . P F2 and P F4 have common solutions 1, 2, and 3, but P F2
has two more solutions, i.e., solution 4 and 5.
Theorem 3. For a 2-dimensional maximization problem with the criterion space  0 , k
f 1 
and f   2  be the two extreme solutions, i.e., f   1 
exist two other solutions, f   1  and f   2  such that
r
must be greater than
k
for
IH
f  
1
 , f   2  , r
  0, k
 and f   2 
 k, 0
 , let
2

. Let there
0  f 1   1  , f 1   2  , f 2   1  , f 2   2   k

 r, r  
IH
. Then
  f     , f     , r   r , r  
1
2
to
hold.
Proof. By Lemma 1,
IH
f  
1
 , f   2  , r

 r, r 
 f1   1  f 2   1   f1   2

 r f 2   1   r f1   2   r
 2k r  r
f 2   2   f1   1  f 2   2

2
2
13
f1   1   f 2   2   0
since
f 2   1   f1   2   k
and
.
Similarly,
  f     , f     , r   r , r   
IH
1
2
f 1   1  f 2   1   f 1   2  f 2   2   f 1   1  f 2   2 
 r f 2   1   r f 1   2   r
.
2
Therefore,
IH
f  
1
 , f   2  , r

 r, r  
IH
 f     , f     , r   r , r   
1
2
2 k r  f 1   1  f 2   1   f 1   2  f 2   2   f 1   1  f 2   2 
(6)
 r f 2   1   r f 1   2 
Note
0  f 1   1  , f 1   2  , f 2   1  , f 2   2   k
implies
0  f 1   1  f 2   1  , f 1   2  f 2   2  , f 1   1  f 2   2   k
2
so that
2k
2
 x r  f 1   1  f 2   1   f 1   2  f 2   2   f 1   1  f 2   2 
 r f 2   1   r f 1   2 
where
0  x  f 2   1   f 1   2   2 k
Then, (6) holds if
2k r  2k
2
 x r
.
which implies
r 
2k
2
2k  x
 k
, i.e.,
r  k
. □
This result, though a simplification that considers only fronts of size two, is suggestive
regarding the choice of the reference point. We outline some guidelines, then illustrate with an
example.
Intuitively, we prefer a set of representative solutions which are as close as possible to the
true Pareto front while being uniformly distributed along the whole front. However, if a
judgment between several fronts is desired such that the two extreme solutions  0 , 1  and  1, 0 
in  0 , 1  are most emphasized, then Theorem 3 suggests that one should choose
2
r  k  1.
On the other hand, if points are desired to be uniformly distributed along the front
consider the following. Suppose the true Pareto front in  0 , 1  is the line connecting the two
2
extremes—certainly a simplifying assumption—or, alternatively, assume an arbitrary true Pareto
front and consider the projection of this front onto the line, which we call the Projected Uniform
14
Pareto front (PUPf). Consider a surrogate for the true front that is uniformly distributed along the
PUPf, and further consider any pair of adjacent points in this approximation along with the
isosceles right triangle formed by taking the line connecting the pair of points as the hypotenuse.
Then for the coordinate system implied by these two points (with the origin at the intersection
opposite the hypotenuse),
P FS
k 
1
and the distance between the points is
P FS  1
2
P FS  1
where
denotes the number of elements in the surrogate front. For these two points in isolation,
then, Theorem 3 suggests
r  k 
1
P FS  1
. Since this applies to any two adjacent points, we
broaden this suggestion as a tentative guideline for the reference point in the original coordinate
system.
Therefore, if extreme points are favored, we suggest
r 1;
with uniformly distributed points are preferred, the suggestion is that
1
P FS  1
if, as is more likely, fronts
r
. Since the length of the line connecting the extremes is
between each adjacent point on the PUPf is

2
r1  r2  
,
 P FS  1

2

, where
P FS
2
P FS  1

1
P FS  1
be somewhat larger than
2 1
and the distance
, we recommend choosing
is the number of designs in the surrogate Pareto front.
This strategy provides a way for the decision maker to explicitly incorporate preferences
regarding the desired distributions of points in the Pareto front and also account for the number
of points in the Pareto front.
As an illustration of our proposed strategy, we compare the four Pareto fronts in Figure 3.
Since there are 5 points in each, we choose
distributed points, but
r 
2  1 .4 1
r 
2
51
 0 .3 5
if we wish to emphasize uniformly
if we prefer the two extreme solutions. The hypervolume
indicators for the four fronts with respect to the two reference points are presented in Table 1.
Since
P F1
includes the two extreme solutions and is uniformly distributed along the
PUPf, it has the highest hypervolume regardless of the reference point.
PF2
is better than
PF3
15
when
r  0 .3 5 ,
because this reference point favors uniformly distributed points. When
PF3
is better
PF4
has one extreme point, it is inferior to
PF2
r  1 .4 1
since this reference point prefers the extreme points. However, even though
PF2
when
is quite far from  1, 0  . In contrast, the two ends of
P F2
r  1 .4 1
because its other end,  0 .2 , 0 .8  ,
are very close to the extreme points.
Figure 3. Comparing four 2-dimensional Pareto fronts to demonstrate how the reference point can affect
the ranks of the fronts.
Table 1. The hypervolume indicators for the four 2-dimensional Pareto fronts in Figure 3 with respect to
two different reference points.
Pareto front r  0 .3 5
r  1 .4 1
PF1
PF2
PF3
PF4
1.1975
1.1625
0.9950
0.7175
5.1831
4.9361
4.9806
3.8551
3.2 Three dimensions and the choice of reference point
A generalization of Theorems 1 and 3 from two dimensions to three is not straightforward, since a point’s hypervolume contribution no longer possesses a simple geometric shape,
as opposed to the two-criterion case where it is always rectangular. As such, the hypervolume
indicator for a higher dimensional Pareto front is more intricately dependent upon on the choice
of the reference point. In order to investigate how to choose a reference point, we consider a
16
straightforward method for computing the hypervolume of a Pareto front in
d  2
dimensions:
the Hypervolume by Slicing Objectives (HSO) algorithm (While et al. 2006; Fonseca et al. 2006).
This procedure assumes a non-dominated set, and consists of the following steps:
(1) Sort the points in decreasing order of the coordinate values of a chosen dividing criterion.
(2) Sweep the set by a  d
 1  -dimensional
hyperplane along the dividing criterion, defining
d-dimensional slices between consecutive points.
(3) Calculate the hypervolume of each single slice by multiplying its height (measured along
the dividing criterion) by the hypervolume of its next lower dimensional base.
(4) Steps 1-3 are recursively repeated until in Step 3 the hypervolume can be calculated in
two dimensions.
(5) Add up the hypervolumes of each individual slice to produce the total hypervolume.
It may be helpful to visualize the above procedure in the 3-dimensional case. As shown
in Figure 4, the seven points in a hypothetical 3-dimensional Pareto front are sorted in
descending order along the z-coordinate, i.e., criterion 3. Since these seven points can be
classified into four distinct groups, the Pareto front is divided into four 3-dimensional slices. It
can be seen that points 5, 6 and 7 form the bottom slice, point 1 forms the top slice, point 4 and
points 2 and 3 form the middle two slices. The base of each slice is a 2-dimensional Pareto front
and the height of each slice is the distance between the two consecutive z-values. The
hypervolume of a slice is then the hypervolume of the base 2-dimensional Pareto front multiplied
by the height of the slice. Finally, as illustrated in Figure 5, the hypervolume of a 3-dimensional
Pareto front is the sum of the hypervolume of all the slices.
The relationship between the reference point and the hypervolume indicator is much
more complicated in three- or higher-dimensions than in two dimensions. For instance, in
Supplementary Material A, a result is given which shows the relationship between the reference
point, the points on the front, and the hypervolume for the situation given in Figure 4. In this
case, the hypervolume depends on the reference point as well as the two boundary points (those
points that are on the edge of the Pareto front); i.e. the ones which have criterion values
and

f 1  ip
i
 for
i  1,
, s
, where
s
f 2   i1 
is the number of slices and assuming that criterion 3 is the
dividing criterion of the 2-dimensional Pareto fronts defined by each of the slices. For an
example in which
s  4
, see Figure 4.
17
Figure 4. A hypothetical 3-dimensional Pareto front consisting of 7 points. The coordinates x, y, z
correspond to criterion 1, criterion 2, and criterion 3, respectively.
Figure 1. An illustration of the HSO algorithm. The hypervolume of a 3-dimensional Pareto front breaks
into four 3-dimensional slices. The area of the bottom of each slice is a 2-dimensional Pareto front and the
height of each slice is obtained along the third criterion.
Establishing theoretical results regarding the reference point in three dimensions is
difficult; to date, only one paper in the literature has attempted it (Auger et al., 2010). Perhaps
even harder is to develop a theory that would clearly guide experimenters in making this
important selection. Instead, we suggest some strategies analogous the two-dimensional case.
If uniformly distributed solutions along the whole Pareto front are preferred, then we
would expect each slice to have the same height. If there are
criterion, there are
s 1
s
different levels of the dividing
such slices, so that each has a height of
bottom layer as much weight as the others, we would set
r3  
1
s 1
1
s 1
. If we wish to give the
which would give this slice
the same height as the others. More generally, if we assume the typical scenario, that the three
criteria are equally important and a uniform distribution of solutions is preferred, we suggest
18
r1  r2  r3 
choose
1
s 1
. On the other hand, if the individually optimal designs are to be emphasized
r1  r2  r3  1 .
We note again that these guidelines are not theoretically supported, but are instead
suggestions extrapolated from the two-dimensional case. In Section 4.3 we will provide some
results to illustrate the above choices.
4. Applications with numerical evaluations
In this section, we apply the comparison procedure proposed in Section 2 and the
theoretical results and empirical rules developed in Section 3 to two published examples. First,
we review the Pareto front comparison procedure given by Lu and Anderson-Cook (2012) in
Section 4.1 and compare it with our approach in the case of a simple hypothetical 2-dimensional
Pareto front. Then, we apply our method to a 3-criterion design problem which is solved using
the Pareto Aggregate Point Exchange (PAPE) algorithm of Lu et al. (2011). This problem is used
to demonstrate the our procedure in both an offline and online setting because of its welldeveloped baseline Pareto front.
4.1 Comparing 2-dimensional Pareto fronts
As mentioned earlier, Lu and Anderson-Cook (2012) (henceforth referred to as LA) also
proposed the use of a hypervolume-like indicator for comparing Pareto fronts. LA’s approach is
similar to what we have proposed in Section 2.3 though there are two important differences: (1)
LA does not compute
P Fi
as in our first step; (2) in our third step LA computes what they call
the “Hypervolume Under the Pareto Front” (HVUPF) for
P FS , P F1 , P F2 , ...,
and
P Fn
, which always
uses the front’s nadir point as the reference point.
To illustrate these differences, we present an example taken from LA. Consider the
seven-solution Pareto front in Figure 6, scaled as usual to  0 , 1  , where the goal is to maximize
criteria 1 and 2. Since we are comparing two fronts, this exemplifies the offline usage of the
hypervolume procedure. The solutions in
P F1
are denoted by the six triangles and the solutions
19
in
P F2
are denoted by the five diamonds. The combined front, P FS , is composed of seven
solutions, each represented by a red dot.
Figure 2: A Pareto front involving two criteria where the objective is to simultaneously maximize both
criteria. P FS has 7 solutions (red dots); P F1 has 6 solutions (open triangles); P F1 has 4 solutions (triangles
with red dots); and P F2  P F2 has 5 solutions (diamonds with red dots).
The computation of
H V U P FPF
S
proceeds by taking the sum of the five rectangle areas
R1-R5 as illustrated in Figure 7 (a). The two boundary points in this Pareto front are  0 , 1  and
 1, 0  , so that the nadir point is located at  0 , 0  . Since HVUPF uses the nadir as the reference
point, the two boundary points contribute nothing to the calculation. More explicitly, using
criterion 1 as the dividing criterion the computation proceeds as follows:
Rectangle 1 (R1)  C rite ria
1   0 , 0 .4  
Rectangle 2 (R2)  C rite ria
1   0 .4 , 0 .5  
:
0 .7 5   0 .5  0 .4   0 .0 7 5
Rectangle 3 (R3)  C rite ria
1   0 .5 , 0 .7  
:
0 .7   0 .7  0 .5   0 .1 4 ,
Rectangle 4 (R4)  C rite ria
1   0 .7 , 0 .8  
:
0 .6   0 .8  0 .7   0 .0 6 ,
Rectangle 5 (R5)  C r ite r ia
1   0 .8 , 0 .9 
:
0 .9   0 .4  0   0 .3 6
 :
,
0 .4   0 .9  0 .8   0 .0 4
,
.
20
Therefore, H V U P F P F
 0 .6 7 5
S
.
Figure 7. The areas associated with (a)
H V U P FPF
S
and (b) I H  P FS , r .C S R  . The reference point for
the LA approach is the nadir point  0 , 0  and denoted by r . L A . The reference point for computing
IH
 P FS , r  is   0 .2 4 , 
0 .2 4 
The computation of
and is denoted by r .C S R .
H V U P FPF
1
and
H V U P FPF
2
proceed in a similar fashion, though it is
important to note that the reference points for each of these two fronts are their individual nadir
points. For
H V U P FPF
1
we have:
H V U P F P F   0 .5  0 .1   0 .7 5   0 .7  0 .5   0 .7   0 .7 5  0 .7   0 .5   0 .9  0 .7 5   0 .4  0 .5 2 5
1
H V U P F P F  0 .3 1 .
Similarly,
The areas associated with
2
H V U P FPF
1
and
H V U P FPF
2
are shown in
Figure 8(a) and Figure 8(c), respectively.
We outlined four drawbacks of the LA procedure in the Introduction, and expanded upon
them in Section 2.2. The first three are apparent in an offline setting such as this, and we
illustrate them here.
The first problem is that different reference points are used for different surrogate Pareto
fronts. Consider
H V U P FPF
reference point for
H V U P FPF
2
1
and
H V U P FPF
1
H V U P FPF
2
pictured in Figure 8(a) and Figure 8(c). Note that the
is  0 .1, 0  , the nadir point of
is  0 , 0 .4  , the nadir point of
P F2
.
P F1 ,
while the reference point for
Calculating the hypervolume and making
21
comparisons in this case cannot be expected to give a fair and meaningful comparison between
the two Pareto fronts. The second issue also relates to the reference point: Choosing the nadir as
the reference point does not permit a contribution from the extreme points in the front. Consider
the computation of H V U P F P F displayed in Figure 8(a). Since the nadir point is  0 .1, 0  , the
1
points  0 .1, 0 .8  and  1, 0  are excluded in the computation of
. The third issue arises
H V U P FPF
1
because dominated points are used when a surrogate front is compared to the reference front.
Consider the first rectangle R1 in Figure 8(a). LA limits the width of this rectangle by the
dominated point  0 .1, 0 .8  , despite the fact that it is collectively dominated by P FS , and thus not a
Pareto optimal solution.
We now illustrate the computation of
IH
 P F   and how it rectifies the aforementioned
problems. Recall the seven-solution Pareto front in Figure 6 where the goal is to maximize
criteria 1 and 2. Also recall that there are two competing Pareto fronts,
that while
P F1
contains five solutions, only four of these are within
P F1
P FS
and
and
P F2
, and note
P F1  P F1
P FS
denotes these 4 solutions (triangles with red circles). For P F2 , all five solutions exist within
and hence
P F2 = P F2
P FS  P F2
used for the computation of
IH
P FS
(diamonds with red circles). Also note that the reference point
IH
 P F1  and
 P FS , r .C S R  . This is chosen because
 P F2  is
IH
r .C S R =   0 .2 4 ,  0 .2 4 
, the same as for
has 7 points and we prefer to prioritize uniformly
P FS
distributed points along the front.
The computation of
IH
 P FS , r .C S R  proceeds by taking the sum of the 7 rectangle areas
(height×width) using criterion 1 as the dividing criterion and
r .C S R =   0 .2 4 ,  0 .2 4 
as the
reference point.
Rectangle 1 (R1)  C r ite r ia
1    0 .2 4 , 0 
 :
Rectangle 2 (R2)  C rite ria
1   0 , 0 .4 
Rectangle 3 (R3)  C r ite r ia
1   0 .4 , 0 .5 
 :
 0 .7 5    0 .2 4     0 .5  0 .4   0 .0 9 9
Rectangle 4 (R4)  C r ite r ia
1   0 .5, 0 .7 
 :
 0 .7    0 .2 4     0 .7  0 .5   0 .1 8 8
 :
 1    0 .2 4     0    0 .2 4    0 .2 9 7 6
 0 .9    0 .2 4     0 .4  0   0 .4 5 6
,
,
,
,
22
Rectangle 5 (R5)  C r ite r ia
1   0 .7 , 0 .8 
 :
Rectangle 6 (R6)  C r ite r ia
1   0 .8 , 0 .9 
 :
Rectangle 7 (R7)  C rite ria
1   0 .9 , 1 
Figure 8. (a)
H V U P FPF
1
 :
; (b) I H  P F1, r .C S R  ; (c)
 0 .6    0 .2 4     0 .8  0 .7   0 .0 8 4
,
 0 .4    0 .2 4     0 .9  0 .8   0 .0 6 4
 0    0 .2 4     1  0 .9   0 .0 2 4
H V U P FPF
2
,
.
; and (d) I H  P F2 , r .C S R  areas for
the two competing Pareto fronts P F1 and P F2 .
We then have I H  P FS , r .C S R 
 1 .2 1 2 6
, with this area pictured in Figure 7(b). In comparing
Figure 7(a) and Figure 7(b) and observing the calculations above, note that
extreme solutions to contribute area to the computation of
The computation of
 P F1, r .C S R 
IH
 1 .0 0 8 6
and
IH
 P FS , r .C S R  .
IH
 P F2 , r .C S R 
r .C S R
 1 .1 8 3 6
allows the
proceed in a
similar fashion. Note that the areas for the rectangles in Figure 8(b) and Figure 8(d) contain all
points that are dominated by the associated point in the front. For instance, in Figure 8(d) the
rectangle determined by the point  0 .9 , 0 .4  contains all points that are dominated by this point
whereas the HVUPF area associated with this point, shown in Figure 8(c), does not include any
of the dominated points below the horizontal line running through  0 .9 , 0 .4  .
Using the LA procedure, one would conclude that
H V U P F P F  0 .5 2 5  H V U P F P F  0 .3 1
1
2
.
P F1
is better than
P F2
This, is unintuitive and misleading, because
since
P F2
23
contributes five well distributed solutions to the surrogate of the true Pareto front whereas
only contributes four well distributed solutions.
IH
 P F2 , r .C S R 
 1 .1 8 3 6  I H
 P F1, r .C S R 
 1 .0 0 8 6
contribution rate of the first Pareto front as
P F1
Using our procedure, we observe
.
We are also able to calculate the
C R  P F1 , P FS  
IH
 P F1, r .C S R 
IH
 P FS , r .C S R 

1 .0 0 8 6
 0 .8 3 1 8
,
1 .2 1 2 6
and the contribution rate of the second as 0.9761. Specifically, then, Pareto front 2 is about 15%
more efficient in approximating the true Pareto front than Pareto front 1.
4.2 Comparing 3-dimensional Pareto fronts
A 3-criterion design problem is presented in Lu et al. (2011) where the experimenter
wishes to obtain a 14-run screening design for 5 factors,
1
).
The user-defined model is
covariance
interactions
 I
2
,
X1
Y  X 1β 1  ε
where
X 1, X 2 ,
ε
,X5,
each at 2 levels (  1 and
has mean vector 0 and variance-
contains the intercept, all main effects and the particular two-factor
X 1 X 2 , X 1 X 3, X 2 X 4 , and X 3 X 5
design to estimate the parameters in
that the true model is
β1
. Though the experimenter wishes to have an efficient
, there is also the desire to protect against the possibility
Y  X 1β 1  X 2 β 2  ε
where
X
2
is the
14  6
matrix containing the
remaining six two-factor interactions. Consequently, the experimenter wishes to find a design
that is efficient in terms of the D-,
tr  A A  
-, and
tr  R R
 -criterion. Note that the D-criterion
focuses upon the precision of the model coefficient estimates, the
tr  A A  
-criterion seeks to
minimize the effect of model mis-specification upon the coefficient estimates and the
tr  R R
-
criterion seeks to minimize the effect of model mis-specification up on the error variance
estimate. For more details on these criteria, see Lu et al. (2011).
This example serves as a nice benchmark for comparing algorithms in three dimensions
because previous work has established the fact that the true Pareto front contains 351 designs,
and this Pareto front is denoted as
P F3 5 1 .
In what follows, we use the Pareto Aggregate Point
Exchange (PAPE) algorithm of Lu et al. (2011) to construct several Pareto front approximations,
and make online comparisons using the methods developed in this paper. Note that we discuss
24
the PAPE algorithm because we are comparing Pareto fronts for multiple-criteria optimal
experiment design problems, though the methods proposed herein could be used to compare
Pareto fronts in other domains, produced by other methods, as well.
The PAPE algorithm (Lu et al., 2011) is an elaboration of a classic point exchange
algorithm (e.g. Fedorov, 1972; Cook and Nachtsheim, 1980) that builds up a Pareto front by
considering exchanges between current design points and candidate points. It requires the
number of random starts to be specified, so we specify 10000 random starts and observe the
Pareto fronts at 50, 100, 500, 1000, 5000 and 10000 random starts. We then assess the results
using
both
r .C S R 
criterion,
our
procedure
as
well
as
that
of
  0 .0 0 3 8,  0 .0 0 3 8,  0 .0 0 3 8  based on the fact that if we use
P F3 5 1
r1  r2  r3 
r .C S R 
proposed
LA.
tr  R R
We
take
 as the dividing
is divided into 267 slices and our empirical rule in Section 3.2 suggests
1
267  1
 0 .0 0 3 8
.
The hypervolume indicator for
  0 .0 0 3 8,  0 .0 0 3 8,  0 .0 0 3 8  is I H  P F3 5 1 , r .C S R 
 0 .6 1 5 7
P F3 5 1
with respect to
.
Results presented in Table 2 demonstrate drawbacks for the LA procedure when used online in this way: the HVUPF obtained with 50 random starts (0.4828) is larger than that for 100
random starts (0.4267) and even bigger than that of 500 random starts (0.4784).
This is
problematic since in this online setting additional random starts should never make the Pareto
front worse. In contrast, the CSR procedure gives monotonic results in keeping with our
intuition. Additionally, Table 2 gives the following information: (1)
Pareto-optimal solutions found; (2)
C R  P F i , P FS 
IH
P Fi
, the number of true
 P Fi, r .C S R  , the hypervolume measure; and (3)
, the proportion of the true Pareto front that has been found.
Table 2. Performance assessment of PAPE with different input settings.
Number
of random starts
50
100
500
1000
5000
10000
P Fi
P Fi
118
166
264
290
315
322
61
78
215
252
299
309
 P Fi, r .C S R 
HVUPF
IH
0.4828
0.4267
0.4784
0.5245
0.5867
0.5874
0.3274
0.3327
0.5774
0.5802
0.5974
0.5978
C R  P Fi , P F 3 5 1 
53.17%
54.04%
93.78%
94.23%
97.03%
97.09%
25
4.3 Empirical assessment of reference point recommendations for three dimensions
Continuing with the three-dimensional example from the preceding section, we perform a
brief empirical study of our reference point recommendations from Section 3.2. Specifically, we
compare three surrogates of the ostensibly true Pareto front, PF351:
1. PF61, the Pareto front generated by PAPE with 50 random starts;
2. PF61R, a Pareto front generated by randomly selecting 61 solutions from PF351;
3. PF61E, an “extreme” Pareto front generated by choosing, from PF351, the 20 solutions with
the largest values of criterion 1 (D-optimality), the 20 solutions with the largest values of
criterion 2 ( tr  A A   -optimality), and the 21 solutions with the largest values of criterion
3 ( tr  R R  -optimality).
While PF61 is constructed directly via a front-populating algorithm, PF61R represents a
relatively uniform front and PF61E represents one in which the extremes of the fronts are most
thoroughly explored. Based on Section 3.2, if we favor uniformly distributed fronts, we would
choose the reference point to be
| r |  0 .0 1 8
,
| r |  0 .0 1 7
, and
| r |  0 .0 1 8
, respectively, based on
the number of slices for each of the fronts (58, 61, and 56, respectively, using criterion 3 as the
dividing criterion). Here we abuse notation slightly and use
reference point
r  ( r1 , r2 , r3 )
|r |
as shorthand for the entire
. If we wish to compare the three fronts, we must choose a single
reference point and since a uniform front is favored here, we might choose the smallest of the
three,
| r |  0 .0 1 7
. We would expect, in this case, that PF61R is judged to be the best and indeed,
Table 3 indicates that it is. Alternatively, if we choose
| r | 1
we would expect the more extreme
front, PF61E, to be superior and Table demonstrates that this is the case. Indeed, somewhere
between
| r | 0 .1 7
and
| r | 0 .3 4
, the more uniform front becomes inferior to the more extreme
one.
Though we must be careful not to make sweeping conclusions based upon a single
instance, this example along with the two-dimensional example in Section 3.1 lend some
evidence to the general conclusion that as
favors extreme points, while smaller
|r |
|r |
increases, the hypervolume indicator increasingly
prefers solutions that are more spread out.
26
Table 3. For the three dimensional example from Section 4.2, an evaluation of several 61-solution
surrogate fronts for various reference points.
PFs
PF61
r  0 .0 1 7
r  0 .1 7
r  0 .3 4
r 1
r  1 .7
0.3486
0.6579
1.1559
5.1681 14.4925
PF61R 0.5787
1.0231
1.7048
6.7641 17.7797
PF61E 0.5656
1.0187
1.7157
6.8925 18.1332
4.4 Decreases and fluctuations in the hypervolume measure
As has already been noted in Section 4.2, when the LA procedure is used online,
decreases in HVUPF can occur even while the number of random starts increases. However, a
similar problem can occur when naively comparing
P Fi
and
PFj
instead of
P Fi
and
P F j
, a
common problem with the hypervolume indicator in the on-line setting due to the lack of a
consistent surrogate for the true Pareto front. The consequence of doing so is that the reference
points will shift from comparison to comparison and this violates a condition of Theorem 2
which requires a common reference point to ensure Pareto compliance.
Figure 9 plots the hypervolume growth of the Pareto fronts produced sequentially for the
first 20 random starts using the PAPE algorithm for the 3-criterion design problem given in
Section 4.2, using the procedure proposed in this paper. Although the general trend is increasing,
the hypervolume measure is nonmonotonic. In the online usage of the hypervolume indicator, the
reference point depends upon the current nadir point, which changes as the Pareto fronts are
updated during the optimization process.
Though there are several possible ways to avoid this problem, we suggest making
pairwise comparisons of fronts as the front evolves. For instance, one might compare the front
after the first random start with that of the second; the front after the second random start with
that of the third; etc. Or, the comparison might be made after a particular increment (e.g.
comparing the front after 50 random starts with that after 100; 100 with 150; etc.). Instead of
using the hypervolume measure directly to make the comparisons, they are made by plotting the
percentage improvement, i.e.,
I
 H
PF
j
, r   IH
PF
i
PFj, r  IH
PF
i
PFj , r   100%

for
j  i.
In this way, Pareto compliance is maintained because a common reference point can be used for
27
each comparison. In Figure 10, we give an illustration. Note that any improvement shows up as
positive on the graph, and a cessation of improvement is indicated by a flat line at 0.
For applications such as this, where thousands of random starts can be executed, we
suggest making comparisons every 50 or 100 random starts to smooth the measure of
improvement. If only hundreds or dozens of random starts are feasible, then perhaps
comparisons can be made every 5 or 10 random starts. Either way, this can be used as part of an
algorithm termination strategy. For instance, the algorithm could be terminated after the first
batch of 50 random starts for which no improvement in hypervolume is made.
Figure 9. The hypervolume growth plot for the PAPE algorithm to solve the 3-criterion design problem
given in Section 4.2.
28
Figure 10. Example of suggested online measure of algorithm progress.
5. Discussion
In this paper we have proposed an improved version of the hypervolume indicator and
applied it in the context of multiobjective optimal experiment design. We give a procedure to
compare competing Pareto fronts that largely avoids the pitfalls of recent work by Lu and
Anderson-Cook (2012). In particular, we have studied the relationship between the hypervolume
indicator and its reference point, given conditions for two-criteria problems that ensure that the
reference point will not affect comparisons between Pareto fronts, suggest rules to guide the
selection of the reference point for two dimensions when these conditions are not satisfied, and
give guidance for the selection of the reference point in three dimensions. We also ensure that
our procedure is Pareto compliant by removing criterion vectors that are dominated by points in
the reference Pareto front, and illustrate our methods in both an offline and online setting. For
online applications, we show how the improved hypervolume procedure can be used to evaluate
the progress of an algorithm that is generating a Pareto front of designs. Because the front is
evolving in this case, reference points will shift and the hypervolume indicator may not increase
monotonically. We propose an approach that avoids this problem by making pairwise
comparisons of Pareto fronts and measuring the percent improvement for each comparison. This
allows the hypervolume indicator to be used as an algorithm termination criterion.
29
Though our goal is to present methods of Pareto front comparison that are practically
useful, we have made a number of assumptions that may limit this work’s applicability. For
instance, Theorem 3 is used to motivate our recommendations for the selection of the reference
point in two dimensions but is based upon a special case in which there are only two elements in
the Pareto front. We also assume a projection of the true Pareto front onto the line connecting the
extremes. These assumptions are, at this point, necessary simplifications that allow concrete
guidance to the end-user. Further work might be undertaken to weaken or eliminate these
assumptions. In addition, our tentative recommendations in three dimensions are not based upon
any theoretical result, but instead are an analogy to the guidelines for two dimensions.
Furthermore, in the hypervolume calculation in three dimensions, there is an implicit
selection of an initial dividing criterion. The number of slices associated with this criterion then
drives our recommendation. There are at least two potential issues with this. First, the number of
slices is dependent upon the level of rounding. For instance, in the example of Sections 4.2 and
4.3,
P F3 5 1
has 267 slices with respect to criterion 3 if rounding to three decimal places, and 321
slices if rounding to 4. We have used 3 decimal places in this work. Second, the dividing
criterion chosen may change the number of slices used to calculate the reference point. For
instance, PF61E of Section 4.3 has 56 slices if the third criterion is used to initially divide, but
only 40 if the first is used.
We have not found these issues to make a difference in the ultimate ordering of fronts,
though it is likely that pathological cases could be constructed for which Pareto front orderings
could change based upon the level of rounding or the chosen dividing criterion. If this is a
concern, we recommend that the practitioner compare the fronts using several scenarios (e.g.
with each criterion as the dividing criterion) to see if the ordering of the fronts change. In the
unlikely case that they do, the user might use the general principle that a smaller
uniform fronts and a larger
|r |
|r |
favors more
favors the extremes to guide the ultimate selection of the
reference point.
The work in this paper has implications for multiobjective experiment design, but also
beyond. Recent work in experiment design has focused on incorporating multiple measures of
design quality into the decision making process, and the developments in this article improve the
tools available to evaluating sets of candidate designs. We emphasize, however, that this work
30
can be applied to a wide variety of optimization settings in which Pareto fronts are used to
evaluate trade-offs between opposing criteria.
Both offline and online usages of the hypervolume indicator are crucial to the process of
constructing Pareto fronts of experiment designs. This procedure can help researchers compare
and evaluate competing algorithms, as well as various versions of a single algorithm, in order to
determine which are most effective in populating a front. It can also guide the use of a particular
algorithm by providing information on its progress toward populating the front.
The multiobjective optimization problem has three main questions that need to be
answered: (1) How do we compare Pareto fronts? (2) How do we populate the Pareto front? (3)
How do we choose a single solution to use? We have addressed the first question, but leave the
second and third to the literature (Lu et al., 2011, Sambo et al., 2014, Park, 2009 for the second;
Lu et al., 2011 and Zio and Bazzo, 2012 for the third) or future work.
Acknowledgements
The authors would like to thank Drs. Christine Anderson-Cook and Lu Lu for their suggestions
and comments throughout this process. We also wish to express gratitude to the reviewers and
associate editor who reviewed this work and allowed us the opportunity to improve the paper.
References
Albrecht, M. C., Nachtsheim, C. J., Albrecht, T. A., Cook, R. D., 2013. Experimental design for
engineering dimensional analysis (with discussions). Technometrics, 55(3), 257-295.
Auger, A., Bader, J., Brockhoff, D., 2010. Theoretical investigation optimal distributions for the
hypervolume indicator: first results for three objectives. Parallel Problem Solving from Nature,
PPSN XI 586-596. Springer Berlin Heidelberg.
31
Auger, A., Bader, J., Brockhoff, D., Zitzler, E., 2009. Theory of the hypervolume indicator:
optimal
distributions and the choice of the reference point. In Foundations of Genetic
Algorithms (FOGA 2009) 87-102. ACM, New York, NY, USA.
Auger, A., Bader, J., Brockhoff, D., Zitzler, E., 2012. Hypervolume-based multiobjective
optimization: theoretical foundations and practical implications." Theoretical Computer Science
425, 75-103.
Bader, J., Zitzler, E., 2011. HypE: An algorithm for fast hypervolume-based many-objective
optimization. Evolutionary Computation 19(1), 45-76.
Beume, N., Naujoks, B., Emmerich, M., 2007. SMS-EMOA: multiobjective selection based on
dominated hypervolume. European Journal of Operational Research 181 (3), 1653-1669.
Brockhoff, D., 2010. Optimal μ-distributions for the hypervolume indicator for problems with
linear bi-objective fronts: exact and exhaustive Results. In Simulated Evolution and Learning.
Springer Berlin Heidelberg, 24-34.
Coello Coello, C.A., Lamont, G.B., Van Veldhuizen, D.A., 2007. Evolutionary algorithms for
solving multi-objective problems. 2nd edition. Springer.
Cook, R. D., Nachtsheim, C. J., 1980. A Comparison of Algorithms for Constructing Exact DOptimal Designs. Technometrics 22, 315-324.
Emmerich, M., Beume, N., Naujoks, B., 2005. An EMO algorithm using the hypervolume
measure as selection criterion. In Evolutionary Multi-Criterion Optimization. Springer Berlin
Heidelberg, 62-76.
Fedorov, V. V., 1972. Theory of Optimal Exeriments. New York, NY: Academic Press.
32
Fleischer, M., 2003. The measure of Pareto optima: applications to multi-objective
metaheuristics. In Evolutionary multi-criterion optimization. Springer Berlin Heidelberg, 519533.
Fonseca, C.M., Knowles, J.D., Thiele, L., Zitzler, E., 2005. A tutorial on the performance
assessment of stochastic multiobjective optimizers. In Third International Conference on
Evolutionary Multi-Criterion Optimization (EMO 2005), 216.
Fonseca, C.M., Paquete, L., López-Ibánez, M., 2006. An improved dimension-sweep algorithm
for the hypervolume indicator. In IEEE Congress on Evolutionary Computation, 2006. CEC
2006. IEEE, 1157-1163.
Friedrich, T., Neumann, F., Thyssen, C., 2013. Multiplicative approximations, optimal
hypervolume distributions, and the choice of the reference point.
arXiv preprint,
http://arxiv.org/abs/1309.3816.
Gilmour, S.G., Trinca, L.A., 2012. Optimum design of experiments for statistical inference.
Journal of the Royal Statistical Society: Series C (Applied Statistics) 61(3), 345-401.
Goel, T., Haftka, R.T., Shyy, W., Watson, L.T., 2008. Pitfalls of using a single criterion for
selecting experimental designs. International Journal for Numerical Methods in Engineering,
75(2), 127-155.
Judt, L., Mersmann, O., Naujoks, B., 2013a. Non-monotonicity of obtained hypervolume in 1greedy S-metric selection. Journal of Multi-Criteria Decision Analysis 20(5-6), 277-290.
Judt, L., Mersmann, O., Naujoks, B., 2013b. Do hypervolume regressions hinder EMOA
performance? surprise and relief. In Evolutionary Multi-Criterion Optimization, Springer Berlin
Heidelberg, 96-110.
33
Knowles, J.D., Corne, D.W., Fleischer, M., 2003. Bounded archiving using the Lebesgue
measure. In The 2003 Congress on Evolutionary Computation, 2003. CEC '03. IEEE, 4, 24902497.
Lu, L., Anderson-Cook, C.M., 2012. Adapting the hypervolume quality indicator to quantify
trade-offs and search efficiency for multiple criteria decision making using Pareto fronts. Quality
and Reliability Engineering International 29(8), 1117-1133.
Lu, L., Anderson-Cook, C.M., Robinson, T.J., 2011. Optimization of designed experiments
based on multiple criteria utilizing a Pareto frontier. Technometrics 61, 353–365.
Myers, R. H., Montgomery, D. C. and Anderson-Cook, C. M., 2009. Response surface
methodology (process and product optimization using designed experiments). John Wiley &
Sons. New Jersey.
Park, Y.-J., 2009. Multi-optimal designs for second-order response surface model.
Communication of the Korean Statistical Society 16(1), 195-208.
Sambo, F., Borrotti, M., Mylona, K., 2014. A coordinate-exchange two-phase local search
algorithm for the D- and I-optimal designs of split-plot experiments. Computational Statistics &
Data Analysis 71, 1193-1207.
Steinberg, D. M. and Bursztyn, D., 2006. Comparison of designs for computer experiments.
Journal of Statistical planning and Inference. 163, 1103-1119.
While, L., Hingston, P., Barone, L., Huband, S., 2006. A faster algorithm for calculating
hypervolume. IEEE Transactions on Evolutionary Computation 10(1), 29-38.
Zio, E., & Bazzo, R. (2012). A Comparison of Methods For Selecting Preferred Solutions in
Multiobjective Decision Making. In Computational Intelligence Systems in Industrial
Engineering (pp. 23-43). Atlantis Press.
34
Zitzler, E., Brockhoff, D., Thiele, L., 2007. The hypervolume indicator revisited: on the design
of Pareto-compliant indicators via weighted integration. In Evolutionary Multi-Criterion
Optimization. Spring Berlin Heidelberg, 862-876.
Zitzler, E., Knowles, J., Thiele, L., 2008. Quality assessment of Pareto set approximations. In
Multiobjective Optimization. Springer Berlin Heidelberg, 373-404.
Zitzler, E., Künzli, S., 2004. Indicator-based selection in multiobjective search. In Parallel
Problem Solving from Nature-PPSN VIII. Springer Berlin Heidelberg, 832-842.
Zitzler, E. and Thiele, L., 1998. Multiobjective optimization using evolutionary algorithms—a
comparative case study. In Parallel problem solving from Nature—PPSN V. Spring Berlin
Heidelberg, 292-301.
35