A FUZZY SET AND RESEMBLANCE RELATION APPROACH
TO THE VALIDATION OF SIMULATION MODELS
MARTENS
J. • Pur F. • KERRE E.
A FUZZY SET AND RESEMBLANCE RELATION APPROACH TO
THE VALIDATION OF SIMULATION MODELS
MARTENS J. AND PUT F.
Faculty of Economics and Applied Economics
Catholic University Leuven, Naamsestraat 69
3000 Leuven, Belgium
KERRE E.
Department of Applied Mathematics and Computer Science
Ghent University, Krijgslaan 281/S9
9000 Ghent, Belgium
Validation is no doubt one of the most important steps in the development of an effective and reliable simulation model of a real system. It aims at deciding whether
the model forms a representation of the system accurate enough for credible analysis and decision making. The methods that are currently available for validation
are binary in nature, in the sense that they can only be used to either reject or
accept the validity of a model. Since it is a commonly accepted point of view that
all models are invalid in the strict sense, we develop in this paper a new method
for validation that allows to express degrees of validity on a continuous scale. The
method makes use of a fuzzy inference algorithm and of a fairly new concept in
the theory of fuzzy sets, known as resemblance relations. We demonstrate how our
method can easily be used to discriminate more from less valid simulation models
for a real-life airline network.
1. Introduction
In this paper, we develop a new method for the validation of behaviour of simulation
models. By simulation model, we mean in fact any kind of model that imitates a system
of interest in reality. Examples of simulation models are models of entire factory floors
or individual manufacturing processes, models of the functioning of call centers, airports
and railway stations, models of communication systems and computer networks, models
of distribution systems, material handling models, models of banks and supermarkets,
highway system models, and so on. The models may be specified at the formal layer
of modelling with a certain specification formalism, or they may be computer models
that have been built with a particular simulation package. By behaviour, we have in
mind a mathematical relation between values that can be seen as properties of inputs,
and values that can be seen as properties of outputs returned in response. To give an
example, for a model of a bank, behaviour can be a relation between arrival rates of customers to the bank and average times that customers have to wait before being served.
To give another example, for a model of an airport, behaviour can be a relation between
correlations among successive arrival delays at the airport and correlations among successive departure delays. Whatever the precise nature of behaviour, we interpret the
process of validation as that of comparing behaviour computed out of model generated
input-output data, with behaviour computed out of real system input-output data.
Leaving our particular interpretation of behaviour aside for a moment, the literature
on modelling and simulation offers today a variety of different (statistical) behaviour
validation techniques. The most important techniques are the method of confidence
intervals, the method of (regression) meta-modelling, and methods based on a time
series analysis - see Law 6 for a discussion of these and other techniques. Validation
using confidence intervals is an approach that has been proposed by Balci 2 . It can be
used to evaluate the significance of a difference in the mean of a performance measure
between a model and a real system. A more sophisticated statistical validation technique
has been proposed by Kleijnen 5 , and is based on the fitting of a regression model from
data of a simulation model. The regression model is called a meta-model, and the idea
is to judge upon the validity of the simulation model from an inspection of the sign and
the magnitude of the coefficients in the meta-model. As to time series based methods for
validation, the technique of spectral analysis is a powerful method that allows to compare
the auto-covariance function of a simulated stochastic process with the auto-covariance
function of a real stochastic process. We refer to N aylor ll and Schruben 12 for examples.
The statistical methods for behaviour validation that are available today are binary
in nature in the sense that the validity of a model can be either accepted or rejected.
Since it is however a commonly adopted point of view in the modelling and simulation
community that all models are invalid in the strict sense, the current methods for be-
1
haviour validation appear to focus on the 'wrong' hypothesis that model and real system
have identical behaviour. In that respect, it is our aim in this paper to present a new
method for behaviour validation that allows to express degrees of model validity on a
continuous scale.
The paper is organised as follows. In section 2, we argue that a statistical method
for behaviour validation is indeed binary in nature, and that interpretational anomalies
arise when it is nevertheless forced to express degrees of model validity. Then, we design
a new method for behaviour validation in section 3. The method employs a fairly new
concept in the theory of fuzzy sets, and involves a rule based comparison of the points
in model and real system behaviour. We experiment with our method in section 4 in the
context of a real-life airline case, and show how it can effectively be used to discriminate
more from less valid simulation models. The paper is ended by summarising the most
important research findings, and by presenting some ideas for future work.
2. Limitations of statistical validation techniques
A statistical method for validating the behaviour of a simulation model comes down (one
way or another) to testing the null-hypothesis that the model is valid - see also figure
1. The test itself involves then computing a sample of a test statistic out of real and
model generated data, and comparing the sample with a predetermined critical point,
the position of which depends on a chosen significance level a. In the particular case that
we illustrated in figure 1, a sample that surpasses the critical point for a predetermined
significance level, is seen as sufficient evidence that the model at issue is invalid.
It is easy to see that hypothesis testing is a binary procedure to validation, the
outcome of which fully depends on the value of a. For, assume that the samples XA,
XB and Xc in figure 1 result from a comparison of real with simulated data, coming
from models A, Band C respectively. If we then fix the value of a at 5%, it follows
from the figure that model A will be retained as invalid, whereas models Band C will
be retained as valid. However, there appears to be not enough evidence to reject the
validity of model A when the value of a is set at 1%. Since the percentages of 5% and
1% are typical values for a, should we now consider model A as valid or as invalid?
In an attempt to bypass the binary outcome of hypothesis testing, one common
strategy is to judge upon the validity of a model by inspecting the p-value that goes
with the sample of the test statistic - the p-value represents the maximal significance
level for which the model will be retained as valid, and is independent of the value of a.
In particular, the lower the p-value for a model, the less likely the model is valid. That
being the case, should we now conclude that higher degrees of validity go hand in hand
with higher p-values? And hence, should we then derive from figure 1 that model C is
more valid than model B, since p(xc) > p(XB)? We believe that the answer is in the
2
x
Figure 1.
Hypothesis testing in model validation
negative. For, both XB and Xc are nothing more but samples of test statistics. When
we 'take' another pair of samples - and thus recompute XB and Xc from new data of the
models Band C -, we may then very well have to conclude that model B is more valid
than model C if it turns out that p(XB) > p(xc). Whereas the (unknown) densities of
the test statistics for models Band C in the figure appear to indicate that model B is
more valid than model C, we believe that the p-values of individual samples do not yield
a robust ranking in validity of the models, since there is no reason to expect that the
samples for model C will be systematically higher (and thus have lower p-values) than
the samples for model B across multiple tests.
Since a null-hypothesis that conveys that a model is (completely) valid will most likely
be untrue in the first place, one may bring up the argument that a confidence interval
is more useful in evaluating the validity of a model - see e.g. Law6 for an argumentation
that points in this direction. In figure 2, we showed an hypothetical confidence interval
for every of the models A, Band C. Assume that each interval is centered around an
estimate of the difference in the mean of a performance measure. On that assumption, a
confidence interval for a model that covers 0 is seen as sufficient evidence that the model
is valid, since the estimated difference in the mean of the performance measure between
the model and the real system of interest appears then to be not significant.
Generally, the higher the number of observations n that are available on the performance measure of interest, the narrower the confidence interval. As we illustrated in
figure 2, this may result in an ambiguous situation for model A. Indeed, suppose that we
3
first gathered nl observations on the performance measure of interest in the model and
the real system, after we collected an additional n2 such observations. That being the
case, should we now use the interval with n £ nl to consider model A as valid, or should
we use the interval with n £ nl + n2 to consider model A as invalid? Furthermore, since
the width of the interval will become infinitely small when n ----t +00, any difference
estimate will eventually turn out to be significant for model A. In that respect, the
number of observations n appears to have a role much too great when using confidence
intervals to decide upon the validity of a model.
Verifying whether or not a confidence interval for a model covers 0 is in fact the same
as testing the (not so useful) hypothesis that there is no difference at all in the mean of
the performance measure of interest between the model and the real system. Therefore,
one often confronts a confidence interval with a lower and upper tolerance limit - see
figure. Only when both the begin and the end point of the interval fall in between the
lower and upper tolerance limit is the model retained as valid. Thus, according to the
pair of tolerance limits that we showed in figure 2, model B would be retained as valid,
while model C would be conceived as invalid. Although this approach is useful to allow
for certain acceptable differences between a model and a real system, it does not provide
an objective means to decide upon the tolerance limits, nor does it help in setting up an
unambiguous ranking among models from more to less valid ones. For, whereas model
B seems to be more valid than model C because of the violation of a tolerance limit
by the confidence interval for the latter, we cannot rule out the possibility that the true
difference in the mean of the performance measure is smaller for model C than for model
B, and hence that model C is in that respect more valid than model B.
C.A
•
C.B
~
I
~ I
-I
I c.c
I •
Figure 2.
•
I
I
I
I
I
I
I
I
II
I
I
"d
I "d
I ~
"
0
Confidence intervals in model validation
4
3. A new method for behaviour validation
In this section, we develop a new method for behaviour validation that overcomes the
shortcomings of classic statistical techniques highlighted in section 2. The method, the
foundations of which we designed in Martens 7 , makes use of a new·o-fuzzy inference
algorithm and of some fairly new concepts in the theory of fuzzy sets. The reader who
is not familiar with fuzzy sets is referred to Kerre4 and Zimmermann 14 for an excellent
introduction and example applications. We delay an illustration of the concepts that we
define in this section until a discussion of the experimental results in section 4.
3.1. Resemblance relations and inference
Let J( be a simulation model, the validity of which needs to be determined in view of
the real system R that K aims to replicate. Further, let K' play the role of an unknown
though ideal model for R; in the terminology of Zeigler 13 , K' would be termed as a
base model for R. The base model is postulated to be an hypothetical and complete
simulation, covering every single aspect of the real system. We denote the behaviour of
K and K' in the following by 8ll(K) and 8ll(K') respectively. The precise computation
of behaviour depends on the type of the simulation model. Here, we do not impose any
restrictions with this regard, and require only that the behaviour of the model can be
unambiguously defined as a mathematical relation in the sense that we agreed upon in
the introduction. Notice that, since the base model K' remains essentially unknown, all
that we have available on the real system side is a collection of data, out of which we
can estimate all or part of the behaviour of K'. Let this estimate be denoted by lB. We
refer to lB as the behaviour of R.
Recalling our definition of behaviour in the introduction, we determine the validity of
K by investigating whether the points in the relation 8ll(K) are reflected by approximately
equal points in the relation lB. For that purpose, we make use of a concept that has
recently been introduced in fuzzy set theory and that is known as resemblance relations
- see DeCock3 for details and examples. Resemblance relations form a specific class of
fuzzy relations that can be used to express a notion of approximate equality between
points of a common space, without the danger of ending up with certain paradoxes that
may arise when classic so-called (fuzzy) similarity relations are used.
For D a non-empty set, (X, d) a pseudo-metric space, and g a function from D
into X, a fuzzy relation R on D is called a (g, d)-resemblance relation if and only if it
holds for all w, W, w' and W' in D that 1) R(w,w) = 1, 2) R(w,w) = R(w,w) and 3)
d(g(w),g(w)) :::; d(g(w'),g(w')) =} R(w,w) 2: R(w',w'). To give a short example, assume
that A 1 , A 2 , ••• , Ak are fuzzy sets on D, and let g be a function that maps every point
w of D on a tuple (A 1 (w), A 2 (w), ... , Ak(W)) with membership values. Then, the fuzzy
relation Ron D, the membership function of which is defined in (w,w) by R(w,w) £
5
1 - d(g(w),g(w)), is a resemblance relation. When d is the sub-pseudo-metric, defined
by d(x,x') ~ maxt=l{!Xi -xm for every x ~ (Xl,X2, ... ,Xk) and x' ~ (x~,x~, ... ,xD
in X, and each of the fuzzy sets A l , A 2 , ... , Ak has the interpretation of a particular
characteristic, then the resemblance R(w,w) indicates whether wand w share these
characteristics to the same extent.
Everything as in the example above, we let the fuzzy sets A l , A 2 , ... , Ak be the fuzzy
rules that are learned by applying a fuzzy inference algorithm to both the behaviour of
the simulation model K, and the behaviour of the real system R. Denoting the set of
all fuzzy rules by §, and using the sub-pseudo-metric, we call the resulting resemblance
relation a rule base induced resemblance relation, and denote it by Rg:. Because of the
definition of the sub-pseudo-metric, the resemblance between two behaviour points will
then be small if there is at least one rule for which holds that the hit of one point to
the rule deviates significantly from the hit of the other point to the rule, whereas the
resemblance will be large if and only if both points have similar hits to every rule. In
short, we thus let the resemblance between behaviour points depend on their compliance
with the fuzzy rules that are learned from the behaviour of K and R. The more two
behaviour points agree with the 'knowledge' stored in the fuzzy rules, the higher will be
their resemblance.
3.2. Maximal resemblance functions
In the line of the classic system theoretic notions of sub- and supersystems - see e.g.
Zeigler 13 -, we call K a subsystem of K' when &6(K) <:;;; &6(K'). Similarly, we call K a
supersystem of K' when &6(K) :2 &6(K'). We say that K is valid with regard to K' if
it is both a sub- and supersystem of K'. Since K will in practice never turn out to be
a subsystem nor a supersystem of K', and will hence always be invalid with regard to
JC', we introduce a gradual version of the notion of sub- and supersystems, and allow to
make the statements: K is a subsystem of K' with degree x, and K is a supersystem of K'
with degree y. We abbreviate these statements by &6(K) <:;;;x &6(K') and &6(K) :2 y &6(K')
respectively, and agree to let x, y E [0,1]. With the aim then to develop a useful notion of
gradual validity, we propose to label every such statement with a truth value, estimated
from a comparison of the points in &6(K) to the points in E as follows.
For every behaviour point p of K, we look up whether there is a behaviour point of
R that highly resembles p according to the rule base induced resemblance relation Rg:.
In case many of the behaviour points of K turn out to have a highly similar counterpart
in the behaviour of R, then we estimate statements of the form &6(K) <:;;;x &6(K') to be
very true for large values of x, and to be very false for small values of x. 'vVe use here
the verb estimate as we do not have &6(K') but only its estimate E available in practice.
Otherwise, in case many of the behaviour points of K do not even have a reasonably
6
similar counterpart in the behaviour of R, we estimate such statements to be very true
for small values of x, and to be very false for large values of x. vVe call the function
that maps every behaviour point in pg(lC) to the highest resemblance value with regard
to the behaviour points in E a forward maximal resemblance function, and denote it by
ni K . Following a completely similar reasoning in order to estimate the degree of truth of
statements of the form pg(lC) ~y pg(lC'), we call the function that maps every behaviour
point in E to the highest resemblance value with regard to the behaviour points in
pg(lC) a backward maximal resemblance function, and denote it by !!!K. Considering
forward and backward maximal resemblance functions as random variables, we now
use their cumulative distribution to estimate the truth value of pg(lC) <;;;x pg(lC') and
pg(lC) ~y pg(lC') for every x, y E [0,1] as follows.
Suppose that we randomly take a behaviour point p of lC, hand over this point to
a neutral assistant, and further ask him to determine the extent to which p can be
considered as a behaviour point of R. Then, he can return niK (p) as the truth value
of the statement pEE. Assume now that we no longer give p to our assistant. In case
we then ask him to determine the extent to which p can be seen as a behaviour point of
R, he will not be able to give an answer in deterministic terms, but can instead decide
to look up, for a certain preferably small value of q E [0,1]' the qth quantile Xq of the
cumulative distribution of ni K , and return this quantile as an (approximate) probabilistic
lower bound on the truth value of the statement pER For, it holds by definition of a
quantile that the statement pEE will then be true with a truth value that exceeds x q ,
with a probability of (approximately) 1 - q.
Let us now substitute the proposition pg(lC) <;;;1 pg(lC') by an equivalent anding of
individual propositions of the form p E pg(lC') , that all together comprise every single
behaviour point of lC. Clearly, for the former proposition to be true, all of the latter
propositions must be true. In that respect, we conceive the proposition pg(lC) <;;;l-q
pg(lC') as (fully) true, in case exactly 100(1 - q)% of the propositions with regard to
individual behaviour points are true. Since likely none of the individual propositions
will be true in the strict sense (at best, many of them are approximately true), we
propose to let an estimate of the degree of truth of pg(lC) <;;;l-q pg(lC') be in proportion
to the (approximate) probabilistic lower bound x q , and this for relatively small values
of q.
Keeping the experimental set-up that we outlined above, our assistant can also easily
determine the extent to which a certain behaviour point p of lC can be conceived as not
being a behaviour point of R, by returning the complement of nidp). In case our
assistant has no knowledge of p, he can still use the complement of the qth quantile
of the cumulative distribution of niK for a certain preferably large value of q E [0,1]'
as an (approximate) probabilistic lower bound on the truth value of the statement p ¢
E. Indeed, applying again the definition of a quantile, the maximal resemblance of a
7
random behaviour point p of K will be lower than or equal to Xq with (approximate)
probability q. In that respect, the truth value of the statement p rt lB has a probability
of (approximately) q of being greater than or equal to 1 - x q •
In a similar way as before, let us now substitute the proposition !!lJ(K) £;;1 !!lJ(K') by
an equivalent an ding of individual propositions of the form p rt !!lJ(K'), that all together
comprise every single behaviour point of K. Again, for the former proposition to be true,
all of the latter propositions must be true. In that respect, we conceive the proposition
!!lJ(K) £;;q !!lJ(K') as (fully) true, in case exactly lOOq% of the propositions with regard to
individual behaviour points are true. Since likely all of the individual propositions will
be true in the strict sense (in general, all of them are more or less untrue), we suggest
to let an estimate of the degree of truth of !!lJ(K) £;;q !!lJ(K') be in proportion to the
(approximate) probabilistic lower bound 1 - x q , and this for relatively large values of q.
We illustrated our reasoning above in figure 3. In the figure, the models Kj , j E J
represent alternative simulation models for a common real system n with base model
K'. Assume for simplicity that the behaviour lB of
that we have available coincides
with the behaviour !!lJ(K') of the base model. The functions that we showed in the figure
are example (smoothed) cumulative distributions for the forward maximal resemblance
n
e--o
"'--0
(very weak)
-
-0
(weak)
(smwh weak)
,
i :
'":,
1
"2
(smwh strong)
(strong)
(very strong)
o
1
4
~
3
+-
1 -
Xq
~i
E-
-------------..r:: -------------------------
8~
Xq -+
I. . .
~
2.-~
--.
~
Figure 3.
Cumulative distributions of maximal resemblance functions
8
functions ni Kj , j E J; alternatively, the functions may be seen as example (smoothed)
cumulative distributions for the backward maximal resemblance functions ~Kj' j E J.
The functions illustrate different grades to which the behaviour of the simulation models
are contained in that of the base model. In effect, looking at the cumulative distribution
of ni Kj for j ~ j1, it appears that only very few of the points in !!lJ(Kj) do not have
a reasonably similar point in !!lJ(K'). According to our rationale above, the statement
!!lJ(Kjl) S;;;l-q !!lJ(K') will therefore be labelled very much true for small values of q, since
the qth quantile Xq is relatively high for q small. Conversely, inspecting the cumulative
distribution of ni Kj for j ~ j5, many of the points in !!lJ(Kj) do not seem to have an
even fairly similar point in !!lJ(K'). According to our rationale above, the statement
!!lJ(Kj5) c;lq !!lJ(K') will therefore be labelled very much true for large values of q, since
the qth quantile Xq is now relatively low for q large.
3.3. Fuzzy sets of gradual validity propositions
It will be clear from subsection 3.2 that, when the proposition !!lJ(K) S;;;l-q !!lJ(K') is
(fully) true for some q E [0,1], then the proposition !!lJ(K) c;lq !!lJ(K') will also be (fully)
true for the same value of q, and vice versa. Since we stated that the only value of q for
which the propositions will be (fully) true likely equals 1, we reasoned to let the degree
of truth (or an estimate thereof when only an estimate of the behaviour of K and/or
K' is available) of the former be in proportion to the qth quantile Xq of the cumulative
distribution of ni K, and to let the degree of truth of the latter be in proportion to the
complement 1- x q , and this for respectively small and large values of q in [0,1]. Now, as
both propositions convey in fact the same information, their degree of truth should from
an intuitive point of view be equal to one another for every q E [0, 1]. In that respect, we
define the degree of truth of !!lJ(K) S;;;l-q !!lJ(K') and !!lJ(K) c;lq !!lJ(K') as a scaled t-norm
T of Xq and its complement l-xq. In particular, denoting a ~ max q E[O,l] {T(xq, l-xq)},
we define the degree of truth by T(xq, 1 - xq)/a, and denote it by aK:(1 - q). Clearly,
{(1- q, aK:(l- q) )}qE[O,l] constitutes a fuzzy set on [0,1]. Associating every point (1- q)
of [0,1] with the gradual subsystem proposition !!lJ(K) S;;;l-q !!lJ(K') (or its equivalent
!!lJ(K) c;lq !!lJ(K')), we call aK a fuzzy set of gradual subsystem propositions. We define
the degree of truth of propositions of the form !!lJ(K) ;;21-q !!lJ(K') or of the equivalent
form !!lJ(K) ~q !!lJ(K'), denoted by f3K:(1 - q), in a completely similar way, and speak
accordingly of {(I-q, f3K:(I-q) )}qE[O,l] as a fuzzy set of gradual supersystem propositions.
Finally, we refer to a t-norm based intersection of the cylindrical extensions of aK and
f3K into [0, 1j2, and thus to the fuzzy set on [0, 1j2 that is defined by 'YK ~ cext[O, 1]2 (aK) 1\
cext[O,lj2(f3K), as a fuzzy set of gradual validity propositions.
Based on an extensive number of preliminary experiments, we found that the membership function of a fuzzy set of gradual validity propositions is highly sensitive to
9
the parameter set-up that we selected in the inference algorithm that we used in the
process of learning a rule base induced resemblance relation. Therefore, we decided to
convert fuzzy sets of gradual validity propositions into defuzzified values on the unit
interval, in order to obtain an easy to use instrument for assessing the validity of a simulation model. We refer to every defuzzified value as a validity coefficient. We compute
a first validity coefficient by calculating a kind of Juzzy centroid of "tIC as the integral
Jo1 Jo1 "tdx, y)min(x, y)dxdy/ Jo1 Jo1 "tdx, y)dxdy. We call such a validity coefficient the
average validity of K. We compute a second validity coefficient by calculating a kind
of middle oJ maxima value of "tIC as J JMmin(x,y)dxdy/ J JMdxdy, in the assumption
that the membership function of "tIC attains its maximum at the points in a subset M of
the unit plane. We refer to such a validity coefficient as the validity mode of K.
4. A validation problem
In this section, we apply the method for validation that we designed in section 3 to
a real-life airline case. The case involves a punctuality study that we carried out in
the past for a (former) Belgian airline company - see Adem 1 . Whereas our goal then
was to investigate the impact of different strategies to boost the departure punctuality
of the company's inter-european flights, our aim now is to evaluate the validity of the
simulation model that we built for that purpose.
4.1. Problem description
The real system of interest is a network of inter-european flights that are organised
according to the famous hub fj spoke architecture. All flights in the network are scheduled
from the hub (Brussels) to a line station, or from a line station (Venice, Bordeaux,
Amsterdam, etc.) back to the hub. When an aircraft with a particular flight arrives at
the hub or a line station, it undergoes a process to be made ready for its next flight. This
process is called aircraft rotation or turnaround, and involves activities like unloading,
cleaning, refueling, technical inspection, boarding, etc. Below, we briefly recapitulate
a core aspect of the conceptual model behind the simulation model that we built in
Adem 1 , and that will play an important role in defining behaviour in subsection 4.2.
When an aircraft arrives on time at the hub or a line station, turnaround proceeds
according to what could be termed as normal circumstances. When an aircraft arrives
'somewhat' late, leaving however sufficient time to rotate the aircraft from a target
rotation time point of view, turnaround proceeds according to what could be called
agitated circumstances. When an aircraft is however 'substantially' delayed on arrival
such that the target rotation time is no longer available, turnaround proceeds according
to what could be termed as rushed circumstances. Conceptually, the delay at departure
for the next flight of an aircraft under normal and agitated circumstances is determined
10
in the simulation model by sampling a departure delay from an empirical distribution
of actual departure delays of the flight at issue. Under rushed circumstances, a surplus
turnaround time is sampled and added to the target rotation time, after which the result
is added to the actual time of arrival of the aircraft, in order to determine the delay at
departure of its next flight.
In this example, we select the target rotation time parameter at the hub, denoted
by TRThub, as a key parameter of the simulation model that has to be decided upon,
before any of the results that are obtained by using the model to investigate the impact
of e.g. changes in the current time table, can be called reliable. The decision to try
out different values for TRT hub comes from the fact that we are uncertain which of the
values yields the most valid simulation model, i.e. the model with behaviour that looks
most similar to the behaviour of the real airline network. We define the different values
of interest for TRT hub to be 40, 45, 50, 55 and 60 minutes. Each of these values thus
points to a fully calibrated simulation model for the airline network. We denote these
models by K 40 through K60 respectively.
4.2. Defining behaviour
In every simulation model, we let the target rotation time parameter at the line stations,
denoted by TRTzs, vary from 15 to 60 minutes, in discrete steps of 1 minute. Each value
of TRTzs corresponds in fact with a workload level for the hub. For, the higher the
value of TRTzS) the more flights will find themselves in the third category of arrival
situations that we described in subsection 4.1, and hence the more flights will be rotated
under rushed circumstances at a line station. Since the departure delay of those flights
is computed by adding a surplus turnaround time to the value of TRT zSl the departure
delay at a line station, as well as the arrival delay at the hub, will overall be larger than
average when the value of TRT zs is high. In other words, the higher the value of TRTzS)
the higher the delays on arrival at the hub, and the more effort the hub has to display
in order to prevent huge delays on departure.
For every simulation model and value of TRTzS) we now define a stochastic input
process, realisations of which are sequences of daily average arrival delays at the hub, and
a stochastic output process, realisations of which are sequences of relative frequencies,
indicating the daily probability that an aircraft has a departure delay at the hub of no
more than 3 minutes - departures with a delay of less than or equal to 3 minutes are
considered to be on time in the airline industry. Thus, the first number in a realisation
of a stochastic input process will be the average delay on arrival at the hub on the first
day of operation of the network, the second number will be the average delay on arrival
on the second day, and so on. Similarly, the first number in a realisation of a stochastic
output process will be the probability that an aircraft has a departure delay at the hub of
11
3 minutes or less on the first day, the second number will indicate the same information
for the second day, and so on.
Clearly, every of the simulation models K 40 through K60 induces a mapping of
stochastic input on stochastic output processes. We now define behaviour of a model
in terms of the long-run means and variances of these stochastic processes. According
to a first behaviour orientation, we let behaviour be a relation between mean average
arrival delays and mean relative frequencies of on time departures. According to a second behaviour orientation, we let behaviour be a relation between average arrival delay
variances and on time departure relative frequency variances. In figure 4, we plotted an
estimate of the behaviour of every simulation model under both behaviour orientations.
We computed the behaviour estimates by processing the output data of the simulation
model that we built earlier in Martens 8 .
4.3. Experimental results
For the purpose of 'validating' the experimental results that we obtain with our fuzzy
set theoretic technique for behaviour validation, we let the simulation model K 40 play
the role of a base model for the airline network, and consider (for both behaviour orientations) the estimate that we computed of the behaviour of K 40 as the behaviour lE
of the network. Since figure 4 then suggests that the behaviour of the other simulation
models tends to deviate more and more from that of the airline network when the value
of TRThub increases, the computational results that we achieve with our method should
eventually allow to make the ranking K45 >- K50 >- K55 >- K 60 , where >- reads as is more
valid than.
In order to learn a fuzzy rule base from an estimate of the behaviour of a simulation
model on the one hand, and from the behaviour of the airline network on the other
hand, we employed a neuro-fuzzy inference algorithm called NEFPROX - see Nauck 10
for details on the algorithm. Based on a number of experiments that we carried out in
the past with a classification variant of the algorithm in Martens 9 , we decided to retain
the following parameters of the algorithm as being significant in the process of learning
a fuzzy rule base: 1) the number of fuzzy sets per input and output variable (being 3, 6,
9 or 12), 2) the shape of these fuzzy sets (being normal (gaussian) or triangular), and 3)
the defuzzification method that is used to achieve a crisp output prediction (being center
of gravity, middle of maxima or first of maxima).
For every simulation model and parameter set-up, we estimated a rule base induced
resemblance relation from the membership values of the behaviour point estimates of the
simulation model and the behaviour points of the real system to the fuzzy rules that were
learned by NEFPROX - recall subsection 3.1 -, and estimated from there the cumulative
distribution of the forward and backward maximal resemblance function of the model.
12
We depicted some of the cumulative distribution estimates that we obtained in figure 5( a)
for the first behaviour orientation, and in figure 5(c) for the second behaviour orientation.
We then converted all cumulative distribution estimates into estimates of the membership
functions of the fuzzy sets of gradual sub- and supersystem propositions of the simulation
models according to our reasoning in subsections 3.2 and 3.3. In the computation of the
membership function estimates, we used Zadeh's t-norm, and thus set T ~ min. We
depicted some of the membership function estimates that we obtained in figure 5(b) for
the first behaviour orientation, and in figure 5(d) for the second behaviour orientation.
Figures 6(a) through (d) then contain estimates of the membership functions of a variety
of fuzzy sets of gradual validity propositions. Finally, table 1 displays an estimate of the
average validity (first number in each cell) and validity mode (second number) that we
computed for every simulation model under the first behaviour orientation (left block of
estimates) and second behaviour orientation (right block), and this for all NEFPROX
parameter set-ups of interest.
Comparing the cumulative distribution and fuzzy set estimates that we showed in
figures 5 and 6 for the simulation models K45 and K50 with one another, and doing
the same for the simulation models K55 and K 60 , their location and shape appear to
strike with what we expect to find from an intuitive point of view. For, the more the
value of TRThub in a simulation model deviates from the ideal value (being 40 minutes),
the more the cumulative distribution and fuzzy set estimates for the model should be
bended towards the left and/or should exhibit a sharp rise near the begin point of the
unit interval or the south west corner of the unit square - keeping the parameter setup of the NEFPROX algorithm fixed. Carefully notice that this expectation is indeed
confirmed by the figures.
Investigating the validity coefficients that we displayed for both behaviour orientations in table 1, we conclude that, the more the value of TRT hub deviates from the
ideal value, the less (overall) appears the average validity and validity mode of the corresponding simulation model. In that respect, if we would have to rank the simulation
models in a decreasing order of validity on the basis of the computational results in the
table, then we would set up the (intuitively correct) ranking K45 >- K50 >- K55 >- K 60 .
When we then finally have to decide on the value of TRThub in our simulation model
for the airline network, before using the model for making a reliable study of the impact
of different suggestions to improve the departure punctuality of flights at the hub, we
would pin down the value of TRThub at 45 minutes in favour of 50, 55 and 60 minutes.
13
0.18
0.16
..,. ++
+
*
0.14
+
0.12
"",
~
i
0.1
."
I
0.08
~
0.06
+
+
""xx
XX
*X
)!.
lIE
~
+
..*l.
lIE
+
++
X
lIE
+:
x
~:lI!:
++
x
xxX
0.04
o~.; ••
0.02
•• _
0, L5 - - - - - - , 2LS - - - - - - : 3L' - - - - - - ,48
L-------:"5'-9- - - - - - = " 0
mean average arrival delay
45
40
so
55
50
(a) Orientation 1
0.007
+ +
0.006
.1
+'
0.005
+
~
i..
J:
JS
0.004
0.003
K
~
0.002
~
...
o
0.001
oL-_ _ _ _--:"_ _ _ _ _
40
40
66
45
~
_ _ _ _ _- L_ _ _ _ __ L_ _ _ _ _
92
118
average arrival delay variance
50
55
0
so
(b) Orientation 2
Figure 4.
Simulation model behaviour estimates
14
~
170
144
45 • NORM-COG-9 - Bwd - 50 - NORM-COG-9· Bwd .-.--55- NORM-MOM-6- Bwd -
60 • NORM-MOM·6 - Bwd ..
45 - NORM-COG-9 • Fwd .
50· NORM-COG-9 • Fwd .
55 - NORM·MOM-s· Fwd 60 - NORM-MOM-s - Fwd
(a) Cumulative distributions (smoothed) - orientation 1
45 - NORM-COG-9 - Sup - 50 - NORM-COG-9 - Sup • .
55 - NORM-MOM-6 - Sup . --
so - NORM-MOM-S - Sup
45 - NORM-COG-9 - Sub •.•.•.•
50 - NORM-COG-9 - Sub •. - .•. -
55 - NORM-MOM-S Sub·
SO - NORM-MOM-6 Sub·
(b) Fuzzy sets (smoothed) - orientation 1
15
0.75
0.5
0.25
0.25
45 • TRIA·MOM..s - Bwd - 50 - TRIA-MOM-6 - Bwd -----55 - TAIA-FOM-3 - Bwd -
60 - TRIA-FOM-3 - Bwd
45 - TRIA-MOM-6 - Fwd _._ .•.•
50 - TRIA-MOM-6 - Fwd _. _. _..
55 - TRIA-fOM-3 - Fwd·
60 - TRIA-FOM-3 - Fwd
(c) Cumulative distributions (smoothed) - orientation 2
45 - TAIA-MOM-S - Sup - 50 - TRIA-MOM-S - Sup -.
55 - TRIA-FOM-3 - Sup -
60 - TRIA-FOM-3 - Sup
45 - TRIA-MOM-S - Sub •.•.•.•
55 - TRIA-FOM-3 - Sub
60 - TRIA-FOM-3 - Sub
50 - TRIA-MOM-6 - Sub •.•.•..
(d) Fuzzy sets (smoothed) - orientation 2
Figure 5.
Cumulative distribution and fuzzy set estimates
16
0.75
0.75
o.s
0.75
18
Table 1.
FT :ap~
'""
<II
TRIA
~
"'
~
'"
NORM
0
'"
<II
TRIA
~
"'
>-'
CD
~
'"
''""
<II
NORM
TRIA
";l
"'
~
'"
NORM
0
'"
<II
TRIA
~
~"'
'"
NORM
[Defuzz
Validity coefficients - orientations 1 (left block) and 2 (right block)
I
·--.-.1
Fuzzy Partitioning
~
Fuzzy Partitioning
12
12
-_._----
COG
0.74
0.92
0.44
0.48
0.45
0.53
0.34
0.41
0.71
0.91
0.56
0.81
0.40
0.58
0.39
0.53
MOM
0.77
0.94
0.41
0.43
0.38
0.41
0.31
0.35
0.66
0.87
0.51
0.76
0.42
0.64
0.38
0.50
FOM
0.77
0.93
0.46
0.52
0.39
0.45
0.36
0.44
0.63
0.87
0.52
0.78
0.42
0.64
0.38
0.48
COG
0.66
0.88
0.49
0.66
0.42
0.53
0.38
0.46
0.70
0.91
0.50
0.72
0.42
0.64
0.37
0.50
MOM
0.60
0.84
0.47
0.61
0.44
0.57
0.37
0.44
0.69
0.90
0.52
0.74
0.43
0.62
0.37
0.52
FOM
0.64
0.87
0.46
0.58
0.41
0.52
0.34
0.38
0.71
0.91
0.54
0.78
0.50
0.75
0.39
0.56
COG
0.41
0.43
0.32
0.34
0.32
0.36
0.27
0.27
0.63
0.86
0.49
0.71
0.33
0.38
0.37
0.46
MOM
0.37
0.35
0.32
0.34
0.27
0.31
0.23
0.20
0.65
0.83
0.45
0.59
0.36
0.41
0.37
0.50
FOM
0.39
0.41
0.36
0.45
0.29
0.33
0.26
0.27
0.68
0.87
0.49
0.71
0.36
0.42
0.37
0.47
COG
0.45
0.58
0.31
0.32
0.34
0.37
0.27
0.25
0.62
0.86
0.44
0.65
0.41
0.55
0.31
0.37
MOM
0.40
0.52
0.30
0.29
0.33
0.35
0.27
0.23
0.63
0.86
0.44
0.65
0.41
0.51
0.34
0.42
FOM
0.42
0.54
0.31
0.32
0.33
0.33
0.25
0.19
0.65
0.86
0.43
0.62
0.38
0.50
0.32
0.35
COG
0.27
0.27
0.30
0.33
0.29
0.32
0.25
0.27
0.42
0.46
0.36
0.40
0.34
0.33
0.34
0.35
MOM
0.21
0.18
0.32
0.36
0.27
0.30
0.18
0.16
0.25
0.22
0.32
0.38
0.36
0.35
0.34
0.34
FOM
0.23
0.21
0.30
0.30
0.28
0.32
0.22
0.20
0.39
0.45
0.34
0.40
0.36
0.35
0.34
0.34
COG
0.29
0.28
0.24
0.16
0.28
0.26
0.27
0.24
0.47
0.56
0.41
0.48
0.36
0.41
0.30
0.28
MOM
0.29
0.28
0.24
0.20
0.27
0.21
0.23
0.16
0.49
0.64
0.41
0.48
0.33
0.33
0.30
0.28
FOM
0.30
0.32
0.25
0.17
0.26
0.22
0.22
0.14
0.50
0.61
0.40
0.44
0.35
0.41
0.32
0.29
COG
0.34
0.36
0.14
0.05
0.14
0.05
0.17
0.11
0.14
0.05
0.18
0.12
0.22
0.17
0.28
0.25
MOM
0.34
0.36
0.13
0.05
0.14
0.05
0.14
0.06
0.14
0.05
0.20
0.17
0.23
0.19
0.33
0.29
FOM
0.34
0.36
0.14
0.05
0.22
0.18
0.17
0.11
0.24
0.22
0.20
0.14
0.24
0.19
0.32
0.29
COG
0.24
0.22
0.15
0.07
0.27
0.22
0.25
0.17
0.29
0.29
0.31
0.30
0.24
0.19
0.18
0.13
MOM
0.23
0.21
0.15
0.10
0.25
0.20
0.22
0.13
0.34
0.38
0.31
0.31
0.23
0.18
0.18
0.13
FOM
0.29
0.31
0.16
0.10
0.26
0.20
0.18
0.09
0.34
0.38
0.30
0.29
0.26
0.23
0.16
0.11
5. Conclusion
In this paper, we presented a new method to validate the behaviour of simulation models. First, in section 2, we argued that the current statistical methods for behaviour
validation are binary in nature, and do not help the model builder in ranking models
from more to less valid ones. Then, in section 3, we designed a fuzzy set theoretic
method for behaviour validation that allows to express the similarity between behaviour
of a simulation model and behaviour of a real system on a continuous scale. Finally, in
section 4, we experimented with our method in the context of a real-life airline network.
In the future, we intend to investigate the sensitivity of the validity coefficients that
we computed to the particular inference algorithm that we employed in the process of
learning a rule base induced resemblance relation. We also plan to increase the scope of
our experiment, and intend to generate computational results for additional behaviour
orientations and other modelling and validation problem cases.
References
1. J. Adem and J. Martens. Analysing and improving network punctuality at a belgian airline
company using simulation. Technical Report 0140, Katholieke Universiteit Leuven, 2001.
2. O. Bald and R.G. Sargent. Validation of simulation models via simultaneous confidence
intervals. American Journal of Mathematical and Management Sciences, 4(3-4):375-406,
1984.
3. M. DeCock and E.E. Kerre. On (un)suitable fuzzy relations to model approximate equality.
Fuzzy Sets and Systems, 133(2):181-192,2003.
4. E.E. Kerre. Introduction to the Basic Principles of Fuzzy Set Theory and Some of its
Applications. Communication and Cognition, 1993.
5. J.P.C. Kleijnen and R.G. Sargent. A methodology for the fitting and validation of metamodels in simulation. European Journal of Operational Research, 120(1):14-29, 2000.
6. A.M. Law and W.D. Kelton. Simulation Modeling and Analysis. McGraw-Hill, 2000.
7. J. Martens. Model validation: An integrated, fuzzy set and system theoretic approach.
Technical Report 0311, Katholieke Universiteit Leuven, 2003.
8. J. Martens and J. Adem. Punctuality evaluation of airline networks with simulation. In
Proceedings of the 13th European Simulation Symposium, pages 440-447, 2001.
9. J. Martens, G. Wets, J. Vanthienen, and C. Mues. Improving a neuro-fuzzy classifier using exploratory factor analysis. International Journal of Intelligent System, 15(8):785-800,
2000.
10. D. Nauck, F. Klawonn, and R. Kruse. Neuro-Fuzzy Systems. Wiley, 1997.
11. T.H. Naylor, K. Wertz, and T.R. Wonnacott. Spectral analysis of data generated by simulation experiments with econometric models. Econometrica, 37:333-352, 1969.
12. L.W. Schruben and V.J. Cogliano. An experimental procedure for simulation response
surface model identification. Communications of the ACM, 30(8):716-730, 1987.
13. B.P. Zeigler. Theory of Modeling and Simulation. Academic Press, 1976.
14. J .H. Zimmermann. Fuzzy Set Theory and its Applications. Kluwer Academic, 1991.
20
© Copyright 2026 Paperzz