Anticipation of Stochastic Customer Requests in Vehicle Routing

Anticipation of Stochastic Customer Requests in Vehicle
Routing: Value Function Approximation based on a Dynamic
Lookup-Table
Marlin W. Ulmer, Dirk C. Mattfeld, Felix Köster
Technische Universität Braunschweig, Germany,
[email protected]
December 4, 2014
Abstract
We present a vehicle routing problem with stochastic customer requests. A single vehicle
has to serve customers in a service area. In the beginning, a set of customers is known and
has to be served. During the day, stochastic requests arrive. Respecting a time limit, the
dispatcher has to decide, whether to reject or confirm each request. Effective decision making
requires the anticipation of future requests. We apply approximate dynamic programming
(ADP) evaluating appearances and actions regarding the expected number of future costumer
confirmations. As indicators for the expected value, we use the parameters time and slack.
The values are stored in a lookup table (LT). We show that the (static) interval length of
the LT-axes significantly impacts the approximation process and the solution quality. Small
intervals result in high solution qualities, large intervals in a fast and reliable approximation.
To combine both advantages, we introduce a dynamic LT. The dynamic LT adapts interval
lengths to the problem specifics during the approximation process. The dynamic LT proves as
a generic approach providing an efficient approximation process and effective decision making.
keywords: vehicle routing, stochastic customer requests, value function approximation, approximate dynamic programming, dynamic lookup table
1
1
Introduction.
capacitated vehicle collects parcels during the
day. The vehicle starts and ends its tour in
a depot. Early request customers have to be
served and are known in advance (e.g., are
postponed customers from the day before).
During the service period, new requests arrive
stochastically within the service area. Arriving
at a customer provides a set of new requests.
The dispatcher has to decide whether to
permanently accept or reject new requests.
The confirmed subset of requests has to be
added to the planned tour. Our objective is
to maximize the number of collected parcels
considering the given time limit reflecting the
driver’s working hours. As a result, not every
request can be served. So, in some cases,
rejecting feasible customers in one area might
be promising to serve more future requests in
another.
In last mile delivery, challenges for logistic service providers increase. Especially, for courier
express and parcel services, customers expect
reasonably priced, fast and reliable service
(Ehmke 2012). Due to an significant increase
in e-commerce sales, the number of shipped
parcels has grown significantly in recent years.
Beside delivery, service provider offer pickup
of parcels (Hvattum et al. 2006). For delivery,
regular tours are defined by service network
design (Crainic 2000). For parcel pickup,
additional vehicles are scheduled dynamically,
because some pickups are requested in the
course of the day (Lin et al. 2010). For these
vehicles, a subset of the pickup-requests is
stochastic and not known in the beginning of
the service period. Further, the requesting
customers may be arbitrarily distributed in
the whole service region and the locations
may not be known beforehand. To include
new pickup-requests in the tour, the provider
has to dynamically adapt the current plan
(Gendreau et al. 2006). Due to working hour
restrictions, service provider may not be able
to serve all dynamic requests. Some requests
have to be rejected (Menge and Hebes 2011).
Decisions about new requests significantly
impact future outcomes. So, myopic plans
may lead to substantial detours and many
customer rejections in the future. To address
this risk, anticipation of future requests into
current decision making is mandatory (Meisel
2011). For the anticipation of future requests,
service provider can derive probabilities of
customer requests for certain regions of the
service area by making prognoses about customer behavior (Dennis 2011) and analyzing
historical data.
The focus of this paper is on providing
anticipatory confirmation policies.
Given
a set of new requests, such a policy allows
for decisions about confirming or rejecting a
request. In the decision process, each decision
results in a known system appearance. Such
an appearance consists of the point of time
and the locations of the vehicle and the
remaining customers.
These appearances
have an significant impact of the number of
future confirmations. To exploit this impact,
we apply a value function approximation
using approximate dynamic programming
(ADP, Powell 2007). ADP considers both the
immediate and expected future confirmations.
The immediate number depends directly on
the applied decision. The expected number
of future confirmations is approximated for
every appearance. Due to the multiplicity of
possible appearances, a distinct approximation of every appearance value is not feasible.
In this paper, we study a single-period dy- Therefore, ADP assigns a group of appearnamic vehicle routing problem with stochastic ances to a simplified (post decision) state of
customer requests. We assume that a non- lower dimensionality. Usually, an appearance
2
served customers significantly. For the static
LT, we experience a high variance in solution
quality regarding different interval lengths.
An a priori determination of a suitable interval
length is not possible. The DLTs provide
excellent solution quality and significantly
The state space can be described as a reduce the required memory consumption.
lookup table (LT, Sutton and Barto 1998).
Every axis of the LT represents a parameter.
This paper is outlined as follows. In §2, we
The axes are divided in equidistant intervals. present and discuss the related literature focusThe number of different entries in the table ing on vehicle routing problems with stochastic
and therefore, the size of the state space is customer requests. In §3, we recall the geninversely proportional to the interval length. eral terminology and modus operandi of ADP
The selection of the interval length is essential using a (static) lookup table. To analyze the
for the success of ADP. A large interval length, impact of the LT selection to the approximai. e., a small LT allows a frequent observation tion process, we present an example highlightof the entries and therefore, a fast and reliable ing the trade-off between approximation effiapproximation. Nevertheless, it may group ciency and solution quality using aggregation
heterogeneous appearances to a single entry within the LT. We use this motivation to de(George et al. 2008). This results in a high velop a new approach using a DLT in §4. In
value deviation within the observations of §5, we define the vehicle routing problem and
an entry. The decision quality decreases. A present a variety of real-world sized instances
small interval length, i. e., a large LT allows differing in customer distribution, region size
a detailed differentiation of the appearances, and ratio of dynamic customers. For these inbut entries are observed rarely.
So, the stances, we apply ADP using different LTs and
approximation requires high computational analyze the approximation process and the soefforts and a large memory consumption lution qualities in §6. We especially evaluate
(Sutton and Barto 1998). In this paper, we the approximation efficiency to highlight the
combine the advantages of both small and advantages of a dynamic lookup tables. The
large interval lengths defining a dynamic LT paper concludes with a summary of the results
(DLT). DLT is a generic approach and adapts and directions for future research in §7.
to the problem specifics. DLT dynamically
change the interval length of the parameters
according to the approximation process. 2 Literature review
Assuming different approximation behavior
within the LT, ”interesting” areas in the DLT We consider a stochastic and dynamic vehicle
are considered in more detail, while the rest routing problem (Kall and Wallace 1994).
of the DLT stays in the original design. This The problem is stochastic, because not all
allows both a fast and reliable approximation information is provided in the beginning, but
and, if necessary, a detailed differentiation is revealed over time. It is dynamic, because
of the appearances. We compare different the problem setting allows decision making
DLTs with LTs of static interval length for during the service period. The literature on
instances of real-world size. All approaches stochastic and dynamic vehicle routing is
allow anticipation and increase the number of vast. Stochastic impacts are uncertain travel
is reduced to a vector of key parameters (for
the given problem, time and slack). To reduce
dimensionality, the parameter realizations are
assigned to intervals. These intervals generate
the state space.
3
the future regards the positions (or distributions) of possible customers and their request
behavior over time (Hvattum et al. 2006). Nevertheless, customers can request service at arbitrary locations to every point in time. So, we
experience a nearly infinite number of possible
problem outcomes and appearances. Due to
the high dimensionality of the information, it
is necessary to aggregate information to apply
optimization algorithms (Rogers et al. 1991,
Provost and Fawcett 2013). To include forecasts into optimization, for all anticipatory approaches, a simplification of the problem or
within the algorithm is applied. These, mainly
static, simplifications have a significant impact
of algorithm efficiency and solution quality. On
the one hand, only simplified information allows the efficient application of solution algorithms. On the other hand, a simplification
has to maintain the problem specific characteristics to effectively achieve good solution qualities (Meisel and Mattfeld 2010). Simplification is applied in the problem setting, by using
decomposition and within the stochastic optimization model. To anticipate future customers, predefined policy functions using aggregated information (e.g., waiting strategies),
sampling approaches reducing the number of
possible outcomes, and approaches approximating the value of certain groups of appearances are applied.
times (Thomas and White III 2007, Lecluyse
et al. 2009, Ehmke et al. 2015), service times
(Daganzo 1984, Larsen et al. 2002), stochastic
customer demands (Erera and Daganzo 2003,
Sungur et al. 2008) and requests (Psaraftis
1980, Ichoua et al. 2007).
An extensive
overview over the different problem settings is
given by Pillac et al. (2013). In the sequel, we
focus on work considering stochastic customer
requests. For these problems, decisions consider both routing and request confirmations.
We first present myopic approaches. Then, we
review the anticipatory approaches.
2.1
Myopic vs. Anticipatory
Solution approaches can be divided into myopic and anticipatory. Myopic approaches only
use already revealed information, while anticipatory approaches exploit further knowledge
(Butz et al. 2003). To deal with stochastic customer requests, myopic approaches
are mainly focused on routing, following a
greedy confirmation policy. First come, first
serve-policies are applied by Psaraftis (1980,
1995), Bertsimas and Van Ryzin (1991),
Swihart and Papastavrou (1999), Larsen et al.
(2002). Tassiulas (1996) partitions the service
region and subsequently serves the subareas.
Gendreau et al. (1999, 2006) combine tabu
search and an adaptive memory with a rolling
horizon algorithm to dispatch customer re2.2 Problem Simplification and Dequests to a fleet of vehicles. Other frequent
composition
myopic approaches are basic waiting strategies
(e.g., wait at start, wait at end ), for instance Many anticipatory approaches use a geographapplied by Mitrović-Minić and Laporte (2004). ical aggregation (Campbell 2006) to simplify
the problem structure. Here, possible cusFor customer anticipation, knowledge about tomers are represented by a set of nodes in a
future requests has to be incorporated in the graph model. So, the vast number of possidecision process. This knowledge allows fore- ble customer locations is simplified to a known
casts and can be derived from historic obser- set of reasonable size. For some problems, this
vations or depends on predictions of future bears the risk of a discrepancy between (aggreevents. In most approaches, information about gated) decision and practical implementation
4
potential customers. Policy function approximations allow the application of efficient optimization algorithms. Nevertheless, the anticipation of future requests is limited due to the
static rules and the generally high information
aggregation.
(Ulmer and Mattfeld 2013) and therefore, of
inefficient solutions. To achieve suitable practical actions, an appropriate disaggregation policy is required. Another approach to reduce
dimensionality is by problem decomposition.
In some cases, a greedy confirmation policy is
combined with anticipatory routing (Thomas
and White III 2007). Alternatively, a predefined routing algorithm is applied, while anticipation in confined to in the confirmation policy (Schmid 2012). In both cases, the number
of possible decisions and so the decision space
dimensionality can be reduced significantly.
2.3
2.4
Sampling
Monte Carlo-sampling is used to reduce the
dimensionality of the optimization problem.
Sampling approaches simulate a set of future
events to evaluate current decisions. The set of
all possible outcomes is represented by limited
number of sampled outcomes. Each sampled
outcome generates a lookahead model (Powell
2014). Optimization is applied to these models. Hence, the outcome space is simplified,
while the detailed level of information within
the outcomes can be maintained. Sampling allows a more detailed consideration of future
events, but often requires significant computational effort, depending on the number of generated samples. To anticipate stochastic customer requests in vehicle routing, future customer requests are simulated. These requests
are used to evaluate different routes and decisions. Bent and Van Hentenryck (2003, 2004)
introduce a multiple plan approach and multiple scenario approach, where future customer
requests are sampled and integrated in plans
containing a set of routes. This approach is
also used by Sungur et al. (2010) and Flatberg et al. (2007). Ghiani et al. (2009) use
short term sampling for a pick-up and delivery problem. Hvattum et al. (2006) approach a real world case study with sampling.
They use historical data of customer requests
to minimize the expected travel costs. Ghiani et al. (2012) compare the center-of-gravity
waiting approach with a sampling method and
achieved nearly similar results. Even though
the sampling approach requires high computational effort, it is not able to surpass the effi-
Policy Function Approximation
In a policy function approximation, decision
policies follow certain rules depending on aggregated information. For vehicle routing
problems, not every possible future customer is
considered. Instead, the information is translated into some key figures to evaluate customers (Powell et al. 1988, van Hemert and
La Poutré 2004), certain routes (Branke et al.
2005), waiting locations and waiting times
(Thomas 2007). Mainly, these approaches determine certain routes and waiting locations
to insert new requests efficiently in the scheduled tour. Larsen et al. (2004) propose to wait
on idle points with high probabilities of future
requests. Branke et al. (2005) maximize the
probability to include new requests in the tour
by using evolutionary algorithms and waiting
strategies to find the best tour and waiting position. Ichoua et al. (2006) partition the service area into zones and calculate the different
request probabilities. Waiting is only applied
near zones with request probability higher than
a certain threshold. Thomas (2007) introduces
a center-of-gravity waiting strategy. He dynamically calculates the center of gravity for
the remaining (feasible) potential customers.
The vehicle waits at the customer right before
this center. So, it not passes by a majority of
5
between the key parameters and a value, each
parameter vector is assigned to a value stored
in a lookup table (LT). This enables a more
detailed evaluation of the appearances without
the use of any a priori hypothesis. Given a
known set of possible customers combined
with decomposition of the algorithm, this allows a nearly exact appearance representation.
Meisel et al. (2009) use a vector of customer
statuses to describe an appearance. Each
state contains information about the time, the
vehicle position and the set of possible future
customers. Considering a priori unknown
customers distributed in the whole service
area, this approach is no longer applicable.
Schmid (2012) partitions the service area
into zones. A state contains of the number
of vehicles and customers in the different
zones. In combination with cheapest insertion
for routing, Ulmer et al. (2014) use a LT
containing the numerical parameters time and
slack as appearance representation achieving
an acceptable confirmation policy at the expense of a large state space and therefore, high
memory consumption and slow approximation.
cient policy function approximations.
2.5
Value Function Approximation
A decision mainly depends on the expected future contributions, i.e., value of an appearance
(Bellman 1957a). For problem settings of realworld size, calculating these expected values
is computationally intractable and value function approximation (VFA) is applied. Among
others, approximation is achieved using ADP
(Powell 2009). ADP approximates appearance
values by simulation and, therefore, finds
reasonable decisions combining immediate
and expected future contributions. Again,
aggregation techniques are necessary to reduce
the manifold appearances to a suitable set
of states, which can be efficiently evaluated.
Mainly, an appearance is assigned to a vector
of key parameters. The parameter selection
is essential to achieve anticipation. They
must enable to group similar appearances to
identical states. The similarity of appearances
depends on the problem specifics and the
applied routing and confirmation heuristics.
Different routing heuristics may significantly
impact the value of an appearance.
The dimensionality of a LT is highly limited
due to computational and memory reasons
(Barto 1998).
To reduce dimensionality,
the numerical parameters of a LT can be
aggregated to intervals. George et al. (2008)
analyze the influence of the aggregation level
of the approximation process. Results show
that the aggregation level has a significant
impact on the approximation process and the
solution quality. On the one hand, a highly
aggregated LT allows for a frequent visit of the
entries and supports a fast approximation. On
the other hand, for a detailed consideration of
the problem specifics, a fine-grained level of
aggregation is necessary. In essence, we experience a trade-off between method efficiency
and solution quality. To reduce the number
of approximation iterations and to improve
In some cases, the value of a state is
calculated directly by a weighted set of basis
functions depending on the parameters. The
basis functions express hypothesis of the
correlations between the key parameters and
the value. During the simulation runs, these
weights are approximated using multiple linear
regression. Meisel et al. (2011) use a basis
function considering the slack and the average
additional time to include the remaining
possible customers. They stated that the
number of expected future confirmations is
directly depending on the ratio between slack
and average time to include a customer.
To allow a direct and unbiased mapping
6
ure 1: On the operational level, ADP simulates
a problem. For decision making, it extracts
an appearance and a set of possible actions,
which are assigned to the according set of post
decision states. The immediate contributions
of the actions and the set of PDSs are used
in a Markov Decision Process (MDP, Bellman
1957b) to achieve a decision. For evaluation of
the PDS, the MDP draws on the value function
approximation. Finally, the provided decision
is assigned back to the specific action applied
to the simulation. The outcome of an iteration
is used to update the VFA. During the iterations, VFA approximates the expected PDS
values. This subsequently leads to a more accurate evaluation and the solution quality increases. To achieve an efficient and effective
approximation process, the interfaces between
system and MDP, i.e., the assignments of an
appearance-action pair to a PDS for evaluation
are essential.
For optimization, ADP uses the functionality of a Markov Decision Process (Bellman
1957b). MDPs provide optimal solutions for
stochastic and dynamic multi-stage optimization problems of small size. Given a small set
of possible states and the probabilities for transitions between the states, MDP allows the calculation of the optimal decision considering immediate and expected future contributions. In
each state s of a finite set S = {s0 , . . . , sn }, a
subset of possible decisions depending on state
s is given: Xs ⊆ X = {x1 , . . . , xm }. The outcome of each combination (s, x) is known beforehand and is defined as the PDS (s, x) = p ∈
P = S × X. A realization ω of the (stochastic) transition function Ω : P → S leads to the
subsequent state s′ = ω(s, x). Each decision
x ∈ X in state s ∈ S generates an immediate
contribution C : S × X → R. Additionally, it
leads to a more a less valuable PDS p. The
expected value V (p) is calculated by the (discounted) sum of the expected following contributions. So, the value function can be defined
Optimization
MDP
s
e
le
c
t
Values
Post Decision States
re
a
s
s
ig
n
n
igs
as
Appearance + Actions
cta
rt
x
e
System
a
p
p
ly
Update
Approximate Value Function
tr
se
in
Simulation
Figure 1: ADP-Algorithm: Interaction between Problem-System and MDP
solution quality, a dynamical adaption of the
LT to the problem and algorithm specifics is
promising. George et al. (2008) provide the
first approach to combine different levels of
aggregation. The values are calculated using
a set of LTs differing in level of aggregation.
George et al. depict, that for some problems
this approach results in a fast learning process
with good solutions. Nevertheless, the calculation effort and memory consumption using
multiple LTs increases drastically, caused
especially by the LT with the lowest level of
aggregation. Thus, we propose to use a single
LT with areas of different aggregation level
(interval length) dynamically regarding the
approximation process.
3
Approximate Dynamic Programming
To define the dynamic LT in §4, we first recall
the terminology and modus operandi of ADP.
We focus on ADP using post decision states
(PDSs) and a LT for evaluation. ADP contains of three main modules, depicted in Fig7
by V : P → R. To select a decision, the immediate contribution C(s, x) and the PDS-value
V (p) = V (s, x) are combined. An optimal decision x∗s for state s fulfills:
x∗s = arg max(C(s, x) + V (s, x))
The parameter α represents the stepsize of the
approximation process. The more frequent
a state is visited, the more accurate is the
approximation.
If states are only visited
sparsely, the approximation and the solution
quality might be impaired (Barto 1998, p.
193). In multistage problems, V̂run contains
the subsequent contributions of one iteration.
Hence, further ineffective solutions impact
V̂run . By choosing an appropriate stepsize and
initial values, V̂ approximates the values of
the PDSs within the iterations.
(1)
x∈Xs
The expected value of a PDS is calculated
recursively using an adaption of the Bellman
equation (Bellman 1957a):
∑
V (s, x) =
π(s′ , (s, x))V (s′ )
(2)
s′ ∈S
Here, π : P × S → [0, 1] is the probability of
the transition between (s, x) and s′ .
For many problem settings, the number of
possible system appearances and actions is
vast, so it is impossible to apply the plain
MDP. Further, π is not accessible, respectively,
computationally tractable. Hence, to apply a
MDP to a more complex problem, appearances
are grouped into states. The state values are
approximated using simulation.
3.1
3.2
The simulation depicted in Figure 3 models
the problem on an operation level, i.e., appearances and actions consist of every detailed information provided by the simulation. To apply MDP, these appearance-action pairs have
to be assigned to post decision states by aggregation of information. Therefore, a vector of key parameters (e.g., point of time)
ki ∈ Ki ∀ i ∈ {1, . . . , n} is derived from appearance and action and is used for evaluation. Mainly, parameters can be binary, ordinal or numerical. There are two different
approaches for evaluation: value approximation using a lookup table and the application of weighted basis functions (Powell 2007).
In this paper, we consider PDS spaces represented by LTs, because they allow a more accurate and unrestricted mapping between appearances and values. Nevertheless, they often
lead to computational intractability and consume large amounts of memory. In the following, we define the classical static LT and
show the impact of the LT size to approximation process and solution quality. As a result,
we derive the motivation for the dynamic LT.
Value Function Approximation
In every decision point in the operational system (appearance), a set of possible actions is
given. The appearance and the actions are assigned to a set of post decision states to apply
MDP as seen in figure 1. MDP chooses the decision leading to the most valuable PDS considering the immediate contribution of the actions
seen in (1). The expected values are provided
by the value function approximation. The chosen decision is reassigned to the according action, which is applied to the system appearance in the simulation. By updating the state
values, ADP approximates the value function
during the iterations. Here, every post decision state is initialized with value V̂0 . During
the simulation, the values are updated with the
realized values V̂run as given
V̂i+1 = (1 − α) V̂i + α V̂run .
State Space Representation
(3)
8
3.2.1
the according SLT contains four entries
P = {p1 , . . . , p4 }. Additionally, we consider
an aggregated SLT P̄ = p̄. The value in every
entry in P follows a normal distribution with
expected value V (pi ) and standard deviation
σ 2 (pi ). Additionally, let νi the probability of
observing pi . The according values are shown
in Table 1. Here, the entry values behave
heterogeneous, the expected values and deviations rise from p1 to p4 . The according values
of p̄ are a result of the aggregation.
Static Lookup Table
A PDS p is represented by a set of key parameters p = (k1 , k2 , . . . , kn ), ki ∈ Ki , ∀ i ∈
{1, . . . , n}. With no loss of generality, we only
consider numerical parameters. The resulting
LTs might be added extra non-numerical parameters. The PDS space is evaluated by a ndimensional LT. Each axis of the LT represents
a parameter. The axes are divided into static
intervals. The PDSs are assigned to the entries
of the static LT (SLT) by mapping parameter
values to the LT-intervals. The size of a SLT is
defined by the interval lengths and is essential
for the approximation process. Small interval
lengths allow a highly accurate representation
of an appearance. Nevertheless, they impede
the approximation process and are computationally challenging. Additionally, a large SLT
requires a significant amount of memory (Sutton and Barto 1998). For a certain parameter,
let K 1 be the LT with the smallest interval
length. Now, within K 1 , we can aggregate the
parameters to intervals of size I. For a particular parameter, this leads to:
To show the impact of the SLT size on
the approximation process, we calculate the
expected necessary number of iterations n∗i
for every entry and the total number n∗ for
termination, i.e., for sufficient approximation
in every entry. As a termination criterion, we
allow a difference of the average values V̂ to
the expected values up to 0.05. Further, we
calculate the number of iterations n̄ for the
aggregated entry p̄ and compare the results.
For each entry, we calculate the distribution of
the expected realizations average (i.e., α = n1 ).
Then, we derive the probability Pk that the
average lies in the allowed deviation range
after k entry iterations. We determine n∗i as
the minimal number of iterations with probability higher than 95.0%. Let V̂l (pi ) be the
value of the lth entry realization of pi . Then,
the minimum number of iterations for entry pi can be calculated as seen in Formula (6).
|K 1 |
.
(4)
I
Let I be the interval length of all KiI , i ∈
{1, . . . , n} and P I the resulting LT of the post
decision state space. Then, combining several
parameters, we can calculate the impact of uniform aggregation to the size of P I :
|K I | =
n
∏
n
1 ∏ 1
= n
|Ki |
I
Additionally, the total number of iterations
(5) n∗ is the maximum number of individual iteri=1
i=1
ations considering the probability of observing
In essence, the number of possible LT- the entry:
entries decreases significantly given larger
{ ∗}
interval lengths within the parameter sets.
ni
∗
.
(7)
n = max
νi
i∈{1,...,4}
Let us show by a simplified, single-stage
example how the interval length impacts Hence, rarely visited entries increase the necthe approximation process. Given a single essary number of iterations of the algorithm.
parameter, in the lowest level of aggregation In the example, p4 requires the most visits for
|P | ≤
I
|KiI |
9
Table 1: Expected Entry Values and Deviation
Entries p1
p2
p3
p4
p̄
V
1.0
2.0
4.0
6.0
3.0
σ2
0.0
0.2
0
1.0
0.3
1, 537
3.0
0.4
4, 610
4.0
0.1
6, 146
4.3
1.0
6, 590
ν
n∗i
(
)
}
k
∑
1
n∗i = arg min Pk V (pi ) −
V̂l (pi ) ≤ 0.05 ≥ 0.95
k
+
k∈N
{
(6)
l=1
termination with n∗4 = 6, 146. Due to the probability ν4 = 0.1 of observing p4 , the expected
number of runs for
termination of the whole
n∗
process is n∗ = 0.14 = 61, 460.
We now show that aggregation can reduce
the number of iterations significantly. Therefore, the entries are aggregated to one single
entry P̄ = {p̄}. The new expected value of
V (p̄) is the weighted sum of the single expected
values:
m
∑
V (p̄) =
νi V (pi ) = 3.0
(8)
i=1
The variance σ 2 (p̄) can be calculated as
σ 2 (p̄)=
∑4
i=1
νi V (pi )2 −(
∑4
i=1
2
νi V (pi )) =4.3.
(9)
The probability distribution of V (p̄) is the
weighted sum of the single distributions. The
number of necessary iterations to achieve a
maximal deviation of 0.05 with a probability
of at least 95% is n̄ = 6, 590. The number
of necessary iterations is reduced by 89.3%
compared to P . For our example, aggregation
allows a significantly faster approximation.
Nevertheless, aggregation has a large impact
on decision making. As we can see from (9),
the variance of V (p̄) exceeds the variance
of all original entries p1 , . . . , p4 . Aggregation results in a rise of the deviation of the
entry value. We additionally experience a
bias |V (p̄) − V (pi )| up to 3.0 for all former
states. Using p̄ results in a less accurate representation and may lead to ineffective decisions.
We now show the influence of the aggregation to the solution quality. We consider
a decision point s and two possible decisions
xa , xb . Decision xa leads to post decision
state (or entry) p1 , xb to p4 . The immediate
contributions are C(xa ) = 2.0, C(xb ) = 1.0.
Considering (1), in P , the overall values are
C(xa ) + V (p1 ) = 3.0 and C(xb ) + V (p4 ) = 7.0.
Hence, we choose xb and achieve an expected
overall outcome of 7.0. With SLT P̄ , the two
decisions result in the same entry p̄. Hence,
decision xa is chosen with overall outcome
of 3.0. Due to aggregation, we experience
a loss of 4.0. In essence of the example,
aggregation allows a faster approximation,
but simultaneously leads to a loss in solution
quality.
Obviously, the selection of the interval
lengths strongly impacts the approximation
process. A small tuning results in a significantly different and hard to predict behavior
of the approximation process. Given an SLT,
the interval lengths have to be defi