Creating Meaningful Training Data for Difficult Job Shop Scheduling

Creating
TrainingInstances
Data forforDicult
JobMeaningful
ShopOrdinal
Scheduling
Regression
Helga Ingimundardóttir
University of Iceland
March 28th , 2012
Outline
Introduction
Job Shop Scheduling
Ordinal Regression
Generating Training Data
Future Work
2 of 36
Introduction
Job Shop Scheduling
Ordinal Regression
Generating Training Data
Future Work
3 of 36
Motivation
General Goal
• General goal is how to search for
problem domain.
good
solutions for an arbitrary
• Automate the design of optimization algorithms.
• Use of randomly sampled problem instances and their corresponding
optimal vs. suboptimal solutions.
4 of 36
Case Study: Job Shop Scheduling Problem
There are many heuristic methods available for JSSP, where dierent
methodologies outperform other depending on the particular problem
instance.
Relationship between problem structure and heuristic
performance for JSSP
• Finding out how the problem instances dier in structure.
• Determine when a particular heuristic solution is likely to fail and
explore in further detail the causes of such failures.
• Preliminary research was done in Ingimundardottir and Runarsson
[2011] for a simple data distribution. This talk will be on how to
deal with harder instances.
5 of 36
Case Study: Job Shop Scheduling Problem
There are many heuristic methods available for JSSP, where dierent
methodologies outperform other depending on the particular problem
instance.
Relationship between problem structure and heuristic
performance for JSSP
• Finding out how the problem instances dier in structure.
• Determine when a particular heuristic solution is likely to fail and
explore in further detail the causes of such failures.
• Preliminary research was done in Ingimundardottir and Runarsson
[2011] for a simple data distribution. This talk will be on how to
deal with harder instances.
5 of 36
Previous Work
Methods previously proposed for investigating between problem
structure and heuristic eectiveness:
• Footprints in instance space [Smith-Miles and Lopes, 2010, Corne
and Reynolds, 2011] or landmarking [Pfahringer and Bensusan,
2000] which give an indicator how an algorithm generalizes over the
instance space.
• The focus has been on searching through a large set of algorithms
and determine the most suitable for a given subset of the instance
space.
• However, there is a lack of research based on a single algorithm and
understanding how it works on the instance space in the hopes of
being able to extrapolate where it excels in order to aid its failing
aspects.
6 of 36
Previous Work
Methods previously proposed for investigating between problem
structure and heuristic eectiveness:
• Footprints in instance space [Smith-Miles and Lopes, 2010, Corne
and Reynolds, 2011] or landmarking [Pfahringer and Bensusan,
2000] which give an indicator how an algorithm generalizes over the
instance space.
• The focus has been on searching through a large set of algorithms
and determine the most suitable for a given subset of the instance
space.
• However, there is a lack of research based on a single algorithm and
understanding how it works on the instance space in the hopes of
being able to extrapolate where it excels in order to aid its failing
aspects.
6 of 36
Previous Work
Methods previously proposed for investigating between problem
structure and heuristic eectiveness:
• Footprints in instance space [Smith-Miles and Lopes, 2010, Corne
and Reynolds, 2011] or landmarking [Pfahringer and Bensusan,
2000] which give an indicator how an algorithm generalizes over the
instance space.
• The focus has been on searching through a large set of algorithms
and determine the most suitable for a given subset of the instance
space.
• However, there is a lack of research based on a single algorithm and
understanding how it works on the instance space in the hopes of
being able to extrapolate where it excels in order to aid its failing
aspects.
6 of 36
Previous Work
Methods previously proposed for investigating between problem
structure and heuristic eectiveness:
• Footprints in instance space [Smith-Miles and Lopes, 2010, Corne
and Reynolds, 2011] or landmarking [Pfahringer and Bensusan,
2000] which give an indicator how an algorithm generalizes over the
instance space.
• The focus has been on searching through a large set of algorithms
and determine the most suitable for a given subset of the instance
space.
• However, there is a lack of research based on a single algorithm and
understanding how it works on the instance space in the hopes of
being able to extrapolate where it excels in order to aid its failing
aspects.
6 of 36
Introduction
Job Shop Scheduling
Ordinal Regression
Generating Training Data
Future Work
7 of 36
Job Shop Scheduling (1)
JSSP
A simple job shop scheduling problem is where
a set of m machines, subject to constraints:
n
jobs are scheduled on
• each job must follow a predened machine order,
• that a machine can handle at most one job at a time.
The objective is to schedule the jobs so as to minimize the maximum
completion time, known as the makespan, µ.
• Job
j has an indivisible operation time on machine a, p (j , a), which
is assumed to be integral, where j ∈ {1, .., n} and a ∈ {1, .., m}.
8 of 36
Job Shop Scheduling (2)
• Job
j has a specied processing order through the machines, it is a
permutation vector, σ , of {1, .., m}, i.e. j can be processed on
σ(j , a) only after it has completed on σ(j , a − 1).
Problem instance generation
A total of ` = 1, 500 random problem instances were generated by
xing number of jobs n = 6 and machines m = 6 and:
• Sampling discrete processing times from the uniform distribution,
p ∼ U(1, 200).
◦
Ingimundardottir and Runarsson [2011] used a uniform distribution
U(50, 100)
and
U(1, 100)
with robust results.
• Machine order σ is a random permutation of {1, ..., m}.
9 of 36
Job Shop Scheduling (3)
Dispatching rules (DR) for constructing JSSP
• Starts with an empty schedule and adds on one job at a time.
• When a machine is free the DR inspects the waiting/available jobs
and selects the job with the highest priority. (see Panwalkar and
Iskander [1977] for a survey of over 100 DR).
• Complete schedule consists of ` =
• At each dispatch/step
are calculated.
k
n
× m sequential dispatches.
features ϕ(k ) for the temporal schedule
• Performance of DR is compared with its optimal makespan, as a
ratio: % = µµDR
opt .
10 of 36
Features for JSSP
ϕ
ϕ1
ϕ2
ϕ3
ϕ4
ϕ5
ϕ6
ϕ7
ϕ8
ϕ9
ϕ10
ϕ11
ϕ12
ϕ13
ϕ14
ϕ15
ϕ16
Feature description
processing time for job on machine
start-time
end-time
when machine is next free
current makespan
work remaining
most work remaining
slack time for this particular machine
slack time for all machines
slack time weighted w.r.t. number of operations already assigned
time job had to wait
size of slot created by assignment
total processing time for job
total processing time for all jobs
mean processing time for all jobs
range of processing times over all jobs
Table: Feature space
F
for JSSP based on Ingimundardottir and Runarsson
[2011] which successfully captured the essence of a JSSP data structure.
11 of 36
Job Shop Scheduling
Example
A schedule being built at step k = 17. The dashed boxes represent six
dierent possible jobs that could be scheduled next using a DR.
12 of 36
Introduction
Job Shop Scheduling
Ordinal Regression
Generating Training Data
Future Work
13 of 36
Ordinal Regression (1)
Preference learning problem
Specied by a set of preference pairs:
n
o
S =
{z o , +1)}`k =1 , {z s , −1)}`k =1 | ∀o ∈ O(k ) , s ∈ S (k ) ⊂ Φ × Y
where the set of point/rank pairs are:
zo = ϕ(o ) − ϕ(s ) , ranked +1
Sub-optimal decision: zs = ϕ(s ) − ϕ(o ) , ranked −1
• Optimal decision:
•
In Ingimundardottir and Runarsson [2011] the training set is created
from known optimal sequences of dispatch.
14 of 36
Ordinal Regression (2)
Logistic regression
• Mapping of points to ranks: {h(·) : Φ 7→
ϕo ϕs
⇔
h
Y
} where
(ϕo ) > h(ϕs )
• The preference is dened by a linear function:
h
(ϕ) =
d
X
i =1
iϕ =
w
w ·ϕ
.
• Logistic regression learns the optimal parameters
15 of 36
w.
Introduction
Job Shop Scheduling
Ordinal Regression
Generating Training Data
Future Work
16 of 36
Problem Structure & Heuristic Eciency
Rice [1976] framework for algorithm selection problem
• Problem space or instance space P = {(pj , σj )}`j =1
• Feature space F , measurable properties of the instances in P (see
table on slide 9)
• Algorithm space A = {h}, set of all algorithms under inspection,
• Performance space Y = {%j }`j =1 , the outcome for P using an
algorithm from A
•
: A × F → Y is the mapping for algorithm and feature space
onto the performance space, namely the step-by-step scheduling.
Y
17 of 36
Preliminary Experiments
Example
Performance w.r.t. % for training and test data using dierent
algorithms for six-job six-machine JSSP when p ∼ U(1, 200).
18 of 36
Generating JSSP training data
• Determine the order/sequence of jobs assigned, at the rst
available time slot (to the left)
• When job is assigned, new state occurs and features are updated
• At each time step, a good/bad ordinal data pair is only created if
nal makespan is dierent.
Comparison w.r.t. optimal decisions
◦
The approach taken in Ingimundardottir and Runarsson [2011] was to
compare schedules w.r.t. known optimal solutions.
◦
The approach taken now is to verify analytically, by xing the current
temporal schedule as an initial state, whether it is possible to obtain
an optimal makespan for that state.
•
This takes into consideration when there are multiple optimal solutions
to the same problem instance.
19 of 36
Generating JSSP training data
• Determine the order/sequence of jobs assigned, at the rst
available time slot (to the left)
• When job is assigned, new state occurs and features are updated
• At each time step, a good/bad ordinal data pair is only created if
nal makespan is dierent.
Comparison w.r.t. optimal decisions
◦
The approach taken in Ingimundardottir and Runarsson [2011] was to
compare schedules w.r.t. known optimal solutions.
◦
The approach taken now is to verify analytically, by xing the current
temporal schedule as an initial state, whether it is possible to obtain
an optimal makespan for that state.
•
This takes into consideration when there are multiple optimal solutions
to the same problem instance.
19 of 36
Job Shop Scheduling Game Tree
Example
Game Tree for JSSP for the rst two dispatches.
20 of 36
Job Shop Scheduling Game Tree
Example
In the top layer one can see all possible dispatches (dashed) for an
empty schedule.
21 of 36
Job Shop Scheduling Game Tree
Example
In the middle layer one can see all possible dispatches given that one
of the dispatches from the layer above has been executed (solid).
22 of 36
Job Shop Scheduling Game Tree
Example
In the bottom layer, task j = 3 on machine M4 has been dispatched
and all possible next dispatches from that time step.
23 of 36
Job Shop Scheduling Game Tree
Example
Game Tree for JSSP for the rst two dispatches.
24 of 36
Game Tree
Main Characteristics for JSSP
• Root node denotes the initial (empty) schedule
• Leaf nodes denote the complete schedule
• Height of the tree is ` =
n
·m
• Worst case scenario a full n-ary tree
• Traversing from root to leaf gives the sequence of dispatches that
yielded the resulting schedule
• Tree creation can be done recursively for all possible dispatches.
• Such an exhaustive search would yield at the most n` leaf nodes.
• Even for small dimensions of
n and
expensive to investigate them all.
25 of 36
m
, it is too computationally
Game Tree
Main Characteristics for JSSP
• Root node denotes the initial (empty) schedule
• Leaf nodes denote the complete schedule
• Height of the tree is ` =
n
·m
• Worst case scenario a full n-ary tree
• Traversing from root to leaf gives the sequence of dispatches that
yielded the resulting schedule
• Tree creation can be done recursively for all possible dispatches.
• Such an exhaustive search would yield at the most n` leaf nodes.
• Even for small dimensions of
n and
expensive to investigate them all.
25 of 36
m
, it is too computationally
Size of training set (1)
A seperate DR for each dispatch iteration
• At each dispatch k a number of data pairs are created
◦ for each of the N problem instance created.
• Deliberately create a separate data set for each dispatch
◦ Resulting in ` linear scheduling rules for solving a n × m JSSP.
Dening the size of the training set as l = |Φ|, gives the size of the
preference set as |S | = 2l .
• If
l
26 of 36
is too large, than sampling needs to be done.
Size of training set (2)
Sampling approach in Ingimundardottir and Runarsson [2011]
The strategy was to follow some single optimal job j ∈ O(k ) , thus
creating |O(k ) | · |S (k ) | feature pairs at each dispatch k , resulting in a
training size of:
!
N X
`
X
l =
|O(k ) | · |S (k ) |
q=1 k =1
For the data distribution considered there, this simple sampling was
sucient for a favourable outcome. However for a considerably harder
data distribution this strategy did not work well (see slide 15).
27 of 36
Size of training set (3)
Applying CMA-ES to minimize % w.r.t. w directly gave signicantly
more favourable result in predicting optimal vs. suboptimal dispatches.
• Nature of CMA-ES is to explore suboptimal routes until it
converges to an optimal one.
28 of 36
Size of training set (4)
CMA-ES inspired approach
• Obviously looking only into one optimal route isn't sucient
information.
• Training set should also make the distinction between suboptimal
and sub-suboptimal features, etc.
◦
e.g. inspect the Pareto ranking at each dispatch
k,
creating an even greater training set that needs to be sampled.
Due to the nature of the sequence representation, the earlier stages of
the dispatching are more or less equivalent (and thus irrelevant),
hence it could be appropriate to follow some random path to begin
with and then follow some policy until completion at step `.
29 of 36
Size of training set (5)
Main aspects for generating training data
• What sort of rankings should be compared during each step?
• Which path(s) should be investigated? Only one, some or all paths?
30 of 36
Introduction
Job Shop Scheduling
Ordinal Regression
Generating Training Data
Future Work
31 of 36
Future Work
• This talk pointed out some aspects that need to be taken into
consideration when creating training data, especially since
Ingimundardottir and Runarsson [2012] illustrated the interaction of
a specic algorithm with a given data structure and its properties.
• Feature selection is of paramount importance in order for
algorithms to become successful, therefore one needs to give great
thought to how features are selected.
◦
What kind of feature behaviour yield bad schedules? And can they
be steered onto the path of more promising feature characteristics?
• In general, this sort of investigation can be used in better algorithm
design which is more equipped to deal with varying data instances.
◦
Which could
algorithm.
32 of 36
tailor to individual data instances, i.e.
footprint-oriented
Future Work
• This talk pointed out some aspects that need to be taken into
consideration when creating training data, especially since
Ingimundardottir and Runarsson [2012] illustrated the interaction of
a specic algorithm with a given data structure and its properties.
• Feature selection is of paramount importance in order for
algorithms to become successful, therefore one needs to give great
thought to how features are selected.
◦
What kind of feature behaviour yield bad schedules? And can they
be steered onto the path of more promising feature characteristics?
• In general, this sort of investigation can be used in better algorithm
design which is more equipped to deal with varying data instances.
◦
Which could
algorithm.
32 of 36
tailor to individual data instances, i.e.
footprint-oriented
Future Work
• This talk pointed out some aspects that need to be taken into
consideration when creating training data, especially since
Ingimundardottir and Runarsson [2012] illustrated the interaction of
a specic algorithm with a given data structure and its properties.
• Feature selection is of paramount importance in order for
algorithms to become successful, therefore one needs to give great
thought to how features are selected.
◦
What kind of feature behaviour yield bad schedules? And can they
be steered onto the path of more promising feature characteristics?
• In general, this sort of investigation can be used in better algorithm
design which is more equipped to deal with varying data instances.
◦
Which could
algorithm.
32 of 36
tailor to individual data instances, i.e.
footprint-oriented
Future Work
• This talk pointed out some aspects that need to be taken into
consideration when creating training data, especially since
Ingimundardottir and Runarsson [2012] illustrated the interaction of
a specic algorithm with a given data structure and its properties.
• Feature selection is of paramount importance in order for
algorithms to become successful, therefore one needs to give great
thought to how features are selected.
◦
What kind of feature behaviour yield bad schedules? And can they
be steered onto the path of more promising feature characteristics?
• In general, this sort of investigation can be used in better algorithm
design which is more equipped to deal with varying data instances.
◦
Which could
algorithm.
32 of 36
tailor to individual data instances, i.e.
footprint-oriented
Future Work
• This talk pointed out some aspects that need to be taken into
consideration when creating training data, especially since
Ingimundardottir and Runarsson [2012] illustrated the interaction of
a specic algorithm with a given data structure and its properties.
• Feature selection is of paramount importance in order for
algorithms to become successful, therefore one needs to give great
thought to how features are selected.
◦
What kind of feature behaviour yield bad schedules? And can they
be steered onto the path of more promising feature characteristics?
• In general, this sort of investigation can be used in better algorithm
design which is more equipped to deal with varying data instances.
◦
Which could
algorithm.
32 of 36
tailor to individual data instances, i.e.
footprint-oriented
Future Work
• This talk pointed out some aspects that need to be taken into
consideration when creating training data, especially since
Ingimundardottir and Runarsson [2012] illustrated the interaction of
a specic algorithm with a given data structure and its properties.
• Feature selection is of paramount importance in order for
algorithms to become successful, therefore one needs to give great
thought to how features are selected.
◦
What kind of feature behaviour yield bad schedules? And can they
be steered onto the path of more promising feature characteristics?
• In general, this sort of investigation can be used in better algorithm
design which is more equipped to deal with varying data instances.
◦
Which could
algorithm.
32 of 36
tailor to individual data instances, i.e.
footprint-oriented
Bibliography (1)
D. Corne and A. Reynolds. Optimisation and generalisation: footprints
in instance space. Parallel Problem Solving from Nature, PPSN XI,
pages 2231, 2011.
N. Hansen and A. Ostermeier. Completely Derandomized
Self-Adaptation in Evolution Strategies. Evol. Comput., 9(2):
159195, June 2001.
H. Ingimundardottir and T. Runarsson. Supervised learning linear
priority dispatch rules for job-shop scheduling. In C. Coello, editor,
Learning and Intelligent Optimization, volume 6683 of Lecture Notes
in Computer Science, pages 263277. Springer Berlin, jan. 2011.
33 of 36
Bibliography (2)
H. Ingimundardottir and T. P. Runarsson. Determining the
Characteristic of Dicult Job Shop Scheduling Instances for a
Heuristic Solution Method. In M. Schoenauer, editor, Learning and
Intelligent Optimization, 6th International Conference, LION 6,
Paris, jan. 2012. Springer Lecture Notes in Computer Science.
S. Panwalkar and W. Iskander. A Survey of Scheduling Rules.
Operations Research, 25(1):4561, Jan. 1977.
B. Pfahringer and H. Bensusan. Meta-learning by landmarking various
learning algorithms. on Machine Learning, 2000.
J. R. Rice. The algorithm selection problem.
15:65118, 1976.
34 of 36
Advances in Computers
,
Bibliography (3)
K. Smith-Miles and L. Lopes. Generalising Algorithm Performance in
Instance Space: A timetabling case study. In C. A. Coello Coello,
editor, Learning and Intelligent Optimization, 5th International
Conference, LION 5, Lecture Notes in Computer Science, 2010.
35 of 36
Thank you for your attention
Questions?
Helga Ingimundardóttir, [email protected]
36 of 36
Thank you for your attention
Questions?
Helga Ingimundardóttir, [email protected]
36 of 36