Accelerating a local search algorithm for large instances of the

Motivation
Initial algorithm
Adaptation
Large instances
Conclusion
Accelerating a local search algorithm for large
instances of the independent task scheduling
problem with the GPU
Frederic Pinel
Johnatan Pecero
Pascal Bouvry
Computer Science and Communications
University of Luxembourg
GreenDays@Paris
Motivation
Initial algorithm
Adaptation
Outline
Motivation
Initial algorithm
Adaptation
Large instances
Conclusion
Large instances
Conclusion
Motivation
Initial algorithm
Adaptation
Large instances
Conclusion
Motivation
• Independent tasks
• Makespan, combines several perspectives:
• User: flowtime
• Provider: load balance, energy (low machine heterogeneity)
Motivation
Initial algorithm
Adaptation
Large instances
Parallel CGA
Selection
Recombination
Mutation
Replacement
Figure: Generating solution
Conclusion
Motivation
Initial algorithm
Adaptation
Large instances
Parallel CGA
• Parallel asynchronous cellular genetic algorithm
• Initialized with heuristic (Min-Min)
• Local search
Conclusion
Motivation
Initial algorithm
Adaptation
Large instances
Conclusion
1.0
Feedback
0.0
0.2
0.4
0.6
0.8
main effect
interactions
Pop
P_mutation
Iter_mutation P_crossover
P_search
Iter_search
Load_search
Threads
Motivation
Initial algorithm
Adaptation
Large instances
Adaptation
• Simplified algorithm
• Min-Min, incremental formulation
• Increased local search, complete-state formulation
Conclusion
Motivation
Initial algorithm
Adaptation
Large instances
Results
1300
1200
1100
1000
900
800
700
Min-Min
2PH
PA-CGA-1 PA-CGA-2
PACGA-3
PACGA-4
PACGA-5
Figure: Consistent, high-h. tasks, low-h. machines
Conclusion
Motivation
Initial algorithm
Adaptation
Large instances
Results
1150
1100
1050
1000
950
900
850
800
750
700
650
Min-Min
2PH
PA-CGA-1 PA-CGA-2 PA-CGA-3 PA-CGA-4 PA-CGA-5
Figure: Semi-consistent, high-h. tasks, low-h. machines
Conclusion
Motivation
Initial algorithm
Adaptation
Large instances
Results
860
850
840
830
820
810
800
790
780
770
760
Min-Min
2PH
PA-CGA-1 PA-CGA-2 PA-CGA-3 PA-CGA-4 PA-CGA-5
Figure: Consistent, low-h. tasks, low-h. machines
Conclusion
Motivation
Initial algorithm
Adaptation
Large instances
Results
770
760
750
740
730
720
710
700
690
Min-Min
2PH
PA-CGA-1 PA-CGA-2 PA-CGA-3 PA-CGA-4 PA-CGA-5
Figure: Semi-consistent, low-h. tasks, low-h. machines
Conclusion
Motivation
Initial algorithm
Adaptation
Large instances
Min-Min on GPU
Figure: Parallel reduction in Min-Min
Conclusion
Motivation
Initial algorithm
Adaptation
Large instances
Min-Min runtime
1e+06
CPU 1 thread
CPU 4 threads
CPU 8 threads
GPU
100000
Runtime (sec)
10000
1000
100
10
1
4096x128
8192x256
16384x512
Instance size
32768x1024
Figure: Runtime Min-Min
65536x2048
Conclusion
Motivation
Initial algorithm
Adaptation
Large instances
Performance
2050
2000
32
320
3200
CGA
1950
Makespan
1900
1850
1800
1750
1700
1650
512x16
4096x128
8192x256
16384x512
Instance size
Figure: Makespan
32768x1024
65536x2048
Conclusion
Motivation
Initial algorithm
Adaptation
Large instances
Performance
600
32
320
3200
CGA - Kgen
500
Runtime (sec)
400
300
200
100
0
512x16
4096x128
8192x256
16384x512
Instance size
Figure: Runtime
32768x1024
65536x2048
Conclusion
Motivation
Initial algorithm
Adaptation
Conclusion
• Failure
• Solution → Feedback → loop
• Learning process
Large instances
Conclusion
Motivation
Initial algorithm
Adaptation
Large instances
Machine learning opportunities
• Learn on problem instance
• Task profiling
• Co-scheduling
• Learn allocation rules
• Adapt (parameters, heuristics)
• Algorithm (oracle: solved instances)
Conclusion
Motivation
Initial algorithm
Adaptation
Questions
Thank you.
Large instances
Conclusion