Load Balancing, Beowulf, and Grid Computing

Load Balancing and Grid
Computing
David Finkel
Computer Science Department
Worcester Polytechnic Institute
Computer Science Department
1
References
• “The Anatomy of the Grid”, Ian Foster, Carl Kesselman,
Steven Tuccke, International Journal of Supercomputer
Applications, 2001
• “A Performance Oriented Migration Framework for the
Grid”, Satish S. Vadhiyar and Jack J. Dongarra,
Proceedings of CCGrid 2003, Third IEEE/ACM
International Symposium on Cluster Computing and the
Grid
• Innumerable papers by PEDS members Finkel, Wills and
Finkel, and Claypool and Finkel, with additional coauthors.
Computer Science Department
2
What is the Grid? (Foster et al paper)
• Distributed computing infrastructure for
advanced science and engineering
• Runs over the Internet, potentially worldwide
• Several approaches have emerged: Paper
discusses Globus Toolkit
Computer Science Department
3
The Grid Concept
• Coordinated resource sharing and problem
solving in dynamic, multi-institutional
virtual organizations.
• Highly controlled, with resource providers
and consumers defining what is shared and
the conditions of sharing.
• Issues to address: Protocols, privacy,
security, costs, …
Computer Science Department
4
Related approaches
•
•
•
•
•
Application Service Providers
Storage Service Providers
CORBA
DCE
Volunteer Computing (SETI @ home,
Distriblets, SLINC)
Computer Science Department
5
Computer Science Department
6
Fabric Layer
• Provides access and control to resources
• Resources: Computational, storage, network
• Enquiry functions: to determine
characteristics and state of a resource
• Management functions: Start, stop
computations, reserve bandwidth
Computer Science Department
7
Collective Layer
• Protocols and services not associated with a
particular resource
– Directory services for discovery of resources
– Co-allocation, scheduling, brokering
– Monitoring the Virtual Organization for failure,
intrusion detection, etc.
Computer Science Department
8
Load Sharing - Overview
• Transferring work from a heavily loaded
node to a lightly loaded node
• Purpose: To improve application
performance
• Transferring processes not suitable for finegrain parallelism
• Also known as: Load Balancing, Process
Migration.
Computer Science Department
9
Load Sharing Issues
•
•
•
•
•
Criteria for heavily-loaded, lightly loaded
Measuring load (policy, implementation)
Exchanging information about load, state
Which jobs to transfer
When to transfer (new processes only,
already-running processes)
Computer Science Department
10
Load Sharing in the Grid
• “A Performance Oriented Migration
Framework for the Grid”, Vadhiyar and
Donngarra
• Part of the GrADS project – Grid
Application Development System – based at
Univ. of Tennessee and other institutions
• Designed for long-running computations
Computer Science Department
11
Load Sharing in the Grid - 2
• Basic idea – the load sharing system can run a
performance model of a computation to estimate
running time and resource requirements.
• Application programmer is responsible for
providing performance model for the application,
and hooks to stop application, checkpoint state,
and re-start application.
• Based on MPI Programming Library, Globus
Toolkit
Computer Science Department
12
Load Sharing in the Grid - 3
• Before application begins, Application
Manager runs performance model to predict
execution times, number of processors.
• Determines whether an appropriate set of
processors is available, schedules jobs
• Monitors process of application as it runs
Computer Science Department
13
Computer Science Department
14
Load Sharing in the Grid - 4
• Load sharing can occur if
– Application progress is delayed
– Additional resources become available
• App Manager sends message to application
so it will
– Checkpoint
– Stop computation
• Re-start on new collection of nodes
Computer Science Department
15
Computer Science Department
16
Research Directions
• Load sharing on the Grid:
– There’s a large body of pre-Grid research of
load balancing in distributed systems
– Can the results of this research be used to
design load balancing systems for the Grid
Computer Science Department
17
Load Balancing and Grid
Computing
David Finkel
Computer Science Department
Worcester Polytechnic Institute
Computer Science Department
18