ppt

Cluster Resource Management:
Scalable Approaches
Ning Li
Jordan Parker
Mid-semester Status Report
CS 736 – Fall 2000
Ning Li
Jordan Parker
Scalable Cluster Resource
Management
1
Why Study Cluster Resource
Management?
• Clusters have become increasingly popular
for large parallel computing.
– Web Servers
• Clusters are becoming increasingly large to
the order of thousands of nodes.
• Clusters are providing multiple services.
Ning Li
Jordan Parker
Scalable Cluster Resource
Management
2
Multiple Services: Example
• An Internet Service Provider is hosting
many different websites for clients
– How do you schedule according to the amount
of bandwidth a client is paying for?
• Proportional Share
• Cluster Reserves
• Our technique more scalable.
Ning Li
Jordan Parker
Scalable Cluster Resource
Management
3
Overview
•
•
•
•
Introduction / Reason for Research
Related Work
Infrastructure
Evaluation
Ning Li
Jordan Parker
Scalable Cluster Resource
Management
4
Related Work
•
•
•
•
•
Andrea C. Arpaci-Dusseau, David E. Culler, Alan Mainwaring, Scheduling with
Implicit Information in Distributed Systems, Sigmetrics'98 Conference on the
Measurement and Modeling of Computer Systems
Armando Fox, Steven D. Gribble, Yatin Chawathe, Eric A. Brewer, Cluster-Based
Scalable Network Services, Proc. 1997 Symposium on Operating Systems Principles
(SOSP-16), St-Malo, France, Oct. 1997.
M. Aron, P. Druschel, and W. Zwaenepoel. Cluster reserves: A mechanism for
resource management in cluster-based network servers. In Proceedings of ACM
SIGMETRICS 2000, June 2000.
Waldspurger, C.A. and Weihl, W.E., Lottery Scheduling: Flexible Proportional-Share
Resource Mangement, Proceedings of the First Symposium on Operating Systems
Design and Implementation, Monterey CA, November 1994, pp. 1-11.
NS – Network Simulator Manual, http://www.isi.edu/nsnam/ns/ns-documentation.html.
Ning Li
Jordan Parker
Scalable Cluster Resource
Management
5
What make us different?
• Goal: to provide a scalable solution for resource
management.
• Other papers focused primarily on just having
good management
– This often meant 1 manager for all the nodes.
– Clearly this could present a scalable bottleneck
• Effectiveness: Other solutions probably better for
smaller clusters, we hope to be better for large
(>1000 nodes) clusters.
Ning Li
Jordan Parker
Scalable Cluster Resource
Management
6
The Management Scheme
• Cluster Reserves with multiple managers
– Mainly a comparison
• A new Lottery like algorithm (Banks)
• A hierarchal management network
Ning Li
Jordan Parker
Scalable Cluster Resource
Management
7
Infrastructure
• The Hierachal Algorithms
• Use NS to simulate our algorithms
Ning Li
Jordan Parker
Scalable Cluster Resource
Management
8
Hierarchal View
1
2
5
Ning Li
Jordan Parker
6
3
7
8
4
9
Scalable Cluster Resource
Management
10
11
12
9
A Problem and a Solution
• Problem: not scalable
Solution: Hierarchy! + Fault Tolerance
(a nice little example, perhaps with 2 level managers)
Ning Li
Jordan Parker
Scalable Cluster Resource
Management
10
Approach 1:
• modify "Cluster Reserves" optimization algorithm
– use it when manager manages nodes
– AND when level_n+1 manager manages level_n managers.
Ning Li
Jordan Parker
Scalable Cluster Resource
Management
11
Approach 2:
• introduce bank account mechanism
– use bank algorithm for manager managing nodes
– use transfer strategy for level_n+1 manager managing level_n
managers
Ning Li
Jordan Parker
Scalable Cluster Resource
Management
12
Problem Specification:
N: # of nodes in a cluster
S: # of service classes
T: a vector of N elements, T_i: resource (# of tickets) on node I
T_total: total resource in cluster (not in "cluster" paper)
r and u: NxS matrices, r_ij and u_ij: the percentage resource allocation
and resource usage, respectively, at node i for service class j.
D: a vector of S elements, D_j: the desired percentage resource allocation
for service class j over the cluster.
Input: r and u and the vector T and D
Output: a NxS matrix R, R_ij: the new percentage resource
allocation for service class j on node i.
Ning Li
Jordan Parker
Scalable Cluster Resource
Management
13
Solution Step 1:
• Compute the least feasible deviation between desired and actual
allocations.
S | N
|
Minimize sum|sum R_ij*T_i - T_total*D_j|
j=l|i=1
|
(1)
• Resource allocations on any cluster-node should sum to no more than
100.
for any i in 1..N,
S
sum R_ij <= 100
j=1
• On any node, new allocation should be no more than the usage if the
node is not a resource sink, i.e. if previous allocation exceeds the
usage.
for any i,j
Ning Li
Jordan Parker
R_ij <= u_ij
if r_ij > u_ij
Scalable Cluster Resource
Management
14
Solution Step 2:
Compute the new resource allocations s.t.
1) the deviation computed in the first step is achieved, and
2) the computed resource allocations are close to the ideal
allocation (D) (different from paper, to see which is
better)
N
S
Minimize sum sum(R_ij - D_j)^2
i=l j=1
Ning Li
Jordan Parker
Scalable Cluster Resource
Management
(2)
15
A New Idea/Addition
• Distribute unassigned cluster resource to service classes
who need it
• Since manager has the knowledge of when and how much
resource a service class contributed before, it can give
appropriate priorities to those classes when assigning
unused resource.
Ning Li
Jordan Parker
Scalable Cluster Resource
Management
16
Approach 2: Bank Account Mechanism
• Each manager has a bank.
• Each bank has an account for each service class.
• In the account is the # of tickets saved and when they are
deposited.
• Depositing, drawing, and transferring tickets together are
used to achieve both performance isolation and resource
utilization.
Ning Li
Jordan Parker
Scalable Cluster Resource
Management
17
Bank Algorithm: part 1
Checking each service class j on each node i:
compare previous ticket usage u_ij, allocation r_ij and desired
allocation D_j
1 u_ij < r_ij and r_ij <= D_j:
R_ij = u_ij
deposit D_j - R_ij to its back account
2 u_ij < r_ij and r_ij > D_j:
R_ij = min(u_ij,D_j)
deposit D_j - R_ij to its bank account if it's greater than 0
3 u_ij = r_ij and r_ij < D_j:
R_ij = D_j
(or R_ij = u_ij + k,
where k is a small #)
4 u_ij = r_ij and r_ij >= D_j:
R_ij = D_j
Ning Li
Jordan Parker
Scalable Cluster Resource
Management
18
Bank Algorithm: part 2
let t_i be # of tickets currently allocated on
node i
IF t_i >= T_i
normalize the tickets so that t_i = T_i
ELSE
check balance B_ij in bank account for
class j in case 4 above
Ning Li
Jordan Parker
Scalable Cluster Resource
Management
19
Bank Algorithm: part 2 (continued)
option 1: check classes in decreasing balance order
let b_ij = min(B_ij, h), where h is a
relatively small #
R_ij += b_ij, and draw b_ij from j's
bank account
t_i += b_ij
until t_i >= T_i
option 2: check all classes in case 4 above with balance >= 0
allocate T_i - t_i tickets to these classes
proportional to their bank account, and draw
from bank account accordingly
Ning Li
Jordan Parker
Scalable Cluster Resource
Management
20
Bank Algorithm: part 3
assign to classes in case 4 above
proportional to their share or their
need if there are still unassigned
tickets.
Ning Li
Jordan Parker
Scalable Cluster Resource
Management
21
Notes and Other Strategies:
• Note: Tickets in bank account has a time-stamp associated with it, and
will expire after getting certain age.
• Strategy: Manager could force some compensation if t_i >= T_i on all
the nodes before adjustment, and some classes have high balance in
their accounts. Manager could allocate a reasonable amount of tickets
as in option 2 above, then normalize so that t_I gets equal to T_i.
•
Strategy: Some class on some node may choose to reserve some
tickets for its use on this same node in the near future, but not deposit
them in the bank. We'll check this option.
Ning Li
Jordan Parker
Scalable Cluster Resource
Management
22
Transfer Strategy: Very simple
• Based on the previous usage report from lower-level
managers, current manager transfers from one account to
another where tickets are badly needed.
Ning Li
Jordan Parker
Scalable Cluster Resource
Management
23
Transfer Strategy:
More detailed (if needed)
check class-manager <j,i> pair in decreasing usage/share order, i.e.
check those classes that need more tickets most
check j's account on other managers l, where usage/share is low
transfer min(B_lj,b) tickets from j's acccount on manager l
to j's account on manager i, where b is a constant
Ning Li
Jordan Parker
Scalable Cluster Resource
Management
24
Thinking of better strategies. :-)
• Any Ideas
Ning Li
Jordan Parker
Scalable Cluster Resource
Management
25
Network View
5
6
7
2
1
3
8
9
10
4
11
12
1
2
3
4
5
6
7
8
9
10
11
12
Ning Li
Jordan Parker
Scalable Cluster Resource
Management
26
Full Network Overview
WAN
5
6
Ning Li
Jordan Parker
7
2
1
3
8
Scalable Cluster Resource
Management
9
10
4
11
12
27
Failure Design
• Essentially tried to create a structure similar
to a tree structure
• Thus we try to delete nodes and deal with
the recovery similar to removing a node
from a tree
Ning Li
Jordan Parker
Scalable Cluster Resource
Management
28
Minor Node(6) Failure
1
2
5
6
5
1
3
7
6
Ning Li
Jordan Parker
8
7
4
9
10
2
11
1
2
12
5
3
8
6
Scalable Cluster Resource
Management
9
3
7
8
10
4
9
4
10
11
11
12
12
29
st
1
Level Manger(2) Failure
1
2
5
5
1
3
6
7
6
Ning Li
Jordan Parker
8
7
4
9
2
10
11
1
5
12
3
3
6
8
7
9
Scalable Cluster Resource
Management
8
10
4
9
4
10
11
11
12
12
30
nd
2
Level Manger(1) Failure
2
1
2
5
3
6
5
7
6
Ning Li
Jordan Parker
8
7
5
4
9
2
10
11
1
6
12
3
3
8
Scalable Cluster Resource
Management
7
9
8
10
4
9
4
10
11
11
12
12
31
Node Insertion
• Simply find a manager with nodes to fill
• If there is no space simply make a leaf node
into a manager
Ning Li
Jordan Parker
Scalable Cluster Resource
Management
32
Why discuss failure?
• Not relevant to the performance of our
scheduler, we don’t even plan to simulate it
(unless we have lots of free time), but …
• It does show that the network layout we’ve
designed could easily handle failures
• Making the tree balance itself and handling
failures could be relatively straight forward
Ning Li
Jordan Parker
Scalable Cluster Resource
Management
33
Network Simulator - NS
• Our Components
– A new Agent Class: RsrcAgent
• Agents are servers running on a node
– A script to create ns input file
• Specifies network layout
– Number of Nodes
– Nodes per Manager
• Specifies the request trace
Ning Li
Jordan Parker
Scalable Cluster Resource
Management
34
NS implementation status
• Look at code
Ning Li
Jordan Parker
Scalable Cluster Resource
Management
35
Evaluation
• NS should make it easy
• Just extract information from nodes about
load balance
• More importantly look at the rate queries
get handled by the nodes
Ning Li
Jordan Parker
Scalable Cluster Resource
Management
36