Aim of the work - High Performance Computing Lab

Issues and Opportunities
of Cloud Federations
Massimo Coppola
in collaboration with
Laura Ricci, Emanuele Carlini,
Patrizio Dazzi, Ranieri Baraglia
Summary
•
Cloud Computing
•
•
Where do we come from :
HPC, Parallel Computing, Grids, P2P
Federations of Clouds
•
•
What and why
What we inherit from our past experiences
• Autonomic, P2P, Resource Scheduling
•
•
Cloud applied to virtual environments
Business models for cloud federations
Parallelism, to Grid, to Clouds ...
•
To approach today’s Clouds,
and boldly go beyond them,
many techniques and theoretical results
can be reused
•
sometimes are reinvented with a different name...
•
Scheduling and resource management
from Parallel and Grid Computing
•
P2P techniques to cheaply and widely
spread information
•
Autonomic management based on
performance models of applications
Grid and Cloud computing with XtreemOS
Part 3 - Basic of System Administration
Massimo Coppola
ISTI-CNR, Italy
with contributions by Christine Morin and countless
collaborators within XtreemOS
Eurosys 2010, Paris
XtreemOS IP project
XtreemOS IP project
is funded by the European Commission
under contract IST-FP6-033576
is funded by the European Commission under contract IST-FP6-033576
4
SRDS and RSS
 SRDS (service and resource discovery service)as part
of the XtreemOS releases
 Requested for node selection by the AEM
 New functionalities
 Support of multiple underlying DHTs (Scalaris, Overlay
Weaver)
 Support of XACML policy filters
 Support of the new mutithreaded DIXI
Tested using up to 500 machines from Grid'5000
XtreemOS System
XtreemOS IP project - EC IST-FP6-033576 - Eurosys 2010 Tutorial, Paris
Contrail Iaas Federation
A Contrail Federation integrates
in a common platform multiple Clouds,
of public and private kind.
User identities, data, and resources are
interoperable within the federation, thanks to
•common supports for authentication and
authorization
•common mechanisms for policy definition,
monitoring, and enforcing of all aspects of QoS :
SLA, QoP, etc.
•the basis of a common economic model
Federation Objectives
• Develop a Federation support that integrates and
actively coordinates SLA management provided
by single Cloud providers
• Do not disrupt provider’s business model
•
Cloud administration is not Federation management
• Allow exploiting a Federation as a single Cloud
•
• Cloudbursting to and from the Federation
Federation Support must be scalable
•
Number of apps running, providers, resources, users
Cloud revolutions
•
•
Is there a place for “small” Cloud
providers?
•
they offer lower scalability, are not worldwide
Large Cloud providers are subject to
contrasting forces
• concentration data centers where
•
management is cheaper
placing resources scattered over the internet
structure, to improve the networking cost
• m.media streaming and real time enjoy lower
latencies and round-trips, less overall
bandwidth
Cloud revolutions
•
Federations as a way to flexibly merge
separate providers
•
•
•
Smooth the size disadvantage
Increase the “market size”
Provide a competitive edge as small
providers are already geographically
distributed
Distributed Architecture
• Abstract API is replicated onto each Federation
•
access point
FAP act as brokers, but share a common view
•
Security, provider status, user actions
• FAP not restricted to “local” provider
F
F
F
F
• Policies and auth/authZ are
common
• Contention issues
• Final resource allocation is on
providers
• Shared info helps
management
• AP either hosted by provider,
or on independent HW
Holistic approach to QoS
• Extend the set of characteristics to be
•
•
measured on the platform
Protection
•
Type of security mechanisms which are in place
•
Guarantees offered by storage holder, network
infrastructure
• Auth. Protocols, Encryption mechanisms, Isolation
Privacy
• Geo-localization
•
Can have deep legal implications
•
E.g. power consumption: overall power,
efficiency
• More in the future
Planning for SLAs
•
•
Choose the best provider(s) and map
the application on the virtual resources
provided
Beside constraints, multiple criteria
choice
• Many user criteria
• Federation has its own goals
•
•
• balance user satisfaction
• balance provider satisfaction
How do you choose the resources?
What if one provider is not enough?
Application and SLA splitting
•
•
•
Application deployment on multiple
providers : a federation is more than the
sum of its providers
• Type and amount of resources needed
• Sudden elasticity
• Peculiar resource dislocation
Tough issue
•
Multi-criteria and problem size
Both at SLA negotiation and at run-time
• Matching application structure and SLA
• Identifying suitable set of providers and
mapping
Standard interoperation
•
•
•
Standards are still “flowing” in the Cloud
•
except de facto ones
Interoperation is mandatory
We are building an open-source OVF
toolkit  a standard converter
•
•
•
with INRIA and XLAB
(de)serialize in memory Java structures from
to OVF and other standards for VM and
Application description
will be extended to deal with SLA standards
Future directions
•
Apply autonomic heuristics to Clouds
and Federations, and develop new
ones.
•
New business models to be applied in
Cloud Federations
• For Service Providers, Federation
aggregators and/or end-users
•
W.r.t the security and trust counterpart:
24/7 UCON authorization
and “geographic” SLA constraints
Digital Virtual Environments
• Player can move and interact with the
surrounding environment
• Shared sense of space among players
• Modifications of the environment
visible to every players
• Area Of Interest (AOI)
Virtual Environments
•
Complex and challenging applications
•
•
High number of players
Near real-time constraints
•
Quadratic (or cubic) load (bandwidth, cpu)
depending on the number of players:
seasonal
•
QoS requirements depends on the user
behavior
•
movements vs interactions
Aim of the work
•
Distributed architecture for Virtual
Environments
•
•
scalable in QoS and cost
Exploit the (illusion of) infinite resources
of Cloud Computing and the free
resources of user machines.
Hybrid Architecture?
•
Private server-racks are fine... but they
are statically sized for the peak load
•
Pure P2P should scale up.. but makes it
hard to manage the QoS in limit
situations
•
Only cloud? Costly for large instances
Combination of the Cloud and P2P to support
the DVE in an inexpensive and QoS-aware
fashion
Cloud & P2P Combination
Letting the cloud
manage the
bootstrap and
peak load
Concrete Architecture
•
State Action Manager (SAM)
•
•
manages the state. Medium
rate, No error tolerance,
Conflicts
Positional Action Manager
(PAM)
•
manages the position. High
rate, Some error tolerance,
No conflicts
SAM
•
Cloud IAASs runs
on a DHT together
with users
machines
•
Heuristics decide
when moving load
from users to
Cloud
•
Backups for user
machines
w/o heuristic
with heuristic
PAM
(she likes to gossip!)
•
•
•
•
“Wisdom of the
Crowds”
A best-effort
gossip-based
algorithm
Percentage of object retrieval
using gossip
accurate, slower heuristic
Storage Cloud as
support
Around 70-80%
less requests to
the Cloud
faster heuristic
Workload for Simulations
Load and number
of players
Positions of objects/avatar
What’s next?
•
Elastic provisioning and Prediction in
SAM
•
Dynamic management of the AOI in
PAM
Some References
Carlini E., Coppola M., Dazzi P., Ricci L., and Righetti G.. “Cloud
Federations in Contrail”. Euro-Par 2011: Parallel Processing Workshops,
LLNCS 7155, 2012.
Carlini, E., M. Coppola, and L. Ricci. “Flexible Load Distribution for Hybrid
Distributed Virtual Environments”. submitted
Carlini, E., M. Coppola, and L. Ricci. “Gossip-Based Best-Effort Interest
Management for Distributed Virtual Environments”. submitted
Carlini, E., M. Coppola, and L. Ricci (2010). Integration of P2P and Clouds
to Support Massively Multiuser Virtual Environments. In: Network and
Systems Support for Games (NetGames), 2010 9th Annual Workshop on.
IEEE, pp.1–6.
http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=5679660
Beware!
Backup slides behind.
Load Characterization
Cloud
P2P
Cloud
SAM Architecture
PAM: Area Coverage
Find a subset of areas that maximize
the coverage is a NP problem
Two heuristic:
- greedy: slower, but more accurate
- score: faster, but less accurate
Some Collaborations