Introduction and Overview of Advanced Computer Networks

CLOUD COMPUTING AND DATA
CENTER NETWORKING
ZILONG YE, PH.D.
[email protected]
WHAT IS CLOUD COMPUTING?
 Cloud Computing is a general term used to describe a new class
of network based computing that takes place over the Internet,
 basically a step on from Grid Computing
 a collection/group of integrated and networked hardware,
software and Internet infrastructure (called a platform).
 Using the Internet for communication and transport provides
hardware, software and networking services to clients
 These platforms hide the complexity and details of the underlying
infrastructure from users and applications by providing very simple
graphical interface or API (Applications Programming Interface).
2
WHAT IS CLOUD COMPUTING?
 In addition, the platform provides on demand
services, that are always on, anywhere, anytime and
any place.
 Pay for use and as needed, elastic
 scale up and down in capacity and functionalities
 The hardware and software services are available to
 general public, enterprises, corporations and
businesses markets
3
CLOUD SUMMARY
 Cloud computing is an umbrella term used to refer
to Internet based development and services
 A number of characteristics define cloud data,
applications services and infrastructure:
 Remotely hosted: Services or data are hosted on remote
infrastructure.
 Ubiquitous: Services or data are available from anywhere.
 Commodified: The result is a utility computing model
similar to traditional that of traditional utilities, like 4gas and
electricity - you pay for what you would want!
CLOUD ARCHITECTURE
5
CLOUD SERVICE MODELS
Software as a
Service (SaaS)
Platform as a
Service (PaaS)
Infrastructure as a
Service (IaaS)
SalesForce CRM
LotusLive
Google
App
Engine
6
Adopted from: Effectively and Securely Using the Cloud Computing Paradigm by peter Mell, Tim
Grance
BASIC CLOUD CHARACTERISTICS
 The “no-need-to-know” in terms of the underlying
details of infrastructure, applications interface with
the infrastructure via the APIs.
 The “flexibility and elasticity” allows these
systems to scale up and down at will
 utilizing the resources of all kinds
 CPU, storage, server capacity, load balancing, and databases
 The “pay as much as used and needed” type of
utility computing and the “always on!, anywhere
and any place” type of network-based computing.
BASIC CLOUD CHARACTERISTICS
 Cloud are transparent to users and applications, they
can be built in multiple ways
 branded products, proprietary open source,
hardware or software, or just off-the-shelf PCs.
 In general, they are built on clusters of PC servers
and off-the-shelf components plus Open Source
software combined with in-house applications and/or
system software.
8
CLOUD COMPUTING CHARACTERISTICS
Common Characteristics:
Massive Scale
Resilient Computing
Homogeneity
Geographic Distribution
Virtualization
Service Orientation
Low Cost Software
Advanced Security
Essential Characteristics:
On Demand Self-Service
Broad Network Access
Rapid Elasticity
Resource Pooling
Measured Service
VIRTUALIZATION
 Virtual workspaces:
 An abstraction of an execution environment that can be made dynamically
available to authorized clients by using well-defined protocols,
 Resource sharing (e.g. CPU, memory share),
 Software configuration (e.g. O/S, provided services).
 Implement on Virtual Machines (VMs):
 Abstraction of a physical host machine,
 Hypervisor intercepts and emulates instructions from VMs, and allows
management of VMs,
 VMWare, Xen, etc.
 Provide infrastructure API:
 Plug-ins to hardware/support structures
App
App
App
OS
OS
OS
Hypervisor
Hardware
Virtualized Stack
VIRTUAL MACHINES
App
App
App
App
App
Guest OS
(Linux)
Guest OS
(NetBSD)
Guest OS
(Windows)
VM
VM
VM
Virtual Machine Monitor (VMM) / Hypervisor
Hardware
Xen
VMWare
UML
Denali
etc.
11
VIRTUALIZATION IN GENERAL
 Advantages of virtual machines:
 Run operating systems where the physical hardware is unavailable,
 Easier to create new machines, backup machines, etc.,
 Software testing using “clean” installs of operating systems and
software,
 Emulate more machines than are physically available,
 Timeshare lightly loaded systems on one host,
 Debug problems (suspend and resume the problem machine),
 Easy migration of virtual machines (shutdown needed or not).
 Run legacy systems!
12
WHAT IS HADOOP?
 At Google MapReduce operation are run on a special file system called Google File
System (GFS) that is highly optimized for this purpose.
 GFS is not open source.
 Doug Cutting and others at Yahoo! reverse engineered the GFS and called it Hadoop
Distributed File System (HDFS).
 The software framework that supports HDFS, MapReduce and other related entities is
called the project Hadoop or simply Hadoop.
 This is open source and distributed by Apache.
FAULT TOLERANCE
 Failure is the norm rather than exception
 A HDFS instance may consist of thousands of server machines, each storing part of the
file system’s data.
 Since we have huge number of components and that each component has non-trivial
probability of failure means that there is always some component that is non-functional.
 Detection of faults and quick, automatic recovery from them is a core architectural goal
of HDFS.
HADOOP DISTRIBUTED FILE
SYSTEM
HDFS Server
Master node
HDFS Client
Application
Local file
system
Block size: 2K
Name Nodes
Block size: 128M
Replicated
HDFS ARCHITECTURE
Metadata ops
Metadata(Name, replicas..)
(/home/foo/data,6. ..
Namenode
Client
Block ops
Read
Datanodes
Datanodes
replication
B
Blocks
Rack1
Write
Client
Rack2
MAPREDUCE
 MapReduce is a programming model Google has used
successfully to process its “big-data” sets (~ 20000 peta bytes
per day)
 A map function extracts some intelligence from raw data.
 A reduce function aggregates according to some guides the
data output by the map.
 Users specify the computation in terms of a map and a
reduce function,
 Underlying runtime system automatically parallelizes the
computation across large-scale clusters of machines, and
 Underlying system also handles machine failures, efficient
communications, and performance issues.
Large scale data splits
Map <key, 1>
<key, value>pair
Reducers (say, Count)
Parse-hash
Count
P-0000
, count1
Parse-hash
Count
P-0001
, count2
Parse-hash
Count
Parse-hash
P-0002
,count3
CLOUD COMPUTING RESEARCH TOPICS
 Virtual machine placement

One-dimensional VM placement

Multi-dimensional VM placement

VM placement in single DC

VM placement in multi DC
 Virtual machine live migration
 Availability-aware virtual machine placement
VIRTUAL MACHINE PLACEMENT
 One single physical machine can host multiple VMs as long as the capacity of the
physical machine is not exceeded
 VM placement: determine how to allocate VMs on physical machines
 Objective: minimize the number of physical machines used
 Constraint: physical machine capacity, other QoS requirements, SLA
requirements
 One-dimensional VM placement problem is similar to the bin-packing problem.
BIN PACKING (1-D)
The bins;
(capacity 1)
Bin Packing Problem
……
1
.5
.7
.5
.5
.5
.2
.4
.2
.4
.2
.2
.7
Items to be packed
.5
.5
.1
.6
.1
.6
BIN PACKING (1-D)
Bin Packing Problem
……
1
.5
.7
.5
.2
.4
.2
.5
.1
.6
Optimal Packing
.1
.5
.5
.2
.7
.5
.4
.6
.2
N0 = 4
NEXT FIT PACKING ALGORITHM
Bin Packing Problem
.5
.7
.5
.2
.4
.2
.5
.1
.6
.1
.5
.5
.2
.7
.5
.4
N0 = 4
.6
.2
Next Fit Packing Algorithm
.2
.5
.7
.5
.2
.1
.4
.5
.6
N=6
FIRST FIT PACKING ALGORITHM
.5
.7
.5
.2
.4
.2
.5
.1
.6
Next Fit Packing Algorithm
.2
.5
.7
.5
.2
.1
.4
.5
.5
.6
.6
First Fit Packing Algorithm
.5
.1
.2
.2
.5
.7
.4
N=5
OTHER APPROACHES
 Best fit bin packing
 Load balanced bin packing
 First fit with decreasing demand
MULTI-DIMENSIONAL VM PLACEMENT
 Not only CPU capacity is considered, but also memory and
storage capacity are also considered, so it becomes a multidimensional bin packing problem
 Possible solutions:
 Map VM to a physical machine that can satisfy all the
requirements
 The physical machine has the highest remaining capacity in
terms of the average of all the requirements
VM PLACEMENT IN MULTI-DATACENTERS
 Considering not only the VM placement, but also the network consumption
between different VMs.
 Similar to the Virtual Infrastructure Mapping problem
e
60
35
75
1
2
60G
a
30G
b
40
40G
a
10G
d
25
100G
40G
60G
e
35
d
10G
140G
50G
6
3
30 25G
65
c
50
Virtual request
5
10
40G
4
60
Physical substrate
120G
b
c
VM MIGRATION
 Dynamic VM placement problem
 Why VM migration?

Failure

Migration to save energy
 Cons: additional network consumption, overhead (the new VM should be
initiated first, and get all the replicated data from the old VM before the old VM
shuts down)
 Objective: minimize disruption time and SLA violation
 Questions: when to migrate? How to migrate?
OTHER DYNAMIC VM PLACEMENT PROBLEMS
 VM splitting in order to increase the utilization of the physical machine

Computing overhead, e.g., a VM with 100 computing demands can be split into two
VMs, each of which has 55 computing demands

Additional network load between the VMs which are split from the original one
 The VMs’ computing demand are changing over time (so best fit may not be the
bet)
 There may be upgrading VMs joining the application, which may need to be
provisioned close to where the original VMs locate
RESILIENCY IN VM PLACEMENT
 One working VM should have around 3 or 5 backup VMs
 Data are replicated from the working VM to the backup VMs

The dataflow can follow unicast or multicast

How to choose the multicast routing to save bandwidth

How to make the dataflow reliable against failure
 Dynamic VM resiliency enhancement, considering the application state, the mean
time to repair