Cloud Computing

Grid Computing
Cloud Computing
Elastic
Web Scale
Evolution of Computing with Network
1. Network Computing
• Network is computer (client - server)
• Separation of Functionalities
2. Cluster Computing
• Tightly coupled computing resources:
CPU, storage, data, etc. Usually connected within a LAN
• Managed as a single resource
• Commodity, Open source
3. Grid Computing
• Resource sharing across several domains
• Decentralized, open standards
• Global resource sharing
4. Utility Computing
• Don’t buy computers, lease computing power
• Upload, run, download
• Ownership model
What is Grid computing?
1.
Grid computing is the combination of computer resources from multiple administrative
domains for a common goal
2.
The bringing together of many different groups in this collaborative effort is known
as Virtual Organizations (VOs).
3.
These VOs may be formed to solve a single task and may then disappear just as
quickly.
4.
Grids are usually used for solving scientific, technical or business problems that
require a great number of computer processing cycles for processing of large amounts
of data.
5.
One of the main strategies of Grid computing is to use middleware to divide and
apportion pieces of a program among several computers, sometimes up to many
thousands
6.
Grid computing involves computation in a distributed fashion, which may also involve
the aggregation of large-scale cluster computing based systems.
7.
The size of a Grid may vary from being small — confined to a network of computer
workstations within a corporation, for example — to being large, public collaboration
across many companies and networks
What is Grid computing?
Computing Grids
• Distribute computing intensive tasks among different physical locations
• Often used in scientific applications (number crunching)
Data Grids
• Distribute data (files or database) among different physical locations
• Tupel Grid is a distributed object „shared memory“
• Data is highly redundant and available
Grid Computing Diagram
DistributeTask
Problem
Collect
Results
Master Worker Pattern
1. Typical grid computing pattern
1. Split up a big task into smaller processing units that can run in parallel
Distribute the task onto many computers
2. The grid software distributes the tasks and monitors them
3. Collects the results
Grid Computing Toolkits
Computing grid
• Globus Toolkit – Standard toolkit for grid computing
• GridGain - Very simple to use framework for distributed processing (Master
Worker, Mapreduce)
• Hadoop (Apache) Mapreduce Framework + Distributed Filesystem (like Google
FS)
Datagrid
• Tupel Space: JavaSpaces – e.g. Gigaspaces, Rio
• Commercial Datagrids like Oracle 10g
• …
The Distributed computing problem
1. Overhead of Network communication
2. Processing time must be magnitude higher than network communication time to
make it worth
3. As Martin Fowler states: „put as much as possible locally to avoid network
bottlenecks“
Cloud Computing
Why Cloud?
1. The
term cloud is used as a metaphor for the Internet, based
on the cloud drawing used in the past to represent the
telephone network,[5] and later to depict the Internet
in computer network diagramsas an abstraction of the
underlying infrastructure it represents
Cloud Computing
1. Cloud computing is Internet-based computing, whereby shared resources, software and
information are provided to computers and other devices on-demand, like a public utility.
2. It is a paradigm shift following the shift from mainframe to client-server that preceded it
in the early '80s. Details are abstracted from the users who no longer have need of,
expertise in, or control over the technology infrastructure "in the cloud" that supports
them.
3. Cloud computing describes a new supplement, consumption and delivery model for IT
services based on the Internet, and it typically involves the provision of
dynamically scalable and often virtualized resources as a service over the Internet .
Cloud Computing
http://my.com
0.1
$/h
0.1
$/h
0.1
$/h
0.2
$/h
0.1
$/h
0.1
$/h
0.1
$/h
Load Balancer
Example: 0.4$
Example:
per hour
0.8$
+ Traffic
per hour + Traffic
Cloud Computing Diagram
Cloud Computing Explained
1. Rent computing resources from a provider
2. Dynamically increase or shrink computing units
3. Pay only for resources that you actually use
•
Computing power, network bandwidth, storage
Cloud vs. Grid
Cloud Computing is an infrastructure that virtualizes hardware and software resources
Grid Computing are patterns, tools and frameworks to distribute computing or data
A cloud can be the platform to run a computing or data grid
6 Layer Cloud Computing Stack
Example of Cloud Services
Infrastructure
IaaS
Infrastructure
as a Service
Example of Cloud Services
Software
SaaS
Software as a
Service
Example of Cloud Services
Platform
PaaS
Platform as a
Service
IaaS
Infrastructure as a Service
Amazon
AWS
AWS: Amazon Web Services
1. Amazon Elastic Compute Cloud (EC2)
2. Amazon Simple Storage Service (S3)
3. Amazon CloudFront
4. Amazon Relational Database Service
5. Amazon SimpleDB
6. Amazon Simple Queue Service
7. Amazon Elastic MapReduce
Amazon Pricing Diagram
S3 Datastorage
1. Amazon S3 provides a simple web services interface that can be used to store and retrieve
any amount of data, at any time, from anywhere on the web.
2. It gives any developer access to the same highly scalable, reliable, fast, inexpensive data
storage infrastructure that Amazon uses to run its own global network of web sites.
3. Datastore like Filesystem but flat (no directory structure possible)
1. Separated in buckets
2. Can contain unlimited objects
3. Each object can be up to 5GB
4. Secure object through ACLs
5. Standard SOAP or REST Access (open by plain URL!)
S3 Use Cases
Data Backups for EC2 instances
Simple means to provide unlimited storage to your users
• File download, File upload websites
Very simple integration in any web site
AWS – S3 - Pricing
Storage
•$0.15 per GB-Month of storage used
Data Transfer
•$0.100 per GB – all data transfer in
•$0.170 per GB – first 10 TB / month data transfer out
•$0.130 per GB – next 40 TB / month data transfer out
•$0.110 per GB – next 100 TB / month data transfer out
•$0.100 per GB – data transfer out / month over 150 TB
Requests
•$0.01 per 1,000 PUT, POST, or LIST requests
•$0.01 per 10,000 GET and all other requests*
SQS- Simple Queue System
1. Amazon Simple Queue Service (Amazon SQS) offers a reliable, highly scalable, hosted queue for storing
messages as they travel between computers.
2. By using Amazon SQS, developers can simply move data between distributed components of their
applications that perform different tasks, without losing messages or requiring each component to be
always available
3. Features:
1. You can create unlimited number of queues
2. Each message can be up to 8kb of size
3. Message can stay max 4 days in a queue
4. Message is locked so only one client can process it at a time
5. Simple Access via SOAP or http Query API
SQS Pricing
Requests
$0.01 per 10,000 Amazon SQS Requests ($0.000001 per Request)
Amazon SQS requests are CreateQueue, ListQueues, DeleteQueue, SendMessage,
ReceiveMessage, DeleteMessage, SetQueueAttributes and GetQueueAttributes
Data Transfer
$0.100 per GB – all data transfer in
$0.170 per GB – first 10 TB / month data transfer out
$0.130 per GB – next 40 TB / month data transfer out
$0.110 per GB – next 100 TB / month data transfer out
$0.100 per GB – data transfer out / month over 150 TB
Amazon Elastic Compute Cloud (Amazon EC2)
Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides resizable
compute capacity in the cloud.
Amazon EC2’s simple web service interface allows you to obtain and configure capacity with
minimal friction.
• It provides you with complete control of your computing resources and lets you run
on Amazon’s proven computing environment.
• Amazon EC2 reduces the time required to obtain and boot new server instances to
minutes, allowing you to quickly scale capacity, both up and down, as your
computing requirements change.
• Amazon EC2 allows you to pay only for capacity that you actually use
Application of EC2 – S3 -- SQS
SQS
Pool
Manager
Start/stop
Service
Service
busy
AMI
Service
AMI
Service
AMI
AMI
Status
Monitor & Output(file)
calculate #
of cu‘s
Input
File
Ingestor
process
S3
Input
bucket
EC2
Output
bucket
Architecture of the application
SQS
Pool
Manager
Start/stop
Service
Service
busy
AMI
Service
AMI
Service
AMI
AMI
Status
Monitor & Output(file)
calculate #
of cu‘s
Input
File
Ingestor
process
S3
Input
bucket
EC2
Output
bucket
Application of EC2 – S3 – SQS -- DB
SQS
Pool
Manager
Start/stop
Service
Service
busy
AMI
Service
AMI
Service
AMI
AMI
Status
Monitor & Output(file)
calculate #
of cu‘s
Input
File
Ingestor
process
S3
Input
bucket
EC2
Output
bucket
Amazon CloudFront
Amazon CloudFront is a web service for content delivery.
• It integrates with other Amazon Web Services (e.g., Amazon Simple Storage Service)
to give developers and businesses an easy way to distribute content (static and
streaming content) to end users with low latency, high data transfer speeds, and no
commitments.
Amazon SimpleDB
Amazon SimpleDB is a web service providing the core database functions of data indexing
and querying in the cloud.
• Works in conjunction with Amazon S3 and Amazon EC2 to run queries on structured
data in real time.
Amazon Relational Database Service
Amazon Relational Database Service (Amazon RDS) is a web service that makes it easy to
set up, operate, and scale a relational database in the cloud.
• It provides cost-efficient and resizable capacity while managing time-consuming
database administration tasks, freeing you up to focus on your applications and
business.
Amazon Elastic MapReduce
Amazon Elastic MapReduce is a web service that enables businesses, researchers, data
analysts, and developers to easily and cost-effectively process vast amounts of data.
Operating Systems Supported
Software Provided
On-Demand Instances Pricing – EC2
Pricing – S3
Software as a Service
Google Family
Google
Cloud Computing
1. Description: “Cloud Computing” is a suite of products including Gmail, Google
Calender, Google Docs, Google Sites, and Postini.
1. Gmail
2. Google Calendar
3. Google Docs: Google Docs is web based word processing, spreadsheets, and
presentations.
4. Google Sites: Google Sites can be used to build websites and an intranet for multiple
users.
5. Postini: Postini provides e-mail security and archiving.
2. Imbedded within the suite of products is a file management system which allows file
sharing on the internet.
3. Cost: $50 per user per year
PaaS
Platform as a Service
Google Family
Scenario for
Google Platform as a Service
PaaS
Platform as a Service
Microsoft Family
Windows Live
Office Live
IaaS
Intelligence as a Service
Collaboration in the Cloud
Type of Collaboration
Categories
• Social Calendars
• Social Networking Sites
• Social Bookmarking
• Social Desktops
• Social Wikis
• Social Documents
http://web2.econsultant.com/collaboration-groups-teams-services.html - 125 sites...
Social Calendars
30 Boxes
•
www.30boxes.com
Google Calendar
•
www.google.com/calendar
Uses
•
Keep milestones, due dates, etc. where everyone can access them
•
Give others a look at your schedule so they can schedule meetings, etc. with your
time constraints in mind (without a million emails)
Social Networking Sites
Facebook
•
www.facebook.com
Ning
•
www.ning.com
Uses
•
(Facebook) use as platform for custom apps
•
(Ning) customize extensively
•
Use message boards, private and group messaging functions to keep in touch
•
Store documents in central location
Social Bookmarking
Del.icio.us
•
http://del.icio.us
Google Bookmarks
•
www.google.com/bookmarks
Uses
•
Store references, citations and other data needed by the group
•
Identify resources for group use
•
Easy access to group documents, pics/charts and sites
Social Desktops
CentralDesktop
•
www.centraldesktop.com/
MyWebDesktop
•
www.mywebdesktop.net/
Uses
•
Single desktop area for storage
•
Unified working environment for all collaborators
•
Most apps included (calendar, messaging, office software)
Social Wikis
Wetpaint
•
www.wetpaint.com
PBWiki
•
www.pbwiki.com
Uses
•
Central, structured repository for documents
•
Easy editing interface
•
Built-in revision, rollback features
•
Instant updates and notes access
Social Documents
Google Docs
•
docs.google.com/
ThinkFree
•
www.thinkfree.com
Uses
•
Multi-user editing features
•
Document storage
•
Document formats (doc, xls, ppt, pdf, odt, etc.)
GroupWare
http://grou.ps
•
Web site
•
Desktop client
•
Mobile interface
•
Facebook App
Drupal/Joomla (CMS - http://opensourcecms.com/ )
•
Create your own groupware
•
Host, manage, control your data
Cloud for the Library/Information Services?