Cloudifying Games: Rain for the Thirsty

Cloud Computing Research at
TU Delft (2008—ongoing)
Parallel and Distributed Systems Group
Delft University of Technology
The Netherlands
3TU.
=
+
+
Our team: Undergrad Gargi Prasad, Arnoud Bakker, Nassos Antoniou, Thomas de
Ruiter, … Grad Siqi Shen, Nezih Yigitbasi, Ozan Sonmez Staff Henk Sips, Dick Epema,
Alexandru Iosup Collaborators Ion Stoica and the Mesos team (UC Berkeley),
Thomas Fahringer, Radu Prodan (U. Innsbruck), Nicolae Tapus, Mihaela Balint, Vlad
Posea (UPB), Derrick Kondo, Emmanuel Jeannot (INRIA), ...
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing
1
TUD Team: 2 Staff, 2+3PhD, n MSc, ...
Our team: Undergrad Adrian Lascateu, Alexandru Dimitriu (UPB, Romania), …, Grad
Vlad Nae (U. Innsbruck, Austria), Siqi Shen, Nezih Yigitbasi (TU Delft, the Netherlands),
…Staff Alexandru Iosup, Dick Epema, Henk Sips (TU Delft), Thomas Fahringer, Radu
Prodan (U. Innsbruck), Nicolae Tapus, Mihaela Balint, Vlad Posea (UPB), etc.
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing
2
What is Cloud Computing?
VS
http://www.flickr.com/photos/dimitrisotiropoulos/4204766418/
•
•
•
•
“The path to abundance”
On-demand capacity
Pay what you use
Great for web apps (EIP, web
crawl, DB ops, I/O)
Tropical Cyclone Nargis (NASA, ISSS, 04/29/08)
• “The killer cyclone”
• Not so great performance
for sci. applications1
• Long-term perf. variability2
• How to manage?
1- Iosup et al., Performance Analysis of Cloud Computing Services for
Many Tasks Scientific Computing, IEEE TPDS, 2011.
EIT ICT Labs
Cloud
Workshop
Futures
at TU
Workshop
Delft, May2010
2011 – Cloud
Cloud Computing
Computing Support
2- Iosup et al., On the Performance
of Production3 Cloud
for MassivelyVariability
Social Gaming
Services, CCGrid 2011.
3
What do We Want from Clouds?
Good IaaS, PaaS, SaaS
•
•
•
•
Portability (Virtualisation, no vendor lock-in)
Accountability (lease what you use)
… for eScience
… for Massively Social Gaming
Good resource management
•
•
•
•
•
Elasticity
Reliability
Efficiency (Scheduling)
Data-aware mechanisms
Being “green”?
Performance evaluation (What is “Good”?)
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing
4
Agenda
1.
2.
3.
4.
Introduction
Cloud Performance Studies
The Cloud Workloads Archive
Massivizing Online Social Games using Clouds
1. Platform Challenge
2. Content Challenge
3. Analytics Challenge
5. Other Cloud Activities at TUD
6. Take-Home Message
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing
5
Cloud Performance Studies
• Many-Tasks Scientific Computing
• Quantitative definition: J jobs and B bags-of-tasks
• Extracted proto-MT users from grid and parallel
production environments
• Performance Evaluation of
Four Commercial Clouds
• Amazon EC2, GoGrid, Elastic Hosts, Mosso
• Resource acquisition, Single- and Multi-Instance
benchmarking
• Low compute and networking performance
• Clouds vs Other Environments
• Order of magnitude better performance needed for clouds
• Clouds already good for short-term, deadline-driven
scientific computing
1- Iosup et al., Performance Analysis of Cloud Computing Services for Many
Tasks Scientific Computing, IEEE TPDS, 2011 (in print)
http://www.st.ewi.tudelft.nl/~iosup/cloud-perf10tpds_in-print.pdf
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing
2- Iosup et al., On the Performance Variability of Production Cloud
6
Services, CCGrid 2011, pds.twi.tudelft.nl/reports/2010/PDS-2010-002.pdf
Performance Evaluation of Clouds [1/3]
Tools: C-Meter
Yigitbasi et al.: C-Meter: A Framework for
Performance Analysis of Computing Clouds.
Proc. of CCGRID 2009
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing
7
Performance Evaluation of Clouds [2/3]
Low Performance for Sci.Comp.
• Evaluated the performance of resources from four
production, commercial clouds.
• GrenchMark for evaluating the performance of cloud resources
• C-Meter for complex workloads
• Four production, commercial IaaS clouds: Amazon Elastic
Compute Cloud (EC2), Mosso, Elastic Hosts, and GoGrid.
• Finding: cloud performance low for sci.comp.
S. Ostermann, A. Iosup, N. Yigitbasi, R. Prodan, T.
Fahringer, and D. Epema, A Performance Analysis of EC2 Cloud
Computing Services for Scientific Computing, Cloudcomp 2009,
LNICST 34, pp. 115–131, 2010.
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing
8
Performance Evaluation of Clouds [3/3]
Cloud Performance Variability
• Long-term performance variability of production cloud services
• IaaS:
Amazon Web Services
• PaaS:
Google App Engine
Amazon S3: GET US HI operations
• Year-long performance information for nine services
• Finding: about half of the cloud services investigated in
this work exhibits yearly and daily patterns; impact of
performance variability depends on application.
A. Iosup, N. Yigitbasi, and D. Epema, On the Performance
Variability of Production Cloud Services, CCGrid 2011.
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing
9
Agenda
1.
2.
3.
4.
Introduction
Cloud Performance Studies
The Cloud Workloads Archive
Massivizing Online Social Games using Clouds
1. Platform Challenge
2. Content Challenge
3. Analytics Challenge
5. Other Cloud Activities at TUD
6. Take-Home Message
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing
10
Traces: Sine Qua Non in Comp.Sys.Res.
• “My system/method/algorithm is better than yours
(on my carefully crafted workload)”
• Unrealistic (trivial): Prove that “prioritize jobs from
users whose name starts with A” is a good scheduling policy
• Realistic? “85% jobs are short”; “10% Writes”; ...
• Major problem in Computer Systems research
• Workload Trace = recording of real activity from a (real)
system, often as a sequence of jobs / requests submitted
by users for execution
• Main use: compare and cross-validate new job and resource
management techniques and algorithms
• Major problem: real workload traces from several sources
August 26, 2010
11
The Cloud Workloads Archive (CWA)
What’s in a Name?
CWA = Public collection of cloud/data center workload traces
and of tools to process these traces; allows us to:
1. Compare and cross-validate new job and resource management
techniques and algorithms, across various workload traces
2. Determine which (part of a) trace is most interesting for a specific
job and resource management technique or algorithm
3. Design a general model for data center workloads, and validate it
with various real workload traces
4. Evaluate the generality of a particular workload trace, to
determine if results are biased towards a particular trace
5. Analyze the evolution of workload characteristics across long
timescales, both intra- and inter-trace
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing
12
12
One Format Fits Them All
• Flat format
CWJ
CWJD CWT CWTD
• Job and Tasks
• Summary (20 unique data fields) and Detail (60 fields)
• Categories of information
• Shared with GWA, PWA: Time, Disk, Memory, Net
• Jobs/Tasks that change resource consumption profile
• MapReduce-specific (two-thirds data fields)
A. Iosup, R. Griffith, A. Konwinski, M. Zaharia, A. Ghodsi, I.
Stoica, Data Format for the Cloud Workloads Archive, v.3, 13/07/10
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing
13
13
CWA Contents: Large-Scale Workloads
Trace ID
System
Size J/T/Obs
Period
Notes
CWA-01
Facebook
1.1M/-/-
5m/2009
Time & IO
CWA-02
Yahoo M
28K/28M/-
20d/2009
~Full detail
CWA-03
Facebook 2
61K/10M/-
10d/2009
Full detail
CWA-04
Facebook 3
?/?/-
10d/01-2010
Full detail
CWA-05
Facebook 4
?/?/-
3m/02+2010
Full detail
CWA-06
Google 2
25 Aug 2010
CWA-07
eBay
23 Sep 2010
CWA-08
Twitter
Need help!
CWA-09?
Google
9K/177K/4M
7h/2009
Coarse,Period
• Tools
• Convert to CWA format
• Analyze and model automatically  Report
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing
14
14
The Cloud Workloads Archive
• Looking for invariants
• Wr [%] ~40% Total IO, but absolute values vary
Trace ID
Total IO [MB] Rd. [MB]
Wr [%]
HDFS Wr[MB]
CWA-01
10,934
6,805
38%
1,538
CWA-02
75,546
47,539
37%
8,563
• # Tasks/Job, ratio M:(M+R) Tasks, vary
• Understanding workload evolution
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing
15
Agenda
1.
2.
3.
4.
Introduction
Cloud Performance Studies
The Cloud Workloads Archive
Massivizing Online Social Games using Clouds
1. Platform Challenge
2. Content Challenge
3. Analytics Challenge
5. Other Cloud Activities at TUD
6. Take-Home Message
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing
16
What’s in a name? MSG, MMOG, MMO, …
250,000,000 active players
3BN hours/week world-wide
Massively Social Gaming =
(online) games with massive
numbers of players (100K+),
for which social interaction
helps the gaming experience
Romeo and Juliet
1. Virtual world
Explore, do, learn,
socialize, compete
+
2. Content
Graphics, maps,
puzzles, quests, culture
+
3. Game data
Player stats and
relationships
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing
17
FarmVille, a Massively Social Game
Sources: CNN, Zynga.
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing
18
Source: InsideSocialGames.com
MSGs are a Popular, Growing Market
• 25,000,000 subscribed players (from 250,000,000+ active)
• Over 10,000 MSGs in operation
• Subscription market size $7.5B+/year, Zynga $600M+/year
Sources: MMOGChart, own research.
Sources: ESA, MPAA, RIAA.
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing
19
Massivizing Games using Clouds
(Platform Challenge)
Build MSG platform that uses (mostly) cloud resources
• Close to players
• No upfront costs, no maintenance
• Compute platforms: multi-cores, GPUs, clusters, all-in-one!
Nae, Iosup, Prodan, Dynamic Resource Provisioning in
Massively Multiplayer Online Games, IEEE TPDS, 2011.
(Content Challenge)
Produce and distribute content for 1BN people
• Game Analytics  Game statistics
• Auto-generated game content
Iosup, POGGI: Puzzle-Based Online Games on Grid
Infrastructures, EuroPar 2009 (Best Paper Award)
(Analytics Challenge)
Build cloud-based layer to Improve gaming experience
• Game Analytics  Ranking / Rating
• Game Analytics  Matchmaking / Recommendations
Iosup, Lascateu, Tapus. CAMEO: social networks for
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing
MMOGs through continuous analytics and cloud
computing, ACM NetGames 2010.
20
Cloudifying: PaaS for MSGs
(Platform Challenge)
Build MSG platform that uses (mostly) cloud resources
•
•
•
•
•
•
•
•
•
Close to players
No upfront costs, no maintenance
Compute platforms: multi-cores, GPUs, clusters, all-in-one!
Performance guarantees
Code for various compute platforms—platform profiling
Misprediction=$$$
What services?
Vendor lock-in?
My data
EIT ICT Labs Workshop
at TU
Delft, May 2011
– Cloud Computing in
Nae, Iosup, Prodan,
Dynamic
Resource
Provisioning
Massively Multiplayer Online Games, IEEE TPDS, 2011.
21
Proposed hosting model: dynamic
• Using data centers for dynamic resource allocation
Massive leave
join
Massive join
• Main advantages:
1. Significantly lower over-provisioning
2. Efficient coverage of the world is possible
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing
[Source: Nae, Iosup, and Prodan, ACM SC 2008]
22
Static vs. Dynamic Allocation
Q:What is the penalty for static vs. dynamic allocation?
250%
25%
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing
[Source: Nae, Iosup, and Prodan, ACM SC 2008]
23
Cloudifying:
Content, Content, Content
(Content Challenge)
Produce and distribute content for 1BN people
•
•
•
•
•
•
Game Analytics  Game statistic
Crowdsourcing
Storification
Auto-generated game content
Adaptive game content
Content distribution/
Streaming content
EIT ICT Labs
Workshop at TU Delft,Online
May 2011 –Games
Cloud Computing
A. Iosup, POGGI:
Puzzle-Based
on Grid
Infrastructures, EuroPar 2009 (Best Paper Award)
24
(Procedural) Game Content (Generation)
Derived Content
NewsGen, Storification
Hendricks, Meijer, vd Velden, Iosup,
Procedural Game Content Generation:
A Survey, Working Paper, 2010
Game Design
Rules, Mechanics, …
Game Scenarios
Puzzle, Quest/Story, …
Game Systems
Eco, Road Nets, Urban Envs, …
Game Space
Height Maps, Bodies of Water, Placement Maps, …
Game Bits
Texture, Sound, Vegetation, Buildings, Behavior,
Fire/Water/Stone/Clouds
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing
25
The New Content Generation Process*
Only the puzzle concept, and the instance generation and
solving algorithms, are produced at development time
ICT Labs Workshop
at TU Delft, May
2011 – Cloud
Computing
* A. Iosup,EITPOGGI:
Puzzle-Based
Online
Games
on Grid
Infrastructures, EuroPar 2009 (Best Paper Award)
26
Puzzle-Specific Considerations
Generating Player-Customized Content
Puzzle difficulty
•
•
•
•
4
Solution size
Solution alternatives
Variation of moves
Skill moves
Player ability
• Keep population statistics and generate
enough content for most likely cases
• Match player ability with puzzle difficulty
• Take into account puzzle freshness
21
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing
27
Cloudifying: Social Everything!
• Social Network=undirected graph, relationship=edge
• Community=sub-graph, density of edges between its nodes higher
than density of edges outside sub-graph
(Analytics Challenge)
Build cloud-based layer to
Improve gaming experience
• Ranking / Rating
• Matchmaking / Recommendations
• Play Style/Tutoring
Organize Gaming Communities
• Player Behavior
A. Iosup, CAMEO: Continuous Analytics for Massively
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing
Multiplayer Online Games on Cloud Resources.
ROIA, Euro-Par 2009 Workshops.
28
Continuous Analytics for MMOGs
MMOG Data =
raw and derivative information
from the virtual world (millions
of users)
Continuous Analytics for MMOGs =
Analysis of MMOG data s.t.
important events are not lost
• Data collection
• Data storage
• Data analysis
• Data presentation
• … at MMOG rate and scale
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing
29
Continuous Analysis for MMOGs
Main Uses By and For Gamers
1. Support player communities
2. Understand play patterns
(decide future investments)
3. Prevent and detect cheating or
disastrous game exploits
(think MMOG economy reset)
4. Broadcasting of gaming events
5. Data for advertisement companies
(new revenue stream for MMOGs)
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing
30
The CAMEO Framework*
1. Address community needs
•
•
Can analyze skill level, experience points, rank
Can assess community size dynamically
2. Using on-demand technology: Cloud Comp.
•
Dynamic cloud resource allocation, Elastic IP
3. Data management and storage: Cloud Comp.
•
Crawl + Store data in the cloud (best performance)
4. Performance, scalability, robustness: Cloud Comp.
* A. Iosup,EITCAMEO:
Continuous
for
Massively
ICT Labs Workshop
at TU Delft,Analytics
May 2011 – Cloud
Computing
Multiplayer Online Games on Cloud Resources. ROIA,
Euro-Par 2009 Workshops, LNCS 6043, (2010)
32
CAMEO: Cloud Resource Management
2,500
Used Amazon EC2 Instances
Dynamic Analytics
Steady Analytics
2,000
1,500
1,000
Unexpected
500
Periodic
Burst
3/6/2009
3/13/2009
3/20/2009
3/27/2009
Date
• Snapshot = dataset for a set of players
• More machines = more snapshots per time unit
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing
33
CAMEO: Exploiting Cloud Features
• Machines close(r) to server
• Traffic dominated
by small packets
(latency)
• Elastic IP to avoid traffic bans
(legalese: acting on behalf of real people)
A. Iosup, A. Lascateu, N. Tapus, CAMEO: Enabling
EIT ICT Labs Workshop
at TU Delft, MayMultiplayer
2011 – Cloud Computing
Social Networks
for Massively
Online
Games through Continuous Analytics and Cloud
Computing, ACM NetGames 2010.
34
Sample Game Analytics Results
Skill Level Distribution in RuneScape
• RuneScape: 135M+ open accounts (world record)
• Dataset: 3M players (largest measurement, to date)
• 1,817,211 over level 100
• Max skill 2,280
• Number of mid- and
high-level players
is significant
Mid
Level
High
Level
New Content
Generation Challenge
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing
35
Cost of Continuous RuneScape Analytics
• Put a price on MMOG analytics (here, $425/month,
or less than $0.00015/user/month)
• Trade-off accuracy vs. cost, runtime is constant
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing
36
Cloud Scheduling
A Provisioning-and-Allocation problem
Many other
possibilities
Manage
Provision
Allocate
Queue
Before
experiment
Queue
Application
Job
During experiment
We’re just
started working
on this problem
When needed
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing
37
Take Home Message: TUD Research in Clouds
• Understanding how real clouds work (focus on data-intensive)
• Modeling cloud infrastructure (performance, availability) and workloads
• Compare clouds with other platforms (grids, parallel production env., p2p,…)
• The Cloud Workloads Archive: easy to share
cloud workload traces and research associated with them
• Complement the Grid Workloads Archive
•
•
Publications
2008: ACM SC
2009: ROIA, CCGrid,
Scheduling: making clouds work
NetGames, EuroPar (Best
• eScience and gaming applications
Paper Award) 2010: IEEE
(cloud application architectures)
TPDS, Elsevier CCPE,…
• MapReduce
2011: ICPE, CCGrid, Book
Chapter CAMEO+Clouds,
Massive Gaming: services on clouds
IEEE TPDS, IJAMC, …
• CAMEO: Massive Game Analytics
Graduation (Forecast)
• Toolkit for Online Social Network analysis
2011-2014: 2+3PhD,
• POGGI: game content generation at scale
10+MSc, nBSc
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing
39
Thank you for your attention!
Questions? Suggestions? Observations?
More Info:
- http://www.st.ewi.tudelft.nl/~iosup/research.html
- http://www.st.ewi.tudelft.nl/~iosup/research_gaming.html
- http://www.st.ewi.tudelft.nl/~iosup/research_cloud.html
Alexandru Iosup
Do not hesitate to
contact me…
[email protected]
http://www.pds.ewi.tudelft.nl/~iosup/ (or google “iosup”)
Parallel and Distributed Systems Group
Delft University of Technology
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing
40
EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing
41