Cloud Computing Research at TU Delft (2008—ongoing) Parallel and Distributed Systems Group Delft University of Technology The Netherlands 3TU. = + + Our team: Undergrad Gargi Prasad, Arnoud Bakker, Nassos Antoniou, Thomas de Ruiter, … Grad Siqi Shen, Nezih Yigitbasi, Ozan Sonmez Staff Henk Sips, Dick Epema, Alexandru Iosup Collaborators Ion Stoica and the Mesos team (UC Berkeley), Thomas Fahringer, Radu Prodan (U. Innsbruck), Nicolae Tapus, Mihaela Balint, Vlad Posea (UPB), Derrick Kondo, Emmanuel Jeannot (INRIA), ... EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing 1 TUD Team: 2 Staff, 2+3PhD, n MSc, ... Our team: Undergrad Adrian Lascateu, Alexandru Dimitriu (UPB, Romania), …, Grad Vlad Nae (U. Innsbruck, Austria), Siqi Shen, Nezih Yigitbasi (TU Delft, the Netherlands), …Staff Alexandru Iosup, Dick Epema, Henk Sips (TU Delft), Thomas Fahringer, Radu Prodan (U. Innsbruck), Nicolae Tapus, Mihaela Balint, Vlad Posea (UPB), etc. EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing 2 What is Cloud Computing? VS http://www.flickr.com/photos/dimitrisotiropoulos/4204766418/ • • • • “The path to abundance” On-demand capacity Pay what you use Great for web apps (EIP, web crawl, DB ops, I/O) Tropical Cyclone Nargis (NASA, ISSS, 04/29/08) • “The killer cyclone” • Not so great performance for sci. applications1 • Long-term perf. variability2 • How to manage? 1- Iosup et al., Performance Analysis of Cloud Computing Services for Many Tasks Scientific Computing, IEEE TPDS, 2011. EIT ICT Labs Cloud Workshop Futures at TU Workshop Delft, May2010 2011 – Cloud Cloud Computing Computing Support 2- Iosup et al., On the Performance of Production3 Cloud for MassivelyVariability Social Gaming Services, CCGrid 2011. 3 What do We Want from Clouds? Good IaaS, PaaS, SaaS • • • • Portability (Virtualisation, no vendor lock-in) Accountability (lease what you use) … for eScience … for Massively Social Gaming Good resource management • • • • • Elasticity Reliability Efficiency (Scheduling) Data-aware mechanisms Being “green”? Performance evaluation (What is “Good”?) EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing 4 Agenda 1. 2. 3. 4. Introduction Cloud Performance Studies The Cloud Workloads Archive Massivizing Online Social Games using Clouds 1. Platform Challenge 2. Content Challenge 3. Analytics Challenge 5. Other Cloud Activities at TUD 6. Take-Home Message EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing 5 Cloud Performance Studies • Many-Tasks Scientific Computing • Quantitative definition: J jobs and B bags-of-tasks • Extracted proto-MT users from grid and parallel production environments • Performance Evaluation of Four Commercial Clouds • Amazon EC2, GoGrid, Elastic Hosts, Mosso • Resource acquisition, Single- and Multi-Instance benchmarking • Low compute and networking performance • Clouds vs Other Environments • Order of magnitude better performance needed for clouds • Clouds already good for short-term, deadline-driven scientific computing 1- Iosup et al., Performance Analysis of Cloud Computing Services for Many Tasks Scientific Computing, IEEE TPDS, 2011 (in print) http://www.st.ewi.tudelft.nl/~iosup/cloud-perf10tpds_in-print.pdf EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing 2- Iosup et al., On the Performance Variability of Production Cloud 6 Services, CCGrid 2011, pds.twi.tudelft.nl/reports/2010/PDS-2010-002.pdf Performance Evaluation of Clouds [1/3] Tools: C-Meter Yigitbasi et al.: C-Meter: A Framework for Performance Analysis of Computing Clouds. Proc. of CCGRID 2009 EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing 7 Performance Evaluation of Clouds [2/3] Low Performance for Sci.Comp. • Evaluated the performance of resources from four production, commercial clouds. • GrenchMark for evaluating the performance of cloud resources • C-Meter for complex workloads • Four production, commercial IaaS clouds: Amazon Elastic Compute Cloud (EC2), Mosso, Elastic Hosts, and GoGrid. • Finding: cloud performance low for sci.comp. S. Ostermann, A. Iosup, N. Yigitbasi, R. Prodan, T. Fahringer, and D. Epema, A Performance Analysis of EC2 Cloud Computing Services for Scientific Computing, Cloudcomp 2009, LNICST 34, pp. 115–131, 2010. EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing 8 Performance Evaluation of Clouds [3/3] Cloud Performance Variability • Long-term performance variability of production cloud services • IaaS: Amazon Web Services • PaaS: Google App Engine Amazon S3: GET US HI operations • Year-long performance information for nine services • Finding: about half of the cloud services investigated in this work exhibits yearly and daily patterns; impact of performance variability depends on application. A. Iosup, N. Yigitbasi, and D. Epema, On the Performance Variability of Production Cloud Services, CCGrid 2011. EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing 9 Agenda 1. 2. 3. 4. Introduction Cloud Performance Studies The Cloud Workloads Archive Massivizing Online Social Games using Clouds 1. Platform Challenge 2. Content Challenge 3. Analytics Challenge 5. Other Cloud Activities at TUD 6. Take-Home Message EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing 10 Traces: Sine Qua Non in Comp.Sys.Res. • “My system/method/algorithm is better than yours (on my carefully crafted workload)” • Unrealistic (trivial): Prove that “prioritize jobs from users whose name starts with A” is a good scheduling policy • Realistic? “85% jobs are short”; “10% Writes”; ... • Major problem in Computer Systems research • Workload Trace = recording of real activity from a (real) system, often as a sequence of jobs / requests submitted by users for execution • Main use: compare and cross-validate new job and resource management techniques and algorithms • Major problem: real workload traces from several sources August 26, 2010 11 The Cloud Workloads Archive (CWA) What’s in a Name? CWA = Public collection of cloud/data center workload traces and of tools to process these traces; allows us to: 1. Compare and cross-validate new job and resource management techniques and algorithms, across various workload traces 2. Determine which (part of a) trace is most interesting for a specific job and resource management technique or algorithm 3. Design a general model for data center workloads, and validate it with various real workload traces 4. Evaluate the generality of a particular workload trace, to determine if results are biased towards a particular trace 5. Analyze the evolution of workload characteristics across long timescales, both intra- and inter-trace EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing 12 12 One Format Fits Them All • Flat format CWJ CWJD CWT CWTD • Job and Tasks • Summary (20 unique data fields) and Detail (60 fields) • Categories of information • Shared with GWA, PWA: Time, Disk, Memory, Net • Jobs/Tasks that change resource consumption profile • MapReduce-specific (two-thirds data fields) A. Iosup, R. Griffith, A. Konwinski, M. Zaharia, A. Ghodsi, I. Stoica, Data Format for the Cloud Workloads Archive, v.3, 13/07/10 EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing 13 13 CWA Contents: Large-Scale Workloads Trace ID System Size J/T/Obs Period Notes CWA-01 Facebook 1.1M/-/- 5m/2009 Time & IO CWA-02 Yahoo M 28K/28M/- 20d/2009 ~Full detail CWA-03 Facebook 2 61K/10M/- 10d/2009 Full detail CWA-04 Facebook 3 ?/?/- 10d/01-2010 Full detail CWA-05 Facebook 4 ?/?/- 3m/02+2010 Full detail CWA-06 Google 2 25 Aug 2010 CWA-07 eBay 23 Sep 2010 CWA-08 Twitter Need help! CWA-09? Google 9K/177K/4M 7h/2009 Coarse,Period • Tools • Convert to CWA format • Analyze and model automatically Report EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing 14 14 The Cloud Workloads Archive • Looking for invariants • Wr [%] ~40% Total IO, but absolute values vary Trace ID Total IO [MB] Rd. [MB] Wr [%] HDFS Wr[MB] CWA-01 10,934 6,805 38% 1,538 CWA-02 75,546 47,539 37% 8,563 • # Tasks/Job, ratio M:(M+R) Tasks, vary • Understanding workload evolution EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing 15 Agenda 1. 2. 3. 4. Introduction Cloud Performance Studies The Cloud Workloads Archive Massivizing Online Social Games using Clouds 1. Platform Challenge 2. Content Challenge 3. Analytics Challenge 5. Other Cloud Activities at TUD 6. Take-Home Message EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing 16 What’s in a name? MSG, MMOG, MMO, … 250,000,000 active players 3BN hours/week world-wide Massively Social Gaming = (online) games with massive numbers of players (100K+), for which social interaction helps the gaming experience Romeo and Juliet 1. Virtual world Explore, do, learn, socialize, compete + 2. Content Graphics, maps, puzzles, quests, culture + 3. Game data Player stats and relationships EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing 17 FarmVille, a Massively Social Game Sources: CNN, Zynga. EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing 18 Source: InsideSocialGames.com MSGs are a Popular, Growing Market • 25,000,000 subscribed players (from 250,000,000+ active) • Over 10,000 MSGs in operation • Subscription market size $7.5B+/year, Zynga $600M+/year Sources: MMOGChart, own research. Sources: ESA, MPAA, RIAA. EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing 19 Massivizing Games using Clouds (Platform Challenge) Build MSG platform that uses (mostly) cloud resources • Close to players • No upfront costs, no maintenance • Compute platforms: multi-cores, GPUs, clusters, all-in-one! Nae, Iosup, Prodan, Dynamic Resource Provisioning in Massively Multiplayer Online Games, IEEE TPDS, 2011. (Content Challenge) Produce and distribute content for 1BN people • Game Analytics Game statistics • Auto-generated game content Iosup, POGGI: Puzzle-Based Online Games on Grid Infrastructures, EuroPar 2009 (Best Paper Award) (Analytics Challenge) Build cloud-based layer to Improve gaming experience • Game Analytics Ranking / Rating • Game Analytics Matchmaking / Recommendations Iosup, Lascateu, Tapus. CAMEO: social networks for EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing MMOGs through continuous analytics and cloud computing, ACM NetGames 2010. 20 Cloudifying: PaaS for MSGs (Platform Challenge) Build MSG platform that uses (mostly) cloud resources • • • • • • • • • Close to players No upfront costs, no maintenance Compute platforms: multi-cores, GPUs, clusters, all-in-one! Performance guarantees Code for various compute platforms—platform profiling Misprediction=$$$ What services? Vendor lock-in? My data EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing in Nae, Iosup, Prodan, Dynamic Resource Provisioning Massively Multiplayer Online Games, IEEE TPDS, 2011. 21 Proposed hosting model: dynamic • Using data centers for dynamic resource allocation Massive leave join Massive join • Main advantages: 1. Significantly lower over-provisioning 2. Efficient coverage of the world is possible EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing [Source: Nae, Iosup, and Prodan, ACM SC 2008] 22 Static vs. Dynamic Allocation Q:What is the penalty for static vs. dynamic allocation? 250% 25% EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing [Source: Nae, Iosup, and Prodan, ACM SC 2008] 23 Cloudifying: Content, Content, Content (Content Challenge) Produce and distribute content for 1BN people • • • • • • Game Analytics Game statistic Crowdsourcing Storification Auto-generated game content Adaptive game content Content distribution/ Streaming content EIT ICT Labs Workshop at TU Delft,Online May 2011 –Games Cloud Computing A. Iosup, POGGI: Puzzle-Based on Grid Infrastructures, EuroPar 2009 (Best Paper Award) 24 (Procedural) Game Content (Generation) Derived Content NewsGen, Storification Hendricks, Meijer, vd Velden, Iosup, Procedural Game Content Generation: A Survey, Working Paper, 2010 Game Design Rules, Mechanics, … Game Scenarios Puzzle, Quest/Story, … Game Systems Eco, Road Nets, Urban Envs, … Game Space Height Maps, Bodies of Water, Placement Maps, … Game Bits Texture, Sound, Vegetation, Buildings, Behavior, Fire/Water/Stone/Clouds EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing 25 The New Content Generation Process* Only the puzzle concept, and the instance generation and solving algorithms, are produced at development time ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing * A. Iosup,EITPOGGI: Puzzle-Based Online Games on Grid Infrastructures, EuroPar 2009 (Best Paper Award) 26 Puzzle-Specific Considerations Generating Player-Customized Content Puzzle difficulty • • • • 4 Solution size Solution alternatives Variation of moves Skill moves Player ability • Keep population statistics and generate enough content for most likely cases • Match player ability with puzzle difficulty • Take into account puzzle freshness 21 EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing 27 Cloudifying: Social Everything! • Social Network=undirected graph, relationship=edge • Community=sub-graph, density of edges between its nodes higher than density of edges outside sub-graph (Analytics Challenge) Build cloud-based layer to Improve gaming experience • Ranking / Rating • Matchmaking / Recommendations • Play Style/Tutoring Organize Gaming Communities • Player Behavior A. Iosup, CAMEO: Continuous Analytics for Massively EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing Multiplayer Online Games on Cloud Resources. ROIA, Euro-Par 2009 Workshops. 28 Continuous Analytics for MMOGs MMOG Data = raw and derivative information from the virtual world (millions of users) Continuous Analytics for MMOGs = Analysis of MMOG data s.t. important events are not lost • Data collection • Data storage • Data analysis • Data presentation • … at MMOG rate and scale EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing 29 Continuous Analysis for MMOGs Main Uses By and For Gamers 1. Support player communities 2. Understand play patterns (decide future investments) 3. Prevent and detect cheating or disastrous game exploits (think MMOG economy reset) 4. Broadcasting of gaming events 5. Data for advertisement companies (new revenue stream for MMOGs) EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing 30 The CAMEO Framework* 1. Address community needs • • Can analyze skill level, experience points, rank Can assess community size dynamically 2. Using on-demand technology: Cloud Comp. • Dynamic cloud resource allocation, Elastic IP 3. Data management and storage: Cloud Comp. • Crawl + Store data in the cloud (best performance) 4. Performance, scalability, robustness: Cloud Comp. * A. Iosup,EITCAMEO: Continuous for Massively ICT Labs Workshop at TU Delft,Analytics May 2011 – Cloud Computing Multiplayer Online Games on Cloud Resources. ROIA, Euro-Par 2009 Workshops, LNCS 6043, (2010) 32 CAMEO: Cloud Resource Management 2,500 Used Amazon EC2 Instances Dynamic Analytics Steady Analytics 2,000 1,500 1,000 Unexpected 500 Periodic Burst 3/6/2009 3/13/2009 3/20/2009 3/27/2009 Date • Snapshot = dataset for a set of players • More machines = more snapshots per time unit EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing 33 CAMEO: Exploiting Cloud Features • Machines close(r) to server • Traffic dominated by small packets (latency) • Elastic IP to avoid traffic bans (legalese: acting on behalf of real people) A. Iosup, A. Lascateu, N. Tapus, CAMEO: Enabling EIT ICT Labs Workshop at TU Delft, MayMultiplayer 2011 – Cloud Computing Social Networks for Massively Online Games through Continuous Analytics and Cloud Computing, ACM NetGames 2010. 34 Sample Game Analytics Results Skill Level Distribution in RuneScape • RuneScape: 135M+ open accounts (world record) • Dataset: 3M players (largest measurement, to date) • 1,817,211 over level 100 • Max skill 2,280 • Number of mid- and high-level players is significant Mid Level High Level New Content Generation Challenge EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing 35 Cost of Continuous RuneScape Analytics • Put a price on MMOG analytics (here, $425/month, or less than $0.00015/user/month) • Trade-off accuracy vs. cost, runtime is constant EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing 36 Cloud Scheduling A Provisioning-and-Allocation problem Many other possibilities Manage Provision Allocate Queue Before experiment Queue Application Job During experiment We’re just started working on this problem When needed EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing 37 Take Home Message: TUD Research in Clouds • Understanding how real clouds work (focus on data-intensive) • Modeling cloud infrastructure (performance, availability) and workloads • Compare clouds with other platforms (grids, parallel production env., p2p,…) • The Cloud Workloads Archive: easy to share cloud workload traces and research associated with them • Complement the Grid Workloads Archive • • Publications 2008: ACM SC 2009: ROIA, CCGrid, Scheduling: making clouds work NetGames, EuroPar (Best • eScience and gaming applications Paper Award) 2010: IEEE (cloud application architectures) TPDS, Elsevier CCPE,… • MapReduce 2011: ICPE, CCGrid, Book Chapter CAMEO+Clouds, Massive Gaming: services on clouds IEEE TPDS, IJAMC, … • CAMEO: Massive Game Analytics Graduation (Forecast) • Toolkit for Online Social Network analysis 2011-2014: 2+3PhD, • POGGI: game content generation at scale 10+MSc, nBSc EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing 39 Thank you for your attention! Questions? Suggestions? Observations? More Info: - http://www.st.ewi.tudelft.nl/~iosup/research.html - http://www.st.ewi.tudelft.nl/~iosup/research_gaming.html - http://www.st.ewi.tudelft.nl/~iosup/research_cloud.html Alexandru Iosup Do not hesitate to contact me… [email protected] http://www.pds.ewi.tudelft.nl/~iosup/ (or google “iosup”) Parallel and Distributed Systems Group Delft University of Technology EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing 40 EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing 41
© Copyright 2026 Paperzz