Document

THE WORLD WIDE W5
A GLOBAL ARCHITECTURE FOR DECISION
SUPPORT
Rick McGeer
HP Labs, University of Victoria
November 18, 2010
1
of 29
©2010 HP Created on xx/xx/xxxx
MOTIVATION: THE INTERCLOUD
– Internet: set of standards and protocols which permit
interconnection of independently-administered networks
• Network
of networks
– Intercloud: Set of standards and protocols which permit
interconnection of independently administered clouds
• Term
due to Greg Papadopoulos
• Defining
infrastructure of 2010’s and beyond
– Question: What will the Intercloud look like? What makes
it an intercloud (as opposed to a Cloud)?
2
of 29
©2010 HP Created on xx/xx/xxxx
OUTLINE
– The problem of big data
– How to build the world’s greatest digital library
– How to build the world’s greatest decision support engine
– The building blocks
•
Virtualization: VMs, PlanetLab, Seattle
•
Programming Paradigms: MapReduce, Hadoop, Pig
– Next Steps and a Plan
3
of 29
©2010 HP Created on xx/xx/xxxx
3
THE WORLD IS DROWNING IN DATA
– Rise of massive numbers of high-capacity sensors
•
5 MP (= 15 MB/frame) sensors now ~ $10 (and dropping)
•
Each of these capable of generating 1 GB per second
– Massive deployments and applications
4
•
Genomics
•
Sloan Digital Sky Survey (200 GB/day)
•
ALICE detector at CERN (1 GB/second = 86 TB/day) [only 1 month of the year]
•
CASA Atmospheric sensing experiment (100 Mb/s/radar on-board, 4 Mb/s/radar
[network limitation]
•
SmartSantander (20,000 sensors in a city, Kb/sec –100 Mb/sec, depending…)
•
~ 1 Exabtyte in medical images worldwide…
•
Many, many more
of 29
©2010 HP Created on xx/xx/xxxx
4
WE NEED TO REDUCE AND COMPUTE
ON ALL THIS STUFF
– Data needs to be searched, reduced, analyzed
– Ex: Jim Gray and the Digital Telescope (SkyServer)
•
•
•
Make SDSS data available online
Searchable by astronomers and graduate students
Resulted in a 100x increase in number of galaxies classified
– Ex: ALICE detector (CERN)
•
•
•
Produces 1 GB/sec of physics data, sustained, one month/year
Data distributed to LHC Grid Tier-One sites
Processed and turned into physics, redistributed to Tier-II sites
– Ex: VHA Health Informatics (20TB as of 2007) (courtesy Jim Gray)
•
Support epidemiological studies
− 7 million enrollee, 5 million patients
− Example Milestones:
•
•
•
5
of 29
1 Billionth Vital Sign loaded in April ‘06
30-minutes to population-wide obesity analysis
Discovered seasonality in blood pressure -- NEJM fall ‘06
©2010 HP Created on xx/xx/xxxx
5
TWO APPROACHES TO ANALYZING
DATA
1. Schlep all the data to the processing site
Use the network
a)
−
This is going to hurt….
−
US coast-to-coast TCP performance (Linux 2.6, standard initial window): ~3.5 Mb/s
−
Best possible: 10 Gb/s (dedicated fiber, TCP Offload Engine, $$$)
−
Can be assisted by prefetch (cf, LambdaRAM, 20x performance increase in cyclone analysis)
Send the disk
b)
−
100 TB disk, Fed Ex 24-hour delivery (Customs?): ~10 Mb/s
−
24 hour latency for first byte, disk performance (~ 60 Gb/s) thereafter….
−
Actually easier just to send the computer (cf Jim Gray)
2. Schlep the processing to the data
6
•
Makes a lot more sense
•
Processing in general reduces data; results are cheaper to send than data
•
Processing power is ubiquitous
•
Programs are tiny (always under 1 GB)
of 29
©2010 HP Created on xx/xx/xxxx
6
QUESTION: HOW CAN WE BUILD A
DISTRIBUTED DATA ANALYSIS SYSTEM?
– Key: Want to run programs at the data
– So….
1.
Programs have to run where the data is
2.
Programs have to be safe
3.
Resources must be allocated in a reasonable manner…
–
7
Return to this after a little inspiration
of 29
©2010 HP Created on xx/xx/xxxx
7
OUTLINE
– The problem of big data
– How to build the world’s greatest digital library
– How to build the world’s greatest decision support engine
– The building blocks
•
Virtualization: VMs, PlanetLab, Seattle
•
Programming Paradigms: MapReduce, Hadoop, Pig
– Next Steps and a Plan
8
of 29
©2010 HP Created on xx/xx/xxxx
8
BUILDING THE WORLD’S GREATEST
DIGITAL LIBRARY: CHOICE 1
– Build a massive database system
– Buy thousands or millions of servers
– Hire tens of thousands of programmers
– Buy out the world supply of hard disks
– Spend billions of dollars
– Cover Kansas in computers (central location, not much else going on,
GpENI gives us a start)
•
Contact James Sterbenz for details….
– Never get it done….
9
of 29
©2010 HP Created on xx/xx/xxxx
9
BUILDING THE WORLD’S GREATEST
DIGITAL LIBRARY: CHOICE 2
– Invent a simple protocol by which one computer can send a file to
another
– Invent a simple file format which can be easily created on a text editor
– Invent a simple, universal client which can run on anything
– Let nature take its course…
– HTTP
– HTML
– Mosaic (then Navigator, IE, Firefox, Chrome, Safari, Opera…).
10 of 29
©2010 HP Created on xx/xx/xxxx
10
CONCLUSION
– The first method might have worked…
– The second one certainly did
•
Currently a zettabyte (1021 bytes) stored on the world’s Cloud servers…
•
How much is that?
•
6.9 billion people (6.9 x 109 ) people on earth….
•
About 1011 bytes (100 gigabytes) for every person on earth
•
Average book size: 6 MB (6 x 106) bytes
•
30,000 books for every man, woman, and child on earth….
•
Doubling every 18 months….(data from IDC)
– More to the point….
•
Every trivial fact (fact which can be established by lookup) now “known” by every
connected individual
•
For the first time since Francis Bacon (1561-1626) one person can “know”
everything…
11 of 29
©2010 HP Created on xx/xx/xxxx
11
OUTLINE
– The problem of big data
– How to build the world’s greatest digital library
– How to build the world’s greatest decision support engine
– The building blocks
•
Virtualization: VMs, PlanetLab, Seattle
•
Programming Paradigms: MapReduce, Hadoop, Pig
– Next Steps and a Plan
12 of 29
©2010 HP Created on xx/xx/xxxx
12
BUILDING THE WORLD’S GREATEST
DECISION SUPPORT SYSTEM: CHOICE
1
– Build a massive database system
– Buy thousands or millions of servers
– Hire tens of thousands of programmers
– Buy out the world supply of hard disks
– Spend billions of dollars
– Cover Kansas in computers (central location, not much else going on,
GpENI gives us a start)
•
Contact James Sterbenz for details….
– Never get it done….
13 of 29
©2010 HP Created on xx/xx/xxxx
13
BUILDING THE WORLD’S GREATEST
DECISION SUPPORT SYSTEM: CHOICE
2– Invent a simple protocol by which one computer can send a program
to another and have it reliably (and safely) executed
– Invent a simple, universal meta schema and API which can be easily
implemented in anything
– Invent a simple, universal query system which runs on everything
– Let nature take its course…
–?
–?
–?
14 of 29
©2010 HP Created on xx/xx/xxxx
14
OUTLINE
– The problem of big data
– How to build the world’s greatest digital library
– How to build the world’s greatest decision support engine
– The building blocks
•
Virtualization: VMs, PlanetLab, Seattle
•
Programming Paradigms: MapReduce, Hadoop, Pig
– Next Steps and a Plan
15 of 29
©2010 HP Created on xx/xx/xxxx
15
VIRTUALIZATION: KEY BUILDING BLOCK
TECHNOLOGY
– Key problem for software generally: dependence on environment
•
Instruction set
•
Operating system
•
Installed libraries
•
….
– Solution (since at least 1965!) virtualization
•
Essentially, carry the environment around with the program in a “virtual” machine
•
First introduced in the IBM 360 line
•
Brought to a high art in the late 1990’s, early 2000’s
16 of 29
©2010 HP Created on xx/xx/xxxx
16
VIRTUALIZATION IN THE MODERN ERA
–VMWare: Mendel Rosenblum (Stanford) + very
succesful startup
• Permitted
abstraction of both OS and instruction set
–“Paravirtualization”: Ian Clarke (Cambridge).
• Xen
system
• Abstraction
• Permitted
of OS
running several isolated virtual machines on same
hardware
• Gave
each virtual machine the illusion that it was operating as its
own isolated physical machine
− Performance and security isolation
17 of 29
©2010 HP Created on xx/xx/xxxx
17
PARAVIRTUALIZATION ENABLES THE
CLOUD
– Most applications and web services don’t need a whole machine
– Virtualization lets multiple virtual machines share the same physical
machine transparently
– Permits: renting virtual machines for cheap!
•
Amazon EC2: $.10 for a “micro” instance
– Bandwidth, storage, processing are easy to control on a per-virtual
machine basis
•
Means that offering VM computation services is safe
•
Offering VM computation services can be done for fixed, controllable cost
•
No longer risky to run someone else’s code…
18 of 29
©2010 HP Created on xx/xx/xxxx
18
BUILDING BLOCK II: STANDARD MODEL
OF COMPUTATION OVER DATA
– Observation (due to Google). Most highly-parallel computation can
be thought of as a two-stage process:
•
Map: Perform identical operations over many sets of separate data
•
Reduce: merge the results from the Map phase into the solution
– Google: turn this into a programming model
•
Operate on (key, value) pairs
•
Map step works over a list of keys to produce list of (key, value) pairs
•
Reduce step operates on list of (key, value) pairs to produce final result
– Can be done recursively
– Key point: map step can be run on different data sets on many
difference computers, simultaneously
•
Map nodes send results to Reduce node
•
Reduce nodes form results
19 of 29
©2010 HP Created on xx/xx/xxxx
19
MAP REDUCE EXAMPLE: COUNTING
WORDS
def map(lkey, value, outputCollector):
words = breakStringIntoWords(value);
Runs on Map
Nodes
for word in words: outputCollector.add(word, 1)
return outputCollector
def reduce(outputCollector, resultCollector):
for word in outputCollector.keys():
Runs on
Reduce
Nodes
if resultCollector.hasKey(word):
resultCollector[word] += outputCollector[word]
else: resultCollector[word] = outputCollector[word]
return resultCollector
20 of 29
©2010 HP Created on xx/xx/xxxx
20
MAP REDUCE SUCCESS STORIES,
IMPLEMENTATIONS, AND KEY NEEDS
– Success stories
•
Google: used to completely redo their web index
•
New York Times: Uses MapReduce instances (Hadoop) running on Amazon EC2
nodes to index 4 TB of image data
– Key Needs
•
Distributed File System (e.g., “Cassandra” from Facebook)
•
Scheduler, “Sharder” for Map jobs
•
New networking (“incast” problem).
– Key implementations
•
Hadoop (Apache project)
•
Google, Oracle, Disco, Misco, Twister, Greenplum, Phoenix, Plasma, BashReduce
•
Well over 20, many open-source
21 of 29
©2010 HP Created on xx/xx/xxxx
21
QUERY/DECISION SUPPORT
FRAMEWORKS OVER MAPREDUCE
– Key: MapReduce programmers must still write code in:
•
Erlang, Haskell, Ruby, Bash, C, Go, Python, C#, CUDA, Ocaml, or Java
•
Many implementations, but all require programming
– Need: High-level query interface
•
Something like SQL
•
Tractable for non-programmers
•
Abstract away details of how mapping is done
– Preliminary implementations (open source)
•
Hive framework (Facebook)
•
Pig (Apache project)
22 of 29
©2010 HP Created on xx/xx/xxxx
22
W5
23 of 29
©2010 HP Created on xx/xx/xxxx
23
W5
– Network of large and small clusters, running open query language
over MapReduce to answer user queries
– Each cluster has a standard stack:
•
Cluster manager with resource control (Eucalytpus, Tashi….)
•
Virtualization on each node
•
Map/Reduce implementation inside virtual machines
•
Pig or other query language on Map/Reduce
– “Clusters” can be as small as a single PC!
24 of 29
©2010 HP Created on xx/xx/xxxx
24
PROBLEMS TO SOLVE
– Simple: standardize on common access, resource format
– Key first step: ssh keys, rspecs (GENI)
– Harder: MapReduce across the continents
•
How does this work with 50ms + in latency between Map and Reduce
•
How does one do zero-data-move MapReduce
•
Optimization Function?
– Hardest:
•
Per-user resource and bandwidth allocation across multiple cloud instances
•
When to move data, when not to
•
Distributed rate control to conserve costs while making performance goals
•
Plenty to keep plumbers happy for awhile
25 of 29
©2010 HP Created on xx/xx/xxxx
25
26 of 29
©2010 HP Created on xx/xx/xxxx
26
OUTLINE
– The problem of big data
– How to build the world’s greatest digital library
– How to build the world’s greatest decision support engine
– The building blocks
•
Virtualization: VMs, PlanetLab, Seattle
•
Programming Paradigms: MapReduce, Hadoop, Pig
– Next Steps and a Plan
27 of 29
©2010 HP Created on xx/xx/xxxx
27
CAN IT WORK?
– Only way to find out is to try it
– Build it, and see what happens
– Build one or more cloud cluster(s) and operate it (them) 24/7
•
Euclayptus node manager
•
Walrus storage instance
•
Cassandra distributed file system
•
Standard image featuring MapReduce scheduler as a minimum
•
Pig queries as a maximum
•
Load it up with operational datasets
− Neptune, Herzberg, others?
– Invite scientists to use it
– Find out problems, fix them, make it work
28 of 29
©2010 HP Created on xx/xx/xxxx
28
THANKS!
29 of 29
©2010 HP Created on xx/xx/xxxx