Lecture 2 – Theoretical Underpinnings of MapReduce

Lecture 3 – MapReduce:
Implementation
CSE 490h – Introduction
to Distributed Computing,
Spring 2009
Except as otherwise noted, the content of this presentation is
licensed under the Creative Commons Attribution 2.5 License.
Last Class
Input Handling
 Map Function
 Partition Function
 Compare Function
 Reduce Function
 Output Writer

map (Functional Programming)
Creates a new list by applying f to each element of
the input list; returns output in order.
f
f
f
f
f
f
map f lst: (’a->’b) -> (’a list) -> (’b list)
Fold
Moves across a list, applying f to each element
plus an accumulator. f returns the next
accumulator value, which is combined with the
next element of the list
f
f
f
f
f
returned
initial
fold f x0 lst: ('a*'b->'b)->'b->('a list)->'b
Input key*value
pairs
Input key*value
pairs
...
map
map
Data store 1
Data store n
(key 1,
values...)
(key 2,
values...)
(key 3,
values...)
(key 2,
values...)
(key 1,
values...)
(key 3,
values...)
== Barrier == : Aggregates intermediate values by output key
key 1,
intermediate
values
key 2,
intermediate
values
key 3,
intermediate
values
reduce
reduce
reduce
final key 1
values
final key 2
values
final key 3
values
Advantages of MapReduce
Flexible for a wide range of problems
 Fault tolerant
 Scalable

Overview
Hardware
 Task assignment
 Failure
 Non-Determinism
 Optimizations

Commodity Hardware

Cheap Hardware
– 4 GB memory
 100 megabit / sec
 x86 processors running Linux
2

Cheap Hardware + Lots of It = Failure!
Master vs Worker

Users submit jobs into scheduling system
 Implement
map and reduce
 Specify M map tasks and R reducers

Many copies of program started
 One

task is the master
Master assigns map/reduce tasks to idle
workers
Map Tasks
Input broken up into 16MB - 64MB chunks
 M map tasks processed in parallel

Reduce Tasks
R reduce tasks
 Assigned by partitioning function

 Typically:
hash(key) mod R
 Sometimes useful to customize
Master Data Structures

For each map / reduce task, store state
and identity of machine
 State:

Idle, In-Progress, Complete
For each complete map task, store
locations of output (R locations)
Worker with Map Tasks
Parses input data into key/value pairs
 Applies map
 Buffered pairs written to disk, partitioned
into R regions
 Locations of output eventually passed to
master

Worker with Reduce Tasks

Read data from map machines via RPC
 Sorts
data
Applies reduce
 Output appended to final output file

After Reduce
When all complete, master wakes up user
program
 Output available in R output files, with
names specified by user

How do you pick M and R

How many scheduling decisions?
 O(M+R)

How much state in memory by master?
 O(M*R)
M: much larger than number of machines
 R: small multiple of number of machines

Failures & Issues
Worker Failure
 Master Failure
 Stragglers
 Crashes, Etc

Worker Failure

Master pings worker
 No

response -> assumes failed
Failed map tasks
 Completed

& In-Progress tasks set to idle
Failed reduce tasks
 In-Progress
tasks set to idle
Master Failure
You could write checkpoints
 In practice: just let the user deal with it

Stragglers (Causes)

Why?
 Bad
disk but correctable errors
 Too many other tasks
 No caching
Stragglers (Solutions)
Re-schedule remaining tasks when
operation is close to completion
 A task is complete when either primary or
secondary task is complete

Crashes, Etc

Causes:
 Bad
Records
 Bug in Third Party Code

Solution: Skip over errors?
Non-Determinism
Deterministic = distributed implementation
produces same result as sequential
execution
 Non-Deterministic = map or reduce are
non-deterministic

Non-Determinism
Guarantee: output for a specific reduce
task is equivalent to some sequential
operation
 But: output from different reduce tasks
may correspond to different sequential
operations

Non-Determinism
There may be no sequential operation that
matches the full output
 Why?

 Because
R1 and R2 may have read outputs
for the different execution of M
Advanced Stuff
Input Types
 Combiner Function
 Counters

Input Types
May need to change how input is read
 Implement reader interface

Combiner
“Combiner” functions can run on same
machine as a mapper
 Causes a mini-reduce phase to occur
before the real reduce phase, to save
bandwidth

Under what conditions is it sound to use a combiner?
Combiner Function

Can only be used if communicative and
associative
 Communicative:
a+b+c=b+c+a
 Associative: (a × b) × c = a × (b × c)
Counters
Global Counter
 Masters handles issue of duplicate
executions
 Useful for sanity checking or debugging

Discussion Questions



1. Give an example of a MapReduce problem
not listed in the reading. In your example, what
are the map and reduce functions (including
inputs and outputs)?
2. What part of the MapReduce implementation
do you find most interesting? Why?
3. Give an example of a distributable problem
that should not be solved with MapReduce.
What are the limitations of MapReduce that
make it ill-suited for your task?
Discussion Questions

1. Assuming you had a corpus of webpages as input such that the key
for each mapper is the URL and the value is the text of the page,
how would you design a mapper and a reducer to construct an
inverse graph of the web - that is, for each URL output the list
of web pages that point to it?
2. TF–IDF is a statistical value assigned to words in a document
corpus that indicates the relative importance of the word. As part
of computing it, the Inverse Document Frequency of a word is found
from: The number of documents in the corpus divided by the number
of documents containing the word. Given a corpus of documents, and
given that you know how many documents are in the corpus, how
would you use map reduce to find this quantity for every word in
the corpus simultaneously?