MapReduce

MapReduce Framework
Korea University of Technology and Education
CSE
Jun-Ki Min
About this Presentation




This is presented based solely on publicly available
information
Information is incomplete and could be inaccurate
Presentation reflects my understanding which may
be erroneous
Some parts of this presentation come from
• Kyuseok Shim, “MapReduce Algorithms for Big Data
•
•
Analysis”, VLDB 2012 Tutorial.
Tom White, “Hadoop The Definitive Guide”, O’reilly
Anold Rajaraman and Dan Weld, “MapReduce
Architecture”
Contents
Introduction of Big Data
 Hadoop/MapReduce
 MapReduce Framework

Introduction of Big Data
Era of Bigdata
• Massive data
• Social Network,
• sensor data,
• mobile hand held devices, etc.
• Definition of Bigdata
• 전형적인 데이터베이스 소프트웨어 도구가 획득, 저장,
관리, 분석할 수 있는 한계를 초과하는 크기의 데이터
집합 - 매킨지 세계 연구소 2011년
• 3V (volume, velocity, variety) - META Group 2001년 (현
Gartner)
5
Flood of Data
NYSE generates 1TB new trade data /
day
6
Flood of Data
Facebook hosts 10 billion photos (1
petabyte)
7
Flood of Data
Internet Archive stores 2 petabytes of
data
8
Individuals’ Data are Growing
Apace
It becomes easier to take more and more photos
9
Individuals’ Data are Growing
Apace
Microsoft Research’s MyLifeBits Project
Capture and encoding
SQL
LifeLog, my life in a terabyte
10
Amount of Public Data Increases

Available Public Data Sets on AWS
• Annotated Human Genome
• Public database of chemical structures
• Various census data and labor statistics
11
‘Digital Universe’ Nears a
Zettabyte
 Digital Universe: the total amount of data stored in the world’s computers
 Zettabyte: 1021 bytes >> Exabyte >> Petabyte >> Terabyte
 만, 억, 조, 경, 해, 자, 양, 구, ....
12
Big Data!
massive data
Social Network, sensor data, mobile hand held
devices, etc.
“More data usually beats better algorithms”
How to store & analyze large data?
13
Data Storage

Current HDD
capacity
1TB
transfer rate
100MB/s
How long it takes to read all the data off the disk?
How about using multiple disks?
Multiple Disk

Hardware Failure

Doing tasks need to combine the
distributed data

What Hadoop Provides
• Reliable shared storage (HDFS)
• Reliable analysis system (MapReduce)
"In pioneer days, they used oxen for
heavy pulling, and when one ox couldn’t
budge a log, they didn’t try to grow a
bigger ox"
. . . Grace Murray Hopper

Comparison with Other Systems
RDBMS
 Grid Computing


Volunteer Computing
RDBMS
*
**
* Low latency for point queries or updates
** Update times of a relatively small amount of data
18
Grid Computing
Shared storage (SAN)
 Works well for predominantly CPU-intensive jobs
 Becomes a problem when nodes need to access large data
19
Volunteer Computing


Volunteers donate CPU time from their idle computers
Work units are sent to computers around the world

Suitable for very CPU-intensive work with small data
sets

Risky due to running work on untrusted machines
20
Hadoop/MapReduce
Brief History of Hadoop


Created by Doug Cutting
Originated in Apache Nutch (2002)
• Open source web search engine, a part of the Lucene
project





NDFS (Nutch Distributed File System, 2004)
MapReduce (2005)
Doug Cutting joins Yahoo! (Jan 2006)
Official start of Apache Hadoop project (Feb 2006)
Adoption of Hadoop on Yahoo! Grid team (Feb 2006)
22
The Apache Hadoop Project
Pig
Chukwa
MapReduce
Core
Hive
HBase
HDFS
Zoo
Keeper
Avro
23
Hadoop


Open source of MapReduce framework of Apache Project
Hadoop Distributed File System (HDFS)
• Store big files across machines
• Store each file as a sequence of blocks
• Each block of a file are replicated for fault tolerance


Distribute processing of large data across up to thousands of
commodity machines
Key components
• MapReduce - distributes applications
• Hadoop Distributed File System (HDFS) - distributes data

A single Namenode (master) and multiple Datanodes (slaves)
• Namenode: manages the file system and access to files by clients
• Datanode: manages the storages attached to the nodes running on
Divide & Conquer Strategy
분할(Divide): 해결하기 쉽도록 문제를 여러
개의 작은 부분으로 나눈다.
 정복(Conquer): 나눈 작은 문제를 각각 해
결한다.
 통합(Combine): (필요하다면) 해결된 해답
을 모은다.

An Example of Divide & Conquer

문제: 여러 개의 단어들로 이루어진 사전 D에 x라는 단
어가 있는지 알라내시오. 만약 x가 D에 있다면 ‘예’를 출
력하고 없다면 ‘아니오’를 출력하시오.
파라미터: D, x

Linear Search

처음부터 끝까지 각 레코드를 비교하면서 찾아가는 방법
시간: |D|

Binary Search : Divide & Conquer 전략
가정) 값들이 정렬되어 있다.
시간: log |D|
MapReduce Framework
MapReduce Framework

Distributed Parallel Execution Environment

Motivation of MapReduce
• Large-Scale Data Processing
• Want to use 1000s of CPUs
• But don’t want hassle of managing things
• MapReduce Architecture provides
• Automatic parallelization & distribution
• Fault tolerance
• I/O scheduling
• Monitoring & status updates
MapReduce framework


Large cluster of low-end commodity machine
Machine
: connected via an underlying network to the rest of the cluster
• Processor
• Fast primary memory
• Slower secondary memory
: is used as part of a global shared memory

When a phase is finished
• every machine has written its output data to shared memory
• Data is synchronized
• Local primary memory is cleared before each synchronization
MapReduce framework
• Fault tolerance, load balancing and
synchronization.
• Are archived by having each machine work on multiple tasks,
making it easier to reprocess and reassign these tasks in case of
machine failure.
• A master
• Controls how these tasks are assigned across
the other works processors
• Workers
Map and Reduce Function


Borrows from functional programming
Users should implement two primary
methods:
• Map: (key1, val1) → [(key2, val2)]
• Reduce: (key2, [val2]) → [(key3, val3)]
Map and Reduce Function

Preliminary
• MapReduce library splits the input files into M pieces

Map function
• Map task : Input split into q part.
• Output
• <key; value> pairs
• Stored in worker’s secondary memory  fault tolerance
• MapReduce library groups all intermediate values
associated with the same intermediate key
• A worker for a map task partitions a set of (key,
value_set) into R pieces (e.g., key mod R)
• A worker notifies the location of the partitioned set to
the master
Map and Reduce Function

Reduce
• accept an intermediate key and a set of values
•
for that key
Output
• Stored in global memory
• Either be the final output of algorithm or used as
input to a new round of MapReduce
Data flow in MapReduce
Distributed Execution Overview
User
Program
fork
assign
map
Input Data
Split 0 read
Split 1
Split 2
fork
Master
fork
assign
reduce
Worker
Worker
Worker
local
write
Worker
Worker
write
Output
File 0
Output
File 1
remote
read,
sort
35
Combine Function
Reduce the result size of map functions
 Perform reduce-like function in each
machine
 Decrease the shuffling cost
 It is desirable to design MapReduce
algorithms to use combine functions

Simple Example of MapReduce
•
•
•
•
Max value
Word Counting
Inverted Index
Page Rank
Max Value

목적
• 주어진 숫자들 중에서 크기가 가장 큰 최대값을 출
력
• 입력: 숫자로 이루어진 Text 파일
• 출력: 입력 받은 숫자 중의 최대값

주요 특징
• FileInputFormat/FileOutputFormat
• 데이타 타입
• 단일 리듀서/단일 리듀스 함수
Serial Algorithm
• Max = -inf
• while (!EOF){
• value = read(data);
• if (value > Max) Max = value;
•}

BigData  inefficient
MapReduce Algorithm
• Map Phase
• 각 슬레이브는 자신에 할당 받은 청크에서 가장 큰값을
계산
• Reduce Phase
• 각 Mapper에서 계산된 결과 중 가장 큰 값을 출력

Serial Version으로 보면 MapReduce가 더 느림
• 예) 데이타: 100만개 , 컴퓨터 100대
• serial algorithm: 100만 번 비교
• MapReduce algorithm: 만개*100대 + 100번
• Parallel Version: 10100번 비교 (만번+100번)
• 100대가 병렬로 수행
Word counting

목적
• 주어진 텍스트 파일에서 각 단어의 출연 횟수를
•
계산하여 출력
입력: Text 파일, 출력: 단어와 개수
41
An Example of Word Counting
with MapReduce
M1
Key
Documents
Doc2 Financial, IMF,
Crisis
Map
Doc1 Financial, IMF,
Economics, Crisis
Financial
IMF
M2
Map
Doc5 Crisis, Harry,
Potter
`
Financial
Documents
Doc4 Financial, Harry,
Potter, Film
`
`
Economics
Crisis
Doc3 Economics, Harry
Value
IMF
Crisis
`
Key
Value
1
Economics
1
Harry
1
Financial
1
1
Harry
1
1
Potter
1
Film
1
1
Crisis
1
Harry
1
Potter
`
`
`
1
1
1
1
An Example of Word Counting
with MapReduce
Documents
1
IMF
Financial
1,Crisis
1
1
1,Crisis
1
1
1
1,Harry
1, 1
1
Film
Economics
Potter
Economics
Crisis
Harry
Key
Value
`
`
3
Economics
2
1
Crisis
3
1,Harry
1, 1
1
Harry
1
1 Harry
1
Film
1
1
1,Film
1
1
Potter
2
1
Potter
1
Potter
1
Reduce
Doc5 Crisis, Harry,
Potter
IMF
Map
Doc4 Financial, Harry,
Potter, Film
Valu
e
1,Crisis
1, 1
Economics
Financial
1
Documents
Key
Value list
Financial
Financial
1
IMF
Doc3 Economics, Harry
KeyValu
e
Reduce
Doc2 Financial, IMF,
Crisis
Map
Doc1 Financial, IMF,
Economics, Crisis
Key
Financial
IMF
Before reduce functions are called,
for each distinct key, the list of its values are generated
`
2
3
Combine Function
Reduce the result size of map functions
 Perform reduce-like function in each
machine
 Decrease the shuffling cost
 It is desirable to design MapReduce
algorithms to use combine functions

An Example of Word Counting
with Combine Function
Key
Documents
Documents
Economics, Harry
Financial, Harry,
Potter, Film
Crisis, Harry,
Potter
Financial
1
IMF
1
Economics
1
Crisis
1
Financial
1
IMF
1
Crisis
1
Key
Combi
ne
Financial, IMF,
Crisis
Map
Financial, IMF,
Economics, Crisis
Valu
e
Valu
e
Financial
2
IMF
2
Economics
1
Crisis
2
An Example of Word Counting
with Combine Function
Documents
Financial, IMF,
Economics, Crisis
Key
Value
1
Harry
1
Documents
Financial
1
Economics, Harry
Harry
1
Potter
1
Film
1
Crisis
1
Harry
1
Potter
1
Financial, Harry,
Potter, Film
Crisis, Harry,
Potter
Key
Combi
ne
Economics
Map
Financial, IMF,
Crisis
Value
Economics
1
Harry
3
Financial
1
Potter
2
Film
1
Crisis
1
An Example of Word Counting
with Combine Function
Key
Documents
2Value list
Key
1, 1
1
2, 1
3
3
1
1
2
2
1
Key
Value
Financial`
3
`
2
Economics
2
Crisis
3
IMF
Harry
Reduce
Crisis
Value
Reduce
Economics
Economics
Crisis
Harry
Harry
Financial
Film
Potter
Potter
Film
Economics, Harry
Crisis, Harry,
Potter
IMF
Key
Economics 2, 11
Financial
Crisis
IMF
22
Documents
Financial, Harry,
Potter, Film
2
Combi
ne
Financial, IMF,
Crisis
Financial
Combi
ne
Financial, IMF,
Economics, Crisis
Valu
e
`
3
Film
1
Potter
2
1
Before reduce functions are called,
for each distinct key, the list of its values are generated
An Example of Building an
Inverted Index
Doc1:
Doc2:
Doc3:
Doc4:
Doc5:
IMF, Financial Economics Crisis
IMF, Financial Crisis
Harry Economics
Financial Harry Potter Film
Harry Potter Crisis
The following is the inverted index of the above data
IMF -> Doc1:1, Doc2:1
Financial -> Doc1:6, Doc2:6, Doc4:1
Economics -> Doc1:16, Doc3:7
Crisis -> Doc1:26, Doc2:16, Doc5:14
Harry -> Doc3:1, Doc4:11, Doc5:1
Potter -> Doc4:17, Doc5:7
Film -> Doc4:24
An Example of Building an
Inverted Index
Key
Doc1
Doc1:1
Documents
IMF
Doc1:12
Financial, IMF,
Economics, Crisis
Economics
Doc1:17
Crisis
Doc1:28
Financial
Doc1:1, Doc2:1, Doc4:1
Financial
Doc2:1
IMF
Doc1:12, Doc2:12
IMF
Doc2:12
Economics
Doc1:17, Doc3:1
Crisis
Doc2:17
Crisis
Doc1:28, Doc2:17, Doc5:1
Harry
Doc3:12, Doc4:12, Doc5:9
Doc3 Economics, Harry
Financial, Harry,
Doc4
Potter, Film
Crisis, Harry,
Potter
Key
Value list
Key
Key
Shuffle
Documents
Map
Financial
Financial, IMF,
Doc2 Crisis
Doc5
Value
Value
Value lists
lists
Economics
Doc3:1
Potter
Doc4:19, Doc5:16
Harry
Doc3:12
Film
Doc4:27
Financial
Doc4:1
Harry
Doc4:12
Potter
Doc4:19
Film
Doc4:27
Crisis
Doc5:1
Harry
Doc5:9
Potter
Doc5:16
An Example of Building an
Inverted Index
Key
Value lists
Key
Value lists
Doc1:1, Doc2:1, Doc4:1
Financial
Doc1:1, Doc2:1, Doc4:1
IMF
Doc1:12, Doc2:12
IMF
Doc1:12, Doc2:12
Economics
Doc1:17, Doc3:1
Economics
Doc1:17, Doc3:1
Crisis
Doc1:28, Doc2:17, Doc5:1
Crisis
Doc1:28, Doc2:17, Doc5:1
Harry
Doc3:12, Doc4:12, Doc5:9
Harry
Doc3:12, Doc4:12, Doc5:9
Potter
Doc4:19, Doc5:16
Potter
Doc4:19, Doc5:16
Film
Doc4:27
Film
Doc4:27
Reduce
Financial
2.2 MapReduce 기반 Page Rank 알
고리즘

page rank 란?
• 월드 와이드 웹과 같은 하이퍼링크 구조를 가지
•
는 문서에 상대적 중요도에 따라 가중치를 부여
하는 방법이다. 이 알고리즘은 서로간에 인용과
참조로 연결된 임의의 묶음에 적용할 수 있다.
Page rank 기반 검색 기술 바탕으로 Google사
가 설립되었음.
PageRank Equation


Let
• D be the set of all Web pages
• I(p) be the set of pages that link to the page p
• |O(q)| be the total number of links going out of page q
• d is damping factor (usually 0.85)
The PageRank of page p, denoted by PR(p), is

PR(q) 
1
PR( p)  d  

(
1

d
)

|
O
(
q
)
|
| D|
qI ( p )

following link
random jump
(constant)
...
q1 in I(p)
PR(q1)/|O(q)|
q2 in I(p)
PR(q2)/|O(q)|
p
Example of Page Rank



10 nodes
Initial Page Rank of each node =
1.0/10 = 0.1
PR(v0) = 0.85(0.1/1 +
0.1/2+0.1/3+0.1/3)+0.15/10 = 0.199
Computing PageRank

Sketch of PageRank computation
• Start with current PR(pi) values
• Each page pi distributes current PR(pi) “credit” evenly to all of its linked pages
• Each target page adds up “credit” from all in-bound links to compute next PR(pi)
values
Iterate until values converge

•
Properties of PageRank computation
• Computed iteratively and effects at each iteration is local
• Calculation depends on only the PageRank values of previous iteration
• Individual rows of the adjacency matrix can be processed in parallel
54
PageRank with MapReduce
Map: distribute PageRank “credit” to link targets
q1
q3
q4
q1
q2
q3
q2
Reduce: gather up PageRank “credit” from multiple sources
to compute new PageRank value
Iterate until
convergence
PageRank with MapReduce
Map: distribute PageRank “credit” to link targets
Reduce: gather up PageRank “credit” from multiple sources
to compute new PageRank value
Iterate until
convergence
PageRank with MapReduce
• Each emitted key-value pair <pij, PR> from map functions
•
can go to any machine
The computed PageRank value should be updated using
network
In each machine Links
Page
#ofOut
PR(p11)
id
Links
pi1, pj3
pk1,p11
PR(pi1)
…
Page
id
pi1
#ofOut
Links
…
…
PR(pk1)
…
Page
pk1id
pk1
#ofOut
Links
1
p11
p11
Mi
pi1
pj3
pk1
p11
PR(p11)/2
PR(p11)/2
PR(p12)/2
PR(p12)/2
PR(pk1) = R(p12)/2
Reduce
2
2
PR(pi1) =
PR(p11)/2
Mk
Map
p11
p12
Key-value pairs
Map
PR(P12)
…
Update with network traffics
M1
p11
PR(pk1)
PR(p11) =
PR(pk1)+
PR(p12)/2