For SaaS

Design of Cloud Management Layer
for High-Performance File Transfer
高效能檔案傳輸之雲端層設計
1
Outline
•
•
•
•
•
Introduction
Background
System Design
Implementation
Experiments
2
Introduction
• Motivation
– File Transfer with load sharing and fault tolerance.
Clients
Cloud Management Layer
Cloud Storage
3
Introduction
• Clients can upload the files to the cloud, and promise
your file cannot be lose .
User
(Client)
Files
Upload
Files
Download
Management Layer
Cluster Storage Server
4
Introduction (Cont.)
• Surveillance Application
Host
Device x 30
2000G(2T) / 30 = 67G
67G / 0.3G = 223h
223h / 24h = 9.3 day
Storage:2TB
Size:
0.2~0.5 GB/1hr
5
Introduction (Cont.)
• Used Platforms and API
Hadoop
• Hadoop Distributed File System (HDFS)
– File Storage
• Hbase
– User Authentication
Socket
• File Data
– Transmissions
6
Outline
•
•
•
•
•
Introduction
Background
System Design
Implementation
Experiments
7
Background (Cont.)
What is Cloud Computing?
• Scalable Computing and Storage Resource.
Cloud Source Models
Google doc
Google talk
Dropbox
Google App Engine
Windows Azure
hadoop
AWS EC2
IBM SmartCloud
Hinet hicloud CaaS
Software as a service
(SaaS)
Platform as a service
(PaaS)
Infrastructure as a Service
(IaaS)
Cloud Computing
8
Background (Cont.)
Hadoop
Cloud App
DataBase
File System
Hbase
MapReduce
Parallel
Processing
Hadoop Distributed File System
(HDFS)
• The other components of Hadoop
Pig
Dataflow language and parallel execution Framework
Hive
Data warehouse infrastructure
ZooKeeper
Distributed coordination service
Chukwa
System for collecting management data
Avro
Data serialization system
9
Background (Cont.)
What is “Hadoop Distributed File System (HDFS)”?
Clients
Transmission
Cluster
Storage
Server
Namenode
Datanode
PC
10
Background (Cont.)
• Hbase have high availability, high performance, and
high expansion flexibility.
Table: ‘t1’
'f1'
'f2' 'fn' column-family
row-key
‘c1’ ‘c2’ ‘c3’ ‘c4’ * column-quantifier
v1
r1
v2
v3
r2
v4
11
Outline
•
•
•
•
•
•
•
Introduction
Background
System Design
Implementation
Experiments
Conclusions
Reference
12
System Design(Cont.)
• Components
–
–
–
–
Manager Selection
Authentication
Synchronization
Cloud-Based File Transfer
Client
Management Layer
Node1
Node2
Node3
Noden
Hadoop
Hbase
HDFS
13
System Design(Cont.)
Client
• Manager Selection
1.
2.
3.
4.
5.
Connect Management layer and
perform load balance
Collect memory load information
from each nodes
Select the lowest memory load as
connection port(Nodei)
Connect with Nodei
Authentication
1.Connect Management Layer
and perform load balance
2.Collect memory load
information from each nodes
3.Select the lowest memory
load as connection port(Nodei)
Failure
4.Connect with Nodei
Success
5.Authentication
14
System Design(Cont.)
• Authentication
1.
2.
3.
Manager Selection
Key in account / password
Authentication(Hbase verification)
Synchronization
1.Key in account /
password
Failure
2.Authentication
(Hbase verification)
Success
3.Synchronization
15
System Design(Cont.)
• Synchronization
1.
2.
3.
4.
HDFS files synchronization
in client side
Check files existence
Cloud-Based files transfer
Background Monitoring
Authentication
1.HDFS files
synchronization
in client side
2.Check files existence
No
3.Cloud-Based
files transfer
Yes
Background
Monitoring
16
System Design(Cont.)
• Synchronization
1.
2.
3.
4.
HDFS files synchronization
in client side
Check files existence
Cloud-Based files transfer
Background Monitoring
Background
Monitoring
operation
Insertion
Update
Inserting
Files to
servers
Updating
Files in
servers
deletion
Deleting
Files from
servers
Cloud-Based files transfer
17
System Design(Cont.)
Synchronization
&
Monitoring
• Cloud-Based File Transfer
1.
2.
3.
Send request using socket
Analysis The packet
(to extract the command and path)
Execute command (Write/Send/Delete)
1.Send request using
socket
2.Analysis The packet
(to extract the
command and path)
3.Execute command
Write a file
Send a file
Delete a file
18
Outline
•
•
•
•
•
•
Introduction
Background
System Design
Experiments
Conclusions
Reference
22
Experiments
• Manager Selection
– 隨機配置
– 負載平衡配置
• 選擇負載最小之節點:5 個
• 更新時間:10 秒
• 可同時連線數量:10 個
• Environmental Environment 1
– Servers: 50, 100, 150, 200, 250
– Clients : 2000
• Environmental Environment 2
– Servers: 50
– Clients : 1000, 2000, 3000, 4000, 5000
23
Experiments(Cont.)
• Experiment 1
– Servers: 50, 100, 150, 200, 250
– Clients : 2000
24
Experiments(Cont.)
• Experiment 2
– Servers: 50
– Clients : 1000, 2000, 3000, 4000, 5000
25