PowerPoint 演示文稿

Tianjin University of Science & Technology
A young group of the China-VO family
Big-data oriented astronomical data processing based on Hadoop &
Spark
• Footprint Generation
• Cross-match
Task scheduling strategies of astronomical workflows in cloud
Footprint Generation
• Sky coverage - an important piece of information about astronomical observations.
• Applications:
• intersections
• unions
• other logical operations based on the geometric coverage of regions of the sky
• cross-match
• Multi-order coverage Healpix maps generated on Hadoop & Spark platform
Footprint generation based on Spark
Data: Twomass,12.6G, 41067000 records
Environment: Dual-core with 4G memory
Spark-2.0.2, Hadoop-2.7.3
node
number
4
8
time (s)
138s
69s
Hadoop based cross-match
Step1: data distribution (1 Map+ 1 Reduce)
Step2: distance calculation(1 Map)
Experimental results
time
300273
250
time (s)
200
136
150
100
69
38
50
25
0
4
8
12
16
20
24
28
32
36
40
44
48
52
56
60
64
Node number
Data: SDSS, 100,106,811 records
2MASS, 470,992,970 records
node
number
4
8
16
32
64
time (s)
273
136
69
38
25
Spark based cross-match
• integrated with footprint generation
typical rich-BoT workflows
further optimization
• scientific workflow
scheduling research
China-VO and Alibaba-Cloud
The cloud China-VO will provide to
users:
• data
• software
• computing resources
d4
d1
d2
Science workflow
• one of the most commonly used application
model in Astronomy
d3
t2
t4
d6
t1
d6
d3
d5
t3
d7
t5
d8
What is worthy of concern
• running efficiency
• rental cost
• energy consumption
High
Performance
low energy
consumption
upload data
d5
launch application
How to achieve these
goals?
data placement
resource allocation
task sheduling
low cost
d1
d2
d3
t3
d3
t1
t2
d4
d7
d8
d6
t4
t5
1 . task and data clustering
based on data correlation
2. Cloud environment modeling
and the heuristic rule based
task scheduling method
Characteristics of astronomical
workflows applications
• Data-intensive
& compute-intensive
• Rich-BoT structures
• Task execution time
difficult to estimate
• complex network structure
• heterogeneous machines
Our contributions
3 . Dynamic multi-layer
deadline decomposition
4. multi-objective optimization
•
•
•
•
•
•
A new energy-aware task scheduling method for data-intensive applications in the cloud,
Journal of Network and Computer Applications,2016,59:14-27。 (SCI:
WOS:000367491600003)
A Data Placement Algorithm for Data Intensive Applications in Cloud,International Journal of
Grid and Distribution Computing,2016,9(2):145-156。(EI: 20161002063730)
A data placement strategy for data-intensive scientific workflows in cloud,15th IEEE/ACM
International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015,928-934,
2015.5.4-2015.5.7。(EI: 20153701270757)
Heuristic Data Placement for Data-Intensive Applications in Heterogeneous Cloud, Journal of
Electrical & Computer Engineering, 2016, 2016(13):1-8 ( EI)
Qing Zhao ,Haonan Dai,Congcong Xiong ,Peng Wang,Heuristic Data Layout for
Heterogeneous Cloud Data Centers,2015 International Symposium on Information
Technology Convergence,2015.10.13-2015.10.15
Qing Zhao, Jizhou Sun,Ce Yu,Jian Xiao,Chenzhou Cui, Xiao Zhang, Improved parallel
processing function for high-performance large-scale astronomical cross-matching,
Transactions of Tianjin University,2011,17(1):62-67。(EI: 20112013983867)
•
•
•
•
Qing Zhao, Congcong Xiong, An Improved Data Layout Algorithm Based on Data Correlation
Clustering in Cloud, 2014 International Symposium on Information Technology Convergence,
2014
Qing Zhao, Jizhou Sun,Ce Yu,Chenzhou Cui,Liqiang Lv,Jian Xiao,A paralleled largescale astronomical cross-matching function,9th International Conference on Algorithms and
Architectures for Parallel Processing, ICA3PP 2009,604-614,2009.6.8-2009.6.11。(EI:
20093912332041)
Qing Zhao, Jizhou Sun,Ce Yu,Chenzhou Cui, Jian Xiao,Big data oriented paralleled
astronomical cross-match, Journal of Computer Application,2010,30(8):2056-2059
Qing Zhao, Jizhou Sun,Jian Xiao, Ce Yu,Chenzhou Cui, Xu Liu, Ao Yuan, Distributed
astronomical cross-match based on MapReduce, Journal of Computer application research,
2010,27(9):3322-3325