Tianjin University of Science & Technology A young group of the China-VO family Big-data oriented astronomical data processing based on Hadoop & Spark • Footprint Generation • Cross-match Task scheduling strategies of astronomical workflows in cloud Footprint Generation • Sky coverage - an important piece of information about astronomical observations. • Applications: • intersections • unions • other logical operations based on the geometric coverage of regions of the sky • cross-match • Multi-order coverage Healpix maps generated on Hadoop & Spark platform Footprint generation based on Spark Data: Twomass,12.6G, 41067000 records Environment: Dual-core with 4G memory Spark-2.0.2, Hadoop-2.7.3 node number 4 8 time (s) 138s 69s Hadoop based cross-match Step1: data distribution (1 Map+ 1 Reduce) Step2: distance calculation(1 Map) Experimental results time 300273 250 time (s) 200 136 150 100 69 38 50 25 0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 Node number Data: SDSS, 100,106,811 records 2MASS, 470,992,970 records node number 4 8 16 32 64 time (s) 273 136 69 38 25 Spark based cross-match • integrated with footprint generation typical rich-BoT workflows further optimization • scientific workflow scheduling research China-VO and Alibaba-Cloud The cloud China-VO will provide to users: • data • software • computing resources d4 d1 d2 Science workflow • one of the most commonly used application model in Astronomy d3 t2 t4 d6 t1 d6 d3 d5 t3 d7 t5 d8 What is worthy of concern • running efficiency • rental cost • energy consumption High Performance low energy consumption upload data d5 launch application How to achieve these goals? data placement resource allocation task sheduling low cost d1 d2 d3 t3 d3 t1 t2 d4 d7 d8 d6 t4 t5 1 . task and data clustering based on data correlation 2. Cloud environment modeling and the heuristic rule based task scheduling method Characteristics of astronomical workflows applications • Data-intensive & compute-intensive • Rich-BoT structures • Task execution time difficult to estimate • complex network structure • heterogeneous machines Our contributions 3 . Dynamic multi-layer deadline decomposition 4. multi-objective optimization • • • • • • A new energy-aware task scheduling method for data-intensive applications in the cloud, Journal of Network and Computer Applications,2016,59:14-27。 (SCI: WOS:000367491600003) A Data Placement Algorithm for Data Intensive Applications in Cloud,International Journal of Grid and Distribution Computing,2016,9(2):145-156。(EI: 20161002063730) A data placement strategy for data-intensive scientific workflows in cloud,15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015,928-934, 2015.5.4-2015.5.7。(EI: 20153701270757) Heuristic Data Placement for Data-Intensive Applications in Heterogeneous Cloud, Journal of Electrical & Computer Engineering, 2016, 2016(13):1-8 ( EI) Qing Zhao ,Haonan Dai,Congcong Xiong ,Peng Wang,Heuristic Data Layout for Heterogeneous Cloud Data Centers,2015 International Symposium on Information Technology Convergence,2015.10.13-2015.10.15 Qing Zhao, Jizhou Sun,Ce Yu,Jian Xiao,Chenzhou Cui, Xiao Zhang, Improved parallel processing function for high-performance large-scale astronomical cross-matching, Transactions of Tianjin University,2011,17(1):62-67。(EI: 20112013983867) • • • • Qing Zhao, Congcong Xiong, An Improved Data Layout Algorithm Based on Data Correlation Clustering in Cloud, 2014 International Symposium on Information Technology Convergence, 2014 Qing Zhao, Jizhou Sun,Ce Yu,Chenzhou Cui,Liqiang Lv,Jian Xiao,A paralleled largescale astronomical cross-matching function,9th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2009,604-614,2009.6.8-2009.6.11。(EI: 20093912332041) Qing Zhao, Jizhou Sun,Ce Yu,Chenzhou Cui, Jian Xiao,Big data oriented paralleled astronomical cross-match, Journal of Computer Application,2010,30(8):2056-2059 Qing Zhao, Jizhou Sun,Jian Xiao, Ce Yu,Chenzhou Cui, Xu Liu, Ao Yuan, Distributed astronomical cross-match based on MapReduce, Journal of Computer application research, 2010,27(9):3322-3325
© Copyright 2026 Paperzz