IEEE/ACM International Conference on Big Data Computing

Conference Program
2016 IEEE/ACM 9th International Conference
on Utility and Cloud Computing
2016 IEEE/ACM 3rd International Conference
on Big Data Computing, Applications
and Technologies
6-9 December 2016
Shanghai, China
Message from the UCC 2016 General and Organizing Chairs
It is our great pleasure to welcome you to the 9th IEEE/ACM International Conference on Utility and
Cloud Computing (UCC 2016, held at Tongji University, Shanghai, China, from 6–9 December 2016.
The IEEE/ACM International Conference on Utility and Cloud Computing (UCC) is a premier
IEEE/ACM conference covering all areas related to cloud computing as a utility and provides an
international forum for leading researchers and practitioners in this important and growing field. As
with the previous successful instances of this conference series, UCC 2016, to be held in Shanghai,
brings academics and industrial researchers together to discuss leading innovations in cloud
computing research and novel uses of this technology in applications. We received 85 submissions
this year, of which 22 were accepted, leading to an acceptance rate of ~26%. We received papers
from 30 countries, with most submissions being from China, the United Kingdom, and the United
States. The country with the highest acceptance rate (given the number of submissions) is the United
Kingdom.
An international conference requires the hard work and dedication of many people. First, we would
like to thank Program Committee Chairs Josef Spillner (Zurich University of Applied Sciences,
Switzerland) and Alan Sill (Texas Tech, USA) together with all Program Committee members and
reviewers for their considerable time and effort. Their dedication and commitment to coordinating
the review process and ensuring that reviews were of high quality has been essential to once again
provide a high-quality program. We would like to thank Honorary Chair Prof. Rajkumar Buyya and
the Steering Committee members for their valuable help and support for UCC 2016. We would like
to express our appreciation to Workshop Chairs Khalid Elgazzar (Carnegie Mellon University, USA)
and Shangguang Wang (BUPT, China) for coordinating the workshops. We would also like to thank
Proceeding Chairs Samee Khan (North Dakota State University, USA), Ashiq Anjum (University of
Derby, UK), and Bo Yuan (Tongji University, China) for preparing the proceedings for the
conference. Special thanks go to the Publicity Chairs George Papadopoulos (University of Cyprus,
Cyprus), Zhiyi Huang (University of Otago, New Zealand), Zhangxi Lin (Texas Tech, USA), Yan
Wu (Jiangsu University, China), and Luiz Bittencourt (UNICAMP, Brazil) for their efforts to
distribute the Call for Papers in their respective regions around the world. We would like to thank
Poster Chair Rafael Tolosana-Calasanz, (University of Zaragoza, Spain) for arranging the poster
session for the conference, and Doctoral Symposium Chairs Rami Bahsoon (University of
Birmingham, UK) and Zhihui Du (Tsinghua University, China) for organizing a PhD Symposium.
Collectively, their efforts have produced the peer-reviewed, high-quality program that you will see at
UCC 2016.
We also express our gratitude to the Registration and Financial Chair Richard Hill (University of
Derby, UK) for managing the registration system and other technical assistance and to Web Chair
John Panneerselvam (University of Derby, UK) for maintaining the conference website. A very
special appreciation goes out to the UCC 2016 local organizing team for their hard work and local
arrangements for the event.
Finally, we would like to thank the authors for choosing UCC 2016 as the venue at which to present
their research and all of the participants attending this event. We hope that the conference fosters
interaction among researchers and provides a stimulating forum for exchanging new ideas and
sharing development experiences in the rapidly changing field of cloud computing and utility
computing.
We would like to single out one individual who has played a major role in coordinating activities this
year. His hard work, people skills, and unending support for virtually all UCC activities has enabled
us to meet IEEE/ACM deadlines and produce the high-quality program you will see this year. This is
Prof. Lu Liu from the University of Derby. It has been a pleasure to interact with him over the last
year, and to work alongside him with other members of the organizing committee.
We hope you will all enjoy the conference and enjoy your stay in Shanghai China.
Changjun Jiang, Tongji University, China
Omer Rana, Cardiff University, UK
Nick Antonopoulos, University of Derby, UK
UCC 2016 General Chairs
Zhijun Ding, Tongji University, China
Yaying Zhang, Tongji University, China
Cheng Wang, Tongji University, China
Lu Liu, University of Derby, UK
UCC 2016 Organizing Chairs
Message from the UCC 2016 Technical Program
Committee Co-Chairs
We are delighted to see UCC return to China this year. As a strongly community-driven conference, the
9th IEEE/ACM International Conference on Utility and Cloud Computing (UCC 2016) has the ambition
to bring together researchers and practitioners from all parts of the world. This ambition leads to
constant changes in conference locations, formats, topics, and trends. And yet, the topics do not diverge
too much from one year to another, suggesting that cloud and utility computing challenges remain
interesting everywhere.
This year, the carefully selected TPC members, many of whom are previous authors to UCC, have
selected 15 full papers and 6 short papers out of 85 submissions in total. While acceptance rates are poor
metrics for conference quality, they are nevertheless a useful indicator, and in this case are determined to
be 18% for full papers and 26% overall. This highly selective process with 193 reviews in total confirms
the commitment by the UCC TPC and the steering committee to deliver a high-quality conference that
reports on significant works. Five interesting full-paper sessions and one packed short-paper session are
representing current global cloud research.
Submissions have been received from 30 countries. The host country, China, saw 56 submissions,
followed by the United Kingdom with 26, and the United States with 20. The topics cover the whole
cloud stack from hardware and infrastructure to middleware, scheduling and monitoring up to the
applications, and their software design and tuning.
Enjoy the conference with all of its tracks and the wonderful city of Shanghai, which will give you a
good time in connection with the social programme offered by the local host. Shanghai’s history is one
of cultural encounters, debates about difficult situations and how to proceed, and which technologies to
use to advance quickly—the perfect setting for serious debates among academic and industrial
researchers and other attendees at UCC 2016.
Josef Spillner, Zurich University of Applied Sciences, Switzerland
Alan Sill, Texas Tech, USA
UCC 2016 Technical Program Committee Co-Chairs
Message from the IEEE/ACM BDCAT 2016 Program Co-Chairs
On behalf of the program committee, it is our pleasure to welcome you to the 3rd IEEE/ACM
International Conference on Big Data Computing, Applications and Technologies, being held in Shanghai,
China.
Rapid advances in digital sensors, networks, storage, and computation along with their availability at low
cost is leading to the creation of huge collections of data—dubbed “Big Data.” As a result, a Big Data
computing paradigm has emerged, enabling new insights that can change the way business, science, and
governments deliver services to their consumers, and can impact society as a whole. BDCAT provides an
international forum for researchers and practitioners to present and discuss new discoveries,
developments, and results, as well as the latest trends in big data computing, technologies, and
applications.
Since the 1st Big Data Computing conference (BDC 2014 held in London, UK), the conference has been
continuously growing. This year we reviewed 100 submissions from 23 countries. The conference
accepted 24 papers as regular papers, leading to an acceptance rate of 24%. The conference also accepted
an additional 15 papers as short papers. For this we would like to acknowledge the dedication and
tremendous efforts of the program committee and reviewers, who gave their time and expertise as we
handled these submissions.
An event such as BDCAT 2016 is not possible without the coordinated efforts of many dedicated
individuals who volunteer their time and expertise. We would like to acknowledge the leadership of the
conference Honorary Chairs, Prof. Geoffrey Fox at the Indiana University, Prof. Rajkumar Buyya at the
University of Melbourne, and Prof. Beng Chin Ooi at the National University of Singapore. We are also
grateful for the dedication and hard work of the Local Organizing Chair, Prof. Cheng Wong at the Tongji
University. We also acknowledge the Publicity Chairs, Prof. Yaser Jararweh at the Jordan University of
Science and Technology and Prof. Shruti Kohli at the University of Birmingham.
We hope that you will find the BDCAT 2016 technical program interesting and thought provoking, and
that it provides you with a valuable opportunity to share ideas with researchers and practitioners from
academia and industry from around the world.
Prof. Ashiq Anjum
Department of Computing and Mathematics
University of Derby, UK
Prof. Xinghui Zhao
School of Engineering and Computer Science
Washington State University Vancouver, USA
x
Program at A Glance
Activity
Room/Location
C201
Date/Time
5-Dec
14:00-18:00
C301
Registration (Location:Tongji Sino French Center, Siping Road Campus, Tongji University)
6-Dec
08:00 -- 18:00
9:00 - 9:20
9:20- 10:20
10:20- 10:40
10:40 - 12:10
12:10 - 13:10
13:10 - 14:10
14:10 - 15:40
15:40 - 16:00
16:00 - 17:30
18:00 - 20:30
Registration (Location:Tongji Sino French Center, Siping Road Campus, Tongji University)
Opening
Opening (Location:C201)
Plenary Keynote 1 Minyi Guo, "Platform Development for Collaborative Computing with Urban Big Data"(Location: C201)
Tea/Coffee break (JCR)
Parallel Sessions
UCC 1
BDCAT 1
SCCTSA 1
Lunch: Tongji Sanhaowu Restaurant
Plenary Keynote 2 Jae Kyu Lee, "Can the Bright Cloud be a Business Model?"
Parallel Sessions
UCC 2
BDCAT 2
SCCTSA 2
Tea/Coffee break (JCR)
Parallel Sessions
UCC 3
BDCAT 3
Posters Session
Reception (Location:Kingswell Hotel Tongji)
7-Dec
9:00 - 10:00
10:00 - 10:30
10:30 - 12:00
12:00 - 13:00
13:00 - 14:00
14:00 - 15:30
15:30 - 15:50
15:50 - 17:20
Plenary Keynote 3 Xinbing Wang, "Paperbook: Design and Implementation"(Location: C201)
Tea/Coffee break (JCR)
Parallel Sessions
UCC 4
BDCAT 4
SD3C 1
Lunch:Tongji Sanhaowu Restaurant
Plenary Keynote 4 Geyong Min, "Cloud-Assisted and Data-Driven Knowledge Discovery for Future Internet"(Location: C201)
Parallel Sessions
UCC 5
BDCAT 5
CloudAM 1
Tea/Coffee break (JCR)
Parallel Sessions
PhD Symposium
BDCAT 6
CloudAM 2
8-Dec
9:00 - 10:00
10:00 - 10:30
10:30 - 12:00
12:00 - 13:00
13:00 - 14:00
14:00 - 15:30
15:30 - 15:50
15:50 - 17:30
18:00 - 20:30
Plenary Keynote 5 Hui Lei, "When Big Data Meets Cognitive Computing ... on the Cloud"(Location: C201)
Tea/Coffee break (JCR)
Parallel Sessions
UCC 6
BDCAT 7
RTDPCC 1
Lunch:Tongji Sanhaowu restaurant
Plenary Keynote 6 Jiannong Cao, "Performance Modeling and Optimization in Mobile Cloud Computing Environment"(Location: C201)
Parallel Sessions
BDCAT 12
BDCAT 8
BDCAT 11
Tea/Coffee break (JCR)
Plenary Panel
UCC and BDCAT (Location:C201)
Best Paper Awards/Planning for 2017/Banquet(Location: Kingswell Hotel Tongji)
9-Dec
9:00 - 10:00
10:00 - 10:30
10:30 - 12:00
12:00 - 13:00
13:00 - 14:30
14:30 - 14:50
14:50-17:30
17:30
Plenary Keynote 7 Dharma Rajan, "The Art of Transforming Traditional Utilities to the Cloud Model"(Location: C201)
Tea/Coffee break (JCR)
Parallel Sessions
IDP 1
BDCAT 9
RTDPCC 2
Lunch:Tongji Sanhaowu Restaurant
Parallel Sessions
IDP 2
BDCAT 10
RTDPCC 3
Tea/Coffee break (JCR)
Parallel Sessions
Tutorial 1
Tutorial 2
Tutorial 3
Closing (Location:C201)
Conference Venues:
Registration, Plenary Keynotes and
Technical Sessions
Tongji Sino French Center (同济中法中心, 四平路校区)
Lunch
Tongji Sanhaowu Restaurant (同济三好坞餐厅, 四平路校区)
Reception and Banquet
Kingswell Hotel Tongji ( 同济君禧大酒店, 四平路校区)
C401
A401
BIUC 1
BIUC 2
BIUC 3
9th IEEE/ACM International Conference on Utility and Cloud Computing
Program
Room/Location: C201
Date/Time
Activity
Program
December 6th, 2016
10:40-12:10
Session 1: Hardware-as- aService and Energy
full paper
full paper
full paper
14:10-15:40
Session 2: Emerging
Topics in Cloud
Computing
short paper
short paper
short paper
short paper
short paper
short paper
16:00-17:30
Session 3: Scheduling and
Scalability
full paper
full paper
full paper
Session Chair: Gleb Radchenko
Anca Iordache, Guillaume Pierre, Peter Sanders, Jose Gabriel De F. Coutinho and Mark
Stillwell: "Democratizing High Performance in the Cloud with FPGA Groups"
Peter Garraghan, Yaser Al-Anii, Jon Summers, Harvey Thompson, Nik Kapur and Karim
Djemame: "A Unified Model for Holistic Power Usage in Cloud Datacenter Servers"
Mauro Canuto, Mario Macias and Jordi Guitart: "A Methodology for Full-System Power
Modeling in Heterogeneous Data Centers"
Session Chair: Radu Prodan
Mohan Baruwal Chhetri, Quoc Bao Vo and Ryszard Kowalczyk: "CL-SLAM: CrossLayer SLA Monitoring Framework for Cloud Service-Based Applications"
Michael Borkowski, Stefan Schulte and Christoph Hochreiner: "Predicting Cloud Resource
Utilization"
Wei Wang: "Towards an Emerging Cloudware Paradigm for Transparent Computing"
Kuan-Hsin Lee, I-Cheng Lai and Che-Rung Lee: "Optimizing Back-and-forth Live
Migration"
Thiago A. L. Genez, Luiz F. Bittencourt, Rizos Sakellariou and Edmundo Madeira: "A
Flexible Scheduler for Workflow Ensembles"
William Tarneberg, Vishal Chandrasekaran and Marty Humphrey: "Experiences Creating a
Framework for Smart Traffic Control using AWS IOT"
Session Chair: Luiz Bittencourt
Vahid Arabnejad, Kris Bubendorfer and Bryan Ng: "Deadline Distribution Strategies for
Scientific Workflow Scheduling in Commercial Clouds"
Carlos Mera-Gómez, Rami Bahsoon and Rajkumar Buyya: "Elasticity Debt: A DebtAware Approach to Reason About Elasticity Decisions in the Cloud"
Hamid Mohammadi Fard, Sasko Ristov and Radu Prodan: "Handling the Uncertainty in
Resource Performance for Executing Workflow Applications in Clouds"
December 7th, 2016
10:30-12:00
Session 4: Virtualisation
full paper
full paper
full paper
14:00-15:30
Session 5: Monitoring and
Tuning
full paper
full paper
full paper
Session Chair: Fahimeh Farahnakian
Dinuni Fernando, Hardik Bagdi, Yaohui Hu, Ping Yang, Kartik Gopalan, Charles Kamhoua
and Kevin Kwiat: "Quick Eviction of Virtual Machines Through Proactive Live Snapshots"
Vincenzo De Maio, Gabor Kecskemeti and Radu Prodan: "An Improved Model for Live
Migration in Data Centre Simulators"
Muyang He, Paul Pang, Denis Lavrov, Ding Lu, Yuan Zhang and Abdolhossein
Sarrafzadeh: "Reverse Replication of Virtual Machines (rRVM) for Low Latency and
High Availability Services"
Session Chair: Mohan Baruwal Chhetri
Dániel Géhberger, Péter Mátray and Gábor Németh: "Data-Driven Monitoring for Cloud
Compute Systems"
Bernhard Primas, Peter Garraghan, Karim Djemame and Natasha Shakhlevich: "Resource
Boxing: Converting Realistic Cloud Task Utilization Patterns for Theoretical Scheduling"
Oleg Sukhoroslov, Sergey Volkov and Alexander Afanasiev: "Program Autotuning as a
Service: Opportunities and Challenges"
December 8th, 2016
10:30-12:00
Session 6: Services and
Federation
full paper
full paper
full paper
Session Chair: Dirk Habich
Yash Khandelwal, Suresh Purini and Puduru Reddy: "Fast Algorithms for Optimal Coalition
Formation in Federated Clouds"
Philipp Leitner, Jürgen Cito and Emanuel Stöckli: "Modelling and Managing Deployment
Costs of Microservice-Based Cloud Applications"
Richard Sinnot, Natasha Thomas, Himanshu Bansal and Zeyu Zhao: "My Ever
Changing Moods: Sentiment-based Event Detection on the Cloud"
3rd IEEE/ACM International Conference on Big Data Computing, Applications and
Technologies
Program
Room/Location: C301 (exceptions: Session 11 in C401, Session 12 in C201)
Date/Time
Activity
Program
December 6th, 2016
10:40-12:10
Session 1: Big Data &
Machine Learning
full paper
full paper
full paper
14:10-15:40
Session 2: Hadoop &
Spark
full paper
full paper
full paper
16:00-17:30
Session 3: Visualization
& Social Networks
full paper
short paper
short paper
short paper
Session Chair: Ying Xie
Linh Le, Jie Hao, Ying Xie and Jennifer Priestley: "Deep Kernel: Learning Kernel Function
from Data Using Deep Neural Network"
Salman Salloum and Joshua Zhexue Huang: "Empirical Analysis of Asymptotic Ensemble
Learning for Big Data"
Muhammad Usman Yaseen: "Cloud-based Blur and Illumination Invariant Object
Classification"
Session Chair: Iman Elghandour
Orazio Tomarchio, Giuseppe Di Modica, Marco Cavallo and Carmelo Polito: "H2F: a
Hierarchical Hadoop Framework for big data processing in geo-distributed environments"
Shashank Gugnani, Xiaoyi Lu and Dhabaleswar Panda: "Performance Characterization of
Hadoop Workloads on SR-IOV-enabled Virtualized InfiniBand Clusters"
Yi Chen and Behzad Bordbar: "DRESS: A Rule Engine on Spark for Event Stream
Processing"
Session Chair: Mohsen Farid
Chris Muelder, Robert Faris and Kwan-Liu Ma: "A Visual Analytics Approach to Author
Name Disambiguation"
Ying Xie, Pooja Chenna, Jing Selena He, Lihn Le and Jacey Planteen: "Visualization of Big
High Dimensional Data in a 3 Dimensional Space"
Aqsa Hameed, Saqib Ali, Roger Cottrell and Bebo White: "Applying Big Data Warehousing
and Visualization Techniques on PingER Data"
Bo Yuan, Lu Liu and Nick Antonopoulos: "Efficient Service Discovery in Decentralized
Online Social Networks"
December 7th, 2016
10:30-12:00
Session 4: Health
Applications
full paper
full paper
short paper
14:00-15:30
Session 5: Data Model &
Information Retrieval
full paper
full paper
short paper
15:50-17:20
Session 6: Spatial Data
Analytics
full paper
full paper
short paper
Session Chair: Richard Sinnott
Arjun Athreya, Kee Yuan Ngiam, Zhaojing Luo, Tai E Shyong, Zbigniew Kalbarczyk and
Ravishankar Iyer: "Towards Longitudinal Analysis of a Population’s Electronic Health
Records using Factor Graphs"
Chunxiao Xing, Fengjing Shao, Shunyao Wu and Rencheng Sun:"Disease gene discovery of
single-gene disorders based on Complex Network"
Mohammed Bahja and Mark Lycett: "Identifying Patient Experience from Online Resources via Sentiment Analysis and Topic Moddeling Approach "
Session Chair: Xinghui Zhao
Amir Sinaeepourfard, Jordi Garcia, Xavier Masip-Bruin and Eva Marín-Tordera: "Towards
a Comprehensive Data LifeCycle model for Big Data Environments"
Christina Lioma, Birger Larsen, Wei Lu and Yong Huang: "A Study of Factuality,
Objectivity and Relevance: Three Desiderata in Large-Scale Information Retrieval?"
Francisco J. Clemente-Castelló, Bogdan Nicolae, M. Mustafa Rafique, Rafael Mayo Gual
and Juan Carlos Fernandez: "On Exploiting Data Locality for Iterative MapReduce
Applications in Hybrid Clouds"
Session Chair: Yong Xue
Mariam Malak Fahmy, Iman Elghandour and Magdy Nagy: "CoS-HDFS: Co-Locating
Geo-Distributed Spatial Data in Hadoop Distributed File System"
Lau Pik Lik, Tanmay Chaturvedi, Kai Kiat Benny Ng, Kai Li, Marakkalage Sumudu
Hasala and Chau Yuen: "Spatio and Temporal Analysis of Urban Space Utilization
Renewable Wireless Sensor Network"
Vinutha Magal Shreenath and Sebastiaan Meijer: "Spatial Big Data for designing large
scale infrastructure"
December 8th, 2016
10:30-12:00
Session 7: Scalability &
Performance
full paper
full paper
full paper
14:00-15:30
Session 8: Pattern
Detection & Recognition
full paper
full paper
short paper
14:00-15:30
Session 11: Big Data
Applications 1
full paper
short paper
short paper
short paper
14:00-15:30
Session 12: Big Data
Applications 2
full paper
short paper
short paper
short paper
Session Chair: Radu Prodan
Jingcai Guo: "An Improved Incremental Training Approach for Large Scaled Dataset
based on Support Vector Machine"
Linlin You and Bige Tunçer: "SAPAM: a Scalable “Activities in Places” Analysis
Mechanism for Informed Place Design"
Mohamed Hassaan and Iman Elghandour: "DAMB: A Real-Time Big Data Analysis
Framework on a CPU/GPU Heterogeneous Cluster\\ A Meteorological Application Case
Study"
Session Chair: Philipp Leitner
Christophe Courtin and Miguel Tomasena: "A Benchmarking Platform for Analyzing
Corpora of Traces: The recognition of the users' involvement in fields of competencies"
Abdul Razaq, Huaglory Tianfield and Peter Barrie: "A Big Data Analytics Based Approach
to Anomaly Detection "
Jingjiao Zhang, Jiaqing Fu, Chunhong Zhang and Zheng Hu: "Not Too Late to Identify
Potential Churners: Early Churn Prediction in Telecommunication Industry"
Session Chair: Nick Antonopoulos (Room:C401)
Patrick Glauner, Jorge Meira, Lautaro Dolberg, Radu State, Franck Bettinger, Yves
Rangoni and Diogo Duarte: "Neighborhood Features Help Detecting Non-Technical Losses
in Big Data Sets"
Qing Sun, Niu Jianwei, Zhong Yao and Qiu Dongmin: "Research on Semantic Orientation
Classification of Chinese Online Product Reviews Based on Multi-aspect Sentiment
Analysis"
Anne Tall, Jun Wang and Dezhi Han: "Survey of Data Intensive Computing Technologies
Application to Security Log Data Management"
Mohammed Nasser, Ibrahim Kamel and Zaher Al Aghbari: "Social Community Detection
based on Node Distance and Interest"
Session Chair: Rafael Tolosana (Room:C201)
Guoying Zhang, Min He, Hao Wu, Guanghui Cai and Jianhong Ge: "Non-negative Multiple
Matrix Factorization with Social Similarity for Recommender Systems"
Jian Li, Guanjun Liu, Changjun Jiang and Chungang Yan: "A Hybrid Method of
Recommending POIs Based on Context and Personal Preference Confidence"
Jie Hou and Ya Zhang:"Synergy and Antagonism in Online Advertising"
Manxing Du, Radu State, Mats Brorsson and Tigran Avanesov: "Behavior Profiling for
Mobile Advertising"
December 9th, 2016
10:30-12:00
Session 9: Visual and
Graph Analytics
full paper
full paper
short paper
13:00-14:30
Session 10: Memory &
Storage
full paper
full paper
short paper
Session Chair: Luiz Fernando Bittencourt
Tim Kiefer, Dirk Habich and Wolfgang Lehner: "Penalized Graph Partitioning based
Allocation Strategy for Database-as-a-Service Systems"
Phuong-Hanh Du and Ngoc-Hoa Nguyen: "Optimizing the shortest path query on largescale dynamic directed graph"
Zhenglong Yan, Wenjian Luo, Chenyang Bu and Li Ni: "Clustering Spatial Data by the
Neighbors Intersection and the Density Difference"
Session Chair: Ashiq Anjum
Jie Zhou, Yanyan Shen, Sumin Li and Linpeng Huang: "NVHT: An Efficient Key-Value
Storage Library for Non-Volatile Memory"
Ahsan Javed Awan, Mats Brorsson, Vladimir Vlassov and Eduard Ayguade: "Node
Architecture Implications for In-Memory Data Analytics in Scale-in Clusters"
Mehnuma Tabassum Omar and K. M Azharul Hasan: "A Scalable Storage System for
Structured Data based on Higher Order Index Array"
3rd International Workshop on Smart City Clouds: Technologies, Systems and
Applications (SCCTSA 2016)
Program
Room/Location: C401
Date/Time
Activity
6th December
10:40 - 12:10
Systems and Application
14:10 - 15:40
Security and Safety
Program
Session Chair: Zaheer Khan
Opening
Guest Talk: Richard McClatchey - Emerging Technologies for Supporting Smarter Cities
Antorweep Chakravorty, Bikash Agrawal, Tomasz Wiktorski and Chunming Rong:"
Enrichment of Machine Learning based Activity Classification in Smart Homes using
Ensemble Learning"
Kamran Soomro, Zaheer Khan and Khawar Hasham:"Towards Provisioning of Real-time
Smart City Services Using Clouds"
Rawad Hammad and David Ludlow:"Towards A Smart Learning Environment for Smart
City Governance"
Session Chair: Zaheer Khan
Aljawharah Almuaythir and Mohammad Anwar Hossain:"Cloud-Based Parametrized
Publish/Subscribe System for Public Safety Applications in Smarter Cities"
Shohreh Hosseinzadeh, Samuel Laurén and Ville Leppänen:"Security in Container-based
Virtualization Through Vtpm"
Yu Lei, Philip S. Yu:"Service Topic Model with Probability Distance"
5th International Workshop on Clouds and (eScience)
Applications Management - CloudAM 2016
Program
Room/Location: C401
Date/Time
Activity
Program
7th December
14:00 - 15:30
Service Composition,
Scheduling & Performance
Session Chair: Luiz Bittencourt
Opening
Kuo-Chan Huang, Yu-Chun Lu, Meng-Han Tsai, Ying-Jhih Wu and Hsi-Ya Chang:
"Performance-Efficient Service Deployment and Scheduling Methods for Composite Cloud
Services"
Bilkisu Larai, Muhammad-Bello, Masayoshi Aritsugi: "TCloud: A Transparent Framework
for Public Cloud Service Comparison"
Jorge Mario Cortés-Mendoza, Andrei Tchernykh, Alexander Drozdov and Loic Didelot:
"Robust Cloud VoIP Scheduling with VMs Startup Time Delay Uncertainty"
Vojtech Uhlir, Ondrej Tomanek, Lukas Kencl: "Latency–based Benchmarking of Cloud
Service Providers"
15:50 - 17:20
Resource Management
Session Chair: Ashiq Anjum
Luiz Henrique Nunes, Julio Cezar Estrella, Stephan Reiff-Marganiec, Alexandre Claúdio
Botazzo Delbem and Charith Perera: "The Effects of Relative Importance of User
Constraints in Cloud of Things Resource Discovery: A Case Study"
Victor Medel, Omer Rana, Jose Angel Bañares and Unai Arronategui: "Modelling
Performance and Resource Management in Kubernetes"
Rafael Tolosana-Calasanz, Javier Diaz-Montes, Luiz F. Bittencourt, Omer Rana and Manish
Parashar: "Capacity Management for Streaming Applications over Cloud Infrastructures
with Micro Billing Models"
Closing
5th International Workshop on Bright Internet-based Utility Computing
BIUC 2016
Program
Room/Location: A401
Date/Time
Activity
Program
6th December
10:40-12:10
Session 1: Big data & bright
Internet
full paper
full paper
full paper
full paper
14:10-15:40
Session 2: Social network
analytics
full paper
full paper
full paper
full paper
full paper
16:00-17:30
Session 3: Emerging Topics
full paper
full paper
full paper
full paper
full paper
Session Chair: Zhangxi Lin
Dazeng Yuan, Mingxing He, Shengke Zeng, Xiao Li, and Long Lu:"(t,p)-Threshold Point
Function Secret Sharing Scheme Based on Polynomial Interpolation And Its Application"
Shengke Zeng, Shuangquan Tan, Yong Chen, Mingxing He, Meichen Xia, and Xiao Li:"
Privacy-preserving Location-based Service based on Deniable Authentication"
Meina Song, Xuejun Zhao, Haihong E, Zhonghong Ou:"Statistic-based CRM approach via time
series segmenting RFM on large scale data"
Hu Yang and Yu He:"The Penalized Weighted Clustering Algorithm for Missing and Noisy
Data"
Session Chair: Zhu Jian Ming
Jianfeng Li, Zhangxi Lin and Jiaxian Qiu:"A social network-based evaluation of credit in online
P2P lending market"
Guofang Ma, Yuexuan Wang, Xiaolin Zheng and Litao Xiao:"Leveraging Social Trust Relations
to Improve Cross-domain Recommendation"
Rui Gu and Kanliang Wang:"Empirical Study of Mobile Social Network Users’ Dissemination
of Health-Threatening Information"
Yuhao Li and Kanliang Wang:"Avoid It in a private and social space? An Empirical Study of
Marketers-generated Content Avoidance"
Wenping Zhang and Wei Xu:"Is a Hospital Reliable? Its Website Tells"
Session Chair: Wei Xu
Arodh Lal Karn, Niranjan Sapkota and Muhammad Rafiq: "Incorporating News in Real Time
Trading:A High Frequency Trading perspective"
Zhang Ge, Zhang Shuo and Yang Yiping: "Analysis of Hotelling Model in Enterprise Cloud
Computing Competition based on User Participation"
Yufan Wang and Yingjing Wu: "Research on Determinants of E-commerce Blend Degree on
Sustained Competitive Abilities of SME in Inner Mongolia"
Fu Yong Gui and Zhu Jian Ming: "Operation Mechanism and Data Value Analysis for G2B
System Based on Blockchain"
Ning Zhang and Shan Zhong: "Using Blockchain to Protect Personal Privacy in the Scenario of
Internet Car Rental"
2nd Workshop on Sustainable Data Centers and Cloud Computing
SD3C 2016
Program
Room/Location: C401
Date/Time
Activity
7th December
10:30 - 12:00
Sustainable Data Centers and
Cloud Computing
Program
Session Chair: Robert Birke
Opening
Bob Duncan, Andreas Happe and Alfred Bratterud:"Enterprise IoT Security and
Scalability: How Unikernels can Improve the Status Quo"
Mstapha Ait-Idir and Nazim Agoulmine:"Enhancing Cloud capabilities for SLA
enforcement of Cloud scheduled applications"
Petteri Mäki, Sampsa Rauti, Shohreh Hosseinzadeh, Lauri Koivunen and Ville
Leppänen:"Interface Diversification in IoT Operating Systems"
Alessandro Carrega and Matteo Repetto:"Exploiting Novel Software Development
Paradigms to Increase the Sustainability of Data Centers"
Xiangyue Huang, Zhifeng Zhao and Honggang Zhang:"Latency Analysis of
Cooperative Caching with Multicast for 5G Wireless Networks"
5th International Workshop on Intelligent Data Processing
IDP 2016
Program
Room/Location: C201
Date/Time
Activity
Program
9th December
10:30 - 12:00
Recognition and Prediction
Session Chair: Haolan Zhang
Keynote: Sonya Zhang "The Use of Machine Learning in Business"
Wen Xu, Jing He and Hao Lan Zhang:"Real-Time Target Detection and Recognition with
Deep Convolutional Networks for Intelligent Visual Surveillance"
Valentina Franzoni, Giulio Biondi, Alfredo Milani and Yuanxi Li:"Web-based Similarity for
Emotion Recognition in Web Objects"
Bilal Mehboob, Muzamal Liaqat and Nazar Abbas:"Student Performance Prediction and
Risk Analysis by Using Data Mining Approach"
13:00 - 14:30
Intelligent Processing
Session Chair: Haolan Zhang
Xiaoyun Li, Shizhong Huang, Huanyu Zhao, Xueyan Guo, Libo Xu, Xingsen Li and Youjia
Li:"Image Compression Based on Restricted Wavelet Synopses with Maximum Error
Bound"
Mengyuan Pan, Yang Yang and Zhenqiang Mi:"Research on an extended SVD
Recommendation algorithm based on user’s neighbor model"
Wang Suzhen and Zhou Haowei:"The research of MapReduce load balancing based on
multiple partition algorithm"
International Symposium on Real-time Data Processing for Cloud Computing
(RTDPCC 2016)
Program
Room/Location: C401
Date/Time
Activity
8th December
Program
10:30-12:00
Session Chair: Lu Liu
Data Processing
Keynote: Prof Yong Xue, 'Big Earth Data – a New Dimension for Digital Earth'
Nan Guo, Yuan He, ChunGang Yan, Lu Liu, Cheng Wang: "Multi-level Topical Text
Categorization with Wikipedia"
Jun Yu and Zengfu Wang: "A Monocular Video-Based Facial Expression Recognition
System by Combining Static and Dynamic Knowledge"
Paul Comerford, John N. Davies and Vic Grout: "Reducing packet delay through filter
merging"
9th December
10:30 - 12:00
Strategy and Applications
13:00 - 14:30
Network Access and Control
Session Chair: Xiaojun Zhai
Keynote: Dr Liangxiu Han, “Meeting Society Challenges: Big Data Driven Approaches”
Guilin Shao and Jiming Chen: "A Load Balancing Strategy Based on Data Correlationin
Cloud Computing"
Xingzhen Bai, Maoyong Cao, Lu Liu and John Panneerselvam: "Collaborative Estimation and
Actuation of Wireless Sensor and Actuator Networks for the Greenhouse Environment"
Li Song, Hong Zhong and Jie Cui: "A Certificateless Public Auditing Scheme for Cloudbased Wireless Body Area Network with Revocation and Privacy Preserving"
Session Chair: Xiaojun Zhai
Jie Cui, Hong Zhong and Xuan Tang: "A Fined-grained Privacy-Preserving Access Control
Protocol in Wireless Sensor Networks"
Mohammad Al-Athamneh, Fatih Kurugollu, Danny Crookes and Mohsen Farid: "Video
Authentication Based on Statistical Local Information"
Shoukun Wang, Kaigui Wu and Changze Wu: "Attribute-Based Solution with Time
Restriction Delegate for Flexible and Scalable Access Control in Cloud Computing"
Bksp Kumar Raju, Bhupendra Moharil and G Geethakumari:"FaaSeC: Enabling Forensicsas-a-Service for Cloud Computing Systems"
PhD Symposium
Program
Room/Location: C201
Date/Time
Activity
7th December
15:50 - 17:20
PhD Symposium
Program
Session Chair: Zhihui Du
Carlos Ruiz, Hector A. Duran-Limon, Nikos Parlavantzas:"Towards a Software
Product Line-based approach to adapt IaaS cloud configurations"
Salim Saay, Alex Norta:"A Reference Architecture for a National e-Learning
Infrastructure"
Yatheendraprakash Govindaraju, Hector Duran-Limon:"A QoS and Energy aware
Load Balancing and Resource Allocation Framework for IaaS Cloud"
Moeez Masroor Subhani, Ashiq Anjum:"Clinical and Genomics Data"
Bilal Arshad, Ashiq Anjum:"Graph Based Data Integration for System Integrity and
Scalable Analytics"
Posters Session
Program
Room/Location: C401
6th December
16:00 - 17:30 Posters Session
16:00
16:15
16:30
16:45
17:00
Poster Title
QuRAM Service Recommender: A Platform for IaaS
Service Selection
Adaptive Application Scheduling under Interference in
Kubernetes
Session Chair: Rafael Tolosana-Calasanz
Poster Authors
Sima Soltani, Khalid Elgazzar and Patrick Matin
Victor Medel, Omer Rana, Jose A. Bañares and Unai
Arronategui
Testing-as-a-Service Approach for Cloud Applications
Gleb Radchenko, Dmitry Savchenko and Nikita Ashikhmin
Titian2: A Scalable System-level Emulator with All
Ke Zhang, Ran Zhao, Hongxia Zhang, Lei Yu, Yisong
Programmability for Datacenter Servers in Cloud Computing Chang, Zhao Zhang and Mingyu Chen
Reducting the VM Instance Rental Cost in the Cloud Spot
Market
Jianxiong Wan, Gefei Zhang, Xiang Gui and Ran Zhang
Tutorials
Program
Date/Time
9th December
Activity
14:50-17:30
Tutorial 1
14:50-17:30
14:50-17:30
Tutorial 2
Tutorial 3
Program
Room
Patrick Glauner and Radu State, "Deep Learning on Big Data Sets in
the Cloud with Apache Spark and Google TensorFlow"
C201
Fionn Murtagh and Mohsen Farid, "Survey Analytics from
Questionnaires and Textual Social Media Analytics"
C301
Jiming Wu, "Use Amazon Elastic MapReduce to Process Big Data" C401
Plenary Keynote: Platform Development for Collaborative
Computing with Urban Big data
Professor Minyi Guo, Shanghai Jiao Tong University, China
Abstract:
Nowadays, sensing technologies and large-scale computing infrastructures have produced a variety
of big data in urban spaces, e.g. human mobility, air quality, traffic patterns, and geographical data.
The big data implies rich knowledge about a city and can help tackle these challenges when used
correctly. We believe this is the right time to research on holistic urban big data which has been made
possible due to recent advances in communication technologies that allow wireless connection and
untethered data exchange among vast urban sensing and computing devices, as well as advanced data
and computing science that provides us necessary methods and computing power to understand,
model, and reason the urban data and people. In this talk, we give some properties for processing
urban big data, introduce a system for urban big data processing, and discuss how the collaborative
computing bridges the data and computation in the cyber space and the environment, systems, people
and things in the physical world.
Biography:
Minyi Guo is currently Zhiyuan Chair professor and chair of the Department of
Computer Science and Engineering, Shanghai Jiao Tong University (SJTU),
China. Before joined SJTU, Dr. Guo had been a professor of the school of
computer science and engineering, University of Aizu, Japan. Dr. Guo received
the national science fund for distinguished young scholars from NSFC in 2007,
and was supported by “1000 recruitment program of China” in 2010. His
present research interests include parallel/distributed computing, compiler
optimizations, embedded systems, pervasive computing, and cloud computing.
He has more than 300 publications in major journals and international conferences in these areas,
including the IEEE Transactions on Parallel and Distributed Systems, the IEEE Transactions on
Computers, the ACM Transactions on Autonomous and Adaptive Systems, INFOCOM, IPDPS, ICS,
ISCA, HPCA, SC, WWW, PODC, etc. He received 5 best paper awards from international
conferences. He is on the editorial board of IEEE Transactions on Parallel and Distributed Systems
and Journal of Parallel and Distributed Computing.
Plenary Keynote: Can the Bright Cloud be a Business Model?
Professor Jae Kyu Lee, Korea Advanced Institute of Science & Technology, South Korea
Abstract:
The Bright Internet aims a safer Internet platform where the origination of malicious behaviors can
be deterred because their origins can be identified. As such, the primary goal of the Bright Internet is
the establishment of Preventive Security paradigm in contrast with the current paradigm of protective
security of its own system.
The current cloud computing service providers have no choice but to adopt the protective security
paradigm. In this talk, the benefit of adopting the Bright Internet platform will be presented in the
cloud service provisioning. A question is how to motivate the individual Cloud Service Providers
(CSPs) to adopt the Bright Internet platform.
For this purposes, we analyze the benefits of adopting the Bright Internet platform in terms of
marketing, economy, and compliance to regulation.
1) Marketing Advantage: Suppose that the Bright Internet Global Governance Center certifies the
cleanness level of outgoing messages which will upgrade their trustworthiness to their online business
partners. If the clients of a CSP need such trustworthiness for their business creation, then the CSP
needs to offer the Bright Internet based cloud services.
2) Economic Advantage: Suppose the Bright Internet Global Governance Center evaluates the levels
of harms created by the originating companies such as CPSs. If the cost of preventive measure is more
economical than the payment for the penalty, CSPs will be motivated to invest for preventive security
for their clients.
3) Compliance Advantage: If the social value of preventive security is bigger than the sum of
individual investments for it, the legislation that requires the preventive security measures will be
socially justified. Then the CSPs will have a good reason to adopt the preventive measures like the
Bright Internet.
We present the architecture of Bright Cloud that justifies these business models. To explain the
concept of Bright Cloud, this talk will explain the three goals of Bright Internet (Preventive Security,
Freedom of Anonymous Expression for the Innocent Netizens, and Privacy Protection) and Five Basic
Principles (Origin Responsibility, Deliverer Responsibility, Identifiable Anonymity, Global
Collaborative Search, and Privacy Protection). The specific Bright Cloud business models may adopt
the essential principles that are most suitable for the specific business strategy. The first mover of
Bright Cloud will be able to get the benefit of marketing advantage, and eventually the benefits of
economic and compliance advantages.
Biography:
Jae Kyu Lee was the HHI Chair Professor of Korea Advanced Institute
of Science and Technology, and has become Professor Emeritus of
KAIST since September 2016. He is currently the Director Emeritus of
Bright Internet Research Center at KAIST, a Distinguished Visiting
Professor at Heinz College of Carnegie Mellon University, and the
Honorary Yingluo Wang Professor at School of Management at Xian
Jiaotong University in China as a co-director of the Bright Internet
Global Governance Research Center, China.
He is the Immediate Past President and Fellow of Association for Information Systems, and
conference chair of International Conference on Information Systems 2017 in Seoul. He is also the
chair of inaugurating Bright Internet Global Summit that will be held in Seoul as the pre-ICIS 2017.
He received a Ph.D. in Operations and Information Systems from the Wharton School, University of
Pennsylvania (1985), and has been a Professor of Information Systems and Electronic Commerce at
KAIST since then. He was the founding editor-in-chief of the journal, Electronic Commerce Research
and Applications (Elsevier, SSCI and SCIE Accredited), and was the founding chair of the
International Conference on Electronic Commerce. He was a chair of the International Conference on
Electronic Commerce (ICEC 1998, and ICEC 2000) and Pacific Asia Conference on Information
Systems (2001, 2006).
He was the President of Korea Society of Management Information Systems and Korea Society of
Intelligent Information Systems, and served for the program committee of numerous international
conferences in information systems, intelligent systems, and e-commerce.
He authored four English books and seven Korean books with many editions in the area of Electronic
Commerce, Information System, and Intelligent Systems, including Electronic Commerce: A
Managerial Perspective (2014 Springer; coauthored with Efraim Turban), Artificial Intelligence in
Finance and Investing (Irwin). He published many international journal papers in journals such as MIS
Quarterly, Information Systems Research, Decision Support Systems, Communications of ACM,
Management Science, International Journal of Electronic Commerce, Expert System with
Applications, European Journal of Information Systems, and many others. He presented many keynote
speeches at ICIS, PACIS, AMCIS, and ICEC. He received the best paper awards ten times from the
major conferences, and received a national decoration from the Korea Government for his contribution
to the development of the IT industry.
His research interest has been the application of Artificial Intelligence for Managerial Decision
Support, Electronic Commerce, and Green IT, and his current research interest is the establishment of
the Bright Internet platform. He has conducted 45 granted projects on the topics of the Bright Internet,
Green Business, eCommerce strategies for financial sectors, SCM and eProcurement Systems, case
based project management systems, intelligent scheduling systems for ship building, power
generation, and refinery.
Plenary Keynote: Paperbook: Design and Implementation
Professor Xinbing Wang, Shanghai Jiaotong University, China
Abstract:
In this keynote, we will introduce a novel academic system, paperbook or AceMap, to analyze the
big scholarly data and present the results through a “map" approach. AceMap integrates several
algorithms in the eld of network analysis and data mining, and then displays the information in a clear
and intuitive way, aiming to help the researchers facilitate their work. After describing the big picture,
we present achieved results and our work in progress. By far, AceMap has implemented the following
functions: dynamic citation network display, paper clustering, academic genealogy, author and
conference homepage, etc. We have also designed and performed distributed network analysis
algorithms in a cutting-edge Spark system and utilized modern visualization tools to present the
results. Finally, we conclude my keynote by proposing the future outlooks.
Biography:
Professor Xinbing Wang received the B.S. degree (with hons.) in
Automation from Shanghai Jiao Tong University, Shanghai, China, in 1998,
the M.S. degree in computer science and technology from Tsinghua
University, Beijing, China, in 2001, and the Ph.D. degree with a major in
electrical and computer engineering and minor in mathematics from North
Carolina State University, Raleigh, in 2006. Currently, he is a Professor in
the Department of Electronic Engineering, and Department of Computer
Science, Shanghai Jiao Tong University, Shanghai, China. Dr. Wang has
been an Associate Editor for IEEE/ACM Transactions on Networking, IEEE
Transactions on Mobile Computing, and ACM Transactions on Sensor
Networks. He has also been the Technical Program Committees of several conferences including
ACM MobiCom 2012,2014, ACM MobiHoc 2012-2017, IEEE INFOCOM 2009-2017.
Plenary Keynote:Cloud-Assisted and Data-Driven Knowledge
Discovery for Future Internet
Professor Geyong Min, University of Exeter, U.K.
Abstract:
Autonomic Future Internet (AFI) coupled with the emerging SDN/NFV technologies is regarded as
a promising and viable solution for addressing many grand challenges faced by future 5G networks
and Cloud computing systems. The ambition of AFI is to exploit an autonomic, intelligent and selfmanaging Future Internet with consequent improvement in system efficiency and performance,
increased profitability, and reduced OPEX and CAPEX. Two key features of AFI are selfmanagement and cognitive learning; the former is essential for complexity reduction and fast
adaptation to changing situations and the latter can increase the intelligence through flexible
knowledge utilization.
In this talk, we will present state-of-the-art network architecture for AFI that is seamlessly
integrated with SDN and NFV. The core Knowledge Plane within this unified architecture is
responsible for real-time network big data analysis and knowledge discovery in order to maintain
high-level behaviors of how the system should be configured, managed, and optimized. To establish a
powerful, flexible and scalable Knowledge Plane in AFI, we will present the innovative big data
processing technologies and cost-effective platform developed in Cloud-assisted computational
framework. This framework includes the unified representation of heterogeneous big data and realtime incremental data analysis tools for extracting valuable insights to support better decision making
for system design, resource management and optimization. This talk offers the theoretical
underpinning for efficient processing of big data, and also opens up a new horizon of research and
development by exploiting the key intelligence and insights hidden in rich network big data for design
and improvement of Future Internet and Cloud computing systems.
Biography:
Professor Geyong Min is a Chair in High Performance Computing and
Networking with the Computer Science discipline in the College of
Engineering, Mathematics and Physical Sciences at the University of Exeter,
UK. His recent research has been supported by European FP6/FP7, UK
EPSRC, Royal Academy of Engineering, Royal Society, and industrial
partners including Motorola, IBM, Huawei Technologies, INMARSAT, and
InforSense Ltd. Prof. Min is the Co-ordinator of two recently funded FP7
projects: 1) Quality-of-Experience Improvement for Mobile Multimedia
across Heterogeneous Wireless Networks; and 2) Cross-Layer Investigation
and Integration of Computing and Networking Aspects of Mobile Social
Networks. As a key team member and participant, he has made significant contributions to several EU
funded research projects on Future Generation Internet. He has published more than 200 research
papers in leading international journals including IEEE/ACM Transactions on Networking, IEEE
Journal on Selected Areas in Communications, IEEE Transactions on Communications, IEEE
Transactions on Wireless Communications, IEEE Transactions on Multimedia, IEEE Transactions on
Computers, IEEE Transactions on Parallel and Distributed Systems, and at reputable international
conferences, such as SIGCOMM-IMC, ICDCS, IPDPS, GLOBECOM, and ICC. He is an Associated
Editor of several international journals, e.g., IEEE Transactions on Computers. He served as the
General Chair/Program Chair of a number of international conferences in the area of Information and
Communications Technologies.
Plenary Keynote: When Big Data Meets Cognitive
Computing…on the Cloud
Hui Lei, Director and CTO, Watson Health Cloud IBM, U.S, IEEE Fellow
Abstract:
The cloud has turned into an important platform for business innovation and industry
transformation, leveraging the rapid growth of big data and the emerging paradigm of cognitive
computing. Specifically, big data is becoming the world’s new natural resource and is driving
fundamental changes in technology, business and society. With its exponentially increasing volume,
velocity and variety, big data promises to be for the 21st century what steam power was for the 18th
century, electricity for the 19th, and gas and oil for the 20th. At the same time, the rise of cognitive
systems represents the dawn of a new era of computing. A necessary and natural evolution of
traditional programmable systems, cognitive systems are able to scale and extend human knowledge,
reason with purpose, and learn and improve over time. More importantly, cognitive computing is a key
enabling technology for turning big data into insights and delivering on the full value of big data.In
this talk, I will draw upon our experience at IBM building the Watson Health Cloud, and discuss how
big data and cognitive computing can come together to enable innovative health solutions that tackle
many of the clinical, societal, and economic issues faced by today’s health industry. I will present use
cases, highlight the challenges, describe our approaches, and relate to client experiences as
appropriate.
Biography:
Dr. Hui Lei is CTO, Watson Health Cloud at IBM. An IBM Distinguished
Engineer, he provides leadership on the Watson Health Cloud technical
strategy, and spearheads the design and development of the Watson Health
Cloud platform. Prior to his current role, Dr. Lei was Senior Manager, Cloud
Platform Technologies at the IBM T. J. Watson Research Center, where he led
IBM’s worldwide research strategies in cloud infrastructure services and cloud
managed services. Dr. Lei’s technical vision and creative contributions have
influenced many commercial software products and services, which range
across big data solutions, cloud service offerings, middleware platform for mobile and pervasive
computing, and e-business tooling. Dr. Lei is an active and recognized member of the international
technical community. He is a Fellow of the IEEE, Editor-in-Chief of the IEEE Transactions on Cloud
Computing, and Chair of the IEEE Computer Society Technical Committee on Business Informatics
and Systems. He has taken part in many international conferences as a steering committee chair,
general chair, technical program chair, or keynote speaker. He is also a prolific inventor and has over
70 patents to his credit. He received his PhD in Computer Science from Columbia University.
Plenary Keynote: Performance Modeling and Optimization in
Mobile Cloud Computing Environment
Professor Jiannong Cao, Hong Kong Polytechnic University, China, IEEE Fellow
Abstract:
Mobile cloud computing has emerged as a new paradigm in IT industry and led to many research
and development initiatives. High performance for both the end users and system providers remains to
be an essential goal but is much more difficult to achieve in the new paradigm. Due to the diversity of
applications in mobile cloud computing, there exists different performance models and thus various
methodologies to enhance the application performance. In this talk, we focus on three types of mobile
cloud applications, i.e., the workflow applications, data streaming applications, and the content
delivery applications, and discuss how to model the performance of the applications. Based on the
performance models, we then present our methods to optimize the application performance. In
particular, for the workflow and data streaming applications, we will present a series of new solutions
on computation partitioning to optimize the application performance, while for the content delivery
application, we will present our recent work on load dispatching and service placement to minimize
the overall latency of end users in accessing the content/services.
Biography:
Prof Jiannong Cao is currently a chair professor and head of the Department of
Computing at Hong Kong Polytechnic University, Hung Hom, Hong Kong. His
research interests include parallel and distributed computing, computer networks,
mobile and pervasive computing, fault tolerance, and middleware. He has coauthored 3 books, co-edited 9 books, and published over 300 papers in major
international journals and conference proceedings. He is a fellow of IEEE, a senior
member of China Computer Federation, and a member of ACM. He was the Chair
of the Technical Committee on Distributed Computing of IEEE Computer Society
from 2012 - 2014. Prof. Cao has served as an associate editor and a member of the editorial boards of
many international journals, including ACM Transactions on Sensor Networks, IEEE Transacitons on
Computers, IEEE Transactions on Parallel and Distributed Systems, IEEE Networks, Pervasive and
Mobile Computing Journal, and Peer-to-Peer Networking and Applications. He has also served as a
chair and member of organizing / program committees for many international conferences, including
PERCOM, INFOCOM, ICDCS, IPDPS, ICPP, RTSS, DSN, ICNP, SRDS, MASS, PRDC, ICC,
GLOBECOM, and WCNC.
Prof Cao received the BSc degree in computer science from Nanjing University, Nanjing, China, and
the MSc and the Ph.D degrees in computer science from Washington State University, Pullman, WA,
USA.
Plenary Keynote: The Art of Transforming Traditional Utilities to
the Cloud Model
Dharma Rajan, Practice Solutions Architect – Vmware, U.S.
Abstract:
Public, private, and hybrid clouds are now a de facto industry model. New cloud services are being
introduced by the industry at a very fast pace. This keynote session will take you through the journey
of cloud evolution, from enterprise to utilities, and the industry transformation that is happening. We
will drive through telco cloud with the advent of SDN and NFV, as well as look at how 5G and IoT
cloud evolution will enable new service models. An artful transformation to software-defined smart
cities with smart utilities operated from the cloud is becoming close to reality. With trends in
automation, orchestration, and evolving technology like multi-cloud micro-services, Mobile Virtual
Network Operators can offer new revenue generating cloud services that might transform the way we
do business and research.
Biography:
Dharma Rajan is a leading expert in cloud technology working as lead Solution
Architect at VMware, USA. His areas of expertise span infrastructure virtualization,
hybrid cloud, NFV, and cloud security. Prior to joining VMware, Dharma has worked
at Ericsson, USA for over a decade, building 4G platform architectures, carrier grade
networks, and network management systems. He has also worked at Cisco Systems,
USA on enterprise architecture. He has several technical publications and is an invited
speaker at major industry events and world conferences. He holds an MS in Computer Engineering
from NCSU, USA and M.Tech in CAD from IIT-Kanpur, India.
IDP Keynote-1:
The Use of Machine Learning in Business
Sonya Zhang, California State Polytechnic University, U.S.
Abstract:
Machine Learning is no doubt gaining momentum and reaching the top of
Gartner’s hype curve. As data analytics becomes a more common practice, businesses
are now looking deeper into their data to increase efficiency and competitiveness
using machine learning, which can learn from data, find hidden insights, and make
predictions without being explicitly programmed. Today Machine learning can be
found in many business applications, ranging from facial and object recognition, fraud
detection, product or content recommendation, to effective web search and targeted
ads. In this talk I will give a brief introduction on machine learning, and then focus on
current applications and examples of machine learning in different business functions,
business models, and industries, and finally, the opportunities and challenges.
Biography:
Sonya Zhang is an Associate Professor of Computer Information
Systems at the College of Business Administration, California
State Polytechnic University, Pomona. She received her PhD in
Information Systems and Technology from Claremont Graduate
University. She also holds an M.S. in Computer Science, and an
MBA from Illinois State University.
Sonya’s research specialties are: Web and Software
Development, Digital Analytics, Internet Entrepreneurship, and Online Learning. She
co-authored The Smarter Startup: A Better Approach to Online Business for
Entrepreneurs. Her work also appeared in Journal of Computer Information Systems,
ACM Interactions, Journal of Information Systems Education, Journal of Information
Technology Education, International Journal of Healthcare Information Systems and
Informatics, HICSS, AMCIS and IEEE conference proceedings.
Prior to joining academia, Sonya was a software engineer in health informatics and
higher education for seven years, worked on ERP, Business Intelligence, CMS,
eLearning and eHealth products/projects.
RTDPCC Keynote-1:
Big Earth Data – a New Dimension for Digital Earth
Professor Yong Xue, University of Derby, U.K.
Abstract:
Digital Earth is a multi-resolution, three-dimensional representation of the planet,
into which we can embed vast quantities of geo-referenced data (Al Gore, 1998). As a
new dimension of the Digital Earth, in addition to Computational Science, Mass
Storage, Satellite Imagery, Broadband networks, Interoperability and Metadata, Big
Data technologies provide a set of advanced tools that can improve development of
Digital Earth. After a period of slow but steady scientific progress, this scientific area
seems to be mature for new research and application breakthroughs. The rapid
progress in the development of integrated Big Data and Earth observation tools has
boosted this process (Goodchild et al. 2012, Guo et al. 2016).
As one of the Big Data fields, Earth observation Big Data is unleashing an
interesting time of transition, driving the innovation and development of disciplines,
becoming a new key to the cognition of nature and a new engine for Earth sciences.
Based on widely collected Earth observation big data combined with models of the
Earth system, the development of theory and methods for knowledge discovery
related to big Earth data is an important scientific issue needing attention.
Bibliography:
Professor Dr. Yong Xue (senior member of IEEE) is a Professor
in Computation in University of Derby, United Kingdom. He
received his BSc degree in Physics and his MSc degree in
remote sensing and GIS from Peking University, China in 1986
and 1989, respectively. He received his PhD in remote sensing
and GIS from University of Dundee, UK in 1995. His main
research interests include Geocomputation, aerosol optical
depth retrieval from remotely sensed data, thermal inertia
modeling and heat exchange calculation for the boundary layer.
Prof. Xue has published over 104 peer-reviewed journal papers
(with the highest Impact Factor at 7.885) and over 148 peer-reviewed conference
papers. The overall citations of his publications are over 1330 times with one paper
citations of over 130 times (Google Scholar). He has served as the technical
programme committee members for several international conferences, such as
IEEE/IGARSS conferences and the International Conferences on Computational
Science (ICCS). Professor Xue is an Associate Editor of the International Journal of
Remote Sensing published by Taylor and Francis, UK, a Chartered Physicist and a
member of the Institute of Physics, UK, and the Chapter chair of the joint chapter of
IEEE Aerospace Engineering Society/Oceanic Engineering Society/Geosciences and
Remote Sensing Society since 2004 in United Kingdom. Contact him at:
[email protected].
RTDPCC Keynote-2:
Meeting Society Challenges: Big Data Driven Approaches
Dr. Liangxiu Han, Manchester Metropolitan University, U.K.
Abstract:
This talk will be focusing on new developments and methods based on big data driven
approaches to address society challenges and their applications into application domains such
as Health, Food, Smart Cities.
Biography:
Dr. Liangxiu Han is a Reader in Computer Science, where she is a
Deputy Director for two centres: Informatics Research Centre and
the Man Met Crime and Well-Being Big Data Centre. Having
worked in both industry and academia, Dr. Han has over 14 years
research and practical experiences in developing intelligent ICTenabled software solutions for large scale data processing and data
analysis and mining in different application domains (e.g. Health,
Smart Cities, Bioscience, Cyber Security, Energy, etc.) using
various datasets including images, sensor data, and web pages
(funded by innovate UK, EPSRC, EU-FP7, Government and
Industry respectively). As a Principal Investigator (PI) or Co-PI,
Han has been conducting research in relation to large-scale data
processing, data mining, cloud computing, software architecture (funded by EPSRC, BBSRC,
Innovate UK, Horizon 2020, Industry, Charity, respectively, etc.). Dr. Han is a member of
EPSRC Peer Review College, an independent expert for Horizon 2020 proposal
evaluation/review and British Council Peer Review Panel. She is also a reviewer for IEEE
computer society and Journal of Parallel and Distributed Computing, Journal of Information
Science from Elsevier science, IEEE Transaction on Service Computing, Brain Computing,
IEEE Transaction on Biomedical Imaging engineering, Bioinformatics, Brain Informatics,
Clustering Computing, etc. and various international conferences and programme committee
member of various International Conferences. She had been also involved in number of
professional activities in UK and China.
Deep Learning on Big Data Sets in the Cloud with
Apache Spark and Google TensorFlow
Patrick GLAUNER and Radu STATE
Interdisciplinary Centre for Security, Reliability and Trust, University of Luxembourg
{first.last}@uni.lu
September 12, 2016
Abstract
Machine learning is the branch of artificial intelligence giving computers the ability to learn
patterns from data without being explicitly programmed. Deep Learning is a set of cutting-edge
machine learning algorithms that are inspired by how the human brain works. It allows to selflearn feature hierarchies from the data rather than modeling hand-crafted features. It has proven to
significantly improve performance in challenging data analytics problems. In this tutorial, we will
first provide an introduction to the theoretical foundations of neural networks and Deep Learning.
Second, we will demonstrate how to use Deep Learning in a cloud using a distributed environment
for Big Data analytics. This combines Apache Spark and TensorFlow, Google's in-house Deep
Learning platform made for Big Data machine learning applications. Practical demonstrations will
include character recognition and time series forecasting in Big Data sets. Attendees will be
provided with code snippets that they can easily amend in order to analyze their own data. A
related, but shorter tutorial focusing on Deep Learning on a single computer was given at the Data
Science Luxembourg Meetup in April 2016. It was attended by 70 people making it the most
attended event of this Meetup series in Luxembourg ever since its beginning.
1. Intended audience
This tutorial assumes no prior experience with Apache Spark, machine learning, Deep Learning or
TensorFlow. Attendees will be able to acquire both, the theoretical foundations and hands-on
experience, in this tutorial. Attendees with prior experience in machine learning will benefit from this
part by experiencing a comprehensive rehearsal of the theoretical foundations. However, this tutorial
will include advanced topics of Deep Learning such as new regularization methods or long-short term
memories (LSTM) which primarily focus on attendees with prior experience in this domain. This part
can be skipped by beginners and will not be crucial to their overall learning experience. In order to
fully benefit from this tutorial, attendees should bring their own laptop. This will allow them to
perform experiments on their computer at the same time and to discuss practical questions.
2. Learning outcome
Attendees will get an understanding of what machine learning is and how Deep Learning, its cuttingedge flavor, works. They will not only learn how to apply a distributed environment to Big Data
analytics that can be deployed in a cloud. Rather, they will experience it using Apache Spark and
Google TensorFlow on real and Big Data sets. This knowledge will allow them to apply these
techniques and infrastructure to their own analytics problems in a cloud.
3. Description
3.1 Motivation
Machine learning allows computers to learn from data without being explicitly programmed. However,
hand-crafting features from raw data input is a major challenge in machine learning. Deep Learning
allows to self-learn increasingly more complex feature hierarchies from the raw data input. Deep
Learning builds on top of the theory of neural networks, which are celebrating a comeback under this
new term. Deep Learning has proven to significantly outperform other learning algorithms in a variety
of tasks, such as image recognition1, speech recognition2 or winning the game of Go3. However, Deep
Learning is not an easy-to-use silver bullet and requires intensive training. To date, there is no
comprehensive book on this topic and expertise must be painfully collected from many different
sources. Therefore, the goal of this tutorial is to provide a comprehensive introduction to the
foundations of Deep Learning. Another shortcoming of Deep Learning is the potentially long training
time of a deep neural network. TensorFlow is Google's in-house Deep Learning platform that allows to
efficiently train deep neural networks on GPUs. A different approach is to use map reduce architectures
such as Apache Spark or GPUs. In this tutorial, this effectiveness of a combination of both will be
shown on real Big Data sets and how to deploy it in a cloud.
3.2 Outline of the proposed content
The proposed structure of this tutorial is as follows:
1.
2.
3.
4.
5.
6.
7.
This tutorial will begin with a quick introduction to the most relevant foundations of machine
learning.
It will then provide a comprehensive introduction to neural networks, a learning algorithm that
is inspired by how the human brain works. This also includes a discussion of the limitations of
backpropagation, the traditional neural network training method.
Neural networks are the foundation of Deep Learning, which are basically a neural network
with many layers of neurons. In this section, Deep Learning will be presented to the audience
and how these new training methods overcome the limitations of backpropagation in order to
efficiently train powerful deep neural networks.
Training Deep Learning architectures is time-consuming. However, training neural networks
is basically a series of matrix multiplications. Matrix multiplications can be efficiently
distributed. Typical distribution methods include map reduce and training on GPUs.
Apache Spark4 uses map reduce in order to distribute computations among nodes. In contrast,
Google TensorFlow5 allows to distribute training on one or multiple GPUs. Both approaches
will be presented and also how they can be combined to take advantage of both concepts to
achieve the most efficient outcome in a cloud.
In the first practical demonstration, multiple deep neural networks are trained to recognize
characters using the notMNIST dataset6, which are characters A-J of different fonts. This will
include a discussion of convolutional neural networks (CNN), which are inspired by how the
human vision system works.
In the second practical demonstration, multiple deep neural networks are trained to forecast a
time series. This will include a discussion of recurrent neural networks (RNN) which are able
to process temporal information. Furthermore, long-short term memories (LSTM) will be
discussed, a modular and highly effective type of RNN. Furthermore, advanced time series
forecasting such as for electricity load coreacasting using a dataset of the "Global Energy
Forecasting Competition 2012 - Load Forecasting" Kaggle challenge7 will be discussed.
4. Prior tutorials
A related tutorial was given at the Data Science Luxembourg Meetup8 in April 2016 under the title
"Deep Learning with TensorFlow"9. That 1-hour tutorial assumed expertise in machine learning and
focused on the theoretical foundations of Deep Learning and how to apply regular deep feed-forward
1
Y. LeCun, Y. Bengio and G. E. Hinton, "Deep Learning", Nature, vol. 521, pp. 436-444, 2016.
G. Hinton, L. Deng, D. Yu, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, G. Dahl and Brian
Kingsbury, "Deep Neural Networks for Acoustic Modeling in Speech Recognition", IEEE Signal Processing Magazine, 29
(6), 82-97, 2012.
3
D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V.
Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K.
Kavukcuoglu, T. Graepel and D. Hassabis, "Mastering the game of Go with deep neural networks and tree search", Nature,
vol. 529, pp. 484-489, 2016.
4
http://spark.apache.org/
5
http://www.tensorflow.org/
6
http://yaroslavvb.blogspot.lu/2011/09/notmnist-dataset.html
7
http://www.kaggle.com/c/global-energy-forecasting-competition-2012-load-forecasting 2
neural networks in TensorFlow to the notMNIST dataset for character recognition. It was attended by
approximately 70 people, who asked many questions and their feedback was consistently positive. This
was the most popular Data Science Luxembourg Meetup event ever since this monthly meet up series
was started in November 2012. A tutorial on Deep Learning for load forecasting10 was accepted at
IEEE PES Innovative Smart Grid Technologies (ISGT), Europe11 and will be given in October 2016.
However, the focus will be on Deep Learning on a single computer using TensorFlow for time series
forecasting.
This 3-hour tutorial will be different in the following ways:





Use of Apache Spark combined with TensorFlow taking advantage of a distributed
environment in order to efficiently process Big Data sets in a cloud.
The length of this tutorial allows to also cover CNNs and not just regular feed-forward
architectures for the image recognition example.
It will include not only RNNs in the time series example but also provide a comparison to
other state-of-the-art models such as Hidden Markov Models.
Prior machine learning experience will not be assumed and the theoretical foundations will be
covered in the beginning.
It will focus on Deep Learning and skip the last part on the future of artificial intelligence and
the technological singularity.
5. Materials
A comprehensive tutorial slide deck will be provided, which contains figures, definitions, explanations,
relevant parts of code snippets and annotated bibliography. The complete and functional code snippets
will be provided. In order to make them work, a list of dependencies to required libraries will also be
provided, so that attendees can easily install them. All code snippets will be able to be deployed in a
cloud to speed up training time.
6. Bio sketch
Patrick GLAUNER graduated as valedictorian with a B.Sc. degree in computer science from Karlsruhe
University of Applied Sciences in 2012 and received the M.Sc. degree in machine learning from
Imperial College London in 2015. He was a Fellow at CERN, the European Organization for Nuclear
Research, worked at SAP and is an alumnus of the German National Academic Foundation
(Studienstiftung des deutschen Volkes). He is currently a Ph.D. student in machine learning in the
Interdisciplinary Centre for Security, Reliability and Trust, University of Luxembourg, under the
supervision of Dr. Radu STATE. He also holds an adjunct faculty appointment at Karlsruhe University
of Applied Sciences where he teaches artificial intelligence. His interests include anomaly detection,
big data, computer vision, deep learning, time series.
Radu STATE received the M.Sc. degree from the Johns Hopkins University, Baltimore, MD, USA, and
the Ph.D. degree and a HDR from University of Lorraine, Nancy, France. He is a Senior Researcher
with the Interdisciplinary Center on Security and Trust in Luxembourg, where he heads the SEDAN
research group. He was a Professor at the University of Lorraine and a Senior Researcher at INRIA
Nancy, Grand Est. Having authored more than 100 papers, his research interests cover network and
system security and management.
8
http://www.meetup.com/LuxRgroup/events/229662811/
P. Glauner, "Introduction to Deep Learning and Google TensorFlow", Data Science Luxembourg Meetup, Luxembourg,
Luxembourg, 2016.
10
P. Glauner and R. State, "Load Forecasting with Artificial Intelligence on Big Data", Sixth IEEE Conference on
Innovative Smart Grid Technologies, Europe (ISGT Europe 2016), tutorial session, Ljubljana, Slovenia, 2016.
11
http://sites.ieee.org/isgt-europe-2016/ 9
Survey Analytics from Questionnaires and Textual Social
Media Analytics. With Accompanying Practical Sessions,
examples and case studies in English.
Prof Fionn Murtagh
Dr Mohsen Farid
Professor, Big Data Lab, University of Derby;
and Goldsmith University of London.
Associate Professor, Big Data Lab, University
of Derby.
[email protected]
[email protected])
1. Course Description
The work of the celebrated social scientist Pierre Bourdieu (1930-2002) includes the thoughtful and
creative use of the Correspondence Analysis, published in English in 1984, with title Distinction. It is
on such a geometric data analysis approach that this course is based.
The focus is: (1) interpretation of results, graphical displays and other outputs, (2) practical
implementation using the R statistical and visualization environment, and (3) providing intuition, and
full understanding, relating to the geometry and statistical processing.
We use data collected in
various questionnaires, starting from work by Bourdieu on cultural taste. Other questionnaire analysis
case studies will be related to transport, cooking and lifestyle, student experience, consumer behavior,
and music appreciation.
Next the questionnaire outcomes express both closed, fixed format questions, and, conjointly analyzed,
free text responses.
Finally studied will be data sourced from social media micro-blogging, i.e. Twitter.
Data Sources: Questionnaire Numerical Scoring Responses, Free Text Responses, and Twitter Data
Sources.
2. Syllabus

Tools
The course uses the R programming and visualization language

Topics
In accompanying online course materials, there will be a practical introduction to the R
language and environment. This is for participants who have not used R before.
Part 1: Questionnaire analysis case study: taking the Bourdieu taste data, detailed discussion
of output, detailing the R code used.
Part 2: Geometric intuition: the methodology used for graphical display, hierarchical
clustering, and putting it all together.
Part 3: Carrying out geometric data analysis, including clustering, using R. Including
publication/presentation outputs, storing data for later work, and maintaining the R scripts that
are used.
Part 4: Further case studies of questionnaire analysis.
Part 5: Questionnaire analysis, using conjoint, or integrally related, analysis of closed
questions, and open or free text questions.
Part 6: Coverage of social media data sources, will be especially centered on Twitter. All
sessions will be associated with practical exercises, using case studies.
Final Part: Concluding short debate and discussion on potential and scope for analytics, and
statistical treatment of data, and text mining.
3. Target Audience
Practitioners and researchers related to any domains that are encompassed in the case studies, and
practical exercises. Students who are undertaking, or who are planning to undertake, any and all such
work.
Domains of general relevance include:









Health and medical surveys,
Marketing,
Security and forensics,
Information and data sourcing through web-based questionnaires,
Lifestyle and wellbeing analytics,
Legal studies,
Political studies,
Language and literature,
Digital humanities.
The presentation language of the short course is English. Case studies will also be in English as well,
however issues related other languages such as Arabic may be addressed.
4. Facilities Required




Classrooms equipped with a computer (with the complete software environment) connected to
an overhead projector and screen, plus a writing board.
Computers for participants. Course participants’ own laptops are also feasible (with the
complete software environment).
Software:
o R, open source and openly available with pertinent toolboxes as required., for all
computer platforms.
Course Material
o
All course materials, including the data and examples of software use for the case
studies, will be made available for course participants, on a password protected web
site.
Use Amazon Elastic MapReduce to Process Big Data
Jiming Wu
Associate Professor of California State University, East Bay
Abstract:
This tutorial is to teach audience how to use Amazon Elastic MapReduce (Amazon EMR) to
analyze large amount of data. Amazon EMR is a web service that provides a managed Hadoop
framework to simplify big data processing. Topics will include 1) create an Amazon Web Service
Account, 2) employ Amazon cloud storage service, 3) run an EMR cluster, 4) set up an EMR job,
and 5) examine EMR job output.
Intended Audience: graduate students with a concentration on business analytics, data analytics, or
data science.
Learning outcome: audience will be able to use Amazon EMR to process Big Data.
Description: This is an introduction to Amazon Elastic MapReduce system. Topics include
MapReduce features, Hadoop distributed filesystem, input/output, Amazon storage system, and EMR
cluster. Students will have opportunity to use Amazon MapReduce system to process Big Data. The
objective of this tutorial is to impart working knowledge and skills associated with Big Data
technologies and to let students better understand how companies leverage these technologies to
analyze Big Data.
Outline of the content: 1) learn how to create an Amazon Web Service Account, 2) discuss how to
employ Amazon cloud storage service, 3) explain how to create and run an EMR cluster, 4) describe
how to set up an EMR job, and 5) show how to access and interpret EMR job output.
An example of finding the maximum temperature:
1.
2.
Set up a Hadoop cluster on Amazon Elastic MapReduce (EMR)
Submit max-temperature.jar to EMR. Please refer to the following website about how to
submit a customer Jar:
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-launch-customjar-cli.html
3. Set up the input file folder on Amazon storage service – S3
4. In Amazon console, set up Jar (max-temperature.jar) and then set up arguments:
MaxTemperature
s3://chapter2/input
s3://chapter2/output
5. Run the Jar on Amazon Elastic MapReduce
6. In Amazon SSH command interface: Amazon local drive folder is /home/hadoop; Amazon
HDFS folder is /user/hadoop
7. Copy file from S3 to Local Disk: aws s3 cp s3://chapter2/MaxTemperatureMapper.java .
8. Copy file from Local Disk to HDFS: hdfs dfs -copyFromLocal file.gz .
9. Compile Java files:
javac -cp src/:hadoop-common-2.6.1.jar:hadoop-mapreduce-client-core-2.6.1.jar:commonscli-2.0.jar -d . MaxTemperature.java MaxTemperatureReducer.java
MaxTemperatureMapper.java
10. Create a Jar file: jar -cvf max-temperature.jar MaxTemperature*.class
11. Run a Jar file:
hadoop jar max-temperature.jar MaxTemperature /user/hadoop/input0/sample.txt
/user/hadoop/output01
12. Display the output on screen: hadoop fs -cat /user/hadoop/output01/part-r-00001
Statement: this tutorial has never been given before.
Materials: PowerPoint slides and Word documents will be provided to attendees.
Bio-sketch: Jiming Wu is an Associate Professor in the Department of Management at California State
University, East Bay. He received his B.S. from Shanghai Jiao Tong University, M.S. from Texas Tech
University, and Ph.D. from the University of Kentucky. His research interests include knowledge
management, IT adoption and acceptance, and computer and network security. His work has appeared
in MIS Quarterly, Journal of the Association for Information Systems, European Journal of Information
Systems, Information & Management, Decision Support Systems, and elsewhere.
Conference map