Kuo Kwo-Sen - Research Data Alliance

RDA PLENARY 5
BIG DATA (ANALYTICS) IG SESSION
10 March 2015
Paradise Point
San Diego,
California
NEXT JOINT SESSION CANCELED
 The planned breakout joint session with
Reproducibility IG has been canceled:
 10 March, 16:00-17:30 Sunset Ballroom Salon V
 Responsible co-chairs from both IGs are unable to attend this
Plenary.
3/10/15
RDA P5, San Diego, California
2
AGENDA
(FINAL VERSION, PROMISE!)
Time
Presenter/Mo
derator
14:00-14:05 Kuo
Presentation/Discussion
Session Introduction
(Actually we are waiting for people to come in…)
14:05-14:25 Markus Götz
Smart Data Analytics
– 3 use cases in different domains
14:25-14:45 Line Pouchard
Issues in Big Data Curation
14:45-15:05 Peter
Baumann
Use Case Advances Report: ND Arrays,
Spatiotemporal Earth Data
14:45-15:00 Kuo
The Case for IG Name Change
– Big Data Analytics IG
15:00-15:30 Members and
Participants


3/10/15
BD IG Outcome and Deliverable Discussion
WG Creation and Coordination Discussion
RDA P5, San Diego, California
3
PRESENTATIONS START
3/10/15
RDA P5, San Diego, California
4
NEXT JOINT SESSION CANCELED
 The planned breakout joint session with
Reproducibility IG has been canceled:
 10 March, 16:00-17:30 Sunset Ballroom Salon V
 Responsible co-chairs from both IGs are unable to attend this
Plenary.
3/10/15
RDA P5, San Diego, California
5
DRAFT NBD-PWG REFERENCE
ARCHITECTURE
I N F O R M AT I O N V A L U E C H A I N
Visualization
Analytics
DATA
Access
SW
SW
SW
KEY:
3/10/15
DATA
Processing: Computing and Analytic
Streaming
Interactive
Batch
Resource Management
Messaging/ Communications
Big Data Framework Provider
Platforms: Data Organization and Distribution
Indexed Storage
File Systems
Infrastructures: Networking, Computing, Storage
Virtual Resources
Physical Resources
Big Data Information
Flow
Service Use
RDA P5, San Diego, California
SW
I T VA LU E C H A I N
Preparation
/ Curation
Management
Collection
Security & Privacy
DATA
DATA
Data Provider
Big Data Application Provider
Data Consumer
System Orchestrator
Software Tools and
Algorithms Transfer
6
MISSION
The ultimate goals of RDA Big Data Interest Group is to
produce a set of recommendation documents to advise
diverse research communities with respect to:
 How to select an appropriate Big Data solution for a
particular science application with optimal value.
 Important: Need to connect with various science/research
domains!
 What are the best practices in dealing with various
data and computing issues associated with such a
solution.
3/10/15
RDA P5, San Diego, California
7
OBJECTIVES
 Clarifying, and sometimes defining, terminologies related to Big Data,
leveraging:
 ISO/IEC JTC 1 Terms and Definitions, NIST Big Data PWG (NBD -PWG) Definitions, and
Taxonomies documents, and RDA Terminologies WG
 Characterizing leading Big Data technologies .
 Important: Need to collaborate with relevant RDA IGs and initiate Working Groups.
 Example characteristics include: performance , resource utilization, scalability,
usability, flexibility, extensibility, propensity in supporting scientific collaborations,
etc.
 Collaborating with external entities through IG member involvements,
including:
 ISO, NIST, INCITS, OGC, NBD -PWG, EarthCube, EarthServer, etc.
 Producing a set of recommendation documents based on results obtained
from activities in attaining above objectives, including:
 A systematic classification of algorithms pertinent to the characterization of Big Data
technologies,
 Characterizations of Big Data technologies investigated, especially their value
characteristics in each category of use cases,
 Frequency of each class of algorithms and/or queries used by workflows in various use
cases, delineated by science domains/subdomains, and
 Feasible combinations of analysis algorithms, analytical tools, data and resource
characteristics and scientific queries.
3/10/15
RDA P5, San Diego, California
8
PARTICIPATION
 Domain scientists wishing to utilize Big Data solutions for their
research and/ or applications ,
 Data specialists with experience in data production, curation, analysis,
and management, especially involving large volumes and varieties of
data,
 Computational scientists or sof tware engineers with special interests
in data analysis techniques and algorithm analysis, especially
per taining to BigData relevant technologies and tools,
 Exper ts, or aspiring exper ts, of various Big Data technologies and tools,
 Computational infrastructure and architecture exper ts in fields such as
distributed computing , high-per formance computing , and database
systems,
 Data scientists with a blended interest involving some subsets the
activities mentioned above , in par ticular with share, use and reuse of
open scientific datasets, and
 Managers involved in any combination of the activities mentioned
above.
3/10/15
RDA P5, San Diego, California
9
INTERACTION MECHANISM
 Monthly teleconference to with planned agenda to discuss
specific issues.
 Proposing 10 AM US Eastern Time (4 PM Central European Time)
every 1 st or 2 nd Thursday of each month.
 We will use GoToMeeting instead of the default RDA means for
teleconferencing.
 Agenda should be available 1 week before meeting.
 Meeting minutes should be available within 1 week after meeting.
 Asynchronous collaboration using RDA Wiki, Google Docs, and
email lists.
 Semiannual RDA Plenary meetings to hold sessions for
progress reports and face -to-face interactions amongst
interested parties.
3/10/15
RDA P5, San Diego, California
10
SCHEDULE
Year
Qr.
Task
2015
Q1
 Revise BDA IG (original group name) charter to suit the broadened scope
of proposed IG name change to “Big Data Interest Group”.
 Prepare RDA 5th Plenary.
Q2
 Start the planning and organization of studies into the characterization of
various popular Big Data technologies
 Evaluation of potential WG spinoffs based on characterization work.
Q3
 Progress reports on characterization studies.
 Prepare RDA 6th Plenary.
 Created Spinoffs WGs on detailed big data studies
Q4
 Progress reports on characterization studies.
Q1
 Produce a report on characterization studies.
 Prepare RDA 7th Plenary.
 Initial results of Spinoff WGs and their findings
2016
3/10/15
RDA P5, San Diego, California
11