trellis - Information Sciences Institute

AAAI-08 Tutorial on
Computational Workflows for
Large-Scale Artificial Intelligence Research
Part VII:
Future Challenges in
Computational Workflows and
Opportunities for AI Research
USC Information Sciences Institute
Yolanda Gil ([email protected])
AAAI-08 Tutorial July 13, 2008
1
Scientific Collaborations: Publications
[from Science, April 2005]
USC Information Sciences Institute
Yolanda Gil ([email protected])
AAAI-08 Tutorial July 13, 2008
2
Sharing Data Collection: LIGO (ligo.caltech.edu)
USC Information Sciences Institute
Yolanda Gil ([email protected])
AAAI-08 Tutorial July 13, 2008
3
Sharing Computing Resources
USC Information Sciences Institute
Yolanda Gil ([email protected])
AAAI-08 Tutorial July 13, 2008
4
Ongoing Research
USC Information Sciences Institute
Yolanda Gil ([email protected])
AAAI-08 Tutorial July 13, 2008
5
Workflow Lifecycle [Deelman and Gil 06]
Workflow
and
Component
Libraries
Data
Products
Adapt,
Modify
Workflow
Template
Workflow
Instance
Data, Metadata,
Provenance
Information
Execute
Populate
with data
Executable
Workflow
Map to
available
resources
Compute,
Storage
and
Network
Resources
USC Information Sciences Institute
Data,
Metadata
Catalogs
Yolanda Gil ([email protected])
Resource,
Application
Component
Descriptions
AAAI-08 Tutorial July 13, 2008
6
Workflow Creation

Workflow completion
•


Workflows as components of other workflows
Automatic workflow assembly from libraries of
components
•

[McDermott 02] [McIlraith & Son 03] [Blythe et al 04] …
Interleaving workflow composition and execution
•

Automatically add data conversion and formatting components
[Gil et al 07]
“Science of design” for computational workflows as
software artifacts
•
[Deelman & Gil 07] [Gil et al 07][Gil 08]
USC Information Sciences Institute
Yolanda Gil ([email protected])
AAAI-08 Tutorial July 13, 2008
7
Workflow Catalogs

Workflow description and formal representation
•

Workflow discovery
•

[Goderis et al 06] [Goderis et al 07]
Query-based workflow matching
•

[Goble et al 06]
Workflow reuse and repurposing
•

W3C semantic workflow language activity
[Horrocks and Li 02] [Baader 01]
Workflow sharing
•
[DeRoure & Goble 07]
USC Information Sciences Institute
Yolanda Gil ([email protected])
AAAI-08 Tutorial July 13, 2008
8
Workflow Learning
1) From a user’s
demonstration
of service
invocations
[Burstein et al 08]
[Kim & Gil 08]
2) From tutorial
instruction
[Groth & Gil 08]
3) Generalizing
from examples
(from [Burstein et al 08])
USC Information Sciences Institute
Yolanda Gil ([email protected])
AAAI-08 Tutorial July 13, 2008
9
Five Opportunities
for Future Research
USC Information Sciences Institute
Yolanda Gil ([email protected])
AAAI-08 Tutorial July 13, 2008
10
1) Reduce Setup Cost -> Workflow as First
Class Citizen in Scientific Research

Today: Workflow design and implementation is costly
•
Developed through collaboration
– Application scientists in several areas, software
engineers, distributed systems experts, etc.
•
Developed over many months
– Must adapt existing code, must create “glue” code
•

Validated and refined over time
Goal: Must be done by scientists themselves at minimal
cost:
•
•
•
•
•
To create them
To understand them
To learn to use them for research
To adapt them for another purpose or analysis variant
To refine/update them over time
USC Information Sciences Institute
Yolanda Gil ([email protected])
AAAI-08 Tutorial July 13, 2008
11
2) Workflow Centered User Interaction






Workflow template as selected method
User visibility into the data analysis process
User steering during execution based on results
Interleaving generation and execution (data-driven
adaptation)
Recording provenance
Automation of non-experiment critical, routine tasks
USC Information Sciences Institute
Yolanda Gil ([email protected])
AAAI-08 Tutorial July 13, 2008
12
3) Workflows for Cross-Disciplinary
Analyses -> Enable Integrative Science

Today: Workflow systems can generate detailed
provenance and metadata for new data products
•
•

Describe individual datasets so they can be used by others
Reuse of new data products by other systems is currently rare
– Reuse is common within systems/communities
Goal: Workflows generating data that is used across
disciplines
•
•
Meaningful reuse of data products (results) by other workflows
True test of the utility of provenance and metadata information
USC Information Sciences Institute
Yolanda Gil ([email protected])
AAAI-08 Tutorial July 13, 2008
13
4) Using Workflows for Educating New (and
Old!) Scientists

Today: Scientific analyses are less and less accessible to
newcomers
•

Steep learning curve that includes a variety of areas of expertise
– Application science(s), modeling, software
engineering, distributed computing, etc.
Goal: Workflow systems could be configured to enable
learning of additional capabilities on-demand
•
•
Could isolate less proficient users from advanced capabilities
while enabling them to learn and practice what they learn
Everyone should be able to contribute as they learn
USC Information Sciences Institute
Yolanda Gil ([email protected])
AAAI-08 Tutorial July 13, 2008
14
5) Workflows as Efficient Instruments of
Systematic Exploration and Discovery

Today: Workflows manually selected by user
•
•
•
•

User decides what data/analysis to conduct
Not a systematic exploration of space
Visualization is only one way to understand results
Human is bottleneck, current practice will not scale
Goal: Workflows conduct automated heuristic discovery
and pattern detection
•
•
•
•
Automate systematic exploration of all possible workflows
Formulate heuristics for scientific discovery: recurring domainindependent data analysis patterns [Simon 82]
Search for patterns (or pattern types)
Workflows could include pattern detection and discovery
components
USC Information Sciences Institute
Yolanda Gil ([email protected])
AAAI-08 Tutorial July 13, 2008
15
Cyberinfrastructure: Not Just Big Iron
“The Federal government must rebalance R&D investments to:
• Create a new generation of well-engineered, scalable, easy-to-use software
suitable for computational science that can reduce the complexity and time
to solution for today’s challenging scientific applications and can create
accurate models and simulations that answer new questions
• Design, prototype, and evaluate new hardware architectures that can deliver
larger fractions of peak hardware performance on key applications
• Focus on sensor- and data-intensive computational science applications in
light of the explosive growth of data”
President’s Information Technology Advisory Committee (PITAC) report on
“Computational Science: Ensuring America’s Competitiveness”,
May 2005
USC Information Sciences Institute
Yolanda Gil ([email protected])
AAAI-08 Tutorial July 13, 2008
16
Tomorrow’s Cyberinfrastructure Layers Enabled by
Knowledge-Rich Workflow Systems [Gil 08]
Portals
Portals Interfaces
Workflow-Centered
Portals
Heuristic Discovery
Data
Services
Workflow Sharing
Application
Tools
Workflow Systems
Resource Sharing
Resource Access
USC Information Sciences Institute
Yolanda Gil ([email protected])
AAAI-08 Tutorial July 13, 2008
17
“As We May Think”
“Wholly new forms of encyclopedias will appear, ready made with a
mesh of associative trails running through them […]. The lawyer has
at his touch the associated opinions and decisions of his whole
experience, and of the experience of friends and authorities. The
patent attorney has on call the millions of issued patents, with
familiar trails to every point of his client's interest. […] The chemist,
struggling with the synthesis of an organic compound, has all the
chemical literature before him in his laboratory, with trails following
the analogies of compounds, and side trails to their physical and
chemical behavior. […]
There is a new profession of trail blazers, those who find delight in the
task of establishing useful trails through the enormous mass of the
common record. The inheritance from the master becomes, not only
his additions to the world's record, but for his disciples the entire
scaffolding by which [their additions] were erected.”
--- Vannevar Bush, 1945
http://www.theatlantic.com/unbound/flashbks/computer/bushf.htm
USC Information Sciences Institute
Yolanda Gil ([email protected])
AAAI-08 Tutorial July 13, 2008
18