Powerpoint

A Steering Portal for
Condor/DAGMAN
Naoya Maruyama on behalf of Akiko Iino
Hidemoto Nakada, Satoshi Matsuoka
Tokyo Institute of Technology
1
Background

Common Grid Usage Scenario
 Zillions
of Batch Jobs scheduled over combination of
private/public resources within a VO

Some Jobs require steering during workflow
 “Human

decision required”
Most previous steering work focused on GUIlevel interactivity
 Real-time,
interactive steering of the application itself
 Does not meld well with batch jobs
 Need significant application customizations
2
Objectives and Contributions


Objectives
 A Steering Portal for workflow (DAGMAN) jobs with
easy descriptions, w/o application, Condor, or
DAGMAN modifications
Contributions
 Portal to allow steering with simple additions to
DAGMAN scripts
 Confirmed low overhead with exemplar applications

Quantitative assessment of user steps required
3
Outline
Background
 Motivating example
 Required features of steering
 Steering example
 Overview and prototype implementation
 Evaluation
 Conclustion

4
Exemplar Application:
Phylogenetic Tree Inference

Infer phylogenetic
relationships between
different species from
their genomic sequences
[Hasegawa&Shimodaira04]

App Characteristics
Common Ancestor
 Basically
execute multiple parallel jobs in sequence
=> Workflow of batch jobs
 But difficult to judge the termination condition of the
application phases
=> Need human steering
5
Phylogenetic Tree Inference Breakdown
Narrow down on the candidate
phylogenetic trees:
Hard to automate=>batch jobs difficult
Compute Posterior
Probability
Compute likelihood value
“PAML”
“MrBayes”
Test
“CONSEL”
6
List of Applications in the WF
Job
Description
Input
Output
Compute Time
Required
MrBayes
Compute
Posterior
Probability
Initial
Topology
List of
Topologies
~2 weeks on 24
high-end CPUs
PAML
Compute
likelihood
value
List of
Topologies
Likelihood
Values
~10 days on 26
high-end CPUs
Test
List of
Topologies
& Likelihood
Values
Probability
Values
1~2 hours on 1 CPU
CONSEL
7
The Actual Workflow
1
1
1
1
1
1.
5
5
2
2.
3
3.
4
Need 4.
Steering
5.
5
5
5
6.
6
Exec. MrBayes
Termination
Judgement
Manutal input of
new parameters
Post-Process
MrBayes
Execute PAML
Execute CONSEL
8
MrBayes Example and Problems

As a standalone app,
requests interactive input
 Up
to a user to judge
computational convergence

But lacks info display to
allow good judgment
 Not
on this screen!
1.User needs to periodically poll his screen
and make interactive input
2.Also look at output files from 1000 jobs!
9
MrBayes Examples and Problems (2)
・Decide on
Convergence
Visualize
・Decide on next
parameter
Output file
Problems:
3.Manual conversion to graphical display
4.Changing appropriate parameters
10
Outline
Background
 Motivating example
 Required features of steering
 Steering example
 Overview and prototype implementation
 Evaluation
 Conclustion

11
Steering portal features for batch
workflows with interactivity elements

Pausing/resuming computation



Allow flexible parameter modifications





Progress computation as much as possible until user input
is absolutely needed
Resume immediately after input
Various ways to specify parameters for output and input
Various ways to notify users – interactive screen, email, etc.
Various ways of parameter observations – various portal
functions
Various ways to modify parameters
Even switching back and forth between your terminal
and from a cell phone 10,000 miles away!
12
Outline
Background
 Motivating example
 Required features of steering
 Steering example
 Overview and prototype implementation
 Evaluation
 Conclustion

13
Example: (1) Job submission

Standard Condor/DAGMAN job submission
 But
includes steering functions in job description
14
Example (2): User Notification



Various notification methods, incl. email
Displays Portal URL in the message
Works on various devices incl. cell phones
15
Example (3): Steering Portal
Visualize current status
Continuing of
Workflow
Portal generating steering web
pages dynamically depending on
workflow context
Parameter Input
16
Outline
Background
 Motivating example
 Required features of steering
 Steering example
 Overview and prototype implementation
 Evaluation
 Conclustion

17
Overview of our Steering Portal
Individual job
submissions
Workflow and
Steering description
DAGMAN/
Condor
submission
Condor
Pool
Retry
Function
POST
Scripting Steering–
Features notification
Steering Portal
User Notification Steering–display
Web page generation
and Job control
Steering–input
18
Overview of Steering Portal (2)

The user defines several steering components for the
steering portal, defining in a script below:
A) A set of applications in the workflow
B) CondorDAGMan+Steering workflow description
A)
B)
Translator for converting output to input to continue
workflow
Visualization program to display application output on
steering web page
Application input/output specifications
Parameters that require steering
The Steering portal does:
C)
D)




Read the above script
Automatically generate steering web page
Interact with DAGMAN to notify users (email, etc.) and take
input from the web portal
19
Prototype Implementation

Coordination between DAGMAN and Steering Portal



Use DAGMan POST Scripting function to invoke the steering
portal
Use DAGMan Retry function to resume workflow execution
Prototype Implementation of the Steering Portal


Interpretation of the steering descriptions embedded in
DAGMAN workflow
Appropriate and multiple notifications and steering interfaces
available

Notification and interfaces currently selected according to script




Automated selection for the future
Mail and messaging notification function with embedded services
CGI web page generation onto the portal sever using ssh
Steering from anywhere, anytime (incl. cell phones and PDAs
20
Outline
Background
 Motivating example
 Required features of steering
 Steering example
 Overview and prototype implementation
 Evaluation
 Conclustion

21
Evaluation



Apply to sample applications (simple pi
calculation and more complex phylogenetic tree
example)
Evaluate the necessary “work steps”
Items of Evaluation
A)
B)
C)
D)
E)
F)
G)
Modification to the application program itself
CondorDAGMan workflow description
Translator for converting input to output to continue
workflow
Visualization program to display application output
on steering web page
Application input/output specifications
Parameters that require steering
Modifications to the Condor Job submit file
22
Sample Pi Program
Eval.
Item
A
No mod to the original
program
E
Input: 4 inputs from stdin
Output: 3 number columns
F
# Lines in
Total
Eval. Item # Files
2 inputs out of the 4 stdin
B
2
4
C
0
0
D
1
3
G
1
6
23
Phylogenetic Tree Program
Eval.
Item
A
No mod to the original
program
E
Input: 1 setup file, 1 data
file
Output: 2 files
F
# Lines in
Total
Eval. Item # Files
B
3
6
C
1
40
D
1
16
G
20(1)
180
1 parameter value
(1) 20 9-line files, only 1 line differs
amongst them
24
Conclusion and Future Work

Conclusion
 Proposed
a Steering Portal that allows interactive
steering of batch scheduled jobs in Condor/DAGMAN
 Created prototypes with flexible notification and
visualization/steering features
 Applied to sample apps including Pi and Phylogenetic
trees

Future work
 Support
and automatically select various interfaces
 Apply to other application, esp. with larger workflows
and more complex interactions
 Apply to other workflow engines
25

Contact info
 Satoshi
Matsuoka, [email protected],
Tokyo Institute of Technology
26