A Steering Portal for Condor/DAGMAN Naoya Maruyama on behalf of Akiko Iino Hidemoto Nakada, Satoshi Matsuoka Tokyo Institute of Technology 1 Background Common Grid Usage Scenario Zillions of Batch Jobs scheduled over combination of private/public resources within a VO Some Jobs require steering during workflow “Human decision required” Most previous steering work focused on GUIlevel interactivity Real-time, interactive steering of the application itself Does not meld well with batch jobs Need significant application customizations 2 Objectives and Contributions Objectives A Steering Portal for workflow (DAGMAN) jobs with easy descriptions, w/o application, Condor, or DAGMAN modifications Contributions Portal to allow steering with simple additions to DAGMAN scripts Confirmed low overhead with exemplar applications Quantitative assessment of user steps required 3 Outline Background Motivating example Required features of steering Steering example Overview and prototype implementation Evaluation Conclustion 4 Exemplar Application: Phylogenetic Tree Inference Infer phylogenetic relationships between different species from their genomic sequences [Hasegawa&Shimodaira04] App Characteristics Common Ancestor Basically execute multiple parallel jobs in sequence => Workflow of batch jobs But difficult to judge the termination condition of the application phases => Need human steering 5 Phylogenetic Tree Inference Breakdown Narrow down on the candidate phylogenetic trees: Hard to automate=>batch jobs difficult Compute Posterior Probability Compute likelihood value “PAML” “MrBayes” Test “CONSEL” 6 List of Applications in the WF Job Description Input Output Compute Time Required MrBayes Compute Posterior Probability Initial Topology List of Topologies ~2 weeks on 24 high-end CPUs PAML Compute likelihood value List of Topologies Likelihood Values ~10 days on 26 high-end CPUs Test List of Topologies & Likelihood Values Probability Values 1~2 hours on 1 CPU CONSEL 7 The Actual Workflow 1 1 1 1 1 1. 5 5 2 2. 3 3. 4 Need 4. Steering 5. 5 5 5 6. 6 Exec. MrBayes Termination Judgement Manutal input of new parameters Post-Process MrBayes Execute PAML Execute CONSEL 8 MrBayes Example and Problems As a standalone app, requests interactive input Up to a user to judge computational convergence But lacks info display to allow good judgment Not on this screen! 1.User needs to periodically poll his screen and make interactive input 2.Also look at output files from 1000 jobs! 9 MrBayes Examples and Problems (2) ・Decide on Convergence Visualize ・Decide on next parameter Output file Problems: 3.Manual conversion to graphical display 4.Changing appropriate parameters 10 Outline Background Motivating example Required features of steering Steering example Overview and prototype implementation Evaluation Conclustion 11 Steering portal features for batch workflows with interactivity elements Pausing/resuming computation Allow flexible parameter modifications Progress computation as much as possible until user input is absolutely needed Resume immediately after input Various ways to specify parameters for output and input Various ways to notify users – interactive screen, email, etc. Various ways of parameter observations – various portal functions Various ways to modify parameters Even switching back and forth between your terminal and from a cell phone 10,000 miles away! 12 Outline Background Motivating example Required features of steering Steering example Overview and prototype implementation Evaluation Conclustion 13 Example: (1) Job submission Standard Condor/DAGMAN job submission But includes steering functions in job description 14 Example (2): User Notification Various notification methods, incl. email Displays Portal URL in the message Works on various devices incl. cell phones 15 Example (3): Steering Portal Visualize current status Continuing of Workflow Portal generating steering web pages dynamically depending on workflow context Parameter Input 16 Outline Background Motivating example Required features of steering Steering example Overview and prototype implementation Evaluation Conclustion 17 Overview of our Steering Portal Individual job submissions Workflow and Steering description DAGMAN/ Condor submission Condor Pool Retry Function POST Scripting Steering– Features notification Steering Portal User Notification Steering–display Web page generation and Job control Steering–input 18 Overview of Steering Portal (2) The user defines several steering components for the steering portal, defining in a script below: A) A set of applications in the workflow B) CondorDAGMan+Steering workflow description A) B) Translator for converting output to input to continue workflow Visualization program to display application output on steering web page Application input/output specifications Parameters that require steering The Steering portal does: C) D) Read the above script Automatically generate steering web page Interact with DAGMAN to notify users (email, etc.) and take input from the web portal 19 Prototype Implementation Coordination between DAGMAN and Steering Portal Use DAGMan POST Scripting function to invoke the steering portal Use DAGMan Retry function to resume workflow execution Prototype Implementation of the Steering Portal Interpretation of the steering descriptions embedded in DAGMAN workflow Appropriate and multiple notifications and steering interfaces available Notification and interfaces currently selected according to script Automated selection for the future Mail and messaging notification function with embedded services CGI web page generation onto the portal sever using ssh Steering from anywhere, anytime (incl. cell phones and PDAs 20 Outline Background Motivating example Required features of steering Steering example Overview and prototype implementation Evaluation Conclustion 21 Evaluation Apply to sample applications (simple pi calculation and more complex phylogenetic tree example) Evaluate the necessary “work steps” Items of Evaluation A) B) C) D) E) F) G) Modification to the application program itself CondorDAGMan workflow description Translator for converting input to output to continue workflow Visualization program to display application output on steering web page Application input/output specifications Parameters that require steering Modifications to the Condor Job submit file 22 Sample Pi Program Eval. Item A No mod to the original program E Input: 4 inputs from stdin Output: 3 number columns F # Lines in Total Eval. Item # Files 2 inputs out of the 4 stdin B 2 4 C 0 0 D 1 3 G 1 6 23 Phylogenetic Tree Program Eval. Item A No mod to the original program E Input: 1 setup file, 1 data file Output: 2 files F # Lines in Total Eval. Item # Files B 3 6 C 1 40 D 1 16 G 20(1) 180 1 parameter value (1) 20 9-line files, only 1 line differs amongst them 24 Conclusion and Future Work Conclusion Proposed a Steering Portal that allows interactive steering of batch scheduled jobs in Condor/DAGMAN Created prototypes with flexible notification and visualization/steering features Applied to sample apps including Pi and Phylogenetic trees Future work Support and automatically select various interfaces Apply to other application, esp. with larger workflows and more complex interactions Apply to other workflow engines 25 Contact info Satoshi Matsuoka, [email protected], Tokyo Institute of Technology 26
© Copyright 2024 Paperzz