CENTRE FOR PARALLEL COMPUTING 8th IDGF Workshop Hannover, August 17th 2011 International Desktop Grid Federation CENTRE FOR PARALLEL COMPUTING Experiences with the University of Westminster Desktop Grid S C Winter, T Kiss, G Terstyanszky, D Farkas, T Delaitre CENTRE FOR PARALLEL COMPUTING Contents • Introduction to Westminster Local Desktop Grid (WLDG) – Architecture, deployment management – EDGeS Application Development Methodology (EADM) • Application examples • Conclusions CENTRE FOR PARALLEL COMPUTING Introduction to Westminster Local Desktop Grid (WLDG) New Cavendish St Marylebone Road Regent Street Wells Street Little Titchfield St Harrow Campus 6 1 2 5 3 4 576 559 395 31 66 254 CENTRE FOR PARALLEL COMPUTING WLDG Environment • DG Server on private University network • Over 1500 client nodes on 6 different campuses • Most machines are dual core, all running Windows • Running SZTAKI Local Desktop Grid package • Based on student laboratory PC’s – If not used by student switch to DG mode – If no more work from DG server shutdown (Green policy) CENTRE FOR PARALLEL COMPUTING The DG Scenario gUSE WS PGRADE portal UoW Local Desktop Grid WS P-GRADE BOINC Server DG Submitter submits jobs and retrieve results via 3G Bridge Create graph and concrete workflow, and submit to DG BOINC workers End-user Workers: Download executable and input files Upload: result CENTRE FOR PARALLEL COMPUTING WLDG: ZENworks deployment • BOINC clients installed automatically and maintained by specifically developed Novell ZENworks objects – MSI file has been created to generate a ZENworks object that installs the client software. – BOINC Client Install Shield Executable converted into an MSI package (/a switch on the BOINC Client executable) – BOINC client is part of the generic image installed on all lab PC’s throughout the University – Guaranteed that any newly purchased and installed PC automatically becomes part of the WLDG • All clients registered under same user account CENTRE FOR PARALLEL COMPUTING EDGeS Application Development Methodology (EADM) • Generic methodology for DG application porting • Motivation: Special focus required when porting/developing an application to a SG/DG platform • Defines how the recommended software tools, eg. developed by EDGeS, can aid this process • Supports iterative methods: – well-defined stages suggest a logical order – but (since in most cases process is non-linear) allows revisiting and revising results of previous stages, at any point CENTRE FOR PARALLEL COMPUTING EADM – Defined Stages 1. Analysis of current application 2. Requirements analysis 3. Systems design 4. Detailed design 5. Implementation 6. Testing 7. Validation 8. Deployment 9. User support, maintenance & feedback CENTRE FOR PARALLEL COMPUTING Application Examples • Digital Alias-Free Signal Processing • AutoDock Molecular Modelling CENTRE FOR PARALLEL COMPUTING Digital Alias-Free Signal Processing (DASP) • Users: Centre for Systems Analysis – University of Westminster • Traditional DSP based on Uniform sampling – Suffers from aliasing • Aim: Digital Alias-free Signal Processing (DASP) – One solution is Periodic Non-uniform Sampling (PNS) • The DASP application designs PNS sequences • Selection of optimal sampling sequence is computationally expensive process – A linear equation has to be solved and a large number of solutions (~1010 ) compared. • The analyses of the solutions are independent from each other suitable for DG parallelisation CENTRE FOR PARALLEL COMPUTING DASP - Parallelisation Solve qr r q2 r 1 (2r 1) p qr, qr+1, …, q2r-1 Find best Permutation for solution 1, 1+m, 1+2m… Computer 1 qr, qr+1, …, q2r-1 qr, qr+1, …, q2r-1 Find best Permutation for solution 2, 2+m, 2+2m … Computer 2 … Find best Permutation for solution m, 2m, 3m, … Computer m Locally best solution Locally best solution Locally best solution Find globally best solution CENTRE FOR PARALLEL COMPUTING DASP – Performance test results Period T DG Sequential (factor) worst DG median DG best # of # of Speedup nodes work (best involved units case) (median) 18 13 min 9 min 7 min 4 min 50 3.3 59 20 2 hr 29 min 111 min 43 min 20 min 100 7.5 97 22 26 hr 40 min 5h 1min 3 hr 24 min 2 hr 31 min 723 11 179 24 ~820 hr n/a n/a 17 hr 54 min 980 46 372 CENTRE FOR PARALLEL COMPUTING DASP – Addressing the performance issues • Inefficient load balancing – solutions of the equation should be grouped based on the execution time required to analyse individual solutions • Inefficient work unit generation – some of the solutions should be divided into subtasks (more work units) – Limits to the possible speed-up • User-community/application developers to consider redesigning the algorithm CENTRE FOR PARALLEL COMPUTING AutoDock Molecular Modelling • Users: • Dept of Molecular & Applied Biosciences, UoW • AutoDock: • a suite of automated docking tools • designed to predict how small molecules, such as substrates or drug candidates, bind to a receptor of known 3D structure • application components: – AutoDock performs the docking of the ligand to a set of grids describing the target protein – AutoGrid pre-calculates these grids CENTRE FOR PARALLEL COMPUTING Need for Parallelisation • One run of AutoDock finishes in a reasonable time on a single PC • However, thousands of scenarios have to be simulated and analysed to get stable and meaningful results. – AutoDock has to be run multiple times with the same input files but with random factors – Simulations runs are independent from each other – suitable for DG • AutoGrid does not require Grid resources CENTRE FOR PARALLEL COMPUTING AutoDock component workflow dpf file gpf file pdb file (ligand) Pamela pdb file (receptor) prepare_ligand4.py prepare_receptor4.py pdbqt file AUTOGRID map files pdbqt file AUTODOCK AUTODOCK AUTODOCK AUTODOCK AUTODOCK dlg files pdb file SCRIPT2 best dlg files SCRIPT1 CENTRE FOR PARALLEL COMPUTING Computational workflow in P-GRADE receptor.pdb gpf descriptor file Number of work units Autogrid executables, Scripts (uploaded by the developer , don’t change it) ligand.pdb dpf descriptor file 1. 2. 3. output pdb file dlg files 4. The Generator job creates specified numbered of AutoDock jobs. The AutoGrid job creates pdbqt files from the pdb files, runs the autogrid application and generates the map files. Zips them into an archive file. This archive will be the input of all AutoDock jobs. The AutoDock jobs are running on the Desktop Grid. As output they provide dlg files. The Collector job collects the dlg files. Takes the best results and concatenates them into a pdb file. CENTRE FOR PARALLEL COMPUTING AutoDock – Performance test results Speedup 200 180 160 Speedup 140 120 100 80 60 40 20 0 10 100 1000 # of work units 3000 CENTRE FOR PARALLEL COMPUTING DG Drawbacks: The “Tail” Problem Jobs >> Nodes Jobs ≈ Nodes CENTRE FOR PARALLEL COMPUTING Tackling the Tail Problem • Augment the DG infrastructure with more reliable nodes, eg. service grid or cloud resources • Redesign scheduler to detect tail and resubmit tardy tasks to SG or cloud CENTRE FOR PARALLEL COMPUTING Cloudbursting: Indicative Results CENTRE FOR PARALLEL COMPUTING AutoDock - Conclusions • CygWin on Windows implementation inhibited performance – can be improved using (eg.) • DG to EGEE bridge • Cloudbursting • AutoDock is black-box legacy application – source code not available – code-based improvement not possible CENTRE FOR PARALLEL COMPUTING Further Applications • • • • • • • • • • • Ultrasound Computer Tomography - Forschungszentrum Karlsruhe EMMIL – E-marketplace optimization - SZTAKI Anti-Cancer Drug Research (CancerGrid) - SZTAKI Integrator of Stochastic Differential Equations in Plasmas - BIFI Distributed Audio Retrieval - Cardiff University Cellular Automata based Laser Dynamics - University of Sevilla Radio Network Design – University of Extramadura An X-ray diffraction spectrum analysis - University of Extramadura DNA Sequence Comparison and Pattern Discovery - Erasmus Medical Center PLINK - Analysis of genotype/phenotype data - Atos Origin 3D video rendering - University of Westminster CENTRE FOR PARALLEL COMPUTING Conclusions – Performance Issues • Performance enhancements – accrue from cyclical enterprise level hardware and software upgrades • Are countered by performance degradation – arising from shared nature of resources • Need to focus on robust performance measures – in face of random unpredictable run-time behaviours CENTRE FOR PARALLEL COMPUTING Conclusions – Load Balancing Strategies • Heterogranular workflows – Tasks can differ widely in size and run times – Performance prediction, based eg. on previous runs, can inform mapping (up to a point) .. – .. but after this, may need to re-engineer code (white box only) – .. or consider offloading bottleneck tasks to reliable resources • Homogranular workflows – Classic example: parameter sweep problem – Fine grain problems (#Tasks >> #Nodes) help smooth out the overall performance, but .. – .. tail problem can be significant (especially if #Tasks ≈ #Nodes) – Smart detection of delayed tasks coupled with speculative duplication CENTRE FOR PARALLEL COMPUTING Conclusions – Deployment Issues • Integration within enterprise desktop management environment has many advantages, eg. – PC’s and applications are continually upgraded – Hosts and licenses are “free” on the DG • … but, also some drawbacks: – No direct control • Typical environments can be slack and dirty • Corporate objectives can override DG service objectives • Examples: current UoW Win7 deployment, green agenda – Service relationship, based on trust • DG bugs can easily damage trust relationship, if not caught quickly • Example: recent GenWrapper bug – Non-dedicated resource • Must give way to priority users, eg. students CENTRE FOR PARALLEL COMPUTING The End Any questions?
© Copyright 2026 Paperzz