Unit 4 - Programming Model - e

Unit 4 - Programming Model
Prof. B. Chandramouli
Syllabus
• Open source grid middleware packages
• GT4 Architecture, configuration
• Usage of Globus
• Main components and programming model
• Introduction to Hadoop Framework
• Mapreduce, Input splitting, map and reduce functions
• Specifying I/O parameters, configuring and running a job
• Design of Hadoop file system, HDFS concepts
• Command line and java interface, data flow of file read & file write
Open source grid middleware packages
• Grid system includes
•
•
•
•
Computational resources
Storage resources
Network resources
Scientific instruments
• Grid middleware is a software which provides users
with
• Access to resources
• Computing ability
Popular Opensource Grid middleware SWs
• UNICORE ( open source)
• Focus : High level programming models (Java on Unix)
• GLOBUS ( open source)
• Focus : Low level services (C and Java on Unix)
• GRIDBUS ( open source)
• Focus : Abstraction and market models (Java on Unix)
• LEGION ( Not open source)
• Focus : High level programming models (C++ on Unix)
Building a Grid service with GT4
OGSI is the basic bldg block of OGSA, OGSI is implemented with GT4,. A grid service ,
which is an extension of web service , so can be built with GT4
Globus Toolkit 4
• GT4 has become a de facto standard
• Using GT4, we can build computational grid and run grid based apps.
• The Globus toolkit has four major parts (services):
• Security – Components to provide a security envelope and secure access ( GSI
– Grid services Infrastructure).
• Information – Monitoring and discovery of resources and services (MDS)
• Grid Resources and Allocation Management - (GRAM)
• Data management – Access and transfer of data ( Grid FTP)
User employing Globus services in a Grid
Security Shield – Grid Security
Infrastructure ( GSI)
GT4 Architecture
GT4 Architecture aspects 1 of 3
• Service Implements ( i.e infrastructure services):
•
•
•
•
Resources allocation management (GRAM)
Data access and data movement ( Grid FTP Reliable File Transfer - RFT)
Replica management ( RLS – replica location services)
Credential management – security (MyProxy, Delegation thro Certificate
Authorities - CAS)
• Discovery and monitoring resources( Index, Trigger)
GT4 Architecture aspects 2 of 3
• Containers:
• Java
• Python
•C
• These containers are open source environment to provide web
services including
• WS resource frame work ( WSRF)
• WS Notification
• WS security
GT4 Architecture aspects 3 of 3
• Class Libraries
• To invoke client programs
Client Server Communication
• WS Interoperability transport compliant to communicate using SOAP
messaging
• X.509 entity and proxy certifications for single sign in.
GT4 service components - GRAM
• Grid Resources and Allocation Management
• After discovery of resources, GRAM initiates, monitors and manages the
execution of computations on remote computers
• GRAM also responsible for restarting the process in the event of resource
failure or service failure
GT4 service components - GSI
• Grid Security Infrastructure
•
•
•
•
•
Provides authentication to grid users
Ensures secure communication
Single sign on thro certifications
Data Encryption
Technologies used are
• SSL – Secure sockets Layer
• PKI – Public Key Infrastructure
• X.509 – Certificate for security
Single sign on using Trust
Authority
GT4 service components –
Grid FTP, RFT, RLS
• Data management package
• Transmits , stores and manage massive data sets
• Components of this service are
• Grid FTP ( normal FTP + enhanced security)
• RFT ( Reliable File Transfer)
• RLS ( Replica Location service)
GT4 Job workflow
GT4 Configuration *
• Grid FTP Configure
• Installed when GT4 is installed by default. No specific config required. But GridFTP must
be started using commands to bring up FTP services
• RFT configure
• It does third party transfers between GridFTP servers and records transfer status in a
database ( Postgre SQLv8.1.4) . This dB must be installed.
• GRAM configure
• Installed when GT4 is installed by default. No specific config required. GRAM executes
and manages jobs thro local scheduler.
• *Refer book for commands
Usage of GLOBUS GT4
• Defining a job
• A job is a single process or multiple processes created as an
outcome of a job request
• Staging files
• Transferring .exe and data files to required destination without
user intervention. To transfer we must provide source and
destination URLs
• Submitting a job (2 steps)
• Data transfer ( GridFTP or GASS protocol)
• Job submission ( GRAM has tools to submit job)
• Monitoring a job ( 3 tasks)
• Track status of submitted job
• Collect output
• Clean files
Main components and Programming model
of GT4
• Main components
• Security component
• GSI ( Grid Security Infrastructure)
• Data Management component
• Grid FTP
• RFT ( Reliable File Transfer)
• Data Replica component
• RLS ( Replica Location Service)
• DRS ( Data Replication service)
• Execution management
• GRAM ( Grid Resources AllocatioManagement)
• Monitoring and Discovery Services ( MDS)
• Aggregator services ( general framework to build service and
aggregate data)
• Index
• Trigger
Possible 2 mark questions
1.
2.
3.
4.
Define Grid middleware
List 4 popular Grid middleware software
What are the 4 major parts (services) of GT4 ?
Describe the purpose of GSI, GRAM, MDS and
GridFTP
5. List 2 safe file transfer protocols in Globus
6. What 3 components need to be installed during
GT4 configuration?
7. What does staging file means?
Possible big questions
8marks
1. Write a short note on GT4 configuration
2. Write a short note on GT4 usage
3. Write a short note on GT4 main components
16 marks
1. Explain with a neat diagram the GT4 architecture in detail.
2. Explain with a neat diagram the GT4 Job workflow in detail.