Somu_sp11

Somu Jayabalan
CSS 534: Parallel Programming Grid and Cloud - Programming Tasks
Assignment #4: Visualization of Sentinel Agent execution
Problem
To schedule MPI applications, Condor needs to be configured in such way that machines running MPI
jobs are dedicated. It means that once Condor begins MPI execution, it will continue the program until
the program ends. The program will not be preempted or suspended in the middle. If the program has
larger computation cycle, during the execution of program, the system resources may go low. Under
that condition, continuing execution on the same machine will affect the performance of the resource
which results poor execution time. Moreover during the execution of program, the user specified
resource criteria mayn’t be satisfied by the current executing resources.
Recommendation
Hence checking resource capacity during the execution of user program is required to decide whether
the program continue to run on the same machines or should be transferred to different nods. If it finds
the better computing resources then it can stop the execution in the current nodes and resume its
execution from different nodes.
Implementation
AgentTeamwork is a Job management system similar to Condor. This is java based system developed by
Prof. Fukuda. It consists of Daemon process (UWPlace) and collection of Mobile agents. Mobile agents
will be running inside the daemon process.
PFAgent: PFAgent is a mobile agent running in all participating nodes to broadcast resource information
(CPU, memory, network bandwidth etc.).
Commnader Agent: Commander Agent is the one injected by a user to execute the user program.
Commander Agent then spawns Sentinel Agent and exited upon completion of user program.
Sentinel Agent: Sentinel Agent contacts the PFAgent to get best computing nodes matching user
specified criteria and decided where to execute the user program. It also monitors the execution of user
program and if it finds the best computing node then it stops the execution of user program and
resumes the execution form best computing node. Sentinel Agent also moves to best computing node
along with the user program. Once the user program completes the execution, it notifies the
Commander Agent.
Somu Jayabalan
CSS 534: Parallel Programming Grid and Cloud - Programming Tasks
Assignment #4: Visualization of Sentinel Agent execution
Comm
ander
spwans
Sumits job
Sentin
el
Migrates
Sentin
el
Migrates
Sentin
el
Sentin
el
Sentin
el
In this final project, I’ve visualized where the sentinel agent is moving during the execution of user
program. Since this is a MPI program, it also visually represents the nodes (smaller circle) which are part
of the mpd.hosts file. When the sentinel agent move around it sends information to Commander Agent
then commander Agents writes the information in a file (nodes.txt). Graphics application keeps reading
this nodes.txt file and displays it visually. At the end of the execution, Commander Agent writes “end” in
the node file. When graphics application reads this “end” then it stops reading the node file.
Execution output
Green color represents the “execution completed” on the specific nodes and “Red” color represents
“Currently executing node”. From the below screenshot, we see the Sentinel Agent initially started on
Uw1-320-00 and then migrated to Uw1-320-06, Uw1-320-05, Uw1-320-01, Uw1-320-04 and then Uw1320-07. Smaller circle represents the nodes which were part of mpd.hosts file.
Somu Jayabalan
CSS 534: Parallel Programming Grid and Cloud - Programming Tasks
Assignment #4: Visualization of Sentinel Agent execution
Analysis
Original version of AgentTeamwork was implemented with the static list of nodes (defined in xml).
During my independent study, I’ve enhanced the framework to work with nodes based on its resource
capacity (dynamic). With the static list, we may end up executing the program with the nodes which has
low capacity. I’ve conducted performance evaluation with the best computing nodes as well worst
computing nodes. Below table summarizes the results.
Iterations
Iteration#1
Iteration#2
Iteration#3
Best Computing Node
Executiontime (seconds)
158.345
157.766
158.921
Worst Computing node
Executiontime (seconds)
167.682
166.039
163.266
Improvement
5.5%
4.9%
2.7%
Based on these iterations, we are always seeing improvement with best computing nodes over worst.
Somu Jayabalan
CSS 534: Parallel Programming Grid and Cloud - Programming Tasks
Assignment #4: Visualization of Sentinel Agent execution
Discussions
The way best computing node is calculated based on the following formula. I calculate the rank for each
computing node and then sorts the nodes based on its rank. Higher rank represents the best node and
lower rank represents worst computing node.
Cpu_capacity = (#ofCPUs*#ofCores*CPUSpeed) * (1-cpu_Load)
Cpu_rank0 =
(𝑐𝑝𝑢
𝑐𝑝𝑢𝑐𝑎𝑝𝑎𝑐𝑖𝑡𝑦0
𝑐𝑎𝑝𝑎𝑐𝑖𝑡𝑦0 +𝑐𝑝𝑢𝑐𝑎𝑝𝑎𝑐𝑖𝑡𝑦1 +⋯+𝑐𝑝𝑢𝑐𝑎𝑝𝑎𝑐𝑖𝑡𝑦𝑛
Memory_free_rank0 = (𝑚𝑒𝑚𝑜𝑟𝑦
)%
𝑚𝑒𝑚𝑜𝑟𝑦𝑓𝑟𝑒𝑒0
𝑓𝑟𝑒𝑒0 +𝑚𝑒𝑚𝑜𝑟𝑦𝑓𝑟𝑒𝑒1 +⋯+𝑚𝑒𝑚𝑜𝑟𝑦𝑓𝑟𝑒𝑒𝑛
)%
𝑇𝑜𝑡𝑎𝑙𝑀𝑒𝑚𝑜𝑟𝑦𝐴𝑣𝑎𝑖𝑙𝑎𝑏𝑙𝑒−𝑚𝑒𝑚𝑜𝑟𝑦_𝑓𝑟𝑒𝑒0
)
𝑎𝑣𝑎𝑖𝑙𝑎𝑏𝑙𝑒1 …+𝑚𝑒𝑚𝑜𝑟𝑦_𝑎𝑣𝑎𝑖𝑙𝑎𝑏𝑙𝑒𝑛
Memory_load0 = (𝑚𝑒𝑚𝑜𝑟𝑦𝐴𝑣𝑎𝑖𝑙𝑎𝑏𝑙𝑒0+𝑚𝑒𝑚𝑜𝑟𝑦
100−(𝑚𝑒𝑚𝑜𝑟𝑦𝑙𝑜𝑎𝑑0 )
)%
𝐿𝑜𝑎𝑑1 …+𝑚𝑒𝑚𝑜𝑟𝑦_𝐿𝑜𝑎𝑑𝑛
Memory_pressure_rank0 = (𝑚𝑒𝑚𝑜𝑟𝑦𝐿𝑜𝑎𝑑0+𝑚𝑒𝑚𝑜𝑟𝑦
𝑏𝑎𝑛𝑑𝑤𝑖𝑑𝑡ℎ0
Bandwdith_rank0 = (𝑏𝑎𝑛𝑑𝑤𝑖𝑑𝑡ℎ0+𝑏𝑎𝑛𝑑𝑤𝑖𝑑𝑡ℎ2+⋯+𝑏𝑎𝑛𝑑𝑤𝑖𝑑𝑡ℎ𝑛 ) %
overall_rank0 = (cpu_rank0 * 0.5) + (memoryfree_rank0 * 0.2) + (memoryload_rank0 * 0.1) + (bandwidth_rank0 * 0.2)
Further research
1) The weight allocated to Cpu_rank , memory and bandwidth are arbitrarily selected. Need to
conduct further research to come up with the appropriate weights and find out the correlations
between them.
2) Secondly I migrate the sentinel agent if the overall_rank of the best node is greater than current
node’s rank (overall_rank > (current_rank + 2)). During this migration need to find out the
migration cost (Time taken to save the current program and to resume from the destination
node).
Somu Jayabalan
CSS 534: Parallel Programming Grid and Cloud - Programming Tasks
Assignment #4: Visualization of Sentinel Agent execution
3)
The performance evaluation needs to be conducted with the simulated condition. Means I need
to develop stress scripts for CPU & Memory.