introduction

AVERAGES, DISTRIBUTIONS AND SCALABILITY
OF MPI
COMMUNICATION TIMES FOR ETHERNET AND
MYRINET NETWORKS
Nor Asilah Wati Abdul Hamid and Paul Coddington
Presented by:
Ibrahim Saidu GS22854
Kumane Saed GS24433
Cheng Kian Yong GS24460
Luay GS 21605
Lecturer: Dr. Nor Asilah Wati Abdul Hamid
INTRODUCTION
 In the past few years, commodity clusters have
become the dominant architecture for high
performance computing.
 Most parallel programs that run on clusters use the
Message Passing Interface (MPI) for communicating
data between nodes of the clusters.
 It is well known that Myrinet with GM has
significant advantages over Fast Ethernet with TCP.
 In the case of Ethernet with TCP, retransmit timeouts
(RTOs) can also occur
PROBLEM STATEMENT
• Most modern parallel computers are clusters using Myrinet
or Ethernet communication networks.
• Several studies have been published comparing the
performance of these two networks for parallel computing,
however these focus on average performance, and do not
address the distributions of communication times, which
can have long tails due to contention effects.
• In the case of Ethernet with TCP, retransmit
timeouts (RTOs) can also occur.
OBJECTIVES
 To investigate the effect of Retransmit timeouts
(RTOs) on Ethernet performance and how much
could be gained from reducing the effects of RTOs.
 We have analyzed the distributions of
communication times for standard MPI routines
on Ethernet with TCP and Myrinet with GM
communications networks on the same cluster.
 We also studied the scalability of the distributions
as the number of communicating processes
increases.
RELATED WORK
• [4,5,6,7]) measure only the average times for point-to-point
(ping-pong) communications between two nodes.
• [3] Studied the effects of TCP Retransmit Timeouts (RTO)
on MPI communications over Ethernet networks,
including collective communications.
• [3,4,5,6]) compare network performance using
applications benchmarks such as the NAS Parallel
Benchmarks.
• [3,4] analyzed the effects of tuning Ethernet drivers or TCP
configuration to improve MPI performance on Ethernet
networks.
RELATED WORK
• [8] has used MPIBench to compare the MPI performance
(including distributions of communication times) of
Ethernet and Myrinet networks, but these were not direct
comparisons.
• [9] compare the performance of different Ethernet network
topologies in commodity clusters, showed that there were
significant problems with the performance of collective
communications in MPICH version 1.2.0 on Fast Ethernet
networks.
• [11] used later version of the MPICH for collective
communication routines , which
give much better
performance on Ethernet networks and perhaps reduce the
number of RTOs
METHODOLOGY
IBM eServer 1350 Linux Cluster
IBM eServer 1350 Linux Cluster
 Fast Ethernet Architecture
METHODOLOGY
Bench Mark.
 Measurements of MPI communication times were
obtained using MPIBench [1,2,8]. All measurements
were run with dedicated access to the cluster, so there
were no other processes affecting the results.
Results
1. Send/Receive
Send/Receive (Cont..)
 Fast Ethernet are about 10 times higher than Myrinet.
 For higher message sizes the difference is primarily
due to the difference in bandwidth for each network.
 For Ethernet there is a jump between 64 and 128 CPUs
(32 to 64 nodes) which is due to the communication
no longer being between processors connected by a
single switch.
Send/Receive (Cont..)
Send/Receive (Cont..)
 TCP Retransmit-Timeout (RTO), which the TCP
specifications say should be given by
RTO = SRTT + 4 * RTTVAR
 The average communication time without RTO
(SRTT= 25 ms) plus the 200 ms minimum value for 4 *
RTTVAR set by the Linux kernel.
 Presumably caused by communications that suffer 2 or
3 RTOs before finally being completed
2. Combined Send and Receive
Combined Send/Receive (Cont..)
 Results are approximately a factor of 2 larger than the
MPI_Send/MPI_Recv
 Results indicated the duplex capability of these
networks is not being utilized.
3.Barrier
Barrier (Cont…)
 The big jump in the Ethernet result is probably due to
a different algorithm being used in MPICH 1.2.6 code.
 Ethernet is approximately 4-5 times slower than
Myrinet.
Barrier (Cont…)
4.Broadcast
Broadcast (Cont…)
 Through a single Ethernet switch, rather than between
switches, there are no RTOs for broadcast.
 Myrinet distributions have quite long tails, which are
caused by a small number of repetitions of the
benchmark
5.Alltoall
Alltoall (Cont…)
 That average completion time for Myrinet increases
gradually with message size and number of processes.
 Ethernet performance for more than 32 CPUs shows
the effect of Retransmit -Timeouts
6. Conclusions
 As expected, the Myrinet network performs
significantly better than Fast Ethernet.
 The TCP RTO on the Ethernet network does affect
communications performance, but only for large
message sizes and large numbers of processors, where
the network becomes saturated.
 The effects are much less serious than previous
measurements.
FACULTY OF COMPUTER
SCIENCE AND INFORMATION
TECHNOLOGY
Thank you