Instead we set the sequences to permanently reside at the

Master Thread
SendAll(MOTIF_NEWLGH)
SendAll(MOTIF_SCAN)
GetAll(MOTIF_REDNORM)
SendAll(MOTIF_SCAN)
GetAll(MOTIF_REDNORM)
GetAll(MOTIF_REDPWM)
SendAll(MOTIF_SAVEVARS)
GetAll(MOTIF_GETVARS)
SendAll(MOTIF_PATELIM)
Instead we set the sequences to
permanently reside at the
processing element and send it
processing “triggers” (i.e. masterslave).
Since sequences can be searched
and modified independently they
are spread across multiple
processing elements (i.e. parallel
master-slaves).
Note the processing elements
maintain state information and may
perform asynchronous operation inbetween receiving triggers.
time
SendAll(MOTIF_STOP)
Slave Threads
Marchand, Bajic, Kaushik KAUST Oct 2011
1 Pure MPI
3000000
2500000
2000000
1500000
1000000
MPI
500000
Motif
Now the maximum speedup
is 239.6x – over 256 cores
(MPI-OpenMP).
0
# CPUs
MPI Overheads #CPUS 256 512 1024 2048 4096 8192 16384 32768 65536 Pure MPI Ini3al 18112 9072 4799 2375 11655 56369 279519 1156811 MPI-­‐
OpenMP Ini3al 541 109 80 78 247 2289 10124 49814 255710 Marchand, Bajic, Kaushik KAUST Oct 2011
Pure MPI Grouped 9353 18444 9278 4734 2892 3493 12614 43995 44782 MPI-­‐
MPI-­‐
OpenMP Pure MPI OpenMP Grouped Mul3Level Mul3Level 328 36 643 148 43 133 82 5425 3316 68 3466 1497 60 1765 1918 154 856 404 722 451 424 2807 224 148 13692 29 28 300.00
250.00
200.00
150.00
100.00
50.00
0.00
Speedup
Pure-MPI
256
512
1024
2048
4096
8192
16384
32768
65536
Run Time (msec)
3500000
MPIOpenMP
2