Ring pipelined algorithm for the algebraic path problem on the CELL

Claude Tadonki
Mines ParisTech – CRI – Mathématiques et Systèmes
Laboratoire de l’Accélérateur Linéaire/IN2P3/CNRS
France
[email protected]
2nd Workshop on Architecture and Multi-Core Applications
23rd International Symposium on Computer Architecture and High Performance Computing (SBAC PAD 2011)
October, 26 – 29 2010, Vitória, Espírito Santo, Brazil.
Large Scale Kronecker Product on Supercomputers
C. TADONKI
The Kronecker product (définition and applications)
2nd Workshop on Architecture and Multi-Core Applications
23rd International Symposium on Computer Architecture and High Performance Computing (SBAC PAD 2011)
October, 26 – 29 2010, Vitória, Espírito Santo, Brazil.
Large Scale Kronecker Product on Supercomputers
C. TADONKI
The Kronecker product (properties and problem formulation)
2nd Workshop on Architecture and Multi-Core Applications
23rd International Symposium on Computer Architecture and High Performance Computing (SBAC PAD 2011)
October, 26 – 29 2010, Vitória, Espírito Santo, Brazil.
Large Scale Kronecker Product on Supercomputers
C. TADONKI
The Kronecker (complexity and recurrence equation)
Forming the matrix first would
• require a huge amount of memory
• yield lot of redundant multiplication, which in total would be
Using the so-called normal factorization, we could derive an optimal scheme
which reduces the number of floatting point multiplication to
2nd Workshop on Architecture and Multi-Core Applications
23rd International Symposium on Computer Architecture and High Performance Computing (SBAC PAD 2011)
October, 26 – 29 2010, Vitória, Espírito Santo, Brazil.
Large Scale Kronecker Product on Supercomputers
C. TADONKI
The Kronecker product and its applications
2nd Workshop on Architecture and Multi-Core Applications
23rd International Symposium on Computer Architecture and High Performance Computing (SBAC PAD 2011)
October, 26 – 29 2010, Vitória, Espírito Santo, Brazil.
Large Scale Kronecker Product on Supercomputers
C. TADONKI
Performance issues and heuristic for finding a good topology
The total (parallel) execution time depends on
• the sizes of the matrices
• the gap between virtual topology and physical topology
• the way the task is splitted among the processors (decomposition)
2nd Workshop on Architecture and Multi-Core Applications
23rd International Symposium on Computer Architecture and High Performance Computing (SBAC PAD 2011)
October, 26 – 29 2010, Vitória, Espírito Santo, Brazil.
Large Scale Kronecker Product on Supercomputers
C. TADONKI
Performances
We consider N = 6 matrices of
orders 30, 36, 32, 18, 24, 16,
thus L = 159 252 480
We see that
• our heuristic yields a significant improvment compare to trivial decompositions
• we start loosing the scalabily when the number of cores increases (com)
We the turn to hybrid implementation
2nd Workshop on Architecture and Multi-Core Applications
23rd International Symposium on Computer Architecture and High Performance Computing (SBAC PAD 2011)
October, 26 – 29 2010, Vitória, Espírito Santo, Brazil.
Large Scale Kronecker Product on Supercomputers
C. TADONKI
Performance of the hybrid implementation
We see that
• the hybrid implementation is better for larger number of cores
• for smaller number of cores, the SM implemntation exacerbates on cache misses
Need to investigate on the compromise and a better memory layout.
2nd Workshop on Architecture and Multi-Core Applications
23rd International Symposium on Computer Architecture and High Performance Computing (SBAC PAD 2011)
October, 26 – 29 2010, Vitória, Espírito Santo, Brazil.
Large Scale Kronecker Product on Supercomputers
C. TADONKI
END & QUESTIONS
2nd Workshop on Architecture and Multi-Core Applications
23rd International Symposium on Computer Architecture and High Performance Computing (SBAC PAD 2011)
October, 26 – 29 2010, Vitória, Espírito Santo, Brazil.