Computer Science 111

Computer Science 320
Sequential Dependencies
All-Pairs-Shortest-Paths
• Given a weighted graph, find the shortest
paths between each pair of vertices for
which there is a path
• Useful for planning or scheduling trips
between cities, designing networks, etc.
Graph shows vertices and
edges; weights are yet to be
filled in (distances between
adjacent cities)
Distance matrix shows distances
(weights) between adjacent vertices; ∞
means distance of a path not yet
known
All-pairs algorithm fills in the shortest
distances between all the pairs
Floyd’s Algorithm
• Published by Robert Floyd in 1962
• Inputs: n by n matrix d for a graph of n
vertices
• Outputs: Same matrix, but with the length
of each path replaced by the length of the
shortest path, or ∞ if there is no path
Floyd’s Algorithm
for i = 0 to n – 1
for r = 0 to n – 1
for c = 0 to n – 1
drc = min(drc, dri + dic)
drc = distance from vertex r to vertex c
dri = distance from vertex r to vertex i
dic = distance from vertex i to vertex c
Resources for Distance Matrices
• edu.rit.io.DoubleMatrixFile supports output
and input of distance matrices
• Program FloydRandom creates a new distance matrix
from a random seed, adjacency radius, and the number of
vertices
FloydSeq
// Read distance matrix from input file.
DoubleMatrixFile in = new DoubleMatrixFile();
DoubleMatrixFile.Reader reader = in.prepareToRead
(new BufferedInputStream(new FileInputStream (infile)));
reader.read();
reader.close();
d = in.getMatrix();
n = d.length;
long t2 = System.currentTimeMillis();
for (int i = 0; i < n; ++ i){
double[] d_i = d[i];
for (int r = 0; r < n; ++r){
double[] d_r = d[r];
for (int c = 0; c < n; ++c)
d_r[c] = Math.min(d_r[c], d_r[i] + d_i[c]);
}
}
long t3 = System.currentTimeMillis();
Parallelize!
• Can the outer loop be done in parallel?
• Can the middle loop be done in parallel?
• Can the last loop be done in parallel?
Parallelize with Row Slicing
for i = 0 to n – 1
parallel for r = 0 to n – 1
for c = 0 to n – 1
drc = min(drc, dri + dic)
Parallelize with Column Slicing
for i = 0 to n –
for r = 0 to
parallel
drc =
1
n – 1
for c = 0 to n – 1
min(drc, dri + dic)
FloydSmpRow
new ParallelTeam().execute(new ParallelRegion(){
public void run() throws Exception{
for (int ii = 0; ii < n; ++ ii){
final int i = ii;
final double[] d_i = d[i];
execute(0, n - 1, new IntegerForLoop(){
public void run (int first, int last){
for (int r = first; r <= last; ++r){
double[] d_r = d[r];
for (int c = 0; c < n; ++c)
d_r[c] = Math.min(d_r[c], d_r[i] + d_i[c]);
}
}
});
}
}
});
SmpRow Performance
Cache churning
when data set
becomes large
relative to the
number of threads
FloydSmpCol
new ParallelTeam().execute (new ParallelRegion(){
public void run() throws Exception{
for (int ii = 0; ii < n; ++ii){
final int i = ii;
final double[] d_i = d[i];
for (int r = 0; r < n; ++r){
final double[] d_r = d[r];
execute(0, n-1, new IntegerForLoop(){
public void run(int first, int last){
for (int c = first; c <= last; ++c)
d_r[c] = Math.min(d_r[c], d_r[i] + d_i[c]);
}
});
}
}
}
});
SmpCol Performance
n2 barrier waits,
as opposed to n
barrier waits, so
larger sequential
fraction
But decreases as
size increases,
because n3
running time
predominates
SmpRow Performance
Cache churning!