נכתב על ידי
מאיר בכור 027382977
אביתר שרעבי 32033946
Module
The module we are talking about is:
computer with multiple processors but only one
memory unit.
All the processors are synchronized using the same
clock.
The processors are all connected to each other and
to the memory.
If more then one processor writes the same value to
the same address in memory at the same time then
the value will be written correctly. If the values are
not the same then any value can be written.
Module
More then one processor can read the same memory
address at the same time.
Other modules:
The processors are on different computers.
There is no sheared memory for all the processors.
The processors are not using the same clock.
Array Maximum Problem
On a computer with one processor:
Time: O(N).
Algorithm: Going over an array and keeping
the maximum.
On a computer with K processors:
Time: O(N/K).
Algorithm: Each processor handles N/K
elements from the array. And all the sum's of
the parts of the array are summed together.
Array Maximum Problem
On a computer with O(N) processors.
Time: O(log(N)).
Algorithm: On the first stage every processor
will add 2 items. So after the first round will
have N/2 numbers. On the next round N/4
processors each will take 2 numbers and sum
them so we will have on ly N/4 result after
the 2 round.
After log(N) rounds we will have the sum of
the array.
Array Maximum Problem
1
2
3
4
5
6
Example: 8 elements time 3 = Log(8).
7
8
Array Maximum Problem
The number of commutations that are
performed is 7 (4 in the first round, 2 in the
second and 1 in the last). This is the same
number of computation that is being done in the
serial algorithm but it’s being done in less time.
This Algorithm will work for a lot of other
functions not just Max like Min, Sum, Avg.
It will work for every Associative function.
Finding The Two Greatest
Numbers
Simple solution for O(N) processors.
Algorithm: Find the first maximum remove it from
the array and find the second.
Time: 2 Log(N).
Smart algorithm for O(N) processors.
Algorithm:
First round: each processor handles 2 items find the max
and puts the other item in a.
Rounds 2..log(n): each processors handles 2 of the result of
the second round compares the 2 Max values takes the Max
as the new Max. and Takes the candidate group of the new
max adds the max of the second group to it as the new
candidate group.
Finding The Two Greatest
Numbers
On The last round the Max of the array is the
maximum and the second max is the maximum of
the candidate group.
Sample:
Array: 7, 10, 1, 3, 100, 8, 55, 6.
Finding The Two Greatest
Numbers
100
8
55
10
10
7
3
10
7
7
10
3
1
1
100
8
55
100
8
3
100
8
55
55
6
6
Results: The maximum is the maximum of the array (100)
and the second maximum is the maximum of the candidate
group (55).
Finding The Two Greatest
Numbers
Time:
Log(N) + LogLog(N).
Log(N) to find the first maximum and the
candidate group.
LogLog(N) to find the maximum in the
candidate group.
The candidate group size grows in 1 in each
round (the maximum of the other group) so
at the end it’s size is Log(N).
Merge problem
Description: We have 2 sorted N size arrays B,
C and we need to divide them into 2 new N
sized arrays A1, A2 that the N largest items
from both B and C will be in A1 and the N
smallest will be in A2.
Simple solution: We can merge B and C into one
sorted array A and copy the firs N elements to
A1 and the last N elements to A2. But with this
algorithm we can’t use multiple processors the
cost will still be O(N).
Merge problem
Smart algorithm for O(N) processors.
Processor I compares Bi with Cn+1-i the largest of the
two is going to A1 and the other to A2.
Correction proof.
If Bi > Cn+1-i the Bi > B1..Bi-1 and Cn+1-i > C1..Cn-i
so Bi is larger then N elements (I - 1 from B and N - i +
1 from C) so Bi needs to be in A1.
If Cn+1-i > Bi then Cn+1-i is larger then N elements (
N - I from C and I from B ) so Cn+1-i needs to be in
A1.
Merge problem
Example:
B: 1, 8, 10, 17
C: 9, 12, 67, 100
(B1, Cn), (B2, Cn-1), (B3, Cn-2), (B4, Cn-3).
A1 : 100, 67, 12, 17.
A2 : 1, 8, 10, 9.
Time: We can do all the comparisons at
the same time so the cost will be O(1).
Prefix Problem
Description: Find the sum of the elements
group.
S11 = X1
S12 = X1 + X2
S1n = X1 + X2 +… Xn-1+Xn
Simple solution: Compute the sums with N
processors time O(NLogN) N sums where each
one takes O(LogN).
Prefix Problem
Algorithm:
for I = 0 to n-1 doip
Si = Xi
for j = 0 to log n do
for I = 2^j to n-1 doip
Si = Si + Si-2^j
The doip means do in parallel in the different
processor.
At the end the results are in the array s.
Prefix Problem
Example: With 8 numbers X1..X8
Sij is Xi + Xi+1… + Xj.
X1
X2
X3
X4
X5
X6
X7
X8
S11
S11
S11
S12
S12
S12
S23
S13
S13
S34
S14
S14
S45
S25
S15
S56
S36
S16
S67
S47
S17
S78
S58
S18
Prefix Problem
Time:
each round we get double the result S1i
so after log(n) rounds we will get all the
result.
In order to use this algorithm each
processor needs to be connected to log(n)
other processors.
Prefix Problem
Usage example
Problem : we have an arithmetic expression and
we need to test if the brackets arrangement is
legal.
Algorithm: we will create an array x by adding 1
for each “(“ and -1 for each “)”. And run the
prefix algorithm. The results needs to be.
S11 = 1 and S11..S1n-1>=0 and S1n = 0.
Time with N processors : O(logN) log(N) for the
prefix algorithm and O(1) for the test.
Partition Problem
Description: We have and array X that some of it’s
element are signed we need to move all the signed
elements to one array and the none signed to another
array.
Simple solution: We take 2 stacks we push the signed
into one stack and the none signed into the other stack.
It will take o(N) time.
Simple solution 2: We take two indexes one for the start
of the array and one to the end. The first search for
signed and the second for none signed and when they
both find they exchange the items they point to and
move on until they meet. This will take o(N) time too
but it’s more parallel.
Partition Problem
Smart algorithm for O(N) processors:
Create a new array B but in be if the element
i is signed B[i] = 1 else B[i] = 0.
Create an array C with the prefix sums of B
that is C[i] = B[1] + B[2] + … B[i].
If X[i] is signed then Y1[C[i]] = X[i].
If X[i] is not signed then Y2[i-C[i]] = X[i].
Partition Problem
Example: X = 2, 4, 7, 8, 1, 3, 10, 12, 15.
X = 2, 4, 7, 8, 1, 3, 10, 12, 15
B = 0, 1, 0, 0, 0, 1, 1, 0, 1
C = 0, 1, 1, 1, 1, 2, 3, 3, 4
Y1 = 4, 3, 10, 15
Y2 = 2, 7, 8, 1, 12
Partition Problem
Time with O(N) processor.
Computing B: O(1).
Computing C: O(log(n)) using the prefix
algorithm.
Computing Y1 and Y2: O(1).
Total: O(log(n)).
Sorting Algorithm
Description: Sorting array A using O(N^2)
processors and put the result into array C.
Simple algorithm: The serial algorithm for
sorting an array takes a minimum of O(Nlog(N))
time.
Smart algorithm
Create a matrix B size of N*N and initialize it with
zeroes at all cells.
We will look at the N^2 processor as a matrix of
processors. Processor Pi,j will compute Ai>=Aj if true
then B[i,j] =1.
Sorting Algorithm
For each i from 1 to N C[Sum(i)] = A[i]. When
Sum(i) is the sum of B[i,1] to B[i,N].
Example: A=3, 5, 2, 9,
Matrix B
1 2 3 4
1 1 0 1 0
2 1 1 1 0
3 0 0 1 0
4 1 1 1 1
5 0 0 0 0
1
5
1
1
1
0
1
Sorting Algorithm
C = 1, 2, 3, 5, 9.
Time:
Using O(N^2) processors finding B matrix will take
O(1) and finding C will cost O(log(N)).
So the total cost of the algorithm will be
O(log(N)).
Using O(N) processors finding B will take O(N)
time and finding C will take O(N) time so the
total will be O(N).
Sorting Algorithm
Description: Sorting array A using O(N^2)
processors and put the result into array C.
Algorithm: Merge sort the largest cost in the
merge sort algorithm is the cost of the merge.
Using a serial algorithm the cost of merging 2
sorted arrays is O(N) and the cost of the merge
sort algorithm is O(Nlog(N)).
We will use the regular algorithm but with a
smarter merge algorithm.
Sorting Algorithm
Smart merge algorithm
Description: We need to merge two sorted
arrays A, B to a sorted array R.
Algorithm: We will describe a recursive
algorithm Merge.
C=merge(even(A), odd(B)).
D=merge(odd(A), even(B)).
Where odd(A) is all the items in A with an
Odd index. And Even(A) is all the items in A
with an even index.
Sorting Algorithm
When C = C0, C1, C2….Cn
D = D0, D1, D2….Dn
E=C0, D0, C1, D1…Cn, Dn.
Compare each Ci,Di and if Ci>Di then replace Ci and
Di in array E.
And array E is the merger of C and D.
Sorting Algorithm
Example: A = 3, 5, 8, 10
B = 4, 7, 9, 12
Even(A) = 5 ,10 Odd(A) = 3, 8
Even(B) = 7, 12 Odd(B) = 4, 9
C = 3, 7, 8, 12
D = 4, 5, 9, 10
E = 3, 4, 7, 5, 8, 9, 12, 10
After replacing in E
E = 3, 4, 5, 7, 8, 9, 10, 12
Time: Using O(N) processors the merge will take
O(log(N)) time The merge sort runs the merge
algorithm log(N) times so the total cost of the merge
sort is O(log^2(N)).
Find Algorithm
Description: If array X contains the value Val the
Res needs to be True else Res needs to be
False.
Simple Algorithm: Using a serial algorithm it will
take O(N) time.
Smart Algorithm: Using O(N) processor.
Res = False. Each process i tests if X[I] = Val if
true Res = True.
Time: O(1).
Model Description
Many processors.
Processors can send messages to each
other through communication.
We will want that each processor will
have a unique identification.
Since we have O(n) processors we need
O(logn) bit to represent the Id.
Model Description
Clean Net: when a processor doesn’t
now anything about his neighbors, not
even their Id’s. he only knows how
many neighbors he have.
We will explicitly mention when dealing
with Clean Net, otherwise every
processor has a unique Id.
Model Description
Message should include sender and
receiver Id and some information - total
O(logn) bits.
If X wants to send message to Y
through Z, it will cost 2 steps to send
the message.
X
Z
Y
Model Description
Local computation doesn’t take time.
we will analyze:
time complexity - the number of
steps the algorithm takes in the worst
case.
communication complexity - the
total number of messages that we sent
in the execution of the algorithm in the
worst case.
Distributed vs. Sequential
Communication - we need in the
distributed model but not in the
sequential.
Partial knowledge - together all the
processor knows everything, but not all
the processors necessarily knows
everything.
There can be processors or
communication channels down.
Distributed vs. Sequential
Synchronization - we need to
synchronize the processor.
Synchronic Model
there is a global clock.
In any clock cycle each of the
processor
- send messages to his neighbors.
- receive messages from his neighbors.
- make local computation in 0 time.
- change state.
Asynchronies Model
There is no global clock.
if a message was sent it will eventually
arrive to its destination (with no fall
downs) but we can't assume anything
about the arrival time.
we will start the time from the beginning
of the execution until the last processor
stooped.
Asynchronies Model
We will force the assumption that any
of the messages arrived in one time
unit in the worst case for time
complexity calculations.
Model Representation
We can represent the processors net
with a graph.
Each node in the graph is a processor.
There is an edge between two nodes if
there is a direct communication channel
between the two processors they
represent.
Complexity
C(, G, I) - communication complexity:
the total number of messages that were
sent in the execution in the worst case.
T(, G, I) - time complexity:
the number of clock cycles that the
execution take in the worst case.
Where is the protocol, G is the graph
and I is the input.
Complexity - examples
The following examples are in a full
graph.
1
n
2
Complexity - example 1
Protocol A: node 1 send the message m
to node 2.
C(A, G, I) = 1.
T(A, G, I) = 1.
1
2
m
Complexity - example 2
Protocol B: node 1 send the message mi
to the node i.
C(B, G, I) = n.
T(B, G, I) = 1.
1
i
mi
iG
Complexity - example 3
Protocol C: node i send the message mi
to node i+1.
C(C, G, I) = n.
T(C, G, I) = 1.
i
i+1
mi
iG
Complexity - example 4
Protocol D: node i send the message m
to node i+1 in cycle i.
C(D, G, I) = n.
T(D, G, I) = n.
m
1
2
2
m
.
.
.
3
Transmission Problem
Input: there is a message m in the
node V0.
Output: the message m is written in all
the nodes in the graph.
dG(x,y) - the shortest path from x to y
in graph G.
D = Diameter(G) = max x,yV { dG(x,y)
}.
Algorithms for the
Transmission Problem
Direct Delivery.
Spanning Tree.
DFS.
Flooding.
Direct Delivery
Bases on the assumptions:
- there is a routing system, such as that
messages are sent in the shortest path.
- V0 knows the addresses of all other
nodes in the graph.
V0 send the message m n-1 times, each
time to a different node.
DD Communication
Complexity
V0 sends n messages.
It takes O(D) steps for each massages.
C(DD, G, I) = O(n*D).
DD Time Complexity
Under the assumptions:
1. synchronic model.
2. V0 sends one new message in any
clock cycle.
There won’t be collisions between
messages, because messages goes in
the shortest path, and therefore we
can’t have more then one message for
a given distance from V0.
DD Time Complexity
The last messages will be sent in the n1 cycle.
It will take O(D) steps for the last
message to arrive.
T(DD, G, I) = O( n+D ).
DD Time Complexity
We can show the same time complexity
even without assumption 2.
If we will have two messages in a node
competing for the same edge. We will
send the message that should arrive to
the node with the smaller Id.
the message for node i, in time t, must
be in a distance t-i+1 from V0 (or in Vi).
Spanning Tree
Assumptions:
We have a spanning tree in the graph,
that all the node aware off (each node
knows which of his edges is part of the
spanning tree).
Each node that receive the message
send it on the spanning tree edges.
Spanning Tree Complexity
We send the message once for each
spanning tree edge.
C(ST) = n-1.
We need tree depth rounds until the
last node receive the message.
T(ST) = O( Depth( tree, V0 ) ).
If we choose a BFS tree: T(ST) = O(D).
Building a Spanning Tree
If we don’t have a spanning tree, we
can built one using any algorithm A for
Transmission.
Execute algorithm A.
each node V choose as a parent the
node W from which it received the
message for the first time.
Building a Spanning Tree
V inform W that he is his parent.
The edge E(W,V) is marked as a
spanning tree edge.
Since transmission algorithm deliver the
message to all nodes, we know that all
the nodes are in the spanning tree.
We have no cycles since V choose only
one parent.
DFS
We traverse the graph in DFS order.
If we reached a new node we leave a
copy of the message, mark the node
and continue the traversal.
If we reached a marked node we go
back.
DFS Complexity
In the DFS algorithm we move on each
edge exactly twice.
C(DFS) = T(DFS) = O(E).
Flooding
Each node that receive the message for
the first time, sent it to all of his
neighbors.
When a node receive a message in the
next times, it just dump the message.
Flooding is affective also in a Clean Net.
Flooding Complexity
In each edge the message will pass
twice, once in each direction.
C(Flood) = O(E).
After t time unit the message will reach
all the nodes that their distance from V0
is smaller or equal to t.
T(Flood) = O(D).
© Copyright 2026 Paperzz