Data Mining Engineering

Parallel and Distributed Systems
Peter Brezany
Institute for Software Science
University of Vienna
P.Brezany
Institute for Software Science – University of Vienna
Typical One-Processor Architecture
(SISD Architecture)
SISD : Single Instruction stream Single Data stream
P.Brezany
Institute for Software Science – University of Vienna
2
Array Processor (SIMD Architecture)
SIMD: Single Instruction
stream Multiple Data streams
P.Brezany
Institute for Software Science – University of Vienna
3
Loop Parallelizing for SIMDSs
A typical scientific program spends approx. 90% of its execution time
in loops.
Example in Java:
float A[1000], B[1000];
for (int i = 1; i < 1000; i++) {
A[i-1] = B[i];
}
The above loop can be expressed in Fortran 95 in the following way:
A(0:998) = B(1:999)
This statement can be directly mapped onto a SIMD processor.
There is an initiative to extend Java by similar constructs.
P.Brezany
Institute for Software Science – University of Vienna
4
Parallel Multi-Processor-Hardware
(MIMD Architectures)
• Distributed-memory machines (DM Multiprocessors,
DM MIMDS)
–
–
–
–
Each processor has local memory and disk
Communication via message-passing
Hard to program: explicit data distribution
Goal: minimize communication
–
–
–
–
Shared global address space and disk
Communication via shared memory variables
Ease of programming
Goal: maximize locality, minimize false sharing
• Shared-memory machines (SM Multiprocessors, SM
MIMDs, SMPs)
• Current trend: Cluster of SMPs
P.Brezany
Institute for Software Science – University of Vienna
5
Distributed Memory Architecture
(Shared Nothing)
Interconnection Network
P.Brezany
CPU
CPU
CPU
CPU
Local
Memory
Local
Memory
Local
Memory
Local
Memory
Institute for Software Science – University of Vienna
6
DMM: Shared Disk Architecture
Interconnection Network
CPU
CPU
CPU
CPU
Local
Memory
Local
Memory
Local
Memory
Local
Memory
Global Shared Disk Subsystem
P.Brezany
Institute for Software Science – University of Vienna
7
Loop Parallelizing for DM MIMDs
Example in Java:
float A[10], B[10];
for (int i = 1; i < 10; i++) { A[i] = B[i-1]; }
For two processors, P1 and P2, a straightforward
solution would be:
P.Brezany
Institute for Software Science – University of Vienna
8
Loop Parallelizing for DM MIMDs (2)
P.Brezany
Institute for Software Science – University of Vienna
9
Loop Parallelizing for DM MIMDs (3)
Code on P1:
float A[5], B[5]; float temp;
for (int i = 1; i < 6; i++) {
if ( i == 5 ) {
receive message from P2 into temp; A[i-1] = temp;
{
else A[i-1] = B[i];
}
Code on P2:
float A[5], B[5]; float temp;
for (int i = 0; i < 5; i++) {
if ( i == 0 ) {
temp = B[0]; send temp to P1;
{
else A[i-1] = B[i];
}
P.Brezany
Institute for Software Science – University of Vienna
10
Shared Memory Architecture
(Shared Everything, SMP)
Interconnection Network
CPU
CPU
CPU
CPU
Global Shared Memory
P.Brezany
Institute for Software Science – University of Vienna
11
Loop Parallelizing for SMPs
Example in Java:
float A[1000], B[1000];
for (int i = 1; i < 1000; i++) {
A[i-1] = B[i];
}
If we have, e.g. two processors, P1 and P2, a straightforward
(non-optimal) solution would be:
Code on P1:
for (int i = 1; i < 500; i++) { A[i-1] = B[i]; }
Code on P2:
for (int i = 500; i < 1000; i++) { A[i-1] = B[i]; }
Data elements of A and B are stored in the shared memory.
P.Brezany
Institute for Software Science – University of Vienna
12
Cluster of SMPs
Interconnection Network
P.Brezany
CPU CPU
CPU CPU
CPU CPU
CPU CPU
CPU CPU
CPU CPU
CPU CPU
CPU CPU
4-CPU
SMP
4-CPU
SMP
4-CPU
SMP
4-CPU
SMP
Institute for Software Science – University of Vienna
13
P.Brezany
Institute for Software Science – University of Vienna
14
Abstract Maschine Model
P.Brezany
Institute for Software Science – University of Vienna
15
Cluster von PCs
P.Brezany
Institute for Software Science – University of Vienna
16
P.Brezany
Institute for Software Science – University of Vienna
17
No Pipeline
Pipeline
P.Brezany
Institute for Software Science – University of Vienna
18
Towards Parallel Databases
P.Brezany
Institute for Software Science – University of Vienna
19
Relational Data Model
Example
Cities
Name
Munich
Bremen
...
Population
1211617
535058
...
Land
Bayern
Bremen
...
Schema: (Cities: STRING, Population: INTEGER, Land: STRING
Relation (represented by a table):
{(Munich, 1.211.617, Bayern), (Bremen, 535.058, Bremen), ...}
Key : {Name}
P.Brezany
Institute for Software Science – University of Vienna
20
Relational Data Model - Queries
SELECT
<Attribute List>
FROM
<Relation Name>
[WHERE
<Condition>] – option
..........
other options
Example:
This is equivalent to:
SELECT *
FROM
Cities
SELECT Name, Population, Land
FROM Cities
SELECT Name, Land
FROM
Cities
WHERE Population > 600000
P.Brezany
Institute for Software Science – University of Vienna
21
P.Brezany
Institute for Software Science – University of Vienna
22
P.Brezany
Institute for Software Science – University of Vienna
23
Grid Idea
P.Brezany
Institute for Software Science – University of Vienna
24