of these memory operations have a Fixed Cost

Computer Engineering of Wave Machines for
Seismic Modeling and Seismic Migration
R. Phillip Bording
February 17, 2004
Husky Energy Chair in Oil and Gas Research
Memorial University of Newfoundland
0
Max Address
M U N - February 17, 2005 - Phil
Bording
1
Session 1
History of Design
Tyco Brahe
Napier
Charles Babbage – mechanical design
John Atanasoff – Storage – spinning capacitor
-
Konrad Zuse - Floating Point
Mauchley and Ekert
von-Neumann
Harvard memory – code
memory - data
Princeton
memory code and data
Session 2
Current Design Issues
Scaling laws
Moore’s Law
Transistors – VLSI
Memory – Technology
Division of Design
The memory Challenge
The processor Challenge
The ILLIAC – PEPE
IBM 7094
IBM 360/44
IBM 360/95
Array Processors
the software of array processor calls
Programming Models
vectors
shared memory
distributed memory
Lamda Rules
M U N - February 17, 2005 - Phil
Bording
4
Division of design
Company A
Memory
Memory
Weak Link
ALU
ALU
One Company
Company B
M U N - February 17, 2005 - Phil
Bording
5
Moore’s Laws
Every 18 months the density of
transistors on a VLSI chip doubles
The investments of $ doubles with every
new VLSI plant
M U N - February 17, 2005 - Phil
Bording
6
Illiac
8 X 8 Processors
Nearest Neighbor Connections
M U N - February 17, 2005 - Phil
Bording
7
Parallel Ensemble Processing
Elements - PEPE
Radar Processing Computer
Associative Computing
Data Outputs
P0
....
Pn-3
Pn-2
Pn-1
Pn
Data Inputs
M U N - February 17, 2005 - Phil
Bording
8
IBM Machines
Early 1960’s 7094, 36 bit arithmetic

1600 and 1400 processors completely different
Middle 1960’s New Machine – IBM 360

36 bit words, but memory parity was added



8 bit byte + 1 bit parity
Uniform business machine architectures
32 and 64 bit floating point
Not any industry standard for format of
floating point
M U N - February 17, 2005 - Phil
Bording
9
Array Processors
IBM and CDC designed DMA
processors – Direct Memory Access
Frees the main processor to compute
 Allows separate simple processors to do
the i/o

The idea translated into attached
processors for arithmetic processing
M U N - February 17, 2005 - Phil
Bording
10
Array Processors
Arrays of data are moved to a local very
high speed memory – fast registers
Arithmetic is performed by special
instructions passed to array processor
CPU
Array Processor
M U N - February 17, 2005 - Phil
Bording
11
Software Design Issues
Vector Programming
Cache Programming
Message Passing Programming
NUMA Programming
Grid Programming
ALL of these memory operations have a
Fixed Cost
Code Performance
Improvements
M U N - February 17, 2005 - Phil
Bording
are dominated
by fixed costs
12
Hardware Design Issues
10 Years equals 100 Fold Speedup
Memory Latency – cost of getting the
first word is a constant
Wires have failed to scale
Bigger cache memories are slower
Code Performance Improvements are
dominated by fixed costs
M U N - February 17, 2005 - Phil
Bording
13
Linear Address Space
0
Max Address
Address Pointer
Latency is the time to access the first word
Bandwidth is the rate of accessing successive words
M U N - February 17, 2005 - Phil
Bording
14
von Neumann
Architecture
Princeton
Address Pointer
Arithmetic
Logic
Unit
(ALU)
Data/Instructions
Memory
Pc = Pc + 1
Program Counter
Featuring Deterministic Execution
M U N - February 17, 2005 - Phil
Bording
15
Cache Memory
Architecture
Main Memory is large
and slow.
Cache is much smaller
and much faster.
Control logic control
keeps the main memory
coherent.
C
A
C
H
E
C
O
N
T
R
O
L
Memory
Cache
Memory
Address Pointer
Featuring Non-Deterministic Execution
M U N - February 17, 2005 - Phil
Bording
16
Cache Memory
- Three Levels
Architecture
Memory
MultiGigabytes
2 Gigahertz Clock
Large
and
Slow
160 X
Cache
Control
Logic
2X
8X
L1 Cache
Memory
L2 Cache
Memory
32 Kilobytes
16X
L3 Cache
Memory
128 Kilobytes 16 Megabytes
Featuring Really Non-Deterministic
M U N - February 17, 2005 - Phil
Execution
Bording
Address Pointer
17
Programming Models
for
Parallel Computing
M U N - February 17, 2005 - Phil
Bording
18
Distributed Computing
Message Passing Interface
Program Address Spaces
0
Max 0
Max 0
Max 0
Max
Multiple Address Pointers
M U N - February 17, 2005 - Phil
Bording
19
Distributed Computing
with Message Passing
Program Address Spaces
Messages Left and Right
Multiple Address Pointers
M U N - February 17, 2005 - Phil
Bording
20
M U N - February 17, 2005 - Phil
Bording
21
Multi-Threading
OpenMP Programming Model
Global Program Address Space
Local
0
Local
Local
Local
n-1 n
2n-1 2n
3n-1 3n
4n-1
Multiple Address Pointers
M U N - February 17, 2005 - Phil
Bording
Address and Cache
Bus with Conflict
Resolution
22
Uniqueness of Store
Multi-Threading
Program Address Space
0
Duplicate Pointers
Multiple Address Pointers
to the same Location – Conflict on storing a result
So whoMisU Nmanaging
the multiple pointers?
- February 17, 2005 - Phil
Bording responsibility.
It is the programmers
23
Multiple Bank Memory
Systems
Memory Banks
Bank
0
Starting
Address
Mod 4
1
+1
+N
2
3
+2
+2N
+3
+3N
Vector Programming Model
M U N - February 17, 2005 - Phil
Bording
24

Download Report

of these memory operations have a Fixed Cost

Paperzz.com

Your Paperzz