Slides-4061KB

AC:
PRESTO = Precursory Research for Embryonic Science and Technology
JST= Japan Science and Technology
Quad-tree image compression using
reconfigurable free-space optical interconnections
and pipelined parallel processors
…
LCD/SLM
LCD/SLM
LCD/SLM
LCD/SLM
Alvaro Cassinelli*, Makoto Naruse*,** and Masatoshi Ishikawa*
Ishikawa-Hashimoto lab. University of Tokyo*, PRESTO JST**
Plan of the presentation
I. OCULAR architectures for computing
- Reconfigurable Single Stage (OCULAR-I)
- Reconfigurable Multi-stage (OCULAR-II)
II. OCULAR-II demonstration: Quad-tree compression.
- Quad-tree compression algorithm
- Set-up and Demonstration
- Discussion
III. Conclusion and further work
O ptoelectronic
C omputer
I. OCULAR architectures for computing
U sing
I.1 Reconfigurable Single Stage (OCULAR-I)
L aser
A rrays with
I.2 Reconfigurable Multi-stage (OCULAR-II)
R econfiguration
Processing
Element Array
Photo
Detector
Array
VCSEL array
Optical
Interconnections
Optical Interconnections
2D array
of data
2D array
of data
Optical feed-back
…
VCSEL
Processing Element Array
Photo Detector
Output
I.1 Single-stage paradigm for parallel computing
Optical technology offers enhanced parallel communication primitives
…of great benefit for network-based parallel computers
= distributed memory
 shared memory
Static
Dynamic
Reconfigurable
interconnection
Pn
P1
controller
…switches inside
processors (local control)
…
…
Z
…
Pn
…
Fixed
interconnection
(X, Y, and Z)
…
Mem
Y
…
ULA
P2
…
X
…
control
…
P2
X
…
P1
Y
mux
Z
(X, Y or Z).
…switches outside processors
(local or global/external control possible)
I.1 Dynamic architecture vs. static [slide not shown in main presentation]
In an n-degree static topology,
each processor has n distinct
optoelectronic I/O ports…
switches
…
P1
interconnections
…
processors
…
…
…
…
…
…
Pn
…
P2
Technologically challenging
Non reusable architecture
Bad scalability
…anyway, static
networks can be
redesigned as
single-stage
dynamic
networks…
P1
Pn
…
P2
Feed-back loop
…processors, switches and
interconnections located in
distinct modules
Optimal use of electronic, optoelectronic and optics
Scalability, hardware reusability in other topologies
possible introduction of multiple stages…
I.1 OCULAR-I system architecture
dynamic single stage…
…optical architecture
Y
Z
…
…
…
…
Pn
VCSEL array
…
P2
Photo-detector
array
…
…
X
…
P1
…
Elementary
Processor Array
…
[ Modular architecture ]
2D optoelectronic
processing layer
(PD-PE-VCSEL)
+
Switches and
interconnections :
reconfigurable
diffractive optics
module
Processing Module
[VCSEL array ]
[ Photo-detector array ]
Si photo-detectors with
850 nm VCSELs
Integrated amplifier / threshold
Modulation > 1 GHz
(possible 10-50 GHz)
[ SIMD Processor array ]
registers
A
8x8 PEs
(on FPGA)
B
local memory
(24 bits)
ALU
mapped I/O
4-neighbors
VCSEL PD
Electronic mesh for rapid short range
communication between PEs.
PE
Each array attached to a PCB
10 MHz operation demonstrated
alvaro:
Reconfigurable
interconnection module
In these optical interconnection module, we require adjustable components to adopt the diffraction position
on LD and PD.
Folded 4-f system
We have designed
The
14 x 25 zooming
x 6.2 cmFourier transform lens as the adjustable component.
module generates the
The focal length is adjustable from 360mm to 440mm by moving one of lenses
as illustrated in the pattern…
figure.
interconnection
This function is important for matching interconnection parameters such as the pixel pitches of the VCSELarray, the PD-array, the CGH, and for compensating for wavelength variation of the VCSEL array.
X
=
FT lens
Laser diode
Y
Z
…it is therefore responsible for
interconnection and switching
CGH is generated by an
optically addressable SLM,
using a laser diode and a
liquid crystal display
coupled trough a fiber
optical plate.
Space-invariant interconnections – good/bad?
Free-space – alignment issues?
Multi-level CGH – good diffraction efficiency
Reconfiguration (“switch”) freq. – 100 Hz…
I.2 Multi-stage paradigm for parallel computing
architecture can be “spanned” into
Stage 2
P1
P1
P2
Pn
Hypercube
Mesh
Cube Cycle
Tree
[computing]
Pyramid
De Bruijn
P1
…
Pn
Delta
Omega
P2
…
…
…
S&I-1
P2
Stage m
S&I-m
Stage 1
Multi-Stages
S&I-2
Single-Stage
Pn
Benes
Clos
[computing & networking]
Shuffle/exchange
Banyan
Simplicity & Speed – S & I does not need to be complex
(shuffle-exchange networks).
The cost of
multiplying the
processors is paid
back as…
Scalability / Reconfigurability – for different topologies.
Pipelining – possible.
Theoretical background – Multi-stage architectures have
been studied for decades in networking applications…
I.2 OCULAR-II system architecture
Optoelectronic
processing module
Elementary Processor Array
Photo-detector array
Optical
interconnection
module
VCSEL array
Optical
interconnection
module
Optical
interconnection
module
Two
layer
module
II. Quad-tree compression on OCULAR-II
II.1 Quad-tree compression algorithm
II.2 Set-up and Demonstration
II.3 Discussion
Sender array
Electrical feed-back
trough host
computer
PE array
VCSELs
Receiver
array
Interconnection module
(SLM)
Photo Detectors
PE array
II.1 Principle of the quad-tree compression algorithm
Image…
A
This group of
pixels is a level 2
leaf of address B
…corresponding tree
level 3
B
A
D
C
B
level 2
B
D
…this pixel
is NOT a leaf
B
level 1
D
level 1 leaf of
address DB
DB
C
…this pixel is a
level 0 leaf of
address CDA
A
level 0
CDA
Leaf = ( level , address )
Image as a tree = ( 2 , B ) + ( 1 , DB ) + ( 0 , CDA )
A C :compression on OCULAR-II architecture
II.1 Quad-tree
Rem : data from the receiver side to
Load 2Nfeedx2N image. ON pixels are
the sender side is electronically
• initialization set as lowest level leafs on local
back trough the host computer…
PE memories.
•
from stage to stage
2
1
•
detect upper leaves
4
3
- sequentially broadcast leaf’s values to
corresponding upper PE.
- compare on receiver side
- update leaf levels of upper-level PE, if corners
resulted to be lower “false” leafs.
•
detect upper leaves
cutting branches
- parallel broadcast signal for resetting false
low-level leaves.
•
End on last stage:
- Download data from last array.
- Save data (level, address) from PEs which are still
leaves.
cutting branches
Example : interconnection for processing of level 1
1) Detecting leaves
CCD image of PD plane
A
B
…Is A a level
one leaf?
(first order)
C
(zero order)
D
D
= broadcasting PE on array n
= computing PE on array n+1
2) Conditional broadcast
…If so, A must
update its leaf
level and cut
lower branches.
A
B
C
D
[slide not shown in main presentation]
A
II.2 OCULAR-II demonstrator setup
• demonstration is carried out on a two layer OCULAR II prototype
PE array 1
VCSEL array
PD array PE array 2
Optical
interconnection
module
Multiple layer processing
is simulated thanks to
electronic feed-back
between first and second
processor arrays.
• Interconnection for each level are time multiplexed on the SLM module.
Level 0
cgh
Level 1
diffraction pattern
• Two level CGHs are used (enough diffraction efficiency)
Level 2
…quad-tree algorithm and hypercube network
Image 2n/2 x 2n/2 pixel large
2n elementary processors arranged in
a n-dimensional hypercube topology
Y
X
Z
W
Quad-tree on OCULAR-II: pairs of (6-dimensional) hypercube links are generated
and multiplexed in time thanks to the SLM-based interconnection module…
…on level 1: X, Z
…on level 2: Y, W
…
II.2 Quad-Tree Compression Demonstration Setup
CGH
monitor
“receiver”
array
(SIMD + PD)
Monitor
CCD
Interconnection
module
“sender” array
(SIMD + VCELS)
Control and results on
host computer …
Example : holograms required during level 1 processing.
1) Broadcast hologram (quadrant comparison)
(first order)
(zero order)
A
Potential
leaf on
level one
B
D
C
D
2) Re-Broadcast hologram (cutting branches)
= broadcasting PE
= computing
PE
A
A
B
C
D
[slide not shown in main presentation]
Level 0. Detecting upper leaves.
Level 0
quadrants
AB
DC
D
C
A
B
level 0
leaves
…symbolic representation of the initial tree, containing 28
level 0 (most of them false) leaves
Detail of level 0 broadcasting
[slide not shown in main presentation]
= “D” corners with leaf bit ON
sender array
= “D” corners with leaf bit OFF.
photo-detector chip surface as seen
through the alignment CCD camera
receiver array
In this demonstration we used
two-level phase CGHs
computed by SA.
Only the 1st order of diffraction is
used as the interconnection
pattern.
Level 0. Cutting branches.
newly
created leaf
on level 1
D
C
A
B
Level 1. Detecting upper leaves.
Level 1
quadrants
A B
D C
D
C
A
B
Level 1. Cutting branches.
newly
created leaf
on level 2
D
C
A
B
Level 2. Detecting leaves and cutting branches.
D
Level 2
quadrants
A B
A
C
B
D C
…symbolic representation of the encoded image as a
minimal tree with seven leaves.
II.3 Discussion
Compression of a 2Nx2N pixel large image takes O(5.N) clock cycles...
SIMD array, VCSEL and photo-detectors can run at more than 100MHz…
 two million 1024x1024 images compressed per second!
8x8
image
(N=3)
28 pixels ON = 28 initial leaves.
15 iterations…
…only seven final leaves
However, SLM reconfiguration limits operation at maximum hundred hertz....
Also, one have to remember than our chips are only 8x8 pixel large.
III. Conclusion and further work
II.1 Summary
II.2 Research underway and further work
I.1 Summary
We have successfully tested OCULAR-II multistage architecture with
reconfigurable optical interconnections by implementing quad-tree
compression on binary images (=example of embedded hypercube)
However…
Optically addressed SLM-based interconnection module accounts for the strongest
bandwidth limitation (hundred hertz)
Electronic feed-back trough host computer generates parasitic signals,
and synchronization problems!
Alignment is not difficult, but may become a critical issue in “true” multistage
architectures...
III.2 Further work: OCULAR-III
[ Research underway ]
Alignment issues (between 2D arrays)
- dynamic alignment using actuators and control theory.
Fiber
bundle
- pre-aligned connectors using fiber-bundles.
Concurrent multistage paradigm using fixed interconnections
- design of fixed, guide-wave-based pre-aligned
interconnection modules (the processor array is in
charge of the switching function) => OCULAR-III
Design of an integrated (VLSI) optoelectronic layer (with switching…)
IBnC
[ Future research directions ]
 network
interconnection
modules
- Test of these “modular” architectures for building
computing and networking MINs.
- Design of all-optical networks using the above paradigm.
Processor
arrays
http://www.k2.t.u-tokyo.ac.jp/index-e.html