AC: PRESTO = Precursory Research for Embryonic Science and Technology JST= Japan Science and Technology Quad-tree image compression using reconfigurable free-space optical interconnections and pipelined parallel processors … LCD/SLM LCD/SLM LCD/SLM LCD/SLM Alvaro Cassinelli*, Makoto Naruse*,** and Masatoshi Ishikawa* Ishikawa-Hashimoto lab. University of Tokyo*, PRESTO JST** Plan of the presentation I. OCULAR architectures for computing - Reconfigurable Single Stage (OCULAR-I) - Reconfigurable Multi-stage (OCULAR-II) II. OCULAR-II demonstration: Quad-tree compression. - Quad-tree compression algorithm - Set-up and Demonstration - Discussion III. Conclusion and further work O ptoelectronic C omputer I. OCULAR architectures for computing U sing I.1 Reconfigurable Single Stage (OCULAR-I) L aser A rrays with I.2 Reconfigurable Multi-stage (OCULAR-II) R econfiguration Processing Element Array Photo Detector Array VCSEL array Optical Interconnections Optical Interconnections 2D array of data 2D array of data Optical feed-back … VCSEL Processing Element Array Photo Detector Output I.1 Single-stage paradigm for parallel computing Optical technology offers enhanced parallel communication primitives …of great benefit for network-based parallel computers = distributed memory shared memory Static Dynamic Reconfigurable interconnection Pn P1 controller …switches inside processors (local control) … … Z … Pn … Fixed interconnection (X, Y, and Z) … Mem Y … ULA P2 … X … control … P2 X … P1 Y mux Z (X, Y or Z). …switches outside processors (local or global/external control possible) I.1 Dynamic architecture vs. static [slide not shown in main presentation] In an n-degree static topology, each processor has n distinct optoelectronic I/O ports… switches … P1 interconnections … processors … … … … … … Pn … P2 Technologically challenging Non reusable architecture Bad scalability …anyway, static networks can be redesigned as single-stage dynamic networks… P1 Pn … P2 Feed-back loop …processors, switches and interconnections located in distinct modules Optimal use of electronic, optoelectronic and optics Scalability, hardware reusability in other topologies possible introduction of multiple stages… I.1 OCULAR-I system architecture dynamic single stage… …optical architecture Y Z … … … … Pn VCSEL array … P2 Photo-detector array … … X … P1 … Elementary Processor Array … [ Modular architecture ] 2D optoelectronic processing layer (PD-PE-VCSEL) + Switches and interconnections : reconfigurable diffractive optics module Processing Module [VCSEL array ] [ Photo-detector array ] Si photo-detectors with 850 nm VCSELs Integrated amplifier / threshold Modulation > 1 GHz (possible 10-50 GHz) [ SIMD Processor array ] registers A 8x8 PEs (on FPGA) B local memory (24 bits) ALU mapped I/O 4-neighbors VCSEL PD Electronic mesh for rapid short range communication between PEs. PE Each array attached to a PCB 10 MHz operation demonstrated alvaro: Reconfigurable interconnection module In these optical interconnection module, we require adjustable components to adopt the diffraction position on LD and PD. Folded 4-f system We have designed The 14 x 25 zooming x 6.2 cmFourier transform lens as the adjustable component. module generates the The focal length is adjustable from 360mm to 440mm by moving one of lenses as illustrated in the pattern… figure. interconnection This function is important for matching interconnection parameters such as the pixel pitches of the VCSELarray, the PD-array, the CGH, and for compensating for wavelength variation of the VCSEL array. X = FT lens Laser diode Y Z …it is therefore responsible for interconnection and switching CGH is generated by an optically addressable SLM, using a laser diode and a liquid crystal display coupled trough a fiber optical plate. Space-invariant interconnections – good/bad? Free-space – alignment issues? Multi-level CGH – good diffraction efficiency Reconfiguration (“switch”) freq. – 100 Hz… I.2 Multi-stage paradigm for parallel computing architecture can be “spanned” into Stage 2 P1 P1 P2 Pn Hypercube Mesh Cube Cycle Tree [computing] Pyramid De Bruijn P1 … Pn Delta Omega P2 … … … S&I-1 P2 Stage m S&I-m Stage 1 Multi-Stages S&I-2 Single-Stage Pn Benes Clos [computing & networking] Shuffle/exchange Banyan Simplicity & Speed – S & I does not need to be complex (shuffle-exchange networks). The cost of multiplying the processors is paid back as… Scalability / Reconfigurability – for different topologies. Pipelining – possible. Theoretical background – Multi-stage architectures have been studied for decades in networking applications… I.2 OCULAR-II system architecture Optoelectronic processing module Elementary Processor Array Photo-detector array Optical interconnection module VCSEL array Optical interconnection module Optical interconnection module Two layer module II. Quad-tree compression on OCULAR-II II.1 Quad-tree compression algorithm II.2 Set-up and Demonstration II.3 Discussion Sender array Electrical feed-back trough host computer PE array VCSELs Receiver array Interconnection module (SLM) Photo Detectors PE array II.1 Principle of the quad-tree compression algorithm Image… A This group of pixels is a level 2 leaf of address B …corresponding tree level 3 B A D C B level 2 B D …this pixel is NOT a leaf B level 1 D level 1 leaf of address DB DB C …this pixel is a level 0 leaf of address CDA A level 0 CDA Leaf = ( level , address ) Image as a tree = ( 2 , B ) + ( 1 , DB ) + ( 0 , CDA ) A C :compression on OCULAR-II architecture II.1 Quad-tree Rem : data from the receiver side to Load 2Nfeedx2N image. ON pixels are the sender side is electronically • initialization set as lowest level leafs on local back trough the host computer… PE memories. • from stage to stage 2 1 • detect upper leaves 4 3 - sequentially broadcast leaf’s values to corresponding upper PE. - compare on receiver side - update leaf levels of upper-level PE, if corners resulted to be lower “false” leafs. • detect upper leaves cutting branches - parallel broadcast signal for resetting false low-level leaves. • End on last stage: - Download data from last array. - Save data (level, address) from PEs which are still leaves. cutting branches Example : interconnection for processing of level 1 1) Detecting leaves CCD image of PD plane A B …Is A a level one leaf? (first order) C (zero order) D D = broadcasting PE on array n = computing PE on array n+1 2) Conditional broadcast …If so, A must update its leaf level and cut lower branches. A B C D [slide not shown in main presentation] A II.2 OCULAR-II demonstrator setup • demonstration is carried out on a two layer OCULAR II prototype PE array 1 VCSEL array PD array PE array 2 Optical interconnection module Multiple layer processing is simulated thanks to electronic feed-back between first and second processor arrays. • Interconnection for each level are time multiplexed on the SLM module. Level 0 cgh Level 1 diffraction pattern • Two level CGHs are used (enough diffraction efficiency) Level 2 …quad-tree algorithm and hypercube network Image 2n/2 x 2n/2 pixel large 2n elementary processors arranged in a n-dimensional hypercube topology Y X Z W Quad-tree on OCULAR-II: pairs of (6-dimensional) hypercube links are generated and multiplexed in time thanks to the SLM-based interconnection module… …on level 1: X, Z …on level 2: Y, W … II.2 Quad-Tree Compression Demonstration Setup CGH monitor “receiver” array (SIMD + PD) Monitor CCD Interconnection module “sender” array (SIMD + VCELS) Control and results on host computer … Example : holograms required during level 1 processing. 1) Broadcast hologram (quadrant comparison) (first order) (zero order) A Potential leaf on level one B D C D 2) Re-Broadcast hologram (cutting branches) = broadcasting PE = computing PE A A B C D [slide not shown in main presentation] Level 0. Detecting upper leaves. Level 0 quadrants AB DC D C A B level 0 leaves …symbolic representation of the initial tree, containing 28 level 0 (most of them false) leaves Detail of level 0 broadcasting [slide not shown in main presentation] = “D” corners with leaf bit ON sender array = “D” corners with leaf bit OFF. photo-detector chip surface as seen through the alignment CCD camera receiver array In this demonstration we used two-level phase CGHs computed by SA. Only the 1st order of diffraction is used as the interconnection pattern. Level 0. Cutting branches. newly created leaf on level 1 D C A B Level 1. Detecting upper leaves. Level 1 quadrants A B D C D C A B Level 1. Cutting branches. newly created leaf on level 2 D C A B Level 2. Detecting leaves and cutting branches. D Level 2 quadrants A B A C B D C …symbolic representation of the encoded image as a minimal tree with seven leaves. II.3 Discussion Compression of a 2Nx2N pixel large image takes O(5.N) clock cycles... SIMD array, VCSEL and photo-detectors can run at more than 100MHz… two million 1024x1024 images compressed per second! 8x8 image (N=3) 28 pixels ON = 28 initial leaves. 15 iterations… …only seven final leaves However, SLM reconfiguration limits operation at maximum hundred hertz.... Also, one have to remember than our chips are only 8x8 pixel large. III. Conclusion and further work II.1 Summary II.2 Research underway and further work I.1 Summary We have successfully tested OCULAR-II multistage architecture with reconfigurable optical interconnections by implementing quad-tree compression on binary images (=example of embedded hypercube) However… Optically addressed SLM-based interconnection module accounts for the strongest bandwidth limitation (hundred hertz) Electronic feed-back trough host computer generates parasitic signals, and synchronization problems! Alignment is not difficult, but may become a critical issue in “true” multistage architectures... III.2 Further work: OCULAR-III [ Research underway ] Alignment issues (between 2D arrays) - dynamic alignment using actuators and control theory. Fiber bundle - pre-aligned connectors using fiber-bundles. Concurrent multistage paradigm using fixed interconnections - design of fixed, guide-wave-based pre-aligned interconnection modules (the processor array is in charge of the switching function) => OCULAR-III Design of an integrated (VLSI) optoelectronic layer (with switching…) IBnC [ Future research directions ] network interconnection modules - Test of these “modular” architectures for building computing and networking MINs. - Design of all-optical networks using the above paradigm. Processor arrays http://www.k2.t.u-tokyo.ac.jp/index-e.html
© Copyright 2026 Paperzz