ska phase 1 correlator and tied array beamformer

SKA PHASE 1 CORRELATOR AND TIED ARRAY BEAMFORMER Document number .................................................................. WP2‐040.060.010‐TD‐002
Revision ........................................................................................................................... 1
Author ............................................................................................................ J. D. Bunton
Date ................................................................................................................. 2011‐04‐01
Status ............................................................................................... Approved for release
Name Designation Affiliation Date Signature 2011‐04‐01 2011‐04‐01 Additional Authors G.A.Hampson Submitted by: J.D. Bunton CSIRO Approved by: W. Turner Signal Processing Domain Specialist
SPDO WP2‐040.060.010‐TD‐002 Revision : 1 DOCUMENT HISTORY Revision Date Of Issue Engineering Change Comments Number A ‐ ‐ First draft release for internal review DOCUMENT SOFTWARE Package Version Filename Wordprocessor MsWord Word 2003 Block diagrams Other 03c‐wp2‐040.060.010‐td‐002‐1‐ASKAP_SKA1_concept‐description‐2003 ORGANISATION DETAILS Name Physical/Postal Address SKA Program Development Office Jodrell Bank Centre for Astrophysics Alan Turing Building The University of Manchester Oxford Road Manchester, UK Fax. Website M13 9PL +44 (0)161 275 4049 www.skatelescope.org 2011‐04‐01 Page 2 of 22 WP2‐040.060.010‐TD‐002 Revision : 1 TABLE OF CONTENTS 1 INTRODUCTION ............................................................................................. 6 1.1 Purpose of the document ....................................................................................................... 6 2 REFERENCES ................................................................................................ 7 3 INTRODUCTION ............................................................................................. 9 3.1 Scope ....................................................................................................................................... 9 4 SKA SPECIFICATION .................................................................................. 9 5 TECHNICAL ASSUMPTIONS ....................................................................... 9 6 SPARSE APERTURE ARRAY CORRELATOR ................................................ 11 6.1 6.2 6.3 6.4 Architecture Overview. ......................................................................................................... 11 Correlation Processing .......................................................................................................... 12 Correlator Data Reordering .................................................................................................. 13 Data transport: SKA Phase 1 & 2 options .............................................................................. 14 7 SPF DISH CORRELATOR ........................................................................... 14 7.1 7.2 Architecture Overview .......................................................................................................... 14 Extension to PAF correlator .................................................................................................. 16 8 TIED ARRAY BEAMFORMING IN THE CORRELATOR BEAMFORMING IN THE CORRELATOR ............................................................................................... 17 8.1 8.2 8.3 Coherent Tied Array Beamforming ....................................................................................... 17 Incoherent Beamforming ...................................................................................................... 17 Incoherent Beams from Coherent Station beams ................................................................ 18 9 POWER DISSIPATION .............................................................................. 18 10 PATH TO THE PHASE 1 CORRELATOR .................................................... 19 11 COST ................................................................................................... 20 11.1 Cost reduction ....................................................................................................................... 21 11.2 EARLY CONSTRUCTION ......................................................................................................... 21 12 CONCLUSION ...................................................................................... 22 13 ACKNOWLEGMENTS ........................................................................... 22 2011‐04‐01 Page 3 of 22 WP2‐040.060.010‐TD‐002 Revision : 1 LIST OF FIGURES Figure 1 "Pizza" box implementation for 1.5 AA beams. Dual FPGA implementation shown, Each has 25 bi‐directional 10 Gb/s links and there are 25 bi‐directional 10 Gb/s links between FPGAs . 12 Figure 2 Arrangement of data to the 55 correlation cells. Each group comprises data for 10 signals for a single frequency channel ................................................................................................... 13 Figure 3 ASKAP’s Redback2 processing board, with 4 processing FPGAs each with attached DRAM and 8 SFP+ modules .................................................................................................................... 14 Figure 4 Single Pixel Feed correlator board. SFP+ inputs are 10Gb/s, InterFPGA links dual 5Gb/s and backplane links 2.5Gb/s .............................................................................................................. 15 Figure 5 DragonFly2 digitiser and filterbank board .............................................................................. 16 LIST OF TABLES Table 1 Summary of SKA Phase 1 correlator specifications ................................................................... 9 Table 2 ATCA class correlator board designs at CSIRO ......................................................................... 19 Table 3 Estimate Correlator cost. ......................................................................................................... 22 2011‐04‐01 Page 4 of 22 WP2‐040.060.010‐TD‐002 Revision : 1 LIST OF ABBREVIATIONS AA.................................. Aperture Array
APERTIF ....................... APERture Tile In Focus
ASKAP .......................... Australian SKA Pathfinder
CMAC ............................ Complex Multiply Accumulate
EVLA ............................. Extended Very Large Array
FPGA............................. Field Programmable Gate Array
FTE................................ Full Time Effective
GS ................................. GigaSamples
MWA.............................. Murchison Wide-field Array
SKA ............................... Square Kilometre Array
SKAMP .......................... SKA Molongo Prototype
SFP+ ............................. Small Form-factor Pluggable (10 Gb/s optical module)
SPF ............................... Single Pixel Feed
Shelf .............................. Also card cage, chassis, or crate of boards
ATCA ............................. Advanced TCA – an industry standard shelf
FX .................................. A Correlator architecture were the transform to the frequency domain (F)
precede the cross multiplies (X)
Copyright and Disclaimer © 2010 CSIRO To the extent permitted by law, all rights are reserved and no part of this publication covered by copyright may be reproduced or copied in any form or by any means except with the written permission of CSIRO. Important Disclaimer CSIRO advises that the information contained in this publication comprises general statements based on scientific research. The reader is advised and needs to be aware that such information may be incomplete or unable to be used in any specific situation. No reliance or actions must therefore be made on that information without seeking prior expert professional, scientific and technical advice. To the extent permitted by law, CSIRO (including its employees and consultants) excludes all liability to any person for any consequences, including but all losses, damages, costs, expenses and any other compensation, arising directly or indirectly from using this publication (in part or in whole) and any not limited to information or material contained in it. 2011‐04‐01 Page 5 of 22 WP2‐040.060.010‐TD‐002 Revision : 1 1
Introduction This memo describes a potential correlator design, based on FPGAs, which would meet the requirements of the two separate correlator systems needed for SKA Phase 1. As this document is for a design to built some time in the future a number of assumption are made as to the progress in technology and its limitations. With these assumptions viable designs based on ‘conservative’ 10Gb/s optical links are described. For the sparse aperture array a correlator system consisting of up to 320 “pizza” boxes is proposed. For the dish array, two standard ATCA shelves would suffice. In this memo we also describe how a tied‐array beamforming function could be incorporated within the correlator. This document is part of a series generated in support of the Signal Processing CoDR which includes the following: 
Signal Processing High Level Description [11] 
Technology Roadmap [12] 
Design Concept Descriptions [13] through [23] 
Signal Processing Requirements [24] 
Signal Processing Costs [25] 
Signal Processing Risk Register [26] 
Signal Processing Strategy to Proceed to the Next Phase [27] 
Signal Processing Co DR Review Plan [29] 
Software & Firmware Strategy [30] 1.1 Purpose of the document The purpose of this document is to provide a concept description as part of a larger document set in support of the SKA Signal Processing CoDR. It provides a ‘bottom up ‘perspective of Correlation for the different receptor types proposed for the SKA. This document has been produced in accordance to the Systems Engineering Management Plan and Signal Processing PrepSKA Work Breakdown document and includes: 
First draft block diagrams of the relevant subsystem 
First draft estimates of cost 
First draft estimates of power. At present, details on reliability have not been included. SKA Memos 125 and the DRM have been used as the baseline for best information on system parameters while the Systems Requirement Specification, SRS, is being created.
2011‐04‐01 Page 6 of 22 2
WP2‐040.060.010‐TD‐002 Revision : 1 References [1] . Dewdney et al. “SKA Phase 1: Preliminary System Description”, SKA Memo 130, Nov 2010 [2] Bunton, J.D., “SKA Strawman Correlator”, SKA memo 126, Aug 2010 [3] S. Iguchi, S. K. Okumura, M. Okiura, M. Momose, Y. Chikada, ‘4‐Gsps 2‐bit FX Correlator with 262144‐point FFT’ URSI General Assembly 2002, Maastricht, 17‐24 August, See paper 970, Available at: http://alma.mtk.nao.ac.jp/~iguchi/alma.files/p0970.pdf [4] DeBoer, D.R., et.al., “ Australian SKA Pathfinder: A High-Dynamic Range Wide-Field
of View Survey Telescope Array” IEEE Proceedings, Sept 2009 [5] Kooistra, E., “RadioNet FP7: UniBoard” , CASPER Workshop 2009, Cape Town, Sept 29, 2009. [6] ITRS “International Technology Roadmap For Semiconductors, 2010 Update, Overview”, Available at: http://www.itrs.net/Links/2010ITRS/2010Update/ToPost/2010_Update_Overview.pdf [7] De Souza, L., Bunton, J.D., Campbell‐Wilson, D., Cappallo, R., Kincaid, B., ‘A Radioastronomy Correlator Optimised for the Virtex‐4 SX FPGA’, IEEE 17th International Conference on Field Programmable Logic and Applications, Amsterdam, Netherland, Aug 27‐29, 2007 [8] Hussein, J., Klien, M. and Hart, M., “Lowering Power at 28nm with Xilinx 7 series FPGAs” Xilinx White Paper 389, Feb, 2011 Available at: http://www.xilinx.com/support/documentation/white_papers/wp389_Lowering_Power_at_
28nm.pdf [9] Stratix V Device Handbook, Altera, Available at: http://www.altera.com/literature/hb/stratix‐v/stratix5_handbook.pdf [10]Xilinx 7 Series Product Brief Available at: http://www.xilinx.com/publications/prod_mktg/7‐
Series‐Product‐Brief.pdf [11]Signal Processing High Level Description WP2‐040.030.010‐TD‐001 Rev D [12]Signal Processing Technology Roadmap WP2‐040.030.011.TD‐001 Rev D [13]Software Correlator Concept Description WP2‐040.040.010‐td‐001 Rev A [14]GSA Correlator Concept Description WP2‐040.050.010‐td‐001 Rev A [15] ASKAP Correlator Concept Decription WP2‐040.060.010‐td‐001 Rev A [16] UNIBOARD Concept Description WP2‐040.070.010‐td‐001 Rev B [17]CASPER Correlator Concept Description WP2‐040.080.010‐td‐001 Rev A [18]ASIC‐Based Correlator for Minimum Power Consumption‐‐ Concept Description WP2‐
040.090.010‐td‐001 Rev B [19]SKADS Processing WP2‐040.100.010‐td‐001 Rev A [20]Central Beamformer Concept Description WP2‐040.110.010‐td‐001 Rev A [21] Station Beamformer Concept WP2‐040.120.010‐td‐001 Rev A [22] SKA Non Imaging Processing Concept Decription: GPU Processing for Real‐Time Isolated Radio Pulse Detection WP2‐040.130.010‐td‐001 Rev A [23] A Scalable Computer Architecture For On‐Line Pulsar Search on the SKA WP2‐040.130.010‐
td‐002 Rev A [24] Signal Processing Requirement Specification WP2‐040.030.000.SRS‐001 Rev B [25]SKA Signal Processing Costs WP2‐040.030.020‐TD‐001 Rev C [26]Signal Processing Risk Register WP2‐040.010.010.RE‐001 Rev A [27]Signal Processing Strategy to Proceed to the Next Phase WP2‐040.010.030.PLA‐001 RevA 2011‐04‐01 Page 7 of 22 WP2‐040.060.010‐TD‐002 Revision : 1 [28]SIGNAL PROCESSING CODR [29]CoDR Review Plan WP2‐040.020.011‐PLA‐001 Rev B [30] Software and Firmware Strategy WP2‐040.200.012‐PLA‐001 Rev A 2011‐04‐01 Page 8 of 22 3
WP2‐040.060.010‐TD‐002 Revision : 1 Introduction 3.1 Scope The correlator design considers a 4+4‐bit correlator and we include some discussion on the filterbanks needed to implement the FX (frequency‐based cross multiplier) correlator. However, as the filterbank is not included in the final costings presented here as it is considered to be part of the beamformer system in the case of the aperture array or is located at the dish antenna for the dish array 4
SKA SPECIFICATION The specifications for SKA phase 1 are set out in SKA Memo 130 [1], and comprises two antenna arrays: a sparse aperture array and dish array. The sparse aperture array covers the frequency range from 0.07 to 0.45 GHz; each of the 50 aperture array stations generates 480 full bandwidth beams. The dish array comprises 250 dishes each equipped with two octave‐band single pixel feeds (SPF). One SPF covers 0.45‐1 GHz and the other 1‐2 GHz. These specifications are given in Table 1 together with the compute rate of the corresponding correlator. The compute rate is given in terms of complex multiply accumulate operation per second (CMAC/s) as the input to the correlator is complex data. Number
antennas
Processed
Bandwidth
GHz
0.38
0.55
Beams/antenna
Sparse Aperture Array
50
480
Dish with SPF, 1-2 GHz
250
1
frequency coverage
Dish with SPF, 0.45-1 GHz 250
1.0
1
frequency coverage
Table 1 Summary of SKA Phase 1 correlator specifications Correlation
TeraCMAC/s
900
69
125
It can be seen that the major compute load is from the sparse aperture array with the high frequency SPF system being only about one seventh the compute load 5
TECHNICAL ASSUMPTIONS To generate a base design we define some guiding assumptions. The following are largely taken from SKA Memo 126 [2] but adapted to suit SKA phase 1. Assumption 1: Data Precision The ADC resolution for the SPF systems is ~8bits. Correlator data precision is 4bits. While the lower bandwidth systems on PAFs or on an AA may have more ADC resolution, it is assumed that beamforming will occur at the antenna and that subsequent data transport to the correlator will be at 4bit precision. Assumption 2: FFT length The maximum length of an FFT or polyphase filterbank is ~1000 frequency bins within an FPGA or ASIC. As the size of an FFT increases the computational load increases as log(N) but the memory is proportional to N. Eventually the memory dominates. When this occurs it is better to use external memory and process long FFTs in two stages. An example is the ~256 thousand point FFT 2011‐04‐01 Page 9 of 22 WP2‐040.060.010‐TD‐002 Revision : 1 implemented for the ALMA compact array correlator [3]. In these an ASIC implements a 512 point FFT. Two such ASICs with a corner turner memory in between are used to implement the long FFT. Synthesis on FPGAs show that a ~2000 point polyphase filter bank uses similar percentages of the block ram and multiplier resources. Beyond this, the design is dominated memory leading to underutilisation of multiplier and logic resources. Assumption 3: Inputs per Shelf A single shelf (crate, chassis or card cage) is limited to ~256 optical connections (16 boards with 16 inputs each). Both the ASKAP beamformer [4] and a full UNIboard system[5] are designed close to this limit. Digital optical modules are usually sold with both a transmitter and receiver so this gives ~256 optical inputs and 256 outputs. It is assumed that the input and output of an optical module can be used for independent data paths, for example, a correlator input and a tied array beam output. This can halve the number of expensive optical modules. It will be seen that data transport is a major correlator cost so this optimisation is vital. However, use of optical systems with independent inputs and outputs may preclude the use of commercial switches. Assumption 4: Optical Fibre Transmitter Data transport to the correlator will be on fibres and for SKA phase 1 it is assumed that this data is transported at 10Gb/s per fibre. 100GE transport may be an option but it is as yet uncertain whether this technology will have reached a sufficient level of maturity and affordability for SKA Phase 1. For long haul links individual single mode fibre transmitters are used. For short haul links up to 100m, multimode fibre can be used. This allows the use of optical transmitters that illuminate 12 fibre ribbon cables giving a data rate of up to 120Gb/s per transmitter. Assumption 5: FPGA Capabilities We chose to build the correlator out of “midsized” FPGAs. The largest FPGAs come at a premium in terms of cost per unit of processing. Furthermore, the largest FPGAs are always the last to be released which further reduces their usefulness. Next‐generation midsized FPGAs are due for release in 2011 with production quantities on the market in 2012. The midsized FPGAs in this generation have ~2000 multipliers that clock at close to 400 MHz. The ITRS roadmap [6] shows another generation in 2 years after which it is expected to become a 3‐year cycle. Hence, in 2017 it is expected that midsized production FPGAs will have ~8000 multipliers. These FPGAs should clock at 450 MHz in correlator applications. This gives the FPGA a processing capacity of 3.6T 18‐bit multiplies/sec. Each multiplier can be used as a complex 4+4bit multiplier [7] so this translates to 3.6T CMAC/s per FPGA. This equates to 29T arithmetic operations per second. Assumption 6: FPGA Power Dissipation Power dissipation of current generation FPGAs is ~10W when the device utilisation is at a level expected for the ASKAP beamformer and correlator. Xilinx claim next generation FPGAs will consume half the power for the same logic functionality [8] thus a 2000 multiplier next generation FPGA is expected to dissipate ~10W. Xilinx also achieved a halving of power dissipation between Virtex 5 and 6. If this trend continued, a 2017 generation FPGA with 8000 multipliers would also dissipate ~10W. Instead, it is assumed that the power per unit of logic will decrease by 0.7 for subsequent generation of FPGA. This would see the midsized 2017 FPGA dissipating ~20W. A summary of the assumptions is: 1. Input to correlator has 4‐bit precision 2. Maximum FFT length implemented internally to an FPGA is ~2000 points. 3. Single hardware shelf support ~256 optical transceivers. 2011‐04‐01 Page 10 of 22 WP2‐040.060.010‐TD‐002 Revision : 1 4. Data transmission at 10Gb/s per fibre. Individual transmitters for long haul, fibre ribbon transmitters for short haul. 5. Midsized production FPGA has 8000 18‐bit multipliers in 2017. Power dissipation of midsized FPGA in 2017 is ~20W. 6
SPARSE APERTURE ARRAY CORRELATOR 6.1 Architecture Overview. For the sparse aperture arrays beamforming occurs at each station. The preferred beamforming technique first decimates the bandwidth from the antenna into a number of frequency channels and then beamforms within each frequency channel. We propose that the data is decimated to its final frequency resolution of 1kHz at this location and then be quantised to 4‐bit resolution before being transport to the correlator. The data rate for a single polarisation beam from a station is ~380MSample/s x 8 bits/Sample or 3.04Gb/s. Note each sample consists of 4+4bit complex number. Thus each 10Gb/s link can transport the data for 3 single polarisation beams. The station has 480 dual polarisation beams each with a bandwidth of 380MHz. To transport this data 320 single mode fibres are required. This is a large capacity, but is considered reasonable when compared to ASKAP’s [4] data transport capacity from each of its antennas on 192 fibres. There are 50 sparse aperture array stations. Assumption (3) puts a limit of 256 on the number of fibres into a single correlator shelf. Thus, each shelf can have as input 5 fibres from the each of the 50 stations. The resulting correlator would have 64 shelves and shelf has a processing load of 14T CMAC/s. Using assumption 5, this computing load can be accomplished with just 4 midsized FPGAs. A standard 14‐16 slot AdvancedTCA shelf or similar is not a good fit to these requirements. It is better to consider the processing load imposed by a single fibre from each of the 50 station. Each fibre is configured to carry data for 1.5 dual polarisation beams. This is achieved by transporting the full bandwidth for one beam and half the bandwidth for another. The compute load is 1/320 of the total or 2.8T CMAC/s. This can be processed by a single FPGA. FPGAs with sufficient I/O bandwidth have already been announced [9][10]. 2011‐04‐01 Page 11 of 22 WP2‐040.060.010‐TD‐002 Revision : 1 Figure 1 "Pizza" box implementation for 1.5 AA beams. Dual FPGA implementation shown, Each has 25 bi‐
directional 10 Gb/s links and there are 25 bi‐directional 10 Gb/s links between FPGAs The physical implementation of the correlator could be in pizza box size modules, Figure 1, with 25 dual height SPF+ modules to accept the beam data. The outputs of the SPF+ modules are routed to the processing FPGAs within the box. A two‐FPGA implementation is shown in Figure 1which includes a separate control unit, either FPGA or DSP that communicates via gigabit Ethernet. With the two‐FPGA solution half the input bandwidth to one FPGA must be transported to the other. This is implemented on the 25 10Gb/s links between the FPGAs. Download of correlation form the “pizza” box is assumed to be implemented via the SFP+ optical modules. If the correlator implements a 1.2 second correlation on 1kHz resolution data then ~1/4 of the SFP+ transmitters are needed to download 24+24 bit correlation data. 6.2 Correlation Processing Using the correlation cell proposed in [7] a single multiplier can correlate one group of 16 antenna signals against another group of 16. The cell process 256 correlation at any one time and on average process a bandwidth equal to the clock rate divided by 256. For the Aperture Array correlator groups of 10 are a better choice. Exactly 10 groups of 10 are needed to cover the 100 inputs from the antennas (50 dual pol). Each correlation cell can be considered as forming full Stokes parameters for 25 baselines. Some cells form correlation between identical sets of inputs. It is these cells that calculate the autocorrelations but there is some inefficiency because cross‐correlations are duplicated. Because of the inefficiencies and added autocorrelations 55 correlation cells are needed to form all correlations simultaneously for a single frequency channel. The arrangement of inputs to the 55 cells is shown in Figure 2. The cells with two sets of identical inputs are located along the diagonal. 2011‐04‐01 Page 12 of 22 WP2‐040.060.010‐TD‐002 Revision : 1 6
5
1
2
3
4
Groups
7
8
9
10
Figure 2 Arrangement of data to the 55 correlation cells. Each group comprises data for 10 signals for a single frequency channel The FPGA for SKA Phase1 has ~8000 multiplier/correlation cells allowing the FPGA to process all correlations for 145 individual frequency channels at one time. The 1.5 beams processed by the FPGA have 570,000 1 kHz frequency channels. These 570,000 are divided into 3,931 sets of 145 channels. A single time sample for a 1 kHz channel must be processed each1ms, implying a processing time of 0.254μs for single time sample in one of the 3931 sets. The correlation cell takes 100 clock cycles to process all data for a single time sample and the FPGA must clock at 393 MHz to process the data. The class of FPGAs discussed in assumption (5) are capable of comfortably achieving this clock rate. This implies a single FPGA per “pizza” box is sufficient to implement the correlator. In a later section the added capabilities of two FPGAs allows coherent tied array beamforming as well. 6.3 Correlator Data Reordering The correlation cell stores only one set of correlations at any one time. Ideally, the cell should process all time samples needed to calculate the correlation for a single frequency channel. These correlations are then dumped to the imaging system while the correlation cell processes data for the next frequency channel. The minimum dump time of the correlator is 1.2s [1]. In 1.2 seconds integration at 1 kHz resolution the correlation cell processes 1,200 time samples. To achieve this order of processing it is necessary to store 1,200 time samples for 380,000 frequency channels of the 480 dual polarisation beams per antenna station. Each time sample is a single byte giving a total data set size of 440GB or 880GB with double buffering. In 2017 32GB DRAMs should be commodity items and the 880GB can be stored in 28 of these The I/O bandwidth required is 0.38GHz by 1 byte/Hz by 480 beams by two polarisations which gives 364Gbyte/s for read and the same for write. Total I/O bandwidth is 726Gbytes/s. In 2017 the clock rate for commodity DRAM modules is expected to be ~5GHz with 8 bytes transferred per transaction giving a possible I/O bandwidth of 40Gbytes/s. Thus, the 28 DRAMs have sufficient I/O bandwidth. The 28 DRAMs per antenna station can be located at the antenna station or at the correlator. The total for all 50 stations is 1400 DRAMs. At the correlator, this can be implemented by adding three DRAMs to each of the two FPGAs in a correlator “Pizza” box. 2011‐04‐01 Page 13 of 22 WP2‐040.060.010‐TD‐002 Revision : 1 6.4 Data transport: SKA Phase 1 & 2 options The design given above assumed 10Gb/s data links but already 28Gb/s SERDES transceivers are promised on next generation FPGAs. If matching optical transceivers are made available in the same form factor as SFP+ modules then the number of optical fibres from a sparse AA station could be reduced to ~120. Each “pizza” box unit would now house five mid‐sized FPGAs with interconnection between the FPGAs. This would be a simple extension of design shown in Figure 1. It is more likely that 40b/s and eventually 100Gb/s links will become available. At these data rates a multi‐board correlator design may be needed for the SKA Phase 1 Aperture Array correlator. 7
SPF DISH CORRELATOR 7.1 Architecture Overview Assuming that the filterbank operations occur at the dish as implemented within ASKAP, then the data from the dish at a 1 GHz bandwidth is 16Gb/s for two polarisations at a 4+4bit complex data resolution. To transport this data, two 10Gb/s links are needed each carrying data for 500MHz for the two polarisations. There are 250 antennas in the array so by assumption (3) a single shelf of boards can take a single fibre from each antenna. The processing requirement for the 500MHz of bandwidth on the fibres into a single shelf is 63 TCMAC/s. A single FPGA is can implement ~8000 correlation cells at 0.45GHz or 3.6 TCMAC/s. The correlation requirement for the shelf can be implemented in 18 FPGAs. To accept the 256 optical inputs the shelf would consist of 16 boards each housing two FPGAs which may have fewer than 8000 multipliers. ASKAP are currently using a “Redback2” processing board shown inFigure 3, which comprises four processing FPGAs. Four of the next generation FPGAs could provide a board with the required compute capacity for the SPF SKA Phase 1 correlator. Current generation FPGAs have limited high speed I/O, however with next generation FPGAs a direct implementation of the current design would look like Figure 4. The current board has 8 SFP+ modules on the front panel. The extra two large chips on the board are a control FPGA and cross point switch. The cross point switch would not be required in future generations of this board as the input data is switched by the FPGAs. Figure 3 ASKAP’s Redback2 processing board, with 4 processing FPGAs each with attached DRAM and 8 SFP+ modules Using the ATCA standard each board would have 16 single‐height SFP+ on the front. These would be used for data input and also correlator output. The 16 inputs per board give the system the ability to handle 256 inputs. This could be 250 antennas and 6 dual polarisation RFI signals. The use to which the RFI signals can be put is not considered here. 2011‐04‐01 Page 14 of 22 WP2‐040.060.010‐TD‐002 Revision : 1 SFP+
SFP+
SFP+
15 x
2.5Gb/s
FPGA1
SFP+
SFP+
SFP+
15 x
2.5Gb/s
FPGA2
SFP+
SFP+
SFP+
SFP+
15 x
2.5Gb/s
Back Plane
Input and output 16 x 10Gb/s optical
SFP+
FPGA3
SFP+
SFP+
SFP+
SFP+
15 x
2.5Gb/s
FPGA4
SFP+
Gigabit Ethernet
Monitor and
control
Figure 4 Single Pixel Feed correlator board. SFP+ inputs are 10Gb/s, InterFPGA links dual 5Gb/s and backplane links 2.5Gb/s The 16 inputs route to four FPGAs which then route the data to the backplane. After the backplane, routing each FPGA has 31.25 MHz of data from 64 antennas. For simplicity, the input bandwidth is made 512 MHz and each FPGA has data for 32 MHz. A second stage of data redistribution is now needed between the four FPGAs. This is done on the dual 5GB/s links between FPGAs. After this data redistribution, each FPGA has 8 MHz of data for all 256 inputs. In an implementation with one processing FPGA there would be still be 16 10Gb/s inputs and 60 connection to the backplane, but no inter‐FPGA links, giving 74 high speed connection to the FPGA. It is interesting to note that the next generation FPGAs [9][10] approach this I/O capacity, but unfortunately they lack the required computing capacity. As with the AA correlator the correlator board will process a limited number of fine frequency channels at a time. In this case, the inputs could be broken up into 32 groups of 16. This requires 528 correlation cells to process a single frequency channel at one time. If the cell clocks at 400 MHz and it takes 256 (16x16) clock cycles to process a single data set then each group of 528 multipliers can process fine frequency channels for 1.5625 MHz of bandwidth. To process the 500 MHz bandwidth that comes into the 16 processing boards in each shelf will require 320 groups of cells. Across the 64 FPGAs in the shelf this translates into 2640 multipliers per FPGA for a 4 FPGA board and 5280 multiplier per FPGA if there are just two processing FPGAs per board. As the frequency resolution could be as fine as ~1 kHz each group of 528 correlation cells could process 1563 fine frequency channels; however, there is insufficient memory in the FPGAs to store these correlations. One solution is to add external memory and accumulate the data to memory. At 2011‐04‐01 Page 15 of 22 WP2‐040.060.010‐TD‐002 Revision : 1 a 400 MHz clock rate the 2,640 correlation cells generate 2640 x 256 correlations or 67,580 correlations every complete integration cycle. In the integration is over 512 time samples this occurs every 328μs. If the complex correlation data is accumulated to dual 32 bit words then a 64 bit word must be read and written every 4.8ns. Four times this data rate is achievable with current commodity DDR3 DRAMs. So a single DRAM DIMM is needed with every FPGA in either the four or two FPGA design. As with the sparse AA correlator it is assumed that the filterbanks are physically located at the antenna. A board with this level of capability is the ASKAP “DragonFly2” board illustrated in Figure
5. An essential part of this board is the FPGA which provides the interface between the ADCs on the right and the SPF+ 10Gb/s optical modules on the left. In ASKAP it was reasoned that as the FPGA was needed for the interface it can be utilised to do some intermediate processing: within ASKAP this FPGA implements the coarse filterbank. Surprisingly the RFI generated by this board was so low that it was difficult to measure in the laboratory. For SKA Phase 1 it would be desirable for the FPGA to also implement the fine filterbank. The advantage of this approach is standard 8‐bit ADC can be used at the antenna and truncation to 4+4 bit for the correlator occurs at only one place; after the fine filterbank. The implementation of the fine filterbank would require the addition of RAM to the boards to store the coarse filterbank data. This may raise the RFI levels which might require dual board design with the RAM isolated from the ADC. In either case, the system is quite compact. Note, implementing the filterbank at the antenna does not significantly impact the reliability of dish hardware as a system as the digitising is done at the antenna in any case. Figure 5 DragonFly2 digitiser and filterbank board 7.2 Extension to PAF correlator Assuming that beamforming for a PAF occurs at the antenna then to the correlator it appears that antenna is simply generating more 500 MHz bandwidth signal. Each one of these will need a correlator shelf. Consider for example PAF beamformer that generate 36 dual polarisation beams each with a bandwidth of 500 MHz. This has an order of magnitude higher survey speed than the 1‐
2 GHz SPF. The correlator hardware required for this comprises 36 correlator shelves. It is possible to fit three shelves in a equipment cabinet and the full system has 12 cabinets. 2011‐04‐01 Page 16 of 22 8
WP2‐040.060.010‐TD‐002 Revision : 1 TIED ARRAY BEAMFORMING IN THE CORRELATOR BEAMFORMING IN THE CORRELATOR 8.1 Coherent Tied Array Beamforming The inputs to the correlator are likely to be commodity optical modules such as SFP+. These are designed as bidirectional units. Assuming 105 frequency channels in the SPF correlator, 24+24 bits per correlation and 125,000 correlations per channel then the correlator output data rate with 1 second correlations is 600Gb/s. This use 60 of the 500 optical outputs leaving 440 free for other purposes such outputting the data from tied array beamforming that can be implemented in the correlator. If coherent tied array beam data is also 4+4 bit complex data then the 440 outputs is sufficient for 220 dual polarisation beams. To a first approximation there are as many tied array beams as antennas. For the AA correlator one quarter of the optical outputs are used for transporting correlation data leaving 38 per “pizza” box for tied array data. Each output can transport data for 1.5 beams. Each “pizza” box has output capabilities for 57 beams. Over the 320 “pizza” boxes this gives 18,000 tied array beams. In both cases the number of tied array beams (per antenna beam) is approximately equal to the number of antenna. To form a coherent tied array beam output a single time complex sample from each antenna is is multiplied by a complex weight (usually corresponding to a given phase) and all the data summed. The number of complex multiply accumulate operations(CMACs) is the 2 x number of tied array dual pol beams x the number of inputs summed per beam. Both these factors are approximately equal to the number of antennas Nant, assuming the two polarisations are summed separately. Hence, the number of CMAC operations is equal to 2Nant2. For the correlator the same number of inputs requires 2 Nant(Nant+1) CMACs if autocorrelations are also calculated. The two compute requirements are approximately equal and require the same number of multipliers assuming 4‐bit complex beam weights. (4‐bit weights introduce up to 3.5o phase errors and 6% amplitude error and analysis is needed to determine if this is sufficiently accurate) If the beamforming is done in the correlator then the added cost of generating Nant tied array beams doubles the compute load. It requires a doubling of FPGA resources. This doubling of FPGA resources has already been factored into the designs for both the AA and SPF correlators. If more than 220 SPF tied array beams or 18,000 AA tied array beams are need the signal from the antennas must be duplicated, possibly in a optical splitter and the correlator/tied array beamformer systems duplicated. After beamforming, the data is at the frequency resolution of the correlator. The channalised data is produced by analysis polyphase filterbanks, the beam data can be brought back to the original sample rate (frequency resolution) by a matching synthesis polyphase filterbank. An oversampling analysis polyphase filterbank is needed if the errors in the resampled data is to be kept low. 8.2 Incoherent Beamforming Full incoherent beamforming generates one beam per antenna beam and the data is generated as power averaged over a small time intervals, usually 1ms or less. The total data rate is much lower than that for coherent beams. The internal processing within the correlator is similar to a single coherent beam: instead of multiplying the input data by a 4+4 bit weight it is multiplied by the conjugate of itself. This adds very little to the compute load compared to that of the correlator. 2011‐04‐01 Page 17 of 22 WP2‐040.060.010‐TD‐002 Revision : 1 8.3 Incoherent Beams from Coherent Station beams Coherent beams have the highest sensitivity and incoherent beams the largest field of view. Intermediate between these two are beams that are formed by incoherent summation of station beams. This is a strong possibility for the dish array and is what is happening in practice for incoherent beams generated from AA station beams. The dish array is broken up into a number of stations or sub‐arrays all of a similar physical size. Within each station coherent beamforming is performed. This is followed by incoherent summation of the corresponding station beams. The compute cost for forming a station beam, for all stations, is equal to that for forming a coherent tied array beam as it is still necessary to multiply each input sample by a complex weight. However, each summation is now over the individual station. The station beams are then incoherently summed. In the section on coherent beamforming the number of beams that were generated was limited to a number equal the number of antennas. This approximately doubled the FPGA resources in the correlator. It is assumed the same limitation applies here. The field of view of a station beam is approximately the field of view of the dish FoVdish multiplied by filling factor Ffill divided by the number of antennas in the station Nstation. Thus the fraction of the FoVdish that can be covered by the Nant beams is Ffill* Nant / Nstation. A high station filling factor would be 0.1. Hence the full FoVdish could be covered if Nstation = 25 for SKA Phase 1. This all assumes that all stations have a similar antenna layout. However some antenna configuration have stations with antennas arranged in lines at various angles. This will greatly reduce the common field of view and increase the number of beams need to cover the field of view of the dish. Data output requirements for the incoherent data is much less than that for coherent beams. Typically the data might be 1ms power integrations over 1MHz bands. This reduces the data rate by a factor of 1000. Even allowing an increase in precision from 4 to 32 bits the bit rate is still 1/125 of that for coherent beams. This can be transmitted on two 10G links from each of the correlator shelves in the SPF dish correlator. 9
POWER DISSIPATION The inclusion of beamforming in the same subsystem as the correlator doubles the number of FPGAs and hence the power requirements. The following assumes the correlator also generates tied array beams For the AA correlator the inclusion of beamforming for 50 beams results in a system that has 640 FPGAs assuming 2017‐class FPGAs are used. From assumption 6, each FPGA would dissipate ~20W. If the power supplies for the FPGAs are 80% efficient then the dissipation attributable to the FPGAs is 16kW. The power dissipation of the AA the beamformer and fine filterbank is out of scope for this document. The optical transceiver are currently ~1W per link. By 2017 the expectation is the power per link will more than halve resulting in a optical SFP+ module dissipation of ~8kW. This brings the total power to 24 kW. For the SPF dish correlator the FPGA power dissipation can be calculated by noting that the compute requirements are about one seventh of that for the AA correlator. This gives power dissipation of just over 2.28 kW for the shelves and 250W for the SFP+ modules. This gives a correlator dissipation of 2.5kW. For the SPF dish correlator the ADC and filterbanks systems are needed. The main components are a high‐speed dual channel ADC, FPGA and DRAM. With today’s technology this is ~10W and by 2017 it is expected to be less than ~3W. This adds ~750 W to the power dissipation. Optical SFP+ modules add another ~250 W of dissipation, taking the total power dissipation ~3.5 kW for the entire system. 2011‐04‐01 Page 18 of 22 WP2‐040.060.010‐TD‐002 Revision : 1 If 2014‐class FPGAs are adopted, then we estimate that the power dissipation would increase by 40%, i.e. the AA correlator would consume ~34kW and the dish correlator ~4.5kW. 10 PATH TO THE PHASE 1 CORRELATOR Experience at CSIRO has shown there is considerable benefit of a program of continuous hardware development. This not only keeps a cohesive team together who maintain the knowledge from earlier system implementations but there is also the advantage that lessons learned with each hardware implementation are used to improve the next generation hardware. Fundamentally, individuals within the team will likely change over time, but as long as the there is continuity in the core skill areas the team’s expertise remains high. For ATCA‐class hardware CSIRO has already implemented many designs most of which are listed below. This productivity was achieved by having two teams operating concurrently for all of the earlier boards. Number
Processing
FPGAs
No 64 bit
DRAMs
interfaces
18bit Mulipliers
Used for
Processing
Input data
rate
Technology
MOPS
17
2
1496
20 Gb/s
Virtex II Pro
SKAMP
14
4
1680
40 Gb/s
Virtex 4
CABB
10
6
3056
40 Gb/s
Virtex 4 and II Pro
Redback1
4
4
2560
120 Gb/s
Virtex 5
Redback2
4
4
3072
160 Gb/s
Virtex 6
Table 2 ATCA class correlator board designs at CSIRO One obvious trend in Table 2 is the decreasing number of FPGAs on the boards at each technology step. The lesson here was that more complex boards take longer to design and it was better to implement simpler boards. Another trend is the improving ratio of DRAMs to FPGAs even though the ratio of DRAM to multipliers is roughly constant. A major limitation of FPGAs is their limited internal memory and this is compensated by the addition of external memory. The actual processing capabilities of the board have not increased greatly but the board development time (schematic and layout) is now down to 3 months (Redback1, Redback 2 was same basic design with and FPGA upgrade and took a couple of weeks to layout). What has reduced is the cost of producing each board. This cost has steadily declined so there has been a significant reduction in the cost per unit of processing capability with time. CSIRO plans to continue developing boards in future generations of FPGAs and the lessons learnt regarding the limitation of previous board will shape their design. The current Redback2 board uses Virtex‐6 and the limitation in the design is the I/O capabilities of the FPGAs. Next generation FPGAs, both Xilinx and Altera, have increased both the speed and number of SERDES inputs so this limitation will not occur in next generation boards. A simple upgrade\` of the Redback2 FPGAs with next generation FPGAs produces a system that has that is close to that needed for the dish with SPF correlator, Figure 4 This design is discussed in section 5. To implement the SKA correlator and other digital systems such as beamformers it is suggested that at least one team be assembled to develop designs. The team is to develop a design for all digital systems, beamformer, filterbanks and correlators, with each generation of FPGA so that the final system has the refinements derived from correcting the limitation of earlier generation of hardware. This approach can also means that a new design will eventuate very rapidly when a new generation 2011‐04‐01 Page 19 of 22 WP2‐040.060.010‐TD‐002 Revision : 1 of FPGAs are announced. The ASKAP experience was that it took considerable time to develop a concept starting from scratch. However the second generation Redback system was rapidly developed and it is expected that the same should be true of third and later generations of hardware. The team that delivered two generations of hardware and half the firmware for ASKAP comprises an average of 6 people over a year and half. The pace of development for SKA Phase 1 will likely be less concentrated and it should also be able to build on the foundation of firmware developed for projects like APERTIF, MWA, EVLA, SKAMP as well as ASKAP. For the hardware, assuming 2‐3 years between generations, about 1 FTE is needed over a number of different skill sets. The engineers need to be skilled in schematic and board layouts working with at least 2 concept developers. In addition, the functions of management and prototype production must be covered. A number of different people are needed to cover these functions and it is suggested that the people are embedded in a radioastronomy institute that would use their skills for other functions when not involved in SKA hardware development. It is suggested that at least two concept developers be used as a single developer can get trapped by adherence to a particular philosophy/approach and a at least one other person is needed to see faults or alternative approaches. Here a healthy rivalry can be very productive. The input from these people is quite sporadic however. In addition to hardware development, there needs to be 2‐3 FPGA programmers to develop new firmware and adapt existing firmware to new hardware. This is an area where more personnel can greatly speed development. In addition for SKA phase 1 a number of production personnel are needed in the years that the full system is deployed. This gives a team size of at least six during production and installation. In the years leading up to this a team of a minimum 3 FTEs is needed spread over 5‐6 people. The above presupposes a design that is deliberately chosen so that it is easily implementable. Designs of higher complexity could be chosen but these would require more input into the hardware design. This will require more effort but could lead to a design that has fewer boards but with a higher risk of late delivery. With the “simple” design approach the requirement could be as low as 3 FTEs for 3 years and plus 6 FTEs for 2 years, or a total input to the design and production of ~20 FTEs. In addition ~$1M is needed to allow prototyping of full capability subsystems. 11 COST The cost of midsized FPGAs in the quantities needed for the SKA, assuming the same FPGA is used throughout all beamformers and correlators, is expected to ~$1000 each. This is less than the equivalent FPGA current list price but more than competitive high volume pricing. The highest performance FPGA is necessarily of a similar size to high‐end CPUs hence with similar margins they should have a similar cost base. These CPUs have a cost of ~$1000. In this memo devices half the size of the biggest FPGAs are used (“midsized”). Considering the size reduction and improved yield, they are potentially about a third the cost of the highest‐performance devices and a cost much less than $1000. Thus an estimate volume price of $1000 could be considered conservative. In 2017 the AA correlator comprises 320 midsized FPGAs, excluding tied array beamforming. If the FPGAs cost ~$1000 each including DRAMs, and the boards, assembly, “pizza” boxes and power supplies another ~$2000, then cost for processing in the correlator is $960,000. To these costs we need to add the optical links; there are 16,000 10Gb/s links. If these are $100 each then this becomes the dominate cost at $1,600,000. Hence, the estimate cost is $2.6M using 10Gb/s links. The SPF dish correlator has two ATCA shelves at ~$10k each and a 48v power supply costing ~$5k. Each shelf has 16 processing boards for a total of 32 processing boards. Each processing board has two FPGAs. Assuming $2000 for the two FPGAs on each board and the same cost for the board 2011‐04‐01 Page 20 of 22 WP2‐040.060.010‐TD‐002 Revision : 1 gives a total cost of $2000 per board. The cost for all 32 boards is $128,000. The input used 500 SFP+ modules and these are estimated to cost $50,000 giving a total cost of about $180,000. The 500MHz PAF correlator is 18 times this value or $3.2M. Adding coherent beamforming capabilities adds a single FPGA to board for both designs, remembering that the only part of the FPGA capabilities are used in dish correlator design. This adds $320,000 to the AA correlator and $64,000 to the dish correlator. 11.1 Cost reduction In the section on tied array beamforming all the optical outputs in the correlator can be used if coherent beamforming is used. This represents a massive amount of data to process and it is probably impractical to implement coherent transient searches on this data. It is more likely that incoherent data will be used for searching with a limited number of coherent tied array beams being used for task such as pulsar timing, SETI search and targeted observations of transient sources. Assuming the total number of coherent tied array beams is of order tens for dishes and a hundred for aperture arrays then it is seen that the majority of optical transmitter are unused. After considering correlation data, coherent beams and incoherent beams ~200 are unused in the dish correlator shelf and 37 per AA correlator “pizza’ box. The major cost of any optical transceiver is the optical transmitter. In ASKAP input to the correlator has been implemented by receive only ROSA modules. The cost saving in using these for the majority of the correlator inputs is ~$35,000 for the dish correlator and for the aperture array correlator ~$1M. With limited coherent beamforming the dish correlator needs only two FPGAs and the dish correlator cost is reduced to $165,000 and the AA correlator $1.6M 11.2 EARLY CONSTRUCTION The designs given above are for systems assumed to be constructed in 2018. If an earlier generation of FPGAs are used the roughly twice as many are needed, but initial construction could can be advanced to as early as 2015. For both systems, the actual boards have sufficient area to accept the extra FPGAs. In the cost of sparse aperture array correlator with 10Gb/s links each “pizza” box holds two FPGAs and the added cost is ~$320,000 this increases to ~$640,000 for full implemented coherent beamforming. For the dish correlator the number of FPGAs increases from 2 to 3 and to 4 with beamforming and the added cost is $32‐64k. In both cases the added cost is comparatively small as the major cost is in the signal transport and infrastructure to support the FPGAs. For either generation of FPGA there is little difference. The correlator would be prototyped in early generation FPGA and the decision move to next generation FPGA can left to late in the construction phase. 2011‐04‐01 Page 21 of 22 WP2‐040.060.010‐TD‐002 Revision : 1 Correlator Type
2017 Cost
2017 with ROSAs
Cost 2014
$k
$k
$k
AA
2600
1600
2900
AA + coherent BF
2900
NA
3500
SPF Dish
180
145
210
SPF Dish + coherent BF
210
NA
274
PAF Dish
3240
2610
3780
PAF Dish + coherent BF
3780
NA
4930
Table 3 Estimate Correlator cost. 12 CONCLUSION We have outlined potential designs for the SKA Phase 1 AA and dish correlators. They are comparatively simple and consist of 1‐4 FPGAs per board and a number of 10G optical transceivers. The design assumes technological advances in FPGA compute power over the coming years and are incremental changes to existing boards used in radioastronomy. The firmware will also be incremental on that used currently. As well as implementing the correlator, the same designs can implement tied array beamforming. Using bi‐directional optical modules a board can output almost as many beams as there are antennas and station beams. This is ~220 coherent tied array beams for the dish correlator/beamformer and ~19,000 for the full aperture array correlator/beamformer. The computing requires a doubling of the FPGA resources on each processing board. Implementing full incoherent beamforming has a much lower cost as only one beam is generated. Coherent station beamforming followed by incoherent beamforming is also possible. The single pixel feed dish correlator is a system that fits into a single cabinet and has two shelves of processing boards. Each processing board has 2 FPGAs as well as an industry standard backplane interconnect between the 16 boards in a shelf. For a 500MHz PAF correlator the system is 18 times larger. The sparse aperture array correlator could use a single board in “pizza” box design. The limiting factor in the size of this correlator is the number of optical receivers. With 10Gb/s technology there are 16,000 fibre links. Each “pizza” box takes a single fibre from each sparse aperture array and this requires ~320 “pizza” box modules to take in the data. With higher speed optical connections the number of “pizza” box modules can be reduces. The estimated cost of these correlator is given in Table 3. 13 ACKNOWLEGMENTS The expertise in the construction of high performance processing boards at CSIRO would not be possible with a great team of people. The author would like to thank Andrew Brown, Evan Davis, Dick Ferris and Joseph Pathikulangara for their contribution to the design and layout of boards mentioned in Table 2. Behind this group is a large dedicated team who have procured the parts and ensured the boards were assembled and debugged 2011‐04‐01 Page 22 of 22

Download Report

ska phase 1 correlator and tied array beamformer

Paperzz.com

Your Paperzz