The SKA LOW correlator design challenges John Bunton | CSP System Engineer C4SKA, Auckland, 9-10 February, 2017 CSIRO ASTRONOMY AND SPACE SCIENCE SKA1 Low antenna station (Australia) The SKA LOW correlator design challenges Station beamforming part of Receptor Sub-Element (LFAA) The SKA LOW correlator design challenges Low Stations Low Frequency Aperture Array - LFAA LFAA has 512 station, maximum baseline less than 65km • Distributed between 1-16 subarrays Each station 256 dual polarisation log periodic antenna. • Frequency band 50-350MHz Signal is sent using RF-over-Fibre to Digitiser/beamformer. Output to CSP_Low • • • • • 384 “coarse” channels per station (781 kHz each, 300 MHz total) Channels can have any of 8 different look directions. Stations can be in any of 16 subarrays Up to 128 look directions! 5.8Tbps of data The SKA LOW correlator design challenges Central Signal Processing Central Signal Processing (CSP) tasked to take the data from LFAA and produce • visibilitiy data and • Pulsar products. CSP divided into 4 sub-elements • • • • Correlator and beamform Pulsar timing Pulsar search and Local Monitor and control Correlator and beamformer (CBF) work package is done by CSIRO (Australia), ASTRON (Netherlands) and AUT (New Zealand) The SKA LOW correlator design challenges Correlator – “standard” mode Low correlator full Stokes (all polarisation parameters) Low 524,800 correlations per frequency channel Upt to 65k channels across the band Low 34G correlation per dump, 0.25s dumps, 11.0Tbps output Note originally 0.9s Compute load in the correlator (one correlation acc equiv 8 Flop) Low 524,800 correlation at 0.3GHz = 1.26 Petaflops Other processing within CSP similar in compute load MWA Correlator TFLOPS Inpt data rate Tbits/sec Output data rate Gbit/sec The SKA LOW correlator design challenges 8 0.08 3 LOFAR 19 0.34 97 JVLA 131 3.1 1 ASKAP 224 12.4 20 ALMA SKA1_LOW SKA1_MID 746 1258 2484 10.4 5.8 29.9 48 10995 2543 Frequency reolution Frequency Resolution “standard” observing • 4.6 kHz across 300 MHz • 52k frequency channels Or Zoom Mode - 4 bands • • • • • • • • Zoom band bandwidth each 4, 8, 16, 32, 64, 128 or 256 MHz Note orginally just 4, 8, 16, 32 MHz Zoom band centre frequency – anywhere in observing band. Band overlap allowed An LFAA frequency channels can be in any zoom band 16k output frequency channels per zoom bnad resolutions 0.23 to 14.5 kHz The SKA LOW correlator design challenges Requirement churn Recent Engineering Change Proposal has allows an exchange of bandwidth for number of inputs. Each of 512 LFFA stations can be configured to sum subsets of the 256 • Example over 150 MHz sum two separate sets of 128 antennas – Looks like two smaller stations (substations) each with 150MHz bandwidth. – Input to correlator now 1024 station, to maintain total numbe of correlations constant bandwidth is reduced to 75MHz. Must design so it is possibel to accomodate 1204 or 2048 substation Other major recent changes • Added zoom modes 64-256 MHz. Required a major design change. • Decrease in integation time – major increase in output data rate Design must be capable of adapting to requirement change. The SKA LOW correlator design challenges Station based processing FX correlator implemented – channelise data to final resolution before correlations • 8 diiferent frequency resolutions (226 to 14.5kHz) - 8 filterbanks ??? • Finest zoom mode 4096 channel filterbank – AH HA! Implement finest zoom and integrate in frequency for rest – Integration of 1,2,4,8,16,24,32, or 64 channels for all resolution Relative delay of astronomical signal to stations varies with time • Must be remove • Implemented as sample delay correction (coarse) and phase slope across filterbank channels. RFI flagging – input data has flags and internal flagging needed The SKA LOW correlator design challenges Corner turning Filterbanks and correlation engine cannot process all frequency channels simultaneously • Must process part of the bandwidth at time • Filterbanks and correlator to process a few of the 384 LFAA channels at a time Store all frequency channels for short term integration time and read out to filterbanks all time data for limited channels at time Input data – All frequencies for limited time (0.2ms) Output data – All time data for limited bandwidth (0.9, 0.25s) The SKA LOW correlator design challenges Gemini board Proof of Concept (POC) Single FPGA Xilinx Virtex Ultrascale+, water cooled, 4xHybrid Memory Cubes (2 link, 4G), 4x12 fibre optical at 25G, 4x Four to be mounted in a 1U chassis BUT 2 link, 4G HMC is now end-of-life Redesign underway for Prototype HMC high bandwidth memory replaced by integrated HBM (smaller but faster) Add DDR4 for bulk memory The SKA LOW correlator design challenges Design Evolution - POC design – Separate Subsysems With Gemini POC originally had a separate Correlator, Beamfromer and Station Based processing and 0.9s integration time Major corner turn for correlator in the Station based processing (144Gbps per FPGA). But insufficient HMC or 0.9s (1.3TB double buffer but had 43 FPGAs with 16 G each) Must store accumulate full time integration in correlation 0.34TB But uses most of available HMC bandwidth The SKA LOW correlator design challenges Prototype (Unified) Design Change to HBM reduced the available memory by half. Major problems fitting the design in Go to unified design - Station based processing and correlation in the same FPGA. • Number FPGAs that accept inputs from LFAA increased from 43 to 288 – Six times reduction in input bandwidth per FPGA – Can now use DDR4 for corner turn – AND no buffer size limitation 0.9s possible – Correlator can output data SDP for a frequency channel as soon as it is computed. Very little memory need for correlator buffer What looked like a disaster with loss of a key component has lead a better an more robust design The SKA LOW correlator design challenges Connecting the FPGA The Unified design has 288 FPGAs All FPGAs must be able to communicate with all others Switch ?? But the heart of switch is usually and FPGA Number the FPGAs (X,Y,Z) (X 1:8) (Y,Z 1:6) Z, 1 to 8 FPGA Cube Arrange as a cube with these coordinates (3,4,3) Cross connect within rows and colums (3,4,1) Inculding self connection each FPGA has Y, 1 to 6 (1,1,1) (3,1,1) 6 in X, 6 in Y and 8 in Z connections 20 connection in total - 500Gbps The SKA LOW correlator design challenges X, 1 to 6 Data Flow Input FPGA have at most one LFAA input (2 stations full bandwidth) ZXY connections to uniformly distribute data for processing Allows uniform distribution of compute and output in Zoom mode Beamformers must bring all frequency data together use same XYZ Station Processing Array Processing LFAA inputs Correlator Output Buffer Z Buffer In Filterbank Delay RFI XY Buffers Ingest Doppler Correction Z XYZ PSS Beamformer PST Beamformer Corr Emit XY PST Buffer PSS Emit Z PST Emit VLBI Reformat The SKA LOW correlator design challenges SDP Outputs PSS Outputs PST Outputs VLBI Outputs 0.9 to 0.25 sec integration With 0.9s integration output uses 72 (50% full) • One in 4 FPGAs have output. • Aggregate using Z connect All 8 interconnected FPGA send data to two output FPGA half to each At 0.25s (changed requirement) is 144 at 100%. Use 180 at 80% • Simply change to 5 out of 8 FPGAs have outputs. Each FPGA sends 1/5 of its data to an output FPGA. • Design can accommodate 22Tbps of ouput data to SDP without modifications. • Small change to hardware (duplicate the 8 Z connection on each FPA) and can do 44Tbps The SKA LOW correlator design challenges Substations Unified design ease usage of fast memory Without substations 4 LFAA channels are processed in parallel • Need for uniform distribution of load from 4 zoom bands. With 2048 substations process 1LFAA station at a time - 4X stations • Same data rate to correlator, Same size for input buffer to correlator for 2048 substations Correlator process 2048 stations in 16 passes • Buffer for correlation products increases form 55MB to 0.88GB – Progressive readout during processing could reduce this but more complex The SKA LOW correlator design challenges Conclusion CSIRO/ASTRON/AUT have design a flexible FPGA based system for the LOW Correlator It has sufficient spare resources, I/O and memory to accommodate recent requirement changes and still have spare capacity • 20 of 48 internal optical connection per FPGA are currently used. • Further expanasion possible – not I/O limited Zoom mode changes required major redesign of data ordering but no chance to the hardware Changes to integration time and addition of substation were changes only to some subsystems The SKA LOW correlator design challenges Revised Low Correlator and Beamformers Filterbanks From LFAA From LFAA Interconnects in reverse aggregate the data separate filterbank The SKA LOW correlator design challenges To SDP, PSS, PST PSS and PST Beamformers Gemini 6 Gemini per group 1/8 BW per link 128 Gemini All internal links bi-directional From LFAA Gemini From LFAA From LFAA 8th group of 16 per Filtebank/Correlator Now 4 LFAA station per GEMINI Previous was 12 Correlator From LFAA Cross connects deliver part band to each correlator and beamformer. e.g. 2 complete PSS beams Cross connects 1st group of 16 All processing modules identical. 16 groups of 8 also an option Gemini version II On board to rule them all (functions that is) One HMC retained for High Bandwidth External memory Two DDR4 to be added for High Memory depth system Up to 4 12-fibre 25G optics + QSPF,SFP Change to card rack system, Each card a single Gemini II with all I/O Water cooling, up to 200W per card One FPGA per LRU - Reduced (1/4) I/O per line replaceable unit (LRU Pluggable Optics, Power and Water at rear – easy replacement All data connections optical The SKA LOW correlator design challenges SKA1 Overview SKA1-low stations include Station Beamformer Central Signal Processing includes Correlator and Pulsar systems The SKA LOW correlator design challenges The SKA LOW correlator design challenges Thank you CASS John Bunton SKA1 CSP System Engineer t +61 2 9372 4420 e [email protected] w www.atnf.csiro.au/projects/askap PO BOX 76 EPPING, 1710, AUSTRALIA The SKA LOW correlator design challenges Central Signal Processing For Mid and Low Central Signal Processing (CSP) consists of Correlator between all pairs of elements (dish or station) Tied Array Beamforming coherent sum of signals from all elements Tied Array beams are processed by Pulsar Search engine (limited bandwith) Pulsar Timing engine and are used for VLBI LMC - Monitor of performance and control of all functions (NRC Canada, ) The SKA LOW correlator design challenges SKA1 MID antennas (South Africa) The SKA LOW correlator design challenges Mid Dishes 133 15m offset Gregorian Dishes + 64 MeerKAT dishes Total of 197 dishes (Distributed between 1-16 subarrays) maximum baseline, less than 150 km Receivers for 5 bands 0.35 to 1.050 GHz full bandwidth 0.70 GHz at 8 bit resolution 0.95 to 1.76 GHz full bandwidth 0.81 GHz at 8 bit resolution 1.65 to 3.05 GHz not installed during construction 2.80 to 5.18 GHz not installed during construction 4.6 to 13.8 GHz 2 x 2.5GHz! at 4 bits resolution 16 subarrays The SKA LOW correlator design challenges CSP Organisation at PDR 2014 (Correlator) In December 2014 the Preliminary Design Review (PDR) held At that time three telescopes Low, Mid and Survey. Physical Implementation Proposal (PIP) submitted for each Low lead by Oxford University with three designs in a single PIP Uniboard (ASTRON), PowerMX (NRC Canada), Redback (CSIRO) Survey lead by AUT (NZ) considered many options in a single PIP Redback (CSIRO), PowerMX (NRC Canada), Multicore processor, GPUs and ASIC MID lead by NRC Canada had thee separate PIPs PowerMX (Canada), Redback (CSIRO Australia) & SKARAB (S.A.) Project management MDA Canada, Local Monitor Control NRC The SKA LOW correlator design challenges Pulsars The Pulsar teams are Pulsar Timing lead by Swinburne University CPU/GPU based Pulsar Search lead by Manchester University CPU/GPU based with FPGA acceleration/power saving Pulsar search on limited bandwidth (120 MHz Low, 300 MHz Mid) They process array beams (coherent, polarisation corrected sums of data from ~400 stations) generated by CBF The SKA LOW correlator design challenges A shake up for CSP Correlator/Beamformer One outcome of the review was the SKA Office wanted just one design to proceed for each Telescope THEN as total cost too high Rebaselining occurred. Decisions Stop the Survey Telescope Delay work PAFs (led by CSIRO) (Critical to Survey) The politicians stepped and decided which designs would proceed NRC Canada continue to lead Mid (PowerMX) CSIRO to lead Low with ASTRON, (Redback/Uniboard)+AUT This resulted in the South African and UK teams to leaving (taking most of the Systems Engineering with them) The SKA LOW correlator design challenges Pulsar Timing Beamforming Mid and Low form 16 tied array beams Delay aligned, coherent summation of dual polarisation data Must apply polarisation correction for each value summed ~ 3M Jones Matrices for low Time resolution of data 200ns Mid, 2us Low Basically no significant time ripples allowed, narrow impulse response For Mid approach taken was fractional time delay filter on wideband signal to do delay correction and achieve narrow impulse response For Low summation done on narrow band channels, phase only which is cheaper than fractional time delay – followed by synthesis. New approach to avoid synthesis filterbank being investigated. The SKA LOW correlator design challenges Pulsar Timing Pulsar signal is smeared due to dispersion (delay α wavelength2) Must remove dispersion. Pulsar time implement overlap-save convolution on the beamformed time series and the correction filter. Time series ~1 minute, bandwidth 10 MHz, multi-million point FFTs in GPUs. Very stringent timing requirements on data supplied less than 10ns error over a 10 year period. The SKA LOW correlator design challenges Pulsar Search (PSS) Mid to form 1500 power beams at a bandwidth of 300MHz Low to form 500 power beams at a bandwidth of 120MHz dishes/stations in compact area used, PSS beams to fill dish/station beam Coherent summation of dish/station data, phase on narrow channels ~20kHz Polarisation correction to dish/station data (beam centre) and for Low after beamforming. (~800,000 Jones Matrices for MID) Search each beam for Pulsars in PSS engine (GPU/CPU 16 racks, Low) 500 dispersion measures in each dispersion measure acceleration search to 300ms-1 The SKA LOW correlator design challenges Other Functions Both Mid and Low require a transient buffer. in Low allocated to LFFA, 256GB per Station, 150MHz of data, 2-bit precision in Mid allocated to CSP 32GB per dish, 300MHz of data, 2-bit precision Mid produces four VLBI beams. VLBI possible Europe, America, Australia, Asia. not sufficient Low frequency telescope for VLBI with Low. The SKA LOW correlator design challenges Hardware Pulsar Search (PSS) and Pulsar Timing (PST) use common hardware for Mid and Low. CPU/GPU based (FPGA acceleration for PSS). PST two racks at each site PSS 16 rack at Low, dissipating ~160 kW, Mid 59 racks @ ~470 kW wider bandwidth, more beams but lower total delay to search Each compute node process 2 beams (TBC) Correlator and Beamformers are FPGA based Mid based on PowerMX Low based on Perentie (development from Redback+Uniboard) The SKA LOW correlator design challenges Perentie (CSIRO/ASTRON) for Low July 2015 Final confirmation the CSIRO would lead CSP for Low Condition of leadership was to collaborate with ASTON on design. At that time ASTRON were completing their Uniboard II CSIRO platform was Redback-3. Both multi-FPGA boards. For SKA CSIRO had proposed, Redback-5 a board with a single FPGA After a lengthy downselect process it was decided in November to proceed with GEMINI board Four of these to be mounted in a 1U chassis The SKA LOW correlator design challenges
© Copyright 2025 Paperzz