The SKA LOW correlator design challenges

The SKA LOW correlator design
challenges
John Bunton | CSP System Engineer
C4SKA, Auckland, 9-10 February, 2017
CSIRO ASTRONOMY AND SPACE SCIENCE
SKA1 Low antenna station (Australia)
The SKA LOW correlator design challenges
Station beamforming part of
Receptor Sub-Element (LFAA)
The SKA LOW correlator design challenges
Low Stations
Low Frequency Aperture Array - LFAA
LFAA has 512 station, maximum baseline less than 65km
•
Distributed between 1-16 subarrays
Each station 256 dual polarisation log periodic antenna.
•
Frequency band 50-350MHz
Signal is sent using RF-over-Fibre to Digitiser/beamformer.
Output to CSP_Low
•
•
•
•
•
384 “coarse” channels per station (781 kHz each, 300 MHz total)
Channels can have any of 8 different look directions.
Stations can be in any of 16 subarrays
Up to 128 look directions!
5.8Tbps of data
The SKA LOW correlator design challenges
Central Signal Processing
Central Signal Processing (CSP) tasked to take the data from LFAA
and produce
• visibilitiy data and
• Pulsar products.
CSP divided into 4 sub-elements
•
•
•
•
Correlator and beamform
Pulsar timing
Pulsar search and
Local Monitor and control
Correlator and beamformer (CBF) work package is done by
CSIRO (Australia), ASTRON (Netherlands) and AUT (New Zealand)
The SKA LOW correlator design challenges
Correlator – “standard” mode
Low correlator full Stokes (all polarisation parameters)
Low 524,800 correlations per frequency channel
Upt to 65k channels across the band
Low 34G correlation per dump, 0.25s dumps, 11.0Tbps output
Note originally 0.9s
Compute load in the correlator (one correlation acc equiv 8 Flop)
Low 524,800 correlation at 0.3GHz = 1.26 Petaflops
Other processing within CSP similar in compute load
MWA
Correlator TFLOPS
Inpt data rate Tbits/sec
Output data rate Gbit/sec
The SKA LOW correlator design challenges
8
0.08
3
LOFAR
19
0.34
97
JVLA
131
3.1
1
ASKAP
224
12.4
20
ALMA SKA1_LOW SKA1_MID
746
1258
2484
10.4
5.8
29.9
48
10995
2543
Frequency reolution
Frequency Resolution “standard” observing
• 4.6 kHz across 300 MHz
• 52k frequency channels
Or Zoom Mode - 4 bands
•
•
•
•
•
•
•
•
Zoom band bandwidth
each 4, 8, 16, 32, 64, 128 or 256 MHz
Note orginally just 4, 8, 16, 32 MHz
Zoom band centre frequency – anywhere in observing band.
Band overlap allowed
An LFAA frequency channels can be in any zoom band
16k output frequency channels per zoom bnad
resolutions 0.23 to 14.5 kHz
The SKA LOW correlator design challenges
Requirement churn
Recent Engineering Change Proposal has allows an exchange of
bandwidth for number of inputs.
Each of 512 LFFA stations can be configured to sum subsets of the 256
• Example over 150 MHz sum two separate sets of 128 antennas
– Looks like two smaller stations (substations) each with 150MHz bandwidth.
– Input to correlator now 1024 station, to maintain total numbe of correlations
constant bandwidth is reduced to 75MHz.
Must design so it is possibel to accomodate 1204 or 2048 substation
Other major recent changes
• Added zoom modes 64-256 MHz. Required a major design change.
• Decrease in integation time – major increase in output data rate
Design must be capable of adapting to requirement change.
The SKA LOW correlator design challenges
Station based processing
FX correlator implemented – channelise data to final resolution
before correlations
• 8 diiferent frequency resolutions (226 to 14.5kHz) - 8 filterbanks ???
• Finest zoom mode 4096 channel filterbank
– AH HA! Implement finest zoom and integrate in frequency for rest
– Integration of 1,2,4,8,16,24,32, or 64 channels for all resolution
Relative delay of astronomical signal to stations varies with time
• Must be remove
• Implemented as sample delay correction (coarse) and phase slope across
filterbank channels.
RFI flagging – input data has flags and internal flagging needed
The SKA LOW correlator design challenges
Corner turning
Filterbanks and correlation engine cannot process all frequency
channels simultaneously
• Must process part of the bandwidth at time
• Filterbanks and correlator to process a few of the 384 LFAA channels at a time
Store all frequency channels for short term integration time and
read out to filterbanks all time data for limited channels at time
Input data – All frequencies for limited time (0.2ms)
Output data – All time data for limited bandwidth (0.9, 0.25s)
The SKA LOW correlator design challenges
Gemini board
Proof of Concept (POC)
Single FPGA Xilinx Virtex Ultrascale+, water
cooled, 4xHybrid Memory Cubes (2 link,
4G), 4x12 fibre optical at 25G, 4x
Four to be mounted in a 1U chassis
BUT 2 link, 4G HMC is now end-of-life
Redesign underway for Prototype
HMC high bandwidth memory replaced by
integrated HBM (smaller but faster)
Add DDR4 for bulk memory
The SKA LOW correlator design challenges
Design Evolution - POC design
– Separate Subsysems
With Gemini POC originally had a separate Correlator, Beamfromer
and Station Based processing and 0.9s integration time
Major corner turn for correlator in the Station based processing
(144Gbps per FPGA). But insufficient HMC or 0.9s (1.3TB double
buffer but had 43 FPGAs with 16 G each)
Must store accumulate full time integration in correlation 0.34TB
But uses most of available HMC bandwidth
The SKA LOW correlator design challenges
Prototype (Unified) Design
Change to HBM reduced the available memory by half. Major
problems fitting the design in
Go to unified design - Station based processing and correlation in
the same FPGA.
• Number FPGAs that accept inputs from LFAA increased from 43 to 288
– Six times reduction in input bandwidth per FPGA
– Can now use DDR4 for corner turn
– AND no buffer size limitation 0.9s possible
– Correlator can output data SDP for a frequency channel as soon as it is
computed. Very little memory need for correlator buffer
What looked like a disaster with loss of a key component has lead a
better an more robust design
The SKA LOW correlator design challenges
Connecting the FPGA
The Unified design has 288 FPGAs
All FPGAs must be able to communicate with all others
Switch ??
But the heart of switch is usually and FPGA
Number the FPGAs (X,Y,Z) (X 1:8) (Y,Z 1:6)
Z, 1 to 8
FPGA Cube
Arrange as a cube with these coordinates
(3,4,3)
Cross connect within rows and colums
(3,4,1)
Inculding self connection each FPGA has
Y, 1 to 6
(1,1,1)
(3,1,1)
6 in X, 6 in Y and 8 in Z connections
20 connection in total - 500Gbps
The SKA LOW correlator design challenges
X, 1 to 6
Data Flow
Input FPGA have at most one LFAA input (2 stations full bandwidth)
ZXY connections to uniformly distribute data for processing
Allows uniform distribution of compute and output in Zoom mode
Beamformers must bring all frequency data together use same XYZ
Station Processing
Array Processing
LFAA
inputs
Correlator
Output Buffer
Z
Buffer In
Filterbank
Delay
RFI
XY
Buffers
Ingest
Doppler
Correction
Z
XYZ
PSS Beamformer
PST Beamformer
Corr Emit
XY
PST
Buffer
PSS Emit
Z
PST Emit
VLBI
Reformat
The SKA LOW correlator design challenges
SDP
Outputs
PSS
Outputs
PST
Outputs
VLBI
Outputs
0.9 to 0.25 sec integration
With 0.9s integration output uses 72 (50% full)
• One in 4 FPGAs have output.
• Aggregate using Z connect All 8 interconnected FPGA send data to two output
FPGA half to each
At 0.25s (changed requirement) is 144 at 100%. Use 180 at 80%
• Simply change to 5 out of 8 FPGAs have outputs. Each FPGA sends 1/5 of its
data to an output FPGA.
• Design can accommodate 22Tbps of ouput data to SDP without modifications.
• Small change to hardware (duplicate the 8 Z connection on each FPA) and can
do 44Tbps
The SKA LOW correlator design challenges
Substations
Unified design ease usage of fast memory
Without substations 4 LFAA channels are processed in parallel
• Need for uniform distribution of load from 4 zoom bands.
With 2048 substations process 1LFAA station at a time - 4X stations
• Same data rate to correlator, Same size for input buffer to correlator for 2048
substations
Correlator process 2048 stations in 16 passes
• Buffer for correlation products increases form 55MB to 0.88GB
– Progressive readout during processing could reduce this but more complex
The SKA LOW correlator design challenges
Conclusion
CSIRO/ASTRON/AUT have design a flexible FPGA based system for
the LOW Correlator
It has sufficient spare resources, I/O and memory to accommodate
recent requirement changes and still have spare capacity
• 20 of 48 internal optical connection per FPGA are currently used.
• Further expanasion possible – not I/O limited
Zoom mode changes required major redesign of data ordering but
no chance to the hardware
Changes to integration time and addition of substation were
changes only to some subsystems
The SKA LOW correlator design challenges
Revised Low Correlator and Beamformers
Filterbanks
From LFAA
From LFAA
Interconnects in reverse
aggregate the data
separate filterbank
The SKA LOW correlator design challenges
To SDP,
PSS, PST
PSS and PST
Beamformers
Gemini
6 Gemini
per group
1/8
BW
per
link
128 Gemini
All internal links bi-directional
From LFAA
Gemini
From LFAA
From LFAA
8th
group of 16
per Filtebank/Correlator
Now 4 LFAA station per GEMINI
Previous was 12
Correlator
From LFAA
Cross connects deliver part
band to each correlator and
beamformer.
e.g. 2 complete PSS beams
Cross
connects
1st group of
16
All processing modules
identical.
16 groups of 8 also an option
Gemini version II
On board to rule them all (functions that is)
One HMC retained for High Bandwidth External memory
Two DDR4 to be added for High Memory depth system
Up to 4 12-fibre 25G optics + QSPF,SFP
Change to card rack system, Each card a single Gemini II with all I/O
Water cooling, up to 200W per card
One FPGA per LRU - Reduced (1/4) I/O per line replaceable unit (LRU
Pluggable Optics, Power and Water at rear – easy replacement
All data connections optical
The SKA LOW correlator design challenges
SKA1 Overview
SKA1-low stations
include Station
Beamformer
Central Signal
Processing includes
Correlator and
Pulsar systems
The SKA LOW correlator design challenges
The SKA LOW correlator design challenges
Thank you
CASS
John Bunton
SKA1 CSP System Engineer
t +61 2 9372 4420
e [email protected]
w www.atnf.csiro.au/projects/askap
PO BOX 76 EPPING, 1710, AUSTRALIA
The SKA LOW correlator design challenges
Central Signal Processing
For Mid and Low Central Signal Processing (CSP) consists of
Correlator between all pairs of elements (dish or station)
Tied Array Beamforming
coherent sum of signals from all elements
Tied Array beams are processed by
Pulsar Search engine (limited bandwith)
Pulsar Timing engine
and are used for VLBI
LMC - Monitor of performance and control of all functions
(NRC Canada, )
The SKA LOW correlator design challenges
SKA1 MID antennas (South Africa)
The SKA LOW correlator design challenges
Mid Dishes
133 15m offset Gregorian Dishes + 64 MeerKAT dishes
Total of 197 dishes (Distributed between 1-16 subarrays)
maximum baseline, less than 150 km
Receivers for 5 bands
0.35 to 1.050 GHz full bandwidth 0.70 GHz at 8 bit resolution
0.95 to 1.76 GHz full bandwidth 0.81 GHz at 8 bit resolution
1.65 to 3.05 GHz
not installed during construction
2.80 to 5.18 GHz
not installed during construction
4.6 to 13.8 GHz 2 x 2.5GHz! at 4 bits resolution
16 subarrays
The SKA LOW correlator design challenges
CSP Organisation at PDR 2014 (Correlator)
In December 2014 the Preliminary Design Review (PDR) held
At that time three telescopes Low, Mid and Survey.
Physical Implementation Proposal (PIP) submitted for each
Low lead by Oxford University with three designs in a single PIP
Uniboard (ASTRON), PowerMX (NRC Canada), Redback (CSIRO)
Survey lead by AUT (NZ) considered many options in a single PIP
Redback (CSIRO), PowerMX (NRC Canada),
Multicore processor, GPUs and ASIC
MID lead by NRC Canada had thee separate PIPs
PowerMX (Canada), Redback (CSIRO Australia) & SKARAB (S.A.)
Project management MDA Canada, Local Monitor Control NRC
The SKA LOW correlator design challenges
Pulsars
The Pulsar teams are
Pulsar Timing lead by Swinburne University
CPU/GPU based
Pulsar Search lead by Manchester University
CPU/GPU based with FPGA acceleration/power saving
Pulsar search on limited bandwidth (120 MHz Low, 300 MHz Mid)
They process array beams (coherent, polarisation corrected sums
of data from ~400 stations) generated by CBF
The SKA LOW correlator design challenges
A shake up for CSP Correlator/Beamformer
One outcome of the review was the SKA Office wanted just one
design to proceed for each Telescope
THEN as total cost too high Rebaselining occurred. Decisions
Stop the Survey Telescope
Delay work PAFs (led by CSIRO) (Critical to Survey)
The politicians stepped and decided which designs would proceed
NRC Canada continue to lead Mid (PowerMX)
CSIRO to lead Low with ASTRON, (Redback/Uniboard)+AUT
This resulted in the South African and UK teams to leaving
(taking most of the Systems Engineering with them)
The SKA LOW correlator design challenges
Pulsar Timing Beamforming
Mid and Low form 16 tied array beams
Delay aligned, coherent summation of dual polarisation data
Must apply polarisation correction for each value summed
~ 3M Jones Matrices for low
Time resolution of data 200ns Mid, 2us Low
Basically no significant time ripples allowed, narrow impulse response
For Mid approach taken was fractional time delay filter on wideband
signal to do delay correction and achieve narrow impulse response
For Low summation done on narrow band channels, phase only
which is cheaper than fractional time delay – followed by synthesis.
New approach to avoid synthesis filterbank being investigated.
The SKA LOW correlator design challenges
Pulsar Timing
Pulsar signal is smeared due to dispersion (delay α wavelength2)
Must remove dispersion.
Pulsar time implement overlap-save convolution on the
beamformed time series and the correction filter.
Time series ~1 minute, bandwidth 10 MHz, multi-million point
FFTs in GPUs.
Very stringent timing requirements on data supplied
less than 10ns error over a 10 year period.
The SKA LOW correlator design challenges
Pulsar Search (PSS)
Mid to form 1500 power beams at a bandwidth of 300MHz
Low to form 500 power beams at a bandwidth of 120MHz
dishes/stations in compact area used, PSS beams to fill dish/station
beam
Coherent summation of dish/station data, phase on narrow channels
~20kHz
Polarisation correction to dish/station data (beam centre) and for
Low after beamforming. (~800,000 Jones Matrices for MID)
Search each beam for Pulsars in PSS engine (GPU/CPU 16 racks, Low)
500 dispersion measures
in each dispersion measure acceleration search to 300ms-1
The SKA LOW correlator design challenges
Other Functions
Both Mid and Low require a transient buffer.
in Low allocated to LFFA, 256GB per Station, 150MHz of data, 2-bit precision
in Mid allocated to CSP
32GB per dish,
300MHz of data, 2-bit precision
Mid produces four VLBI beams.
VLBI possible Europe, America, Australia, Asia.
not sufficient Low frequency telescope for VLBI with Low.
The SKA LOW correlator design challenges
Hardware
Pulsar Search (PSS) and Pulsar Timing (PST) use common hardware
for Mid and Low. CPU/GPU based (FPGA acceleration for PSS).
PST two racks at each site
PSS 16 rack at Low, dissipating ~160 kW, Mid 59 racks @ ~470 kW
wider bandwidth, more beams but lower total delay to search
Each compute node process 2 beams (TBC)
Correlator and Beamformers are FPGA based
Mid based on PowerMX
Low based on Perentie (development from Redback+Uniboard)
The SKA LOW correlator design challenges
Perentie (CSIRO/ASTRON) for Low
July 2015 Final confirmation the CSIRO would lead CSP for Low
Condition of leadership was to collaborate with ASTON on design.
At that time ASTRON were completing their Uniboard II
CSIRO platform was Redback-3. Both multi-FPGA boards.
For SKA CSIRO had proposed, Redback-5 a board with a single FPGA
After a lengthy downselect process it was decided in November to
proceed with GEMINI board
Four of these to be mounted in a 1U chassis
The SKA LOW correlator design challenges