the ODD FIFO is cleared

INFN-FE, Angelo Cotta Ramusino 2013-10-24
GTK_DataLinksToDDRBuffer module description
last modified 2013-10-24
entity GTK_DataLinksToDDRBuffer is
generic (
DL_NUMBER_OF_HIT_PER_FRAME
: natural := 320;
DL_NUM_48bit_LINK_SINCH_K28_5 : natural := 128;
-- FOR SIMULATION_ONLY OTHERWISE DL_NUM_48bit_LINK_SINCH_K28_5
= 150000
DPRAM_CACHE_ADR_WIDTH_IN
: natural := 10;
DPRAM_CACHE_ADR_WIDTH_OUT
: natural := 8; -- DPRAM_CACHE_ADR_WIDTH_IN - 2;
DPRAM_IN_WIDTH
: natural := 64;
DPRAM_OUT_WIDTH
: natural := 256;
NUMBER_PRE_SELECTION_WINDOWS
: natural := 16
);
port
(
clock320 : IN STD_LOGIC; -- acr 2013-02-26 "main_clock" label changed to "clock320" (320.64MHz)
frontend_clk : IN STD_LOGIC; -- acr 2013-02-26 "frontend_clk" port added
data_merger_reset : IN STD_LOGIC;
DL_reset : IN STD_LOGIC; -- connected to "tx_digitalreset_1x" at the calling level
DDR2_phy_clk : IN STD_LOGIC; -- acr 2013-02-28 added "DDR2_phy_clk" feeding the read side of the "DPRAM_cache" instances
DL_tx_clkout : IN STD_LOGIC; -- acr 2012-01-24 frequency of "tx_clkout_TX" is serializer_clk (3.2G) / 10 (8b/10b encoding) / 2 (byte serializer) = 160.32MHz
DL_frame_generation_enable : IN STD_LOGIC; -- acr 2012-01-31 introducing an extra control port to control frame transmission
DL_frame_rollover_pulse_40MHz : IN STD_LOGIC; -- acr 2012-01-24
DL_rx_clkout
: IN STD_LOGIC_VECTOR(1 downto 0);
DL_rx_freqlocked
: IN STD_LOGIC_VECTOR(1 downto 0);
DL_rx_pll_locked
: IN STD_LOGIC_VECTOR(1 downto 0);
DL_rx_reset
: IN STD_LOGIC_VECTOR(1 downto 0);
dpram_cache_rd_adr : IN STD_LOGIC_VECTOR(DPRAM_CACHE_ADR_WIDTH_OUT-1 downto 0); -- output port is 4x wide -> out depth is 2^(2 bits less than input)
dpram_cache_rden : IN STD_LOGIC;
--acr 2013-01-23 BEGIN including XCVR ports to make them available at the top level
GTK_rx_datain_ch0
: IN STD_LOGIC_VECTOR(15 DOWNTO 0);
GTK_rx_ctrldetect_ch0 : IN STD_LOGIC_VECTOR(1 DOWNTO 0);
GTK_rx_datain_ch1
: IN STD_LOGIC_VECTOR(15 DOWNTO 0);
GTK_rx_ctrldetect_ch1 : IN STD_LOGIC_VECTOR(1 DOWNTO 0);
tx_dataout_ch0 : OUT STD_LOGIC_VECTOR (15 DOWNTO 0);
tx_dataout_ch1 : OUT STD_LOGIC_VECTOR (15 DOWNTO 0);
tx_ctrlenable_ch0 : OUT STD_LOGIC_VECTOR ( 1 DOWNTO 0);
tx_ctrlenable_ch1 : OUT STD_LOGIC_VECTOR ( 1 DOWNTO 0);
--acr 2013-01-23 END including XCVR ports to make them available at the top level
dbg_trig_gen_enable_out
: out std_logic; -- acr 2012-08-10
DPRAM_0_data_valid
: OUT STD_LOGIC;
DPRAM_1_data_valid
: OUT STD_LOGIC;
dpram_cache_q : OUT STD_LOGIC_VECTOR(DPRAM_OUT_WIDTH-1 downto 0);
--acr 2013-01-23 BEGIN including diagnostic outputs to make them available at the top level
total_frame_counter_ch0,
error_frame_counter_ch0,
good_frame_counter_ch0,
total_frame_counter_ch1,
error_frame_counter_ch1,
good_frame_counter_ch1: OUT std_logic_vector (31 downto 0); -- acr 2013-01-23 don't use array () of std_logic_vector to be compatible with Verilog port assignments
error_flag_ch0,
error_flag_ch1
: OUT std_logic
--acr 2013-01-23 END including diagnostic outputs to make them available at the top level
);
end entity;

Clock domain(s):
o “clock320”
: in; 320,64MHz;
o “frontend_clk” : in; 240,48MHz;
o “DDR2_phy_clk” : in; 120.24 MHz (half of DDR2 SDRAM clock frequency);
o “DL_tx_clkout” : in; 160.32 MHz;
o “DL_rx_clkout” : in; 160.32 MHz;

Overview:
The GTK_DataLinksToDDRBuffer module’s main function is to capture, format, pre-order and buffer the data
coming from the GTK-RO de-serializers connected to 2 TDCpix data output channels. The module contains thus 2
GTK_DataReceiver modules, 4 input derandomizer FIFOs (two each for each channel) and 2 DPRAM cache memories.
The two DPRAMs are alternatively connected (through a DDR2 controller module described by a different source file)
to the DDR2 external memory buffer in which the TDCpix data waits to be eventually selected by the L0 trigger. The
de-serialized, 16-bit wide, input data is formatted into 48-bit words representing either “hit” measurements or “framing”
words (two framing words are generated in this debug design). The 48 bit words are buffered into de-randomizer FIFOs
which are written during one frame and read out during the next frame if the CRC-16 calculated on the received stream
matches the CRC-16 value contained in the second “framing” word; a CRC mismatch causes instead the filled derandomizer FIFO to be cleared. Two FIFOs are needed for each channel so that there is always one free for writing
while the other is being read.
The contents of the active (for read operation) input FIFO is read out and transferred to the active dual port
SRAM buffer, whose 1024 locations are divided into 16 pages to allow “bucket-ordering” of the TDCpix data while it
is stored in the cache DPRAM. The least significant 4 bits of the leading edge coarse time of a data word are used to
determine the destination page. When all data from the active input FIFO has been readout (FIFO_empty is set) then
each page is written a page termination code. If the input FIFO had been cleared because of a CRC mismatch each of
the 16 DPRAM pages will simply contain the terminator code. In order to provide continuous operation the
GTK_DataLinksToDDRBuffer module contains two DPRAM_cache modules which are alternatively written and read.
The DPRAM cache memories are 64-bit wide at the write side and 256-bit wide at the read side; the read side width
matches the input width of the data port of the DDR2 controller. The GTK_DataLinksToDDRBuffer also contains 2
GTK_DataGenerator modules, one per channel, which simulate the data streams coming from the TDCpix.
In the design hierarchy targeted for synthesis the data GTK_DataGenerator modules actually drive two high
speed (3.2Gbps) serializers which are in turn connected to the GTK-RO optical transceivers and thus data can be looped
back through optical fibers. In the design hierarchy used for simulation the loopback of data is done at the testbench
level and involves the un-serialized 16-bit data paths.
Please note that in the “test_assembly_oct_slave_ta1” module the 16-bit wide output data port of the
“GTK_DataGenerator” is directly connected to the input data port of the “GTK_DataReceiver” in the testbench
“test_assembly_oct_slave_ta1_tb”.

Detailed informations:
The details of the operation of the GTK_DataGenerator and the GTK_DataReceiver modules are given in the
specific documents.
The GTK_DataLinksToDDRBuffer module operates on different clock domains:
o “DL_tx_clkout”: in; 160.32 MHz; it comes from the XCVR module at the top of the design hierarchy
(or a clock generator in the simulation testbench) and it is only used by the GTK_DataGenerator
o “DL_rx_clkout”: in; 160.32 MHz; it comes from the XCVR module at the top of the design hierarchy
(or a clock generator in the simulation testbench) and it is used by the GTK_ DataReceiver to move
the de-serialized data into the input de-randomizer FIFO which also use the DL_rx_clkout at the
“write” side
o “clock320”: in; 320,64MHz; it is actually unused in the latest version of the source code
o “frontend_clk”: in; 240,48MHz; it comes from the main PLL at the top level of the design hierarchy
(or a clock generator in the simulation testbench) and it is the main clock for the operation of the
GTK_DataLinksToDDRBuffer: it is the timing source for the extraction of data from the input
derandomizer FIFOs and for the “bucket-ordering” and storing of the data into the DPRAM cache
o “DDR2_phy_clk” : in; 120.24 MHz (half of DDR2 SDRAM clock frequency): it is the timing source
for the burst transfer of the data from the “read” side of the DPRAM cache to the input port of the
UNIPHY-based DDR2 controller block
Some topics which are worth noticing are described in the following:
 even / odd toggling: each of the 2 channels handled by the GTK_DataLinksToDDRBuffer module is equipped
with 2 derandomizer FIFOs which are alternatively written and read out. In the current implementation the
toggling between even and odd is caused by a pulse (lasting one “frontend_clk” period) on the signal





“FIFO_ch0_paritytoggle_condition_fend_clk” which is generated when the GTK_DataReceiver module sets
(for one “DL_rx_clkout”clock period) its output “GTK_crcvalid_from_trueCRCchkr_chX” (where X is for 0
or 1) regardless whether the CRC check is successful or not; if the check is not successful the de-randomizer
FIFO which contains the data of the frame just received is cleared (fig.1)
in the current version of the GTK_DataLinksToDDRBuffer source code there is a
“derand_FIFO_wrenable_link_X” signal enabling the writing of incoming serial data into the input
derandomizer
FIFO
for
channel
X.
The
enable
signal
is
set
by
the
signal
“FIFO_chx_wrenable_condition_fend_clk” which is set upon the first successful CRC check for that channel.
All bad frames and the first good frame previous the onset of the “derand_FIFO_wrenable_link_X” are
therefore lost, i.e. unprocessed
the readout of the de-randomizer FIFO is performed under control of the “data_merger_state_machine” with
state register “xfer_sm_state”. The FIFO read operation starts, assuming that the CRC check was successful,
when a pulse
is detected
on “FIFO_chX_rdenable_condition_fend_clk” (derived
from
“GTK_crcvalid_from_trueCRCchkr_chX”) and thus when a whole frame worth of data is stored in the input
de-randomizer FIFO. The read operation ends, assuming that the CRC check was successful, when the FIFO
becomes empty: at this point the flag “xfer_to_DPRAM_cache_dne_chX” is set and when both channel have
set this flag than the state machine proceeds to writing the termination token at the end of each of the 16
“bucket-ordering” pages in which the cache DPRAM is subdivided. If the CRC check was unsuccessful then
the “data_merger_state_machine” would find that the (active for reading) de-randomizer FIFOs are empty
(because of having been reset) with no “xfer_to_DPRAM_cache_dne_chX” set: this condition cause the
“data_merger_state_machine” to wait for the “DL_frame_rollover_pulse_40MHz_metafree” and the jump to
writing the termination tokens at the end of each “bucket-ordering” page (fig.2)
fig.3 shows a detail of the DPRAM cache address management during the “bucket-ordering” of incoming data:
the least significant 4 bits of the leading edge coarse time of a data word are used to determine the destination
window. For each window a pointer is incremented at every new data inserted. When the input de-randomizer
FIFO is emptied the “data_merger_state_machine” manages the storage of a termination token at the end of
each “bucket-order” address window.
DPRAM0 / DPRAM1 toggling: after being written with the termination tokens the currently “write-enabled”
DPRAM cache becomes available “read-enabled” for data transfer to the external DDR2 memory via the
ALTERA UNIPHY-based DDR2 controller described in a different source file. At the same time the
complementary DPRAM cache is assumed by this time to have been read out by the DDR2 controller and it is
then “write-enable”
the “end_of_raw_event_extr_flag” signal is set (synchronously with “frontend_clk”) after the terminators are
written to the DPRAM cache and this, with a suitable delay, sets the output “DPRAM_X_data_valid” toward
the module which controls the transfer of the TDCpix (or simulated) frame data from the DPRAM cache to the
target row of the external DDR2 memory
c’era da indicare dimensioni di fifo in e dpram cache e window e parlare di come sono fatti I terminatori
nota bene: bisogna ogliere l’errore di crc indotto sulla frame 3
bisognera’ far notare che i 4 bit guardati per il bucket ordering in questo progetto di test sono i bit 24..21 (bit 7..4 del
coarse time stamp)
the ODD FIFO is cleared
the EVEN FIFO is now selected to
receive data for the next frame
“CRCbad” is high when “CRCvalid”
is set (error induced by firmware)
the gtk_datareceiver is already
receiving the next data frame
and calculating the new CRC
fig.1 screenshot from functional simulation of “sim_test_assembly_oct_slave_ta1” (exploiting an abstract model for the
UNIPHY controller): even/odd toggling details
CRC match
transferring (simulated) TDCpix
data from the de-randomizer
FIFOs to the DPRAM cache with
“bucket ordering”
CRC mis-match
writing page terminator tokens
after data storage
writing page terminator tokens after
NO data storage (CRC mismatch)
fig.2 screenshot from functional simulation of “sim_test_assembly_oct_slave_ta1” (exploiting an abstract model for the
UNIPHY controller): “bucket-ordering” of incoming data and storage into the DPRAM cache memories in two cases:
positive CRC-16 check (left frame) and mismatch between the CRC-16 calculated on incoming data and that extracted
from the second trailer word
fig.3 screenshot from functional simulation of “sim_test_assembly_oct_slave_ta1” (exploiting an abstract model for the
UNIPHY controller): detail of DPRAM cache address management during the “bucket-ordering” of incoming data. The
least significant 4 bits of the leading edge coarse time of a data word are used to determine the destination window. For
each window a pointer is incremented at every new data inserted.