INFN-FE, Angelo Cotta Ramusino 2013-10-24 GTK_DataLinksToDDRBuffer module description last modified 2013-10-24 entity GTK_DataLinksToDDRBuffer is generic ( DL_NUMBER_OF_HIT_PER_FRAME : natural := 320; DL_NUM_48bit_LINK_SINCH_K28_5 : natural := 128; -- FOR SIMULATION_ONLY OTHERWISE DL_NUM_48bit_LINK_SINCH_K28_5 = 150000 DPRAM_CACHE_ADR_WIDTH_IN : natural := 10; DPRAM_CACHE_ADR_WIDTH_OUT : natural := 8; -- DPRAM_CACHE_ADR_WIDTH_IN - 2; DPRAM_IN_WIDTH : natural := 64; DPRAM_OUT_WIDTH : natural := 256; NUMBER_PRE_SELECTION_WINDOWS : natural := 16 ); port ( clock320 : IN STD_LOGIC; -- acr 2013-02-26 "main_clock" label changed to "clock320" (320.64MHz) frontend_clk : IN STD_LOGIC; -- acr 2013-02-26 "frontend_clk" port added data_merger_reset : IN STD_LOGIC; DL_reset : IN STD_LOGIC; -- connected to "tx_digitalreset_1x" at the calling level DDR2_phy_clk : IN STD_LOGIC; -- acr 2013-02-28 added "DDR2_phy_clk" feeding the read side of the "DPRAM_cache" instances DL_tx_clkout : IN STD_LOGIC; -- acr 2012-01-24 frequency of "tx_clkout_TX" is serializer_clk (3.2G) / 10 (8b/10b encoding) / 2 (byte serializer) = 160.32MHz DL_frame_generation_enable : IN STD_LOGIC; -- acr 2012-01-31 introducing an extra control port to control frame transmission DL_frame_rollover_pulse_40MHz : IN STD_LOGIC; -- acr 2012-01-24 DL_rx_clkout : IN STD_LOGIC_VECTOR(1 downto 0); DL_rx_freqlocked : IN STD_LOGIC_VECTOR(1 downto 0); DL_rx_pll_locked : IN STD_LOGIC_VECTOR(1 downto 0); DL_rx_reset : IN STD_LOGIC_VECTOR(1 downto 0); dpram_cache_rd_adr : IN STD_LOGIC_VECTOR(DPRAM_CACHE_ADR_WIDTH_OUT-1 downto 0); -- output port is 4x wide -> out depth is 2^(2 bits less than input) dpram_cache_rden : IN STD_LOGIC; --acr 2013-01-23 BEGIN including XCVR ports to make them available at the top level GTK_rx_datain_ch0 : IN STD_LOGIC_VECTOR(15 DOWNTO 0); GTK_rx_ctrldetect_ch0 : IN STD_LOGIC_VECTOR(1 DOWNTO 0); GTK_rx_datain_ch1 : IN STD_LOGIC_VECTOR(15 DOWNTO 0); GTK_rx_ctrldetect_ch1 : IN STD_LOGIC_VECTOR(1 DOWNTO 0); tx_dataout_ch0 : OUT STD_LOGIC_VECTOR (15 DOWNTO 0); tx_dataout_ch1 : OUT STD_LOGIC_VECTOR (15 DOWNTO 0); tx_ctrlenable_ch0 : OUT STD_LOGIC_VECTOR ( 1 DOWNTO 0); tx_ctrlenable_ch1 : OUT STD_LOGIC_VECTOR ( 1 DOWNTO 0); --acr 2013-01-23 END including XCVR ports to make them available at the top level dbg_trig_gen_enable_out : out std_logic; -- acr 2012-08-10 DPRAM_0_data_valid : OUT STD_LOGIC; DPRAM_1_data_valid : OUT STD_LOGIC; dpram_cache_q : OUT STD_LOGIC_VECTOR(DPRAM_OUT_WIDTH-1 downto 0); --acr 2013-01-23 BEGIN including diagnostic outputs to make them available at the top level total_frame_counter_ch0, error_frame_counter_ch0, good_frame_counter_ch0, total_frame_counter_ch1, error_frame_counter_ch1, good_frame_counter_ch1: OUT std_logic_vector (31 downto 0); -- acr 2013-01-23 don't use array () of std_logic_vector to be compatible with Verilog port assignments error_flag_ch0, error_flag_ch1 : OUT std_logic --acr 2013-01-23 END including diagnostic outputs to make them available at the top level ); end entity; Clock domain(s): o “clock320” : in; 320,64MHz; o “frontend_clk” : in; 240,48MHz; o “DDR2_phy_clk” : in; 120.24 MHz (half of DDR2 SDRAM clock frequency); o “DL_tx_clkout” : in; 160.32 MHz; o “DL_rx_clkout” : in; 160.32 MHz; Overview: The GTK_DataLinksToDDRBuffer module’s main function is to capture, format, pre-order and buffer the data coming from the GTK-RO de-serializers connected to 2 TDCpix data output channels. The module contains thus 2 GTK_DataReceiver modules, 4 input derandomizer FIFOs (two each for each channel) and 2 DPRAM cache memories. The two DPRAMs are alternatively connected (through a DDR2 controller module described by a different source file) to the DDR2 external memory buffer in which the TDCpix data waits to be eventually selected by the L0 trigger. The de-serialized, 16-bit wide, input data is formatted into 48-bit words representing either “hit” measurements or “framing” words (two framing words are generated in this debug design). The 48 bit words are buffered into de-randomizer FIFOs which are written during one frame and read out during the next frame if the CRC-16 calculated on the received stream matches the CRC-16 value contained in the second “framing” word; a CRC mismatch causes instead the filled derandomizer FIFO to be cleared. Two FIFOs are needed for each channel so that there is always one free for writing while the other is being read. The contents of the active (for read operation) input FIFO is read out and transferred to the active dual port SRAM buffer, whose 1024 locations are divided into 16 pages to allow “bucket-ordering” of the TDCpix data while it is stored in the cache DPRAM. The least significant 4 bits of the leading edge coarse time of a data word are used to determine the destination page. When all data from the active input FIFO has been readout (FIFO_empty is set) then each page is written a page termination code. If the input FIFO had been cleared because of a CRC mismatch each of the 16 DPRAM pages will simply contain the terminator code. In order to provide continuous operation the GTK_DataLinksToDDRBuffer module contains two DPRAM_cache modules which are alternatively written and read. The DPRAM cache memories are 64-bit wide at the write side and 256-bit wide at the read side; the read side width matches the input width of the data port of the DDR2 controller. The GTK_DataLinksToDDRBuffer also contains 2 GTK_DataGenerator modules, one per channel, which simulate the data streams coming from the TDCpix. In the design hierarchy targeted for synthesis the data GTK_DataGenerator modules actually drive two high speed (3.2Gbps) serializers which are in turn connected to the GTK-RO optical transceivers and thus data can be looped back through optical fibers. In the design hierarchy used for simulation the loopback of data is done at the testbench level and involves the un-serialized 16-bit data paths. Please note that in the “test_assembly_oct_slave_ta1” module the 16-bit wide output data port of the “GTK_DataGenerator” is directly connected to the input data port of the “GTK_DataReceiver” in the testbench “test_assembly_oct_slave_ta1_tb”. Detailed informations: The details of the operation of the GTK_DataGenerator and the GTK_DataReceiver modules are given in the specific documents. The GTK_DataLinksToDDRBuffer module operates on different clock domains: o “DL_tx_clkout”: in; 160.32 MHz; it comes from the XCVR module at the top of the design hierarchy (or a clock generator in the simulation testbench) and it is only used by the GTK_DataGenerator o “DL_rx_clkout”: in; 160.32 MHz; it comes from the XCVR module at the top of the design hierarchy (or a clock generator in the simulation testbench) and it is used by the GTK_ DataReceiver to move the de-serialized data into the input de-randomizer FIFO which also use the DL_rx_clkout at the “write” side o “clock320”: in; 320,64MHz; it is actually unused in the latest version of the source code o “frontend_clk”: in; 240,48MHz; it comes from the main PLL at the top level of the design hierarchy (or a clock generator in the simulation testbench) and it is the main clock for the operation of the GTK_DataLinksToDDRBuffer: it is the timing source for the extraction of data from the input derandomizer FIFOs and for the “bucket-ordering” and storing of the data into the DPRAM cache o “DDR2_phy_clk” : in; 120.24 MHz (half of DDR2 SDRAM clock frequency): it is the timing source for the burst transfer of the data from the “read” side of the DPRAM cache to the input port of the UNIPHY-based DDR2 controller block Some topics which are worth noticing are described in the following: even / odd toggling: each of the 2 channels handled by the GTK_DataLinksToDDRBuffer module is equipped with 2 derandomizer FIFOs which are alternatively written and read out. In the current implementation the toggling between even and odd is caused by a pulse (lasting one “frontend_clk” period) on the signal “FIFO_ch0_paritytoggle_condition_fend_clk” which is generated when the GTK_DataReceiver module sets (for one “DL_rx_clkout”clock period) its output “GTK_crcvalid_from_trueCRCchkr_chX” (where X is for 0 or 1) regardless whether the CRC check is successful or not; if the check is not successful the de-randomizer FIFO which contains the data of the frame just received is cleared (fig.1) in the current version of the GTK_DataLinksToDDRBuffer source code there is a “derand_FIFO_wrenable_link_X” signal enabling the writing of incoming serial data into the input derandomizer FIFO for channel X. The enable signal is set by the signal “FIFO_chx_wrenable_condition_fend_clk” which is set upon the first successful CRC check for that channel. All bad frames and the first good frame previous the onset of the “derand_FIFO_wrenable_link_X” are therefore lost, i.e. unprocessed the readout of the de-randomizer FIFO is performed under control of the “data_merger_state_machine” with state register “xfer_sm_state”. The FIFO read operation starts, assuming that the CRC check was successful, when a pulse is detected on “FIFO_chX_rdenable_condition_fend_clk” (derived from “GTK_crcvalid_from_trueCRCchkr_chX”) and thus when a whole frame worth of data is stored in the input de-randomizer FIFO. The read operation ends, assuming that the CRC check was successful, when the FIFO becomes empty: at this point the flag “xfer_to_DPRAM_cache_dne_chX” is set and when both channel have set this flag than the state machine proceeds to writing the termination token at the end of each of the 16 “bucket-ordering” pages in which the cache DPRAM is subdivided. If the CRC check was unsuccessful then the “data_merger_state_machine” would find that the (active for reading) de-randomizer FIFOs are empty (because of having been reset) with no “xfer_to_DPRAM_cache_dne_chX” set: this condition cause the “data_merger_state_machine” to wait for the “DL_frame_rollover_pulse_40MHz_metafree” and the jump to writing the termination tokens at the end of each “bucket-ordering” page (fig.2) fig.3 shows a detail of the DPRAM cache address management during the “bucket-ordering” of incoming data: the least significant 4 bits of the leading edge coarse time of a data word are used to determine the destination window. For each window a pointer is incremented at every new data inserted. When the input de-randomizer FIFO is emptied the “data_merger_state_machine” manages the storage of a termination token at the end of each “bucket-order” address window. DPRAM0 / DPRAM1 toggling: after being written with the termination tokens the currently “write-enabled” DPRAM cache becomes available “read-enabled” for data transfer to the external DDR2 memory via the ALTERA UNIPHY-based DDR2 controller described in a different source file. At the same time the complementary DPRAM cache is assumed by this time to have been read out by the DDR2 controller and it is then “write-enable” the “end_of_raw_event_extr_flag” signal is set (synchronously with “frontend_clk”) after the terminators are written to the DPRAM cache and this, with a suitable delay, sets the output “DPRAM_X_data_valid” toward the module which controls the transfer of the TDCpix (or simulated) frame data from the DPRAM cache to the target row of the external DDR2 memory c’era da indicare dimensioni di fifo in e dpram cache e window e parlare di come sono fatti I terminatori nota bene: bisogna ogliere l’errore di crc indotto sulla frame 3 bisognera’ far notare che i 4 bit guardati per il bucket ordering in questo progetto di test sono i bit 24..21 (bit 7..4 del coarse time stamp) the ODD FIFO is cleared the EVEN FIFO is now selected to receive data for the next frame “CRCbad” is high when “CRCvalid” is set (error induced by firmware) the gtk_datareceiver is already receiving the next data frame and calculating the new CRC fig.1 screenshot from functional simulation of “sim_test_assembly_oct_slave_ta1” (exploiting an abstract model for the UNIPHY controller): even/odd toggling details CRC match transferring (simulated) TDCpix data from the de-randomizer FIFOs to the DPRAM cache with “bucket ordering” CRC mis-match writing page terminator tokens after data storage writing page terminator tokens after NO data storage (CRC mismatch) fig.2 screenshot from functional simulation of “sim_test_assembly_oct_slave_ta1” (exploiting an abstract model for the UNIPHY controller): “bucket-ordering” of incoming data and storage into the DPRAM cache memories in two cases: positive CRC-16 check (left frame) and mismatch between the CRC-16 calculated on incoming data and that extracted from the second trailer word fig.3 screenshot from functional simulation of “sim_test_assembly_oct_slave_ta1” (exploiting an abstract model for the UNIPHY controller): detail of DPRAM cache address management during the “bucket-ordering” of incoming data. The least significant 4 bits of the leading edge coarse time of a data word are used to determine the destination window. For each window a pointer is incremented at every new data inserted.
© Copyright 2024 Paperzz