Flow Stats Module James Moscola September 12, 2007 SPP V1 LC Egress with 1x10Gb/s Tx XScale M S F R B U F Rx1 NN Rx2 NN Key Extract NN Lookup SCR S W I T C H NAT Miss Scratch Ring Hdr Format NN TCAM NN M S F T B U F 1x10G Tx2 Stats (1 ME) 1x10G Tx1 NN SRAM3 XScale SRAM Archive Records ‹#› - Flow Stats Module - James Moscola Flow Stats1 SCR Flow Stats2 SRAM SCR NN SCR R T M Freelist QM0 SCR QM1 SCR QM2 SCR QM3 SCR Port Splitter SRAM1 NAT Pkt return SRAM2 XScale SPP V1 LC Egress with 10x1Gb/s Tx XScale M S F R B U F Rx1 NN Rx2 NN Key Extract NN Lookup SCR S W I T C H NAT Miss Scratch Ring Hdr Format NN TCAM NN M S F T B U F SCR SRAM3 XScale SRAM Archive Records ‹#› - Flow Stats Module - James Moscola Flow Stats1 SCR Flow Stats2 SRAM SCR Stats (1 ME) SCR SCR R T M 5x1G Tx1 (P0-P4) 5x1G Tx2 (P5-P9) Freelist QM0 SCR QM1 SCR QM2 SCR QM3 SCR Port Splitter SRAM1 NAT Pkt return SRAM2 XScale Overview of Flow Stats Main functions »Uniquely identify flows based on 6-tuple Hash header values to get an index into a table of records »Maintain packet and byte counts for each flow Compare packet header with header values in record, and increment if same Otherwise, follow hash chain until correct record is found »Send flow information to XScale for archiving every five minutes Secondary functions »Maintain hash table Identify and remove flows that are no longer active Invalid flows are removed so memory can be resused ‹#› - Flow Stats Module - James Moscola Design Considerations Efficiently collisions maintaining a hash table with chained »Efficiently inserting and deleting records Efficiently reading hash table records Synchronization issues »Multiple threads modifying hash table and chains ‹#› - Flow Stats Module - James Moscola Flow Record Total Record Size = 8 32-bit words » V is valid bit Only needed at head of chain ‘1’ for valid record ‘0’ for invalid record » Start timestamp (16-bits) is set when record starts counting flow Reset to zero when record is archived » End timestamp (16-bits) is set each time a packet is seen for the given flow » Packet and Byte counters are incremented for each packet on the given flow Reset to zero when record is archived » Next Record Number is next record in hash chain 0x1FFFF if record is tail Address of next record = (next_record_num * record_size) + collision_table_base_addr LW0 Source Address (32b) LW1 Destination Address (32b) LW2 LW3 LW4 SrcPort (16b) Reserved (12b) V (1b) DestPort (16b) Slice ID (12b) Reserved (14b) Next Record Number (17b) LW5 Packet Counter (32b) LW6 Byte Counter (32b) LW7 Start Timestamp (16b) End Timestamp (16b) = Member of 6-tuple ‹#› - Flow Stats Module - James Moscola Protocol (8b) Timestamp Details Timestamp on XScale is 64-bits Storing 64-bit start and end timestamps would cause each flow record to be too large for a single SRAM read Instead, only store the 16-bits of each timestamp required to represent a five minute time interval » Clock frequency = 1.4 GHz » Timestamp increments every 16 clock cycles » Use bits 41:26 for 16 bit timestamps (226 * 16 cycles)/1.4GHz = .767 seconds 41 * 16 cycles)/1.4GHz =25131.69 seconds (418 minutes) (2 » Time interval that can be represented using these bits .767 seconds through 418 minutes ‹#› - Flow Stats Module - James Moscola Hash Table Memory Allocating 4 MBytes in SRAM Channel 3 for hash table » Supports ~130K records » Divided memory 75% for the main table and 25% for the collision table » Memory required = Main_table_size + Collision_table_size .75*(#records * #bytes/record) + .25*(#records * #bytes/record) ~98K records + ~32K records ~3Mbytes + ~1Mbytes Space for main table and collision table can be adjusted to tune performance » Larger main table means fewer collisions, but still need adequate space for collision table ‹#› - Flow Stats Module - James Moscola Main Table Collision Table ~75% ~25% Inserting Into Hash Table IXP has 3 different hash functions (48-bit, 64-bit, 128-bit) » Using 64-bit hash function is sufficient and takes less time than 128-bit hash function Not including Source Addr or Protocol into address HASH(D.Addr, S.Port, D.Port); Result of hash is used to address the main hash table » Since we want ~100K records in main table, result of hash is used to get as close to 100K entries as possible by adding a 16bit and 15bit chunk from the hash result hash_result(15:0) + hash_result(30:16) = record_number » Records in the main table represent the head of a chain » If slot at head of chain is empty (valid_bit=0), store record there » If slot at head of chain is occupied, compare 6-tuple If 6-tuple matches Main Table If packet_count == 0 then (existing flows will have 0 packet_counts when previous packets on flow have just been archived) – Increment packet_counter for record – Add size of current packet to byte_counter – Set start and end time stamps If packet_count > 0 then – Increment packet_counter for record – Add size of current packet to byte_counter – Set end time stamp If 6-tuples doesn’t match then a collision has occurred and the record needs to be stored in collision table ‹#› - Flow Stats Module - James Moscola Collision Table Hash Collisions Hash collisions are chained in linked list » Head of list is in the main table » Remainder of list is in collision table SRAM ring maintains list of free slots in collision table » Slots are numbered from 0 to #_Collision_Table_Slots Same as next_record_number To convert to memory address (slot_num * record_size) + collision_table_base_addr » When a collision occurs, a pointer to an open slot in the collision table can be retrieved from the SRAM ring » When a record is removed from the collision table, a pointer is returned to the SRAM ring for the invalidated slot Main Table Collision Table SRAM Ring Free list ‹#› - Flow Stats Module - James Moscola Archiving Hash Table Records Send all valid records in hash table to XScale for archiving every 5 minutes For each record in the main table (i.e. start of chain) ... » For each record in hash chain ... If record is valid ... If packet count > 0 then – Send record to XScale via SRAM ring – Set packet count to 0 – Set byte count to 0 – Leave record in table If packet count == 0 then – Flow has already been archived – No packet has arrived on flow in 5 minutes – Record is no longer valid – Delete record from hash table to free memory ‹#› - Flow Stats Module - James Moscola Info Sent to XScale for each flow every 5 minutes LW0 Source Address (32b) LW1 Destination Address (32b) LW2 LW3 SrcPort (16b) Reserved (12b) DestPort (16b) Slice ID (12b) LW4 Packet Counter (32b) LW5 Byte Counter (32b) LW6 Start Timestamp_high (32b) LW7 Start Timestamp_low (32b) LW8 End Timestamp_high (32b) LW9 End Timestamp_low (32b) Protocol (8b) Deleting Records from Hash Table While archiving records » If packet count is zero then remove record from hash table Record has already been archived, and no packets have arrived in the last five minutes To remove a record » If ((record == head) && (record == tail)) Main Table Valid_bit = 0 » Else If ((record == head) && (record != tail)) Replace record with record.next Free the slot for the moved record Collision Table » Else if record != head Set previous records next pointer to record.next Free slot for the deleted record SRAM Ring Free list ‹#› - Flow Stats Module - James Moscola Memory Synchronization Issues Multiple threads reading/writing same block of memory Only allow 1 ME to modify structure of hash table »Inserting and deleting nodes Use global registers to indicate that the structure of the hash table is being modified »Eight global lock registers (1 per thread) to indicate what chain in the hash table is being modified »When a thread wants to insert/delete a record from hash table Store pointer to the head of the hash chain in the threads dedicated global lock register If another thread is processing a packet that hashed to the same hash chain, wait for lock register to clear and restart processing packet Otherwise, continue processing the packet normally Clear global lock register when done with insert/deletes Value of 0xFFFFFFFF indicates that lock is clear ‹#› - Flow Stats Module - James Moscola Flow Stats Execution ME 1 » Init - Configure hash function » 8 threads ME 2 Read packet header Hash packet header Send header and hash result to ME2 for processing (thread numbers may need adjusting) » Init - Load SRAM ring with addresses for each slot in the collision table Init - Set TIMESTAMP to 0 » 7 threads (ctx 1-7) Insert records into hash table Increment counter for records » 1 thread (ctx 0) Archive and delete hash table records ‹#› - Flow Stats Module - James Moscola Diagram of Flow Stats Execution (ME1) get buffer handle from QM 60 cycles read packet header (DRAM) 300 cycles 300 cycles read buffer descriptor (SRAM) 150 cycles send buffer handle to TX 60 cycles build hash key ~50 cycles compute hash 100 cycles send packet info to ME2 60 cycles 300 cycles ~570 cycles ‹#› - Flow Stats Module - James Moscola Diagram of Flow Stats Execution (ME2) Incrementing Counters Iterating through hash chain Locking head of chain » Adds records to hash chain, but doesn’t remove them Best: ~360 cycles Worst: ~520 +160x 60 cycles get packet info from ME1 150 cycles read hash table record (SRAM) valid? Yes x compare record to header match? ~10 cycles Yes 150 cycles insert new record clear lock register set register to lock chain 150 cycles Write START/END time & new counts clear lock register ‹#› - Flow Stats Module - James Moscola tail? Yes Yes count==0? set register to lock chain No No set register to lock chain No set register to lock chain 150 cycles Write END time & new counts clear lock register No 150 cycles get record slot from freelist 150 cycles insert new record clear lock register read next record in chain 150 cycles Diagram of Flow Stats Execution (ME2) Archiving Records Waiting to archive Locking head of chain » Removes records from hash chain, but doesn’t add them » Processing of archiving records occurs every five minutes read current time count == 0? No No send record to XScale 5 minutes? Yes set register to lock chain read next record from main table reset counters and timestamps Yes valid? clear lock register No read next record in chain Yes head of list? No Yes tail of list? more records in chain? done with all records? Yes ‹#› - Flow Stats Module - James Moscola Yes No set register to lock chain set register to lock chain set register to lock chain write next_ptr to previous list item read record.next set valid bit to zero clear lock register replace record with record.next clear lock register return record slot to freelist clear lock register return record.next slot to freelist No No Yes Return from Swap When returning from each CTX switch, always check global lock registers » If any of the global locks contain the address of the hash chain that the current thread is trying to modify, then the hash chain is locked and the current thread must restart processing on the current packet » If none of the global locks contain the address of the hash chain that the current thread is trying to modify, then the current thread can just continue processing that packet as usual check global lock values match current chain? Yes No continue processing packet ‹#› - Flow Stats Module - James Moscola restart procssing packet SPP V1 LC Egress with 1x10Gb/s Tx V: Valid Bit V Rsv Port 1 (3b) (4b) Buffer Handle(24b) QM0 1x10G Tx1 SCR XScale SRAM Flow Stats2 Archive Records Rsvd (3b) SrcPort (16b) Reserved (12b) Freelist DestPort (16b) Slice ID (12b) Packet Counter (32b) Source Address (32b) Byte Counter (32b) Destination Address (32b) Start Timestamp_high (32b) SrcPort (16b) Reserved (8b) Source Address (32b) Destination Address (32b) SRAM SRAM3 QM2 QM3 SCR NN QM1 Flow Stats1 DestPort (16b) Packet Length (16b) Hash Result (17b) Protocol (8b) Slice ID (12b) ‹#› - Flow Stats Module - James Moscola Start Timestamp_low (32b) End Timestamp_high (32b) End Timestamp_low (32b) Protocol (8b) Flow Statistics Module Scratch rings » » » » » QM_TO_FS_RING_1: 0x2400 – 0x27FF QM_TO_FS_RING_2: 0x2800 – 0x2BFF FS1_TO_FS2_RING: 0x2C00 - 0x2FFF FS_TO_TX_RING_1: 0x3000 - 0x33FF FS_TO_TX_RING_2: 0x3400 – 0x37FF // // // // // for for for for for receiving from QM receiving from QM sending data from FS1 to FS2 sending data to TX1 sending data to TX2 SRAM rings » FS2_FREELIST: 0x???? - 0x???? // stores list of open slots in collision table » FS2_TO_XSCALE: 0x???? – 0x???? // for sending record information to the XScale for archiving LC Egress SRAM Channel 3 info for Flow Stats » HASH_CHAIN_TAIL » ARCHIVE_DELAY 0x1FFFF 0x0188 » » » » 8 * 4 = 32 // 8 32-bit words/record * 4 bytes/word 130688 // MAX with 4 MB table is ~130K records 98304 // NUM_HASH_TABLE_RECORDS<=TOTAL_NUM_RECORDS (mod 32 = 0) TOTAL_NUM_RECORDS - NUM_HASH_TABLE_RECORDS = 32384 RECORD_SIZE TOTAL_NUM_RECORDS NUM_HASH_TABLE_RECORDS NUM_COLLISION_TABLE_RECORDS » LCE_FS_HASH_TABLE_BASE » LCE_FS_HASH_TABLE_SIZE » LCE_FS_COLLISION_TABLE_BASE // indicates the end of a hash chain // 5 minutes SRAM_CHANNEL_3_BASE_ADDR + 0x200000 = 0xC0200000 0x400000 (HASH_TABLE_BASE + (RECORD_SIZE * NUM_HASH_TABLE_RECORDS)) = 0xC0500000 ‹#› - Flow Stats Module - James Moscola End ‹#› - Flow Stats Module - James Moscola
© Copyright 2026 Paperzz