White Paper ® APEX CAM as Cache for External CAM Introduction In designs that use larger, discrete content-addressable memory (CAM) devices, you can use embedded CAM as a cache block for an external CAM device to store more frequently requested data. External CAM devices have longer access times compared to the embedded CAM in APEX™ devices. This white paper describes how embedded CAM can be used with external CAM and a cache megafunction to decrease access time and perform faster searches. The CAM memory architecture implements fast searches by using the contents of the memory to locate the address. With CAM, the system supplies the data, and CAM searches its memory space and returns the address where the data was found. CAM can accelerate any application requiring fast searches through a database or list. Cache for Large External CAM CAM cache speeds memory access by providing an embedded structure that stores words that are frequently requested from the external CAM. Because the CAM cache is faster and has less latency than an external device, it can present the requesting system with a match address several clock cycles before the external CAM device. The CAM cache is connected in parallel with the external memory source; the system sends a read request simultaneously to the cache and the external CAM. If the data word is stored in the CAM cache memory, it will report the match to the system match, usually several clock cycles before the external CAM device will find a match. If the CAM cache reports a miss, it will wait for the external CAM to find a match and the cache will then proceed to write the new word and address. There are two major cache architectures: direct mapped and fully associative. Direct mapped cache requires that each word is only stored at one particular location in the cache, while fully associative cache allows a word to be stored at any location in the cache. Direct Mapped Cache To reduce resource usage, direct mapped cache stores each word from external memory in the same specific cache location. Cache typically uses CAM to index a RAM table containing the appropriate data. Cache cannot be implemented exclusively in the APEX embedded CAM block because the embedded CAM’s cache has a smaller address width than the external CAM. For example, a 32-word cache block has an address width of five bits, whereas a 256-word CAM block has an address width of eight bits. Therefore, a RAM block is required to store the additional three address bits that the embedded CAM block cannot store. Figure 1 shows an example of a 4-word × 5-bit cache block designed to cache a 32-word × 5-bit external CAM block. This is shown as an example; actual applications will store more data. The match address is stored collectively by both RAM and CAM. In this example, the CAM block stores the upper two bits of the five-bit external CAM address and the RAM block stores the lower three bits of the address. If the external CAM device stores the word 45 at address location 00110, the CAM portion of the cache stores the word 45 at location 00 while the RAM portion stores address bits 110 at address location 00. The match address is formed by concatenating the CAM output address 00 with the RAM output data 110 to produce the correct match address 00110. M-WP-APEXCAMX-01 January 2001, ver. 1.0 1 Altera Corporation APEX CAM as Cache for External CAM White Paper Figure 1. Direct Mapped Cache Block Diagram Data out to external CAM 2 RAM CAM Data Data (Tag) Address 45 00 61 01 21 10 1C 11 Address Data 00 110 01 100 10 000 11 111 3 2 ram_widthadd Cache Controller cache_widthadd dataout_size 5 Address 5 Address from external CAM The direct mapped structure is an efficient cache implementation because it consumes the least amount of logic, but at a lower hit rate than associative cache. The major drawback of direct mapped cache is that it can continually miss when two addresses from the same mapping location are continually referenced. Fully Associative Cache Fully associative cache does not place any restrictions on where words can be stored in the cache. This improves throughput because it reduces the number of cache misses. However, fully associative cache consumes more memory resources than direct mapped cache and also requires a replacement algorithm to determine where to write new words into the cache. Fully associative cache uses the same type of CAM indexing and RAM search structure as direct mapped cache. Fully associative cache can place words anywhere in the cache because it stores the entire external address in RAM. Figure 2 shows the block diagram of a fully associative cache block. If the external CAM device stores the word 45 at address location 00110, the CAM portion of the cache memory stores the word 45 at location 00, which indexes to address location 00 in RAM containing the address 00110. 2 Altera Corporation APEX CAM as Cache for External CAM White Paper Figure 2. Fully Associative Cache Block Diagram Data out to External CAM 2 RAM CAM Data Data (Tag) Address 45 00 61 01 21 10 1C 11 Address Data 00 00110 01 00011 10 01000 11 11111 5 Cache Controller ex_widthadd ram_widthadd Address Address from external CAM 5 The CAM system is modeled by the block diagram in Figure 3. As the diagram indicates, the cache uses registers on the inputs to produce a two-cycle hit latency. Any of these registers can be removed to reduce the latency at the expense of the maximum frequency. Figure 3. CAM Cache Block Diagram APEX 20KE Device Internal Cache Data Data System CAM Address RAM External CAM Address 3 Altera Corporation APEX CAM as Cache for External CAM White Paper Cache Reference Design Altera has created reference designs to implement either direct mapped or fully associative cache. These designs are available on the Altera web site (http://www.altera.com). Table 1 describes the input and output ports of the cache reference designs. Table 2 describes the parameters used to customize the cache reference designs. Table 1. Cache Reference Designs Port Listing Port Name Required Type Description clock Yes Input Clock signal for all registers aclr Yes Input Register clear signal data_valid Yes Input Indicates the current value on the datain bus is registered and initiates the search process datain[] Yes Input Input data for searching, writing, or erasing from cache write Yes Input Indicates that cache should write the values on the address_ext and datain buses delete Yes Input Deletes the values on the datain and address_ext buses from cache ext_match_found Yes Input Indicates that external CAM has found a match; address_ext reflects the matching address when ext_match_found is high address_ext[] Yes Input Address or associated data value from the exteral process cache_busy No Output Indicates that cache is busy writing or erasing cache_hit Yes Output Signals that cache has found a match for data at the address indicated by cache_address dataout[] Yes Output Contains the associated data for the value on the datain bus Table 2. Cache Reference Designs Parameter Listing Parameter 4 Description depth Cache depth datain_size Width of the data values stored in cache dataout_size Width of the address values stored in cache cache_widthad Width of the CAM block’s local address bus; this value must be log2(depth) Altera Corporation APEX CAM as Cache for External CAM White Paper CAM Cache Operation The APEX CAM cache is implemented as a parameterizable VHDL design. (There are two reference designs: one for direct mapped cache and another for fully associative cache.) These designs are composed of three modules: ■ ■ ■ Controller finite state machine (FSM) CAM module RAM module The controller FSM is responsible for writing to and reading from the CAM and RAM modules. You can easily customize the FSM for additional functionality. The CAM cache supports four different operations: read, write-back (cache miss), write directly to cache, and erase. Read A cache read is initiated by applying an input pattern on the datain bus and asserting data_valid (see Figure 4). The data_valid signal registers the value on the datain bus and signals the controller FSM to search the cache for a match. The cache asserts cache_hit two clock cycles after data_valid if a match is found. The matching address can be sampled from the dataout bus when cache_hit is high. Figure 4. Read Timing Waveform clock data_valid datain input data cache_hit dataout match data Write-Back (Cache Miss) The controller FSM writes a new data value into the cache every time there is a cache miss. If the cache_hit signal does not go high two clock cycles after data_valid, the cache enters a wait state. The controller remains in the wait state until the external memory finds a match (see Figure 5). Assert the ext_match_found signal along with the correct address or translation value (address_ext) and datain value when the external memory completes a search. The ext_match_found signal indicates that the controller should write the values on the datain and address_ext buses into the cache memory. The cache asserts the cache_busy signal for two clock cycles while storing new data. The cache is ready for a new access when cache_busy is deasserted. 5 Altera Corporation APEX CAM as Cache for External CAM White Paper Figure 5. Cache Read & Write-Back Timing Waveform clock data_valid datain input data cache_hit ext_match_found address_ext write address cache_busy Write Directly to Cache The system can write a value directly into the cache (see Figure 6). The cache is written by applying a value on the datain and address_ext buses and asserting the write signal for one clock cycle. cache_busy goes high during a cache write. The cache is ready for a new access when cache_busy returns low. The cache_busy signal will be asserted for four or six clock cycles depending on whether the word you are trying to write already existed in the cache. The CAM block of the cache is in single match mode, so every time CAM is written, a check is done to see if the word already exists in CAM. Figure 6. Cache Write Timing Waveform clock write datain address_ext input data write address cache_busy (1) cache_busy (2) Notes: (1) This cache_busy signal is applicable when datain does not exist in the cache. (2) This cache_busy signal is applicable when datain exists in the cache. A datain value is written to a different internal cache address for direct-mapped and associative caches. A directmapped cache writes the datain value to the cache location specified by higher order bits of the address_ext bus. An associative cache uses a replacement algorithm to determine which cache address location a word should be written to. This associative cache uses a counter-based replacement algorithm to determine the internal write address to the cache CAM and RAM blocks. 6 Altera Corporation APEX CAM as Cache for External CAM White Paper Data can be written to the cache in one of two ways. Either a system can write to cache when writing to external memory, or an external process can write directly to the cache. A system should write to the cache whenever writing a word to external memory because that word will probably be referenced again in the near future. To maintain data integrity, you should write to the external CAM block when writing directly to the cache. Because the cache is a write-through cache, it is important to provide a memory controller to write to the external CAM when you directly write to the cache. You can support the write-through cache by creating a write buffer and a memory controller in your system-level design that stores data written to the cache and then writes this data to your external CAM. Erase The system can use the delete signal to delete values from the cache (see Figure 7). Assert the delete signal along with the appropriate datain and address_ext values for one clock cycle. After delete is asserted, the controller FSM checks to see if the cache memory contains the value on the datain bus. If the datain value is found in the cache, the FSM will proceed to delete that value from the cache. The delete operation is completed three clock cycles after asserting the delete signal. Figure 7. Cache Erase Timing Waveform clock delete datain address_ext input data erase address cache_busy Using CAM Cache with a RAM Block CAM blocks are often used to index RAM blocks for data translation. Input data is compared against CAM contents. If a match is found, CAM uses the address output to index the RAM table, which returns the associated data for the CAM input. The APEX 20KE CAM cache can be configured to store the external CAM output address or the external RAM data output. Generally, cache memory has the greatest performance impact if the associated RAM data is stored in the cache versus storing the discrete CAM output address in the cache. This type of cache is larger than one that stores the output address; however, it increases throughput because the process will not have to access the external RAM during a cache hit. When you use the cache to store the associated data, set the ports and parameters according to Table 3. 7 Altera Corporation APEX CAM as Cache for External CAM White Paper Table 3. Cache Configuration for Storing Associated Data in RAM Port or Parameter Description datain_size datain_size should be the same width as the external CAM data input port. dataout_size dataout_size should be the same width as the external RAM data output port. However, if the external RAM uses bursting, set the dataout_size port to be equal to the size of the RAM word, not the RAM output port. datain Connect the CAM data input to the datain bus. address_ext The RAM output data should be presented on this port when writing to the cache. Other All other signals should be connected to the system process and managed to meet all timing waveforms presented. The cache can be used to store the CAM output address if it is configured according to Table 4. Table 4. Cache Configuration for Storing CAM Output Address Port or Parameter Description datain_size datain_size should be the same width as the external CAM data input port. dataout_size dataout_size should be the same width as the CAM output address port. datain Connect the CAM data input to the datain bus. address_ext Apply the CAM output address value to this port during cache write. Other Connect all other signals to the system process and manage to them meet all timing waveforms presented. Conclusion CAM can provide a solution for any application requiring a fast memory search. APEX CAM provides a fast and efficient method for implementing the parallel search structure required by cache memory. The cache megafunction can be used in a number of applications such as ATM switching that requires larger CAMs. A combination of embedded CAM (APEX CAM) and external CAM (discrete CAM) can be used to speed the search process. ® 101 Innovation Drive San Jose, CA 95134 (408) 544-7000 http://www.altera.com 8 Copyright 2001 Altera Corporation. Altera, APEX, and APEX 20KE are trademarks and/or service marks of Altera Corporation in the United States and other countries. Altera acknowledges the trademarks of other organizations for their respective products or services mentioned in this document. Altera products are protected under numerous U.S. and foreign patents and pending applications, maskwork rights, and copyrights. Altera warrants performance of its semiconductor products to current specifications in accordance with Altera’s standard warranty, but reserves the right to make changes to any products and services at any time without notice. Altera assumes no responsibility or liability arising out of the application or use of any information, product, or service described herein except as expressly agreed to in writing by Altera Corporation. Altera customers are advised to obtain the latest version of device specifications before relying on any published information and before placing orders for products or services. All rights reserved.
© Copyright 2025 Paperzz