camcachewp.pdf

White Paper
®
APEX CAM as Cache for External CAM
Introduction
In designs that use larger, discrete content-addressable memory (CAM) devices, you can use embedded CAM as a
cache block for an external CAM device to store more frequently requested data. External CAM devices have longer
access times compared to the embedded CAM in APEX™ devices. This white paper describes how embedded CAM
can be used with external CAM and a cache megafunction to decrease access time and perform faster searches.
The CAM memory architecture implements fast searches by using the contents of the memory to locate the address.
With CAM, the system supplies the data, and CAM searches its memory space and returns the address where the data
was found.
CAM can accelerate any application requiring fast searches through a database or list.
Cache for Large External CAM
CAM cache speeds memory access by providing an embedded structure that stores words that are frequently
requested from the external CAM. Because the CAM cache is faster and has less latency than an external device, it
can present the requesting system with a match address several clock cycles before the external CAM device. The
CAM cache is connected in parallel with the external memory source; the system sends a read request simultaneously
to the cache and the external CAM. If the data word is stored in the CAM cache memory, it will report the match to
the system match, usually several clock cycles before the external CAM device will find a match. If the CAM cache
reports a miss, it will wait for the external CAM to find a match and the cache will then proceed to write the new word
and address.
There are two major cache architectures: direct mapped and fully associative. Direct mapped cache requires that each
word is only stored at one particular location in the cache, while fully associative cache allows a word to be stored at
any location in the cache.
Direct Mapped Cache
To reduce resource usage, direct mapped cache stores each word from external memory in the same specific cache
location.
Cache typically uses CAM to index a RAM table containing the appropriate data. Cache cannot be implemented
exclusively in the APEX embedded CAM block because the embedded CAM’s cache has a smaller address width
than the external CAM. For example, a 32-word cache block has an address width of five bits, whereas a 256-word
CAM block has an address width of eight bits. Therefore, a RAM block is required to store the additional three
address bits that the embedded CAM block cannot store.
Figure 1 shows an example of a 4-word × 5-bit cache block designed to cache a 32-word × 5-bit external CAM block.
This is shown as an example; actual applications will store more data. The match address is stored collectively by
both RAM and CAM. In this example, the CAM block stores the upper two bits of the five-bit external CAM address
and the RAM block stores the lower three bits of the address.
If the external CAM device stores the word 45 at address location 00110, the CAM portion of the cache stores the
word 45 at location 00 while the RAM portion stores address bits 110 at address location 00. The match address is
formed by concatenating the CAM output address 00 with the RAM output data 110 to produce the correct match
address 00110.
M-WP-APEXCAMX-01
January 2001, ver. 1.0
1
Altera Corporation
APEX CAM as Cache for External CAM White Paper
Figure 1. Direct Mapped Cache Block Diagram
Data out to
external CAM
2
RAM
CAM
Data
Data
(Tag)
Address
45
00
61
01
21
10
1C
11
Address
Data
00
110
01
100
10
000
11
111
3
2
ram_widthadd
Cache
Controller
cache_widthadd
dataout_size
5
Address
5
Address from
external CAM
The direct mapped structure is an efficient cache implementation because it consumes the least amount of logic, but at
a lower hit rate than associative cache. The major drawback of direct mapped cache is that it can continually miss
when two addresses from the same mapping location are continually referenced.
Fully Associative Cache
Fully associative cache does not place any restrictions on where words can be stored in the cache. This improves
throughput because it reduces the number of cache misses. However, fully associative cache consumes more memory
resources than direct mapped cache and also requires a replacement algorithm to determine where to write new words
into the cache.
Fully associative cache uses the same type of CAM indexing and RAM search structure as direct mapped cache. Fully
associative cache can place words anywhere in the cache because it stores the entire external address in RAM.
Figure 2 shows the block diagram of a fully associative cache block. If the external CAM device stores the word 45 at
address location 00110, the CAM portion of the cache memory stores the word 45 at location 00, which indexes to
address location 00 in RAM containing the address 00110.
2
Altera Corporation
APEX CAM as Cache for External CAM White Paper
Figure 2. Fully Associative Cache Block Diagram
Data out to
External CAM
2
RAM
CAM
Data
Data
(Tag)
Address
45
00
61
01
21
10
1C
11
Address
Data
00
00110
01
00011
10
01000
11
11111
5
Cache
Controller
ex_widthadd
ram_widthadd
Address
Address from
external CAM
5
The CAM system is modeled by the block diagram in Figure 3. As the diagram indicates, the cache uses registers on
the inputs to produce a two-cycle hit latency. Any of these registers can be removed to reduce the latency at the
expense of the maximum frequency.
Figure 3. CAM Cache Block Diagram
APEX 20KE Device
Internal Cache
Data
Data
System
CAM
Address
RAM
External
CAM
Address
3
Altera Corporation
APEX CAM as Cache for External CAM White Paper
Cache Reference Design
Altera has created reference designs to implement either direct mapped or fully associative cache. These designs are
available on the Altera web site (http://www.altera.com). Table 1 describes the input and output ports of the cache
reference designs. Table 2 describes the parameters used to customize the cache reference designs.
Table 1. Cache Reference Designs Port Listing
Port Name
Required
Type
Description
clock
Yes
Input
Clock signal for all registers
aclr
Yes
Input
Register clear signal
data_valid
Yes
Input
Indicates the current value on the datain bus is
registered and initiates the search process
datain[]
Yes
Input
Input data for searching, writing, or erasing from cache
write
Yes
Input
Indicates that cache should write the values on the
address_ext and datain buses
delete
Yes
Input
Deletes the values on the datain and address_ext
buses from cache
ext_match_found
Yes
Input
Indicates that external CAM has found a match;
address_ext reflects the matching address when
ext_match_found is high
address_ext[]
Yes
Input
Address or associated data value from the exteral
process
cache_busy
No
Output
Indicates that cache is busy writing or erasing
cache_hit
Yes
Output
Signals that cache has found a match for data at the
address indicated by cache_address
dataout[]
Yes
Output
Contains the associated data for the value on the datain
bus
Table 2. Cache Reference Designs Parameter Listing
Parameter
4
Description
depth
Cache depth
datain_size
Width of the data values stored in cache
dataout_size
Width of the address values stored in cache
cache_widthad
Width of the CAM block’s local address bus; this value must be log2(depth)
Altera Corporation
APEX CAM as Cache for External CAM White Paper
CAM Cache Operation
The APEX CAM cache is implemented as a parameterizable VHDL design. (There are two reference designs: one for
direct mapped cache and another for fully associative cache.) These designs are composed of three modules:
■
■
■
Controller finite state machine (FSM)
CAM module
RAM module
The controller FSM is responsible for writing to and reading from the CAM and RAM modules. You can easily
customize the FSM for additional functionality.
The CAM cache supports four different operations: read, write-back (cache miss), write directly to cache, and erase.
Read
A cache read is initiated by applying an input pattern on the datain bus and asserting data_valid (see Figure 4).
The data_valid signal registers the value on the datain bus and signals the controller FSM to search the cache
for a match. The cache asserts cache_hit two clock cycles after data_valid if a match is found. The matching
address can be sampled from the dataout bus when cache_hit is high.
Figure 4. Read Timing Waveform
clock
data_valid
datain
input data
cache_hit
dataout
match data
Write-Back (Cache Miss)
The controller FSM writes a new data value into the cache every time there is a cache miss. If the cache_hit signal
does not go high two clock cycles after data_valid, the cache enters a wait state. The controller remains in the
wait state until the external memory finds a match (see Figure 5). Assert the ext_match_found signal along with
the correct address or translation value (address_ext) and datain value when the external memory completes a
search. The ext_match_found signal indicates that the controller should write the values on the datain and
address_ext buses into the cache memory. The cache asserts the cache_busy signal for two clock cycles while
storing new data. The cache is ready for a new access when cache_busy is deasserted.
5
Altera Corporation
APEX CAM as Cache for External CAM White Paper
Figure 5. Cache Read & Write-Back Timing Waveform
clock
data_valid
datain
input data
cache_hit
ext_match_found
address_ext
write address
cache_busy
Write Directly to Cache
The system can write a value directly into the cache (see Figure 6). The cache is written by applying a value on the
datain and address_ext buses and asserting the write signal for one clock cycle. cache_busy goes high
during a cache write. The cache is ready for a new access when cache_busy returns low. The cache_busy signal
will be asserted for four or six clock cycles depending on whether the word you are trying to write already existed in
the cache. The CAM block of the cache is in single match mode, so every time CAM is written, a check is done to see
if the word already exists in CAM.
Figure 6. Cache Write Timing Waveform
clock
write
datain
address_ext
input data
write address
cache_busy (1)
cache_busy (2)
Notes:
(1) This cache_busy signal is applicable when datain does not exist in the cache.
(2) This cache_busy signal is applicable when datain exists in the cache.
A datain value is written to a different internal cache address for direct-mapped and associative caches. A directmapped cache writes the datain value to the cache location specified by higher order bits of the address_ext
bus. An associative cache uses a replacement algorithm to determine which cache address location a word should be
written to. This associative cache uses a counter-based replacement algorithm to determine the internal write address
to the cache CAM and RAM blocks.
6
Altera Corporation
APEX CAM as Cache for External CAM White Paper
Data can be written to the cache in one of two ways. Either a system can write to cache when writing to external
memory, or an external process can write directly to the cache. A system should write to the cache whenever writing a
word to external memory because that word will probably be referenced again in the near future.
To maintain data integrity, you should write to the external CAM block when writing directly to the cache. Because
the cache is a write-through cache, it is important to provide a memory controller to write to the external CAM when
you directly write to the cache. You can support the write-through cache by creating a write buffer and a memory
controller in your system-level design that stores data written to the cache and then writes this data to your external
CAM.
Erase
The system can use the delete signal to delete values from the cache (see Figure 7). Assert the delete signal
along with the appropriate datain and address_ext values for one clock cycle. After delete is asserted, the
controller FSM checks to see if the cache memory contains the value on the datain bus. If the datain value is
found in the cache, the FSM will proceed to delete that value from the cache. The delete operation is completed three
clock cycles after asserting the delete signal.
Figure 7. Cache Erase Timing Waveform
clock
delete
datain
address_ext
input data
erase address
cache_busy
Using CAM Cache with a RAM Block
CAM blocks are often used to index RAM blocks for data translation. Input data is compared against CAM contents.
If a match is found, CAM uses the address output to index the RAM table, which returns the associated data for the
CAM input. The APEX 20KE CAM cache can be configured to store the external CAM output address or the external
RAM data output.
Generally, cache memory has the greatest performance impact if the associated RAM data is stored in the cache
versus storing the discrete CAM output address in the cache. This type of cache is larger than one that stores the
output address; however, it increases throughput because the process will not have to access the external RAM during
a cache hit. When you use the cache to store the associated data, set the ports and parameters according to Table 3.
7
Altera Corporation
APEX CAM as Cache for External CAM White Paper
Table 3. Cache Configuration for Storing Associated Data in RAM
Port or Parameter
Description
datain_size
datain_size should be the same width as the external CAM data input port.
dataout_size
dataout_size should be the same width as the external RAM data output port. However, if the
external RAM uses bursting, set the dataout_size port to be equal to the size of the RAM word,
not the RAM output port.
datain
Connect the CAM data input to the datain bus.
address_ext
The RAM output data should be presented on this port when writing to the cache.
Other
All other signals should be connected to the system process and managed to meet all timing
waveforms presented.
The cache can be used to store the CAM output address if it is configured according to Table 4.
Table 4. Cache Configuration for Storing CAM Output Address
Port or Parameter
Description
datain_size
datain_size should be the same width as the external CAM data input port.
dataout_size
dataout_size should be the same width as the CAM output address port.
datain
Connect the CAM data input to the datain bus.
address_ext
Apply the CAM output address value to this port during cache write.
Other
Connect all other signals to the system process and manage to them meet all timing waveforms
presented.
Conclusion
CAM can provide a solution for any application requiring a fast memory search. APEX CAM provides a fast and
efficient method for implementing the parallel search structure required by cache memory. The cache megafunction
can be used in a number of applications such as ATM switching that requires larger CAMs. A combination of
embedded CAM (APEX CAM) and external CAM (discrete CAM) can be used to speed the search process.
®
101 Innovation Drive
San Jose, CA 95134
(408) 544-7000
http://www.altera.com
8
Copyright  2001 Altera Corporation. Altera, APEX, and APEX 20KE are trademarks and/or service marks of Altera Corporation in the United
States and other countries. Altera acknowledges the trademarks of other organizations for their respective products or services mentioned in
this document. Altera products are protected under numerous U.S. and foreign patents and pending applications, maskwork rights, and
copyrights. Altera warrants performance of its semiconductor products to current specifications in accordance with Altera’s standard
warranty, but reserves the right to make changes to any products and services at any time without notice. Altera assumes no responsibility or
liability arising out of the application or use of any information, product, or service described herein except as expressly agreed to in writing
by Altera Corporation. Altera customers are advised to obtain the latest version of device specifications before relying on any published
information and before placing orders for products or services. All rights reserved.