Excalibur Solutions—
Multi-Master Reference
Design
November 2002, ver. 2.3
Introduction
Application Note 181
The advent of the system-on-a-programmable-chip (SOPC) era has caused
a shift in the implementation challenges facing programmable logic
device (PLD) designers. From simply achieving a specified clock-to-out
time or system fMAX, more abstract issues such as system throughput and
bandwidth have gained importance in the PLD design process. At the
same time, the basic challenges of improving system performance, the
system feature set, and time to market still remain. The Excalibur™
embedded processor solutions incorporate an open standard embedded
bus architecture. With this architecture, the connection between the host
processor and the embedded peripherals simplifies and accelerates
system integration.
Among other considerations, the specification phase of a design focuses
on the means of communication between individual system components.
The embedded bus architecture that is chosen affects the system’s feature
set and its performance. The Excalibur embedded processor PLD solution,
the flagship of the SOPC era, incorporates an embedded stripe. It provides
designers with an embedded processor solution built on bus architecture
that, when used in conjunction with good design practices, address
today’s design challenges.
The multi-master reference design demonstrates a bus structure
implementation that supports a large number of master and slave
peripherals attached to the embedded processor stripe. This
implementation allows masters in the PLD to communicate with slaves in
both the PLD and the stripe—the processor runs embedded software to
simulate a master in the stripe that can communicate with a set of shared
slaves in the PLD. The multi-master reference design provides a baseline
for designers to develop and refine their own system architectures by
simulating various types of bus transactions and analyzing how well the
design performs. In this way, the designer can optimize a design for both
size and performance.
This application note gives a perspective on implementing multi-master
systems in Excalibur devices. It describes the bus architectures used in the
multi-master reference design, and discusses alternative design
capabilities. The document also explains how to simulate the design and
gives an overview of the components of the advanced micro-controller
bus architecture (AMBA™) high-performance bus (AHB).
Altera Corporation
AN-181-2.3
1
AN 181: Excalibur Solutions—Multi-Master Reference Design
1
Refer to “Revision History” on page 28 to see the changes made
for this version of the document.
Related
Documents
You should read the following related documents:
Embedded
Stripe Bus
Architecture
The Excalibur embedded stripe architecture uses two AMBA advanced
high-performance buses (AHBs), referred to as AHB1 and AHB2. These
two buses provide the embedded stripe peripherals with an efficient
means of processing data: the lower-speed peripherals are grouped on
AHB2, which maximizes the capacity for the faster peripherals, located on
AHB1, to run at higher speeds.
■
■
■
Wrapper Latency White Paper
AN142: Using the Embedded Stripe Bridges
AN192: Embedded Stripe Performance
1
Appendix A gives more information on the AHB and its
components.
Table 1 details the peripherals in AHB1:
Table 1. Peripherals on AHB1
Masters
ARM922T
Slaves
SRAM
DPRAM
Interrupt Controller
Watchdog Timer
AHB1-2 bridge
Table 2 details the peripherals in AHB2:
Table 2. Peripherals on AHB2
Masters
Slaves
Configuration logic
UART
PLD-to-stripe bridge
EBI
AHB1-2 bridge
Timer
SRAM
DPRAM
Stripe-to-PLD bridge
Figure 1 shows a block diagram of the stripe bus architecture.
2
Altera Corporation
AN 181: Excalibur Solutions—Multi-Master Reference Design
Figure 1. Excalibur Embedded Stripe Bus Structure
Flash
SDRAM
Watchdog
Timer
Interrupt
Controller
Processor
AHB1
Memory
Controller
EBI
UART
AHB1-2
Bridge
Single-Port
SRAM
Single-Port
SRAM
Dual-Port
SRAM
Dual-Port
SRAM
AHB2
Timer
Stripe-to-PLD
Bridge
Cofiguration
Logic
PLD-to-Stripe
Bridge
Embedded Stripe
PLD Array
1
In Figure 1, masters are shown in black; and slaves are shown in
gray.
Although the configuration logic is both a master and a slave on
AHB2, for this discussion it is referred to as a master.
The embedded stripe bus architecture allows the embedded processor to
access both the stripe peripherals and the PLD via the bridges. Peripherals
in the PLD have access to AHB2 peripherals via the PLD-to-stripe bridge.
The memory elements (SRAM, dual-port SRAM, and SDRAM) in the
embedded stripe are located on both the AHB1 and AHB2 buses. Internal
arbitration on the two buses is done in each memory element, to prevent
data corruption.
The AHB2 clock is derived from the AHB1 clock. AHB1 and AHB2
masters access slaves on the appropriate buses at their specified clock
speeds.
The embedded stripe bus structure provides an efficient means of
communication, allowing peripherals to be accessed from the different
bus structures in the system, yet segregating the high-speed and lowspeed peripherals, so that overall system performance is increased.
Altera Corporation
3
AN 181: Excalibur Solutions—Multi-Master Reference Design
When implementing peripherals in the PLD, the strategy for
interconnecting multiple masters sharing multiple slave peripherals must
be carefully considered for optimal implementation. The multi-master
reference design demonstrates an implementation for such an
interconnection. For a specific design, some or all of this strategy may be
used.
Functional
Description
The multi-master reference design offers one approach to attaching
peripherals to the embedded stripe. It is a behavioral implementation that
illustrates the fundamental concepts of an AHB system together with
other concepts that are particular to Excalibur bus architecture. The
following section describes the multi-master bus architecture. For a more
detailed description of each component in the system, see Appendix A.
Bus Architecture
The multi-master reference design comprises two independent bus
structures, PLD bus 1 and PLD bus 2, which can both process AHB
transactions. They each have a shared address and control bus, a write
data bus, and a read data and slave response bus. For each bus, an arbiter
controls which master has access to the bus, and an address decoder
provides the chip selects for each of the slaves.
Figure 2 on page 5 is a block diagram of the multi-master reference
design.
4
Altera Corporation
AN 181: Excalibur Solutions—Multi-Master Reference Design
Figure 2. Block Diagram of the Multi-Master Reference Design
CPU
Stripe
Memory AHB1
Element
Stripe
Memory
Element
AHB1 to
AHB2
Bridge
AHB2
PLD-to-stripe
bridge
slave 0
Stripe-to-PLD
bridge
Master 0
Stripe
PLD Bus 1
Layer 1
Master
3
PLD Bus 2
Interconnect Matrix
Master
1
Layer 2
Slave
7
Slave
1
Master
2
Master
4
Slave
2
Slave
4
Slave
5
Slave
6
Slave
8
Master
5
Slave
3
Slave
9
An interconnect matrix connects the bus structures. It has two layer
interfaces and three slave interfaces. The layer interface provides the bus
structures with a means of accessing the common pool of slaves. To the
PLD buses, the interconnect matrix interface looks identical to a
conventional AHB slave interface.
The addition of an interconnect matrix provides a system in which
masters on PLD bus 1 and PLD bus 2 have access to a common pool of
slaves in the PLD. The embedded processor also has access to the common
pool of slaves in the PLD, because the stripe-to-PLD bridge is PLD bus 2.
The same functionality can be achieved by mirroring the AHB2 bus in the
PLD, i.e., replacing the interconnect matrix in Figure 2 with a direct
connection to the PLD-to-stripe bridge. When the AHB2 bus is mirrored
in this fashion there is a strong possibility of bus lockups. The interconnect
matrix provides the same functionality but prevents bus lockups. For
more information on this topic, see Appendix A.
Altera Corporation
5
AN 181: Excalibur Solutions—Multi-Master Reference Design
Figure 3 is a block diagram of PLD bus 2; PLD bus 1 is similar to PLD
bus 2.
Slave
8
Slave
9
HADDR[31:0]
Address & Control Bus
Address & Control 1
Address & Control 2
Master
1
Arbiter HBUSREQ[2:0] & HGRANT[2:0]
Decoder
HRDATA & Slave Resp 8
Slave
7
Master 0
Stripe-toPLD Bridge
Address & Control 0
Write Data 0
HSEL_BUS[31:0]
HRDATA & SlaveResp 7
Interconnect
Matrix
HRDATA & Slave Resp Layer 2
Figure 3. Block Diagram of PLD Bus 2
Write Data Bus
Master
2
Write Data 1
Write Data 2
Read Data & Slave
Response Bus
HRDATA & Slave Resp 9
Peripherals
As shown in Figure 2 on page 5, the multi-master reference design has 6
AHB masters and 10 AHB slaves. Each peripheral implements a specific
portion of the available AHB protocol.
Masters
The masters in this system model the front-end interface to the AHB. As
indicated by their names, they produce the address and control
information to process a specific type of AHB transaction.
6
Altera Corporation
AN 181: Excalibur Solutions—Multi-Master Reference Design
Table 3 lists the masters in the system.
Table 3. Masters in the Multi-Master Reference Design
No
Master
PLD Bus
2
Description; Bus Protocol
0
Stripe-to-PLD bridge
Bridges between PLD bus 2 and the AHB2 bus
1
Single read
2
Single read: byte, half-word and word
2
ALU master
2
Drives slave 5 (the ALU slave) through the interconnect matrix
3
Single write
1
Single write: byte, half-word and word
4
Burst write
1
Burst write of unspecified length
5
Burst read
1
Burst read of unspecified length
The single-word read and single-word write masters perform the basic
transactions—single reads and writes. The masters have a back-end
interface where the user specifies the location and data for the transaction.
The masters then drive the AHB protocol on the front end to process the
transaction.
The burst write and burst read masters initiate a 16-word burst to a
location specified by the user on the back-end interface. The masters drive
the AHB protocol on the front end to process the burst. The location
specified on the back-end interface is the first address of the burst. The
masters increment the address internally to complete the burst. Even with
a 16-word burst, the masters encode the burst as an unspecified length
incrementing burst. Although both bursting methods are functionally the
same, the reference design uses unspecified length incrementing bursts
for simplicity.
The arithmetic logic unit (ALU) master most closely resembles a
conventional bus master. It drives the ALU slave with an operation and
two operands and then reads back the results from the ALU slave. The
location and data for the ALU master are hard-coded in the RTL code.
Master Back-End Interface
The back-end interface of the master allows users to specify the location
and data content of a transaction. Table 4 on page 8 shows the signal
names for the back-end interface.
Altera Corporation
7
AN 181: Excalibur Solutions—Multi-Master Reference Design
Table 4. Signal Names for the Back-End Interface
Pin Name
Direction
Description
Input
START_TRANS
Starts the transaction
HADDRESS[31:0] Input
Transaction address
HWDATA[31:0]
Input
Transaction write data
HRDATA[31:0]
Output
Transaction read data
BUS_ERROR
Output
Asserted when a error occurs on the bus
The START_TRANS pin signals the master to start a transaction to the
location specified by the current value of HADDRESS. For the write
masters, the data transferred is on HWDATA. For the read masters, the data
read back is presented on HRDATA. All masters, except the ALU master,
use this interface, because ALU transactions are to a specific location.
1
See “Stimulating Masters” on page 20 for more details on the
master back-end interface.
Slaves
The slaves model the front-end interface to the AHB system. As indicated
by their names, they process specific types of transactions. See Table 5 for
details.
Table 5. Slaves in the Multi-Master Reference Design
No
Slave
Bus
Description; Bus Protocol
1 Single-transaction slave
PLD bus 1
Accepts a single AHB transaction
2 Burst-transaction slave
PLD bus 1
Accepts INCR burst transactions
3 Narrow data bus slave
PLD bus 1
16-bit peripheral on a 32-bit bus
4 Single-transaction slave
Int. matrix
Accepts a single AHB transaction
5 ALU slave
Int. matrix
Unsigned arithmetic unit ADD, SUB, and MULT
6 Burst-transaction slave
Int. matrix
Accepts INCR burst transactions
7 Single-transaction slave
PLD bus 2
Accepts a single AHB transaction
8 Burst-transaction slave
PLD bus 2
Accepts INCR burst transactions
9 Wide data bus slave
PLD bus 2
64-bit peripheral on a 32-bit bus
10 PLD-to-stripe bridge
PLD bus 1
Embedded stripe bridge
8
Altera Corporation
AN 181: Excalibur Solutions—Multi-Master Reference Design
Single-transaction slaves are able to process single-word read and write
transactions. If a burst transaction is targeted to a single-transaction slave,
the slave issues an error response. In addition, if a transaction is targeted
to an address location that is in the slave’s address space, but without a
memory location, the slave issues an error response.
Burst slaves are able to accept both burst reads and writes as well as single
transactions. Burst slaves issue an error response to transactions whose
memory locations are not physically in the slave’s assigned address space.
The narrow data bus slave is a slave implementation with a 32-bit wide
front-end interface and a 16-bit wide back-end interface. It is half-word
addressed rather than word-addressed. It can also accept bursts and its
error checking is the same as the burst slaves’.
The wide data bus slave is a slave implementation with a 32-bit wide
front-end interface and a 64-bit wide back-end interface: the third bit of
the address bus chip selects between two 32-bit wide register files. The
wide slave also accepts bursts and its error checking is the same as the
burst slaves’.
1
All slaves do some error checking, but they do not check for all
possible error conditions.
The ALU slave performs unsigned arithmetic on operands specified by
the master driving the peripheral. The master must first write both
operands and then write to the operation register. Wait states are inserted,
based on the operations requested. After the wait condition is removed,
the master can then read back the results of the operations.
Slave Register Files
Each slave, except the ALU slave, has a 16-word register file. The register
file for the narrow slave is a half word, because it models a narrow
peripheral on a wide bus. The register file for the wide slave is a doubleword register file, because it models a wide peripheral and a narrow bus.
The ALU slave has only a five-word deep register file. The first two
locations are the two operands to the computation, the third location is the
operation register, and the last two locations are where the results for the
computation are stored. Figure 4 on page 10 shows the register map for
the ALU slave.
Altera Corporation
9
AN 181: Excalibur Solutions—Multi-Master Reference Design
Figure 4. ALU Register File Map
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9
Offset
Register
04H
Operand 1
Operand 1
Operand 2
08H
Operand 2
0CH
Operation
0
10H
Result Low
Result_Low
14H
Result High
Result_High
8
7
6
5
4
3
2
1
0
Oper
Slave Address Map
Each slave in the multi-master reference design occupies a 4-Kbyte
address, except for the PLD-to-stripe bridge. The PLD-to-stripe bridge
addressable range includes the address space for the PLD as well as the
address space of the AHB2 peripherals. The addresses for the slaves in the
multi-master reference design is are listed in Table 6.
Table 6. Addresses for Slaves in the Multi-Master Reference Design
No
Slave
Bus
PLD bus 1
Assigned Address Space
1
Single-transaction slave
80002000 – 80002FFF
2
Burst-transaction slave
PLD bus 1
80004000 – 80004FFF
3
Narrow data bus slave
PLD bus 1
80008000 – 80008FFF
4
Single-transaction slave
Int. matrix
80010000 – 80010FFF
5
ALU slave
Int. matrix
80020000 – 80020FFF
6
Burst-transaction slave
Int. matrix
80040000 – 80040FFF
7
Single-transaction slave
PLD bus 2
80080000 – 80080FFF
8
Burst transaction slave
PLD bus 2
80100000 – 80100FFF
9
Wide data bus slave
PLD bus 2
80200000 – 80200FFF
10
PLD-to-stripe bridge
PLD bus 1
90000000 – FFFFFFFF(1)
Note:
(1)
Currently only SRAM is in this address space. Other AHB2 slaves should be added
to this space.
The decoder in the reference design chip selects the PLD-to-stripe bridge
in the 90000000-FFFFFFFF address range. Any AHB2 peripherals added
to the design could be added to this space.
10
Altera Corporation
AN 181: Excalibur Solutions—Multi-Master Reference Design
Implementation
Capabilities
This section describes the capabilities of the multi-master reference design
implementation in more detail. It also gives suggestions for other bus
structures and shows an example application that uses the embedded
stripe.
The purpose of the multi-master reference design is to demonstrate
masters in the PLD communicating with slaves both in the PLD and in the
embedded stripe, and to show the processor communicating with a group
of shared slaves in the PLD. The gray path shown in Figure 5 shows
master 5 accessing a stripe memory element on AHB2, plus slaves 3 and
4; the blue path shows the processor core accessing slaves in the PLD.
Figure 5. Multi-Master Reference Design Diagram
CPU
Stripe
Memory AHB1
Element
AHB2
AHB1 to
AHB2
Bridge
PLD-to-stripe
bridge
slave 0
Stripe-to-PLD
bridge
Master 0
Stripe
PLD Bus 1
PLD Bus 2
Interconnect Matrix
Layer 1
Master
3
Stripe
Memory
Element
Master
1
Layer 2
Slave
7
Slave
1
Master
2
Master
4
Slave
2
Slave
4
Slave
5
Slave
6
Slave
8
Master
5
Slave
3
Altera Corporation
Slave
9
11
AN 181: Excalibur Solutions—Multi-Master Reference Design
In Figure 5, masters 0, 1, and 2 have access to slaves 7, 8 and 9, because
they are on the same local bus. Masters 0, 1, and 2 also have access to
slaves 4, 5, and 6 via the interconnect matrix. Masters 3, 4, and 5 have
access to slaves 1, 2, and 3, as they are on the same local bus; and they have
access to slaves 4, 5, and 6 via the interconnect matrix. Masters 3, 4, and 5
also have access to embedded stripe peripherals on AHB2.
Alternative design capabilities are that PLD bus 1 and PLD bus 2 can be
two independent bus structures, which means they can process
transactions simultaneously. Or, in a different scenario, PLD bus 1 and
PLD bus 2 can access two different slaves on the interconnect matrix
simultaneously.
Using
Alternative Bus
Implementations
In the multi-master reference design, masters in the PLD can communicate
with slaves in the PLD and in the embedded stripe; and the embedded
processor can communicate with slaves in the PLD. However, this
implementation uses a large number of peripherals, which may not be
typical for many systems. In addition, the reference design allows masters
to communicate with the majority of the slaves in the system, which again
might not be the case with most systems. Changes to the number of
peripherals in the system and how they communicate can have a dramatic
impact on the complexity, performance, and device utilization. The
following examples show how bus structures affect system complexity,
performance, and device utilization:
1.
Implementation with no common PLD slaves
2.
Implementation using a switch fabric
3.
Implementation that shares slave ports and local masters
System Implementation with No Common PLD slaves
If a system does not require multiple masters to have access to a common
pool of slaves, including the stripe-to-PLD bridge, the interconnect matrix
can be taken out. In this scenario, one or both of the PLD buses can
probably be trimmed to meet the needs of the system. Figure 6 shows a
block diagram of this implementation.
12
Altera Corporation
AN 181: Excalibur Solutions—Multi-Master Reference Design
Figure 6. Interconnect Matrix Removed
Stripe
Memory AHB1
Element
CPU
Stripe
Memory
Element
AHB1 to
AHB2
Bridge
AHB2
PLD-to-stripe
bridge
Slave 0
Stripe-to-PLD
bridge
Master 0
Embedded Stripe
Slave
1
Master
1
Slave
3
Master
2
Slave
2
The bus structure in Figure 6 demonstrates how system complexity and
device utilization have been reduced by removing the interconnect matrix
and simplifying the PLD bus structure. With this implementation, system
performance increases.
System Implementation Using a Switch Fabric
Although the interconnect matrix is a complex piece of logic, its flexibility
makes it an attractive option for a bus structure. One way to modify the
system is to change the number of layers and slave interfaces to match the
number of masters and slaves in the system. Figure 7 on page 14 shows
this implementation.
Altera Corporation
13
AN 181: Excalibur Solutions—Multi-Master Reference Design
Figure 7. Full Interconnect Matrix Implementation
Stripe
Memory AHB1
Element
Stripe
Memory
Element
AHB1 to
AHB2
Bridge
AHB2
PLD-to-stripe
bridge
slave 0
CPU
Stripe-to-PLD
bridge
Master 0
Stripe
Layer 1
Slave
1
Layer 2
Master
1
Layer 3
Master
2
Interconnect Matrix
Slave
2
Slave
3
The implementation shown in Figure 7 is a complete switch fabric:
multiple masters are able to access multiple slaves simultaneously. The
disadvantage of this implementation is that performance decreases as the
number of peripherals increases, due to the highly combinatorial nature
of the interconnect matrix.
System Implementation Sharing Slave Ports and Local Masters
Another bus structure variation arranges peripherals around the
interconnect matrix to address each peripheral’s specific needs. Making
slaves local to a master and using one slave port for several slaves
simplifies the interconnect matrix and can increase system performance.
Figure 8 on page 15 shows an example of this implementation.
14
Altera Corporation
AN 181: Excalibur Solutions—Multi-Master Reference Design
Figure 8. Implementation with Local Slaves and Multiple Slaves on One Slave Port
Stripe
Memory AHB1
Element
AHB2
PLD-to-stripe
bridge
slave 0
CPU
Stripe
Memory
Element
AHB1 to
AHB2
Bridge
Stripe-to-PLD
bridge
Master 0
Embedded Stripe
Layer 1
Slave
3
Layer 2
Master
2
Interconnect Matrix
Slave
4
Slave
1
Slave
2
The bus structure in Figure 8 shows the interconnect matrix with two
layer interfaces and two slave interfaces. This makes the interconnect
matrix less complex and can increase performance. Slaves 1 and 2 are local
to master 1 and slaves 3 and 4 are connected to a single slave port.
f
Summary
Altera Corporation
See Multi-Layer AHB at http://www.arm.com for more information on
the interconnect matrix.
Embedded bus architectures play a vital role in the feature set and
performance of a SOPC design. The multi-master reference design
described in this application note illustrates how masters in an Excalibur
device can communicate with slaves in both the PLD portion and the
stripe, and allows designers to simulate a variety of bus transactions. With
Altera’s Excalibur embedded processor solutions, users have an ideal
framework that addresses their SOPC design needs. However, to
maximize the benefits of the Excalibur solutions, designers must
implement PLD bus structures that complement the embedded stripe bus
architecture.
15
AN 181: Excalibur Solutions—Multi-Master Reference Design
Installation
This section details the software requirements and the directory structure.
Software Requirements
The following software is required to build and simulate the multi-master
reference design.
■
■
■
Quartus® II software version 2.2
ARM Developer Suite for Altera (Altera ADS-Lite) software version
1.1 or GNUPro for ARM toolkit
Model Technology™ ModelSim® software version 5.6d
The instructions in this section assume that:
■
■
You are using a PC running Windows.
You are familiar with the Quartus II, Altera ADS-Lite or GnuPro for
ARM, and ModelSim software; and the software is installed on your
PC in the default location.
To make efficient use of the design file provided, you should have a
working knowledge of the following areas:
■
■
■
Bus interfacing and the AHB; for the AMBA specification, go to
http://www.arm.com
Design tool flow for Excalibur devices using the following software:
–
The Quartus II software
–
ModelSim simulation tool
–
The Altera ADS-Lite software or GNUPro toolkit for ARM
Verilog HDL and assembly language
Directory Structure
To install the multi-master reference design, unzip an181.zip into the
installation directory of your choice.
16
Altera Corporation
AN 181: Excalibur Solutions—Multi-Master Reference Design
After installation, the directory structure is as follows:
<Installation Directory>\multimaster_ahb
\Ads
\simulation
\Modelsim
\software
\Gnu
\simulation
\Modelsim
\software
\rtl
\testbench
The multi-master reference contains two Quartus II projects, which differ
in the embedded software tool chain that is used to compile the software.
The \Ads directory contains a Quartus II project that uses the ADS lite
tool chain and the \Gnu directory contains a Quartus II project that uses
the GNUPro for ARM tool chain.
1
The project and the project directory that is not used can be
deleted or ignored.
<Installation Directory>\example_designs\multimaster_ahb\<ads or
gnu> contains the Quartus II project files, files generated by the Excalibur
MegaWizard® Plug-In, power kit files, and the sbd2sim.bat file. The
power kit files are used to make a multi-cycle assignment to the ALU
slave. Where computations carried out by the ALU take more than one
clock cycle to complete, wait states are inserted and the event is timed as
a multi-cycle path. sbd2sim.bat is a batch file that runs as a post-software
build command and is used to produce the correct set of stripe model
initialization files.
The <Installation Directory>\example_designs
\multimaster_ahb\<ads or gnu>\software directory contains an
assembly code file, multi_master_reference_design.s, that performs
reads and writes to different addresses in the design.
1
Altera Corporation
The assembly files for the GNUPro and ADS tool chain are
different.
17
AN 181: Excalibur Solutions—Multi-Master Reference Design
<Installation Directory>\example_designs\multimaster_ahb\<ads or
gnu>\simulation\modelsim contains all the files necessary for all
simulation flows. See Table 7 for a list of these files.
Table 7. Simulation Files in <Installation Directory>\example_designs\multimaster_ahb\<ads or gnu>\
simulation\modelsim
File
Description
compile_and_run_rtl_busfuncmodel_v.do
Compiles and runs all files necessary for functional
simulation with the BFM
compile_and_run_rtl_fullmodel_v.do
Compiles and runs all files necessary for functional
simulation with the stripe model
compile_and_run_timing_busfuncmodel_v.do Compiles and runs all files necessary for timing simulation
with the BFM
compile_and_run_timing_fullmodel_v.do
Compiles and runs all files necessary for timing simulation
with the stripe model
input.dat
Stimulus file for the BFM
modelsim.mpf
Modelsim project file. Updated to work with the stripe model
slavememory.0.dat
Initial contents file for memory bank 0
slavememory.cfg.dat
Configures the memory bank for the BFM slave port
wave_interconnect_matrix.do
Waveform of the interconnect matrix I/O
wave_rtl_bfm.do
Waveform file for functional BFM simulation
wave_timing_busfuncmodel.do
Waveform file for timing BFM simulation
wave_rtl_fullstripe.do
Waveform file for functional simulation with the full stripe
model
wave_timing_fullstripe.do
Waveform file for timing simulation with the full stripe model
The <Installation Directory>\example_designs\multimaster_ahb\rtl
directory contains all the design files for the design. See Table 8 for a
description of each file in the directory.
Table 8. Design Files in <Installation Directory>\example_designs\multimaster_ahb\rtl (Part 1 of 3)
File
Description
address_control_mux.v
Address and control information multiplexer
ahb_include.v
Include file that contains all of the parameters for the design
ahb_slave_sm.v
State machine for the ALU slave
alu.v
ALU for the ALU slave
alu_regfile.v
Register file for the ALU slave
18
Altera Corporation
AN 181: Excalibur Solutions—Multi-Master Reference Design
Table 8. Design Files in <Installation Directory>\example_designs\multimaster_ahb\rtl (Part 2 of 3)
File
Description
alu_slave.v
Top level file for the ALU slave. It instantiates all of the
components for the ALU slave and performs unsigned
addition subtraction and multiplication. The register file for
this slave has five locations. Location 1 is operand1. Location
2 is operand2. Location 3 is the operation (5 = add, 6 = sub,
7 = mult). Location 4 is the lower 32 bits of the result.
Location 5 is the high 32 bits of the result. The wait states for
the different operations can be changed in the
ahb_include.v file (slave 5)
arbiter.v
Simple priority encoded arbiter. No Fairness implemented
burst_slave.v
Simple slave that is able to handle incremental burst
transactions (slave 8 and slave 2)
default_slave.v
Default slave used to fill up address space
input_stage.v
Interconnect matrix input stage used to buffer data
interconnect_decoder.v
Interconnect matrix decoder
interconnect_matrix.v
Top level file for the Interconnect matrix. Instantiates all of
the components for the Interconnect matrix
interconnect_mux.v
Interconnect multiplexer
interconnect_mux_resp_layer1.v
Layer 1 Interconnect response multiplexer
interconnect_mux_resp_layer2.v
Layer 2 Interconnect response multiplexer
master_alu.v
Simple AHB master used to drive the ALU slave. The values
that are actually driven to the slave are hard coded in the
ahb_include file. Any changes to be made to data that is
driven to the ALU slave should be made to ahb_include.v
(master 2)
master_burst_read.v
Master that performs burst reads from locations specified by
the user. Does a 16-word burst read starting that the address
specified on HADDRESS (master 5)
master_burst_write.v
Master that performs burst writes to locations specified by
the user. Does a 16-word write burst starting at the address
specified on HADDRESS (master 4)
master_single_read.v
Master that reads a single location from the location specified
on HADDRESS (master 1)
master_single_write.v
Master that writes a single location from the location
specified on HADDRESS (master 3)
multi-master_reference_design.v
Top level design file
narrow_regfile.v
Narrow slave’s register file 16 half-word locations
narrow_slave.v
Narrow data bus width slave. Can accept bursts (slave 3)
pld_bus1_decoder.v
PLD bus1 address decoder
Altera Corporation
19
AN 181: Excalibur Solutions—Multi-Master Reference Design
Table 8. Design Files in <Installation Directory>\example_designs\multimaster_ahb\rtl (Part 3 of 3)
File
Description
pld_bus2_decoder.v
PLD bus2 address decoder
read_data_bus_and_slave_response_pld1.v
Read data and slave response multiplexor for PLD bus 1
read_data_bus_and_slave_response_pld2.v
Read data and slave response multiplexor for PLD bus 2
regfile.v
16-word register file that is used by most of the slaves to
store data
signal_transaction_slave.v
Slave that is able to handle single transactions (slave 7,
slave 4, and slave 1)
wait_state_gen.v
Wait state generator for ALU slave
wide_slave.v
Wide data bus-width slave that accepts bursts (slave 9)
write_data_bus_mux.v
Write data bus multiplexer
Simulation
All testbench, assembly, and other input stimuli files are provided with
the multi-master reference design. The goal is to keep the stimuli file as
simple as possible, so users can easily make changes to view different
transactions.
Stimulating Masters
To start a transaction, the testbench strobes the START_TRANS line on the
master. The values in HADDRESS and HWDATA define the location and the
data to transfer if the transaction is a write. For a read, the data is read back
on HRDATA. If the transaction location generates an error, the BUS_ERROR
signal is asserted. On the slave side, there is a 16-location deep register file
where the contents can be viewed in simulation.
Verifying the Design Flow
The multi-master project files are set up to allow four different verification
flows, two RTL simulations and two timing simulations:
■
RTL simulations
–
–
■
Timing simulations
–
–
1
20
Bus functional model (BFM) RTL simulation
Full stripe model RTL simulation
BFM timing simulation
Full stripe model timing simulation
For a more detailed explanation of the simulation tool flow, see
the Excalibur ARM-Based Hardware Design Tutorial on the
Excalibur Tools for ARM CD.
Altera Corporation
AN 181: Excalibur Solutions—Multi-Master Reference Design
RTL Simulation
RTL simulation can be done with both the bus functional model and the
full stripe model; scripts automate a portion of the simulation process. The
steps to perform RTL simulation are given below.
Running BFM RTL Simulation
Follow the steps below to run an RTL simulation on the BFM:
1.
Start the Modelsim simulation tool and open the Modelsim project
modelsim.mpf in <Installation Directory>\example_designs\
multimaster_ahb\<ads or gnu>\simulation\modelsim.
1
2.
Click Cancel if the load project dialog box appears.
To create the project work directory, at the ModelSim prompt type:
vlib work↵
3.
Open a Command Prompt window and use the bus translator to
convert input.dat to mastercommands.dat file.
4.
Navigate to <Installation Directory>\example_designs\
multimaster_ahb\<ads or gnu>\simulation\modelsim and at the
command prompt, type:
exc_bus_translate input.dat↵
5.
Return to the Modelsim simulation tool. Choose Execute Macro
(Macro menu) and select the <Installation Directory>\
example_designs\multimaster_ahb\<ads or gnu>\simulation
\modelsim\compile_and_run_rtl_busfuncmodel_v.do script file.
6.
Examine the waveform output. This shows the PLD buses and each
of the peripherals. For masters, the AHB signals and the
START_TRANS signals are shown. For slaves, the AHB signals are
shown and the contents of the register files are shown also.
Figure 9 on page 23 shows the portion of the waveform file with the
PLD Bus 1 and PLD bus 2 signals:
Altera Corporation
■
At 1005 ns in simulation time, the master port, which is on PLD
bus 2, starts a transaction to slave 4 (address 80010004H) on the
interconnect matrix. This is signified by a value of NONSEQ
(binary-encoded as 10, see ahb_include.v) on HTRANS.
■
At 1035 ns, the transaction completes with the master port
writing a value of 0000000A to slave 4. Also at that time, master
21
AN 181: Excalibur Solutions—Multi-Master Reference Design
3 on PLD bus 1 starts a transaction to slave 4. Because slave 4 is
still granted to PLD bus 2, a wait state is inserted to complete the
data phase of the transaction from the master port.
22
Altera Corporation
AN 181: Excalibur Solutions—Multi-Master Reference Design
■
During the wait state, the master port initiates a read of the same
location that it wrote to in the previous transaction. Because
master 3 is in the process of writing to slave 4, the master port is
wait-stated, and at 1155 ns, instead of reading back the 0000000A
that it originally wrote, it reads back AFAFAFAF, which is the
data written to slave 4 by master 3.
Figure 9. BFM RTL Simulation Waveform
7.
Choose End simulation (Design menu) to end the current
simulation.
8.
Examine output.dat in <Installation Directory>\example_designs\
multimaster_ahb\<ads or gnu>\simulation\modelsim.
1
Altera Corporation
The BFM is generally intended to be used to verify a single AHB
master or a single AHB slave, and its text output can be a
valuable tool when verifying a complete system. However, the
full stripe model should still be used for the final system check
out.
23
AN 181: Excalibur Solutions—Multi-Master Reference Design
Running Full Stripe Model RTL Simulation
Follow the steps below to run an RTL simulation on the full stripe model:
1.
Create the project work directory, if it has not already been created.
Perform steps 1 and 2 of the “Running BFM RTL Simulation” on
page 21.
2.
Run the Quartus II software and open the project file <Installation
Directory>\example_designs\multimaster_ahb\<ads or gnu>\
multi_master_reference_design.quartus.
3.
Choose Start Software build (Processing menu) to compile the
assembly code.
1
24
The Quartus II software calls the Altera ADS-Lite tools to
compile the assembly code, and uses a post-software build
command to create the memory initialization files needed for
simulation and to copy them to <Installation Directory>\
example_designs\multimaster_ahb\<ads or gnu>\
simulation\modelsim.
4.
Start the ModelSim simulation tool, if it is not already running.
Choose Execute Macro (Macro menu) and select the <Installation
Directory>\example_designs\multimaster_ahb\<ads or gnu>\
simulation\modelsim\compile_and_run_rtl_fullmodel_v.do
script file.
5.
Examine the waveform to see the waveforms produced by the initial
transactions of the RTL full stripe simulation. The waveform for the
full stripe model has the same signals as the BFM waveform, with
the addition of the embedded processor’s general purpose registers,
the AHB1 bus, and the AHB2 bus.
■
In the full stripe simulation, the testbench ia the same one used
for the BFM simulation. The BFM is now replaced with the actual
stripe model and the code reads and writes to different locations
in the design.
■
At 3585 ns in simulation, master 4, on PLD bus 1, initiates a write
burst to the SRAM module on AHB2, which is signified by a
value of NONSEQ appearing on HTRANS followed by a SEQ. The
burst finishes at 4245 ns.
■
At around 16600 ns, the processor initiates execution of the
following instructions:
Altera Corporation
AN 181: Excalibur Solutions—Multi-Master Reference Design
ldr r12, =SRAM1_BASE
ldmia r12!, {r1-r8}
ldmia r12!, {r1-r8}
;;Load the base address for SRAM1 base in r12
;;First 8 words in SRAM
;;Second 8 words in SRAM
This piece of code loads the base address of SRAM 1 into general
purpose register r12 and then performs two eight-word burst
reads of SRAM, reading the data that was written by master 4 into
registers r1 through r8. Figure 10 on page 25 shows the waveform
for these transactions.
Figure 10. Full Stripe Model RTL Simulation
1
If you are not familiar with the behavior of the AHB, Altera
recommends that you spend time stepping through both the
BFM and full stripe model waveforms, and the associated
assembly code.
The script file <Installation Directory>\example_designs\
multimaster_ahb\<ads or gnu>\simulation\modelsim\
wave_interconnect_matrix.do is also provided, so that you can
study this implementation of an interconnect matrix.
Timing Simulation
Timing simulation can also be performed with both the BFM and the full
stripe model; scripts automate a portion of the simulation process.
1
If the OEM edition of ModelSim is used for timing simulation,
the simulation run can take a very long time to complete.
The steps for performing timing simulation are given below.
Altera Corporation
25
AN 181: Excalibur Solutions—Multi-Master Reference Design
Running BFM Timing Simulation
Follow the steps below to run a timing simulation on the BFM:
26
1.
If you have not already done so, perform steps 1 through 4 of the
BFM RTL simulation.
2.
Choose Open Project (File menu) and open the project file
<Installation Directory>\example_designs\multimaster_ahb\<ads
or gnu>\multi_master_reference_design.quartus.
3.
Choose EDA Tools Settings (Assignments menu).
4.
Choose the Simulation option and click Settings.
5.
Turn off Output Excalibur stripe as a single module (see Figure 11
on page 27). Click OK.
Altera Corporation
AN 181: Excalibur Solutions—Multi-Master Reference Design
Figure 11. Verilog HDL Output Settings
6.
Choose Start Compilation (Processing menu) to synthesize and
place-and-route the design.
7.
Return to the ModelSim simulation tool. Choose Execute Macro
(Macro menu) and select the <Installation Directory>\
example_designs\multimaster_ahb\<ads or gnu>\simulation\
modelsim\compile_and_run_timing_busfuncmodel_v.do script
file.
8.
Examine the waveforms to see the effects of the BFM timing
simulation.
9.
To end the simulation, choose End simulation (Design menu).
10. Examine output.dat in <Installation Directory>\example_designs\
multimaster_ahb\<ads or gnu>\simulation\modelsim.
Full Stripe Model Timing Simulation
Follow the steps below to run a timing simulation on the full stripe model:
1.
Altera Corporation
If you have not already done so, perform steps 1 through 3 of the full
stripe model RTL simulation.
27
AN 181: Excalibur Solutions—Multi-Master Reference Design
2.
If you have not already done so, perform steps 2 through 5 of the
BFM timing simulation.
3.
Choose EDA Tools Settings (Assignments menu).
4.
Choose the Simulation option and click Settings.
5.
Turn on Output Excalibur stripe as a single module (see Figure 11
on page 27). Click OK.
6.
Choose Start Compilation (Processing menu) to synthesize and
place-and-route the design.
7.
Return to the ModelSim simulation tool. Choose Execute Macro
(Macro menu) and select the <Installation Directory>\
example_designs\multimaster_ahb\<ads or gnu>\simulation\
modelsim\compile_and_run_timing_fullmodel_v.do script file.
8.
Examine the waveforms to see the effects of the full stripe timing
simulation.
Because nodes are synthesized out, it can be difficult to view internal
nodes in the design and the waveform file provided is not as detailed
as the RTL versions. For this reason, it is beneficial to combine BFM
simulation with a processor running special software that has been
written to aid hardware verification.
Revision
History
Table 9 shows the document revision history.
Table 9. Revision History
Date
Description
November 2002 Quartus II version 2.2 updates and 3rd-party synthesis removed.
June 2002
28
Modification to accommodate Quartus II version 2.1 and the
GNUPro tools.
April 2002
Modification to acknowledge Quartus II version 2.0.
October 2001
Version 2.0 first publication.
Altera Corporation
Appendix A
Design Notes
This appendix describes the individual components of the multi-master
reference design. It reviews the construction of one of the PLD buses and
discusses issues that are not implemented in the reference design but
should be considered when constructing a system that is to go into
production.
AMBA
Overview
AMBA is an open standard that defines on-chip communications for highperformance embedded systems.
The current version of the AMBA specification, AMBA Specification,
Rev 2.0, defines three bus architectures:
■
■
■
The advanced peripheral bus (APB) is a bus structure for low-speed
simple interface peripherals. The APB interfaces with the system bus
via an ASB or AHB bridge.
The advanced system bus (ASB) was the first system bus
incorporated into the AMBA specification. The ASB is a multiplemaster system that has a pipelined operation and supports burst
transfers.
The advanced high-performance bus (AHB) is the latest system bus.
The AHB is a multi-master pipelined system which supports burst
transfers. The AHB also has separate address and data buses and
includes support for wider data bus configurations.
Figure 12 on page 30 is a block diagram of a typical AMBA bus structure.
Altera Corporation
29
AN 181: Excalibur Solutions—Multi-Master Reference Design
Appendix A
Figure 12. Typical AMBA Bus Structure
High-Performance
Processor
High-Bandwidth
External Memory
Interface
High-Bandwidth
on-chip RAM
AHB or ASB
DMA Bus
Master
B
R
I
D
G
E
UART
Timer
Keypad
PIO
APB
AHB to APB Bridge
or
ASB to APB Bridge
AHB Protocol Review
An AHB master initiates every AHB transaction, whether read or write.
Before beginning a transaction, an AHB master must first acquire
ownership of the bus by asserting its HBUSREQ signal to request the bus.
The master has ownership of the bus when the arbiter asserts HGRANT.
Once granted the bus, the master begins the transaction, which consists of
two distinct phases: an address phase and a data phase. During the
address phase, the master drives out address and control information for
the transaction. In addition, the targeted slave for the transaction samples
the address and control information presented by the initiating master.
Based on the sampled address and control information, the AHB slave
decides how the transaction will finish and drives that information back
to the master in the data phase. If the transaction is a write, the master
drives write data during the data phase and the slave samples that
information. If the transaction is a read, the slave drives out the read data
during the data phase and the master samples it. Figure 13 on page 31 is a
timing diagram for a basic AHB transaction.
30
Altera Corporation
Appendix A
AN 181: Excalibur Solutions—Multi-Master Reference Design
Figure 13. Timing Diagram for a Basic AHB Transaction
Address Phase
Data Phase
HCLK
HADDR[31:0]
Control
A
Control A
HWDATA[31:0]
Data A
HREADY
HRDATA[31:0]
f
Data A
For more information on the AHB Protocol, see the AMBA Specification,
Rev 2.0.
The AHB architecture uses a centralized multiplexing scheme. Because
signals are presented on the bus at different times and in different
directions, it is best to consider the multiplexing as three distinct
multiplexers. PLD bus 1 of the multi-master reference design makes use
of three multiplexers to transfer data:
■
■
■
Address and control multiplexer
Write data bus multiplexer
Read data bus and slave response multiplexer
The address and control multiplexer is a 3-to-1 multiplexer that contains
a set of address and control signals for each master on the bus. HMASTER,
which is an output of the arbiter, is the selector for this multiplexer. The
multiplexer output is the shared address and control bus that is used by
the rest of the bus masters. Figure 14 on page 32 shows the address and
control multiplexer.
Altera Corporation
31
AN 181: Excalibur Solutions—Multi-Master Reference Design
Appendix A
Figure 14. Address and Control Multiplexer
Master#0 HADDR[31:0]
Master#0 HTRANS[1:0]
Master#0 HBURST[2:0]
Master#0 HSIZE[2:0]
Master#0 HWRITE
Master#0 Address & Control
Master#1 Address & Control
Master#2 Address & Control
0
Shared Address &
Control Bus
1
2
Sel
HMASTER
The write data bus multiplexer is another 3-to-1 multiplexer that has a
write data bus for each of the possible masters in an AHB system. The
output of this multiplexer is the shared write data bus for the rest of the
bus. HMASTER_delayed is an output from the arbiter. It is a delayed
version of HMASTER and is used as the selector for this multiplexer. The
delay occurs because the write data is presented during the data phase of
a transaction, which starts after the completion of the address phase.
Figure 15 on page 33 shows the write data multiplexer.
32
Altera Corporation
Appendix A
AN 181: Excalibur Solutions—Multi-Master Reference Design
Figure 15. Write Data Multiplexer
Master#0 HWDATA[31:0]
Master#0 Write Data Bus
Master#1 Write Data Bus
0
Shared Write
Data Bus
1
2
Master#2 Write Data Bus
Sel
HMASTER_delayed
The read data bus and slave response multiplexer is a 4-to-1 multiplexer
that has a set of slave response signals and a read data bus for each of the
slaves in the system. The multiplexer output is the shared read data and
slave response bus used by the rest of the system and the selector
HSEL_BUS. HSEL_BUS is an output of the decoder that selects the slave
trying to be accessed by the master initiating the transaction. Figure 16 on
page 33 shows the read data multiplexer.
Figure 16. Read Data Multiplexer
Slave #0 HRDATA[31:0]
Slave #0 HRESP[1:0]
Slave #0 HREADY
Slave #0 Read Data
& Response
Slave #1 Read Data
& Response
Slave #2 Read Data
& Response
Slave #3 Read Data
& Response
0
Shared Read Data
& Slave Reponse Bus
1
2
3
Sel
HSEL_BUS
Altera Corporation
33
AN 181: Excalibur Solutions—Multi-Master Reference Design
Appendix A
Because multiplexers are highly combinatorial, they can cause bottlenecks
in the system performance. The most successful optimization efforts are
likely to result from creating an efficient way of multiplexing the signals.
This is especially important if a large number of peripherals are connected
to the bus structure.
1
The multiplexers used in the reference design are a behavioral
model and have not been optimized.
Arbitration
Arbitration is the mechanism used to ensure that only one master has
access to the bus at any given time. This is accomplished by monitoring
the request signals from the masters in the system and granting access to
the highest-priority master requesting the bus. The arbitration algorithm
implemented determines a master’s priority. The PLD buses have very
simple priority-based arbiters. Master 0 on PLD bus 2, which is the stripeto-PLD bridge, has the highest priority in this system, because it helps to
minimize the number of wait states that can occur on AHB2 while
peripherals in the stripe write or read from the PLD bus. Master 2 is the
lowest-priority master.
Figure 17. Arbiter
HBUSREQ[15:0]
HMASTER[3:0]
Shared Address
& Control
HMASTER_delayed[3:0]
HCLOCK
Arbiter
HGRANT[15:0]
HRESETn
34
Altera Corporation
Appendix A
AN 181: Excalibur Solutions—Multi-Master Reference Design
The AHB protocol allows the address phase of one transaction to overlap
with the data phase of a previous transaction. The arbiter plays an
important role in this process. Not only does the arbiter have to monitor
the masters’ bus requests, it must also monitor the activity on the bus so it
can tell when a transaction is about to finish. Once the previous
transaction is about to finish, it grants the address and control bus to the
next master requiring access to the system.
The arbiter can discern when the current transaction is almost complete by
monitoring the HTRANS, HBURST, and HREADY signals on the bus.
Depending on the value of HBURST, the arbiter can tell how many beats of
data will be transferred. Then it keeps track of how many beats of data
have actually passed through the system by monitoring HTRANS and
HREADY.
A much simpler approach to deducing when a master has finished
transferring data is to monitor HTRANS. Masters in an AHB system are
required to issue an IDLE transaction when they are granted the bus, but
have no data to transfer. In the clock cycle after a master has finished
processing a transaction, it must issue an IDLE transaction because it is
still granted the bus. In this case, the arbiter can watch for IDLE to be
presented on HTRANS and it can then grant access to the next master. The
arbiter in the reference design uses this implementation, which greatly
simplifies the arbiter design at the small cost of losing the pipelining effect
during the interval between switching masters. Figure 18 on page 36
shows the difference in timing between the two implementations.
Altera Corporation
35
AN 181: Excalibur Solutions—Multi-Master Reference Design
Appendix A
Figure 18. Timing Diagram of AHB Transactions
Overlapping address and data phases.
Arbiter monitors HTRANS [1:0],
HBURST[2:0], HREADY
Address Phase A
Address Phase B
Non-overlapping address and data phases
Arbiter only has to monitor HTRANS[1:0]
Address Phase A
Data Phase A
IDLE Cycle
Address Phase B
Data Phase A
HCLK
HADDR[31:0]
A
HTRANS[1:0]
NONSEQ
NONSEQ
NONSEQ
Control
Control A
Control B
Control A
B
HWDATA[31:0]
Data A
A
B
IDLE
NONSEQ
Control B
Data A
HREADY
HRDATA[31:0]
Data A
1
Data A
Figure 18 shows two situations: one in which the arbiter
monitors all of the control signals; and the other in which the
arbiter monitors only HTRANS.
AHB Master
The master is the active element on the AHB. It is responsible for initiating
both read and write transactions and for producing appropriate signals at
the right time to adhere to the AHB protocol. During the address phase,
the master must drive address and control information for the transaction.
The actual address and control information that it produces depends on
the application and back-end master logic. During the data phase, the
master drives the write data for a write transaction and samples the read
data for a read transaction and the response signals from the slave that
accepted the transaction. Depending on the response from the slave, the
master must either process the data received, process the next transaction,
or react to the slave’s response to the transaction. Figure 19 on page 37 is
an example state diagram for an AHB master.
36
Altera Corporation
Altera Corporation
HTRANS[1:0]
HADDR[31:0]
HWRITE
HSIZE[2:0]
HBURST[2:0]
HRESETn
HREADY
HGRANTx
HRESP[1:0]
HRDATA[31:0]
HWDATA[31:0]
OUTPUTS:
HBUSREQx
INPUTS:
HRESETn = 1
Start_trans = 0
HGRANTx = X
HREADY = X
HRESP[1:0] = X
HRDATA[31:0] = X
-------------------------HBUSREQx = 0
HLOCK = 0
HTRANS[1:0] = IDLE
HADDR[31:0] = 0
HWRITE = AHB_READ
HSIZE[2:0] = 0
HBURST[2:0] = Single
HWDATA[31:0] = 0
HCLOCK
HRESETn = 0
Start_trans = X
HGRANTx = X
HREADY = X
HRESP[1:0] = X
HRDATA[31:0] = X
-------------------------HBUSREQx = 0
HLOCK = 0
HTRANS[1:0] = IDLE
HADDR[31:0] = 0
HWRITE = AHB_READ
HSIZE[2:0] = 0
HBURST[2:0] = Single
HWDATA[31:0] = 0
HRESETn = 1
Start_trans = 1
HGRANTx = 0
HREADY = X
HRESP[1:0] = X
HRDATA[31:0] = X
-------------------------HBUSREQx = 1
HLOCK = 0
HTRANS[1:0] = NONSEQ
HADDR[31:0] = ADDRESS
HWRITE = AHB_WRITE
HSIZE[2:0] = AHB_WORD
HBURST[2:0] = Single
HWDATA[31:0] = 0
IDLE
HRESETn = 1
Start_trans = X
HGRANTx = X
HREADY = 1
HRESP[1:0] = OKAY
HRDATA[31:0] = DATA
---------------------HBUSREQx = 0
HLOCK = 0
HTRANS[1:0] = IDLE
HADDR[31:0] = ADDRESS + 4
HWRITE = AHB_READ
HSIZE[2:0] = AHB_BYTE
HBURST[2:0] = Single
HWDATA[31:0] = 0
Address Phase
Data Phase
HRESETn = 1
Start_trans = X
HGRANTx = X
HREADY = 1
HRESP[1:0] = X
HRDATA[31:0] = X
---------------------HBUSREQx = 0
HLOCK = 0
HTRANS[1:0] = IDLE
HADDR[31:0] = ADDRESS + 4
HWRITE = AHB_WRITE
HSIZE[2:0] = AHB_WORD
HBURST[2:0] = Single
HWDATA[31:0] = DATA
HRESETn = 1
Start_trans = 1
HGRANTx = 1
HREADY = X
HRESP[1:0] = X
HRDATA[31:0] = X
-------------------------HBUSREQx = 1
HLOCK = 0
HTRANS[1:0] = NONSEQ
HADDR[31:0] = ADDRESS
HWRITE = AHB_WRITE
HSIZE[2:0] = AHB_WORD
HBURST[2:0] = Single
HWDATA[31:0] = 0
HRESETn = 1
Start_trans = X
HGRANTx = X
HREADY = 0
HRESP[1:0] =OKAY
HRDATA[31:0] = X
---------------------HBUSREQx = 0
HLOCK = 0
HTRANS[1:0] = IDLE
HADDR[31:0] = ADDRESS + 4
HWRITE = AHB_WRITE
HSIZE[2:0] = AHB_WORD
HBURST[2:0] = Single
HWDATA[31:0]= DATA
HRESETn = 1
Start_trans = X
HGRANTx = X
HREADY = 0
HRESP[1:0] =X
HRDATA[31:0] = X
---------------------HBUSREQx = 0
HLOCK = 0
HTRANS[1:0] = NONSEQ
HADDR[31:0] = ADDRESS + 4
HWRITE = AHB_WRITE
HSIZE[2:0] = AHB_WORD
HBURST[2:0] = Single
HWDATA[31:0] = 0
Appendix A
AN 181: Excalibur Solutions—Multi-Master Reference Design
Figure 19. Example Master State Diagram
37
AN 181: Excalibur Solutions—Multi-Master Reference Design
Appendix A
Because the construction of the state machine depends on the back-end
logic of the master in question, the state machine implementation can vary
dramatically. Figure 19 on page 37 exemplifies how a state machine that
produces AHB-compliant signals might look. One thing to note about this
diagram is that a master can drive out address and control information as
it moves from the idle phase to the address phase without being granted
the bus. This is because the signals are multiplexed and are not presented
to the rest of the bus elements until the arbiter grants access to the master.
Figure 20 shows the master input and output signals.
Figure 20. Master I/O
HBUSREQx
HGRANTx
HADDR[31:0]
HREADY
HTRANS[1:0]
HRESP[1:0]
HWRITE
Master
HSIZE[2:0]
HRESETn
HBURST[2:0]
HCLOCK
HRDATA[31:0]
1
HWDATA[31:0]
Features enabled by HLOCK and HPROT[3:0] are not used in the
multi-master reference design implementation.
Decoder
The AHB makes use of a signal-centralized combinatorial address
decoder, which uses the higher-order, shared-address bus bits as its
inputs and then produces the chip-selects for all the slaves in a system.
The same output signals can also be used as the selector for the slave
response and read data multiplexer. Figure 21 on page 39 shows the
decoder input and output.
38
Altera Corporation
Appendix A
AN 181: Excalibur Solutions—Multi-Master Reference Design
Figure 21. Decoder I/O
Shared HADDR[31:10]
Decoder
HSEL_BUS[21:0]
Bottlenecks can occur in the address decoder during operation, and
simple address decoding schemes are recommended to help with system
performance. The reference design gives a 4-Kbyte address space to each
slave in the system, with the exception of the PLD-to-stripe bridge in the
stripe. The PLD-to-stripe bridge is at a top-of-address space and is given
plenty of address space for stripe peripherals.
AHB Slave
AHB slaves can be considered passive elements in the bus structure
because they do not initiate transactions on the bus. A key requirement
when planning a slave is to understand the series of events that occurs on
the bus and then design accordingly. As the AHB protocol states, there is
an address phase and data phase for every AHB transaction, therefore one
way to construct the slave is to build a state machine that has a state for
each phase in the transactions. Figure 22 on page 40 shows a state diagram
for an AHB slave.
1
Altera Corporation
Figure 22 on page 40 shows only an example, which supports
only basic transactions. It is not a fully-compliant state diagram.
39
40
HWDATA
HBURST
HSIZE
HTRANS
HADDRESS
HWRITE
HREADY_in
HRESETn
HSEL
HCLOCK
INPUTS:
Error Phase
Error Condition
---------------------HREADY_out = 0
HRESP = ERROR
HRDATA = X
HRESETn = 0
HSEL = X
HREADY_in = X
HWRITE = X
HADDRESS = X
HTRANS = X
HSIZE = X
HBURST = X
HWDATA = X
---------------------HREADY_out = 1
HRESP = OKAY
HRDATA = X
Error Condition
---------------------HREADY_out = 0
HRESP = ERROR
HRDATA = X
HRESETn = 1
HSEL = 1
HREADY_in = 1
HWRITE = X
HADDRESS = X
HTRANS = X
HSIZE = X
HBURST = X
HWDATA = DATA
---------------------HREADY_out = 1
HRESP = OKAY
HRDATA = DATA
Address Phase
Latch Address and
Control
HRESETn = 1
HSEL = 1
HREADY_in = 1
HWRITE = X
HADDRESS = X
HTRANS = X
HSIZE = X
HBURST = X
HWDATA = X
---------------------HREADY_out = 1
HRESP = ERROR
HRDATA = DATA
HRESETn = 1
HSEL = 0
HREADY_in = X
HWRITE = X
HADDRESS = X
HTRANS = X
HSIZE = X
HBURST = X
HWDATA = X
---------------------HREADY_out = 1
HRESP = OKAY
HRDATA = X
Data Phase
HRESETn = 1
HSEL = 1
HREADY_in = 1
HWRITE = X
HADDRESS = X
HTRANS = NONSEQ
HSIZE = X
HBURST = X
HWDATA = X
---------------------HREADY_out = 1
HRESP = OKAY
HRDATA = X
HRESETn = 1
HSEL = 1
HREADY_in = 0
HWRITE = X
HADDRESS = X
HTRANS = X
HSIZE = X
HBURST = X
HWDATA = X
---------------------HREADY_out = 1
HRESP = OKAY
HRDATA = X
HRDATA
HRESP
Wait Condition from
Back-end logic
---------------------HREADY_out = 0
HRESP = OKAY
HRDATA = X
HREADY_out
OUTPUTS:
AN 181: Excalibur Solutions—Multi-Master Reference Design
Appendix A
Figure 22. Example State Diagram for AHB Slave
Altera Corporation
Appendix A
AN 181: Excalibur Solutions—Multi-Master Reference Design
The type of application and the effect of the back-end slave logic mean that
permutations of the state diagram are possible. During the address phase,
the address decoder chip selects slaves. Once selected, a slave is required
to sample the address and control information for the transaction. During
the data phase of the transaction, the slave responds to the address and
control information presented during the address phase. It processes the
transaction if the information being presented is valid and the back-end
logic is ready to receive it.
Figure 23 shows the slave inputs and outputs.
Figure 23. Slave Inputs and Outputs
HSELx
HADDR[31:0]
HWRITE
HREADY_out
HTRANS[1:0]
HSIZE[2:0]
HBURST[2:0]
HWDATA[31:0]
HRESP[1:0]
Slave
HRDATA[31:0]
HREADY_in
HRESETn
HCLOCK
1
Features enabled by HMASTER[3:0], HMASTERLOCK, and
HSPLIT[15:0] are not used in the multi-master reference
design implementation.
PLD Bus 2 Structure
Combining the components detailed above yields the full bus
implementation shown in Figure 24 on page 42. Each master, including
the stripe-to-PLD bridge, has access to the slaves that are local to this bus
and to the slaves that are connected to the interconnect matrix.
Altera Corporation
41
AN 181: Excalibur Solutions—Multi-Master Reference Design
Appendix A
Slave
8
Slave
9
HADDR[31:0]
Address & Control Bus
Address & Control 1
Address & Control 2
Master
1
Arbiter HBUSREQ[2:0] & HGRANT[2:0]
Decoder
HRDATA & Slave Resp 8
Slave
7
Master 0
Stripe-toPLD Bridge
Address & Control 0
HSEL_BUS[31:0]
HRDATA & SlaveResp 7
Interconnect
Matrix
HRDATA & Slave Resp Layer 2
Figure 24. Layer 2 Bus Structure
Write Data 0
Write Data Bus
Master
2
Write Data 1
Write Data 2
Read Data & Slave
Response Bus
HRDATA & Slave Resp 9
The implementation shown in Figure 24 suggests that this solution could
be used as a PLD bus structure on its own, with the PLD-to-stripe bridge
replacing the interconnect matrix. Masters in the PLD would have access
to slaves in the PLD and in the stripe, and masters in the stripe would have
access to slaves in the PLD. However, this implementation is not a
complete solution on its own and is subject to bus lock-ups. If the
processor initiates a transaction whose destination is a slave in the PLD,
the transaction starts on AHB1 and travels to AHB2 via the AHB1-2
bridge, finally accessing the slave by reaching the bus in the PLD via the
stripe-to-PLD bridge. A bus lock-up can occur if a master in the PLD starts
a transaction whose destination is on AHB2 after the processor has gained
access to AHB2, but before it gains access to the bus in the PLD.
Figure 25 on page 43 shows a timing diagram of this lock-up occurring.
42
Altera Corporation
Appendix A
AN 181: Excalibur Solutions—Multi-Master Reference Design
Figure 25. Timing Diagram Showing Bus Lock Up
AHB1
Processor HGRANT
AHB1 HMASTER
Processor
AHB2
AHB1-to-AHB2 bridge HBUSREQ Requesting AHB2
AHB2 Granted to
AHB1-2 Bridge
AHB1-to-AHB2 bridge HGRANT
AHB2 HMASTER
AHB1-to-AHB2 Bridge
Requesting PLD Bus
Stripe-to-PLD bridge HBUSREQ
PLD BUS
PLD master 1 HBUSREQ
Requesting PLD
Bus
PLD Bus Granted
to PLD Master
PLD master 1 HGRANT
PLD bus HMASTER
PLD Master
PLD-to-stripe bridge HBUSREQ
Requesting AHB
T1
T2
T3
T4
T5
T6
If the embedded processor begins a transaction whose destination is slave
7 before T1, at some point the AHB1-to-AHB2 bridge requests the AHB2
bus (shown at T1). In addition, at T1 a master in the PLD requests the bus
in the PLD with, for example, the ultimate destination being the SRAM in
the stripe. At time T2, AHB2 is granted the AHB1-2 bridge and the PLD
bus is granted to PLD master 1. At time T3 the lock-up occurs. At time T3,
the transaction that originated at the processor is at the stage where the
stripe-to-PLD bridge is trying to access the PLD bus, but PLD master 1
owns the PLD bus. Also, at this time the transaction originating from PLD
master 1 is at the state where the PLD-to-stripe bridge is trying to access
AHB2, but the AHB1- 2 bridge owns AHB2. Because neither of the
transactions can reach their final destination, the bus becomes locked.
To fix the lock problem, it is necessary to break the overlapping paths
from the stripe into the PLD and from the PLD to the stripe. An
interconnect matrix solves both of these problems.
Altera Corporation
43
AN 181: Excalibur Solutions—Multi-Master Reference Design
Appendix A
Interconnect Matrix
The interconnect matrix structure allows multiple masters in the system
to access slaves in parallel. This is done by having a point-to-point
connection from each layer input to every slave connected to the matrix.
With this implementation, a master on layer 1 can access a slave on the
interconnect matrix while a master on layer 2 is accessing another slave.
When two layers require access to the same slave, the master that is not
granted access to the slave is placed into a wait condition until the other
transaction is finished, which greatly simplifies arbitration. From the layer
side, the interconnect resembles an AHB slave interface. From the slave
side, the interconnect matrix resembles an AHB master interface.
Figure 26 on page 44 shows a block diagram of the internal components of
the interconnect matrix.
Figure 26. Interconnect Matrix Internal Components
Interconnect Matrix
Decode
Mux
Layer1
Slave1
Input
Stage
Mux
Slave2
Decode
Layer2
Input
Stage
Mux
44
Slave3
Altera Corporation
Appendix A
AN 181: Excalibur Solutions—Multi-Master Reference Design
Each layer has an input stage and decoder, and is connected to a
multiplexer for each slave that is on the matrix. The input stage is a buffer
for the address and control information for the transaction. When a layer
is put into a wait condition because a slave is being accessed by another
layer, the address and control information for that transaction must be
saved. The decoder decodes the address that is presented and serves as
the selector for the multiplexer. The multiplexer routes the correct layer to
the slave and it pulls HREADY low for the layers that do not have access to
the slave that it is connected to.
Interconnect Input Stage
The input stage buffers data from a master that is trying to access a slave
which is already being accessed by the other layer. It performs this
function by monitoring the HREADY_slave signal from the multiplexer
structure. If this signal is high, the information presented by the layer is
passed straight through. If HREADY_slave is low, the data is registered
and is not presented to the slave until HREADY_slave is asserted again.
Figure 27 shows the I/O for the input stage.
Figure 27. Input Stage
HADDRESS[31..0]
HREADY
HRESETn
HSEL
Interconnect
Decoder
HSEL_delayed
HCLOCK
The multi-master reference design also uses the input stage to register the
write data from the layer. The write data is registered one clock cycle after
the address and control information.
Interconnect Decoder
The interconnect decoder decodes the address presented by the input
stage and drives HSEL to select the slave for which the transaction is
intended. HSEL_delayed is a delayed version of HSEL. Both HSEL and
HSEL_delayed are driven to the multiplexer structure and used as
selectors. Figure 28 on page 46 shows the I/O for the interconnect
decoder.
Altera Corporation
45
AN 181: Excalibur Solutions—Multi-Master Reference Design
Appendix A
Figure 28. Interconnect Decoder
HADDRESS[31..0]
HSEL
HREADY
Interconnect
Decoder
HSEL_delayed
HRESETn
HCLOCK
Interconnect Multiplexer Structure
An interconnect multiplexer structure comprises one multiplexer for
every slave (which routes the layer’s address, control, and write data to
the slaves) and one multiplexer for every layer (which routes the slave’s
responses and read data back to the layers). The multi-master reference
design uses a total of five multiplexers in the interconnect matrix, of which
three are interconnect slave multiplexers and two are interconnect
response multiplexers.
Figure 29 on page 47 shows an interconnect slave multiplexer, which
routes address, control, and write data.
46
Altera Corporation
Appendix A
AN 181: Excalibur Solutions—Multi-Master Reference Design
Figure 29. Interconnect Slave Multiplexer
Layer1_req_HSEL
HADDR_layer1[31..0]
HWRITE_layer1
HTRANS_layer1[1..0]
HSIZE_layer1[2..0]
HSEL_slave
HBURST_layer1[2..0]
HADDR_slave{31..0]
HWDATA_layer1[31..0]
HWRITE_slave
HREADY_in_layer1
HRESETn
HTRANS_slave[1..0]
Interconnect
Slave
Multiplexer
HCLOCK
HSIZE_slave[2..0]
HBURST_slave[2..0]
HREADY_in_slave
HADDR_layer2[31..0]
HWRITE_layer2
HTRANS_layer2[1..0]
Layer_granted_delay
Layer_granted_out
HSIZE_layer2[2..0]
HBURST_layer2[2..0]
HWDATA_layer2[31..0]
HREAD_in_layer2
Layer2_req_HSEL
The interconnect slave multiplexer is also responsible for arbitrating
between the slaves it is attached to. It accepts request signals from each of
the layers and, based upon the request, grants access to the slave by
driving layer_granted_out. Layer_granted_out and the variant
layer_granted_delay are both driven to the interconnect response
multiplexer.
The interconnect response multiplexer shown in Figure 30 routes the
slaves’ transaction responses back to the layers. It is also responsible for
placing the layers in a wait condition if the destination slave is busy.
Altera Corporation
47
AN 181: Excalibur Solutions—Multi-Master Reference Design
Appendix A
Figure 30. Interconnect Response Multiplexer
HRESP_slave1[2..0]
HRDATA_slave1[31..0]
HSEL[2..0]
HREADY_slave1
HSEL_delayed[2..0]
Layer_granted_slave1
Layer_granted_slave1_delay
HTRANS[1..0]
HRESP_slave2[2..0]
Interconnect
Response
Multiplexer
HRESP_layer[2..0]
HRDATA_slave2[31..0]
HREADY_slave2
Layer_granted_slave2
HRDATA_layer[31..0]
Layer_granted_slave2_delay
HREADY_layer
HRESP_slave3[2..0]
HRDATA_slave3[31..0]
HREADY_slave3
HRESETn
HCLOCK
1
Layer_granted_slave3
Layer_granted_slave3_delay
The multi-master reference design contains two of the
interconnect response multiplexes.
Interconnect Matrix Summary
The flexibility of the interconnect matrix allows for several different bus
topologies. The multi-master reference design uses the interconnect
matrix to bridge several bus structures together, but it can be used in
various configurations to meet differing design needs. For additional
information, see the multi-layer AHB specification at
http://www.arm.com.
48
Altera Corporation
Appendix A
Implementation Results
AN 181: Excalibur Solutions—Multi-Master Reference Design
This appendix considers the implementation of the multi-master
reference design on an EPXA10 chip with regard to the efficiency of its use
of logic elements (LEs).
Device Utilization
The multi-master reference design uses approximately 12,900 LEs, or 33%
of an EPXA10 device. The majority of the resource usage is in building the
large number of register files—the multi-master reference design bus
structure is only a small portion of the total number of LEs. Table10 shows
the logic element distribution between the different instances in the
design. These are approximate numbers generated by the
LeonardoSpectrum synthesis tool.
1
Resource sharing is enabled, therefore some modules might be
sharing portions of logic.
Table 10. Multi-Master Reference Design Logic Element Distribution
Instance
Number of
Logic Element
Instances
Percentage of
Total LEs
Master_single_read
1
206
1.63
Master_alu
1
122
0.97
Master_single_write
1
143
1.13
Master_burst_write
1
218
1.73
Master_burst_read
1
183
1.45
Single_transaction_slave
3
99
0.79
Regfile
8
7152
56.75
Burst_slave
3
171
1.36
Narrow slave
1
72
0.57
Narrow_regfile
1
462
3.67
Wide_slave
1
168
1.33
Input_stage
2
96
0.76
Interconnect_mux
3
163
1.29
Interconnect_mux_resp_layer1
1
96
0.76
Interconnect_mux_resp_layer2
1
95
0.75
Ahb_slave_sm
1
113
0.90
Wait_state_gen
1
63
0.50
Alu_regfile
1
204
1.62
ALU and other logic
1
2777
22.03
Total
Altera Corporation
Number of
Instance
12603
49
AN 181: Excalibur Solutions—Multi-Master Reference Design
Appendix A
As the table shows, the majority of the LEs are used in this design for the
register files and the arithmetic unit. One way to decrease the LE count for
register files is to make use of the embedded system block (ESB). The
actual bus structure, multiplexes, decoders, arbiters, and interconnect
matrix uses less than 1300 LEs, leaving plenty of room for peripheral logic.
Performance
The multi-master reference design runs at greater than 33 MHz in an
EPXA10 device. The slowest communication is between the stripe-to-PLD
bridge and a slave on the interconnect matrix. The challenge with this path
is that, for any master to send a transaction to a slave on the interconnect
matrix, signals must propagate through two layers of multiplexing. To
improve on an fMAX of 33 Mhz, the multiplexers should be the focus of
optimization. Because of the centralized multiplexing scheme inherent to
AHB, the number of select lines on the multiplexers increases with the
number of peripherals. The delay incurred through a multiplexer
increases dramatically as the number of select lines increases. If there are
a high number of peripherals in a design, developing methods of keeping
the multiplexers small is the first way to improve fMAX.
101 Innovation Drive
San Jose, CA 95134
(408) 544-7000
http://www.altera.com
Applications Hotline:
(800) 800-EPLD
Literature Services:
[email protected]
50
Copyright © 2002 Altera Corporation. All rights reserved. Altera, The Programmable Solutions Company, the
stylized Altera logo, specific device designations, and all other words and logos that are identified as
trademarks and/or service marks are, unless noted otherwise, the trademarks and service marks of Altera
Corporation in the U.S. and other countries. All other product or service names are the property of their
respective holders. Altera products are protected under numerous U.S. and foreign patents and pending
applications, mask work rights, and copyrights. Altera warrants performance of its
semiconductor products to current specifications in accordance with Altera’s standard
warranty, but reserves the right to make changes to any products and services at any time
without notice. Altera assumes no responsibility or liability arising out of the application
or use of any information, product, or service described herein except as expressly agreed
to in writing by Altera Corporation. Altera customers are advised to obtain the latest
version of device specifications before relying on any published information and before
placing orders for products or services.
Altera Corporation
© Copyright 2026 Paperzz