Briefly explain how the biquad filter works. Reference the verilog code and the OPENCORES
website.
A digital signal processing (DSP) filter works by processing a sequence of numbers at discrete intervals, usually
with the same time interval between each number. The numbers are usually generated by sampling an analog
signal using an ADC, although they can be generated by a similar circuit such as a numerically controlled
oscillator, or from a previously stored data file.
DSP filters are usually characterized by a difference equation (similar to a differential equation with continuous
filters):
yn b0 xn b1 xn 1 b2 xn 2 a0 yn 1 a2 yn 2
In order to determine what the filter does, one needs to find the transfer function of the filter. This is done by
finding the z-transform of the difference equation.
Y ( z ) b0 b1 z 1 b2 z 2
X ( z ) 1 a1 z 1 a 2 z 2
This can be factored to yield poles and zeroes in the complex z-plane, from which the frequency response can be
calculated by evaluating the transfer function over the unit circle. The coefficients can be determined by using
typical DSP filter design procedures (such as converting a continuous filter expression into a sampled one, or by
windowing a set of ideal coefficients).
An implementation of this filter is realized in the time domain by implementing the difference equation. There
are several different methods of implementing a difference equation, each having their own advantages and
disadvantages with regards to circuit complexity, output delay, etc..
The biquad filter is a fairly simple implementation that implements the difference equation almost as it written.
The input is sent through a delay line (each delay consisting of 2 cycles) and taps from the delay line are
multiplied by the coefficients and added. Each multiply and accumulate stage is given a delay so as to ease the
timing constraints on the multipliers. After the first set of coefficients have been used, the output from previous
stages is added. This feedback loop causes the filter output to propagate even if the input is 0. If the
coefficients are not properly chosen, then the loop may oscillate or become unstable, depending on the location
of the poles, although a properly designed filter’s output will die out as time progresses.
Because the filter is designed to be a general purpose filter, the coefficients are not predetermined, thus
increasing the complexity of the circuit, as logic must be added to load the coefficients (which is done by the
Coefio module), and simplifications cannot be done (for instance, if a coefficient was 1 or 0, a multiplier could be
omitted). The coefio module works by allowing the external circuit to set each of the coefficients, 1 at a time at
a time. This module then records and holds each of these values, which are connected to the main filter. This
filter contains a series of multipliers, adders and delay units which together implement the difference equation.
What is "WishBONE"?
WishBONE is a standardized interface for DSP cores. This interface allows for many different operations to be
performed. In the filter used in this assignment, the filter uses the WishBONE interface to allow filter
coefficients to be loaded.
The advantage of using a standardized interface, rather than creating a customized one, is that designer using
the circuit is much more likely to know how to use the standardized interface, so design time when
implementing the circuit is minimized. In addition, if the module is replacing an older module using the same
interface, then very little code will need to be changed to implement and test the design. With standardized
interfaces, it is much more likely that different modules will be interchangeable. However, using a standard
creates a performance penalty, as the interface is a general purpose one, and thus is unable to take advantage
of any simplifying assumptions that may be present in a given design. For instance, if the coefficients of a filter
were fixed, the circuit could be greatly simplified and the performance improved by using these fixed
coefficients. The tradeoff between using standardized interfaces is thus up to the designer.
How are the filter coefficients stored?
The coefficients are stored in the registers of the coefio module. The external circuit loads feeds it the
coefficients, one at a time by changing the various address and enable pins to determine which register is
changed.
Can they be changed?
Because each of the coefficients is stored in a register, there is no physical reason why the coefficients cannot be
changed. The interface itself also allows the coefficients to be changed by virtue of the fact that there is not
code which prevents writing over of existing coefficients. It should also be noted that the reset input will always
reset all of the coefficients to 0.
Why do you think there are two multiplier modules (multa and multb)?
The multipliers are each used in different areas. The multa modules are used for feedback coefficients, while
the multb modules are used for the feed-forward coefficients. Each of these have different constraints based on
their expected input and output values.
For instance, as numbers progress through the filter and are multiplied and added, the expected range will
increase. Thus, in order to prevent overflow, the number of bits for the feed-back multipliers will need to be
higher than those in the feed-forward section. As can be seen in the code, the number of bits for the input and
the output of the A and B coefficients are all different, although the input and output size of each set are all the
same (ie. Each feedback coefficient multiplier has the same number of output bits)
Provide schematics of all the elaborated modules from Part 2. Hint: There is a quick way to plot
everything out.
The output from the plot command is found at the end of this report. These plots were generated using the all
designs in hierarchy option.
Review the verilog module code, the schematic for bqmain, and the documentation at the
OPENCORES site and the PDF file in the DOCs directory to arrive at a reasonable description of
each of the pins of the two blocks "coefio" and "biquad".
Coefio Pin description
The coefio module allows the coefficients of the filter to be loaded and or changed. To decrease the number of
pins and wires used externally to the filter, the module only allows 1 coefficient to be loaded at a time.
Pin name
adr_i[2:0]
Pin type
Input
clk_i
Input
dat_i[15:0]
rst_i
Input
Input
stb_i
Input
we_i
a11[15:0]
a12[15:0]
ack_o
Input
Output
Output
Output
b10[15:0]
b11[15:0]
b12[15:0]
dat_0[15:0]
Output
Output
Output
Output
Description
Input address selection module. Allows the external interface to select
which coefficient is loaded or outputted.
Input clock to module, which is required since the module uses sequential
logic.
Input data line. The value of this wire is used to set the value of the next bit.
Rising edge reset. When high, all registers are set to 0, and all coefficients
are thus set to 0.
Strobe Input: value is used to indicate that the device has been selected.
This appears to be redundant (since it appears to perform the same function
as the write enable input), but is required by the standard, as the same write
enable input may be shared among multiple devices, while each device
receives unique strobe input inputs.
Write enable input. Must be high in order for data to be written
Holds the current value of the a11 coefficient.
Holds the current value of the a12 coefficient.
Signal which indicates that the module has acknowledged the current
request to write data
Holds the current value of the b10 coefficient.
Holds the current value of the b11 coefficient.
Holds the current value of the b12 coefficient.
Outputs the current value of the selected coefficient.
Biquad Pin description
The biquad module performs the actual filtering of the input signal. It requires that another module external to
it store and hold all of the filter coefficients (coeffio in this case).
Pin name
a11[7:0]
a12[7:0]
b10[7:0]
b11[7:0]
b12[7:0]
clk
nreset
valid
Pin Type
Input
Input
Input
Input
Input
Input
Input
Input
Description
Coefficient value for the a11 coefficient.
Coefficient value for the a12 coefficient.
Coefficient value for the b10 coefficient.
Coefficient value for the b11 coefficient.
Coefficient value for the b12 coefficient.
Clock signal for controlling delay lines.
Reset signal for resetting all registers to 0. Active when nreset is low
Indicates that the input is valid.
Pin name
x[7:0]
yout[7:0]
Pin Type
Input
Output
Description
Input to the filter
Output of the filter
Write a script to perform all the operations you did manually in Part 2.
The following script was created.
sh mkdir WORK
define_design_lib WORK -path "./WORK"
analyze -format verilog -lib WORK {"/home/jlam/5804/biquad/HDLs/bqmain.v"}
analyze -format verilog -lib WORK {"/home/jlam/5804/biquad/HDLs/biquad.v"}
analyze -format verilog -lib WORK {"/home/jlam/5804/biquad/HDLs/coefio.v"}
analyze -format verilog -lib WORK {"/home/jlam/5804/biquad/HDLs/multa.v"}
analyze -format verilog -lib WORK {"/home/jlam/5804/biquad/HDLs/multb.v"}
elaborate bqmain -arch "verilog" -lib DEFAULT -update
create_schematic -size infinite
-gen_database
Write a script to perform all the operations you did manually in Part 3.
The script on the following page was created.
set_load 0.2 "ack_o"
set_load 0.2 "ack_o"
set_load 0.2 "dat_o[15]"
set_load 0.2 "dat_o[14]"
set_load 0.2 "dat_o[13]"
set_load 0.2 "dat_o[12]"
set_load 0.2 "dat_o[11]"
set_load 0.2 "dat_o[10]"
set_load 0.2 "dat_o[9]"
set_load 0.2 "dat_o[8]"
set_load 0.2 "dat_o[7]"
set_load 0.2 "dat_o[6]"
set_load 0.2 "dat_o[5]"
set_load 0.2 "dat_o[4]"
set_load 0.2 "dat_o[3]"
set_load 0.2 "dat_o[2]"
set_load 0.2 "dat_o[1]"
set_load 0.2 "dat_o[0]"
set_load 0.2 "y[7]"
set_load 0.2 "y[6]"
set_load 0.2 "y[5]"
set_load 0.2 "y[4]"
set_load 0.2 "y[3]"
set_load 0.2 "y[2]"
set_load 0.2 "y[1]"
set_load 0.2 "y[0]"
set_load 0.2 "y[7]"
set_load 0.2 "y[6]"
set_load 0.2 "y[5]"
set_load 0.2 "y[4]"
set_load 0.2 "y[3]"
set_load 0.2 "y[2]"
set_load 0.2 "y[1]"
set_load 0.2 "y[0]"
set_load 0.2 "dat_o[15]"
set_load 0.2 "dat_o[14]"
set_load 0.2 "dat_o[13]"
set_load 0.2 "dat_o[12]"
set_load 0.2 "dat_o[11]"
set_load 0.2 "dat_o[10]"
set_load 0.2 "dat_o[9]"
set_load 0.2 "dat_o[8]"
set_load 0.2 "dat_o[7]"
set_load 0.2 "dat_o[6]"
set_load 0.2 "dat_o[5]"
set_load 0.2 "dat_o[4]"
set_load 0.2 "dat_o[3]"
set_load 0.2 "dat_o[2]"
set_load 0.2 "dat_o[1]"
set_load 0.2 "dat_o[0]"
create_clock -name "clk_i" -period 25 -waveform {
"0" "12.5" } {
"clk_i" }
create_clock -name "dspclk" -period 12.5 -waveform {
"0" "6.25" } {
"dspclk" }
set_clock_skew -plus_uncertainty 0.25 "dspclk"
set_clock_skew -minus_uncertainty 0.25 "dspclk"
Once your design is synthesized, Design Analyzer can give you estimates on required area, power, etc.
You can generate these reports by going to Analysis-->Report. Find the estimated area for the layout of
the biquad filter.
The following report was generated, indicating an expected total area of 360000 um2 or 0.36 mm2.
****************************************
Report : area
Design : bqmain
Version: X-2005.09
Date
: Wed Dec 12 13:21:33 2007
****************************************
Library(s) Used:
tcb773pwc (File: /CMC/kits/cmosp35/synopsys/2000.11-SP1/syn/tcb773pwc.db)
Number
Number
Number
Number
of
of
of
of
ports:
nets:
cells:
references:
59
99
2
2
Combinational area:
Noncombinational area:
Net Interconnect area:
268940.000000
88007.500000
undefined (Wire load has zero net area)
Total cell area:
Total area:
1
design_analyzer>
356947.500000
undefined
Provide all the plots for the synthesized design. Hint: There is a quick way to generate all the
plots. Compare the synthesized schematics to the elaborated schematics. Discuss three
differences that you can find.
The results of the plot command can be found at the end of this report.
Some of the differences between the two plots are discussed in the following:
1. Previously undefined modules are now given instantiations. Previously, the multiplers were drawn as
simple blocks, since the code describing each contained a simple c=a*b statement that does not imply
any logic. Thus, it does not make sense to associate a logical implementation with it, since at the time of
design analysis, it is not known what the load or timing requirements are. Thus, even what form the
multiplier takes is not known, and so it must be given a block-box sort of view. Once the design is
synthesized, the synthesizer must choose how to implement each design, although it now has the timing
and load requirements and so it thus able to make good decisions as to how to implement the logic.
2. Instances used multiple times are given unique instantiations. For instance, in the original plot, only 2
modules for multipliers exist- multa and multb. In the post-synthesized version, each multiplier has
been given a unique instance for each time that it is used. Part of this is because of the uniquify
command that was run before synthesis to request this to happen. This is necessary because each
instantiation has its own set of timing requirements and loads, and so it makes sense to treat each
module separately and optimize the contents of each for the specific circumstance that it is in.
3. Generalized logic blocks in the biquad and coeffio modules have been implemented using logic gates.
For instance, the original plot of the biquad module included large rectangles labeled as SEQGEN, each
with a set of outputs and inputs. These corresponded to the combinational logic that was specified in
the code, such as an XOR operation between two sets of wires. Since it is unknown how the logic should
be implemented, no logic is created for it. In the post-synthesized version, each one of these blocks has
been replaced with the appropriate gates, each corresponding to a specific standard cell in the library.
4. Gates that have been specified in the design have been associated with very specific cells. For instance,
in the biquad module, the original schematic specified inverters. In the post-synthesis version, the
inverters have all been associated with cells in the standard cell library, such as winv_2.
What exactly does "Uniquify" do?
In the filter design, several modules (in this case, the multipliers)were instantiated more than once. The
uniquify command tells the synthesizer to create a new module version for each instantiation. Thus, since multa
is used twice, two different multa modules are created, multa_0 and multa_1. Each new version is identical to
the original, but creating new versions of each allows each instantiation to be optimized according to where in
the circuit it is being used.
In your synthesized "coefio" module, find U86. What is the name of this component?
When the file was first compiled, the U86 module was found as an inverter. This can be seen in Figure 1.
Figure 1 Location of the U86 module
As an inverter, no documentation for it was found in the DesignWare manuals on the Synopsys website. This is
because, as an inverter, its function is obvious, and it does not need to be documented.
When the design was compiled using the DC setup file for assignment 5, the instance U86 was found, although it
was located in the biquad module (in the biquad_DW01_inc_7 submodule). It is a winv_2 cell.
The submodule it is in (the DW01_inc cell) is defined in the DesignWare manual as the following:
Incrementer DW01_inc adds 1 to an input number A to produce the output SUM.
Do you think that the synthesized logic is the best that Synopsys can generate? If not, explain how you
could improve on the output.
It is very improbable that the generated design is the best that can be created. This is due to the fact that the
true circumstances that the circuit will be used in is not known. For instance, only rough estimates of the timing
clock uncertainty was entered. If more information about the clock was entered, the design would have more
precise timing requirements. This would enable the synthesizer to better choose devices, since it no longer has
to operate with worst-case estimates. Thus, some paths are likely to yield less stringent timing requirements,
and thus allow smaller size modules to be used. More information about the input/output signals would thus
enable the synthesizer to find a better solution.
The design can also be optimized by changing some of the settings in the synthesizer. For instance, the results
files indicate that synthesizer was set to “medium” effort, which is typical for the default setting. Setting this
value to a higher setting would cause the synthesizer to take more time synthesizing the design, but would likely
find better a result. In addition, the synthesizer likely has settings to enable it to optimize specifically for
power/area/speed etc., thus enabling an optimal design if one constraint is more important than the others.
The design can also be improved upon by modifying the library that it uses. For instance, the current library has
several different versions of each gate, each capable of driving different loads, and each having their own values
for power and speed. This allows the synthesizer to pick a version of a gate that best suits where it is being used
for. Increasing the granularity by adding more versions of gates will allow the synthesizer to have even more
options, and so will likely be able to find a more optimal solution. In addition to this, creating customized
versions of blocks such as incrementors or multipliers will allow the logic for them to be optimized to a greater
degree. For example, if a combinational logic circuit were to be implemented as a single gate using static CMOS
(or some other logic family), fewer transistors would be used than if the circuit were to be implemented using
gates.
From Part 1, briefly explain how the targeted library was changed using the .synopsys_dc.setup
file. Indicate the lines in the setup file that were changed. Do you think that targetting the wcells
library resulted in a longer compilation time (Part 4)? If you think that it took longer to compile,
provide a reason why it took longer to compile.
By comparing the two DC setup files, it can be seen that the key difference between the two files is that the
following lines have been commented out in the previous setup file:
link_library = "* wcells.db wsram.db"
target_library = {wcells.db wsram.db}
symbol_library = {wcells.sdb}
while the following lines were commented out of the current setup file:
link_library = "* tcb773pwc.db tpd773pnwc.db wsram.db"
target_library = {tcb773pwc.db tpd773pnwc.db wsram.db}
symbol_library = {tcb773p.sdb tpd773pn.sdb}
These lines cause the target library to be changed to the wcells. This is necessary because the wcell library
contains the abstract views necessary to complete the floor planning, as well as containing all of the transistors
within each cell.
As for the length of the compile time, no significant difference in compile time was observed. It should be noted
that the exact time of each was not monitored, but each compile appeared to take around 10-20 minutes long.
In theory, there shouldn’t be a significant difference between the two libraries, as the compiler should have to
follow through the same steps for each. With the simpler wcells library, however, there are fewer cells and thus
fewer options for the compiler to pick from, so it might be a little bit faster. This would, of course, come at the
expense of poorer performance.
From Part 4, once your design is synthesized, Design Analyzer can give you estimates on
required area, power, etc. You can generate these reports by going to Analysis-->Report. Find
the estimated area for the layout of the biquad filter. Compare the reported area to the area
reported for the synthesis run you performed for Assignment #4. Explain the size difference if
there is one.
The following report was obtained for the synthesis:
Information: Updating design information... (UID-85)
****************************************
Report : area
Design : bqmain
Version: X-2005.09
Date
: Wed Dec 12 14:52:05 2007
****************************************
Library(s) Used:
wcells (File: /CMC/kits/cmosp35/synopsys/2000.11-SP1/syn/wcells.db)
Number
Number
Number
Number
of
of
of
of
ports:
nets:
cells:
references:
59
99
2
2
Combinational area:
Noncombinational area:
Net Interconnect area:
477041.000000
95469.000000
undefined (Wire load has zero net area)
Total cell area:
Total area:
1
design_analyzer>
572510.000000
undefined
As can be seen, 570000 um2 is required, which is 1.58 times more area than that required by the previous
synthesis. This is what is expected, as the wcells library is less complex, thus requiring more simpler cells to
perform what a more complex cell in the bbox library could do on its own.
Provide all the plots for the synthesized design from Part 4. Hint: There is a quick way to
generate all the plots. Compare the synthesized schematics generated in this assignment to the
synthesized schematics generated in Assignment #4. Discuss a couple of differences that you
can find.
The plot for the synthesis can be found in the Appendix. Some of the notable differences between the two cells
include the following:
Logic complexity of each is different: It was observed that the bbox version contained fewer cells but of greater
complexity. For instance, gates with individual inputs inverted, 3 input XOR gates, and adders were found, all of
which were missing in the wcells version, as they were likely implemented by a larger number of simpler gates.
Different cell names are used: In the wcell version, each of the cell names are from the wcell library, so names
such as winv_2, wand2_2 are observed, where as in the other version, the names are given according to another
convention, such as INV4 etc.. This is expected, since the target libraries of each are different, so the cels
instantiating the design will be different, and will thus have different names. Since many of the cells in each
version perform the same task (ie. invert, or AND etc.), the names will stay similar (so they will both have inv or
AND in the name), only the naming convention will change.
Logic is rearranged differently: Since the wcell library contains more cells, but of a less diverse variety, the logic
for each of the blocks is unlikely to remain the same. This is observed, since there is no reason why the block
view of one cell should match that of the other. For instance, in the wcell version, a group of AND and NAND
gates are found whereas in the other view, NAND, AND and OR gates are found (with fewer numbers of each).
How many standard cells were required to create the "coeficio" block? How did you find this
out?
By looking at the attributes window in the First Encounter (see Figure 2), it can be seen that the coefioi module
contains 473 standard cells. This was verified by looking at the design browser after the routing was completed
(see Figure 3).
Figure 2 Information window showing details about the coefio cell, indicating the number of cells it contains
Figure 3 Design Browser window showing number of cells in each module
In Part 8 and Part 9, power rings and stripes were created. The rings run around the perimeter of
the core, and the strips run vertically through the core. However, there doesn't seem to be any
horizontal power routing through the core! Is this a mistake? Why or why not?
The power stripes on the design were only created in 1 direction because that is all that is needed. Standard cell
design has the power and ground lines moving in a single direction from one cell to the next. Thus, the only
routing that is needed is from one row of cells to the next. In this design, the rows are all arranged from the left
to the right, so that horizontal power routing isn’t needed, only vertical routing.
In Part 10, FE decided where to place the standard cells corresponding to each module. It did try
to keep the standard cells for each module in a contiguous group. Read the FE documentation
and get FE to place the standard cells for each module exactly where you specify. For example,
you can draw a rectangular region and force FE to place the standard cells for a given module in
that region only.
Figure 4 Floorplanning window after specifying the location of the coefioi module
By double-clicking on the module in the floor planner, a constraint for the placement of the module can be
added. By setting the constraint to of type “Fence” as opposed to “guide” (see Figure 4), the placement of all
modules in the block will placed inside the specified rectangle (see Figure 5). The guide-type constraint only acts
as a soft constraint, and allows the router to place cells outside of the rectangle if doing so would improve the
performance. The reason why a constraint would be used is because the placement of cells is a highly
complicated procedure, and the computer is unlikely to find the single best routing the first time. While it can
optimize a design by using algorithms like simulated annealing, it is likely that the algorithm will get stuck in a
local minima, thus leaving the design is a relatively optimal state, but far from the most idea situation. Thus, if
the engineer were to examine the design and determine before hand where the modules should go, the
computer will be much more likely to find a design that is optimal for routing/power/size etc..
Figure 5 Floor plan of the final design showing coefioi modules inside the specified rectangle
In Part 11, FE routed the design. Which metal layers did it use? If it didn't use metal 1, why not?
What information does FE give you when you click on a metal routing line?
The routing of the standard cell can be seen in Figure 6. The blue wires are M1, the red ones M2 and the green
ones M3. As can be seen, no metal 1 has been used in the design. This is because the routing done within all of
the standard cells is done mostly on the metal1 layer, and so this does not leave room for intercell routing on
M1.
As can be seen in the figure, the routing of M3 is done all vertically, since doing otherwise would cause it to
overlap the power and ground rails. Thus, the routing of M2 can be done using only M2, since M3 can handle all
of the vertical routing. Dividing the routing between two layers like this simplifies the design of the routing.
Figure 6 Floor plan of die after routing
When a metal layer is selected, the information that can be obtained from the tool can be seen in Figure 7.
Here, it can be seen that the information includes the wire direction, the wire bounding box (giving the extents
of the metal routing), the metal layer used, and the net it is attached to. This information is also obtained in the
query window when hovering over a wire.
Figure 7 Attribute editor after double clicking a wire
In Part 12, did the position of your U101 match the position of my U101 shown as above? If it
didn't, justify the difference.
When the routing was performed, 4 instances named U101 were found. These can be seen in Figure 9, as the
yellow highlighted modules. The reason why there would be a difference between the two is that during the
compilation and synthesis steps, the computer attempts to optimize the design, which is largely a somewhat
random process. It is thus unlikely that two different designs, optimized with different versions of the same
tools and using different versions of the same kits, will yield exactly the same settings. Since many optimization
algorithms use random numbers to obtain initial guesses, it is also unlikely that 2 runs will yield exactly the same
result, even if performed with the same equipment and tools.
Figure 8 Design Browser window showing the found instances with the U101 name
Figure 9 Floor plan of design with U101 modules highlighted
© Copyright 2025 Paperzz