A core generator for arithmetic cores and testing

Journal of Systems Architecture 52 (2006) 1–12
www.elsevier.com/locate/sysarc
A core generator for arithmetic cores and testing
structures with a network interface q
D. Bakalis a,c,*, K.D. Adaos a, D. Lymperopoulos a, M. Bellos a,
H.T. Vergos a, G.Ph. Alexiou a,b, D. Nikolos a,b
a
b
Computer Engineering and Informatics Department, University of Patras, 265 00 Patras, Greece
Research Academic Computer Technology Institute, 61 Riga Feraiou Str., 262 21 Patras, Greece
c
Electronics Laboratory, Department of Physics, University of Patras, 265 00 Patras, Greece
Received 7 October 2002; received in revised form 30 September 2003; accepted 16 December 2004
Available online 13 June 2005
Abstract
We present Eudoxus, a tool for generation of architectural variants for arithmetic soft cores and testing structures
targeting a wide variety of functions, operand sizes and architectures. Eudoxus produces structural and synthesizable
VHDL and/or Verilog descriptions for: (a) several arithmetic operations including addition, subtraction, multiplication,
division, squaring, square rooting and shifting, and (b) several testing structures that can be used as test pattern generators and test response compactors. Interaction with the user is made through a network interface. Since the end user
is presented with a variety of unencrypted structural cores, each one describing an architecture with its own area, delay
and power characteristics, he can choose the one that best fits his specific needs which he can further optimize or
customize. Therefore, designs utilizing these cores are completed in less time and with less effort.
2005 Elsevier B.V. All rights reserved.
1. Introduction
q
Based on ‘‘EUDOXUS: A WWW-based Generator of
Reusable Arithmetic Cores’’, by D. Bakalis, K.D. Adaos, D.
Lymperopoulos, G.Ph. Alexiou and D. Nikolos, which
appeared in Proceedings of 12th IEEE International Workshop
on Rapid System Prototyping, Monterey, CA, USA, pp. 182–
187. 2001 IEEE.
*
Corresponding author. Tel.: +30 2610 996287; fax: +30 2610
997456.
E-mail address: [email protected] (D. Bakalis).
Engineers that deal with System-on-a-Chip
(SoC) design, face the challenge to integrate a rich
set of features in a single piece of silicon, with
high performance requirements, on progressively
shorter product life cycles in order to achieve
todayÕs strict time-to-market goals [1,2]. To cope
with these challenges, a block-based approach,
that emphasizes design reuse, becomes necessary.
1383-7621/$ - see front matter 2005 Elsevier B.V. All rights reserved.
doi:10.1016/j.sysarc.2004.12.006
2
D. Bakalis et al. / Journal of Systems Architecture 52 (2006) 1–12
The use of pre-designed and pre-verified design
blocks (cores), available in technology independent
descriptions such as in Hardware Description Languages (HDLs) reduces time-to-market through
faster design cycles and leads to higher productivity of the design teams. Moreover, it minimizes
risk and enables cost-effective solutions. In order
to be effective, this block-based design methodology requires access to an extensive library of
reusable cores or equivalently to a versatile core
generator.
The size and complexity of todayÕs complex
SOCs also requires the verification of the design
before its ASIC implementation. This verification
step is commonly accomplished by using a platform of reconfigurable hardware utilizing one or
more FPGAs. To exploit the full potential of the
rapid prototyping process and minimize the development time and effort, the same HDL description
is desirable to be used in both ASIC and FPGA
implementations.
As the amount of information required to complete a design is increasing, the Internet has
emerged as a suitable tool for the delivery of information. Network-based applications and design
tools are becoming popular in the last few years
[3,4] since they offer several advantages compared
with conventional workstation-hosted design techniques: first of all, design tools are now developed
for a single platform resulting in reduced cost.
Furthermore, upgrades and bug fixes are immediately available to users whereas installation costs
are minimized. Moreover, better access control
can be achieved. Finally, performance-intensive
design tools can run on powerful network servers,
yet they can be accessed from low-performance
low-cost desktop computers [5]. Several attempts
for producing network-based design tools and
applications have been reported during the last
years [5–14].
Arithmetic modules, such as adders and multipliers, are common and essential building blocks
for many applications [15]. Built-In Self-Test
(BIST) structures are also candidates for reuse
across multiple designs. Algorithms for arithmetic
or testing operations coded in a technology independent HDL and synthesizable in a variety of
commercial products, offer the designer efficient
and versatile building blocks for developing complex System-on-a-Chip solutions. Most available
synthesis tools offer some kind of block generators
for arithmetic modules. Common limitation is the
lack of a variety of architectures for all possible
implementation technologies and the fact that
the produced modules are encrypted and therefore
cannot be customized.
Several arithmetic module generators have been
presented during the last years. In [12] an on-line
arithmetic design tool for FPGAs is presented.
The tool was developed in Java programming language and generates standard VHDL code. However, it is restricted to on-line arithmetic, that is,
all operations are performed digit serially, most
significant bit first. A web-based arithmetic module generator is presented in [13] which is capable
of producing multipliers, squarers and adders. In
[16] a module generator for generating complex
arithmetic structures is presented. Rather than
using distinct adder or multiplier modules, this
generator combines the supported arithmetic
expressions into a single module. The disadvantage of this approach is that it restricts the designer
from examining different alternatives to meet his
design goals. The authors of [17] present a
macro-cell generator for floating-point division
and square root operation. A module generator
for logic-emulation applications, which is able to
generate macro-cells of arbitrarily complex functions described in HDL is proposed in [18]. This
module generator accepts Verilog description and
produces partitioned Configurable Logic Block
(CLB) netlists for decomposing the design into
multiple FPGA chips. In [19], a generator for multiplier and squarer structures is proposed. The
generator utilizes Wallace tree summation and
produces synthesizable VHDL code. However, it
is a stand-alone tool limited to only these two
types of arithmetic structures. Another standalone generator that implements multiplexer-based
structures is presented in [20].
In this work we present Eudoxus,1 a generator
of arithmetic cores and testing structures with a
1
The core generator is named Eudoxus after the ancient
Greek mathematician and astronomer who substantially
advanced number theory.
D. Bakalis et al. / Journal of Systems Architecture 52 (2006) 1–12
network interface. The output of Eudoxus is
synthesizable VHDL and/or Verilog structural
descriptions of the arithmetic modules or testing
structures. The user only needs to provide the type
of the requested operation and the size and form of
the operands. Since a completely structural, nonencrypted, description is produced, reusability is
assured. Eudoxus handles several arithmetic operations, such as addition, subtraction, multiplication,
division, square rooting, squaring and shifting.
Moreover, for each such operation Eudoxus can
produce more than one distinct architectures.
Therefore, the designer can select the one that best
meets his specifications or even choose to customize the one closer to his specifications. Therefore,
the generator can both alleviate the design effort
and help towards the reduction of time to market.
Finally, Eudoxus can produce testing structures
that are based on Linear Feedback Shift Registers
(LFSRs), Cellular Automata (CA) and accumulators. These testing structures can be used for pseudorandom test pattern generation and test response
compaction in Built-In Self-Test.
The core generator in this work is a direct
improvement of the generator presented in [21].
Several new arithmetic cores have been added to
the generator as well as the testing structures.
The number of different cores supported in the
new version of the generator is twice the number
of different cores supported in the original version.
Furthermore, design examples and comparisons
are also included in this work.
The rest of the paper is organized as follows:
Section 2 presents the core generator. Two examples of circuits whose design was simplified by
the proposed generator are given in Section 3.
Comparison between the results obtained after
synthesis of the descriptions produced by Eudoxus
and those produced by commercial generators are
given in Section 4.
2. The core generator
The characteristics of the core generator, the
way it produces the cores, the currently supported
cores and its network interface are explained in the
following.
3
2.1. Characteristics of the core generator
The following decisions were made regarding
the implementation of the core generator. In order
to ensure technology independence and reusability, Eudoxus produces its descriptions in an
abstract level, that is, in structural Verilog and/or
VHDL language. The need for supporting both
languages is driven by the fact that both of them
are considered established standards in the design
world. The support of both HDLs helps the designer to easily integrate a module with the rest
of a design and map it to a specific technology library through the use of any commercially available logic synthesis tool. Furthermore, since the
cores are not encrypted, customization is possible.
The cores generated by the tool, are described in a
hierarchical way, with small behavioral or structural subfunctions in the leaf cells (e.g. full adders,
half adders). The designer may therefore choose to
define his own gate-level implementations of the
leaf cells even if he has no knowledge of the algorithm used for implementing the arithmetic operation in order to further optimize his design. He has
also the ability to insert pipeline registers between
the stages of a core or combine more than one
cores to implement specialized operations.
Eudoxus supports a comprehensive set of different architectures for each of the four basic
arithmetic operations (addition, subtraction, multiplication and division) assuming that the
operands are applied either in parallel or in serial.
This feature provides the designer with a volume
of design alternatives for a specific type of arithmetic operation and the ability to choose the one that
best meets his area, delay, power and testability
constraints. Moreover, Eudoxus is able to produce
some other arithmetic modules that are frequently
used in system design such as fractional and integer squarers and square rooters and multiplexerbased shifters. It also supports a comprehensive
set of different architectures for pseudorandom
test pattern generation and test response compaction.
Finally, Eudoxus is implemented as a software
tool with a network-based user interface, enabling
the designer to interact with it without having to
install it to a local workstation. This remote
4
D. Bakalis et al. / Journal of Systems Architecture 52 (2006) 1–12
operation, additionally, simplifies the maintenance
of the tool.
2.2. Core production
In this section we shortly describe the way Eudoxus produces the cores. In every architecture
supported there is a layout regularity, that is, each
architecture can be thought of as a regular interconnection of basic design blocks (cells). The cells
may be as simple as a single gate or slightly more
complex, as for example a full adder, a Booth Encoder or a carry save adder stage. The regularity of
an architecture can be either linear or two-dimensional. For example, a ripple carry adder has a linear regularity whereas a parallel prefix adder or an
array multiplier has a two-dimensional one. Once
the functionality, the architecture and the size of
the requested core has been requested by the user,
Eudoxus produces a structural HDL description
of the core based on the following steps:
Step 1 (core definition): The names of the primary inputs and primary outputs of the core
are determined based on its size and functionality. Information about the type of the core as
well as its primary inputs and outputs are then
placed in the beginning of the core description.
Step 2 (basic cell definition): The generator,
using an internal look-up table, determines the
basic cells that are required to create the core,
e.g. full adders, one-bit controlled adders/subtracters, etc. Their HDL descriptions are then
produced and placed after the core information
part in the core description.
Step 3 (core description): The module of the
requested core is to be produced next. The size
of the core determines the number and type of
the basic cells that are needed. At first, the moduleÕs inputs and output ports are declared. Then
multiple instances for each of the basic cells are
produced. In every instantiated basic cell, each
input is connected to either a coreÕs primary
input or an output of another basic cell whereas
each output is connected to either a coreÕs primary output or an input of another basic cell.
The generator, according to the requested coreÕs
functionality and architecture, then determines
the interconnections between the basic cells
and the primary inputs and outputs of the core.
Step 4 (test-bench): Once the core has been fully
described in HDL, a test-bench for functional
verification is also produced which can provide
input vectors to the core giving the designer the
ability to check its correct functionality by a
logic simulation tool.
2.3. Arithmetic cores, testing structures
and supported sizes
The currently supported cores are presented in
Table 1. This table is divided in four major parts.
The first one presents arithmetic cores where all
operands are applied in parallel (Parallel Arithmetic Cores). The second part presents the cores
where one operand is applied in parallel whereas
the other operand is applied serially (Serial/Parallel Arithmetic Cores) and the third part presents
the cores where all operands are applied serially
(Serial Arithmetic Cores). The fourth part presents
some structures that are suitable for pseudorandom testing of the arithmetic cores (Testing Structures). These structures include pseudorandom test
pattern generators, based on Linear Feedback
Shift Registers (LFSRs), Cellular Automata (CA)
and accumulators, and test response analyzers
based on Multiple Input Linear Feedback Shift
Registers (MISRs) and can be combined with circuits under test to produce complete BIST
schemes.
We have to note that all the above algorithms
lead to highly regular structures, that is, the produced cores are designed in the form of regularly
repeated patterns of identical circuits. Thus, due
to their geometrical regularity, they are suitable
for efficient synthesis and implementation in both
FPGA and ASIC technologies.
The implementation of the above algorithms for
arithmetic operations imposes some restrictions on
the size of the operands that can be handled. For
example, since we use two-bit recoding for Booth
multipliers, the operand size must be a multiple
of 2. Moreover, in the current version of the tool,
we support only modules with equally sized operands (in the case of dividers, the size of one
D. Bakalis et al. / Journal of Systems Architecture 52 (2006) 1–12
5
Table 1
Arithmetic cores, testing structures and operand sizes
Minimum Step
size
Maximum
size
Ripple Carry Adder/Subtracter
Group Carry Look-Ahead Adder/Subtracter
2
4
1
4
1024
1024
Adders
Ripple Carry Adder
Group Carry Look-Ahead Adder
Multi-Level Carry Look-Ahead Adder [22]
Ladner-Fischer Parallel-Prefix Adder [23]
Kogge–Stone Parallel-Prefix Adder [24]
2
4
4
2
2
1
4
4
1
1
1024
1024
256
1024
1024
Subtracters
Ripple Carry Subtracter
Group Carry Look-Ahead Subtracter
2
4
1
4
1024
1024
Multipliers
Carry Propagate Array Multiplier
Carry Save Array Multiplier [22]
TriSection Pezaris Array Multiplier [25]
Baugh–Wooley Array Multiplier [26]
Modified Booth Multiplier [27]
2
2
2
2
2
1
1
1
1
2
256
256
256
256
256
Dividers
Restoring Cellular Array Dividers [22]
Non-Restoring Cellular Array Dividers [22]
Square rooters
Non-Restoring Fractional Square Rooter [22]
4
2
256
Squarers
Non-Restoring Fractional Squarer [22]
4
2
256
Shifters
Multiplexer-based Shifter [28]
2
1
256
Unsigned Serial/Parallel Multiplier
2Õs complement Serial/Parallel Multiplier [29]
2
2
1
1
256
256
Serial adder
Serial Column Adder
2
4
1
2
1024
1024
Parallel Arithmetic Cores
Adders/subtracters
Serial/Parallel Arithmetic Cores
Multipliers
Serial Arithmetic Cores
Adders
8, 16, 32, 64
8, 16, 32, 64
Multipliers
Serial multiplier [30]
4
2
1024
Squarers
Serial squarer [30]
4
2
1024
Complementers
Serial 1Õs/2Õs complement
2
1
1024
Linear Feedback Shift Registers (LFSRs) [31]
Cellular Automata based [32]
Accumulator-based without carry feedback [33]
Accumulator-based with carry feedback [33]
2
2
2
2
1
1
1
1
256
256
1024
1024
Multiple Input Linear Feedback Shift Register (MISR) 2
1
256
Testing Structures
Pseudorandom test pattern generators
Response analyzers
operand is twice the size of the other). The currently supported sizes are also shown in Table 1.
The third and fifth columns of Table 1 indicate
the smallest and largest size supported. The fourth
column indicates the step between two supported
sizes. For example, in the case of parallel multi-level carry look-ahead adders, the tool can produce
all cores with operand size a multiple of 4, e.g. 4, 8,
12, . . . , 256.
2.4. Structure of the core generator
The interface of the core generator is described in Fig. 1. The arrows define the flow of
6
D. Bakalis et al. / Journal of Systems Architecture 52 (2006) 1–12
Fig. 1. The interface of the generator.
information between the client where the designer
stands and the server where the generator is installed. The designer, using any web-browsing
tool, initiates a request to use the generator. If
he is authorized to use the tool, a parameter form
is passed to him. Information about the supported cores and sizes is available to help him
make his decisions. The designer defines the type
of the arithmetic or testing operation, the size of
the operands and the HDL language of the requested core. Then, the parameters are passed
back to the http server, which initiates the core
generator as a system process. Eudoxus produces
the requested core and stores it in the http server.
Then a hyperlink of the produced core is passed
to the client, which is able to browse or download
the core. At the client, the core can be combined
with the rest of a design and mapped to a specific
technology with a logic synthesis tool or can be
simulated for functional verification using the
test-bench module that is produced together with
the core. Some snapshots of Eudoxus are given in
Fig. 2.
The generator of arithmetic cores and testing
structures has been implemented using ANSI C
whereas the user interface was based on HTML
and CGI/Perl scripts.
3. Design examples
The applicability of the presented generator is
demonstrated in this section in the design of two
applications, namely of (a) a floating-point multiplier, and (b) a FIR digital filter.
3.1. Floating-point multiplier
We have used the core generator to design a
floating-point multiplier that supports the IEEE
754-1985 standard [34]. This multiplier supports
both the single precision and the double precision
format of the above standard and can perform a
number of functions. Specifically, it can perform
the following:
• it can multiply two single precision operands
and produce the result in the same format,
• it can multiply two double precision operands
and produce the result in the same format and
D. Bakalis et al. / Journal of Systems Architecture 52 (2006) 1–12
7
Fig. 2. Snapshots of Eudoxus.
• it can be fed with two single precision operands,
convert them to double precision and provide
the result in the latter format. The conversion
logic block depicted in Fig. 3 contains the circuitry required to convert the operands from
one format to another.
At first, the multiplierÕs functionality was described in behavioral HDL and simulated against
a software model. Several design alternatives had
then to be examined to reach an implementation
that balances area and performance requirements.
The floating-point multiplier design makes use of
three arithmetic cores (see Fig. 3):
sign1
+/-
e1
exponent
s1
sign2
mantissa
+/-
e2
s2
exponent
mantissa
Conversion Logic
-bias
XOR
11-bit adder
53-bit
unsigned
multiplier
incrementer
rounding
enable incrementer
• A 53-bit unsigned multiplier for the mantissas.
• Two 11-bit integer adders/subtracters for the
exponents. The first one is used to subtract the
bias from the first exponent whereas the second
adds the second exponent.
+/-
exponent
mantissa
Fig. 3. A floating-point, double precision, IEEE standard
multiplier.
8
D. Bakalis et al. / Journal of Systems Architecture 52 (2006) 1–12
• A 11-bit incrementer used for adjusting the
exponent.
• A 53-bit incrementer for the rounding
procedure.
architectures of Table 2 in very short time and with
zero design effort.
The maximum delay of the floating-point multiplier depends heavily on the delay of the unsigned
multiplier for the mantissas. To this end, several
implementations of this fixed-point multiplier had
to be taken into account. Table 2 presents attained
floating point multiplierÕs performance and area results when alternative architectures, provided by
our generator, were used for the mantissas multiplication. We have used Group Carry Look-Ahead
cores for both the 11-bit adder/subtracters and
the two incrementers. Results were obtained using
the Synopsys Design Compiler logic synthesis tool
along with a 0.25 lm CMOS technology.
The availability of the presented core generator
enabled us to examine and evaluate all different
A second example of the use of Eudoxus, is the
design of a Finite Impulse Response (FIR) digital
filter with variable coefficients. We have implemented a 16-tap FIR filter with coefficient and data
width of eight bits. The filterÕs response is given by
3.2. FIR digital filter
yðnÞ ¼
n1
X
hðkÞ xðn kÞ
ð1Þ
k¼0
where the filter coefficients are denoted by h(k).
x(k) denotes the input at time k. By proper design,
the FIR filter can have linear phase. In this case, the
filter coefficients are symmetrical and Eq. (2) holds.
hðkÞ ¼ hð15 kÞ;
k ¼ 0; . . . ; 15
ð2Þ
Eq. (1) in this case can be expressed as follows:
7
X
½xðkÞ þ xð15 kÞ hðkÞ
Table 2
Floating point multiplierÕs area and delay results for three
different unsigned multiplier architectures
yðnÞ ¼
Unsigned multiplier architecture
Symmetrical coefficients lead to a reduction of
the number of multiplications to one half. Fig. 4
presents a possible structure of the filter described
by Eq. (3). This architecture was used in our design
example.
We have implemented and evaluated three different versions of this FIR filter architecture by
Floating point
multiplier
54-bit Baugh–Wooley
53-bit Carry Save Array
54-bit Modified Booth
D
x(n)
D
Area
(sq.microns)
Delay (ns)
864,151
690,870
579,522
41.80
30.06
14.00
x(n-1)
x(n-14)
D
D
x(n-2)
x(n-13)
D
D
x(n-3)
x(n-12)
D
D
x(n-4)
x(n-11)
D
D
x(n-5)
x(n-10)
D
D
x(n-6)
x(n-9)
D
D
x(n-7)
x(n-8)
x(n-15)
+
h(0)
+
h(1)
+
h(2)
+
+
h(3)
ð3Þ
k¼0
+
h(4)
h(
+
h(5))
+
h(6)
6
+
+
+
y(n)
Fig. 4. 16-Tap linear-phase FIR Filter.
h(7)
+
+
+
+
D
D. Bakalis et al. / Journal of Systems Architecture 52 (2006) 1–12
9
Table 3
FIR filter implementation results
Target: ASIC
(library: 0.25 lm CMOS)
Modified Booth
Baugh–Wooley
Pezaris
Target: Altera FPGAs
(device: EPF10K100EQC208-1)
Area (sq. microns)
Speed (MHz)
Area (logic cells)
Speed (MHz)
160,863
191,568
194,032
140
120
125
2589
1848
2116
19.68
21.73
22.72
using three different kinds of multiplication units.
By using Eudoxus we created Verilog descriptions
for the multipliers. These descriptions were
included in custom (hand written) Verilog code
describing the other parts of the FIR filter. The total design was implemented in both FPGA and
ASIC technology by using AlteraÕs MAX+Plus II
and SynopsysÕ Design Compiler, respectively.
The three versions of the multiplier unit were Modified Booth, Baugh–Wooley and Pezaris. Each multiplier block is used eight times in the
corresponding filter implementation. Table 3 summarizes the implementation results of these three
versions.
4. Evaluation of the proposed generator and
comparison against commercial generators
Most available synthesis tools, whether they
target ASIC or FPGA technologies, offer some
kind of core generators for arithmetic modules or
testing structures. For example, we can refer to
Synopsys DesignWare library, Xilinx Core Generator and Altera MegaFunction Wizard.
To judge the efficiency of the core descriptions
produced by Eudoxus, we compared a number of
parallel adder and multiplier cores against the
cores produced by Synopsys DesignWare using a
0.25 lm CMOS technology. The results of the synthesis process regarding area and delay are shown
in Table 4. In all examined cases, the area and
delay of the cores that are generated by Eudoxus
are comparable to those of the DesignWare generated cores. Furthermore, in some cases, Eudoxus
generated cores which are faster than those generated by DesignWare. We, thus, come to the conclusion that Eudoxus can produce cores that are
equally efficient as those generated by commercial
generators. This can be attributed to the fact that
the cores generated by Eudoxus implement efficient algorithms for arithmetic operations.
Similar comparisons of Eudoxus against the Altera or Xilinx core generators however, showed that
in some cases Altera or Xilinx cores are better than
those of Eudoxus. This is due to the fact that Altera
Table 4
Comparison of the proposed generator and the Synopsys DesignWare library
Circuit
Size
Synopsys DesignWare
Area (sq.microns)
Eudoxus
Delay (ns)
Area (sq.microns)
Delay (ns)
Ripple Carry Adder
8
32
1014
4055
1.75
6.88
1584
6336
1.69
6.42
Carry Look-Ahead Adder
8
32
1473
6399
1.20
4.22
1449
4910
1.33
1.56
Parallel-Prefix Addera
8
32
1821
9258
0.68
1.59
2740
16,101
0.82
1.18
Booth Multiplier
8
16
11,025
39,006
2.95
4.16
12,260
50,308
3.02
3.95
a
Parallel-prefix adders are based on the Brent–Kung architecture in Synopsys DesignWare and on the Kogge–Stone architecture in
Eudoxus.
10
D. Bakalis et al. / Journal of Systems Architecture 52 (2006) 1–12
or Xilinx core generators exploit the structure of the
LUTs of the FPGA in order to drive the synthesis
tool and produce optimized cores (for example they
can instruct the synthesis tool to use effectively the
fast carry chains) whereas Eudoxus has no knowledge of the underlying technology and therefore
the produced cores strongly depend on the efficiency of the FPGA synthesis tool. However, even
in these cases, the user can either customize the Eudoxus generated cores or select an alternative core
in order to achieve the required characteristics in
terms of area, delay and power. This is not feasible
with the FPGA commercial generators.
The main advantage of Eudoxus compared to
generators that accompany synthesis tools is the
flexibility and portability that it offers. In commercial generators, the user can only define the parameters of the core that he is interested in and integrate
the core with the rest of the design. In several cases,
the user does not have access inside the produced
core, which is somehow encrypted, and therefore
modifications and customization are prohibited.
Finally, the generator can only be used along with
the corresponding synthesis tool; produced cores
cannot be inserted to a different synthesis tool. On
the other hand, Eudoxus generated cores are not
encrypted. This allows easy customization of the
cores. For example, the designer can define his
own gate-level implementations in the leaf cells of
the core. Furthermore, Eudoxus gives the opportunity to the designer to select, between several different architectures performing a specific operation,
the one that best suits his needs. Moreover, because
of the structural description, the cores that the proposed generator produces can be used with any
logic synthesis tool. This feature provides high portability of the produced cores.
Finally, commercial generators are only available to those who have installed the corresponding
logic synthesis tool at their workplace. Eudoxus,
on the other hand, is always available to the user
through a network interface.
5. Conclusion
We have presented a tool for generation of
cores performing arithmetic operations and testing
structures with network accessibility. Once the
user determines the type of the arithmetic operation or testing structure that he wishes, along with
the operand size and the required architecture, the
tool produces the requested core in structural
VHDL and/or Verilog. The main benefits of the
presented generator are:
1. It supports many (all commonly used) arithmetic operations including addition, subtraction,
multiplication, division, squaring, square rooting and shifting.
2. It also supports pseudorandom testing structures that can be used for creating BIST schemes.
A core without its corresponding testing scheme
is not welcome in several applications.
3. For each operation several architectures are
supported. This leaves the designer with the
ability to perform an architectural exploration
to choose the architecture that best suits his specific needs. For that reason, Eudoxus minimizes
the design effort and reduces time-to-market.
4. The structural code that the tool produces is
unencrypted, allowing the end designer to perform customization or further optimization.
Acknowledgment
This research was partially supported by the
Public Benefit Foundation ‘‘Alexander S. Onassis’’
via its scholarships program.
References
[1] M. Keating, P. Bricaud, Reuse Methodology Manual for
System-on-a-Chip Designs, Kluwer Academic Publishers,
1998.
[2] G. Dare, D. Linzmeier, B. Deitrich, K. Whitelaw, Circuit
generation for creating architecture-based virtual components, in: Proc. of Design Automation and Test in Europe
Conference (User Forum), 2000, pp. 79–83.
[3] D. Alles, G. Vergottini, Taking a look at internet based
design in the year 2001, Electronic Design (January 6)
(1997) 42–50.
[4] M. Spiller, A. Newton, EDA and the Network, in: Proc. of
International Conference on Computer-Aided Design,
1997, pp. 470–476.
D. Bakalis et al. / Journal of Systems Architecture 52 (2006) 1–12
[5] L. Walczowski, D. Nalbantis, W. Waller, K. Shi, Analogue
layout generation by world wide web server-based agents,
in: Proc. of European Design and Test Conference, 1997,
pp. 384–388.
[6] T. Vassileva, V. Tchoumatchenko, I. Astinov, I. Furnadjiev, Virtual VHDL Laboratory, in: Proc. of 5th
International Conference on Electronics, Circuits and
Systems, 1998, pp. 325–328.
[7] D. Nalbantis, L. Walczowski, W. Waller, Multiple server
WWW-based synthesis of VLSI circuits, in: Proc. of 5th
International Conference on Electronics, Circuits and
Systems, 1998, pp. 537–540.
[8] V. Tchoumatchenko, T. Vassileva, I. Vassilev, I. Furnadjiev, WWW based distributed FPGA design, in: Proc.
of Design Automation and Test in Europe Conference
(User Forum), 2000, pp. 97–101.
[9] H. Lavana, A. Khetawat, F. Brglez, K. Kozminski,
Executable workflows: a Paradigm for collaborative design
on the internet, in: Proc. of Design Automation Conference, 1997, pp. 553–558.
[10] J. Pardo, M. Iriso, T. Riesgo, E. de la Torre, Y. Torroja, J.
Uceda, AVI: a tool to learn VHDL through internet, in:
Proc. of XV Design of Circuits and Integrated Systems
Conference, 2000, pp. 224–229.
[11] M. Hayes, M. Jamrozik, Internet distance learning: the
problems, the pitfalls and the future, Journal of VLSI
Signal Processing 29 (1–2) (2001) 63–69.
[12] A. Schneider, R. McIlhenny, M. Ercegovac, BigSky—an
on-line arithmetic design tool for FPGAs, in: Proc. of
IEEE Symposium on Field-Programmable Custom Computing Machines, 2000, pp. 303–304.
[13] J. Pihl, J. Oeye, A web-based arithmetic module generator
for high performance VLSI applications, in: Proc. of
International Workshop on IP-Based Synthesis and System
Design, 1998.
[14] Xtensa Configurable Processor, http://www.tensilica.com.
[15] M. Jacome, H. Peixoto, A survey of digital design reuse,
IEEE Design and Test of Computers 18 (3) (2001) 98–107.
[16] D. Kumar, B. Erickson, ASOP: Arithmetic sum-of-products generator, in: Proc. of International Conference on
Computer Design, 1994, pp. 522–526.
[17] M. Aberbour, A. Houelle, H. Mahrez, N. Vaucher, A.
Guyot, On portable macrocell FPU generators for division
and square root operators complying to the full IEEE-754
Standard, IEEE Transactions on VLSI systems 6 (1) (1998)
114–121.
[18] W. Fang, A. Wu, D. Chen, EmGen—a module generator
for logic emulation applications, IEEE Transactions on
VLSI Systems 7 (4) (1999) 488–492.
[19] J. Pihl, E. Aas, A multiplier and squarer generator for high
performance DSP applications, in: Proc. of 39th Midwest
Symposium on Circuits and Systems, vol. 1, 1996, pp. 109–
112.
[20] J. Abke, E. Barke, J. Stohmann, A universal module
generator for LUT-based FPGAs, in: Proc. of International Workshop on Rapid System Prototyping, 1999, pp.
230–235.
11
[21] D. Bakalis, K.D. Adaos, D. Lymperopoulos, G. Ph.
Alexiou, D. Nikolos, EUDOXUS: a WWW-based generator of reusable arithmetic cores, in: Proc. of International
Workshop on Rapid System Prototyping, 2001, pp. 182–
187.
[22] K. Hwang, Computer Arithmetic Principles, Architecture
and Design, John Wiley & Sons, 1979.
[23] R.E. Ladner, M.J. Fischer, Parallel prefix computation,
Journal of the ACM 27 (4) (1980) 831–838.
[24] P. Kogge, H. Stone, A parallel algorithm for the efficient
solution of a general class of recurrence equations, IEEE
Transactions on Computers 22 (8) (1973) 783–791.
[25] D. Pezaris, A 40 ns 17-bit by 17-bit array multiplier, IEEE
Transactions on Computers C-20 (4) (1971) 442–447.
[26] R. Baugh, A. Wooley, A twoÕs complement parallel array
multiplication algorithm, IEEE Transactions on Computers C-22 (1–2) (1973) 1045–1047.
[27] M. Annaratone, Digital CMOS Circuit Design, Kluwer
Academic Publishers, 1986.
[28] R.O. Duarte, M. Nicolaidis, H. Bederr, Y. Zorian, Efficient
totally self-checking shifter design, Journal of Electronic
Testing: Theory and Applications 12 (1–2) (1998) 29–
39.
[29] G. Alexiou, N. Kanopoulos, A new serial/parallel twoÕs
complement multiplier for VLSI digital signal processing,
International Journal of Circuit Theory & Applications 20
(1992) 209–214.
[30] P. Ienne, M.A. Viredaz, Bit serial multipliers and squarers,
IEEE Transactions on Computers 43 (12) (1994) 1445–
1450.
[31] P.H. Bardell, W.H. McAnney, J. Savir, Built-in Test for
VLSI: Pseudo-random Techniques, John Wiley & Sons,
New York, 1987.
[32] K. Cattell, S. Zhang, Minimal cost one-dimensional linear
hybrid cellular automata of degree through 500, Journal of
Electronic Testing: Theory and Applications 6 (2) (1995)
255–258.
[33] A.P. Stroele, BIST pattern generators using addition and
subtraction operations, Journal of Electronic Testing:
Theory and Applications 11 (1) (1997) 69–80.
[34] ANSI/IEEE, IEEE standard for binary floating-point
arithmetic, ANSI/IEEE Trans. Std. 754-1985.
Dimitris N. Bakalis received the
Diploma degree in 1995, the M.Sc.
degree in 2000 and the Ph.D. degree in
2001 in Computer Engineering, all
from the Department of Computer
Engineering and Informatics at the
University of Patras in Greece. He
currently holds a lecturer position in
the Physics Department at the same
university. His main research interests
include Rapid Prototyping, VLSI and
System Design and Test, Low Power Design, Test and Design
for Testability. He is a member of the IEEE.
12
D. Bakalis et al. / Journal of Systems Architecture 52 (2006) 1–12
Kostas D. Adaos received a Diploma
degree from the Department of Computer Engineering and Informatics at
the University of Patras in Greece in
1993, where he is pursuing his Ph.D.
degree. He has gained experience in
various positions in industry where he
has worked as an IC and system
designer. He is currently working as an
IC designer for Atmel Corporation.
His main research interests include
Rapid Prototyping, VLSI and System Design. He is a member
of the IEEE.
Dimitrios Lymperopoulos received his
Diploma in Computer Engineering in
2003 from the Department of Computer Engineering and Informatics,
University of Patras, Greece. In 2003
he joined the Embedded Networks and
Applications Lab (ENALAB) of the
Electrical Engineering Department at
Yale University. He is currently pursuing a Ph.D. degree in the area of
embedded systems and wireless sensor
networks.
Maciej Bellos received the Diploma in
Computer Engineering in 1999 and the
M.Sc. degree in Computer Science and
Technology in 2001 from the Department of Computer Engineering and
Informatics, University of Patras,
Greece. He is currently pursuing the
Ph.D. degree. His research is focused
on efficient IC testing and design for
testability. He is a member of the
Technical Chamber of Greece.
H.T. Vergos received the Diploma in
Computer Engineering, and his Ph.D.
in Fault Tolerant Computer Architectures in 1991 and 1996, respectively,
both from the Department of Computer Engineering & Informatics,
University of Patras, Greece, where he
currently holds an Assistant Professor.
In 1998, he worked for Atmel Multimedia & Communications Group,
developing the first IEEE 802.11 compliant wireless MAC device. He is the author of more than 30
scientific publications and holds one world patent. His interests
include Computer Arithmetic, Rapid Prototyping, Dependable
System Architectures and Low Power design and Low Power
Test.
George Ph. Alexiou received a B.Sc. in
Physics in 1976 and a Ph.D. in Electronics in 1980, both from the University of Patras, Greece, where he is
now a professor in the Department of
Computer Engineering and Informatics and Director of the Microelectronics (VLSI) Lab. He has served in all
programme committees of IEEE
International Symposium of Quality
Electronics Design (ISQED) starting
2000. He also has served in the programme committee of IEEE
Rapid System Prototyping Workshop starting 2000. He is
publishing papers in a number of international journals and
conference proceedings. His research interests include VLSI
design, VLSI CAD tools, signal processing, digital systems and
RF data communications. He is a member of the IEEE.
Dimitris Nikolos received the B.Sc.
degree in physics, the M.Sc. degree in
Electronics and the Ph.D. degree in
Computer Science, from the University
of Athens, Athens, Greece. He is currently a Full Professor in the Computer
Engineering and Informatics Department of the University of Patras,
Patras, Greece, and head of the Technology and Computer Architecture
Laboratory. He has authored or coauthored more than 140 scientific papers in refereed international journals and conferences and holds one USA patent. His
main research interests are fault-tolerant computing, computer
architecture, VLSI design, low power design, test and design for
testability. Prof. Nikolos was co-recipient of the Best Paper
Award for his work ‘‘Extending the Viability of IDDQ Testing
in the Deep Submicron Era’’ presented at the 3rd IEEE Int.
Symposium on Quality Electronic Design (ISQED 2002). He
has served as the Program Co-chairman of five IEEE Int. OnLine Testing Workshops (1997–2001). He also served on the
program committees for the IEEE Int. On-Line Testing Symposiums (2003–2005), the IEEE International Symposium on
Defect and Fault Tolerance in VLSI systems (1997–1999), for
the Third and Fourth European Dependable Computing Conference and the Design Automation & Test (DATE) Conferences (2000–2005). He was a Guest Co-editor for the June 2002
special issue of the Journal of Electronic Testing, Theory and
Applications (JETTA), which was devoted to the 2001 IEEE
International On-Line Testing Workshop. He is a member of
the IEEE.